I am shipping Inky today. It is a digital product for interactive storytelling, but more importantly, it is a case study in how I run my studio. When you are building an ai story app, the temptation is to focus on the prompt. That is a mistake. The prompt is the easiest part of the stack. The system underneath—the state management, the agentic orchestration, and the feedback loops—is where the product actually lives.
I learned the hard way that a single-prompt architecture fails the moment a story gains complexity. If you want a narrative that holds together over twenty chapters, you cannot rely on a massive context window and a prayer. You need a system.
The Architecture of Coherence
Inky does not treat the LLM as a writer. It treats the LLM as a series of specialized workers. This is what I call agentic engineering. Instead of asking one model to 'write a story,' the system breaks the process into discrete phases: world-building, character mapping, plot outlining, and finally, prose generation.
Each phase is handled by a specific agent configuration. The world-building agent defines the constraints. The character agent ensures motivations are consistent. The prose agent only sees what it needs to see for the current scene, plus a compressed summary of the 'world state.'
By decoupling the narrative logic from the prose generation, the system maintains coherence. If a character loses a sword in chapter two, the world state reflects that. The prose agent in chapter ten does not need to read chapter two; it just needs the current state object. This reduces token waste and prevents the 'hallucination drift' common in long-form AI generation.
Working in Public: The Stack
I build from a monorepo. It is the only way I can manage a multi-product studio as a solo operator. For Inky, the stack is chosen for speed and durability, not for the sake of using the latest framework.
- Backend: Firebase. I migrated 14 callables last week to optimize for cold starts. I shaved 300ms off the initial handshake by flattening the data structure.
- Orchestration: VERA. This is my internal agent orchestration layer. It handles the handoffs between Claude 3.5 Sonnet (for logic) and Gemini 1.5 Pro (for long-context retrieval).
- Frontend: A lean implementation that stays out of the way. The UI is a window into the system, not the system itself.
Building an ai story app requires you to think about the 'money layer' early. High-quality output requires high-quality models, which are expensive. I engineered a caching layer that stores common narrative branches. If two users take a similar path in a shared world, the system retrieves the cached state rather than re-generating. This keeps margins healthy without sacrificing the user experience.



