Building an AI Story App: Architecture and Lessons Learned
A technical deep dive into the systems architecture of Inky. No hype, just the reality of shipping agentic engineering and managing narrative state at scale.
I’ve spent the last few months working in public on Inky. It is a multi-product studio project designed to solve a specific problem: AI-generated long-form fiction usually sucks. Most people think the solution is a better prompt. They are wrong. The solution is a better system.
Building an ai story app isn't about finding a magic string of text to send to Claude. It is about architecting a multi-agent system that can maintain state, character consistency, and narrative arc across 50,000 words without hallucinating. I am building this using agentic engineering—treating AI as the operating layer of the team rather than just a code-completion tool.
The Architecture of Narrative
When you are building an ai story app, you realize quickly that prose is the easy part. LLMs are excellent at generating a single scene. They are terrible at remembering that a character lost their keys in chapter two when they reach the front door in chapter twelve.
To solve this, I moved away from the 'one-shot' generation model. Inky operates on a decoupled architecture. The system is split into three distinct layers: the Planner, the Chronicler, and the Editor.
- The Planner: This agent doesn't write a single word of prose. Its only job is to maintain the 'Story Bible'—a structured JSON object containing character traits, plot beats, and world-building constraints.
- The Chronicler: This agent receives a specific beat from the Planner and the relevant context from the Story Bible. It generates the raw prose.
- The Editor: This agent reviews the output against the Story Bible to ensure no continuity errors were introduced.
By separating these concerns, I’ve reduced narrative drift by roughly 70%. The system no longer 'forgets' who is in the room because the context is injected programmatically, not left to the model's fading memory.
Agentic Engineering: The VERA Layer
I run my studio using a custom agent orchestration layer I call VERA. For Inky, VERA manages the handoffs between the Planner and the Chronicler. This isn't a simple sequential chain. It is a feedback loop.
If the Chronicler decides, in the flow of writing, that a character should make a choice not originally in the outline, it sends a request back to the Planner to update the Story Bible. This allows for 'emergent storytelling' while maintaining a rigid system of record.
The Stack and the Monorepo
I build everything in a monorepo. As a solo operator running a multi-product studio, I don't have time to manage dependencies across ten different repositories. Inky shares a core logic library with my other products, which handles authentication, billing, and my MCP (Model Context Protocol) servers.
I use Claude 3.5 Sonnet for the heavy lifting of narrative generation because of its superior grasp of subtext. However, I use Gemini 1.5 Pro for long-context retrieval. When the Story Bible grows to 200,000 tokens, Gemini’s needle-in-a-haystack performance is the only thing that keeps the system from breaking.
Latency vs. Quality Tradeoffs
I learned the hard way that users will wait for quality, but they won't wait forever. A full chapter generation can take 45 seconds because of the multi-agent verification loop. I had to build a streaming status indicator that shows the user exactly what the agents are doing: 'Planner is updating the character arc,' 'Chronicler is drafting scene 2,' etc.
This transparency isn't just a UI trick; it’s a necessity when you are shipping agentic engineering. It builds trust in the system's 'thinking' process.
Studio Notes
How I’m building the studio.
The operator’s log — systems, decisions, and what’s working.
Written by
Founder, Total Ventures
Solo-founder building a multi-brand product studio with AI agents. Writing about building, operating, and shipping.


