Building an AI story app is often framed as a prompt engineering exercise. It isn't. If you are building a system that generates a cohesive, multi-chapter narrative with consistent characters and visual assets, you aren't writing prompts—you are architecting a state machine.
I am currently building Inky, a multi-product studio project designed to handle long-form storytelling. The goal was to move past the 'chatbot' interface and create a system that functions as a production house. This required moving away from simple API calls and toward what I call agentic engineering.
Here is the architecture, the stack, and the lessons I learned the hard way while shipping today.
The Shift from Prompts to Agentic Engineering
When you start building an AI story app, the temptation is to send a massive prompt to Claude or GPT-4 and ask for a story. This works for a few paragraphs. It fails for a book. The context window drifts, the narrative arc flattens, and the 'AI voice' becomes repetitive.
Inky uses VERA, my custom agent orchestration layer, to break the production into discrete roles. Instead of one prompt, the system runs a sequence of specialized agents:
- The Architect: Defines the narrative arc, themes, and world-building constraints.
- The Casting Director: Generates detailed character descriptions and visual 'seeds' to ensure consistency across chapters.
- The Scribe: Writes the actual prose, one scene at a time, constrained by the Architect’s outline.
- The Continuity Editor: Reviews the output against the global state to ensure a character who lost a sword in Chapter 2 doesn't suddenly have it in Chapter 4.
This modularity allows me to swap models based on the task. I might use Claude 3.5 Sonnet for the prose because of its nuance, but use a faster, cheaper model for the initial structural outlining. By treating AI as the team rather than a single oracle, the system becomes resilient.
Managing Narrative State Across Long Contexts
Consistency is the primary friction point when building an AI story app. If the reader is ten chapters deep, the system must remember the emotional weight of a previous scene without re-sending the entire book text in every API call—which is both expensive and prone to noise.
I solved this by implementing a 'Narrative Ledger.' This is a structured JSON object stored in a Postgres database that tracks:
- Character State: Current location, inventory, and relationship status.
- Plot Points: Resolved vs. unresolved threads.
- Visual Anchors: Specific physical descriptions used to generate consistent image prompts.
Before the Scribe agent writes a single word, the system queries the Ledger for the relevant context. This keeps the prompt size down and the focus sharp. I learned the hard way that relying on the LLM's 'memory' is a recipe for hallucinations. The database is the source of truth; the LLM is just the processor.


