Building an AI story app is often framed as a prompt engineering exercise. The reality is different. When I started building Inky, my multi-agent storytelling engine, I realized quickly that the LLM is the least interesting part of the stack. The challenge isn't getting the AI to write; the challenge is building a system that maintains state, enforces narrative logic, and scales without turning into a token-burning mess.
I am building Inky in public as part of my multi-product studio. This isn't a side project; it is a testbed for agentic engineering—the practice of using AI as the operating layer of a product rather than just a feature. Here is how the system is architected and what I have learned the hard way while shipping today.
The Architecture of a Narrative Engine
Most people building an AI story app start with a single prompt. They ask a model to "write a story about a cat." This works for a paragraph, but it fails for a novel. To build something durable, you have to move away from the chat interface and toward a structured system of agents.
Inky runs on a monorepo architecture. I use a custom agent orchestration layer I built called VERA. VERA doesn't just send prompts; it manages the lifecycle of a story through a series of specialized agents.
The Agentic Pipeline
Instead of one model doing everything, Inky uses four distinct agents:
- The Architect: This agent handles the high-level plot points and world-building rules. It doesn't write prose. It writes the "truth" of the world into a JSON schema.
- The Author: This agent takes the Architect's instructions and generates the narrative. It is constrained by the current state of the world.
- The Editor: This agent reviews the Author's output for consistency. If the Author says a character is wearing a red hat in chapter one and a blue hat in chapter two without explanation, the Editor flags it.
- The Archivist: This agent manages the long-term memory. It summarizes previous chapters and updates the global state to ensure the context window doesn't overflow.
By separating these concerns, I can swap models for different tasks. I might use Claude 3.5 Sonnet for the Author because of its creative nuance, while using a faster, cheaper model for the Archivist's summarization tasks. This is the core of agentic engineering: picking the right tool for the specific sub-task.



