I am building Inky. It is an AI storytelling app designed to handle complex, multi-chapter narratives. Most people building an ai story app start with a single prompt and a text area. That works for a paragraph, but it breaks for a book. I learned the hard way that narrative consistency requires more than a better model; it requires a better system.
Inky is part of my multi-product studio. It is not a side project; it is a test of the operating layer I have been refining. The goal is to move from simple text generation to a structured, agentic workflow that understands plot, character arcs, and world-building constraints.
The Architecture of Agentic Engineering
When you are building an ai story app, the first hurdle is realizing that a single LLM call is insufficient for long-form content. If you ask a model to write a 50,000-word novel, it will hallucinate, lose the plot by chapter three, and ignore the character development you established in the prologue.
I architected Inky using agentic engineering. Instead of one prompt, I use a network of specialized agents managed by VERA, my custom orchestration layer.
The Director-Writer-Editor Pattern
In Inky, the work is split across three primary roles:
- The Director: This agent holds the high-level outline. It manages the story Bible—a structured JSON object containing character traits, locations, and plot beats. It ensures that if a character loses their left arm in chapter two, they aren't playing piano with both hands in chapter ten.
- The Writer: This agent focuses on the prose. It receives a specific scene objective from the Director and a slice of the story Bible. Its only job is to produce high-quality narrative text.
- The Editor: This agent reviews the Writer’s output against the Director’s constraints. It looks for continuity errors and tone shifts. If the prose is off, it sends it back for a rewrite.
This system mimics a real production house. It is a feedback loop, not a linear pipeline.
Solving the Context Window Problem
Context is the most expensive and volatile resource in building an ai story app. Even with 200k token windows, you cannot simply dump an entire book into the prompt and expect the model to maintain focus. The signal-to-noise ratio degrades as the window fills.
I solved this by implementing a sliding window of "active memory" and a vector database for "latent memory."
- Active Memory: The last two chapters and the current scene outline. This stays in the immediate context.
- Latent Memory: The story Bible and previous plot points stored as embeddings. When the Writer agent needs to reference a character's backstory from ten chapters ago, VERA performs a similarity search and injects only the relevant snippet into the prompt.
This keeps the context lean and the output sharp. I learned the hard way that over-stuffing the context window leads to "lazy writing" from the model, where it starts summarizing instead of dramatizing.



