I am shipping Inky today. It is not a demo or a proof of concept. It is a functional digital product designed to solve a specific problem: the friction between human imagination and the blank page.
When you are building an ai story app, the temptation is to treat the LLM as a magic box. You send a prompt, you get a story, and you call it a product. That approach is brittle. It does not scale, and it does not produce a durable user experience. Inky is built differently. It is the result of architecting a system where AI is the operating layer, not just an autocomplete feature.
The Artifact: Beyond the Wrapper
Inky is a storytelling engine. The goal was to create a system that could maintain narrative consistency across long-form arcs while allowing for granular user intervention. Most people building an ai story app stop at the single-prompt generation. They hit the context window limit or the narrative drifts into nonsense by chapter three.
I learned the hard way that a single LLM call cannot handle a 50,000-word plot. The system requires a state machine. In Inky, the architecture separates the 'World Brain' from the 'Drafting Engine.' The World Brain is a structured database—a set of JSON schemas—that tracks character traits, locations, and plot points. The Drafting Engine is the agentic layer that pulls from that state to write the prose.
This separation of concerns is what makes the app functional. If a character loses an arm in chapter two, the World Brain updates that state. When the Drafting Engine writes chapter ten, it queries the state first. This is not 'magic'; it is basic systems engineering applied to generative models.
Agentic Engineering as the Team
I run a multi-product studio where AI is the team. For Inky, this meant moving beyond simple API calls and into agentic engineering. I built a custom orchestration layer called VERA to handle the heavy lifting.
In this model, I am the architect, and the agents are the specialists. One agent is responsible for narrative pacing. Another handles dialogue consistency. A third monitors the infrastructure for latency spikes. They operate within a monorepo, sharing types and schemas, which allows me to maintain a high shipping velocity without a headcount of twenty developers.
Working in public means being honest about the stack. I am not an advocate for one specific framework. I use what works. For Inky, that means a TypeScript-heavy monorepo, a robust PostgreSQL backend for state persistence, and a mix of Claude and Gemini models depending on the task. Claude handles the creative prose; Gemini handles the long-context retrieval for the World Brain. The system is the priority, not the brand of the model.



