I am currently building Inky, an AI-driven storytelling platform. Most people think building an ai story app is a matter of finding the right prompt and wrapping it in a UI. I have learned the hard way that this approach fails as soon as the narrative gains any real complexity.
In my studio, I don't treat AI as a better version of autocomplete. I treat it as the operating layer of the team. When you are building an ai story app, you aren't just managing text generation; you are architecting a system that manages state, context, and narrative logic across multiple agents. This is agentic engineering in practice.
Architecture Over Autocomplete
The core challenge of building an ai story app is consistency. A standard LLM call is stateless. If you ask it to write chapter four, it has no inherent memory of the character's eye color in chapter one unless you provide that context. But context windows are expensive and noisy. If you dump 50,000 words into a prompt every time a user clicks 'next,' the model loses the thread.
To solve this, I built a system I call VERA—my custom agent orchestration layer. Instead of one giant prompt, Inky uses a network of specialized agents:
- The Librarian: Manages the vector database where world-building facts and character arcs are stored.
- The Architect: Outlines the narrative structure and ensures the pacing follows established literary frameworks.
- The Weaver: Handles the actual prose generation, pulling only the necessary context from the Librarian and the Architect.
By decoupling these concerns, the system remains stable. This isn't about being a prompt engineer; it's about being a systems architect.
The Stack: Choosing Instruments
I don't believe in being a partisan for a specific stack. I pick the instruments that allow me to ship today. For Inky, that means a monorepo structure that allows me to move fast as a solo operator supported by AI agents.
- Next.js & TypeScript: The frontend and API layer. TypeScript is essential here—not because I am an expert in it, but because it provides the guardrails necessary when agents are contributing to the codebase.
- Supabase: Handles the heavy lifting for the database, authentication, and edge functions.
- Pinecone: Used for vector embeddings. This is how the 'Librarian' agent remembers that a character is allergic to peanuts three chapters later.
- Claude API & Gemini: I use different models for different tasks. Claude excels at nuanced prose; Gemini is useful for processing massive amounts of research data due to its larger context window.
Working in public means admitting that this stack will likely evolve. But for now, it is the most efficient way to maintain a multi-product studio without a bloated headcount.



