Shipping AI Products: Lessons from Building a Game with AI

I built a turn-based starship combat game called Sektorium: Captains’ Duel. Two captains face off on a hex grid, taking four sequential phases each turn: allocate energy, plot movement, lock sensors, fire weapons. It is free to play in your browser with no download and no account, has six factions with distinct playstyles, and is fully open source. I was inspired by the tabletop games of my childhood — Star Fleet Battles, BattleTech. The joy was in the deliberation as much as the dice. I built most of it with heavy assistance from Claude.ai and a few other large language models. You can play it at sektorium.com or fork it on GitHub if you want to see how any of this actually works.

I want to be clear about what this was: a deliberate learning experiment. I broke most of my own rules about product management simply to see what Claude.ai could do. No phased gate reviews I actually enforced, no scope discipline, no “is this in the roadmap” conversation with myself. I let the tool run. The point was to find the edges — to learn firsthand where an LLM collaborator makes you faster and where it quietly makes you slower, by doing the thing rather than theorizing about it. The game was the laboratory. The findings were the real output.

What follows are the five lessons that survived. They are not really about game development. They are about working with LLM collaborators on something ambitious and long-lived, which is increasingly the job for anyone deploying AI inside a real product. I am sharing them the way I share findings with the founders and funds we work with at SproutVest: the misses are more useful than the wins, so I am going to be candid about what hurt.

Lesson 01

The leverage was in the documents, not the model

The question I get most often is some version of “how much did the AI actually write?” The honest answer is a lot. The useful answer is that the model was never the bottleneck. Context was.

LLMs do not remember anything between sessions by default. Every fresh conversation starts from zero. If you do not deliberately re-establish what the project is, what its rules are, and where things live, the model fills the gaps with reasonable-sounding guesses that sometimes contradict yesterday’s reasonable-sounding guesses. I solved this with a small set of documents loaded at the start of every session: a master brief that defined the non-negotiable rules and the layout of the work, a separate brief for each major phase with its own scope and acceptance criteria, and a single verification document that answered one question — “is this piece actually done?”

The insight that transfers most directly to product work: the documents around the code mattered more than the code itself. When they were good, the output followed. When they were stale or inconsistent, no amount of clever prompting recovered it. If you are a product manager standing up an AI feature, your spec, your invariants, and your definition of done are not paperwork. They are the structure everything else depends on, and they are the part that is most expensive to get wrong.

Lesson 02

A few non-negotiable rules are worth defending

At the very start I committed to three rules and refused to break them. The game logic had to be fully reproducible — the same inputs always produce the same result. The server was always in charge, and the player’s browser was just a window, never a source of truth. And every change to the game state got recorded as an entry in an append-only log, like a bank ledger you add to but never edit.

None of these were necessary to ship a playable game. I could have skipped all three. But months later, when I wanted replays, AI opponents, spectator mode, and a way to prove a competitive match had not been tampered with, every one of those features came almost for free because the rules were already holding. The replay was just the ledger played back. The AI opponent just produced the same kind of moves a human would, fed through the same engine.

The lesson is not “adopt these three rules.” It is that a small number of architectural commitments, made early and defended without exception, are worth more than ten nice-to-haves you bolt on later. The day you compromise one of them for a small convenience is the day a future feature quietly breaks and you do not notice for months. In product terms, this is the difference between a platform decision and a feature decision — and AI features have a way of turning yesterday’s convenient shortcut into today’s structural debt.

Lesson 03

LLM-assisted work creates real technical debt, and it compounds fast

This is the lesson I have to be honest about, because it is invisible from the outside and it is the failure mode anyone working this way will eventually hit. An LLM writes code that looks like the code it most recently saw. That is usually a gift, because it picks up your conventions and applies them consistently. It also means it propagates bad patterns exactly as fast as good ones, and occasionally invents patterns nobody asked for.

I ended up with three flavors of this. The model reinvented helpers that already existed, quietly recreating the same small utility in a dozen different files instead of importing the one canonical version. It left patterns that should have been shared helpers as copy-pasted blocks scattered across the codebase. And it accumulated detritus — duplicate files with “2” appended to the name, tests that never ran, old schemas lingering after replacements arrived. Each item was individually trivial. Together they added up to real cleanup work.

The deeper point for product leaders is that the model amplifies whatever direction you’re already moving. Shipping fast is not the same as making progress. The discipline that matters is not writing cleverer prompts — it is noticing duplication early and consolidating before the third instance becomes the thirtieth. If you are measuring an AI-assisted team only on how fast they ship, you are not seeing the debt accruing underneath, and it will surface as a slowdown right when you want to scale.

Lesson 04

A numbered roadmap does not stop scope creep

My plan had a tidy list of numbered phases. It looked disciplined. Then a phase 7.5 appeared mid-stream because an earlier phase had shipped a capability without the supporting piece it obviously needed. Later phases swelled to absorb accounts, social login, ranked play, cosmetic items, and a stack of integrations — most of which entered as “small additions.” None of them was a mistake on its own. Every one was defensible the day I added it. The cumulative effect was a project where the actual game, the part you play, is maybe a fifth of the work. The rest is identity, social features, economics, and scaffolding.

The numbered list created an illusion of bounded scope that the reality never honored. The roadmap’s real value turned out to be the pause points — the moments where I could stop and decide whether the next thing was worth doing. I was good at pausing. I was weak at deciding not to.

The transferable habit: plan no more than two phases ahead, and give every phase an explicit “what is not in scope” section sitting right next to the scope. The deferrals are as important as the inclusions. Speculative work — the integrations you build because they are interesting rather than because a real user needs them — should stay off the roadmap until there is a concrete problem they solve. Building the infrastructure first and the audience second is backwards, even when — especially when — the tooling makes the infrastructure cheap to build.

Lesson 05

Half-finished moves are worse than not starting them

The single most confusing thing in the whole Sektorium codebase was a pair of staging directories I created for a reorganization I planned and then never executed. The staging happened. The split did not. Those directories sat there for months holding stale copies of code that lived elsewhere, plus notes describing a future that may never arrive. Every time I searched the codebase (including every time the model searched it), the question “where does this actually live?” suddenly had three plausible answers instead of one.

The same shape showed up in smaller forms all over: legacy files that outlived their replacements, templates duplicated across three places, notes spread across five files that should have been one. In every case a change got started, the new artifact got created, the old one never got deleted, and the result was two sources of truth drifting apart.

The rule I wish I had held: either finish the move inside a defined window or revert it. No long-lived “we will finish this later” states. This matters more with an AI collaborator than without one, because the model reads everything you leave lying around and treats all of it as signal. Ambiguity you would mentally filter out, it faithfully amplifies. A clean workspace is not tidiness for its own sake. It is the context your collaborator depends on to be any good.

What I Would Carry Into the Next Build

If I started a similar project tomorrow, the changes would be mostly about discipline rather than design. Write the invariants and the definition of done before any real work begins, and load them every session. Pick a small number of architectural rules and defend them without exception. Review specifically for duplication, and consolidate a pattern the second time it appears rather than the thirtieth. Plan two phases ahead at most, with the deferrals written down as deliberately as the commitments. And when a change is worth starting, finish it or back it out — but never leave it staged.

The biggest surprise was how little of this was about the AI at all. The model was fast, capable, and tireless. What determined whether that speed turned into a product or into a mess was the same thing that has always determined it: clear rules, honest scope, and the discipline to keep the workspace clean enough that everyone — human or otherwise — knows what is true. The tooling changed. The job did not.

At SproutVest we help founders and funds turn deep technology into something commercial and trustworthy. More and more that means helping teams get the most out of AI collaborators without drowning in the debt they can quietly create. If that is the problem in front of you, I am happy to compare notes.