Back to blog

Game jam experiment

Orb Knight: how far can AI agents take a first-time game jam?

This is an experiment log: how far can I get in a large game jam, with almost no game development background, if I treat Codex and Claude Code as implementation agents and keep the feedback loop tight?

Result

Orb Knight placed 12th overall in Gamedev.js Jam 2026 and 6th in the Gameplay category. The public winners post lists the overall ranking, and the itch.io results page lists the Gameplay ranking. The game itself is a browser-only 3D action game about escaping a machine dungeon and pushing toward the castle road.

12th overall in Gamedev.js Jam 2026
6th in Gameplay on itch.io
Built with SvelteKit, Svelte 5, Threlte, Three.js and Rapier
Game systems, graphics direction, music, UI and gameplay were built through LLM-agent workflows
Used Codex and Claude Code as implementation agents under a fixed jam deadline
Storybook playgrounds supported focused iteration on combat, rooms, UI, player and models

What was built

The final demo has third-person combat, shooting, melee, procedural foundry rooms, room transitions, pickups, shops, treasure rooms, boss encounters, loadout modules, audio, settings, Storybook playgrounds and persistent run progress.

Why include it here?

It is not a normal portfolio project and it is not trying to pretend I became a game developer overnight. The useful signal is different: I took an unfamiliar domain, used AI agents as a serious engineering tool, decomposed the work into systems, and shipped a playable result that ranked well in a competitive public setting.

Lessons

Agents are strongest with tight feedback loops

The useful pattern was not one giant prompt. It was short implementation loops, browser checks, screenshots, playable states and quick corrections.

Unknown domains force better decomposition

Without game development experience, the work had to be split into clear systems: movement, combat, rooms, loadout, physics, camera, UI, audio and progression.

Visual quality needs verification, not vibes

Screenshots, Storybook scenes and repeated playtests mattered because visual regressions and game feel problems are hard to catch from code alone.

The best result was learning velocity

The ranking was nice, but the real result was proving how far a small team can get in a new domain with strong agent workflows and fast taste-driven iteration.