A fan-made arcade game compilation by ProxiBlue, built around real GitHub repository data from Mage-OS.
The game uses real pull request and commit data from a GitHub repository, pre-fetched and cached to a local JSON data file (not fetched in real-time). This data shapes every aspect of the gameplay experience.
Each landing pad on the terrain represents a real merged pull request. The PR's classification determines the pad's colour, width, point value, and which mini-game triggers on landing.
| PR Type | Pad | Points | Width | Mini-Game |
|---|---|---|---|---|
| Security | Red | 200 (3x) | Narrow | Missile Command / Invaders |
| Bug Fix | Yellow | 100 (2x) | Medium | Bug Bombing Run |
| Feature | Cyan | 50 (1x) | Wide | Feature Drive |
| Chore | Purple | 75 (1x) | Medium | Code Breaker |
| Other | Grey | 100 (1x) | Medium | Tech Debt Blaster |
PRs are classified by scanning labels, titles, and descriptions:
"Incoming threats detected — defend the codebase"
Incoming missiles labeled with conflict markers rain down toward buildings labeled with filenames from the PR. Fire interceptor missiles from defense batteries to protect your codebase.
"Defend against security threats"
Alien waves scroll in from the right. Shoot them for bonus points. Ship uses thruster-based physics with retro thrusters for braking. Alien contact ends the mini-game.
"Squash the bugs — every fix counts"
Fly over terrain and drop bombs on scuttling bugs. Normal lander physics apply. Kill bugs to earn points and restore fuel with extension tank overflow.
"Ship the feature — deploy to production"
The M ship sprouts wheels and drives across side-scrolling terrain to a destination pad. Jump over gaps, dodge rocks, collect review approvals.
"Refactor: systematically clear the debris"
Classic Asteroids gameplay. Large asteroids labeled with tech debt split into smaller, faster pieces. Some contain hidden aliens that escape and shoot at you. ProxiBlue power-up asteroids grant a shield.
"Chore: methodically clear the backlog"
Breakout/Arkanoid gameplay. The M ship is the paddle, bouncing a ball into bricks labeled with code smells. Power-ups: wide paddle, multi-ball, fireball, shooting, extra ball.
| Up / W | Thrust |
| Left / A | Rotate left |
| Right / D | Rotate right |
| Down / S | Retro thrust (mini-games) |
| Space | Start / Continue / Shoot / Launch |
| R | Restart level |
# Fetch PR and commit data from any GitHub repository node fetch-github-data.js owner/repo # Classify PRs by type node classify-prs.js data/<repo>.json
Open index.html in a browser. Select a repository if multiple data files exist, then press Space to start.
This game was built as a real-world test case for Chief — an autonomous PRD agent that runs Claude Code in a loop to build software from product requirements.
I'm enhancing Chief with an adversarial evaluation system in my fork (main branch). In the standard Chief loop, a single agent builds code and checks its own work. My fork adds independent adversarial evaluator agents and a dedicated security evaluator (OWASP Top 10) that review every completed story before it's allowed to pass. If the evaluators find missing features, broken acceptance criteria, or security issues, the story is failed and retried automatically.
This game is the proving ground — a complex, multi-system project designed to stress-test the adversarial evaluation loop and surface implementation bugs. Every mini-game, every feature, every PR-data integration was built by the Chief agent loop and validated by adversarial reviewers. The chiefloopEVALexample repository contains side-by-side output comparing standard (no eval) vs adversarial (with eval) builds of the same game.
The mini-games — Lunar Lander, Asteroids, Missile Command, Space Invaders, Breakout, Moon Buggy — are all classics from the Atari 2600, my first ever game console. That console is part of what got me interested in coding in the first place. Building these games with AI felt like coming full circle.
Some aspects of each mini-game were refined outside the Chief loop after the initial build — gameplay tuning, physics adjustments, visual polish, and security hardening. These refinements were fed back as PRD updates so the non-adversarial comparison test has the same information on changes and tweaks. Security hardening (CSP headers, input validation, JSONP removal) was also done manually outside the loop.