This walkthrough is meant for engineers who are cautious about “vibe coding.” The point is not that AI can type fast. The point is that newer tooling can put specs, review gates, memory, and adversarial checks around that speed.
The skeptical reaction is reasonable. If the model is just generating code from a prompt, quality is mostly luck. This workflow is different because it introduces structure before and after code generation.
CLAUDE.md, so lessons from one session become defaults for the next instead of disappearing into chat history.Before the pipeline starts, the developer creates a structured requirements document. Then the spec itself gets stress-tested through multi-model review before any implementation begins.
/dialogue, where Codex, Gemini, and Claude reviewed the same proposal from different angles. Claude moderated the flow, claims were checked against local code context, and the session forced convergence on concrete design decisions before execution.Ship-phase begins. It reads the validated spec, scans the codebase for reusable assets, and produces context that downstream agents can use without repeatedly rediscovering the same facts.
/chat or /challenge needed.Build.DEVICE). This shaped the entire test architecture — PlayerCommands interface became the testability boundary.player.stop() causes a black screen on Android TV hardware (Media3 issue #2941).FOREGROUND_SERVICE_MEDIA_PLAYBACK permission or the service crashes at runtime.
<read_first>, <acceptance_criteria>, and <action> blocks. Every requirement (PLAY-01 through PLAY-06) mapped to at least one plan. A plan-checker agent verified coverage./tracer wiring audit → G3 decision hardening → G4 external code review → G4b /test-audit → G4c /rubber-ducky blind spots → G4d /code-health SOLID/KISS → G5–G8 gap closure, simplify, verify, reconcile. This is the core pitch: speed plus repeatable review pressure.After all code was written and all 27 tests passed, an external adversarial review still found bugs at the wrong abstraction layer for unit tests to catch. This is the part skeptical engineers usually care about most.
handleTokenExpired() launched on IO dispatcher but called playerCommands.setMediaItemAndPrepare() which must run on the main thread. Intermittent crashes on real devices.mockk(relaxed = true) hid failure paths, and PlaybackService had zero unit tests. None are code bugs. They are integrity gaps between what the workflow said and what it truly proved.Each skill is a specialized capability with a narrow responsibility. That is useful because it replaces vague prompting with named, repeatable operations.