GSD-2: Get Shit Done
Autonomous Coding Agents
A purpose-built TypeScript CLI that orchestrates AI agents to plan, execute, verify, and ship multi-phase software projects — without human intervention.
What Is GSD-2 — and Why It Exists
GSD-2 is a standalone CLI application (npm package: gsd-pi) that enables autonomous AI agents to execute multi-phase software development projects end-to-end. It's the evolution of the original GSD prompt framework.
Version 1 was a prompt framework that relied on Claude Code slash commands. Version 2 is a complete rewrite — a purpose-built agent orchestration system written in TypeScript on the Pi SDK, giving it direct harness-level control over agent sessions, context windows, and execution pipelines.
v1 = prompt framework that guided an LLM. v2 = application that controls agent sessions directly. The LLM is a tool, not the orchestrator.
Work Hierarchy — Milestone → Slice → Task
GSD structures all work into a three-level hierarchy designed around context window constraints. Every unit of work has a clear definition and scope ceiling.
Milestone
A shippable version of the project. Contains 4–10 slices. This is what gets squash-merged to main and what you'd demo to stakeholders.
Slice
A demoable vertical capability. Contains 1–7 tasks. Each slice gets its own git branch with worktree isolation.
Task
A single context-window unit of work. The atomic execution unit. Each task gets a fresh context window with pre-inlined context.
Operational Modes — Three Ways to Run
GSD-2 offers three distinct execution modes, each targeting a different workflow — from fully hands-off automation to guided step-by-step execution to CI/CD integration.
Autonomous Mode
/gsd autoFull autopilot. Researches, plans, executes, verifies, commits, and repeats — without human intervention. Each phase gets a fresh context window with pre-inlined relevant context. The agent walks through: research → plan → execute (per task) → complete → reassess roadmap → next slice.
Step Mode
/gsd nextSame state machine, human pacing. Pauses between each unit with a wizard UI showing completed work and next steps. Ideal for developers who want to stay in the loop while the agent handles execution.
Headless Mode
gsd headlessNo TUI, pure automation. Designed for CI pipelines, cron jobs, and scripted workflows. gsd headless query provides instant JSON snapshots (~50ms) without spawning LLM sessions.
Use the two-terminal workflow: run /gsd auto in one terminal, and steer with /gsd discuss or /gsd status in another. The agent adapts in real time.
Context Engineering — Zero Wasted Tool Calls
One of GSD-2's most powerful capabilities: it pre-inlines everything the agent needs into dispatch prompts. Task plans, slice summaries, prior outcomes, roadmap excerpts, and decision registers — all injected before the LLM session starts.
- Fresh context per task — no garbage accumulation from prior work
- Pre-inlined task plans — the agent starts with full orientation
- Prior outcomes — what was built, what succeeded, what failed
- Decision register excerpts — architectural choices already made
- Roadmap context — what's ahead, what's at risk
LLMs degrade as context fills up. By starting each task with a clean, purpose-built context window, GSD avoids the "confused agent" problem where prior noise causes hallucinated decisions.
Git Strategy — Bisect-Friendly by Default
GSD-2 manages git automatically with a strategy designed for clean history, safe isolation, and easy rollbacks.
worktree (default), branch, or none.
main clean and makes git bisect efficient. Each milestone is a single revertable unit.
git revert. No cherry-picking across dozens of commits. The granular per-task history is preserved on slice branches if you need it.
Verification System — Trust but Check
Every task goes through a multi-layer verification pipeline before it's considered complete. This is how GSD ensures agents actually build what they claim.
Must-Haves
Each task plan includes "must-haves" — mechanically checkable outcomes. These aren't vague descriptions; they're observable truths: implementation artifacts exist, key imports are wired, specific behaviors are present.
Verification Pipeline
Static Checks
Must-have verification against the file system and code artifacts.
Shell Commands
Configurable lint, test, and build commands. Auto-fix retries on failure.
Behavioral Testing
Does the feature actually work? Validated against slice acceptance criteria.
Human Review
Only when necessary. GSD generates UAT scripts from slice outcomes so you know exactly what to test.
Crash Recovery — Sessions Die, State Survives
Agent sessions crash. Network connections drop. Machines restart. GSD-2 is built for all of it.
- Lock files track the current unit of work — survives session death
- Session forensics — reads surviving session files and synthesizes a recovery briefing from all completed tool calls
- Parallel state persistence — orchestrator state persisted with PID liveness detection
- Headless auto-restart — on crash, auto-restarts with exponential backoff (default 3 attempts)
When GSD detects a crashed session, it reads the lock file and surviving artifacts. It builds a recovery context — what was completed, what was in progress, what remains — and injects that into the next agent session. The agent picks up exactly where the crash happened.
Cost & Token Tracking — Know What You're Spending
Autonomous agents can burn through API budgets fast. GSD-2 includes a full cost management system built directly into the execution pipeline.
Ctrl+Alt+G or /gsd status. Shows live spend, token consumption, and progress metrics across the current milestone.
Artifact Management — The .gsd/ Directory
All project state lives in a .gsd/ directory at the project root. These are human-readable Markdown files that double as the agent's source of truth.
| File | Purpose |
|---|---|
| STATE.md | Quick-glance dashboard — read this first for current status |
| DECISIONS.md | Append-only architectural decision register |
| PROJECT.md | Living doc of current project state and context |
| M###-ROADMAP.md | Slice checkboxes, risk levels, and dependency graph |
| S##-PLAN.md | Task decomposition with must-haves for each task |
| T##-SUMMARY.md | Outcome narrative with YAML frontmatter metadata |
| S##-UAT.md | Human acceptance test script generated from slice outcomes |
| reports/*.html | Self-contained HTML reports with DAG, charts, and changelog |
These files are both human-readable and machine-readable. You can review STATE.md in your editor to see exactly what the agent sees. Edit DECISIONS.md to steer architectural choices mid-run.
Command Reference — What You Can Run
GSD-2 exposes its functionality through slash commands (inside agent sessions) and CLI commands (for headless/external use).
Tech Stack — What Powers It
- Runtime — TypeScript application built on the Pi SDK
- Agent Control — Direct harness access, not prompt-based orchestration
- Version Control — Git with native worktree support
- LLM Providers — Anthropic, OpenAI, Google, OpenRouter, GitHub Copilot, and 15+ others
- Per-Phase Model Selection — different models for research vs. planning vs. execution, with fallback chains
- IDE Integration — VS Code extension with chat participant and dashboard
- Distribution — npm global CLI (
gsd-pi) - Headless Deployment — CI/CD compatible, auto-restart on crash
GSD supports per-phase model selection with fallback chains. Use a cheaper model for research, a powerful one for execution, and a fast one for verification. Configure in .gsd/preferences.md.
Why GSD-2 Is Notable — The Key Differentiators
In a landscape of AI coding tools, GSD-2 stands out for solving the problems that actually kill agent-driven projects.
- No context degradation — fresh session per task, not accumulated garbage. The agent never gets confused by prior noise.
- True automation via state machine — reads .gsd/ files from disk, not LLM self-looping. Deterministic orchestration, not hope-based chaining.
- Built-in observability — cost tracking, token metrics, progress dashboards, stuck detection, and timeout supervision. You always know what's happening.
- Crash resilience — session forensics, lock files, PID liveness detection, auto-recovery with exponential backoff. Sessions die; state survives.
- Verification guarantees — configurable shell commands with auto-fix retries. Must-haves are mechanically checked, not self-assessed.
- Clean git history — squash-merged milestones, meaningful per-task commits, git bisect-friendly. Your repo stays professional.