GSD-2 — Autonomous Coding Agent System Explainer

01

What Is GSD-2 — and Why It Exists

GSD-2 is a standalone CLI application (npm package: gsd-pi) that enables autonomous AI agents to execute multi-phase software development projects end-to-end. It's the evolution of the original GSD prompt framework.

Version 1 was a prompt framework that relied on Claude Code slash commands. Version 2 is a complete rewrite — a purpose-built agent orchestration system written in TypeScript on the Pi SDK, giving it direct harness-level control over agent sessions, context windows, and execution pipelines.

Key Distinction

v1 = prompt framework that guided an LLM. v2 = application that controls agent sessions directly. The LLM is a tool, not the orchestrator.

npm install -g gsd-pi gsd /login # OAuth or API key for 20+ providers

02

Work Hierarchy — Milestone → Slice → Task

GSD structures all work into a three-level hierarchy designed around context window constraints. Every unit of work has a clear definition and scope ceiling.

Milestone

A shippable version of the project. Contains 4–10 slices. This is what gets squash-merged to main and what you'd demo to stakeholders.

Slice

A demoable vertical capability. Contains 1–7 tasks. Each slice gets its own git branch with worktree isolation.

Task

A single context-window unit of work. The atomic execution unit. Each task gets a fresh context window with pre-inlined context.

"A task must fit in one context window. If it can't, it's two tasks."

03

Operational Modes — Three Ways to Run

GSD-2 offers three distinct execution modes, each targeting a different workflow — from fully hands-off automation to guided step-by-step execution to CI/CD integration.

1

Autonomous Mode

/gsd auto

Full autopilot. Researches, plans, executes, verifies, commits, and repeats — without human intervention. Each phase gets a fresh context window with pre-inlined relevant context. The agent walks through: research → plan → execute (per task) → complete → reassess roadmap → next slice.

2

Step Mode

/gsd next

Same state machine, human pacing. Pauses between each unit with a wizard UI showing completed work and next steps. Ideal for developers who want to stay in the loop while the agent handles execution.

3

Headless Mode

gsd headless

No TUI, pure automation. Designed for CI pipelines, cron jobs, and scripted workflows. gsd headless query provides instant JSON snapshots (~50ms) without spawning LLM sessions.

Pro Tip

Use the two-terminal workflow: run /gsd auto in one terminal, and steer with /gsd discuss or /gsd status in another. The agent adapts in real time.

04

Context Engineering — Zero Wasted Tool Calls

One of GSD-2's most powerful capabilities: it pre-inlines everything the agent needs into dispatch prompts. Task plans, slice summaries, prior outcomes, roadmap excerpts, and decision registers — all injected before the LLM session starts.

Fresh context per task — no garbage accumulation from prior work
Pre-inlined task plans — the agent starts with full orientation
Prior outcomes — what was built, what succeeded, what failed
Decision register excerpts — architectural choices already made
Roadmap context — what's ahead, what's at risk

Why This Matters

LLMs degrade as context fills up. By starting each task with a clean, purpose-built context window, GSD avoids the "confused agent" problem where prior noise causes hallucinated decisions.

05

Git Strategy — Bisect-Friendly by Default

GSD-2 manages git automatically with a strategy designed for clean history, safe isolation, and easy rollbacks.

Every slice gets its own git branch. By default, GSD uses git worktrees so each slice works on an isolated copy of the repo. No stashing, no conflicts between parallel slices. Configurable: worktree (default), branch, or none.

Within a slice, each task produces a commit with a message derived from the task summary. Commits are sequential and meaningful — you can read the branch history and understand exactly what happened.

When a milestone completes, all slices are squash-merged to main (or the source branch) as a single milestone commit. This keeps main clean and makes git bisect efficient. Each milestone is a single revertable unit.

Because milestones squash-merge, you can revert an entire milestone with a single git revert. No cherry-picking across dozens of commits. The granular per-task history is preserved on slice branches if you need it.

06

Verification System — Trust but Check

Every task goes through a multi-layer verification pipeline before it's considered complete. This is how GSD ensures agents actually build what they claim.

Must-Haves

Each task plan includes "must-haves" — mechanically checkable outcomes. These aren't vague descriptions; they're observable truths: implementation artifacts exist, key imports are wired, specific behaviors are present.

Verification Pipeline

Static Checks

Must-have verification against the file system and code artifacts.

Shell Commands

Configurable lint, test, and build commands. Auto-fix retries on failure.

Behavioral Testing

Does the feature actually work? Validated against slice acceptance criteria.

Human Review

Only when necessary. GSD generates UAT scripts from slice outcomes so you know exactly what to test.

07

Crash Recovery — Sessions Die, State Survives

Agent sessions crash. Network connections drop. Machines restart. GSD-2 is built for all of it.

Lock files track the current unit of work — survives session death
Session forensics — reads surviving session files and synthesizes a recovery briefing from all completed tool calls
Parallel state persistence — orchestrator state persisted with PID liveness detection
Headless auto-restart — on crash, auto-restarts with exponential backoff (default 3 attempts)

How Recovery Works

When GSD detects a crashed session, it reads the lock file and surviving artifacts. It builds a recovery context — what was completed, what was in progress, what remains — and injects that into the next agent session. The agent picks up exactly where the crash happened.

08

Cost & Token Tracking — Know What You're Spending

Autonomous agents can burn through API budgets fast. GSD-2 includes a full cost management system built directly into the execution pipeline.

Every task, slice, and milestone tracks tokens used and cost incurred, broken down by phase and model. You can see exactly where your budget went — research vs. planning vs. execution vs. verification.

Access the cost dashboard with Ctrl+Alt+G or /gsd status. Shows live spend, token consumption, and progress metrics across the current milestone.

Set a USD budget ceiling in preferences. GSD pauses auto mode before overspending — it won't silently blow past your limit. You get a prompt to continue, adjust, or stop.

Based on completed work, GSD projects remaining cost for the milestone. Useful for estimating total project cost before committing to the full run.

09

Artifact Management — The .gsd/ Directory

All project state lives in a .gsd/ directory at the project root. These are human-readable Markdown files that double as the agent's source of truth.

File	Purpose
STATE.md	Quick-glance dashboard — read this first for current status
DECISIONS.md	Append-only architectural decision register
PROJECT.md	Living doc of current project state and context
M###-ROADMAP.md	Slice checkboxes, risk levels, and dependency graph
S##-PLAN.md	Task decomposition with must-haves for each task
T##-SUMMARY.md	Outcome narrative with YAML frontmatter metadata
S##-UAT.md	Human acceptance test script generated from slice outcomes
reports/*.html	Self-contained HTML reports with DAG, charts, and changelog

Pro Tip

These files are both human-readable and machine-readable. You can review STATE.md in your editor to see exactly what the agent sees. Edit DECISIONS.md to steer architectural choices mid-run.

10

Command Reference — What You Can Run

GSD-2 exposes its functionality through slash commands (inside agent sessions) and CLI commands (for headless/external use).

/gsd

Step mode — guided, pausing between each unit

/gsd auto

Autonomous execution — full autopilot

/gsd next

Execute the next unit in step mode

/gsd discuss

Architectural discussion (works alongside auto)

/gsd status

Progress dashboard with cost and token metrics

/gsd queue

Queue future milestones for execution

/gsd migrate

Convert v1 .planning/ to v2 .gsd/ format

/gsd export --html

Generate self-contained HTML report

/gsd doctor

Runtime health checks with auto-fix

gsd headless

No-TUI automation for CI/CD pipelines

gsd headless query

Instant JSON snapshot (~50ms, no LLM)

11

Tech Stack — What Powers It

Runtime — TypeScript application built on the Pi SDK
Agent Control — Direct harness access, not prompt-based orchestration
Version Control — Git with native worktree support
LLM Providers — Anthropic, OpenAI, Google, OpenRouter, GitHub Copilot, and 15+ others
Per-Phase Model Selection — different models for research vs. planning vs. execution, with fallback chains
IDE Integration — VS Code extension with chat participant and dashboard
Distribution — npm global CLI (gsd-pi)
Headless Deployment — CI/CD compatible, auto-restart on crash

Model Flexibility

GSD supports per-phase model selection with fallback chains. Use a cheaper model for research, a powerful one for execution, and a fast one for verification. Configure in .gsd/preferences.md.

12

Why GSD-2 Is Notable — The Key Differentiators

In a landscape of AI coding tools, GSD-2 stands out for solving the problems that actually kill agent-driven projects.

No context degradation — fresh session per task, not accumulated garbage. The agent never gets confused by prior noise.
True automation via state machine — reads .gsd/ files from disk, not LLM self-looping. Deterministic orchestration, not hope-based chaining.
Built-in observability — cost tracking, token metrics, progress dashboards, stuck detection, and timeout supervision. You always know what's happening.
Crash resilience — session forensics, lock files, PID liveness detection, auto-recovery with exponential backoff. Sessions die; state survives.
Verification guarantees — configurable shell commands with auto-fix retries. Must-haves are mechanically checked, not self-assessed.
Clean git history — squash-merged milestones, meaningful per-task commits, git bisect-friendly. Your repo stays professional.

"The agent is a tool, not the orchestrator. The state machine reads disk. The LLM executes tasks."

GSD-2: Get Shit DoneAutonomous Coding Agents

What Is GSD-2 — and Why It Exists

Work Hierarchy — Milestone → Slice → Task

Milestone

Slice

Task

Operational Modes — Three Ways to Run

Autonomous Mode

Step Mode

Headless Mode

Context Engineering — Zero Wasted Tool Calls

Git Strategy — Bisect-Friendly by Default

Verification System — Trust but Check

Must-Haves

Verification Pipeline

Static Checks

Shell Commands

Behavioral Testing

Human Review

Crash Recovery — Sessions Die, State Survives

Cost & Token Tracking — Know What You're Spending

Artifact Management — The .gsd/ Directory

Command Reference — What You Can Run

Tech Stack — What Powers It

Why GSD-2 Is Notable — The Key Differentiators

GSD-2: Get Shit Done
Autonomous Coding Agents