Open Source CLI Tool

GSD-2: Get Shit Done
Autonomous Coding Agents

A purpose-built TypeScript CLI that orchestrates AI agents to plan, execute, verify, and ship multi-phase software projects — without human intervention.

20+
LLM Providers
3
Execution Modes
50ms
Headless Query
v2
Major Rewrite
01

What Is GSD-2 — and Why It Exists

GSD-2 is a standalone CLI application (npm package: gsd-pi) that enables autonomous AI agents to execute multi-phase software development projects end-to-end. It's the evolution of the original GSD prompt framework.

Version 1 was a prompt framework that relied on Claude Code slash commands. Version 2 is a complete rewrite — a purpose-built agent orchestration system written in TypeScript on the Pi SDK, giving it direct harness-level control over agent sessions, context windows, and execution pipelines.

Key Distinction

v1 = prompt framework that guided an LLM. v2 = application that controls agent sessions directly. The LLM is a tool, not the orchestrator.

npm install -g gsd-pi gsd /login # OAuth or API key for 20+ providers
02

Work Hierarchy — Milestone → Slice → Task

GSD structures all work into a three-level hierarchy designed around context window constraints. Every unit of work has a clear definition and scope ceiling.

Milestone

A shippable version of the project. Contains 4–10 slices. This is what gets squash-merged to main and what you'd demo to stakeholders.

Slice

A demoable vertical capability. Contains 1–7 tasks. Each slice gets its own git branch with worktree isolation.

Task

A single context-window unit of work. The atomic execution unit. Each task gets a fresh context window with pre-inlined context.

"A task must fit in one context window. If it can't, it's two tasks."
03

Operational Modes — Three Ways to Run

GSD-2 offers three distinct execution modes, each targeting a different workflow — from fully hands-off automation to guided step-by-step execution to CI/CD integration.

1

Autonomous Mode

/gsd auto

Full autopilot. Researches, plans, executes, verifies, commits, and repeats — without human intervention. Each phase gets a fresh context window with pre-inlined relevant context. The agent walks through: research → plan → execute (per task) → complete → reassess roadmap → next slice.

2

Step Mode

/gsd next

Same state machine, human pacing. Pauses between each unit with a wizard UI showing completed work and next steps. Ideal for developers who want to stay in the loop while the agent handles execution.

3

Headless Mode

gsd headless

No TUI, pure automation. Designed for CI pipelines, cron jobs, and scripted workflows. gsd headless query provides instant JSON snapshots (~50ms) without spawning LLM sessions.

Pro Tip

Use the two-terminal workflow: run /gsd auto in one terminal, and steer with /gsd discuss or /gsd status in another. The agent adapts in real time.

04

Context Engineering — Zero Wasted Tool Calls

One of GSD-2's most powerful capabilities: it pre-inlines everything the agent needs into dispatch prompts. Task plans, slice summaries, prior outcomes, roadmap excerpts, and decision registers — all injected before the LLM session starts.

  • Fresh context per task — no garbage accumulation from prior work
  • Pre-inlined task plans — the agent starts with full orientation
  • Prior outcomes — what was built, what succeeded, what failed
  • Decision register excerpts — architectural choices already made
  • Roadmap context — what's ahead, what's at risk
Why This Matters

LLMs degrade as context fills up. By starting each task with a clean, purpose-built context window, GSD avoids the "confused agent" problem where prior noise causes hallucinated decisions.

05

Git Strategy — Bisect-Friendly by Default

GSD-2 manages git automatically with a strategy designed for clean history, safe isolation, and easy rollbacks.

Every slice gets its own git branch. By default, GSD uses git worktrees so each slice works on an isolated copy of the repo. No stashing, no conflicts between parallel slices. Configurable: worktree (default), branch, or none.
Within a slice, each task produces a commit with a message derived from the task summary. Commits are sequential and meaningful — you can read the branch history and understand exactly what happened.
When a milestone completes, all slices are squash-merged to main (or the source branch) as a single milestone commit. This keeps main clean and makes git bisect efficient. Each milestone is a single revertable unit.
Because milestones squash-merge, you can revert an entire milestone with a single git revert. No cherry-picking across dozens of commits. The granular per-task history is preserved on slice branches if you need it.
06

Verification System — Trust but Check

Every task goes through a multi-layer verification pipeline before it's considered complete. This is how GSD ensures agents actually build what they claim.

Must-Haves

Each task plan includes "must-haves" — mechanically checkable outcomes. These aren't vague descriptions; they're observable truths: implementation artifacts exist, key imports are wired, specific behaviors are present.

Verification Pipeline

Static Checks

Must-have verification against the file system and code artifacts.

Shell Commands

Configurable lint, test, and build commands. Auto-fix retries on failure.

Behavioral Testing

Does the feature actually work? Validated against slice acceptance criteria.

Human Review

Only when necessary. GSD generates UAT scripts from slice outcomes so you know exactly what to test.

07

Crash Recovery — Sessions Die, State Survives

Agent sessions crash. Network connections drop. Machines restart. GSD-2 is built for all of it.

  • Lock files track the current unit of work — survives session death
  • Session forensics — reads surviving session files and synthesizes a recovery briefing from all completed tool calls
  • Parallel state persistence — orchestrator state persisted with PID liveness detection
  • Headless auto-restart — on crash, auto-restarts with exponential backoff (default 3 attempts)
How Recovery Works

When GSD detects a crashed session, it reads the lock file and surviving artifacts. It builds a recovery context — what was completed, what was in progress, what remains — and injects that into the next agent session. The agent picks up exactly where the crash happened.

08

Cost & Token Tracking — Know What You're Spending

Autonomous agents can burn through API budgets fast. GSD-2 includes a full cost management system built directly into the execution pipeline.

Every task, slice, and milestone tracks tokens used and cost incurred, broken down by phase and model. You can see exactly where your budget went — research vs. planning vs. execution vs. verification.
Access the cost dashboard with Ctrl+Alt+G or /gsd status. Shows live spend, token consumption, and progress metrics across the current milestone.
Set a USD budget ceiling in preferences. GSD pauses auto mode before overspending — it won't silently blow past your limit. You get a prompt to continue, adjust, or stop.
Based on completed work, GSD projects remaining cost for the milestone. Useful for estimating total project cost before committing to the full run.
09

Artifact Management — The .gsd/ Directory

All project state lives in a .gsd/ directory at the project root. These are human-readable Markdown files that double as the agent's source of truth.

File Purpose
STATE.md Quick-glance dashboard — read this first for current status
DECISIONS.md Append-only architectural decision register
PROJECT.md Living doc of current project state and context
M###-ROADMAP.md Slice checkboxes, risk levels, and dependency graph
S##-PLAN.md Task decomposition with must-haves for each task
T##-SUMMARY.md Outcome narrative with YAML frontmatter metadata
S##-UAT.md Human acceptance test script generated from slice outcomes
reports/*.html Self-contained HTML reports with DAG, charts, and changelog
Pro Tip

These files are both human-readable and machine-readable. You can review STATE.md in your editor to see exactly what the agent sees. Edit DECISIONS.md to steer architectural choices mid-run.

10

Command Reference — What You Can Run

GSD-2 exposes its functionality through slash commands (inside agent sessions) and CLI commands (for headless/external use).

/gsd
Step mode — guided, pausing between each unit
/gsd auto
Autonomous execution — full autopilot
/gsd next
Execute the next unit in step mode
/gsd discuss
Architectural discussion (works alongside auto)
/gsd status
Progress dashboard with cost and token metrics
/gsd queue
Queue future milestones for execution
/gsd migrate
Convert v1 .planning/ to v2 .gsd/ format
/gsd export --html
Generate self-contained HTML report
/gsd doctor
Runtime health checks with auto-fix
gsd headless
No-TUI automation for CI/CD pipelines
gsd headless query
Instant JSON snapshot (~50ms, no LLM)
11

Tech Stack — What Powers It

  • Runtime — TypeScript application built on the Pi SDK
  • Agent Control — Direct harness access, not prompt-based orchestration
  • Version Control — Git with native worktree support
  • LLM Providers — Anthropic, OpenAI, Google, OpenRouter, GitHub Copilot, and 15+ others
  • Per-Phase Model Selection — different models for research vs. planning vs. execution, with fallback chains
  • IDE Integration — VS Code extension with chat participant and dashboard
  • Distribution — npm global CLI (gsd-pi)
  • Headless Deployment — CI/CD compatible, auto-restart on crash
Model Flexibility

GSD supports per-phase model selection with fallback chains. Use a cheaper model for research, a powerful one for execution, and a fast one for verification. Configure in .gsd/preferences.md.

12

Why GSD-2 Is Notable — The Key Differentiators

In a landscape of AI coding tools, GSD-2 stands out for solving the problems that actually kill agent-driven projects.

  • No context degradation — fresh session per task, not accumulated garbage. The agent never gets confused by prior noise.
  • True automation via state machine — reads .gsd/ files from disk, not LLM self-looping. Deterministic orchestration, not hope-based chaining.
  • Built-in observability — cost tracking, token metrics, progress dashboards, stuck detection, and timeout supervision. You always know what's happening.
  • Crash resilience — session forensics, lock files, PID liveness detection, auto-recovery with exponential backoff. Sessions die; state survives.
  • Verification guarantees — configurable shell commands with auto-fix retries. Must-haves are mechanically checked, not self-assessed.
  • Clean git history — squash-merged milestones, meaningful per-task commits, git bisect-friendly. Your repo stays professional.
"The agent is a tool, not the orchestrator. The state machine reads disk. The LLM executes tasks."