Plugin

pilot-shell

by maxritter

Claude Code plugin for spec-driven development — enforces quality gates, persistent knowledge, and TDD for production-grade AI

Pilot Shell is a Claude Code plugin that elevates AI agent development to production standards. It provides a comprehensive framework for spec-driven development, enforcing quality gates like linting, formatting, and type checking, while maintaining persistent knowledge across sessions. The tool integrates with Claude Code's native features, offering a local web dashboard for session management and real-time notifications.

View on GitHub ↗

Key features

Spec-driven development with TDD for features and bug fixes
Enforces quality gates: linting, formatting, type checking, tests
Persistent context engineering for cross-session knowledge
Semantic code search and knowledge graph via Probe and CodeGraph
Local web dashboard for session management and notifications

Languages

TypeScript60%Python30%JavaScript8%Shell1%HTML0%

Top contributors

maxritter845

semantic-release-bot341

mavogel22

claude11

+74 stars since added

Started at 1.7k ★ when added to RepoDepot.

Topics

ai-agentsai-assistantai-codingai-coding-toolsai-engineeringai-toolsanthropicanthropic-claudeclaudeclaude-aiclaude-codeclaude-contextclaude-skillsclaudecodemodel-context-protocolspec-driven-development

README

View on GitHub ↗

Make Claude Code production-ready.

From requirement to production-grade code — planned, tested, verified.
Spec-driven plans. Enforced quality gates. Persistent knowledge.

Install • Features • Console • Docs • Website • Changelog

curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash

macOS · Linux · Windows (WSL2) — installs in under 2 minutes.

Why Pilot Shell

Claude Code writes code fast — but without structure, it skips tests, loses context, and produces inconsistent results. Other frameworks add complexity (dozens of agents, thousands of lines of config) without meaningfully better output.

Pilot Shell is different. Every component solves a real problem:

/spec — plans, implements, and verifies features end-to-end with TDD
/prd — turns vague ideas into clear requirements with optional deep research
Quality hooks — enforce linting, formatting, type checking, and tests as gates (not suggestions)
Context engineering — preserves decisions and knowledge across sessions
Code intelligence — semantic search (Probe) + code knowledge graph (CodeGraph)
Token optimization — 60–90% cost reduction via RTK and context-mode
Extensions — reusable rules, skills, and MCP servers with team sharing and customization
Console — local web dashboard with real-time notifications and session management
Pilot Bot — persistent automation agent with scheduled tasks and background jobs

Run pilot for Spec-Driven Development with /spec, or pilot bot for 24/7 automations.

Getting Started

Prerequisites

Claude Code: Install Claude Code using the native installer before setting up Pilot Shell. If you have the npm or brew version installed, uninstall it first. If no Claude Code installation is detected, the Pilot installer will attempt to set it up for you.

Claude Subscription: Solo developers should choose Max 5x for moderate usage or Max 20x for heavy usage. Teams should use Team Premium (6.25x usage per member, SSO, admin tools, billing management). Companies with stricter compliance or procurement requirements should use Enterprise (API based pricing applies per usage).

Terminal (Recommended): cmux works great with Pilot Shell — its vertical tab layout lets you run multiple sessions side by side. Any modern terminal works: Ghostty, iTerm2, or the built-in macOS/Linux terminal.

Claude Chrome (Recommended): Install the Claude Code Chrome extension for browser automation and E2E testing. Pilot automatically detects it and uses it as the preferred tool. When the extension isn't available, Pilot falls back to Chrome DevTools MCP (direct CDP access, Lighthouse, performance tracing), then playwright-cli (persistent sessions, tracing) or agent-browser (lightweight, fast startup).

Codex Plugin (Included): The Codex plugin is installed automatically with Pilot. It provides adversarial code review powered by OpenAI Codex — an independent second opinion during /spec planning and verification. Run /codex:setup once to authenticate, then enable reviewers in Console Settings → Reviewers. A ChatGPT Plus subscription ($20/mo) covers the Codex API usage.

Installation

Works with any existing project. Pilot Shell is installed on top of Claude Code and uses its built-in concepts like commands, rules, hooks, skills, subagents, MCP, LSP and optimized settings to improve your experience:

curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash

Installs globally on macOS, Linux, and Windows (WSL2). All tools and rules go to ~/.pilot/ and ~/.claude/. After installation, cd into any project and run pilot or ccp to start.

Downgrade

If you encounter an issue or unfixed bug in the latest version, you can always go back to a previous version (see releases):

export VERSION=8.2.3
curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash

Uninstalling

Removes the Pilot binary, plugin files, managed commands/rules, settings and shell aliases:

curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/uninstall.sh | bash

Using a Dev Container

Pilot Shell works inside Dev Containers. Copy the .devcontainer folder from this repository into your project, adapt it to your needs (base image, extensions, dependencies), and run the installer inside the container. The installer auto-detects the container environment and skips system-level dependencies like Homebrew.

What the installer does

7-step installer with progress tracking, rollback on failure, and idempotent re-runs:

Prerequisites — Checks/installs Homebrew, Node.js, Python 3.12+, uv, git, jq
Claude files — Sets up ~/.claude/ plugin — rules, commands, hooks, MCP servers
Config files — Creates .nvmrc and project config
Dependencies — Installs Probe, RTK, CodeGraph, context-mode (better-sqlite3), Chrome DevTools MCP, playwright-cli, agent-browser, language servers
Shell integration — Auto-configures bash, fish, and zsh with pilot alias
VS Code extensions — Installs recommended extensions for your stack
Finalize — Success message with next steps

How It Works

export VERSION=8.2.3
curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash

Just chat — no plan, no approval gate. Quality hooks and TDD enforcement still apply. Best for small tasks and exploration. For anything that needs a plan, use /spec — not Claude Code's built-in plan mode.

/spec — Spec-Driven Development

/spec replaces Claude Code's built-in plan mode (Shift+Tab). It provides a complete planning workflow with TDD, verification, and code review — use /spec instead of plan mode for all planned work.

Features, bug fixes, refactoring — describe it and /spec handles the rest. Auto-detects whether it's a feature or a bugfix and adapts the workflow. Specs are saved to docs/plans/ and visible in the Console's Specification tab.

pilot
> /spec "Add user authentication with OAuth and JWT tokens"   # → feature mode
> /spec "Fix the crash when deleting nodes with two children"  # → bugfix mode (auto-detected)

Discuss  →  Plan  →  Approve  →  Implement (TDD)  →  Verify  →  Done
                                                        ↑         ↓
                                                        └── Loop──┘

Feature Mode

Full exploration workflow for new functionality, refactoring, or architectural changes.

Plan: Explores codebase with semantic search → asks clarifying questions → writes detailed spec with scope, tasks, and definition of done → for UI features, writes E2E test scenarios (step-by-step, browser-executable) that become the verification contract → spec-review sub-agent validates completeness → waits for your approval. Optional Codex adversarial review provides an independent second opinion when enabled.

Implement: Creates an isolated git worktree → implements each task with strict TDD (RED → GREEN → REFACTOR) → quality hooks auto-lint, format, and type-check every edit → full test suite after each task.

Verify: Full test suite + actual program execution → unified review sub-agent (compliance + quality + goal) → for UI features, executes each E2E scenario step-by-step via browser automation (pass/fail tracked, results written to plan) → auto-fixes findings → squash merges to main on success.

Bugfix Mode

Investigation-first workflow for targeted fixes. Finds the root cause before touching any code.

Investigate: Reproduces the bug → traces backward through the call chain to find the root cause at a specific file:line → compares against working code patterns → states the fix with confidence level. If 3+ hypotheses fail, escalates as an architectural problem.

Test-Before-Fix: Writes a regression test that FAILS on current code → implements the minimal fix at the root cause → verifies all tests pass. Defense-in-depth validation at multiple layers when the bug involves data flowing through shared code paths.

Verify: Lightweight verification — regression test confirmation → full test suite → lint + type check → quality checks. No review sub-agents — the regression test proves the fix works, the full suite proves nothing else broke.

Why this matters: Root cause investigation prevents "fix one thing, break another." The regression test locks in the fix. No formal notation overhead — just trace, test, fix, verify.

Status Line

Pilot shell ships with its own advanced status line with real-time session metrics and spec progress:

All fields explained

Line 1 — Session Metrics (separated by |):

Widget	Description
Model	Active model in short form (`Opus 4.7 [1M]`, `Sonnet 4.6`)
Context	Effective context usage with progress bar, buffer indicator, and token count. Green < 80%, Yellow 80–95%, Red 95%+
Lines changed	`+added -removed` in session (hidden when usage API data available)
Git	Branch with staged (`+N`) / unstaged (`~N`) counts
Cost	Session cost in USD. Green < $1, Yellow $1–5, Red $5+
Savings	Token savings percentage from RTK proxy (`Savings: N%`), shown when no usage data

Line 2 — Mode:

Quick Mode: Quick Mode
Spec Mode: Plan name, type (feature/bugfix), phase (plan/implement/verify), progress bar, task count, and iteration count

Line 3 — Version & Session Info:

Pilot <version> (<tier>) · CC <version> (<subscription>) · sessions <N> · memories <N>

Pilot tier: Solo, Team, or Trial with time remaining. Claude subscription (Pro/Max/Team/Enterprise) detected via claude auth status and cached for 24 hours.

Pilot Shell Console

A local web dashboard with different views and real-time notifications when Claude needs your input:

All views

Each view with project-specific data has an inline Project Filter dropdown — switch projects without leaving the page. Dashboard stats tiles are clickable — navigate directly to the relevant view.

View	What it shows
Dashboard	Global command center — 8 clickable stat cards, 4 recent cards (Specifications, Requirements, Sessions, Memories) with "Show all" links. Active specs as pills in the top bar, notification bell in top right.
Sessions	Browse past sessions with search. Copy the session ID and use `/resume <session-id>` in Claude Code to jump back into any session.
Memories	Browsable observations — decisions, discoveries, bugfixes — with type filters and search. Each memory shows its session — click to navigate.
Requirements	PRD documents with view/annotate modes. Selected shown as a tab, others in a Previous dropdown.
Specifications	All spec plans with task progress, phase tracking, and iteration history. Annotate mode lets you mark up plans visually before approving — select text or click + on any block to write a note. Share with Teammate generates a compressed share link; Receive Feedback imports their annotations with accept/reject controls
Extensions	All extensions — local, plugin, and remote — with team sharing via git, diff view, push/pull, and color-coded categories.
Changes	Git diff viewer with staged/unstaged files, branch info, and worktree context. Review mode adds inline annotations on diff lines — the agent reads them directly before marking a spec as verified
Usage	Daily token costs, model routing breakdown, and usage trends
Help	Documentation, guides, and quick-start resources
Settings	Model selection per command/sub-agent, spec workflow toggles (worktree, questions, approval), reviewer toggles (spec review, changes review, optional Codex), extended context (1M) toggle with pricing info

Visual plan annotation

When a spec plan is pending approval, the Specifications tab defaults to Annotate mode. Two ways to annotate:

Select text — highlight any passage and write a note via the floating toolbar
Click + — hover over any block to add a note without selecting text

Annotations save automatically. The agent reads them at the next checkpoint, revises the plan accordingly, and asks for approval again. This turns plan review into a conversation — you mark what needs changing, the agent addresses it, you review again.

Spec sharing

Share specs and requirements with teammates for async review — no cloud service required:

Share — Click Share with Teammate to generate a compressed link. The entire spec and annotations are encoded into the URL fragment, so no data is transmitted to any server.
Review — Your colleague opens the link in their Console (or on pilot-shell.com), sees your annotations, and adds their own feedback.
Import — Click Receive Feedback to import their annotations. Each annotation has per-item accept/reject controls — accepted annotations merge into your plan, rejected ones are removed from both the sidebar and inline markers.

Everything works locally. The link is self-contained — the recipient doesn't need access to your machine, repository, or any shared service.

Code review

After a spec completes all automated checks, the agent prompts you to review the changes in the Changes tab:

Enable Review mode — toggle it in the Changes tab header
Annotate diffs — click + on any diff line to add an inline note. Annotations save automatically.
Agent addresses feedback — the agent reads every annotation and resolves them before marking the spec as verified

This gives you a final quality gate with direct, line-level feedback — the same workflow as a PR review, but before the code ever leaves your machine.

Extensions

Rules, commands, skills, and agents — all plain markdown files in .claude/ (project) or ~/.claude/ (global). The Console Extensions page lets you browse, edit, compare, and share everything from one place:

Extension categories

Extension	Location	When it loads
Skills	`.claude/skills/`	Automatically when relevant
Rules	`.claude/rules/`	Every session, or by file type
Commands	`.claude/commands/`	On demand via `/command-name`
Agents	`.claude/agents/`	Spawned as sub-agents for specialized tasks

Use /setup-rules to auto-generate rules from your codebase. Use /create-skill to capture workflows as reusable skills.

Scopes: Global, Project, Plugin

Project extensions live in .claude/ — commit them so teammates get them on git clone. Global extensions live in ~/.claude/ — personal and available across all projects. Move extensions between scopes with one click.

Plugin extensions come from installed Claude Code plugins (claude plugin install <name>). They appear as read-only items — visible but not editable.

Team sharing & APM (Team tier)

Connect a git repository to share extensions across your team:

Push local extensions to the team remote
Pull remote extensions to your machine (global or project scope)
Compare local vs remote with a built-in side-by-side diff view
Conflict detection — when local and remote differ, choose which version to keep

APM format — check one box and your remote becomes an APM package, directly installable via apm install owner/repo by anyone using Copilot, Cursor, OpenCode, or Claude. Extensions are automatically converted to APM conventions on push:

Pilot Shell	APM Remote
`rules/my-rule.md`	`instructions/my-rule.instructions.md`
`commands/my-cmd.md`	`prompts/my-cmd.prompt.md`
`skills/my-skill/SKILL.md`	`skills/my-skill/SKILL.md`
`agents/my-agent.md`	`agents/my-agent.agent.md`

APM-compatible frontmatter is injected automatically. An apm.yml manifest is generated. Toggling APM on/off migrates existing extensions in a single commit.

Customization

Customize everything Pilot auto-installs — tweak the built-in /spec workflow, modify existing rules, register additional hooks, add agents, and adjust the auto-applied settings.json, claude.json, .mcp.json, and .lsp.json. Source can be a git repo (team-wide) or a local directory (personal, no git needed). Team and Enterprise plans.

pilot customize install <git-url-or-path>    # Install and apply
pilot customize update                       # Pull latest (git) or re-read (local)
pilot customize status                       # Active source + drift warnings
pilot customize remove                       # Restore Pilot defaults

What you can customize

Target	How
Skills (built-in workflows like `/spec`, `/prd`)	Overlay ops in `customization.json`: `insert_after`, `insert_before`, `replace`, `disable` — or ship an entirely new skill folder
Rules	New rules are additive; same filename as a core rule overrides it
Hooks	Scripts copy to `~/.claude/pilot/hooks/`; ship `hooks.json` to register them alongside Pilot's built-ins
Agents	Drop `.md` files to add review/helper agents into the spec workflow
MCP servers	Top-level `.mcp.json` replaces the auto-configured server list
LSP servers	Top-level `.lsp.json` replaces the auto-configured server list
Claude settings	Top-level `settings.json` and `claude.json` deep-merge into the user's files — model prefs, permissions, env vars, etc. User state (oauth, projects) is preserved

Replaced skill fragments stay pinned to upstream by hash. pilot customize status surfaces drift when Pilot upgrades a replaced step; pilot customize diff <skill>/<step-id> shows what changed so you can port improvements. See the Customization guide for the full schema.

Repository / folder structure

The layout is the same whether you publish it as a git repo or keep it as a local folder. Directory names map 1:1 to ~/.claude/:

my-customization/
├── customization.json          # Required: metadata + optional skill overlays
├── settings.json               # Optional: deep-merges into ~/.claude/settings.json
├── claude.json                 # Optional: deep-merges into ~/.claude.json
├── .mcp.json                   # Optional: replaces ~/.claude/pilot/.mcp.json
├── .lsp.json                   # Optional: replaces ~/.claude/pilot/.lsp.json
├── skills/                     # → ~/.claude/skills/
│   ├── spec-plan/steps/
│   │   └── security-review.md  # New step injected into spec-plan
│   └── team-deploy/            # Brand-new skill
│       ├── manifest.json
│       ├── orchestrator.md
│       └── steps/01-stage.md
├── rules/                      # → ~/.claude/rules/
│   ├── team-standards.md       # Additive
│   └── testing.md              # Overrides core (same filename)
├── hooks/                      # → ~/.claude/pilot/hooks/
│   ├── team-lint-check.sh
│   └── hooks.json              # Registers team-lint-check.sh alongside Pilot's hooks
└── agents/                     # → ~/.claude/pilot/agents/
    └── team-reviewer.md

Only ship the files and directories you need. A repo with just rules/ is a valid customization; so is one with just .mcp.json.

Pilot Bot

Run Claude Code as a persistent 24/7 automation agent with scheduled tasks, background jobs and heartbeat monitoring:

pilot bot    # Launch automation session (auto-initializes on first run)

Pilot Bot defines scheduled jobs, automates recurring tasks, and monitor system health around the clock. If the Telegram Channels plugin is installed, the bot auto-detects it and enables bidirectional messaging. Similar to OpenClaw, but without the added complexity and costs.

Bot skills

Skill	Purpose
`bot-boot`	Boot sequence — health check, job registration, heartbeat setup
`bot-heartbeat`	Periodic health checks, notifies only when issues are detected
`bot-jobs`	Manage scheduled jobs — add, remove, pause, resume, edit, list
`bot-channel-task`	Channel task flow — acknowledge, execute, report (when Telegram is available)
`bot-defaults`	Standard bot behaviors (dedup, reporting, error handling)

/prd — Generate Product Requirements Documents

Use /prd before /spec when requirements are unclear. It's a strategic thought partner that turns vague ideas into concrete Product Requirements Documents (PRDs) through one-on-one conversation — with optional research, challenging assumptions, exploring trade-offs, and defining scope before you commit to building.

pilot
> /prd "Add real-time notifications for team updates"
> /prd "We need better onboarding — users drop off after signup"

Choose a research tier at the start: Quick (skip), Standard (web search for competitors, prior art, best practices), or Deep (parallel research agents for comprehensive findings). The conversation produces a PRD with problem statement, core user flows, scope boundaries, and technical context — then offers to hand off directly to /spec for implementation. PRDs are saved to docs/prd/ and visible in the Console's Requirements tab.

/setup-rules — Generate Modular Rules

Explores your codebase, discovers conventions, generates modular rules and documents MCP servers. Run once initially, then anytime your project changes significantly.

pilot
> /setup-rules

What /setup-rules Does

11 phases that read your codebase and produce comprehensive AI context:

Reference — load best practices for rule structure, path-scoping, and quality standards
Read existing rules — inventory all .claude/rules/ files, detect structure and path-scoping
Migrate unscoped assets — prefix with project slug for better sharing
Quality audit — check rules against best practices (size, specificity, stale references, conflicts)
Explore codebase — semantic search with Probe CLI, structural analysis with CodeGraph
Compare patterns — discovered vs documented conventions
Sync project rule — update {slug}-project.md with current tech stack, structure, commands
Sync MCP docs — smoke-test user MCP servers, document working tools
Discover new rules — find undocumented patterns worth capturing
Cross-check — validate all references, ensure consistency across generated files
Summary — report all changes made

For monorepos: Organizes rules in nested subdirectories by product and team, with paths frontmatter to scope rules to specific file types. Generates a README.md documenting the structure.

/create-skill — Reusable Skill Creator

Builds a reusable skill from any topic — explores the codebase and creates it interactively with you. If no topic is given, evaluates the current session for extractable knowledge.

pilot
> /create-skill "Automate the review and triaging of our PR Bot comments"

What /create-skill Does

6 phases that turn domain knowledge into a reusable skill:

Reference — load use case categories, complexity spectrum, file structure template, description formula, security restrictions
Understand — explore the codebase for relevant patterns, ask clarifying questions, or evaluate the current session for extractable knowledge
Check existing — search project and global skills to avoid duplicates
Create — write to .claude/skills/ (project) or ~/.claude/skills/ (global), apply portability and determinism checklists
Quality gates — structure checklist (SKILL.md naming, frontmatter fields), content checklist (error handling, examples, exclusions), triggering test (should/shouldn't trigger), iteration signals
Test & iterate — run test prompts with sub-agents, evaluate results, optimize description triggering

Use case categories:

Category	Best For
Document & Asset Creation	Consistent reports, designs, code with embedded style guides and templates
Workflow Automation	Multi-step processes with validation gates and iterative refinement
MCP Enhancement	Workflow guidance on top of MCP tool access, multi-MCP coordination

Skill structure: Each skill is a folder with a SKILL.md file (case-sensitive), optional scripts/, references/, and assets/ directories. The YAML frontmatter description determines when Claude loads the skill — it must include what the skill does, when to use it, and specific trigger phrases. Progressive disclosure keeps context lean: frontmatter loads always (~100 tokens), SKILL.md loads on activation, linked files load on demand.

Claude CLI Flag Passthrough

All Claude Code CLI flags work directly with pilot — current and future. Pilot forwards any flag it doesn't recognize to the Claude CLI automatically.

pilot --channels plugin:telegram@claude-plugins-official
pilot --model opus --verbose
pilot --resume

Headless Mode

Run Pilot non-interactively with -p for CI/CD pipelines, scripts, and automated workflows. All Claude Code CLI flags work — --output-format, --allowedTools, --channels, --continue, --bare, etc.

pilot -p "Run tests and fix failures" --allowedTools "Bash,Read,Edit"
pilot -p "Summarize this project" --o

Similar plugins

Plugin

superpowers

by obra

Claude Code plugin for agentic software development — automates TDD, planning, and subagent coordination

$/plugin install superpowers@claude-plugins-official

213k+53kv5.1.0· 1mo agoShell

GitHub ↗

Plugin

everything-claude-code

by affaan-m

Claude Code plugin: agent harness performance system with skills, memory, security, and continuous learning

$/plugin marketplace add https://github.com/affaan-m/everything-claude-code

199k+38kv1.10.0· 1mo agoJavaScript

GitHub ↗

Plugin

1 dev uses this

andrej-karpathy-skills

by forrestchang

Claude Code plugin for LLM coding guidelines — applies Karpathy's principles to prevent common AI coding pitfalls

$/plugin install andrej-karpathy-skills@karpathy-skills

161k+100kupdated 1mo ago

GitHub ↗

Plugin

1 dev uses this

claude-mem

by thedotmack

Persistent memory for Claude Code plugin — captures, compresses, and reinjects session context

$/plugin marketplace add thedotmack/claude-mem

79k+16kv13.3.0· 1w agoTypeScript

GitHub ↗

See all Plugins →