Plugin

context-engineering-kit

by NeoLabHQ

Claude Code plugin for agent reliability — improves LLM output quality and predictability with advanced context engineering techniques

A Claude Code plugin marketplace offering a collection of advanced context engineering techniques and patterns to enhance agent result quality and predictability. It provides modular plugins, each loading specific skills, commands, and subagents, designed for token efficiency and scientific validation. Install individual plugins to introduce feedback loops, structured development methods, and specialized code review agents into your Claude Code workflow.

View on GitHub ↗

Key features

Minimizes token footprint with command-oriented skills and sub-agents
Modular plugins for specific agent result improvements
Integrates scientifically proven context engineering techniques
Adds Reflexion for self-refinement and memory updates
Provides Spec-Driven Development for reliable code generation

Languages

TypeScript61%Shell19%Just14%Dockerfile5%

Top contributors

+186 stars since added

Started at 832 ★ when added to RepoDepot.

Topics

agentaiclaudeclinecursorllmmarketplaceopencodewindsurf

README

View on GitHub ↗

Advanced context engineering techniques and patterns for Claude Code, OpenCode, Cursor, Antigravity and more.

Quick Start · Plugins · Github Action · Reference · Docs

Context Engineering Kit

Hand-crafted collection of advanced context engineering techniques and patterns with minimal token footprint, focused on improving agent result quality and predictability.

The marketplace is based on prompts used daily by our company developers for a long time, supplemented by plugins from benchmarked papers and high-quality projects.

Key Features

Simple to Use - Easy to install and use without any dependencies. Contains automatically used skills and self-explanatory commands.
Token-Efficient - Carefully crafted prompts and architecture, preferring command-oriented skills with sub-agents over general information skills when possible, to minimize populating context with unnecessary information.
Quality-Focused - Each plugin is focused on meaningfully improving agent results in a specific area.
Granular - Install only the plugins you need. Each plugin loads only its specific agents, commands, and skills. Each without overlap or redundant skills.
Scientifically proven - Plugins are based on proven techniques and patterns that were tested by well-trusted benchmarks and studies.
Open-Standards - Skills are based on agentskills.io specification. The SDD plugin is based on the Arc42 specification standard for software development documentation.

News

Updates from key releases:

v2.0.0: Spec-Driven Development plugin was rewritten from scratch. It is now able to produce working code in 99% of cases on real-life production projects!
v2.1.0: Spec-Driven Development plugin agents include high-level code quality guidelines from DDD plugin.
v2.2.0: Subagent-Driven Development plugin now works as a distilled version of SDD plugin using meta-judge and judge sub-agents for specification generation on the fly and in parallel to implementation. DDD plugin now includes Clean Architecture, DDD, SOLID, Functional Programming, and other pattern examples as rules that are automatically added to the context during code writing.

Quick Start

Step 1: Install Marketplace and Plugins

Claude Code

Open Claude Code and add the Context Engineering Kit marketplace

/plugin marketplace add NeoLabHQ/context-engineering-kit

This makes all plugins available for installation, but does not load any agents or skills into your context.

Install any plugin, for example reflexion:

/plugin install reflexion@NeoLabHQ/context-engineering-kit

Each installed plugin loads only its specific agents, commands, and skills into Claude's context.

Cursor, Antigravity, Codex, OpenCode and others

Run the vercel-labs/skills command in your terminal:

npx skills add NeoLabHQ/context-engineering-kit

You can pick which skills and agents to install.

Alternative installation methods

You can use OpenSkills to install skills by running the following commands:

npx openskills install NeoLabHQ/context-engineering-kit
npx openskills sync

Step 2: Use Plugin

> claude "implement user authentication"
# Claude implements user authentication, then you can ask it to reflect on implementation

> /reflexion:reflect
# It analyses results and suggests improvements
# If issues are obvious, it will fix them immediately
# If they are minor, it will suggest improvements that you can respond to
> fix the issues

# If you would like to prevent issues found during reflection from appearing again,
# ask Claude to extract resolution strategies and save the insights to project memory
> /reflexion:memorize

Alternatively, you can use the reflect word in the initial prompt:

> claude "implement user authentication, then reflect"
# Claude implements user authentication,
# then hook automatically runs /reflexion:reflect

In order to use this hook, you need to have bun installed. However, it is not required for the overall command.

Documentation

You can find the complete Context Engineering Kit documentation here.

However, the main plugins we recommend starting from are Subagent-Driven Development and Spec-Driven Development.

Agent Reliability Engineering

The three plugins in this marketplace are designed to improve how accurately and consistently the agent follows provided instructions and reduce the number of hallucinations and bias toward incorrect solutions. They are not competitors but rather complementary to each other, because they allow you to balance reliability vs token cost. Here is a high-level comparison of different agent usage approaches vs probability to receive results that are fully accurate and include zero hallucinations based on task complexity:

Approach	Probability to receive fully accurate results for the following number of changed files (p)				Tokens Overhead	What does this mean in practice
Approach	1-3	4-10	10-20	20+	Tokens Overhead	What does this mean in practice
One-shot prompt	60%-80%	30%-50%	5%-30%	1%-20%	0	Accuracy depends on model, but with context growth LLM quality degrades exponentially
/reflect	68%-91%	49%-71%	13%-41%	1%-30%	1k-3k	Agent finds and fixes missed requirements on its own
/reflect + /memorize	79%-87%	60%-79%	34%-42%	5%-30%	2k-5k	Agent extracts repeatable mistakes and avoids them during new tasks
/do-and-judge	90%	83%	60%	30%	1.5x-3x	Mitigates context rot, bias, hallucinations and missed requirements using Judge sub-agent
/do-in-steps	92%	90%	71%	50%	3x-5x	Resolves all issues similarly to /do-and-judge, but separately per file group
/plan + /implement	94%	93%	85%	70%	5x-20x	Performs the /do-in-steps flow, but the specification mitigates issues caused by inconsistent architecture and codebase size
/brainstorm + /plan + /implement	95%	95%	90%	80%	5x-20x	Brainstorming decreases the number of incorrect decisions and missed requirements
/plan + human review + /implement	99%	99%	99%	95%	5x-35x	Human review mitigates misunderstanding of requirements by LLM

Reliability metrics are based on real development usage on production projects for more than 6 months.

Plugins List

To view all available plugins:

/plugin

Reflexion - Introduces feedback and refinement loops to improve output quality.
Spec-Driven Development - Introduces commands for specification-driven development, based on Continuous Learning + LLM-as-Judge + Agent Swarm. Achieves development as compilation through reliable code generation.
Code Review - Introduces codebase and PR review commands and skills using multiple specialized agents.
Git - Introduces commands for commit and PR creation.
Test-Driven Development - Introduces commands for test-driven development, common anti-patterns and skills for testing using subagents.
Subagent-Driven Development - Introduces skills for subagent-driven development, which dispatches a fresh subagent for each task with code review between tasks, enabling fast iteration with quality gates.
Domain-Driven Development - Introduces commands to update CLAUDE.md with best practices for domain-driven development, focused on code quality, and includes Clean Architecture, SOLID principles, and other design patterns.
FPF - First Principles Framework - Introduces structured reasoning using ADI cycle (Abduction-Deduction-Induction) with knowledge layer progression. Uses workflow command pattern with fpf-agent for hypothesis generation, verification, and auditable decision-making.
Kaizen - Inspired by Japanese continuous improvement philosophy, Agile and Lean development practices. Introduces commands for analysis of root causes of issues and problems, including 5 Whys, Cause and Effect Analysis, and other techniques.
Customaize Agent - Commands and skills for writing and refining commands, hooks, and skills for Claude Code. Includes Anthropic Best Practices and Agent Persuasion Principles that can be useful for sub-agent workflows.
Docs - Commands for analyzing projects, writing and refining documentation.
Tech Stack - Commands for setting up or updating CLAUDE.md file with best practices for specific languages or frameworks.
MCP - Commands for setting up well-known MCP server integration if needed and updating CLAUDE.md file with requirements to use this MCP server for the current project.

Reflexion

Collection of commands that force the LLM to reflect on the previous response and output. Includes automatic reflection hooks that trigger when you include "reflect" in your prompt.

How to install

/plugin install reflexion@NeoLabHQ/context-engineering-kit

Commands

/reflexion:reflect - Reflect on previous response and output, based on Self-refinement framework for iterative improvement with complexity triage and verification
/reflexion:memorize - Memorize insights from reflections and update the CLAUDE.md file with this knowledge. Curates insights from reflections and critiques into CLAUDE.md using Agentic Context Engineering
/reflexion:critique - Comprehensive multi-perspective review using specialized judges with debate and consensus building

Hooks

Automatic Reflection Hook - Triggers /reflexion:reflect automatically when "reflect" appears in your prompt

Theoretical Foundation

The plugin is based on papers like Self-Refine and Reflexion. These techniques improve the output of large language models by introducing feedback and refinement loops.

They are proven to increase output quality by 8–21% based on both automatic metrics and human preferences across seven diverse tasks, including dialogue generation, coding, and mathematical reasoning, when compared to standard one-step model outputs.

On top of that, the plugin is based on the Agentic Context Engineering paper that uses memory updates after reflection, and consistently outperforms strong baselines by 10.6% on agents.

Code Review

Comprehensive code review commands using multiple specialized agents for thorough code quality evaluation.

How to install

/plugin install code-review@NeoLabHQ/context-engineering-kit

Commands

/code-review:review-local-changes - Comprehensive review of local uncommitted changes using specialized agents with code improvement suggestions
/code-review:review-pr - Comprehensive pull request review using specialized agents

Agents

This plugin uses multiple specialized agents for comprehensive code quality analysis:

bug-hunter - Identifies potential bugs, edge cases, and error-prone patterns
code-quality-reviewer - Evaluates code structure, readability, and maintainability
contracts-reviewer - Reviews interfaces, API contracts, and data models
historical-context-reviewer - Analyzes changes in relation to codebase history and patterns
security-auditor - Identifies security vulnerabilities and potential attack vectors
test-coverage-reviewer - Evaluates test coverage and suggests missing test cases

You can use this plugin to review code in GitHub Actions; to do so, follow this guide.

Git

Commands and skills for streamlined Git operations including commits, pull request creation, and advanced workflow patterns.

How to install

/plugin install git@NeoLabHQ/context-engineering-kit

Commands

/git:commit - Create well-formatted commits with conventional commit messages and emoji
/git:create-pr - Create pull requests using GitHub CLI with proper templates and formatting
/git:analyze-issue - Analyze a GitHub issue and create a detailed technical specification
/git:load-issues - Load all open issues from GitHub and save them as markdown files
/git:create-worktree - Create git worktrees for parallel development with automatic dependency installation
/git:compare-worktrees - Compare files and directories between git worktrees
/git:merge-worktree - Merge changes from worktrees with selective checkout, cherry-picking, or patch selection

Skills

worktrees - Git worktree commands and workflow patterns for parallel branch development
notes - Git notes commands for attaching non-invasive metadata to commits

Test-Driven Development

Commands and skills for test-driven development with anti-pattern detection.

How to install

/plugin install tdd@NeoLabHQ/context-engineering-kit

Commands

/tdd:write-tests - Systematically add test coverage for local code changes using specialized review and development agents
/tdd:fix-tests - Fix failing tests after business logic changes or refactoring using orchestrated agents

Skills

test-driven-development - Introduces TDD methodology, best practices, and skills for testing using subagents

Subagent-Driven Development

Execution framework for competitive generation, multi-agent evaluation, and subagent-driven development with quality gates.

How to install

/plugin install sadd@NeoLabHQ/context-engineering-kit

Commands

/sadd:launch-sub-agent - Launch focused sub-agents with intelligent model selection, Zero-shot CoT reasoning, and self-critique verification
/sadd:do-and-judge - Execute a single task with implementation sub-agent, independent judge verification, and automatic retry loop until passing
/sadd:do-in-parallel - Execute the same task across multiple independent targets in parallel with context isolation
/sadd:do-in-steps - Execute complex tasks through sequential sub-agent orchestration with automatic decomposition and context passing
/sadd:do-competitively - Execute tasks through competitive generation, multi-judge evaluation, and evidence-based synthesis to produce superior results
/sadd:tree-of-thoughts - Execute complex reasoning through systematic exploration of solution space, pruning unpromising branches, and synthesizing the best solution
/sadd:judge-with-debate - Evaluate solutions through iterative multi-judge debate with consensus building or disagreement reporting
/sadd:judge - Evaluate completed work using LLM-as-Judge with structured rubrics and evidence-based scoring

Skills

subagent-driven-development - Dispatches a fresh subagent for each task with code review between tasks, enabling fast iteration with quality gates
multi-agent-patterns - Design multi-agent architectures (supervisor, peer-to-peer, hierarchical) for complex tasks exceeding single-agent context limits

Spec-Driven Development

Comprehensive specification-driven development workflow plugin that transforms prompts into production-ready implementations through structured planning, architecture design, and quality-gated execution.

This plugin is designed to consistently produce working code. It was tested on real-life production projects by our team, and in 100% of cases, it generated working code aligned with the initial prompt. If you find a use case it cannot handle, please report it as an issue.

Key Features

Development as compilation — The plugin works like a "compilation" or "nightly build" for your development process: task specs → run /sdd:implement → working code. After writing your prompt, you can launch the plugin and expect a working result when you come back. The time it takes depends on task complexity — simple tasks may finish in 30 minutes, while complex ones can take a few days.
Benchmark-level quality in real life — Model benchmarks improve with each release, yet real-world results usually stay the same. That's because benchmarks reflect the best possible output a model can achieve, whereas in practice LLMs tend to drift toward sub-optimal solutions that can be wrong or non-functional. This plugin uses a variety of patterns to keep the model working at its peak performance.
Customizable — Balance result quality and process speed by adjusting command parameters. Learn more in the Customization section.
Developer time-efficient — The overall process is designed to minimize developer time and reduce the number of interactions, while still producing results better than what a model can generate from scratch. However, overall quality is highly proportional to the time you invest in iterating and refining the specification.
Industry-standard — The plugin's specification template is based on the arc42 standard, adjusted for LLM capabilities. Arc42 is a widely adopted, high-quality standard for software development documentation used by many companies and organizations.
Works best in complex or large codebases — While most other frameworks work best for new projects and greenfield development, this plugin is designed to perform better the more existing code and well-structured architecture you have. At each planning phase it includes a codebase impact analysis step that evaluates which files may be affected and which patterns to follow to achieve the desired result.
Simple — This plugin avoids unnecessary complexity and mainly uses just 3 commands, offloading process complexity to the model via multi-agent orchestration. /sdd:implement is a single command that produces working code from a task specification. To create that specification, you run /sdd:add-task and /sdd:plan, which analyze your prompt and iteratively refine the specification until it meets the required quality.

Quick Start

/plugin install sdd@NeoLabHQ/context-engineering-kit

Then run the following commands:

# create .specs/tasks/draft/design-auth-middleware.feature.md file with initial prompt
/sdd:add-task "Design and implement authentication middleware with JWT support"

# write detailed specification for the task
/sdd:plan
# will move task to .specs/tasks/todo/ folder

Restart the Claude Code session to clear context and start fresh. Then run the following command:

# implement the task
/sdd:implement @.specs/tasks/todo/design-auth-middleware.feature.md
# produces working implementation and moves the task to .specs/tasks/done/ folder

Commands

/sdd:add-task - Create task template file with initial prompt
/sdd:plan - Analyze prompt, generate required skills and refine task specification
/sdd:implement - Produce a working implementation of the task and verify it

Additional commands useful before creating a task:

/sdd:create-ideas - Generate diverse ideas on a given topic using creative sampling techniques
/sdd:brainstorm - Refine vague ideas into fully-formed designs through collaborative dialogue

Agents

Agent	Description	Used By
`researcher`	Technology research, dependency analysis, best practices	`/sdd:plan` (Phase 2a)
`code-explorer`	Codebase analysis, pattern identification, architecture mapping	`/sdd:plan` (Phase 2b)
`business-analyst`	Requirements discovery, stakeholder analysis, specification writing	`/sdd:plan` (Phase 2c)
`software-architect`	Architecture design, component design, implementation planning	`/sdd:plan` (Phase 3)
`tech-lead`	Task decomposition, dependency mapping, risk analysis	`/sdd:plan` (Phase 4)
`team-lead`	Step parallelization, agent assignment, execution planning	`/sdd:plan` (Phase 5)
`qa-engineer`	Verification rubrics, quality gates, LLM-as-Judge definitions	`/sdd:plan` (Phase 6)
`developer`	Code implementation, TDD execution, quality review, verification	`/sdd:implement`
`tech-writer`	Technical documentation writing, API guides, architecture updates, lessons learned	`/sdd:implement`

Patterns

Key patterns implemented in this plugin:

Structured reasoning templates — includes Zero-shot and Few-shot Chain of Thought, Tree of Thoughts, Problem Decomposition, and Self-Critique. Each is tailored to a specific agent and task, enabling sufficiently detailed decomposition so that isolated sub-agents can implement each step independently.
Multi-agent orchestration for context management — Context isolation of independent agents prevents the context rot problem, essentially keeping LLMs at optimal performance at each step of the process. The main agent acts as an orchestrator that launches sub-agents and controls their work.
Quality gates based on LLM-as-Judge — Evaluate the quality of each planning and implementation step using evidence-based scoring and predefined verification rubrics. This fully eliminates cases where an agent produces non-working or incorrect solutions.
Continuous learning — Builds skills that the agent needs to implement a specific task, which it would otherwise not be able to perform from scratch.
Spec-driven development pattern — Based on the arc42 specification standard, adjusted for LLM capabilities, to eliminate parts of the specification that add no value to implementation quality or that could degrade it.
MAKER — An agent reliability pattern introduced in Solving a Million-Step LLM Task with Zero Errors. It removes agent mistakes caused by accumulated context and hallucinations by utilizing clean-state agent launches, filesystem-based memory storage, and multi-agent voting during critical decision-making.

Vibe Coding vs. Specification-Driven Development

This plugin is not a "vibe coding" solution, but out of the box it works like one. By default it is designed to work from a single prompt through to the end of the task, making reasonable assumptions and evidence-based decisions instead of constantly asking for clarification. This is because developer time is more valuable than model time. As a result, the plugin is designed to allow the developer to decide how much time the task is worth. The plugin will always produce working results, but quality will be sub-optimal if no human feedback is provided.

To improve quality, after generating a specification you can correct it or leave comments using //, then run the /plan command again with the --refine flag. You can also verify each planning and implementation phase by adding the --human-in-the-loop flag. According to most known research, human feedback is the most effective way to improve results.

Our tests showed that even when the initially generated specification was incorrect due to lack of information or task complexity, the agent was still able to self-correct until it reached a working solution. However, it usually takes much longer, and results in the agent spending time on wrong paths and stopping more frequently. To avoid this, we strongly advise decomposing tasks into smaller separate tasks with dependencies and reviewing the specification for each one independently. You can add dependencies between tasks as arguments to the /add-task command, and the agent will link them together by adding a depends_on section to the task file frontmatter.

Even if you don't want to spend much time on this process, you can still use the plugin for complex tasks without decomposition or human verification — but you will likely need tools like ralph-loop to keep the agent running for longer.

Learn more about available customization options in Customization.

Domain-Driven Development

Commands for setting up domain-driven development best practices focused on code quality.

How to install

/plugin install ddd@NeoLabHQ/context-engineering-kit

Commands

/ddd:setup-code-formatting - Sets up code formatting rules and style guidelines in CLAUDE.md

Rules

15 composable rules covering Clean Architecture, SOLID principles, Command-Query Separation, Functional Core/Imperative Shell, Explicit Control Flow, Domain-Specific Naming, and more. See rules reference

FPF - First Principles Framework

A structured reasoning plugin that implements the First Principles Framework (FPF) by Anatoly Levenchuk — a methodology for rigorous, auditable reasoning. The killer feature is turning the black box of AI reasoning into a transparent, evidence-backed audit trail. The plugin makes AI decision-maki

Similar plugins

Plugin

superpowers

by obra

Claude Code plugin for agentic software development — automates TDD, planning, and subagent coordination

$/plugin install superpowers@claude-plugins-official

213k+53kv5.1.0· 1mo agoShell

GitHub ↗

Plugin

everything-claude-code

by affaan-m

Claude Code plugin: agent harness performance system with skills, memory, security, and continuous learning

$/plugin marketplace add https://github.com/affaan-m/everything-claude-code

199k+38kv1.10.0· 1mo agoJavaScript

GitHub ↗

Plugin

1 dev uses this

andrej-karpathy-skills

by forrestchang

Claude Code plugin for LLM coding guidelines — applies Karpathy's principles to prevent common AI coding pitfalls

$/plugin install andrej-karpathy-skills@karpathy-skills

161k+100kupdated 1mo ago

GitHub ↗

Plugin

1 dev uses this

claude-mem

by thedotmack

Persistent memory for Claude Code plugin — captures, compresses, and reinjects session context

$/plugin marketplace add thedotmack/claude-mem

79k+16kv13.3.0· 1w agoTypeScript

GitHub ↗

See all Plugins →