token-savior
MCP server for Claude Code — structural code navigation with 97% token savings and persistent cross-session memory
Token Savior Recall is an MCP server for Claude Code that provides structural code navigation and persistent memory, drastically reducing token usage and improving context retention. It indexes your codebase by symbol, allowing the model to navigate efficiently, and stores session context in SQLite with vector embeddings for recall. Features include contradiction detection, ROI tracking for observations, and a progressive disclosure memory contract.
- Indexes codebase by symbol for efficient code navigation
- Achieves 97% token savings on code context injection
- Persistent memory engine stores session context in SQLite
- Hybrid FTS + vector search with RRF for memory recall
- Progressive disclosure memory contract with 3 cost layers
README
View on GitHub ↗⚡ Token Savior Recall
Structural code navigation + persistent memory engine for Claude Code. 97% fewer tokens. Nothing forgotten between sessions.
📖 mibayy.github.io/token-savior — project site + benchmark landing 🧪 github.com/Mibayy/tsbench — benchmark source + fixtures
Benchmark — 90 real coding tasks
| Plain Claude Code | With Token Savior | |
|---|---|---|
| Score | 141 / 180 (78.3%) | 180 / 180 (100.0%) |
| Active tokens | 1.55 M | 804 k (−48%) |
| Wall time | 166 min | 35 min (−79%) |
| Wins / Ties / Losses | — | 25 / 65 / 0 |
Perfect (100%) across all 11 categories: audit, bug_fixing,
code_generation, code_review, config_infra, documentation,
explanation, git, navigation, refactoring, writing_tests.
Zero losses against plain Claude — every task is a win or a tie.
Also validated on Sonnet 4.6 (ts 170/180 = 94.4% vs base 156/180 = 86.7%).
Model: Claude Opus 4.7 · Methodology + per-task breakdown: mibayy.github.io/token-savior.
What it does
Claude Code reads whole files to answer questions about three lines, and forgets
everything the moment a session ends. Token Savior Recall fixes both. It
indexes your codebase by symbol — functions, classes, imports, call graph — so
the model navigates by pointer instead of by cat. Measured reduction: 97%
fewer chars injected across 170+ real sessions.
On top of that sits a persistent memory engine. Every decision, bugfix, convention, guardrail and session rollup is stored in SQLite WAL + FTS5 + vector embeddings, ranked by Bayesian validity and ROI, and re-injected as a compact delta at the start of the next session. Contradictions are detected at save time; observations decay with explicit TTLs; a 3-layer progressive-disclosure contract keeps lookup cost bounded.
Token savings
| Operation | Plain Claude | Token Savior | Reduction |
|---|---|---|---|
find_symbol("send_message") |
41M chars (full read) | 67 chars | −99.9% |
get_function_source("compile") |
grep + cat chain | 4.5K chars | direct |
get_change_impact("LLMClient") |
impossible | 16K chars (154 direct + 492 transitive) | new capability |
get_backward_slice(var, line) |
130 lines | 12 lines | −92% |
memory_index (Layer 1) |
n/a | ~15 tokens/result | Layer 1 shortlist |
| 60-task tsbench run | 1.43 M chars | 216 k chars | −85% |
| tsbench score | 67/120 (56%) | 118/120 (98%) | +42 pts |
Full benchmark methodology and per-task results: tsbench.
Memory engine
| Capability | How it works |
|---|---|
| Storage | SQLite WAL + FTS5 + sqlite-vec (optional), 12 observation types |
| Hybrid search | BM25 + vector (all-MiniLM-L6-v2, 384d) fused via RRF, FTS fallback graceful |
| Progressive disclosure | 3-layer contract: memory_index → memory_search → memory_get |
| Citation URIs | ts://obs/{id} — reusable across layers, agent-native pointers |
| Bayesian validity | Each obs carries a validity prior + update rule; stale obs are surfaced, not silently trusted |
| Contradiction detection | Triggered at save time against existing index; flagged in hook output |
| Decay + TTL | Per-type TTL (command 60d, research 90d, note 60d) + LRU scoring 0.4·recency + 0.3·access + 0.3·type |
| Symbol staleness | Obs linked to symbols are invalidated when the symbol's content hash changes |
| ROI tracking | Access count × context weight — unused obs age out, high-ROI obs are promoted |
| MDL distillation | Minimum Description Length grouping compresses redundant observations into conventions |
| Auto-promotion | note ×5 accesses → convention; warning ×5 → guardrail |
| Hooks | 8 Claude Code lifecycle hooks (SessionStart/Stop/End, PreCompact, PreToolUse ×2, UserPromptSubmit, PostToolUse) |
| Web viewer | 127.0.0.1:$TS_VIEWER_PORT — htmx + SSE, opt-in |
| LLM auto-extraction | Opt-in TS_AUTO_EXTRACT=1 — PostToolUse tool uses extracted into 0-3 observations via small-model call |
vs claude-mem
Two projects share the goal — persistent memory for Claude Code. The axes below are measured, not marketing.
| Axis | claude-mem | Token Savior Recall |
|---|---|---|
| Bayesian validity | no | yes |
| Contradiction detection at save | no | yes |
| Per-type decay + TTL | no | yes |
| Symbol staleness (content-hash linked obs) | no | yes |
| ROI tracking + auto-promotion | no | yes |
| MDL distillation into conventions | no | yes |
| Code graph / AST navigation | no | yes (90 tools, cross-language) |
| Progressive disclosure contract | no | yes (3 layers, ~15/60/200 tokens) |
| Hybrid FTS + vector search (RRF) | no | yes |
Token Savior Recall is a superset: it ships the memory engine plus the structural codebase server that gave the project its name.
Install
uvx (no venv, no clone)
uvx token-savior-recall
pip
pip install "token-savior-recall[mcp]"
# Optional hybrid vector search:
pip install "token-savior-recall[mcp,memory-vector]"
Claude Code one-liner
claude mcp add token-savior -- /path/to/venv/bin/token-savior
Development
git clone https://github.com/Mibayy/token-savior
cd token-savior
python3 -m venv .venv
.venv/bin/pip install -e ".[mcp,dev]"
pytest tests/ -q
Configure
{
"mcpServers": {
"token-savior-recall": {
"command": "/path/to/venv/bin/token-savior",
"env": {
"WORKSPACE_ROOTS": "/path/to/project1,/path/to/project2",
"TOKEN_SAVIOR_CLIENT": "claude-code"
}
}
}
}
Optional env: TELEGRAM_BOT_TOKEN + TELEGRAM_CHAT_ID (critical-observation
feed), TS_VIEWER_PORT (web viewer), TS_AUTO_EXTRACT=1 + TS_API_KEY
(LLM auto-extraction), TOKEN_SAVIOR_PROFILE (full / core / nav / lean /
ultra — filters advertised tool set to shrink the per-turn MCP manifest).
Tools (90)
Category counts — full catalog is served via MCP tools/list.
| Category | Count |
|---|---|
| Core navigation | 14 |
| Dependencies & graph | 9 |
| Git & diffs | 5 |
| Safe editing | 8 |
| Checkpoints | 6 |
| Test & run | 6 |
| Config & quality | 8 |
| Docker & multi-project | 2 |
| Advanced context (slicing, packing, RWR, prefetch, verify) | 6 |
| Memory engine | 21 |
| Reasoning (plan/decision traces) | 3 |
| Stats, budget, health | 10 |
| Project management | 7 |
Profiles
TOKEN_SAVIOR_PROFILE filters the advertised tools/list payload while
keeping handlers live.
| Profile | Advertised | ~Tokens | Use case |
|---|---|---|---|
full (default) |
106 | ~10 950 | All capabilities |
core |
54 | ~5 800 | Daily coding, no memory engine |
nav |
28 | ~3 100 | Read-only exploration |
lean |
59 | ~6 620 | Memory engine off — used in tsbench v2 |
ultra |
17 | ~2 740 | Hot tools only + ts_extended meta-tool for lazy discovery |
Progressive disclosure — memory search
Three layers, increasing cost. Always start at Layer 1. Escalate only if the previous layer paid off. Full contract: docs/progressive-disclosure.md.
| Layer | Tool | Tokens/result | When |
|---|---|---|---|
| 1 | memory_index |
~15 | Always first |
| 2 | memory_search |
~60 | If Layer 1 matched |
| 3 | memory_get |
~200 | If Layer 2 confirmed |
Each Layer 1 row ends with [ts://obs/{id}] — pass it straight to Layer 3.
Links
- Site — https://mibayy.github.io/token-savior/
- Repo — https://github.com/Mibayy/token-savior
- PyPI — https://pypi.org/project/token-savior-recall/
- Benchmark — https://github.com/Mibayy/tsbench
- Changelog — CHANGELOG.md
- Progressive disclosure — docs/progressive-disclosure.md
License
MIT — see LICENSE.
Works with any MCP-compatible AI coding tool. Claude Code · Cursor · Windsurf · Cline · Continue · any custom MCP client
Similar mcp servers
ruflo
Multi-agent orchestration MCP server for Claude Code — deploys swarms, coordinates workflows, learns from outcomes
claude-task-master
MCP server for AI task management — streamlines development workflows with Claude and other AI chats
n8n-mcp
MCP server for n8n workflow automation — provides AI assistants deep access to 1,500+ n8n nodes and templates
ccusage
MCP server for Claude Code usage — real-time token and cost analysis from local JSONL files