loading…
Search for a command to run...
loading…
MCP middleware proxy that compresses tools/list, prompts/list, and resources/list responses to save tokens. Sister project to Brevix.
MCP middleware proxy that compresses tools/list, prompts/list, and resources/list responses to save tokens. Sister project to Brevix.

Install · Usage · Cost Routing · Accuracy Guard · Benchmarks · Roadmap · Contributing
Cuts response tokens 40–75% with a deterministic rule engine. Routes each task to the cheapest capable Claude tier — saving up to ~80% of API spend. Verifies meaning is preserved before emit. Works across 20+ AI coding tools.
Brevix is two cost-saving layers in one tool:
Works with Claude Code · Cursor · Windsurf · OpenAI Codex CLI · Google Antigravity · Gemini CLI · GitHub Copilot Chat · Aider · Continue.dev · Cline · Roo Code · Zed AI · Augment · Kilo · OpenHands · Tabnine · Warp · Replit · Sourcegraph Amp — plus any tool reading AGENTS.md.
What it does: Stops you from paying Opus prices for tasks Haiku can solve in a single shot.
Most LLM coding tools call the most expensive model for every prompt. A task like "classify this support ticket" hits Opus at $0.05 per call, when Haiku could answer it for $0.0002 — 250× cheaper, identical answer.
Brevix Route is a thin layer in front of the Anthropic SDK that:
task → model rule table you control.from brevix import RoutedClient
client = RoutedClient(log_enabled=True)
# Cheap task -> auto-routed to Haiku.
r = client.call("Classify this ticket: app crashes on login")
print(r.text, r.model, r.cost_usd)
# > "bug" claude-haiku-4-5 0.000028
# Hard task -> auto-routed to Opus.
r = client.call("Architect a multi-region payment system with sub-100ms p99 latency")
print(r.model)
# > claude-opus-4-7
# Confidence-guarded mode: retries on next tier only if the first answer hedges.
r = client.call("Review this auth middleware for token expiry bugs", confidence_check=True)
print(r.model, r.escalations, r.confidence)
# > claude-opus-4-7 1 0.94
brevix route --init # one-time: write default config
brevix route "classify ticket: app crashes" --explain # dry-run, show decision + cost
brevix route "..." --call # actually call the model
brevix route "..." --call --confidence # call + escalate on low conf
brevix route --budget-show # current spend
brevix stats --routing --since 7d # weekly cost-saved report
brevix route --learn-suggest # recommend rule changes
brevix route --learn-apply # apply them
After /plugin install brevix@brevix:
/brevix-route classify this support ticket: "app crashes on login"
↓
[brevix] task=classify tier=haiku escalated=0
bug
/brevix-route architect a multi-region payment system with sub-100ms p99 latency
↓
[brevix] task=architecture tier=opus escalated=0
<full Opus response>
/brevix-route-stats --since 7d # cost saved vs Opus-only baseline
/brevix-learn # suggested rule changes
/brevix-learn apply # apply them
Edit ~/.brevix/route.json:
{
"budget": { "tokens": 0, "cost_usd": 50.00 }
}
When the cap is hit, the next call raises BudgetExceededError instead of silently overspending.
Or per-run from the CLI:
brevix route --budget-cost 5.00 --call "your prompt"
| Workload | Naive (Opus only) | Brevix routed | Saved |
|---|---|---|---|
| Solo dev, 50 calls/day, mostly easy tasks | $0.40/day | $0.06/day | 85% |
| Team of 5, 500 calls/day, mixed | $4.00/day | $0.80/day | 80% |
| Customer-support bot, 10k tickets/day, 90% classify | $80/day | $1.50/day | 98% |
Same answer quality — confidence guard ensures hard cases still hit Opus.
See the full Routing tutorial in ## ⚡ Usage for advanced flags, force-tier overrides, and the auto-tuning learn loop.
Compression Engine
|
Safety & Verification
|
Integrations
|
Insights
|
💰 Cost Management & Model Routing — new
|
|
curl -fsSL https://raw.githubusercontent.com/Yash-Koladiya30/brevix/main/install.sh | bash -s -- --all
irm https://raw.githubusercontent.com/Yash-Koladiya30/brevix/main/install.ps1 | iex
skills CLI — one command, 9 tools at onceAuto-installs Brevix skills into Antigravity, Claude Code, Cline, Codex, Cursor, Gemini CLI, GitHub Copilot, Kiro CLI, and Qoder simultaneously.
npx skills add https://github.com/Yash-Koladiya30/brevix
Pick a specific skill:
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix-commit
npx skills add https://github.com/Yash-Koladiya30/brevix --skill brevix-stats
Listing → skills.sh/Yash-Koladiya30/brevix
pip install brevix # core
pip install 'brevix[guard]' # + semantic Accuracy Guard
pip install 'brevix[tokens]' # + accurate tiktoken counts
pip install 'brevix[all]' # everything
brevix install --list # show all 20 targets
brevix install claude-code # Claude Code plugin layout
brevix install cursor # .cursor/rules/brevix.mdc
brevix install codex # AGENTS.md + .codex/hooks.json
brevix install gemini # gemini-extension.json + GEMINI.md
brevix install all # write rule files for every tool
Idempotent — re-running updates the Brevix block, leaves your other content alone.
/plugin marketplace add Yash-Koladiya30/brevix
/plugin install brevix@brevix
Compress upstream MCP server descriptions:
npm install -g brevix-shrink
Wrap any MCP server in your Claude config:
{
"mcpServers": {
"fs-shrunk": {
"command": "npx",
"args": ["brevix-shrink", "npx", "-y",
"@modelcontextprotocol/server-filesystem", "/tmp"]
}
}
}
# Output compression
/brevix # toggle on (full mode)
/brevix lite # gentle compression
/brevix ultra # max compression
/brevix auto # pick best mode per response
/brevix off # disable
/brevix-commit # terse Conventional Commit message
/brevix-check # run Accuracy Guard on a snippet
/brevix-stats # show compression savings
# Smart model routing (saves API cost)
/brevix-route <task> # route to cheapest capable tier (Haiku/Sonnet/Opus)
/brevix-route-stats # cost saved vs Opus-only baseline
/brevix-learn # suggested rule changes from observed escalations
/brevix-learn apply # apply suggestions to ~/.brevix/route.json
For Codex CLI (no slash commands), use $brevix lite|full|ultra|auto|off.
# Output compression
brevix compress "Your verbose text here" --mode full
brevix compress - # stdin
brevix compress . --mode auto -v # adaptive picks best
brevix compress . --guard --strict --threshold 0.85
# File compression (CLAUDE.md, AGENTS.md, project notes)
brevix compress-file CLAUDE.md # writes .original.md backup
brevix compress-file CLAUDE.md --dry-run
# Stats
brevix stats # estimated, in-process
brevix stats --real --since 7d # parsed from Claude Code session logs
brevix stats --share # tweet-ready one-liner
brevix stats --reset
# Verification
brevix check "original" "compressed"
brevix count "how many tokens?"
# Install rules into a project
brevix install cursor
brevix install --list
# Smart model routing
brevix route --init # write default config to ~/.brevix/route.json
brevix route "classify ticket: app crashes" # print suggested model
brevix route "..." --explain # task, model, est cost, reason
brevix route "..." --call # actually call the chosen model
brevix route "..." --call --confidence # also escalate on low-confidence answers
brevix route --budget-show # current spend vs cap
brevix route --budget-tokens 1000000 --budget-cost 50.00 "..." --call
brevix stats --routing # cost saved, escalation rate, by model/task
brevix stats --routing --since 7d
brevix route --learn-suggest # recommend rule changes
brevix route --learn-apply # apply them
agents/ ships six focused subagents that emit ~60% smaller tool results than vanilla agents:
| Agent | Purpose | Output format |
|---|---|---|
| brevix-investigator | Read-only code locator | path:line — symbol — note |
| brevix-builder | Surgical 1–2 file edits with verification | Diff + verify status |
| brevix-reviewer | Bug-focused diff review | path:line: 🔴 bug: …. fix. |
| brevix-haiku | Cheap tier — classify, parse, format, rename | Brief, exact answer |
| brevix-sonnet | Mid tier — review, refactor, debug, explain | Direct, file:line cited |
| brevix-opus | Heavy tier — architecture, multi-agent, hard bugs | Decision + reasoning chain |
/brevix-routeSave ~80% on Claude API cost without manually picking a model per task.
1. Install Brevix and the Anthropic SDK
pip install brevix anthropic
export ANTHROPIC_API_KEY=sk-ant-...
2. Install the Claude Code plugin
/plugin marketplace add Yash-Koladiya30/brevix
/plugin install brevix@brevix
This adds the routing-tier subagents (brevix-haiku / brevix-sonnet / brevix-opus) and the /brevix-route, /brevix-route-stats, /brevix-learn slash commands.
3. Initialize the routing config
brevix route --init
Writes ~/.brevix/route.json with sensible defaults: classify→Haiku, code review→Sonnet, architecture→Opus, escalation chain Haiku→Sonnet→Opus, no budget cap.
4. Use it from Claude Code
/brevix-route classify this support ticket: "app crashes on login"
↓
[brevix] task=classify tier=haiku escalated=0
bug
/brevix-route architect a multi-region payment system with sub-100ms p99 latency
↓
[brevix] task=architecture tier=opus escalated=0
<full Opus response>
5. Check what you saved
/brevix-route-stats --since 7d
/brevix-route <task> runs brevix route ... --explain under the hood.brevix-haiku / brevix-sonnet / brevix-opus) via the Task tool.escalate: <reason>, retries on the next tier (capped at 1 retry per tier).~/.brevix/routing_log.jsonl for stats.Force a tier when you already know the right one:
/brevix-route --force-tier=sonnet refactor this 200-line module
/brevix-route-stats/brevix-route-stats # all-time
/brevix-route-stats --since 7d # last week
Sample output:
Brevix Routing Stats
--------------------
Window: 7d
Calls: 412
Total cost: $1.84
Opus-only baseline: $9.67
Saved: $7.83 (80.9%)
Escalations: 18 (4.4% of calls)
By model:
claude-haiku-4-5 289 (70.1%) $0.08
claude-sonnet-4-6 108 (26.2%) $0.65
claude-opus-4-7 15 ( 3.7%) $1.11
/brevix-learnAfter a few days, Brevix recommends rule changes from observed escalation patterns:
/brevix-learn
refactor: claude-sonnet-4-6 -> claude-opus-4-7
samples=54 escalation_rate=68.5%
reason: 37/54 escalated (69%); 32 of those landed on claude-opus-4-7
Apply with /brevix-learn apply to update ~/.brevix/route.json. User customizations and budget caps are preserved.
Edit ~/.brevix/route.json:
{
"budget": { "tokens": 0, "cost_usd": 50.00 }
}
Or per-run from the CLI:
brevix route --budget-cost 5.00 --call "your prompt"
When the cap is hit, the next call raises BudgetExceededError instead of silently overspending.
0.85) → emit compressed. Otherwise warn, or in --strict mode fall back to original.sentence-transformers installed → falls back to content-word containment (drops stopwords without penalty, fair to compression).Result: compression you can trust on production code, specs, and contracts.
Before
The reason your React component is re-rendering on every parent update is that you are passing an inline object as a prop. In JavaScript, every render creates a new object reference, even if the contents are identical. To fix this, wrap the object in
useMemoso the reference stays stable across renders.
After (full mode)
Inline object prop = new ref each render = re-render. Wrap in
useMemo.
Tokens saved: ~75% · Meaning preserved: ✅ similarity 0.91
Reproducible three-arm A/B harness in evals/. Compares no-system-prompt vs "be terse" control vs Brevix on 10 developer prompts.
| arm | n | median | mean | total | vs baseline | vs control |
|---|---|---|---|---|---|---|
| baseline | 10 | 221 | 247.3 | 2473 | — | — |
| control | 10 | 178 | 191.6 | 1916 | 22.5% | — |
| brevix | 10 | 119 | 128.4 | 1284 | 48.1% | 33.0% |
Run yourself:
pip install 'brevix[all]' anthropic
export ANTHROPIC_API_KEY=...
python evals/llm_run.py --model claude-sonnet-4-6
python evals/measure.py
The
vs controlcolumn is the honest savings — what Brevix adds beyond "just be brief."
brevix compress-file)brevix-shrink)BudgetExceededError)brevix stats --routing)brevix route --learn-suggest|--learn-apply)MIT — free for personal and commercial use. Issues and PRs welcome — see docs/CONTRIBUTING.md.
Выполни в терминале:
claude mcp add brevix-shrink -- npx -y brevix-shrinkWeb content fetching and conversion for efficient LLM usage.
Retrieval from AWS Knowledge Base using Bedrock Agent Runtime.
автор: modelcontextprotocolProvides auto-configuration for setting up an MCP server in Spring Boot applications.
A very streamlined mcp client that supports calling and monitoring stdio/sse/streamableHttp, and can also view request responses through the /logs page. It also
автор: xuzexin-hzНе уверен что выбрать?
Найди свой стек за 60 секунд
Автор?
Embed-бейдж для README
Похожее
Все в категории ai