loading…
Search for a command to run...
loading…
Scan codebases for LLM API calls and estimate monthly costs. Compare costs between git refs to catch cost regressions during code review.
Scan codebases for LLM API calls and estimate monthly costs. Compare costs between git refs to catch cost regressions during code review.
Prevent LLM cost regressions before production.
CI PyPI version GitHub Marketplace License: MIT Python 3.10+ tokentoll MCP server
tokentoll is a CI gate for LLM cost. It statically analyzes Python, JavaScript, and TypeScript for LLM API calls, scores every pull request against a policy you control, and posts a PASS/WARN/FAIL verdict directly on the PR. Optionally, it fails the workflow when the policy is violated, so cost regressions cannot be merged.
Jwrede/tokentoll-demo is a small polyglot LLM app (Python + TypeScript) wired up to the tokentoll cost gate. Two PRs are already open against it:
Open each PR's conversation tab to see the verdict comment tokentoll actually posts.
When a PR violates your policy, tokentoll comments with a verdict and a blocking-findings list, then exits non-zero so the check fails. Example:
## tokentoll verdict: FAIL
**Blocking findings (2):**
- `src/agent.py:42` - per-call cost grew 15.0x (threshold 5x)
- total monthly delta +$812.00 exceeds budget $250.00
> Required action: revert the regression, raise the threshold in `.tokentoll.yml`, or add an exemption.
When the PR is clean, the verdict is PASS and the comment shows only the cost delta table. When no policy is configured, tokentoll posts an informational delta comment with no verdict.
Add .github/workflows/tokentoll.yml:
name: tokentoll
on:
pull_request:
paths:
- "**.py"
- "**.ts"
- "**.tsx"
- "**.js"
- "**.jsx"
permissions:
contents: read
pull-requests: write
jobs:
cost-gate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: Jwrede/[email protected]
with:
fail-on-policy-violation: true
Then add .tokentoll.yml to your repo root:
budgets:
max_monthly_delta_usd: 250
max_callsite_monthly_usd: 100
max_relative_increase: 5.0
policies:
block_unknown_models: true
fail_on_policy_violation: true
Future PRs receive a verdict comment. PRs that exceed the thresholds fail the workflow.
For SHA-pinned installs and minimal-permissions setups, see docs/github-action.md. For the full policy schema, see docs/policy.md. For the security posture, see docs/security.md.
Python
| SDK | Patterns |
|---|---|
| OpenAI | chat.completions.create, responses.create |
| Anthropic | messages.create, messages.stream |
| Google GenAI | models.generate_content |
| LiteLLM | completion, acompletion |
| LangChain | ChatOpenAI, ChatAnthropic, init_chat_model |
| Zhipu AI | ZhipuAiClient, ZhipuAI (GLM models) |
JavaScript / TypeScript (parsed via tree-sitter, handles .js, .jsx, .ts, .tsx)
| SDK | Patterns |
|---|---|
| OpenAI Node SDK | client.chat.completions.create, client.responses.create, client.embeddings.create |
| Anthropic SDK | client.messages.create, client.messages.stream |
| Vercel AI SDK | generateText, streamText, generateObject, streamObject, embed, embedMany |
| LangChain.js | new ChatOpenAI, new ChatAnthropic, new ChatGoogleGenerativeAI, ... |
| OpenAI-compatible | same shape as OpenAI Node SDK, picked up automatically |
The policy block in .tokentoll.yml controls when a PR fails:
| Rule | Trigger |
|---|---|
budgets.max_monthly_delta_usd |
total estimated monthly delta exceeds the threshold |
budgets.max_callsite_monthly_usd |
any new or changed call site exceeds the threshold |
budgets.max_relative_increase |
per-call cost for any modified call site grows by more than this multiplier |
policies.block_unknown_models |
any new or modified call site uses an unpriced or unresolved model |
policies.fail_on_policy_violation |
tokentoll diff exits 1 on FAIL (CI gate behavior) |
Each rule is independent. Leave a field unset to disable that rule. Full reference in docs/policy.md.
pip install tokentoll
# Scan current directory for LLM API calls and their costs
tokentoll scan .
# Show cost impact of your last commit
tokentoll diff HEAD~1
# Compare two refs and fail on policy violation
tokentoll diff main..HEAD --fail-on-policy-violation
Subcommands:
tokentoll scan [PATH...] [--format table|json|markdown] [--calls-per-month N] [--config PATH]
tokentoll diff [REF] [--base REF] [--head REF] [--format table|json|markdown|github-comment]
[--config PATH] [--fail-on-policy-violation]
tokentoll update # refresh bundled pricing data from LiteLLM
.tokentoll.yml lives in the repo root and is auto-discovered. Beyond the policy block:
# Per-SDK defaults for dynamic (runtime-resolved) model names
default_models:
openai: gpt-4o-mini
anthropic: claude-haiku-3-20240307
# Assumed monthly call volume per call site (used for dollar estimates)
calls_per_month: 5000
# Skip cost estimation for dynamic models entirely.
# Default false: dynamic calls are priced against the per-SDK default.
skip_dynamic_models: false
# Default excludes (tests/, examples/, docs/, cookbook/, benchmarks/, evals/,
# scripts/, notebooks/) are applied automatically. Opt out with:
use_default_excludes: false
# Additional excludes (prefix or glob)
exclude:
- "*_test.py"
- vendor/
# Per-path overrides (longest prefix match)
overrides:
- path: src/agents/
default_model: gpt-4o
calls_per_month: 10000
- path: src/azure/
skip_dynamic_models: true
Resolution order for dynamic model defaults: default_models (per-SDK) > default_model (generic) > built-in SDK defaults.
tokentoll requires no API keys, sends no telemetry, and runs entirely inside your CI environment. Pricing data ships with the package and updates from LiteLLM on demand. For the recommended permission set, SHA pinning, and fork PR risk, see docs/security.md.
tokentoll ships an MCP (Model Context Protocol) server so Claude Code and other MCP hosts can check the cost impact of LLM code changes from inside an agent conversation:
pip install tokentoll[mcp]
claude mcp add --transport stdio tokentoll -- tokentoll-mcp
Two tools are exposed: scan (estimate costs across a path) and diff (compare two refs). Both return JSON.
Source code (.py, .ts, .tsx, .js, .jsx)
|
v
+----------------+ +------------------+
| AST scanners |-->| SDK detectors |
| ast (Python) + | | OpenAI, Anthropic|
| tree-sitter | | Google, LiteLLM, |
| (JS/TS) | | LangChain, Zhipu,|
+----------------+ | Vercel AI SDK |
+------------------+
|
v
+------------------+
| Pricing engine |
| 2200+ models |
+------------------+
|
v
+------------------+
| Diff engine |
| (old vs new) |
+------------------+
|
v
+------------------+
| Policy evaluator |
| PASS/WARN/FAIL |
+------------------+
|
v
+------------------+
| PR comment / CLI |
| output |
+------------------+
A multi-pass constant propagation engine resolves model names through variable assignments, os.getenv() / process.env.X fallbacks, function defaults, class attributes, constructor arguments, dict and object literals, **kwargs unpacking, and Vercel AI SDK provider wrappers (openai("gpt-4o")), so real-world code with indirection still produces useful estimates.
Pricing is bundled and works offline. To refresh from LiteLLM:
tokentoll update
Coverage: 300+ models across OpenAI, Anthropic, Google, AWS Bedrock, Azure, and more, plus 2200+ entries from LiteLLM's combined catalog.
(default).pip install tokentoll[tiktoken]).calls_per_month or per-path with overrides.MIT
Run in your terminal:
claude mcp add tokentoll -- npx Yes, Tokentoll MCP is free — one-click install via Unyly at no cost.
No, Tokentoll runs without API keys or environment variables.
Self-hosted: the server runs locally on your machine via the install command above.
Open Tokentoll on unyly.org, pick your client tab (Claude Desktop, Claude Code, Cursor) and press Install — the config is generated automatically, no JSON editing.
CSA PROJECT - FZCO © 2026 IFZA Business Park, DDP, Premises Number 31174 - 001
Security
Low riskAutomated heuristic from public metadata — not a security guarantee.