loading…
Search for a command to run...
loading…
Entroly is a high-performance context engineering engine that optimizes AI coding agent context windows through information-theoretic compression and reinforcem
Entroly is a high-performance context engineering engine that optimizes AI coding agent context windows through information-theoretic compression and reinforcement learning. It gives your AI 100% codebase visibility while reducing token consumption by an average of 78%.It's learns fully autonomously.
🇨🇳 中文 • 🇯🇵 日本語 • 🇰🇷 한국어 • 🇧🇷 Português • 🇪🇸 Español • 🇩🇪 Deutsch • 🇫🇷 Français • 🇷🇺 Русский • 🇮🇳 हिन्दी • 🇹🇷 Türkçe
npm install entroly-wasm && npx entroly-wasm | pip install entroly && entroly go | 📊 Live Dashboard → | Live demo →
Every AI coding tool — Claude, Cursor, Copilot, Codex — has the same blind spot: it only sees 5–10 files at a time. The other 95% of your codebase is invisible. This causes hallucinated APIs, broken imports, missed dependencies, and wasted developer hours fixing AI-generated mistakes.
Models keep getting bigger — Claude Opus 4.7 just dropped with even more capability and even higher per-token costs. Larger context windows don't solve the problem; they make it worse. You're paying for 186,000 tokens per request — most of which is duplicated boilerplate.
Entroly fixes both problems in 30 seconds. It compresses your entire codebase into the AI context window at variable resolution, so your AI sees everything — and you pay for almost none of it.
| Metric | Before Entroly | After Entroly |
|---|---|---|
| Files visible to AI | 5–10 | Your entire codebase |
| Tokens per request | ~186,000 | 9,300 – 55,000 |
| Monthly AI spend (at 1K req/day) | ~$16,800 | $840 – $5,040 |
| AI answer accuracy | Incomplete, often hallucinated | Dependency-aware, correct |
| Developer time fixing AI mistakes | Hours/week | Near zero |
| Setup | Days of prompt engineering | 30 seconds |
ROI example: A 10-person team spending $15K/month on AI API calls saves $10K–$14K/month on day 1. Entroly pays for itself in the first hour. (It's free and open-source, so it actually pays for itself instantly.)
The teams adopting Entroly today aren't just saving money — they're compounding an advantage your team can't catch up to.
Every day you wait, the gap widens. The federation effect means early adopters get smarter faster — and that advantage compounds.
pip install entroly && entroly go
Or wrap your coding agent — one command:
entroly wrap claude # Claude Code
entroly wrap cursor # Cursor
entroly wrap codex # Codex CLI
entroly wrap aider # Aider
entroly wrap copilot # GitHub Copilot
Or use the proxy — zero code changes, any language:
entroly proxy --port 9377
ANTHROPIC_BASE_URL=http://localhost:9377 your-app
OPENAI_BASE_URL=http://localhost:9377/v1 your-app
Drop it into your own code — two lines:
from entroly import compress, compress_messages
# Compress any content (code, JSON, logs, prose)
compressed = compress(api_response, budget=2000)
# Or compress a full LLM conversation
messages = compress_messages(messages, budget=30000)
What happens under the hood:
Your AI now sees 100% of your codebase. You pay for 5–30% of the tokens.
Most "self-improving" AI tools burn tokens to learn — your bill grows with their intelligence. Entroly's learning loop is provably token-negative: it cannot spend more on learning than it saves you.
The math is simple and auditable:
Learning budget ≤ 5% × Lifetime savings
Day 1: 70% token savings. Day 30: 85%+. Day 90: 90%+. The improvement costs you $0.
Now take the Dreaming Loop and multiply it by every developer on Earth who runs Entroly.
While you sleep, your daemon dreams — and so do 10,000 others. Each one discovers slightly different tricks for compressing code. Each one shares what it learned — anonymously, privately, no code ever leaves your machine. Each one absorbs what the others found.
You wake up. Your AI is smarter than when you left it. Not because of anything you did — because of what the swarm dreamed.
Your daemon dreams → discovers a better strategy → shares it (anonymously)
↓
10,000 other daemons did the same thing last night
↓
You open your laptop → your AI already absorbed all of it
Network effect:
# Opt-in — your choice, always
export ENTROLY_FEDERATION=1
LLM responses contain ~40% filler — "Sure, I'd be happy to help!", hedging, meta-commentary. Entroly strips it. Code blocks are never touched.
Before: "Sure! I'd be happy to help. Let me take a look at your code.
The issue is in the auth module. Hope this helps!"
After: "The issue is in the auth module."
→ 70% fewer output tokens
Three intensity levels: lite → full → ultra. Enable with one env var.
Zero cloud dependencies. Zero data exfiltration risk. Everything runs on your CPU in <10ms. Works in air-gapped and regulated environments — nothing ever phones home.
| Tool | Setup |
|---|---|
| Claude Code | entroly wrap claude or claude mcp add entroly -- entroly |
| Cursor | entroly wrap cursor → prints config, paste once |
| Codex CLI | entroly wrap codex |
| GitHub Copilot | entroly wrap copilot |
| Aider | entroly wrap aider |
| Windsurf / Cline / Cody | entroly init → MCP server |
| Any LLM API | entroly proxy → HTTP proxy on localhost:9377 |
| LangChain / LlamaIndex | from entroly import compress |
Also: OpenAI API • Anthropic API • Google Vertex • AWS Bedrock • Groq • Together • OpenRouter • Ollama • vLLM • 100+ models
This is from this repo's vault, not a roadmap:
[detect] gap observed → entity="auth", miss_count=3
[synthesize] StructuralSynthesizer ($0, deterministic, no LLM)
[benchmark] skill=ddb2e2969bb0 → fitness 1.0 (1 pass / 0 fail, 338 ms)
[promote] status: draft → promoted
[spend] $0.0000 — invariant C_spent ≤ τ·S(t) holds
Compression doesn't hurt accuracy — we measured it (n=100, gpt-4o-mini, Wilson 95% CIs):
| Benchmark | Baseline (95% CI) | With Entroly (95% CI) | Retention |
|---|---|---|---|
| NeedleInAHaystack | 100% [83.9–100%] | 100% [83.9–100%] | 100.0% |
| GSM8K | 85.0% [76.7–90.7%] | 86.0% [77.9–91.5%] | 101.2% |
| SQuAD 2.0 | 84.0% [75.6–89.9%] | 83.0% [74.5–89.1%] | 98.8% |
| MMLU | 82.0% [73.3–88.3%] | 85.0% [76.7–90.7%] | 103.7% |
| TruthfulQA (MC1) | 72.0% [62.5–79.9%] | 73.0% [63.6–80.7%] | 101.4% |
| LongBench (HotpotQA) | 57.0% [47.2–66.3%] | 59.8% [49.8–69.0%] | 104.9% |
| Berkeley Function Calling | 99.0% [94.5–99.8%] | 100.0% [96.3–100.0%] | 101.0% |
Confidence intervals overlap on every one of the 7 benchmarks — accuracy is statistically indistinguishable from baseline. LongBench (the only benchmark where context exceeds the budget) shows a 3.6% token saving with a small retention gain. Reproduce:
python -m bench.accuracy --benchmark all --model gpt-4o-mini --samples 100
Custom OpenAI-compatible providers (Groq, Together, OpenRouter, Ollama, vLLM, ...):
python -m bench.accuracy --benchmark gsm8k --model llama-3.1-70b-versatile \
--base-url https://api.groq.com/openai/v1 --api-key-env GROQ_API_KEY
Run token cost checks in every PR — catch regressions before they ship:
- uses: juyterman1000/entroly-cost-check-@v1
→ entroly-cost-check GitHub Action
Entroly selects the right context. Other tools compress or truncate whatever you give them. Selection beats compression — always.
| Entroly | Compression tools | Top-K / RAG | Raw truncation | |
|---|---|---|---|---|
| Approach | Information-theoretic selection | Text compression | Embedding retrieval | Cut-off |
| Token savings | 94% | 50–70% | 30–50% | 0% |
| Quality loss | 0% (benchmark-verified) | 2–5% | Variable | High |
| Multi-resolution | Full / Skeleton / Reference | One-size | One-size | One-size |
| Learns over time | Yes (PRISM RL) | No | No | No |
| Latency | 12ms (Rust) | 50–200ms | 100–500ms | 0ms |
| Reversible | Yes — full content always retrievable | Varies | Yes | No |
| Runs locally | Yes | Varies | Varies | Yes |
Why selection > compression: Compressing a bad selection is still a bad selection. Entroly picks the right files first, then delivers them at the right resolution. The AI gets architectural understanding, not just fewer tokens.
Three chat integrations ship in the box. See every gap detection, skill synthesis, and dream-cycle win in real-time:
export ENTROLY_TG_TOKEN=... # Telegram (2-way: /status /skills /gaps /dream)
export ENTROLY_DISCORD_WEBHOOK=... # Discord
export ENTROLY_SLACK_WEBHOOK=... # Slack
Skills Entroly creates aren't locked in. Export to the open agentskills.io v0.1 spec:
node node_modules/entroly-wasm/js/agentskills_export.js ./dist/agentskills
python -m entroly.integrations.agentskills ./dist/agentskills
Every exported skill carries origin.token_cost: 0.0 — the zero-cost provenance travels with it.
Both runtimes are feature-complete. Same engine, same vault, same learning loop:
| Capability | Python | Node.js (WASM) |
|---|---|---|
| Context compression | ✅ | ✅ |
| Self-evolution | ✅ | ✅ |
| Dreaming loop | ✅ | ✅ |
| Federation | ✅ | ✅ |
| Response distillation | ✅ | ✅ |
| Chat gateways | ✅ | ✅ |
| agentskills.io export | ✅ | ✅ |
Architecture, 21 Rust modules, 3-resolution compression, provenance guarantees, RAG comparison, full CLI reference, Python SDK, LangChain integration → docs/DETAILS.md
Stop paying for tokens your AI wastes. Start running an AI that teaches itself.
npm install entroly-wasm && npx entroly-wasm | pip install entroly && entroly go
Discussions • Issues • Apache-2.0 License
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"entroly-context-engine": {
"command": "npx",
"args": []
}
}
}