loading…
Search for a command to run...
loading…
Big Indexer provides architecture-aware context (boundaries, seams, impact scope, and BGI-TWIN tools).
Big Indexer provides architecture-aware context (boundaries, seams, impact scope, and BGI-TWIN tools).
ahmedxuhri/bigindexer MCP server PyPI version License Tests Interactive Demo MCP Registry PR Risk Bot Action
BGI is a static architecture analysis tool for large codebases.
It groups code units by behavioral role and emits explicit architectural boundaries.
Project domain: bigindexer.com
Big Indexer is published in the MCP Registry as io.github.ahmedxuhri/bigindexer.
pip install bigindexer==0.1.3
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json
Validation: https://bigindexer.com/validation
Most architecture graphs fail at scale in two ways:
BGI is built to keep both under control, so the output remains usable on large repos.
bgi-graph.json, fuse-graph.json) plus optional human context (bigindexer.md).task_fingerprint, behavioral_twins, twin_context) ground prompts in in-repo behavior patterns.Run BGI on the included fixture repo:
git clone https://github.com/ahmedxuhri/bigindexer
cd bigindexer
pip install -e .
bgi scan tests/fixtures --lang python --out /tmp/bgi-example.json
head -50 /tmp/bgi-example.json
Observed result on this repository:
121426 unitsOne produced edge looks like:
{
"source": "auth_module.py::AuthService::__init__",
"target": "auth_module.py::AuthService::__del__",
"key": "COV.INIT",
"lock": "COV.TEARDOWN",
"type": "HARD"
}
Why this matters: instead of raw syntax references only, you get behavioral relationships plus cluster structure that can drive architecture decisions.
| BGI term | Plain meaning |
|---|---|
| COV token | A behavior label for a unit (for example: FETCH, PERSIST, AUTHENTICATE) |
| Key-Lock edge | A behavioral connection between two units with complementary roles |
| DRS cluster | A unit-level grouping by behavioral role. Mostly intra-file in practice. File-level architectural components are better expressed via the BGI edge graph or the fuse-graph boundary signal — see external benchmark |
| Fuse edge / fuse event | A refused merge because cluster growth hit the cap; treated as boundary signal |
| Spectral masks | Scope rules that limit where matching is allowed (global, directory, file) |
Source files
->
Gate 1: fingerprint unit behavior (COV tokens)
->
Gate 2: create behavioral edges with scoped matching
->
Gate 3: cluster with hard size cap + boundary emission
->
Artifacts: bgi-graph.json, fuse-graph.json, bigindexer.md, optional routes/graphml/html
Core approach:
.scm - single-pass query extraction path in Gate 1.| Capability | LSP / SCIP index | Call-graph + generic community detection | BGI |
|---|---|---|---|
| Fast symbol lookup | Strong | Medium | Available (Phase 6 index) |
| Behavioral token model | No | Usually no | Yes |
| Hard-bounded clustering | No | Usually no | Yes (unit-level) |
| First-class boundary artifact | No | Usually no | Yes (fuse-graph.json) |
| Scope-constrained edge generation | Limited | Rare | Yes (spectral masks) |
External head-to-head benchmark (Louvain on BGI's edges vs Louvain on raw imports, scored against package layout): BGI's edges win on Python (django F1 0.38 vs 0.29, MoJoFM 0.45 vs 0.34) and currently tie/lose on Go due to lower cross-file edge density on tier-2 scanners. Full results and methodology in docs/VALIDATION_EVIDENCE.md.
Comparable kubernetes sample (go comparable mode, 162,917 units):
141.964s67.261s (historical comparable baseline: 138.869s)9.359s218.584s1.113%0Artifact: output/validation/kubernetes-optionb-controlled-median-v21.json
tests/test_gate2.py).tests/test_gate3.py).python3 -m pytest tests/ -x -q (project baseline target remains passing).task → COV → top-3 twins + seam + rubric) is complete: actionability 4.75/5 (p04 slice: 4.8/5), boundary 1.0, hallucinations 0.django/p02 miss.BGI does not treat all languages equally; support is tiered:
.scm): python, typescript, tsx, javascript, go, rust, java, csharp, php, ruby, kotlin, scalac, lua, elixirswift, r, dart, bash, nim, zig, haskell, ocaml, fsharp, clojure, erlang, matlab, vb, crystal, cobol, groovyUse this as a reliability signal: query-backed and dedicated scanner tiers are stronger than generic fallback.
Cross-file edge density caveat: the language tiers above describe parser quality. A separate axis is cross-file behavioral edge density — how many key-lock pairs the scanner produces that link units in different files. Tier-1 (.scm-backed) languages produce dense cross-file edges. Tier-2 scanner-backed languages currently produce sparser cross-file edges because their token mix is dominated by structural tokens (INTAKE/OUTPUT/CONDITIONAL/LOOP) that gate-2 deliberately scopes to same-file to prevent O(N²) noise. The user-visible MCP product (boundary detection, twin retrieval, AI-assistant context) still works on tier-2 languages — see the validation evidence — but cluster-recovery benchmarks against import-graph baselines reflect this density gap. Concrete numbers in docs/VALIDATION_EVIDENCE.md.
pip install -e .
# scan
bgi scan /path/to/repo --lang auto --out bgi-graph.json
# optional outputs
bgi scan /path/to/repo --lang auto \
--fuse-graph fuse-graph.json \
--routes routes.json \
--graphml graph.graphml \
--html
# incremental
bgi scan /path/to/repo --lang auto --incremental --cache .bgi-cache.json
# diff
bgi diff /path/before /path/after --lang auto --out diff.json
# run MCP server over generated artifacts
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json
Example MCP usage pattern (from your client prompt):
Use MCP tool twin_context for:
"Add endpoint that validates input and persists data."
Return top twin candidate, seam suggestion, and rubric checklist.
BGI ships with opt-in, off-by-default anonymous telemetry. To enable:
export BGI_TELEMETRY=1
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json
What's collected when enabled: BGI version, OS, repo size bucket, and a 12-char hash of your repo's git remote (so we can deduplicate "same repo seen twice" without ever knowing which repo). What's never collected: file paths, source code, repo names, user identity, or IP addresses. Full schema and disable instructions in docs/TELEMETRY.md.
MEMORANDUM.md - design contracts and invariantsdocs/LANGUAGE_SUPPORT.md - language implementation detailsdocs/CONTRIBUTING_LANGUAGES.md - language contribution guidedocs/INDEX_SCHEMA.md - interactive index schemadocs/QUERY_PLANNER.md - query planner scoringdocs/MCP_SETUP.md - MCP server setup and usagedocs/MCP_WITH_CONTINUE.md - 5-minute Continue + BGI walkthroughdocs/TELEMETRY.md - opt-in telemetry: what we collect and how to disablehttps://bigindexer.com/validation - public validation evidencedocs/MCP_QUICKSTART_DEMO.md - 5-minute demo walkthroughdocs/MCP_EXAMPLE_TRANSCRIPTS.md - real-world MCP tool invocation examplesdocs/MCP_REAL_TRANSCRIPT.md - unedited transcript from FastAPI analysisscripts/mcp-demo.sh - automated demo script for multiple CLIs and repositoriesLICENSE)DCO) enforced on pull requestsВыполни в терминале:
claude mcp add big-indexer -- npx Да, Big Indexer MCP бесплатный — установка в один клик через Unyly без оплаты.
Нет, Big Indexer работает без API-ключей и переменных окружения.
Self-hosted: сервер запускается локально на твоей машине командой из раздела установки.
Открой Big Indexer на unyly.org, выбери вкладку своего клиента (Claude Desktop, Claude Code, Cursor) и нажми Install — конфиг сгенерируется автоматически, без правки JSON.
CSA PROJECT - FZCO © 2026 IFZA Business Park, DDP, Premises Number 31174 - 001
Безопасность
Низкий рискАвтоматическая эвристика по публичным данным — не гарантия безопасности.