loading…
Search for a command to run...
loading…
Big Indexer provides architecture-aware context (boundaries, seams, impact scope, and BGI-TWIN tools).
Big Indexer provides architecture-aware context (boundaries, seams, impact scope, and BGI-TWIN tools).
ahmedxuhri/bigindexer MCP server PyPI version License Tests MCP Registry
Copyright (c) 2026 Ahmed Xuhri — Licensed under Apache-2.0.
BGI is a static architecture analysis tool for large codebases.
It groups code units by behavioral role and emits explicit architectural boundaries.
Project domain: bigindexer.com
Big Indexer is published in the MCP Registry as io.github.ahmedxuhri/bigindexer.
pip install bigindexer==0.1.2
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json
Validation: https://bigindexer.com/validation
Most architecture graphs fail at scale in two ways:
BGI is built to keep both under control, so the output remains usable on large repos.
bgi-graph.json, fuse-graph.json) for automation and review.task_fingerprint, behavioral_twins, twin_context) so they start from proven in-repo patterns.Run BGI on the included fixture repo:
git clone https://github.com/ahmedxuhri/bigindexer
cd bigindexer
pip install -e .
bgi scan tests/fixtures --lang python --out /tmp/bgi-example.json
head -50 /tmp/bgi-example.json
Observed result on this repository:
121426 unitsOne produced edge looks like:
{
"source": "auth_module.py::AuthService::__init__",
"target": "auth_module.py::AuthService::__del__",
"key": "COV.INIT",
"lock": "COV.TEARDOWN",
"type": "HARD"
}
Why this matters: instead of raw syntax references only, you get behavioral relationships plus cluster structure that can drive architecture decisions.
| BGI term | Plain meaning |
|---|---|
| COV token | A behavior label for a unit (for example: FETCH, PERSIST, AUTHENTICATE) |
| Key-Lock edge | A behavioral connection between two units with complementary roles |
| DRS cluster | A group of units likely belonging to one architectural component |
| Fuse edge / fuse event | A refused merge because cluster growth hit the cap; treated as boundary signal |
| Spectral masks | Scope rules that limit where matching is allowed (global, directory, file) |
Source files
->
Gate 1: fingerprint unit behavior (COV tokens)
->
Gate 2: create behavioral edges with scoped matching
->
Gate 3: cluster with hard size cap + boundary emission
->
Artifacts: bgi-graph.json, fuse-graph.json, optional routes/graphml/html
Core approach:
.scm - single-pass query extraction path in Gate 1.| Capability | LSP / SCIP index | Call-graph + generic community detection | BGI |
|---|---|---|---|
| Fast symbol lookup | Strong | Medium | Available (Phase 6 index) |
| Behavioral token model | No | Usually no | Yes |
| Hard-bounded clustering | No | Usually no | Yes |
| First-class boundary artifact | No | Usually no | Yes (fuse-graph.json) |
| Scope-constrained edge generation | Limited | Rare | Yes (spectral masks) |
Comparable kubernetes sample (go comparable mode, 162,917 units):
141.964s67.261s (historical comparable baseline: 138.869s)9.359s218.584s1.113%0Artifact: output/validation/kubernetes-optionb-controlled-median-v21.json
tests/test_gate2.py).tests/test_gate3.py).python3 -m pytest tests/ -x -q (project baseline target remains passing).task → COV → top-3 twins + seam + rubric) is complete: actionability 4.75/5 (p04 slice: 4.8/5), boundary 1.0, hallucinations 0.django/p02 miss.BGI does not treat all languages equally; support is tiered:
.scm): python, typescriptjavascript, java, go, rust, ruby, csharp, php, kotlin, c, scala, lua, elixirswift, r, dart, bash, nim, zig, haskell, ocaml, fsharp, clojure, erlang, matlab, vb, crystal, cobol, groovyUse this as a reliability signal: query-backed and dedicated scanner tiers are stronger than generic fallback.
pip install -e .
# scan
bgi scan /path/to/repo --lang auto --out bgi-graph.json
# optional outputs
bgi scan /path/to/repo --lang auto \
--fuse-graph fuse-graph.json \
--routes routes.json \
--graphml graph.graphml \
--html
# incremental
bgi scan /path/to/repo --lang auto --incremental --cache .bgi-cache.json
# diff
bgi diff /path/before /path/after --lang auto --out diff.json
# run MCP server over generated artifacts
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json
Example MCP usage pattern (from your client prompt):
Use MCP tool twin_context for:
"Add endpoint that validates input and persists data."
Return top twin candidate, seam suggestion, and rubric checklist.
MEMORANDUM.md - design contracts and invariantsdocs/LANGUAGE_SUPPORT.md - language implementation detailsdocs/CONTRIBUTING_LANGUAGES.md - language contribution guidedocs/INDEX_SCHEMA.md - interactive index schemadocs/QUERY_PLANNER.md - query planner scoringdocs/MCP_SETUP.md - MCP server setup and usagehttps://bigindexer.com/validation - public validation evidencedocs/MCP_QUICKSTART_DEMO.md - 5-minute demo walkthroughdocs/MCP_EXAMPLE_TRANSCRIPTS.md - real-world MCP tool invocation examplesdocs/MCP_REAL_TRANSCRIPT.md - unedited transcript from FastAPI analysisscripts/mcp-demo.sh - automated demo script for multiple CLIs and repositoriesLICENSE)DCO) enforced on pull requestsRun in your terminal:
claude mcp add big-indexer -- npx