loading…
Search for a command to run...
loading…
Local RAG system for Claude Code with hybrid search (BM25 + semantic), cross-encoder reranking, markdown-aware chunking, query expansion, and 12 MCP tools. Runs
Local RAG system for Claude Code with hybrid search (BM25 + semantic), cross-encoder reranking, markdown-aware chunking, query expansion, and 12 MCP tools. Runs entirely offline with zero external servers.
PyPI
NPM
Downloads
CI
CodeQL
Glama Score
Drop your PDFs, markdown, code, notebooks — 1800+ files, 39K chunks, indexed in under 3 minutes.
Hybrid search (BM25 + semantic vectors + cross-encoder reranking) through 12 MCP tools.
Everything runs locally via ONNX. No Docker, no Ollama, no API keys, no data leaves your machine.
pip install knowledge-rag → restart Claude Code → search_knowledge("your query")
12 MCP Tools | Hybrid Search + Reranking | 20 File Formats | Optional NVIDIA GPU | 100% Local
What's New | Supported Formats | Installation | Configuration | API Reference | Architecture
Language-aware extraction for C, C++, JavaScript, TypeScript, and XML — functions, classes, structs, interfaces, imports, and namespaces are captured as searchable metadata. Total supported formats: 20.
npx -y knowledge-rag # NPM — zero setup, auto-manages Python venv
pip install knowledge-rag # PyPI — classic Python install
curl -fsSL .../install.sh | bash # One-line installer (Linux/macOS/Windows)
docker pull ghcr.io/lyonzin/knowledge-rag # Docker — models pre-downloaded
git clone ... && pip install -r ... # From source
All methods produce the same MCP server. See Installation for full instructions.
gpu: false), BASE_DIR resolution fix for editable installs<3.13 upper bound — 3.13 and 3.14 now supportedSee Changelog for full history.
| Format | Extension | Parser | Default | Notes |
|---|---|---|---|---|
| Markdown | .md |
Section-aware (splits at ##) |
Yes | Headers preserved as chunk boundaries |
| Plain Text | .txt |
Fixed-size chunking | Yes | 1000 chars + 200 overlap |
.pdf |
PyMuPDF extraction | Yes | Text-based PDFs only (no OCR) | |
| Python | .py |
Code-aware parser | Yes | Functions/classes as chunks |
| JSON | .json |
Structure-aware | Yes | Flattened key-value extraction |
| CSV | .csv |
Row-based parser | Yes | Headers + rows as text |
| Word | .docx |
python-docx | Yes | Headings preserved as markdown |
| Excel | .xlsx |
openpyxl | Yes | Sheet-by-sheet extraction |
| PowerPoint | .pptx |
python-pptx | Yes | Slide-by-slide extraction |
| Jupyter Notebook | .ipynb |
Cell-aware parser | Yes | Markdown + code cells only, no outputs/base64 |
| C Source | .c |
Code-aware parser | Yes | Functions/structs/includes extracted |
| C/C++ Header | .h |
Code-aware parser | Yes | Function declarations/structs extracted |
| C++ Source | .cpp |
Code-aware parser | Yes | Classes/structs/includes extracted |
| JavaScript | .js |
Code-aware parser | Yes | Functions/classes/imports (ESM + CJS) |
| React JSX | .jsx |
Code-aware parser | Yes | Same as JS parser |
| TypeScript | .ts |
Code-aware parser | Yes | Functions/classes/interfaces/enums/imports |
| React TSX | .tsx |
Code-aware parser | Yes | Same as TS parser |
| XML | .xml |
XML parser | Yes | Root element and namespace extraction |
| MQL4 Header | .mqh |
Code parser | No | MetaTrader — add to supported_formats to enable |
| MQL4 Source | .mq4 |
Code parser | No | MetaTrader — add to supported_formats to enable |
Tip: The parser dispatch is extensible. Any format mapped in
_parserscan be enabled viasupported_formatsin config.yaml.
| Feature | Description |
|---|---|
| Hybrid Search | Semantic + BM25 keyword search with Reciprocal Rank Fusion |
| Cross-Encoder Reranker | Xenova/ms-marco-MiniLM-L-6-v2 re-scores top candidates for precision |
| GPU Acceleration | Optional ONNX CUDA support for 5-10x faster indexing |
| YAML Configuration | Fully customizable via config.yaml with domain-specific presets |
| Query Expansion | Configurable synonym mappings (69 security-term defaults) |
| Markdown-Aware Chunking | .md files split by ##/### sections instead of fixed windows |
| In-Process Embeddings | FastEmbed ONNX Runtime (BAAI/bge-small-en-v1.5, 384D) |
| Keyword Routing | Word-boundary aware routing for domain-specific queries |
| 20 Format Parsers | MD, TXT, PDF, PY, C, H, CPP, JS, JSX, TS, TSX, JSON, XML, CSV, DOCX, XLSX, PPTX, IPYNB + opt-in MQH/MQ4 |
| Category Organization | Organize docs by folder, auto-tagged by path |
| Incremental Indexing | Change detection via mtime/size — only re-indexes modified files |
| Chunk Deduplication | SHA256 content hashing prevents duplicate chunks |
| Query Cache | LRU cache with 5-min TTL for instant repeat queries |
| Document CRUD | Add, update, remove documents via MCP tools |
| URL Ingestion | Fetch URLs, strip HTML, convert to markdown, index |
| Similarity Search | Find documents similar to a reference document |
| Retrieval Evaluation | Built-in MRR@5 and Recall@5 metrics |
| File Watcher | Auto-reindex on document changes via watchdog (5s debounce) |
| Exclude Patterns | Glob-based file/directory exclusion during indexing |
| MMR Diversification | Maximal Marginal Relevance reduces redundant results |
| Persistent Model Cache | Embedding models cached in models_cache/ — survives reboots |
| Auto-Migration | Detects embedding dimension mismatch and rebuilds automatically |
| 12 MCP Tools | Full CRUD + search + evaluation via Claude Code |
flowchart TB
subgraph MCP["MCP SERVER (FastMCP)"]
direction TB
TOOLS["12 MCP Tools<br/>search | get | add | update | remove<br/>reindex | list | stats | url | similar | evaluate"]
end
subgraph SEARCH["HYBRID SEARCH ENGINE"]
direction LR
ROUTER["Keyword Router<br/>(word boundaries)"]
SEMANTIC["Semantic Search<br/>(ChromaDB)"]
BM25["BM25 Keyword<br/>(rank-bm25 + expansion)"]
RRF["Reciprocal Rank<br/>Fusion (RRF)"]
RERANK["Cross-Encoder<br/>Reranker"]
ROUTER --> SEMANTIC
ROUTER --> BM25
SEMANTIC --> RRF
BM25 --> RRF
RRF --> RERANK
end
subgraph STORAGE["STORAGE LAYER"]
direction LR
CHROMA[("ChromaDB<br/>Vector Database")]
COLLECTIONS["Collections<br/>security | ctf<br/>logscale | development"]
CHROMA --- COLLECTIONS
end
subgraph EMBED["EMBEDDINGS (In-Process)"]
FASTEMBED["FastEmbed ONNX<br/>BAAI/bge-small-en-v1.5<br/>(384D, CPU or GPU)"]
CROSSENC["Cross-Encoder<br/>ms-marco-MiniLM-L-6-v2"]
FASTEMBED --- CROSSENC
end
subgraph INGEST["DOCUMENT INGESTION"]
PARSERS["20 Parsers<br/>MD | PDF | TXT | PY | C | H | CPP | JS | JSX | TS | TSX | JSON | XML | CSV<br/>DOCX | XLSX | PPTX | IPYNB | MQH | MQ4"]
CHUNKER["Chunking<br/>MD: section-aware<br/>Other: 1000 chars + 200 overlap"]
PARSERS --> CHUNKER
end
CLAUDE["Claude Code"] --> MCP
MCP --> SEARCH
SEARCH --> STORAGE
STORAGE --> EMBED
INGEST --> EMBED
EMBED --> STORAGE
flowchart TB
QUERY["User Query<br/>'mimikatz credential dump'"] --> EXPAND
subgraph EXPANSION["Query Expansion"]
EXPAND["Synonym Expansion<br/>mimikatz -> mimikatz, sekurlsa, logonpasswords"]
end
EXPAND --> ROUTER
subgraph ROUTING["Keyword Routing"]
ROUTER["Keyword Router"]
MATCH{"Word Boundary<br/>Match?"}
CATEGORY["Filter: redteam"]
NOFILTER["No Filter"]
ROUTER --> MATCH
MATCH -->|Yes| CATEGORY
MATCH -->|No| NOFILTER
end
subgraph HYBRID["Hybrid Search"]
direction LR
SEMANTIC["Semantic Search<br/>(ChromaDB embeddings)<br/>Conceptual similarity"]
BM25["BM25 Search<br/>(expanded query)<br/>Exact term matching"]
end
subgraph FUSION["Result Fusion + Reranking"]
RRF["Reciprocal Rank Fusion<br/>score = alpha * 1/(k+rank_sem)<br/>+ (1-alpha) * 1/(k+rank_bm25)"]
RERANK["Cross-Encoder Reranker<br/>Re-scores top 3x candidates<br/>query+doc pair scoring"]
SORT["Sort by Reranker Score<br/>Normalize to 0-1"]
RRF --> RERANK --> SORT
end
CATEGORY --> HYBRID
NOFILTER --> HYBRID
SEMANTIC --> RRF
BM25 --> RRF
SORT --> RESULTS["Results<br/>search_method: hybrid|semantic|keyword<br/>score + reranker_score + raw_rrf_score"]
flowchart LR
subgraph INPUT["Input"]
FILES["documents/<br/>├── security/<br/>├── development/<br/>├── ctf/<br/>└── general/"]
end
subgraph PARSE["Parse (20 formats)"]
MD["Markdown"]
PDF["PDF<br/>(PyMuPDF)"]
OFFICE["DOCX | XLSX<br/>PPTX | CSV"]
CODE["PY | C | H | CPP | JS | JSX<br/>TS | TSX | JSON | XML | IPYNB"]
end
subgraph CHUNK["Chunk"]
MDSPLIT["MD: Section-Aware<br/>Split at ## headers"]
TXTSPLIT["Other: Fixed-Size<br/>1000 chars + 200 overlap"]
DEDUP["SHA256 Dedup<br/>Skip duplicate content"]
end
subgraph EMBED["Embed"]
FASTEMBED["FastEmbed ONNX<br/>bge-small-en-v1.5<br/>(384D, CPU or GPU)"]
end
subgraph STORE["Store"]
CHROMADB[("ChromaDB")]
BM25IDX["BM25 Index"]
end
FILES --> MD & PDF & OFFICE & CODE
MD --> MDSPLIT
PDF & OFFICE & CODE --> TXTSPLIT
MDSPLIT --> DEDUP
TXTSPLIT --> DEDUP
DEDUP --> EMBED
EMBED --> STORE
flowchart LR
subgraph ALPHA["hybrid_alpha values"]
A0["0.0<br/>Pure BM25<br/>Instant"]
A3["0.3 (default)<br/>Keyword-heavy<br/>Fast"]
A5["0.5<br/>Balanced"]
A7["0.7<br/>Semantic-heavy"]
A10["1.0<br/>Pure Semantic"]
end
subgraph USE["Best For"]
U0["CVEs, tool names<br/>exact matches"]
U3["Technical queries<br/>specific terms"]
U5["General queries"]
U7["Conceptual queries<br/>related topics"]
U10["'How to...' questions<br/>conceptual search"]
end
A0 --- U0
A3 --- U3
A5 --- U5
A7 --- U7
A10 --- U10
pip install knowledge-rag[gpu] + models.embedding.gpu: true in config)Pick one — all produce the same running server.
Requires Node.js 16+. Handles Python venv, pip install, and version upgrades automatically.
claude mcp add knowledge-rag -s user -- npx -y knowledge-rag
That's it. On first run, npx creates a venv at ~/.knowledge-rag/, installs the PyPI package, and starts the MCP server. Subsequent runs reuse the cached venv.
# Linux/macOS:
curl -fsSL https://raw.githubusercontent.com/lyonzin/knowledge-rag/master/install.sh | bash
# Windows (PowerShell):
irm https://raw.githubusercontent.com/lyonzin/knowledge-rag/master/install.ps1 | iex
Then configure Claude Code:
claude mcp add knowledge-rag -s user -- ~/knowledge-rag/venv/bin/python -m mcp_server.server
Windows:
claude mcp add knowledge-rag -s user -- %USERPROFILE%\knowledge-rag\venv\Scripts\python.exe -m mcp_server.server
mkdir ~/knowledge-rag && cd ~/knowledge-rag
python3 -m venv venv && source venv/bin/activate
pip install knowledge-rag
knowledge-rag init # Exports config template, presets, creates documents/
Then configure Claude Code:
claude mcp add knowledge-rag -s user -- ~/knowledge-rag/venv/bin/python -m mcp_server.server
Windows users: Use
pythoninstead ofpython3,venv\Scripts\activateinstead ofsource venv/bin/activate. Windows path:claude mcp add knowledge-rag -s user -- %USERPROFILE%\knowledge-rag\venv\Scripts\python.exe -m mcp_server.server
git clone https://github.com/lyonzin/knowledge-rag.git ~/knowledge-rag
cd ~/knowledge-rag
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
Then configure Claude Code:
claude mcp add knowledge-rag -s user -- ~/knowledge-rag/venv/bin/python -m mcp_server.server
docker pull ghcr.io/lyonzin/knowledge-rag:latest
claude mcp add knowledge-rag -s user -- \
docker run -i --rm \
-v ~/knowledge-rag/documents:/app/documents \
-v ~/knowledge-rag/data:/app/data \
ghcr.io/lyonzin/knowledge-rag:latest
Models are pre-downloaded in the image — no first-run delay.
Add to ~/.claude.json:
Windows:
{
"mcpServers": {
"knowledge-rag": {
"command": "C:\\Users\\YOUR_USER\\knowledge-rag\\venv\\Scripts\\python.exe",
"args": ["-m", "mcp_server.server"]
}
}
}
Linux / macOS:
{
"mcpServers": {
"knowledge-rag": {
"command": "/home/YOUR_USER/knowledge-rag/venv/bin/python",
"args": ["-m", "mcp_server.server"]
}
}
}
Replace
YOUR_USERwith your username, or use the full path fromecho $HOME.
claude mcp list
On first start, the server will:
models_cache/)documents/ directoryPlace your documents in the documents/ directory, organized by category:
documents/
├── security/ # Pentest, exploit, vulnerability docs
├── development/ # Code, APIs, frameworks
├── ctf/ # CTF writeups and methodology
├── logscale/ # LogScale/LQL documentation
└── general/ # Everything else
Or add documents programmatically via MCP tools:
# Add from content
add_document(
content="# My Document\n\nContent here...",
filepath="security/my-technique.md",
category="security"
)
# Add from URL
add_from_url(
url="https://example.com/article",
category="security",
title="Custom Title"
)
Claude uses the RAG system automatically when configured. You can also control search behavior:
# Pure keyword search — instant, no embedding needed
search_knowledge("gtfobins suid", hybrid_alpha=0.0)
# Keyword-heavy (default) — fast, slight semantic boost
search_knowledge("mimikatz", hybrid_alpha=0.3)
# Balanced hybrid — both engines equally weighted
search_knowledge("SQL injection techniques", hybrid_alpha=0.5)
# Semantic-heavy — better for conceptual queries
search_knowledge("how to escalate privileges", hybrid_alpha=0.7)
# Pure semantic — embedding similarity only
search_knowledge("lateral movement strategies", hybrid_alpha=1.0)
Documents are automatically indexed on first startup. To manage the index:
# Incremental: only re-index changed files (fast)
reindex_documents()
# Smart reindex: detect changes + rebuild BM25
reindex_documents(force=True)
# Nuclear rebuild: delete everything, re-embed all (use after model change)
reindex_documents(full_rebuild=True)
evaluate_retrieval(test_cases='[
{"query": "sql injection", "expected_filepath": "security/sqli-guide.md"},
{"query": "privilege escalation", "expected_filepath": "security/privesc.md"}
]')
# Returns: MRR@5, Recall@5, per-query results
search_knowledgeHybrid search combining semantic search + BM25 keyword search with cross-encoder reranking.
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | required | Search query text (1-3 keywords recommended) |
max_results |
int | 5 | Maximum results to return (1-20) |
category |
string | null | Filter by category |
hybrid_alpha |
float | 0.3 | Balance: 0.0 = keyword only, 1.0 = semantic only |
Returns:
{
"status": "success",
"query": "mimikatz credential dump",
"hybrid_alpha": 0.5,
"result_count": 3,
"cache_hit_rate": "0.0%",
"results": [
{
"content": "Mimikatz can extract credentials from memory...",
"source": "documents/security/credential-attacks.md",
"filename": "credential-attacks.md",
"category": "security",
"score": 0.9823,
"raw_rrf_score": 0.016393,
"reranker_score": 0.987654,
"semantic_rank": 2,
"bm25_rank": 1,
"search_method": "hybrid",
"keywords": ["mimikatz", "credential", "lsass"],
"routed_by": "redteam"
}
]
}
Search Method Values:
hybrid: Found by both semantic and BM25 search (highest confidence)semantic: Found only by semantic searchkeyword: Found only by BM25 keyword searchget_documentRetrieve the full content of a specific document.
| Parameter | Type | Description |
|---|---|---|
filepath |
string | Path to the document file |
Returns: JSON with document content, metadata, keywords, and chunk count.
reindex_documentsIndex or reindex all documents in the knowledge base.
| Parameter | Type | Default | Description |
|---|---|---|---|
force |
bool | false | Smart reindex: detects changes, rebuilds BM25. Fast. |
full_rebuild |
bool | false | Nuclear rebuild: deletes everything, re-embeds all documents. Use after model change. |
Returns: JSON with indexing statistics (indexed, updated, skipped, deleted, chunks_added, chunks_removed, dedup_skipped, elapsed_seconds).
list_categoriesList all document categories with their document counts.
Returns:
{
"status": "success",
"categories": {
"security": 52,
"development": 8,
"ctf": 12,
"general": 3
},
"total_documents": 75
}
list_documentsList all indexed documents, optionally filtered by category.
| Parameter | Type | Description |
|---|---|---|
category |
string | Optional category filter |
Returns: JSON array of documents with id, source, category, format, chunks, and keywords.
get_index_statsGet statistics about the knowledge base index.
Returns:
{
"status": "success",
"stats": {
"total_documents": 75,
"total_chunks": 9256,
"unique_content_hashes": 9100,
"categories": {"security": 52, "development": 8},
"supported_formats": [".md", ".txt", ".pdf", ".py", ".json", ".docx", ".xlsx", ".pptx", ".csv", ".ipynb"],
"embedding_model": "BAAI/bge-small-en-v1.5",
"embedding_dim": 384,
"reranker_model": "Xenova/ms-marco-MiniLM-L-6-v2",
"chunk_size": 1000,
"chunk_overlap": 200,
"query_cache": {
"size": 12,
"max_size": 100,
"ttl_seconds": 300,
"hits": 45,
"misses": 23,
"hit_rate": "66.2%"
}
}
}
add_documentAdd a new document to the knowledge base from raw content. Saves the file to the documents directory and indexes it immediately.
| Parameter | Type | Default | Description |
|---|---|---|---|
content |
string | required | Full text content of the document |
filepath |
string | required | Relative path within documents dir (e.g., security/new-technique.md) |
category |
string | "general" | Document category |
update_documentUpdate an existing document. Removes old chunks from the index and re-indexes with new content.
| Parameter | Type | Description |
|---|---|---|
filepath |
string | Full path to the document file |
content |
string | New content for the document |
remove_documentRemove a document from the knowledge base index. Optionally deletes the file from disk.
| Parameter | Type | Default | Description |
|---|---|---|---|
filepath |
string | required | Path to the document file |
delete_file |
bool | false | If true, also delete the file from disk |
add_from_urlFetch content from a URL, strip HTML (scripts, styles, nav, footer, header), convert to markdown, and add to the knowledge base.
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | required | URL to fetch content from |
category |
string | "general" | Document category |
title |
string | null | Custom title (auto-detected from <title> tag if not provided) |
search_similarFind documents similar to a given document using embedding similarity.
| Parameter | Type | Default | Description |
|---|---|---|---|
filepath |
string | required | Path to the reference document |
max_results |
int | 5 | Number of similar documents to return (1-20) |
evaluate_retrievalEvaluate retrieval quality with test queries. Useful for tuning hybrid_alpha, testing query expansion effectiveness, or validating after reindexing.
| Parameter | Type | Description |
|---|---|---|
test_cases |
string (JSON) | Array of test cases: [{"query": "...", "expected_filepath": "..."}, ...] |
Metrics:
Knowledge RAG is fully configurable via a config.yaml file in the project root. If no config.yaml exists, sensible defaults are used — the system works out of the box with zero configuration.
# Option 1: Use a preset
cp presets/cybersecurity.yaml config.yaml # Offensive/defensive security, CTFs
cp presets/developer.yaml config.yaml # Software engineering, APIs, DevOps
cp presets/research.yaml config.yaml # Academic research, papers, studies
cp presets/general.yaml config.yaml # Blank slate, pure semantic search
# Option 2: Start from the documented template
cp config.example.yaml config.yaml
# Edit config.yaml to your needs
Restart Claude Code after changing config.yaml.
# Paths — where your documents live
paths:
documents_dir: "./documents" # Scanned recursively
data_dir: "./data" # Index storage
models_cache_dir: "./models_cache" # Persistent embedding model cache
# Documents — what gets indexed and how
documents:
supported_formats: # File types to index
- .md
- .txt
- .pdf
- .docx
- .ipynb
# - .py # Uncomment to index code
exclude_patterns: # Glob patterns to skip
- "node_modules"
- ".venv"
- "__pycache__"
chunking:
chunk_size: 1000 # Max chars per chunk
chunk_overlap: 200 # Shared chars between chunks
# Models — AI models for search (all run locally, no API keys)
models:
embedding:
model: "BAAI/bge-small-en-v1.5" # ONNX, ~33MB, auto-downloaded
dimensions: 384
gpu: false # Set true + pip install knowledge-rag[gpu]
reranker:
enabled: true # Set false on low-resource machines
model: "Xenova/ms-marco-MiniLM-L-6-v2"
top_k_multiplier: 3 # Candidates fetched before reranking
# Search — result limits and collection name
search:
default_results: 5
max_results: 20
collection_name: "knowledge_base" # Change for separate knowledge bases
# Categories — auto-tag documents by folder path
# Set to {} to disable categorization entirely
category_mappings:
"security/redteam": "redteam"
"security/blueteam": "blueteam"
"notes": "notes"
# Keyword routing — prioritize categories based on query keywords
# Set to {} for pure semantic search with no routing bias
keyword_routes:
redteam:
- pentest
- exploit
- privilege escalation
# Query expansion — expand abbreviations for better BM25 recall
# Set to {} for no expansion (search terms used as-is)
query_expansions:
sqli:
- sql injection
- sqli
privesc:
- privilege escalation
- privesc
See
config.example.yamlfor the fully documented template with explanations for every field.
Pre-built configurations for common use cases:
| Preset | File | Categories | Keywords | Expansions | Best For |
|---|---|---|---|---|---|
| Cybersecurity | presets/cybersecurity.yaml |
8 | 200+ | 69 | Red/Blue Team, CTFs, threat hunting, exploit dev |
| Developer | presets/developer.yaml |
9 | 150+ | 50+ | Full-stack dev, APIs, DevOps, cloud, databases |
| Research | presets/research.yaml |
9 | 100+ | 40+ | Academic papers, thesis, lab notebooks, datasets |
| General | presets/general.yaml |
0 | 0 | 0 | Blank slate — pure semantic search, no domain logic |
Creating your own preset: Copy config.example.yaml, fill in your categories/keywords/expansions, save to presets/your-domain.yaml.
| Field | Default | Description |
|---|---|---|
paths.documents_dir |
./documents |
Root folder scanned recursively for documents |
paths.data_dir |
./data |
Internal storage for ChromaDB and index metadata |
paths.models_cache_dir |
./models_cache |
Persistent cache for embedding models (~250MB). Survives reboots |
Relative paths resolve from the project root. Absolute paths work too.
| Field | Default | Description |
|---|---|---|
documents.supported_formats |
.md .txt .pdf .py .json .docx .xlsx .pptx .csv .ipynb | File extensions to index |
documents.exclude_patterns |
[] (empty) |
Glob patterns for files/dirs to skip during indexing |
documents.chunking.chunk_size |
1000 | Max characters per chunk |
documents.chunking.chunk_overlap |
200 | Characters shared between consecutive chunks |
Chunking guidelines: Short notes → 500/100. General use → 1000/200. Long technical docs → 1500/300.
For .md files, chunking splits at ## and ### header boundaries first. Sections larger than chunk_size are sub-chunked with overlap. Non-markdown files use fixed-size chunking.
| Field | Default | Description |
|---|---|---|
models.embedding.model |
BAAI/bge-small-en-v1.5 |
Embedding model (ONNX, runs locally) |
models.embedding.dimensions |
384 | Vector dimensions (must match model) |
models.embedding.gpu |
false | Enable CUDA GPU acceleration. Requires pip install knowledge-rag[gpu] |
models.reranker.enabled |
true | Enable cross-encoder reranking |
models.reranker.model |
Xenova/ms-marco-MiniLM-L-6-v2 |
Reranker model |
models.reranker.top_k_multiplier |
3 | Fetch N*multiplier candidates for reranking |
Embedding model options (fastest → most accurate):
BAAI/bge-small-en-v1.5 — 384D, ~33MB (default)BAAI/bge-base-en-v1.5 — 768D, ~130MBBAAI/bge-large-en-v1.5 — 1024D, ~335MBintfloat/multilingual-e5-small — 384D, 100+ languagesWarning: Changing the embedding model after indexing requires
reindex_documents(full_rebuild=True).
| Field | Default | Description |
|---|---|---|
search.default_results |
5 | Results returned when no limit specified |
search.max_results |
20 | Hard cap even if client requests more |
search.collection_name |
knowledge_base |
ChromaDB collection — change for separate KBs |
Map folder paths to category names. Documents in matching folders get auto-tagged, enabling filtered searches.
category_mappings:
"security/redteam": "redteam"
"security": "security"
Set category_mappings: {} to disable — documents are still searchable, just without category filters.
Route queries to categories based on keywords. When a query contains listed keywords, results from that category are prioritized (not filtered — other categories still appear, ranked lower).
keyword_routes:
redteam:
- pentest
- exploit
- sqli
Single-word keywords use regex word boundaries (\b) — "api" won't match "RAPID". Multi-word keywords use substring matching.
Set keyword_routes: {} for pure semantic search.
Expand search terms with synonyms before BM25 search. Supports single tokens, bigrams, and full query matches.
query_expansions:
sqli:
- sql injection
- sqli
k8s:
- kubernetes
- k8s
Set query_expansions: {} for no expansion.
| hybrid_alpha | Behavior | Best For |
|---|---|---|
| 0.0 | Pure BM25 keyword | Exact terms, CVEs, tool names |
| 0.3 | Keyword-heavy (default) | Technical queries with specific terms |
| 0.5 | Balanced | General queries |
| 0.7 | Semantic-heavy | Conceptual queries, related topics |
| 1.0 | Pure semantic | "How to..." questions, abstract concepts |
knowledge-rag/
├── mcp_server/
│ ├── __init__.py # Stdout protection + version
│ ├── config.py # YAML config loader + defaults
│ ├── ingestion.py # 20 parsers, chunking, metadata extraction
│ └── server.py # MCP server, ChromaDB, BM25, reranker, 12 tools
├── config.example.yaml # Documented config template (copy to config.yaml)
├── config.yaml # Your active configuration (git-ignored)
├── presets/ # Ready-to-use domain configurations
│ ├── cybersecurity.yaml
│ ├── developer.yaml
│ ├── research.yaml
│ └── general.yaml
├── documents/ # Your documents (scanned recursively)
├── data/
│ ├── chroma_db/ # ChromaDB vector database
│ └── index_metadata.json # Incremental indexing state
├── models_cache/ # Persistent embedding model cache
├── tests/ # Test suite (82 tests)
├── install.sh # Linux/macOS installer
├── install.ps1 # Windows installer
├── venv/ # Python virtual environment
├── requirements.txt
├── pyproject.toml
├── LICENSE
└── README.md
Requires Python 3.11 or newer.
python --version # Must be 3.11+
On first run, FastEmbed downloads models to models_cache/. If the download fails:
# Clear cache and retry
# Windows:
rmdir /s /q models_cache
# Linux/macOS:
rm -rf models_cache
# Then restart the MCP server
# Check documents directory has files
ls documents/
# Force reindex via Claude Code:
# reindex_documents(force=True)
# Or nuclear rebuild if model changed:
# reindex_documents(full_rebuild=True)
~/.claude.json exists and has valid JSON in the mcpServers section\\) on Windowsclaude mcp list to check connection statusThe MCP server uses stdout for JSON-RPC communication. If a library prints to stdout during init, the stream gets corrupted. v3.4.3+ includes stdout protection that prevents this. If you're on an older version, upgrade:
pip install --upgrade knowledge-rag
The cross-encoder reranker model is lazy-loaded on the first query. This adds a one-time ~2-3 second delay for model download and loading. Subsequent queries are fast.
With ~200 documents, expect ~300-500MB RAM. The embedding model (~50MB) and reranker (~25MB) are loaded into memory. For very large knowledge bases (1000+ documents), consider enabling GPU acceleration and using exclude patterns to limit index scope.
.c), C++ (.cpp/.h), JavaScript (.js/.jsx), TypeScript (.ts/.tsx) with per-language function/class/import extraction.xml) — root element and namespace metadata extractionnpx knowledge-rag) + Docker image (ghcr.io/lyonzin/knowledge-rag)language metadata per file type (was hardcoded to "python" for all code files)[WARN] log when CUDA init fails (missing drivers, wrong version, etc.)CPUExecutionProvider when gpu: false — eliminates noisy CUDA probe errors in logsconfig.yaml over those with only config.example.yaml (fixes editable installs)<3.13 → >=3.11). Python 3.13 and 3.14 now supported — onnxruntime ships wheels for both.pip install knowledge-rag[gpu] + models.embedding.gpu: true in config. 5-10x faster indexing on NVIDIA GPUs with automatic CPU fallback.__init__.py saves original stdout and redirects to stderr during init, server.py main() restores it before mcp.run(). v3.4.2's global redirect broke MCP JSON-RPC response channel.pip install knowledge-rag now auto-detects project directory from venv locationinstall.sh — Linux/macOS installer with pip and from-source modesmodels_cache_dir — persistent embedding model cache, prevents re-download after rebootsexclude_patterns — glob-based file/directory exclusion during indexingknowledge-rag init CLIKNOWLEDGE_RAG_DIR env varhybrid_alpha=0 skips Ollama, default changed from 0.5 to 0.3hybrid_alpha parametergit checkout -b feature/amazing-feature)git push origin feature/amazing-feature)This project is licensed under the MIT License - see the LICENSE file for details.
Lyon.
Security Researcher | Developer
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"lyonzin-knowledge-rag": {
"command": "npx",
"args": []
}
}
}