loading…
Search for a command to run...
loading…
A local security knowledge base that indexes documentation like CVEs and CWEs using hybrid keyword and semantic search. It enables LLM agents to query indexed m
A local security knowledge base that indexes documentation like CVEs and CWEs using hybrid keyword and semantic search. It enables LLM agents to query indexed materials via MCP for accurate, offline retrieval during security audits and code reviews.
Your LLM re-reads the same reference docs every conversation. Grimoire indexes them once.
Your LLM agent needs to reference CWE-89 during a code review. Without Grimoire, it either hallucinates the details, or you paste 50 pages of NIST docs into the context window and hope it finds the right paragraph. Every conversation. Every time.
Grimoire indexes security reference material once — CVEs, CWEs, OWASP, audit findings, your internal standards — into a single SQLite file with both FTS5 keyword search and semantic embeddings. Your LLM agent searches it mid-conversation via MCP. Exact matches when you need "CWE-89". Conceptual recall when you need "authentication bypass techniques". Both in one query.
One SQLite file. Zero cloud. Instant retrieval via MCP.
+------------------+
| Data Sources |
| CVE MD CSV .. |
+--------+---------+
|
ingest()
|
+--------v---------+
| SQLite DB |
| +------------+ |
| | documents | |
| +------------+ |
| | docs_fts5 | | <-- FTS5 keyword index
| +------------+ |
| | embeddings | | <-- semantic vectors
| +------------+ |
+--------+---------+
|
+--------v---------+
| Search Engine |
| |
| keyword (BM25) |
| semantic (cos) |
| hybrid (both) |
+--------+---------+
|
+-------------+-------------+
| |
+------v------+ +--------v--------+
| Python API | | MCP Server |
| | | |
| Grimoire() | | grimoire_search |
| .search() | | grimoire_status |
| .add_doc() | | grimoire_quality|
+-------------+ +-----------------+
git clone https://github.com/tannernicol/grimoire.git
cd grimoire
pip install -e .
# Fetch and index real security data (NVD CVEs + CWE catalog + OWASP Top 10)
python scripts/fetch_sources.py all
# Search
python examples/search_demo.py "SQL injection"
python examples/search_demo.py "access control" --severity critical
python examples/search_demo.py --status
Grimoire fetches from reputable public sources — no manual downloads:
# Everything: NVD + CWE + OWASP
python scripts/fetch_sources.py all
# Recent CVEs from NIST NVD (last 90 days, critical only)
python scripts/fetch_sources.py nvd --days 90 --severity CRITICAL
# Full CWE catalog from MITRE
python scripts/fetch_sources.py cwe
# With embeddings for semantic search (requires Ollama)
python scripts/fetch_sources.py all --embeddings
Requires Ollama with nomic-embed-text:
ollama pull nomic-embed-text
python scripts/fetch_sources.py all --embeddings
python examples/search_demo.py "authentication bypass" --mode hybrid
Most RAG setups do one thing: chunk documents, embed them, vector search. That works until you need an exact CVE number, a specific NIST control ID, or a CWE by name. Vector search alone misses exact matches.
Grimoire runs both:
Everything lives in a single SQLite file. No Postgres, no Pinecone, no cloud anything.
Resume reviewers click through looking for real security signal. Grimoire makes it obvious:
python scripts/fetch_sources.py nvd --days 30 --severity CRITICAL
python scripts/fetch_sources.py cwe --embeddings
python examples/search_demo.py "JWT kid bypass" --mode hybrid --limit 5
The search demo will show hybrid BM25 + cosine hits with severity labels, while
the MCP server (pip install -e '.[mcp]') exposes identical results to your
agents without copying a single document into the prompt.
from grimoire.core import Grimoire
g = Grimoire("security_kb.db")
# Add documents
g.add_document(
source="advisory",
title="CVE-2024-1234",
content="Buffer overflow in example library allows RCE via crafted input...",
severity="critical",
categories=["buffer-overflow", "RCE"],
)
# Search
results = g.search("buffer overflow", mode="hybrid", limit=10)
for r in results:
print(f"[{r.score:.3f}] {r.title} ({r.severity})")
# Check index health
status = g.index_status()
health = g.health_check()
Built-in ingestors for common security data formats:
# CVE/NVD feeds (API 2.0, 1.1, or JSON array)
from grimoire.ingest.cve import CVEIngestor
CVEIngestor().ingest_to_grimoire(g, "cve_data.json")
# Markdown files (recursively scan directories)
from grimoire.ingest.markdown import MarkdownIngestor
MarkdownIngestor(source_label="audit-findings").ingest_to_grimoire(g, "findings/")
# CSV with column mapping
from grimoire.ingest.csv import CSVIngestor
CSVIngestor(
source_label="vuln-db",
column_map={"vuln_name": "title", "details": "content"},
).ingest_to_grimoire(g, "vulns.csv")
Add your own by subclassing BaseIngestor:
from grimoire.ingest.base import BaseIngestor
class MyIngestor(BaseIngestor):
source_name = "my-source"
def ingest(self, path):
for item in read_my_data(path):
yield {
"source": self.source_name,
"title": item["name"],
"content": item["description"],
"severity": item.get("severity"),
"categories": item.get("tags"),
}
Grimoire ships an MCP server so LLM agents can search your knowledge base mid-conversation.
Note: The MCP server is an optional dependency. Install it with:
pip install -e ".[mcp]"
# Start the server
grimoire-mcp --db security_kb.db
Add to Claude Code or Claude Desktop:
{
"mcpServers": {
"grimoire": {
"command": "grimoire-mcp",
"args": ["--db", "/path/to/security_kb.db"]
}
}
}
| Tool | What it does |
|---|---|
grimoire_search |
Keyword, semantic, or hybrid search with severity/source filters |
grimoire_index_status |
Document count, embedding coverage, sources, last update |
grimoire_quality |
Health check; optionally test a query for result quality |
database:
path: grimoire.db
ollama:
url: http://localhost:11434
model: nomic-embed-text
search:
default_mode: hybrid
semantic_weight: 0.6 # 60% semantic, 40% keyword
default_limit: 20
min_similarity: 0.3
quality:
min_cases: 5
gate_on_missing_eval: false
nomic-embed-text)hybrid = (0.4 * normalized_bm25) + (0.6 * cosine_sim)sqlite3)nomic-embed-text (only needed for semantic/hybrid search — keyword works without it)pip install -e ".[dev]"
pytest
In scope — what Grimoire defends against:
Out of scope — what Grimoire intentionally does not defend against:
nomic-embed-text could influence rankingflowchart TB
subgraph Data Sources
NVD[NVD / CVE Feeds]
CWE[MITRE CWE Catalog]
OWASP[OWASP Top 10]
Custom[Markdown / CSV\nAudit Findings]
end
NVD --> Ingest
CWE --> Ingest
OWASP --> Ingest
Custom --> Ingest
subgraph Grimoire Core
Ingest[Ingestor Pipeline]
Ingest -->|documents| DB[(SQLite DB)]
DB -->|FTS5 index| FTS[BM25 Keyword Search]
DB -->|embedding vectors| Sem[Cosine Semantic Search]
Ollama[Local Ollama\nnomic-embed-text] -->|embeddings| DB
FTS --> Fusion[Score Fusion\n40% keyword + 60% semantic]
Sem --> Fusion
Fusion --> Results[Ranked Results\nwith provenance]
end
Results --> API[Python API\nGrimoire.search]
Results --> MCP[MCP Server\ngrimoire_search\ngrimoire_status\ngrimoire_quality]
MCP --> Agent([LLM Agent\ne.g. Claude Code])
API --> Scripts([Scripts / Pipelines])
Tanner Nicol — tannner.com · GitHub · LinkedIn
MIT
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"grimoire": {
"command": "npx",
"args": []
}
}
}