loading…
Search for a command to run...
loading…
A local-first Graph-RAG system that combines ChromaDB with metadata-based graph relationships and Gemini 2.5 Flash for intelligent Q\&A over your Obsidian vault
A local-first Graph-RAG system that combines ChromaDB with metadata-based graph relationships and Gemini 2.5 Flash for intelligent Q&A over your Obsidian vault.
A powerful local-first Graph-RAG system that combines ChromaDB unified store with metadata-based graph relationships and Gemini 2.5 Flash for intelligent Q&A over your Obsidian vault.
┌─────────────────┐ ┌─────────────────┐
│ Obsidian │ │ ChromaDB │
│ Vault │───▶│ (Unified Store) │
│ │ │ Vector + Graph │
└─────────────────┘ │ Metadata │
│ └─────────────────┘
│ │
│ ▼
│ ┌─────────────────┐
│ │ DSPy │
└─────────────▶│ RAG Engine │
│ + Gemini 2.5 │
│ (Multi-hop │
│ Retrieval) │
└─────────────────┘
│
▼
┌─────────────────┐
│ MCP Server │
│ (FastMCP) │
└─────────────────┘
│
▼
┌─────────────────┐
│ Claude Desktop │
│ Integration │
└─────────────────┘
The easiest way to get started with Claude Desktop, Cursor, or Raycast:
# Interactive setup wizard
uv run install.py
# Or non-interactive with your settings
uv run install.py --vault "/path/to/your/vault" --api-key "your_key"
The installer will:
If you prefer manual setup:
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS/Linux
uv sync
cp configs/.env.example .env
# Edit .env and add your GEMINI_API_KEY and vault paths
# Full indexing (unified ChromaDB store)
uv run scripts/reindex.py all
# Check indexing status
uv run scripts/reindex.py status
Claude Desktop (stdio mode):
{
"mcpServers": {
"graph-rag-obsidian": {
"command": "uvx",
"args": ["--python", "3.13", "--from", ".", "graph-rag-mcp-stdio"],
"cwd": "/path/to/graph-rag-mcp-server",
"env": {
"GEMINI_API_KEY": "your_api_key_here",
"OBSIDIAN_RAG_VAULTS": "/path/to/your/vault"
}
}
}
}
Cursor (HTTP mode):
# Start HTTP server
uv run graph-rag-mcp-http
# Configure Cursor MCP extension to use http://localhost:8765
Raycast (HTTP mode):
# Start HTTP server
uv run graph-rag-mcp-http
# Install generated Raycast extension
For detailed configuration instructions, see SETUP.md.
# Interactive installer
uv run install.py
# Claude Desktop (stdio mode)
uvx --python 3.13 --from . graph-rag-mcp-stdio
# Cursor/Raycast (HTTP mode)
uv run graph-rag-mcp-http
# HTTP with custom port
uv run graph-rag-mcp-http --port 9000
# Direct stdio runs (alternative)
uv run main.py # stdio mode
uv run src/mcp_server.py # stdio mode
# Full indexing
uv run scripts/reindex.py all
# ChromaDB unified store
uv run scripts/reindex.py unified
# Check status
uv run scripts/reindex.py status
# Start file watcher
uv run scripts/reindex_watch.py start
# Test file detection
uv run scripts/reindex_watch.py test
Enhance your vault with intelligent PARA system classification using DSPy:
# Analyze current vault taxonomy state
uv run scripts/enrich_para_taxonomy.py analyze --sample 100
# Preview enrichment (dry run) on sample notes
uv run scripts/enrich_para_taxonomy.py enrich --limit 10 --dry-run
# Apply enrichment to specific notes
uv run scripts/enrich_para_taxonomy.py enrich "path/to/note.md" --apply
# Bulk enrichment with filters
uv run scripts/enrich_para_taxonomy.py enrich --limit 50 --folder "Projects" --apply
# FULL VAULT ENRICHMENT (new!)
# Preview entire vault enrichment
uv run scripts/enrich_para_taxonomy.py enrich-all --dry-run
# Apply to entire vault (skips already enriched by default)
uv run scripts/enrich_para_taxonomy.py enrich-all --apply
# Force re-enrichment of entire vault
uv run scripts/enrich_para_taxonomy.py enrich-all --apply --force-all
# Customize batch size for large vaults
uv run scripts/enrich_para_taxonomy.py enrich-all --apply --batch-size 25
PARA Classification Features:
#para/project/ai/automationThe server exposes these tools for Claude:
search_notes: Vector search across vaultanswer_question: RAG-powered Q&A with citationsgraph_neighbors: Find related notes via graphget_subgraph: Extract note subgraphscreate_note: Create new notes with auto-enriched frontmatterlist_notes: Browse vault contentsread_note: Get full note contentget_note_properties: Read frontmatterupdate_note_properties: Modify frontmatteradd_content_to_note: Append contentget_backlinks: Find notes linking to targetget_notes_by_tag: Find notes by tagarchive_note: Move notes to archivecreate_folder: Create directoriesreindex_vault: Reindex unified ChromaDB storeenrich_notes: Apply PARA taxonomy enrichment to notesKey settings in .env:
# Required
GEMINI_API_KEY=your_key_here
# ChromaDB configuration
OBSIDIAN_RAG_CHROMA_DIR=/custom/path/to/.chroma_db
OBSIDIAN_RAG_COLLECTION=vault_collection
# Optional customization
OBSIDIAN_RAG_EMBEDDING_MODEL=all-MiniLM-L6-v2
OBSIDIAN_RAG_GEMINI_MODEL=gemini-2.5-flash
# Semantic chunking configuration
OBSIDIAN_RAG_CHUNK_STRATEGY=semantic # or "character" for simple chunking
OBSIDIAN_RAG_SEMANTIC_MIN_CHUNK_SIZE=100
OBSIDIAN_RAG_SEMANTIC_MAX_CHUNK_SIZE=3000
OBSIDIAN_RAG_SEMANTIC_MERGE_THRESHOLD=200
# Vector search
results = search_notes("machine learning algorithms", k=5)
# Q&A with context
answer = answer_question("What did I learn about transformers?")
# Find related notes
neighbors = graph_neighbors("Deep Learning", depth=2)
# Get backlinks
backlinks = get_backlinks("Neural Networks")
# Find by tag
tagged_notes = get_notes_by_tag("ai")
# Create a new note with auto-enrichment
note = create_note(
title="Machine Learning Breakthrough",
content="# Key Findings\n\nDiscovered new optimization technique...",
folder="Research",
tags=["ml", "optimization"],
para_type="project", # Hint for PARA classification
enrich=True # Apply AI enrichment
)
# Read note
content = read_note("Research/AI Progress.md")
# Update properties
update_note_properties("Research/AI Progress.md", {
"status": "completed",
"tags": ["ai", "research", "finished"]
})
The create_note tool creates properly formatted Obsidian notes with:
Example created note:
---
created: '2025-08-23T20:30:00.000000'
modified: '2025-08-23T20:30:00.000000'
para_type: project
para_category: ai/research
para_confidence: 0.85
key_concepts:
- Machine Learning Optimization
- Gradient Descent Improvements
- Performance Benchmarking
tags:
- ml
- optimization
- para/project
- para/project/ai/research
- tech/ai/ml/optimization
potential_links:
- '[[Optimization Techniques]]'
- '[[Research Log 2025]]'
enrichment_version: '1.0'
last_enriched: '2025-08-23T20:30:00.000000'
enrichment_model: gemini-2.5-flash
---
# Machine Learning Breakthrough
Your content here...
Step 1: Analyze your vault
uv run scripts/enrich_para_taxonomy.py analyze --sample 100
Shows current taxonomy state and enrichment potential.
Step 2: Test on subset (dry run)
uv run scripts/enrich_para_taxonomy.py enrich --limit 5 --dry-run
Preview classifications without making changes.
Step 3: Apply enrichment
# Start small
uv run scripts/enrich_para_taxonomy.py enrich --limit 20 --apply
# Scale up
uv run scripts/enrich_para_taxonomy.py enrich --limit 100 --apply
Example enriched note:
---
para_type: project
para_category: AI/Automation
para_confidence: 0.9
key_concepts:
- AI Agent Development
- Computer Use Automation
- Grounded AI Systems
tags:
- "#project/ai/automation"
- "#area/ai/development"
potential_links:
- "Related Project Name"
enrichment_version: "1.0"
last_enriched: "2025-08-23T17:59:32"
---
# Your original note content remains unchanged
#para/project/ai/automationThe system stores graph relationships as ChromaDB metadata:
# Note-level metadata
metadata = {
"note_id": "my_note",
"title": "My Note Title",
"path": "/path/to/note.md",
"tags": "important,ai,project",
"links_to": "other_note,related_note",
"backlinks_from": "source_note,another_note",
"vault": "my_vault"
}
# Chunk-level metadata (semantic chunking)
chunk_metadata = {
"chunk_id": "my_note#chunk_0",
"chunk_type": "section",
"header_text": "Introduction",
"header_level": 2,
"importance_score": 0.8,
"sequential_next": "my_note#chunk_1",
"sequential_prev": "",
"parent_chunk": "my_note#header_0",
"child_chunks": "my_note#chunk_1,my_note#chunk_2",
"sibling_chunks": "my_note#chunk_3",
"semantic_chunk": True
}
Install test dependencies and run the suite:
# Using uv (recommended)
uv sync --extra test
uv run pytest -q
# Or using the local virtualenv
PYTHONPATH=. .venv/bin/pytest -q
Common invocations:
# Only unit / integration
PYTHONPATH=. .venv/bin/pytest tests/unit -q
PYTHONPATH=. .venv/bin/pytest tests/integration -q
# Coverage (threshold configured in pytest.ini)
PYTHONPATH=. .venv/bin/pytest --cov -q
# Markers
pytest -m unit
pytest -m integration
pytest -m "not slow"
Notes:
ModuleNotFoundError: No module named 'tests', prefix commands with PYTHONPATH=...cache/. Override with XDG_CACHE_HOME or DSPY_CACHEDIR if needed.# Test indexing
uv run scripts/reindex.py status
# Test file watching
uv run scripts/reindex_watch.py test
# Test unified store
uv run python -c "
from src.unified_store import UnifiedStore
from src.config import settings
store = UnifiedStore(
client_dir=settings.chroma_dir,
collection_name=settings.collection,
embed_model=settings.embedding_model
)
stats = store.get_stats()
print(f'Store stats: {stats}')
"
The smart_search MCP tool now returns a typed SmartSearchResponse payload:
schema_version, status (ok/degraded/error), and composite confidence.diagnostics with retrieval method, intent + confidence, mean distance, retry count, circuit-breaker state, and warnings.recommendations (up to three actionable suggestions) whenever the response is degraded or fails.hits, total_results, strategy_used) remain for one release to ease migration.Inspect responses locally with the updated CLI helper:
uv run scripts/dspy_mcp_client.py --vault /path/to/vault --query "My quarterly goals" --json
Additional MCP tools complement the contract:
health_check — runs registered probes, returning cache stats, rate limiter tokens, circuit-breaker state, and metrics counters.get_dspy_optimization_status — surfaces optimizer schedule/lock state and whether a background run is pending.force_dspy_optimization — triggers an optimization cycle (guarded by async/file locks, executed via a background thread).Each smart-search invocation also emits a structured log line (SMART_SEARCH_RESULT { ... }) containing the query, status, confidence, retrieval method, retries, circuit-breaker state, warnings, and duration. Simple in-process counters (smart_search_ok/degraded/error, cb_open_events, embed_fallback_events) are exposed via health_check for quick instrumentation.
.chroma_db/ directoryuv run scripts/reindex.py unified --full.chroma_db/uv run scripts/reindex.py unified --fulluv run scripts/reindex.py status--dry-run to preview changes first--batch-size 10graph-rag-mcp-server/
├── src/
│ ├── config.py # Configuration management
│ ├── fs_indexer.py # File parsing & metadata extraction
│ ├── semantic_chunker.py # Intelligent markdown-aware chunking
│ ├── chroma_store.py # Vector database operations
│ ├── unified_store.py # ChromaDB unified operations (vectors + graph metadata)
│ ├── dspy_rag.py # RAG engine with Gemini 2.5 Flash
│ └── mcp_server.py # FastMCP server & tool definitions
├── scripts/
│ ├── reindex.py # Database indexing utilities
│ ├── reindex_watch.py # Real-time file monitoring
│ ├── enrich_para_taxonomy.py # PARA classification & enrichment
│ └── migrate_rdf_store.py # (removed)
├── configs/
│ ├── claude-desktop.json # Claude Desktop MCP configuration template
│ ├── cursor-mcp.json # Cursor MCP configuration template
│ ├── raycast-config.json # Raycast extension configuration template
│ └── .env.example # Environment configuration template
├── install.py # Automated installer & configurator
├── main.py # Alternate MCP server entry point (stdio)
├── pyproject.toml # Dependencies & entry points (uv managed)
├── SETUP.md # Comprehensive setup guide
└── README.md # Project overview & quick start
MIT License - see LICENSE file for details.
Built with modern python stack: Pydantic, ChromaDB, DSPy, FastMCP, and the latest google-genai SDK.
This repo includes a comprehensive test framework with structured fixtures at tests/fixtures/content/ and an evaluation runner that executes deterministic evals for the MCP tools. The evals use distinctive phrases and low-temperature LLM configs so results are stable across embedding and Gemini model choices.
Running Tests:
uv run python -m pytest tests/unit/ -vuv run python scripts/run_tests.py alluv run python tests/evals/runner.pyTest Categories:
tests/unit/): Query intent detection, fuzzy matching, relationship weighting, URI generationtests/integration/): Full MCP server component integrationtests/evals/): Performance metrics and quality assessmenttests/fixtures/): Structured test content (planets, health, projects)The evaluation framework creates temporary test environments, reinitializes app state, reindexes content, and validates:
search_notes and smart_search with strategy routinggraph_neighbors, get_subgraph, get_backlinks, get_notes_by_tagcreate_note, add_content_to_note, archive_noteread_note, get_note_properties, update_note_properties, list_notesTest environments are automatically created and cleaned up. All test content uses structured Pydantic models with proper type validation.
Выполни в терминале:
claude mcp add graph-rag-obsidian -- npx Безопасность
Низкий рискАвтоматическая эвристика по публичным данным — не гарантия безопасности.