loading…
Search for a command to run...
loading…
Python MCP server for vector search using Qdrant vector database and Ollama embeddings, with advanced query techniques like query expansion, HyDE, and reranking
Python MCP server for vector search using Qdrant vector database and Ollama embeddings, with advanced query techniques like query expansion, HyDE, and reranking.
Python MCP server for vector search using Qdrant vector database and Ollama embeddings.
Status: ✅ Working with Qdrant vector search and Ollama embeddings + Advanced query techniques
Supports flexible output formats (Markdown or JSON) with configurable similarity thresholds and advanced search options.
# Using Docker (Recommended)
docker run -p 6333:6333 qdrant/qdrant:latest
Or install locally: Qdrant Quick Start
# Install: https://ollama.ai
ollama pull bge-m3
ollama pull mistral
ollama serve
Available embedding models:
bge-m3 (384 dims) - ⭐ recommended - best quality-speed balancenomic-embed-text (768 dims) - balanced, good for general usemxbai-embed-large (1024 dims) - highest qualityall-minilm (384 dims) - ultra-lightweight, good for mobilecd /path/to/w3-mcp-server-qdrant
# Remove old lockfile and venv
rm -rf uv.lock .venv venv
# Unset old environment variable
unset VIRTUAL_ENV
# Install all Python dependencies using uv
uv sync
That's it! uv sync installs all dependencies including MCP, pydantic, qdrant-client, and httpx.
Create a .env file from template:
cp .env.example .env
Edit .env:
# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY= # Optional if using API key auth
# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBED_MODEL=bge-m3:latest
OLLAMA_RERANK_MODEL=mistral # For query expansion and reranking
Or export environment variables:
export QDRANT_URL=http://localhost:6333
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBED_MODEL=bge-m3:latest
export OLLAMA_RERANK_MODEL=mistral
# Check Qdrant
curl http://localhost:6333/health
# Check Ollama
curl http://localhost:11434/api/tags
# Check Python env
uv run python -c "from mcp.server.fastmcp import FastMCP; print('✓ MCP ready')"
# Start MCP Inspector (interactive web UI)
uv run mcp dev server.py
Opens URL like:
http://localhost:6274/?MCP_PROXY_AUTH_TOKEN=...
Features:
Best way to test and debug:
cd /path/to/w3-mcp-server-qdrant
# Start inspector
uv run mcp dev server.py
Opens web UI at http://localhost:5173:
# Run server (stdio mode)
uv run python server.py
Edit ~/.claude/claude_config.json:
{
"mcpServers": {
"qdrant": {
"type": "stdio",
"command": "uv",
"args": ["run", "server.py"],
"cwd": "/path/to/w3-mcp-server-qdrant",
"env": {
"QDRANT_URL": "http://localhost:6333",
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_EMBED_MODEL": "bge-m3:latest",
"OLLAMA_RERANK_MODEL": "mistral"
}
}
}
}
Advantages:
Install from PyPI (always fetch latest version):
uv run --with w3-mcp-server-qdrant --refresh w3-mcp-server-qdrant
Edit ~/.claude/claude_config.json:
{
"mcpServers": {
"qdrant": {
"type": "stdio",
"command": "uv",
"args": ["run", "--with", "w3-mcp-server-qdrant", "--refresh", "w3-mcp-server-qdrant"],
"env": {
"QDRANT_URL": "http://localhost:6333",
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_EMBED_MODEL": "bge-m3:latest",
"OLLAMA_RERANK_MODEL": "mistral"
}
}
}
}
Advantages:
Then restart Claude Code.
Search for similar documents in a collection using text query (auto-embedded via Ollama).
Supports advanced search techniques: query expansion, hypothetical document embeddings (HyDE), and LLM-based reranking.
| Parameter | Type | Default | Description |
|---|---|---|---|
collection_name |
string | required | Name of the collection to search |
query_text |
string | required | Text to search for (auto-embedded via Ollama) |
limit |
integer | 5 | Max results to return (1-100) |
score_threshold |
float | 0.0 | Minimum similarity threshold (0.0-1.0) |
fields |
string | "" | Comma-separated metadata fields to return (empty = all) |
response_format |
string | "markdown" | "markdown" or "json" |
Generate N query variations, search all in parallel, merge results with Reciprocal Rank Fusion:
| Parameter | Type | Default | Description |
|---|---|---|---|
expand_query |
boolean | false | Enable query expansion |
expand_query_count |
integer | 3 | Number of variations to generate (1-10) |
Generate a hypothetical document matching the query intent, then embed it:
| Parameter | Type | Default | Description |
|---|---|---|---|
use_hyde |
boolean | false | Enable HyDE |
hyde_combine_original |
boolean | true | Also search original query + HyDE doc |
Use LLM to reorder results by relevance to the original query:
| Parameter | Type | Default | Description |
|---|---|---|---|
rerank |
boolean | false | Enable LLM reranking |
rerank_top_n |
integer | 10 | Number of results to rerank (1-100) |
Example 1: Basic search
{
"collection_name": "docs",
"query_text": "machine learning",
"limit": 5
}
Example 2: Query expansion (good recall)
{
"collection_name": "docs",
"query_text": "machine learning",
"expand_query": true,
"expand_query_count": 5,
"limit": 5
}
Example 3: HyDE (semantic understanding)
{
"collection_name": "docs",
"query_text": "machine learning",
"use_hyde": true,
"hyde_combine_original": true,
"limit": 5
}
Example 4: Full combo (best quality, slower)
{
"collection_name": "docs",
"query_text": "machine learning",
"expand_query": true,
"expand_query_count": 3,
"use_hyde": true,
"rerank": true,
"rerank_top_n": 15,
"limit": 5
}
Returns JSON with search metadata and ranked results:
{
"query": "machine learning",
"collection": "docs",
"total": 3,
"search_method": "rrf+hyde+expand+rerank",
"results": [
{
"index": 1,
"id": "doc_123",
"score": 0.0273,
"metadata": {
"title": "Machine Learning Basics",
"author": "Jane Doe"
}
}
]
}
Note: search_method field indicates which techniques were applied:
basic - simple vector searchrrf - multiple searches merged with Reciprocal Rank Fusionrrf+hyde - RRF with HyDErrf+expand - RRF with query expansionrrf+hyde+expand+rerank - all techniques combinedList all collections in Qdrant with metadata.
Parameters:
response_format (string): "markdown" or "json" (default: "markdown")Example:
{
"response_format": "json"
}
Output:
{
"collections": [
{
"name": "tech_docs",
"points_count": 1250,
"vector_size": 768
},
{
"name": "papers",
"points_count": 3840,
"vector_size": 1024
}
]
}
Specifies the URL of your Qdrant server.
Set via:
Environment variable:
export QDRANT_URL=http://localhost:6333
uv run python server.py
.env file:
QDRANT_URL=http://localhost:6333
In claude_config.json:
"env": {
"QDRANT_URL": "http://localhost:6333"
}
Specifies the URL of your Ollama server.
Default: http://localhost:11434
Specifies which embedding model to use for embedding search queries and documents.
Default: bge-m3:latest
Recommended embedding models:
bge-m3 (384 dims) - ⭐ Recommended - best quality-to-speed rationomic-embed-text (768 dims) - balanced, good for most use casesall-minilm (384 dims) - fast, lightweightmxbai-embed-large (1024 dims) - highest quality but slowerSpecifies which LLM model to use for advanced features (query expansion, HyDE, reranking).
Default: mistral
Recommended models:
mistral (7B) - ⭐ Recommended - good quality, reasonable speedqwen2.5-coder (7B) - high quality but optimized for codellama3.2 (3B) - smaller, faster but lower qualityneural-chat (7B) - good for instruction-followingNote: Only used when expand_query=true, use_hyde=true, or rerank=true
w3-mcp-server-qdrant/
├── server.py # MCP server entry point
├── pyproject.toml # Project config
├── .env.example # Environment variables template
├── README.md # This file
└── tests/
└── test_mcp_server.py # Integration tests
MCP Client (Claude, IDE, etc.)
↓
MCP Server (server.py)
├── Ollama: text → embedding vector
└── Qdrant: vector search
# Via Claude/MCP interface
qdrant_search(
collection_name="tech_docs",
query_text="machine learning algorithms",
limit=5,
score_threshold=0.6,
response_format="markdown"
)
# Via Claude/MCP interface
qdrant_list_collections(response_format="json")
uv run pytest tests/
uv run black server.py
uv run ruff check server.py
uv run mcp dev server.py
Web UI at http://localhost:5173 shows:
score_threshold to filter low-relevance results and reduce noiselimit parameter (1-100) to balance quality vs. speednomic-embed-text: balanced (recommended)all-minilm: fast, lightweightmxbai-embed-large: higher quality but slower| Feature | Quality | Speed | Use Case |
|---|---|---|---|
| Basic search | ⭐⭐ | ⚡⚡⚡ | Clear, specific queries |
| Query expansion | ⭐⭐⭐ | ⚡⚡ | Ambiguous queries, high recall needed |
| HyDE | ⭐⭐⭐ | ⚡⚡ | Semantic understanding important |
| Reranking | ⭐⭐⭐⭐ | ⚡ | Precision critical, can wait 1-2s |
| All combined | ⭐⭐⭐⭐⭐ | ⚡ | Best quality, time not critical |
limit=5expand_query=true, expand_query_count=3use_hyde=truererank=true (slowest, ~5-10s)# Check if Qdrant is running
curl http://localhost:6333/health
# Start Qdrant with Docker
docker run -p 6333:6333 qdrant/qdrant:latest
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Pull embedding model
ollama pull nomic-embed-text
# Start Ollama
ollama serve
# Install dependencies with uv
uv sync
curl http://localhost:6333/health and curl http://localhost:11434/api/tagsMIT
Run in your terminal:
claude mcp add w3-mcp-server-qdrant -- npx Security
Low riskAutomated heuristic from public metadata — not a security guarantee.