loading…
Search for a command to run...
loading…
Exposes queryable GPU inference benchmark data (quantization, throughput, VRAM, concurrent users) as tools for LLM clients.
Exposes queryable GPU inference benchmark data (quantization, throughput, VRAM, concurrent users) as tools for LLM clients.
An MCP server that exposes Poor Paul's Benchmark GPU inference data — quantization × throughput × VRAM × concurrent users — as queryable tools to any LLM client.
Hosted instance: https://mcp.poorpaul.dev/ (streamable-http transport, no auth)
Connect any MCP-aware client (Claude Desktop, Cline, Continue, etc.) to ask questions like:
It exposes nine tools backed by 30,000+ real benchmark rows:
| Tool | What it does |
|---|---|
list_tested_configs |
Lists every tested GPU, model, and quantization (call this first) |
query_ppb_results |
Filters raw benchmark rows by GPU / VRAM / model / quant / users / backend |
recommend_quantization |
Three-tier empirical-first recommendation engine (high / medium / low confidence) |
get_gpu_headroom |
Sanity-checks a (gpu, model, quant, users) configuration for VRAM headroom |
| Tool | What it does |
|---|---|
get_qualitative_summary |
All available qualitative scores (context-rot, tool accuracy, quality, MT-Bench) |
query_qualitative_results |
Filter qualitative rows by phase, model, quant, GPU, or minimum score thresholds |
get_context_rot_breakdown |
Long-context recall scores by length, depth, and needle type |
get_tool_accuracy_breakdown |
Tool-call accuracy: selection, parameters, hallucination rate, parse success |
compare_quants_qualitative |
Side-by-side qualitative comparison across quantizations with deterministic insight |
Benchmark rows are mirrored into a local SQLite cache (./ppb_cache.db by
default; override with PPB_DB_PATH). On startup the server loads from
SQLite and only contacts HuggingFace when the dataset's git commit SHA
has changed — making subsequent restarts fast and offline-friendly.
Add to your MCP client config (Claude Desktop example, ~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"ppb": {
"transport": { "type": "http", "url": "https://mcp.poorpaul.dev/mcp" }
}
}
}
pip install and run locally (stdio)pip install ppb-mcp
MCP_TRANSPORT=stdio ppb-mcp
Claude Desktop config:
{
"mcpServers": {
"ppb": {
"command": "ppb-mcp",
"env": { "MCP_TRANSPORT": "stdio" }
}
}
}
docker run --rm -p 9933:9933 \
-e MCP_TRANSPORT=streamable-http \
-v ppb-hf-cache:/data/huggingface \
ghcr.io/paulplee/ppb-mcp:latest
git clone https://github.com/paulplee/ppb-mcp
cd ppb-mcp
pip install -e ".[dev]"
ppb-mcp # streamable-http on :9933
All clients use the same hosted endpoint: https://mcp.poorpaul.dev/mcp
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"ppb": {
"transport": { "type": "http", "url": "https://mcp.poorpaul.dev/mcp" }
}
}
}
Restart Claude Desktop after saving.
Edit ~/.cursor/mcp.json (create if it doesn't exist):
{
"mcpServers": {
"ppb": {
"url": "https://mcp.poorpaul.dev/mcp",
"type": "http"
}
}
}
Or via UI: Settings → Tools & Integrations → MCP → Add Server.
Edit ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"ppb": {
"serverUrl": "https://mcp.poorpaul.dev/mcp",
"transport": "http"
}
}
}
Add to your .vscode/mcp.json (workspace) or User settings.json:
{
"mcp": {
"servers": {
"ppb": {
"type": "http",
"url": "https://mcp.poorpaul.dev/mcp"
}
}
}
}
Add to ~/.config/zed/settings.json under "context_servers":
{
"context_servers": {
"ppb": {
"command": {
"path": "env",
"args": ["MCP_TRANSPORT=stdio", "uvx", "ppb-mcp"]
}
}
}
}
Open the Cline panel → MCP Servers tab → Add Server → select SSE/HTTP → paste https://mcp.poorpaul.dev/mcp.
Add to ~/.continue/config.yaml:
mcpServers:
- name: ppb
transport:
type: http
url: https://mcp.poorpaul.dev/mcp
Add to ~/.config/opencode/config.json:
{
"mcp": {
"ppb": {
"type": "remote",
"url": "https://mcp.poorpaul.dev/mcp"
}
}
}
goose mcp add ppb --transport http --url https://mcp.poorpaul.dev/mcp
# Zero-install (requires uv):
env MCP_TRANSPORT=stdio uvx ppb-mcp
# After pip install:
env MCP_TRANSPORT=stdio ppb-mcp
Note on transport key names: MCP clients are not yet fully standardised on JSON key names for the HTTP transport. If your client doesn't connect with
"type": "http", try"transport": "http","type": "sse", or"transport": "streamable-http". The endpoint URL is the same regardless.
> list_tested_configs
{ "gpus": ["Apple M4 Pro", "NVIDIA GB10", "NVIDIA GeForce RTX 5090"],
"models": ["Qwen3.5-9B", ...], "quantizations": ["Q4_K_M", ...] }
> recommend_quantization(gpu_vram_gb=32, concurrent_users=8, model="Qwen3.5-9B", priority="balance")
{ "recommended_quantization": "Q5_K_M",
"estimated_vram_usage_gb": 27.8,
"estimated_tokens_per_second": 142.0,
"headroom_gb": 4.2,
"confidence": "high",
"reasoning": "Q5_K_M is recommended for your NVIDIA GeForce RTX 5090 (32 GB) ...",
"alternatives": ["Q4_K_M", "Q8_0"] }
| Env var | Default | Notes |
|---|---|---|
HF_DATASET |
paulplee/ppb-results |
HuggingFace dataset ID |
REFRESH_INTERVAL_HOURS |
1 |
Background refresh cadence |
MCP_TRANSPORT |
streamable-http |
stdio or streamable-http |
HOST |
0.0.0.0 |
HTTP bind host |
PORT |
9933 |
HTTP bind port |
LOG_LEVEL |
INFO |
Python logging level |
git clone https://github.com/paulplee/ppb-mcp /tmp/ppb-mcp
cd /tmp/ppb-mcp
DOMAIN=mcp.example.com [email protected] ./deploy/deploy.sh
This installs Docker, builds the image, registers a systemd unit, configures nginx, and runs certbot.
pip install -e ".[dev]"
ruff check src tests
pytest -v
Integration tests against the live HuggingFace dataset are gated behind PPB_RUN_INTEGRATION=1 to keep CI offline-clean.
(model, quant) benchmarked on a different GPU at the same concurrency; throughput borrowed, VRAM scaled to your card.vram_per_user ≈ (params_B × bits_per_weight / 8) × 1.15; viable iff total ≤ 90 % of your VRAM.MIT — see LICENSE.
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"ppb-mcp": {
"command": "npx",
"args": []
}
}
}