loading…
Search for a command to run...
loading…
MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format. 6 tools: info, check, recommend, quantize, evaluate, push. Self-co
MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format. 6 tools: info, check, recommend, quantize, evaluate, push. Self-contained Python server — no external CLI needed.
Self-contained Python MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format in a single tool call.
No external CLI required -- all quantization logic is embedded.
pip install mcp-turboquant
Or run directly with uvx:
uvx mcp-turboquant
The info, check, and recommend tools work out of the box. For actual quantization, install the backend you need:
# GGUF (Ollama, llama.cpp, LM Studio)
pip install mcp-turboquant[gguf]
# GPTQ (vLLM, TGI)
pip install mcp-turboquant[gptq]
# AWQ (vLLM, TGI)
pip install mcp-turboquant[awq]
# Everything
pip install mcp-turboquant[all]
Add to ~/.claude/settings.json:
{
"mcpServers": {
"turboquant": {
"command": "mcp-turboquant"
}
}
}
Or with uvx (no install needed):
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}
Add to claude_desktop_config.json:
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}
| Tool | Description | Heavy deps? |
|---|---|---|
info |
Get model info from HuggingFace (params, size, architecture) | No |
check |
Check available quantization backends on the system | No |
recommend |
Hardware-aware recommendation for best format + bits | No |
quantize |
Quantize a model to GGUF/GPTQ/AWQ | Yes |
evaluate |
Run perplexity evaluation on a quantized model | Yes |
push |
Push quantized model to HuggingFace Hub | No |
Once configured, ask Claude:
"Get info on meta-llama/Llama-3.1-8B-Instruct"
"What quantization format should I use for Mistral-7B on my machine?"
"Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"
"Check which quantization backends I have installed"
"Evaluate the perplexity of my quantized model at /path/to/model.gguf"
"Push my quantized model to myuser/model-GGUF on HuggingFace"
Claude / Agent <--> MCP Protocol (stdio) <--> mcp-turboquant (Python) <--> llama-cpp-python / auto-gptq / autoawq
All quantization logic runs in-process. No external CLI tools needed.
# As a command
mcp-turboquant
# As a module
python -m mcp_turboquant
MIT
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"mcp-turboquant": {
"command": "npx",
"args": []
}
}
}