loading…
Search for a command to run...
loading…
LLM quantization via tool call. Convert models to GGUF, GPTQ, and AWQ formats. Recommend optimal quant settings, evaluate quality, and push to Hugging Face Hub.
LLM quantization via tool call. Convert models to GGUF, GPTQ, and AWQ formats. Recommend optimal quant settings, evaluate quality, and push to Hugging Face Hub.
Self-contained Python MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format in a single tool call.
No external CLI required -- all quantization logic is embedded.
pip install mcp-turboquant
Or run directly with uvx:
uvx mcp-turboquant
The info, check, and recommend tools work out of the box. For actual quantization, install the backend you need:
# GGUF (Ollama, llama.cpp, LM Studio)
pip install mcp-turboquant[gguf]
# GPTQ (vLLM, TGI)
pip install mcp-turboquant[gptq]
# AWQ (vLLM, TGI)
pip install mcp-turboquant[awq]
# Everything
pip install mcp-turboquant[all]
Add to ~/.claude/settings.json:
{
"mcpServers": {
"turboquant": {
"command": "mcp-turboquant"
}
}
}
Or with uvx (no install needed):
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}
Add to claude_desktop_config.json:
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}
| Tool | Description | Heavy deps? |
|---|---|---|
info |
Get model info from HuggingFace (params, size, architecture) | No |
check |
Check available quantization backends on the system | No |
recommend |
Hardware-aware recommendation for best format + bits | No |
quantize |
Quantize a model to GGUF/GPTQ/AWQ | Yes |
evaluate |
Run perplexity evaluation on a quantized model | Yes |
push |
Push quantized model to HuggingFace Hub | No |
Once configured, ask Claude:
"Get info on meta-llama/Llama-3.1-8B-Instruct"
"What quantization format should I use for Mistral-7B on my machine?"
"Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"
"Check which quantization backends I have installed"
"Evaluate the perplexity of my quantized model at /path/to/model.gguf"
"Push my quantized model to myuser/model-GGUF on HuggingFace"
Claude / Agent <--> MCP Protocol (stdio) <--> mcp-turboquant (Python) <--> llama-cpp-python / auto-gptq / autoawq
All quantization logic runs in-process. No external CLI tools needed.
# As a command
mcp-turboquant
# As a module
python -m mcp_turboquant
MIT
Выполни в терминале:
claude mcp add shipitandpray-mcp-turboquant -- npx pro-tip
Поставил ShipItAndPray/mcp-turboquant? Скажи Claude: «запомни почему я установил ShipItAndPray/mcp-turboquant и что хочу попробовать» — попадёт в твой Vault.
как это работает →Безопасность
Низкий рискАвтоматическая эвристика по публичным данным — не гарантия безопасности.