loading…
Search for a command to run...
loading…
LLM quantization via tool call. Convert models to GGUF, GPTQ, and AWQ formats. Recommend optimal quant settings, evaluate quality, and push to Hugging Face Hub.
LLM quantization via tool call. Convert models to GGUF, GPTQ, and AWQ formats. Recommend optimal quant settings, evaluate quality, and push to Hugging Face Hub.
Self-contained Python MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format in a single tool call.
No external CLI required -- all quantization logic is embedded.
pip install mcp-turboquant
Or run directly with uvx:
uvx mcp-turboquant
The info, check, and recommend tools work out of the box. For actual quantization, install the backend you need:
# GGUF (Ollama, llama.cpp, LM Studio)
pip install mcp-turboquant[gguf]
# GPTQ (vLLM, TGI)
pip install mcp-turboquant[gptq]
# AWQ (vLLM, TGI)
pip install mcp-turboquant[awq]
# Everything
pip install mcp-turboquant[all]
Add to ~/.claude/settings.json:
{
"mcpServers": {
"turboquant": {
"command": "mcp-turboquant"
}
}
}
Or with uvx (no install needed):
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}
Add to claude_desktop_config.json:
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}
| Tool | Description | Heavy deps? |
|---|---|---|
info |
Get model info from HuggingFace (params, size, architecture) | No |
check |
Check available quantization backends on the system | No |
recommend |
Hardware-aware recommendation for best format + bits | No |
quantize |
Quantize a model to GGUF/GPTQ/AWQ | Yes |
evaluate |
Run perplexity evaluation on a quantized model | Yes |
push |
Push quantized model to HuggingFace Hub | No |
Once configured, ask Claude:
"Get info on meta-llama/Llama-3.1-8B-Instruct"
"What quantization format should I use for Mistral-7B on my machine?"
"Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"
"Check which quantization backends I have installed"
"Evaluate the perplexity of my quantized model at /path/to/model.gguf"
"Push my quantized model to myuser/model-GGUF on HuggingFace"
Claude / Agent <--> MCP Protocol (stdio) <--> mcp-turboquant (Python) <--> llama-cpp-python / auto-gptq / autoawq
All quantization logic runs in-process. No external CLI tools needed.
# As a command
mcp-turboquant
# As a module
python -m mcp_turboquant
MIT
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"shipitandpray-mcp-turboquant": {
"command": "npx",
"args": []
}
}
}Web content fetching and conversion for efficient LLM usage.
Retrieval from AWS Knowledge Base using Bedrock Agent Runtime.
Provides auto-configuration for setting up an MCP server in Spring Boot applications.
A very streamlined mcp client that supports calling and monitoring stdio/sse/streamableHttp, and can also view request responses through the /logs page. It also