Turboquant

БесплатноНе проверен

MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format. 6 tools: info, check, recommend, quantize, evaluate, push. Self-co

автор: ShipItAndPray

GitHub Embed

Описание

MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format. 6 tools: info, check, recommend, quantize, evaluate, push. Self-contained Python server — no external CLI needed.

README

Self-contained Python MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format in a single tool call.

No external CLI required -- all quantization logic is embedded.

Install

pip install mcp-turboquant

Or run directly with uvx:

uvx mcp-turboquant

Optional backends

The info, check, and recommend tools work out of the box. For actual quantization, install the backend you need:

# GGUF (Ollama, llama.cpp, LM Studio)
pip install mcp-turboquant[gguf]

# GPTQ (vLLM, TGI)
pip install mcp-turboquant[gptq]

# AWQ (vLLM, TGI)
pip install mcp-turboquant[awq]

# Everything
pip install mcp-turboquant[all]

Configure

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "turboquant": {
      "command": "mcp-turboquant"
    }
  }
}

Or with uvx (no install needed):

{
  "mcpServers": {
    "turboquant": {
      "command": "uvx",
      "args": ["mcp-turboquant"]
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "turboquant": {
      "command": "uvx",
      "args": ["mcp-turboquant"]
    }
  }
}

Tools

Tool	Description	Heavy deps?
`info`	Get model info from HuggingFace (params, size, architecture)	No
`check`	Check available quantization backends on the system	No
`recommend`	Hardware-aware recommendation for best format + bits	No
`quantize`	Quantize a model to GGUF/GPTQ/AWQ	Yes
`evaluate`	Run perplexity evaluation on a quantized model	Yes
`push`	Push quantized model to HuggingFace Hub	No

Examples

Once configured, ask Claude:

"Get info on meta-llama/Llama-3.1-8B-Instruct"

"What quantization format should I use for Mistral-7B on my machine?"

"Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"

"Check which quantization backends I have installed"

"Evaluate the perplexity of my quantized model at /path/to/model.gguf"

"Push my quantized model to myuser/model-GGUF on HuggingFace"

How it works

Claude / Agent  <-->  MCP Protocol (stdio)  <-->  mcp-turboquant (Python)  <-->  llama-cpp-python / auto-gptq / autoawq

All quantization logic runs in-process. No external CLI tools needed.

Run directly

# As a command
mcp-turboquant

# As a module
python -m mcp_turboquant

License

MIT

from github.com/ShipItAndPray/mcp-turboquant

Как установить

Выполни в терминале:

claude mcp add mcp-turboquant -- npx

Compare Turboquant with

TurboquantvsFetch TurboquantvsAWS KB Retrieval TurboquantvsSpring AI MCP Server Turboquantvsllm-analysis-assistant

Не уверен что выбрать?

Найди свой стек за 60 секунд

Автор?

Embed-бейдж для README

Похожее

Все в категории ai

loading…

Весь каталог

Turboquant

БесплатноНе проверен

MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format. 6 tools: info, check, recommend, quantize, evaluate, push. Self-co

автор: ShipItAndPray

GitHub Embed

Описание

README

Self-contained Python MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format in a single tool call.

No external CLI required -- all quantization logic is embedded.

Install

pip install mcp-turboquant

Or run directly with uvx:

uvx mcp-turboquant

Optional backends

The info, check, and recommend tools work out of the box. For actual quantization, install the backend you need:

# GGUF (Ollama, llama.cpp, LM Studio)
pip install mcp-turboquant[gguf]

# GPTQ (vLLM, TGI)
pip install mcp-turboquant[gptq]

# AWQ (vLLM, TGI)
pip install mcp-turboquant[awq]

# Everything
pip install mcp-turboquant[all]

Configure

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "turboquant": {
      "command": "mcp-turboquant"
    }
  }
}

Or with uvx (no install needed):

{
  "mcpServers": {
    "turboquant": {
      "command": "uvx",
      "args": ["mcp-turboquant"]
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "turboquant": {
      "command": "uvx",
      "args": ["mcp-turboquant"]
    }
  }
}

Tools

Tool	Description	Heavy deps?
`info`	Get model info from HuggingFace (params, size, architecture)	No
`check`	Check available quantization backends on the system	No
`recommend`	Hardware-aware recommendation for best format + bits	No
`quantize`	Quantize a model to GGUF/GPTQ/AWQ	Yes
`evaluate`	Run perplexity evaluation on a quantized model	Yes
`push`	Push quantized model to HuggingFace Hub	No

Examples

Once configured, ask Claude:

"Get info on meta-llama/Llama-3.1-8B-Instruct"

"What quantization format should I use for Mistral-7B on my machine?"

"Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"

"Check which quantization backends I have installed"

"Evaluate the perplexity of my quantized model at /path/to/model.gguf"

"Push my quantized model to myuser/model-GGUF on HuggingFace"

How it works

Claude / Agent  <-->  MCP Protocol (stdio)  <-->  mcp-turboquant (Python)  <-->  llama-cpp-python / auto-gptq / autoawq

All quantization logic runs in-process. No external CLI tools needed.

Run directly

# As a command
mcp-turboquant

# As a module
python -m mcp_turboquant

License

MIT

from github.com/ShipItAndPray/mcp-turboquant

Как установить

Выполни в терминале:

claude mcp add mcp-turboquant -- npx

Compare Turboquant with

TurboquantvsFetch TurboquantvsAWS KB Retrieval TurboquantvsSpring AI MCP Server Turboquantvsllm-analysis-assistant

Не уверен что выбрать?

Найди свой стек за 60 секунд

Автор?

Embed-бейдж для README

Похожее

Все в категории ai

Turboquant

Описание

README

Install

Optional backends

Configure

Claude Code

Claude Desktop

Tools

Examples

How it works

Run directly

License

Как установить

Похожие MCP

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant

Compare Turboquant with

Turboquant

Описание

README

Install

Optional backends

Configure

Claude Code

Claude Desktop

Tools

Examples

How it works

Run directly

License

Как установить

Похожие MCP

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant

Compare Turboquant with

Command Palette

Turboquant

Описание

README

Install

Optional backends

Configure

Claude Code

Claude Desktop

Tools

Examples

How it works

Run directly

License

Как установить

Похожие MCP

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant

Compare Turboquant with

Turboquant

Описание

README

Install

Optional backends

Configure

Claude Code

Claude Desktop

Tools

Examples

How it works

Run directly

License

Как установить

Похожие MCP

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant

Compare Turboquant with