loading…
Search for a command to run...
loading…
MCP server wrapping local Ollama models for offload from API-priced orchestrators. Nine stdio tools - generation, summarisation, analysis, drafting, code tasks
MCP server wrapping local Ollama models for offload from API-priced orchestrators. Nine stdio tools - generation, summarisation, analysis, drafting, code tasks (docstring/test/explain/review/types/refactor-suggest), diff-driven tasks (commit-message/pr-description/changelog/summary/impact), mechanical transforms, and model management (list/pull). Apache-2.0.
MCP server wrapping local Ollama models for offload from API-priced orchestrators.
Exposes nine tools that pass work to a local model (text generation, summarisation, code tasks, mechanical transforms, commit/PR/changelog drafting). The orchestrator decides what to route locally; this server does the routing.
hermes3:8b (override via OLLAMA_MODEL)http://localhost:11434 (override via OLLAMA_HOST)Orchestrators priced by the token (Claude Code, Cursor, the Anthropic API, Cline, Aider) pay for every classification, every docstring, every commit message. Most of that work doesn't need a frontier model. Routed to Ollama on the same machine, the same work is free and faster. mcp-ollama is the routing surface.
The orchestrating model decides what to route where. This server is plumbing — it does not try to be clever about task classification. Pick the right tool, pass the text, get a result back.
git clone https://github.com/true-alter/mcp-ollama.git
cd mcp-ollama
npm install
npm run build
You also need a running Ollama instance with at least one model pulled:
# Default — 8B, fast, good for classifications and short generations
ollama pull hermes3:8b
# Optional — code-specialised, heavier, better for local_code tasks
ollama pull qwen2.5-coder:32b
docker build -t mcp-ollama .
docker run -i --rm \
-e OLLAMA_HOST=http://host.docker.internal:11434 \
-e OLLAMA_MODEL=hermes3:8b \
mcp-ollama
The supplied Dockerfile points at host.docker.internal:11434 so the container reaches Ollama on the host.
node dist/index.js
Stdio servers are launched by the MCP client (Claude Code, Cursor, etc.) — running it directly is only useful for debugging.
claude mcp add --transport stdio ollama -- node /absolute/path/to/mcp-ollama/dist/index.js
Or in ~/.claude/settings.json:
{
"mcpServers": {
"ollama": {
"transport": "stdio",
"command": "node",
"args": ["/absolute/path/to/mcp-ollama/dist/index.js"],
"env": {
"OLLAMA_HOST": "http://localhost:11434",
"OLLAMA_MODEL": "hermes3:8b"
}
}
}
}
| Tool | Purpose |
|---|---|
local_generate |
General-purpose generation with system + user prompt |
local_summarize |
Summarise a blob of text |
local_analyze |
Analyse text against a specific question |
local_draft |
Draft content in a given style |
local_code |
Code tasks: docstring / test / explain / review / types / refactor-suggest |
local_diff |
Diff-driven tasks: commit-message / pr-description / changelog / summary / impact |
local_transform |
Mechanical code transformations |
local_models |
List models available on the local Ollama host |
local_pull |
Pull a model onto the local Ollama host |
Full tool schemas are exposed over MCP introspection — any MCP-aware client will enumerate them automatically.
| Variable | Default | Purpose |
|---|---|---|
OLLAMA_HOST |
http://localhost:11434 |
Ollama HTTP endpoint |
OLLAMA_MODEL |
hermes3:8b |
Default model when a tool call omits model |
Any tool call may override model explicitly — the env default only applies when unset. local_code tends to work better with a code-specialised model passed per-call, while local_summarize and local_draft are fine on the default.
| Workload | Recommended model | Rationale |
|---|---|---|
| Classification, one-liners, tags | hermes3:8b |
Fastest round-trip, cheap to run |
| Commit messages, changelogs, summaries | qwen2.5-14b-instruct |
Higher quality, still comfortable on 16GB GPU |
| Code review, docstrings, tests | qwen2.5-coder:32b |
Code-specialised |
| Fallback / unknown model | whatever local_models returns |
Inspect first, then route |
Use local_models at session start if you're unsure what's available on a host.
Ollama error 404 when calling a tool. The model isn't pulled. Run ollama pull <name> or call local_pull from the client.
fetch failed / connection refused. Ollama isn't running, or OLLAMA_HOST points somewhere wrong. Verify with curl $OLLAMA_HOST/api/tags. Inside a container, localhost is the container itself — use host.docker.internal on macOS/Windows or a bridge IP on Linux.
Tool calls feel slow. First call to a cold model incurs a load. Subsequent calls within the same Ollama process are much faster. If the model is larger than available VRAM, Ollama falls back to CPU — watch ollama ps to confirm.
Empty or truncated output. max_tokens defaults to 2048 per tool. For long generations, pass max_tokens explicitly in the tool call.
mcp-ollama makes no network call of its own beyond the configured OLLAMA_HOST. It ships no telemetry, no analytics, no auto-update pinger. Tool inputs are forwarded to Ollama's HTTP API verbatim and the response is relayed back; the server itself is stateless between calls.
If you run Ollama on localhost (the default) the entire loop stays on the host. If you point OLLAMA_HOST at a remote endpoint, treat that endpoint's security posture as authoritative — a typo sending prompts to a third-party host is trivially possible.
To report a security issue, see SECURITY.md.
Bug reports and small patches welcome — see CONTRIBUTING.md. Larger design changes: please open an issue first so we can talk about scope before you invest time.
mcp-ollama is maintained by ALTER as part of the identity infrastructure for the AI economy. The ALTER identity MCP server is hosted at mcp.truealter.com — see @truealter/sdk for the TypeScript client.
Apache License 2.0. See LICENSE for the full text. Copyright 2026 Alter Meridian Pty Ltd (ABN 54 696 662 049).
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"mcp-ollama": {
"command": "npx",
"args": []
}
}
}