loading…
Search for a command to run...
loading…
Give brains to your small models. MCP server that enforces step-by-step Chain-of-Thought — turns 4B models into methodical reasoners.
Give brains to your small models. MCP server that enforces step-by-step Chain-of-Thought — turns 4B models into methodical reasoners.
"Give brains to your small models."
CotForce enforces step-by-step Chain-of-Thought, turning 4B parameter models into methodical reasoners.
A 4-billion-parameter Gemma cannot solve SEND + MORE = MONEY. It's a classic cryptarithmetic puzzle — 8 unique digits, 5 columns, 4 carry values. A bare 4B model guesses randomly. It hallucinates digits. It loses track of carries after column 2.
The same model, with CotForce:
Step 1: Analyze the leftmost column. S+M+C3 = MO. Max sum is 19998. ∴ M=1.
Step 2: S+1+C3 = 10+O. With M=1 and carry, O must be 0.
Step 3: D+E = Y+10C1 → C1=1. Now R+C1=9 → C1=0→R=9 (used), C1=1→R=8.
...
Step 11: All digits assigned. 9567 + 1085 = 10652. Verified.
11 structured reasoning steps. Zero hallucinations. Correct answer.
CotForce doesn't make small models smarter. It forces them to think before they speak — which is often all they need.
CotForce uses the MCP sampling protocol (sampling/createMessage) to call LLMs. If your client supports it (Claude Desktop, Cursor), nothing extra is needed.
If not — or if you're using a local model like Gemma via LMStudio — switch to direct HTTP mode:
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["node_modules/@slbdn/cotforce-mcp/index.js"],
"env": {
"MODE": "direct",
"API_BASE_URL": "http://localhost:1234/v1",
"MODEL": "gemma-4-e4b-it-mlx"
}
}
}
}
That's it. The same 4B Gemma that couldn't solve SEND+MORE=MONEY above — now with CotForce, working locally through LMStudio.
{reasoning, result} via strict system prompts and few‑shot examples.CotParser interface. Select parsers via COT_PARSERS env var.<reasoning>, Reasoning:)cl100k_base encoding, with fallback to character heuristic. Tweak via REASONING_OVERHEAD.MODEL environment variable to hint a specific model; leave unset for host default.MODEL.API_KEY to use direct mode.LOG_LEVEL).TRUNCATION_THRESHOLD).resultSchema parameter validates the result field type‑map; mismatches trigger retry.npm install @slbdn/cotforce-mcp
# or
git clone https://github.com/islobodan/cotforce-mcp
cd cotforce-mcp
npm install
npm run build
Requires Node.js ≥ 18.
Add to claude_desktop_config.json:
{
"mcpServers": {
"cotforce": {
"command": "npx",
"args": ["-y", "@slbdn/cotforce-mcp"],
"env": {
"MODEL": "claude-3-5-sonnet"
}
}
}
}
No clone, no build. npx -y pulls and runs directly from npm.
The server is configured via environment variables (all optional):
| Variable | Default | Description |
|---|---|---|
MODEL |
(not set) | Model name hint (e.g. claude-3-5-sonnet, gpt-4o). If empty, no hint sent – MCP host decides. |
MAX_RETRIES |
2 |
Number of retry attempts before returning raw output. |
BASE_TEMP |
0.1 |
Initial sampling temperature. |
TEMP_INCREMENT |
0.2 |
Temperature added per retry attempt. |
TIMEOUT |
60000 / 120000 |
Sampling timeout in ms (60s). Direct HTTP mode uses longer default (120s) since local models are slower. |
CACHE_TTL |
3600000 |
Result cache TTL in ms (default 1 hour). Set to 0 to disable. |
CACHE_MAX_ENTRIES |
100 |
Maximum cached results before evicting oldest. |
COT_PARSERS |
(all) | Comma-separated parser names to use (e.g., direct-json,fenced-block). Skips others. |
TRUNCATION_THRESHOLD |
0.95 |
Ratio of output/budget that triggers truncation detection. Attempts truncated JSON recovery first, then retries with 1.5x budget. |
REASONING_OVERHEAD |
800 |
Fixed token overhead added to the budget formula. Increase for verbose models. |
FALLBACK_MODELS |
(not set) | Comma-separated list of fallback models (e.g. gpt-4o,claude-3-5-sonnet). Cycled on failure. |
MODE |
auto |
auto, sampling, or direct. auto uses direct HTTP when API_KEY is set and client lacks sampling support. |
API_KEY |
(not set) | LLM API key for direct HTTP mode. Optional for local endpoints (LMStudio, Ollama). Required for remote providers (OpenAI, Anthropic, etc.). |
API_BASE_URL |
https://api.openai.com |
Base URL for direct HTTP mode. Change for LMStudio (http://localhost:1234/v1) or other providers. |
LOG_LEVEL |
INFO |
One of DEBUG, INFO, WARN, ERROR. |
MODEL=gpt-4o MAX_RETRIES=3 BASE_TEMP=0.2 TEMP_INCREMENT=0.15 LOG_LEVEL=DEBUG npx @slbdn/cotforce-mcp
Add to your MCP client configuration. A .mcp.json file is included in the package for auto-discovery by clients like Cursor, VS Code, and Windsurf. Copy the relevant config below to your client's settings:
With MCP sampling (Claude Desktop):
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["/path/to/cotforce-mcp/index.js"],
"env": {
"MODEL": "claude-3-5-sonnet",
"MAX_RETRIES": "2"
}
}
}
}
With direct LLM HTTP (LMStudio, OpenAI, Ollama):
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["/path/to/cotforce-mcp/index.js"],
"env": {
"MODE": "direct",
"API_BASE_URL": "http://localhost:1234/v1",
"MODEL": "local-model",
"MAX_RETRIES": "2"
}
}
}
}
Note:
API_KEYis optional for local endpoints like LMStudio or Ollama. It is required for remote providers like OpenAI or Anthropic.
The root
index.jsis a launcher that delegates todist/index.js. It guards against missing builds with a helpful error message.
What you see: finish_reason: "length" in the LLM response. The reasoning cuts off before the result field.
Why: The token budget is too tight. Complex reasoning (like SEND+MORE=MONEY) can need 3000+ output tokens, but the default minimum is 4096 — while the default model-level cap can vary.
Fix: Increase the budget overhead:
REASONING_OVERHEAD=1600 # default is 800, raise for verbose models
Or skip token-heavy parser layers to save budget for reasoning:
COT_PARSERS=direct-json,fenced-block # skip heuristic and brace-balanced
What you see: MCP error -32001: Request timed out before the solution appears.
Why: Complex CoT reasoning takes time — 60-90 seconds for local models like Gemma. This error can come from two places:
TIMEOUT env var.Fix — check both sides:
Increase CotForce's timeout:
TIMEOUT=180000 # 3 minutes
Check your MCP client's timeout setting:
LM Studio — add "timeout" to mcp.json (milliseconds):
{
"mcpServers": {
"cotforce": {
"command": "node",
"args": ["index.js"],
"env": {
"TIMEOUT": "180000"
},
"timeout": 300000
}
}
}
Claude Desktop — the tool call timeout is not directly configurable. A workaround is to increase CotForce's TIMEOUT to complete within the client's window, or use a faster model.
Cursor / VS Code — check the MCP extension or .vscode/mcp.json for a timeout or requestTimeout setting.
{
"name": "solve_problem",
"arguments": {
"prompt": "What is 7 * 8 + 2?"
}
}
{
"name": "solve_problem",
"arguments": {
"prompt": "List the prime numbers between 10 and 20",
"resultSchema": {
"primes": "object",
"count": "number"
}
}
}
If the result field doesn't match the schema, the server retries with a correction hint.
See EXAMPLES.md for 16 diverse examples including:
{
"content": [{
"type": "text",
"text": "🤖 Agentic CoT Result:\n\n**Reasoning:** Step 1: Multiply 7 * 8 = 56. Step 2: Add 2 to get 58.\n\n**Answer:** 58\n\n📊 Token Usage: 42 in / 150 out / 4096 budget"
}]
}
If parsing fails after all retries, the server returns the raw LLM output with a warning.
The parser is a priority-sorted pipeline of plugins. Five built-in parsers run in order:
| Priority | Name | What it does |
|---|---|---|
| 10 | direct-json |
Parses whole output as JSON (strips ```json fences) |
| 20 | fenced-block |
Extracts JSON from markdown code blocks |
| 30 | heuristic |
Looks for <reasoning>/<result> XML tags or Reasoning:/Result: labels |
| 40 | brace-balanced |
Finds first balanced {} in arbitrary text |
| 50 | truncated-recovery |
Salvages reasoning from truncated JSON (hit token limit) |
Filter parsers via COT_PARSERS env var:
COT_PARSERS=direct-json,fenced-block node index.js
Write a custom parser:
import { CotParser, AgenticCotSchema } from "@slbdn/cotforce-mcp";
class YamlParser implements CotParser {
name = "yaml";
priority = 35; // runs after heuristic, before brace-balanced
parse(raw: string): { reasoning: string; result: unknown } | null {
// Custom YAML parsing logic here
return null; // return null if this output isn't YAML
}
}
Then register it programmatically:
import { defaultParserPipeline, ParserPipeline } from "@slbdn/cotforce-mcp";
const pipeline = defaultParserPipeline();
pipeline.addParser(new YamlParser());
const result = pipeline.parse(rawText);
solve_problem{ prompt: string } — the problem to solve.CotForce supports two modes for calling the LLM:
MCP Sampling (default with compatible clients):
sampling/createMessageDirect HTTP (for clients without sampling support):
/v1/chat/completions directlyMODE=auto when API_KEY is set and client lacks samplingMODE=directBoth modes use the same system prompt with few‑shot examples and strict schema constraints.
cotforce-mcp/
├── src/
│ ├── index.ts # MCP server, tool handlers, routing logic
│ └── lib/
│ ├── parser.ts # Parser pipeline: CotParser interface + 5 plugin parsers + Zod schemas
│ ├── tokens.ts # tiktoken integration + budget computation
│ ├── prompts.ts # Model-specific system prompts
│ ├── metrics.ts # In-memory request/performance counters
│ └── llm.ts # Direct HTTP LLM client (OpenAI-compatible)
├── tests/
│ ├── cache.test.ts # 10 unit tests for result caching
│ ├── parser.test.ts # 47 unit tests for parser layers
│ ├── tokens.test.ts # 23 unit tests for token budgeting
│ ├── schema.test.ts # 8 unit tests for result schema validation
│ ├── metrics.test.ts # 9 unit tests for metrics tracking
│ ├── prompts.test.ts # 12 unit tests for model-specific prompts
│ ├── llm.test.ts # 6 tests for direct mode detection
│ ├── retry.test.ts # 4 integration tests for retry loop
│ ├── progress.test.ts # 5 unit tests for progress notifications
│ └── server.test.ts # 9 integration tests via @slbdn/mcp-tester
├── index.js # Root launcher (delegates to dist/)
├── dist/ # Compiled TypeScript output
└── package.json
reasoning and result. Model-specific variants tuned for Claude, GPT-4, Gemini, Grok.COT_PARSERS env var and the CotParser interface.FALLBACK_MODELS) when primary model refuses.estimateTokens() (lightweight heuristic) for budget math and countTokens() (tiktoken) for exact counts. Sets maxTokens dynamically (between 4096 and 8192) via formula overhead + inputTokens × 4. Detects truncation via finish_reason: "length" and attempts JSON recovery before retrying.git clone https://github.com/islobodan/cotforce-mcp
cd cotforce-mcp
npm install
npm run build # compile TypeScript to dist/
npm run dev # tsc --watch
npm run typecheck # type-check src/ and tests/
| Script | Purpose |
|---|---|
npm run build |
Compile TypeScript (src/ → dist/) |
npm run dev |
Watch mode compilation |
npm run typecheck |
TypeScript type-checking for source and tests |
npm test |
Run full Jest test suite (133 tests) |
npm run test:smoke |
Quick smoke test via mcp-tester CLI |
npm run test:tools |
List available tools via mcp-tester CLI |
The test suite uses Jest with ts-jest (ESM) and @slbdn/mcp-tester for MCP server integration testing:
tests/parser.test.ts) — 47 unit tests covering all 5 parser plugins, edge cases, and AgenticCotSchema validation.tests/tokens.test.ts) — 16 unit tests for tiktoken integration, budget computation, and REASONING_OVERHEAD tuning.tests/schema.test.ts) — 8 unit tests for user-supplied resultSchema validation.tests/metrics.test.ts) — 9 unit tests for request counters, latency tracking, and token usage averages.tests/prompts.test.ts) — 10 unit tests for model-specific prompt selection.tests/llm.test.ts) — 3 unit tests for direct HTTP mode detection.tests/server.test.ts) — 11 integration tests for tool discovery, argument validation, server lifecycle, and concurrent calls.Custom Jest matchers are available via @slbdn/mcp-tester:
expect(tools).toHaveTool("solve_problem");
expect(tools).toHaveToolWithSchema("solve_problem");
expect(result).toReturnTextContaining("Reasoning:");
MIT © Slobodan Ivkovic
If you find CotForce-MCP useful, consider starring the repo and sharing your feedback!
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"cotforce-mcp": {
"command": "npx",
"args": []
}
}
}