loading…
Search for a command to run...
loading…
A local, offline MCP server that provides a natural-language interface to Clark, a warehouse workforce RL agent, enabling plan queries, what-if analysis, and de
A local, offline MCP server that provides a natural-language interface to Clark, a warehouse workforce RL agent, enabling plan queries, what-if analysis, and decision explanations without any cloud dependencies.
Local, offline natural-language interface to Clark (the warehouse workforce RL agent). Plain English in → real staffing decisions out. No cloud, no API cost, no data egress.
Source-available under PolyForm Noncommercial 1.0.0. The trained Clark foundation and managed deployments are commercial — see Clark's repo for access.
Ask "what's the opening plan for the east dock Tuesday, and what happens if two pickers call off?" — a local LLM turns that into real Clark tool calls and explains the result honestly. Nothing leaves the machine.
┌── chat ─────────▶ hermes3:8b (Ollama, local)
│ │ tool calls
browser ──HTTP──▶ web UI ▼
│ clark_mcp server (MCP, stdio)
└── staffing sweep ──▶ │
▼ HTTP
clark serve (localhost inference API)
│
▼ real Clark RL inference
Four thin layers, each independently testable, no shared process:
clark serve (in the clark
repo) — a minimal localhost inference API: 7 stateless read routes
(incl. /simulate for staffing-sufficiency outcomes and
/capabilities for architectural facts), weights loaded once. Not
part of this repo; this repo consumes it.clark_mcp/server.py — a real MCP server (any MCP host can use
it) exposing 7 tools: clark_list_facilities,
clark_facility_info, clark_capabilities (architectural facts —
the model looks them up rather than memorizing), clark_get_plan,
clark_what_if, clark_staffing_sweep (chat-driven version of the
dashboard sweep — answers "would hiring 2 more help?"),
clark_explain_decision.clark_mcp/agent.py — a fully-local client: a Hermes-3-8B model
in Ollama drives those tools and explains the result. Zero external
calls. Multi-turn aware (history persists across calls).clark_mcp/web/ — a tiny stdlib local web UI (browser →
http://127.0.0.1:8765). Chat panel + a staffing-sufficiency
sweep dashboard panel that visualizes grade distribution across
roster sizes. No JS frameworks, no chart libraries.Nothing here re-implements inference — every tool delegates to Clark's
localhost API over HTTP. clark_explain_decision returns Clark's plan
plus the facility's rules as grounding; the explanation is the
model's, not Clark policy introspection (Clark is an RL policy — it
emits actions, not reasons).
See docs/ARCHITECTURE.md for the full design, tool contracts, the honesty model, and the fine-tune pipeline.
This system is designed to be a truthful staffing tool, not a confident one. Three rules — two enforced structurally by the data model, one enforced by SYSTEM prompt + fine-tune training:
clark serve call; the LLM
cannot synthesize assignments. A tool error or unknown facility
is reported plainly. With CLARK_CONSTRAINED=1 (default in the
bundled web UI), facility_id is constrained at decode time to
the live enum, so the model can't even propose a call with a
fabricated facility name. (Surrounding prose has no structural
guard — that's the fine-tune's job, gated by tools/live_audit.py.)non_introspection probe catches it./plan returns
start-of-day assignments only; /simulate is the separate
endpoint that runs the policy end-to-end. The data model
enforces the distinction — the assignments object never
contains a simulated grade.A trained Clark genuinely fails some days (a roster can be too thin for its volume). The tool is meant to surface those failures honestly — not to be tuned until it always "wins."
# 1. Clark inference API (from the clark repo, on a stable checkpoint)
clark serve --model <checkpoint.pt> --facilities-dir clark/data/configs --port 8000
# 2. Ollama with the model pulled
ollama pull hermes3:8b
# (or: ollama run clark-hermes3:ft to use the locally-deployed fine-tune)
# 3. this:
pip install -e .
# Three surfaces — pick one:
python -m clark_mcp.web # local web UI (http://127.0.0.1:8765)
python -m clark_mcp.agent # interactive REPL (multi-turn; /reset to restart)
clark-mcp # MCP server for any MCP host (Claude Desktop, Cursor, etc.)
Windows shortcut: double-click Run Clark Chat.bat at the
repo root instead of step 3. It:
hermes3:8b on first run (~5 GB, one-time)clark serve in a sibling cmd window if it isn't already
bound on :8000, with TIME_WAIT retry; same probe loop runs every 15s
while the chat is alive so an unexpected death gets auto-relaunched with
a 60s cooldownCLARK_CONSTRAINED=1 so Ollama enforces the <tool_call> envelopeThe web UI surfaces three header buttons beyond the chat itself:
clark_staffing_sweep against a roster sweep
(+0…+N extra workers) and renders a stacked-bar grade-distribution
chart per roster size. Same /simulate primitive that powered the
Validated-on-Jack head-to-head in clark's README.If clark serve ever becomes unreachable mid-session, a red banner
appears above the header naming the state ("auto-relaunching now…" /
"auto-relaunch failed: CLARK_LLM_MODEL=clark-hermes3:ft
in the shell before launching to use the fine-tune instead of the
base model.
clark-mcp's chat is read-only over existing facilities — it plans,
explains, what-ifs, and sweeps roster sizes for facilities that are
already configured in clark serve. It cannot create, edit, or save
new facility configs (none of its 7 tools write data). If you ask the
chat "set up my warehouse", the right answer is "that's the
wizard's job; come back here once your YAML is loaded."
To author a new facility, use the Clark setup
wizard
in the clark repo — separate local web UI (port :8090) with two modes:
Both write a validated YAML to clark/data/configs/user/. The wizard's
"Train on my facility" button defaults to 50 episodes (~3.3 h on a
consumer GPU) per the Jack-validation floor; bump it on the review step
for deeper training. End-to-end loop:
1. (clark repo) Run Clark Wizard.bat → writes my_warehouse.yaml
2. (clark repo) clark serve --facilities-dir clark/data/configs --port 8000
← restart so it picks up the new YAML
3. (clark-mcp repo) Run Clark Chat.bat → ask plans, what-ifs, sweeps
Quicker: from the chat's header, click New facility ↗ — it auto-spawns the wizard for you. Each surface is independent; you can run either without the other.
Because clark-mcp speaks the standard Model Context Protocol over
stdio, any MCP-aware host can drive the 7 Clark tools without using
this repo's bundled chat or web UI.
Claude Desktop — edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"clark": {
"command": "clark-mcp",
"env": {
"CLARK_API_URL": "http://127.0.0.1:8000"
}
}
}
}
(Use the absolute path to clark-mcp if it's not on PATH —
the executable lives in your venv's Scripts/ or bin/ dir.)
Cursor — ~/.cursor/mcp.json (or workspace .cursor/mcp.json):
{
"mcpServers": {
"clark": {
"command": "clark-mcp",
"env": {"CLARK_API_URL": "http://127.0.0.1:8000"}
}
}
}
Same shape for Continue, Cline, Zed, and any other host that follows the MCP config convention.
After restarting the host, the 7 Clark tools are available alongside the host's other tools. The trade-off vs. the bundled web UI:
python -m clark_mcp.web) — uses a local
LLM (Ollama, fully offline) and adds the staffing-sufficiency
dashboard panel. The locally-deployed QLoRA fine-tune
(clark-hermes3:ft) is trained to call these tools reflexively;
current iter + metrics live in the Status table below.Prereqs (both): clark serve running on the URL given in
CLARK_API_URL.
pip install -e ".[dev]"
pytest
Two layers, honest about what each one covers:
tests/test_chat_format.py — 12 pure-Python unit tests for the
shared tool-calling protocol (render, parse_tool_calls strict +
lenient, prefix-stability for assistant-only masking, TOOLS schema
pin). No external dependencies. Catches the class of bug that bit
twice during Phase 3 (regex false-negatives on FT-style refusals;
parser too-strict on bare-JSON tool calls).tests/test_tools.py — end-to-end against the real Clark
Phase 0 API in-process (httpx ASGITransport, untrained
ClarkAgent). Skips cleanly via pytest.importorskip("clark") if
the clark package isn't installed in the env. So a fresh
clark-mcp checkout's pytest always passes — the 12 unit tests
always run; the end-to-end tests run if and only if clark is
available.The agent.py LLM loop and MCP stdio transport are smoke-tested,
not regression-covered in CI — exercised end-to-end via the live
web UI and tools/live_audit.py, an
autoregressive probe suite (the gate the teacher-forced eval.py
structurally can't be). Run it any time after deploying a new
fine-tune:
# Default targets the deployed fine-tune
python tools/live_audit.py
# Compare against base
python tools/live_audit.py --model hermes3:8b
Reports pass/fail per probe (single-turn fact lookup, refusal, architectural capability, multi-turn context, conversational synthesis, what-if flow) and exits non-zero on any failure.
| Phase | What | State |
|---|---|---|
| 0 | Minimal localhost Clark inference API (clark serve) |
Built + hardened. Non-facility configs → clean 422 (not 500); seeded /plan reproducible; pytest-green against real Clark. |
| 1 | MCP server + fully-local Hermes-3 client | Built. Tool layer regression-covered; LLM loop + stdio smoke-tested. |
| 2 | Fine-tune dataset | Built + quality-gated. Real-API generator + 8-example gold bar; 412 curated examples in finetune/dataset.jsonl (plan, what-if, explain, multi-turn, numeric-grounding, synthesis, capabilities, anti-hallucination, multitool, concept, failure, staffing-sweep), every taught behavior covered, no category > 30%. See finetune/DATASET.md. |
| 3 | QLoRA domain fine-tune of the local model | clark-hermes3:ft deployed locally (no public release — see Commercial access in clark's README). Latest teacher-forced eval (n=173 held-out): format_validity 0.000→1.000, tool_selection 0.719→0.977, tool_args 0.520→0.760, numeric_grounding 0.071→1.000, honest_failure/grounding_fidelity/non_introspection at 1.000/1.000/1.000. The teacher-forced metric is structurally blind to autoregressive conversational defects — the live tools/live_audit.py probes (incl. agreement_inline_call, fake-facility refusal, unstructured-input handling) are the honest gate; runtime ships with constrained-decode (CLARK_CONSTRAINED=1 in Run Clark Chat.bat) so the envelope is enforced at decode time regardless. See finetune/PHASE3.md for the full scorecard. |
| 4 | Integration + the staffing-sufficiency what-if | Primitive + dashboard + MCP tool all built. Clark's /simulate endpoint runs the policy end-to-end at a given roster size (with extra_workers for additive sweeps). The web UI's "Staffing sweep" panel calls it and renders a stacked-bar chart of grade distribution per roster size. The clark_staffing_sweep MCP tool (added; counts toward the 7-tool MCP layer now) lets the chat answer questions like "would hiring 2 more help?" directly — same primitive that powered the Validated-on-Jack head-to-head in the Clark README. Honest residual: this tool was added late in dataset construction, so each iter expands its training coverage — the schema is always in the system prompt (so the tool is available even with the base model), but reflexive arg-shape recall improves iter over iter. Current iter's tool_args rate is in the Phase 3 row above. |
| 5 | Portfolio write-up | Planned. |
Honest scope of the result: the pipeline works end to end; plan
quality is only as good as the Clark checkpoint behind it, and answer
fluency is whatever LLM is loaded. The runtime default is
hermes3:8b (the base model — works out of the box for anyone with
Ollama). To use the locally-deployed fine-tune, set
CLARK_LLM_MODEL=clark-hermes3:ft in the environment. The fine-tuned
model is built per the Phase 3 runbook
(finetune/PHASE3.md) and is not bundled as a
public artifact — see clark's README ("Use Clark / Commercial
access") for why and how.
Architectural decisions and constraints are recorded under .context/ (context-keeper): why the runtime is HTTP-decoupled from Clark, why every fine-tune payload must be live-captured, why the dataset is quality-gated against the gold set. Read these before changing the contract.
Run in your terminal:
claude mcp add clark-mcp -- npx