loading…
Search for a command to run...
loading…
White-box MCP workflow orchestrator for Claude Code. Sexpr workflows stored in sqlite that you can cat, diff, patch, and replay across crashes; lets smaller / c
White-box MCP workflow orchestrator for Claude Code. Sexpr workflows stored in sqlite that you can cat, diff, patch, and replay across crashes; lets smaller / cheaper models handle multi-step agent tasks via deterministic orchestration instead of CLAUDE.md interpretation.
A white-box MCP server for Claude Code agent workflows. Survive a kill -9 mid-task. Read what your subagent did in plain text. Replay any step without rerunning the rest.
Claude Code is powerful in one session. The 4 things that break in production are paired with what NeuroLisp does about each:
| The pain | NeuroLisp's answer |
|---|---|
| You crash 8 hours in and lose all state. | Every step's cursor, prompt, and result lives in sqlite. Reconnect and nl_workflow_replay. |
You cannot audit what your subagent did. LangGraph objects inspect to <Node 0x...>. |
Every step is sexpr text in workflow_runs.steps_sexpr. Readable with cat. |
| Your patterns repeat but the agent forgets each time. | After 3 consistent observations, the pattern auto-crystallizes into a reusable named skill. |
| You want to fix one step without rerunning everything. | nl_workflow_patch_step <id> <step> <new-prompt>, then replay from that step only. |
The file you executed is the file you patch. The same sexpr is workflow, plan, template, and macroexpansion result. That is homoiconicity, and the value is concrete: you can read, diff, edit, and replay without ever leaving plain text.
CLAUDE.md and similar instruction files are an interpretive runtime. Every step, the model re-reads your rules, decides which skill to load, recalls past context, and improvises what to do next. That requires a very capable model. The cost grows with workflow complexity.
NeuroLisp's sexpr workflow is a deterministic runtime. Step order, tool whitelists, lexical scope, retry guards, pre-loaded skills, the briefing kit — all encoded in the sexpr before the model is invoked. The model only does the leaf work: draft this paragraph, classify this review, summarize these sources. The orchestration is the program, not the prompt.
CLAUDE.md interpretation NeuroLisp deterministic orchestration
───────────────────────── ────────────────────────────────────
Big model reads instructions Small model receives a fully-formed
Big model decides next step briefing for ONE leaf task
Big model loads context Workflow already loaded the context
Big model picks the tool Workflow already locked the tools
Big model improvises Sexpr executes deterministically
Cost scales with model + complexity Cost scales with leaf-task count only
Practical effect: a 5-step essay pipeline that needs a top-tier model to coordinate via instruction-file interpretation runs on a smaller, cheaper model under NeuroLisp because the coordination is in the sexpr. Same output quality, often two orders of magnitude cheaper per LLM call:
DeepSeek v4-flash $0.07 / 1M input tokens $0.27 / 1M output tokens
Claude Opus 4.x $15 / 1M input tokens $75 / 1M output tokens
~214× input ~278× output
Public list prices as of 2026-05. The 5-step essay benchmark in this repo lands at ~$0.003 per essay on DeepSeek v4-flash; the same workflow under a top-tier model would be in the $0.50 - $1.00 range.
This compounds. As you accumulate dozens of reusable pipelines and skills, you have a personal orchestration layer that any cheap model can drive. The intelligence migrates from the model into your workflow library, where it is inspectable, diff-able, and version-controlled.
Run a real workflow. One macro expands into 5 LLM steps: planner → 2 parallel researchers → 2 writers → reflect-revise editor.
(essay-atom-pipeline-scoped "GraphQL vs REST in 2026" "research-team")
;; → 4000+ word essay, ~$0.002-0.005 on DeepSeek v4-flash
Inspect what ran. Workflow + steps are sqlite rows, not opaque objects.
sqlite3 ~/.neurolisp_mcp.sqlite \
"SELECT id, status, cursor, length(steps_sexpr) FROM workflow_runs ORDER BY id DESC LIMIT 1"
# wf-9b2e... | complete | 5/5 | 842 chars of plain sexpr
Survive a crash. Pull the plug at step 4 of 5. Reconnect Claude Code and ask it to:
nl_workflow_replay(workflow_id="wf-9b2e...", from_step="draft-section-2")
# resumes at draft-section-2; outline + research-1 + research-2 reuse
# cached results from sqlite, cost: $0 for the first 3 steps
Hand-edit a step then replay.
nl_workflow_patch_step(workflow_id="wf-9b2e...",
step_name="reflect-revise",
new_prompt="Reflect using 7 quality dimensions, then revise.")
nl_workflow_replay(workflow_id="wf-9b2e...", from_step="reflect-revise")
No framework rerun. No LangGraph rebuild. Patch the sexpr and go.
Take the essay-atom-pipeline from the example above. It is intentionally complex enough to exercise every NeuroLisp mechanism: :auto T (server-side LLM), :tools deferred subagents (brain-side dispatch), a parallel group, a :retry-validate quality guard, a :sink length backstop, and :scope lexical-tool whitelisting through a NodeProfile. After the one-line macro call, here is what actually happens.
USER
│ "Write me an essay on GraphQL vs REST."
▼
┌──────────────────┐
│ Claude Code │ main agent in user's terminal
│ (the brain) │ decides to use NeuroLisp
└────────┬─────────┘
│
│ nl_eval_sexpr(
│ '(essay-atom-pipeline-scoped
│ "GraphQL vs REST in 2026"
│ "research-team")')
▼
┌──────────────────┐
│ NeuroLisp MCP │ parses sexpr, expands macro,
│ server │ walks workflow groups, persists
│ (Python) │ state to ~/.neurolisp_mcp.sqlite
└─┬────────────┬───┘
│ │
┌────────────┘ └────────────┐
│ for :auto T steps for :tools │
│ server calls LLM server │
│ provider directly hands back │
▼ deferred ▼
┌─────────────────┐ token ┌────────────────┐
│ LLM provider │ │ Claude Code │
│ DeepSeek / │◀──────────────────────│ Agent tool │
│ OpenAI-compat / │ HTTP request │ dispatches a │
│ Anthropic │ includes briefing │ fresh subagent │
│ (urllib only) │ │ with the │
└────────┬────────┘ │ briefing kit │
│ text response └───────┬────────┘
▼ │ subagent
apply :sink │ runs WebSearch
apply :retry-validate │ + WebFetch
write to wf.results │ + reasoning
write to corpus row in sqlite ▼
│ result text
│ ┌─────────────────────────┘
│ │ nl_resolve_subagent(token, result)
▼ ▼
workflow advances cursor
next group fires
auto-chain runs all :auto steps in one server call
│
▼
workflow status = complete
final result in wf.results["reflect-revise"]
│
▼
┌──────────────────┐
│ USER reads │
│ the essay │
└──────────────────┘
| # | Phase | Who acts | What actually happens |
|---|---|---|---|
| 1 | Macro expansion | Server | (essay-atom-pipeline-scoped ...) → 30-line (workflow (quote ...) (quote (5 steps))) AST in memory. No LLM yet. |
| 2 | Group 0: outline | Server + DeepSeek | :auto T planner step. Server builds briefing from essay-outline-architect skill + topic, HTTP-POSTs DeepSeek, gets 400-word outline, applies sink (none), writes wf.results["outline"] + corpus row auto-step:planner. |
| 3 | Group 1: research × 2 (parallel + deferred) | Brain → 2 subagents | :tools (WebSearch WebFetch) steps emit 2 deferred tokens. Brain receives parallel_steps payload, dispatches 2 Claude Code subagents via the Agent tool, each with its own briefing kit (Role / Task / Upstream Artifacts / SOP / Tools Available). Subagents call WebSearch + WebFetch independently. Brain receives 2 result strings, calls nl_resolve_subagent(token, result) twice. |
| 4 | Auto chain: groups 2-4 | Server + DeepSeek | After the 2nd resolve, server sees the next 3 groups (draft-section-1, draft-section-2, reflect-revise) are all :auto T. It runs them back-to-back in a single server-side loop (auto chain, v7.62), no brain round-trips. Each step's prompt references upstream step names which the env resolves to actual text. |
| 5 | Final guard | Server | reflect-revise has :sink (cond ((< (string-length result) 2000) (str "WARNING short essay..." result)) (T result)). If LLM truncates, the sink prepends a WARNING header before storing. wf.summaries["reflect-revise"] also stored for downstream brevity. |
| 6 | Return | Server → Brain → User | Server returns {complete: true, results: {6 step keys}}. Claude Code reads results["reflect-revise"] and shows the essay to the user. |
That separation is what lets a small, cheap leaf-model do the same end-to-end work that a single big-model session would otherwise need: every step is pre-decided in the sexpr, so the model is never asked to be smart about the plan.
sqlite3 ~/.neurolisp_mcp.sqlite "
SELECT primitive, success, length(output) AS out_chars, cost
FROM corpus
ORDER BY row_id DESC LIMIT 6"
invoke-subagent:editor | 1 | 4149 | 0.00097 -- reflect-revise (:auto T)
invoke-subagent:writer | 1 | 2104 | 0.00031 -- draft-section-2 (:auto T)
invoke-subagent:writer | 1 | 2087 | 0.00029 -- draft-section-1 (:auto T)
invoke-subagent:general-purpose | 1 | 1856 | 0.0 -- research-2 (:tools, brain-side)
invoke-subagent:general-purpose | 1 | 1734 | 0.0 -- research-1 (:tools, brain-side)
invoke-subagent:planner | 1 | 412 | 0.00018 -- outline (:auto T)
Every step writes a row with primitive prefix invoke-subagent:<agent> regardless of execution path. :auto T rows carry the server-side LLM cost; :tools rows show 0.0 because the cost lives on the brain side (subagent dispatch). :pure T steps use a different prefix pure-step:<agent>.
sqlite3 ~/.neurolisp_mcp.sqlite "
SELECT id, status, cursor, length(steps_sexpr) AS plan_chars
FROM workflow_runs ORDER BY rowid DESC LIMIT 1"
wf-9b2e... | complete | 5/5 | 842
6 corpus rows are append-only audit trail; the workflow_runs row is the resumable snapshot. Both are plain SQL. Both are diff-able. Nothing about this run is opaque.
pip install neurolisp
python -m neurolisp.health
Or install from source (for contributors):
git clone https://github.com/KevinBangbang/NeuroLisp.git
cd NeuroLisp
pip install -e ".[dev]"
python -m pytest -q
Wire into Claude Code by editing ~/.claude.json:
{
"mcpServers": {
"neurolisp": {
"command": "python",
"args": ["-m", "mcp_server.server"]
}
}
}
Restart Claude Code, run /mcp. All 45 nl_* tools appear.
For real LLM steps, export an API key:
export DEEPSEEK_API_KEY=sk-...
# or OPENAI_API_KEY / ANTHROPIC_API_KEY
OpenAI-compatible endpoints (Groq, Together, Cerebras, local vLLM, etc.) are supported by swapping base_url. See docs/00_quickstart.md for the 10-minute walkthrough.
kill -9. Step results live in sqlite, not RAM.cat-ing a sexpr. The execution plan is the same file you can hand-edit.nl_workflow_replay(workflow_id, from_step="step-name"). Cached upstream results stay valid.nl_workflow_patch_step updates the plan; the next replay picks up the change.stakes-route. Empirically 89% cheaper than the full pipeline (v0.34 routine vs full bench).urllib, no SDK lock-in.| If you want to... | Read this |
|---|---|
| Try it in 10 minutes | docs/00_quickstart.md |
| Understand why each layer exists | docs/01_concepts/00_first_principles.md |
| Follow a build-up tutorial | docs/07_tutorial-book/ 17 chapters |
| Browse all 45 MCP tools | docs/02_reference/mcp-tools.md |
| Read the 8 invariants and anti-goals | NORTH_STAR.md |
| See empirical benchmarks | docs/BENCHMARKS.md |
| Run example scripts locally | examples/ 3 standalone demos |
| Browse version history | CHANGELOG.md |
This is a 1-maintainer project. Realistic response time: a few days for bug reports with reproducers, longer for open-ended discussions.
NeuroLisp is small by design. We review every PR with Occam's razor. New atoms, modules, or workflows must pass real-LLM end-to-end A/B validation before landing on main. See CONTRIBUTING.md for setup, conventions, and the in-scope / out-of-scope list.
Apache 2.0. Copyright Bangcheng Wang and NeuroLisp contributors.
Run in your terminal:
claude mcp add neurolisp -- npx Security
Low riskAutomated heuristic from public metadata — not a security guarantee.