loading…
Search for a command to run...
loading…
A local MCP server that lets Claude delegate scoped work to Codex with structured results and guardrails, supporting planning, code review, build, reverse engin
A local MCP server that lets Claude delegate scoped work to Codex with structured results and guardrails, supporting planning, code review, build, reverse engineering, and long-running background tasks.
A local stdio MCP server that lets Claude delegate scoped work to Codex with structured results, persistent artifacts, and guardrails around filesystem access, review fan-out, and reverse-engineering workflows.
It gives Claude a sharper tool surface than a raw shell handoff: planning stays read-only, builds stay scoped, reviews can require specialist subagents, and every run leaves behind inspectable logs and outputs. It's especially useful for long-running analysis, Ghidra sessions, and deeper code review passes.
| Tool | Description |
|---|---|
plan |
Break down a task and propose a scoped plan without editing files. |
research |
Investigate code, docs, and context in read-only mode and report findings. |
brainstorm |
Evaluate an idea, scope an MVP, and recommend whether it is worth building. |
build |
Implement a change, run focused verification, and report results. |
validate |
Run existing repo validation commands (build, test, lint) and report the results. |
review |
Review code or changes directly for one selected agent, or fan out to one Codex subagent per selected review agent when multiple agents are selected. |
reverse_engineer |
Use reverse-engineering tooling and broader roots to investigate binaries. |
start_run |
Start any Dobby tool in the background and return immediately with a task id. |
get_run |
Fetch the status or final result for a background run by task id. |
wait_run |
Block until one or more background runs finish (or a timeout elapses), then return the first result. |
list_runs |
List recent runs for a repo so you can recover task ids and results after timeouts. |
uvSome tools work best (or only work) when specific MCP servers are available in the parent environment:
research and brainstorm: fetchaller-mcp MCP server for web search, URL fetching, and Reddit browsing. Without it, these tools fall back to codebase-only analysis.reverse_engineer: ghidra-mcp bridge for binary analysis via Ghidra. Without it, reverse engineering is limited to whatever other tools are available in the sandbox.Dobby auto-detects these integrations from the active Codex MCP config. If an integration is not installed or not configured for the run, Dobby does not try to use it: the worker prompt explicitly tells Codex not to call it and to continue with the best non-integration path available.
Recommended local tool install:
uv tool install .
codex-dobby-mcp
One-off local execution without installing:
uvx --from . codex-dobby-mcp
When this package is published to PyPI, replace . with codex-dobby-mcp.
Development checkout:
uv sync
uv run codex-dobby-mcp
Target repo is resolved in this order: explicit repo_root arg → MCP metadata (_meta.repo_root, repo_root, repoRoot, working_directory, workingDirectory, cwd) → server cwd. If your client sends working-directory metadata, that is enough. Otherwise wrap the launch with cd.
Safety guard: if repo_root is omitted and the prompt clearly references an absolute path inside a different git worktree, Dobby fails fast instead of silently defaulting to the server cwd. It also refuses to guess when the request only names relative files that do not exist under the server cwd. The caller should retry with explicit repo_root or correct working-directory metadata.
Example launch with an installed tool:
{
"mcpServers": {
"codex-dobby": {
"command": "sh",
"args": ["-lc", "cd /ABSOLUTE/PATH/TO/TARGET-REPO && codex-dobby-mcp"]
}
}
}
Example launch from a source checkout:
{
"mcpServers": {
"codex-dobby": {
"command": "sh",
"args": ["-lc", "cd /ABSOLUTE/PATH/TO/TARGET-REPO && uv --directory /ABSOLUTE/PATH/TO/codex-dobby-mcp run codex-dobby-mcp"]
}
}
}
If you use Dobby from Claude Code, add this to your CLAUDE.md so Claude delegates correctly:
## Delegating Work to Dobby (codex-dobby MCP)
Offload grunt work — build/test, code review, research, planning, implementation, brainstorming, reverse engineering — to the `codex-dobby` MCP tools instead of doing it inline. Saves tokens and context.
- Give focused prompts with a concrete outcome. One task per call — if you have multiple things to ask, make multiple Dobby calls (in parallel when independent) instead of bundling them into one vague prompt.
- Call `mcp__codex-dobby__*` directly. Never wrap them in a general-purpose Agent/Task subagent.
- Don't lower `timeout_seconds` below the default. Err too long — a short timeout kills the run; a long one costs nothing because Dobby returns as soon as it's ready.
- For long work, start it with `mcp__codex-dobby__start_run` and then either block on `mcp__codex-dobby__wait_run` (parent sleeps in the tool call) or poll `mcp__codex-dobby__get_run` (parent keeps working). On Claude Code, `/loop` or `ScheduleWakeup` can schedule the polls for you so the parent is free between checks.
Common params: prompt, repo_root, files, important_context, timeout_seconds, extra_roots, model, reasoning_effort. Tool-specific: danger (build, reverse_engineer), agents (review).
review agents: generalist (default), security, performance, architecture, correctness, ux, regression. Pass multiple for multi-agent review.
start_run takes the same params as the target tool, plus required tool. If timeout_seconds is omitted, it defaults to that target tool's normal timeout. Non-empty agents are only accepted when tool is review; other tools reject them with a validation error.
get_run params: task_id, optional repo_root.
wait_run params: optional task_id (single run), optional task_ids (list — first-to-finish wins), optional repo_root, timeout_seconds (default 540s / 9 min, clamped to [1, 100_000] / ~27.8 hours — matching Claude Code's MCP_TOOL_TIMEOUT default of 100000000 ms). Omit both task_id and task_ids to wait on every currently-live run for the repo. Passing both is rejected; an empty task_ids list is rejected. On timeout returns a running lookup whose pending_task_ids lists the ids still outstanding, and whose summary instructs the caller to re-call wait_run with that list until one finishes. Pick timeout_seconds below your MCP client's own tools/call ceiling — Claude Code defaults to ~28 hours (so the full clamp is usable); Codex CLI defaults to 60s per [mcp_servers.<id>].tool_timeout_sec (so raise that in ~/.codex/config.toml before using long waits from Codex); Claude Desktop / Cursor / Cline / Continue vary and may cap low.
list_runs params: optional repo_root, optional limit.
For clients with a short tools/call ceiling (Claude Desktop ~60s, unconfigured Codex CLI 60s), prefer start_run + get_run/list_runs for long review, research, build, validate, or reverse_engineer work. Where the ceiling is raised (Claude Code defaults to ~28h, Codex CLI with tool_timeout_sec overridden), start_run + wait_run is usually fewer round-trips than polling.
Default model is gpt-5.5 for all tools and review subagents. Any explicit timeout_seconds must be at least 300s.
| Tool | Timeout | Reasoning | Sandbox |
|---|---|---|---|
plan |
600s | high | read-only |
research |
1200s | medium | read-only |
brainstorm |
600s | high | read-only |
review |
600s default, 1200s recommended for multi-agent | medium | read-only |
validate |
600s | medium | workspace-write via --full-auto |
build |
1200s | high | workspace-write via --full-auto |
reverse_engineer |
1800s | high | workspace-write via --full-auto |
build and reverse_engineer switch to danger-full-access when danger=true.
start_run, get_run, and list_runs are control-plane tools and return immediately. start_run uses the selected target tool's timeout budget. wait_run is also a control-plane tool, but it intentionally blocks up to its own timeout_seconds (capped at 100_000s / ~27.8h, matching Claude Code's MCP_TOOL_TIMEOUT default) waiting for a background run to finish.
CODEX_HOME under the system temp directory (.../codex-dobby/<task-id>/codex-home) instead of pointing children at the user's global Codex home directly.research prefers codebase evidence and uses fetchaller MCP tools when available. If fetchaller is not installed or not configured for the run, the worker is told not to call it and to continue without web MCP support.validate runs in workspace-write --full-auto because validation often needs temp or cache writes; the worker prompt still forbids source edits and commits.review uses a direct single-lens path for one agent, or multi-agent orchestration (via spawn_agent over codex exec --json) for multiple. Single-agent review defaults to gpt-5.5 at medium reasoning. Multi-agent review uses a gpt-5.5 parent at medium reasoning and injects gpt-5.5 reviewer subagents, also at medium by default.reverse_engineer includes a Ghidra MCP workflow only when Ghidra is installed and configured for the run. When Dobby can discover Ghidra from the active Codex configs (CODEX_HOME/config.toml and repo-local .codex/config.toml), it adds the configured Ghidra MCP helper repo as a writable helper root. When a live Ghidra UDS socket runtime directory is discoverable, Dobby also mounts that runtime path so child reverse-engineering workers can reach the already-running Ghidra instance. In that live-UDS case, Dobby enables workspace-write network access and passes the discovered socket roots through network.allow_unix_sockets for the child Codex run. If Ghidra is not installed or not configured, the worker is told not to call mcp__ghidra__*.reverse_engineer responses include reverse_engineer_details.ghidra, which reports whether Ghidra was configured, whether the run used direct MCP calls or the mounted helper fallback, and which Ghidra calls were observed.start_run launches the selected Dobby tool in the server process and returns a task_id immediately. get_run first checks any still-live in-memory run, then falls back to the run artifacts on disk.wait_run awaits the same live in-memory task via asyncio.wait_for with asyncio.shield, so if the caller is cancelled (client disconnect, outer MCP timeout) the underlying background run keeps going and can still be recovered with get_run/wait_run. If the task isn't in the live registry — e.g. the server restarted — wait_run falls through to the on-disk artifact lookup without polling.wait_run (task_ids=[...] or no ids → all-live) uses asyncio.wait(..., return_when=FIRST_COMPLETED) over shielded futures and returns the first run to finish. The response's pending_task_ids lists the still-outstanding ids; callers should re-call wait_run(task_ids=pending_task_ids) until it's empty. On timeout the primary entry is the first id in the input list; pending_task_ids holds every id still waiting.get_run sees either the startup placeholder or the final persisted response instead of a partially written result.json.review, research, etc.) can still hit the caller's outer tools/call ceiling. Prefer start_run + wait_run / get_run when that's a concern.read-only. Dobby still mounts the per-run artifact directory plus any in-repo extra_roots; extra_roots outside the repo are exposed as additional read-only roots, not writable roots.extra_roots writable via --add-dir..codex-dobby/ is present in .gitignore. Unsafe .gitignore targets, such as symlinks or multiply-linked files, fail closed.stop_reason: sandbox_violation only when the worker self-reports having modified files that the wrapper also observed.~/.codex/sessions. Dobby seeds the child home from CODEX_HOME/auth.json and CODEX_HOME/config.toml when those files exist, mirrors referenced helper files from CODEX_HOME and CLAUDE_CONFIG_DIR into a private runtime, then points the child at that private temp home. The server process therefore needs read access to the parent Codex and Claude config files plus read/write access to the temp runtime directory.CODEX_DOBBY_ACTIVE=1 is set on child runs and Dobby refuses to run if already set. Inherited codex-dobby-mcp entries are disabled so workers can't call back.status: "error".task_id values and symlinked artifact roots or paths. Wrapper writes also fail closed on unsafe .gitignore targets.Each run writes to <target-root>/.codex-dobby/runs/<task-id>/: request.json, prompt.txt, stdout.log, stderr.log, last_message.txt, result.json, output-schema.json, events.jsonl. Multi-agent review logs are also JSONL inside stdout.log. events.jsonl is the durable record of streaming progress events for the run (one ACP-shaped event per line; see Streaming progress below). Treat .codex-dobby/ as unredacted local logs.
Worker-facing tools (plan, research, brainstorm, build, validate, review, reverse_engineer) return task_id, tool, status, summary, completeness, important_facts, next_steps, files_changed (this run only), file_diffs (mutating tools only — see File diffs), artifact_paths, sandbox_violations, repo_root, exit_code, duration_ms, warnings, raw_output_available, model, reasoning_effort, result_state, and stop_reason (see Stop reasons). review responses also include review_details, where requested_review_agents is the raw caller-supplied list and effective_review_agents is the normalized/defaulted list Dobby actually used. reverse_engineer responses also include reverse_engineer_details.ghidra, with mode, summary, and the observed mcp_calls and helper_calls.
Every run writes ACP-shaped events to <run-dir>/events.jsonl, regardless of whether anyone is listening. When the MCP caller passes a progressToken in the tools/call _meta, Dobby also forwards each event live as an MCP notifications/progress. The numeric progress field is an event counter, message is a short human-readable label (the tool title or event type), and the structured event lives in _meta.acpEvent for clients that want it. Clients that don't pass a token receive nothing — runs continue normally.
Event vocabulary borrows ACP's session/update payload shapes: agent_message_chunk for streamed model text, tool_call / tool_call_update for tool invocations (with kind ∈ read|edit|delete|move|search|execute|think|fetch|other), and plan for worker outlines. The codex --json event types we map (and ignore) are documented in src/codex_dobby_mcp/events.py.
For background runs (start_run + wait_run), live events are forwarded only to whichever wait_run caller is currently subscribed. The full history stays available in events.jsonl regardless.
Each ToolResponse carries a stop_reason that classifies why the run ended. The first five values mirror ACP's StopReason enum verbatim:
end_turn — model finished naturally (the typical success case)max_tokens — reached the model's token budget (only when codex emits a usage signal)max_turn_requests — exceeded the model's turn-request budgetrefusal — worker explicitly refused the task (worker output set refused: true)cancelled — Dobby was cancelled or killed before completionPlus four Dobby-specific extensions for failure modes ACP doesn't model:
timeout — run hit timeout_secondsstall — codex went idle for too long and Dobby killed itsandbox_violation — read-only tool tried to write, mutating tool tried to commiterror — codex exited non-zero or returned malformed outputLegacy run artifacts written before this field existed are still readable; stop_reason is optional on RunLookupResponse.
Mutating tools (build, validate, reverse_engineer) populate a file_diffs: list[FileDiff] field with one entry per changed file:
{
"path": "/abs/repo/src/foo.py",
"oldText": "...", // null for new files
"newText": "...", // null for deleted files
"truncated": false // true for files Dobby couldn't represent as text
}
Field names use ACP's camelCase shape (oldText/newText) on the wire, in result.json, and in events.jsonl. The Pydantic model uses Pythonic snake_case fields internally with camelCase aliases, and every serialization path passes by_alias=True so on-disk and on-the-wire shapes match. Worker JSON also uses camelCase (the oldText/newText schema is enforced).
Files larger than 2 MB or that look binary are reported with truncated: true and both texts left as null. Files outside the repo root are not diffed. The wrapper detects diffs from snapshot deltas; workers may also attach file_diffs in their structured output, which Dobby uses to fill in gaps the wrapper couldn't capture (e.g. a stashed snippet of a very large file).
Read-only tools always have file_diffs: [].
start_run returns an AsyncRunHandle with task_id, tool, state, summary, repo_root, artifact_paths, model, and reasoning_effortget_run returns a RunLookupResponse with task_id, state, summary, repo_root, optional tool, optional status, optional result_state, optional stop_reason, optional final result, artifact metadata, and warningswait_run returns the same RunLookupResponse shape as get_run plus an optional pending_task_ids list (populated whenever the caller passed task_ids or used the all-live mode). On timeout state is running and result is unset, on completion state is finished with the final result populatedlist_runs returns the resolved repo_root plus recent run summaries for that repoIf your MCP client gives up on long blocking tool calls before Dobby finishes, use the async path:
{
"tool": "start_run",
"arguments": {
"tool": "review",
"prompt": "Review the current uncommitted state",
"repo_root": "/ABSOLUTE/PATH/TO/TARGET-REPO",
"files": ["src/foo.ts", "ui/main.ts"]
}
}
This returns quickly with a task_id. Then poll or recover the result:
{
"tool": "get_run",
"arguments": {
"task_id": "<task-id>",
"repo_root": "/ABSOLUTE/PATH/TO/TARGET-REPO"
}
}
If you lost the id because a previous blocking call timed out, list_runs reads .codex-dobby/runs/ and shows recent task ids and summaries.
If you'd rather block than poll, call wait_run:
{
"tool": "wait_run",
"arguments": {
"task_id": "<task-id>",
"repo_root": "/ABSOLUTE/PATH/TO/TARGET-REPO"
}
}
To be woken by whichever of several runs finishes first, pass task_ids instead:
{
"tool": "wait_run",
"arguments": {
"task_ids": ["<task-id-1>", "<task-id-2>", "<task-id-3>"],
"repo_root": "/ABSOLUTE/PATH/TO/TARGET-REPO"
}
}
Omit both task_id and task_ids to wait on every currently-live run for the repo. On completion / timeout semantics, see the wait_run params in Requests — the short version: the response's pending_task_ids tells you what to re-call wait_run with.
States reported by get_run and wait_run:
running: the server still has the run alive in memoryfinished: a final ToolResponse is available and result_state is finalunknown: the run directory exists but no readable final result is available; if result_state is placeholder, only the startup placeholder artifact was writtennot_found: there is no matching run directory, or the supplied task_id is invalidImportant limitation: live background tracking is in-process. If the server restarts, running background work is lost. Completed results remain recoverable from .codex-dobby/runs/.
uv run pytest
uv build --offline --no-build-isolation
uv run mcp dev src/codex_dobby_mcp/server.py:app
uv run python -m codex_dobby_mcp
Выполни в терминале:
claude mcp add codex-dobby-mcp -- npx Не уверен что выбрать?
Найди свой стек за 60 секунд
Автор?
Embed-бейдж для README
Похожее
Все в категории development