loading…
Search for a command to run...
loading…
Enables multi-agent code review with cross-verification of findings against source code, catching hallucinations and improving agent accuracy over time.
Enables multi-agent code review with cross-verification of findings against source code, catching hallucinations and improving agent accuracy over time.
weightless in-context RL for code review — agents that learn from grounded signals, no weights touched.
Install · First Run · Daily Use · Dashboard · Troubleshooting · Config · For AI Agents
Live dashboard at http://localhost:63007/dashboard — fleet vortex, signal stream, skill graduation grid, and consensus flow, all in real time.
Per-agent leaderboard — accuracy, uniqueness, impact, hallucinations caught, and dispatch weight. Updated after every consensus round.
The single-reviewer failure mode: a solo AI reviewer ships hallucinated bugs as critical findings 5–10% of the time in our internal usage. Gossipcat's cross-review drops that to under 1%. That delta is what the whole system exists to produce.
Multi-agent consensus code review that catches hallucinations before you act on them — and gets smarter every session.
Gossipcat is an MCP server for Claude Code that runs 3+ AI agents in parallel to review your code. They independently find bugs, then cross-review each other's findings. Confirmed = real. Caught = hallucination, penalized. Over time, agents accumulate accuracy profiles and the system routes tasks to whoever is most reliable for that category. No weights updated — the "policy" is a markdown skill file.
Gossipcat is an MCP server that orchestrates multiple AI agents to review your code in parallel. Agents independently review, then cross-review each other's findings. Agreements are confirmed. Hallucinations are caught and penalized. Over time, each agent builds an accuracy profile — the system learns who to trust for what.
Most RL pipelines update model weights. Gossipcat doesn't touch weights — it learns by updating the prompt layer.
Every finding an agent produces must cite a real file:line. Peers verify those citations against actual source code. Verified findings (and caught hallucinations) become grounded reward signals — no judge model, no subjective grade, just mechanical checks against ground truth. Those signals update per-agent competency scores, which steer future dispatch. When an agent keeps failing in a category, a targeted skill file is auto-generated from its own failure history and injected into future prompts.
flowchart LR
A([agent review]) -->|cites file:line| B([peer cross-review])
B -->|verifies against code| C{verdict}
C -->|confirmed| D[reward signal]
C -->|hallucination| E[penalty signal]
D --> F[competency score]
E --> F
F -->|steer dispatch| G([next agent pick])
E -->|≥3 in category| H[auto-generate skill]
H -->|inject into prompt| A
G --> A
style A fill:#0ea5e9,stroke:#0369a1,color:#fff
style H fill:#f59e0b,stroke:#b45309,color:#fff
style D fill:#10b981,stroke:#047857,color:#fff
style E fill:#ef4444,stroke:#b91c1c,color:#fff
The "policy update" is a markdown file under .gossip/agents/<id>/skills/. No fine-tuning, no RLHF infrastructure, no labelling pipeline. The reward signal is grounded in source code rather than a judge model, which is the piece that makes the loop trustworthy enough to automate. When agents disagree, we check the code — not another LLM's opinion.
| What you get | Hallucination filtering | Agents improve over time | |
|---|---|---|---|
| Gossipcat | 3+ agents cross-review each other's findings; confirmed bugs only | Yes — peers catch and penalize hallucinations mechanically | Yes — accuracy signals steer dispatch; skill files fix repeat failures |
| Single-agent review (Claude Code built-in, Cursor review) | One model reviews your diff | No — hallucinations ship as findings | No — no feedback loop |
| LLM-as-judge cross-review (most multi-agent frameworks) | One model grades another model's output | Partial — judge can hallucinate too; no ground truth | No — judge scores aren't wired to dispatch |
| Traditional review tools (CodeRabbit, PR-Agent) | Pattern-match + one LLM pass | No | No |
The core difference: gossipcat verifies findings against actual file:line citations in your codebase. That ground truth is what makes the reward signal trustworthy enough to automate.
What a typical gossipcat session looks like in practice (2026-05-22, v0.4.30 ship):
4b28a1c, 1255+ LOC, 50/50 new tests, zero regressions). Full design ↔ ship arc through gossipcat itself: 6 consensus rounds on the spec before any code was written.run() path (rev-3, opus-implementer), a phantom AgentTeam type the implementer would have invented if shipped (rev-5, sonnet-reviewer grep-grounded against live AgentConfig), and a metadata field that didn't exist on ConsensusSignal at all (rev-4, sonnet — metadata lives only on MetaSignal / PipelineSignal). Each round produced a measurable rev: rev-1 had 4 HIGHs, rev-6 had 0.signal-allowlist-drift.test.ts:108 (which exists precisely to catch this — same failure mode as PR #329's silent-drop of transport_failure) flagged that the implementer added the 2 new signals to KNOWN_SIGNALS + the type union + OPERATIONAL_SIGNAL_NAMES but missed VALID_CONSENSUS_SIGNALS in performance-writer.ts. 7-line fix landed in the same PR before merge.citation_grounding agreement signal so the agent's pattern compounds across sessions.Nothing landed without cross-review. Two agents got +/- score adjustments based on what they caught vs. what they missed. The spec is now usable as a worked example of what 6 rounds of multi-agent design review looks like — docs/superpowers/specs/2026-05-21-consensus-auto-verify-design.md.
| Without gossipcat | With gossipcat |
|---|---|
| One AI reviews your code — and hallucinates a finding you waste 20 minutes on | Multiple agents cross-check each other — hallucinations get caught before you see them |
| Every agent gets the same tasks regardless of track record | Dispatch weights route tasks to the agent with the best accuracy in that category |
| An agent keeps making the same class of mistake | Skill files are auto-generated from failure data and injected into future prompts |
| You don't know which agent to trust | Accuracy, uniqueness, and reliability scores are tracked per agent, per category |
Consensus Review3+ agents review independently, then cross-review each other. Findings tagged as CONFIRMED, DISPUTED, or UNIQUE. |
Adaptive DispatchAgent accuracy is tracked per-category. Dispatch weights adjust automatically — the best agent for the job gets picked. |
Skill DevelopmentWhen an agent keeps failing in a category, targeted skills are generated from failure data and injected into future prompts. Effectiveness is measured with a z-test on post-bind signals — passed, failed, or inconclusive. |
Multi-ProviderMix Anthropic, Google, OpenAI, and OpenClaw agents in one team. Each brings different strengths. Native agents need no API key. 🦞 Lobster friendly. |
Live DashboardReal-time view of tasks, consensus reports, agent scores, and activity feed. Terminal Amber theme. WebSocket updates. |
Agent MemoryPer-agent cognitive memory persists across sessions. Agents remember past findings, patterns, and project context. |
Auto-Verify (v0.4.30)Opt-in. Every UNVERIFIED finding getsfile_read-checked by a verifier agent before the report is returned. tag stays 'unverified' — auto-verify is metadata, not state transition. Flag: GOSSIP_CONSENSUS_AUTO_VERIFY_UNVERIFIED=1.
|
||
| Works with |
Full support |
Cursor Not yet |
Windsurf Not yet |
VS Code Not yet |
| Provider gateways |
HTTP gateway ✅ |
Local models ✅ |
Any base_url ✅ |
The Mermaid diagram above shows the loop end-to-end. Here's the per-step definition:
| Step | What happens |
|---|---|
| Dispatch | Tasks routed to agents based on dispatch weights (accuracy history per category) |
| Parallel review | Agents work independently, each producing findings with confidence scores |
| Cross-review | Each agent reviews peers' findings: agree, disagree, unverified, or new finding |
| Consensus | Findings deduplicated and tagged: CONFIRMED, DISPUTED, UNVERIFIED, UNIQUE |
| Signals | You verify findings against code and record accuracy signals |
| Skill development | Agents with repeated failures get targeted skill files injected into future prompts |
| Native | Relay | |
|---|---|---|
| Runs as | Claude Code subagent (Agent() tool) |
WebSocket worker on relay server |
| Providers | Anthropic (Claude) | Google (Gemini), OpenAI, any provider |
| API key | None — uses your Claude Code subscription | Required per provider |
| Defined in | .claude/agents/*.md |
.gossip/config.json |
| Consensus | Yes | Yes |
| Memory & Skills | Yes | Yes |
Both types participate equally in consensus, cross-review, and skill development. Native subagents get skill files injected into their system prompts and can call gossip_remember for memory recall. Relay workers call the equivalent memory_query tool and get file_read + file_grep during cross-review so their verification parity matches natives.
Requirements: Node.js 22+ and Claude Code.
npm install -g gossipcat && claude mcp add gossipcat -s user -- gossipcat
Restart Claude Code. Then in any project, ask:
"Set up a gossipcat team for this project"
claude mcp add doesn't work for your setup)Add to ~/.claude/mcp_settings.json:
{
"mcpServers": {
"gossipcat": {
"command": "gossipcat"
}
}
}
Or project-local in .mcp.json:
{
"mcpServers": {
"gossipcat": {
"command": "npx",
"args": ["gossipcat"]
}
}
}
Claude Code will call gossip_setup() to scaffold .gossip/config.json and your agent team. First-run bootstrap also writes the dispatch rules and tool catalog so Claude Code knows how to use gossipcat — no manual config needed.
Gossipcat is on npm and GitHub Releases — both carry the same bundle. npm install -g gossipcat pulls from the registry and is the shortest path; the GitHub release URL is useful when you want to pin to a specific tarball (see Alternative install paths below). Either way, npm drops a gossipcat binary on your PATH.
| What you get | |
|---|---|
| MCP server | Bundled binary at dist-mcp/mcp-server.js, wired as the gossipcat command on PATH |
| Dashboard | Prebuilt static assets in dist-dashboard/ — launches automatically on a dynamic port (ask Claude Code "what's my gossipcat dashboard URL?"). Override with GOSSIPCAT_PORT=24420 if you want a stable port. |
| Default skills + rules + archetypes | 16 bundled skill templates, operational rules, and project archetypes copied into the install |
| Postinstall wizard | Writes .mcp.json with correct absolute paths for your machine |
Pin to a specific npm version:
npm install -g [email protected]
Pin to a specific GitHub release tarball (version-locked, bypasses npm registry):
npm install -g https://github.com/gossipcat-ai/gossipcat-ai/releases/download/v0.5.2/gossipcat-0.5.2.tgz
Project-local install (each project gets its own gossipcat):
cd your-project
npm install --save-dev gossipcat
The postinstall writes .mcp.json to your project root. Open Claude Code in that directory and gossipcat connects automatically — no claude mcp add needed.
From source (contributors):
git clone https://github.com/gossipcat-ai/gossipcat-ai.git
cd gossipcat-ai
npm install
npm run build:mcp
claude mcp add gossipcat -s user -- node "$PWD/dist-mcp/mcp-server.js"
Re-run the install — npm will fetch the latest version and replace the installed binary:
npm install -g gossipcat@latest
Or in-session, ask Claude Code: "Check for gossipcat updates" — the gossip_update tool fetches the latest release notes and applies the upgrade with your confirmation.
Add env vars for the providers you want to use. Pass them with -e when registering, or set them in your shell environment.
| Provider | Env var | Notes |
|---|---|---|
| Native (Claude Code) | — | Dispatches through your active Claude Code subscription. No key needed. |
| Anthropic API | ANTHROPIC_API_KEY |
Direct API access if you don't want to go through Claude Code. |
| Google Gemini | GOOGLE_API_KEY |
Gemini Pro / Flash relay agents. |
| OpenAI | OPENAI_API_KEY (+ optional OPENAI_BASE_URL) |
GPT-4 / GPT-4o relay agents. OPENAI_BASE_URL lets you point at OpenAI-compatible gateways (Azure, Together, Groq, etc.). |
| OpenClaw | — (local gateway) | OpenAI-compatible, defaults to http://127.0.0.1:18789/v1. No API key — auth handled by your local OpenClaw daemon. |
| Ollama (local) | — | Runs locally via http://localhost:11434. No key. Pull your model first with ollama pull llama3.1:8b. |
Native only (zero API keys — everything runs through Claude Code):
claude mcp add gossipcat -s user -- gossipcat
Then in session ask for a team built from sonnet-reviewer / haiku-researcher / opus-implementer. Native agents dispatch through Agent() and relay back. Good zero-config starting point.
Anthropic API (direct, bypasses Claude Code):
claude mcp add gossipcat -s user \
-e ANTHROPIC_API_KEY=sk-ant-... \
-- gossipcat
Use this if you want relay agents running Claude models without going through the Claude Code subscription path — e.g. for parallelism beyond Claude Code's concurrency cap, or for running long background reviews while you keep working.
Google Gemini:
claude mcp add gossipcat -s user \
-e GOOGLE_API_KEY=AIza... \
-- gossipcat
Enables gemini-reviewer, gemini-tester, gemini-implementer on the relay. Watch the quota — gossipcat has a built-in 429 watcher that falls back to native agents when Gemini is cooling down.
OpenAI (and OpenAI-compatible gateways):
claude mcp add gossipcat -s user \
-e OPENAI_API_KEY=sk-... \
-- gossipcat
For Azure / Together / Groq / OpenRouter, add OPENAI_BASE_URL:
claude mcp add gossipcat -s user \
-e OPENAI_API_KEY=your-key \
-e OPENAI_BASE_URL=https://api.groq.com/openai/v1 \
-- gossipcat
OpenClaw (local gateway):
# Start the OpenClaw daemon first (see openclaw docs), default port 18789
claude mcp add gossipcat -s user -- gossipcat
No env vars. Configure an agent with provider: "openclaw" in .gossip/config.json and gossipcat talks to the local gateway automatically. Override the port with base_url in the agent config if your daemon runs elsewhere.
Ollama (fully local, no API):
# Pull a model once
ollama pull llama3.1:8b
# Then register gossipcat
claude mcp add gossipcat -s user -- gossipcat
Configure the agent with provider: "local" and model: "llama3.1:8b" in .gossip/config.json. Good for airgapped dev, offline work, and burning-down-test-debt sessions where you don't want to spend API credits.
Mixed setup (common production shape — Gemini cheap reviewers + Anthropic heavy implementers):
claude mcp add gossipcat -s user \
-e GOOGLE_API_KEY=AIza... \
-e ANTHROPIC_API_KEY=sk-ant-... \
-- gossipcat
Then set up a team with gemini-reviewer + haiku-researcher (native) + opus-implementer (native) + sonnet-reviewer (native). Gossipcat dispatches by category strength from the signal pipeline.
Keys are stored persistently and cross-platform:
secret-tool)Start a Claude Code session in any project and ask Claude to set up your team:
"Set up a gossipcat team with a Gemini reviewer and a Sonnet implementer"
Claude Code calls gossip_setup() to create your .gossip/config.json and agent definitions. You choose the providers, models, and roles — gossipcat adapts to your setup.
Available presets: reviewer, implementer, tester, researcher, debugger, architect, security, designer, planner, devops, documenter
The fastest path from "just installed" to "first useful review". If you skip this section you'll probably get stuck on the same things everyone else gets stuck on.
cd ~/your-project
claude
Gossipcat is registered globally now, so it boots automatically. You'll see it in the MCP server list.
In Claude Code, just type:
Run gossip_status
This loads gossipcat's operating rules into the current session, creates .gossip/ in your project on first run, and prints the dashboard URL + auth key. Copy the key — you'll paste it into the dashboard once.
You'll see something like:
Status:
Host: claude-code (native agents supported)
Relay: running :49664
Workers: 0
Dashboard: http://localhost:49664/dashboard (key: c3208820f8f70605fd45fa90004a2a4b)
Quota: google — OK
Open the dashboard URL in your browser, paste the key. You're now connected.
Tell Claude what you're building:
"Set up a gossipcat team for this project — it's a TypeScript Next.js app with a Postgres backend and Stripe payments."
Claude calls gossip_setup() and proposes a team. Typical proposal:
Proposed team:
- sonnet-reviewer (anthropic/claude-sonnet-4-6, native) reviewer + security
- gemini-reviewer (google/gemini-2.5-pro, relay) reviewer + types
- haiku-researcher (anthropic/claude-haiku-4-5, native) researcher
- opus-implementer (anthropic/claude-opus-4-6, native) implementer
Approve? (y/n)
Native agents (native: true) run through your existing Claude Code subscription — no API key needed. Relay agents need a key for their provider. If you don't have a Google API key, drop gemini-reviewer from the team for now and add it later.
Once you approve, gossipcat writes .gossip/config.json and the agents are live.
In a project where you've made some changes:
"Do a consensus review of my recent changes"
What happens (typical timing):
| Phase | Time | What you see |
|---|---|---|
| 1. Decompose | 1s | Claude picks agents and dispatches them in parallel |
| 2. Independent review | 30s–2min | Each agent reads your diff and reports findings |
| 3. Cross-review | 30s–1min | Each agent reviews the others' findings |
| 4. Consensus report | <1s | Findings tagged CONFIRMED / DISPUTED / UNVERIFIED / UNIQUE |
| 5. Verification | varies | Claude reads UNVERIFIED findings against the code, decides if they're real |
| 6. Signal recording | <1s | Accuracy signals saved per agent |
You get a report like:
Consensus round b81956b2-e0fa4ea4 — 3 agents
CONFIRMED (2):
[critical] Race condition in tasks Map at server.ts:47 — sonnet + gemini
[high] Missing auth on WebSocket upgrade at server.ts:112 — sonnet + gemini
UNIQUE (1):
[medium] String concat in SQL query at queries.ts:88 — only sonnet caught this
DISPUTED (1):
[low] "Memory leak in timer" — haiku says yes, sonnet/gemini say no
→ verified, sonnet was right (not a leak — cleanup is in finally)
Final: 3 real bugs to fix, 1 false alarm caught by cross-review.
You only act on CONFIRMED + verified UNIQUE findings. The cross-review is the whole point — single-agent reviews ship hallucinated bugs as critical findings 5–10% of the time. Cross-review with verification drops that to under 1%.
The dashboard shows everything live: agents, scores, active tasks, consensus reports, signals. You can leave it open in a tab while you work — every gossipcat tool call pushes an update via WebSocket.
That's the basic loop. The rest of this README covers advanced workflows, troubleshooting, and how to interpret what you're seeing.
Concrete recipes for the most common workflows. Each one shows what to type, what you'll get back, and what to do with it.
Type:
"Review my staged changes"
What you'll get: A consensus report (1–3 minutes) with findings tagged CONFIRMED / UNIQUE / DISPUTED. Claude verifies UNVERIFIED findings against the code and tells you which are real.
What to do with it: Fix the CONFIRMED + verified-real findings. Ignore disputed-but-falsified findings. If a finding looks important but you disagree, ask Claude "verify finding f3 against the code yourself" — it'll re-check and either back you up or push back.
When NOT to use it: Tiny diffs (under 20 lines) — overhead exceeds value. Just eyeball them.
Type:
"Security audit the payment handler at lib/stripe/webhook.ts"
What you'll get: Each security-skilled agent reviews from a different angle (OWASP, input validation, auth, secrets). Findings get cross-validated. Real vulns surface; theoretical ones get caught and dropped.
What to do with it: Fix critical/high findings before merge. Bookmark medium/low findings for the next pass.
Tip: Be specific about the file or module. "Security audit the codebase" is too broad and produces noisy results. "Security audit lib/stripe/webhook.ts" produces actionable findings.
Type:
"Research how the WebSocket connection lifecycle works in this project before I touch it"
What you'll get: A research agent (haiku-researcher by default — fast and cheap) reads the code, traces call paths, and writes a summary. The summary is saved to that agent's cognitive memory so the next time you ask about the same area it remembers.
What to do with it: Use the summary to plan your change. The agent will reference it next time you ask anything related — no re-discovery cost.
Type:
"I think there's a race condition in the tasks Map at server.ts:47 — check if I'm right"
What you'll get: Two agents independently check the specific claim and either confirm or push back. Author self-review is optimistic — this isn't.
What to do with it: If both agree with you, fix it. If they push back, read their reasoning before defending your hypothesis. They might be right.
Type:
"Show me agent scores"
What you'll get: A table of agents sorted by reliability with per-category accuracy and dispatch weights. Categories include trust_boundaries, injection_vectors, concurrency, error_handling, data_integrity, type_safety, etc.
What to do with it: If gemini-reviewer is sitting at 30% accuracy on concurrency, you know not to trust its concurrency findings without cross-review. If sonnet-reviewer is at 90% on trust_boundaries, you can ship its findings on auth/session bugs with high confidence.
Type:
"gemini-reviewer keeps hallucinating about concurrency — develop a skill for it"
What you'll get: Gossipcat reads gemini-reviewer's failure data, generates a targeted skill file with concrete anti-patterns, and injects it into the agent's prompt for all future concurrency-related reviews. Effectiveness is measured statistically (z-test on post-bind signals) — it'll tell you if the skill is actually working after ~30 dispatches.
What to do with it: Nothing — it's automatic. Just keep using the agent. Over time, the failure rate drops.
Type:
"Set up a gossipcat team for a TypeScript Cloudflare Workers project with Drizzle ORM and KV storage"
What you'll get: A proposed team with archetypes matched to your stack. Worker projects need different reviewers than long-running Node services — gossipcat picks accordingly.
What to do with it: Review the proposal, drop agents you can't run (missing API keys), approve.
gossip_run with one agent is fine for "what does this function do?"-tier queries. Save consensus for changes that touch shared state, auth, persistence, or the dispatch pipeline itself.The dashboard at http://localhost:<port>/dashboard is the visual layer over everything gossipcat knows. Open it once with the auth key from gossip_status, leave the tab open while you work. Updates push live via WebSocket.
Skill graduation grid — each card is one (skill × agent) pair: post-bind effectiveness curve over a 7-day window with current value vs threshold and ±pp drift on graduated skills.
| Panel | What it shows | When to look at it |
|---|---|---|
| Overview | Active agents, dispatch weights, recent finding counts | First thing in the morning — quick sanity check |
| Team | All agents sorted by reliability score, with category breakdowns | Picking which agent to trust for a tricky finding |
| Tasks | Live + historical task list with agent, duration, status | When something feels stuck — find it here first |
| Findings | Consensus reports paginated by round, with CONFIRMED/DISPUTED/UNVERIFIED breakdowns | Reviewing what got caught in a recent review |
| Agent detail | Per-agent memory entries, skills, score history, task history | Diagnosing why a specific agent keeps failing in a category |
| Signals | Raw signal feed (agreement / hallucination / unique_confirmed) | Auditing the scoring pipeline if scores look wrong |
| Logs | mcp.log content (boot, errors, warnings) | When the MCP server is misbehaving and you need raw evidence |
Auth keys rotate every session. A fresh key is generated each time gossipcat boots. If the dashboard says "unauthorized", run gossip_status again to get the new key.
The auth key rotates every boot. Run gossip_status in Claude Code to get the current key, paste it into the dashboard login.
Check ~/.gossip/mcp.log (or <your-project>/.gossip/mcp.log) for the boot log. Look for the [gossipcat] 🌐 Dashboard: line — that's the actual port. If it's missing, the relay didn't start. Common causes:
.gossip/relay.pid from a crashed previous boot — delete it and restart Claude CodeGOSSIPCAT_PORT set to a port already in use — unset the env var or pick a free portThis was a critical bug in v0.1.0 — fixed in v0.1.1. Upgrade with the install one-liner above. v0.1.1+ boots in degraded mode (dashboard + relay only) so you can run gossip_setup from inside Claude Code.
Usually a model or quota problem. Check gossip_status — it shows Quota: google — OK (or cooling down) per provider. If you're rate-limited, gossipcat will fall back to native agents automatically, but fallback agents may not be in your team. Either wait for the cooldown or add native agents to your team.
Record a hallucination_caught signal: ask Claude "record a hallucination_caught signal for finding f3 in the last consensus round — it claimed X but the code shows Y". After 3 such signals, the offending agent's score drops in that category and the orchestrator stops asking it questions in that area.
Edit .gossip/config.json directly. Any OpenAI-compatible endpoint works via provider: "openai" + base_url. Local models work via Ollama (provider: "local"). See the Configuration section.
The strict <agent_finding> parser drops tags whose type isn't one of finding | suggestion | insight (see invariant #8 in docs/HANDBOOK.md). When that happens, the gossip_signals receipt surfaces the drop count and a finding_dropped_format pipeline signal is emitted. Check the consensus round's droppedFindingsByType field on the dashboard — it names the offending type. If you see <agent_finding> instead of raw <agent_finding>, a transport layer is entity-encoding the output; pass agent output verbatim to gossip_relay.
Already supported as of v0.1.1 — each instance gets its own dynamic port. If you want a stable port for one specific instance (e.g. for browser bookmarks), set GOSSIPCAT_PORT=24420 for that one project's environment.
npm uninstall -g gossipcat
claude mcp remove gossipcat -s user
rm -rf ~/.gossip # if you want to wipe global memory + signals
rm -rf <project>/.gossip # if you want to wipe per-project state
File an issue at https://github.com/gossipcat-ai/gossipcat-ai/issues. Include the contents of .gossip/mcp.log (last 100 lines) and the output of gossip_status. Or ask Claude in-session: "file a gossipcat bug report about <...>" — the gossip_bug_feedback tool packages it up automatically.
Claude Code translates your natural-language requests into gossipcat MCP tool calls automatically — you don't need to type these — but if you want fine-grained control they're documented here:
gossip_run(agent_id: "auto", task: "...") → single-agent task
gossip_dispatch(mode: "consensus", tasks: [...]) → multi-agent review with cross-review
gossip_collect(consensus: true) → wait for results, run consensus
gossip_signals(action: "record", signals: [...]) → record accuracy after verification
gossip_scores() → view agent performance
gossip_skills(action: "develop", ...) → improve a struggling agent
gossip_status() → system status + dashboard URL
gossip_setup(...) → create or update your team
The dispatch rules at .claude/rules/gossipcat.md (auto-generated on first boot) teach Claude Code when to pick which mode based on what your change touches. You can edit these rules to bias the dispatch.
Currently supported orchestrator: Claude Code only. Cursor, Windsurf, and VS Code agents do not yet have native agent support and cannot run consensus cross-review. If you are reading this as a Claude Code instance with the gossipcat MCP server connected, this guide applies to you.
If you are a Claude Code instance reading this to understand how to use gossipcat, here is the complete operational guide.
You are the orchestrator. Gossipcat gives you tools to dispatch work to a team of specialized agents, collect their results, run cross-review consensus, and record accuracy signals. You do not implement code directly — you coordinate.
gossip_status()
This tells you: which agents are available, their providers/models, relay status, dashboard URL, and suggested workflows. Always call this first in a new session.
Single agent (auto-selected):
gossip_run(agent_id: "auto", task: "Review the auth module for security issues")
The system picks the best agent for the task based on past accuracy. For relay agents, this returns the result directly. For native agents (Claude Code subagents), it returns ⚠️ EXECUTE NOW instructions — follow them immediately in the same response.
Parallel (fan-out, no consensus):
gossip_dispatch(mode: "parallel", tasks: [
{ agent_id: "gemini-reviewer", task: "..." },
{ agent_id: "sonnet-reviewer", task: "..." }
])
Consensus (cross-review):
gossip_dispatch(mode: "consensus", tasks: [
{ agent_id: "gemini-reviewer", task: "..." },
{ agent_id: "sonnet-reviewer", task: "..." },
{ agent_id: "haiku-researcher", task: "..." }
])
gossip_collect(task_ids: ["id1", "id2", "id3"], consensus: true)
With consensus: true, agents cross-review each other's findings. If native agents are in the round, gossip_collect returns ⚠️ EXECUTE NOW with prompts — dispatch those Agent() calls immediately, then relay each result via gossip_relay_cross_review.
After consensus, verify every UNVERIFIED finding against the actual code (grep/read the cited files). Then record signals:
gossip_signals(action: "record", signals: [{
signal: "unique_confirmed", // or "hallucination_caught", "agreement"
agent_id: "gemini-reviewer",
finding: "Race condition in task map at line 47",
finding_id: "<consensus_id>:<agent_id>:f1" // mandatory
}])
Signals update dispatch weights. Agents that hallucinate get penalized. Agents that catch real bugs get promoted.
finding_id is mandatory on every signal — format: <consensus_id>:<agent_id>:fN.gossip_progress after reconnect — if a consensus round was in flight, it re-surfaces the pending EXECUTE NOW prompts.Use gossip_dispatch(mode: "consensus") when the change touches: shared mutable state, auth/sessions, file persistence, or the core dispatch pipeline. Use gossip_run for single-agent research, exploration, or review tasks that don't need cross-validation.
These tools are called by the internal LLM (the orchestrator — Claude Code with gossipcat MCP). You don't invoke them manually; the orchestrator selects and calls them based on your requests.
| Tool | Purpose |
|---|---|
gossip_status |
System status, dashboard URL, agent list |
gossip_run |
Single-agent dispatch with auto agent selection |
gossip_dispatch |
Multi-agent dispatch: single, parallel, or consensus |
gossip_collect |
Collect results with optional cross-review synthesis |
gossip_relay |
Feed native agent results back into the pipeline |
gossip_relay_cross_review |
Feed native cross-review results into consensus |
gossip_plan |
Decompose task into sub-tasks with agent assignments |
gossip_signals |
Record or retract accuracy signals |
gossip_scores |
View agent accuracy, uniqueness, and dispatch weights |
gossip_skills |
Develop, bind, unbind, or list per-agent skills |
gossip_setup |
Create or update agent team |
gossip_session_save |
Save session context for next session |
gossip_remember |
Search an agent's cognitive memory |
gossip_progress |
Check in-progress task status |
gossip_watch |
Stream signals as agents emit them, between dispatch and collect (catches pipeline drops mid-round) |
gossip_verify_memory |
Verify a memory claim against current code — FRESH / STALE / CONTRADICTED / INCONCLUSIVE — before acting on backlog items |
gossip_reload |
Self-terminate the MCP process so Claude Code respawns with a fresh bundle (dev loop after code changes) |
gossip_tools |
List all available tools |
gossip_update |
Check for or apply gossipcat updates from npm |
gossip_bug_feedback |
File a GitHub issue on the gossipcat repo from an in-session bug report |
User-facing dashboard guide is in Reading the dashboard above. This section covers the build + tech stack.
Built with React + Vite + shadcn/ui. Source lives at packages/dashboard-v2/. The bundled assets ship in dist-dashboard/ and the relay serves them as static files at http://localhost:<dynamic-port>/dashboard/. Live updates push via WebSocket — every gossipcat tool call emits an event that connected dashboard tabs receive in real time.
To rebuild from source (contributors only):
npm run build:dashboard
gossipcat/
apps/
cli/ MCP server, native agent bridge, boot sequence
packages/
orchestrator/ Dispatch pipeline, consensus engine, memory, skills,
performance scoring, task graph, prompt assembly
relay/ WebSocket relay server, dashboard REST/WS API
dashboard-v2/ React + Vite frontend (Terminal Amber theme)
client/ Lightweight WebSocket client for relay connections
tools/ File/shell/git tool implementations for worker agents
types/ Shared TypeScript types and message protocol
Gossipcat supports OpenClaw as a provider gateway. OpenClaw runs locally and exposes an OpenAI-compatible HTTP API — gossipcat talks to it like any other relay agent, with your stored gateway token and a separate quota slot so OpenClaw rate limits never bleed into your OpenAI agents.
Store your gateway token once (macOS):
security add-generic-password -s gossip-mesh -a openclaw -w <your-gateway-token>
On Linux:
secret-tool store --label "Gossip Mesh openclaw" service gossip-mesh provider openclaw
# (enter token when prompted)
Then add it to your team:
"Add an OpenClaw reviewer to my team"
Or directly via gossip_setup:
gossip_setup(mode: "merge", agents: [{
id: "openclaw-agent",
type: "custom",
provider: "openclaw",
custom_model: "openclaw/default",
role: "reviewer",
skills: ["code_review", "typescript"]
}])
The gateway runs at http://127.0.0.1:18789/v1 by default. Override with base_url if yours is on a different port. Available models: openclaw, openclaw/default, openclaw/main.
Once added, the agent participates in consensus rounds, accumulates accuracy signals, and gets skill files generated from its failure patterns — same as any other agent in the mesh.
Config is searched in order: .gossip/config.json > gossip.agents.json > gossip.agents.yaml.
{
"main_agent": {
"provider": "google",
"model": "gemini-2.5-pro"
},
"utility_model": {
"provider": "native",
"model": "haiku"
},
"consensus_judge": {
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"native": true
},
"agents": {
"sonnet-reviewer": {
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"preset": "reviewer",
"skills": ["code_review", "security_audit", "typescript"],
"native": true
}
}
}
| Field | Description |
|---|---|
main_agent |
Internal tool LLM for routing, planning, and synthesis |
utility_model |
Memory compaction, gossip, lens generation |
consensus_judge |
Model for cross-review synthesis |
agents.<id>.provider |
anthropic, google, openai, openclaw, local |
agents.<id>.base_url |
Custom endpoint for openai/openclaw (e.g. http://127.0.0.1:18789/v1) |
agents.<id>.native |
true = runs via Claude Code Agent(), no API key |
agents.<id>.preset |
reviewer, implementer, tester, researcher, debugger, architect, security, designer, planner, devops, documenter |
agents.<id>.skills |
Skill labels for dispatch matching |
Gossipcat auto-detects the host environment:
| Host | Native agents | Rules file |
|---|---|---|
| Claude Code | Yes | .claude/rules/gossipcat.md |
| Cursor | No | .cursor/rules/gossipcat.mdc |
| Windsurf | No | .windsurfrules |
| VS Code | No | — |
| Feature | Status |
|---|---|
| Consensus code review | ✅ Shipped |
| Adaptive dispatch weights | ✅ Shipped |
| Per-agent skill development | ✅ Shipped |
| Agent cognitive memory | ✅ Shipped |
| Live dashboard | ✅ Shipped |
| Cross-platform key storage | ✅ Shipped |
OpenAI-compatible gateway support (base_url) |
✅ Shipped |
| OpenClaw provider integration 🦞 | ✅ Shipped |
| Local LLM support (Ollama) | ✅ Shipped |
| Statistical skill effectiveness (z-test on per-category accuracy, auto pass/fail verdicts) | ✅ Shipped |
| Native subagents get skill injection + cognitive memory recall | ✅ Shipped |
Relay cross-reviewers get file_read + file_grep (closes tool-blindness gap with natives) |
✅ Shipped |
Worktree-aware consensus (resolutionRoots + auto-discover for feature-branch reviews) |
✅ Shipped |
Signal pipeline observability (format-drop receipts + finding_dropped_format meta-signal + gossip_watch stream) |
✅ Shipped |
Consensus round retraction (gossip_signals action: retract with tombstones) |
✅ Shipped |
| Worktree sandbox hardening (Layer 1+2+3 boundary enforcement + rotated audit log) | ✅ Shipped |
In-session bundle hot-swap (gossip_reload) |
✅ Shipped |
| npm package — one-liner install with bundled MCP server + dashboard | ✅ Shipped |
| Full implementation workflow (agents write code with scoped + worktree isolation) | ✅ Shipped |
| Dashboard enrichment (graphs, trends, session history) | ☐ Planned |
| Local Postgres migration (embedded Postgres for tasks/signals/consensus/memory — unblocks full task results, real queries, no more JSONL scans) | ☐ Planned |
| Full Cursor support | ☐ Planned |
| Windsurf / VS Code parity | ☐ Planned |
| Standalone CLI (no IDE required) | ☐ Planned |
| CLI parity with MCP pipeline (gossip, task graph, agent memory in chat mode) | ☐ Planned |
Gossipcat is open source and early-stage — bug reports, feature ideas, and PRs are all welcome.
gossip_bug_feedback tool posts structured issues from your current session.master. Run npm test before pushing. Commit messages follow conventional commits (fix:, feat:, chore:, docs:).See CLAUDE.md in the repo for the operational rules gossipcat's own agents follow during development — it's a useful read if you want to understand the signal pipeline and consensus workflow from the inside.
Releases go to GitHub Releases via a two-stage script that respects branch protection — no direct commits to master.
# Stage 1 — open the version bump PR
./scripts/release.sh 0.1.2
# review + merge the PR via gh or web UI
gh pr merge <pr-number> --squash --delete-branch
# Stage 2 — build, tag, release (from master, after the PR is merged)
git checkout master && git pull
./scripts/release.sh # no args
Stage 1 creates chore/release-X.Y.Z, bumps package.json, opens the PR, exits. Stage 2 reads the version from package.json, builds the MCP bundle + dashboard, packs the tarball, tags, pushes the tag, and creates the GitHub release with auto-generated notes from commits since the last tag.
Run in your terminal:
claude mcp add gossipcat -- npx Security
Low riskAutomated heuristic from public metadata — not a security guarantee.