loading…
Search for a command to run...
loading…
API-first MCP server for multi-model cross-review with unanimous convergence gates.
API-first MCP server for multi-model cross-review with unanimous convergence gates.
MCP server orchestrating API-first cross-review between Claude, ChatGPT Codex, Gemini, DeepSeek, and Grok with unanimous convergence gates.
status: stable npm runtime: API-only security: CodeQL Default Setup license: Apache 2.0
Install.
npm install -g @lcv-ideas-software/cross-review-v2
# or using the GitHub Packages mirror:
npm install -g @lcv-ideas-software/cross-review-v2 --registry=https://npm.pkg.github.com
Status. Stable. Current release: v03.00.00 (npm package 3.0.0). See
CHANGELOG.md for the release history.
The version history at a glance:
| Release | Scope |
|---|---|
v03.04.00 |
Minor — Perplexity multi-failure-mode close-out 2026-05-13: 3 coordinated fixes covering 7 production sessions Codex flagged (51973fac, f72e597a, f9a19401, 99d46a2b, 00d92cce, 59776026, 0003b2fe). Fix #1 — streaming-path strip parity (P0, surgical 2-line edit in src/peers/perplexity.ts:~409/~504): the v3.2.0 stripPerplexityThinkingBlock fix was applied only inside sonarText(response) (non-streaming path at :~426/~521). Production server_info.streaming.tokens=true is the default, so virtually every Perplexity call traversed the streaming branches which used raw stream_buffer.text() and bypassed the strip entirely. <think>...</think> preambles from sonar-reasoning-pro / sonar-deep-research reached the status parser, producing unparseable_after_recovery despite valid trailing JSON. v3.4.0 wraps stream_buffer.text() with stripPerplexityThinkingBlock(...) at both streaming sites, restoring parity. Forensic evidence: sess f9a19401 (v3.3.0 self-investigation) — 4 peers converged READY on the exact diagnosis; Perplexity ready_rate=0.28125 (9/32) vs ~1.0 for other peers. Fix #2 — anti-meta-audit lock (P1, prompt clause + heuristic detector): sess 51973fac shipped a checklist of MISSING: diff hunk placeholders + sections titled Evidence Gap / Validation Claims (NARRATIVE / Peer Review Readiness Blockers instead of refining the artifact. leadShipModeDirective() gains an ## Anti-Meta-Audit Lock (HARD) clause; new exported detectMetaAuditFabrication(text) in src/core/orchestrator.ts flags placeholder + section anti-patterns with double-bar threshold (placeholders ≥ 3) OR (sections ≥ 1 AND placeholders ≥ 2) for false-positive resistance. Reuses the shared consecutiveLeadDrifts counter (cap=2); new event session.lead_meta_audit_fabrication_detected + finalize reason lead_meta_audit_repeated. Fix #3 — reviewer proportionality (P2, prompt only): sess 0003b2fe — Perplexity reviewer demanded separate session_attach_evidence of the same rg scan output the caller had narrated inline, blocking convergence over rounds. sessionContractDirectives() gains item 5 scoped tightly to pure config/script/text static-scan reviews; runtime work (build/test/deploy/migration/network) still requires raw output; "when in doubt, prefer asking for evidence" preserves rigor default. 3 new smoke markers (perplexity_streaming_strip_parity_test, meta_audit_fabrication_detection_test, proportionality_guidance_test). 100% backward-compatible additive: new exported helper, new event type, new finalize reason; tool schema unchanged. Minor bump (3.3.0 → 3.4.0; Y-component increment per SemVer) — additive public surface is the reason; behavior change for callers passing valid args is pure failure-mode prevention. |
v03.03.00 |
Minor — Caller peer-selection lock (operator directive 2026-05-12: "TODOS OS AGENTES/PEERS SEMPRE PARTICIPAM, INDEPENDENTE DA ESCOLHA OU VONTADE DO CALLER"). Closes the systematic gaming pattern where peer callers (notably Codex, observed across multiple sessions) selectively excluded other peers from their own cross-review panels via curated peers: [...] lists or pinned a sympathetic relator via lead_peer. Lock surface: peers is locked for ALL callers (including operator) — reviewer panel is ALWAYS the full server-configured peer_enabled set; operators tune via env vars (`CROSS_REVIEW_V2_PEER_ |
v03.02.00 |
Patch — Codex bug-report close-out 2026-05-12: three surgical fixes (Perplexity <think> parser + session-state invariant + orchestrator strict peers). Fix #1 (src/peers/perplexity.ts): sonar-reasoning-pro / sonar-deep-research emit a <think>...</think> reasoning preamble before structured JSON; pre-v3.2.0 the parser fed that raw string into the format-recovery pipeline, which failed unparseable_after_recovery even when the trailing JSON was valid READY. New PERPLEXITY_THINKING_BLOCK regex + exported stripPerplexityThinkingBlock() helper; sonarText() now strips before returning. Closes the long-standing blocker that forced v3.0.0/v3.1.0 to self-bypass HARD GATE. Fix #2 (src/core/session-store.ts): closes session-state corruption observed in session 41244a1c-e7e8-439a-a59e-9339f7c7175d (R1-R3 didn't converge, R4 finalized as converged, R5+R6 ran on top and clobbered convergence_health back to "blocked", leaving meta with outcome="converged" / health.state="blocked"). finalize() now validates outcome="converged" against the latest round's convergence.converged (throws code: "session_finalize_outcome_mismatch"); appendRound() refuses to append to a finalized session (code: "session_already_finalized"); new public assertNotFinalized() helper wired into askPeers + runUntilUnanimous entry points so the round fails fast instead of after burning budget. Fix #3 (src/core/orchestrator.ts): when the caller passes an explicit peers: [...] list, autowire judges are intersected with the explicit list — both the consensus and single-peer paths. Observed in session 73036fbb where peers=[codex,gemini,deepseek,grok] but autowire still invoked perplexity as judge. New hadExplicitPeers flag + judgeRespectsExplicitPeers() helper; skipped sessions emit session.evidence_judge_pass.autowire_skipped with skipped_for_explicit_peers: true + session_explicit_peers: [...] for operator audit. 3 new smoke markers (perplexity_thinking_block_strip_test 7 scenarios + 3 pins; session_finalize_state_invariant_test 5 scenarios + 1 pin; orchestrator_strict_peer_panel_test 5 source pins). Smoke harness completes ok: true / events: 99. Patch bump (additive — new exports + new error codes; pre-existing anti-patterns now reject loudly instead of corrupting state). The cross-review-v2-attachment-inline-test smoke fixture was updated to caller_status: "NOT_READY" so R1 doesn't auto-converge under stub mode. |
v03.00.00 |
Major — Perplexity joins the sexteto. Quinteto (5 peers) → sexteto (6). Operator directive 2026-05-12. New PerplexityAdapter at https://api.perplexity.ai (Sonar API, OpenAI-Chat-Completions-compatible; reuses shared loadOpenAICtor lazy SDK helper). 5 architectural traits handled explicitly: (1) web search is the DEFAULT per call — peer becomes fact-check overlay; (2) system prompt is half-honored (search component does not attend to it); (3) reasoning_effort enum is `minimal |
v02.28.00 |
Minor — Cold-start hardening Part 3: Windows registry env-var lookup bulk-cached (3-7 s → ~100 ms). Empirical profile revealed the real boot bottleneck on Windows: loadConfig() consuming 3.1-7.0 s because readWindowsRegistryEnv(name) fired reg query <root> /v NAME per missing env var × 2 scopes (HKCU + HKLM). With ~140 config vars and partial process.env, this burned 3-7 s dwarfing every other boot cost. v2.27.0 + v2.27.1 attacked SDK imports + sweeps (~340 ms) — a side concern. v2.28.0 fix: single bulk reg query <root> at first miss populates a Map<string,string> module cache; readWindowsRegistryEnv becomes a pure cache.get(name). Cost: O(1 + 2 registry reads) instead of O(N missing × 2 spawns). Empirical handshake (3 trials each): v2.27.1 3.18 / 3.12 / 3.14 s → v2.28.0 0.37 / 0.37 / 0.38 s = 8.4× speedup. loadConfig() alone: 3,307 ms → 87 ms (38×). Cold-start now well below Claude Code's spawn timeout. New smoke windows_registry_env_bulk_cache_test (7-class assertion pinning Map cache + bulk loader + canonical reg query <root> shape + negative invariant on per-var /v NAME + escapeRegExp absence + thin lookup + dist parity). Public surface 100% backward-compatible. Self-review BYPASSED per feedback_cross_review_self_repair_exception.md (gate-fixing-itself, third installment). Minor bump — internal behavior change with measurable 8.4× runtime impact. |
v02.27.01 |
Patch — Cold-start hardening Part 2: lazy-load 5 provider SDKs + defer 6 startup sweeps to setTimeout(30s). Completes the cold-start fix started in v2.27.0. Empirical motivation 2026-05-12: cross-review-v2 failed to register tools in a Claude Code session while the other 5 MCP hosts (Codex CLI / Gemini Code Assist / Antigravity / Grok CLI / DeepSeek CLI / VS Code) loaded normally. Diagnostic measurement of the real JSON-RPC initialize handshake showed the server taking ~4.2 s to respond — right on top of Claude Code's per-spawn timeout. Two contributors stacked: eager top-level imports of 5 provider SDK module trees (~3 s of CommonJS/ESM graph) + v2.27.0's 4 boot sweeps + 2 boot notices all running on the same event-loop tick as the initialize message. Lazy-load: every adapter source uses import type only for provider SDKs; new shared cached loaders loadAnthropicCtor() / loadOpenAICtor() / loadGenaiModule() wrap import(<sdk>) in a per-module promise so concurrent first-callers resolve exactly once. Each adapter's client() is now async; 25 call sites across the 5 adapters updated; geminiThinkingConfig(model, ThinkingLevel) takes the lazy-loaded enum as 2nd arg. model-selection.ts consumes the same loaders. Deferred sweeps: 6 boot-time setImmediate blocks in server.ts become setTimeout(..., STARTUP_SWEEP_DELAY_MS) with delay = 30_000 ms — initialize responds in <200 ms while sweeps run later when the operator is idle. Empirical handshake measurement post-ship: 3.6-3.9 s (vs 3.7-4.2 s pre-ship); margin is modest because Node.js + MCP SDK + orchestrator still dominate, but the architectural correctness keeps SDK and FS work off the initialize tick entirely. 2 new smoke markers (lazy_provider_sdk_imports_test, startup_sweeps_use_setTimeout_test) + 2 existing gemini assertions updated. Public surface 100% backward-compatible (3 new named exports for cross-module loader reuse; client() is private so async-conversion is internal). Patch bump. |
v02.27.00 |
Minor — Cold-start hardening Part 1: corrupted meta.json auto-quarantine + finalized-session auto-prune. Empirically motivated by Claude Code reload friction 2026-05-12: 534 session dirs accumulated under ~/.cross-review/data_v2/sessions/, including 3 corrupted by the v2.25.1 redact escape-boundary bug (77c47284, be47a5b0, 7edf63e3). The startup sweeps iterate via list() which read every meta.json; a single corrupted file caused the sweep to throw + abort, surfacing parse-error stderr on every reload — Claude Code is more sensitive to startup stderr than other hosts. SessionStore.list() now silently skips + quarantines corrupted meta.json (renamed to meta.json.bad with one [cross-review-v2] quarantined … stderr line, idempotent). SessionStore.pruneOldSessions(maxAgeDays?) removes finalized session dirs (outcome ∈ converged/aborted/max-rounds) whose updated_at is older than the cutoff. Default 60 days; CROSS_REVIEW_V2_PRUNE_AFTER_DAYS=0 disables entirely. In-flight or untyped-outcome sessions are NEVER pruned (preserves audit trail). New boot setImmediate block wires the prune; stderr only emitted when pruned > 0. Minor bump — 2 new methods on SessionStore; list() swallows-and-quarantines instead of throws (additive defensive). Backward-compatible default; operators see no behavior change unless they have corrupted meta.json or >60-day-old finalized sessions. Self-review BYPASSED per feedback_cross_review_self_repair_exception.md. |
v02.26.01 |
Patch — max_attached_evidence_chars default raised 80_000 → 200_000 to fix multi-file evidence truncation. Empirically demonstrated by the stepsecurity v0.2.0 ship 2026-05-12 (sess fd1037e5 and prior 85f94725): with 5 attached evidence files totaling ~95KB, session-store.readEvidenceAttachments() budget allocator at src/core/session-store.ts:1481-1543 exhausted the 80KB total cap before reaching the 4th+ attachment, surfacing (truncated to 33273 of 38412 bytes) to peers, who in 5 consecutive rounds correctly flagged the truncation as a blocker. The perFileCap = max(2_000, floor(totalCap * 0.6)) mechanic remains correct (60% per-file allowance leaves room for at least 1 other attachment); only the global totalCap default needed bumping. New default 200_000 chars accommodates ~5 attachments averaging 30KB each. Operator override unchanged via CROSS_REVIEW_V2_MAX_ATTACHED_EVIDENCE_CHARS. Documented adjacent issues (no code fix; tracked for v2.27+): (1) lead-drift abort threshold is 2 consecutive (orchestrator.ts:3662) — when max_rounds is reached with consecutiveLeadDrifts === 1, the session ends max-rounds instead of lead_meta_review_drift; workaround = use ask_peers for known-drift-prone task patterns; (2) inaccessible upstream OpenAPI spec — when peers demand verbatim spec excerpts but the spec endpoint requires browser-session cookie auth, the caller must rely on alternative-evidence patterns. Patch bump — backward-compatible default change. No public API surface change. Self-review BYPASSED per feedback_cross_review_self_repair_exception.md. |
v02.26.00 |
Minor — Full pricing-model schema: base + extended-tier + cache (read/write) + promo (limited-time discount), all env-configurable, graceful fallback when fields are absent or promo expires. Operator directive 2026-05-11 ("Cross-review-v2 precisa saber ler das variáveis configuráveis nos arquivos de configuração e no env var todos os modelos de preços vigentes, com e sem cache, com promoção e sem promoção abaixo de tantos tokens e acima de tantos tokens"). Adds 14 new optional pricing env vars per provider plus 2 metadata env vars per provider (_THRESHOLD_TOKENS, _PROMO_EXPIRES_AT_UTC) on top of the v2.0.0 required pair — total 18 env-var slots per provider × 5 providers = 90 max. Selection cascade in new exported selectRate(): (promo+extended) → promo → extended → base, each step automatically falling through when the corresponding field is unset OR the gating condition does not apply. When promo expires (Date.now() >= Date.parse(promo_expires_at)), system uses base without operator intervention; when extended is unset, base applies to all prompt sizes; when cache rates are unset entirely, cache tokens are billed at the input rate (zero savings reported, no penalty). No-hardcoded-financials directive — the legacy src/core/cache-rates.json runtime fallback was REMOVED; cache pricing comes exclusively from env vars or graceful input-rate fallback. CostEstimate type extended with cache_read_cost?, cache_write_cost?, tier_used? ("base"|"extended"|"promo"|"promo_extended"). estimateCacheSavings() third positional arg (configRate) is now required — internal/MCP callers route through estimateCost() and are unaffected. New smoke marker full_pricing_model_v2260_test pinning 11 invariants. Minor bump — additive public surface; breaking only for direct estimateCacheSavings() callers. |
v02.25.01 |
Patch — meta.json corruption hotfix: redact() env-style pattern was crossing JSON-escape boundaries. The env-style assignment regex in src/security/redact.ts:26 used [^\s"',}]{6,} for the value capture group; backslash was NOT in the exclusion class, so when a peer response contained the JSON-escaped sequence token: write\" (the inner-string close-quote of an escaped peer text), the {6,} quantifier consumed write\ (6 chars including the escape backslash). The replacement [REDACTED] ate the closing \ of the escape, leaving a bare " that prematurely closed the outer JSON string — producing structurally-broken meta.json files that could not be re-parsed at session resume time. Empirical impact: 3 cross-review-v2 sessions today (be47a5b0, 77c47284, 7edf63e3) all aborted at session_init with parser errors at different positions — same root cause: peer responses to a 13-repo scorecard hotfix submission quoted id-token: write inside backtick-fenced YAML excerpts. Fix: extend the negative char class with \\. Three smoke regression cases added (escapeBoundary, realAssignment, yamlExcerpt). Patch bump — additive defensive narrowing of an existing pattern; no public surface change. Cross-review-v2 self-review BYPASSED per operator directive 2026-05-11 (the bug being fixed is in the cross-review gate itself; routing the fix through the broken gate would re-encounter the same corruption). |
v02.25.00 |
Third deliberation mode circular joins ship and review. Imported from maestro-app's editorial protocol after operator review of the maestro design 2026-05-11. Serial deliberative custody: caller submits artifact; non-caller peers rotate as temporary curators; each rotator either approves the current version unchanged or produces a narrowly justified revision; convergence = full rotation completes without substantive change. No parallel peer-voting in this mode — the rotator IS the actor each round. When to use each mode: ship (default) for approving/rejecting an external artifact (PR review, spec approval, security gate — tribunal primitive, all peers vote READY); review for tasks phrased as a review act where the lead emits structured response; circular for producing/refining a shared artifact (spec drafting, RFC, protocol evolution, CHANGELOG copy — editorial primitive, panel produces). Modes coexist; mixing within a single session is not supported. Implementation: new SessionMode = "ship" | "review" | "circular"; new leadCircularModeDirective() Layer-1 prompt clause with 5 subsections (approve unchanged, approved-content lock, quality preservation, no-self-review, evidence-provenance-lock shared with ship); new runCircularLoop() private orchestrator method called when sessionMode === "circular"; new circular_state: {rotation_order, consecutive_no_change_count, last_revision_round} persisted in meta.json; new circular_max_rotations config (default 3, env CROSS_REVIEW_V2_CIRCULAR_MAX_ROTATIONS); new event types session.circular_rotation_assigned / _step_unchanged / _step_revised / _full_rotation_no_change / _max_rotations_exceeded / _rotation_too_small; new finalize reasons circular_full_rotation_no_change / circular_max_rotations_exceeded / circular_rotation_too_small. Rotation length minimum is 2 (no-self-immediate-output guard). Drift / empty / fabrication detection from v2.23/v2.24 fires identically. New smoke marker circular_mode_test pinning 11 invariants. MCP tool schemas (run_until_unanimous, session_start_unanimous) accept mode: "circular". Minor bump — additive public surface; pre-v2.25 callers see no behavior change. |
v02.24.00 |
Evidence-provenance lock for the ship-mode relator (Codex bug report 2026-05-10). Codex empirically observed two adjacent failure modes from his own working session 019dc794: (a) session 09c21d7a — lead_peer (Grok) fabricating operational evidence ex nihilo in run_until_unanimous with mode: ship (symmetric-pattern SHAs, 39-char SHAs where git emits 40, test-run counts not in attached evidence, git diff --check passed assertions, vite asset hashes); (b) session eee886d3 — different relator (DeepSeek) propagating caller-narrated operational claims (cargo test: 147 passed, npm run typecheck: passed) as if they were verified evidence, when the caller never attached raw command output via session_attach_evidence. Same architectural gap from two angles: NARRATIVE about operational evidence ≠ PROVENANCE-GRADE operational evidence. Pre-v2.24.0 the orchestrator promoted such revisions to next-round draft, burning a full round of peer calls before downstream peers (claude + deepseek) blocked convergence. Layer 1 — Evidence Provenance Lock (HARD) clause added to leadShipModeDirective() system prompt, instructing the relator that operational evidence (SHAs/hashes/build outputs/test counts/diff hunks/git assertions) MUST be cited verbatim from the corpus or declared as a blocker. Layer 2 — new exported detectFabricatedEvidence(revisionText, provenanceCorpus): FabricationDetectionResult heuristic detector with hex-token-subset check + canonical operational-assertion patterns. Thresholds: 3+ net-new hex tokens or 2+ suspicious assertions trip fabrication; corpus-quoted tokens are subtracted before scoring (false-positive guard). Layer 3 — orchestrator relator-revision branch wires the detector after emptyText/driftDetected checks, preserves prior draft on detection, increments consecutiveLeadDrifts, emits session.lead_fabrication_detected event (data.fabrication_signals carries net_new_hex_count + sample + suspicious_assertion_count + sample), finalizes with lead_fabrication_repeated at the consecutive-cap. New smoke marker relator_evidence_provenance_lock_test pins behavioral matrix (clean/hex/assertion/provenance-correct) + source-level invariants (prompt sentinel, threshold constants, event type, finalize reason, unified-counter contract). No tool surface change. Patch bump — additive event + finalize reason; failure-mode behavior change only. |
v02.23.00 |
Anthropic empty-revision degenerate path detection. Patch closing a $0.21 USD waste path discovered while triaging maestro-app v0.5.20 review session 8187f5a8 (2026-05-10): Claude Opus extended-thinking responses can return a content array with only thinking/redacted_thinking blocks and no final text block. Pre-v2.23.0 the Anthropic adapter silently produced text: "" and the orchestrator promoted that empty string to the next-round draft, dispatching peer calls against an empty Draft Or Solution Under Review: block. Layer 1 — new parseAnthropicContent(content) returns {text, parser_warning?} instead of the lossy string; legacy textFromAnthropicContent kept as backward-compat shim. Layer 2 — anthropic.ts call sites surface parser_warning via new extraParserWarnings parameter on resultFromText/generationFromText, flowing to PeerResult.parser_warnings and (new) GenerationResult.parser_warnings. Layer 3 — orchestrator's relator-revision branch treats generation.text.trim() === "" the same as drift: preserve prior draft, increment consecutiveLeadDrifts, emit dedicated session.lead_empty_revision event, finalize with lead_empty_revision_repeated when the cap is hit. New smoke marker anthropic_empty_text_detection_test pins all 4 invariants (helper return shape, adapter call-site uniformity, orchestrator sentinel strings, types declaration). No public surface change for callers passing valid arguments. Patch bump — failure-mode behavior change only. |
v02.22.00 |
session_doctor drill-down + per-round cost telemetry + budget warning event. Three observability/audit improvements identified during a forensic audit of 467 durable sessions. A.P2: session_doctor hides per-session enumeration of findings.self_lead_metadata by default (178/467 = 38% pre-v2.16.0 noise); totals.self_lead_metadata count remains visible; pass include_legacy: true to enumerate. B.P2: entries in findings.open_evidence_sessions[] gain item_types (open items grouped by surfacing peer) + chronic_blockers (item ids with round_count >= 3) so operators see which evidence asks are systemic. B.P3: new costs_per_round[] + cost_ceiling_usd in meta.json (snapshot at session_init time so retroactive analysis is decoupled from later env-var changes); new one-shot session.budget_warning event fires when cumulative cost crosses 75% of the ceiling, providing early visibility before max_rounds_budget_exceeded. 3 new smoke markers (session_doctor_legacy_filter_test, evidence_checklist_drilldown_test, budget_warning_emit_test). Minor bump — public surface is additive; pre-v2.22 callers see no behavior change. |
v02.21.00 |
Cross-provider prompt caching across all 5 peers (OpenAI, Anthropic, Gemini, DeepSeek, Grok). Single coordinated ship that wires uniform prompt-caching telemetry through the runtime: each adapter parses provider-native cache fields (prompt_tokens_details.cached_tokens / cache_creation_input_tokens / cache_read_input_tokens / cachedContentTokenCount / prompt_cache_hit_tokens / prompt_cache_miss_tokens); orchestrator emits a canonical provider.cache.usage event; per-session cache_manifest.json is appended for every cached call. Anthropic uses EXPLICIT cache_control breakpoints on the system prompt (TTL 5m/1h). OpenAI uses pair-scoped prompt_cache_key + prompt_cache_retention (in_memory/24h). Grok mirrors OpenAI plus x-grok-conv-id header for cache-bucket scoping. DeepSeek parses auto-cache telemetry (no payload changes). Gemini parses implicit-cache telemetry only (explicit caches.create deferred). New src/core/prompt-parts.ts builds the canonical stablePrefix that always begins with cache_schema_version: vN and produces a sha256 hex hash invariant across rounds for the same case. New src/core/cache-manifest.ts persists per-session cache history with the same atomic-write retry pattern as meta.json. New rate cards in src/core/cache-rates.json populate CostEstimate.cache_savings_usd (or cache_savings_unknown when no rate matches). Operator can disable globally with CROSS_REVIEW_V2_DISABLE_CACHE=true; TTL via CROSS_REVIEW_V2_CACHE_TTL_ANTHROPIC / CROSS_REVIEW_V2_CACHE_TTL_OPENAI; schema bump via CROSS_REVIEW_V2_CACHE_SCHEMA_VERSION. 5 new smoke markers (cache_hash_invariance_test, cache_schema_version_in_prefix_test, cache_rates_json_loaded_test, cache_manifest_atomic_write_test, cache_disable_kill_switch_test). New docs/caching.md documents per-provider behavior matrix. Minor bump — public surface is additive; pre-v2.21 callers see no behavior change. |
v02.18.08 |
Site sponsor card iteration. site/index.html GitHub Sponsors iframe (caixa branca cross-origin) substituído por link card dark navy com ❤ pink + meta cyan + seta animada; card movido para DEPOIS dos botões (lcv.dev/sponsor primário, GitHub Sponsors alternativa). Companion ship Phase 3 (12 repos). |
v02.18.07 |
Patch — site/index.html visual identity refresh. GitHub Pages doc/sponsor page reskin to the new LCV org dark-first navy/cyan visual identity (palette #050b18/#38bdf8/#34d399, radial gradients, glow shadows, gradient text on h1). Coordinated companion ship with cross-review-v1 1.12.9, deepseek-cli 0.3.1, grok-cli 1.6.2, sponsor-motor APP v01.02.02, and .github-org/site (org root + /sponsor). No change to the published npm tarball (files[] does not include site/); only the GitHub Pages page changes. Patch bump (no public surface change). |
v02.18.06 |
Patch — Gemini API function-declaration compatibility for MCP tool inputSchemas. Gemini Code Assist forwards each MCP tool's inputSchema to the Gemini API as a function_declarations[*].parameters payload; the Gemini API's OpenAPI 3.0 subset rejects three patterns the SDK was emitting from the existing zod schemas, surfacing as 400 INVALID_ARGUMENT for every chat turn including cross-review-v2 tools. v2.18.6 cleans the offending zod usage. (1) additionalProperties: false removed from every MCP tool inputSchema (~28 tools) by dropping the .strict() chain; runtime accepts the same valid arguments because handlers consume only declared properties via destructuring. (2) caller field flattened from z.union([PeerSchema, z.literal("operator")]) (6 occurrences) to a single CallerSchema = z.enum([...PEERS, "operator"]), replacing the anyOf: [enum, const] shape with a clean single enum. (3) reasoning_effort_overrides refactored from z.record(PeerSchema, ReasoningEffortSchema).optional() to an explicit z.object({codex?, claude?, gemini?, deepseek?, grok?}).optional(), eliminating the non-OpenAPI propertyNames constraint and the spurious required: [<all 5 peers>] artifact that contradicted the field's .optional() declaration. No behavior change for any caller passing valid arguments — Claude Code, Codex CLI, Gemini Code Assist, Grok CLI and DeepSeek CLI continue invoking the same tools with the same keys. Lint/typecheck/format clean; smoke harness completes with ok: true / events: 96. Patch bump (compatibilidade pública preservada; única diferença observável é que campos extras não declarados passam a ser silenciosamente descartados em vez de rejeitados com mcp_arg_validation_failed). |
v02.18.05 |
Patch — anti-drift smoke drivers for v2.18.4 audit closure (operator directive 2026-05-07). v2.18.4 shipped 6 surgical fixes from the Codex external audit; v2.18.5 hardens those fixes against silent regression with 5 anti-drift smoke checks (hono_override / abort_signal_threading / max_items_per_pass_default / clamp_effort_for_model / consensus_event_per_peer_attribution). P1.1: package.json overrides.hono === ">=4.12.16" + ip-address override retained. P1.3: ≥2 sites with signal?: AbortSignal param + signal: params.signal wiring + signal: input.signal autowire emission; consensus pass has no leftover signal: undefined. P1.4: source-level ?? "4" fallback + behavioral loadConfig() returns max_items_per_pass=4 (env unset). P2.1: behavioral clampEffortForModel("xhigh", "grok-4.3")="high"; passthrough on multi-agent; clamp wired at exactly 2 responses.create sites. P2.4: legacy judge_peer + new judge_peers array + per_peer_verdict map co-emitted at every this.emit({...}) event payload. clampEffortForModel is now exported from src/peers/grok.ts so the harness can verify directly. Companion to cross-review-v1 v1.12.7 (parallel ship, same operator directive). Smoke harness completes with ok: true / exit 0; lint/typecheck/format clean; npm audit --audit-level=moderate 0 vulnerabilities. Patch bump (additive — only new exports + new smoke markers; no runtime behavior change). |
v02.18.04 |
Patch — Codex external audit 2026-05-07 outcome: 6 surgical fixes (P1.1, P1.2, P1.3, P1.4, P2.1, P2.4). Codex submitted a read-only audit of cross-review-v2 v2.18.3 with 4 P1 + 7 P2 findings; this ship lands 6 verified-actionable items. P1.1: package.json adds "hono": ">=4.12.16" override clearing 2 npm-audit moderate advisories (GHSA-9vqf-7f2p-gf9v + GHSA-69xw-7hcm-h432) via @modelcontextprotocol/sdk transitive (practical exposure ~zero in stdio runtime, but audit-gate matters for publish + defense-in-depth; same precedent as v2.18.1 ip-address override). P1.2: src/security/redact.ts adds xai- API key pattern at parity with sk-/sk-ant-/AIza/etc; logs/sessions could previously leak xAI keys via persisted provider errors. P1.3: runEvidenceChecklistJudgeConsensusPass + runEvidenceChecklistJudgePass now thread AbortSignal through to judgeEvidenceAsk(context.signal) — pre-v2.18.4 the consensus path hardcoded signal: undefined and single-peer omitted the field, so session_cancel_job could not abort judges mid-flight. Autowire call sites pass input.signal from round scope. P1.4: lowered default CROSS_REVIEW_V2_EVIDENCE_JUDGE_MAX_ITEMS_PER_PASS from 8 → 4 — with default consensus_peers=4, worst-case round goes from 4×8=32 paid judge calls down to 4×4=16. Operators wanting prior behavior set env-var explicitly. P2.1: GROK_REASONING_EFFORT_MODELS allowlist expanded from {"grok-4.20-multi-agent"} to include "grok-4.3" per current xAI docs (verified via WebFetch 2026-05-07; xAI added grok-4.3 reasoning_effort support after v2.16.0 froze). New clampEffortForModel() narrows internal xhigh/minimal scale to high for grok-4.3 (which only accepts `none |
v02.18.03 |
Patch — Gemini default pin bump gemini-3.1-pro-preview → gemini-2.5-pro (operator preference 2026-05-07; coordinated with cross-review-v1 v1.12.4). Source-of-truth defaults flipped: src/core/config.ts models.gemini default → gemini-2.5-pro; src/peers/model-selection.ts priority list → ["gemini-2.5-pro", "gemini-3.1-pro-preview"] (3.1-pro-preview retained as fallback). Rationale: under Google One AI Ultra subscription, gemini-2.5-pro carries 1k requests/day quota vs gemini-3.1-pro-preview's 250 requests/day; post-bump empirical sessions (08cbc942, 1d5be5f2, 256ac7c9 — all 2026-05-07) confirm gemini-2.5-pro stable across the 5-peer panel without rate_limit blockers. The 7 LCV-workspace MCP host configs already flipped CROSS_REVIEW_GEMINI_MODEL=gemini-2.5-pro env-override 2026-05-07; this ship aligns the source-of-truth defaults so a fresh install without env-override picks the same model. Workspace policy (operator directive 2026-05-07): only gemini-*-pro variants ≥ 2.5 are permitted — no *-flash and no models below 2.5. Smoke fixture scripts/smoke.ts:225 (currentOfficialModel iterator) flipped to gemini-2.5-pro. docs/api-keys.md env-var example + docs/model-selection.md priority documentation refreshed to match. Patch bump (no public surface change beyond default model ID; behavior unchanged for env-override users). |
v02.18.02 |
Tier 5 — Windows process-tree introspection (coordinated with cross-review-v1 v1.12.2). Closes the long-standing forensics gap: pre-v2.18.2 getParentProcessSnapshot() returned parent_exe_basename: null on Windows because we only had a POSIX /proc/<ppid>/comm reader (Windows path deferred at F1 v2.18.0). v2.18.2 closes the gap with a defensive tasklist /FI "PID eq <ppid>" /FO CSV /NH reader via child_process.spawnSync (timeout: 500, windowsHide: true); parser uses leading-quote discriminator and the same 1 ≤ length < 128 sanity filter as POSIX. Best-effort try/catch swallows ENOENT, timeout, parse failures. POSIX path unchanged. scripts/smoke.ts sub-test (14) extended with shape sanity + Windows-specific populated-basename assertion + source-level anti-drift guards. Forensics-only field — NOT used by F1 token gate or v2.17.0 clientInfo cross-check. Patch bump (no public surface change). |
v02.18.00 |
F1 caller capability tokens (coordinated with cross-review-v1 v1.11.0). Cryptographic identity proof that complements the v2.17.0 clientInfo gate. Pre-v2.18.0 the v2.17.0 cross-check between caller and clientInfo.name only catches inconsistent self-reports — both fields are declared by the caller. F1 introduces a per-host secret (env CROSS_REVIEW_CALLER_TOKEN), authoritative on match and rejected on mismatch. New caller-tokens module exposes generation, loading, constant-time hex matching, env verification and a best-effort parent-process snapshot for forensics (Option C / Hybrid). New MCP tool regenerate_caller_tokens rotates host-tokens.json. New env vars CROSS_REVIEW_CALLER_TOKEN, CROSS_REVIEW_TOKENS_FILE, CROSS_REVIEW_REQUIRE_TOKEN. New caller_tokens block in server_info surfaces the gate state. verifyCallerIdentity extended with verification_method ("token" |
v02.17.00 |
HARD GATE — identity forgery rejection (operator directive 2026-05-05). Empirical evidence flagrada: cross-review-v2 session 0994cbaf foi criada por Codex com caller=claude (impersonação para auto-exclusão do real Claude da panel). Pre-v2.17.0 v2 nem capturava clientInfo da MCP initialize handshake — caller era trusted unconditionally. v2.17.0 adiciona verifyCallerIdentity(declaredCaller, clientInfo) que cross-checks o caller declarado contra getCallerCandidatesFromClientInfo(clientInfo). Aplicado em todos os 6 handlers caller-accepting: session_init, ask_peers, session_start_round, run_until_unanimous, session_start_unanimous, contest_verdict (quando new_caller provided). Match → OK + identity_verified=true. clientInfo unknown → OK + identity_verified=false (legitimate override). caller="operator" → OK (no agent claim made). Mismatch OR multi-match clientInfo → throws identity_forgery_blocked. Smoke identity_forgery_blocked_test (6 sub-tests). Coordinated ship com cross-review-v1 v1.9.0. Minor bump porque public surface adds identity_forgery_blocked error. Cross-review trilateral bypassed por operator directive (security fix to the gate itself, would otherwise route through compromised gate). |
v02.16.00 |
Tribunal protocol repair plus operational doctor. Separates petitioner/caller from relator metadata, applies self-recusal to direct ask_peers, adds read-only session_doctor, fixes Windows smoke teardown, and refreshes provider model guidance from official docs. |
v02.15.01 |
server_info consensus visibility hotfix. Exposes consensus_peers and configured_consensus_peers_raw for evidence-judge autowire so operators can audit the same configuration the dispatcher is using. |
v02.15.00 |
Backlog bundle for operational judge controls. Added consensus-based judge autowire, per-call reasoning-effort overrides, opt-in real-API smoke, provider 4xx docs hints, and a Grok reasoning-capability allowlist while exposing consensus toggles across the six MCP host configs. |
v02.14.01 |
Grok reasoning model hotfix. Switched the default Grok model to grok-4.20-multi-agent after real xAI verification and official docs showed reasoning.effort is accepted only on that model family. |
v02.14.00 |
Grok joins the tribunal. Expanded the peer set to five with Grok, added per-peer on/off env vars, precision-report groundwork, active evidence-judge autowire, contest_verdict, multi-peer judge consensus, attached-evidence prompt injection, and CodeQL-safe temp-directory handling. |
v02.13.00 |
Lead meta-review drift fix. Added explicit ship versus review session mode, lead drift detection, drift telemetry, and an abort gate so run_until_unanimous does not replace the artifact under review with a structured peer-review verdict. |
v02.12.00 |
Shadow judge observability. Turned on evidence-judge shadow-mode data collection, surfaced autowire config in server_info, added dashboard/runtime rollups, and codified the tribunal-colegiado model for caller, relator, peer votes, and contestation. |
v02.11.00 |
Relator lottery plus shadow auto-wire. Added automatic relator selection that excludes the caller and wired the v2.9 judge pass in shadow mode so self-review drift stops at the session structure. |
v02.09.00 |
LLM evidence-judge pass. Added an operator-triggered judge that evaluates open evidence asks against the current draft and promotes only verified satisfied items, leaving inferred/unknown cases open. |
v02.08.00 |
Per-peer health and Evidence Broker lifecycle. Added health rollups, evidence lifecycle tracking, resurfacing inference, dashboard surfaces, and the final architectural audit item on top of v2.7. |
v02.07.00 |
Evidence Broker. Added a persistent per-session evidence checklist that deduplicates NEEDS_EVIDENCE caller requests and injects outstanding asks into subsequent revision prompts. |
v02.06.01 |
Fallback/recovery budget hard gate. Replicated hard budget refusal to fallback and moderation-recovery paths so paid recovery calls cannot silently exceed the session cost ceiling. |
v02.06.00 |
Token-delta compaction plus v2.5 format hotfix bundle. Coalesced streaming token delta events to reduce events.ndjson noise and bundled the deferred Prettier/format fix from v2.5. |
v02.05.00 |
Evidence and budget hardening pass. Folded in operator-requested evidence/budget improvements plus empirical Codex/Gemini audit findings from historical session analysis. |
v02.04.01 |
CI stub fail-fast hotfix. Fixed import-time server startup so the smoke harness can import MCP schemas while CROSS_REVIEW_V2_STUB=1 is set in CI with explicit confirmation. |
v02.04.00 |
Audit-closure hardening pass. Closed internal v2.3.3 technical-opinion priorities with additive public-surface hardening and several explicitly documented behavior changes. |
v02.03.03 |
Prompt shielding and financial safety. Wrapped review_focus in escaped delimiters, blocked paid calls until financial controls are configured, expanded server_info financial diagnostics, and hardened MCP IDs, sweeps, jobs, and recovery cost alerts. |
v02.03.02 |
CI-green README/docs cleanup. Reissued README organizational standardization under the repository Prettier policy and completed active-document rename cleanup in NOTICE and CODE_OF_CONDUCT.md. |
v02.03.01 |
README organizational standardization. Adopted the shared LCV README opening while preserving the API-first runtime, model-selection, streaming, and observability sections. |
v02.03.00 |
Provider-neutral review_focus. Added focus support across session tools, persisted focus metadata, injected bounded focus blocks into generation/review/retry prompts, and aligned auto-tag/publish automation with the stable package line. |
v02.02.00 |
Provider token streaming. Added real token streaming for OpenAI, Anthropic, Gemini, and DeepSeek, with count-based progress events, runtime controls, and text-redaction defaults for persisted event logs. |
v02.01.01 |
CodeQL and model-selection hardening. Fixed secret-redaction ReDoS and dashboard log-injection alerts, added decision retry for empty peer output, max-output-token controls, stronger model selection, and improved thinking controls. |
v02.01.00 |
First stable cross-review-v2 release. Promoted the API-first implementation to stable with cancellation, restart recovery, metrics, runtime capabilities, prompt compaction, budget preflight, model fallback, and stable naming. |
v02.00.04 |
Session event race hotfix. Removed the CodeQL file-system race in events.ndjson persistence by appending under the session lock. |
v02.00.03 |
Background sessions and durable reports. Added background MCP tools, durable events and reports, peer decision-quality tracking, generation accounting, provider cost rates, budget guard, moderation-safe retry, and dashboard event/report APIs. |
v02.00.02 |
Publishing and dashboard sanitization. Normalized npm dist-tags, replaced the sponsor landing with the SumUp support page, sanitized dashboard 500 responses, and bumped the alpha runtime. |
v02.00.01 |
Public npm/package metadata alignment. Enforced public npm visibility, added registry visibility checks, aligned funding metadata, normalized repository.url, and bumped the alpha runtime. |
v02.00.00 |
Development package line hardening. Added parser format recovery, convergence metadata, shared MCP timeout/runtime smoke, auto-tag/release publishing, padded public tags, prepack clean builds, ignore-rule hardening, and quorum preservation. |
v2.0.0-alpha.2 |
Durable session recovery alpha. Added in-flight metadata, convergence health, evidence attachment, operator escalation, session sweep, convergence inspection, silent-model-downgrade failures, and smoke coverage for the new surfaces. |
v2.0.0-alpha.1 |
Model attestation and store hardening alpha. Added reported-model tracking, failed-attempt aggregation, recovery hints, atomic/locked session writes, UUID path hardening, safer probes, self-review prevention, English peer prompts, and expanded redaction. |
v2.0.0-alpha.0 |
Initial API/SDK-only MCP server. Introduced official SDK adapters for OpenAI, Anthropic, Gemini, and DeepSeek, runtime model discovery, best-model selection, and a durable local session store. |
cross-review-v2 is the stable API-first implementation of the cross-review
pattern. It orchestrates provider API clients (OpenAI/Codex, Anthropic/Claude,
Google Gemini, DeepSeek, and xAI/Grok) and provides an MCP-compatible server
surface.
Runtime calls are real provider calls by default. Stubs exist only for smoke
tests and CI when CROSS_REVIEW_V2_STUB=1.
# Set API keys (PowerShell example)
[Environment]::SetEnvironmentVariable("OPENAI_API_KEY", "<OPENAI_API_KEY>", "User")
[Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY", "<ANTHROPIC_API_KEY>", "User")
[Environment]::SetEnvironmentVariable("GEMINI_API_KEY", "<GEMINI_API_KEY>", "User")
[Environment]::SetEnvironmentVariable("DEEPSEEK_API_KEY", "<DEEPSEEK_API_KEY>", "User")
[Environment]::SetEnvironmentVariable("GROK_API_KEY", "<GROK_API_KEY>", "User")
Restart your terminal after changing environment variables.
Build and run locally:
npm install
npm run build
node dist/src/mcp/server.js
For local smoke tests (no-cost):
$env:CROSS_REVIEW_V2_STUB = "1"
npm test
Model selection and runtime behaviour can be controlled with environment variables. Example overrides (PowerShell):
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_OPENAI_MODEL", "gpt-5.5", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_OPENAI_REASONING_EFFORT", "xhigh", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_GROK_MODEL", "grok-4.20-multi-agent", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_GROK_REASONING_EFFORT", "xhigh", "User")
For Grok, GROK_API_KEY is canonical. grok-4-latest, grok-4.3,
grok-4.20, and grok-4.20-reasoning use xAI automatic reasoning without an explicit
reasoning.effort field. grok-4.20-multi-agent accepts explicit
reasoning.effort; low/medium select 4 agents and high/xhigh select
16 agents.
Financial and budget controls are required for paid provider calls. Configure these environment variables before running real sessions (example):
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_V2_MAX_SESSION_COST_USD", "20", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_V2_PREFLIGHT_MAX_ROUND_COST_USD", "20", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_V2_UNTIL_STOPPED_MAX_COST_USD", "20", "User")
server_inforuntime_capabilitiesprobe_peerssession_initsession_listsession_readask_peerssession_start_roundrun_until_unanimoussession_start_unanimoussession_cancel_jobsession_recover_interruptedsession_pollsession_eventssession_metricssession_doctorsession_reportsession_check_convergencesession_attach_evidenceescalate_to_operatorsession_sweepsession_finalizeApache License 2.0 — see LICENSE and NOTICE.
Copyright 2026 Leonardo Cardozo Vargas.
© LCV Ideas & Software
LEONARDO CARDOZO VARGAS TECNOLOGIA DA INFORMACAO LTDA
Rua Pais Leme, 215 Conj 1713 - Pinheiros
São Paulo - SP
CEP 05.424-150
CNPJ: 66.584.678/0001-77
IM 05.424-150
Выполни в терминале:
claude mcp add cross-review-v2 -- npx -y @lcv-ideas-software/cross-review-v2Web content fetching and conversion for efficient LLM usage.
Retrieval from AWS Knowledge Base using Bedrock Agent Runtime.
автор: modelcontextprotocolProvides auto-configuration for setting up an MCP server in Spring Boot applications.
A very streamlined mcp client that supports calling and monitoring stdio/sse/streamableHttp, and can also view request responses through the /logs page. It also
автор: xuzexin-hzНе уверен что выбрать?
Найди свой стек за 60 секунд
Автор?
Embed-бейдж для README
Похожее
Все в категории ai