loading…
Search for a command to run...
loading…
ContextLattice is an HTTP-first, MCP-compatible memory/context/task orchestrator that persists writes and returns fused recall from specialized stores with loca
ContextLattice is an HTTP-first, MCP-compatible memory/context/task orchestrator that persists writes and returns fused recall from specialized stores with local-first defaults. Primary URL: https://contextlattice.io/ Install: https://contextlattice.io/installation.html Troubleshooting: https://contextlattice.io/troubleshooting.html
Local-first memory orchestration for AI systems with durable writes, multi-sink fanout, retrieval learning loops, and operator-grade controls.
Overview | Architecture | Wiki | V3 Roadmap | Installation | Integrations | Troubleshooting | Updates
Context Lattice is built for teams running high-volume memory writes where durability and retrieval quality matter more than prompt bloat.
/memory/write) with validated + normalized payloads.topic_rollups, postgres_pgvector) in the staged read lane.|
|
|
|
|
|
Use the new operator wiki as the canonical “best tools + graphics” runtime manual for public/main.
https://contextlattice.io/wiki.htmldocs/wiki/README.mddocker compose), such as Docker Desktop, Docker Engine, or another runtime that supports Compose v2lite vs full) with enough CPU, RAM, and diskgmake, jq, rg, python3, curldocs/private/commercialization/v4_paid_release_gate_checklist.mdhttps://github.com/sheawinkler/ContextLattice/releases/latest/download/ContextLattice-macOS-universal.dmghttps://github.com/sheawinkler/ContextLattice/releases/latest/download/ContextLattice-windows-x64.msihttps://github.com/sheawinkler/ContextLattice/releases/latest/download/ContextLattice-linux-bootstrap.tar.gzgmake quickstart~/ContextLattice/setup/agent_contextlattice_instructions.md (copied to clipboard) plus ~/ContextLattice/setup/agent_smoke_write_read.md for immediate write/read verification.Dockerfile.hf-lite for a single-container deployment on port 7860 (copy it to root Dockerfile in the Space repo before build).docs/huggingface-space-lite.mdtopic_rollups retrieval and disables mongo/mindsdb/pgvector for predictable startup in a single container.Release operator note:
gmake dmg-build
# output: dist/ContextLattice-macOS-universal.dmg
gmake msi-build
# output: dist/ContextLattice-windows-x64.msi
gmake linux-bundle-build
# output: dist/ContextLattice-linux-bootstrap.tar.gz
# attach this file to the latest GitHub release
| Lane | Runtime profile | CPU | RAM | Storage |
|---|---|---|---|---|
Public v3.3.x |
Hugging Face / Glama lite (single container) | 2-4 vCPU |
4-8 GB |
20-50 GB SSD |
Public v3.3.x |
Local Lite compose (core lane) | 2-4 vCPU |
8-12 GB |
25-80 GB SSD |
Public v3.3.x |
Local Full compose (no spike-lab) | 6-8 vCPU |
12-20 GB |
100-180 GB SSD |
Public v3.3.x |
Local Full + spike-lab adapters | 8-12 vCPU |
24-32 GB |
180-300 GB SSD/NVMe |
Public-paid / private v4 |
Local premium tuning lane | 8-12 vCPU |
24-48 GB |
250 GB-1 TB SSD/NVMe (external strongly recommended) |
Private v4 hosted |
Multi-node baseline | 16+ vCPU host + GPU lane |
64+ GB host RAM |
1-2 TB NVMe for indexes/snapshots/logs |
Operational notes:
~16.39 GiB container RSS; Full baseline (excluding spike-lab adapters) measured ~7.70 GiB.20-28 GB is a safe starting range; raise only when running spike-lab).40 GB free at the storage-governance root (ORCH_STORAGE_GOVERNANCE_MIN_FREE_GB=40 default).GO_TELEMETRY_RETENTION_DAYS=75, blob compression enabled, blob GC enabled.ORCH_RETENTION_TELEMETRY_ONLY=true with protected topic/file rules).cp .env.example .env
ln -svf ../../.env infra/compose/.env
Strict runtime lock (prevents tuning drift across restarts):
gmake env-lock-apply
gmake env-lock-check
config/env/strict_runtime.env is the single source of truth for critical runtime/tuning keys.
gmake up, gmake mem-up, and release/lite launch targets auto-apply this lock before compose starts.
Canonical config layout:
config/env/ -> runtime/tuning lockfilesconfig/mcp/ -> MCP hub/proxy/client config filesOptional Letta backlog auto-prune tuning in .env:
LETTA_AUTO_PRUNE_ENABLED=true
LETTA_AUTO_PRUNE_INTERVAL_SECS=75
LETTA_AUTO_PRUNE_BACKLOG_TRIGGER=1000
LETTA_AUTO_PRUNE_LIMIT=20000
LETTA_AUTO_PRUNE_TIMEOUT_SECS=45
LETTA_AUTO_PRUNE_STATUSES=pending,retrying
Optional code-context and agent capability surfaces:
ORCH_CODE_CONTEXT_ENRICH_ENABLED=true
ORCH_MCP_CAPABILITY_MAP_ENABLED=true
ORCH_BROWSER_CONTEXT_INGEST_ENABLED=true
Fastembed adapter runtime (service-backed):
ORCH_ADAPTER_FASTEMBED_RS_ENABLED=true
ORCH_FASTEMBED_RS_BASE_URL=http://fastembed-sidecar:8080
ORCH_FASTEMBED_RS_ROUTE=/embed
ORCH_FASTEMBED_RS_MODEL=BAAI/bge-small-en-v1.5
ORCH_FASTEMBED_RS_TIMEOUT_SECS=2.5
ORCH_ADAPTER_FASTEMBED_RS_REQUIRE_GATE=true
ORCH_ADAPTER_FASTEMBED_RS_GATE_FILE=/app/data/gates/fastembed_gate_latest.json
ORCH_ADAPTER_FASTEMBED_RS_GATE_MAX_AGE_SECS=172800
ORCH_ADAPTER_FASTEMBED_RS_PROMOTE_OVERRIDE=true
ORCH_ADAPTER_FASTEMBED_RS_PROMOTE_REASON=manual_16pct_promotion_2026-03-16
FASTEMBED_DEFAULT_MODEL=BAAI/bge-small-en-v1.5
FASTEMBED_MAX_BATCH=256
When enabled, orchestrator Qdrant write fanout uses batched embeddings (embed_text_batch) to reduce per-item adapter overhead.
If gate mode is enabled, fastembed activates only when the benchmark gate artifact reports passed=true.
Manual promotion override is available for explicitly approved cases; telemetry still reports the raw gate result and marks override activation separately.
fastembed-gate-refresh now runs this refresh loop automatically in compose; manual command remains available:
python3 bench/perf_shortlist_matrix.py \
--api-key "$ORCH_KEY" \
--runs 12 \
--gate-warmups 1 \
--gate-repeats 3 \
--gate-aggregate median \
--baseline bench/results/perf_shortlist_matrix_baseline.json \
--gate-output /app/data/gates/fastembed_gate_latest.json
If the gate refresher starts before orchestrator readiness, it retries quickly via:
GATE_REFRESH_FAILURE_RETRY_SECS=45
Gateway staged retrieval now returns continuation_async.events_url when slow-source continuation is scheduled. Subscribe via SSE to get non-blocking completion updates:
GET /memory/search/continuations/{token}/events
Optional lexical guard for staged retrieval (policy-aware slow-source deferral):
GO_RETRIEVAL_LEXICAL_GUARD_ENABLED=true
GO_RETRIEVAL_LEXICAL_GUARD_MIN_COVERAGE=0.55
GO_RETRIEVAL_LEXICAL_GUARD_MIN_RESULTS=1
Optional mode-aware Qdrant tuning:
ORCH_QDRANT_SEARCH_MODE_HNSW_EF={"fast":48,"balanced":96,"deep":128}
ORCH_QDRANT_SEARCH_MODE_LIMIT_CAPS={"fast":80,"balanced":120,"deep":180}
ORCH_QDRANT_FILTERLESS_LIMIT_CAP=96
ORCH_QDRANT_WARMUP_ENABLED=true
ORCH_QDRANT_WARMUP_DELAY_SECS=2
ORCH_QDRANT_WARMUP_TIMEOUT_SECS=20
Deep async durability + telemetry store routing:
ORCH_RECALL_DEEP_ASYNC_PERSIST_ENABLED=true
ORCH_RECALL_DEEP_ASYNC_STORE_BACKEND=mongo
ORCH_RECALL_DEEP_ASYNC_MONGO_DB=contextlattice_raw
ORCH_RECALL_DEEP_ASYNC_MONGO_COLLECTION=recall_deep_async_jobs
ORCH_TELEMETRY_DB=contextlattice_raw
ORCH_TELEMETRY_COLLECTION=retrieval_telemetry
ORCH_TELEMETRY_PERSIST_ENABLED=true
ORCH_RETRIEVAL_MEMORY_BANK_DEFAULT_ENABLED=true
ORCH_MEMORY_BANK_SEARCH_BACKEND=shodh_spike
ORCH_MEMORY_BANK_SPIKE_FALLBACK_BACKEND=surrealdb_spike
ORCH_MEMORY_BANK_SPIKE_FALLBACK_BACKENDS=surrealdb_spike,memvid_spike,icm_spike,quickwit_spike
ORCH_MEMORY_BANK_SPIKE_HTTP_URL=http://memory-bank-spike-rs:8096
ORCH_MEMORY_BANK_SPIKE_SEARCH_ROUTE=/search
ORCH_MEMORY_BANK_SPIKE_MAX_CHAIN_BACKENDS=3
ORCH_MEMORY_BANK_SPIKE_HEDGE_ENABLED=false
ORCH_MEMORY_BANK_SPIKE_HEDGE_MAX_PARALLEL=2
ORCH_MEMORY_BANK_SPIKE_HEDGE_BACKENDS=shodh_spike,surrealdb_spike
MEMORY_BANK_SPIKE_RS_MEILI_URL=http://meilisearch:7700
MEMORY_BANK_SPIKE_RS_MEILI_INDEX=contextlattice_memory
MEMORY_BANK_SPIKE_RS_MEILI_TASK_TIMEOUT_SECS=30
MEMORY_BANK_SPIKE_RS_PORT=8096
gmake quickstart
This command:
.env if missinglite vs full) with CPU/RAM/storage guidance (interactive shells)Non-interactive profile selection:
QUICKSTART_PROFILE_PROMPT=0 QUICKSTART_PROFILE_DEFAULT=lite gmake quickstart
# or
BOOTSTRAP=1 scripts/first_run.sh --profile full --no-profile-prompt
Easy monitoring after launch:
gmake monitor-open
# CLI-only checks:
gmake monitor-check
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq '.service,.sinks'
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/ops/capabilities | jq
Expected:
/health returns {"ok": true, ...}/status returns service and sink states (with API key)BOOTSTRAP=1 scripts/first_run.sh
MINDSDB_REQUIRED now defaults automatically from COMPOSE_PROFILES.
# launch using current COMPOSE_PROFILES from .env
gmake mem-up
# explicit modes
gmake mem-up-lite
gmake mem-up-full
gmake mem-up-core
# persist profile mode for future gmake mem-up
gmake mem-mode-full
gmake mem-mode-core
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS http://127.0.0.1:8075/health | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/status | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/fanout | jq '.lettaAutoPrune'
curl -fsS -H "x-api-key: ${ORCH_KEY}" http://127.0.0.1:8075/telemetry/retention | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
"http://127.0.0.1:8075/telemetry/memory/cleanup-low-value/chunked?dry_run=true&project_batch=10&per_project_limit=250" | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
"http://127.0.0.1:8075/telemetry/fanout/letta/auto-prune/run?force=false" | jq
curl -fsS -X POST -H "x-api-key: ${ORCH_KEY}" \
"http://127.0.0.1:8075/maintenance/telemetry/purge?dry_run=true&include_qdrant=true&include_mindsdb=true&include_letta=true" | jq
scripts/first_run.sh --allow-secrets-storage
scripts/first_run.sh --block-secrets-storage
scripts/first_run.sh --insecure-local
scripts/first_run.sh --security-mode strict
scripts/first_run.sh now enforces secure local-first defaults unless explicitly overridden:
HOST_BIND_ADDRESS=127.0.0.1)CONTEXTLATTICE_ENV=production, API key optional by default)CONTEXTLATTICE_ENV=strict, API key required)SECRETS_STORAGE_MODE=redact)Security toggles:
--allow-secrets-storage--block-secrets-storage--insecure-local (explicit opt-out)--security-mode development|production|strictPaste this into any new agent session (ChatGPT app, Claude chat apps, Claude Code, Codex):
You must use Context Lattice as the memory/context layer.
Runtime:
- Orchestrator: http://127.0.0.1:8075
- API key: CONTEXTLATTICE_ORCHESTRATOR_API_KEY from my local .env
Required behavior:
1) Before planning, call POST /memory/search with compact query + project/topic filters.
2) During long tasks, checkpoint major decisions/outcomes via POST /memory/write.
2.1) Submit outcome feedback with POST /tools/feedback_submit (include idempotencyKey).
3) Before final answer, run one more POST /memory/search for recency.
4) Keep writes compact (summary, decisions, diffs), never full transcripts.
5) If memory endpoints fail, continue task and report degraded-memory mode explicitly.
6) Use read-call timeouts that match retrieval mode:
- fast: 25s
- balanced: 60s
- deep (blocking reads): 75s
Fast/balanced modes keep slow sources async by default.
Explicit `sources=[...]` does not force blocking; use `blocking=true` (or `sync_slow_sources=true`) when you intentionally want blocking slow-source completion.
Deep mode now defaults to async completion: you get immediate partial results plus `job_id`/`poll_url`/`events_url`, then fetch final results from `GET /memory/search/jobs/{job_id}` (or `/memory/search/async/{job_id}`) or stream updates from `GET /memory/search/jobs/{job_id}/events`.
Read responses expose `retrieval_lifecycle` for explicit status (`queued|running|partial|succeeded|failed`) and source availability.
If a deep read returns partials, show those immediately and poll once after 5-15s for warmed slow-source completion.
7) Set endpoint vars explicitly at session start:
- `export CONTEXTLATTICE_ORCHESTRATOR_URL=http://127.0.0.1:8075`
- `export MEMMCP_ORCHESTRATOR_URL=http://127.0.0.1:8075`
8) Set a stable agent identity for profile defaults:
- `export CONTEXTLATTICE_AGENT_ID=codex_gpt5`
- `export MEMMCP_AGENT_ID=codex_gpt5`
Detailed playbook: docs/human_agent_instruction_playbook.md
Expected user/agent access pattern:
POST /memory/search (fast or balanced) with project, optional topic_path, and include_grounding=true.continuation_async, read partials immediately and either:GET /memory/search/continuations/{token}/events, orblocking=true (or sync_slow_sources=true) and keep a longer caller timeout.POST /memory/context-pack for broad synthesis and POST /v1/memory/neighbors for graph-neighbor exploration.Lifecycle-aware local helper:
./scripts/agent_orchestration.sh search-lifecycle \
"profitability tuning baseline ladder" \
contextlattice \
deep \
wait
Codex-first preflight helper:
./scripts/agent_orchestration.sh preflight contextlattice runbooks/codex-integration
# If the agent is not running from repo root:
REPO_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || pwd)"
python3 "$REPO_ROOT/scripts/agent_orchestration.py" preflight contextlattice runbooks/codex-integration
Profile-aware preflight helpers:
./scripts/agent_orchestration.sh preflight-agent claude-code contextlattice
./scripts/agent_orchestration.sh preflight-agent opencode contextlattice
./scripts/agent_orchestration.sh preflight-agent hermes-agent contextlattice
ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
curl -fsS -H "content-type: application/json" -H "x-api-key: ${ORCH_KEY}" \
-d '{"agent":"chatgpt-web","project":"contextlattice"}' \
http://127.0.0.1:8075/v1/agents/preflight | jq
http://127.0.0.1:8075; Python helpers are compatibility shims for operator scripts only.scripts/contextlattice_client.py (legacy shim: scripts/orchestrator_helper.py).GO_TOOL_CALLS_ALLOW_ALL=true) to prevent startup friction.CONTEXTLATTICE_ORCHESTRATOR_API_KEY: orchestrator/admin lane.CONTEXTLATTICE_WORKER_API_KEY: worker lane.GO_TOOL_CALLS_ROLE_SPLIT_AUTO=true enables role split automatically only when both keys are present and distinct.capability_map,ops_queue_status; deny memory_write_batch,feedback_submit.Agent-specific template blocks:
docs/public_overview/templates/agents/universal.md (canonical contract for all agents)docs/public_overview/templates/agents/codex.mddocs/public_overview/templates/agents/claude-code.mddocs/public_overview/templates/agents/opencode.mddocs/public_overview/templates/agents/hermes-agent.mddocs/public_overview/templates/agents/chatgpt-web-desktop.mddocs/public_overview/templates/agents/claude-web-desktop.mdAgent profile defaults source:
config/agents/agent_profiles.jsonContext Lattice can queue and route tasks to external runners (Codex, OpenCode, Claude Code) and still supports internal application workers.
agent to the external runner id (codex, opencode, claude-code, or any custom worker name).agent=internal or leave unassigned (agent empty / any) for orchestrator workers.ORCH_KEY="$(awk -F= '/^CONTEXTLATTICE_ORCHESTRATOR_API_KEY=/{print substr($0,index($0,"=")+1)}' .env)"
# 1) Create a task targeted to any external runner id.
curl -fsS -X POST http://127.0.0.1:8075/agents/tasks \
-H "content-type: application/json" \
-H "x-api-key: ${ORCH_KEY}" \
-d '{
"title":"summarize deployment notes",
"project":"default",
"agent":"codex",
"priority":3,
"payload":{
"action":"memory_search",
"query":"deployment notes",
"project":"default",
"limit":8
}
}'
# 2) Runner claims only tasks assigned to its worker id (plus unassigned/any tasks).
curl -fsS -X POST "http://127.0.0.1:8075/agents/tasks/next?worker=codex" \
-H "x-api-key: ${ORCH_KEY}"
# 3) Runner reports completion.
curl -fsS -X POST http://127.0.0.1:8075/agents/tasks/<TASK_ID>/status \
-H "content-type: application/json" \
-H "x-api-key: ${ORCH_KEY}" \
-d '{"status":"succeeded","message":"completed by external runner","metadata":{"worker":"codex"}}'
100+ messages/second for typical memory payloads on modern laptop-class hardware.qdrant/mindsdb/letta fanout.ORCH_RETRIEVAL_MEMORY_BANK_DEFAULT_ENABLED=true) with default shodh_spike, deterministic fallback chain surrealdb_spike,memvid_spike,icm_spike,quickwit_spike, and chain breadth cap (ORCH_MEMORY_BANK_SPIKE_MAX_CHAIN_BACKENDS=3) for RAM-safe operation.Memory-bank profiles:
balanced (default): shodh_spike with deterministic fallback chain, capped to 3 backends.low-ram: icm_spike only, chain cap 1, hedge disabled.quality-hedge (opt-in): 2-way parallel hedge across shodh_spike,surrealdb_spike.docs/private/cutover/memory-bank-b2-b3-presets-2026-03-31.md.v3.3 (public) and v4 (private) are intentionally different lanes:
| Area | Public v3.3 |
Private v4 |
|---|---|---|
| Runtime frontdoor | gateway-go on :8075 |
gateway-go on :8075 |
| Fallback lane | Python orchestrator on :18075 |
Python orchestrator on :18075 |
| Rust/Go posture | Enabled by default | Enabled by default |
| Retrieval policy | staged fast-return + async slow continuation | staged + aggressive adaptive experiments |
| Memory-bank default | shodh_spike (with bounded fallback chain) |
shodh_spike with deterministic fallback chain and optional hedge mode |
| Release intent | stable public baseline | experimental/tuning lane behind hard gates |
| Promotion rule | benchmark + parity proof in release notes | benchmark + parity + operational soak before public sync |
Telemetry routing/cleanup toggles:
ORCH_MEMORY_BANK_TELEMETRY_GUARD_ENABLED=true
ORCH_MEMORY_BANK_TELEMETRY_TOPIC_PREFIXES=telemetry,metrics,signals,overrides
ORCH_MEMORY_BANK_TELEMETRY_MARKERS=telemetry,metrics,__state__,__stats__,__snapshots__,__health__,__allocations__,_agg-,queue__
ORCH_QDRANT_TELEMETRY_GUARD_ENABLED=true
ORCH_MINDSDB_TELEMETRY_GUARD_ENABLED=true
ORCH_LETTA_TELEMETRY_GUARD_ENABLED=true
MINDSDB_LOW_VALUE_RETENTION_HOURS=48
Live A/B benchmark on POST /memory/search using bench/phase1_runtime_comparison.py with 8 requests and 20s timeout:
USE_RUST_* = true, USE_GO_ORCHESTRATOR = true):3557ms, p50 2334ms, p95 8494ms, errors 0/8USE_RUST_* = false, USE_GO_ORCHESTRATOR = false):17565ms, p50 20006ms, p95 20008ms, errors 7/8 (timeouts)4.94x faster (about 5x)8.57x faster2.36x fasterArtifacts:
bench/results/phase1_ab_rustgo_on_fast_20260304T182812Z.jsonbench/results/phase1_ab_rustgo_off_fast_20260304T182916Z.jsonV3 is focused on application efficacy, not speed in isolation:
Roadmap documents:
docs/v3-roadmap.mddocs/perf-candidate-notes/ultra_db_stack_recommendation_2026-03-16.mdhttps://contextlattice.io/roadmap.htmlProgram graph:
V3 Objective: Context Efficacy at Scale
├─ Track A (Issues #69 + #72): performance + deep-read stability
├─ Track B (Issues #70 + #72): recall quality + memory semantics
└─ Track C (Issues #68 + #71): runner interop + compute backend
-> unified security/benchmark/recall gates -> staged cutover
The orchestrator now runs Rust+Go as the default runtime path. Python remains in place as a legacy fallback when a proxy is unavailable.
Codec, MemoryStore, Retriever, Scheduler, StateDeltaGET /migration/runtimeUSE_RUST_CODECUSE_RUST_MEMORYUSE_RUST_RETRIEVALORCH_RUST_RETRIEVAL_VECTOR_BACKEND (auto|qdrant_remote|usearch_ann)ORCH_RUST_RETRIEVAL_LEXICAL_BACKEND (auto|none|tantivy_lexical)ORCH_RUST_RETRIEVAL_BACKEND_STRICTORCH_MEMORY_BANK_SEARCH_BACKEND (native|disabled|meilisearch_spike|quickwit_spike|tantivy_spike|lancedb_spike|trieve_spike|helixdb_spike|icm_spike|shodh_spike|memvid_spike|surrealdb_spike)ORCH_MEMORY_BANK_SPIKE_FALLBACK_BACKENDORCH_MEMORY_BANK_SPIKE_FALLBACK_BACKENDSORCH_MEMORY_BANK_SPIKE_MAX_CHAIN_BACKENDSORCH_MEMORY_BANK_SPIKE_HEDGE_ENABLEDORCH_MEMORY_BANK_SPIKE_HEDGE_MAX_PARALLELORCH_MEMORY_BANK_SPIKE_HEDGE_BACKENDSORCH_MEMORY_BANK_SPIKE_HTTP_URLMEMORY_BANK_SPIKE_RS_MEILI_URLMEMORY_BANK_SPIKE_RS_MEILI_INDEXMEMORY_BANK_SPIKE_RS_MEILI_TASK_TIMEOUT_SECSGO_RETRIEVAL_LEXICAL_GUARD_ENABLEDGO_RETRIEVAL_LEXICAL_GUARD_MIN_COVERAGEGO_RETRIEVAL_LEXICAL_GUARD_MIN_RESULTSORCH_RETRIEVAL_SYNC_ASYNC_MIN_FAST_RESULTS_BY_MODE (JSON map, e.g. {"fast":1,"balanced":2,"deep":3})GO_RETRIEVAL_DISABLE_SYNC_SLOW_FALLBACKGO_RETRIEVAL_SLOW_SYNC_TIMEOUT_CAP_SECSGO_RETRIEVAL_RUST_LANE_PROMOTION_ENABLEDGO_RETRIEVAL_TOPIC_PREFILTER_ENABLEDV4 stack reference:
docs/perf-candidate-notes/v4_stack_and_rust_exploration_plan_2026-03-16.mdUSE_GO_ORCHESTRATORCONTEXTLATTICE_ENGINE_MODE (embedded or service)CONTEXTLATTICE_ENGINE_URLCONTEXTLATTICE_GO_ORCHESTRATOR_URLMIGRATION_SHADOW_DUAL_RUNMIGRATION_CANARY_ENABLEDMigration scaffolding:
crates/context_codec, crates/context_engine, crates/context_retrievalproto/contextlattice_engine.protoservices/orchestrator-go, services/gateway-godocs/engine-api.md, docs/migration-phase-status.mdDefault cutover toggles:
USE_RUST_CODEC=true
USE_RUST_MEMORY=true
USE_RUST_RETRIEVAL=true
USE_GO_ORCHESTRATOR=true
CONTEXTLATTICE_ENGINE_MODE=service
CONTEXTLATTICE_ENGINE_URL=http://contextlattice-orchestrator:8075
CONTEXTLATTICE_GO_ORCHESTRATOR_URL=http://orchestrator-go:8090
MIGRATION_SHADOW_DUAL_RUN=true
MIGRATION_CANARY_ENABLED=true
Rollback/legacy toggles (temporary fallback only):
USE_RUST_CODEC=false
USE_RUST_MEMORY=false
USE_RUST_RETRIEVAL=false
USE_GO_ORCHESTRATOR=false
Pathway cache backend modes:
ORCH_RETRIEVAL_PATHWAY_CACHE_BACKEND=memory (in-memory only)ORCH_RETRIEVAL_PATHWAY_CACHE_BACKEND=redis (read/write Redis backend)ORCH_RETRIEVAL_PATHWAY_CACHE_BACKEND=redis_mirror (write-through mirror only; read path stays in-memory)Dashboard retrieval observability:
contextlattice-dashboard status page now includes a retrieval flow panel with:/v1/memory/get)Balanced compose launcher:
scripts/compose_v4_balanced.sh now keeps observability enabled by default.--without-observability only when you intentionally want a lighter runtime.Console + paid-public endpoint verification:
scripts/check_paid_public_endpoints.sh after UI/API route changes.qwen3.5:9b via Ollama).auto:ollama/coremlollamaORCH_INFER_PROVIDER=auto + ORCH_ANE_SIDECAR_ENABLED=true) with automatic fallback to Ollama.SECRETS_STORAGE_MODE=redact redacts secret-like material before memory persistence/fanout.SECRETS_STORAGE_MODE=block rejects writes containing secret-like material (422).SECRETS_STORAGE_MODE=allow stores write payloads as-is (operator opt-in).HOST_BIND_ADDRESS=127.0.0.1.CONTEXTLATTICE_ORCHESTRATOR_API_KEY.Enforce PR-only merges on main with CODEOWNERS approval (.github/CODEOWNERS is * @sheawinkler):
scripts/enable_main_branch_protection.sh main 1
If GitHub returns Upgrade to GitHub Pro or make this repository public, switch repo visibility or plan, then rerun the command.
# optional IronClaw bridge
IRONCLAW_INTEGRATION_ENABLED=true
IRONCLAW_DEFAULT_PROJECT=messaging
# strict secret guard for openclaw/zeroclaw/ironclaw messaging surfaces
MESSAGING_OPENCLAW_STRICT_SECURITY=true
Ingress endpoints:
POST /integrations/messaging/openclawPOST /integrations/messaging/ironclawPOST /integrations/messaging/command@ContextLattice task create|status|list|approve|replay|deadletter|runtimePOST /memory/writePOST /memory/searchPOST /memory/context-packPOST /v1/memory/neighborsGET|POST /v1/skills/quarantine/searchPOST /v1/skills/quarantine/reindex (opt-in; disabled by default)GET|POST /v1/skills/index/search (alias)POST /v1/skills/index/reindex (alias; opt-in)GET /memory/search/continuations/{token}/eventsPOST /tools/feedback_submitGET|POST /tools/skills_quarantine_searchPOST /tools/skills_quarantine_reindex (opt-in; disabled by default)GET|POST /tools/skills_index_search (alias)POST /tools/skills_index_reindex (alias; opt-in)POST /integrations/messaging/commandPOST /integrations/messaging/openclawPOST /integrations/messaging/ironclawPOST /integrations/telegram/webhookPOST /integrations/slack/eventsPOST /agents/tasksGET /agents/tasksGET /agents/tasks/runtimeGET /agents/tasks/deadletterPOST /agents/tasks/{task_id}/replayPOST /agents/tasks/recover-leasesGET /telemetry/memoryGET /telemetry/fanoutPOST /telemetry/fanout/letta/auto-prune/runGET /telemetry/retentionPOST /telemetry/retention/runPOST /maintenance/telemetry/purgeTask workers and generic agent runners now execute a context-expansion loop by default:
POST /memory/context-pack preflight.L0 factual snippetsL1 topic rollupsL2 raw file refs for detail divesTASK_TOOL_CONTEXT_SLICES.agent/checkpoints fallback).Tune with:
CONTEXT_EXPANSION_ENABLED=true
CONTEXT_EXPANSION_L0_BUDGET_TOKENS=1200
CONTEXT_EXPANSION_L1_BUDGET_TOKENS=800
CONTEXT_EXPANSION_L2_BUDGET_TOKENS=400
CONTEXT_EXPANSION_DEEP_ESCALATION_ENABLED=true
ContextLattice now exposes quarantined-skill candidate discovery as a native Go route. This lane is read-only discovery and does not auto-load any quarantined skills.
GET|POST /v1/skills/quarantine/searchGET|POST /tools/skills_quarantine_searchGET|POST /v1/skills/index/searchGET|POST /tools/skills_index_searchPOST /v1/skills/quarantine/reindex (off by default; enable explicitly)Runtime knobs:
ORCH_SKILLS_QUARANTINE_ENABLED=true
ORCH_SKILLS_QUARANTINE_HOST_BIN_DIR=${HOME}/.local/bin
ORCH_SKILLS_QUARANTINE_HOST_ROOT_DIR=${HOME}/.codex/skills_quarantine
ORCH_SKILLS_QUARANTINE_SEARCH_CMD=/opt/contextlattice/skills/bin/codex-skills-quarantine-search
ORCH_SKILLS_QUARANTINE_REINDEX_CMD=/opt/contextlattice/skills/bin/codex-skills-quarantine-reindex
ORCH_SKILLS_QUARANTINE_TIMEOUT_SECS=8
ORCH_SKILLS_QUARANTINE_DEFAULT_LIMIT=20
ORCH_SKILLS_QUARANTINE_MAX_LIMIT=100
ORCH_SKILLS_QUARANTINE_REINDEX_ENABLED=false
CODEX_SKILLS_QUARANTINE_ROOT=/opt/contextlattice/skills_quarantine
CODEX_SKILLS_QUARANTINE_INDEX_DIR=/opt/contextlattice/skills_quarantine/index
CODEX_SKILLS_QUARANTINE_INDEX=/opt/contextlattice/skills_quarantine/index/skills_index.jsonl
docs/releases/v3.2.13.md (Glama-lite sqlite acceleration lane + capability detection)docs/releases/v3.2.3.md (final install/deployment docs alignment for staged runtime lanes)docs/releases/v3.2.2.md (README/website graphics + runtime ownership alignment)docs/releases/v3.2.1.md (config canonicalization + Python fallback audit)docs/releases/v3.2.0.md (public V3 Go-first cutover; Python removed from primary read path; includes A/B benchmark)docs/releases/v3.1.0.md (post-v3.0.0 public, non-V4 integration/runtime updates)docs/audits/python_fallback_audit_v3.2.1.md (fallback-critical vs utility Python validation)docs/perf-baseline.mddocs/migration-plan.mddocs/migration-interfaces.mdbench/README.mddocs/public_overview/README.mddocs/legal/README.mdPre-submit verifier:
gmake submission-preflight
python3 scripts/submission_preflight.py --online
gmake launch-lock
gmake launch-lock-public
This repository (sheawinkler/ContextLattice) is the primary codebase.
Public landing collateral publishes from sheawinkler/ContextLattice branch gh-pages.
docs/public_overview/scripts/sync_public_overview.shhttps://contextlattice.io/https://sheawinkler.github.io/ContextLattice/sheawinkler/memmcp-overview is archived and not used for live hosting.Business Source License 1.1 with change-date transition to Apache-2.0.
Additional Use Grant allows personal/non-production and internal production use
up to 2M JSON-RPC requests/month/organization; usage outside grant requires a
separate commercial license. See LICENSE and docs/legal/README.md.
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"contextlattice": {
"command": "npx",
"args": []
}
}
}