loading…
Search for a command to run...
loading…
Self-improving documentation engine that ingests Slack and GitHub PRs to detect knowledge gaps and improve documentation automatically.
Self-improving documentation engine that ingests Slack and GitHub PRs to detect knowledge gaps and improve documentation automatically.
Stop writing docs after the fact. Capture knowledge where it happens.
DocBrain intercepts knowledge at the moment of creation — from PRs, Slack threads, CI pipelines, and IDE sessions — then scores, reviews, and publishes it before anyone has to ask "where's the doc for this?"
Website • Docs • Quickstart • The Problem • How It Works • Features • Architecture • Security
Project Status: DocBrain is currently distributed as pre-built Docker images and deployment artifacts (Helm charts, configuration, documentation). Source code is not yet published. Contributions are welcome for documentation, configuration, and bug reports.
You know this cycle. Every engineering team does.
Monday: Senior engineer explains the retry logic in a PR review. Three people learn it. The knowledge lives in a GitHub comment thread that nobody will ever find again.
Wednesday: New hire asks "how do I deploy to staging?" in Slack. Someone writes a 4-paragraph answer. It's accurate today. In three months it'll be wrong, and nobody will update it.
Friday: Incident war room. The team discovers that the runbook is 6 months stale. The person who wrote it left the company. Tribal knowledge saves the day, but only because the right people were online.
Next quarter: Leadership says "we need to invest in documentation." You schedule a doc sprint. Engineers write docs for two weeks. Six months later, 40% of those docs are stale. The ones that aren't stale are the ones nobody needed to change because nothing changed.
The root cause isn't laziness. It's timing.
Documentation written after the work is done is documentation written from memory, without context, under competing priorities. It's a tax that nobody wants to pay — and when they do pay it, the result decays immediately.
Every tool in the market solves the wrong problem. They index your existing docs and build a chatbot on top. Great — now you have a chatbot that surfaces your stale, incomplete, scattered documentation slightly faster.
The actual problem is that the knowledge was never captured in the first place.
DocBrain doesn't wait for someone to write a doc. It intercepts knowledge at the point of creation and turns it into documentation automatically. We call this shift-left documentation — the same principle that made shift-left testing work. Move the capture upstream, to where the knowledge actually exists.
WHERE KNOWLEDGE IS CREATED
─────────────────────────
Developer merges a PR ──→ DocBrain extracts decisions, caveats, procedures
Team discusses in Slack ──→ DocBrain distills fragments from the conversation
CI pipeline deploys ──→ DocBrain captures deployment context and changes
Engineer codes in IDE ──→ DocBrain links knowledge to the exact code location
On-call resolves incident ──→ DocBrain captures resolution steps and root cause
│
▼
HOW KNOWLEDGE BECOMES DOCS
─────────────────────────
┌─────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐
│ Capture │───→│ Quality │───→│ Cluster & │───→│ Review & │
│ & Route │ │ Score │ │ Compose │ │ Publish │
└─────────┘ └──────────┘ └───────────┘ └──────────┘
Confidence-based 3-layer scoring Similar fragments Multi-stage
routing: auto-index (structural + grouped by DBSCAN approval with
high-confidence, style + semantic) → auto-composed threaded comments
queue low for review, on every fragment into full docs → published
discard noise and document when cluster ready to your wiki
This is what makes DocBrain different. Other tools index existing docs and answer questions about them. DocBrain captures the knowledge that was never written down — the PR decisions, the Slack explanations, the deployment gotchas, the incident resolutions — and turns it into documentation that meets your team's quality standards.
The result: documentation that's born from real work, not written from memory. Documentation that's quality-scored the moment it exists, not left to rot. Documentation that gets better as your team works, not worse.
git clone https://github.com/docbrain-ai/docbrain.git && cd docbrain
./scripts/setup.sh # interactive wizard — picks provider, sets keys, starts services
Or manually:
cp .env.example .env # set LLM_PROVIDER and API keys
docker compose up -d
# Get the auto-generated admin API key
docker compose exec server cat /app/admin-bootstrap-key.txt
# Open the web dashboard
open http://localhost:3001
# Or ask a question via API
curl -H "Authorization: Bearer <key>" \
-H "Content-Type: application/json" \
-d '{"question":"How do I deploy to production?"}' \
http://localhost:3000/api/v1/ask
The Web UI at http://localhost:3001 gives you the full experience — dashboard, knowledge capture, governance, quality scores, review workflows, predictive analytics, and more. Full setup guide: docs/quickstart.md
docbrain_annotate, docbrain_suggest_capture, and docbrain_commit_capture via MCP — works in Claude Code, Cursor, and any MCP-compatible editor.POST /api/v1/quality/lint plugs into any CI pipeline./api/docs. Auto-generated OpenAPI 3.1 spec. 150+ API endpoints.The core of DocBrain. Every integration point captures knowledge where it's created — before anyone has to remember to document it.
| Capture Point | How It Works |
|---|---|
| Merged PRs | POST /api/v1/ci/analyze — LLM extracts decisions, facts, caveats, and procedures from diffs and commit messages. Hook it into GitHub Actions or GitLab CI. |
| Deployments | POST /api/v1/ci/deploy-capture — Captures deployment context, environment changes, and rollback procedures. |
| Slack & Teams | Capture threads via message shortcut, @DocBrain capture mention, or /docbrain capture — distills conversations into knowledge fragments with confidence scoring. |
| IDE (MCP) | docbrain_annotate links knowledge to exact code locations. docbrain_commit_capture captures intent at commit time. 10 MCP tools total. |
| Conversations | Auto-distillation extracts fragments from Q&A sessions. When someone asks a question and gets a good answer, that answer becomes a fragment automatically. |
| Manual | POST /api/v1/fragments — Teams can submit fragments directly. CLI: docbrain capture. |
What happens after capture: Every fragment is confidence-scored and routed automatically:
Every fragment and document is scored across three independent layers — no unscored content enters the system:
| Layer | Method | What It Measures |
|---|---|---|
| Structural | Deterministic (no LLM cost) | Heading structure, section completeness, code examples, link density, readability |
| Style | Rule engine | Banned terms, heading depth, sentence length, required sections, custom regex |
| Semantic | LLM-assessed (budget-controlled) | Accuracy, clarity, completeness, actionability |
Composite score: structural x 0.4 + style x 0.3 + semantic x 0.3
Quality scores drive automation: low-scoring docs trigger maintenance suggestions, stale docs trigger freshness alerts, and contradictions between docs are flagged automatically.
Every team has a style guide. Nobody follows it. DocBrain enforces it on every document and draft:
# Export your rules as YAML, version-control them, import across spaces
- rule_type: terminology
name: no-simple
description: "Don't assume expertise — avoid 'simple' and 'easy'"
config:
wrong: "simple"
right: "straightforward"
match_whole_word: true
severity: warning
- rule_type: formatting
name: short-sentences
description: "Keep sentences under 40 words for readability"
config:
max_words: 40
severity: info
- rule_type: structure
name: require-intro
description: "Every doc needs an introduction before the first heading"
config:
min_words_before_first_heading: 10
severity: warning
- rule_type: custom_pattern
name: no-internal-urls
description: "Don't leak internal URLs in public docs"
config:
pattern: "https?://internal\\."
message: "Remove internal URL before publishing"
severity: error
Four rule types: terminology (banned/preferred terms), formatting (heading depth, sentence length), structure (required sections, intro paragraphs), and custom_pattern (regex for anything else).
Per-space scoping: Different rules for API docs vs. runbooks vs. onboarding guides.
YAML import/export: Version-control your rules. GET /api/v1/style-rules/export → commit to git → POST /api/v1/style-rules/import on deploy.
Lint any text on demand: POST /api/v1/quality/lint with raw text → get violations with line numbers, severity, and fix suggestions. Wire it into CI to block PRs that break your style guide.
Documentation without ownership decays. DocBrain makes ownership and accountability explicit:
See Governance Guide for setup and configuration.
Configurable multi-stage review pipelines for documentation drafts:
See Review Workflows Guide for configuration and API details.
The autonomous documentation engine that finds and fills gaps without human intervention:
See Autopilot Guide for configuration and tuning.
The full journey from captured knowledge to published documentation:
Capture → Confidence routing → Auto-index / Review queue / Discard
│
Semantic clustering (DBSCAN on embeddings)
│
Auto-composition when cluster is ready
(3+ fragments, 2+ sources, shared topic)
│
Quality scoring (structural + style + semantic)
│
Review workflow (configurable stages)
│
Published documentation
DocBrain doesn't just document what exists — it predicts what's about to break:
See Knowledge Intelligence Guide for details.
Build a connector for any knowledge source in any language. DocBrain handles scheduling, retries, circuit breaking, and ingestion — your connector just serves three HTTP endpoints:
GET /health → { "status": "ok", "connector_name": "notion" }
POST /documents/list → Return document IDs (paginated, incremental via "since")
POST /documents/fetch → Return full document content for given source IDs
Register it in DocBrain, set a cron schedule, and every document flows through the same quality pipeline as built-in sources. Includes SSRF protection, circuit breaker (auto-disable after 5 failures), and incremental sync. Connector Protocol Docs →
10 tools for Claude Code, Cursor, and any MCP-compatible editor:
docbrain_annotate — Link knowledge to exact code locationsdocbrain_suggest_capture — AI suggests what to capture from your current contextdocbrain_commit_capture — Capture intent and decisions at commit timedocbrain_ask — Query your knowledge base without leaving the IDEfragment.captured, gap.detected, draft.created, sla.breached, quality.scoredDocBrain ships with a full web application — not a thin wrapper, but a complete management interface:
| Integration | Type |
|---|---|
| Slack | /docbrain ask, /docbrain incident, thread capture (shortcut or @DocBrain capture) |
| MCP (IDE) | 10 tools for Claude Code, Cursor, and any MCP-compatible editor |
| CLI | docbrain ask, docbrain login, docbrain capture, docbrain freshness |
| GitHub | PR capture via Actions or webhooks, discussion capture |
| GitLab | MR discussion capture, webhook-driven indexing |
| Jira | Issue and comment capture for decision tracking |
| Confluence | Bidirectional — ingest from Confluence, publish drafts back to Confluence |
| PagerDuty / OpsGenie | Incident resolution capture |
| HTTP Connector | Stateless protocol for custom source ingestion |
| OpenAPI | Swagger UI at /api/docs, auto-generated spec at /api/docs/openapi.json |
graph TB
subgraph "Capture Layer"
CI["CI/CD Pipelines"]
IDE["IDE (MCP)"]
SLACK["Slack / Teams"]
WEB["Web UI"]
CLI["CLI"]
API_EXT["External APIs"]
end
subgraph "DocBrain Server (Rust / Axum)"
FRAG["Fragment Router"]
QUAL["Quality Pipeline<br/><i>structural + style + semantic</i>"]
CLUST["Clustering Engine"]
COMP["Composition Engine"]
REV["Review Workflows"]
RAG["RAG Pipeline<br/><i>intent → search → memory → generate</i>"]
AUTO["Autopilot<br/><i>gap detection + draft generation</i>"]
GOV["Governance<br/><i>ownership + SLAs + notifications</i>"]
PRED["Predictive Intelligence<br/><i>cascade + seasonal + onboarding</i>"]
EVT["Event Bus + Webhooks"]
end
subgraph "Storage"
PG["PostgreSQL<br/><i>fragments · scores · workflows<br/>SLAs · memory · entities · events</i>"]
OS["OpenSearch<br/><i>vector (k-NN) + keyword (BM25)</i>"]
RD["Redis<br/><i>sessions · cache</i>"]
end
subgraph "LLM Providers"
PROVIDERS["Anthropic · OpenAI · Bedrock<br/>Ollama · Gemini · Vertex AI<br/>DeepSeek · Groq · Mistral · xAI<br/>Azure OpenAI · OpenRouter<br/>Together AI · Cohere"]
end
CI & IDE & SLACK & WEB & CLI & API_EXT --> FRAG
FRAG --> QUAL --> CLUST --> COMP --> REV
WEB & CLI & SLACK --> RAG
RAG & AUTO & GOV & PRED --> PG & OS
RAG & AUTO & COMP & QUAL --> PROVIDERS
EVT --> PG
GOV --> EVT
| Component | Technology | Role |
|---|---|---|
| API Server | Rust, Axum, Tower | HTTP/SSE, auth, RBAC, rate limiting |
| Quality Pipeline | Structural + Rule Engine + LLM | 3-layer document and fragment scoring |
| Fragment Engine | DBSCAN clustering, LLM composition | Capture, route, cluster, compose |
| Review System | Multi-stage state machine | Configurable approval workflows |
| Governance | SLA checker, breach detection | Ownership, accountability, notifications |
| RAG Pipeline | Hybrid search, 4-tier memory | Intent classification, generation |
| Autopilot | Gap analysis, severity scoring | Autonomous gap detection and draft generation |
| Predictive | Graph analysis, pattern detection | Cascade staleness, seasonal, onboarding |
| Storage | PostgreSQL 17, OpenSearch 2.19, Redis 7 | Metadata, vectors, sessions |
DocBrain runs entirely in your infrastructure. No data leaves your network unless you configure an external LLM provider.
YOUR NETWORK BOUNDARY
┌──────────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────┐ TLS + Bearer Token ┌────────────────-┐ │
│ │ Users │ ──────────────────────────▶ │ DocBrain │ │
│ │ (Browser, │ │ Server │ │
│ │ CLI, Slack,│ ◀────── JSON / SSE ──────── │ (Rust/Axum) │ │
│ │ MCP IDE) │ │ │ │
│ └─────────────┘ │ • RBAC (4 roles│ │
│ │ • Argon2 keys │ │
│ │ • Rate limiting│ │
│ │ • Audit logging│ │
│ └──┬──┬──┬──┬────┘ │
│ │ │ │ │ │
│ ┌──────────────────────────────────┘ │ │ │ │
│ ▼ ▼ ▼ │ │ │
│ ┌───────────────┐ ┌──────────────────┐ ┌────────────┐│ │ │
│ │ PostgreSQL │ │ OpenSearch │ │ Redis ││ │ │
│ │ │ │ │ │ ││ │ │
│ │ • Users/keys │ │ • Document │ │ • Sessions ││ │ │
│ │ • Episodes │ │ chunks + │ │ • Rate ││ │ │
│ │ • Fragments │ │ embeddings │ │ counters ││ │ │
│ │ • Gap clusters│ │ • BM25 + k-NN │ │ • Working ││ │ │
│ │ • Audit log │ │ hybrid search │ │ memory ││ │ │
│ └───────────────┘ └──────────────────┘ └────────────┘│ │ │
│ │ │ │
│ All storage is self-hosted. No credentials leave. │ │ │
│ │ │ │
│ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ - -│ │ │
│ OPTION A: LLM stays inside your network │ │ │
│ │ ┌───────────┘ │ │
│ ▼ │ │
│ │ ┌──────────────────┐ │ │
│ │ Ollama │ │ │
│ │ │ (local model) │ │ │
│ │ Nothing leaves. │ │ │
│ │ └──────────────────┘ │ │
│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │─ ┘
└───────────────────────────────────────────────────────────│──────┘
│
OPTION B: LLM in your cloud account ───────────────│──────
│
┌──────────────────────────────────────────────┘
▼
┌────────────────────────┐ Only query text + relevant chunk
│ AWS Bedrock │ context is sent. Your cloud account,
│ Azure OpenAI │ your data policies, your encryption
│ Google Vertex AI │ keys. No data shared with third
└────────────────────────┘ parties.
OPTION C: Third-party LLM API ─────────────────────────────
│
▼
┌────────────────────────┐ Query text + relevant chunk context
│ Anthropic API │ sent via TLS. Subject to provider's
│ OpenAI API │ data policies. No bulk export —
│ Groq / Mistral / etc. │ only per-request context.
└────────────────────────┘
The LLM is required — it powers RAG, intent classification, quality scoring, and draft generation. You choose where it runs:
| Option | Data leaves your network? | Best for |
|---|---|---|
| Ollama (local) | No. Zero egress. | Air-gapped, regulated, maximum control |
| Bedrock / Azure / Vertex | Stays in your cloud account | Enterprise — your KMS, your VPC, your audit trail |
| Anthropic / OpenAI / etc. | Query + chunk context sent via TLS | Fastest setup, best model quality |
What data goes where:
| Data | Stays in your infra | Sent to LLM |
|---|---|---|
| Documents, embeddings, indexes | Yes (PostgreSQL + OpenSearch) | No |
| User queries | Yes (episodes table) | Yes — needed for answer generation |
| API keys, passwords | Yes (Argon2 hashed) | No |
| Chunk context for answers | Yes (OpenSearch) | Yes — relevant chunks only, not full corpus |
| Analytics, gap clusters, feedback | Yes (PostgreSQL) | No |
Security controls:
| Control | Implementation |
|---|---|
| Authentication | API keys with Argon2 hashing, OIDC/SSO (GitHub, GitLab, generic OIDC) |
| Authorization | 4-tier RBAC (Viewer → Editor → Analyst → Admin) enforced on every endpoint |
| Space isolation | Per-key allowed_spaces hard-filters search results — users only see their team's docs |
| Rate limiting | Per-key RPM limits with sliding window |
| Secrets | Keys shown once at creation, stored as hashes. Bootstrap key written to file with 0600 permissions |
| Audit | All admin actions logged with user, action, timestamp, and target |
| SQL injection | Compile-time verified parameterized queries (sqlx) — no string interpolation |
| Prompt injection | XML delimiter sanitization on all untrusted content entering LLM context |
| Webhook verification | HMAC-SHA256 signed payloads for inbound webhooks (Confluence, GitHub, GitLab) |
For the full threat model with 10 analyzed attack vectors and an operator security checklist, see THREAT_MODEL.md.
| Provider | Config |
|---|---|
| Anthropic | LLM_PROVIDER=anthropic |
| OpenAI | LLM_PROVIDER=openai |
| AWS Bedrock | LLM_PROVIDER=bedrock |
| Ollama | LLM_PROVIDER=ollama — 100% local, no data leaves your machine |
| Google Gemini | LLM_PROVIDER=gemini |
| Vertex AI | LLM_PROVIDER=vertex_ai |
| DeepSeek | LLM_PROVIDER=deepseek |
| Groq | LLM_PROVIDER=groq |
| Mistral | LLM_PROVIDER=mistral |
| xAI (Grok) | LLM_PROVIDER=xai |
| Azure OpenAI | LLM_PROVIDER=azure_openai |
| OpenRouter | LLM_PROVIDER=openrouter |
| Together AI | LLM_PROVIDER=together |
| Cohere | LLM_PROVIDER=cohere |
See Provider Setup for detailed configuration including model selection guidance.
docker compose up -d
Starts the API server (localhost:3000), web UI (localhost:3001), PostgreSQL, OpenSearch, and Redis. Migrations run automatically on first boot.
helm install docbrain ./helm/docbrain \
--set llm.provider=anthropic \
--set llm.anthropicApiKey=sk-ant-...
See Kubernetes Guide for production configuration, scaling, and monitoring.
DocBrain uses a config-first architecture:
| File | Purpose |
|---|---|
config/default.yaml |
Non-secret defaults — all features, thresholds, intervals |
config/local.yaml |
Credentials and local overrides (gitignored) |
.env |
Infrastructure secrets: DATABASE_URL, LLM API keys |
Environment variables always override config files. See Configuration Guide.
| Quickstart | Running locally in 5 minutes |
| Configuration | All environment variables and options |
| Provider Setup | LLM and embedding provider configuration |
| Architecture | System design, data flow, memory, freshness |
| Ingestion Guide | Connecting 13+ knowledge sources |
| External Connectors | Build custom connectors for any knowledge source |
| Governance | Ownership, SLAs, breach detection, dashboards |
| Review Workflows | Multi-stage approval pipelines |
| Knowledge Intelligence | Graph, analytics, predictive intelligence |
| Autopilot | Gap detection, draft generation, feedback loop |
| Learning Pipeline | Embedding fine-tuning (opt-in) |
| API Reference | Full REST API documentation |
| RBAC | Role-based access control and SSO |
| Slack Integration | Slash commands, message shortcuts, and thread capture |
| GitLab Capture | MR discussion indexing |
| Kubernetes | Helm chart deployment |
| What is DocBrain? — 5-min overview | Deep Dive Podcast — 20-min deep dive |
| MCP Preview — 30-sec IDE demo | Full Proof Demo — Downvote → Gap → Draft |
We welcome contributions. Since source code is not yet published, current contributions focus on documentation, configuration, and feedback. See Contributing Guide.
To report a security vulnerability, see SECURITY.md. Do not file a public issue.
Business Source License 1.1 (BSL 1.1). Production use is permitted, except offering DocBrain as a hosted service. Converts to Apache 2.0 on the earlier of January 1, 2028, or 5,000 GitHub stars. For alternative licensing: [email protected].
Contributor Covenant Code of Conduct. Report concerns to [email protected].
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"docbrain": {
"command": "npx",
"args": []
}
}
}