loading…
Search for a command to run...
loading…
An observational memory server that uses LLM agents to capture, compress, and recall project-specific decisions and context across AI coding sessions. It stores
An observational memory server that uses LLM agents to capture, compress, and recall project-specific decisions and context across AI coding sessions. It stores structured observations in SQLite to maintain long-term session continuity and architectural awareness.
Code-aware observational memory MCP server for AI coding assistants.
AI coding assistants (Claude Code, Cursor, Windsurf) suffer from session amnesia — context is lost on compaction or between sessions. Existing solutions are either framework-locked (Mastra's Observational Memory requires their agent framework) or simplistic (mcp-memory-keeper stores key-value pairs without intelligent compression).
codewatch-memory is an MCP-native server that implements observational memory specifically for coding workflows. It uses cheap LLMs (Groq, Gemini Flash) as Observer/Reflector agents to compress conversation context into a structured observation log, stored in SQLite, scoped per git branch.
┌─────────────────────────────────────────────────────────────────┐
│ Claude Code / Cursor / Windsurf │
│ (MCP Client) │
└──────┬──────────────────┬──────────────────────┬────────────────┘
│ │ │
[Stdout/Stdin] [Hook: Stop/PreCompact] [Hook: UserPromptSubmit
(MCP Tools) (save observations) /SessionStart]
│ │ (recall context)
v v │
┌────────────┐ ┌──────────────────┐ v
│ MCP Server │ │ Hook Subprocess │ ┌──────────────────┐
│(stdio mode)│ │ --hook mode │ │ Recall Subprocess│
└─────┬──────┘ └────────┬─────────┘ │ --recall mode │
│ │ └────────┬─────────┘
│ 5 Tools │ Transcript │ FTS5 search
│ observe/recall/ │ parsing & │ keyword extraction
│ reflect/ │ observation │ (no LLM, ~50ms)
│ get_session_info│ extraction │
│ switch_context │ │
v v v
┌──────────────────────────────────────────────────────┐
│ Agents Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Observer │ │Reflector │ │ Categorizer │ │
│ │ extract │ │ compress │ │ classify │ │
│ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
└───────┼──────────────┼───────────────┼───────────────┘
│ │ │
v v v
┌──────────────┐ ┌────────────────────────────┐
│ LLM Provider │ │ SQLite Database │
│ Groq (free) │ │ sessions / observations │
│ Google │ │ reflections / tasks │
│ OpenAI │ │ FTS5 full-text search │
└──────────────┘ └────────────────────────────┘
Capture → Categorize → Store → Search → Compress
Three modes of operation:
Hook mode (--hook) — Fires automatically after every AI response (Stop event) and before context compaction (PreCompact). Reads the last 20 messages from the conversation transcript, sends them to a cheap LLM, and extracts structured observations. No manual tool calls needed.
Recall mode (--recall) — Fires on UserPromptSubmit (every user prompt) and SessionStart (session start/resume/compact). Searches stored observations via FTS5 keyword extraction and injects relevant context. Pure database queries, no LLM — adds ~50ms latency.
MCP server mode — Runs as an MCP server with 5 tools that the AI assistant can call directly for manual observation, recall, and compression.
Hook fires on Stop/PreCompact
→ Read last 20 transcript messages (JSONL)
→ Skip if < 50 tokens (trivial turn)
→ Skip if already processed (hash dedup)
→ Send to Observer LLM agent
→ Extract observations with priority + category
→ Store each in SQLite (FTS5 auto-indexed)
→ Update session stats
→ Check auto-reflect threshold (default 40K tokens)
→ If over: run Reflector with escalating compression
User types a prompt
→ UserPromptSubmit hook fires
→ Extract keywords from prompt (stop-word filtering, file paths, quoted phrases)
→ FTS5 search with OR query for broad recall
→ Fallback: individual keyword search → category heuristic
→ Inject matching observations as context (max ~1K tokens)
→ Claude sees relevant memories before processing the prompt
AI calls recall(query="authentication")
→ FTS5 full-text search on observations
→ Filter by category / priority / files / branch
→ Group by date with priority emojis
→ Include compressed reflections if requested
→ Include current task context
→ Return formatted observation log
Extracts facts and decisions from AI-developer conversations. Runs frequently (every Stop event via hooks).
Priority system:
Critical rules the Observer follows:
Compresses observations while preserving critical information. Runs rarely (only when tokens exceed threshold).
4 compression levels with escalation:
| Level | Detail | Reduction | Strategy |
|---|---|---|---|
| 0 | 10/10 | 0% | Reorganize only — merge duplicates, fix formatting |
| 1 | 8/10 | ~20% | Remove low-priority, consolidate conventions |
| 2 | 6/10 | ~40% | Drop all low-priority, merge by file/module |
| 3 | 4/10 | ~60% | Paragraph summaries, only recent dates keep individual entries |
Sacred content that survives ALL levels:
Escalation: If compression doesn't reach the target token count, the Reflector automatically tries the next level up to level 3.
Classifies observations into one of 8 categories. Uses heuristic keyword matching first (free, instant). Falls back to LLM only if confidence < 0.7.
| Category | Description | Default Priority |
|---|---|---|
architecture |
Design decisions, patterns chosen | high |
user_preference |
User's coding preferences, workflow choices | high |
bugfix |
Bugs found, root causes, fixes | medium |
convention |
Code conventions, naming patterns | medium |
dependency |
Package choices, version decisions | medium |
file_pattern |
Important file locations, project structure | medium |
task_context |
Current task goals, progress | medium |
learning |
Things learned about the codebase | medium |
# 1. Add the MCP server
claude mcp add codewatch -- npx codewatch-memory
# 2. Set up automatic hooks (saves + recalls memory without manual tool calls)
npx codewatch-memory --setup
# 3. Set at least one LLM API key for the Observer agent
export GROQ_API_KEY=your-key # recommended (free tier)
That's it. Observations are saved after every response and recalled automatically when you type a prompt.
claude mcp add codewatch -- pnpx codewatch-memory
npx codewatch-memory --setup
Add to .vscode/mcp.json:
{
"servers": {
"codewatch-memory": {
"command": "pnpx",
"args": ["codewatch-memory"],
"env": {
"GROQ_API_KEY": "${input:groqApiKey}",
"CODEWATCH_LLM_PROVIDER": "groq"
}
}
}
}
Then run npx codewatch-memory --setup in your project to configure automatic hooks.
Add to your MCP config (.cursor/mcp.json):
{
"mcpServers": {
"codewatch-memory": {
"command": "pnpx",
"args": ["codewatch-memory"],
"env": {
"GROQ_API_KEY": "your-key",
"CODEWATCH_LLM_PROVIDER": "groq"
}
}
}
}
Note: automatic hooks (--setup) are currently Claude Code only. Cursor/Windsurf users can still use the MCP tools manually.
git clone https://github.com/KlausAndrade/codewatch.git
cd codewatch
npm install
npm run build
claude mcp add codewatch -- node /path/to/dist/index.js
npx codewatch-memory --setup
The most powerful mode — observations are saved automatically and recalled automatically. No manual tool calls needed.
Add to .claude/settings.local.json:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "npx codewatch-memory --hook",
"timeout": 30,
"async": true
}
]
}
],
"PreCompact": [
{
"hooks": [
{
"type": "command",
"command": "npx codewatch-memory --hook",
"timeout": 60,
"async": true
}
]
}
],
"UserPromptSubmit": [
{
"hooks": [
{
"type": "command",
"command": "npx codewatch-memory --recall"
}
]
}
],
"SessionStart": [
{
"matcher": "startup|resume|compact",
"hooks": [
{
"type": "command",
"command": "npx codewatch-memory --recall"
}
]
}
]
}
}
Saving (automatic):
Stop fires after every Claude response — the hook reads the transcript and extracts observations via a cheap LLMPreCompact fires before context compaction — captures context before it's lostRecall (automatic):
UserPromptSubmit fires when you type a prompt — keywords are extracted from your message and matched against stored observations via FTS5 full-text search. Relevant observations are injected as context before Claude processes your prompt.SessionStart fires on session start, resume, or after context compaction — injects a briefing with your current task, high-priority observations, and the latest reflection summary.At minimum, one LLM API key is required for the Observer/Reflector agents:
# Groq (recommended — fast, free tier available)
export GROQ_API_KEY=your-key
# Google Gemini (cheapest paid option)
export GOOGLE_GENERATIVE_AI_API_KEY=your-key
# OpenAI
export OPENAI_API_KEY=your-key
observe — Store an observationcontent: "Chose repository pattern over active record for User module because team needs to swap DB later"
category?: "architecture" # auto-detected if omitted
priority?: "high" # auto-assigned if omitted
files?: ["src/repositories/UserRepository.ts"]
source_summary?: "user asked to refactor data access"
recall — Retrieve relevant observationsquery?: "authentication" # FTS5 full-text search
categories?: ["bugfix", "architecture"]
files?: ["src/auth/middleware.ts"]
priority_min?: "medium" # high, medium, or low
limit?: 50
include_reflections?: true
branch?: "feature/auth" # defaults to current branch
reflect — Manual compression triggercompression_level?: 0 # 0=reorganize, 1=light, 2=aggressive, 3=critical
branch?: "main"
get_session_info — Session statisticsReturns observation count, token usage, categories breakdown, current task, and compression history.
switch_context — Change branch scopebranch: "feature/new-ui" # or "auto" to re-detect from git
carry_task?: true # carry current task description to new branch
| Env Var | Default | Description |
|---|---|---|
CODEWATCH_LLM_PROVIDER |
google |
Primary LLM (google, openai, groq) |
CODEWATCH_FALLBACK_PROVIDER |
openai |
Fallback LLM (google, openai, groq, none) |
CODEWATCH_GOOGLE_MODEL |
gemini-2.5-flash |
Google model |
CODEWATCH_OPENAI_MODEL |
gpt-4o-mini |
OpenAI model |
CODEWATCH_GROQ_MODEL |
llama-3.3-70b-versatile |
Groq model |
GROQ_API_KEY |
— | Groq API key |
GOOGLE_GENERATIVE_AI_API_KEY |
— | Google API key |
OPENAI_API_KEY |
— | OpenAI API key |
CODEWATCH_REFLECT_THRESHOLD |
40000 |
Auto-reflect trigger (tokens) |
CODEWATCH_DATA_DIR |
~/mcp-data/codewatch-memory/ |
SQLite storage location |
CODEWATCH_AUTO_REFLECT |
true |
Enable auto-reflection |
CODEWATCH_MAX_COMPRESSION |
3 |
Max compression level (0-3) |
CODEWATCH_LOG_LEVEL |
info |
Logging verbosity |
The Observer and Reflector agents use cheap/fast LLMs — not your main coding model. Here's what to consider:
| Provider | Cost | Speed | Quality for this task |
|---|---|---|---|
| Groq (llama-3.3-70b) | Free tier | Fastest | Good — handles structured extraction well |
| Google (gemini-2.5-flash) | ~$0.001/call | Fast | Good — cheapest paid option |
| OpenAI (gpt-4o-mini) | ~$0.003/call | Moderate | Good — reliable fallback |
Recommendation: Use Groq as primary (free, fast) with Google or OpenAI as fallback.
A more expensive model (GPT-4o, Claude Sonnet) would provide marginal improvement for the Observer — structured extraction works well with smaller models. The Reflector benefits more from model quality, but it runs rarely. The real quality gains come from prompt tuning, not model upgrades.
Data is stored in ~/mcp-data/codewatch-memory/codewatch.db (SQLite with WAL mode). The database includes:
The Reflector includes 3 strategies to detect broken LLM output (repetitive/degenerate text):
When degenerate output is detected, the Reflector automatically retries at the next compression level.
| Feature | codewatch-memory | Mastra Observational Memory |
|---|---|---|
| Protocol | MCP (works with any MCP client) | Mastra framework only |
| Capture | Hook-based (last 20 messages per trigger) | Middleware (100% of messages) |
| Branch scoping | Per git branch | Not git-aware |
| Categories | 8 code-specific categories | Generic |
| Storage | Local SQLite + FTS5 | Configurable (Postgres, etc.) |
| Compression | 4-level escalation with sacred rules | 2-agent compression |
| Cost | Free (Groq) or near-free | Depends on provider |
| Setup | npx codewatch-memory |
Requires Mastra framework |
Trade-off: Mastra's middleware approach captures 100% of messages automatically. codewatch-memory's hook approach captures the last 20 messages per trigger — effective for most coding sessions but can miss observations from very early in long sessions.
npm install
npm run build # Compile TypeScript
npm run dev # Watch mode
npm test # Run tests
npm run test:watch # Watch mode tests
src/
├── index.ts # Entry point (MCP server, hook, or recall mode)
├── server.ts # MCP tool registration
├── config.ts # Environment config loading
├── hook.ts # Claude Code hook integration (save observations)
├── recall-hook.ts # Claude Code hook integration (auto-recall)
├── transcript.ts # JSONL transcript parser
├── agents/
│ ├── observer.ts # Observer agent (extract observations)
│ ├── reflector.ts # Reflector agent (compress observations)
│ ├── categorizer.ts # Heuristic + LLM categorization
│ └── prompts.ts # All agent prompt templates
├── storage/
│ ├── database.ts # SQLite schema + initialization
│ ├── observations.ts # Observation CRUD + FTS5 queries
│ ├── reflections.ts # Reflection storage
│ ├── sessions.ts # Session management
│ └── queries.ts # Current task queries
├── tools/
│ ├── observe.ts # MCP tool handler
│ ├── recall.ts # MCP tool handler
│ ├── reflect.ts # MCP tool handler
│ ├── get-session-info.ts
│ └── switch-context.ts
├── llm/
│ └── provider.ts # Multi-provider LLM with fallback
├── git/
│ └── branch.ts # Branch detection with 10s cache
└── utils/
├── tokens.ts # Token estimation (chars/4)
├── stdin.ts # Shared stdin reader for hooks
├── sanitize.ts # XML parsing, line truncation
└── repetition.ts # Degenerate output detection
MIT
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"codewatch-memory": {
"command": "npx",
"args": []
}
}
}