loading…
Search for a command to run...
loading…
Persistent AI memory server with 3-layer hybrid search (vector + FTS5 + keyword), confidence scoring via Reciprocal Rank Fusion, episodic/profile memory, and 16
Persistent AI memory server with 3-layer hybrid search (vector + FTS5 + keyword), confidence scoring via Reciprocal Rank Fusion, episodic/profile memory, and 16 tools. Zero LLM dependency. Works standalone with Claude Desktop and Claude Code. MIT licensed.
Give Claude persistent memory across sessions. Single SQLite file. 21 tools. Zero LLM dependency.
Quick Start · Features · Architecture · All Tools · Zenn Book (JP)
Standalone repository — This is the standalone version for use with Claude Desktop, Claude Code, and any MCP client. If you are a ClotoCore user, use the version in cloto-mcp-servers instead.
Claude forgets everything between sessions. Every conversation starts from zero — no context about your project, your preferences, or what you discussed yesterday.
cpersona fixes this. It's an MCP server that stores memories in a local SQLite file and retrieves them through hybrid search. Claude remembers you.
Prerequisites: Python 3.10+, Git
git clone https://github.com/Cloto-dev/cpersona.git
cd cpersona
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
# source .venv/bin/activate
pip install .
cpersona's hybrid search works best with an embedding server for vector similarity. We recommend using cloto-mcp-servers/embedding with the jina-v5-nano model (33M params, 768d, runs locally on CPU):
git clone https://github.com/Cloto-dev/cloto-mcp-servers.git
cd cloto-mcp-servers/servers
pip install ./embedding
Without an embedding server, cpersona falls back to FTS5 + keyword search only. Vector search (the strongest retrieval layer) will be disabled.
Claude Desktop — add to claude_desktop_config.json:
{
"mcpServers": {
"embedding": {
"command": "/path/to/.venv/bin/python",
"args": ["/path/to/servers/embedding/server.py"],
"env": {
"EMBEDDING_PROVIDER": "onnx_jina_v5_nano",
"EMBEDDING_HTTP_PORT": "8401"
}
},
"cpersona": {
"command": "/path/to/.venv/bin/python",
"args": ["/path/to/cpersona/server.py"],
"env": {
"CPERSONA_DB_PATH": "/home/you/.claude/cpersona.db",
"CPERSONA_EMBEDDING_MODE": "http",
"CPERSONA_EMBEDDING_URL": "http://127.0.0.1:8401/embed"
}
}
}
}
Windows: use
.venv/Scripts/python.exeandC:/Users/you/.claude/cpersona.db
Claude Code:
claude mcp add-json embedding '{"type":"stdio","command":"/path/to/.venv/bin/python","args":["/path/to/servers/embedding/server.py"],"env":{"EMBEDDING_PROVIDER":"onnx_jina_v5_nano","EMBEDDING_HTTP_PORT":"8401"}}' -s user
claude mcp add-json cpersona '{"type":"stdio","command":"/path/to/.venv/bin/python","args":["/path/to/cpersona/server.py"],"env":{"CPERSONA_DB_PATH":"/home/you/.claude/cpersona.db","CPERSONA_EMBEDDING_MODE":"http","CPERSONA_EMBEDDING_URL":"http://127.0.0.1:8401/embed"}}' -s user
That's it. Claude now has persistent memory. Ask it to store something and recall it in a later session.
Hybrid Search — Three independent retrieval strategies run in parallel and merge results via Reciprocal Rank Fusion (RRF):
| Layer | Method | Strength |
|---|---|---|
| Vector | Cosine similarity (jina-v5-nano, 768d) | Semantic meaning |
| FTS5 | SQLite full-text search with trigram tokenizer | Exact terms, names, IDs |
| Keyword | Fallback pattern matching | Edge cases, partial matches |
Memory Types:
storearchive_episodeupdate_profileConfidence Scoring — Each recalled memory gets a confidence score combining:
Zero LLM Dependency — cpersona is a pure data server. It never calls an LLM internally. All summarization and extraction is performed by the calling agent. This means zero API costs from cpersona itself, deterministic behavior, and no hidden latency.
Additional capabilities:
┌─────────────────────────────────────┐
│ MCP Host │
│ (Claude Desktop / Claude Code) │
└──────────────┬──────────────────────┘
│ MCP (JSON-RPC)
┌──────────────▼──────────────────────┐
│ cpersona │
│ (server.py) │
│ │
│ ┌─────────┐ ┌─────────┐ │
│ │ store │ │ recall │ ... │
│ └────┬────┘ └────┬────┘ │
│ │ │ │
│ ┌────▼─────────────▼────────────┐ │
│ │ SQLite DB │ │
│ │ │ │
│ │ memories (content + embed) │ │
│ │ episodes (summaries) │ │
│ │ profiles (attributes) │ │
│ │ memories_fts (FTS5 index) │ │
│ │ episodes_fts (FTS5 index) │ │
│ │ task_queue (async jobs) │ │
│ └────────────────────────────────┘ │
│ │
└──────────────┬───────────────────────┘
│ HTTP
┌──────────────▼──────────────────────┐
│ Embedding Server │
│ (jina-v5-nano ONNX, 768d) │
└─────────────────────────────────────┘
Recall flow (RRF mode):
Query → ┌── Vector search (cosine similarity) ──┐
├── FTS5 search (episodes + memories) ──┼── RRF merge → Confidence scoring → Top-K
└── Keyword fallback ──┘
Tested on LMEB (Long-term Memory Evaluation Benchmark, results) — 22 evaluation tasks measuring memory retrieval quality:
| Embedding Model | Params | Dimensions | Mean NDCG@10 |
|---|---|---|---|
| MiniLM-L6-v2 | 22M | 384 | 36.88 |
| e5-small | 33M | 384 | 46.36 |
| jina-v5-nano | 33M | 768 | 54.14 |
jina-v5-nano achieves +47% improvement over the MiniLM baseline.
| Tool | Description |
|---|---|
store |
Store a message in agent memory |
recall |
Recall relevant memories (vector + FTS5 + keyword, RRF merge) |
get_profile |
Get current agent profile |
update_profile |
Save pre-computed agent profile |
archive_episode |
Archive conversation episode with summary and keywords |
list_memories |
List recent memories |
list_episodes |
List archived episodes |
delete_memory |
Delete a single memory (ownership enforced) |
delete_episode |
Delete a single episode (ownership enforced) |
delete_agent_data |
Delete all data for an agent |
calibrate_threshold |
Auto-calibrate vector search threshold via z-score |
export_memories |
Export to JSONL (memories, episodes, profiles) |
import_memories |
Import from JSONL (idempotent via msg_id dedup) |
merge_memories |
Merge one agent's data into another (atomic, with dedup) |
get_queue_status |
Background task queue status |
recall_with_context |
Recall with external conversation context (auto-dedup) |
update_memory |
Update memory content (rejects if locked) |
lock_memory |
Lock memory to prevent deletion/editing |
unlock_memory |
Unlock memory to allow deletion/editing |
check_health |
16-point database health check with auto-repair |
deep_check |
Deep semantic data quality analysis with auto-repair |
All settings via environment variables with sensible defaults:
| Variable | Default | Description |
|---|---|---|
CPERSONA_DB_PATH |
./cpersona.db |
SQLite database path |
CPERSONA_EMBEDDING_MODE |
http |
Embedding mode (http or disabled) |
CPERSONA_EMBEDDING_URL |
http://127.0.0.1:8401/embed |
Embedding server URL |
CPERSONA_VECTOR_SEARCH_MODE |
remote |
Vector search mode |
CPERSONA_SEARCH_MODE |
rrf |
Search strategy (rrf or cascade) |
CPERSONA_RRF_K |
60 |
RRF smoothing parameter |
CPERSONA_CONFIDENCE_ENABLED |
false |
Include confidence metadata in results |
CPERSONA_AUTO_CALIBRATE |
false |
Auto-calibrate on startup |
CPERSONA_TASK_QUEUE_ENABLED |
false |
Enable background task queue |
CPERSONA_RECENT_RECALL_PENALTY |
0.7 |
Penalty for recently recalled memories |
CPERSONA_RECENT_RECALL_WINDOW_MIN |
5 |
Window (minutes) for recent recall penalty |
server.py)cpersona is an MCP server — it works with any MCP-compatible host:
cpersona is the memory layer of ClotoCore, an open-source AI agent platform written in Rust. While cpersona is fully standalone (MIT license), it was designed to give AI agents persistent, searchable memory within the ClotoCore ecosystem.
MIT — free to use from any MCP host without restriction.
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"cpersona": {
"command": "npx",
"args": []
}
}
}