loading…
Search for a command to run...
loading…
A self-contained Memory server with single-binary architecture (embedded DB & models, no dependencies). Provides persistent semantic and graph-based memory for
A self-contained Memory server with single-binary architecture (embedded DB & models, no dependencies). Provides persistent semantic and graph-based memory for AI agents.
Release Docker License: MIT Built with Rust Architecture
A high-performance, pure Rust Model Context Protocol (MCP) server that provides persistent, semantic, and graph-based memory for AI agents.
Works perfectly with:
Unlike other memory solutions that require a complex stack (Python + Vector DB + Graph DB), this project is a single, self-contained executable.
It combines:
graph TD
User[AI Agent / IDE]
subgraph "Memory MCP Server"
MS[MCP Server]
subgraph "Core Engines"
ES[Embedding Service]
GS[Graph Service]
CS[Codebase Service]
end
MS -- "Store / Search" --> ES
MS -- "Relate Entities" --> GS
MS -- "Index" --> CS
ES -- "Vectorize Text" --> SDB[(SurrealDB Embedded)]
GS -- "Knowledge Graph" --> SDB
CS -- "AST Chunks" --> SDB
end
User -- "MCP Protocol" --> MS
Memory is useless if your agent doesn't check it. To get the "Long-Term Memory" effect, you must instruct your agent to follow a strict protocol.
We provide a battle-tested Memory Protocol (AGENTS.md) that you can adapt.
The protocol implements specific flows to handle Context Window Compaction and Session Restarts:
TASK: in_progress immediately. This restores the full context of what was happening before the last session ended or the context was compacted.TASK:, DECISION:, RESEARCH:) so semantic search can precisely target the right type of information, reducing noise.These workflows turn the agent from a "stateless chatbot" into a "stateful worker" that survives restarts and context clearing.
Instead of scattering instructions across IDE-specific files (like .cursorrules), establish AGENTS.md as the Single Source of Truth.
Instruct your agent (in its base system prompt) to:
AGENTS.md at the start of every session.Here is a minimal reference prompt to bootstrap this behavior:
# 🧠 Memory & Protocol
You have access to a persistent memory server and a protocol definition file.
1. **Protocol Adherence**:
- READ `AGENTS.md` immediately upon starting.
- Strictly follow the "Session Startup" and "Sync" protocols defined there.
2. **Context Restoration**:
- Run `search_text("TASK: in_progress")` to restore context.
- Do NOT ask the user "what should I do?" if a task is already in progress.
Without this protocol, the agent loses context after compaction or session restarts. With this protocol, it maintains the full context of the current task, ensuring no steps or details are lost, even when the chat history is cleared.
To use this MCP server with any client (Claude Code, OpenCode, Cline, etc.), use the following Docker command structure.
Key Requirements:
-v mcp-data:/data (Persists your graph, embeddings, and cached model weights)-v $(pwd):/project:ro (Allows the server to read and index your code)--init (Ensures the server shuts down cleanly)[!TIP] One volume persists everything: The single
-v mcp-data:/datamount covers both the SurrealDB database and the ~1.2 GB embedding model (stored under/data/models/). There is no need for a separate volume for/data/models— it is already a subdirectory of/dataand is preserved automatically. Without a named volume, Docker creates a new anonymous volume on eachdocker run, causing the model to re-download (~1.2 GB) every time.
Add this to your configuration file (e.g., claude_desktop_config.json):
{
"mcpServers": {
"memory": {
"command": "docker",
"args": [
"run",
"--init",
"-i",
"--rm",
"--memory=3g",
"-v", "mcp-data:/data",
"-v", "/absolute/path/to/your/project:/project:ro",
"ghcr.io/pomazanbohdan/memory-mcp-1file:latest"
]
}
}
}
Note: Replace
/absolute/path/to/your/projectwith the actual path you want to index. In some environments (like Cursor or VSCode extensions), you might be able to use variables like${workspaceFolder}, but absolute paths are most reliable for Docker.
stdiomemorydocker run --init -i --rm --memory=3g -v mcp-data:/data -v "/Users/yourname/projects/current:/project:ro" ghcr.io/pomazanbohdan/memory-mcp-1file:latest
(Remember to update the project path when switching workspaces if you need code indexing)docker run --init -i --rm --memory=3g \
-v mcp-data:/data \
-v $(pwd):/project:ro \
ghcr.io/pomazanbohdan/memory-mcp-1file:latest
You can run the server directly via npx or bunx. The npm package automatically downloads the correct pre-compiled binary for your platform.
Add to claude_desktop_config.json:
{
"mcpServers": {
"memory": {
"command": "npx",
"args": ["-y", "memory-mcp-1file"]
}
}
}
claude mcp add memory -- npx -y memory-mcp-1file
commandmemorynpx -y memory-mcp-1fileOr add to .cursor/mcp.json:
{
"mcpServers": {
"memory": {
"command": "npx",
"args": ["-y", "memory-mcp-1file"]
}
}
}
Add to your MCP settings:
{
"mcpServers": {
"memory": {
"command": "npx",
"args": ["-y", "memory-mcp-1file"]
}
}
}
{
"mcpServers": {
"memory": {
"command": "bunx",
"args": ["memory-mcp-1file"]
}
}
}
Note: Unlike Docker,
npx/bunxruns the binary locally — it already has access to your filesystem, so no directory mounting is needed. To customize the data storage path, pass--data-dirvia args:"args": ["-y", "memory-mcp-1file", "--", "--data-dir", "/path/to/data"]
Add to your ~/.gemini/settings.json:
{
"mcpServers": {
"memory": {
"command": "npx",
"args": ["-y", "memory-mcp-1file"]
}
}
}
Or with Docker:
{
"mcpServers": {
"memory": {
"command": "docker",
"args": [
"run", "--init", "-i", "--rm", "--memory=3g",
"-v", "mcp-data:/data",
"-v", "${workspaceFolder}:/project:ro",
"ghcr.io/pomazanbohdan/memory-mcp-1file:latest"
]
}
}
}
qwen3 by default) for "vibe-based" retrieval.User, Project, Tech) and their relations (uses, likes). Supports PageRank-based traversal.valid_from and valid_until dates.The server exposes 18 tools to the AI model, organized into logical categories.
| Tool | Description |
|---|---|
store_memory |
Store a new memory with content and optional metadata. |
update_memory |
Update memory fields. |
delete_memory |
Delete memory by ID. |
list_memories |
List memories (newest first). |
get_memory |
Get full memory by ID. |
invalidate |
Soft-delete memory, optionally linking replacement. |
get_valid |
Get valid memories. Optional timestamp (ISO 8601) for point-in-time query. |
| Tool | Description |
|---|---|
recall |
Hybrid search (Vector + Keyword + Graph via RRF). Default for memories. |
search_memory |
Search memories. mode: vector (default) or bm25. |
| Tool | Description |
|---|---|
knowledge_graph |
Unified KG operations. action: create_entity | create_relation | get_related | detect_communities. |
| Tool | Description |
|---|---|
index_project |
Index codebase directory for code search. |
delete_project |
Delete indexed project. |
recall_code |
Code retrieval. mode: vector or hybrid (default). Hybrid uses vector+BM25+graph fusion. |
search_symbols |
Search code symbols by name. |
symbol_graph |
Navigate symbol graph. action: callers | callees | related. |
project_info |
Project info. action: list | status | stats. |
| Tool | Description |
|---|---|
get_status |
Get system status and startup progress. |
reset_all_memory |
DANGER: Reset all database data (requires confirm=true). |
Environment variables or CLI args:
| Arg | Env | Default | Description |
|---|---|---|---|
--data-dir |
DATA_DIR |
./data |
DB location |
--model |
EMBEDDING_MODEL |
qwen3 |
Embedding model (qwen3, gemma, bge_m3, nomic, e5_multi, e5_small) |
--mrl-dim |
MRL_DIM |
(native) | Output dimension for MRL-supported models (e.g. 64, 128, 256, 512, 1024 for Qwen3). Defaults to the model's native maximum dimension (1024 for Qwen3). |
--batch-size |
BATCH_SIZE |
8 |
Maximum batch size for embedding inference |
--cache-size |
CACHE_SIZE |
1000 |
LRU cache capacity for embeddings |
--timeout |
TIMEOUT_MS |
30000 |
Timeout in milliseconds |
--idle-timeout |
IDLE_TIMEOUT |
0 |
Idle timeout in minutes. 0 = disabled |
--log-level |
LOG_LEVEL |
info |
Verbosity |
| (None) | HF_TOKEN |
(None) | HuggingFace Token (ONLY required for gated models like gemma) |
| (None) | EMBEDDING_QUEUE_CAPACITY |
256 |
Max size of the background embedding queue |
| (None) | EMBEDDING_BATCH_SIZE |
8 |
How many files to process in one embedding chunk |
| (None) | INDEX_BATCH_SIZE |
20 |
How many files to process in one incremental chunk |
| (None) | INDEX_DEBOUNCE_MS |
2000 |
MS to wait before flushing index events (debounce) |
| (None) | MANIFEST_DIFF_INTERVAL_MINS |
10 |
Minutes between periodic missing file checks |
You can switch the embedding model using the --model arg or EMBEDDING_MODEL env var.
| Argument Value | HuggingFace Repo | Dimensions | Size | Use Case |
|---|---|---|---|---|
qwen3 |
Qwen/Qwen3-Embedding-0.6B |
1024 (MRL) | 1.2 GB | Default. Top open-source 2026 model, 32K context, MRL support. |
gemma |
onnx-community/embeddinggemma-300m-ONNX |
768 (MRL) | ~195 MB | Lighter alternative with MRL support. (Requires proprietary license agreement) |
bge_m3 |
BAAI/bge-m3 |
1024 | 2.3 GB | State-of-the-art multilingual hybrid retrieval. Heavy. |
nomic |
nomic-ai/nomic-embed-text-v1.5 |
768 | 1.9 GB | High quality long-context BERT-compatible. |
e5_multi |
intfloat/multilingual-e5-base |
768 | 1.1 GB | Legacy; kept for backward compatibility. |
e5_small |
intfloat/multilingual-e5-small |
384 | 134 MB | Fastest, minimal RAM. Good for dev/testing. |
Models marked with (MRL) support dynamically truncating the output embedding vector to a smaller dimension (e.g., 512, 256, 128) with minimal loss of accuracy. This saves database storage and speeds up vector search.
Use the --mrl-dim argument to specify the desired size. If omitted, the default is the model's native base dimension (e.g., 1024 for Qwen3).
Warning: Once your database is created with a specific dimension, you cannot change it without wiping the data directory.
By default, the server uses Qwen3, which is fully open-source and downloads automatically without any authentication.
However, if you choose to use Gemma (--model gemma), you must authenticate because it is a "Gated Model" with a proprietary license.
To use Gemma:
# Using environment variable
HF_TOKEN="hf_your_token_here" memory-mcp --model gemma
# Or via .env file (see .env.example)
[!WARNING] Changing Models & Data Compatibility
If you switch to a model with different dimensions (e.g., from
e5_smalltoe5_multi), your existing database will be incompatible. You must delete the data directory (volume) and re-index your data.Switching between models with the same dimensions (e.g.,
e5_multi<->nomic) is theoretically possible but not recommended as semantic spaces differ.
Based on analysis of advanced memory systems like Hindsight (see their documentation for details on these mechanisms), we are exploring these "Cognitive Architecture" features for future releases:
reflect background process (or tool) that periodicallly scans recent memories to:namespace or project_id scoping.confidence score (0.0 - 1.0) to memory schemas.MIT
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"pomazanbohdan-memory-mcp-1file": {
"command": "npx",
"args": []
}
}
}