loading…
Search for a command to run...
loading…
Persistent cognitive memory for AI agents. Sub-millisecond recall, fully offline, encrypted. 8 tools: recall, recall_structured, store, store_code, store_decisi
Persistent cognitive memory for AI agents. Sub-millisecond recall, fully offline, encrypted. 8 tools: recall, recall_structured, store, store_code, store_decision, search, insights, consolidate.
Local Cognitive Runtime For Frozen AI Models
Deterministic · No fine-tuning · No cloud training · <1ms recall · ~3 MB
Your AI model is smart. But it forgets everything after every conversation.
AuraSDK is a local cognitive runtime that runs alongside any frozen model. It gives agents durable memory, explainability, governed correction, bounded recall reranking, and bounded self-adaptation through experience — all locally, without fine-tuning or cloud training.
pip install aura-memory
from aura import Aura, Level
brain = Aura("./agent_memory")
brain.enable_full_cognitive_stack() # activate all four bounded reranking overlays
# store what happens
brain.store("User always deploys to staging first", level=Level.Domain, tags=["workflow"])
brain.store("Staging deploy prevented 3 production incidents", level=Level.Domain, tags=["workflow"])
# recall — local retrieval with optional bounded cognitive reranking
context = brain.recall("deployment decision") # <1ms, no API call
# inspect advisory hints produced from stored evidence
hints = brain.get_surfaced_policy_hints()
# → [{"action": "Prefer", "domain": "workflow", "description": "deploy to staging first"}]
No API keys. No embeddings required. No cloud. The model stays the same — the cognitive layer becomes more structured, more inspectable, and more useful over time.
⭐ If AuraSDK is useful to you, a GitHub star helps us get funding to continue development from Kyiv.
| Aura | Mem0 | Zep | Cognee | Letta/MemGPT | |
|---|---|---|---|---|---|
| Architecture | 5-layer cognitive engine | Vector + LLM | Vector + LLM | Graph + LLM | LLM orchestration |
| Derived cognitive layers without LLM | Yes — Belief→Concept→Causal→Policy | No | No | No | No |
| Advisory policy hints from experience | Yes — bounded and non-executing | No | No | No | No |
| Learns from agent's own responses | Yes — bounded, auditable, no fine-tuning | No | No | No | No |
| Salience weighting | Yes — what matters persists longer | No | No | No | No |
| Contradiction governance | Yes — explicit, operator-visible | No | No | No | No |
| LLM required | No | Yes | Yes | Yes | Yes |
| Recall latency | <1ms | ~200ms+ | ~200ms | LLM-bound | LLM-bound |
| Works offline | Fully | Partial | No | No | With local LLM |
| Cost per operation | $0 | API billing | Credit-based | LLM + DB cost | LLM cost |
| Binary size | ~3 MB | ~50 MB+ | Cloud service | Heavy (Neo4j+) | Python pkg |
| Memory decay & promotion | Built-in | Via LLM | Via LLM | No | Via LLM |
| Trust & provenance | Built-in | No | No | No | No |
| Encryption at rest | ChaCha20 + Argon2 | No | No | No | No |
| Language | Rust | Python | Proprietary | Python | Python |
Fine-tuning costs thousands of dollars and weeks of work. RAG requires embeddings and a vector database. Context windows are expensive per token.
Aura gives you a third path: a local cognitive runtime that accumulates structured experience between conversations — free, local, sub-millisecond.
Week 1: GPT-4o-mini + Aura Week 1: GPT-4 alone
→ average answers → average answers
Week 4: GPT-4o-mini + Aura Week 4: GPT-4 alone
→ recalls your workflow → still forgets everything
→ surfaces patterns you repeat → same cost per token
→ exposes explainability + correction → no improvement
→ boundedly adapts from experience → no durable learning
→ $0 compute cost → still billing per call
The model stays the same. The cognitive layer gets stronger. That's Aura.
Benchmarked on 1,000 records (Windows 10 / Ryzen 7):
| Operation | Latency | vs Mem0 |
|---|---|---|
| Store | 0.09 ms | ~same |
| Recall (structured) | 0.74 ms | ~270× faster |
| Recall (cached) | 0.48 µs | ~400,000× faster |
| Maintenance cycle | 1.1 ms | No equivalent |
Mem0 recall requires an embedding API call (~200ms+) + vector search. Aura recall is pure local computation.
Aura's full cognitive recall pipeline is active and bounded:
Record → Belief (±5%) → Concept (±4%) → Causal (±3%) → Policy (±2%)
Enable everything in one call:
brain.enable_full_cognitive_stack() # activates all four bounded reranking phases
brain.disable_full_cognitive_stack() # back to raw RRF baseline
Or configure individual phases:
brain.set_belief_rerank_mode("limited") # belief-aware ranking
brain.set_concept_surface_mode("limited") # concept annotations + bounded concept reranking
brain.set_causal_rerank_mode("limited") # causal chain boost
brain.set_policy_rerank_mode("limited") # policy hint shaping
Higher layers also expose advisory surfaced output:
get_surfaced_concepts() — stable concept abstractions over repeated beliefsget_surfaced_causal_patterns() — learned cause→effect patternsget_surfaced_policy_hints() — advisory recommendations (Prefer / Avoid / Warn)Aura also ships operator-facing and plasticity-facing surfaces:
explain_recall()explain_record()provenance_chain()explainability_bundle()capture_experience()ingest_experience_batch()mark_record_salience()get_high_salience_records()get_salience_summary()get_reflection_summaries()get_latest_reflection_digest()get_reflection_digest()get_belief_instability_summary()get_contradiction_clusters()get_contradiction_review_queue()Aura organizes memories into 4 levels across 2 tiers. Important memories persist, trivial ones decay naturally:
CORE TIER (slow decay — weeks to months)
Identity [0.99] Who the user is. Preferences. Personality.
Domain [0.95] Learned facts. Domain knowledge.
COGNITIVE TIER (fast decay — hours to days)
Decisions [0.90] Choices made. Action items.
Working [0.80] Current tasks. Recent context.
SEMANTIC TYPES (modulate decay & promotion)
fact Default knowledge record.
decision More persistent than a standard fact. Promotes earlier.
preference Long-lived user or agent preference.
contradiction Preserved longer for conflict analysis.
trend Time-sensitive pattern tracked over repeated activation.
serendipity Cross-domain discovery record.
One call runs the lifecycle — decay, promotion, consolidation, and archival:
report = brain.run_maintenance() # background memory maintenance
Core Cognitive Runtime
fact, decision, trend, preference, contradiction, serendipity) that influence memory behavior and insightingnamespace="sandbox" keeps test data invisible to production recallTrust & Safety
recorded, retrieved, inferred, generatedAdaptive Memory
brain.feedback(id, useful=True) boosts useful memories, weakens noisebrain.supersede(old_id, new_content) with full version chainsbrain.snapshot("v1") / brain.rollback("v1") / brain.diff("v1","v2")export_context() / import_context() with trust metadataEnterprise & Integrations
store_image() / store_audio_transcript() with media provenance/metrics endpoint with 10+ business-level counters and histogramstelemetry feature flag with OTLP export and 17 instrumented spansStorageBackend trait abstraction (FsBackend + MemoryBackend)Advisory Cognitive Overlays
Explainability & Governed Adaptation
explain_recall(), explain_record(), provenance_chain(), explainability_bundle()Cognitive Guidance
from aura import Aura, TrustConfig
brain = Aura("./data")
tc = TrustConfig()
tc.source_trust = {"user": 1.0, "api": 0.8, "web_scrape": 0.5}
brain.set_trust_config(tc)
# User facts always rank higher than scraped data in recall
brain.store("User is vegan", channel="user")
brain.store("User might like steak restaurants", channel="web_scrape")
results = brain.recall_structured("food preferences", top_k=5)
# -> "User is vegan" scores higher, always
from aura import Aura
brain = Aura("./data")
# Plug in any embedding function: OpenAI, Ollama, sentence-transformers, etc.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
brain.set_embedding_fn(lambda text: model.encode(text).tolist())
# Now "login problems" matches "Authentication failed" via semantic similarity
brain.store("Authentication failed for user admin")
results = brain.recall_structured("login problems", top_k=5)
Without embeddings, Aura continues to use its local recall pipeline - still fast, still effective.
brain = Aura("./secret_data", password="my-secure-password")
brain.store("Top secret information")
assert brain.is_encrypted() # ChaCha20-Poly1305 + Argon2id
brain = Aura("./data")
# Decisions are treated as higher-value memory
brain.store("Use PostgreSQL over MySQL", semantic_type="decision", tags=["db"])
# Preferences persist longer than generic working notes
brain.store("User prefers dark mode", semantic_type="preference", tags=["ui"])
# Contradictions are preserved for conflict analysis
brain.store("User said vegan but ordered steak", semantic_type="contradiction")
# Search by semantic type
decisions = brain.search(semantic_type="decision")
# Cross-domain insights surface higher-level patterns
insights = brain.insights(phase=2)
# Example:
# [{'insight_type': 'preference_pattern', 'description': 'Preference cluster around ui', ...}]
brain = Aura("./data")
brain.store("Real preference: dark mode", namespace="default")
brain.store("Test: user likes light mode", namespace="sandbox")
# Recall only sees "default" namespace — sandbox is invisible
results = brain.recall_structured("user preference", top_k=5)
Use this when you need inspection-only analytics across isolated namespaces without changing recall behavior.
brain = Aura("./data")
digest = brain.cross_namespace_digest(
namespaces=["default", "sandbox"],
top_concepts_limit=3,
)
# Top concepts per namespace
print(digest["namespaces"][0]["top_concepts"])
# Pairwise overlap
print(digest["pairs"][0]["shared_tags"])
print(digest["pairs"][0]["shared_concept_signatures"])
print(digest["pairs"][0]["shared_causal_signatures"])
HTTP server:
GET /cross-namespace-digest?namespaces=default,sandbox&top_concepts_limit=3
MCP tool:
{
"tool": "cross_namespace_digest",
"arguments": {
"namespaces": ["default", "sandbox"],
"top_concepts_limit": 3
}
}
The digest is read-only. It does not bypass namespace isolation in recall and does not feed training or inference by default.
For richer operator-facing workflows, see examples/V3_OPERATOR_WORKFLOWS.md.
Aura can also observe model output and feed bounded experience back into the cognitive substrate, without retraining the model.
from aura import Aura
brain = Aura("./data")
brain.set_plasticity_mode("limited")
capture = brain.capture_experience(
prompt="How should we deploy this release?",
retrieved_context=[],
model_response="Deploy to staging first, then verify health checks before production.",
session_id="deploy-session-1",
source="model_inference",
)
brain.ingest_experience_batch([capture])
brain.run_maintenance() # queued experience enters the normal cognitive pipeline
This stays bounded and operator-visible:
Recent operator HTTP endpoints:
GET /explain-recordGET /explain-recallGET /explainability-bundleGET /correction-logGET /cross-namespace-digestGET /memory-healthGET /belief-instabilityGET /policy-lifecycleGET /correction-review-queueGET /suggested-correctionsGET /namespace-governance-statusThe killer use case: an agent that remembers your preferences after a week offline, with zero API calls.
See examples/personal_assistant.py for the full runnable script.
from aura import Aura, Level
brain = Aura("./assistant_memory")
# Day 1: User tells the agent about themselves
brain.store("User is vegan", level=Level.Identity, tags=["diet"])
brain.store("User loves jazz music", level=Level.Identity, tags=["music"])
brain.store("User works 10am-6pm", level=Level.Identity, tags=["schedule"])
brain.store("Discuss quarterly report tomorrow", level=Level.Working, tags=["task"])
# Simulate a week passing — run maintenance cycles
for _ in range(7):
brain.run_maintenance() # decay + reflect + consolidate + archive
# Day 8: What does the agent remember?
context = brain.recall("user preferences and personality")
# -> Still remembers: vegan, jazz, schedule (Identity, strength ~0.93)
# -> "quarterly report" decayed heavily (Working, strength ~0.21)
Identity persists. Tasks fade. Important patterns get promoted. Like a real brain.
Give any MCP-compatible AI persistent, self-organizing memory:
pip install aura-memory
Claude Desktop — Settings → Developer → Edit Config:
{
"mcpServers": {
"aura": {
"command": "python",
"args": ["-m", "aura", "mcp", "C:\\Users\\YOUR_NAME\\aura_brain"]
}
}
}
Cursor / VS Code — .cursor/mcp.json or .vscode/mcp.json:
{
"servers": {
"aura": {
"command": "python",
"args": ["-m", "aura", "mcp", "./aura_brain"],
"type": "stdio"
}
}
}
macOS / Linux path:
python -m aura mcp ~/aura_brain
Once connected, Claude automatically has 11 tools:
| Tool | Purpose |
|---|---|
recall |
Retrieve relevant memories before answering |
recall_structured |
Get memories with scores and metadata |
store |
Save a fact, note, or context |
store_code |
Save a code snippet at Domain level |
store_decision |
Save a decision with reasoning |
search |
Filter memories by level or tags |
insights |
Memory health stats |
consolidate |
Merge similar records |
get |
Fetch a specific record by ID |
delete |
Remove a record by ID |
maintain |
Run a full maintenance cycle |
After connecting, tell Claude: "Before answering, always recall relevant context from memory. After our conversation, store key facts."
If cargo test intermittently fails on Windows with LNK1104 for target\debug\deps\aura-...exe, a stale test process is usually holding the file open. Run:
powershell -ExecutionPolicy Bypass -File .\scripts\cleanup_windows_test_lock.ps1
Then rerun the test command.
Aura includes a standalone web dashboard for visual memory management. Download from GitHub Releases.
./aura-dashboard ./my_brain --port 8000
Features: Analytics · Memory Explorer with filtering · Recall Console with live scoring · Batch ingest
| Platform | Binary |
|---|---|
| Windows x64 | aura-dashboard-windows-x64.exe |
| Linux x64 | aura-dashboard-linux-x64 |
| macOS ARM | aura-dashboard-macos-arm64 |
| macOS x64 | aura-dashboard-macos-x64 |
Try now: Open In Colab — zero install, runs in browser
| Integration | Description | Link |
|---|---|---|
| Ollama | Fully local AI assistant, no API key needed | ollama_agent.py |
| LangChain | Drop-in Memory class + prompt injection | langchain_agent.py |
| LlamaIndex | Chat engine with persistent memory recall | llamaindex_agent.py |
| OpenAI Agents | Dynamic instructions with persistent memory | openai_agents.py |
| Claude SDK | System prompt injection + tool use patterns | claude_sdk_agent.py |
| CrewAI | Tool-based recall/store for crew agents | crewai_agent.py |
| AutoGen | Memory protocol implementation | autogen_agent.py |
| FastAPI | Per-user memory middleware with namespace isolation | fastapi_middleware.py |
FFI (C/Go/C#): aura.h · go/main.go · csharp/Program.cs
More examples: basic_usage.py · encryption.py · agent_memory.py · edge_device.py · maintenance_daemon.py · research_bot.py
Aura uses a Rust core with Python bindings and a local-first memory runtime.
Publicly documented concepts are:
Higher cognitive layers may be present in the SDK as bounded reranking overlays and advisory inspection surfaces. They are not default runtime decision-making or behavior control.
The public repository documents the user-facing behavior and integration surface. Detailed internal architecture, tuning, and research notes are intentionally not published.
Contributions welcome! See CONTRIBUTING.md for setup instructions and guidelines, or check the open issues.
⭐ If Aura saves you time, a GitHub star helps others discover it and helps us continue development.
Built in Kyiv, Ukraine 🇺🇦 — including during power outages.
Solo developer project. If you find this useful, your star means more than you think.
Add this to claude_desktop_config.json and restart Claude Desktop.
{
"mcpServers": {
"aura-memory": {
"command": "npx",
"args": []
}
}
}