Compare
Table of Contents
Agent memory solutions differ in where memories live, how they’re shared, and what you have to run.
Landscape #
| Solution | How it works | Trade-offs |
|---|---|---|
| Claude Code auto-memory, Windsurf, Cursor | Markdown files on disk, scoped to one client | No setup. No search, no sharing between clients |
| claude-mem, claude-memory | Records tool calls into local SQLite | Rich session history. Local to one machine |
| Nexus, Obsidian MCP | Markdown in an Obsidian vault with optional embeddings | Human-readable notes. Needs Obsidian running as a bridge |
| Agent Zero | FAISS vector search with LLM extraction, four memory areas | Automatic capture. Tied to the Agent Zero framework |
| Mem0 / OpenMemory | LLM extracts memories, Qdrant + Postgres, optional Neo4j graph | Automatic extraction. Cloud or self-hosted, free tier has usage limits |
| Cognee | LLM pipeline builds knowledge graphs from unstructured data | Self-improving graph. LLM on every ingestion, cloud from $35/month |
| MuninnDB | Single binary with ACT-R decay and Bayesian confidence | Sub-20ms queries, no LLM. New project, runs its own binary |
| QMD | Local hybrid search (BM25 + vector + LLM rerank) over markdown files | Fast local search, no cloud. Single machine unless you share files via NAS or sync |
| Ogham MCP | MCP server backed by PostgreSQL + pgvector | Shared database, no LLM. Needs Postgres and an embedding provider |
Local-only #
The simplest options keep memories on your machine. Claude Code, Windsurf, and Cursor write Markdown files scoped to a single client – no setup, but no search and no sharing. claude-mem and claude-memory go further with SQLite session recording, giving you tool-call history across sessions. Obsidian MCP and Nexus use an Obsidian vault as the backing store, which gives you a readable graph of linked notes but requires Obsidian running as a bridge.
QMD takes the local approach further. It runs BM25 keyword search, vector search, and LLM reranking across your markdown files – all locally, using small GGUF models (~2GB). It has an MCP server, so any client can search your notes. If your markdown lives on a NAS or a synced folder, multiple machines can search the same files. That gets you surprisingly far without a database. The trade-off is concurrency – two agents writing to the same file at the same time is where file-based storage gets uncomfortable. But for a single user searching their own notes from different machines, it works well.
All of these hit the same wall eventually: file-based storage wasn’t designed for concurrent writes from multiple agents. Sharing via sync or NAS works for reads, less so when multiple clients are storing memories at the same time.
LLM-powered #
Mem0, Cognee, and Agent Zero use an LLM to extract and organise memories automatically. You don’t call store_memory – the LLM decides what to keep. Mem0 offers a managed cloud platform or a self-hosted version with Qdrant and Postgres, plus optional Neo4j for a knowledge graph. Cognee builds entity-relationship graphs from unstructured data with an LLM pipeline on every ingestion step. Agent Zero bakes memory into its own agent framework with FAISS for local vector search.
The trade-off across all three: you need an LLM running in the memory pipeline, not just for chat.
Database-backed (no LLM) #
MuninnDB is a single Go binary with cognitive scoring (ACT-R decay, Bayesian confidence) and sub-20ms queries. It runs its own embedded storage.
Ogham MCP takes a different route to the same problem. Instead of a dedicated binary, it pushes cognitive scoring, graph traversal, and hybrid search into PostgreSQL – stored procedures, recursive CTEs, and pgvector indexes do the work that would otherwise need a standalone application. Any Postgres instance becomes the memory engine: Supabase, Neon, a VPS, a machine under your desk.
| Local-only | LLM-powered | Database-backed | |
|---|---|---|---|
| LLM needed | No | Yes | No |
| Shared across clients | No | Yes (Mem0, Cognee) | Yes |
| Shared across machines | Possible via NAS or sync | Yes | Yes |
| Automatic extraction | No | Yes | No |
| Infrastructure | None | 3+ services | 1 database + embedding provider |
Ogham vs Mem0 vs Cognee vs Agent Zero #
Four approaches to the same problem, different trade-offs on infrastructure and where your data lives.
Mem0 extracts and deduplicates memories using an LLM, with optional knowledge graphs via Neo4j. There’s a managed cloud platform and a self-hosted open-source version. The cloud handles infrastructure for you; self-hosting means running an API server with Qdrant and Postgres. Free tier: 10,000 memories, 1,000 retrieval calls per month.
Cognee builds knowledge graphs from your data – an LLM pipeline extracts entities and relationships, then refines the graph over time. An LLM runs on every ingestion step. Self-hosting recommends 32B+ models. The free tier covers basic workflows. $35/month gets you 1,000 documents, 10,000 API calls, and hosted infrastructure.
Agent Zero bakes memory into its own agent framework. It extracts conversation fragments and problem-solving patterns via LLM, stores them in four memory areas, and uses FAISS for local vector search. The catch: it only works inside Agent Zero, and memories live in local project directories.
Ogham MCP skips the LLM for memory processing entirely. It embeds and indexes what you give it, ranks results with cognitive scoring on top of hybrid search, and discovers relationships by embedding similarity. Everything runs as stored procedures in PostgreSQL – no extra services.
| Mem0 / OpenMemory | Cognee | Agent Zero | Ogham MCP | |
|---|---|---|---|---|
| Architecture | MCP server + 3 containers | MCP server + graph/vector backends | Built into agent framework | MCP server + PostgreSQL (pgvector) |
| Vector store | Qdrant | Qdrant, LanceDB, Milvus, pgvector, or others | FAISS (local files) | pgvector (any PostgreSQL) |
| Graph store | Neo4j (optional) | Neo4j, Kuzu, FalkorDB, or NetworkX | None | PostgreSQL (recursive CTEs) |
| LLM required | Yes, for memory extraction | Yes, for entity/relationship extraction | Yes, for extraction + consolidation | No |
| Embeddings | OpenAI (default) or self-hosted | OpenAI, Ollama, or others | 100+ providers via LiteLLM | OpenAI, Mistral, Voyage AI, or Ollama (local) |
| Memory creation | Automatic (LLM extracts) | Automatic (LLM builds graph) | Automatic (LLM extracts) | Explicit (store_memory, or hooks/skills) |
| Ranking | Semantic similarity | Graph traversal + vector search | Cosine similarity + metadata | Hybrid search + cognitive scoring (ACT-R + confidence + graph centrality) |
| Graph building | LLM entity extraction (optional) | LLM pipeline (required) | None | Embedding similarity — auto-linked, no LLM |
| Cross-client sharing | Yes (MCP server) | Yes (MCP server) | No (framework-bound) | Yes (MCP server, shared database) |
| Cross-machine sharing | Yes (cloud or self-hosted) | Yes (cloud or self-hosted) | No (sync manually) | Yes (any PostgreSQL — Supabase, Neon, self-hosted, or managed) |
| Wiki / topic synthesis | No | Knowledge graph (entity-level) | No | Yes — compile_wiki synthesizes a tag’s memories into a markdown page (any LLM, cached) |
| Obsidian / markdown export | No | No | No | Yes — ogham export-obsidian snapshots the wiki layer to a folder of plain .md files |
| Managed cloud | Yes (mem0.ai) | Yes (free tier, paid from $35/month) | No | No (use Supabase or Neon free tier) |
| Memory limits | 10k memories free, 1k retrieval calls/month | 1k documents at $35/month, 10k API calls | No limit (local) | No limit (your database) |
| Cost at scale | Paid after free tier limits | $35/month per developer, top-up packs beyond | Free (framework-bound) | Free (MIT) – you pay for Postgres hosting |
Why PostgreSQL #
Postgres has been around for over 30 years. OpenAI, Anthropic, Supabase, Neon – they all run Postgres. It’s not a bet on something unproven.
The early LLM wave sent everyone scrambling to specialized vector databases. Reasonable at the time. But those setups mean syncing data between two systems, and sync means drift. Delete a memory in your main database but forget to remove the embedding, and your AI starts referencing things that no longer exist. With Postgres and pgvector, your embeddings live next to your data in one ACID-compliant database. Nothing gets out of sync.
All the heavy lifting happens inside that database. Scoring, search, graph traversal, relationship discovery – stored procedures and recursive CTEs, not a separate service. The MCP server calls those functions and passes back results. Hybrid search (semantic + keyword + relational filters) runs in a single SQL query instead of glue code stitching three services together.
Row-Level Security applies to your AI memories the same way it protects your app data. One policy, one place.
So the database is the only thing you need to keep running. Where you put it is your call:
- Supabase or Neon free tier if you don’t want to manage anything
- Hetzner or DigitalOcean if you want a 5-10 dollar box you control
- Your existing Postgres on AWS RDS, Azure, or GCP if you’re already running one
- A machine under your desk if you don’t want anything leaving your network
Pair the last option with Ollama for local embeddings and nothing touches the internet at all.
Embeddings come from OpenAI, Mistral, Voyage AI, or Ollama. Swap providers with a config change and re-embed.
The trade-off is real: you need a PostgreSQL instance and an embedding provider, where local-only solutions need neither. But the database is shared from day one. Point a new machine at the same Postgres instance and it just works – no file sync, no export/import.