Skip to main content

Compare

7 mins

Agent memory solutions differ in where memories live, how they’re shared, and what you have to run.

Landscape #

Solution How it works Trade-offs
Claude Code auto-memory, Windsurf, Cursor Markdown files on disk, scoped to one client No setup. No search, no sharing between clients
claude-mem, claude-memory Records tool calls into local SQLite Rich session history. Local to one machine
Nexus, Obsidian MCP Markdown in an Obsidian vault with optional embeddings Human-readable notes. Needs Obsidian running as a bridge
Agent Zero FAISS vector search with LLM extraction, four memory areas Automatic capture. Tied to the Agent Zero framework
Mem0 / OpenMemory LLM extracts memories, Qdrant + Postgres, optional Neo4j graph Automatic extraction. Cloud or self-hosted, free tier has usage limits
Cognee LLM pipeline builds knowledge graphs from unstructured data Self-improving graph. LLM on every ingestion, cloud from $35/month
MuninnDB Single binary with ACT-R decay and Bayesian confidence Sub-20ms queries, no LLM. New project, runs its own binary
QMD Local hybrid search (BM25 + vector + LLM rerank) over markdown files Fast local search, no cloud. Single machine unless you share files via NAS or sync
Ogham MCP MCP server backed by PostgreSQL + pgvector Shared database, no LLM. Needs Postgres and an embedding provider

Local-only #

The simplest options keep memories on your machine. Claude Code, Windsurf, and Cursor write Markdown files scoped to a single client – no setup, but no search and no sharing. claude-mem and claude-memory go further with SQLite session recording, giving you tool-call history across sessions. Obsidian MCP and Nexus use an Obsidian vault as the backing store, which gives you a readable graph of linked notes but requires Obsidian running as a bridge.

QMD takes the local approach further. It runs BM25 keyword search, vector search, and LLM reranking across your markdown files – all locally, using small GGUF models (~2GB). It has an MCP server, so any client can search your notes. If your markdown lives on a NAS or a synced folder, multiple machines can search the same files. That gets you surprisingly far without a database. The trade-off is concurrency – two agents writing to the same file at the same time is where file-based storage gets uncomfortable. But for a single user searching their own notes from different machines, it works well.

All of these hit the same wall eventually: file-based storage wasn’t designed for concurrent writes from multiple agents. Sharing via sync or NAS works for reads, less so when multiple clients are storing memories at the same time.

LLM-powered #

Mem0, Cognee, and Agent Zero use an LLM to extract and organise memories automatically. You don’t call store_memory – the LLM decides what to keep. Mem0 offers a managed cloud platform or a self-hosted version with Qdrant and Postgres, plus optional Neo4j for a knowledge graph. Cognee builds entity-relationship graphs from unstructured data with an LLM pipeline on every ingestion step. Agent Zero bakes memory into its own agent framework with FAISS for local vector search.

The trade-off across all three: you need an LLM running in the memory pipeline, not just for chat.

Database-backed (no LLM) #

MuninnDB is a single Go binary with cognitive scoring (ACT-R decay, Bayesian confidence) and sub-20ms queries. It runs its own embedded storage.

Ogham MCP takes a different route to the same problem. Instead of a dedicated binary, it pushes cognitive scoring, graph traversal, and hybrid search into PostgreSQL – stored procedures, recursive CTEs, and pgvector indexes do the work that would otherwise need a standalone application. Any Postgres instance becomes the memory engine: Supabase, Neon, a VPS, a machine under your desk.

Local-only LLM-powered Database-backed
LLM needed No Yes No
Shared across clients No Yes (Mem0, Cognee) Yes
Shared across machines Possible via NAS or sync Yes Yes
Automatic extraction No Yes No
Infrastructure None 3+ services 1 database + embedding provider

Ogham vs Mem0 vs Cognee vs Agent Zero #

Four approaches to the same problem, different trade-offs on infrastructure and where your data lives.

Mem0 extracts and deduplicates memories using an LLM, with optional knowledge graphs via Neo4j. There’s a managed cloud platform and a self-hosted open-source version. The cloud handles infrastructure for you; self-hosting means running an API server with Qdrant and Postgres. Free tier: 10,000 memories, 1,000 retrieval calls per month.

Cognee builds knowledge graphs from your data – an LLM pipeline extracts entities and relationships, then refines the graph over time. An LLM runs on every ingestion step. Self-hosting recommends 32B+ models. The free tier covers basic workflows. $35/month gets you 1,000 documents, 10,000 API calls, and hosted infrastructure.

Agent Zero bakes memory into its own agent framework. It extracts conversation fragments and problem-solving patterns via LLM, stores them in four memory areas, and uses FAISS for local vector search. The catch: it only works inside Agent Zero, and memories live in local project directories.

Ogham MCP skips the LLM for memory processing entirely. It embeds and indexes what you give it, ranks results with cognitive scoring on top of hybrid search, and discovers relationships by embedding similarity. Everything runs as stored procedures in PostgreSQL – no extra services.

Mem0 / OpenMemory Cognee Agent Zero Ogham MCP
Architecture MCP server + 3 containers MCP server + graph/vector backends Built into agent framework MCP server + PostgreSQL (pgvector)
Vector store Qdrant Qdrant, LanceDB, Milvus, pgvector, or others FAISS (local files) pgvector (any PostgreSQL)
Graph store Neo4j (optional) Neo4j, Kuzu, FalkorDB, or NetworkX None PostgreSQL (recursive CTEs)
LLM required Yes, for memory extraction Yes, for entity/relationship extraction Yes, for extraction + consolidation No
Embeddings OpenAI (default) or self-hosted OpenAI, Ollama, or others 100+ providers via LiteLLM OpenAI, Mistral, Voyage AI, or Ollama (local)
Memory creation Automatic (LLM extracts) Automatic (LLM builds graph) Automatic (LLM extracts) Explicit (store_memory, or hooks/skills)
Ranking Semantic similarity Graph traversal + vector search Cosine similarity + metadata Hybrid search + cognitive scoring (ACT-R + confidence + graph centrality)
Graph building LLM entity extraction (optional) LLM pipeline (required) None Embedding similarity — auto-linked, no LLM
Cross-client sharing Yes (MCP server) Yes (MCP server) No (framework-bound) Yes (MCP server, shared database)
Cross-machine sharing Yes (cloud or self-hosted) Yes (cloud or self-hosted) No (sync manually) Yes (any PostgreSQL — Supabase, Neon, self-hosted, or managed)
Wiki / topic synthesis No Knowledge graph (entity-level) No Yes — compile_wiki synthesizes a tag’s memories into a markdown page (any LLM, cached)
Obsidian / markdown export No No No Yes — ogham export-obsidian snapshots the wiki layer to a folder of plain .md files
Managed cloud Yes (mem0.ai) Yes (free tier, paid from $35/month) No No (use Supabase or Neon free tier)
Memory limits 10k memories free, 1k retrieval calls/month 1k documents at $35/month, 10k API calls No limit (local) No limit (your database)
Cost at scale Paid after free tier limits $35/month per developer, top-up packs beyond Free (framework-bound) Free (MIT) – you pay for Postgres hosting

Why PostgreSQL #

Postgres has been around for over 30 years. OpenAI, Anthropic, Supabase, Neon – they all run Postgres. It’s not a bet on something unproven.

The early LLM wave sent everyone scrambling to specialized vector databases. Reasonable at the time. But those setups mean syncing data between two systems, and sync means drift. Delete a memory in your main database but forget to remove the embedding, and your AI starts referencing things that no longer exist. With Postgres and pgvector, your embeddings live next to your data in one ACID-compliant database. Nothing gets out of sync.

All the heavy lifting happens inside that database. Scoring, search, graph traversal, relationship discovery – stored procedures and recursive CTEs, not a separate service. The MCP server calls those functions and passes back results. Hybrid search (semantic + keyword + relational filters) runs in a single SQL query instead of glue code stitching three services together.

Row-Level Security applies to your AI memories the same way it protects your app data. One policy, one place.

So the database is the only thing you need to keep running. Where you put it is your call:

  • Supabase or Neon free tier if you don’t want to manage anything
  • Hetzner or DigitalOcean if you want a 5-10 dollar box you control
  • Your existing Postgres on AWS RDS, Azure, or GCP if you’re already running one
  • A machine under your desk if you don’t want anything leaving your network

Pair the last option with Ollama for local embeddings and nothing touches the internet at all.

Embeddings come from OpenAI, Mistral, Voyage AI, or Ollama. Swap providers with a config change and re-embed.

The trade-off is real: you need a PostgreSQL instance and an embedding provider, where local-only solutions need neither. But the database is shared from day one. Point a new machine at the same Postgres instance and it just works – no file sync, no export/import.