Compare

Table of Contents

Agent memory solutions differ in where memories live, how they’re shared, and what you have to run.

Landscape #

Solution	How it works	Trade-offs
Claude Code auto-memory, Windsurf, Cursor	Markdown files on disk, scoped to one client	No setup. No search, no sharing between clients
claude-mem, claude-memory	Records tool calls into local SQLite	Rich session history. Local to one machine
Nexus, Obsidian MCP	Markdown in an Obsidian vault with optional embeddings	Human-readable notes. Needs Obsidian running as a bridge
Agent Zero	FAISS vector search with LLM extraction, four memory areas	Automatic capture. Tied to the Agent Zero framework
Mem0 / OpenMemory	LLM extracts memories, Qdrant + Postgres, optional Neo4j graph	Automatic extraction. Cloud or self-hosted, free tier has usage limits
Cognee	LLM pipeline builds knowledge graphs from unstructured data	Self-improving graph. LLM on every ingestion, cloud from $35/month
MuninnDB	Single binary with ACT-R decay and Bayesian confidence	Sub-20ms queries, no LLM. New project, runs its own binary
QMD	Local hybrid search (BM25 + vector + LLM rerank) over markdown files	Fast local search, no cloud. Single machine unless you share files via NAS or sync
Ogham MCP	MCP server backed by PostgreSQL + pgvector	Shared database, no LLM. Needs Postgres and an embedding provider

Local-only #

The simplest options keep memories on your machine. Claude Code, Windsurf, and Cursor write Markdown files scoped to a single client – no setup, but no search and no sharing. claude-mem and claude-memory go further with SQLite session recording, giving you tool-call history across sessions. Obsidian MCP and Nexus use an Obsidian vault as the backing store, which gives you a readable graph of linked notes but requires Obsidian running as a bridge.

QMD takes the local approach further. It runs BM25 keyword search, vector search, and LLM reranking across your markdown files – all locally, using small GGUF models (~2GB). It has an MCP server, so any client can search your notes. If your markdown lives on a NAS or a synced folder, multiple machines can search the same files. That gets you surprisingly far without a database. The trade-off is concurrency – two agents writing to the same file at the same time is where file-based storage gets uncomfortable. But for a single user searching their own notes from different machines, it works well.

All of these hit the same wall eventually: file-based storage wasn’t designed for concurrent writes from multiple agents. Sharing via sync or NAS works for reads, less so when multiple clients are storing memories at the same time.

LLM-powered #

Mem0, Cognee, and Agent Zero use an LLM to extract and organise memories automatically. You don’t call store_memory – the LLM decides what to keep. Mem0 offers a managed cloud platform or a self-hosted version with Qdrant and Postgres, plus optional Neo4j for a knowledge graph. Cognee builds entity-relationship graphs from unstructured data with an LLM pipeline on every ingestion step. Agent Zero bakes memory into its own agent framework with FAISS for local vector search.

The trade-off across all three: you need an LLM running in the memory pipeline, not just for chat.

Database-backed (no LLM) #

MuninnDB is a single Go binary with cognitive scoring (ACT-R decay, Bayesian confidence) and sub-20ms queries. It runs its own embedded storage.

Ogham MCP takes a different route to the same problem. Instead of a dedicated binary, it pushes cognitive scoring, graph traversal, and hybrid search into PostgreSQL – stored procedures, recursive CTEs, and pgvector indexes do the work that would otherwise need a standalone application. Any Postgres instance becomes the memory engine: Supabase, Neon, a VPS, a machine under your desk.

	Local-only	LLM-powered	Database-backed
LLM needed	No	Yes	No
Shared across clients	No	Yes (Mem0, Cognee)	Yes
Shared across machines	Possible via NAS or sync	Yes	Yes
Automatic extraction	No	Yes	No
Infrastructure	None	3+ services	1 database + embedding provider

Ogham vs Mem0 vs Cognee vs Agent Zero #

Four approaches to the same problem, different trade-offs on infrastructure and where your data lives.

Mem0 extracts and deduplicates memories using an LLM, with optional knowledge graphs via Neo4j. There’s a managed cloud platform and a self-hosted open-source version. The cloud handles infrastructure for you; self-hosting means running an API server with Qdrant and Postgres. Free tier: 10,000 memories, 1,000 retrieval calls per month.

Cognee builds knowledge graphs from your data – an LLM pipeline extracts entities and relationships, then refines the graph over time. An LLM runs on every ingestion step. Self-hosting recommends 32B+ models. The free tier covers basic workflows. $35/month gets you 1,000 documents, 10,000 API calls, and hosted infrastructure.

Agent Zero bakes memory into its own agent framework. It extracts conversation fragments and problem-solving patterns via LLM, stores them in four memory areas, and uses FAISS for local vector search. The catch: it only works inside Agent Zero, and memories live in local project directories.

Ogham MCP skips the LLM for memory processing entirely. It embeds and indexes what you give it, ranks results with cognitive scoring on top of hybrid search, and discovers relationships by embedding similarity. Everything runs as stored procedures in PostgreSQL – no extra services.

	Mem0 / OpenMemory	Cognee	Agent Zero	Ogham MCP
Architecture	MCP server + 3 containers	MCP server + graph/vector backends	Built into agent framework	MCP server + PostgreSQL (pgvector)
Vector store	Qdrant	Qdrant, LanceDB, Milvus, pgvector, or others	FAISS (local files)	pgvector (any PostgreSQL)
Graph store	Neo4j (optional)	Neo4j, Kuzu, FalkorDB, or NetworkX	None	PostgreSQL (recursive CTEs)
LLM required	Yes, for memory extraction	Yes, for entity/relationship extraction	Yes, for extraction + consolidation	No
Embeddings	OpenAI (default) or self-hosted	OpenAI, Ollama, or others	100+ providers via LiteLLM	OpenAI, Mistral, Voyage AI, or Ollama (local)
Memory creation	Automatic (LLM extracts)	Automatic (LLM builds graph)	Automatic (LLM extracts)	Explicit (`store_memory`, or hooks/skills)
Ranking	Semantic similarity	Graph traversal + vector search	Cosine similarity + metadata	Hybrid search + cognitive scoring (ACT-R + confidence + graph centrality)
Graph building	LLM entity extraction (optional)	LLM pipeline (required)	None	Embedding similarity — auto-linked, no LLM
Cross-client sharing	Yes (MCP server)	Yes (MCP server)	No (framework-bound)	Yes (MCP server, shared database)
Cross-machine sharing	Yes (cloud or self-hosted)	Yes (cloud or self-hosted)	No (sync manually)	Yes (any PostgreSQL — Supabase, Neon, self-hosted, or managed)
Wiki / topic synthesis	No	Knowledge graph (entity-level)	No	Yes — `compile_wiki` synthesizes a tag’s memories into a markdown page (any LLM, cached)
Obsidian / markdown export	No	No	No	Yes — `ogham export-obsidian` snapshots the wiki layer to a folder of plain `.md` files
Managed cloud	Yes (mem0.ai)	Yes (free tier, paid from $35/month)	No	No (use Supabase or Neon free tier)
Memory limits	10k memories free, 1k retrieval calls/month	1k documents at $35/month, 10k API calls	No limit (local)	No limit (your database)
Cost at scale	Paid after free tier limits	$35/month per developer, top-up packs beyond	Free (framework-bound)	Free (MIT) – you pay for Postgres hosting

Why PostgreSQL #

Postgres has been around for over 30 years. OpenAI, Anthropic, Supabase, Neon – they all run Postgres. It’s not a bet on something unproven.

The early LLM wave sent everyone scrambling to specialized vector databases. Reasonable at the time. But those setups mean syncing data between two systems, and sync means drift. Delete a memory in your main database but forget to remove the embedding, and your AI starts referencing things that no longer exist. With Postgres and pgvector, your embeddings live next to your data in one ACID-compliant database. Nothing gets out of sync.

All the heavy lifting happens inside that database. Scoring, search, graph traversal, relationship discovery – stored procedures and recursive CTEs, not a separate service. The MCP server calls those functions and passes back results. Hybrid search (semantic + keyword + relational filters) runs in a single SQL query instead of glue code stitching three services together.

Row-Level Security applies to your AI memories the same way it protects your app data. One policy, one place.

So the database is the only thing you need to keep running. Where you put it is your call:

Supabase or Neon free tier if you don’t want to manage anything
Hetzner or DigitalOcean if you want a 5-10 dollar box you control
Your existing Postgres on AWS RDS, Azure, or GCP if you’re already running one
A machine under your desk if you don’t want anything leaving your network

Pair the last option with Ollama for local embeddings and nothing touches the internet at all.

Embeddings come from OpenAI, Mistral, Voyage AI, or Ollama. Swap providers with a config change and re-embed.

The trade-off is real: you need a PostgreSQL instance and an embedding provider, where local-only solutions need neither. But the database is shared from day one. Point a new machine at the same Postgres instance and it just works – no file sync, no export/import.