Skip to main content

Blog

The missing primitive: why AI agents need persistent memory

3 mins

Nate B Jones put out a video yesterday breaking down three tools Anthropic just shipped – Dispatch, Computer Use, and Scheduled Tasks. The tools are worth knowing about. But the more interesting thread is something Nate keeps circling back to between the demos.

He calls it “Open Brain.” A database you control. Cheap. Almost free. Where your AI stores what works and what doesn’t, and that knowledge compounds over time.

We’ve been building that. It’s called Ogham.

We ripped out litellm in one afternoon

3 mins

On Monday, litellm versions 1.82.7 and 1.82.8 hit PyPI with credential-stealing malware baked in. The attack payload collected SSH keys, .env files, cloud credentials, Kubernetes configs, and shell history, encrypted everything with RSA-4096, and sent it to a fake domain. Then it tried to plant persistent backdoors in your kube-system namespace and at ~/.config/sysmon/sysmon.py.

We had litellm in our API gateway. Version 1.82.4 – safe, but one careless pip install --upgrade away from compromise.

Claude now remembers things. Here's what it doesn't remember.

5 mins

Anthropic shipped three features recently that matter if you care about AI memory: auto-memory for session-to-session learning, auto-dream for background memory consolidation, and auto mode for more autonomous tool use.

Auto-memory landed in v2.1.59. Claude now writes notes to itself between sessions – build commands it discovered, debugging patterns, your code style preferences. These live as markdown files in ~/.claude/projects/<project>/memory/ and the first 200 lines get loaded at the start of every conversation.

Gemini Embedding 2 at 512 dims - does multimodal survive compression?

4 mins

This week an article on enterprise knowledge base search caught my interest demonstrating Gemini Embedding 2 – bridging power curve images with CSV maintenance data in a single embedding call. The Python code was minimal. Most of the intelligence was in the model.

It got me thinking about a practical question: Gemini Embedding 2 outputs 3072-dimensional vectors. That’s expensive to index. Google says it supports Matryoshka Representation Learning (MRL) for truncation to lower dimensions. But does the cross-modal magic survive the compression?

We ran 400 questions against pgvector. Here's the retrieval curve.

4 mins

Most memory benchmarks ask one thing: can you find a fact? The BEAM benchmark(Tavakoli et al., ICLR 2026) asks something harder. Can your retrieval engine piece together a timeline from scattered mentions? Can it find both sides of a contradiction? Can it summarise a project that evolved across dozens of conversation turns?

We ran all 400 BEAM probing questions against Ogham. No knowledge graphs, no external rerankers, no LLM calls at search time. Postgres, pgvector, tsvector, and Voyage embeddings at 512 dimensions.

Give your AI agent a memory that survives compaction

3 mins

I keep running into the same problem. Start a session, spend twenty minutes explaining the codebase, make some progress, and then the context window fills up. Claude compacts the conversation. Half the context is gone. The next question gets a confused response because the agent forgot about the database migration we discussed ten minutes ago.

Switching clients makes it worse. Move from Claude Code to Kiro or Cursor and you’re starting from zero.