Skip to main content

Blog

Gemini Embedding 2 at 512 dims - does multimodal survive compression?

4 mins

This week an article on enterprise knowledge base search caught my interest demonstrating Gemini Embedding 2 – bridging power curve images with CSV maintenance data in a single embedding call. The Python code was minimal. Most of the intelligence was in the model.

It got me thinking about a practical question: Gemini Embedding 2 outputs 3072-dimensional vectors. That’s expensive to index. Google says it supports Matryoshka Representation Learning (MRL) for truncation to lower dimensions. But does the cross-modal magic survive the compression?

We ran 400 questions against pgvector. Here's the retrieval curve.

4 mins

Most memory benchmarks ask one thing: can you find a fact? The BEAM benchmark(Tavakoli et al., ICLR 2026) asks something harder. Can your retrieval engine piece together a timeline from scattered mentions? Can it find both sides of a contradiction? Can it summarise a project that evolved across dozens of conversation turns?

We ran all 400 BEAM probing questions against Ogham. No knowledge graphs, no external rerankers, no LLM calls at search time. Postgres, pgvector, tsvector, and Voyage embeddings at 512 dimensions.

Give your AI agent a memory that survives compaction

3 mins

I keep running into the same problem. Start a session, spend twenty minutes explaining the codebase, make some progress, and then the context window fills up. Claude compacts the conversation. Half the context is gone. The next question gets a confused response because the agent forgot about the database migration we discussed ten minutes ago.

Switching clients makes it worse. Move from Claude Code to Kiro or Cursor and you’re starting from zero.