Latent Space: The AI Engineer Podcast
Episode: Retrieval After RAG: Hybrid Search, Agents, and Database Design
Guest: Simon Hørup Eskildsen, Co-Founder of Turbopuffer
Date: March 12, 2026
Episode Overview
This episode explores the evolution of retrieval systems after the popularity of Retrieval-Augmented Generation (RAG). Simon Hørup Eskildsen, co-founder of Turbopuffer, offers a deep dive into the shifting landscape of hybrid search, AI agent workloads, and database design for modern, AI-driven applications. He shares the technical motivations behind Turbopuffer’s architecture, lessons from scaling at Shopify, the realities of startup building, and his principles for engineering excellence.
Key Discussion Points and Insights
1. Turbopuffer’s Mission and Architecture
[02:33 - 06:25]
- Turbopuffer is positioned as a search engine, focusing on full text and vector search for unstructured data.
- Eskildsen frames the goal:
"We can compress into a few terabytes of weights how to reason with the world, but we have to somehow connect it to something external that actually holds that like in full fidelity and truth. And that's the thing that we intend to become."
- The platform leverages new cloud primitives—object storage, NVMe SSDs, and eventual/strong consistency to enable a fundamentally different architecture.
- Three foundational conditions to build a modern database company:
- A new workload: AI-connected data search in every company.
- Storage architecture leap: All-in on fast SSDs and object storage, no consensus layer.
- Comprehensive query support: A big database company must eventually serve a wide variety of query plans for evolving customer needs.
2. Origin Story: From Shopify to Angel Engineering to Turbopuffer
[06:26 - 13:32]
- Draws on nearly a decade of experience at Shopify, especially the pain of scaling and maintaining Elasticsearch clusters.
- Describes his post-Shopify phase as "angel engineering," consulting on infra for startups like Readwise, Replicate, Causal.
- The "aha" for Turbopuffer: While helping Readwise, found vector search could improve recommendations, but the infrastructure cost was "insane" (30k/month just for vectors vs a 5k/month infra budget).
"If the cost had been a tenth, we would have shipped it. And this was really the only data point that I had, right?... It haunted me."
- Spurred to build a fully object-storage-centric, cost-effective search database optimized for cloud primitives.
3. Technical Deep Dive: Cloud Storage, Consensus, and Database Design
[13:32 - 22:58]
- Details how object storage as primary storage and NVMe SSDs as cache open new possibilities.
- On S3’s evolution:
"S3 only became consistent in December of 2020." ([17:19])
- Compare-and-swap support was a gating factor for metadata management across clusters; Google Cloud Storage had it first, AWS’s S3 added it in late 2024.
- Early customers’ (Notion, Cursor) need for low latency led to elaborate routing—buying dark fiber between AWS and GCP regions to avoid introducing extra consensus and state management layers.
- Technical stance: “I don't want state in two systems. The worst outages are the ones where you have state in multiple places that's not syncing up.” ([22:01])
4. Early Customer Stories: Cursor & Notion
[24:58 - 30:57]
- Cursor, second significant client (after Readwise), credits Turbopuffer with making their per-user economics work by reducing search-related infra costs by 95%.
- Cursor’s technical use case:
“Cursor's workload is basically they will embed the entire codebase... They have their own embedding model... very good at semantic search... And so it's very good at: can you find me code that's similar to this or code that does this?”
- Cursor ensures security/privacy by encrypting codebase embeddings and obscuring file paths inside Turbopuffer.
- Notion’s case: Technical credibility and cost/vs. egress/latency tradeoffs were more important than Turbopuffer’s architecture per se.
5. Evolution of Retrieval Workloads: From RAG to Agent-Led Hybrid Search
[31:55 - 34:22]
- Modern workloads: Shift from single RAG-style context window search to agentic patterns where LLM agents drive many concurrent searches, and retrieval is dynamic and fine-grained.
- “Now you have agentic search, where the model is both writing and changing the code and it's searching it again later.”
- Hybrid search (semantic + keyword/regex/SQL):
“All workloads are hybrid. You want the semantic, you want the text, you want the regex... It's silly to be all in on one particular query pattern.” ([31:17])
- Engineering response: Dropping query pricing 5x to support emerging, high-concurrency agentic use.
6. Startup Journey: Pricing, Funding, and Picking Investors
[34:22 - 41:22]
- Early pricing was “Vibe pricing”—rough estimates based on infra costs and a bit of margin, later optimized as real usage revealed cost structure.
- Bootstrapped beginnings—ran on founders’ credit cards until after Cursor joined as a customer.
- On choosing first investor Lockey (despite others being “database experts”):
"I just called Locky... if this doesn't have PMF by the end of the year, we'll just return all the money to you... I just play with open cards. Locky was the only person that didn't freak out. He was like, I've never heard anyone say that before.” ([39:13])
7. Team Building, P99 Engineers, and Company Culture
[41:48 - 48:26]
- “Joining vs. starting”: Simon did not set out to found a company, Turbopuffer evolved from a blog post and proof-of-concept.
- “P99 engineer” concept: Internal framework for ultra-high-talent density, emphasizing high agency, track record of "bending the computer to their will," and a natural obsession with systems/things (e.g., maps, trains).
- Traits: High agency, trade-off thinking, proven ability to deliver breakthroughs.
- Hiring bar: “The default should be, we're definitely not hiring this person. And, you know, if everyone was like, I'll maybe throw a punch, then this is not the right team.”
- Example:
“ANN v3 can search 100 billion vectors with a p50 of around 40 milliseconds and a p99 of 200 milliseconds... the Chief Architect of TurboPuffer... just bent the software... in a six to eight week period.” ([46:34])
8. Future Vision: Roadmap and Beyond
[51:12 - 57:06]
- Act One: Vector search only.
- Act Two: Adding full-text search, optimizing for new, LLM-driven workloads.
- Current Focus: More full text features and improved scale—serving “common crawl-level” datasets, improving feature coverage and dashboard UX, and supporting organizations moving from legacy search platforms.
- Looking Ahead:
- Will expand to additional workloads (OLAP, logging, time series) only as customer demand becomes clear—fierce focus over feature sprawl.
- Possible support for graph query patterns as evolving customer need.
"If you want to build a big database company, the database over time has to implement more or less every query plan... But when you're a startup, your only moat is really just focus." ([54:41])
9. Personal Touches: Tea, Maps, and Pokémon-Grade Nerdiness
[57:06 - end]
- Simon’s obsession with tea (keeps an Airtable of 200+ teas), draws parallel to P99 engineering mentality—total, lifelong curiosity and obsession.
- Maps as a metaphor for good developer relations/devrel—mapping journeys, showing boundaries and paths.
Notable Quotes (with Timestamps)
-
On honesty with funders:
"If this doesn't have PMF by the end of the year, we'll just return all the money to you... When I don't know how to play a game, I just play with open cards."
— Simon ([39:13]) -
On architecture and the cloud:
"You could turn off all the servers that TurboPuffer has and we would not lose any data because we are completely all in on object storage. And this means that our architecture is just so simple."
— Simon ([03:14]) -
On team standards:
"The default should be, we're definitely not hiring this person. And, you know, if everyone was like, I'll maybe throw a punch, then this is not the right team."
— Simon ([45:12]) -
On first principles and technology timing:
"S3 only became consistent in December of 2020... NVMe SSDs were also not in the cloud until around 2017... Compare and swap was not in S3 until late 2024."
— Simon ([17:19-21:04]) -
On focus and sequencing:
"When you're a startup, your only moat is really just focus. So you have to lay out the facts and you have to not get overeager."
— Simon ([54:44])
Key Timestamps
- [02:33] – Turbopuffer’s product definition and philosophy
- [06:26] – Simon’s background, Shopify experience, and angel engineering
- [13:32] – Technical shift: object storage & cloud architecture
- [22:01] – State management philosophy (avoiding Zookeeper, consensus pain)
- [24:58] – Notion and Cursor case studies, cost savings, practical impacts
- [31:17] – Discussion of hybrid/agentic workloads, evolution beyond RAG
- [34:44] – Pricing philosophy, infra cost realities, and bootstrapping
- [39:13] – Choosing Lockey as investor; open cards approach
- [45:12] – P99 engineer philosophy, team culture, hiring standards
- [51:12] – Roadmap vision: acts, feature growth, and caution against overreaching
- [57:06] – Tea stories, quirks, and obsessive nerd curiosities
Memorable Moments
- Simon’s "Vibe pricing" and running a cloud infra bill off his credit card.
- The story of physically buying dark fiber for a startup database to support multi-cloud low-latency workloads.
- The “P99 engineer” standard and the cultural bar for hiring.
- Parallels between obsession with tea, maps, and the kind of talent that shapes foundational infrastructure.
Conclusion
This episode offers an in-depth, candid, and highly technical portrait of the next generation of retrieval systems—where “search after RAG” blends hybrid search methods, high concurrency, and data architectures native to modern cloud primitives. Simon Hørup Eskildsen pulls back the curtain on the real tradeoffs, the founding struggles, and the uncompromising standards behind Turbopuffer’s journey—setting a new bar for AI-driven infrastructure.
