Software Engineering Daily
Episode: Turbopuffer with Simon Hørup Eskildsen
Date: September 30, 2025
Host: Gregor Vand | Guest: Simon Hørup Eskildsen
Episode Overview
This episode explores the creation and technology behind Turbopuffer, a next-generation vector database engineered for speed, cost efficiency, and scalability. Host Gregor Vand interviews co-founder Simon Hørup Eskildsen, delving into the problems with existing vector databases, the technical and economic innovations that Turbopuffer introduces, real-world use cases, and broader architectural decisions that shape its development. The conversation is highly technical and insightful for developers and engineering leaders interested in AI infrastructure and database scalability.
Simon Hørup Eskildsen: Path to Turbopuffer
[02:02]
- Simon’s background includes nearly a decade at Shopify, where he led scaling of their infrastructure from hundreds of requests per second to over a million.
- Cites persistent database bottlenecks as core challenges at scale, fueling his deep focus on databases and storage.
- After Shopify, consulted for smaller SaaS companies, “tuning Postgres” and solving infrastructure issues—mainly vacuuming (Autovacuum) in Postgres.
- The gap year(s) allowed him to observe changes in tech, especially new demands for scalable data to support AI workloads.
- Simon realized: “A new type of database was ready to be built…” driven by the needs of AI, new storage architectures, and the high costs of vector-first databases.
Quote:
"I spent almost the entire time working on that layer between the Rails app and the databases, and sometimes inside the databases themselves, but mostly on top of them. So yeah, almost 10 years working on every single aspect of database scalability at Shopify." — Simon [02:13]
Turbopuffer Origin Story & Market Need
[05:26]
- The idea arose during work with Readwise, a company that needed advanced search and recommendations for ‘Readwise Reader’.
- Initial attempts to use vector indexes with existing databases (e.g., Postgres, third-party SaaS vector DBs) revealed eye-watering cost increases—30x amplification or more in storage and compute.
- Example: For $3,000/month Postgres, equivalent vector DB solutions projected ~$30,000/month.
- Vectors inflate storage needs immensely: 1KB of text can become 20–30KB after chunking and embedding into vectors.
Quote:
"For vector indexes, we're now talking about 30x [storage amplification], right? Just to store the vectors, let alone build an index." — Simon [07:26]
Technical Innovations Behind Turbopuffer
Leveraging Modern Storage Primitives
[09:28]
- Turbopuffer’s architecture is enabled by two ecosystem shifts:
- NVMe SSDs: Orders of magnitude cheaper than RAM, yet fast enough to serve most workloads.
- Cloud Object Storage with New APIS: S3’s strong consistency (2020) and compare-and-swap primitives (2024/2025) allow direct, database-grade storage and coordination without intermediary services.
- As a result, Turbopuffer stores vectors in object storage, using DRAM and SSD as local caches.
Quote:
"Per gigabyte [NVMe SSDs are] about 100 times cheaper than DRAM... most databases haven't built around getting all that bandwidth through. You have to bypass the Linux page cache to get the maximum." — Simon [09:49]
Vector Indexing Algorithms: Graph vs Cluster
[12:34]
- Outlines two primary approaches to approximate nearest neighbor (ANN) vector search:
- Graph-based: Excellent for in-memory performance, but poor on disk/object storage due to high roundtrip costs.
- Cluster-based: Better for disk/cloud environments. Fetches entire centroids/clusters in a couple of efficient bulk I/Os, minimizing slow random seeks.
- Turbopuffer opts for cluster-based indexing, crucial for enabling cheap, scalable cold storage retrieval.
Quote:
"Graphs just don't work that well for it... they are amazing for in-memory and it's almost impossible to beat the performance in memory. But when things are on disk or they're on S3, you have roundtrip latencies into hundreds of milliseconds..." — Simon [17:26]
Cold vs Warm Data Access and Performance
[20:32]
- Cold queries (data not in cache) might involve up to 3 object storage roundtrips, each ~200ms—delivering “sub-second” cold access even for large, rarely accessed datasets.
- Warm queries (cached in RAM or SSD) are as fast as top in-memory solutions.
- Turbopuffer is designed to “inflate” like a pufferfish—residing only in storage when idle, but buffered up through cache hierarchies when active.
Quote:
"The canonical source of truth for all data in TurboPuffer is object storage... As you query the namespace more, TurboPuffer gets faster." — Simon [20:32, 23:00]
Developer Experience & Visualization
[23:40]
- The console is operational and straightforward—visualization/playground features are spare for now, with customers typically exporting namespaces to visualize embeddings.
- Focus remains on robust database infrastructure versus flashy GUIs.
Quote:
"Turbopuffer's console is still fairly simple and operational... we've been very focused on the database itself." — Simon [23:46]
Measuring and Ensuring Quality: Recall
[24:49]
- Recall—the percentage match between approximate result sets from ANN versus exact (brute-force) search—is tracked obsessively.
- Turbopuffer samples ~1% of production queries for recall tests and flags anything below 90% (goal: >95%).
- Real-world production benchmarks matter most, as “academic benchmarks” don’t capture messy customer reality.
Quote:
"We will get big red dots if someone's recall is below 90%. ...nothing matters other than production and it's going to be the same for accuracy." — Simon [25:40, 26:18]
Native Filtering
[28:18]
- Filtering is a core challenge: post-filtering works well when filters are loose (e.g., omit only 1% of items), but tight filters dramatically impact recall/precision unless the planner is clever.
- Turbopuffer’s planner dynamically adjusts fetch size and filter order on a per-query basis to ensure high recall.
Quote:
"The query planner that plans how much data the query needs to look at to get high recall needs to be very aware of both the vector index and also the filtered index to get high recall." — Simon [30:36]
Core Architecture: the “Spfresh” Index
[32:05]
- “Spfresh” is Turbopuffer’s custom, cluster-based vector index that supports incremental writes and heuristics to maintain cluster sizes and quality.
- Handles complex cases like merging, splitting, and recalculating centroids efficiently, even as data grows to billions of records.
- Most complex portion of the Turbopuffer codebase.
Quote:
"This is probably the most complicated part of the entire turbopuffer code base. To make this work at very, very large scale and make it work with recall and make it work with filters..." — Simon [33:54]
Real-World Scalability: Namespacing
[37:13]
- TurboPuffer supports massive multi-tenancy via explicit namespacing—a direct one-to-one mapping from namespace to storage prefix, enabling strong isolation and security for per-user data.
- Over 100 million namespaces in production; encryption can be handled per namespace.
- Simon credits Shopify experience for recognizing sharding as the only way to scale massively multi-tenant systems.
Quote:
"We decided at TurboPuffer to make namespacing a core sharding primitive that we expose to the user... this works great because it also means that we can encrypt every individual namespace differently..." — Simon [37:13]
Real-World Adoption: Notion and Cursor
[39:51]
- Both Notion and Cursor are flagship customers, using Turbopuffer for large-scale semantic code/document search and context injection into AI assistants.
- Cursor: Leveraged S3-based storage to reduce costs by 95% vs in-memory vector stores; only active codebases are “inflated” into memory.
- Notion: Similar story of outgrowing in-memory vector DBs as workloads scaled.
Quote:
"For them, the storage architecture of having everything in memory, which was the previous solution that they were on, just didn't make a lot of sense. ... Their first bill was reduced by 95% to move on to TurboPuffer." — Simon [41:47]
Pricing Model
[44:19]
- No free plan—focus is on being a premium, high-value infrastructure product for customers with serious scale and needs.
- “Try it for 30 days”—if dissatisfied, cancel.
- The economics make sense only as workloads reach scale (tens of millions of vectors); smaller use cases may continue to use add-ons for traditional DBs.
Quote:
"It's easier to provide people really good experience when you have a commercial only offering. And we can put behind the necessary staff to support people when they have questions and give them a really good experience. So that you get the feeling that we're part of your team." — Simon [44:44]
Organization and Future Direction
[45:38]
- Team: Fewer than 20 people—majority are high-caliber database engineers.
- Short-term roadmap: Add more “classic” database features (e.g., conditional writes), mature full-text search, keep driving generic storage engine performance.
- Values engineering density and high standards.
Quote:
"We really want to have a dense like P99 engineering environment and so we've tried to hold the standards high on team that we put in front of our customers that are developing this product." — Simon [45:47]
Visual Language and Branding
[47:37]
- Turbopuffer’s branding embraces a “neo-retro pixel art” aesthetic, chosen to infuse fun and make key product information immediately accessible.
- Simon wanted a database website that’s direct: “What does it cost? What are the trade-offs? Who are the customers?”—the rest is just distraction!
- The pufferfish metaphor (inflating as data is queried) is both whimsical and functional, echoing database design.
Quote:
"Any database has trade offs. Like is this the right set of trade offs for me? What's the architecture what are the guarantees? ... this show, Don't Tell just was very important to me. And then we wanted to breathe a little bit of fun into it." — Simon [48:22]
Where to Learn More
- Website: turbopuffer.com
- Simon on X (Twitter): @sirupsin
- Social media links and blog posts available on the official website.
Memorable Quotes
- "Per gigabyte [NVMe SSDs are] about 100 times cheaper than DRAM... most databases haven't built around getting all that bandwidth through." — Simon [09:49]
- "Graphs just don't work that well for it... they are amazing for in-memory and it's almost impossible to beat the performance in memory. But when things are on disk or they're on S3, you have roundtrip latencies into hundreds of milliseconds..." — Simon [17:26]
- "We will get big red dots if someone's recall is below 90%... nothing matters other than production." — Simon [25:40, 26:18]
- "We have more than 100 million namespaces. And this works great because it also means we can encrypt every individual namespace differently..." — Simon [37:13]
- "Cursor's first bill was reduced by 95% to move on to TurboPuffer." — Simon [41:47]
- "We really want to have a dense like P99 engineering environment..." — Simon [45:47]
- "Any database has trade offs... this show, Don't Tell just was very important to me." — Simon [48:22]
Key Timestamps
- [02:02] Simon’s Shopify background and lessons on scaling
- [05:26] Readwise and the economic motivation for Turbopuffer
- [09:28] NVMe, cloud object storage, and enabling tech shifts
- [12:34]/[17:26] Comparing cluster vs. graph vector index approaches
- [20:32] Cold/warm query strategies; sub-second cold latency
- [24:49] Recall, real-world measurement, and filtering challenges
- [32:05] Turbopuffer’s Spfresh index and cluster management
- [37:13] Namespacing for massive scale and security
- [39:51] Real-world adoption: Notion and Cursor case studies
- [44:19] Pricing rationale and target customer
- [45:38] Team composition, future plans, and product direction
- [47:37] Branding, visual language, and the pufferfish metaphor
Episode Tone
Highly technical, candid, and engineering-driven; Simon is practical and transparent about trade-offs, mistakes, and real customer needs, with an undercurrent of developer fun and enthusiasm.
For listeners: If you want to understand the modern challenges and solutions in AI-first infrastructure and how storage economics shape what’s possible for vector search at scale, this episode delivers a masterclass.
