The Pragmatic Engineer Podcast: How AWS S3 is Built
Host: Gergely Orosz
Guest: Mylan, VP of Data and Analytics at AWS
Release Date: January 21, 2026
Overview
This episode is a rare, behind-the-scenes exploration of Amazon Web Services Simple Storage Service (AWS S3)—one of the world's largest and most critical cloud storage platforms. Host Gergely Orosz is joined by Mylan, VP of Data and Analytics at AWS and long-time leader of S3, to discuss how S3 is engineered for reliability, durability, and innovation at a mind-boggling scale. Software engineers and tech leaders will find deep dives into distributed systems, strong and eventual consistency, formal verification, emerging use cases (like vector storage), and S3's unique engineering culture.
Key Discussion Points & Insights
1. The Sheer Scale of S3 ([01:04]–[03:27])
- S3 Stats:
- Over 500 trillion objects
- Hundreds of exabytes of data
- Handles hundreds of millions of requests per second
- Processes over a quadrillion requests yearly
- Runs on tens of millions of hard drives across millions of servers in 120 availability zones across 38 regions
- Memorable Visualization:
"If you imagine stacking all of our drives one on top of another, it would go all the way to the International Space Station and just about back."
— Mylan [02:27] - Customers often take this scale for granted because "it just works" for them.
2. Origins and Evolution of S3 ([03:57]–[09:45])
- Launch and Motivation:
- Started in 2005, launched in 2006 as AWS's first service due to developer frustration at rebuilding storage infra.
- Original focus: simple, economic storage for unstructured data (e.g., images, backups).
- Designed for eventual consistency—good enough for e-commerce where minor delays were acceptable.
- Rise of Data Lakes:
- Early adopters (Netflix, Pinterest) paired S3 with Hadoop.
- Shifted from unstructured to structured/tabular data over the years.
- Massive adoption of formats like Parquet and, more recently, Apache Iceberg.
- Introduction of S3 Tables (Dec 2024) and S3 Vectors (July 2025 preview, GA soon after).
3. S3 Fundamentals: Primitives & Operations ([09:45]–[12:14])
- Primitives:
Put,Get,Delete,List,Copy—akin to HTTP verbs. - Building Blocks:
- Buckets, Objects, Keys: The basic units for organizing data.
- Tables & Vectors: Newer native data types—tables (for tabular data) and vectors (for AI embeddings).
- Engineering Philosophy: Keep the developer experience simple; every feature is scrutinized for simplicity.
4. S3’s Pricing Philosophy & Economic Growth ([13:36]–[17:16])
- Revolutionary Pricing:
- Launched at $0.15/GB/month—3–5x cheaper than alternatives.
- Continual price cuts; now closer to $0.02/GB/month.
- New storage tiers (e.g., Intelligent Tiering for auto-discounts, Glacier for archiving) help customers grow data without cost anxiety.
- Quote:
"You don't have that conversation with S3 customers because... you can grow the data that you need to grow."
— Mylan [15:16]
5. Engineering Tradeoffs: From Eventual to Strong Consistency ([19:37]–[28:51])
- Eventual Consistency Rationale:
- Favored early for high availability; durability less the focus.
- Indexing subsystem uses quorum-based algorithms and cross-AZ replication for resilience.
- Eventual consistency: Higher availability, lower cost/risk.
- Strong Consistency Transition:
- Built a replicated journal (distributed, sequential data structure) and cache coherency protocols.
- Key challenge: Achieve strong consistency without sacrificing availability or increasing costs to customers.
- Tradeoffs: Increased architectural complexity and hardware cost, but AWS decided not to pass this cost to customers or impact latency.
- Notable Moment:
"We made that explicit decision not to [raise prices]. ...Strong consistency... should just work for any request that comes into S3."
— Mylan [27:10] - Engineering Feat:
Rollout of strong consistency at S3 scale, with no customer-facing changes, is highlighted as a next-level accomplishment.
6. Correctness & Formal Methods in S3 ([28:51]–[34:53])
- Automated Reasoning & Formal Methods:
- Formal proofs are integrated into code check-ins for the index/consistency subsystems.
- Used to mathematically verify properties like consistency, cross-region replication correctness, and API correctness at S3's scale.
- Quote:
"At a certain scale, math has to save you."
— Mylan [33:16] - Ongoing Verification: Every check-in for critical subsystems is formally verified.
7. Durability Guarantees & Real-World Validation ([34:53]–[39:49])
- "11-nines" Durability:
- S3 aims for 99.999999999% durability (i.e., 11 nines), a level rarely claimed elsewhere.
- Hardware failures are expected and proactively handled.
- Auditor subsystems: Over 200 microservices, a subset constantly audits and repairs data, tracking durability in real-time.
- Reality at Scale: Failures are routine, but the system is designed to repair and recover without customer impact.
8. Distributed Systems Challenges: Failure, Correlation, and Crash Consistency ([39:49]–[44:58])
- Correlated Failures:
- Real risk isn't single machine failure, but failures in a shared domain (rack, AZ) impacting many replicas at once.
- Crash Consistency:
- Ensuring system recovers to a consistent state after a "fail-stop" event.
- Failure Allowance:
- Sizing system components to tolerate routine, expected failures, measured via extensive metrics and automated checks.
- S3's scale means applications running atop benefit from these massive, systemic safeguards.
9. Engineering Culture: Balancing Conservatism and Boldness ([46:04]–[49:07])
- Cultural Tenets:
- "Respect what came before"—deeply value reliability and conservative evolution.
- "Be technically fearless"—push boundaries, invent boldly (new data types, new primitives).
- Quote:
"You have to respect what came before...But then there's also this tenant... called Be Technically Fearless. And I believe that the S3 engineers are just amazing at this."
— Mylan [46:56]
10. S3 as a Living, Evolving Platform ([49:07]–[67:26])
- Continuous Evolution:
- S3 adapts based on customer use—observing how customers use tables, vectors, object sizes, etc.
- Over 1,000 new capabilities since 2020; both customer-driven and visionary "product shapes."
- "Product Shape":
- Describes S3's organic, plant-like evolution—new features and types build on a core of fundamental expectations (durability, availability, simplicity).
- Simplicity as a Principle:
- Both in public API (easy to use) and in disciplined, focused microservice design behind the scenes ("do one or two things very well").
- Complexity is carefully managed to avoid system sprawl.
11. Vectors and the Future of Data Storage ([49:54]–[59:48])
- Emergence of Vectors:
- Need for storing billions of AI embeddings led to the creation of S3 Vectors—a foundational new primitive.
- Not just object storage: now enables low-latency vector search (e.g., semantic queries over unstructured data), building on pre-computed vector neighborhoods for speed.
- Designed for extreme scale and predictable sub-100 ms queries for massive indexes.
- Broader Vision:
- S3 is being extended from object storage to a universal, scalable, cost-effective platform for any data format or workload.
- Quote:
"Scale is to your advantage. It just changes how you design."
— Mylan [60:02]
12. Working on S3: Engineering, Hiring & Advice ([70:10]–[73:32])
- Hiring Philosophy:
- S3 attracts both early-career and veteran engineers.
- Core trait sought: Deep sense of ownership and relentless curiosity.
- Advice for Prospective Engineers:
- Stay curious and creative; look beyond boundaries and be willing to redraw them.
- Embrace large-scale system thinking and always seek to simplify.
13. Looking Forward: AI, Multimodal Data, and Recommendations ([74:21]–[75:48])
- Future Trends:
- Mylan is particularly enthusiastic about multimodal embedding models—AI models that can understand data across multiple modalities (images, text, video).
- Encourages engineers to explore this expanding field, as it's likely to shape next-wave data systems.
- Book Recommendation:
- On supporting bees and native insects—a nod to Mylan’s background in forestry and the importance of ecological awareness. (A little off-topic, but highlights the holistic worldview of seasoned engineering leaders.)
Notable Quotes
- On scale:
"It’s really hard to get your brain around the scale of S3... but for our customers, they just focus on what S3 is to them, which is: It just works."
— Mylan [02:27] - On strong consistency:
"We made that explicit decision not to pass [the extra cost] along to customers... strong consistency should just work for any request that comes into S3."
— Mylan [27:10] - On formal methods:
"At a certain scale, math has to save you, right? Because you can’t do all the combinatorics of every single edge case. But math can save you and help you on this at S3 scale."
— Mylan [33:16] - On engineering culture:
"You have to respect what came before... but then there’s also this tenant called Be Technically Fearless."
— Mylan [46:56] - On evolution:
"When I think about S3, I think of it as almost like this living, breathing organism where the shape of the product is evolving."
— Mylan [66:35] - On vectors and the future:
"I think the next world of data lakes... is going to be on metadata, it’s going to be on the semantic understanding of our data."
— Mylan [74:21]
Timestamps for Key Segments
- Introduction & Scale: [01:04]–[03:27]
- S3 Origins & Data Lake Evolution: [03:57]–[09:45]
- S3 API & Primitives: [09:45]–[12:14]
- Pricing & Cost Philosophy: [13:36]–[17:16]
- Eventual vs. Strong Consistency: [19:37]–[28:51]
- Formal Methods & Proofs: [28:51]–[34:53]
- Durability in Practice: [34:53]–[39:49]
- Correlated Failure & Failure Allowances: [39:49]–[44:58]
- Engineering Culture: [46:04]–[49:07]
- Table & Vector Evolution: [49:07]–[59:48]
- Simplicity & Hiring at S3: [68:50]–[73:32]
- Future of Data & Book Recommendations: [74:21]–[75:48]
Conclusion
This episode offers an enlightening, deeply technical look into the engineering that powers AWS S3. From how the world's data is reliably stored and accessed at cosmic scale, to the advanced methods for ensuring consistency and durability, and the cultural values that drive the team, listeners get both practical and philosophical insights. For anyone building large-scale, distributed systems or interested in the cutting edge of cloud infrastructure, this conversation is invaluable.
