AWS Podcast #741 Summary
Modernizing Edge Infrastructure: Booking.com's Journey with AWS CloudFront and Lambda@Edge
Release Date: October 13, 2025
Guests: Ali (Networking & Traffic Management, Booking.com), Sarah (Principal Solutions Architect, AWS)
Host: Gillian Ford
Episode Overview
This episode offers a deep dive into how Booking.com modernized its global infrastructure by migrating edge networking to AWS CloudFront and Lambda@Edge. The conversation demystifies their architectural considerations, observability strategies, resilience via chaos engineering, granular business logic at the edge, and transformative cost optimizations—including saving $500,000 with a single line of code. It’s a case study in large-scale, data-driven transformation, but one with clear lessons for organizations of any size.
Key Discussion Points & Insights
1. Booking.com: Platform Evolution and the Drive to Modernize
-
Introduction to Booking.com
- Initially a Dutch start-up focused on hotel bookings, now a comprehensive, global travel platform spanning hotels, homes, flights, cars, attractions, and airport taxis.
- “It’s a full end-to-end booking platform with the mission of making it easier for everyone to experience the world.” — Ali [02:17]
-
Legacy Infrastructure and Challenges in 2022
- Nearly 30 years of diverse infrastructure: bare metal, monolithic, private cloud, containerized, and serverless.
- Europe-centric data centers and content delivery made it challenging to serve a rapidly growing global customer base.
- Experimented with “miniPOPs” (mini data centers) to improve global access, but scaling was operationally and financially unsustainable.
- Key migration motivator: the need for a globally distributed, manageable, and secure edge solution.
[02:59–07:00]
2. Choosing CloudFront & Lambda@Edge: Requirements and Decision-Making
-
Core Requirements:
- Global presence (CloudFront’s 700+ points of presence)
- Seamless AWS integration
- Measurable, data-driven performance improvements
- Managed operations and reduced toil
- Strengthened, unified security posture
-
CloudFront as the Chosen Solution:
- Native AWS integration, security features (WAF, Shield, Bot Control), and long-lived connections met all requirements.
- “CloudFront… fit like a piece of a puzzle in the remaining part of our infrastructure.” — Ali [07:22]
[07:00–10:51]
3. Security Transformation at the Edge
- Old Model: Many entry points; complex, fragmented effort to enforce perimeter security globally.
- CloudFront Model: Security focus unified at the Edge, with centralized tooling and easier policy enforcement.
- “Now… our security teams can put all their focus on the Edge.” — Ali [12:18]
[10:51–12:25]
4. Technical Overview: CloudFront and Lambda@Edge Basics
-
What is CloudFront?
- AWS’s CDN, globally caching and delivering content via Edge locations.
-
What is Lambda@Edge?
- Serverless compute that runs code at Edge locations, allowing real-time content and request/response customization.
- “That opens to a variety of use cases as well.” — Sarah [13:44]
-
Common Lambda@Edge Scenarios:
- Request/response manipulation, A/B testing, HTTP header changes, SPA routing, video manifest manipulation.
[12:25–14:28]
5. Booking.com’s Architecture: 100% Traffic Through the Edge
-
Dynamic Content and Edge Reverse Proxy:
- Not just static—100% of traffic (including dynamic calls, APIs) routed via CloudFront as a long-lived reverse proxy.
- Massive latency improvement for remote users, up to 30% faster request times.
-
Technical Details:
- Pre-established connections between Edge locations and origin data centers keep latency low.
- “99.7% of our traffic comes through a pre-established connection… massively improved the latency.” — Ali [17:18]
[14:28–19:23]
6. Data-Driven Experimentation & Observability Culture
-
Advanced Experimentation:
- Every change, from button color to CDN provider, is A/B tested for 2–3 weeks.
- Obsessed with measuring impact on business and technical metrics.
-
Observability Practices:
- Comprehensive end-to-end logging and analysis with S3, Athena (later Elasticsearch/OpenSearch), Kinesis, Envoy/HAProxy metrics, and CloudFront server timing headers.
- Emphasis on cost management within observability, fine-tuning sample rates for cost-effective insight.
- “As a rule, you only enhance what you measure.” — Ali [19:23]
-
Advice:
- Build observability first, but monitor its cost closely—log volume and API charges scale rapidly.
[19:23–24:54]
7. Architecture Deep Dive: End-to-End Resilience
-
Request Journey:
- DNS (Route 53) geo-routes user to nearest CloudFront POP, connection established, edge keeps secure, long-lived connections to the origin.
- Extra security via secret headers; redundancy and failover built into every layer, including DNS, load balancing, and data centers.
-
Chaos Engineering:
- Frequent, regular failovers (minimum once per quarter per critical component)
- Unannounced drills to verify redundancy and raise organization-wide reliability standards.
- “The only way for you to be able to absorb failure reliably is to fail all the time. And that's exactly what Chaos Engineering does.” — Ali [29:58]
-
Advice for Adoption:
- Start with planned drills, then increase frequency and surprise element as confidence grows.
- “You start by planning the drill… then more often… you slowly get to a point where failover becomes as well business as usual.” — Ali [33:02]
[25:03–35:20]
8. Scalability: Startups, Scale-ups, and Enterprisess
-
CloudFront and Lambda@Edge are for Everyone:
- Suitable for businesses of any size—from day one.
- Startups benefit from immediate global performance, availability, and reduced operational complexity.
- As businesses grow, architecture scales with them.
-
Shoutout:
- “Ali’s team really moved with the agility of a startup.” — Sarah [38:17]
[36:19–39:13]
9. Building a Platform on Lambda@Edge
- Platformization:
- Multiple teams wanted computing at the edge; Booking.com built a TypeScript-based Lambda@Edge platform supporting modularization, parallel execution, fail-safe isolation, and granular config/firefighting.
- Example Use Cases:
- A/B experimentation system now runs at the edge, for tests between legacy and modern stacks.
- Unified authentication logic and bot protection, easing migration from legacy to modern services.
- “By moving this into the edge… that really freed our service owners to easily move between architectures.” — Ali [45:52]
[39:13–46:38]
10. Business and Performance Impact of Edge Logic
- Accelerated Modernization:
- Freed teams to migrate services without re-implementing shared middleware.
- Performance Optimizations:
- Code leanliness, careful sizing, and parallel modules drove average Lambda@Edge response down to only ~5ms per request.
[45:41–55:04]
11. Best Practices & Advice for Edge Architecture
-
General Guidance:
- Apply serverless best practices: tight cost controls, structured log management, sampling, and lean code.
- Be intentional with Lambda triggers (origin vs viewer requests) to optimize performance and save costs.
- "If you’re using Node.js, use tree shaking… keep your code as lean as possible. Your Lambda will be more performant; you'll reduce your cold start and your duration." — Sarah [53:38]
- Invest in monitoring, observability, and regular performance fine-tuning.
-
Cost Optimization Wisdom:
- Be wary of hidden observability/logging costs (CloudWatch, etc.).
- Major savings came from disabling default Lambda logging at the edge—a single configuration change saved over $500,000/year.
- “It literally took us one little line in our terraform model to disable those standard logs… and it really saved us 10 to 1000 months which add up to more than half a million a year.” — Ali [58:51]
[56:15–59:06]
12. Parting Advice and Reflections
-
Engage Deeply with AWS:
- Don’t treat AWS as a black box; interact with solution architects, ask for features, and shape the product roadmap.
- “Don't be afraid to nag over AWS and ask for exactly what you need.” — Ali [60:05]
-
Edge Technology as an Enabler:
- Use edge and serverless not just for speed and scale but to unlock new business models and customer opportunities.
- “Think about which markets you can unlock, that you haven’t unlocked as a business… partner with your business stakeholders.” — Sarah [62:22]
[59:06–63:27]
Notable Quotes & Moments
- How Edge Transforms Security:
- “Moving to cloud front and using the standard WAF and SHIELD solutions allowed our security teams to really focus their parameter security work on the edge.” — Ali [11:13]
- On Observability:
- “As a rule, you only enhance what you measure.” — Ali [19:23]
- On Chaos Engineering:
- “The only way for you to be able to absorb failure reliably is to fail all the time. And that's exactly what Chaos Engineering does.” — Ali [29:58]
- On Migration Impact:
- “Last time we checked, after we moved to Cloudfront, 99.7% of our traffic comes through a pre-established connection…” — Ali [17:18]
- On Cost Optimization:
- “It literally took us one little line in our terraform model to disable those standard logs… and it really saved us… more than half a million a year.” — Ali [58:51]
- On Engaging AWS:
- “Don't be afraid to nag over AWS and ask for exactly what you need… if they say yes you would get exactly what you want.” — Ali [60:05]
- On Business Value of Edge:
- “Think about how… you can unlock new business opportunities…” — Sarah [62:22]
Timestamps for Key Segments
- [02:13] Booking.com background and legacy infrastructure
- [07:22] Requirements and reasons for selecting CloudFront
- [10:51] Security improvements at the edge
- [12:39] CloudFront & Lambda@Edge explained for beginners (Sarah)
- [14:45] 100% traffic (static & dynamic) through CloudFront
- [19:23] Observability pipeline and best practices for measurement
- [29:58] Chaos engineering explained and Booking.com’s practice
- [36:19] Suitability of these architectures for startups and smaller orgs (Sarah)
- [39:26] Lambda@Edge modular platform & example use cases (Ali)
- [56:43] Cost optimization stories and logging savings
- [59:30] Final advice for customers moving to edge-based architecture
Final Takeaways
- CloudFront and Lambda@Edge architecturally transform performance, scalability, and business agility, not only for enterprises like Booking.com but for organizations of any size.
- Data-driven experimentation, deep observability, and an openness to regular failure are foundational to Booking.com’s success with edge modernization.
- Effective cost management—including small config changes—can yield massive financial savings at scale.
- Don’t hesitate to engage AWS service teams; customer feedback directly influences product evolution.
This summary captures the technical depth, hands-on lessons, and strategic guidance shared by Booking.com and AWS during this episode. Whether for architects, engineers, or business stakeholders, the discussion covers actionable strategies for adopting and optimizing global edge infrastructure.
