AWS Podcast #735: The Frugal Architect w/ Werner Vogels — Zillow's Chief Architect on Why Cheap ≠ Frugal
Release Date: September 1, 2025
Guests: Craig Link (Chief Cloud Architect, Zillow), Werner Vogels (CTO, Amazon)
Hosts: Simon Elisha, Hawn Nguyen-Loughren
Overview
This episode dives into the nuanced difference between being cheap and being frugal in cloud architecture and engineering, emphasizing how informed, intentional choices drive effective innovation and business value. Craig Link shares a narrative arc from formative childhood lessons on frugality to real-world scenarios optimizing large-scale, cost-effective cloud infrastructure at Zillow and previous ventures. The panel, with Werner Vogels’ perspective, explores concrete technical strategies, hard-won lessons, and their broader implications for teams balancing innovation, resilience, and cost.
Key Discussion Points & Insights
1. Origin Story: Frugality vs. Cheapness
⏩ 01:14 – 03:20
- Craig Link recounts family road trips where his father's approach prioritized spending on experiences, not logistics, and measured efficiency (like miles per gallon):
“We'd minimize how many stops we'd have [and] really maximize…the things that we could do at our destination…A certain frugality of how you're choosing to spend your money and your time in the places you want it to be.” (Craig, 02:12)
- The experience instilled a mindset of focusing resources on what truly matters — an early form of cost observability.
- Werner Vogels distinguishes being frugal (intentional, value-driven) from being cheap (mindless cuts):
"Cheap and frugal are two very different things. Where frugal is a conscious decision to spend your money on those things that really matter to you for sure. Where cheap is for whatever reason." (Werner, 03:20)
2. Innovation Born of Constraints
⏩ 04:26 – 09:13
- Early Microsoft Gaming Zone:
- Craig describes technical constraints of 90s dial-up gaming—every byte counted.
- Innovations included virtual LAN drivers and bit-packing to minimize network load:
“Bytes were almost too big at that point…you really bitpack things…how do we really optimize the amount of traffic because the band was slow…as you mentioned, bytes were almost too big at that point.” (Craig, 05:05)
- Pooling and reduction strategies reduced bandwidth while improving user experience.
- Werner draws parallels to the Kindle’s networking design, where Amazon ate connectivity costs but had to engineer carefully to avoid business-impacting overruns.
3. Scaling: Elasticity & Pragmatic Overprovisioning
⏩ 09:13 – 12:23
- FigurePrints Startup:
- Craig’s team brought 3D-printed World of Warcraft figurines to market, facing sporadic bursts of massive traffic.
- Used early AWS EC2 instances to scale rendering during blog promotions, and learned the priority of "throwing capacity" at critical problems before optimally resizing:
“…Sometimes it makes sense to over provision…It's better to solve it, reduce the customer impact…get things stable and then right size it appropriately.” (Craig, 11:14)
4. Deep Technical Optimization at Glimpse
⏩ 12:51 – 16:19
-
As a location-sharing startup, Glimpse needed maximum efficiency per AWS instance.
-
Profiling revealed JSON serialization was a bottleneck, prompting a custom, highly-optimized serializer:
“We wrote our own custom JSON serializer…set up to know…where could we reduce memory copies…measured to be about 2.8 times faster than any other open source or built in JSON serializer…” (Craig, 15:08)
-
Key takeaway: Identify and solve for the bottleneck that has the biggest business impact.
-
Werner adds: Amazon shifted from bloated libraries to minimal, optimized code for significant savings (“We wrote an open source version … purely focusing on performance and minimizing bytes on the wire … that saves us a significant amount of money…” (Werner, 16:19)).
5. Zillow’s Cloud Evolution: Visibility & Automation
⏩ 16:58 – 23:51
-
About Zillow:
- Real-estate marketplace focused in North America; highly variable and regional workloads.
- Craig led the on-prem-to-cloud migration, emphasizing repeatable infrastructure and cost visibility.
-
FinOps, Tagging, and Automation:
- Tagging is critical for cost allocation, accountability, and clarity — but spelling errors and lack of uniformity present challenges.
“You need to understand…not just the account basis or the whole bill…we do slice and dice that based on these kind of business lines and even down to the team level.” (Craig, 22:55)
- Zillow built internal tools (service catalog, ETL pipelines) to create, normalize, and maintain high-fidelity tagging even in legacy/’lift-and-shift’ environments.
- “Guardrails” system (inspired by Amazon’s own approaches) flags best practice, security, and cost issues, balancing autonomy and compliance.
- Tagging is critical for cost allocation, accountability, and clarity — but spelling errors and lack of uniformity present challenges.
6. Real-time Cost Feedback & Guardrails
⏩ 23:51 – 27:29
- Engineers get real-time dashboards and JIRA tickets for policy violations or spend anomalies.
“With this internal service catalog…you can actually see kind of what your spend is based on those tags…” (Craig, 25:03)
- “Guardrails not gates”—the system alerts and suggests, rather than blocks, allowing teams to move fast but correct course when necessary.
- Automation is evolutionary; integration with IaC/CICD is planned, trade-offs between business features and infra improvements are ongoing.
7. Balancing Optimization, Experimentation & Practical Lessons
⏩ 28:00 – 39:18
- Premature optimization can backfire (Knuth’s law), as with NAT optimization for Kubernetes clusters that inadvertently raised costs:
“…It basically saved us around $15K a month…very simple change to get there of something that we thought we’d need that we ended up really not needing.” (Craig, 31:00)
- Observability:
- Make systems observable across metrics: cost, logs, traffic, resource utilization.
- Rapid iteration and pivoting is now viable—cloud as code.
- Cost spikes are “the canary in the coal mine”—often the first sign of deeper inefficiency.
- Werner: Historical example—Amazon reduced search cost 4x by switching from 32-bit to 64-bit instances after benchmarking.
8. Aligning Cost with Business Activity
⏩ 33:30 – 35:07
- Monitor cost relative to usage; increases driven by growth are good, unexplained increases are red flags:
“If cost is going up and the number of transactions, users…what have you is going up as well…that's a happy problem.” (Craig, 33:48)
- Always ask why—is the cost justified, or is it a sign of waste? Cultural curiosity is key.
“If nobody asks the question why, you won’t catch that.” (Craig, 34:27)
9. Empowering Engineers: “Think Big, Move Fast”—Without Waste
⏩ 35:41 – 38:59
- Zillow promotes rapid prototyping and experimentation, but with conscious awareness of cost and value.
“We definitely encourage people to experiment…think big. And move fast…but also want to empower our engineers…It’s that shared ownership…being aware of your spending.” (Craig, 36:09)
- With generative AI and new services, trade-offs (like choosing between heavyweight and lightweight Foundation Models) can be significant.
“Trading off quality versus cost there is important, I think.” (Werner, 37:16)
- Guardrails provide safety, rapid feedback, and course correction—empowerment, not restriction.
10. Final Reflections and Advice
⏩ 39:18 – End
- Craig Link’s Advice:
“Make sure that you really democratize the data and get it to any and everybody’s hands…You want everybody working on it and kind of being a shared ownership model.” (Craig, 39:39)
- Lower barriers to observability, empower all engineering teams with actionable data & accountability.
- Constraint breeds creativity:
“Constraints can breed creativity. I mean, it forces our human brains to live. I mean, AI can’t fix this. This is something that only we as humans can do.” (Werner, 41:09)
Notable Quotes
-
“Frugality is a conscious decision to spend your money on those things that really matter to you…where cheap is for whatever reason.”
— Werner Vogels, 03:20 -
“Bytes were almost too big at that point…how do we really optimize the amount of traffic because the band was slow…”
— Craig Link, 05:05 -
“Sometimes it makes sense to over provision…It’s better to solve it, reduce the customer impact…and then right size it appropriately.”
— Craig Link, 11:14 -
“We wrote our own custom JSON serializer…measured to be about 2.8 times faster than any other … out there.”
— Craig Link, 15:08 -
“Cost is the canary in the coal mine…If you see true spikes in your cost that are not related to your business activity, you have a benchmark to put this against.”
— Werner Vogels, 32:12 -
“If nobody asks the question why, you won’t catch that.”
— Craig Link, 34:27 -
“Constraints can breed creativity. I mean, it forces our human brains to live. I mean, AI can’t fix this.”
— Werner Vogels, 41:09
Key Takeaways
- Frugality ≠ Cheapness: Strategic investment and intentional constraints empower greater creativity and ultimate business value.
- Embrace Observability: Democratize access to cost and performance data across engineering teams—let those closest to the work make informed decisions.
- Guardrails, Not Gates: Enable autonomy while enforcing organizational objectives through transparent, automated feedback systems.
- Technical Optimization: Bottlenecks are context-dependent—profile, measure, and surgically optimize.
- Business Context: Link technical metrics (especially cost) tightly with business outcomes—never view in isolation.
- Iterative Mindset: Be prepared to pivot, learn from anomalies, and promote a culture of curiosity and shared responsibility.
For further feedback and to engage with the show, visit awspodcast.com.
