Podcast Summary
Episode Overview
Podcast: To The Point - Cybersecurity
Episode: Cloud War Games: Building Disaster Muscle Memory and Collaborative Resilience in DevOps Teams
Host(s): Rachael Lyon and Jonathan Knepher
Guest: Matt Lea, Creator of Cloud War Games
Date: December 16, 2025
This episode explores the concepts of incident response in cloud environments, emphasizing the benefits of realistic disaster simulations for DevOps and cloud teams. Matt Lea shares how his Cloud War Games platform helps organizations build “disaster muscle memory” and foster collaborative resilience. The conversation ranges from hands-on team drills and business continuity, to the latest security implications of AI and automation in cloud ecosystems.
Key Discussion Points & Insights
1. Origin of Cloud War Games
- Simulating Outages for Real Training ([01:54])
- Matt saw junior engineers freeze during outages due to pressure and uncertainty, especially with high-stakes clients (e.g., “If they go down, they lose about $100,000 an hour”).
- Developed tabletop and later real-infrastructure simulations: “I actually have a stack of all these 3 a.m. problems… figured we might as well make some lemonade.”
- Platform evolved from whiteboard “Dungeons & Dragons”-style diagrams to realistic, replayable cloud disaster scenarios.
2. Collaborative vs. Competitive Response: Team Dynamics
-
Building Collaborative Skills ([04:13])
- Simulations emphasize teamwork: “I try and make [scenarios] more collaborative… One person checks DNS, another database metrics.”
- Drills help teams intuitively coordinate, which is missing in traditional, sporadic tabletop exercises.
-
Revealing Knowledge Silos ([05:38])
- “Take your lead engineer, have them sit on their hands, and see how the rest of the team handles an incident. You find out where the knowledge gaps are.”
- Encourages organizations to address “single point of failure” risks.
3. Culture of Quick Iteration and Calculated Risk
- Combating Fear of Failure and Indecision ([07:19])
- “A lot of times I see a fear of failure, which is interesting… you can iterate extremely fast.”
- Teaching which actions are reversible in cloud environments is crucial: “It's good to know which switches to flip quickly, and which are irreversible. Deleting a production database—irreversible. Scaling down Docker tasks—recoverable.”
4. Executive Buy-In and Real-Time Drills
- Urgency Arrives After an Incident ([09:25])
- “If I approach a company that hasn't seen a cyber incident, it's not a priority. The day after it is, that's when I get a call.”
- Real-time, surprise drills drive home learning more than scheduled tabletop plans.
5. Differentiating Between Attacks & Outages
- Diagnostic Approaches ([10:53])
- “I create dashboards starting at the external layer... looking for spikes at every layer: Route 53, API Gateway, ALBs, C2 tasks, CPU usage.”
- Emphasizes diagramming: “Cloud is intangible, but if you can see the map and metric discrepancies, you can track the issue.”
6. Containment Strategies for Credential Leaks
- Damage Control, Not Panic ([13:34])
- “We disabled the credentials—not delete, because that could kill off major third-party services.”
- Engineers must “do the math” on business continuity: “Engineers think in ones and zeros, C suite in dollars and cents.”
Memorable Quotes & Moments (with Timestamps)
-
On Simulation-Driven Learning
“Wouldn’t it be great if we could simulate this disaster… in our staging environment… rerun the same disaster we just hit.”
—Matt Lea ([02:22]) -
On Team Collaboration
“More efficiency: one person attacks from the back, another from the front end, another at the application layer… it’s not really intuitive off the bat, but if you coordinate and rehearse this, it becomes much more intuitive.”
—Matt Lea ([04:30]) -
On Knowledge Silos
“Many customers have single point of failures around a key person and don’t even realize how bad it is. I strongly suggest you take their keyboard away and break something, and see how long it takes to come back.”
—Matt Lea ([06:08]) -
On Mistaken Engineer Mindset
“Engineers think in ones and zeros; the C suite, the language of business is dollars and cents. If you can think in both, you'll go far.”
—Matt Lea ([14:39]) -
On Credential Exposure
“Literally had to fly my guy across the country with a hard drive to get code… only to find it public on GitHub a month later with keys intact!”
—Matt Lea ([16:30]) -
On Internal Security
“You should never have an API key for API access with admin rights. That’s insane. Lock down public access, use multiple security group/firewall layers, and private subnets.”
—Matt Lea ([17:28]) -
On AI and Agentic Bots
“Right now the big trend is bot traffic… it used to be easy to tell humans from bots, but now with agentic bots, it’s murky—sometimes we want to let them buy products.”
—Matt Lea ([19:18]) -
On Risks of Generative AI in Business Processes
“The bot that can issue a refund is a dangerous bot. I’d never let a bot issue a refund without a human in the loop. They hallucinate—you need hefty logging and restriction.”
—Matt Lea ([21:20]) -
On Interns & LLM-based Bots
“After working with LLMs, I came to the conclusion they're basically like having an intern that lies to you.”
—Matt Lea ([24:21])
Notable Segment Timestamps
- [01:26] Introduction to Matt Lea and Cloud War Games platform
- [01:54] Real-world inspiration for disaster simulations
- [04:13] Empowering teams via collaborative troubleshooting
- [05:38] Exposing knowledge silos and single points of failure
- [07:19] Team culture, learning from mistakes, fear of failure
- [09:25] When organizations start prioritizing resilience training
- [10:53] Differentiating downtime from cyberattacks: metrics and dashboards
- [13:34] Business calculus in incident response (credential leaks)
- [15:03] Security guardrails and principle of least privilege stories
- [17:28] Hardening internal network security
- [19:18] Modern trends: bot/agentic traffic and AI implications
- [21:01] Security & business process concerns with generative AI
- [24:21] LLMs: “interns that lie to you”
- [26:26] Custom vs. off-the-shelf AI—guidance for startups
- [28:14] Multi-cloud/multi-region as startups grow
- [31:56] Matt’s personal path into cybersecurity (“started by taking apart video games”)
- [34:02] Finding Matt Lea & Schematical online
Additional Insights
Trends in Cloud Security and AI (19:18–25:47)
- Surge in bot and agentic traffic blurring the distinction between legitimate and malicious automation.
- AI becoming embedded in business processes—Matt warns against over-automation, especially with functions like issuing refunds.
Multi-Cloud and Technical Debt (28:14–31:26)
- For early-stage companies, prioritize delivery over premature optimization.
- Multi-region/multi-cloud considered only once business scales enough to justify the complexity and cost.
- Technical debt should be understood as a form of leverage—okay if growth outpaces its “interest.”
Career Insights
- Matt’s journey: From hacking video games in 6th grade, to wearing many hats at startups, to cybersecurity and DevOps consulting at the executive level.
- Advocates cross-training and “knowing the language of business.”
Community Resources
- Matt’s YouTube and comics: Schematical (YouTube.com/schematical and schematical.com).
Takeaway Lessons
- Simulated disaster exercises foster team resilience and expose hidden weaknesses before they become crises.
- Incident response must balance technical rigor with business realities—understand the cost of downtime and the importance of communication.
- Guardrails: Principle of least privilege, restrict admin/API key access, and use layered security.
- AI is a tool but not a magic bullet—deploy with caution, especially in automated decision-making.
- Even the best security protocols can be upended by human error—train, repeat, and foster a learning culture.
- Career path: Blend curiosity, technical skill, and business sense for cybersecurity leadership.
For More
- Visit schematical.com and YouTube.com/schematical for Matt Lea’s comics and cloud training content.
- Listen to more episodes via Apple Podcasts, Spotify, and visit forcepoint.com/podcast for show notes.
Summary prepared for listeners and cybersecurity professionals seeking practical insights in cloud resilience, team preparedness, and adapting to the fast-evolving threat landscape.
