Podcast Summary

AI + a16z

Episode: The AI That Found A Bug In The World’s Most Audited Code
Date: December 10, 2025
Host: Joel de la Garza (A16Z)
Guests: Matt Knight (VP of Security Products and Research, OpenAI; former CISO, now heading Aardvark)

Episode Overview

This episode explores how advances in artificial intelligence—particularly large language models (LLMs)—are transforming cybersecurity. The conversation centers on Aardvark, an AI agent developed at OpenAI that autonomously hunts zero-day vulnerabilities, even in heavily audited open-source code such as OpenSSH. Joel and Matt discuss the rapid evolution from early models like GPT-3 to today’s powerful iterations, how AI is empowering defenders, and what this means for open-source maintainers, enterprises, and national security as AI tools become more integral to code defense and analysis.

Key Discussion Points & Insights

1. AI’s Early Limitations in Security (GPT-3 Era) [04:34 – 06:08]

Initial Aspirations & Shortcomings:
Matt joined OpenAI in 2020, eager to leverage AI for security. However, GPT-3 fell short for practical automation:

“Spoiler alert. Nothing. The model just was not good enough for real security automation or operational tasks or things that would actually create impact for a security program.”
— Matt Knight [05:09]
Reasons: Limited context, poor token vocabulary, insufficient world knowledge, and reasoning led to unreliable results.

2. Breakthroughs with Advanced Language Models (GPT-4/5) [06:10 – 10:58]

GPT-4 as a Turning Point:
The team experienced a paradigm shift with GPT-4’s improved reasoning and instruction-following capabilities—unlocking the ability to triage security logs and detect nuanced behaviors like reverse shell activity.

“There were two tests we ran that wowed us… we took security logs, prompted the model as an expert analyst, and it just kind of got it right… You start to turn up the heat, open a reverse shell or something, and the model said, ‘that right there, you should have somebody look at that.’”
— Matt Knight [08:19 - 10:02]
Threat Intelligence in Multilingual Contexts:
GPT-4 processed 60,000+ chat logs of a cybercriminal group (in Russian “shorthand”—not just Russian), extracting meaningful intelligence in hours that would’ve taken a diverse human team days or weeks.

“We just suddenly had this alien intelligence that could just do it all day, which was great.”
— Matt Knight [12:04]

3. AI as a Security Force Multiplier [13:57 – 15:28]

Efficiency Gains:
Early models primarily reduced toil via automating repetitive but necessary workflows (e.g., employee info gathering during investigations).
- Human-in-the-loop model: AI augmented human analysts, making work both more efficient and satisfying.
From Efficiency to New Capabilities:
Modern LLMs enable previously impossible feats, such as continuous and comprehensive code review at scale.

4. The Rise of Aardvark: An Agentic Security Researcher [16:08 – 23:19]

How Aardvark Works [16:08 – 19:18]:
- Hooks into codebases, builds a threat model, identifies vulnerabilities, validates them in a sandbox, auto-generates patches, and re-validates the patches for completeness.
- Mimics human vulnerability research: reads, tests, hypothesizes, and fixes, but at machine speed and scale.
“It does so by reading the code, by analyzing... It’ll write and run tests, actually try to stand up scaffolding and explore it... Aardvark’s workflow is simple—you hook it up to your codebase and it generates a threat model first, just like a good security engineer would.”
— Matt Knight [16:37 - 18:08]
Real-World Impact:
Aardvark found memory corruption bugs in highly audited code, including OpenSSH, surprising even Matt and the OpenSSH maintainers.

“Anytime you’re finding memory corruption in OpenSSH, that’s super interesting.”
— Matt Knight [22:05]

5. Patching as a Native AI Capability [22:52 – 23:19]

Aardvark not only finds bugs but also auto-generates and suggests validated patches, radically streamlining code defense.
This blurs the line between vulnerability identification and remediation, making secure coding accessible for time-strapped teams.

6. Value for Open Source & Democratizing Security [33:54 – 37:29]

Open Source Under Siege:
Maintainer fatigue, social engineering, and sophisticated attacks (like the xz utils backdoor incident) illustrate how outgunned solo or small-team open-source maintainers are.

“We got lucky as hell. If you’re listening—thank you.”
— Matt Knight [33:50]
AI’s Empowering Role:
Language models like Aardvark can scale security intelligence, giving open-source projects “a fighting chance”, previously only practical for resource-rich organizations.

“With language models, we can scale security intelligence to all the places that need it... to give developers a fighting chance.”
— Matt Knight [35:33]
Call for Beta Testers:
OpenAI is inviting open-source maintainers to join Aardvark’s private beta.

7. Offense vs. Defense—Who Wins in the AI Era? [24:51 – 32:53]

Not a Labor Replacement—A Labor Multiplier:
AI augments human analysts; the shortage in skilled defenders is so acute there’s no near-future risk of replacement.
- Security remains a “specialization within a specialization”.
Arms Race Dynamics:
Both attackers and defenders can now automate and scale attacks/defenses, but defenders may finally have the edge.

“Attackers get to, you know, run all these Mechanical Turk shots on goal. So do we.”
— Matt Knight [29:02]
Continuous Testing as the Future:
Aardvark can provide always-on code auditing, moving from snapshot-in-time analysis to genuinely continuous defense.

8. On Demystifying & Democratizing Security [36:10 – End]

Security Shouldn’t Be “the Rich Person’s Game”:

“It’d be great if my local mom-and-pop dentist had the same security profile as my investment bank. Right? ... That’s a much better world.”
— Joel de la Garza [37:03]
AI Makes Security Scalable and Equitable:
Matt’s personal passion: enabling the next generation of tinkerers and open-source volunteers to secure the global software supply chain.

Notable Quotes & Memorable Moments

On the leap from GPT-3 to GPT-4:

“You give it a section of code and ask it to spot the bug—it couldn’t do it. Or it would make something up.”
— Matt Knight [06:08]
On the xz utils backdoor saga:

“We got lucky as hell. If you’re listening—thank you. … Think about the blast radius had that made it into Linux distributions; that backdoors, what, half the Internet?”
— Matt Knight [33:54]
On open-source maintainers fighting nation-state attackers:

“What chance do these open-source developers ... have against the full force of a foreign intelligence service?”
— Matt Knight [35:06]
On the big picture:

“With language models and the tools that we can build on them, we actually have the ability to scale security intelligence to all the places that need it. To give these developers a fighting chance.”
— Matt Knight [35:33]
On democratizing security:

“Security has always been kind of the rich person’s game… It just seems like this is a wave where we’re going to start to democratize some of this stuff.”
— Joel de la Garza [37:03]

Timestamps for Important Segments

[04:34] - Matt recounts his arrival at OpenAI & early limitations of AI in security
[08:07] - First successful GPT-4 security log triage tests
[12:00] - AI processes non-English, non-standard threat intelligence data
[15:14] - Shift from efficiency to enabling new defensive capabilities
[16:08] - How Aardvark works: agentic, autonomous, code analysis + patching
[22:05] - Aardvark finds a zero-day memory corruption bug in OpenSSH
[24:51] - Are AI agents replacing security analysts? The labor market perspective
[29:02 – 32:53] - Offense v. defense in the AI era; continuous testing as the new paradigm
[33:54 – 37:29] - xz utils incident; why open source needs scalable AI-powered security
[37:03] - Vision for democratizing security for everyone, not just well-resourced orgs

Conclusion

This episode compellingly illustrates AI’s shift from security “helper” to unprecedented force multiplier—capable of continuous, autonomous code review, intelligence analysis, and patching at scale. As the gap closes between attacker capability and defender resources, tools like Aardvark represent a new hope for the broad software ecosystem: defense democratized, open source empowered, and defenders at every tier newly equipped for an era of rapidly evolving threats.

Podcast Summary

AI + a16z

Episode Overview

Key Discussion Points & Insights

1. AI’s Early Limitations in Security (GPT-3 Era) [04:34 – 06:08]

Initial Aspirations & Shortcomings:
Matt joined OpenAI in 2020, eager to leverage AI for security. However, GPT-3 fell short for practical automation:

“Spoiler alert. Nothing. The model just was not good enough for real security automation or operational tasks or things that would actually create impact for a security program.”
— Matt Knight [05:09]
Reasons: Limited context, poor token vocabulary, insufficient world knowledge, and reasoning led to unreliable results.

2. Breakthroughs with Advanced Language Models (GPT-4/5) [06:10 – 10:58]

GPT-4 as a Turning Point:
The team experienced a paradigm shift with GPT-4’s improved reasoning and instruction-following capabilities—unlocking the ability to triage security logs and detect nuanced behaviors like reverse shell activity.

“There were two tests we ran that wowed us… we took security logs, prompted the model as an expert analyst, and it just kind of got it right… You start to turn up the heat, open a reverse shell or something, and the model said, ‘that right there, you should have somebody look at that.’”
— Matt Knight [08:19 - 10:02]
Threat Intelligence in Multilingual Contexts:
GPT-4 processed 60,000+ chat logs of a cybercriminal group (in Russian “shorthand”—not just Russian), extracting meaningful intelligence in hours that would’ve taken a diverse human team days or weeks.

“We just suddenly had this alien intelligence that could just do it all day, which was great.”
— Matt Knight [12:04]

3. AI as a Security Force Multiplier [13:57 – 15:28]

Efficiency Gains:
Early models primarily reduced toil via automating repetitive but necessary workflows (e.g., employee info gathering during investigations).
- Human-in-the-loop model: AI augmented human analysts, making work both more efficient and satisfying.
From Efficiency to New Capabilities:
Modern LLMs enable previously impossible feats, such as continuous and comprehensive code review at scale.

4. The Rise of Aardvark: An Agentic Security Researcher [16:08 – 23:19]

How Aardvark Works [16:08 – 19:18]:
- Hooks into codebases, builds a threat model, identifies vulnerabilities, validates them in a sandbox, auto-generates patches, and re-validates the patches for completeness.
- Mimics human vulnerability research: reads, tests, hypothesizes, and fixes, but at machine speed and scale.
“It does so by reading the code, by analyzing... It’ll write and run tests, actually try to stand up scaffolding and explore it... Aardvark’s workflow is simple—you hook it up to your codebase and it generates a threat model first, just like a good security engineer would.”
— Matt Knight [16:37 - 18:08]
Real-World Impact:
Aardvark found memory corruption bugs in highly audited code, including OpenSSH, surprising even Matt and the OpenSSH maintainers.

“Anytime you’re finding memory corruption in OpenSSH, that’s super interesting.”
— Matt Knight [22:05]

5. Patching as a Native AI Capability [22:52 – 23:19]

Aardvark not only finds bugs but also auto-generates and suggests validated patches, radically streamlining code defense.
This blurs the line between vulnerability identification and remediation, making secure coding accessible for time-strapped teams.

6. Value for Open Source & Democratizing Security [33:54 – 37:29]

Open Source Under Siege:
Maintainer fatigue, social engineering, and sophisticated attacks (like the xz utils backdoor incident) illustrate how outgunned solo or small-team open-source maintainers are.

“We got lucky as hell. If you’re listening—thank you.”
— Matt Knight [33:50]
AI’s Empowering Role:
Language models like Aardvark can scale security intelligence, giving open-source projects “a fighting chance”, previously only practical for resource-rich organizations.

“With language models, we can scale security intelligence to all the places that need it... to give developers a fighting chance.”
— Matt Knight [35:33]
Call for Beta Testers:
OpenAI is inviting open-source maintainers to join Aardvark’s private beta.

7. Offense vs. Defense—Who Wins in the AI Era? [24:51 – 32:53]

Not a Labor Replacement—A Labor Multiplier:
AI augments human analysts; the shortage in skilled defenders is so acute there’s no near-future risk of replacement.
- Security remains a “specialization within a specialization”.
Arms Race Dynamics:
Both attackers and defenders can now automate and scale attacks/defenses, but defenders may finally have the edge.

“Attackers get to, you know, run all these Mechanical Turk shots on goal. So do we.”
— Matt Knight [29:02]
Continuous Testing as the Future:
Aardvark can provide always-on code auditing, moving from snapshot-in-time analysis to genuinely continuous defense.

8. On Demystifying & Democratizing Security [36:10 – End]

Security Shouldn’t Be “the Rich Person’s Game”:

“It’d be great if my local mom-and-pop dentist had the same security profile as my investment bank. Right? ... That’s a much better world.”
— Joel de la Garza [37:03]
AI Makes Security Scalable and Equitable:
Matt’s personal passion: enabling the next generation of tinkerers and open-source volunteers to secure the global software supply chain.

Notable Quotes & Memorable Moments

On the leap from GPT-3 to GPT-4:

“You give it a section of code and ask it to spot the bug—it couldn’t do it. Or it would make something up.”
— Matt Knight [06:08]
On the xz utils backdoor saga:

“We got lucky as hell. If you’re listening—thank you. … Think about the blast radius had that made it into Linux distributions; that backdoors, what, half the Internet?”
— Matt Knight [33:54]
On open-source maintainers fighting nation-state attackers:

“What chance do these open-source developers ... have against the full force of a foreign intelligence service?”
— Matt Knight [35:06]
On the big picture:

“With language models and the tools that we can build on them, we actually have the ability to scale security intelligence to all the places that need it. To give these developers a fighting chance.”
— Matt Knight [35:33]
On democratizing security:

“Security has always been kind of the rich person’s game… It just seems like this is a wave where we’re going to start to democratize some of this stuff.”
— Joel de la Garza [37:03]

Timestamps for Important Segments

[04:34] - Matt recounts his arrival at OpenAI & early limitations of AI in security
[08:07] - First successful GPT-4 security log triage tests
[12:00] - AI processes non-English, non-standard threat intelligence data
[15:14] - Shift from efficiency to enabling new defensive capabilities
[16:08] - How Aardvark works: agentic, autonomous, code analysis + patching
[22:05] - Aardvark finds a zero-day memory corruption bug in OpenSSH
[24:51] - Are AI agents replacing security analysts? The labor market perspective
[29:02 – 32:53] - Offense v. defense in the AI era; continuous testing as the new paradigm
[33:54 – 37:29] - xz utils incident; why open source needs scalable AI-powered security
[37:03] - Vision for democratizing security for everyone, not just well-resourced orgs

wavePod

The AI That Found A Bug In The World’s Most Audited Code

Powered by Wave AI

Summary

Podcast Summary

AI + a16z

Episode Overview

Key Discussion Points & Insights

1. AI’s Early Limitations in Security (GPT-3 Era) [04:34 – 06:08]

2. Breakthroughs with Advanced Language Models (GPT-4/5) [06:10 – 10:58]

3. AI as a Security Force Multiplier [13:57 – 15:28]

4. The Rise of Aardvark: An Agentic Security Researcher [16:08 – 23:19]

5. Patching as a Native AI Capability [22:52 – 23:19]

6. Value for Open Source & Democratizing Security [33:54 – 37:29]

7. Offense vs. Defense—Who Wins in the AI Era? [24:51 – 32:53]

8. On Demystifying & Democratizing Security [36:10 – End]

Notable Quotes & Memorable Moments

Timestamps for Important Segments

Conclusion

Summary

Podcast Summary

AI + a16z

Episode Overview

Key Discussion Points & Insights

1. AI’s Early Limitations in Security (GPT-3 Era) [04:34 – 06:08]

2. Breakthroughs with Advanced Language Models (GPT-4/5) [06:10 – 10:58]

3. AI as a Security Force Multiplier [13:57 – 15:28]

4. The Rise of Aardvark: An Agentic Security Researcher [16:08 – 23:19]

5. Patching as a Native AI Capability [22:52 – 23:19]

6. Value for Open Source & Democratizing Security [33:54 – 37:29]

7. Offense vs. Defense—Who Wins in the AI Era? [24:51 – 32:53]

8. On Demystifying & Democratizing Security [36:10 – End]

Notable Quotes & Memorable Moments

Timestamps for Important Segments

Conclusion