Steve Gibson (17:34)
Yes, and actually we're, we're going to get to that. Later in this post they talk about the relative strength of AI for doing good versus doing bad. And it turns out the good guys have an advantage here for some reason. So you've got the infographic on the screen. This shows all of 2025 and January and then this, this just this previous month of February 2026, what the vulnerability level and classifications were found in Firefox by month, they said. As part of this collaboration, Mozilla fielded a large number of reports from us, helped us understand what types of findings warranted submitting a bug report and shipped fixes to hundreds of millions of users in Firefox 148. Their partnership and the technical lessons we learned provides a model for how AI enabled security researchers and maintainers can work together to meet this moment. So I would argue where it's still in the early stages of, of the deployment of AI to improve our existing software installed base, but it is clearly going to happen, they said. In late 2025, we noticed that Opus 4.5 was close to solving all tasks in Cyber Gym Gym, a benchmark that tests whether LLMs can reproduce known security vulnerabilities. So they're saying 4.5 was close to solving all tasks where LLMs are able or are being tested to see whether they can reproduce known security vulnerabilities like, you know, independently find them. They said we wanted to construct a harder and more realistic evaluation that contained a higher concentration of technically complex vulnerabilities. And again, Mozilla and Firefox heavily scrutinized field tested, you know, long term critical security target. So this, so this makes so much sense for them to test, they said, technically complex vulnerabilities like those present in modern web browsers. So we built a data set of prior Firefox common vulnerabilities and exposures CVEs to see if Claude could reproduce those. We chose Firefox because it's both a complex code base and one of the most well tested and secure open source projects in the world. This makes it a harder test for AI's ability to find novel new security vulnerabilities than the open source software we previously used to test our models. Hundreds of millions of users rely on Firefox daily. And browser vulnerabilities are particularly dangerous because users routinely encounter untrusted content and depend on the browser to keep them safe. Or as we're often saying here on the podcast, it is our Internet facing surface and so it needs to be as bulletproof as possible, they said. Our first step was to use CLAW to find previously identified CVEs in older versions of the Firefox code base. Right. So they're going back to test it. Like, what is it able to do that we already know? They said. We were surprised that Opus 4.6 could reproduce a high percentage of these historical CVEs given that each of them took significant human effort to uncover. But it was still unclear how much we should trust this result, because it was possible that at least some of these historical CVEs were already in Claude's training data. That's, I think that's a very good point. So, you know, being being retrospective has some value prospection is what we need, they said. So we tasked Claude with finding novel vulnerabilities in the current version of Firefox, bugs that by definition cannot have been reported before. We focused first on Firefox's JavaScript engine. Good. But then expanded to other areas of the browser. The JavaScript engine was a convenient first step. It's an independent slice of Firefox's code base that could be analyzed in isolation. And it's particularly important to secure, given its wide attack surface. It processes untrusted external code when users browse the Web. After just 20 minutes of exploration, Claude Opus 4.6 reported that it had identified a use after free, they say, a type of memory vulnerability that could allow attackers to overwrite data with arbitrary malicious content in the JavaScript engine. One of our researchers validated this bug in an independent virtual machine with the latest Firefox release, then forwarded it to two other anthropic researchers who also validated the bug. We then filed a bug report in Bugzilla, Mozilla's issue Tracker, along with the description of the vulnerability and a proposed patch written by Claude and validated by the reporting team to help triage the root cause. In the time it took us to evaluate and submit this first vulnerability to Firefox, Claude had already discovered 5550 more unique crashing inputs while we were triaging these crashes because remember, a crash indicates that something went wrong, shouldn't have crashed. Can we weaponize the source of that crash into doing something that the bad guys want? So 50 more unique crashing Inputs, they said. While we were triaging these crashes, a researcher from Mozilla reached out to us. After a technical discussion about our respective processes and sharing a few more vulnerabilities we had manually validated, they encouraged us to submit all of our findings in bulk without validating each one, even if we weren't confident that all of the crashing tests had security implications. By the end of this effort, we had scanned nearly 6,000 C files and submitted a total of 112 unique reports, including the high and moderate severity vulnerabilities mentioned above. Most issues have been fixed in Firefox 148, but the remainder to be fixed, but the remainder. Sorry, with the remainder to be fixed in upcoming releases. When doing this kind of bug hunting and external software, we're always conscious of the fact that we may have missed something critical about the code base that would make the discovery that a false positive. We tried to do the due diligence of validating the bugs ourselves, but there's always room for error. We're extremely appreciative of Mozilla for being so transparent about their triage process and for helping us adjust our approach to ensure we only submitted test cases they cared about, even if not all of them ended up being relevant to security. Mozilla researchers have since started experimenting with CLAUDE for security purposes internally. So then in their section from identifying vulnerabilities to writing primitive exploits, they said to measure the upper limits of claude's cyber security abilities. We also developed a new evaluation to determine whether CLAUDE was able to exploit a any of the bugs we discovered. In other words, we wanted to understand whether CLAUDE could also develop the sorts of tools that a hacker would use to take advantage of these bugs to execute malicious code. To do this, we gave Claude access to the vulnerabilities we had submitted to Mozilla and asked CLAUDE to create an exploit focused on each one to prove it had successfully exploited a vulnerability. We asked CLAUDE to demonstrate a real attack. Specifically, we required it to read and write a local file in a target system as an attacker would. We ran this test several hundred times with different starting points, spending approximately 4. $4,000 in API credits. Despite this, Opus 4.6 was only able to actually turn the vulnerability into an exploit in two cases. Still, you spend $4,000 and you get two opportunities to read and write files on the victim's machine. That's worth four grand to attackers and then some, they said. Despite this, Opus 4.6, oh yeah, was only able to actually turn the vulnerability into an exploit in Two cases, they said. This tells us two things. One, CLAUDE is much better at finding these bugs than it is at exploiting them. So that's one of our data points, right? Two, the cost of identifying vulnerabilities is an order of magnitude cheaper than creating an exploit for them. However, the fact that Claude could succeed at automatically developing a crude browser exploit, even if only in a few cases, is a concern. Crude is an important caveat here they wrote. The exploits Claude wrote only worked on our testing environment, which intentionally removed some of the security features found in modern browsers. This includes, most importantly, the sandbox, the purpose of which is to reduce the impact of these types of vulnerabilities. Thus, Firefox's defense in depth would have been effective at mitigating even those two particular exploits. But vulnerabilities that escape the sandbox are not unheard of, and Claude's attack is one necessary component of of an end to end exploit. You can read more about how Claude developed one of these Firefox exploits on our Frontier Red Team blog. They said these early signs of AI enabled exploit development underscore the importance of accelerating the find and fix process for defenders. In other words, we don't have any time to waste here folks, because AI is getting good for everyone, and the bad guys, well, we already know they are using it, they said. Towards that end, we want to share a few technical and procedural best practices we found while performing this analysis. First, when researching patching agents which use LLMs to develop and validate bug fixes, we've developed a few methods we hope will help maintainers use LLMs like Claude to triage and address security reports faster. In our experience, CLAUDE works best when it's able to check its own work with another tool. We refer to this class of tool as a task verifier, a trusted method of confirming whether an AI agent's output actually achieves its goal. Task verifiers give the agent real time feedback as it explores a code base, allowing it to iterate deeply until it succeeds. Task verifiers helped us discover the Firefox vulnerabilities described above, and in separate research we found that they're also useful for fixing bugs. A good patching agent needs to verify at least two things that the vulnerability has actually been removed and that the program's intended functionality has not been changed it's been preserved. In our work, we built tools that automatically tested whether the original bug could still be triggered after a proposed fix, and separately ran test suites to catch regressions, which is A change that accidentally breaks something else. We expect maintainers will know best how to build these verifiers for their own code bases. The key point is that giving the agent a reliable way to check both of these properties dramatically improves the quality of its output.