B (17:34)
Yes, and actually we're, we're going to get to that later in this post they talk about the relative strength of AI for doing good versus doing bad. And it turns out the good guys have an advantage here for some reason. So you've got the infographic on the screen. This shows all of 2025 and January and then this, this just this previous month of February 2026, what the vulnerability level and classifications were found in Firefox by month, they said. As part of this collaboration, Mozilla fielded a large number of reports from us, helped us understand what types of findings warranted submitting a bug report and shipped fixes to hundreds of millions of users in Firefox 148. Their partnership and the technical lessons we learned provides a model for how AI enabled security researchers and maintainers can work together to meet this moment. So I would argue where it's still in the early stages of, of the deployment of AI to improve our existing software installed base, but it is clearly going to happen, they said. In late 2025, we noticed that Opus 4.5 was close to solving all tasks in Cyber Gym Gym, a benchmark that tests whether LLMs can reproduce known security vulnerabilities. So they're saying 4.5 was close to solving all tasks where LLMs are able or are being tested to see whether they can reproduce known security vulnerabilities like, you know, independently find them. They said we wanted to construct a harder and more realistic evaluation that contained a higher concentration of technically complex vulnerabilities. And again, Mozilla and Firefox heavily scrutinized field tested, you know, long term critical security target. So this, so this makes so much sense for them to test, they said, technically complex vulnerabilities like those present in modern web browsers. So we built a data set of prior Firefox common vulnerabilities and exposures CVEs to see if Claude could reproduce those. We chose Firefox because it's both a complex code base and one of the most well tested and secure open source projects in the world. This makes it a harder test for AI's ability to find novel new security vulnerabilities than the open source software we previously used to test our models. Hundreds of millions of users rely on Firefox daily, and browser vulnerabilities are particularly dangerous because users routinely encounter untrusted content and depend on the browser to keep them safe. Or as we're often saying here on the podcast, it is our Internet facing surface and so it needs to be as bulletproof as possible, they said. Our first step was to use Claw to find previously identified CVEs in older versions of of the Firefox code base. Right. So they're going back to test it. Like what, what, what is it able to do that we already know? They said. We were surprised that Opus 4.6 could reproduce a high percentage of these historical CVEs given that each of them took significant human effort to uncover. But it was still unclear how much we should trust this result, because it was possible that at least some of these historical CVEs were already in Claude's training data. That's, I think that's a very good point. So you know, being being retrospective has some value prospection is what we need, they said. So we tasked Claude with finding novel vulnerabilities in the current version of Firefox, bugs that by definition cannot have been reported before. We focused first on Firefox's JavaScript engine good. But then expanded to other areas of the browser. The JavaScript engine was a convenient first step. It's an independent slice of Firefox's code base that could be analyzed in isolation, and it's particularly important to secure given its wide attack surface. It processes untrusted external code when users browse the Web. After just 20 minutes of exploration, Claude Opus 4.6 reported that it had identified a use after free, they say a type of memory vulnerability that could allow attackers to overwrite data with arbitrary malicious content in the JavaScript engine. One of our researchers validated this bug in an independent virtual machine with the latest Firefox release, then forwarded it to two other anthropic researchers who also validated the bug. We then filed a bug report in Bugzilla Mozilla's issue tracker, along with a description of the vulnerability and a proposed patch written by Claude and validated who by the reporting team to help triage the root cause. In the time it took us to evaluate and submit this first vulnerability to Firefox, Claude had already discovered 55050 more unique crashing inputs while we were triaging these crashes because remember, a crash indicates that something went wrong, shouldn't have crashed can we weaponize the source of that crash into doing something that the bad guys want? So 50 more unique crashing inputs, they said. While we were triaging these crashes, a researcher from Mozilla reached out to us. After a technical discussion about our respective processes and sharing a few more vulnerabilities we had manually validated, they encouraged us to submit all of our findings in bulk without validating each one, even if we weren't confident that all of the crashing tests had security implications. By the end of this effort, we had scanned nearly 6,000 C files and submitted a total of 112 unique reports, including the high and moderate severity vulnerabilities mentioned above. Most issues have been fixed in Firefox 148, but the remainder to be fixed, but the remainder. Sorry, with the remainder to be fixed in upcoming releases. When doing this kind of bug hunting in external software, we're always conscious of the fact that we may have missed something critical about the code base that would make the discovery that a false positive. We tried to do the due diligence of validating the bugs ourselves, but there's always room for error. We're extremely appreciative of Mozilla for being so transparent about their triage process and for helping us adjust our approach to ensure we only submitted test cases they cared about, even if not all of them ended up being relevant to security. Mozilla researchers have since started experimenting with CLAUDE for security purposes internally. So then in their section from identifying vulnerabilities to writing primitive exploits, they said to measure the upper limits of claude's cyber security abilities. We also developed a new evaluation to determine whether CLAUDE was able to exploit a any of the bugs we discovered. In other words, we wanted to understand whether CLAUDE could also develop the sorts of tools that a hacker would use to take advantage of these bugs to execute malicious code. To do this, we gave CLAUDE access to the vulnerabilities we had submitted to Mozilla and asked CLAUDE to create an exploit focused on each one. To prove it had successfully exploited a vulnerability, we asked CLAUDE to demonstrate a real attack. Specifically, we required it to read and write a local file in a target system as an attacker would. We ran this test several hundred times with different starting points, spending approximately 4 $4,000 in API credits. Despite this, Opus, 4.6 was only able to actually turn the vulnerability into an exploit in two cases. Still, you spend $4,000 and you get two opportunities to read and write files on the victim's machine. That's worth four grand to attackers and then some they said. Despite this, Opus 4.6 oh yeah, was only able to actually turn the vulnerability into an exploit in two cases, they said. This tells us two things. One, CLAUDE is much better at finding these bugs than it is at exploiting them. So that's one of our data points, right? Two, the cost of identifying vulnerabilities is an order of magnitude cheaper than creating an exploit for them. However, the fact that CLAUDE could succeed at automatically developing a crude browser exploit, even if only in a few cases, is a concern. Crude is an important caveat here, they wrote. The exploits Claude wrote only worked on our testing environment, which intentionally removed some of the security features found in modern browsers. This includes, most importantly, the sandbox, the purpose of which is to reduce the impact of these types of vulnerabilities. Thus, Firefox's defense in depth would have been effective at mitigating even those two particular exploits. But vulnerabilities that escape the sandbox are not unheard of, and Claude's attack is one necessary component of of an end to end exploit. You can read more about how Claude developed one of these Firefox exploits on our Frontier Red Team blog. They said these early signs of AI enabled exploit development underscore the importance of accelerating the find and fix process for defenders. In other words, we don't have any time to waste here folks, because AI is getting good for everyone, and the bad guys, well, we already know they are using it, they said. Towards that end, we want to share a few technical and procedural best practices we found while performing this analysis. First, when researching patching agents which use LLMs to develop and validate bug fixes, we've developed a few methods we hope will help maintainers use LLMs like Claude to triage and address security reports faster. In our experience, CLAUDE works best when it's able to check its own work with another tool. We refer to this class of tool as a task verifier, a trusted method of confirming whether an AI agent's output actually achieves its goal. Task verifiers give the agent real time feedback as it explores a code base, allowing it to iterate deeply until it succeeds. Task verifiers helped us discover the Firefox vulnerabilities described above, and in separate research we found that they're also useful for fixing bugs. A good patching agent needs to verify at least two things that the vulnerability has actually been removed and that the program's intended functionality has not been changed, it's been preserved. In our work, we built tools that automatically tested whether the original bug could still be triggered after a proposed fix and separately ran test suites to catch regressions, which is a change that accidentally breaks something else. We expect maintainers will know best how to build these verifiers for their own code bases. The key point is that giving the agent a reliable way to check both of these properties dramatically improves the quality of its output. Right. Again, you don't, you know, we've had reports, right, of careless AI agents spewing out bug reports, you know, inundating Hacker one and similar bounties with, with bogus AI slop. So this is certainly an issue. They said we can't guarantee that all agent generated patches that pass these tests are good enough to merge immediately, but task verifiers give us increased confidence that the produced patch will fix the specific vulnerability while preserving program functionality and therefore achieve what's considered to be the minimum requirement for a plausible patch. Of course, when reviewing AI authored patches, we recommend that maintainers apply the same scrutiny they'd apply to any other patch created by an external auditor. And you know, they told us that the, the moment they started talking to Mozilla about this. The Mozilla guy said, give us everything you have. You found 50 ways to crash our JavaScript engine. We, we want them, you know, please, you know, we'll, we'll take responsibility for them. So Anthropic said, zooming out to the process of submitting bugs and patches, we know that maintainers are underwater. Therefore our approach is to give maintainers the information they need to trust and verify reports. The Firefox team highlighted three components of our submissions that, that were key for trusting our results. First, accompanying minimal test cases, I.e. prov, providing a minimal test case, detailed proof of concept, and candidate patches. Those are the three things that, that Mozilla wanted. They said, we strongly encourage researchers who use LLM powered vulnerability research tools to include similar evidence of verification and reproducibility when submitting bug reports based on the output of such tooling. So here we have Anthropic being essentially responsible, right? They're saying, we've created an AI system, people are, have jumped on it and they're using it. In some cases, they're not being as responsible with their use as they should be. So, you know, we tried this ourselves, here's what we learned. Please, everybody, we're happy to have you use Claude or whatever, but, you know, be respectful of the burden this is putting on maintainers. So they said, we strongly encourage researchers who use LLM powered vulnerability research tools to include similar evidence of verification and reproducibility when submitting reports based on the output of AI tooling. We also published our Coordinated Vulnerability Disclosure, you know CVD operating principles where we describe the procedures we will use when report when working with maintainers. Our processes here follow standard industry norms for the time being, but as models improve, we may need to adjust our processes to keep to keep pace with capabilities. Frontier language models are now world class vulnerability researchers. I think we can say based on this report and their results. That statement is not hyperbole. Frontier Language models are now world class vulnerability researchers. On top of the 22 CVEs we identified in Firefox, we've used Claude Opus 4.6 to discover vulnerabilities in other important software projects like the Linux kernel. Over the coming weeks and months we will continue to report on how we're using our models and working with the open source community to improve security. Opus 4.6 is currently and here it is. Leo, far better at identifying and fixing vulnerabilities than at exploiting them, which is really interesting. He's they said this gives defenders the advantage and with the recent release because for example it found 50 ways to crash the JavaScript engine but was only able to exploit two of those itself, whereas Mozilla found 22 instances where that generated a security relevant CVE. So Claude wasn't as good at at at finding and exploiting as it was at at at locating where there was a problem. So they said Opus 4.6. Right. Far better at identifying and with the recent release of CLAUDE Code security in Limited Research Preview so there's now something called CLAUDE Code Security. We're bringing vulnerability, discovery and patching capabilities directly to customers and open source maintainers. To anybody listening and Leo, we know that we have at least one listener who is now earning full time income bug hunting. We met him in Florida during the Zero Trust world.