Loading summary
A
Foreign. This is Casey Ellis for the Risky Business podcast. Today I am talking to Dylan Airey, the founder and CEO of Truffle Security. For those of you who don't know, Truffle is really designed to make secret management and exposure management around secrets suck less. They've been at this for a really long time. Dylan is one of the, the premier experts in the space. And yeah, they've had a busy couple of weeks. So it's good to see you, man.
B
Good to see you too. Excited to be here.
A
So, like, let's, let's kick off the whole idea of like attacking repos, you know, repos as kind of the center point of a lot of the attacks that we're seeing going on around the Internet. You know, especially ramping up over the past six months. That's definitely like topic du jour at the moment. But in the middle of that, and kind of underpinning a lot of that is basically the fact that we still collectively suck at secret management. And there's a lot of stuff getting kind of left out around the Internet. And you guys have played a pretty interesting role in quite a dramatic version of that over the past couple of weeks. So tell us what happened.
B
Yeah, I mean, Travel Hog has always been one of, if not the most popular open source secret scanning tools out there. I independently authored it way back in 2016. As a company, we haven't existed since 2021, so for a while it was just an independent project, but now we have a whole team of people working on it. And then the company has branched out into other things, into scanning the public Internet, looking for secrets. Not just building the scanner, but also building some of the monitoring and alert and stuff like that around the keys. And then we scan people's internal environments as well. But a big consequence of scanning the external Internet, not just for our customer keys, but we scan every event that goes into GitHub looking for any live key that happens to land there is. We end up with a big data lake of live keys. And it's very, very difficult to do disclosures at the scale we're talking about. We're talking about millions of unique live keys, many of which are students keys that have access to little nothing. But many of them are incredibly sensitive keys that could take down the federal government. So a little while ago, back in November, cisa, the branch of the federal government, cisa, a contractor, exposed a repository that had a bunch of sensitive keys in it. An independent security researcher found that, reported it to CISA and to Brian Krebs. Who thankfully got it taken down. About a week ago, Brian Krebs ran a story on it and we realized, wait a second, like the keys that were reported in the story versus the keys that we had observed from the events, there was a mismatch. Like we'd seen more keys than were reported. And so we wanted to know if all those keys got revoked too. And anybody who might have gotten a copy of the data in the meantime, obviously that was of concern. And so what we found was, sure enough, no, not all the keys got revoked. And so days after Brian Krebs ran the story, there were still a bunch of live keys from that repository, the worst of which was an administrative key that was fully installed on the CISA IT organization on GitHub and had administrative rights on that organization. So you can imagine all of the consequences from a GitHub app key that had that level of privilege. And so we scrambled to get it disclosed to CISA and get it revoked as fast as possible. And thankfully we were able to get that key revoked. But there was a lot in that leak. It was, it was more than just a couple of keys. I mean it was tons and tons of terraform. There were all kinds of passwords to databases that are behind private network. There were TLS certificates. And so, you know, to the extent that we kind of poked through it, we found that there was more work for CISA to do. And I think there's maybe still working through some of it now. It was a pretty significant blast radius from what was exposed.
A
Yeah, I mean it's an extraordinary. When you think about the chain of events there, you've got a contractor that leaves a repo open or does something silly with permissions on a repo, you've got the disclosure of that and then you've got you guys kind of scanning the Internet or listening to the Internet in the background in a position to then cross correlate those credentials. I guess question I've got there is why is it so hard to take this stuff down? And we'll get to the second part, which is like, why is it so difficult for you guys to disclose that stuff? Because that's something that would be great to touch on as well. Just in terms of what a day inside truffle looks like when this sort of thing happens. But on the receiving side, like what's, I guess what's the deal there? Like, why is it so difficult to get this stuff sorted out?
B
Well, when you leak a key out, first of all, it has to be noticed. This was posted to the contractor's personal repo. So it went unnoticed for months and months after it gets noticed. We have to define what success looks like. And in their case, success was take the repo down. That's not enough. Like everybody who had a copy of it. In the meantime, if your grandmother tweeted out her Twitter password, you wouldn't be happy until she changed that password, right? Not just delete the tweet. And so in this case, those keys in that repo, not all of them were actually revoked. And that aligns with data that we've observed. Like we did the huge case study a little while ago where we notified, I think, 10,000 developers of live keys that they had exposed and found after about a month, only about 24% of them had been revoked. About half the time they had taken the repository down. And so, like you can see there's a mismatch in definitions of success. The developer might think taking the repo down is success, but, you know, the reality is you're not done until that key gets revoked. And so there was a mismatch in this case, and so that's why we stayed on it. It's very difficult to automate revocation of exposed keys because it tends to take production systems down when you do that. Some providers have drawn that line in the sand and said, we don't care. And so OpenAI is a great example of that. If you post an OpenAI key somewhere and OpenAI finds out about it, they'll just revoke it. They don't care if your production system goes down. They've kind of taken that stance in their msa. If you sign up for those services and you leak the key out, that's part of what you're signing up for. Interestingly enough, that's also in the CA B browser specification for exposed TLS certificates. If you expose a TLS certificate, the Internet gods will revoke your certificate on your behalf. It's part of the rules. But Amazon does not subscribe to that philosophy. If you expose an Amazon key and Amazon finds out about it, they will put a policy on that key that restricts what it can do, but they will not revoke it because they are worried they're going to cause multimillion dollar outages if they go and revoke everybody's keys. And so that's kind of the unfortunate, weird gray area that we live in is GitHub runs this great partnership program where they notify all these vendors of their customers that expose keys. And the vast majority of providers choose to not revoke keys because of risks that they could take down production services. Sometimes they will notify their customers and say, hey, you leaked the key out. But, but I mean, Casey, I mean think about all the SaaS providers Bugcrowd used. What email address was the one that signed up for those? And who is monitoring that email address? It's easy to miss those things. And then there's the long tail of things that leak outside of GitHub. So for example, we've just cracked open hugging face and started scanning hugging face data sets and have found that actually there's probably more secrets in hugging face datasets than there are in GitHub. Why? Well, because a scrape of GitHub is a subset of hugging face data sets. Hugging face data sets by definition are meant to be scrapes of everything you can possibly imagine. Imagine training an LLM. We say that the pre training phase is literally training on all of human knowledge. All of human knowledge includes all of human API. So you know, you have a telegram that was scraped and we found a bunch of secrets that were in public telegram channels in a hugging face data set. You have GitHub scrapes, you have secrets and you know hugging face from GitHub scrapes. Again, another reason to not just delete the repo because there's a copy of it in hugging face and then you have scrapes of Reddit, scrapes of the entire Internet scrapes, the stack overflow, like all these like mini data sets that have been uploaded into a total sum of over a million data sets on Hugging Face. This is just one of many places
A
that, so cleanup is going to be a bit of work on something like that because at that point the, the blast radius is out all over the floor and into the cleanup aisle nine, but it's made its way down to aisle one in the process, right?
B
Yeah, exactly. Things can kind of propagate out. But also the point I was making more generally is the program that GitHub runs with Amazon and OpenAI about these keys that are leaking out. It's very specific to GitHub and developers are leaking keys in more places than just GitHub. And so it's a very complex problem. Truffle is doing our best to monitor as many places as we can maintain this big data lake and do all these notifications, figuring out who to notify and actually getting the keys revoked. It's a very hard problem to do at scale.
A
Yeah, and it's really interesting hearing you talk about the different vendor kind of default responses as you know Vulnerability disclosure and risk disclosure is kind of near and dear to my heart in terms of the overall kind of system level mechanics of it. And just this whole idea of vulnerability equities has really gone from being like a backroom conversation to something that everyone really needs to be at least thinking about, if not talking about. This is one example of that because I can sort of see OpenAI's position on it and I like the way that they're super upfront about that. It's like if this happens, then this will happen to you. So they've at least created an expectation around that. But I can also kind of argue the toss for AWS not wanting to shut down a GovCloud client, for example, with the revocation.
B
Yeah, I mean, it's a hard problem. I remember Google put out a statement a little while ago saying, hey, we're going to revoke keys that leak out. Well, guess what? Kubernetes has a Google Cloud key hard coded into its source code that's used for pulling public container images. And if you want to be able to pull public container images. And so I think they walked that back. And I don't think Google is actually hard revoking every Google Cloud key that leaks out, but it's easy to kind of take that superficial stance, but then it's hard to actually walk the walk. I remember there was a Stripe key that we got taken down that at the time they tried to reach out to the company and their method for doing that was they just reached out to the contact on the account, which was literally the CEO. And of course the CEO doesn't see the email and doesn't respond to it. So Stripe says, all right, screw it, this is a really sensitive key. We're just going to take it down. Well, the key was processing something like tens of millions of dollars a day. And so there was real material financial impact to getting that key revoked. Caused an outage, it caused a scramble. It's sort of, you know, but on the other side of it, like Stripe's consequence of not doing is, well, a key that has the ability to look at the financial data, tens of millions of dollars worth of transactions a day that has its own cost associated with it. So it's just every vendor has to kind of draw that line and make that call for themselves. Some have decided to hard revoke, but that's really the minority. Slack OpenAI. It's a small number of providers to do that.
A
Yeah, no, definitely. It's a good thing to call out, I think as a set of equities like that downstream consequence piece that you were talking about as well. The whole idea of, of a developer that receives notification of key leakage just removing the repo instead of actually trying to go through and do downstream user protection through key rotation, all that kind of stuff. It's been interesting kind of watching variations of that same problem pop up in vulnerability disclosure land lately. There was a researcher that found a bug. It was an exposed API or a leaked API key, I think in an app called Reframe. They couldn't get in touch with them so they, they decided to go like, you know, effectively full disclosure on, on like sharing the bug to actually get the, the company's attention so it could be fixed. But in the process they leaked the key. So this was an inexperienced researcher that didn't know or wasn't necessarily thinking through the user blast radius and just wanted to try to make sure the thing got fixed. And in the meantime, you know, they weren't thinking that part through. They just, they thought, okay, job done, they've gone now and taken care of, I'm finished. But in the meantime there was like a three or four day exposure window where every user of that application was fundamentally exposed to anyone who wanted to do whatever they wanted to with that API key.
B
Yeah, I mean that's a really good point. Like your point around the developer taking the repo down after you notify them about the key. Like we found that's actually not the most effective way to do disclosures. If a developer leaks a key out and you first reach out to the developer, a lot of times they will just reflexively try to delete the repository. Now you have to tell the story to a security engineer and the story involves a 404 link of like, hey, this thing was exposed at this GitHub URL. Trust me, this happened in the past. I know I don't have any evidence of anymore because the developer just reflexively deleted it and the key is still alive. I need you to trust me on that too. Here's a copy of the key and then they're like, how the heck did, did you get this? Do I believe what they're saying happened happened? And it's just, it can cause a little bit of chaos. And so we found it's more effective to go directly to the security team. Attribution mapping is also tough to try to figure out which security team which keys line up with and it's a hard problem. And you know, another thing is the developer who leaks the key is often different than the one who manufactured the key. Right. So imagine a scenario where a developer gets a key from their co worker five years ago and they're sharing it securely for five years. Now all of a sudden this developer leaks that developer's key out. They're not going to remember who made it five years ago. And if that person's the only person in the universe, you can log in and get it revoked, like that's a very difficult thing for that developer. And now how the, you know, the clairvoyance to say that's Sally, who doesn't work at the company anymore, made it out of their whatever account and Sally has to be the one to log in. Let me reach out to. It's just, it's, it's a hard problem. We do build technology for that in particular, like we've researched meticulously API endpoints from the keys that we find that tell you who manufactured them originally and just reveal kind of metadata about the key without leaking sensitive data. But we had to go key by key, API by API to find the right endpoints to hit that aren't too sensitive, that aren't going to take anything down to be able to surface the right metadata to do the disclosures. It's a very hard problem.
A
Yeah, it's an interesting one. And again, the versions of this that I've worked with personally on the vulnerability disclosure side, so it's a almost like the same set of problems that you're dealing with downstream, but in this case you're talking specifically about secrets and keys. All of the downstream math and attribution archeology that you've got to do like finger pointing and storytelling and all that other stuff. It's almost like a machine speed version of that that you guys are bumping into. So, I mean, for starters, thank you. I guess listening to the Internet is hard and trying to tell it to fix its stuff is even more difficult. And you guys are taking that on, which is pretty cool.
B
Yeah, we're doing the best we can with keys. But the thing is like every single breach that, you know, when you, when you crack the details open on it, it's like, I mean, sometimes there is a exploitation step, but there's always a key step. You know, there's always like, yeah, this is how we got our foothold. But then the next step was we just grabbed the password that was laying there and we logged in with it. And so as you know, some people say we focus on a pretty narrow problem. I mean, I think that's true, but it's also not true. It is true that Secrets is a, is a subset of AppSec, but I also think it's a part of just about every breach. And so that's, that's why it's so important that we get these keys in the places that they belong.
A
We make sure superset in a lot of ways as well.
B
Yeah, yeah, exactly.
A
Totally. Totally. Well, look, Dylan, it's been great to catch up. What, what should we be keeping our eye out for from, from Truffle over
B
the next little bit, I think attribution mapping is going to continue to be a prominent theme. If we find a key somewhere, we're going to figure out whose it is and then that is going to lead to more and more disclosures that we do. The other thing is like you touched briefly on the supply chain attacks that have been happening. The supply chain attacks that have been happening overwhelmingly feature a harvesting credentials and then you're using those credentials to worm into multiple providers. And I think we are just at the early stages of what those could look like. A lot of those worms, I think are actually relatively primitive compared to what they could be. And what I mean by that is they will extract like two or three different credential types, like maybe NPM and GitHub and used to worm within those systems. But I could imagine a much more powerful worm that looked for S3 and then downloaded all the S3 data and then looked for more credentials in there that, you know, used hugging like recursive
A
searching in a sense.
B
Exactly. It's used many, many different providers. And so I'm expecting, based on the data that we're seeing, those attacks are not going to slow down, they're going to speed up, they're going to become more advanced, they're going to involve more providers, and it's going to become even more important to get the keys cleaned up. And so that's kind of the forefront of what we're focused on.
A
So pretty much give, give Truffle Security a call and get your keys cleaned up before the bad guys come along and do it for real, I guess, is what I'm hearing.
B
That's right. I mean, we help out with internal monitoring and external monitoring, making sure the keys inside the environment and outside your environment are picked up, inventory, classified, analyzed and ultimately revoked.
A
Well, look, yeah, this has been super interesting and like it's a fascinating data set that you guys are sitting on and getting to watch kind of the patterns of behavior that exist across the Internet. Obviously the CISA version is kind of the noisy one that we've gotten to hear about in the public recently. But it's kind of any day that ends with why within truffle security, as far as far as I know there. So, again, thank you for that. Thanks for sharing your experience on it. And, yeah, we'll, you know, we'll stay in touch. In terms of getting secrets sorted out, I think everyone who's thinking about this for the first time in the viewership and people that that are looking to up their game, that are thinking about, you know, the the ability that they have to actually mitigate supply chain attacks, check out truffle suit security, because they're doing a lot of stuff that might be able to help you out with that. So cheers, Dylan. I appreciate it. Thanks, man.
B
Thanks, Casey. It's great to be on.
Risky Bulletin Podcast – Detailed Episode Summary
Episode Title: Sponsored: Inside CISA's Disastrous Secrets Leak
Date: May 31, 2026
Host: Casey Ellis (Risky Business Media)
Guest: Dylan Airey (Founder & CEO, Truffle Security)
This episode dives deep into a recent high-profile incident involving the exposure of sensitive keys from a CISA contractor’s public GitHub repository. Guest Dylan Airey, an expert in secret management and founder of Truffle Security, discusses the technical, organizational, and systemic challenges of secret exposure, detection, notification, and remediation at Internet scale. The conversation highlights why secrets leaks remain so difficult to fully resolve, the varying responses from major vendors, and what it actually looks like to manage these incidents in practice.
Truffle Security’s Mission:
Timeline & Discovery:
Remediation:
Awareness & Definition of Success:
Vendor Response Differences:
Secrets Beyond GitHub:
Persistent Propagation:
Scale of Monitoring:
Vendor “Equities”:
Memorable Example:
Unintended Exposure by Researchers:
Summary Note:
This episode is essential listening for security practitioners, developers, or anyone navigating the challenges of secrets management in a world of persistent, large-scale leaks and increasingly complex supply chain attacks. Dylan Airey brings practical insight and war stories from the front lines, making it clear: there's still a lot of work ahead for the industry to properly secure secrets and contain their inevitable exposures.