
Loading summary
A
Usually when you think about whistleblowing, right. We think most people think of like an Edward Snowden type category, but that's not necessarily what we want to talk about. What we really care about is insiders being able to spot issues, evaluate them, and if there's concern, have them addressed. And that's what we care about. Whistleblowing is more important because if companies hold back certain knowledge or they maybe prevent evil organizations from saying certain things, then who again, who do you go to? Who's the fallback that's going to be the inside or the whistleblowing side? If you feel like, okay, we're actually not hitting certain alignment things right now and we're using misaligned models to try and align the models of the future. Probably good to speak up now. If you violate the California state whistleblower protection provisions, you have to pay a fine of $10,000 as a company, which you know is probably like a 5 second burn for most companies these days. Not a terrible deterrent, right?
B
Carl, welcome to the Future of Life Institute podcast.
A
Thank you very much for having me, guys.
B
All right, let's start with why did you decide to begin the AI whistleblower initiative?
A
Big question at the start. So I've been involved with the AI safety community, or however you want to call it these days, since 2016. It was a volunteer researcher at the Future of Humanity Institute, then did an AI safety research camp. Howard, Very different timeline, sort of back in the day, worked in consulting setups for a while. And yeah, I was kind of surprised by the timelines, as I think many were. And then as ChatGPT rolled around, I felt, okay, now it's really time to move back into the space and do something tractable that seems to be like robustly good across a wide variety of scenarios and futures, and relatively quickly settled on the topic of transparency as just a thing that if we have more of that, that would be very good. Looked into compute traceability for a while. Time did some great research on that. Now I think Lucid Computing is looking more into that as well. But the whistleblowing topic came up quite quickly. So this was like mid 23, I believe we started looking more seriously into the topic. People had been thinking about it since like the early days. I think for the same reason that we now think whistleblowing is extremely important in the AI space. People had already been thinking about it in like 2017, 2018, but we just noticed nobody was uniquely focused on this exact topic. In the beginning, we thought we were way too early as the World still seemed a bit more rosy back in the day with I think super alignment team being set up on the OpenAI side, for example. But then the first crack started to show around the OpenAI board drama where it seemed, okay, maybe there is dissent here internally and this is not being resolved super well. And then of course with Daniel's big disclosure around the right to warn sort of mid 24 and especially early, also earlier in 24 with Leopold Ashena's departure, it became clear that this is like an extremely important topic to work on. And that's how we do it.
B
Yeah. So you can do transparency in many ways. Why whistleblowing? What are the unique advantages of, of whistleblowing specifically?
A
Yeah, so I think one way to think about it is probably as like a backstop mechanism. So I think there's the Swiss cheese model of control where you have a bunch of different mechanisms to make sure things that we want to go well, do go well. Transparency is certainly somewhere along those lines, whether that's self imposed transparency obligations, so whether that's a regulator side. There's a bunch of other, of course, things you can do as well to make sure technologies are developed in a responsible manner. And sort of whistleblowing is often used as like the final sort of. If you picture a bunch of like slices of Swiss cheese with holes of them where things can get through, often that's sort of towards the end where if a bunch of other control mechanisms fail, then you have to rely on that. That's probably one angle to think about it. Probably another thought is here, like how much you actually believe, for example, that self imposed transparency commitments actually sort of hold up. Well, that's one way to think about it. And then I think probably from like a systemic level, another reason why we should really care about whistleblowing and I in particular is just how feasible are the other mechanisms really? So if it's not necessarily clear what sort of risks will arise, maybe over the coming months, years, decades, depending on your timelines, how confident can we be that we're going to be able to catch all of those things? And like if the vast majority of the most highly skilled people maybe, or let's say a majority of the most highly skilled people work in those companies, those private companies themselves, what sort of maybe regulatory capacity would you need.
B
To.
A
Actually be able, for example, to check whether things are actually going in the direction we want them to go? And so if you have this massive information asymmetry and, and sort of risks kind of pop up maybe in areas where you don't expect them. Whether that's for example, on the child safety thing with the horrible ChatGPT suicide story recently, or whether it's around situational awareness, for example, of models or to what extent maybe do models really in practice sort of maybe carry over misalignment into future models that they maybe help work on who can spot those things best? And that's most likely going to be insider. So it really matters that both sort of companies internally are really strong at rectifying issues and spotting them, but necessarily also for that probably to happen to a strong extent, having strong whistleblower protections and making sure individuals who want to speak up in the public interest are empowered.
B
And what's the state of whistleblower protections today in the AI industry? Where are we?
A
Another big question. So probably a good way to think about this is in terms of framing it into different channels. So even when we think about whistleblowing, most people think of an Edward Snowden type category, but that's not necessarily what we want to talk about. What we really care about is insiders being able to spot issues, evaluate them, and if there's concern, have them addressed. And that's what we care about. So there is the internal side. So to what extent can people speak up internally and see those issues addressed? There's then the external side to regulators. So how much regulatory capacity is there? What protections are there for people speaking up? And then it's probably the public side. So speaking up to the public about risk concerns, misbehavior on the internal side. So we actually launched a campaign somewhat earlier this year at the National Whistleblower Day event in Washington D.C. we called upon Frontier AI companies to publish their internal whistleblowing policies. So to make it clear what protections are actually there. At the moment we only have OpenAI, who has published their policy, following them, trying to suppress speaking up via their extensive non disclosure, actually non disparagement agreements which Daniel uncovered. After that they've published their policy. Maybe we can go a bit more into detail on that later. We'll actually publish a pretty in depth evaluation of their policy in a while. It doesn't look fantastic. I think on the one hand very commendable. I think that OpenAI has published this and maybe there's a chance they just haven't had much time to go too deep into that process yet or it's not a high priority. I think there's a strong interest that companies have to improve these policies for themselves because there's plenty of empirically proven benefits of having really strong internal whistleblowing channels for a variety of reasons for companies. But for example, an OpenAI side a little bit maybe I can already share is if the dedicated team as it sounds at the moment it's a bit confusing for evaluating internal whistleblowing claims is the legal team, which is also not directly governed as it seems by the board, which would be best practice for independence reasons. And legal team is sort of considered generally worse practice. They also provide a bunch of other channels you can go to, but it's not really clear when sort of what case will go to whom. And legal team is bad because there's plenty of evidence or like plenty of examples in the past where a whistleblower goes internally to the legal team who the legal team's job is to protect the company. And the legal team immediately starts a client attorney privileged case against the whistleblower. So basically having like client attorney privilege being between the company and the attorney of the company to as a litigation risk for that whistleblower. And then if for example a whistleblower then claims retaliation down the line in discovery, you cannot actually see what was the internal conversation within the company about that whistleblowing claim because it's client attorney privileged. So that doesn't look great. Might again just be the case that they thought, okay, let's quickly throw something together here. And this, this is sort of the most obvious thing they came up with and we'd be super happy actually to work with them as well and make sure they do this better. But that doesn't look great for the other companies. We simply don't really know the state. We've run a survey with insiders in the past trying to understand. Of course we talk to insiders quite a bit here and there to understand like how do they feel. And of course you can look at like previous precedent and how well concerns are handled. I can do mig4 One more example on the on the Google side, when it comes to internal whistleblowing, we've seen Trillium Asset Management, which is like an activist investor. They actually specifically called upon Google to improve its internal whistleblowing under the claim. Basically whistleblowers protect shareholders. They don't necessarily necessarily protect executives, depending on sort of how well you handle rectification of concerns and misconduct before. There's also cases where Google has retaliated against whistleblowers. There's a somewhat contested case, but by Satajit Chatterjee who raised concerns internally around research practices, who was then let go and the case was settled for wrongful termination. Make of that what you will. And for the other organizations, I could share stuff with you. I like hear from people. I think there's definitely degrees to which companies openly have conversations internally and which knows the biggest sort of impact factor. How comfortable do people feel with saying I disagree with this. And probably avid listeners of your podcast can maybe imagine which sort of companies have really strong internal dissent cultures from the frontier and which actually really celebrate like openly disagreeing, for example, leadership and having leadership. At least that's the way it's perceived by many insiders, kind of address concerns and which companies maybe don't. So I think probably in terms of internal state, not looking too great on the internal side.
B
Yeah. So you mentioned that whistleblowing can be in the interest of the company's mission at large, even though it can be contrary to the interests of specific executives. How are the AI companies reacting to these calls for whistleblower policies? Are they perceiving it as something that's in their interest or are they perceiving it as something that's aggressive? Because it seems like becoming a whistleblower is an adversarial action against your own company.
A
It is to an extent, especially if you have an internal process and leadership, for example, says no, this is fine, and then you say, actually no, I think this is not fine and I'm potentially even going to go regulator with it. Yeah, that's, that's surely that's, that's an adversarial process to, to an extent. If that's sort of the mindset of the company. Yeah, right. Again, probably mixed across companies. The reception, we definitely know that there was pushback on SB53, sort of the whistleblower protection side. Companies were not, not happy with that. To have that as sort of as broad as possible. Probably the usual suspects the listeners can imagine were opposed to it and maybe the usual other suspects were okay with it. When it comes to the actual reaction to our call, we haven't seen a terrible amount yet. We've heard from like some companies. We've even. It's been posted on like internal slack channels. It's been discussed openly. The overall even I think from like leadership, at least in one instance there was even positive reception of saying, oh yeah, this seems like a good thing to do. We haven't really seen more publications of whistleblowing policies since then. Maybe a function of just priorities and maybe a function of, yeah, to our employees we say this sounds great, but maybe behind closed doors let's maybe rather not. Yeah, not sure if that answers the question.
B
No, it does, it does. How far would you say we are from the optimal state? So where the optimal state would be, say, your preferred policies both internally and externally surrounding AI whistleblowers?
A
So I think most important thing is to have these strong legal protections, which means should be harmonized and very clear and covering a really broad range of risks. Ideally, this is sort of bound both for regular business, but also on the national security side. There's an element here of also moving culture towards viewing whistleblowing as people acting in our interests. So for one, obviously the clearest sort of implications of having really strong whistleblower protections is that you can go to a regulator without having to worry about too much, at least losing your job in case that happens, that you have really strong protections against retaliation. We can go a bit more into depth in a second. So what that exactly means, but on a high level, that is that then also having strong incentives further than protections. We can talk about bounties a bit more later as well. This has been extremely successful on the SEC whistleblowing program. Then you also want to have really strong enforcement. So if companies actually violate whistleblower protections, so for example, if they try to prevent internally free people from speaking up, going to a regulator, you want to have high fines for that. This is, for example, especially an issue in like in California, for example, if you violate the California state whistleblower protection provisions, you have to pay a fine of $10,000 as a company, which you know, is probably like a 5 second burn for most companies these days. Not a terrible deterrent. Right. I think SB53, for example, has the 1 million fine for violations of the main body of SB53. But the whistleblower protections are part of the labor code again, so they don't really fall under that. That would be something we'd really want to see, I think. And then there is, I think really importantly. So you have, what does the actual program look like, how strong are your protections, what channels can you go through, what can you report on, which is extremely important, but then also the case handling itself. So we ran a survey with insiders and I think the most unanimous question that we asked was how strong is your trust in government to understand and act well on concerns that are brought to you? And that was extremely low. I of course don't want to, you know, not quite sure what the English term is, but, you know, speak this into existence either.
B
Right.
A
There is plenty of evidence also for like regulators also in new fields being able to handle cases, but I think there's Some legit concern around here, if I don't know, we are seeing a threshold being potentially crossed in, let's say, an internal evaluation and there's just disagreement internally. Does this now fall? Maybe there isn't really a clear regulation yet. Right. Which is the big issue in the space around what is acceptable and what is not. Can you go to a regulator with this and say, ah, this seems really bad to me and they will actually help you understand whether that is actually bad or whether it's not. And ideally also having that knowledge gathered somewhere. So there is a case we made for having plenty of different channels that insiders can go to, just so you can pick and choose and feel like this is kind of going more direction. What I want is not. But ideally, of course, you want to collect cases also in one place where then you a picture starts to form. Are we seeing maybe emergent risks across a bunch of different companies where the individual risk maybe wouldn't raise too many concerns. But a regulator side, if you see, okay, wow, there's four, five, six, maybe different companies from one company raising this sort of concern, we should really be concerned that's in the regulator side. I think that's what we want to see maybe a bit more context on the national security side because the moment you work with classified information on like DoD contracts, for example, things all get a lot messier here. We want to see something similar, especially with like strong congressional oversight on national security relevant cases. And that is not looking good at all at the moment, but maybe stepping back a little bit. So like, okay, so let's say you have all these great, these strong legal protections that will lead to also internal channels becoming a lot stronger. Ideally that's part of the, for example, a regulation putting this into place. The proposed Federal Air Whistleblower Protection act does have provisions for internal channels. So did SB53 implied. It's already in the California state whistleblower protection law because you're protected from speaking up internally, but it doesn't actually mandate what a good internal process looks like. So for example, you get problems like the legal counsel being recipients.
B
You think this is the direction it will go. So first you get the strong legal protections and then those protections change internal culture because the stories we've heard from the companies sometimes is that we'll begin having with internal mechanisms that will then serve as a test bed and then we'll see what actually works and then we'll have actual legal requirements. But you think, you think the other way is more plausible or better perhaps?
A
Yes. So I think one is better than the other, but they're not exclusive. So of course if, even if you're not legally mandated, you can have strong internal whistleblower protections already and you can have a really strong process. And it's in the interest of the companies to have such like, very strong processes. Right. For example, to not have, this is speaking from the interest of a company now have, have a lot of leaks happen outside the organization where maybe the organization would have been like, if you had told us this, maybe more clearly internally would have actually fixed it. But if there's no trust that if you speak up internally you will immediately be flagged, then that's not going to happen. So there is an extremely strong business case to be made for having strong internal speak up cultures. And of course we want to see more of that. And that's why also why we're pushing for stronger internal policies even in the absence of stronger legal protections. But for example, to tie it together a little bit, internal policies, whistleblowing policies, especially in the U.S. given that and will employment is prevalent. If for example, a company like, let's take the OpenAI policy example, they say yes, you can report on all of these kind of various forms of risks and we will promise not to retaliate. That reads maybe to the average employee like really well, okay, maybe this is even like a contract that we're entering here. They told me they're not going to do this. However, anything that is reported in an internal channel that goes beyond what is already protected by the law. So what's not a violation of law is therefore purely a voluntary commitment by the company. And given that essentially all companies in California use disclaimers in their employee handbooks and in their contracts, they basically say nothing outside of this employment contract is in any way an agreement between us. It basically means it doesn't hold. And sort of insiders keep maybe potentially running into this sort of trap thinking, oh cool, I'm protected now, but in fact they're actually not. So what can you do to counter this either? Companies can basically, for one, at the minimum, I think they should be transparent in their whistleblowing policy saying, FYI, be aware this is not actually a binding contract. The next level is they could turn it into a binding contract. But I said it's not market standard practice. But this would be an extremely strong signal, for example, that a company could send to say we actually care about speaking up. And then obviously like the strongest way to make this happen to like really improve internal protections is to have strong External protections. So I think we've seen like plenty of empirical evidence here in Europe. We've seen it since the introduction of the whistleblowing directive. But we've also seen it in the U.S. for example, that the moment you have pressure and you have a really good channel that goes external, the companies, they will move into a mode. Okay, now we have to really step up our game internally. And there's strong like empirical evidence asking companies actually to what extent this internal production has improved things for them. It dramatically has, both in terms of detecting more misconduct and in terms of more preventing more misconduct into the future. Like this was a survey with more than a thousand companies or something in Europe answering this, where a lot of opposition was there before. So people were like, oh no, this is going to, you know, destroy our sharing culture internally. It is not actually the case because like, one way you can do is of course you can fight it and you can try to isolate people and isolate knowledge and do it that way, or you embrace it and say, okay, we're going to, this is going to become really important for us. And then you actually handle it well internally. And if you don't handle it well, yes, there's going to be consequences. But you basically, if you go with the incentives, it makes things better for everybody. We've seen the same in the U.S. i think with the SEC, there's been a pretty good paper out on like deterrence effect from the SEC whistleblowing program. Maybe little nutshells, probably one of the strongest whistleblowing programs in the world, which allows people to report violations of SEC regulation. But for example, also if companies try to prevent people from speaking to the sec, go there extremely strong anonymity and confidentiality provisions, which is like critically important with a strong track record. They're also very strong, actually investigating well. So not alerting companies too much. There's been a whistleblower report, very important. And then they also give out bounties. So percentages of the recovery of the sec, if there's actually a case, but you actually qualify for protections under these SEC relatively quickly. You just need to have some reasonable, some good faith belief. So just feel like there could be something here. You don't need to prove anything, which is sort of gold standard. And there was a great paper out that basically showed that there was a significant decrease in misconduct. So strong deterrence effect and violations of SEC law through this primarily via improved internal functions, also via the actual SEC becoming active and finding, I think they uncovered more than $6.3 billion like in fines they handed out since I think like 2010 or something. So extremely successful program. So there's the intervention effect and importantly also sort of the deterrence effect. And I think going by all companies will do this voluntarily is not the right path.
B
No. So if you're a whistleblower or potential person who is, who is looking at some eval metric, say, and thinking, okay, maybe you have a disagreement with leadership about whether this is important, whether this is something the public needs to know about, where do you begin? Do you begin by contacting outside counsel? Do you find a lawyer that can potentially talk to you about this? Confident. Under confidential terms? Yeah. What's. Where do you start?
A
Yeah, good question. So it's a very tricky situation. Depends. High, always depends exactly like on the specific situation of the individual. Big indicator is probably like, are you alone in this? Is there other people who kind of share the same concern? Maybe. And then how comfortable do you feel about sharing maybe your concern internally? It's probably a bit of a function of that. If you're the only person that's concerned about this, it's probably gonna. Doesn't make it more difficult for the company to figure out maybe that, you know, you are the one raising the concern. If it's a group of people that, you know, spreads the knowledge and the potential for who took action here, maybe across a wide variety of people. But this very much depends on individual situation and the trust. So it's definitely not a blackened recommendation to say, talk to all of your colleagues about this or address this internally. First of all, depends highly on the situation. Like how much do you feel you can trust leadership if you're already situation where it's maybe clear that there's definitely something bad going on here, or there's maybe an internal eval result that says a, but your company needs to relaunch this and therefore they hide it. That seems like pretty bad. Generally what we always recommend is much, much earlier than you would possibly think about it. Get legal counsel. Speaking directly to other people in the community can be very risky. And there's been like cases where then something comes out maybe as part of like a discovery can be quite risky. So find, find legal counsel. People often kind of put off by this for like various reasons. One's of course on the cost side, one is on the knowledge side. So will those people actually understand what my concern is? One thing maybe I can share here, sort of one offering that we have that would help in this situation, something called third opinion. So this is something where both insiders can approach us directly or their attorneys can approach us. And the concept here is basically that instead of coming out with confidential information directly, maybe an insider still trying to clarify whether they even have a concern, they can reach out anonymously to us. We have a Tor hosted contact form. There's on the FAQ page on the third opinion subpage on our website aoi.org there's details also on like what the technology like architecture looks like, how we try to make it as safe as possible. Can reach out anonymously with a question around their concern. So without actually divulging any confidential information, describe the question where the answer to that question would help them understand whether their concern is actually legitimate or not. What we then do is we work collaborate with the insider via this tool. So basically the insider gets like a code to log back in because we don't collect any data or emails or anything like that. And then we try to identify who would be the right independent experts. So for example, academia to help provide an opinion on this question, then we provide this anonymity shield. We go out with the questions to the experts. If the experts confirm terms and conditions around that confidentiality and bring back the answers to the insider, and if the insider is still concerned after these, if they're not concerned, great, then everything can go on. If they are still concerned, then we connect them to whistleblower support organizations and pro bono legal counsel. There's another extremely important message I think I'd want to place here with listeners is that there is great support organizations out there. So for example, the Signals Network is one of them. Or psst.org, who I'd recommend every listener also of this podcast to take a look at. They've got many, many years of experience supporting whistleblowers also in tech, like Francis Horgan, for example. A very impactful case in the late 2010s has been supported by them both on the legal side with pro bono legal advice, but also, for example, with media training in case it does actually move towards a public disclosure, strategizing around like the smartest disclosure ways, getting psychological support, maybe even safe housing if it's in a really, really sensitive domain. And we can basically either go through the third opinion flow, put together the expert evaluation, and then also supplement the lawyers down the line with these experts to make sure they actually understand the case and do have the context that they need under client attorney privilege, then with that attorney, as long as that's as far as that's feasible, there's some sort of limitations to what extent you can involve outside parties and experts into this, but basically that's one angle. And the other angle is one can also just reach out directly to us anonymously. If you want consultations on who the best support organization might be, you can also check out our contact hub on our website for profiles of different organizations. Why do they care about AI whistleblowing? What is their experience roughly in the space to find the right support organization for you? Because I think most insiders realize way too late that they may already be in a whistleblowing situation. Because technically, for example, by Kelly Law, the moment you raise a concern internally, you are potentially already protected and most people think oh, you know, I'm just raising questions internally or like maybe I'm sending an email to a superior potentially you're already in the space, the moment. So if you start exposing yourself to risk, you're already in the space and it may be wise to think about things carefully.
B
Yeah. How do you evaluate whether whistleblowing is the right choice? Of course that's a broad question, but I could easily see people being both overconfident and underconfident that whistleblowing is the right choice. You might think this is a high stakes situation. Maybe you think your boss is smarter than you or knows more than you and so maybe you think you're wrong about this even though you're right. You can also imagine that you see something and perhaps you don't have the broader context of why this actually isn't an issue and you want to avoid both failure modes. Do you have resources? Do you have this kind of wisdom of how to think about the situation?
A
It's a very good question. It's a very tricky one going case. I mean by default the most trivial framework is like what's impact of disclosure versus personal risk you're exposed to, right? So for example, on personal risk side, like to what extent is it feasible for you to stay anonymous is sort of the big one. Anonymity is just the greatest protection. And to what extent would that impact, for example the impact of the story? An example here, maybe a good one would probably be recent meta releases around their Genai policies and their sort of ethical, you know, if you can call it that framework right around explicit conversations with miners. This is something for example, I don't have the details here. I cannot show that this was a document that was widely circulated and then anonymously shared with news outlets. Impactful story, hopefully not dramatic consequences for the individual that provided it. So this is probably like so generally this, this framework is that if you can stay anonymous, that should be sort of is the best way to go. Whether that's feasible in the individual situation is of course extremely context dependent. You know, are you part, maybe even of the leadership team? Maybe you're part of the top 10 people within your company and only those top 10 people have access to the knowledge and you're the only one out. Could be, could, could be very difficult.
B
Right.
A
There is a fair consideration around to what extent if there's going to be retaliation down the line, do I want to retain my position to maybe make positive impact on the margin sort of over the coming months, years, or is this, is this the one where I go into that risk? It's a fair consideration. I think it's extremely difficult unfortunately to give like a blanket answer. I'd love to give you one but. Because, by the way, because you mentioned the, the wisdom piece, this is actually one that we have like on, on a roadmap also for Q1 to hopefully publish something there to give find of. Right. Find the right like neither to micro, no to macro level to provide something in there.
B
Do you think it's, it's realistic to stay employed at a company after you've become a whistleblower? Do you think you could be impactful? Do you think you could do good work and push the company in the, in a, in a good direction? Or do you think you would be, perhaps you're formally employed still, but you may be, you know, put in on some team that does nothing or you would just be. It would kind of, it would be formally formal employment without any actual impact.
A
Is it possible? Yes.
B
Yeah.
A
How likely is it challenging? Right. Depends again massively on the case. Yeah. So if you are for example uncovering financial fraud and the individuals responsible for it are let go hopefully, or there's some, at least some, some disciplinary action. Yes, that seems feasible. If you're for example sending a direct contrast conflict with senior executive and that executive stays on, it's probably going to be quite difficult. This is under the assumption that there is retaliation and you don't manage to stay anonymous throughout the path by default. Unfortunately. I think the recommendation is for people to consider or like seriously consider that they're not going to be able to stay anonymous. Yeah, I think this is unfortunately sort of the reality you have to consider. I think for example, for the SEC program there is an extremely strong track record of maintaining anonymity and confidentiality, at least from the SEC side. But also there's plenty of examples where companies, even though the SEC didn't link the name in any way, shape or form to the companies. The company still kind of figured it out. Maybe an alternative question to phrase this is can you have a career after your whistleblower activity? And there the answer again is context dependent but points more to yes. So for example previous Google whistleblowers, at least some of them, yes, they then for example left Google but this was a wrongful termination case anyways for internally speaking up and they're still in a great employment. Now we have this on the other hand you have like Tim and Grabber for example it's quite common for people to move sort of more into the research space afterwards or maybe on the advocacy side. But for example with the recent case that the right to warn one with the OpenAI employees we've seen people go to Anthropic, I think multiple actually by now of them or for example Daniel obviously starting the AI Futures project which as far as I'm aware he's also quite happy with. So I don't think we consider this a step down, although I do not know, it's just my impression I think what's important here, I think for the community and like what we want to also see more of is making sure that support ecosystem is there to catch people also afterwards and like culturally to say hey, we want to support these people and celebrating whistleblowers so to speak up in the public interest to make sure they do find afterwards.
B
Yeah, because just from in a general sense we probably want more transparency and more insight into what's happening at these companies and we want therefore we, I think on the margin we want to encourage more people to think about or at least consider whistleblowing. How do you, how do you gather the courage to actually do that if you're in a cushy job, if you're going to potentially, you know, become a person that's known to the public, you're going to lose your job. Maybe, maybe you, maybe your network is in the company, maybe you feel like you're betraying people. On the emotional side of this question, even if you're convinced it's the right thing to do, how do you find the courage?
A
Again, very tough question. So for one, again I'd probably try to reframe it a little bit. As in we don't want to rely on courage. We want to build the right systems that barriers drop as much as possible. We talked before on the legal side of things that this is in fact the moments something becomes codified into law. It does also immediately Become more of a standard process and more of a standard thing. Like actually I can give another example here of somebody who worked also in like finance on Wall street before then moved into like a big tech company and say was shocked that because, for example, SC programs are so strong, compliance is just such an integral part of the culture that for example, using something like an SC whistleblowing channel would not be seen as completely abhorrent. While maybe in big tech that's not really the culture, there is just not that strong sort of compliance culture and making sure, yes, we do actually care about adhering to the law to the same extent. And that may not be the case. So for one, I think again, the legal protection side is going to do a lot and also transforming that culture, making people see, okay, this is what is expected of me. This is in fact societally approved. People want me to do this. So I think there's a strong angle here. Support system like spot, ecosystem strengthening I think is another big one. Talking about it with colleagues. Another thing maybe I didn't mention before, I think for like recommendations on insiders is probably start talking about it now already and start like taking notes as an insider, start thinking about do I see people raising concerns? How is that going for them? Talk to other people. Well, before, you know, you come into the concrete situation about how have your experiences been? How do you feel at the moment? Like when you raise like dissent, is it being taken serious? Is it not being taken seriously seriously? If you're a manager of a team, make sure that happens as in talk to them about do they feel comfortable raising concerns. Also just going to make your team happier if you do that as a side note. So I think building the right systems is the most important thing, like a practical matter. I think the people that do become whistleblowers are just primarily driven by. They feel this is just really important and either the public needs to know or this has to be rectified. This issue in some way or another and thinking about the consequences may not be as bad as potentially calculated. This is not advice. Right. I would always think about, yes, be prepared sort of for the worst. But there is support ecosystems out there. I can actually share something else here. There's going to be an AI whistleblower defense fund that's going to be launched in the coming weeks, which we are going to promote as well. There's already smaller funds by the Signals Network and psst. But this one is going to be offered by legal advocates for safe science and technology. Last they're called Run by Tyler Whitmer, incredible organization. And they're going to be focused specifically on funding defense strategic litigation for AI whistleblowers. So also, again, I think this is another example of bringing down the barriers a bit and to make sure people don't only have to rely on courage to make sure to do it.
B
If we see whistleblower policies from all of the frontier companies, how do we know whether these policies are sincere or whether they're published for. For PR reasons? Is there anything we can. We can. When we evaluate them, how do we know whether they are fake, so to speak, or whether they are. They will actually protect whistleblowers in the end?
A
Yeah. So probably a few different elements. For one, on the internal whistleblowing system, sort of in general, this is going beyond the policy. What you really want to be looking for is to what extent does the company manage the internal whistleblowing process as like an actual business process? So, you know, if you really care about any sort of business process like marketing, you measure, you measure and improve and repeat all the time. That's what you expect to see when you care about a process. If we see companies not doing that, then that probably tells us to an extent how much they care. At the moment, I don't think we have evidence of any of the frontier companies like doing this. This is maybe also not call for publisher policies. Like, level one is just the policy, but that only tells you as much. Level two is really much more. Like how much do they manage this process, ideally also publicly. So do they share? For example, how many requests did you actually get? Yeah, how did that number develop? What percentage is anonymous as maybe an indicator also for trust? How many retaliation claims were there internally? How were those resolved? What actions have they taken? Things like that really matters. So that's sort of level one. Do we see credible evidence of companies taking this seriously by measuring and by being transparent around this measurement? The policy itself sort of has like this precision, two, three elements here. For one, it's the degree of effort that has been put in. So if something looks like it's kind of slapped together, that probably tells you again how seriously they took this. If it looks like, okay, they've actually thought about this, they've run stakeholder sessions to shape this, the transparency around how they developed this, that tells you they take it seriously. Of course, you can still kind of fake that, at least on the policy side to like an outsider on the internal side. You'll probably notice like, does this seem like a PR thing or do they actually care? Does Leadership actually promote this on a regular basis. Do they celebrate people internally who speak up? Do they maybe make take a minute or two for their town halls and saying we want to quickly highlight person ABC from department XYZ who had raised this thing and then of course with consent raised this thing through internal whistleblowing hotline. We found actually this, there was a problem and this led to us changing our ways. This is a great thing. Yes. Or maybe even saying this person came up and we didn't agree. We evaluated it and we thought actually no, this was wrong and therefore we didn't take those actions. And maybe. Here you go, here's the floor to the person that reported and you can also make your case again if you want. You know, that would be like really great to see and there's something inside you can only see when it comes to the policies themselves and like the structure. I think the most important thing probably is the governance setup outside of the measuring and monitoring. So what you really care about is having this function be independent and you want to make sure that actually there's a person there that you can trust that they're going to act in your interest as a whistleblower and they are actually competent and trustworthy. So this is probably the most important thing. Like one way you can do that is by good governance setup. So does the whistleblowing function report to the chief executive? Maybe not ideal. Right. You probably rather want to look at the audit committee as a classic or the board audit committee is pretty. Pretty classic. Maybe if you have a more exotic governance setup like anthropic, maybe the long term benefit trust would be a really good host for something like that on the OpenAI side, probably under the nonprofit, if it's still going to exist. The future, let's see.
B
Right.
A
Those are like things you probably want to be looking for. Then you want to look at who are the recipients. Again, come from the independence side. Right. Legal, like a term like legal team is probably the worst you could possibly go for something like that. You want to look at both as signal for do they care? And is this a trap.
B
Right. Where the ultimate level of independence would be actual law such that it is fully independent from the company. And you could imagine say a yes a government service where you can.
A
Oh yeah. I mean that would be on the external regulator side. Yeah, yeah, absolutely. I mean, if you want to, we can also talk a little bit more about sort of what that would look like or what like ideal policy would look like on the external side.
B
Yeah, I think that's an interesting question. So how different would that look than internal policies? If you imagine a service you can, you can go to and say, I am, I am looking at this, this email and I'm worried and my concerns are perhaps not being taken seriously. Yeah. Where would that fit into, say US law or EU law?
A
Yeah. So ideally that should be sort of the core of, of also like the external regulator set up. Maybe we can start on the EU side. So we are, we'd be very excited to see the EU AI office establish a centralized whistleblowing reporting channel for people to report, for example, violations of the EU AI act, where that should exactly be the function. So the function should be first of all educating insiders about like their rights and then making very clear to them what does the process look like, what are the confidentiality provisions? That matters a lot, obviously, like over communicating on that side, making sure they feel comfortable and then being in touch, especially with the whistleblower, where basically the whistleblower can come. There's ideally a hotline where they can understand the process. What does it look like? When does it maybe start to move outside of my control? Like at which point in time when I ask them about this, is it going to be okay, we are going to start enforcement now. And to what point can you pull back and so say, actually no, maybe never mind, is that even feasible? But explaining things like that really matters and then coming back and staying in constant communication, you can do this anonymously as well. That really, really matters. And this is exactly how we want to see it. So as an insider, you can go, for example, to the UI office, say, I think there is something here. Can you help me understand? Do you think there is also something here? And then working with consent to make sure, you know, at what point do you want to proceed? There is actually, in the whistleblowing directive, there is already requirements on like Member States to at least come back within seven days with an acknowledgment of receipt and then provide updates, the latest every three months. In fact, if they don't do that, there is a right to go public and you're still going to be protected from retaliation. In Europe. We don't have that in the States, unfortunately. So if I could put something, a few things on my wish list for like EU regulation on AI, US regulation on AI, that would be one thing. On the US side, I think the feedback is not as strong, unfortunately. There's no commitments, no guarantees that regulators have to come back to the whistleblower, which again matters a lot for the peace of mind of the whistleblower because they risk this, because they care a lot, otherwise they wouldn't even go risk this. And it also prevents maybe then if the regulator thinks, actually there's maybe not really much here, the Whistler, for example, going to the public and causing for example, pain for no reason. Although I think overall less concerned with that, like the moment. Usually I think people are willing to speak up about something and risk their career. They probably have really good reasons, really good reasons to do so, but not always the case. And so this probably relates also to the capability to evaluate cases. As mentioned, I think, for example, in the proposed federal AI Whistleblower Protection act, which we endorse, there's a multitude of channels which can be good, but you want to have some probably center of excellence or some way that all of these different recipients understand how to evaluate these reports. If you're going to go that route, you can go the European route and say we're going to channel everything in one place, which has the expertise benefit of course as well. But then maybe you don't have as many channels you can try out as an insider to see. Maybe they think this is something, maybe they think this is something that's centralized and sort of on the US side because it's more decentralized and there's a bunch of different laws because it's like fragmented. Maybe you fall into this, maybe you fall under this, maybe under this it would be important that there's some strong way that recipients of such whistleblowing reports can access knowledge and quickly evaluate whether there's actually a case here or not.
B
Yeah. What do you think of alternatives to whistleblowing? And here I'm thinking about evaluations, data done by external organizations and red teaming in particular. So what are the strengths and weaknesses of those alternatives and what is it that whistleblowing provides in addition to those methods?
A
Yeah, so both of these are very important. We do not have mandatory third party testing in the U.S. mostly, I think under the EU it's going to come with the introduction of the AI Act. Naturally that means eval providers, for example, are in a bit of a strange position where of course they have to oversee, but they're still, for example, bound by NDAs and the companies let them see whatever they want them to see. This is still a great thing. I mean, I'd much rather live in a world like right now where we do have like a meter or an Apollo that do work with the companies, uncover new things, maybe almost like an extended workbench at times, but also do, for example, have sometimes freedoms. This is not from them, by the way. This is just my musings here now, not quoting anybody here. Sometimes have freedoms to publish things as part of system cards, sometimes have freedoms to talk about things they see, but also sometimes don't because they're still bound by NDAs. So it's good to have them. Fantastic to have them, in fact. Right. Would be better if they were actually protected or had some stronger rights. In that sort of world that we're probably in right now, whistleblowing is more important because if companies hold back certain knowledge or they maybe prevent evil organizations from saying certain things, then who again, who do you go to? Who's the fallback that's going to be the inside or the whistleblowing side, maybe. Important note here as well. Under European law, evaluation providers would be covered under whistleblowing protections if they, for example, spot a violation of the EUAI act. Under US Law, they're not, at least in the vast majority of cases. Also, for example, now in SB53, it's that only extends to employees. Any violation, in fact, based on like state, California, whistleblower protection only extends to employees. I think probably the argument companies may be making here is oh, yes, we're working with them voluntarily. The moment they have whistleblower protections, we're just not going to involve them anymore. I don't really think that holds for like a bunch of precedents, again, from the past. There is value that these organizations provide. They will work with them. And again, again, you can basically say which way do you want to go? Do you want to go the way of being responsible and compliant or do you want to go the other way? There's also another option here which we would really favor if that is really the breaking point where you can still, for example, provide protections to evaluation companies like organizations to collaborate in investigations and prevent retaliation against correcting investigations without giving them, for example, the full rights to report on, let's say an SB53. That's something like that we would have really liked. Really liked to see.
B
Yeah.
A
But maybe a bit too in depth now if we move into a future where things like these do become more mandatory. Yes. Then there hopefully is less of a need for whistleblowing because we're going to catch more and more risks earlier in the, you know, this in the Swiss cheese model. So ideally we'd like to not have any single whistleblower ever, you know, over the next 10 years. We'd want to be everything, be caught well before red Teaming, whether it was internal or external, red team probably falls sort of in the. In the same category of concerns. Not sure if people are aware of the Nathan Len case where he was part of the red teaming. He did speak up then about his concerns and was excluded from the red teaming. Right. So just not. Not a great indicator here either.
B
It seems like a tempting option for companies because in some sense it solves their problem. This is like when a legal department immediately begins treating an employee that's thinking about whistleblowing, for example, as an adversary. So this is kind of tension where companies are trying to solve one problem. But yeah, they are. They might be undermining valuable information by trying to kind of effectively protect the company. Is that, yeah, I know I've asked this before, but is that attention you see resolving or is there. What, what would you say to the companies to convince them that this is at least partly in their interest also?
A
So I think, like the, the business case for stronger internal speakup cultures is, I think, relatively clear, at least, like, empirically speaking. There's been, like, great studies being done on this. If you want to have really productive researchers, they probably want to have a lot of context. They want to feel comfortable voicing their opinions. And this is how you get to a strong internal speaker culture. Like, yes, you can try and brute force it from the top, but, for example, having structures in place does really help. It does really help. And uncovering misconduct to make sure, for example, it doesn't reach the regulator and doesn't lead to a major PR crisis. If you want to address it, well, also really matters. There is a business case we made here. Of course, we also have to stay realistic. There is just plenty of companies, especially in the big tech space, that do not particularly care about necessarily whether they're violating the law and maybe are not erring on the side of responsibility and safety, but primarily want to maybe, you know, boost their share price. And they just see anybody sort of speaking up against what executives have decided as, at minimum, a nuisance and at maximum, somebody to be actively fought. I think there's been plenty of cases where these quite, quite terribly. I mean, maybe the readers can familiarize themselves with the case of Ashley Grovnik, the Apple whistleblower, and like the retaliation she's experienced since her whistleblowing. I think Meta has been quite aggressive in coming after people speaking up in the public interest, like, for example, trying to block Sarah when Williams spoke from being promoted. I think there's plenty of benefits to improving internal speaker culture, but once company leadership has maybe set their mind that is not the path they want to go down to, then that's the reality we have to work with.
B
Yeah. If we imagine that the world is going to be moving at a faster and faster pace, so the pace of research model releases, everything might be moving quite quickly and it might happen perhaps sooner than the mainstream thinks it will. How do you think about whistleblowing in short timeline scenarios? Is it. Yeah. When say you're, say you're working in one of these companies on the safety team, say if there is such a safety team and you begin noticing some problem and then perhaps another problem and then perhaps a bigger problem, when should you spend your political capital or your whistleblower capital? When. How should you decide when to. When to blow the whistle?
A
Okay, so I'll probably treat this as a scenario. Yeah. Not. Not status quo. Yeah. Where we have, for example, there's been a major breakthrough maybe like a month ago that's been announced something maybe on architecture change, maybe it's about recursive self improvement in some way, shape or form. Maybe we're noticing some step change happening around the meter line, keeps going up on length of task completion and we're noticing actually a large amount of problems on the AI research side sort of maybe are not at 20 hours of work, but maybe they're at 8 or something like that. And so suddenly we can unlock a bunch of capability gain and we just ride much faster. That's the sort of scenario or something. Right. That I'm picturing right now. Generally, I think it depends sort of how intense the race is at this point already. So are we in full blown race mode already or is there still like significant doubt and it's mostly your organization that feels like, okay, we've cracked it now if you're in like full on race mode for example, and like everybody's already all in. So no kind of node lab that has any chance of getting there has any problems anymore raising any sort of funding. We're talking like hundreds of billions potentially now that flow into like whatever investors, champions, whatever they've picked. Then you probably can, I would think at the moment, and this is, I mentioned something before around like a research piece we want to do this is upcoming research piece. There might be a reason to be less concerned about the sort of information you're publishing having impacts on like accelerating arms races because you're already there. Well, probably on the way to that you would want to think twice about like publishing information at least let's say to the public or to Other races about information that says, okay, we're actually accelerating dramatically at the moment. For example, like information on safety risks, for example, or things not going well are probably usually decelerationist caveats here. I said there's probably like cases here where this is not the case. Or like then for example, if the leader publishes something on not getting ahead, then maybe others think, okay, now we have to really race to catch up. We have a fighting chance here now. Right. So there's. Yeah, but there's probably like something around the content information where your considerations may change. There might be something around how much trust should you have in the information being accurate and true where probably today you really care about avoiding boy who cried wolf situations. Yeah, well if you're in full on race mode, maybe that's less of a concern. If the capabilities kind of grow accordingly, then also the public is going to notice, regulators are going to notice. Everybody's probably going to start getting more and more concerned. Probably so. And lastly here maybe in combination with this, what's your forecast of how well we're tracking? So do you think we're heading like we actually solved a bunch of alignment problems, for example, and the leadership of the company that is developing this as well as the oversight of the company on a government level is actually trustworthy and you trust them to like make the transition. Well, okay, then there's probably like if this is your expected value roughly then you have like probability mass of your disclosure. Like what impact is it going to have on what the future looks like? You probably want to be a bit more cautious. If you think we're heading straight for catastrophe, then you may be interested in rolling the dice a bit more on sort of what sort of information you publish. Especially if you're like in full on arms race momentum already. Again, caveat, this is sort of a current state of thinking rather than what we'd what we spent a lot more time in like Q1, Q2 next year.
B
Yeah.
A
Another consideration on the individual side is in terms of I think you mentioned the capital in general. I think no major change to like the situation right now. That considerations, a few ones are probably, if you're maybe not extremely senior and you see more of your research being automated, maybe you're a lot more powerful today than you are maybe in three months if you feel like, okay, we're actually not hitting certain alignment things right now and we're using misaligned models to try and align the models of the future. Probably good to speak up now, but that's sort of A core content consideration. On the individual support side, you could probably expect support to be even stronger then from the like let's say the responsible AI or safety community than it is today. So I mentioned the defense fund for example before. But I think if we live in a world where the writing's now on the wall, I would expect a lot more funding to also flow into support ecosystem. So whether it's defending against legal cases brought forward by a company, whether it's getting you really great safe housing, these sort of things. Another consideration here, and this is probably the last one there on the personal side for the insider. More is already now of course frontier companies have internal safety team like as internal security teams who are only tasked with making sure no information gets out that is going to step up again, especially the moment we move into like national security classification sort of direction. If this is a path that we're going to go down, then yeah, things of course get hairier.
B
Yeah, yeah. How, how difficult do you would you expect things to get for a potential whistleblower? If we're moving into a world in which AI is recognized by governments to be a national security concern and say we you have some form of semi nationalization of, of the frontier companies, then it suddenly becomes a very difficult situation to be a whistleblower. That would be my guess. What do you think? Or potentially it makes it easier because the government might also implement a process for whistleblowing. What do you think is most likely nice?
A
So also here we have a research project sort of slowly starting at the moment, especially around classification creep and like what sort of areas would be most concerned about if certain kinds of projects maybe would be classified and what would be implications for whistleblower protections there? Well, I said we were only kicking off now overall significantly more difficult.
B
Okay.
A
So I think especially for example, like depends very much on then what is the future state of that administration. For example, like at the moment we're seeing for example, somewhat recently Tulsi Gerbert removed the acting counsel of the Intelligence Community Inspector General who are meant to receive whistleblowing reports on classified information, which is congressional oversight basically. And the move as it looks here is that this acting council was replaced with an advisor that reports directly to Gabbard. So it's meant to be an independent oversight mechanism, but now it's going back into executive branch. This does not look good. I think talking also to members of the Intelligence community, there's been a bit of a gutting both in terms of actual individuals looking at internal whistleblowing claims within the intelligence community or classified information, but also in terms of dismantling independence of oversight mechanisms. If something like this were to continue, that would not be good in many different ways already before, like raising concerns within like based on classified information did not go super well and it also kind of only went well if it was in the interest of the program. There's a reason why we got an Edward Snowden disclosure which went to the public. I think it would be quite concerning if we saw like a massive over classification on for example frontier research and deployment. And there's things to be done here hopefully. But to an extent we just have to hope that the people handling reports and classified information are going to do it.
B
Well, is there anything we can do to prepare for that? Is there anything you could put in place such that the information that's potentially relevant for the public to know is not classified? Is there any like this seems a bit. Yeah, this is of course difficult to do, but is there anything we can do to prepare for such a situation?
A
I will tell you after a research project.
B
Yeah, yeah, that makes sense. I want to raise a hypothetical here. Say that we have AI models that are increasingly capable of automating AI research. Would it be possible to implement whistleblower policies for the AI researchers themselves, perhaps implemented in their training or their post training, their model spec or anything like that that would make it so that the AI researchers themselves could inform the public if necessary.
A
Sounds interesting. Not quite sure. On the technical side, sort of bit of a. Okay. If you trust that the model actually does what it is meant to do.
B
Yeah.
A
And something like that in the model spec and we have good proof that in fact it is going to do something like that. That seems like a good thing. Maybe it could also can go to the public, can go to a regulator, can alert other relevant recipients. It would be able to take some kind of qualified action.
B
Yeah, I guess one problem with this suggestion is just that to some extent it assumes that alignment is solved. To some extent it assumes that we can get our values around whistleblowing into these models. But it's just, it would be an interesting additional layer. We talked about the Swiss cheese model, an additional layer of security potentially if we could get it to work.
A
So I think probably on the technical side I'm probably not sufficiently qualified to speak on it. Like also to what extent do models actually have introspection capabilities? Right.
B
It wouldn't necessarily have to be introspection. You could imagine one generation of models working on the next generation and so you would have the full code available. Say true.
A
True. Fair point. Sounds like a nice thing to have. Probably a bit outside of our scope. I'd probably, as per usual, say not to rely too much on it, but that seems like. Yeah, sure, an additional, I think somebody else written on this before a good additional layer.
B
Yeah, fantastic, Carl. That's all. All of the questions I had for you. Thanks a lot for chatting with me.
A
Thank you very much, guys.
Episode: What Happens When Insiders Sound the Alarm on AI? (with Karl Koch)
Date: November 7, 2025
This episode explores the crucial but complex topic of AI whistleblowing—how insiders at leading artificial intelligence companies can spot, escalate, and act on risks and misbehavior that may otherwise go unaddressed. Host and guest Karl Koch, a leading figure in AI safety advocacy, examine what whistleblowing systems exist (or lack thereof), why robust protections matter, how insiders can evaluate difficult choices, and why a culture shift is urgently needed for the future of responsible AI governance.
“Whistleblowing is often used as like the final sort of...if a bunch of other control mechanisms fail, then you have to rely on that.” (Karl, 03:10)
“Legal team is sort of considered generally worse practice. There's plenty of examples...where a whistleblower goes internally to the legal team...and the legal team immediately starts a client attorney privileged case against the whistleblower.” (Karl, 08:35)
“Going by 'all companies will do this voluntarily' is not the right path.” (Karl, 22:58)
“The people that do become whistleblowers are just primarily driven by...they feel this is just really important and...the public needs to know or this has to be rectified.” (Karl, 36:08)
“What you really want to be looking for is to what extent does the company manage the internal whistleblowing process as an actual business process.” (Karl, 39:52)
“It would be quite concerning if we saw like a massive over classification on, for example, frontier research and deployment.” (Karl, 63:29)
“If the vast majority of the most highly skilled people ... work in those companies... what regulatory capacity would you need to check whether things are actually going in the direction we want them to go?” (Karl, 04:51)
“Anything reported in an internal channel that goes beyond what is protected by the law ... is therefore purely a voluntary commitment. ... It doesn’t hold.” (Karl, 19:47)
“We don't want to rely on courage. We want to build the right systems that barriers drop as much as possible.” (Karl, 36:18)
The conversation is earnest, analytical, and candid—balancing legal, systemic, and practical considerations. There is a healthy skepticism about current protections and a pragmatic optimism that building better systems is both necessary and achievable. Koch’s tone is knowledgeable, occasionally dryly humorous, and always direct.
Summary prepared for those seeking to understand the state and challenges of AI whistleblowing, why it matters, and how insiders can safely navigate these high-stakes decisions.