
Loading summary
A
Welcome to the Future of Life Institute podcast. This episode is a cross post from the Cognitive Revolution podcast featuring Nathan labenz interviewing Ryan Kidd. Ryan is the co executive director of mats, which is one of the largest AI safety research talent pipelines in the world. Please enjoy.
B
Ryan Kidd, co Executive Director at mats. Welcome to the Cognitive Revolution.
A
Thanks so much. I'm glad to be here.
B
I'm excited for this conversation. So I've mentioned a couple times that I've been a personal donor to mat. I think it's actually the first time we've ever met and spoken, but your reputation certainly precedes you and I've seen a lot of great work come out of the program and a lot of great reviews and in my research also as part of the Survival and Flourishing Fund recommender group, got a lot of great commentary on the importance of MATs as a talent pipeline into the AI safety research field. So big supporter of your work from afar and appreciate the fact that you guys have come on as a sponsor of the podcast recently as well. This conversation, not technically a part of that deal, but you know, we're in one of these OpenAI style circular flow of funds sorts of things where we're somehow both inflating one another's revenue.
A
But I like to think we didn't buy our way onto the podcast.
B
Yeah, no, the enthusiasm is definitely is real because I've heard so many great things over time. So excited to get into this. I thought we would maybe just start with kind of big picture from your perspective. And I think, you know, having watched some of your previous talks, I know that you play sort of a portfolio strategy where you're not like, you know, I have a very specific narrow prediction and I'm trying to, you know, maximize the value of this organization, of this program for that, you know, very hyper specific prediction. It seems like you're more kind of saying, well, there's a lot of uncertainty out there in the space and we're going to try to be valuable across kind of a range of those scenarios as much as we can be. With that said, you can kind of speak on behalf of yourself or on behalf of mentors or the community as a whole, like where are you guys right now? Where are we in terms of timelines, so to speak? And how has your strategy evolved over the last year or so as we've gained more information on where we are relative to the singularity?
A
Yeah, okay, so I don't like to have opinions here or I don't like to have opinions very loudly and the Reason for that I think is because as you say, we are somewhat like a hedge fund or something or maybe an index fund. Right. More likely, which is to say like we have a broad portfolio, we adopt a bunch of different theories of change as valid and we try and like you know, have our thumb in 100 pies. So I would say like in terms of Mass's institutional opinion on this, definitely we tend to go with like things like metaculus and prediction markets and the Forecasting Research Institute Fri, their predictions and so on. So the current metaculus prediction for strong AGI I think it's called, which is, I think you can ignore most of the requirements of the test and just look at one of them, which is the two hour adversarial Turing test that's predicted somewhere around mid 2033. So I think that is probably the best button we have for when AGI of that nature occurs. Now we recently just had, two days ago or three days ago perhaps dropped this new AI futures project, dropped this new report which two Matt's fellows were. One is the lead author, one is a contributing author on. So very excited about that and that just was updating their model and I think they predicted something between 2031, well 2030 to 2032 depending upon how you define AGI. They broke it down to all these automated coders, they can do all the coding stuff, these top expert dominating AI across all these fields and so on. So I think, I don't know, somewhere around 2033 seems like a decent bet. But also we had Nathan Young recently compiled all these different forecasting platforms. So Metaculus Manifold, another metaculus poll that was for weak AGI. It was a little bit less good at Turing Tests and all these other, I don't know, some couchy thing on whether OpenAI will achieve it. And he came out with an average of 2030. Now I don't know, I still like the metaculous 2033 but like I wouldn't bet against 2030 in terms of nearness of AGI. As for superintelligence, complicated, right? Could be 6 months or less. Could be a very hard takeoff after this AGI thing. If it's like a very software only singularity scenario where you don't need a big bunch of hardware scale up, you aren't limited by computer. It's just recursive self improvement or something algorithmic improvement. AIs are improving the algorithms that train AIs and it's like wow, that's a fast feedback loop, right? Or you might need A lot more experimentation. Right. You might need massive hardware scale ups, you might need like just staggeringly more compute than exists in the world, in which case that could take you a decade, right, to get your singularity. So I currently think that 2033 is a decent central estimate in terms of the median for what we're preparing for. But obviously 20% chance by 2028, I think that's the metaculous prediction. That's a lot. Right. So we should definitely be considering scenarios that are sooner. Right. And particularly I think the sooner AGI happens, the more dangerous it might be. The less time we have to do critical technical research to prepare, the less time we have to implement policy solutions. And I don't know if it's happening during a transition period for a US government could be even wilder. So I would say like median bet on 2033ish, but like really care a lot about the impacts of AI. Like front load your concern to pre2033 scenarios. And I think that Matt's mentors, we haven't surveyed them, but I think if they work, if we were to poll them we'd get something similar.
B
Yeah, I always say I'd rather prepare for the shorter timelines and then have a little extra time and I'm sure we'll find ways that we'll still need it. But it does seem wise to me to play to that sort of, you know, first quartile possible, you know, range of outcomes. Is there still any room in the, in the program or in the community? Like if I showed up and I was like a 2063 guy, would I be sort of out on an island on my own and you know, are there any work streams going on right now in the sort of brain, computer interface or sort of de totalizing? Obviously interpretability has different flavors, but with the recent turn toward more pragmatic interpretability, I wonder if there's any space left for the sort of we really want to understand everything kind of interpretability or if that has kind of been understood generally as like eh, that's probably going to take too long for us to really be excited about pushing it right now.
A
Yeah, it's a good question. I actually think there is plenty of room for this and here's why the mainline kind of meta strategy that the AI safety community seems to be pursuing on the whole we're talking in terms of funding, in terms of sheer numbers of people and resources deployed, not necessarily in terms of less wrong posts written or something, but in terms of resources deployed. Is this AI Control strategy, which is where basically you build perhaps perhaps better called Alignment mvp, which is a term coined by Yan Lakey, former head of super alignment OpenAI, now co lead of Alignment Science and Anthropic. An alignment MVP is an AI system that is a minimum viable product for accelerating the pace of alignment research differentially over capabilities research such that we get the right outcome. So basically you're getting AI to do your homework and there's been a lot of debate on this. There's a very strong camp in the direction of like, this just never will work because as soon as an AI system is strong enough to be useful, it's dangerous. I think CLAUDE code shows this is not the case, at least for software engineering, but perhaps for people who think that aligning AI systems requires serious research taste, they would probably say that this CLAUDE code is nowhere near there where general AI systems are nowhere near that level of research taste, ability. Now, all of the things that you're mentioning that pay off only in 2063 scenarios, presumably they only pay off over that time period. Not necessarily because of like, I don't know, human challenge trials or something. Maybe that makes a difference if you're, you know, if you're interested in like, I don't know, making humans more intelligent with genetic engineering or some of the crazy things that are being tossed around. But if you're mainly interested in like, oh, this thing is going to take decades of technical work, maybe you can compress those decades into a really short period of AI labor. Right? If you can like get them to run faster, massively parallelize things and just in general, just get them to do your homework. Those 2063 AI alignment plans might be automatable over a shorter period of time. And so we should definitely be pursuing those because the more we do to like raise the waterline of understanding on these different scenarios, the easier it will be to hand off to AI assistance or to accelerate AI input. I do think it's interesting you said BCI research because I recall being at a conference once when someone's talking about, okay, so the way we're going to solve alignment is we're going to solve human uploading and we're going to put someone into the computer and get them to do 100 simulated researcher years or something. And then maybe it's not very sci fi, very pantheon, but then Eliezer Yakowski put his hand up and he said, I volunteer to be number two. Which makes sense, right? You don't want to be the first guy that might go wrong. But yes, people are seriously pursuing that. And I think it is interesting. I have talked to some BCI experts about a year ago and they said there's no way that we get BCI in time for AGI. Sorry, it's not bci. Sorry. No way we get human uploading in time for AGI unless you actually have AGI. Right. The time period required would require massive amounts of cognitive labor and human trials and stuff like that. And I don't know, it does sound very sci fi. So I don't think we should rely on something like that, though. I'm all for people pursuing moonshots on the side. That's part of what maths is about. Right. We have this massive portfolio with a few moonshots in.
B
Okay. So there's a lot of different directions I want to go from there. And I'm trying to just make sure I keep a running tally. But maybe an interesting first one would be how do you think we're doing on the AI safety front overall? Maybe relative to your expectations? I mean, you mentioned less Wrong and Eliezer. And there's this sort of, I don't know all the lore of mats, but I do understand that like a lot of people who have participated in it over time come out of the Eleazar discourse and had a certain set of assumptions that were like, we're not going to be able to teach this thing our values. It's going to be like, you know, extremely unwieldy from the beginning. And now we have Claude and it's like, man, you know, that's come a lot farther than I thought it would have at this point in time. And I'm kind of surprised in general by like, how little I see people's p dooms moving. It seems like the people that had really high ones, like, remain really high, those that were never worried, remain not so worried. I kind of feel like I'm taking crazy pills at times where I'm like, I don't know, I see these, like, deceptive behaviors. They kind of freak me out. It's like amazing that that was anticipated as well as it was by the, you know, the safety theorists, even in the absence of any actual systems to work with. But then at the same time, you know, it's not crazy to me to say that Claude seems in many ways like probably above average in terms of how ethical it is compared to the average person like that. You know, I don't know if that's contentious to say, but Claude's. Well, you know, it's. It's awfully it's pretty remarkable in that respect. What do you make of where we are? Are you as confused as I am, or do you have a sort of a more sort of opinionated sense of how well we're doing overall?
A
Honestly, Nathan, I'm pretty confused. I do think that contrary to expectations, we are looking like language models understand our values. That's the first thing to update. They understand them in some key sense. It's not just regurgitating stochastic parrots. Language models are really good at understanding human ethical mores and extrapolating on them in some scenarios. They're also really good at sycophancy. They're getting even better at deception, sophisticated deception. They do tend to deceive users in the right circumstances, though. It does seem like there's some debate about this, and it seems like some of the deception, it's far from what we might call consequentialist, like hardcore consequentialist deception. In most situations, though, I think alignment faking and some other papers have shown that you can create situations where AIs will deceive the user to achieve some ulterior objective, which was something that was deliberately given to the AI as an objective. So that's one of the constraints of these little scenarios. I don't think there are any examples of AIs coherently deceiving users pursuing this coherent objective, not just what we might call good heart deception, where they just kind of. They fall into deceptive tendencies because of the limitations of training data. And I'm talking about this coherent deception. There are a few cases of this, if any, where this is like the sustained coherent deception that appears like to arise spontaneously through the training process, which is pretty good given the level of capabilities we have. Like, it seems like people didn't think five or 10 years ago for sure that we'd have AIs that are as capable as assisting frontier science, that are safe to deploy. And people are like, we're never going to put on the Internet, who would do that? That's crazy. And now they're on the Internet and. And notably the world hasn't ended yet. That's not to say it will stay that way. You know, certainly a thing you don't want to do with the superintelligence is what is out of the box. But yeah, it does seem like we're in a better scenario than many imagined. Now there, of course, like, we could be in the calm before the storm, right? It might well be that there's what they call a sharp left turn or Just a radical change in, in the way AIs internally kind of process information and might acquire these kind of coherent long run objectives. I could point to like Matt's mentor Alex Turner's conception of shard theory as an example for how this might happen. Right. So like instead of AI systems being this like you know, containing like a single MESA optimizer that is kind of coherently forming under training, right. If you remember the old Evan Hubinger paradigm, your outer optimizer loop which is training your AI system causes it to develop an internal like optimizer architecture which then can have its own goals that differ quite a lot from the training objective. And then like whenever you. And like, presumably there's some counting arguments such as there are arbitrarily many ways to like have this optimizer form to produce the right outputs because this thing is clever and if its main goal is to produce paper clips or some other thing, then it's going to realize it's in a training process and it's going to give you the output you want. No matter what its goal is. We still could be in store for that kind of thing, but currently it seems like we don't. AI systems are really messy. They're kludgy like human brains. They have a bunch of contextually activated heuristics. It sees there's a bracket there and it's like oh, maybe I'll put another bracket there. It's a very simple dumb circuitry. But then sometimes it does stuff that is like in context learning seems a lot like it's actually pattern matching to gradient descent when models are learning from the inputs data stream and learning some new complicated thing seems a lot like they're kind of optimizing over the input tokens or rather optimizing to produce some output. So we might be in for a world where AI systems do spontaneously gain these MESA optimizers and these things, these are a serious source of concern because they're very powerful a trying to optimize for some objective and this is what this is. The main concern I have, I guess is that we have this kind of deception model, this inner alignment failure perhaps where AI systems acquire goals spontaneously or maybe because they're being trained deliberately to be power seeking and make money on the Internet and then they decide to hide and we don't have interpretability tools good enough to detect them. So I guess I haven't really changed my fear about this scenario eventuating, but I have become more confident that we can elicit useful work from AI systems Before we see obvious signs of this, I'll say. And I'm pretty confident that AI systems right now are not executing very powerful scheming against us because I think we would see some sort of like warning shots. Like I don't think it's going to be as like night and day, you know, I think we'll see some kind of situations where AI systems are kind of trying to scheme in really dumb ways before they try and scheme in like very competent, difficult ways. Does that make sense?
B
Yeah, I think that's a good summary. I mean eval awareness definitely stands out to me as one thing that is making everything a lot weirder and just harder to feel confident in anything. Not sure really what to make of the deception track that you outlined there. I mean in some ways it's like we're in the. I've often said it feels like we're sort of in a sweet spot where like they're getting smart enough that they can help with science and yet they're like not good enough at using a web browser to go out and you know, get too far in terms of self sustaining or you know, causing whatever havoc. And on the deception side, I'm like, yeah, it seems like it's the gradual rise of. It seems like an example of physics being kind to us. It doesn't seem like we're seeing the sharp left turn. It seems like we are seeing these like proto behaviors that are, you know, at least give us something to kind of study if nothing else. But I'm not quite sure how people get confident in the idea that like maybe they're just not that good at it yet, you know, like how, you know, when you said like we would see warning shots, are these not the warning shots? That's one thing I'm still kind of confused on. People seem very quick to me in some cases to be like, well, you know, that was a structured eval and it was sort of, you know, led into it and it really just wanted to like it. Yes, it refused to be shut down or it took steps to avoid being shut down, but that was just because it wanted to accomplish its task, not because it had like a, you know, take over the world objective. And I'm kind of like, well, okay. Still though it did resist being shut down. Like at what point should I start to consider that to be a warning shot? I'm not sure there's like an answer to that. I'm not sure there's really a question there or that there's a way for you to Answer it. But I guess basically it's just another layer of my own confusion. And it seems like people are very often just led by extremely different intuitions in response to the same fact pattern. And I'm not really sure what to make of that. But when people. The deception, one in particular, not that you were doing this just now, but I've heard a lot of different ways that people sort of say, well, we don't have to worry about that too much. And I'm like, I don't know. I'd really like to know that that's resolved at some point. You know, certainly if I was going to go, one of my common refrains is like if I was going to be part of a military that was going to go into battle with my AI systems, I would really want to know that, you know, deceiving the operator issues have been like, well and fully ironed out and I don't know, it seems like we're a little like casual on that even at Anthropic. Right. Like they, you know, have certainly done some of the best work on this stuff, but they still also kind of like seem remarkably chill about it to me. It's strange.
A
I mean, I don't think people should be going to battle with AIs for many reasons. Like I think, I think that's, that's a pretty bad social more to, you know, to set, to allow that kind of thing. But that's, that's another matter, I guess. Like definitely AI systems I would not feel confident in current or future generate AI systems like not doing out of distribution failures, let alone like in critical scenarios, let alone like scary things like deception, you know, when it really counts. And I think that's a big deal. And we should be tracking two things, right? Like one AI capabilities. So are AI situationally aware? Like do they have the necessary prerequisites to be able to even understand that they are this AI? It seems like they do, right? And even they even know kind of their training date. They know some details, they can detect their text from other AIs texts. So as you become increasingly situationally aware, which is one of the necessary prerequisites for really dangerous deception, do they have the capabilities to hack themselves out of the box to steal money? We had this MATS paper that came out and caused a stir recently. It was this collaboration with Anthropic's AI Red Team where they found that, oh AI system, let's just put it in a simulated environment with a bunch of real smart contracts and it can find $4.6 million worth of exploits. Well, that's a lot of money. That's enough to like set up your own server and you know, run for quite a while. And that was, that was like a relatively short project and it didn't seem, it was, it was pretty hands off as well from the humans, but not entirely. So it does seem like we're getting dangerous capabilities increasingly so, hacking out of the box, getting money, getting influence, all that kind of stuff. And then I think we want to be tracking all of that very closely. And I don't think we're at red lines currently, but we are approaching them, I would say. And separately I think we should be tracking, as you say, this model organisms work where we try to elicit dangerous behavior from AIs. You can think of this as like with your child, right? You leave some cookies out and you're like, don't eat the cookies. And then you turn away, but you're like secretly looking and if they ate the cookies, I caught you. So that's the kind of thing we're doing. The thing is that AI systems can really detect when they're in training versus real environments. But if you recall the AI 2027 scenario and a lot of this discussion around that, people were talking about online learning, it's like we train openrouter trains the last big AI agent and then from then on it's just kind of constantly learning online through some sort of RL paradigm. If AI systems are perpetually online, then they're always in deployment and you've got to have monitors and control protocols. So that's why control research is so important for that, especially for the early days to catch some of these, these slip ups, right? So you can do all the model organisms work you want in the lab and that, that's like one layer of defense to see if we have these capabilities or these penchants for deception emerge. And separately you need to have all the kind of the control evals studying them as they're deployed. Especially if they're going to be learning online, perhaps updating their behaviors and just be constantly checking for this stuff and be ready, have like a fallback plan, a rapid response plan. What are you going to do if actually you see serious warning signs? Can you shut the models down? Your stock price is going to plummet. What do you do? Do you revert to an older system that's safer probably. So I think yeah, we should definitely be tracking this stuff. And I wouldn't say that we are in the clear by a long shot. I would say that we are in a better world, by my estimation, than Bostrom and Murray predicted, you know, 10 something years ago. But I don't know. They would say, they would say I'm very wrong about that. But I don't know. I think that it's useful that we can get some work out of these things that. That looks like it is actually quite likely to accelerate AI safety work.
B
Yeah. So that brings up another, I think huge question for AI safety research in general, and probably the strongest, maybe not in, I don't know if you would say strongest in the sense of being most compelling to you, but certainly the most hawkish or fiercest criticism that AI safety research gets is that it always ends up being dual use and that it always ends up somehow accelerating the core capabilities track. And some people would say just stay away from the domain entirely and focus on social shame or whatever. I do believe we can do better than that. I think we probably have to do better than that. But I wonder how you think about that. The canonical RLHF was sort of a safety technique that really turned out to be more of a utility driver than anything. I would say. I guess they're both right, it is dual use. But certainly when it came to accelerating the field, making the things useful, waking the world up, having all kinds of people pile in, everything going exponential all at once, you can kind of trace back to at least in part, this transition from real raw next token predictors to actual instruction followers. And we've got probably a lot of those things going on today. The one that I think stands out to me the most is one you've alluded to a couple times, which is like getting the AI to do the alignment work. That sounds awfully close and uncomfortably close to recursive self improvement, which is something that I am quite fearful of. I do think again. Claude seems pretty ethical. The GPTs aren't too bad either, but yikes, are we really ready to have them do our alignment homework? So how do you think about teasing out, as you kind of prioritize different kinds of research, like where you want to invest, what kind of mentors you want to bring on, what kind of talent you want to cultivate through the program? How do you. That seems like a huge question. That is a really hard one. How do you think about it?
A
It's a very good question, and I'll preface by saying that all safety work capabilities work fundamentally. Like people like to distinguish these things in terms of like, oh, capabilities work is about the engine. It's about Making the plane go faster and safety work is about the directionality. But as you pointed out, rohf, which was intended as safety work to help the directionality steer it to where you want to go, also made people realize, oh, wait, this thing is useful. I can actually hop in this plane now because it's going to land where I want to. Which made them want to make the engine go faster so they could get there faster. Right? And that whole feedback loop started. So I actually don't know if you can avoid this. Like, the only way I could conceive of doing safety research, there's no impact on capabilities until, like, I don't know, the final critical moment when you deploy it is like being holed up in a lab somewhere with people that you utterly trust under crazy NDAs, and like, only having access to staggering resources, whatever is required. Because presumably maths and theoretical methods aren't enough to improve safety. At least that seems to be the lesson of the last 10 to 20 years. I don't know, I could be wrong, but it seems like that the interplay between theory and empirical research is pretty vital for most types of disciplines like this. So you have to have staggering resources, perfectly loyal team like all these NDAs. No one's going to reveal your research, and then you build the system in secret or something somehow, and then, okay, then you deploy it and then maybe you open source your alignment technology, that one has it, or somehow you disable all the bad actors or something. It just seems like a very difficult prospect.
B
I think that sounds like safe superintelligence in a nutshell. Maybe the setup that they seems like, that's the setup that they've got. Extreme secrecy, unlimited resources. They did have one notable defection, but otherwise, you know, a team that has resisted lucrative buyout offers. Yeah, nobody knows.
A
I'm not trying to, like, defend. I'm not trying to defend research like this or even defend your capabilities enhancing safety research per se. I'm just saying that, like, it's pretty hard to imagine a situation where you. Because I think you do have to build the AGI at the end of the day. And I know I'm alienating a lot of people who might watch the show when I say that, but I think that you kind of have to from a pragmatic perspective, because the market forces driving this are very strong. Now, there are some options that we could take, right? We could build direct source, comprehensive AI services. So you never have to have, like a centralized agent. You have distributed kind of mechanisms, right? You build scientist AI that is very narrow AI systems to serve a bunch of economic things. The problem is I think they all get out competed by agentic AI that like stat like you stack like an AI company filled with agents and they all like go out in the stock market and make products and you know, and so on and just, just make more money and just beat your crappy narrow AI solutions. So the problem is like it's not just about making AI that is aligned, it's about making AI that is performance competitive enough that it dominates in the marketplace. The only alternative is to like have some sort of draconian like shut it all down kind of kind of thing which I am just very skeptical of ever working. I don't see any example of such a thing happening. The closest example we have is like stopping human cloning. But that was not a lucrative bet like in the same way that AGI is I claim. So it's just. Yeah. And also like human cloning is kind of this, it violates this deep social more I think in a way that few people today conceive of powerful AI systems to violate. I think they're wrong. I think building a second species is actually going to violate like some deep social more in the same way that human cloning would be. But I don't think people will see it that way. So that leaves us with the fact that we actually have to build the AGI. But if we can build products that are safer. Right. Or perhaps are under some strict regulatory control that we have some like really like ideally like I don't know, 10 year international slow phased entry to the new AGI world where all these countries and companies are kind of forced to be very careful and collaborative in the way that they align their models, then we're in a much better world. That's the world I hope for now as to whether AI safety research is unnecessarily capabilities enhancing some is perhaps RLHF. I think I'm on the fence. 50, 50 definitely at some point RLHF was, the idea was in the water. It doesn't seem like it would have lasted much longer if Paul Cristiano and Dario Amade et al hadn't done that. I think someone else would have done it. That's not to say that you should necessarily try and accelerate the frontier of capabilities. It seems bad on net, but certainly RLHF opened up the door to a lot of very promising ways to build alignment MVPs which kind of is the Cristiano metastrategy too. I don't know, it's hard to say. I'd like to run the counterfactual simulation and see where the world would be without ROHF1 or like one year sooner or two years sooner. That would be interesting to see. It definitely did kickstart, I think the ChatGPT revolution and productizing AI systems. But it's hard to see given how small the AI safety field was at the time. The AI safety field I think has grown from the increase of AI exposure. So you would have had some amount of additional AI safety research that happened had ChatGPT moment not happened then it happened like one or two years later. But I think it would have been kind of insignificant if I'm honest. I don't think that the field was big enough now you could say like, okay, what if you also tried to pour resources into like secret AI safety projects at the same time? Delay RLHF Delay ChatGPT, build up the AI safety field via networks. The Miriam summer schools weren't doing a lot and Matts came along just before the ChatGPT moment, December 2021. And yeah, I think like the first MATS cohorts were a little bit less, like a little bit more directionless than the later cohorts. Definitely. Like I think safety research really kicked into gear after we had ChatGPT. Not to say that was the only cause, but there were like a lot of things happening around that time. And I think that like, definitely larger, more capable models have enabled certain types of essential safety research you could not do with smaller models. We're talking like interpretability on models that actually have coherent concepts embedded in them. Though we'll say there's probably plenty of work to still be done on GPT. Two small but linear probes and whatnot at a high level can target some of our frontier models. Quin these Chinese models are particularly good for that. Certain types of debate. Like we had the first interesting empirical debate paper only after models were good enough to debate and there's many, many other such examples, like all the control literature I think just could not have happened as well. Sorry if that's too much.
B
No, it's great. Yeah. I was going to ask also about the idea that it sounds like you sort of believe it, at least up to a point. But going back to the sort of founding mythology of Anthropic, I think one of the notions that was seen as like a legitimate reason even among pretty hawkish AI safety folks for starting a company like Anthropic was, well, if you want to do this Safety research, you've got to have frontier models to do it on, otherwise you're just inherently behind. And then you know, what good is that, right? What good is it to work on like last generation model? So obviously we've got quite a few generations between GPT2 and now and sure like we don't understand plenty of things about GPT2. But then I would also say there are a lot of emergent behaviors that are not observed in GPT2 that are definitely of interest, including you know, many of these deception and eval awareness things that are kind of most hair raising to me today. Where do you come down on that now? Like I wonder if, if somebody's like geez, should I go to a Frontier company? Because that's where the best models are and that's where you know, inherently that means the most consequential work would be done there. Or I could go work independently or at any of a number of other organizations and I might be limited to a smaller Quinn model or something. But maybe that suffices. Maybe there is enough in those kind of second tier models as we enter 2026 that you don't really need to be working with the latest again. I think I am mostly probably just confused or unsure about this. But do you have a take?
A
I mean yeah, for plenty of interpretability research. People aren't, people aren't using the Frontier models. You don't have access to them. I mean sure people in the labs are, but at MATS there's tons of really excellent papers that keep getting produced and from many other sources. Right. Eleutherai, Far AI, et cetera. They're like doing world class interpretability research on sub frontier models. Because today's sub frontier model, today's Quinn or Deep SEQ or Llama or whatever is. It's like yesterday's Frontier model in terms of capabilities. We're at that point where these models are all above the waterline for doing really excellent research. So from an interpretability perspective, I don't think you need to be pushing the frontier that much, if at all from the perspective of other types of research agendas such as like weak to strong generalization and other types of AI control and scalable oversight things. I think you kind of do need more data points. I'm not saying we've exhausted everything you can do with the current models, far far from it. But I think you are going to need more data points to build up consistent and to see some of these kind of worrying behaviors emerge where your weaker model can't Actually supervise your strong model in all situations, which by the way is predicated on this idea that verification is easier than generation P versus np, blah, blah, blah. Especially if you can see that person's thoughts and they can't see yours. So I think it does make sense to be at the frontier from that perspective. But I will say that I think the main reason that the companies are doing this is obviously to make money. And then in corollary with that, from a safety perspective, you were trying to actually make a strong case for being at the frontier. It would be like, so our models are performance competitive and that they're close enough to the frontier, like fast follower kind of model that you take a performance hit by using ours instead of the competitors. But they're safer. And currently no one wants to use anything that's worse than the frontier model. Why would you. That's the best model. But if, like a model was like, I don't know, 10% more likely to tell you to jump off a bridge or something, or actually, seriously, 10% more likely to hack your bank account and steal all your money, let alone, I don't know, make a, you know, escape and make a bioweapon, I would like to think people would use the less good model. And I like to think that regulators and insurers could adequately penalize the frontier companies into complying with those because then you have existence proofs like, oh, my product. I'm actually trying. I made an effort, I tried to make my product not do the heinous thing that the very best model developer is doing. Then everyone has no excuse and they have to do that and governments can compel them to, and so on. So I think making your model performance competitive enough that people want to pay the alignment tax, so to speak, seems like a viable strategy from that perspective. Now, of course, none of this is trying to justify the current frontier like the race of frontier models, which seems very reckless. Let's be clear, right? I think at the current pace of development that we're going to be in a lot of trouble. But this is one of those collective action problems. These companies have to coordinate to slow down. And there's international things at stake here as well, because now you do have a US model versus China model developer kind of race now that they're in the running. So it's very complicated. And when you have these collective action problems, I think the main way you solve them is through governance. And sure, the lab leads could be probably even more collaborative. And definitely some of them are not advocating as Strongly as they should be for slowing down and for having this kind of collective kind of sharing in the alignment benefits and not pushing the frontier dangerously. But I do think this is ultimately a job for governments.
B
Yeah, this might be a bit of a digression, but quick follow up on when you said companies are primarily doing this to make money. I actually would model them fairly differently than that. Take OpenAI for example. Sam Altman has said we could burn $5 billion, $50 billion, $500 billion, I don't care we're making AGI, it's going to be expensive. I think that's almost a direct quote. And then when you look at their mission, I'm always struck by how even just the way that they've chosen to define AGI to me strikes me as kind of inherently ideological. Like you could set your goal in any way, shape or form you might want to and theirs is like explicitly in this frame of out competing human workers. And I think that they are sincere in their expectation that it's going to be good for everyone and it's going to free people from drudgery. And I certainly hope they're right about that. You know, love to live in a world where people don't have to do work they don't enjoy doing. But I don't know, I kind of, I think of them more as like obviously they're quite different too across the different companies, but I think of it less as like trying to get rich and more like trying to make a real mark on history is kind of the biggest summary that I would give for a lot of them. Does that resonate with you or not really?
A
Maybe I don't want to. I can't really speculate on the psychologies of the leaders of these labs, let alone their shareholders. Not shareholders so much as I guess like venture capitalist investors and everyone else. They've made promises to their clients and so on their employees. I can't really speculate about that. I will say that given that the value of AGI is estimated as at least between 1 and 17 quadrillion dollars, like I think that seems like that's a lot of money. It's like a pretty big mark on history. I'm not, I'm not sure like if it even matters whether they're trying to make a big mark on history or make money, you know, from. We can adopt Dennett's intentional stance, right about the AI companies be like, okay, so what does it look like they're doing? Like if we were to conceptualize them as like A coherent agent trying to do a thing, what is the thing that they would be trying to be doing? And to me, it seems a lot like they're trying to make a bunch of money. But making your mark in history could also be valid, though, I would say, I guess, like in the world where I expect. I don't want to use any specific AI lab as an example, but I think, like in the world where an AI lab is trying very specifically to make their mark in history and not trying to make a bunch of money, I'd expect, like, it might look identical actually to this world right now.
B
Yeah. Given the capital requirements. I mean, that is. It seems like Anthropic has sort of said as much. Right. Like, in their early days, they were less focused on commercialization or even, you know, sort of thought they might try to stay not commercial somehow, or less commercial. And now it's just like, well, you can't really do that if you want to compete in this particular game, because certainly as long as you believe scaling laws are continue to rule everything around us, then you kind of just have to show that you can bring in resources to attract resources, and that is the path to making a mark on history. Again, Ilya maybe stands out as somebody doing something quite different there.
A
He's a. He's a. He's a billionaire. He's raising huge amounts of money for his models. Like, maybe, maybe he is going to make more money this way in expectation than he would have made staying at OpenAI. It's possible. He's got his own company, you know, now he still retains OpenAI stock, I'm sure, you know, he gets to.
B
Yeah, and I think that also, by the way, that's not something that I. It's funny that he's come up a couple times in my mind just as sort of pattern matching on some of the things you've said. But the idea that somebody's going to go straight to super intelligence and then, you know, drop it on the rest of the world, I think he's kind of softened on that a little bit. But like, that general pattern is, you know, I think if there's One thing that OpenAI probably did have, right, the iterative deployment idea, you know, and giving people a sense of, like, where we are and, you know, not keeping the whole thing under a basket somewhere, I think that was one of the things that, you know, seems like it's aged pretty well in my.
A
I mean, I first heard my earliest perception of person advocating that this was Paul Cristiano. You know, his Takeoff speeds, post pushing back against. I suppose what he saw is like predominantly the Miri perspective at the time, like, oh, we're gonna, you know, we're gonna build the thing in secret or frankly, I don't want to common. I don't know what Mary's objectives were, but I know that they were trying very hard to like not leak any information about their alignment research in some areas. Other areas they published great papers and so on. But Paul Cristiano at the time was like pushing back against like this kind of like he thought that fast takeoff was what would happen if you had a bunch of dry tinder lying around. So if we like had like tons and tons and tons of GPUs and then we stopped research for a year and then started again, well, you'd expect like a steeper growth, right? And we're seeing this like in terms of there's like very, very fast followers. This is not just like a phenomenon in AI, right? In economies, right. It's like Epoch recently did a study where they showed like the pace of like new AI companies approaching the frontier is just so much faster than the pace which the frontier moves because there's an abundance of chips, there's an abundance of data and methods and so on. And it's the same with like, you know, catch up economies and so on in the world. So I think that Paul Cristiano, he was right in the sense that if society is to cope and adapt to AI, then having gradual release and diffusion of technologies is better. From that perspective, there's another perspective which says it's that very gradual release or something that ensures continual VC reinvestment to drive the engine to actually make the progress. Whereas in the other world actually you just wouldn't build AGI because perhaps in that world no one can build it without several hundred billion dollars, maybe a trillion or something. I don't know. I can't say. I certainly think that we're now in the world where it does seem better to have gradual release of models than to have it all kind of hit us at once.
B
Well, I always value the opportunity to get perspective from someone like you who is such a connector and it's such a, such a central node with so many mentors and mentees and all the kind of flows of information and talent flows that you are so close to. But we should probably narrow focus a little bit and talk about what you guys are actually doing at mat and I'll put maybe a timestamp in the intro so people can also, if they want to get right to the core Matt stuff, they can zoom ahead and join us. Skipping over some of the higher level stuff, why don't we do just kind of a quick rundown on like what are the different tracks? I think you call them streams of work that are happening in the MATS program today. Maybe like a little waiting or sort of, you know what you're most excited about. Then we can go into kind of your assessment of the AI safety labor market, which I think is really interesting and unique and we'll take it from there.
A
We recently changed up our track descriptions. So we previously had like the standard oversight, control evals, governance interpretability, agency, which is sort of a catch all term for cooperative AI and agent foundations and AI sentience, digital minds research and of course security. But we've recently changed that up because we want it to reflect more, less the theory of change underpinning those kind of things and more like the type of process and type of individual that works on this. Right. So we now have the tracks on our website. Empirical research. This is AI control, interp skill, oversight evals, red teaming, robustness. A lot of this very like hands on coding, heavy iteration, focused research. Right. We have policy and strategy, which is different again. Right. That's much more focused on like, less on like archive publications, potentially more on modeling, more on like adapting to technical research into things that are actually actionable by policymakers. Theory is another track. So this is a lot of mathematics. It's foundational research on the concepts of agency and how agents interact. It does include some of that agent based modeling for cooperative AI technical governance, which is plenty of stuff like compliance protocols, eval standards, how to actually enforce these kind of things. If you have an off switch, how would you even make such a thing be viable in a governance framework and compute infrastructure, which is stuff like tracking where chips are going. Right. Because if you're going to have international compliance with various types of trees, you need to know where your chips are and what they're running as well. Or at least have some zero knowledge proofs that guarantee they're not doing terribly dangerous things. And of course physical security for if you're going to prevent. If you build superintelligence or AGI or something, presumably you don't want everyone to have access to it and arbitrarily modify it, give it weird goals. That would be bad. Some people say that's good. I say that's we don't let everyone have nukes, you know, why would we let everyone have superintelligence? Seems kind of Ridiculous. So you gotta have physical security to prevent that from happening, to prevent diffusion. So, yeah, those are the main tracks. Now we're super excited. I think we have somewhere over like 50, 60 research mentors lined up for our summer program, which applications are open right now. And it's going to be the largest program yet. 120 fellows across our Berkeley and London offices. Anything else I should say?
B
Yeah, maybe you want to do like the weighting of those, like how many. I don't know if you would break it down by how many mentors are in each of those categories.
A
Yeah, current program has something like 27% evals, 26% interp, 18% oversight control, 2012% agency, 10% governance, and about 9% security. I wish I had a figure I could show. I do have a figure, but this might be hard to show in the podcast format, but as you can see, it's a pretty even mix of things. We have something like maybe roughly three times as many people doing evals as doing security. So there is some divergence there, but we have a pretty broad portfolio and that's just because there's tons of amazing researchers. We really just pick some of the top researchers in every category.
B
Maybe it's a good time to just name some, drop some names.
A
I could do.
B
Yeah, there's a lot of names that people will know.
A
Yeah, I mean, some of the oversight control researchers might be more known because this is one of the things that a lot of the companies, big companies are pursuing. So we have people like Buck Schlegeris at Redwood Research and his whole team are part of that. Ethan Perez and Sam Marks and many other people at Anthropic, we have Eric Jenner and David Lindner and Victoria Krakofena and many at DeepMind. So yeah, just tons of people doing oversight control research. For interpretability, obviously we have Neil Nanda, we have some of the Timaeus folks like Jesse Hugo and Dan Murphy. We have of course, the Goodfire people like Lee Sharkey, longtime mentors, some of the people from Simplex who are doing some of the very interesting stuff like Adam Shea and Paul Rikers and they're doing like, yeah, some of Timaeus. I would say that there's some of the more interesting, kind of, maybe more moonshotty but very promising Interp research bets that I have on the side as well for evals, we have people from Meter, we have people from UK ac, we have people from Apollo Research, Marius Hauphan, plenty of others there. Yeah, I could go on. There's some amazing researchers there. We also have some harder to categorize research. Yoshua Bengio's whole team at Law zero, Yoshua Bengio himself and plenty others are there. And we have some AI sentience research as well. Digital minds. So Patrick Butlin at Leosai, Kyle Fish at Anthropic. Yeah, it's a very exciting program.
B
Yeah, that's quite a who's who in a couple past podcast guests and a couple that I took note of as maybe need to put out an invitation. It seemed like the balance if I was kind of categorizing those right. It seemed like a majority would be in that first empirical category. Is that do you think it stays that way or you know, your comment on kind of ultimately this is a job for governance tracks some of the honestly kind of the more like Miri line these days, right. Like I think the Miri line today would be like, we don't really have time for that much research. Like we need to just go straight for the global treaty. You're not obviously, you know, quite so confident in that direction, but it sounds like you do believe ultimately that, you know, there is a major role for governments to play and you're starting to move more in this like governance and policy direction. Do you see that as like, is that going to be like the biggest growth area reflecting that worldview, or how do you expect the balance of these different areas to evolve over time?
A
I actually can't necessarily say. Well, okay, I can speculate. But I'll say this. We have had actually about the same proportion of governance researchers, give or take a few percent for the last two years. It hasn't changed by fraction. So we are quite on track for continuing the same trend. Potentially part of the reason is we are based in some of the particularly in Berkeley, SF Bay area. This is a big technical hub. There are other programs that have had more deep governance focus. Like we have Govai, their classic fellowship. We have IAPs, we have of course Rand cast this large program running out of RAND and plenty more besides. But yeah, and of course Horizon Fellowship for US Policy careers. So these have also kind of existed for longer than we were around. So at a time when we were basically the biggest fish in town, which we are still in many ways in terms of funding and I think in terms of procedures as well for technical AI safety, we're the biggest and best program. But I would say that for governance there's always been a bigger fish. And so we've never felt that it's necessary to overweight governance beyond what our mentor selection committee indicates. In fact, that's the primary determination of what tracks get selected is our mentor selection committee, which is somewhere between 20 to 40 top researchers, strategists, et cetera, Org leaders that we survey. When everybody applies as a mentor, we decide at the mentor level based on the feedback from our mentor selection committee who gets in with some additional caveat that like, we also have some diversity picks and minimum requirements because we want to support a great breadth of research and we think that the mentor selection committee on the whole might be biased in some ways as well. So we try and like really talk to the experts when it comes to picking the agendas. And it so happens that like governance researchers have historically been like relatively low rated by our committee, which contains many governance researchers. I think I would, I would go so far as to say that governance research is harder to do. Well, in some critical sense. Like, it's harder to see, like what is the actionable thing to do in some ways. Now, everyone who has their specific governance agenda, I would say doesn't feel this way for a good reason, right? Within their agenda, they have clear, actionable things to work on. But I think on the whole, there's just so many more possible technical directions to pursue that are high leverage in some ways as well. I think a lot of the governance stuff is like, oh, we're trying to build. This is not talking about advocacy now, right? This is talking about technical governance. We're trying to build technical governance solutions such that if an administration so deems them worth deploying that we have the capacity to do that, we actually have the solutions that can be deployed, which is very important. Right? But I would say that don't rule technical research out because it is like, especially if we have something like a regulatory market or even like warning shots to cause the public to wake up and tell Congress to regulate this stuff, right? We have to have technical solutions ready to deploy to make these systems safer. And the cheaper we make it, right? To make systems that extra degree safer, to lower the alignment tax that companies have to pay to train and deploy their systems to be safer, the more likely they are to do it when they come under pressure, either internally, externally, or whatever. So I think that lowering the alignment tax via technical research is super important still. Also, if this alignment MVP plan is going to work, we have to have a bunch of directions for things to be iterated on by these AI assistants or humans calling teams of AI assistants as it's more likely to be. And you actually have this Massive interplay between technical research and governance research, where things like evals and safety cases built on technical AI safety solutions are things that can actually be tangibly put forward in policy proposals. Right. And can convince policymakers of demos and evals and model organism honeypot traps. Right. Where AI systems deceive the users or whatever. This is what convinces policymakers to make policy and gives them a tangible target for their policy to work on. Right. So there's a clear flywheel here. So I would say do not rule out technical research. And there is a reason why Matz has so many more technical mentors, and that's just because it seems like on the whole, our mentor selection committee thinks that, I guess, on average, a technical portfolio is worth pursuing.
B
Yeah. That reminds me of what Jake Sullivan said in terms of advice for the AI safety community, which was basically, you need to make this stuff as concrete as you possibly can so that people like me have something to really latch onto. Because as long as it remains sort of a theory or like a possibility or, you know, whatever, it's just really hard to get government to do much stuff on that basis. So he was like, the more grounded and concrete all these fears can be made, the more likely you're going to have success in the policy realm. You mentioned advocacy as well, briefly. There's. Would you ever consider an advocacy track? And I guess it might be like advocacy research. You know, I feel like right now we do have groups doing advocacy. Obviously, I'm not sure how data informed their advocacy. You know, strategies generally are. But I'm always struck by when I do see survey results, it's like, yikes. You know, the public is not like super keen on AI in many ways. Do you think that would ever be something you guys would expand into?
A
I mean, you assume we haven't.
B
Yeah, I haven't seen it.
A
So we are a 501C3. So we have to kind of. We have to keep our adverse advocacy stuff to a minimum. And I think Matt's. A lot of Matt's strength is this kind of impartial player. Like we're trying to be somewhat of a research university, tech accelerator kind of vibe. We don't necessarily. We want to play favorites politically. Like, that's not in anyone's interest. Right. I think if people are doing that and they're trying to be the thing we are, they're doing a bad job. That said. Right. I think, like, we do, we do currently, I believe David Krueger is going to be a mentor in the current program and some of his research is to do that he's, he's gonna be discussing is to be, is to do with, I guess, like what sort of messaging and what sort of like standards are actionable. Right. But of course this is, I wouldn't say this is like true advocacy. This is more masses supporting independent research, working with David Krueger, who has his new org, Evitable, not inevitable Evitable, which is focused on some of these advocacy questions. So I think Matt says to be pretty careful in terms of obviously our 501c3 spending requirements for advocacy. We haven't spent anything on advocacy for what it's worth. And also ensuring this political neutrality so that our fellows, our mentors and all of our strategic partners just can feel assured that we are solutions oriented. Right. We are pushing for a particular outcome. Right. And I think that AI safety being a political football is just a bad idea. And I applaud advocacy orgs like ENCODE and plenty of others like, you know, perhaps Case, et cetera, for their efforts to try and you know, to. That's not, that's not MATS as an organization.
B
Yeah, gotcha. Toe in the water at most for now. Let's talk about the profiles. I both watched a talk of yours and read a blog post from about 18 months ago where you kind of sketch out the different archetypes of AI researcher that you have seen and then also kind of map that onto the demands of organizations. And I don't know how much it's changed in the last 18 months, if at all, but maybe give us the kind of baseline and then if there's any update, I'd love to hear how things are changing, especially, you know, I have in mind, of course, Claude code. And it may accelerate certain people, it may empower certain people to do things that they couldn't otherwise do. But yeah, tell us what you know, first of all, how you organize your thinking about the kinds of people that you're bringing into the program.
A
So I mean, Matt's like, I've talked about a mentor selection committee. Well, we are fundamentally, I think this massive information processing interface. Right. So we consult the very best people as much as we possibly can and we try to like, we build our own opinions, but we don't rely on them. Right. We try and consult experts every stage. So the paper or blog post you're mentioning, which is called talent needs of technical AI safety teams to construct that we surveyed like 31 different lab leads and hiring managers, whoever we could get like the most senior person we could get related to Safety at every AI safety org we could find. That was like hiring at that time. And we asked them, what do you need? And then we compiled all that, that survey, all those interview notes into like three archetypes, right? This is just technical. We've since done this for governance. Expect that to drop soon. So those three archetypes were connectors, iterators and amplifiers. So we chose the term connector because these people are bridging gaps between theoretical arguments for AI safety and theoretical techniques to make AI safe and the empirical techniques to actually make it happen. So they're sort of like spawning new empirical paradigms to work on. Okay. No one is hiring these people. It's pretty rare because if you're good at that, then everyone knows your name and you're already hired. Perhaps you're already leading an organization. Everyone wants to be an ideas guy, but very few people want to hire ideas guys. And these people, typically, it's people like Buck Schlegeris, you know, AI Control, Paul Cristiano, right. Just huge amount of resources he's produced and so on. You know these people, right? And they typically like have AI safety organizations they founded and lead. Then there's iterators, right? And this isn't just. This is not just engineering, right? Iterators are active researchers who have strong research tastes, who are pushing the frontier, but they typically aren't creating like novel paradigms based on like theoretical models of things. They're typically advancing empirical AI safety. And you can even imagine iterators and technical governance agendas as well. So this is the majority of people that are working in AI safety today and also the majority of hiring needs in the future. And then there's amplifiers who I think like the closest example is like TPM archetypes. I'll say this for iterators, like, prominent examples include like Ethan Perez, Neil Nanda, Dan Hendricks. Actually, I think Dan Hendricks maybe crosses some. Crosses some boundaries there. But yeah, amplifiers to distinguish them. They have more focus on amplifying people and typically you'll find them on large research teams and they're scaling the number of people that can be effectively managed and can contribute to organizations. So a lot of MAT's research managers would fit this category or TPMS at the various labs. And interestingly, they're actually quite in demand as well, particularly for labs in like the technical 10 to 30 FTE range. They're the most in demand archetype because it's very hard to hire great people managers who also have the requisite research experience. You're trying to hit two bullseyes. And there are ways of course, like Google has this sort of model where you have your research managers and your people managers, your project people managers, and they're somewhat distinct. And MATS does try to do this for our mentors and our RMs. But yeah, I think the need for amplifiers is only going to grow because as you've said, things like Claude code and other AI systems are going to erode away the technical minimum technical skills required to contribute. And I think also AI agents are going to take more of those things. You end up in a situation where your people skills, your management ability, your networking, your amplifier skills in general are the more bottlenecking thing on AI safety research. So all those iterators out there, there are job opportunities. You are still the main thing everyone wants to hire. But if you don't try and build up your management capabilities, you don't work at managing AI systems, then you are going to be left in the works as the needs of the field shift amplifiers.
B
So to just try to echo that back to you. The connectors, another name for them might be conceptual visionaries. Like these are the people that define research agendas where they just previously didn't exist, like de novo high concept work. They in turn need iterators which sound like essentially machine learning. Engineers is kind of the core skill set.
A
I Scientists. Scientists. Engineers, yeah.
B
And they're the ones that are running the experiments day to day, building the tooling, writing the code traditionally to do the visualizations of the data and kind of taking this sort of initial conceptual hit that the connector came up with and like really kind of systematically mapping out that space. And then these organizations as they grow, then they start to need amplifiers, which I maybe would call like leaders. You know, people that can sort of build up an organization, see that people are working well together, that it can go past the sort of two pizza rule right in scale. Is that changing now? When we hear things like 90% of code will be written by Claude and that seems like it's kind of closer to right than wrong. And certainly I've vibe coded three AI apps for family members for Christmas presents this year, which is something I would not have come close to being able to build previously even with just one or two generations of of model ago. I do wonder how much the skill set is already changing. What are you seeing there? Like what's the up to the minute in terms of how people are thinking about changing hiring needs?
A
I mean up to the minute is that you have to be very proficient at using AI. And I think that some of the companies have updated their coding interview processes to allow for use of AI assistance because on the job you have to be using AI all the time. That's just critical to succeeding in this field, to being amplified by AI. I would say that goes for every one of these archetypes we've identified. I do think as well that checking whether AI output is good or not in critical context is still going to be a very important thing. And stitching together different types of AI output and building pipelines to more efficiently process that are also going to be very critical. But we might be leaving the leetcode era. I will say this, that amplifiers, while not currently the most in demand across all different hierarchies of AI safety, organization or team, are I think probably in the next year or two going to be the most in demand. But that's based on my predictions about AI progress. As you say, it could be slower, there could be jaggedness concerns that slowed down this type of talent transition. But in general it's never good for your, it's never bad for your employability to spec out as a manager. Managers are very useful and leadership traits in general make you more useful, better employee. It's part of personal growth, I think, to take on some leadership roles.
B
What does supply and demand look like these days in terms of like, I guess maybe even at the highest level. Going back to the origin story of mats, my understanding was you said, geez, this AI safety thing seems like it's going to require a lot of people working on it in a lot of different roles. And this is not something that universities teach. So how do people, what's the on ramp to doing this? If somebody would benefit from one, where do they go? So you essentially created one and of course there's some other others out there too, but you've created one of the largest and most highly regarded ones. Where are we in terms of like, are there a lot more jobs out there than Matts can produce? Fellows are there? Yeah, you know, how do you think about that? I think we've gone back.
A
I feel like we've gone back and.
B
Forth a couple times where at one time it was like, oh, we're super talent constrained. And then it was like, well, maybe not so much anymore. And now it's like there aren't actually a lot of roles for people to go into. So I feel like maybe I'm wrong on this, but I feel like this is sort of see sawed back and forth and I don't know exactly where we are today.
A
So I'll start by saying I didn't found mats. I didn't co found mats. I was in the pilot program as a participant. Oh, okay. That's.
B
Good Lord.
A
Yeah, yeah, yeah, yeah. There was like five of us who ended up doing the first research program. And it was a pilot. Like, they didn't have open applications. It was just people nominated from what was the first AI Safety fundamentals course, what's now Blue Dot Impact. Right. And we did that. And the credit goes to Victor Warlop and Oliver Zhang, who is COO and co founder at center for AI Safety. And I joined the team right after that program. And I would say. And then Oliver left Co found Case, and Christian joined on as well. And then Victor left shortly thereafter. And I would say that, like, I scaled mats. That's my contribution. You're an Anthony Funder. Yeah. And, you know, since Christian and I kind of, you know, refounded it in that we like, formed a separate 501C3 a couple years after that because we got too big for our fiscal sponsor. So.
B
Yeah.
A
So I'll take credit for scaling Matson for being the driving force behind strategy since, I guess, mid-2022, I believe. But. Okay. In regards to talent needs. Yeah. So that's a good question. Yeah. Sorry. Actually, tell me the exact wording of the question again.
B
Well, yeah. What's the balance of supply and demand? This might not be right. I mean, you can correct me on this too. But I've had this sense at times where people have sort of said there's so much demand for this kind of talent. Like, where is it, you know, we're talent constrained. But then other times, I have heard from people that now people are rushing into the field and there's not actually so many roles available. And so people are kind of frustrated. But I don't know where we are right now in that back and forth.
A
Yeah. So, okay, I'll say this. AI safety is a field where there are always going to be jobs for the best people. Okay. If you're a cracked coder, you can get a job in AI safety. Like, the Anthropic Alignment science team is growing at 3x per year. Okay. They're trying to scale fast. FAR AI, a nonprofit, 2x per year. Okay. So Mats itself, we've been growing 2x per year over our entire history. So these teams are scaling fast. Okay. And many more are getting founded. Open Phil has. Sorry, Coefficient Giving has huge amounts of grant money to spend on this stuff. Right. There are like A dozen AI safety focused versions, VC firms out there to fund your profit. Like there are incubators like Catalyze Impact, Selden Labs, I believe Constellation has one. Now there's tons of programs like Matt's. I think the problem is that the level, once you have built an organization, especially if you're scaling very fast, right, that hits a certain size, the main constraint becomes like, is this person good enough to warrant the extra management overhead? Can they take on some management responsibilities? So you have this situation where like at OpenAI where people are managing like 10 to 20 individuals, maybe up to like I believe one person, anthropic alignment science had 18 reports. So they're really flat. So you have this real problem where you need to hire people who can quickly ascend the ranks and be research leads, be managers and so on, even PIs of new teams. And that is like the limiting constraint. Okay. And that's the reason why a lot of people do some moderate reskilling and then can't get hired. Because I think these jobs, like there are many, many opportunities. But it's what we find is when we talk to these hiring managers, they say we find it extremely hard to hire. We have the money, right? We have the clear need. But people are not at our bar. And that's what Mattha is trying to do is to, to get people up to that bar. And there's some technical skills element to that, right? There's also some just like actual research experience. So people who come into mats with prior research experience do so much better on average than people who have less research experience. So I think a strong option for many people should just be stay in academia, get your Bachelor's, get your PhD for other people, maybe they should go off and found a company. There's tons of money and directions for AI safety companies. I think founders are strongly needed in this ecosystem. And then you can create opportunities for more people to get hired. But I'll say as well, another thing NATS is trying to provide is like credibility. I wouldn't say not formal accreditation, but some sense like they have the reference from their mentor who's a senior researcher in the field. You have the exhaustive Matt science selection process which is trying very hard to find people who are, who are good. And then you have like your proof of actual research impact. You produce a paper, right? That's your name. Perhaps you get to publish in a conference or it's an archive and people are talking about it. So that's what people need to get employed these days, right? You need to have an actual like, great output, some sort of deliverable that you can put shows your name, maybe several. Right. You have to be like technically good enough at coding or using AI systems, whatever's required. You need references from people that are trusted. Otherwise it's just very hard to get ahead. The same story you see in every talent constrained job market is here.
B
How does that translate to experience profile? This is obviously a big question. In the broader technology world, our junior programmers and endangered species, we see very prominent examples like Neil Nanda and Chris Ola, who broke into the field at a super young age, also kind of defined it in a way and are still quite young actually even today. And that may lead people to think that like this is like a young person's game. But what you're describing is more like, sounds like more like post PhD or sort of, you know, somebody who's kind of been, you know, grown up in an organization to an extent. I'm thinking of Rajiv from the AI Underwriting Company who was at McKinsey for a number of years and now is, you know, co founded this organization. But you know, come to it with like a ton of experience and sophistication in terms of management leadership, all that kind of stakeholder management kind of stuff. What do you see in terms of like, is there a lot of opportunity for people, say straight out of college or is that are they kind of barking at the wrong tree if they want to go directly from like undergrad into this space?
A
So I mean the median Matt's fellow is 27. Okay. So there's like somewhat like a log normal distribution, long tail. I think the oldest one, last coat was like 55, 60. So there aren't people of all ages applying to maths. The youngest person is of course 18 because we can't take minors now. Okay, more statistics. Right. 20% of maths fellows are undergrads. They have no bachelor's degree. Okay. Or perhaps some of them haven't even applied for bachelor's. Right. They're just cracked engineers. About 15% have PhDs already in the bag. So at least as far as MATS is concerned as like this accelerant, reskilling, retraining, internship, mentorship program, whatever. You're getting a broad distribution of people now. I think that there is obviously huge demand for people with more experience. A second critical thing is that they have experience with the latest tools. And because these tools haven't existed very long, young people have a strong chance of being people who are particularly good at using these tools because They've just been constantly on the cusp of things. They haven't been sitting in a cubicle, not using Claude code every day. So to that extent, young people have a huge chance. But it is the case that you do gain valuable knowledge from working on the job, especially in a great team producing great papers that you can't replicate. You've pointed to some what I would call prodigies. You, you know, Chris Ola, Neil Nanda, there's plenty of, plenty of people of that ilk who have come through mats. No one, I think, actually that's not true. There actually are some people who I would put in a similar class, like Maris, Hobhan, et cetera. And like, in that case, like, our main job is just to like, just get out of their way. And I think that if you're that kind of person, don't let anything hold you back. Apply to mats, apply for grant funding, do whatever, come to the Bay, go to London and, and just make it happen. You will find your path. If this is like, maybe not your path, and especially if you're like a more senior researcher or perhaps a person who's like, man, I can't conceive of that. I just want to finish my undergrad degree and do a PhD, that's fine too. People of every single walk have passed through maths, got hired, done other programs, got hired, founded companies, et cetera. I think the advice, it's hard to tailor advice to, you know, a myriad of different types of people, but I would just say focus on your technical skills, focus on understanding the frontier of technology and yet don't be, don't be limited by the opportunities you see on job boards. Right. You can create your own opportunities. You can cold email companies, you can like apply to grant funders with just some random grant proposal you put together because it fascinates you deeply. You can call up hiring managers and stuff.
B
I mean, when you describe the range, it is a pretty broad distribution and that tells me that you trust your own ability to discern who's going to be good more than you trust other outside signals. So maybe tell us what you are actually using to assess people and this could be translated into practical advice in terms of how does somebody make an application stand. Stand out? But what are you looking for that allows you to take somebody in their 50s or somebody who's 18 and feel like you can read what really matters, regardless of what their background is?
A
Yeah, so, I mean, we do some of the standard stuff that you would see at other tech companies, right? Like we have CV review, we have some code signal tests so brush up on your coding skills and so on. And they do detect AI use. We are of course considering ways to allow for tests that include AI use. But these are obviously harder, right? They're harder to design, they're harder to check and so on. But yeah, that's part of our general application now. That's for some streams. I'll say this mentor selection, sorry, scholar selection. Like we're trying very hard to provide something like a service to mentors. So if a mentor says to us I don't want to do CV review, I don't want to do code signal, I just have this selection program problem that I want fellows to applicants to work on and then like I want you guys to help me evaluate this. Build me like a team of contractors or some automated evaluation process to do first pass screening and then we'll go from there. That's our favorite kind of evaluation in some way because we know that is like as close as possible to the actual job, the actual research as we can get. Of course this is like typically in Yonetta's case it's like, like go away and do a 10 hour mech and terp kind of pseudo work test and then present your results to me. You can use AI, do whatever, just find something interesting. And this is like this is great because then we get great results. Some other streams it's harder to do this, it's harder to administrate. And so we do rely on some proxies that are perhaps less specific than ideal, but I think are no worse than anyone else in the industry is doing. And of course I think the way you stand out, obviously it's going to depend on the specific mentor because MATS is very heterogeneous in that respect. Right. The best thing to apply to Neil Nandistream is going to be vastly different than applying to Ethan Perez and the anthropic megastream. But in general you want to really understand your basics about AI safety. So do a blue dot course because there may be some critical knowledge or a paper that if you haven't read, you don't understand. If you don't understand what deceptive alignment is, that might be really bad for Ethan Perez or Buck Schlegeris kind of control research. But even applying getting into those streams, if you don't understand that for interp stream probably doesn't matter as much unless of course you're dealing with deception in your interp thing. So make sure you understand your basics, make sure that if you're applying to a stream that is empirical heavy, that you can do code signal tests you can code, including without AI assistance, at least for the time being. And it doesn't hurt to so apply to other programs as well. MATS is far from the only program out there now. This is not like early days. There are so many great research programs out there. Pivotal era, PIBs, laser labs, spar arena for technical. I think ASTRA is now running again. Yeah, there's tons of great programs out there and that can really booster your cv. If you have experience in the kind of research that you want to do at Mass already, then so much the better. Consider it like a postdoc opportunity or something. Or a post research opportunity. Build your own independent projects. Yeah, sorry if that's like too much advice to be actionable.
B
Yeah, I think it boils down to tangible product is like king. Right? I mean, and I say that always in the AI engineering world as well, if any, you know, And I'm far from the world's leading expert on how to break into that space. But what I always tell people if they ask me is a working demo is kind of the coin of the realm, you know, like it's all. People might be interested in what you have to say, but they really want to see that you can make something work. They want to see it online. It could be a replit or it could even be a collab notebook or something. But like it's got to. You've got to make something that can work. And it sounds like this is a pretty similar worldview. Like you've got to show that you can get in there, make something happen, as you put it, with, with Neil's track in particular, like find something interesting. If you can do that, you know, we, we might have something to talk about. One thing that jumps out is like maybe not as emphasized as I would have thought is being in command of current research. That's something I think at this point really nobody can keep up with all the current research because that exponential has gotten away from all feeble human minds. I would say maybe with a few hyperlexics that can still keep up. But I have found keeping up with research feels important to me. It feels like an important part of how I stay conversant with people across a lot of different areas. But obviously what I'm doing in trying to be conversant with people across all a lot of different areas is not the same thing as research. How much emphasis do you think mentors in general put on, like Being on top of the literature, so to speak.
A
It varies. So there are some of the mentors will ask for questions like I don't know, what do you think about X concept? Others won't be as interested obviously as you say. Right. These costly signals are the most important thing like have you done good research? Do you have it deliverable like a product? Do you have a strong reference for an important person? That's also key. Do you have like have you done your homework in terms of the Blue Dot course and other things? Right. I think that math selection doesn't currently emphasize breadth of knowledge very much. Mostly because mentors don't necessarily want that. And I think that this is maybe a weakness in our process to some extent if we don't then help people build that breadth. But we do, we have seminar programs, we have tons of opportunities for intermingling between different research streams which really rapidly builds a breadth of knowledge. And we have like, we used to have like discussion groups and these still occur occasionally with like workshops and so on. So I would say like I really do encourage everyone to do like a basic Blue Dot course or equivalent like AI Safety Atlas and CASE have good courses as well. But I think that like this is not as required for selection and it's more just like to prevent you from entering MATS and then like starting to do a research project and realizing oh crap, I have no idea where the gaps are. I don't understand how my work fits into anything. How do I get funding after maths, how do I get a job, blah blah, like how do I choose a good original research direction? So it's more for like your ability to actually deliver within the program. Right. Tracking research and less to do with like your ability to get in at the moment. Which is pretty important because MATS is just a stepping stone. Right? If you do MATS and then you don't produce a great deliverable by the end of maths, sure it's a great thing on your resume, but it's not going to be enough in many cases, you know, because it's such a competitive environment. So yeah, I think that it's really good for people to build a shallow but broad understanding of the literature. So I would recommend like don't be checking X constantly for new papers, blah blah blah, unless they're in your field. Maybe set up some Google Scholar alerts for interp if that's your thing. But like more every so often do a periodic like deep dive into like what are all the cool updates across different fields, you know what I mean? And you can do this by like every year looking at the new Blue Dog course or every month looking at some research roundups or highlights like Z as a newsletter, a transformer. And there are other people you can follow on X. That's my main recommendation. So.
B
Your admissions rate, it's super low, right? I don't want to, we want to encourage people to apply, but it is a very selective program. What does the kind of funnel look like in terms of apply? I don't know if there's intermediate steps that would make sense to talk about selected. And then I think the good news though is if you do get into the program, your success rate on the other end of getting into the field in a professional W2 employee status sort of way is really high. You want to run us through those numbers?
A
Yeah. So last program I think we accepted around 5% of people, maybe 4% who applied in our initial intake form. There was a subsequent process that they had to complete which is applying to specific mentors and streams and I think we accepted somewhere around six and a half to seven percent of those people. So a bit higher that maybe is the figure out focus on somewhere around 7% let's say. Now that's better than people think, right? They hear, they thought it was something like a. I think the Anthropic fellows program for example is like 2% because anthropic big name. Right. But mass is larger. We have more diversity and more spots in the so on. And I think that like yeah, I think people should also just treat the application process as a learning experience in general. This is, we try and make it like some streams are going to be like more painful than others. But I think that like for the streams like Neil Nandez where you spend 10 hours working a project, then you have something really cool for your GitHub that can only help.
B
And if you don't like doing that, you're not going to like working with Neil anyway. I would imagine.
A
Yeah, definitely. Definitely the case. I do think it is unfortunate that there aren't easy ways to do credit assignment cheaply to find the best people without them spending a bunch of time. But I don't know. I know job interviews for top tech companies like OpenAI, Anthropic and GDM just last three to six months or something. You've got so many things to do before you get finally the yes or no. So we definitely aren't that involved. Right. It's a much slower process and I think that's because like the commitment is less, you know, on our end, right, we're not giving people W2s. Mats is an independent research program. They get grants from a third party. We provide the housing, the office, the mentorship community, but we don't sign people on for like any type of employment, which I think is part of the appeal as well. So I think. So that's the main statistic there, 7%. At the other end, about 75% of our accepted fellows go on to our extension phase. So our first three month program, 7% get in, 75% go on for another six months, maybe even 12 months in some cases. And that extension phase is where a lot of great follow up research happens. And of the people who've done our program over history, we've had 446 fellows in total, not including people who've done training programs that we've helped facilitate, but which are probably another like 2, 300 of those 446, 80% have gone on to get permanent jobs in AI safety based on our latest statistics. So that's great. I think, yeah, like I think 98% are employed in some capacity now. Of those 80%, not all are like W2s. Right. Some of them are independent researchers with grant funding from coefficient giving or LTFF or something, which I think is like a fine situation. Right. And then in terms of like the actual field growth, there's some statistics I can share. So it seems like based on Stephen Michalise's Lestrong investigation, that something like the growth rate of the AI safety field is extra 25% per year, which is kind of interesting. Right. It does seem to be growing exponentially, as far as we can tell. Now that is a lot less than the growth rate of MATs applications, which are going up somewhere between 1.4 to 1.7x per year, depending upon how you slice it. And mentor applications might be increasing around 1.5x per year. So there's a big disconnect. And according to Blue Dot Impact, I believe that their growth rate is something like 370% per year in terms of applications to their programs. So there's like, you know, yeah, some large disconnects. So a lot of people are applying to Blue Dot that can't go on at that rate. That's just way too fast. But probably because they've done amazing advertising and marketing stuff. Matt's has only just started to do advertising and marketing. We had the first ever Mentor open rounded applications launched just recently. And yeah, we sponsored Europe's. That was cool. And we sponsored your podcast and several other Great venues as well. And I think that this is only going to cause probably the application trend to continue. I would say, I would guess 1.5x per year, something like that, which that is a faster growth rate than the current deployed growth rate of the field. As to why, I could speculate it's probably just caused by like, like a high bar, a very high bar for a lot of these companies, maybe a deficit of founders as well. And, and there are plenty of organizations working to remedy that. I know there was this AI assurance technology report from Juniper Ventures about a year ago where they predicted that the size of the market for AI assurance technologies is doubling each year. So there is a lot of opportunity to do stuff that might contribute to AI safety.
B
What does the salary distribution look like for the people that are getting jobs? Like how much of an alignment tax, if any, do people pay in the salary department?
A
I mean at the Frontier labs, no tax at all. You're getting paid the same rates if you're waiting. Yeah, they're getting staggering amounts of money. I think like a couple years ago the going rate for like someone joining off the street was like 370k or something. I'm sure it's much higher now given all the, especially if the crazy meta stuff that happened. I mean I would bet like mid level and higher people are making over a million even on the safety teams these labs. But I don't have any private data on that. But yeah, I mean if you join as like a junior software engineer, don't be surprised if you get somewhere around 350k or something. nonprofits obviously it's lower. Right. They can't compete with equity, they don't have any equity. But they also typically have less funding. Coefficient givings pockets aren't as deep as the, you know, the collective might of US venture capitalism. And I think there is also something like there is something like a non profit tax. I wouldn't say there's a safety tax. I'll say there's a non profit tax. Right. Because there are nonprofits that are doing like AI capabilities. Stuff like Allen Institute for AI. People make a lot of money still, right. There's like this is, this is artificial intelligence. Right. And coefficient giving understands and other funders understand that you know, you have to pay to play. So you have organizations like Meter that where they are I believe offering quite a lot of money for their roles. Upwards of 300k for most roles. Like yeah, probably over a million for some I would dare say. So there are nonprofits that are really Trying to compete for talent. They can't offer anything like the Frontier Lab salaries, including equity. But they're trying their best. And I think this is kind of reasonable, but it also is, it is a bit, it is a bit of an insane moment. Math salaries are not anywhere near that high. Maybe we're doing the wrong thing, I don't know. Yeah, and I think there are other AI safety nonprofits that have tried very different strategies. Like you have, I think the going rate for far AIs, research scientists or something like 100, 170K. So like significantly lower. They might have actually improved that recently. So there is a wide spectrum here and it really depends on the compensation policy of the organization. But you will see very well funded nonprofits offering comparable salaries to at least some junior AI company roles.
B
What about in terms of compute? I know you guys have a, in addition to the stipend that folks get as fellows, there's also a COMPUTE grant and I believe it's $12,000 worth of compute. I'm interested in what form that takes. Like is it, is it just a BREX card that you can go spend on compute wherever you want or do you guys have like established COMPUTE partners that you work with that you know, serve your fellows well? How often is that enough? Are there times when people find that they need more COMPUTE to do what they want to do and, and then if they go work at a nonprofit, like how COMPUTE rich or poor are the nonprofits?
A
Yeah, so I mean matce offers 12K. I'll say budget. I won't say like we don't give people a card that says here's 12K. No, they have to justify their compute spending. They have to have an actual project and proposal that they are going to spend that on. That necessitates that and most people don't spend anywhere near that much. Which is good because that's part we budgeted as if they could. But we really don't want to spend, you know, just waste money on compute. Basically no one has COMPUTE limitations. Sometimes people rarely have needed more. More than that. We've like considered their proposal and thought hard and reallocated funds as needed. And yeah, I think like people aren't limited by COMPUTE at mass in general. I think that the way we do it is pretty good and that we have a bunch of like, you know, kind of like we have our, for our model API calls and all that. Like we have specific organization accounts that we sign people up, you know, and then we kind of give them a budget and so on and top them up as necessary. We do have our own MATS cluster as well that we maintain online. So our compute team handles that. But typically people tend to use Runpod and other types of like self service things. It depends on the kind of research. Like there's some. There's some types of research that works well on our cluster and there's some types of research where even with that the benefit our compute team can provide in setting up the cluster and maintaining it. The experiments are so customized and so they need to tinker so many things that we just give them a budget and let them use online providers. It's just better that way. We did have used other providers in the past, but that's our current setup and we're looking at putting together a customized Claude code kind of suite as well.
B
Meaning like building out a bunch of tools or MCP type.
A
Yeah, I don't think. Not. Not MCP at this.
B
Well, that's kind of. That's an idea.
A
Yeah. Yeah, you're right. Yeah. There could be. Actually there's a lot of data, you know, and that's databases. We could probably put together something really useful at some point, but we haven't. We haven't thought about that.
B
Yeah, just the archive of everything that's been tried would be pretty fascinating to do some magentic search.
A
The. I mean our new research database is online. Matsprogram.org research you can see everything there. We have a Google Scholar, but that's just all the papers that got published. You also see our Les Wrong blog post under the MATS program tag. But many more research artifacts have been produced than are visible there.
B
You mentioned, you said the word tinker. Is the Thinking Machines API growing in prominence in terms of what people are finding attractive to use?
A
Yeah. Many people are wanting to use the Thinking Machines API. So we've put together some. I think we. Yeah, I'll say many organizations like to donate API credits which is awesome because we can really use that.
B
Yeah. Cool. Is there anything else that we should cover in terms of like January 18th? We know that is important as a date to keep in mind. What other facts should we make sure that we touch on MATS is growing.
A
We're always hiring. If you want to. If you want to work in our team and help grow the next generation of people. If you fancy yourself an amplifier of sorts, you have people skills and research skills. We'd love to hear from you. Go to our website, matsprogram.org careers. We're taking on mentors as well. There's an application form on our mentor section of our website and participants. We're going to run three. Three programs next year. This year rather not one but sorry, not two but three. That's going to be a summer program, a fall program and then a winter program starting into next year. And I'm super excited. We also have plenty of other offerings in the work. We're considering a one to two year residency program for senior researchers as well and yeah, more on that to come.
B
Yeah. Cool. Are you taking any connectors? Is there if I am a connector type or want to become one, is Matt's a way to find my way there or not really many have.
A
I would call Jesse Hoogland one such person. I would call Paul Rikers so Timaeus and Simplex. I would. I'd say Marius Halpan as well to some extent with his deception evals work. Yeah like and probably dozens of people. I'm just like sharing some of the more the names that come more easily to mind. But just many, many people have come through mats. We're super open to individuals who have this kind of archetype and note a connector. Right. They have empirical skills, they have theoretical skills. So they could probably succeed in a bunch of different ways. Right. But they're uniquely specked out to connect those two things. Now there are some mentors and projects that are much more suited to this kind of thing than others. People like Richard Ngo historically Evan Hubinger. I think actually Evan Hubinger has been like probably the most dominant connector driving force at MATS over our time. But he's not a mentor in the next program. Unfortunately he doesn't have time. But yeah, there's many different opportunities at MATS for this kind of thing. I think even in some of the interpretations streams as well. It's very possible to enter an interpretability stream and bring it with it. Like some model of the kind of theory based interpretability mechanism or strategy that you want to pursue and then see that executed on. That's happened several times.
B
One of the things that I took note of in the blog post from 18 months or so ago was you had made a comment that funders basically don't want to or they're much more inclined to support the growth of organizations that they sort of see as legible that have like research directions that they sort of feel are somewhat established or that they can wrap their heads around and they're much more reluctant to fund like totally new conceptual directions. And that seems like it exists in contrast with like the AE Studio survey where they basically found that the field As a whole seems to think that like we don't have all the ideas that we need and you know, that like more kind of far out ideas should be tried. Which of course led to their neglected approaches. Approach. What do you make of that? Is there stuff that we can do or is it, you know, is there. Is it a different organization's job to figure out how to fill that gap? Because I do feel like I want some more and I love some of the AE Studio stuff, including like self other overlap. I always come back to that as an example of something that's just like quite off the map of what most people are doing. When I think of AI control and like what Buck and the Redwood research team are doing, I find that stuff fascinating. And one of the things that kind of impresses me most is that they are willing to work on something that in some ways is so depressing. You know, they're like, we're going to try to figure out how to work with AIs even assuming they're out to get us. And I'm like, yikes. I don't know that I would be able to sustain the positive attitude, you know, enough to do that if I was working from that premise. I do feel like there's a relative dearth of things that are more sort of inspiring here. I think maybe of long a studio, but also like softmax. Obviously people have a lot of different opinions on like, are these things ever going to work or not. But I wonder what your take is on just kind of the overall mix. It seems like a lot of things are kind of more toward patch the holes, keep the AI down, tempt it, you know, see if it'll take the temptation and then patch it. You know, if it takes the temptation and there's not nearly as much that is sort of a. A more kind of colorful, positive vision for the future. And I wish there was, but maybe that's just not happening because the ideas are just too hard to come by. Maybe it's not happening because the funders aren't bold enough. What's your take on how we can get. If we should be trying to get more of that stuff and if you think we should, how might we go about it?
A
I have many takes here, so obviously I advocate a portfolio. And MAT has historically sponsored a bunch of projects. Self Other overlap. That project came out of a mat's alum. I'll just say Mark Carlino might have messed up. His name was like the originator of that project. AA Studios. And I believe Cameron Berg is running another Mats 1.0 alum with me is running some of their more neuroscience inspired approaches as well. So A studios is great. I love what they're doing. I think that the survey they did of less wrong is just like probably not representative of the AI safety research field as on the whole, but it might be even so I think we obviously need more ideas because more ideas are good, right? More bets are good, more shots on goal are good. Now I would not advocate a person who is like a very strong iterator to drop that and be trying to become think of some new paradigm that is, I think that would be strictly counterproductive on the margin because I think we do have very some very strong central research bets that we need more people pursuing because they will yield demonstrable results. But if everyone did that, this would be bad because you need to have your portfolio. Maybe these approaches fail. Maybe they need other pieces to work. Many AI safety research agendas are kind of contingent on other things going right or other people working on other stuff. It's like any kind of research field. You need to have everyone advancing the frontier. So I think AI safety has historically gone really kind of arg max y on different agendas, which is bad. Portfolio approach is much better. Don't rule things out as possible directions, just shift and reallocate resources to them. To their credit, coefficient giving have done an amazing job, particularly recently at supporting a bunch of different novel research bets. And They've also funded PIBs or principles of Intelligence, a program that is trying very hard to pursue sort of moonshotty interpretability and like agency understanding projects. So they're great. Check them out. I think that more ideas would be good. I think that the kind of person who should be pursuing that typically is going to look something like someone who is already a domain expert in some other area. You are occasionally going to have your buckschledgeresses. Your Evan Hubinger's right, come along with no PhD but spent years at Miri, you know, incubating in that kind of that deep AI safety, that rich AI safety experience and then come out with amazing stuff like risk from learned optimization and AI control and all that stuff. But short of having access to that type of community and that type of research experience, I think most of the prominent connectors like your Alex Turners and so on have spent a lot of time in research science PhDs also on Vestron of course, and incubating in that environment as well. Well, so I think MATS is a great way for that kind of person to develop and to spawn more research ideas. In fact, I've seen to shout out Alex Turner. He has come up some amazing research ideas over his time at mats and I think we've been very fortunate to support him. Things like gradient routing. And also Alex Cloud, another Matts mentor. And just plenty of other things like activation engineering and searing. He was one of the people involved in that. So I think that senior experienced researchers are going to be probably like most things, the main drivers of new ideas. And grant funding that lets them pursue whatever their research taste dictates is great. And programs like MATS that let them stalk their research genders are also great. I also think BOUNTY programs could work as well. But I would hazard against people putting all their eggs in the basket of we need to have a bunch of ideas, new ideas, because the central ideas aren't working. I don't think that's true. I think the central ideas are still our actual best bets.
B
Yeah, okay, makes sense. Do you want to shout out any other either just organizations that Matt's fellows have gone to or even started that you think are kind of underappreciated. This could be sort of assignment editing for me for future episodes. But also just things that you think people should be paying more attention to than they are.
A
Yeah, I mean there's tons. Like you could see all the organizations listed on our website. There's so many amazing people there. Like I guess I could shout out. I mean it's hard to play favorites because MATZ is like we've worked with so many people and we're trying to be very broad. But yeah, I guess like in terms of nonprofits specifically because maybe they don't get as much attention obviously. Redwood and Meter and Randcast, Apollo Research, Far Goodfire, Truthway, I, Law zero, Miri, plenty others. I love these organizations. We need more, frankly nonprofit research organizations. And if you think you could found one, give it a shot though obviously like you need to have a ton of research experience under your belt and very credible references and so on. Yeah, I don't know. It's really hard to play favors. Nathan.
B
Yeah, so many great research, good options for organizations to shout out. So it's a testament to how many fellows. Fellows have already gone on to do impressive work. So great job by you guys in driving this and growing it over the last few years. And people should definitely apply if they want to be a fellow. January 18th is the deadline. It's time to get into it if you want to make sure your application stands out. Anything else we should touch on before we break?
A
No, I just really appreciate this experience. Thank you so much for inviting me to talk.
B
My pleasure. Thank you for doing it. And keep up the great work. Ryan Kidd, co executive director at mats. Thank you for being part of the Cognitive Revolution.
A
Thanks, Nathan.
Podcast: Future of Life Institute Podcast (cross-posted from The Cognitive Revolution)
Host: Nathan Labenz (B)
Guest: Ryan Kidd (A), Co-Executive Director, MATS (ML Alignment & Talent Search)
Date: February 6, 2026
This episode delves into the landscape of AI safety, focusing on talent pipelines, practical alignment strategies, field progress, and labor market dynamics. Drawing on Ryan Kidd’s experience leading the MATS program—the largest AI safety research talent pipeline—the conversation surveys technical strategies, organizational priorities, and the evolving demand for researchers as AI races toward advanced capabilities.
Key Segments: [02:17–05:54]
Key Segments: [06:56–10:14]
Key Segments: [10:14–16:53]
Key Segments: [16:53–23:20]
Key Segments: [23:20–32:22]
Key Segments: [32:22–38:02]
Key Segments: [38:02–42:13]
Key Segments: [42:13–44:12]
Key Segments: [45:10–49:51]
Key Segments: [49:51–56:34]
Key Segments: [59:03–64:46]
Key Segments: [76:18–85:55]
Key Segment: [80:05]
Key Segments: [89:45–94:38]
Key Segments: [100:56–104:41]
Key Segments: [104:41–end]
Ryan Kidd provides a panoramic and candid view of the AI safety ecosystem. He emphasizes a broad, portfolio-based approach for both research agendas and talent development, acknowledges controversial dual-use dynamics, and stresses the necessity of both technical and governance solutions. There is room—and urgent need—for both pragmatic iteration and creative moonshots as the field accelerates. MATS continues to scale rapidly, seeking diverse applicants with technical substance, curiosity, and drive.
End of Summary