
Anthropic's co-founder and chief scientist Jared Kaplan discusses AI's rapid evolution, the shorter-than-expected timeline to human-level AI, and how Claude's "thinking time" feature represents a new frontier in AI reasoning capabilities.
Loading summary
A
You last year put forward the prospect of human level artificial intelligence by 2030.
B
If anything, I expect it probably sooner than 2030, probably more like in the next two to three years.
A
What would need to be true in terms of deep seat for it to propel itself beyond the capabilities of US frontier models?
B
There's so much low hanging fruit to collect that it's unpredictable. Who's going to sort of find which advances first? There's no reason why they can't be very competitive algorithmically.
A
What does it mean to be interpretable? If the machines are operating in spaces that make us look not so much like silverback gorillas, but like hamsters, one.
B
Of the directions we're moving is where you could have AI systems that think about what another version of Claude is doing in order to sort of monitor it and steer it in a good direction. So by the time you're at a point where AI is sort of as smart as people or beyond, you're able to leverage those smarter than human AIs.
A
The way in which these models impact economic productivity and the labor market could be much faster than the canal, electricity or the iPhone. What is the debate that ought to be happening that would most help us prepare for that kind of fast deployment scenario?
B
Is it really safe to have AI that is smarter than you? And I think that is a real question, like should we be having these super intelligent AI aliens kind of invading the Earth or should we decide not to?
A
I'm delighted that I've got Jared Kaplan, who's the co founder and chief scientist of Anthropic. Jared, it's great to have you here.
B
Thanks so much for having me. It's great to be here.
A
You last year put forward the prospect of human level artificial intelligence by 2030. Given everything that's happened since then, what's your current assessment?
B
I mean, I think if anything, and I mean, maybe I'm drinking too much of Aya Kool Aid, but I mean, if anything, I expect it probably sooner than 2030, probably more like in the next two to three years. But what is human level AI exactly? I mean, it's not something like an objective measure that you either cross the line or you don't. I think AI is just going to keep getting better in a lot of different ways.
A
So you've raised a really important question there, which is what is it? You know, it's not like landing two astronauts on the moon and bringing them back safely. That was very, very clear and well understood. What is the purpose of having a test that says human level AI?
B
Yeah, well, we don't really have any tests. I mean, at Anthropic, I mean, we have probably tens, hundreds of different tests and evaluations that we're running on Claude. And as time goes on, I think honestly the experience of working with Claude, collaborating with Claude and like, kind of getting productivity benefits from that is in some ways a better measure for how useful Claude is, I think, than any given test. I guess the way that I think about sort of how capable AI is is sort of on two, two axes. There's what environments can an AI actually go out and act in? So I go, I go back to sort of AlphaGo, which was this superhuman go playing program, better than any human, smarter than any person, but it was restricted to be on a go board, on a very, very restricted little grid that you can, you can, you can act on a very specific game. And as we sort of developed large language models, large multimodal models, the different environments that AI could interact in has grown. So, I mean, it grew a lot where you could just talk to chatbots like Claude, I think it's grown further where AI can understand images, it goes further when AI can use computers. And eventually, obviously the thing that we all imagine, the sort of sci fi thing, is AI being embodied in a robot that can, that can go out into the world. So I guess that's one. One of the directions I think of the other is just like, how complex are the things that AI can do? Like, can it do something that would take me a minute or 10 minutes or an hour or a day? And I think we're just going to keep moving in that direction. And that's where sort of AI has to sort of actually take actions in the world and learn things the way that we do to, to be useful in that way.
A
Yeah, I think that's a really helpful way of thinking about it. So if I play that back, one is like, what is the range and breadth of domains in which it can operate? And you know, of course we've gone from text to multimodal images, and of course that next boundary being the physical world. Although I always think that we do a lot of our most useful work in our heads anyway. And then the second one I think is so interesting. It's this idea of what is that unit of human time that the machine is able to operate on? Because the very early large language models, if you go back to, I guess it was Bert, right, they, they did tasks that were a second look at a sentence and find a noun. And then when you got the first versions of, I guess, GPT3, you had tasks that were. That could last maybe 10 seconds, look at a paragraph and pull out a sentence. And of course, if you go to Sonnet 3.7, I mean, I can give Sonnet 3.7 tasks that might take me hours. So here is 20,000 words. Distill out eight or nine of the key arguments, identify where they are coherent with each other and where they don't agree with each other. And that's a job that would take a graduate student half a day. And it's quite interesting to see the rapid progression of that sort of duration of task from these models. So I suppose one question I would have for you is, is that something that you track and you can forecast in your headspace 3.7, which is the latest Claude, how long can it operate for?
B
It's a great question. Yeah, no, that this is something that I track and it's definitely something, certainly it's something that I think about very actively and a lot of our research is oriented around this. I think there's a sort of. We talk about it as sort of the horizon that CLAUDE can operate on. I think the way that at least if you're a developer, and obviously not everyone, is that maybe this is most viscerable is with something like CLAUDE code, or you can ask CLAUDE to sort of search through a repository of code, make changes across all sorts of different features, and maybe iterate and test the code itself. So I think those kinds of capabilities feel like they're the most complex. As you said, a lot of what we do happens in our heads, and that's, that's true for me too. But I think that, like, the way that we really sort of get a purchase on the world is by trying things and see what works and see what doesn't. And so I think that's what sort of really allows you to sort of extend this horizon. It's definitely something I track. I mean, it's something that also. I remember people years ago who were AI enthusiasts talking about, well, maybe AI won't be able to do things that take longer and longer. And I think we are seeing this horizon expand and so the utility of AI goes up.
A
Why does the horizon expand? Is it more memory in the GPUs? Is it some bit of magic that you're tweaking?
B
It's a great question. So I think it's a few things. I think one. One aspect is just sort of the model intelligence, kind of, in a general sense, intelligence is going up. So the Model's able to attend to more different issues, to track more things. Another is sort of the context length. So the context length of our models keeps going up and we find that we can extrapolate it much further than anything that we've shipped yet. And so it should be possible for AI to understand more and more. Like to go from understanding like a paragraph to a chapter to a book to something much, much longer that's helping it. And then finally, I think that we're training AI using reinforcement learning to do more complex tasks in a sort of useful way. So we're training AI to do more complex coding tasks, to study longer documents exactly like the example you gave, and distill out more information. We're always sort of trying to find the tasks that push the envelope on what Claude can do and train it to get better, better there just as like in our own educations as people like, we're always trying to solve harder and harder problems as we get older and we progress from elementary school to high school to university. So I think it's all of those things together that are sort of pushing this envelope. Right.
A
But you know, when we think about large language models, a lot of the emphasis is on that first l, the large. And we, we lived in this regime of, of scaling laws where to some degree there was a predictability that if you 10x'd the size of a model, which would mean 10 times as much data, 10 times as much compute, 10 times more complexity, in the end you got this sort of predictable linear improvement in how well the model worked. And the argument has been that that kind of scaling is getting harder and harder. Either it's that we're running out of data, or it's really expensive, or it's actually just really complex. I mean, you've been at the front line of that. When you look at that pre training scaling, what has been the bit that has started to put the brakes on the rate at which we're seeing results from it?
B
Maybe, maybe zooming out for a second. I think there's sort of scaling laws as a quite precise. And that was what was so surprising about them, very precise empirical finding just from studying AI and AI training. And that was you, you, you put it beautifully that if you increase the size of neural networks, the number of parameters they have, if you increase the amount of data, if you increase the amount of compute you use to train, then you get these stunningly predictive curves for how well the AI can model its data, can make its quote, unquote loss go down what this really means is how well can large language models predict the next word in a sentence, paragraph, document, et cetera. And that just very, very precisely improves as you, as you scale up. And we haven't seen any limits to that. I mean we are seeing that. I think as you make models bigger, as long as you have all of these ingredients that you mentioned, model size, compute and data, you still get improvements. I think probably the limiting factor that people talk about the most for good reason, is data. Eventually one is going to run out of data. I actually don't know that we have reached that point yet. I guess we will see, but I do think eventually in the next couple of years we'll reach that point. Certainly cost also matters, but there are all these different other ingredients I think that are driving costs down. I think we're finding algorithmic improvements that make model training much more efficient. And we're also seeing that hardware is improving very quickly as well. And so costs are going down for those reasons. So I think the sort of scaling is going to continue. Now there's a separate question. There's this very nice empirical statement that the AI can model its data better, but that doesn't necessarily mean it's more useful for you. That doesn't necessarily mean it's more useful as, as Claude. I think generally, generally it has that implication, but it's much less precise. And so it's possible that sort of the gains that we get in what AI can do for you will come more from training it to do useful tasks after pre training rather than pure scale. Right, Right.
A
Okay, I want to talk about the after, after pre training in a second, but it's been, I think one year and ten days since Claud 3 was released. So belated happy birthday to Claude 3, which I think became everyone's most personable LLM a year ago. One of the things that we've seen happen in the full AI stack is that the generation time has got shorter and shorter. So semiconductors used to be on a three year cycle. Jensen Huang has put them on a one year cycle. AMD is responding in a similar sort of way and there has been an acceleration, but we're over a year since Claude 3 came out. So what is the right time between generations for these large language models and what should we expect as consumers on the other end?
B
I think that the generation time for models has been really, really fast, at least to me it feels fast. And I think that's, that's basically going to continue. So I think that we should expect a new generation of Claude models in not, not too long, certainly in the next six months or so. And I think that basically that's going to continue. And it's both because we're improving sort of post training or reinforcement learning training plot on more tests and because I think we're, we're, we're able to improve the efficiency and intelligence from, from pre training. So I think that's not slowing down anytime soon. I think in some ways the model cycle is even faster than the hardware cycle. We'll see if the hardware cycle is really one year, but it's definitely moving quickly and we're getting new chips sort of as we speak.
A
There was that very, very fascinating and challenging paper written by a young man called Leopold Aschenbrenner last year, which I'm sure you would have read. And in that he had a two year generation time between these sort of order of magnitude improvements in models. And I remember reading that and I was thinking, I don't think it can be two years because honestly it takes time to build a data center to get the chips from Nvidia and to find the power and to generate the data. Assuming you needed synthetic data in some cases. When you look back at that now with a bit of distance, you know, how would you communicate that sense of practical generations in terms of how we should expect as consumers, as members of society, these models to improve? Is it a three year clock cycle or is it going to be a two year one?
B
I guess I think of it as being smaller than two years, but that's maybe because we're iterating very, very quickly. So I would say that there's some pre training life cycle, but that's usually measured in months rather than years. Now there's a question of research. How quickly can researchers come up with new innovations that are sort of worthwhile with shipping, but I think of that as being, being quite, quite a bit shorter than a year and then with reinforcement learning. I think that historically that's been much less compute, much less resource intensive than training. Although that may be maybe changing now. And for that I think we can iterate much more quickly. So I think that there's sort of a desire at least as we develop Claude to every time we think that there's like a significant improvement that we can deliver in Claude and that might be for any of those reasons because of pre training improvements or because we've just realized that we can train Claude on some new task like what goes into Claude code and that will be useful to people. I think we're expecting to ship it. So I think it's really, it's more of a continuum. It's more like. Like you could ask how quickly does does Moore's Law develop? And it's really more of a continuum where it's like, I don't know, it's like maybe it doubles every 18 months or something like that. I think with AI it's faster than that. I don't know how viscerally that feels when you're playing with each generation of models. But I mean, you can tell me. I mean, you've been playing with Claude 3.7 sonnet. You played with Claude 3 opus. Like I don't know what you feel is the biggest thing I would know.
A
I mean, it's completely wild. The rate with which we have to update our behaviors as somebody who uses these tools is really, really astonishing. And I find myself coming up with something. You guys introduced something called Claude projects a while ago. And so I built lots of projects to help in my research work then as the models got better. And of course I use every family of the models. I use the OpenAI ones and the perplexity and u.com and lots of other ones all at the same time. And I also often use them adversarially because they all have slightly different flavors. I used to think of Claude as being that really super smart history grad student when you were an undergrad, kind of charming, knew a lot of stuff and was never cleverer than thou. And I would think of another model being a bit like that nerdy mathematician who always had to get it right, you know which one I'm referring to there. And so there was this difference in personalities. But I have found that the rate with which they feel like they're getting better means that I almost don't document my changes in behavior. I'm just literally living it through practice. And yes, you're right. So I think it feels very fast. So I want to come to this other question, which is this phrase that Satya, Nadella and of course Jensen has now started to use quite a lot from the middle of last year, which was test time scaling or inference time scaling. What is it and how big a deal is it?
B
I think it's a big deal. So the, the claim here is that as you let an AI model think for longer, then you can get predictable improvements in the accuracy when it's doing a hard task where just say pure thought, improved performance. So I think the classic example is like solving a really hard math contest problem or a Competition coding problem. What we see is that as you let literally like say Claude 307 son think for say a thousand words or 2000 words or 4000 words or 8000 words or 16,000 words, you get sort of predictable improvements in which each doubling of the amount of time Claude can think, you get sort of a constant increase, increase in performance. And you can actually also see that extended in other ways. Like you can train a separate AI model or you can even just ask plot itself to analyze the solution. But you can, you can train it to decide sort of which possible solution to a problem is best. And then in parallel you could generate, you could generate one solution, two solutions, four, eight, et cetera, and you can ask it to choose which is the best of those of the ones that have been generated in parallel after the fact. And again, I think you tend to see pretty clean scaling where you can get better and better performance that way. And so I think that for very difficult tasks, this means that you can either choose to have a smarter model, sort of solve it in a shorter amount of time, or you could ask a smaller model to work longer and maybe get the same performance. And so I think this is exciting because for the very, very hard tasks that you might want AI to solve, maybe if you just kind of throw enough test time, compute at it, you can solve them. I mean, I think the things that we imagine in the future, things like helping to cure diseases or making new breakthroughs in theoretical physics, things like that.
A
Right. So the additional thinking, this doubling of time spent thinking, gives you this predictable improvement. I noticed in the new version of Claude 3.7 Sonnet, you've got this, this thinking time and it's just a little feature that I flag that I tick. Is there some way of looking at my query? And then you make a judgment of how much thinking should, should go into that type of a query to give me a sufficiently improved response rather than have me sit around waiting for forever.
B
Yeah, indeed. So, I mean, Claude 3.7 sonnet is the first hybrid sort of reasoning model where it can act just like Claude 3.6 sonnet or 3.5 sod it new. I mean, we're great at naming. Your naming is.
A
Yes. Have you thought of using AI to.
B
Help with your naming a very, very primitive or like. Yeah, we're using like GPT2 to name our models. But yeah, yeah, Claude 3.7 Sonnet can behave very similarly to prior generations where it doesn't think at all, or you can ask it to think and it tries to sort of decide from its training how much thinking to do. And sometimes you wish it would think a little bit more, sometimes a little bit less, but, but basically based on sort of the difficulty of the task you assign to it, it will think the amount that it expects is best. And that is something that we're working on and we think will get better over time, is that it should become possible, kind of let it think as much as it wants, but kind of get a response in whatever the most reasonable timeframe is. And I guess the question is it just like, I don't know, if you start a new job and your boss gives you something hard to do, you might really want to spend a lot of time thinking because you really want to get the right answer, you don't want to get fired. But on the other hand, in some situations, maybe once you're comfortable at your new job, you might, you might feel like, oh, I'm just going to give a quick answer, like, we have a good rel. So I think Claude is sort of in the same space where it doesn't know. Like, am I expected to try really, really hard to get the best answer or should I just, am I wasting someone's time? And so that's something that we want Claude to sort of learn from context over time, try to bake a little bit of that in. But, but hopefully that will improve with, with future generations.
A
But architecturally, is it a pre processing step where the query comes in? You do some kind of a process, you make some kind of judgment, you send a parameter to the underlying model to say, think for x thousand tokens or think for 2x thousand tokens. Or is it all in the single model?
B
It's all in the single model. If you're a developer, you can specify precisely sort of what budget Claude gets, right? Sort of 99 plus percent of the time it will sort of stay within that budget. And some of a lot of the time it will actually undershoot that budget quite a bit. So you'll say you can think for 16,000 words, but it'll only think for 4,000 or something. And it's using its own judgment in deciding, should I use all the space I have or not? As a user on cloud AI, I think there are sort of fewer parameters that you can set just to keep things simple. But it's all one model that's sort of deciding based on what you've asked, how much to think.
A
For a lot of people, this thinking time experience exploded into their understanding. Inauguration day this year, January 20, because of deep seq R1. How big a deal was that in Anthropic?
B
I had been following Deepseek's progress for at least sort of a year, year and a half because they've been writing papers and improving their models. So it wasn't actually very surprising. Very surprising to me or to Anthropic. It was interesting to see though, like the reaction sort of globally of, wow, China has this, this, this great model. And there are people that I talked to in the US who, who've thought historically maybe China's many years behind. And seeing Deep Sea's progress in the papers they were writing, I kind of thought, well, they're like, I don't know, maybe they're six months behind, but they're, they're not very far behind.
A
Yeah, yeah, I mean it was, it was apparent with what was called V3, which was I think October 2024, where they were achieving GPT four level capability at 30, 50 times cheaper. And then I think the earlier paper was maybe November 2023 with version 2, and you were starting to see some pretty good results from the outside. I suppose, partly. We're often looking at that chart that shows the gap between the top ELO rated models. So an ELO rating is like a chess like rating for how good models are in chat. I know you know this, Jared, just for people listening and you know, you see the Frontier is a bunch of American firms and the best Chinese model is quite far behind. And over the course of the months the Delta has got shorter and shorter. And so you're looking at that and you're saying, well, is it just going to be a Delta where the Chinese are slightly behind the best US models? Or is there a real momentum within those Chinese firms that will take them past the frontier US models? But what would need to be the true in terms of the kind of research breakthroughs that DeepSeq might need for it to propel itself beyond the capabilities of US frontier models?
B
Yeah, I mean research breakthroughs are happening very, very quickly. I mean, maybe in a certain sense they're not even breakthroughs. I mean, one thing that I always say is that when you see really rapid progress in science, it's not because the scientists suddenly got much smarter, they're not superhuman. It's because people have found an area where there's just a lot of like very low hanging fruit, a lot of iteration that that can be used to improve. And so I think that's what's been happening in AI, I mean maybe for 10, 15 years. Certainly, certainly the last five years. And I think there's so much low hanging fruit to collect that it's unpredictable who's going to sort of find which advances first. My expectation, though I don't know for sure, is that going forward, I think that there are these sort of export controls that are in place that I think mean that that sort of Western firms will probably have an advantage in terms of the amount of compute available and I think that that will probably make it more difficult for Deep Seq and others to be competitive. But I think that in terms of the basic algorithms, all of the sort of leading AI companies I think are finding ways to do very simple things that work well and scale well. And there's no reason why, I mean, I think Deep Seq based on their papers has also found a lot of these ideas and techniques and there's no reason why they can't be very competitive algorithmically.
A
But it does seem that there's been a tone shift in the discussion of AI and AI development. And I know that you're your co founder, Dario Amodai has been speaking quite a lot about increasingly the importance of some sort of export control or licensing regime around chips as regards getting them out to China. And I even detected, and please correct me if I've got this wrong, a shift in Anthropic's approach to how quickly development needs to happen. I got a subtle sense that maybe Anthropic, which has often argued very strongly in favor of a sort of a safety first approach, had slightly changed its gait and said we just need to go faster than we have. Have I misread that?
B
The way that Anthropic thinks about sort of speed of development and its interplay with safety is primarily through our responsible scaling policies. So it's true that in those sort of very early days when we were, when we founded Anthropic, I think there was a general sense among us, although I think this was not something that, I mean, AI was kind of not a big deal in the wider world. Our sense was that AI was going to make very, very rapid progress and that that was primarily going to be very beneficial to the world. But there were also a lot of risks associated with it. And we did have some sense that this powerful technology being developed slightly more slowly could, could actually, actually be better in terms of sort of getting things right. Now the way that we've sort of, we kind of figured out, and I think this was kind of a breakthrough of its own, was that sort of we created this responsible scaling policy as Kind of a way to help coordinate with other labs to make sure that AI development is beneficial and isn't creating harms. So the idea there was that we would have, would think carefully about what kinds of real risks from AI exist that we want to actually take seriously. And we would measure the capability of our systems to sort of actually do harm, to be risky. And then if we crossed certain thresholds, we would basically commit to have mitigations in place to avoid those problems. And so in a certain sense, this meant that when we were thinking in these terms, we were more free to move quickly, because the idea would be we have this framework in place where we can move as fast as we can, both in terms of AI capabilities and also safety research and risk mitigations. And we would be sort of bound by, to sort of not cross certain lines until we were ready. And so I think that's something that we've sort of alluded to with, with even the release of Claude 3.7 Sonnet and the way that we've sort of discussed it in its system card, that we think that we are approaching some of these thresholds and therefore future models may need more protections. But at the same time, in terms of our research, we are getting those protections in place. So we had this both research release that we called Constitutional Classifiers and an associated sort of jailbreaking demo where we asked people, sort of anyone on the Internet, to try to sort of jailbreak this new system and in order to sort of test out this, this method. And so basically what we, what we think is that we want to move as quickly as we can for a variety of reasons, but we want to make sure we have these systems in place. And so that's kind of how we're kind of coordinating to, we hope, scale responsibly.
A
Yeah, I mean, this gets to the heart, I think, of some of the paradoxical ideas that you have to hold in your head, you and your, your colleagues. On the one hand, the way in which we're building AI today has many critics, and the critics say this system is uncontrollable. Right. You don't have any verifiable safety, and we don't have the science for how we build safety around this method of making AI. And you have a great comment. I'm just going to read it here. Maybe supervising a thing that's smarter than us is hard, maybe not. But once you make a thing that's broadly much smarter than you, and given that it'd be easy to run millions of copies of that thing, once you have one, you're going to lose and be disempowered. If there's a conflict, given the stakes, being 90% sure it'll work out is very far from okay. I think that was you. Was that you?
B
That sounds like me.
A
Yeah, it sounds like you.
B
Right.
A
Okay, good, good. So I guess one thing I'm curious about is how do you personally manage the. That paradoxical piece of the sort of risks that you've yourself articulated and the work that you are doing and the speed with which it's moving generation times that are, you know, less than 18 months? I mean, is there some internal cognitive mechanism that you've built for yourself?
B
Yeah. So, I mean, I guess I can talk about all of the different kinds of research we're doing to try to meet the moment. I mean, I think very broadly. I mean, you mentioned deep sea. There are all these AI labs in China that are near the frontier, or almost at the frontier. At the frontier, there are a lot of different AI researchers and labs in the US Scattered across the world. Everyone's sort of pushing forward this technology. And obviously, I mean, you've mentioned the potential for risk and critics that say that this technology is, is. Is. Is. Is dangerous. On the other side, obviously there are a lot of people saying that's. That's really silly. That's totally unnecessary. What's most important is delivering the benefits of this technology, because other things that I've said that Dario have said, maybe this technology can help us to cure cancer 10 times faster than we would otherwise. Um, and so people say things like, how could we possibly slow down when. When we're going to be able to deliver those kinds of broad improvements for human welfare? So there are a lot of competitors. There's a broad capitalist ecosystem moving very quickly. There are a lot of benefits to this technology. But also if it's really as powerful as millions of geniuses in a data center, et cetera, then there are also risks. And so it's. It's a. It's a difficult, difficult thing to juggle. I think that as AI becomes more capable, I think the stakes go up, and I think the confidence level you. You would like to have before you develop or deploy such systems also goes up. The way that we're thinking about this is through a lot of different research directions. Interpretability is one we talk about a lot, and it's something that even I was unsure about. I would say, I don't know. Three or four years ago, I wasn't really sure whether we'd be able to get any benefits. But I do think that we're starting.
A
To see to say, what's the benefit of interoperability?
B
The benefit of interpretability is basically, if you have a really advanced AI and you're not sure if you trust it, it sure would be useful if you could read its mind, and it sure would be useful if you could not even just read its mind, but understand how it puts its thoughts together to take actions in the world. So interpretability, I think first and foremost, at least I think could be a very, very powerful way of checking whether AI really is doing what you want it to do.
A
Right? But as AI gets better on this continuum, the way in which it's going to make its decisions are going to become less and less interpretable to us. You are a theoretical physicist, you are somebody who understands tensors, you know, multi hundred dimensional spaces, and you have formalisms that allow you to navigate and make sense of those. I can say the words in English, but all of the work you do as a theoretical physicist is not interpretable to me. And so at some point, maybe it's 18 months, maybe it's 36, maybe it's 54 on your trajectory, we will have systems that will be smarter than everyone. We have the Astronomer Royal in the uk, Lord Martin Rees, watching this right now, they'll be smarter even than him. And the number of people who could interpret that then drops. It gets to a world where only Terence Tao, the mathematician, can interpret it, and then he can't. So what does it mean to be interpretable if the machines are operating in spaces that make us look not so much like silverback gorillas, but like hamsters.
B
Yeah, it's a good question. So there are a couple of things I would say. So one is that if you want to understand everything that the AI is doing, I agree that's going to be very difficult. But you might be able to study a bunch of specific examples where you can understand, for example, what is the goal that the AI is seeking. Maybe you don't fully understand all of the steps in the process. Maybe you can break it down and study it intensively and you can, maybe you can use AI to help you to analyze what's going on inside. Maybe you can use a dumber AI to help you to understand a smarter AI, but generally there may be components of what the AI is doing, specifically its goals, or the way that it might respond in particularly dicey situations that you can set up and analyze that might give you insight into whether the AI is sort of ultimately aligned or not. The other thing I would say is that interpretability is really just. Just one tool. So I think that it. It hopefully will give us insight and some way of auditing AI and checking what's going on. There's a few other things that, that we're doing simultaneously. One is trying to figure out better and better ways to use AI to help supervise and monitor AI. I think both of those. So constitutional AI, which sort of anthropic, developed a few years ago, I think was like the very first example of this, where we used AI systems to check whether other AI systems were obeying a constitution and to guide them towards behaviors that were in accord with a list of principles that we call the constitution. So what we're trying to develop now, one of the directions we're moving is kind of like constitutional AI souped up, where you could have AI systems that think using the reasoning we were talking about earlier about what another version of Claude is doing in order to sort of monitor it and steer it in a good direction. And so the goal there is that you want your monitoring and the supervision you have of AI to improve with the intelligence of AI. So by the time you're at a point where AI AI is sort of as smart as people or beyond, you're able to leverage those smarter than human AIs for alignment. So.
A
Right. So can I jump in with a historical parallel? So that's a little bit like an enlightenment idea, right? It's an enlightenment idea in the sense that you. You layer on top of what's gone before. And I have a sense of, when I use these large language models that they do embed that idea already, it's really, really difficult to imagine a large language model that will believe the earth is flat, because in order for it to believe the earth is flat, it can't then be helpful in other ways. Right. Because so many of the internal relationships have to be broken that it's not going to help. So there is some sense, it almost feels that this is like a teleological argument, which is that if each subsequent generation is roughly aligned, there is an arrow of progress that suggests that you build one on top of another. And I mean, is it that there's a kind of a selection pressure that goes on, that models that end up diverging from that path become less useful and therefore they don't benefit from economic incentives? Or is that. Am I just drawing too strong a parallel with history?
B
Well, I think there's no. I think it's a good parallel. I would Say there's two things. So one is we got extremely lucky. And I remember talking to folks concerned about safety who were excited about this back. I don't know, in 2017, 2018, there was this vision that AI was going to look like AlphaGo, where it starts off as a completely blank slate and it trains just to optimize to do a particular task. We got, I think, lucky, or maybe we make good choices in developing large language models, because fundamentally, large language models, the very first thing they learn is to understand human writing, human, human ideas, the way humans use words to conceptualize the world. And obviously that has encoded a lot of our biases. It has a lot of the intuitions we have about ethics and morality, common sense, human history, the things that have gone well, the things that have gone wrong. And furthermore, it means that it's very. The very first thing we got these models to do was to chat with us. I mean, I think it's not a coincidence that Dialogue Chat was one of the very first breakout applications, because these models are trained on language. And so I think that is sort of exactly in sync with what you were saying, where there's a very, very strong bias that these models will at least understand and be able to communicate, and their basis for thinking will be very much a mirror to our own. Now, it might be a strange mirror in a lot of ways, it might be a funhouse mirror, but nevertheless, they really need to sort of understand all of those ideas. And so I think that does give us a pretty strong foundation. And it also means that at least some informal senses of interpretability, like being able to look at the chain of thought, thinking these models are using and understand it, I think is sort of baked into the model. So I think that is a major advantage of the way that AI has developed. It didn't necessarily have to be that way. A lot of people, no one really predicted it five or six years ago, but I do think that it makes our task with alignment a little bit easier.
A
Right. But you know, you have, within this idea that constitutional AI, you have character training. These are all decisions that you make about how the model is going to behave and what its sense of, you know, goodness is. Right? I mean, goodness being, giving us things in ways that are honest and helpful and harmless. But those are choices of technologies that could end up being quite infrastructural.
B
Right.
A
You know, I think many people now believe that the. If the 19th century infrastructure was canals and railways and then sewage systems, 21st century infrastructure will be, you know, a layer of AI and the nature of infrastructure is that it is a, it is a social product in of itself. But the way that we currently guide the success of these models is actually through the interactions in the market and it's through the interactions in the market with lots of different SKUs. So if you're building a based model like Grok, you're not competing on a fair basis with Claude because you have 200 million people a month on X that you can pour at that model or if you're at OpenAI, you've developed some forward momentum. So these models are not going to compete in a way that's necessarily aligned with public interest. Is that a problem?
B
It's definitely, I think for any advanced technology it's a problem. I think it's very much a social problem. That's fundamental to sort of capitalism though, in the sense that with any new technology there are certain capitalist incentives. There's a sense in which those often are not completely anti aligned. I mean people are buying products, they, they put their dollars where their opinions are. But I do think that like there are externalities, there can be problems introduced with, with, with, with new technologies. Capitalism is very, very, very far from perfect in terms of making sure human welfare is improved, is broadly available. But I think it's sort of a lot of the problems are similar to problems we already have in the world. They're just potentially if the, if the technology moves very quickly, then maybe we'll have to face them more quickly than we do with other technologies. But I think they're broadly similar also. I mean we do try to bake in some flexibility. Like you could ask Claude to role play in a lot of different ways. Instead of sort of sounding just like Claude, it's happy to sort of sound different. And I think we tried to set some guardrails of sort of basic harmlessness on some of this. But you can change Claude. And I think part of that is that we do think that it shouldn't be up to us to completely dictate the value. I think AI is fundamentally, is very empowering and should be sort of broadly accessible and we want people to sort of be able to benefit in whatever way they see. I mean, I think freedom to use the technology in the way that you want is really important.
A
I think one discontinuity is between what's happening in the Bay Area, then what's happening within the Frontier Labs and Main street for sake of argument, and a sense of how quickly or not things are moving. And I get a sense from you that you really feel things are moving very, very quickly, that the generations between advances is quick, quicker than perhaps we feel outside. There's a contrary perspective, which is that short term equilibriation is often constrained by social organizational inertia, by the fact that you have to build power stations or just get people to change their behaviors. I mean, I've been using large language models now for, you know, regularly since November 30, 2022. I'm still doing searches on Google. I mean, not so many, but I'm still doing a few. So there is this sort of contrary view that says paradigmatic change still takes quite a bit of time. But that still leaves the possibility that the way in which these models impact economic productivity and the labor market could be much faster than the canal or electricity or the iPhone. What is the debate that ought to be happening that would most help us prepare for that kind of fast deployment scenario?
B
That's a great question. I mean, so I was a theoretical physicist until pretty recently and often when I talk to physicists I sort of say if you believe me. And I mean, obviously people should be skeptical about AI. People should think about it carefully. I, if you'd asked me six or seven years ago, didn't in any way expect this. I sort of slowly became more and more convinced that AI was going to make very rapid progress. Now I'm pretty convinced, but I might still be wrong. So people should be, should have, have appropriate levels of skepticism. I certainly question is AI really going to be on this trajectory? But if you accept that, I mean a thing that I say to my fellow physicists is, look, we really need people with all different kinds of intellectual training and experience to work on this. Because if this is really happening, it's pretty revolutionary and we want sort of our best and brightest to be paying attention. We want, want everyone to be paying attention and thinking about what it, what it means in terms of what, what should we be thinking about? I think there's a lot of interesting debates about what the economic impacts of AI really will be. I think it's different from a lot of other technologies. I think something that's very interesting and I don't know what's going to happen with it is that in terms of where is AI most most useful, most productivity enhancing, it's really sort of more educated, sort of white collar jobs that, that might be impacted. And so I think that's, that's a difference, I think from the way we often think about automation. So I think parsing the consequences of that is pretty Interesting. We ourselves, I guess are, we're trying to study empirically, we're very focused on empirical approaches for AI and otherwise how AI is getting used. So we have this tool clio that allows us to, in a privacy preserving way, sort of aggregate how CLAUDE is being used. And we're studying things like is that complementary, is it productivity enhancing, to what extent is it, is it potentially replacing tasks that people would otherwise do? And we're sort of opening that data set up to economists to try to study. So I think that's an example of a place where I think there's a lot of interesting work to do to try to understand like what is the, the progress of this going to be. Because right now we see a huge uptick in, in usage of AI for software engineering. Um, and I think that's sort of the perfect area because software engineers love to adopt new technology. It's very exciting.
A
And also software is verifiable, right? So if CLAUDE produces something that doesn't work, it won't execute, it won't pass its unit tests. So you can go back and say, listen, this didn't do the thing I needed to do.
B
Do it again. Exactly, exactly. So I think there's a lot of reasons why software engineering is a natural place for AI to be adopted. But I think it's a great question, like, is something like what we see in software engineering with, with, with so many, so many folks using, using AI, is that going to happen all across knowledge work or is it going to be much slower and then what, how is it going to kind of permeate, permeate our day to day lives? I think it's interesting to think about that. I think it's interesting to think about do we really want AI that's at or beyond human level. You were asking me all these questions about is it really safe to have AI that is smarter than you? And I think that is a real question. Like I think it's a society level question. Should we be sort of having these super intelligent AI aliens kind of invading the earth or should we, should we decide not to? And how from an international perspective do we decide how we roll out this technology?
A
You know, but I frame it slightly differently to that, to that because I, I see you and your colleagues are evolving, right? Each subsequent AI is really impressive. And if you had shown me Claude 3.7 sonnet thinking five years ago, I would have 100% said we have AGI, we have it. And what you're doing in a sense is you're sort of progressively disappointing us because we're getting so used to it. It's like, oh, wait a second, 3.7 sonnets two weeks old, right. I'm not interested in that anymore. But one of the reasons I slightly disagree with the framing of a super intelligence that appears out of nowhere is because it's not appearing out of nowhere, it's appearing in an evolving way in our hands. We're also starting to recognize that as with any system, there's an efficient frontier. No model is better than every other model in every way. Right. There are trade offs and the way it will get embedded across our economy will be, to be honest, the model that's going to run the OCR on a scanner in a warehouse is just not going to be as smart as Claude. 5 Sonnet why would you do that? Why would you run up the cost? Why would you have all the latency? And so I think for me part of the challenge actually is really about what is that evolving governance of this machine intelligence systems where there will be thousands or millions of these models. I think some of the AI risk debate emerged from Nick Bostrom's excellent book a decade ago where the view was you'd have a singleton, right? You'd have a solitary all powerful AI rather than many, many thousands or millions or tens of millions of them all interacting in slightly different ways, which I think creates a different control problem, to be honest.
B
I completely agree. I think being able to continue to iteratively deploy improvements to Claude I think is valuable because we can sort of see where it's headed, we can sort of identify problems. If, if the next model has, has an issue, people can complain to us, we can discuss it to the extent that it's that important. We can kind of discuss it as a society and then, then we can, we can, we can remedy it. So I agree superintelligence is not going to be one specific moment where we hit super intelligence. It is very much a continuum. AI is evolving in, in that way. And yeah, I agree, like we're going to be using AI systems that are not very sophisticated basically as tools forever. I mean, yeah, there's no reason to use an expensive model for ocr. So I agree we're going to have this ecosystem, it's a good point that this ecosystem itself, especially if AI is interacting with AI to do a lot of the work we do now, like maybe your AI doctor interacts with your AI pharmacist and negotiates with like the AI health insurance to deny you your cover, no doubt. But something won't change.
A
Right?
B
Yeah, Yeah. I think that, like, that ecosystem could have a lot of problems, that maybe Claude is safe and it's aligned, but the ecosystem has problems. And that's something that is so new that no one's really studied it, but. But it's something to worry about. I think there's this general worry that the way that AI development will cause harm eventually is that things kind of go off the rails in the modern world. I don't understand how most things work. Like, do I understand how my car works? Do I understand how I understand how my iPad works? So when you get more and more systems that no one really understands, things can kind of go off the rails in ways that are really hard to predict and are kind of due to the interaction with the ecosystem. So that's definitely a risk.
A
That is another conversation, though. I mean, and you've tickled my brain now, and there's so many places to explore that one. You've been super generous with your time. I wanted to ask just one last question, really, which is about the things that we should be excited about. So if you look out over this next 12 months and you look at Claude, you look at anthropic, what are you most excited about?
B
I'll say something that has nothing to do with Claude. I think I'm most excited about getting what I described in terms of sort of scalable supervision of AI, helping us to sort of make sure that AI is useful, is doing good things for us. I'm very excited about kind of getting that really working, going beyond constitutional AI, because I think that's really the lever for becoming more confident that we can continue to improve the capabilities of AI and make it more useful and make it beneficial for all of us.
A
Well, I look forward to that as well. I will be stopping this call and going straight into Claude and asking people about all the questions I should have asked you. Jared Kaplan, thanks so much this morning for chatting to us.
B
Thanks so much for having me. It was a lot of fun.
Azeem Azhar's Exponential View — April 1, 2025
Host: Azeem Azhar
Guest: Jared Kaplan, Co-founder & Chief Scientist, Anthropic
In this wide-ranging and insightful conversation, Azeem Azhar sits down with Anthropic Co-founder and Chief Scientist Jared Kaplan to discuss the imminent arrival of human-level artificial intelligence, its measurement, and its far-reaching societal implications. The discussion explores the mechanics of AI improvement, the global race, safety strategies, interpretability, economic impacts, and the evolving governance of powerful AI systems.
Key Points:
Insightful Moments:
Key Points:
Notable Quotes:
Key Points:
Key Points:
Key Points:
Key Points:
Notable Quotes:
Key Points:
Key Points:
Key Points:
Key Points:
Closing Quote:
The conversation is candid, intellectually rigorous, and reflective. Both host and guest speak accessibly but with technical depth and honesty about the uncertainties and risks ahead. Kaplan emphasizes both humility (“I might still be wrong”) and a sense of collective responsibility, while Azhar frames questions to provoke deeper reflection about society’s readiness and values.
This episode is essential listening for anyone interested in the present and imminent future of AI, combining technical insight with thoughtful consideration of societal context and looming challenges.