
In this episode, Peter McCrory, Head of Economics at Anthropic, unpacks the company's new Economic Index report. His team analyzed millions of real Claude conversations to map exactly where AI is augmenting human work today and where it isn't. We explore the striking divergence between API and chat usage, why businesses need to extract tacit knowledge to unlock AI's potential, the "hollow ladder" risk for junior workers, and Anthropic's estimate that AI could add 1.0-1.8% to annual productivity growth over the next decade.
Loading summary
A
So Anthropic, the leading AI lab behind Claude, has just released the next edition of their Economic Index report. They have analyzed millions of real conversations with Claude to map exactly where AI is augmenting human work today and where it isn't. I'm with Peter McCrory, who is the head of economics at Anthropic, and he's the one who led this research. I think it's the best empirical window we have into how AI could be sharing, shaping work right now. So, Peter, thank you so much for breaking away from your computer and joining us today.
B
It's a privilege to be here and so glad to be able to share this work with the world and all of the underlying data, which, I mean, I'll talk about, but everything that we do is based on open source data. So we hope that others will join us in making sense of what's on the horizon.
A
You know, that is such an important point because in this moment of the investment boom and the prospect of artificial intelligence really changing the way we live, the quality of data I found has been really, really poor. It's sort of scuttlebutt and, you know, survey a few mates and slap a logo on it and change the way the market thinks. So when you get data from, you know, Anthropic or Epoch or others, it's really good to be able to hold onto it. Your data shows something quite striking, which is two completely different use patterns emerging at the same time. So if you're using Claude, you can do it like most of us do through the chat interface. You know, tippity tappity, tippity, Claude goes away and thinks and comes back with an answer. Or you can use it through the API, which is a programming interface, which means that perhaps you're accessing it through another piece of software, maybe one that you have written, or more likely your IT department has written. So what I found really interesting is I think your previous report showed as well that use cases through the API are about 75% in what you call automation of tasks, but they have lower success rates. Whereas if you look at the interactions on Claude, AI, the task mix is much more around augmentation, but it's also likely to result in more success. So these aren't just two channels, they're two different stories about how AI integrates into the workplace and the future of work. Which one is more indicative of what the future is?
B
That's a really great question. And I think of this in one of two ways. One, broadly speaking, the usage patterns on CLAUDE AI, which is this sort of chatbot interaction, do at a high level look very similar to the API deployment. So dominant usage for coding related tasks, as well as sort of the other overrepresented categories with a little bit more tilt toward programmatic deployment when businesses choose to embed claude's capabilities. So high level, I tend to see these as both capturing where are capabilities strongest and where are they providing economic value. Whether that's through sort of iterative back and forth with a user through the chat window or through the API deployment. But to your point, so much of the, and this was a point that we made in our last report, is so much of the labor market and productivity implications of this technology, much like past technologies, will hinge on how businesses choose to embed and deploy the tool. And so the sharp increase, relative increase in automated use, where CLAUDE is given a straightforward directive and expected to produce an output that feeds directly into a service that's provided to a consumer or some internal business operation, is where I think the productivity effects will begin to materialize. This sort of matches sort of the historical pattern of general purpose technologies where businesses need to figure out how to maybe even embed the capability in invisible ways. So the analogy that I use is with electricity. When I go to a coffee shop and I order a latte, I don't often think, except, except in this conversation, don't often think about the power of the electricity that's required to provide this service, that that general purpose technology is invisible to me. And it will take time for businesses to figure out what those use cases are. The CLAUDE AI usage patterns might be an early window into what's on the horizon. So early adopters use the chat bot to complete very sophisticated tasks. Through multi turn interactions, businesses learn that there's immense economic value there should they be able to provide the right contextual information. And then over time that gets embedded in business workflows.
A
We're going to investigate all those ideas like multi step. And over time I'm going to be a bit cheeky. So when I looked at this data after your previous economic report, the way I took this away was human. Employees like you and I, we think of ourselves and what our job entails in the round. And so when we get a superpower tool like claw AI, we think about how we do that job better, make it more interesting, make it more challenging, get through not just our must do list, but our coulds and woods where the real value and enjoyment may, may lie. Whereas it turns out that companies actually see their employees as bundles of tasks where you can discretize them and you don't ask the question, what does Peter really need to kind of manifest himself fully as the head of economics at Anthropic, which is what you think of about your job in your head. They think, what are the 17 tasks Peter does? And which are the ones we can, can we, can we automate discreetly? And I felt it actually said something a little bit more about that, that different perspective, the relationality that an individual has with their own work and the honesty with which their boss actually thinks about that individual.
B
Yeah, I think that, that's an interesting observation and I think, I think it illustrates the, the, the fact that, I mean so much. One, one implication of our, this report was the sort of uneven impact that across different sorts of jobs and the fact that some jobs are likely to be fundamentally transformed and maybe even have greater risk of displacement. So if you take into account task reliability, like what is Claude really good at doing? And you look at what are the tasks that are very time intensive for certain types of workers, you plug that into what data we actually see on our platform, you get a nuanced picture of sort of who's exposed to AI. One example here would be sort of data entry workers. These are tasks where in our analysis, Claude is pretty reliable at extracting information in standardized ways from sort of natural language reports. That's the most essential task in that job. And I mean, maybe to your point, that's where it might be more straightforward for businesses to figure out how to automate that ex. That specific workflow. In our data in this report, we actually saw a jump in office and administrative usage as a total share of our API traffic, suggesting that businesses are broadening out beyond just coding related tasks in how they choose to sort of deploy the tool into these like back office administrative support tasks. It's not obvious. I don't think that this means that your job will become sort of less meaningful or, or more meaningful, but it may mean that your job fundamentally changes for data entry workers. Maybe that type of role becomes more. And this is very speculative, so you can push back, but maybe someone needs to manage more the sort of relaying of information that Claude is providing. And it's not quite a data entry worker focused on plugging in the data points from a report into a spreadsheet, but there's nevertheless this important role that a human will play in translating and engaging with sort of people within the organization.
A
Well, I mean, what we're doing there is we're going back and we're playing through what happened with previous waves of general purpose technologies and automation. And I think the observation you made that maybe what people do on the chat interface is a little bit of an early warning about or early signal rather than warning about what you can do when you start to automate it. Because I think one of the observations would be that if all you do is automate a task like data entry that had fundamentally been gated by the marginal value of the nth item that was entered into data by a human who you're paying per hour, you had already taken a scarcity mindset to that and you were only going to look at the minimal amount of data that you needed to achieve the outcome. Now if that cost drops by a factor of say 1000, you are not just talking about looking at five more customer records, you are talking about looking at a thousand times more, which is a regime shift. Right. Even if the fundamental kind of economic organizational model remains the same, a thousand x is a regime shift, which means new behaviors emerge. And some of the things that I see when I talk to businesses is, you know, a large part of them are doing the sorts of things we've discussed by automating data entry. But what they've done is they've automated the horse. They've automated the fact that they used to look at 10,000 pieces of data a day by humans and now they look at 10,000 by machines. Rather than saying, well, what if we looked at 10 billion and how would that change our business downstream from that? I mean, you know, I feel that there's something there.
B
Yeah, I, you know, I think this sort of reminds me of one of the more complicated insights from our last report. But I actually think it's quite relevant here, which is the set of things that businesses will do over time to restructure sort of their, their modernizing data tech stack or new organizational workflows to unlock the productivity. It's not just do what you had been doing before, but sort of fundamentally restructure how the business operates. The historical analog here would be again electricity, where factories shifted from centralizing power within the on the factory floor to more distributed power. That changed fundamentally how factories operated. So what do we see in the data? We actually look at how much context do businesses when they use thought through the API, how much contextual information do they provide and how much output does the model produce? And actually if you generate more output tokens, that tends to be the most complex tasks. It turns out that the in order to get those most complex implementations, so Moving beyond just straightforward data entry to something that's like, much more sophisticated. Maybe like automating biological research and analysis.
A
Oh, wow. We've taken a big step.
B
Yeah, that's a. That's a big stack. And you know that that is arguably like something that is maybe on the horizon. Like, I think a lot about even our productivity analysis might be conservative if we're failing to account for the automation of innovation itself, that maybe we can return to that. I'm kind of curious to hear your perspective. But it turns out for those most complex tasks that we see in our data, businesses need to provide disproportionately more contextual information. So even if the capabilities are there, if you don't have the relevant information to deploy that capability, business adoption will maybe constrained or bottlenecked.
A
You've got to run around. You've got to run around and get all your tacit knowledge written down and stuck in markdown files.
B
Exactly. The tacit knowledge piece is so crucial, I think, about developing a sales strategy. It might be that your coworker has the relevant information not even in the sort of customer management software that you're using, but in their brain or in some, like, document that they've written out. And if the business is not thinking about how to elicit that tacit information in a way that can make the model effective, you might see, you might fail to see tasks in our API deployment for which Claude and other frontier models would actually be capable if they had that information.
A
I mean, it's a really fantastic point and it shows how quickly the field is moving because no one was saying this a couple of years ago. Right. And your report came out before Claude. Opus 4.5.
B
Yeah. The data, the underlying data that we sampled was just before Opus 4.
A
5. And Opus 4. 5 has really changed the world, at least for certain early adopters. It's particularly good at coding, for example, and it has absolutely cut into my sleep. You can see my URA scores just collapse over the last month. And, you know, I'm. I'm getting up at 4am, I'm firing off a bunch of parallel agents. Then I'm going for my walk, and then I'm coming back to see what they, what they've done this year. We're on, we're recording this on January 15th. I have written, rather. I've commanded Opus 4. 5 to write about 10,000 lines of code per day. So I'm up at about 150,000 lines of code, according to some, some measure that I did. I'm Just curious about how that kind of capability jump changes the way you, you think people might behave. Based on what you had seen in, in this very, very extensive empirical study you conducted.
B
This sort of touches on one of the findings which is for the most complex tasks, which I presume you're asking Claude to take on very opus 4.5 to take on very ambitious exercises. In our data, those most complex tasks tend to have. That's where the model struggles. So there's like the, the, the success rate is, is somewhat lower. And that suggests that human expertise to evaluate the quality of the output is more important and you need more human delegation and direction and managerial oversight. I have found that even with Opus 4. 5, the. There's still some level of oversight and quality attention that I need to maintain. But I actually worry a lot less about the implementation steps. And so I'll give a somewhat of a concrete example where analysis that I've been wanting to do for a while and I haven't had the time to do it. And it sort of touches on this idea of how does AI exposure across jobs as commonly constructed, how does that relate to business cycle sensitivity? Is it the case that workers who have high AI exposure, are they the ones who might have higher unemployment when the labor market slows down? I've been giving Claude for some time now in a sort of separate server to run autonomously. This exercise of downloading some research papers, implementing the sort of replicating some prior reports and then coming up with this like new analysis that I wanted to do before Opus 4.5, I had low confidence that this would work. And then when I gave the same task to Opus 4. Five, I discovered that it was able to move reasonably proficiently at implementing even somewhat ambiguous directives. But I still had a lot of back and forth. It was like having a very capable research assistant, much like the job that I had when I was fresh out of college working at the St. Louis Fed. That type of implementation capacity, the ability to understand how to run and set up regressions, how to download data. Through writing a new API, Opus was able to, Opus4.5 was able to do all of that and even like write up a documentation methodological report. I suspect that we will see sort of the level of managerial oversight sort of move to a higher level as people begin to delegate. Maybe like lower, more straightforward, but nevertheless sometimes ambiguous implementation implementation steps.
A
This is such an important point. So let me, let me go back to before opus 4.5. So what you looked at in the report, one observation was that when people used the API, the system could do a 3.5 hour task, long task, in other words, a task a human might take 3.5 hours to do with a 50% success rate. But if they were using Claude AI, that 50% success threshold was a 17 hour task, which is two workdays in Europe and of course one workday in the Bay Area. I'm just quite curious about a few details there. Right, so what is a 17 hour human task and how much time were the humans having to do it? Because it's really different if I just, you know, bark a command at Claude, who will rid me of this turbulent priest and it goes off and does it for 17 hours versus I have to sit over the machine and every 30 minutes give it, give it feedback because in that case I'm context switching and perhaps I can't really do other things. So is it a 17 hour task or am I just pressing shift tab accept accept on my keyboard for, for, you know, the better part of a day? And what is it about these multi turn conversations that single shot can't achieve? So I'm, I'm just curious like so, so what is that task? How much is a human sitting over the machine pressing a button?
B
It's a great question. And, and you know, in the report, the 50% at sort of 17 hours or around like 20 hours is an extrapolation of the linear fit that we document. So I don't think we actually see those very long tasks that you're describing. It's just sort of the implication of the data that we do see in our data. And so I think this is similar to the type of analyses that Meter has put out looking at task duration or task horizon and relating that to the reliability of frontier models. But it is conceptually different along two important dimensions. One, it's not just a sort of a laboratory controlled analysis. It's people are choosing to bring which tasks they bring to Claude and they are bringing tasks where they believe CLAUDE will be most capable. Especially on Claude AI where there's this multi turn oversight interaction where you give CLAUDE a task, you see the output, you give it some feedback and you have this oversight that can increase the reliability of what the model is able to complete in a short amount of time relative to how long it would take if you had to do it alone. I think compiling a literature review, a substantive literature review, is an example of a task that would take a research assistant. So I'm speaking a bit from my own experience working at the PED when I was younger that would take me multiple days to put together and compile nicely for the senior economist I was working for. Claude is able to do that sort of thing reasonably proficiently very quickly. And Opus 4.5 will be able to do that even better.
A
It's quite a complicated picture because only a certain type of person with certain experience can command a 17 hour or longer task. Only a certain experienced person can make sense of whether a literature review is good or not. It's why PhD students have supervisors. And so this is a tension in your findings I think is worth exploring. Models are improving and people are trusting them with longer and harder tasks. But judging those outputs does require real expertise. But there is this hollowing out because many of the junior tasks that have built that expertise are at risk or if not already being automated away. And I'm curious about, you know, what you, when you look at that and you look in the data and you, you, you extrapolate, are we creating a long term institutional fragility problem? That what is being a lot of what's being automated is what used to be the apprenticeship work, as you discovered doing all those literature reviews, you know, a few years ago.
B
I think that this is a very insightful observation about how the nature of work might be changing and how the sort of capabilities unequally affect sort of early career workers versus later career workers. And you know, when I worked as a research assistant at the Fed to continue this example for my life, I learned a lot on the job from doing very basic tasks. Sometimes it was more involved, like a literature review. I actually worked with an economic historian to transcribe historical records exactly this data entry job. But in the context of economic research, and Claude and other frontier models are increasingly capable of doing that type of work. And the question is, will businesses sort of continue to invest in younger workers to equip them with those tacit skills that they had previously learned from these basic set of tasks? I think it's not entirely clear. That's partially why we're putting out this data so that we can track in real time what are the, the actual effects in the labor market. We have some suggestive evidence, of course, from this nice paper, canaries in the coal mine from researchers at Stanford that document that early career workers in high AI exposed roles have had worse employment trajectories in recent years. But you know, we have alternative evidence as well. And so we're still in the early stages.
A
A quick note, if you want to support us in bringing more of these conversations to the world, please consider subscribing to the show. Let's turn, let's go back to though, to the personal, because I'm going to go outside of the wheelhouse. But the beauty of doing this on my own, in my own conversation is that I can and no one can stop me. So here I go. You think about London taxi drivers, which is something a bit like Henry Ford's factory that you and I have to talk about a lot. The London taxi drivers hippocampus grows because they spend a lot of time studying the maps and essentially becoming homing pigeons in London. You know, when you read a book, the book changes you and it's the act of doing that that changes a person. That sort of judgment that a, you know, a, a biologist in their 60s has or a shopkeeper in their 60s who's done it for a long time to look at the stock and say, well, we're low on chewing gum is a function of judgment that is almost certainly reflected in, you know, connections in the brain and changes, changes. There, there is something about the skill that you developed, the skills I have developed, which involved late nights, mundane, repetitive work that allow us to work at this higher level. So I'm thinking less about this from a skills acquisition point. I'm thinking about this from the perspective of an individual and how they establish certain classes of mastery. My team will tell you that there are certain things that I just will not notice and I'm kind of rubbish at them. And there are certain things where I have such acute mastery that I can, I can spot the crack from two miles away. And that has come not because of genes, but it's come from banging my head against the wall for 25 years. So when you look at that and you think about your own personal experience, how does this, you know, how do you address this hollowing out?
B
So I think I'll sort of respond to this through the lens of one of the primitives that we introduced. And it was sort of to try to get, get at this idea where we ask how many, you know, ask Claude effectively to estimate how many years of formal education would someone need to have to understand the prompt and how many years of formal education would you need to understand Claude's response? It turns out that the most complex sort of high education tasks in our data typically coincide with very sophisticated prompts provided by the human. So I think in my example that I mentioned before, where I'm asking Claude to do this regression analysis and somewhat sophisticated implementation, you, at, at present you need a PhD in economics to understand and formulate and have the taste to know how to prompt the model and much less evaluate the output that it, that it produces. And so there is this question of like, what's the best way to acquire those skills? I, I, you know, there, there is evidence on this front that it's not just acquiring the, the skill, but it's also developing the, the, the cognitive endurance. A shout out to one of my good friends at University of Chicago, Christina Brown has a nice paper on this front in a developing world educational context.
A
I think this is a really, a really, really interesting point question and I just want to dig into. Anthropic is one of the leading firms here in this space. So you've obviously got this realization internally ahead of many other companies. So what is a sense of things that are working in practice? When you look at your junior team members and you think, how do we, how do we get around the missing rungs in this ladder? How do I help, you know, junior Jenny and starter Steve get those skills that I got through sort of sweating it through in late nights?
B
That's, that's a really great question. And you know, maybe I'll have a harder time sort of concretely pointing to something here at Anthropic, but I do believe that it is exactly as you're describing that to, to have the taste and discernment of what, what is good writing, you need to spend a lot of time actually writing. And so I would encourage someone to not let Claude take the first pass at a draft. Like, that's how you develop your voice, that's how you develop your argument, that's how you can discern whether the writing that Claude produces is actually of sufficient quality. And you know, in previous roles I've seen examples of where people have made missteps, relying too heavily on sort of language models in developing their own voice. I think that there's a similar implication for reading papers. Like you should just sit down and work your way through it. You know, setting aside the question of AI for like very challenging technical papers, I would have feel a lot of trepidation about just, oh man, like I'm never going to be able to understand that. How to do some macro dynamic DSGE modeling, for example, sitting down and reading the methodological paper and just like forcing yourself to work through and understand it. It turns out that maybe it's like, not as scary as you first thought. The great hope of this technology is that it can actually help to facilitate and sort of accelerate the acquisition of skills. So once you've done that first pass of something that's very challenging. Go back and interact with Claude to help clarify things that don't make much sense. So actually one of the primitives that we introduce in the report is are people using it for professional, personal or for coursework. And we actually see that coursework is one of the more common. I mean it is relatively common, around a quarter of all usage globally and more so in lower income countries. People are using CLAW to sort of complement their education for educational purposes and skill and expertise acquisition.
A
But you know, it's, it's a, does require some new novel thinking and you know, in the corporate setting, in a world where you as a business you can get something done quickly to stop and say to your employees no, that 10, you got to spend 10 hours reading slowly requires a certain type of corporate culture. And I'll share with you a couple of things that we've done. So this is my fountain pen. I've never, I've not written as much in by hand for 25 years as I have in the last year. We got the team fountain pens. I do hope they're all using them because it slows you down with your writing. They're really voracious readers. And in fact just yesterday one of my researchers caught me out on a, on the very famous Paul Romer economic growth paper which I have not read for you know, 10 years. And you know, he was, he was sitting there reading it on a printout and I was thinking, gosh, I'm amazed he's even printed it out. And of course in the detail that you forget when you just read the summary is a detail that did catch me out. So there are some behaviors that we have found to be perhaps, you know, a little bit more effective. Slow yourself down, make sure you're reading. But you need to have a certain culture. And I think that a lot of businesses, a lot of American businesses really, you know, they're thinking about short term productivity, they are thinking about volume being more valuable than that insight and that, that depth. Would you be able to see in your data whether anyone was designing around, around this or are we just going to see this quarter by quarter optimization?
B
It's an interesting question and sort of has sort of sparking interest in a number of threads, I'll say briefly and then I want to return to this Paul Romer point about endogenous growth theory and related growth theory because I think it's like a really important point for thinking about what's on the horizon with this technology. So one thing that we do See in our data, when you compare the API versus the claw AI interactions is that businesses give less autonomy to Claude to complete the task, like less decision making ability, which suggests that, I mean it's a kind of related to this idea that like you don't want to fully automate the process, you want to be make sure that there are appropriate guardrails. In the context of reading a paper, if you only ever read the summary, you're sort of missing out on crucially important information and you need a mechanism. Maybe that's a process like a peer or your manager is like reading over very carefully whatever you've written or in the business context, you need to have in the API context, like structure and guardrails to ensure that the quality of the output sort of meets whatever threshold that you need. And we see some evidence of that in how businesses deploy the tool. I'm curious to know what was the detail in the Paul Romer paper and whether or not it kind of relates to this question of AI's impact on long run growth.
A
You've caught me out as well because I can now explain the mistake I made. And it does absolutely, as you rightly say, relate to this question of long run growth. And it was fundamentally about. Just to remind the audience who are not up on neo endogenous growth theory on a Thursday morning, just take us to where that is. This is economists trying to figure out how economies grow. And until the early 90s there was this sense that you needed capital and labor. And then amazingly, from somewhere like a deus ex machina and a Greek play, technology would appear. And that didn't reflect our intuitions because we sort of know that we make the technologies and countries that have good human capital and institutions make the technology. So where does that come from? And that's the idea of endogenous growth, that it just sort of arises from investments and interactions and incentives in the economy. So Romer writes this great paper in I think 1990, 91 or something. He gets a Nobel Prize, quite rightly, a few years later. But one of the challenges that emerged in the 20 years after that was that we weren't seeing productivity, total factor productivity, grow exponentially in a way that you might in certain types of endogenous growth. And the argument seemed to be that, and I'm just Peter, if I'm giving this summary to listeners wrongly, you should jump in. The argument seemed to be that the human researchers were just adding knowledge to an already enormous stock of knowledge. And so it was just a small incremental amount. And then the AI folks show up and they say, well, AI can massively increase the amount of knowledge being produced and we can get an exponential takeoff. So the mistake I made, long and short of it was I said, but kind of Roma doesn't accommodate for that. And Nathan, my researcher who's actually bothering to read the paper in detail, says, no, that's not right. He does, he's got his coefficient or variable called phi, I think it is. And you know, if phi is one or above, you get exponential takeoff, but in general, phi has been below 1. So that's where I got caught out. 1990 endogenous technological change was the paper. I had just read my summary and of course not the detail and thank God I had Nathan there to set me right. But then my question to you is, looking at that, do you and what you see, do you see a path where we get over the burden of knowledge and we do get to see that exponential?
B
Yeah, it's such a great phrase. So, so much of our analysis in this report focuses on sort of task level efficiency gains that we see in our data. You use Claude to write a report and you do it X times faster. And we do this exercise towards the end of the paper where we say, okay, imagine that all of those efficiency gains across tasks that we see in Claude AI and our, in our API traffic, imagine that that materializes fully within the economy over the next decade. How much would that increase labor productivity growth Each year we come up with a number in the baseline analysis of 1.8 percentage points using standard macro growth accounting. If you're interested, read about Holton's Theorem, some great papers by David Pikey on that front. But, but that is all focused on this question of what if we're more productive at the things we're doing right now? I think the long run question is exactly as you said, like to what extent might this technology automate the process of innovation itself? A great turn of phrase that I've heard Jonathan Haskell use is AI might very well be an innovation in the method of innovation itself. Overcoming the burden of knowledge. To become an expert economist, you need to, to get to the frontier, you need to spend many years sort of getting that narrow expertise. But in principle it might be the case that these large language models, they've read the corpus of human knowledge, they can maybe span the space where new ideas, productive ideas, both for scientific applications and for business applications, that's where they, they exist. So I think that is like a pressing question to which we don't Currently analyze in our report and sort of something that's on my, very, very much on my mind. I think an another way to think about it is again point another shout out to Ben Jones, this paper from fall of late last year where he explores this idea of what if there are tasks in the innovation process that are not automated by AI? And there you have this, this tension where it might be the case that you have unbelievably capable artificial intelligence. So the depth of capability is just like growing immensely. But if it is constrained to a subset of tasks that are crucially involved in the innovation process, that will limit the extent to which you could have takeoff and the extent to which productivity growth might rise above 2%, 3%. Anything higher would be like historically unimaginable in many, in, in, in many ways. And I think we have to think about these things in the, with some level of humility. The long 20th century, from 1870 to the end of the early 2000s was a, you know, a period of immense technological transformation, immense automation, the decline in the share of workers and agriculture from above 80% to around 3% today. And if you look over the long sweep of history in the US at least over that time it's about 2% points GDP, real GDP growth each year. So I think the future is very uncertain. And again that's like the big motivation that I have in doing this work is like trying to help us others, policymakers, researchers, to see a bit further into the future, to see a bit more clearly.
A
So look, but let's do this, let's stay on this future track just for a second and I'll bring us back to ground because people are living lives in 2026 and we need, we need to help throw some light on that. So you know, it does feel to me that this is something that is on, on a, on a knife edge balance which is that if you are able to automate research end to end and we're starting to see a number of companies out there funded by private capital trying to do this, you should, you could in principle get to a point where discoveries happen faster and faster. One of my favorite examples is, you know, why didn't your parents use LED light bulbs when you were a kid? LED light bulbs are better. They use less energy, they're more controllable. And the reason they didn't was a problem of knowledge. We didn't know how to make them. It was a 100 year journey from the basic physics through to the first transistors 50 years later in the 1950s and 60s through to the first red LED in 1960. I was explaining to my team that the reason the Cylons in Battlestar galactica in the 1970s and Knight Rider both have red LED things around their face, the kit car, it's because those are the only LEDs we could make at the time. And that was the future. And we didn't get blue LEDs until, you know, the late 1990s. And now it took five years for everyone to transition. If you can compress any part of that process, you can deliver enormous consumer benefit. But it may not show up in product, in gdp, it may just show up in better services.
B
Sort of the mismeasured consumer surplus. I mean, that's also been a lesson of recent information technologies. The advent of Google, the ability to get information at your fingertips didn't show up in gdp. It's a sort of, for most of us, it's free in some sense. And yeah, I mean, I think this is a really interesting point that the greatest value might ultimately be unmeasured, at least from the standpoint of gdp. I'm sympathetic to the critique of GDP as a measure of sort of technological progress and prosperity, but it is also the case that over time and across countries, so many other like choose your favorite measure of human prosperity, it tends to correlate very strongly with gdp. So it might not be the case that it's capturing everything, but it might be pointing us in the right direction.
A
Well, I gave you the excitable bull case there, but there is also the bear case that you alluded to through the Ben Jones paper, which makes me think of a year or so ago. Microsoft had a foundation model look at catalysts, and I think they came up with 35 million materials in one afternoon. And I thought back to mitlaq, who was Fritz Haber's co worker on the Haber Bosch process, who painstakingly looked at two and a half thousand compounds over the course of, you know, many, many months. And so we moved from a discovery process to actually a sifting and verification bottleneck, which those bottlenecks maybe require new skills. They might require new regulation, they might require sociological changes. And so it might take quite a long time to work all your way through that. The system, you know, you just essentially move the bottleneck from discovery or work done to sorting or verification. And that might just, you know, slow us down again and it becomes not so much the burden of knowledge, but the burden of sifting.
B
I, I'm like a, a, a bit familiar with kind of the empirical bayes like large scale inference techniques that can sort of allow you to sift out some signals when you sort of like do large scale testing in this way. But I, I totally agree with the, the point in general that we may be overwhelmed and this is not just in the scientific domain, we may be overwhelmed an immens very challenging to process. And you might be able to use large language models to help you process that information as well. One of the, I'm like bouncing around in my mind but one of the things that you said reminded me of in the paper. We sort of do this exercise where we want to like come up with an effective AI coverage measure takes into account task reliability. How much time are you spending on different tasks? One of the examples that we call out where this seems to matter is for microbiologists. So if you just count up the tasks that microbiologists do, looks like they have about half of their tasks are things that Claude can and is actually being used for. But when you focus on its most, the most time intensive tasks for that role, like hands on research with specialized lab equipment, large language models are not at the point where we are able to sort of automate that process. And maybe we need more general purpose robot robotics that can interact with this intelligence. But until we kind of figure out that bottleneck, we will be constrained in how much we can sort of achieve even if we have the most brilliant microbiologist embedded in Claude's capabilities.
A
I think that this is such a good point for me if I may to bring in a question that came from Claude 4.5 opus on specifically this issue. So I'm just going to scroll up to my notes and just, just pull it out. So, so this was a super interesting point that it made and it said look, if we, if you take a lawyer and you automate 90% of, of a lawyer's task, the 10% of the task that is remaining has all the liability sitting on it. And so the question is whether you really who benefits in that circumstances. I mean, you know, I never wanted to be a lawyer, but a job where the only thing you do is put your neck on the line for the bit that the machine can't do doesn't feel very, very nice. So, so court question was really relating to that which was if we end up getting 90% of the way through, you know, automation, don't we leave a basically a lawyer facing 10 times the exposure. I mean is that productivity or it says after it's M dash, is that fragility?
B
Yeah, I mean, I guess it's kind of like it might be leveraging the burden that is placed on one, one person as you sort of are taking on more responsibility. To your point, especially in the legal. Legal domain. It's not. We don't talk specifically about that. And I don't have, I don't have super crystal clear thoughts. I mean one thing that jumps to my mind is we do see Claude being used for complex legal related tasks. That was something that we pointed out in the productivity report that came out just before Thanksgiving last year in November. And I think it hints at the direction of there may be uneven impact across legal occupations. So paralegals might historically take on more of the reading and compiling and sort of analyzing legal documents to in support of the lawyers at the firm. Claude is capable and frontier models are increasingly capable of doing that type of work. And you might see, and this is sort of a punchline of the report is like for some jobs there might be de Skilling where Claude's taking over the most complex tasks in your job and sort of that could lead to a greater risk of job displacement or lower wages for that type of role. I think the complement of that would be lawyers are doing more than just looking at legal documents. There's a lot of interpersonal negotiation. Exactly. Bearing the burden of the liability when you sign your name on the document. And so there you might actually see wages rise as this technology complements sort of it's like another form of skill bias. Technical change 2, 5, 10 years out from now. That's going to be much harder to anticipate given how quickly capabilities are developing. But that might be something that's on the near term.
A
I'm delighted to hear that Caldwell allowers to bill at higher rates. I'm not sure that's a net good to humanity, but there it is. Listen, I'm teasing you, but it depends in a way shows up quite a lot in the report. It depends on what a person does. It depends on the tasks. It depends on the company's choices. It depends. It's like it's a report we're all depending on and it depends. And one of the things that struck me about the nuance that you brought to that was that it felt to me like it's a recipe for an economy wide staircase implementation rather than a smooth curve. The smooth exponential curves that I love so much but more a lumpy punctuated deployment because each step is dependent on a bunch of it depends issues that might relate to a firm or an industry or a Sector or regulated jobs, protected jobs, powerful classes of work. And if that's the case, some will hold out for longer. You know, it might shape the way that deployment actually happens. Perhaps it's going to be at the same time a bit slower than some people might imagine, but also, you know, jerkier as we kind of hit each part of the staircase.
B
I think the way that I think about the jaggedness, it would be along two fronts. One, there's sort of the jagged frontier of capability that, you know, we're, we're improving along many dimensions, but we're proceeding more quickly along some domains, coding more so than others. And so you get sort of this jagged baseline capability that is proceeding. And then you also have the question of adoption exactly to your point. And historically, the adoption and diffusion of new technologies requires firms to make costly investments in new capital or new organizational workflows. That was kind of a point from our last report. And so that suggests that there will be this, like, jagged adoption, maybe. Coding, again, is. Is a great example because we had very, very swift adoption here because of the contextual information is right there for the model. It can traverse with Claude code. It can traverse the repository to figure out how to complete the task that it's actually capable to complete. We just introduced Claude Cowork, which suggests that there will be maybe another sort of jump for knowledge domains that have sort of similar properties through that type of interface, given the underlying capabilities. But I definitely agree with the point that it's not this inexorable and smooth march. And I think in some sense, as an economist, it's. It's actually somewhat helpful that it is jagged because it suggests that we can tease out some of these impacts along the way by comparing where these jagged adoption or jagged capabilities first materialize. In some sense, that's kind of the, the approach of the, the work that I alluded to before, the canaries in the coal mine or the folks at the Yale Budget lab who've looked at this question from a different angle, but incorporating AI exposure, you can use these differences in exposure and adoption to try to tease out not just where we might expect effects to materialize, but ultimately what those effects are. And I think that is like a big question for the next year.
A
It's super important. We think back to the typewriter. We're going back to the 1890s, 1920s, an awful lot. Right. But it was 25 years for the typewriter to make its way through. And I think there was some other work by economics historians saying that companies with the Exception of startups like Ford didn't fully transition to electricity until one generation of managers retired. Right. And a new generation emerged. And I guess, you know, what we see from other parts of history is that it's often about diffusion rather than direct innovation, that we start to see broad based benefits. But of course, you know, an anthropic Dario has said it's all about a data center of Nobel Prize winners that we'll be able to deliver in a couple of years. And one of the observations I would make and you know, respond how you like, I think most people I know would really struggle to get the most out of Claude Opus 4.1, which you've just about to turn off because it's like, you know, a model that's a few months old, let alone 4.5. And that perhaps the thing that turns, it depends to we can do it is about a particular class of skills or confidence or attitude that may cluster quite a lot in Silicon Valley, but is much, much less profuse in other parts of the world. And I think it's that observation which is what is that tension between the, the exponential rate of improvement of these, these technologies and they're not slowing down and what it actually takes to, to diffuse them. Because my, my observation is just these hard, these more powerful models are in some senses harder for people to get the best out of.
B
I mean, I almost just like agree with, agree, agree with this, this point just right, right off the bat that the, the capability doesn't instantaneously deliver adoption. You need to identify new ways to deploy those capabilities in more accessible ways. I think cloud code is a good example here where software engineers and developers and those who are very comfortable working at the command line interface were able to just like jump in and use it. But if you don't have that background that is entirely inaccessible to you, even when what you can use Claude code for is actually like much more than for coding. And that's kind of illustrates the value of Claude Cowork, which is a much more accessible entry point for folks who may feel some trepidation with downloading Claude code or working with command line interface, even though the underlying mechanics of this agentic technology are very similar. And so you have this question of like, awareness. You need to identify what are those bottlenecks to user adoption versus business adoption. And it relies both on capabilities as well as like product functionality. And then also people need to, I mean, I don't think I fully appreciated what Claude was capable of until I came to Anthropic. I hadn't been using frontier models in the same way that I have since I joined just six months ago. So it also requires experimenting and trying it out.
A
You had this number one to 1.8% of annual productivity growth over the next decade, which, which is on the one hand it's impressive, right? It's cumulative. It's trillions of dollars actually by 2030, 2031. If you were a betting man, would you expect to be surprised to the upside or the downside?
B
I think I'd be surprised by a downside number. I mean I think there's sort of this open question of like are markets currently pricing in the productivity? And that's sort of this question around sort of financial like our financial markets appropriately aggregating information and do current valuations match future cash flows. But I think it's undeniable that this is a transformative technology. It's improving very fast. So like our analysis is current usage of current models and it's being adopted very quickly and I would be surprised if it was less than 1% contribution over the next decade. And then you add on this question of to the extent to which it may automate the process of innovation itself. I mean Claude Cowork is actually an interesting example where Claude code accelerated the time it took the team here to produce it. And that type of sort of fast iteration, fast innovation can point in the direction of even larger effects.
A
Peter, that you have summed it up so beautifully and neatly skirted around the bubble or boom question without saying either word, which is really, really wonderful. That is all the time, all we have time for today. Thank you so much for one of the most rigorous attempts to answer some of the questions that we are all asking and many people are hand waving about.
B
Thank you. What a privilege.
A
Thanks for listening all the way to the end. If you want to know when the next conversation is released, just hit subscribe wherever you're listening. That's all for now and I'll catch you next time.
Episode: Anthropic’s Head of Economics on AI Adoption Data, Claude Code, the Burden of Knowledge & the Next Generation of Experts
Host: Azeem Azhar
Guest: Peter McCrory, Head of Economics at Anthropic
Release Date: January 21, 2026
This episode features a deep dive into Anthropic's latest Economic Index report, an empirical look at real-world AI adoption based on millions of user interactions with Claude, Anthropic's large language model. Azeem and Peter discuss how businesses and individuals are using AI now, the emerging divide between automation and augmentation, the evolving requirements for expertise, and the organizational and societal implications of AI-driven change. They explore the tension between accelerating productivity and possible risks to skill development — especially for early-career workers — as well as the broader macroeconomic impacts.
| Timestamp | Segment | |--------------|-----------------------------------------------| | 00:36 | Anthropic’s open data approach | | 01:32 | Dual channels: Augmentation vs. Automation | | 03:31 | Invisible general-purpose tech (electricity) | | 05:13 | Task discretization: human vs. business view | | 07:26 | Automation moving beyond coding | | 11:35 | Bottleneck: Providing context for complex AI | | 12:57 | Impact of Opus 4.5: jump in AI capability | | 14:14 | Human oversight still essential | | 19:30 | Loss of “apprenticeship” work: fragility risk | | 26:09 | Importance of direct skill acquisition | | 28:19 | Cultural fixes: slow writing & deep reading | | 35:10 | AI as "innovation in the method of innovation"| | 41:17 | Automation bottlenecks (e.g., in microbiology)| | 43:05 | Fragmented work & liability for humans | | 51:09 | Bottlenecks in capability vs. adoption | | 53:03 | Productivity forecast: risk to upside/downside|
For listeners who missed the episode, this summary captures its empirical rigor, nuance, and lively, intellectually honest exchange between two leading thinkers at the intersection of economics and artificial intelligence.