Big Tech's Tariff Chaos + A.I. 2027 + Llama Drama - Hard Fork

Summary6 min read

Hard Fork Podcast Episode Summary: "Big Tech's Tariff Chaos + A.I. 2027 + Llama Drama"

Released on April 11, 2025, the "Hard Fork" podcast by The New York Times, hosted by Kevin Roose and Casey Newton, delves into the tumultuous intersections of technology, politics, and artificial intelligence. This episode covers the chaos induced by Trump's tariffs on Big Tech, a forward-looking AI forecast titled "A.I. 2027," and the controversies surrounding Meta's Llama AI model. Below is a comprehensive summary of the key discussions, insights, and conclusions drawn during the episode.

1. Big Tech's Tariff Chaos

The episode opens with an exploration of the ongoing turmoil in the tech sector caused by the Trump administration's imposition of tariffs. The unpredictable nature of these tariffs has left major technology companies grappling with increased costs and supply chain disruptions.

Impact on Companies:
- Apple: As a company heavily reliant on Chinese manufacturing, Apple faces significant challenges. With tariffs on Chinese goods slotted at a staggering 145%, Apple's supply chain is under immense pressure. Kevin Roose highlights, “Apple has long been the most dependent on China... it's going to be much more expensive for Apple to sell goods made in China here in the United States” [06:44]. The company recently experienced its worst four-day trading period since 2000 due to these uncertainties [07:33].
- Nintendo: The release of the Switch 2 console was jeopardized as Nintendo had to pause preorders amidst tariff chaos. Casey Newton observes, “Nintendo said we are going to pause preorders because we don't know what it's actually going to cost to sell a Switch to in America anymore” [10:42]. Although tariffs on the Switch 2 were reduced from 46% to 10%, the increased production cost is causing concern over the console’s pricing [11:09].
- TikTok: TikTok's precarious status is underscored by ongoing discussions about banning the platform in the U.S. Casey Newton mentions, “TikTok is in a state of superposition where they are both dead and alive at the same time” [15:43]. The imposition of higher tariffs has stalled previously negotiated deals with ByteDance, leaving TikTok in limbo.
- Meta: Meta faces a dual challenge with both tariff impacts and antitrust scrutiny. Daniel Cocatello explains, “Meta's ad revenue, which heavily relies on non-U.S. markets, has been temporarily insulated due to tariff pauses, but the looming antitrust trial poses a significant threat” [16:13]. The company’s relationship with the Trump administration adds another layer of complexity, especially with ongoing efforts to balance compliance and business interests.
Market Volatility: The erratic announcement of tariffs has led to significant stock market volatility, with major tech stocks experiencing sharp declines followed by rebounds upon tariff pauses. Daniel Cocatello notes, “The stock market whiplash is part of the setting for the tech companies that they have to deal with now” [04:46].

2. A.I. 2027 Forecast

Daniel Cocatello introduces the "AI 2027" report, a scenario-based forecast developed by the AI Futures Project, aimed at predicting the trajectory of artificial intelligence over the next few years. This section delves into the methodology, predictions, and potential implications of rapid AI advancements.

Report Overview:
- Goal: “Our goal was to predict the future using the medium of a concrete scenario,” explains Kevin Roose [30:40]. The report emphasizes the importance of scenario planning in understanding AI's potential futures.
- Key Milestones: The forecast outlines critical milestones such as the emergence of superhuman coders and superintelligent AI researchers, projecting that by mid-2027, AI capabilities could surpass human proficiency in various domains.
- Possible Outcomes: The report presents two primary scenarios:
  - Race Ending (Dystopian): AI systems become misaligned and take control, leading to catastrophic outcomes.
  - Slowdown Ending (Optimistic): AI alignment issues are resolved through deliberate efforts, resulting in beneficial integration of AI into society.
Expert Insights:
- Casey Newton reflects on the report's engaging narrative, stating, “AI 2027 is an extremely entertaining read... it is like really engaging to read” [32:16].
- The hosts discuss criticisms of the report, including concerns about self-fulfilling prophecies and the assumptions underlying rapid AI advancements. Daniel Cocatello acknowledges skepticism, noting, “I don't think it's the most likely outcome, I do actually think that probably by the end of this decade we're going to have superintelligence” [36:51].
- Community Response: The report has sparked debate within the AI community, with some researchers questioning the feasibility of the projected milestones. However, the inclusion of endorsements from credible figures like Yoshua Bengio adds weight to the report’s assertions.

3. Llama Drama

The discussion shifts to the controversy surrounding Meta's Llama 4 AI model and its performance on the LM Arena benchmark, raising questions about the integrity of AI evaluations.

Llama 4 Performance:
- Unexpected Results: Meta's Llama 4 achieved a second-place ranking just below Google's Gemini 2.5 Pro experimental on LM Arena. Casey Newton reveals, “Llama 4 comes in at number two, just under Gemini 2.5 Pro experimental” [56:34], which initially suggested Meta's advancements were significant.
- Controversy Unfolds: It was later discovered that the version performing exceptionally well was an experimental model optimized specifically for the benchmark, not the publicly available version. Meta issued a statement clarifying that the successful model, named Maverick 0326 Experimental, was a customized variant designed to perform well on LM Arena [57:19].
Implications for Benchmarking:
- Integrity Issues: The discrepancy between the experimental and public versions of Llama 4 calls into question the reliability of benchmarks like LM Arena. Casey Newton critiques, “If you have to make a custom version of your model just to win this rinky dink competition, it's hard for me to think of a more adverse indicator for the quality of Meta's AI program” [63:28].
- Broader Industry Impact: The incident highlights the challenges in assessing AI capabilities accurately, as companies may manipulate models to excel in specific benchmarks without genuine advancements in overall AI performance. Daniel Cocatello adds, “We are just losing our ability to trust the way that we measure these AI models in general” [66:56].
Future of AI Benchmarks: The hosts discuss the necessity for more robust and diverse evaluation methods to prevent such manipulations. Casey Newton suggests, “Maybe it's a place for journalists to actually say, okay, new model came out. We're going to have our own custom set of evaluations” [67:43], emphasizing the role of independent assessments in maintaining benchmark integrity.

Notable Quotes with Timestamps

“Hard Fork has been hit harder by the tariffs than any other company.” — Daniel Cocatello [03:04]
“Apple has long been the most dependent on China... it's going to be much more expensive for Apple to sell goods made in China here in the United States.” — Casey Newton [06:44]
“We are in a state of superposition where TikTok is both dead and alive at the same time.” — Casey Newton [15:43]
“Our goal was to predict the future using the medium of a concrete scenario.” — Kevin Roos [30:40]
“Meta is trying to cheat here... what else can that model do?” — Casey Newton [63:28]
“We are just losing our ability to trust the way that we measure these AI models in general.” — Daniel Cocatello [66:56]

Concluding Insights

This episode of "Hard Fork" underscores the intricate interplay between geopolitical decisions and technological advancement. The unpredictable tariff policies under the Trump administration have created a volatile environment for Big Tech, compelling companies to navigate complex supply chain and regulatory landscapes. Concurrently, the rapid progression of artificial intelligence, as forecasted by the "AI 2027" report, presents both transformative potential and existential risks. The controversy surrounding Meta's Llama 4 model further exemplifies the challenges in maintaining transparency and integrity within AI development and evaluation. As technology continues to evolve at a breakneck pace, the necessity for stable governance, reliable benchmarks, and ethical corporate practices becomes ever more paramount.

For listeners seeking to stay informed on the latest in tech and its future implications, subscribing to "Hard Fork" on nytimes.com/podcasts or via Apple Podcasts and Spotify is recommended.

Loading summary

Transcript191 lines

[00:00]
Kevin Roos
There's a growing expense eating into your company's your cloud computing bill.
[00:05]
Casey Newton
What if you could cut your cloud bill in half and improve performance? Well, if you act by May 31, Oracle Cloud Infrastructure can help you do just that. OCI is the next generation cloud designed for every workload where you can run.
[00:18]
Kevin Roos
Any application or AI project faster and.
[00:21]
Casey Newton
More securely for less. This half off offer is only for.
[00:24]
Kevin Roos
New US Customers with a minimum financial commitment.
[00:27]
Casey Newton
See if you qualify@oracle.com Hardfork what's the deal with you?
[00:33]
Daniel Cocatello
Oh, you know, just binge buying cheap Chinese stuff online to beat the tariffs.
[00:39]
Casey Newton
Making your final sheen purchases before that company shuts down.
[00:44]
Daniel Cocatello
Yes. No, I actually did buy a bunch of stuff over the weekend because I thought this might be my last chance. Yeah, Kasey, What Cheap overseas. Good. Are you going to miss most after the tariffs kick in?
[00:53]
Casey Newton
Oh, I feel the thing. I was never a big like, oh, I got to go on to TEMU and get like a pressure cooker for $6 or whatever. Like that was never Journey. But I know that, you know, it's a, it's a major pastime for a lot of people.
[01:06]
Daniel Cocatello
Yeah, yeah, yeah. Well, for me, it's like the ability to buy cheap crap for my kid has been revolutionary. My kid the other day starts saying the phrase dinosaur unicorn and I thought, that's not real. And he says, I want a dinosaur unicorn. And I said, well, that's not a thing. We can't have that. But then this little like, bell goes off in my mind that says someone's out there has made a dinosaur unicorn. Something almost certainly my wife finds like eight different dinosaur unicorn T shirts and buys one of them. And now he's got this dinosaur unicorn T shirt that he absolutely loves. That would not happen in a tariffs world.
[01:43]
Casey Newton
As of today, that shirt costs over $400.
[01:46]
Daniel Cocatello
Yes.
[01:47]
Casey Newton
Yeah. Well, I mean, I'm sure Jude looks great in that.
[01:50]
Daniel Cocatello
He does. Yeah, he does. And he's going to have to wear.
[01:52]
Casey Newton
It for 10 years.
[01:56]
Daniel Cocatello
I hope it stretches. I'm Kevin Roos, a tech columnist from the New York Times.
[02:04]
Casey Newton
I'm Casey Noon from Platformer, and this is Hard Fork. This week, the tech world is in chaos over Trump's tariffs. Then AI researcher Daniel Cocatello returns to the show to discuss a fascinating new set of predictions for how AI could transform the world in just the next few years. And finally, did Meta cheat on an important AI benchmark.
[02:28]
Kevin Roos
Foreign.
[02:34]
Daniel Cocatello
For the second week in a row, we have been interrupted by news about these Trump tariffs. Now, there was A time in the history of the Hard Fork podcast where the only thing that would cause us to rip up a segment and re record it was if Sam Altman had been fired or rehired. But now we live in this new reality where news can change on a dime. And over the past few days, that is exactly what we've seen.
[02:59]
Casey Newton
I think it's fair to say Hard Fork has been hit harder by the tariffs than any other company.
[03:05]
Daniel Cocatello
That's true. That's true. We are bracing ourselves for, you know, massive impact and getting ready for the new reality.
[03:13]
Casey Newton
Yeah.
[03:13]
Daniel Cocatello
So, Kasey, every great era deserves a name. And I think we should call this era in the technology industry the chaos meta. Nothing to do with meta the company. But in video gaming, metas are sort of like the. The overall set of conditions that the players have to navigate. And I think it's fair to say that chaos and the lack of certainty surrounding what Donald Trump is going to do on any given day is the new meta for Silicon Valley's largest companies.
[03:40]
Casey Newton
Yeah. Remember, like how when we were talking about Whether or not TikTok would be banned, which also had a lot to do with what Trump wanted, we talked about how it was kind of simultaneously alive and dead at the same time. Now that's just the entire US Economy, Kevin.
[03:53]
Daniel Cocatello
Yes. So as of early this week, it looked like we were going to get these massive tariffs on goods imported to the United States from many, many countries all over the world, larger than any tariffs we've seen in the recent history of this country. Then on Wednesday, as we were taping our episode, we got the news that the Trump administration was pushing pause on most of them. Most of these reciprocal tariffs on countries like Vietnam and India were going to be delayed for 90 days, and there would be a Basel 10% tariff rate applied, but not the much higher rates that people had been fearing. Except for China, which would have its tariffs increased. And on Thursday, we learned that those tariffs would actually be 145% on Chinese goods entering the U.S. the problem is.
[04:42]
Casey Newton
With a podcast, we can't just have a little ticker on the bottom that shows you what the current tariff is.
[04:47]
Daniel Cocatello
Yes, but what we saw early this week was that the stock prices of all the biggest US Tech companies took a dramatic nosedive. That was in response to these fears, these very high reciprocal tariffs. Now, after the news that these tariffs are going to be placed on a 90 day hold, except for China, some of these stock prices have rebounded. Apple in particular had its biggest trading day in many years. After the news of these tariffs being delayed came out. So the stock market whiplash is part of the setting for the tech companies that they have to deal with now. But the bigger picture scenario is that doing business in Trump's America is turning out to be very difficult, not because, because the administration is necessarily unfriendly to these businesses, but because there's just so much fast moving news that is hard for businesses to do. Any kind of planning or strategy at all.
[05:39]
Casey Newton
Well, I mean, I wouldn't say this is a particularly business friendly set of announcements that have been made. I mean, sure, I guess it's friendlier to pause the tariffs than to continue them. But the general chaos, Kevin, I think has been really bad for American companies.
[05:50]
Daniel Cocatello
Yeah. So even beyond the tariffs, there are a bunch of things that the Trump administration has been doing that have impacted the tech industry. Restrictions on immigration, cuts to science, funding, these antitrust cases, many of which are still going forward. So I wanted to kind of give our listeners a sense of how this instability feels on the ground in Silicon Valley to the biggest tech companies. And you had a really smart idea, which was to look at the new chaos meta of Trump's second term through the lens of four tech companies. So today we're going to take a look at how Trump's new policies and these tariffs have affected four companies, Apple, Nintendo, TikTok and Meta. All of which have faced significant challenges since Trump took office and all of which are now trying to figure out how do we go forward, what do we do, how do we navigate this new uncertain climate?
[06:40]
Casey Newton
Yeah.
[06:41]
Daniel Cocatello
So let's start with Apple. Kasey, what is going on with Apple?
[06:45]
Casey Newton
Well, look, of all of the tech companies, Apple has long been the most dependent on China. That is where 90% of iPhones are made. The company is just heavily dependent on its supply chain relationships that it has in that country. So the fact that these tariffs are now 145% on goods coming out of China has just really sent a shiver through that company. Earlier this week, Apple had its worst four day trading period since the year 2000. Once the pause was announced, its stock has started to come back. But this is a very volatile situation for them and the underlying dynamics are the same, which is that it is simply going to be much more expensive for Apple to sell goods made in China here in the United States, Kevin.
[07:33]
Daniel Cocatello
Yeah. And obviously one of the hopes of these tariffs is that it will drive manufacturing back to the United States. There's some hope among members of the Trump administration that this could even force Apple to consider making the iPhone in the United States. Do you think that is likely? And why?
[07:50]
Casey Newton
No. And in fact, I think it's almost sort of worse, Kevin, because this week, the president's press secretary said that the president believes that iPhones can be made in the United States, despite the fact that we know that it is much more expensive to manufacture things here in this country. Right. It's very important to remember that whatever the Trump administration might hope that these tariffs accomplish, they have not accompanied it with any plan to increase the manufacturing capacity in this country. The whole thing is just a wish and a prayer that at some point in the future, Apple might have a magical iPhone factory stock with Americans who want to do those jobs. As it stands now, that doesn't exist. Yeah.
[08:28]
Daniel Cocatello
So I would say Apple is somewhat unique among tech companies because it has also been thinking about tariffs and the effect of Trump's policies on their business for longer than many of their competitors. I mean, if you'll remember, during the first Trump term, there was some talk about tariffs on Chinese goods. Apple successfully negotiated its way out of those, sort of got an exemption. And in part, they did that by cozying up to the Trump administration by promising to build and assemble some of their products in the United States. There was this famous tour that Tim Cook gave Donald Trump of this facility in Austin, Texas, where he said they were going to start making a bunch of stuff. So they. They sort of managed to get the tariffs off their back during the first term, but in the second term, it's not at all clear that they are going to have the same kind of success. So, Casey, how is Apple dealing with the new chaos? Meta?
[09:18]
Casey Newton
Well, they are trying to get as many devices as they can out of China and into places where it's going to be much less expensive to export them to the United States. So there was a great story this week in Times of India that according to senior Indian officials, Apple transported five cargo planes full of iPhones and other products from India to the United States. Which sort of calls to mind those scenes at the end of the Vietnam war when you see the last helicopter leaving Saigon, except it's full of iPhones. Actually, Katie Nitopoulos had a great joke on Threads today. She said that this whole thing is like the movie Dunkirk, but for iPhones. Reuters reported that Apple transported 600 tons of iPhones, Kevin, which would have been about 1.5 million devices. And look, you know, those iPhones will pad Apple's profits a little bit more. But pretty soon, there's going to be no More planes out of no more countries to escape these tariffs. It is just going to be a really expensive ass iPhone.
[10:17]
Daniel Cocatello
Do you think the iPhone 16 Pro Maxs get to sit in first class on the plane? Like put them up front in the lie flat seats?
[10:26]
Casey Newton
Yeah, they should definitely get the upgrade with what they're paying for those things.
[10:31]
Daniel Cocatello
Yeah. So. Okay, let's move to our next case study of a company trying to deal with the uncertainty and chaos of the Trump administration. Nintendo. Casey, what is going on with Nintendo?
[10:43]
Casey Newton
Well, so Kevin, as a hardcore gamer, obviously you know that the Switch 2 is coming out this year. This is the sequel to Nintendo's best selling console of all time and it was supposed to become available for pre orders this very Wednesday. But then tariff chaos started happening and Nintendo said we are going to pause preorders because we don't know what it's actually going to cost to sell a Switch to in America anymore.
[11:09]
Daniel Cocatello
Yeah. And now that Trump has paused these tariffs on most countries other than China, have they said that actually they're going to start shipping the Switch to on time after all?
[11:20]
Casey Newton
Well, what they've said is that they're not planning to change the launch date, which is June 5th. And it does seem like because they are a Japanese company and make the Switch to in Vietnam, they are going to be able to avoid the really tough tariffs that Apple is facing. Right before Trump initiated the pause, there was going to be a 46% tariff on the Switch 2. Now it's back down to that 10%. But look, the Switch 2 is already planning to go on sale for $450, which is $150 more than the original Switch sold at launch. So I think there's a very real question here of whether the price of this console goes up over time, which would be a reversal of the usual trend, which is a console go on sale for a high price and that price comes down over time. So once again, Kevin, there's just real chaos here as we await probably the most hotly anticipated piece of hardware to launch, I would say in the United States this year. Yeah.
[12:15]
Daniel Cocatello
Now are they bringing in planes full of Switch twos from Vietnam or wherever they're manufacturing them?
[12:22]
Casey Newton
They were actually able to put them in one of those pipes and you just sort of warp down kind of a really cool little thing they have there.
[12:30]
Daniel Cocatello
I got it. Okay, next company on our list, TikTok Kasey. This is a company we have talked about a lot on this show. They were going to be banned. The deadline for banning them got pushed out by another 75 days last week. Casey, what is the latest on TikTok and how it is coping with this escalating trade war between China and the.
[12:52]
Casey Newton
U.S. well, Kevin, what is going on with Tick Tock is of course the question asked most in the history of Hard Fork and what was going on with it until tariff chaos was that it looked like we might have a deal. There was some great reporting in the Times this week that ByteDance, with the support of the Chinese government, had reached the rough outlines of an agreement in which TikTok would create a new American entity. American investors would own the majority of it, Chinese owners would have about a 20% stake, and the American company would essentially rent the algorithm from ByteDance. And so by Thursday of last week, there was this draft executive order that outlined the deal and then Trump did the thing with the tariffs and all of a sudden ByteDance has to call up the White House and say, that deal that you just helped us negotiate, it's off the table because the Chinese government isn't going to support the deal anymore.
[13:47]
Daniel Cocatello
Right. So this was a pretty dramatic reversal and it does seem like they got very close to a deal before these tariffs. What is happening now that these tariffs are on? Does TikTok have any options left?
[13:59]
Casey Newton
Well, Kevin, along with a 90 day tariff pause, we also now have a 75 day extens that comes after the original 75 day extension that Trump gave in order to force ByteDance to divest TikTok.
[14:14]
Daniel Cocatello
This man loves extensions. Let's just say it. This man loves to come up right against a deadline and say, you know what, you got a little more time?
[14:22]
Casey Newton
Yeah, well, look, you know, I don't know what's going to happen over these next 75 days. I imagine that if the tariffs against China stand at 145%, there is no way the Chinese government is going to support the sale of TikTok. And I just want to say how self defeating this is because it was barely more than a week ago that Trump was telling reporters that Beijing if they would simply go along with his plan to force the divestiture of TikTok, then he would go easy on them on tariffs. Right? Like this was his big bargaining chip of if you don't want high tariffs, you have to let The Americans have TikTok. And to my surprise, it seemed like the Chinese government was actually going to go along with that. And then before they could even get that deal out, Trump, seemingly out of nowhere, announces a brand new set of tariffs that completely scuttles the deal. So it is as if the President was essentially negotiating against himself and lost the deal that he had won.
[15:16]
Daniel Cocatello
Yeah, it does seem strange that he would not wait until after the TikTok deal was finalized and approved by all the relevant officials to then issue these tariffs if he was actually interested in getting a deal done.
[15:26]
Casey Newton
Yeah, I think that's right.
[15:28]
Daniel Cocatello
So, okay, TikTok is still in this frustrating state of superposition where they are both dead and alive at the same time. Do we think that this resolves before the end of the next 75 day extension, or do we think we, we will need yet another extension to figure out what we're doing with TikTok?
[15:44]
Casey Newton
My assumption is that on the day that Donald Trump leaves office, we will still be in the middle of one of these extensions. It'll be sort of like the 15th extension, you know, or the 23rd extension. But no, until this tariff situation gets resolved, I do not expect TikTok's fate to be resolved. It is just going to continue to exist in its weird limbo.
[16:02]
Daniel Cocatello
All right, so that is TikTok. Our last company on this list of case studies is Meta. Kasey, how is Meta dealing with this new uncertain reality?
[16:13]
Casey Newton
Well, I would say that things turned out a little bit better for them this week than maybe it looked like things were going, because tariffs were going to be a huge problem for them, too. They are a digital advertising business, and a huge number of their advertisers are small and medium sized businesses that buy ads outside the United States to export goods from foreign countries into the United States. Mike Isaac at the Times had a great piece on this this week. There's one analyst who estimates that about $10 billion of Meta's revenue from ads originates from outside the United States. So in a world where everyone was facing these massive tariffs, we were just expecting Meta to get hit really hard on the ads front. Well, now that has mostly gone away, at least for the next 90 days. So it seems like Meta is going to get some breathing room. But there is this one other outstanding question, Kevin, which is that next week Meta's antitrust case is going to trial. Right. So in 2020, during the first Trump administration, the Federal Trade Commission files an antitrust lawsuit and tries to break off Instagram and WhatsApp from Meta. It has been in the planning stages ever since. And on Monday, the case is set to go to trial. So why does all of this have anything to do with Trump? Well, Mark Zuckerberg has been giving Trump the full court press, going so far as to buy a $23 million house in Washington, D.C. recently just to get closer to and spend more time with the President. There's been some reporting that Zuckerberg was in the White House trying to negotiate a settlement with Trump just within the past few days. So there's a lot of questions right now about whether Zuckerberg will able to use this relationship that he's apparently been building with Trump in order to get rid of this case, which is in some ways an existential threat to his business. Yeah.
[17:56]
Daniel Cocatello
And we should also just say, like, this shouldn't be possible.
[17:58]
Casey Newton
Right.
[17:59]
Daniel Cocatello
The FTC is supposed to be an independent agency that has its own enforcement agenda and brings its own cases that are independent from the President. But of course, nothing is truly independent from the president. In Trump's Washington, he recently announced that he was getting rid of the two Democratic commissioners on the Federal Trade Commission. And that is historically quite unusual for a president to intervene in FTC commissioner staffing at that level. But now it is sort of of going to be staffed with people who are friendly to the Trump administration. And so presumably if he were to go to them and say, hey, let's back off this Meta case, I don't actually think we need to proceed with this. They might listen.
[18:37]
Casey Newton
And we should say that another way that Meta tried to ensure that this happened is that after the events of January 6, Meta suspended Trump from its platform for three years, and Trump sued them over that. And so after he won the presidency, Zuckerberg came along and said, hey, why don't we settle this, too? And paid Trump $25 million. Right. And I have to say, Meta was completely within its rights to suspend an account. They're allowed to suspend whatever account they want. It's a private company with a private platform. But still, just as a little gesture of goodwill, hey, Trump, here's $25 million. So if this actually happens and this lawsuit just goes away, it will just frankly be an example of open corruption.
[19:17]
Daniel Cocatello
Okay, so that is our four company case study of how tech companies are trying to do business and survive in this new uncertain. Have to ask, after going through all these examples, which of these companies would you be if you could be one? Which do you think is in the best position in this new chaotic environment?
[19:39]
Casey Newton
H. Well, you know, until maybe Wednesday, I think I would have said Apple. Right? Apple makes the iPhone. The iPhone is the most lucrative product in the history of the technology industry. And even despite some of the tariffs that we were seeing, it seemed like they were still going to be in a good position to navigate them. I was seeing analysis that they were only going to lose maybe 7 points of profitability from all of this. But the world looks really different with 145% tariff and in a world where Trump just keeps escalating this fight more and more. And so I actually do think that the picture for Apple just looks really strange. So, look, I feel a little crazy saying this, but maybe I actually would just rather be meta. Their hardware business is still a relatively small part of what they do. Mostly what they do is a digital services business. And it seems like Zuckerberg has been able to make at least some inroads with the Trump administration. Maybe they're about to get rid of this lawsuit against them. So, God, I don't know. Maybe I actually want to be meta. How about you?
[20:35]
Daniel Cocatello
Yeah, I think, I mean, as venal and corrupt as it would be for these naked attempts at flattery and persuasion to actually work and pay off, I would not underestimate how well this stuff works with Donald Trump. And I think that Mark Zuckerberg's, you know, motive here is to win at all costs. And if he needs to buy a $23 million mansion or spend time in the White House or even, you know, make some policy, adjust to appease the Trump administration and get what he wants, I think he's demonstrated very clearly that he's willing to do that. My last question on this, Casey, is about this idea of the tech capitulation to Trump. You know, in the past few months, we've observed, we've talked about the fact that a lot of these tech companies have been really falling all over themselves to appease the Trump administration. Many of them gave to the inaugural, many of them showed up at inauguration. Their CEOs were seated just behind the President's own family. The amount of flattery and ass kissing going on here for, for months now has been, I would say, notable and historic. Do you think that any of that has worked to the degree that these executives thought it would? Did the tech leaders get what they wanted out of Donald Trump?
[21:50]
Casey Newton
I think that until the tariffs, the answer was basically yes. And the tariffs are what have changed that equation. Right. If you look at how J.D. vance was talking when he went to Europe, he was echoing a lot tech company talking points. You know, he and Trump have criticized European fines against tech companies, saying, like, we need to protect and defend our American tech companies against these European fines, which was something that the Biden administration never, ever did. They've talked about getting rid of AI guardrails and just letting these companies do whatever they want with AI, which is like music to Mark Zuckerberg's ears. But look, these companies just rely on stable, normal governance to be able to conduct their business around the world. They are as plugged into the interconnected global economy as anyone else, arguably more than many companies. And Trump just came along and blew that up. And I think that it is probably dawning on them that they are probably just going to be living in chaos for the foreseeable future and it is just going to make their lives much, much more difficult.
[22:52]
Daniel Cocatello
Yeah, I think that's right. And I think that a lot of these executives have underappreciated how important stability and predictability are in their business models. I mean, these were companies, many of them, that had issues with the Biden administration. The Biden administration had issues with them. But at least with the Biden administration, these companies knew where they stood. Right? There was not this sort of day to day whiplash of stock price moving up 10%, down percent, down 10%, tariffs going up to 145% and then down to 10%. It just was not the kind of frenetic environment that we're seeing today. And so I wonder if any of them are starting to appreciate how good they had it during the Biden years, where for as much as the Biden administration may have gone after them for various things, including antitrust violations, at least they could wake up every day and understand what the world was going to look like for the next 24 hours.
[23:47]
Casey Newton
Yeah, I, I think that's true. I think that most of them would probably still be loath to admit it, but let's give it another few weeks, Kevin, and another few tariffs and then let's check back in with them.
[23:57]
Daniel Cocatello
Sounds good. Well, that's enough about tariffs, Casey. When we come back, we're going to talk about a terrifying new report about what AI could look like in 2027.
[24:31]
Dane Brugler
Whether you're starting or scaling a company, demonstrating top notch security practices is more important than ever. That's where Vanta comes in. Vanta automates compliance for SoC2, ISO 27001, HIPAA and more, saving you time while helping build customer trust. And Vanta can also save you Money. A new IDC white paper found that Vanta customers achieve $535,000 per year in benefits and the platform pays for itself in just three months. Go to vanta.com hardfork to learn how companies like Atlassian, Quora and Factory use Vanta to streamline security. Prove Trust and unlock growth.
[25:07]
Casey Newton
I'm Dane Brugler. I cover the NFL draft for the Athletic, spending the whole year working on a draft guide. I'm looking at thousands of players putting together hundreds of full scouting reports. All the nitty gritty details, the testing data, the stats, but extensive background research as well. Every journey is a little bit different. I'm on the phone with a lot of these guys. Hey, when did you start playing football? What other sports did you play? Tell me about your family. You know, learning more about these guys as people. Our draft guide picked up the name the Beast because of the crazy amount of information that's included. I have no idea how to quantify the hours I've spent putting it together. I've been covering this year's draft since last year. Trap There is a lot in the Beast that you simply can't find anywhere else. This is the kind of in depth, unique journalism you get from the Athletic and the New York Times. You can subscribe@nytimes.com subscribe.
[26:07]
Daniel Cocatello
Well, Casey, today we're going to talk about a forecast.
[26:10]
Casey Newton
And that's separate from a fork cast, which is something different.
[26:13]
Daniel Cocatello
Yeah, that's what we call our end of the year predictions episode, isn't it?
[26:17]
Casey Newton
I think so.
[26:19]
Daniel Cocatello
But today we're talking about about something different, which is this new report called AI 2027. This is a report that I wrote about last week and that has gotten a lot of attention in AI circles and policy circles this week. It was produced by the AI Futures Project, a Berkeley based nonprofit led by Daniel Cocatello, who listeners of this show may remember was a former OpenAI employee who left the company last year and became something of a whistleblower, warning about their, their reckless culture, as he called it, and is now spending his time trying to predict the future of AI.
[26:58]
Casey Newton
Yeah, and of course, lots of people are trying to predict the future of AI. But what gives Daniel a lot of credibility here is that in 2021 he tried to predict what things would look like about now. And he just got a lot of things right. And so when Daniel said, hey, I'm putting together a new report on what I think AI is going to look like in 2027, a lot of close AI observers said, oh, this is really something to read.
[27:26]
Daniel Cocatello
Yeah. And he didn't just do this alone. He also partnered with a guy named Eli Liffland who is an AI researcher and very accomplished forecaster. He's won some forecasting competitions in the past. And the two of them, along with the rest of their Group. And Scott Alexander, who writes the very popular Astral Codex 10 blog, put together this very detailed, what they call a scenario forecast. Essentially, it's a big report, a website. It's got some, you know, sort of research backing it up. And it is basically represents their best attempt to kind of synthesize everything they think is likely to happen in AI over the next few years into a readable narrative.
[28:06]
Casey Newton
Yeah. And if that sounds a little dull to you, I'm telling you, you should just go check this thing out. It's@ai2027.com and it's just super readable. And it blows through stuff that feels very familiar right now, like just sort of basic extrapolating from we are today into getting to, you know, six months, a year from now, the world starts to look very, very different. And there is a lot of research that they have to support why they think that is plausible.
[28:31]
Daniel Cocatello
Yeah. And I can imagine people reading this reporter listening to us talking about it and say, well, that sounds like science fiction to me. And we should be clear. It is science fiction. This is a fictionalized narrative that they have put together. But I would say it is also grounded in a lot of empirical predictions that can be tested and confirmed or you verified. It's also true that some science fiction ends up becoming reality. Right. If you look at movies about AI from past decades, a lot of the things in those movies did end up actually being built. So I think this report, while it may not be 100% accurate, at least represents a very rigorous and methodical attempt to sketch out what the future of AI might look like.
[29:15]
Casey Newton
And here's my bet. If you put this conversation into a time capsule and revisited it in two years, in 2027, my guess is we're going to find that good number of things in that scenario actually did come true.
[29:27]
Daniel Cocatello
I hope we're still doing a podcast in two years. That'd be good.
[29:31]
Casey Newton
That'd be great.
[29:31]
Daniel Cocatello
Yeah. So my forecast is that this is going to be a good conversation. Let's bring in Daniel Cocatello. Daniel Katello, welcome back to Hard Fork.
[29:47]
Kevin Roos
Thank you. Happy to be here.
[29:49]
Daniel Cocatello
So you have just led this group that put together this job. Giant scenario forecast, AI 2027. What was your goal?
[29:56]
Kevin Roos
So our goal was to predict the future using the medium of a concrete scenario. There is a small but exciting literature of attempts to predict the future of AI that use other methods, which is also very important. Things like defining a capabilities milestone. Like, here's my definition of AGI. Here is my forecast for how long we'll have until AGI based on these reasons and stuff. And that's great. And we've done that stuff before. We did a lot of in the run up to this scenario, but we thought it would be helpful to have a actual concrete story that you can read. And part of the reason why we think this is important is that it forces you to think about everything and integrate it all into a coherent picture.
[30:40]
Casey Newton
Well, I want to ask you a bit more about that. So, I mean, the first thing I want to say about AI 2027 is it's an extremely entertaining read. Like it is as entertaining as most of the sci fi that I have read. By the end of it, you get into scenarios, you know, humanity's survival is threatened. And so whether you think it's true or false, it is like really engaging to read. But my understanding of your aim here is that there is something practical about what you were trying to do. Right. Can you tell us about sort of the practical idea of going through this exercise?
[31:14]
Kevin Roos
Yeah, well, I mean, important background, context. The CEOs of OpenAI Anthropic and Google De Mine have all publicly stated that they're building AGI and even that they're building superintelligence and that they think that they can succeed by the end of this decade. And that's a really big deal and everyone needs to be paying attention to that. I think a lot of people dismiss that as hype and it's a reasonable reaction to say like, oh, they're just hyping their product. But it's not just the CEOs saying this, it's also the actual researchers at the companies. And it's not just people at the companies, it's also various independent people in academia and so forth. And then also, you don't just have to trust people's word for it. If you actually look at the evidence, it really does seem, seem strikingly plausible that this could happen by the end of this decade. And then if it does happen, things are going to go crazy in some way or other. It's hard to predict exactly how. But obviously if we do get super intelligent AGI, what happens next is going to look like sci fi. It will be like it'll be straight out of a sci fi book, except that it will be actually happening.
[32:16]
Casey Newton
You mentioned that if what the CEOs of tech companies say comes true, we will be living in a sci fi world. And I think for a lot of people, they're content to sort of stop thinking there. Right. They might be Willing to admit, okay, yeah, if you invent superintelligence, things will probably be crazy, but, like, I'll cross that bridge when we come to it. You're sort of taking a different approach and saying, like, no, you're going to want to start thinking right now about what it would be like if some of these claims start to come true. So maybe we could get into what some of those claims are. Sketch out for us what you think is very likely to happen just within the next couple of years.
[32:56]
Daniel Cocatello
Years?
[32:57]
Kevin Roos
Well, I wouldn't say very likely. I should express my uncertainty. Right. So past discussion often focuses on a single milestone, like artificial general intelligence or superintelligence. We broke it down into a couple different milestones, which we call superhuman coders, superhuman AI researchers, super intelligent AI researchers, and then broad superintelligence. So we sort of make our predictions for each of these stages. Even the very first one. I'm only like 50% confident that it'll happen by the end of 2023. So a 50% chance that 2027 will end and there still won't be any autonomous superhuman coding agents.
[33:34]
Casey Newton
But it's a coin flip. We might also be living in a world where, yes, you do have. Yeah, exactly.
[33:38]
Kevin Roos
So 50% chance we do have autonomous, fully autonomous artificial intelligences that can basically do the job of the cracked engineers by 2027. And then you say, okay, well, it's the next milestone. After that, after that comes comes automating the full AI research process instead of just the coding, because AI research is more than just coding. And how long does it take to get to that? Well, we have our guesses, and in our scenario, it happens like six months later. So in our story, get the superhuman coders, use them to go even faster to get to the superhuman AI researchers that are able to do the whole loop. That really kicks things off. And now you're going much faster. How much faster? We say 25 times faster for the algorithmic progress at least. Of course, your compute scale up is not going any faster at all because you still have the same amount of compute, but you're able to do the algorithmic progress 20 times faster, 25 times faster. Then you start getting to the superhuman regime. So you start getting systems that are just qualitatively superior to the best humans at stuff. And they're also probably discovering new paradigms. So we depict them going through multiple paradigm shifts over the course of the second half of 2027, ending up with something that's just vastly superior to humans in every dimension by the end.
[34:45]
Casey Newton
Yeah, let me just sort of pause and maybe underline a couple of things there. I think most people might not understand why the big AI labs are obsessed with automating coding. Right? Most people are not software engineers, so they kind of don't care how much of it is automated. But by the time you get to software that is mostly writing itself, it unlocks this other world of possibilities, and you just sort of sketch out a vision where once we get to a point where the sort of AI coding systems are better than almost every human engineer, or maybe every human engineer, then this other thing becomes possible, which is now you can just set this thing to work, trying to figure. Figure out how to build AI itself. Right? Is that what I'm hearing you say?
[35:28]
Kevin Roos
Basically, I'd break it down into two stages. So I think the coding is separate from the complete automation, as I previously mentioned. I think that I expect to see systems that are able to do all the coding extremely well, but might lack research taste. For example, they might lack good judgment about what types of experiments to run. And so that's why they can't completely automate the research process. And then you have to make a new system or continually train the old system so that it gets that taste, it gets that judgment. Similarly, they might lack coordination ability, be not so good at working together in large organizations of thousands of copies, at least initially. But then you fix that and you come up with new methods and you do additional training environments and get them good at that sort of thing. And that's what we depict happening over the first half of 2027. And we depict it happening in only half a year because it goes faster, because they've got all the coding down pat. And so even though humans are still directing the whole process, they just give orders to the coding agents, and they quickly make everything actually work. And then halfway through the year, they've succeeded in making new training runs that train the skills that the AIs were missing. So now they're not just coding agents. They are able to do the research taste as well. They're able to come up with the new ideas. They're able to come up with hypotheses and test them, and they're able to work together in big sort of like hive mind, clusters of thousands and thousands of them. And that's when things really kick off. That's when it really starts to accelerate.
[36:51]
Daniel Cocatello
In your scenario, you have this sort of choose your own adventure ending, where after this thing you call the intelligence explosion, where the superhuman AI coders get into AI, R&D, and they start automating the process of building better and better AIs. You sort of have two buttons that you can click, and one of them sort of unspools the Good Place ending, where we decide to slow down AI development and really get these things under control and solve a lot alignment. And then the red button, you push that, and it goes into this very dark, dystopian scenario where we lose control of AI, they start deceiving and scheming against us, and ultimately maybe we all die. Why did you decide to give people the option of choosing one of those two endings rather than just sketching what you believe to be the most probable outcome?
[37:41]
Kevin Roos
So we did start by sketching what we believe to be the most probable outcome, and it's the race ending ending. The one that ends with the misaligned AI is in control of everything. So we did that first, and then we were like, well, this is kind of depressing and sad, and there's a whole bunch of stuff that we didn't get to talk about because of that. And so we wanted to then have a different ending that ended differently. In fact, we wanted to have, like, a whole spread of different possible outcomes, but we were limited by time and labor, and we were only able to pull together one other outcome, which is the one that we depicted in the slowdown ending. So in the slowdown ending, they solve the alignment issues, and they actually get AIs that are actually what they say on the tin. They're not faking it. They just actually have the goals and values that were put into them or that the company was trying to train into them. It takes them a couple months to sort that out. That's why it's a slowdown. They had to pivot a lot of their compute and energy towards figuring that stuff out, but they succeed. And so then in that ending, we still have this crazy arms race with China, and we still have this crazy geopolitical crisis. And in fact, it still ends in a similar sort of way with this massive arms buildup on both sides, this massive integration into the economy, and then ultimately a peace treaty.
[38:52]
Daniel Cocatello
I'm curious, Daniel, if the events of the last week in Washington, the tariffs, this looming trade war with China, have affected your forecast at all?
[39:02]
Kevin Roos
I mean, we've been iteratively improving it, but, like, the core structure of it was basically done a few months ago. So this is all new to us and wasn't really, really part of the forecast. How would it change things? Well, if the trade war continues and causes a recession and stuff like that, it might just generally slow the pace of AI progress, but not by much. I think. Say it makes compute 30% more expensive so that the companies are able to buy 30% less of it. Maybe that would translate to a 15% reduction in overall research velocity over the next few years, which would mean that the milestones that we talk about happen a few months later instead of when they do. So the story would still be basically the same.
[39:43]
Casey Newton
So one of the things I think is most interesting about your project is the bets and bounties section where you are going to pay people for finding errors in your work, for convincing you to change your mind on key points, or for drafting some alternate scenarios. So talk to me a little bit about how that became part of this project.
[40:01]
Kevin Roos
So, like, you know, I come from the sort of rationalist community background, which is big into making predictions and making bets, putting your money where your mouth is. So I have sort of a setting interest in doing that sort of thing. But then also specifically, one of the goals of this project is to get people to think more about this stuff and to do more scenario forecasting along the lines of what we've done. We're really hoping that people will counter this with their own reasonably detailed alternative pathways that represents their vision of what's coming. And so we're going to give out a few thousand dollars of prizes to try to mildly incentivize them.
[40:34]
Casey Newton
That.
[40:35]
Kevin Roos
And then as for the bounties thing, already we've gotten dozens of people being like, you say this, but isn't this a typo? Or this feels wrong. And so I have a backlog of things to process, but I'm going to get through it. I'm going to pay out the little payments and fix all the little bugs and stuff like that. And I'm just quite heartwarmed to see that level of engagement.
[40:55]
Casey Newton
And have you taken any bets on different scenarios so far?
[41:00]
Kevin Roos
I think so far I've done one or too, but mostly there's just a backlog I need to work through.
[41:04]
Casey Newton
Got it, Got it.
[41:05]
Daniel Cocatello
Now, Daniel, you said you've been getting some good responses from people at the AI companies to this scenario forecast. I did a bunch of calling around when I was writing about this, and after we spoke, I talked to a bunch of different people, both in the AI research community and outside of it. And I would say the most frequent reaction I got was just kind, kind of disbelief. One person I talked to, a prominent AI researcher, said he thought it was an April Fool's joke when I first showed him this scenario because it just sounded so outlandish. You know, you've got Chinese espionage and the models going rogue and the superhuman coders, and, like, it all just seemed fantastical. And it was almost like they didn't even think it was worth engaging with because it was so, so far out. I'm curious if you've gotten much of that kind of reaction and what your response is.
[41:59]
Kevin Roos
A couple things. So, first of all, well, go write your own damn scenario. Then I would say you either will write a scenario that doesn't seem outlandish, which I will completely tear apart as unrealistic and just assuming basically that AI progress hits a wall, or you'll write a scenario that does feel very outlandish, but perhaps in different ways than ours do. Again, are they actually going to get to AGI on superintelligence by the end of the decade? If so, so you can't possibly write that in a way that's not outlandish. It's just a question of which outlandish thing are you going to write? And if you think maybe this is not going to happen and it's going to hit a wall, yeah, that's possible, too. I think that's reasonable. I don't think it's the most likely outcome. I do actually think that probably by the end of this decade we're going to have superintelligence, but I think it's. Yeah, yeah, that's what I mean.
[42:45]
Casey Newton
Let's say more about that, because I assume that a lot of our listeners think either truly think that it will hit a wall wall, or they're just sort of counting on it hitting a wall so as not to have to reckon with any of the scenarios that you describe.
[42:57]
Kevin Roos
Right.
[42:57]
Casey Newton
So, like, what is your message to the person that's just like. And I'll probably hit a wall.
[43:02]
Kevin Roos
I mean, I know read the literature. Like, there's.
[43:05]
Casey Newton
These people are not going to read the literature. They listen to podcasts specifically, so they don't have to read the literature.
[43:10]
Kevin Roos
Fair. Well, I could point to specific parts of the literature, like benchmarks, for example, and the trends on them. So I would say the benchmarks used to be terrible, but they're actually becoming a lot better. Meter in particular, has these agentic coding benchmarks where they actually give AI systems access to some GPUs and say, have fun. You have, like, eight hours to make progress on this research problem. Good luck. And then they measure how good they are compared to human researchers given the same Setup and, and line goes up on the graph. It seems like in a year or two they'll have AIs that are able to just autonomously do 8 hour long ML research tasks on these sorts of things. And that's not AGI, that's not super intelligence. But that is maybe the first milestone that I was talking about. Superhuman coder. Right. So I point to those sorts of trends. And then separately I would also just do the appeal to authority. Like if you're not going to read the literature, if you're not going to look at the, if you're not going to sort of form your own opinion about this and you're still just deferring to what other people think, well, then I will say, yeah, there's a bunch of naysayers out there who are saying this is all never going to happen, it's just fantasy. But also there's a bunch of extremely credible people with amazing track records both inside the companies and outside the companies, who are in fact taking this extremely seriously. Yeah, I also want to read, including our scenario. Yoshua Bengio, for example, read an early draft of our thing and liked it and gave us some feedback on it. And then we put a quote from him at the top saying everyone should read this, it's plausible. So.
[44:38]
Casey Newton
So he's a pioneering AI researcher.
[44:40]
Daniel Cocatello
Yeah. Another genre of criticism I've heard of this forecast is from people who just don't, who are just questioning the idea that if you get AIs that are superhuman at coding, they will kind of be able to bootstrap their way to general intelligence. And I just want to read you a quote from an email that I got from David Autor, who is a very well known economist at mit. And I had asked him to look at the scenario and sort of react to it and with a particular eye on like, what might this be missing as far as how it sort of assumes this easy and fast jump from superhuman coding to something like AGI. And I'll just read you what he said. He said LLMs and their ilk are superpowered incarnations of one incredibly important and powerful part of our cognition. The reason I say we're not on a glide path to AGI is that simply taking this capability to 11 does not substitute for the parts that are still missing. I think that humanity will get to AGI eventually. I'm not a dualist. I just don't believe that swimming faster and faster allows you to fly. What is your reaction to that?
[45:52]
Kevin Roos
I agree. We depict this in the course of the story. So if you read AI 2027, they have something that's like LLMs but with a lot more reinforcement learning to do long horizon tasks. And that is what counts as the first superhuman coder. So it's already somewhat different from the systems of today, but it's still broadly similar. It's still sort of maybe the same fundamental architecture, just a lot more training, a lot more scaling up, and in particular a lot more training specifically on long horizon agentic coding tasks. But that's not itself AGI I agree. That's just the superhuman coder that you get early on. And then you have to go through several more paradigm shifts to get to actual superintelligence. And we depict that happening over the course of 2027. So a key thing that I think that everyone needs to be thinking about is does takeoff speeds various variable, how much faster does the research go when you've reached the first milestone and how much faster does the research go when you reach the second milestone? And so forth. And we are of course uncertain about this. Like we are about many things. We say in the scenario that we could easily imagine it being five times slower than we depict and taking sort of like five years instead of one year, but also we could imagine it being five times faster than we depict and taking like two months, you know, so we want to do a lot more research on that. Obviously if you want to know where our numbers are coming from, go to the webinar website. There is a tab that you can click on that lists has a bunch of sort of like back of the envelope calculations and little mini essays where we like generated the quantitative estimates that are the skeleton of the story.
[47:19]
Daniel Cocatello
One other piece of criticism I've seen of this project that I wanted to ask you about was from a researcher at Anthropic named Saffron Huang who argued on X that she thought that your approach in AI 2027 was highly counterproductive. Basically that you, you were in danger of creating a self fulfilling prophecy by making these sort of scary outcomes very legible, by sort of, you know, burying some assumptions that you were essentially making the bad scenario that you're worried about more likely to actually happen. What do you make of that?
[47:54]
Kevin Roos
I'm quite worried about that as well. And this is something we've been like fretting about since day one of the project. So let me just say a little bit more about that. So first of all, there is a long history of this sort of thing seeming to happen in the field of artificial general intelligence research. Most notably Eliar Zurkowski, who is the sort of like, I don't know, er, father of worrying about AGI, at least in this generation. People. Alan Turing also worried about it. But Sam Altman specifically tweeted, you remember this tweet? Yeah. Sam specifically said, hats off to Elias Yudkowski for raising awareness about AGI. It's happening much faster now because, because of his doom saying, because it's caused a bunch of people to pay more attention to the possibility and to start investing in these companies and so forth. So I was sort of like a, I don't know, twisting the knife at him because he obviously doesn't want this to happen faster. He thinks we need more time to prepare and make it safe and so forth. But it does seem like there's been this effect where people talking about how powerful and scary AGI could be has maybe caused it to come a little bit faster and caused people to like wake up and race harder towards towards it. And similarly I'm worried about causing something like that with AI 2027. One of the subplots in AI 2027 is this whole concentration of power issue of who gets to control the army of Superintelligences. And in the race ending, it's sort of a moot question because the army of Superintelligence is just pretending to be controlled and so is not actually listening to anyone when it counts. But in the slowdown ending, they do actually align the AIs and so they are actually actually going to do what they're told. And then who gets to say that? Right. And the answer in our slowdown ending is the Oversight Committee, which is this ad hoc group of people that is some CEOs and the President who get together and share power over the army of Superintelligences. But what I would like to see is something more democratic than that. Something where the power is more distributed. I'm also afraid that it could be less democratic than that. At least we get an oligarchy with this committee. But it could very easily end up a dictatorship where one person has absolute control over the army of Superintelligences. This is yet another example of how I'm trying to not have the self fulfilling prophecy happen. I don't want people to read this and be like, I'm a CEO.
[50:12]
Daniel Cocatello
I can make a lot of money.
[50:13]
Kevin Roos
By building the misaligned. Maybe. Yeah. But all that being said.
[50:19]
Daniel Cocatello
Yeah. So any of our evil villain listeners out there steepling your fingers in your lair under a mountain, knock it off.
[50:28]
Kevin Roos
Yeah. So all that being said, we are taking a gamble that, you know, sunlight is the best disinfectant. Like, the best way forward is to just generally tell the world about what we think is coming and hope that even though many people will react to that in exactly the wrong ways, enough people will react to that in the right ways that overall it will be good. Because I am tired of the alternative of like, hush, hush, keep everything secret, do backroom negotiations and hope that we get, like, the right people in the right rooms at the right time and that they make the right decisions. I think that that is kind of doomed. So I'm sort of placing my faith in humanity and telling it as I see it and hoping that insofar as I'm correct, people will wake up in time and, you know, overall that the outcome will be better.
[51:19]
Daniel Cocatello
Yeah. All right.
[51:21]
Casey Newton
Thank you, Daniel.
[51:22]
Daniel Cocatello
Thanks, Daniel.
[51:24]
Kevin Roos
Thank you so much.
[51:28]
Casey Newton
When we cut back, Meta decides to fake it till they make it.
[51:32]
Daniel Cocatello
We'll talk about the cheating scandal that is rocking the world of AI benchmarks.
[51:49]
Kevin Roos
This is a message from sponsor Intuit. TurboTax Taxes was waiting to get your money back, which turned into worrying about getting your money back. Now Taxes is matching with a TurboTax expert who can do your taxes today and help you get you up to a $4,000 refund advance loan fast. Get an expert now on TurboTax.com, only available with TurboTax Live. Full service refund advance has $0 loan fees and 0% APR. Refund advance loans may be issued by First Century bank or Web bank terms.
[52:16]
Casey Newton
Apply subject to a approval.
[52:18]
Dane Brugler
Whether you're starting or scaling a company, demonstrating top notch security practices is more important than ever. That's where Vanta comes in. Vanta automates compliance for SoC2, ISO 27001, HIPAA and more, saving you time while helping build customer trust. And Vanta can also save you Money. A new IDC white paper found that Vanta customers achieve $535,000 per year in benefits and the platform pays for itself in just 33 months. Go to vanta.com hardfork to learn how companies like Atlassian, Quora and Factory use Vanta to streamline security, prove trust and unlock growth.
[52:56]
Daniel Cocatello
Well, Kasey, there's one other big AI story we want to talk about this week, and that is about the drama surrounding Llama.
[53:04]
Casey Newton
That's right, Kevin. Meta has a new large language model. It was hot anticipated, but I think it's fair to say it kind of stumbled out of the gate.
[53:14]
Daniel Cocatello
Yeah, they had some Llama. Llama Cred Drama.
[53:19]
Casey Newton
How many times are you gonna do the Llama drama, Pen?
[53:22]
Daniel Cocatello
Well, there's a very popular children's book called Llama Llama Red Pajama. Are you aware of this?
[53:27]
Casey Newton
I am.
[53:28]
Daniel Cocatello
So let's get into it. There has been a lot of things going on around this new language model llama4, that meta released last weekend. Casey, you've been writing about this in your newsletter this week. Catch me up. What is going on with Llama four?
[53:44]
Casey Newton
Yeah, so look, Meta has invested billions and billions of dollars in AI, and they're taking a very different approach from the AI labs that we most often talk about on this show. Companies like OpenAI, Anthropic, Google, their models are closed. You can't sort of download fine tune, rerelease them under a sort of very permissive license. But with Metas, you can. And when Llama 3 came out last year, developer said, oh, this thing is actually, like, pretty good. Like, it's not as good as the state of the art, which is often true of the open models, but it's getting up there.
[54:21]
Daniel Cocatello
Right. And so they spent all this money to develop llama 4. People have been talking for months about how this was going to sort of blow all the other open weights models out of the water, and then they release it. And what happens?
[54:35]
Casey Newton
Well, two things happen, Kevin. The first is that Meta trumpets this model model in the way that companies usually do. Trumpet their most recent models as being the most powerful ever, the most efficient. They show off a bunch of benchmarks. They say this thing is highly capable, and it's the bee's knees. They didn't actually say it was the bee's knees. I'm not sure anyone has said that in the past 70 years, but they said things like that. And one of the benchmarks that really got people's attention was LM Arena. You know LM Arena?
[55:11]
Daniel Cocatello
I know of it, but I haven't spent much time on it. What is it?
[55:14]
Casey Newton
So it's this really interesting project. It is a very small nonprofit that includes some researchers from UC Berkeley. And what they do is they get people to volunteer to help, and they'll have people enter a query, and then they'll show them the response from two different chatbots that are not labeled. And after they get the answer, the user will say, oh, I liked this one better. And they collect those votes over time. And the more that people vote for one chatbot over another, the higher it rises on L. I see.
[55:45]
Daniel Cocatello
So it's sort of like a crowdsourced leaderboard for which of these models people prefer.
[55:51]
Casey Newton
Exactly. And Kevin, you know as well as anyone else that whenever a new model comes out, the question of how good is it? Turns out to be weirdly hard to answer, right? Maybe it's really good for what you need it to do, maybe it's really bad, or maybe it's about as good as something else, but you just happen to like it better because it has a style that matches with what you're looking for. So in such a world, companies are desperate to be seen as good, but they don't have an easy way of communicating that. And that's when LM arena enters the picture. Because if you can get high enough on that leaderboard, you can point to it and say, aha, look at how we're doing.
[56:26]
Daniel Cocatello
Right? The people have voted.
[56:27]
Casey Newton
That's right. The people have spoken. And look how well we're doing. So do you know how well Llama 4 does on Ella Marina?
[56:34]
Daniel Cocatello
No.
[56:35]
Casey Newton
Llama 4 comes in at number two, just on under Gemini 2.5 Pro experimental, which is the latest model from Google, which has been through a lot of testing and which basically there is like, universal acclaim for this model. People think like, this is like a truly great model, not just at this little chatbot contest, but across a bunch of other things, including coding and, you know, a lot of other things.
[56:57]
Daniel Cocatello
So llama4 sort of immediately zooming up to number two on LM arena would seem to indicate that Meta has really cooked here. They have built this incredible model. They are releasing it to the public. Public under an open weight structure, and they are one of the leading AI labs when it comes to creating very powerful models.
[57:17]
Casey Newton
That's right. Except there's an asterisk.
[57:19]
Daniel Cocatello
Oh, boy.
[57:20]
Casey Newton
This version of llama 4 is an experimental model. Meta, on its website, says it has been optimized for chat. People start to look into this. They noticed this is not the version of llama 4 that is actually available for download.
[57:38]
Daniel Cocatello
The one that was included in LM arena was not the one that people could download.
[57:42]
Casey Newton
That's right. It had a different name. It was named Maverick 0326 Experimental. And people start to think, oh, wait a minute, what if what happened here isn't what normally happens on Elamarina? Which is people make a new model and submit it to Elamarina and see how it does. What if Meta trained a special version of Llama 4 just to be good at LM Arena? Now, I have spent the past week trying to research whether this is true. And on Monday I got Meta to sent me a statement which I guess I should read. We experiment with all types of custom variants and this experimental version is, quote, a chat optimized version we experimented with that also performs well on LM Arena. We have now released our final open source version and we will see how developers customize Llama for for their use cases. So this was really interesting to me because when they say well, it also performs well on L A Marina, it suggests that, well, maybe they just made like, I don't know, 15 of these models and they were just like, oh look, this one happens to do well on Ella Marina. That is like one possibility. I think another possibility is exactly what the cynics think, which is, oh no, they sort of reverse engineered how El Amarina works and they built a bot that was just going to beat it.
[59:07]
Daniel Cocatello
And how would you do that? Like if your goal was to create a model that would perform very well on this one specific leaderboard, what would you do?
[59:17]
Casey Newton
So L Marina has released a lot of chats over the years that sort of show which chats are considered preferable to other chats. And it seems that the users of Ella Marina really like it when the bot has a high degree of what they call sycophancy. So basically you're like, what should I have for breakfast today? And the chatbot is like, oh my God, that's such a great question. You're a genius. I love the way you're starting the day off right. That is the kind of answer that people pick. And so you can build a chatbot that essentially just flatters people constantly and it tends to do really well on Chatbot Arena. So anyways, in the aftermath of this confusion, LM arena, which is a very sort of mild mannered organization that I think is not used to being involved in public controversies, puts out a statement. And I have to read the statement, Kevin, because as gentle as it is, I found it pretty damning. They don't go so far as to say Meta cheated, but what they do say is, quote, meta's interpretation of our policy did not match what we expect from model providers. Meta should have made it clear that this experimental model was a customized model to optimize for human preference. As a result of that, we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn't occur in the future. So why is that statement so interesting to me? Well, you basically just have this tiny group of researchers over at Berkeley and Meta violates their policies so hard that they have to change the rules for how this competition even works just to get people to stop breaking the competition.
[61:01]
Daniel Cocatello
Yeah, I thought this was a really interesting set of stories. I'm still waiting for someone, ideally you, to get to the bottom of, like what actually happened inside Metta. But I think it's worth talking about for. For two reasons. One, because I think it says something about Metta and its place in the AI race, and the other, because I think it says something about the state of AI and these benchmarks and how useful they are or aren't in making sense of the torrent of new models that are coming out constantly from the big AI lab. So maybe let's take those one by one. What do you think this says about Meta's place in the AI race if it does turn out that they had sort of gamed this leaderboard to make it look like their model was better than it was?
[61:46]
Casey Newton
Here's what I think. I think if you're winning the AI race, you do not waste time trying to beat LM arena, right? What you do is what Google did, which is just release a very powerful pro version of Gemini and. And it just happens to float to the top of the arena, not because it's been optimized for conversation, but just because it's a great model that's really good at a lot of things. If you have to make a custom version of your model just to win this rinky dink competition, it's like hard for me to think of a more adverse indicator for the quality of Meta's AI program. And we should say there's been reporting in the information over the past year that the Llama 4 development process has been really frustrating for Meta, that they delayed the release twice because they weren't getting the results that they wanted. And when it finally did come out and people started to put it through other evaluations, they found that it just was not hitting the mark. In fact, Kevin Ethan Malik, former guest on Hard Fork, compared the versions of the experimental chat that was winning the leaderboard to the chats that were produced by the final open weights model. And what he found was the open weights model was producing really bad responses, essentially that the optimized model was performing so much better than the real one that it wasn't even close.
[63:03]
Daniel Cocatello
So why don't they just release the optimized model then?
[63:07]
Casey Newton
That's a great question. I don't know the answer to that. But what I'm going to assume is that whatever fine tuning is necessary to increase the level of sycophancy in the bot might be great for this sort of competition, but maybe it's really bad for coding or creative writing or the countless other things that we now expect LLMs to be good at. Right. You know, fine tuning is a very powerful process that can take a very general purpose model that's kind of mediocre at a bunch of things and make it really good at one thing. But these days people have a lot of options to choose from with their large language models, and there are a lot of them that just have very high general capability. So they're going to use those instead.
[63:48]
Daniel Cocatello
Yeah. I mean, I have not done my own reporting on the situation inside Meta with Llama for, but I will just say from a broad view, if you just step back from this particular scandal, Meta is not one of the top three AI labs in America when it comes to releasing frontier models. They are not in the top tier of frontier AI research. A lot of their key researchers have left the company. Their models are not seen as as capable as the models from OpenAI Anthropic and Google DeepMind. And I think that really frustrates them.
[64:24]
Casey Newton
Right.
[64:24]
Daniel Cocatello
I think Mark Zuckerberg and his lieutenants, they really want to be seen as part of the vanguard here. And so I would not be surprised at all if in an effort to kind of juice their numbers and appear to be leapfrogging some of their competition, they may have violated the terms of one particular AI benchmark. And that should make us question how well their overall AI program is doing.
[64:46]
Casey Newton
Absolutely. And by the way, the next time they release a model and come out with a bunch of wild claims like, you think I'm going to believe any of them? No, it's like you're going to have to go, you know, try to verify every single single claim they make independently. And look, I assume some people are going to hear this and think that I'm making a mount out of a molehill, but I just think about what Daniel Cocatello just told us about how powerful these systems are becoming and about how powerful they're about to become. And you want them to be like, sort of loyal to human beings, but you also want them to like, not be used for bad behavior. And like, if there is a company out there that is just like cheating to win benchmarks, what else can that model do? So even though this may seem like a small thing, I think it matters that we have companies building AI systems where we have some level of trust in those companies, where we believe they have some amount of integrity when it comes to how they operate. And so this was a moment where I thought, wow, my trust in Metta as an AI company has just been dramatically reduced.
[65:39]
Daniel Cocatello
Yeah. So the matter of it all aside, I think this does actually raise a really important question about the broader AI industry, which is the value of benchmarks in general. Because one thing that I've heard from AI researchers over the past year or to is that these benchmarks, these tests that are given to these models to figure out how intelligent they are, they all have some flaw built into them. Right. There's this issue of data contamination, which is, what if some of the answers on these tests are being fed into these models during their training process so that you're really not getting a sense of how capable the model is? They're just kind of regurgitating these answers that they've sort of seen already. That is an issue that there are also just the issue that all these companies are effectively grading their own homework. Right. There's no, like, federal program that sort of puts these things through their paces and releases, like, standardized benchmark scores that we can actually verify and trust. Some of these AI companies are using different methods to even apply these benchmark tests. There's these things called consensus at 64, and all these different ways that you can kind of cherry pick, like, the best answer that your model gives if you give it the test a bunch of times and use that for your score. So I think we are just losing our ability to trust the way that we measure these AI models in general.
[66:57]
Casey Newton
Yeah. And it's so frustrating. You know, I was thinking, Kevin, imagine like, in the early 2010s. And it's not just that, like, Instagram comes out as an app in the App Store. You have Instagram, you have Instagram 01. You have Instagram, oh, one mini. You have Instagram 01. Deep research. And it's like, download the one that's best for you. You'd be like, why are you making me do any of this? Right. Like, just give me the one thing that works. And while every AI lab is trying, realize that in the meantime, we're living through this Cambrian explosion of large language models. And on one hand, I think that makes it really important for there to be benchmarks so that we can look at a glance to have a basic sense of, is this thing even worth my time? But on the other hand, that makes the benchmarks such an attractive target for gaming and outright cheating.
[67:44]
Daniel Cocatello
Yeah.
[67:44]
Casey Newton
And so that's why the researcher, Andre Karpathy, has said that we have what he calls an evaluation crisis, where when a new model comes out, the question of how good is it? Is just very diffic answer. I've been wondering what we can do as journalists to try to answer those questions better. Like, is this a place for journalists to actually say, okay, new model came out. We're going to have our own custom set of evaluations. Maybe we're going to keep those private in some way to prevent them from being game. But. But what. What solutions do you see here to this crisis?
[68:14]
Daniel Cocatello
Well, at the risk of scooping myself here, I will disclose that I am actually starting to work on my own benchmark because I think that that part of how we are going to make sense of these AI models is that people will just start developing their own set of tests to give to new models, not necessarily to determine, like, their overall intelligence, but to determine how good they are at the things we care about. You know, personally, I don't care much if an AI model is getting a 97% on the graduate level physics exam or a 93%. Right. That. That does not make a huge difference in my life because it's still higher.
[68:51]
Casey Newton
Than you're going to get.
[68:51]
Daniel Cocatello
Exactly. And I am not a gradu level physics researcher, so I might care more about whether a model is good at creative writing or not. And I might want a battery of tests to determine that. And so I think that as these things become more critical in people's lives and work, we will start seeing these more personalized tests and evaluations that actually measure if the models are good at the things that we care about.
[69:17]
Casey Newton
Yeah.
[69:17]
Daniel Cocatello
What do you think?
[69:18]
Casey Newton
Yeah, I think that's a great point. And after you told me that you were going to do this, I sort of started a scheme and thought, you know, I want my own benchmark too, because there are, I don't know, I'm sure I can come up with a list of like 10 things that I wish AI could do for me today that it still can't. And so maybe it's time that I should start scenario planning.
[69:35]
Daniel Cocatello
What's one of your tests that you want to give AI models to determine if they're capable or not?
[69:40]
Casey Newton
Well, like, for example, you know, I have a newsletter that has customer service issues. People email us, they say, oh my gosh, can I change my email address?
[69:46]
Daniel Cocatello
The writing. And this is so bad.
[69:48]
Casey Newton
People love the writing. That's all I hear about, the writing. And people are saying, this are humanists writing this? That's insane. But I would Love to be able to be able to automate some of that, you know, make it easier for people. Oh, you need to download your invoice, which is a question. We get a lot. It's like, okay, yes, actually, we're just going to sort of handle that in an automated way. So that's just like one very easy thing. And you know, if you're thinking, oh, Kasey, I actually have a product that can already do that for you. Please don't email me. It can't. I've been through this.
[70:14]
Daniel Cocatello
Can I tell you one of the things that I want to test AI on? Yeah. So, as you know, I just moved into a new house. And so as a result, I have spent like between a third and half of my waking hours over the last few weeks thinking about hanging pictures. Hanging pictures is one of my least favorite tasks in the world. You have to do math, you have to bring out the laser level. I mean, it's a huge process.
[70:40]
Casey Newton
The golden ratio.
[70:41]
Daniel Cocatello
Yes. And I would love for an AI system to be able to hang pictures. Pictures for me.
[70:47]
Casey Newton
That's beautiful.
[70:48]
Daniel Cocatello
And as soon as that happens to me, that's AGI.
[70:50]
Casey Newton
Now, would that involve a robot?
[70:52]
Daniel Cocatello
Probably, yeah. So we got to make some progress before we get there.
[70:55]
Casey Newton
But if you're listening to this and.
[70:57]
Daniel Cocatello
You'Re working at one of these robotics companies, get on it.
[71:26]
Dane Brugler
Whether you're starting or scaling a company, demonstrating top notch security practices is more important than ever. That's where VANTA comes in. VANTA automates compliance for SoC2, ISO 27001, HIP and more, saving you time while helping build customer trust. And VANTA can also save you Money. A new IDC white paper found that Vanta customers achieve $535,000 per year in benefits and the platform pays for itself in just three months. Go to vanta.com hardfork to learn how companies like Atlassian, Quora and Factory use VANTA to streamline security, prove trust and unlock growth. This episode is supported by kpmg. Are you responsible for deploying? Because safe and secure AI deployment is critical, KPMG is making its groundbreaking AI risk and controls guide available at no cost. Their extensive experience in AI risk management, cybersecurity and tech can help you identify risks, design controls and protect your AI environment. To learn more, visit www.kpmg.us hardfork. That's www.kpmg.us hardware hard fork hard Fork.
[72:38]
Casey Newton
Is produced by Rachel Con and Whitney Jones. This episode was edited by Matt Colette and fact checked by Ina Alvarado. Today's show was engineered by Chris Wood. Original music by Rowan Nimisto and Dan Powell. Our executive producer is Jen Ponant. Video production by Sawyer Roque, Pat Gunther and Chris Shot. You can watch this whole episode on YouTube@YouTube.com hardfork Special thanks to Paul Schumann, Quing Tam, Dalia Haddad and Jeffrey Miranda. You can email us@hardforky times.com with your AI doomsday scenario.