
“As you may have noticed, it is not easy to build data centers here on Earth.”
Loading summary
Rovo AI Promoter
Meet Rovo, your AI powered teammate by Atlassian. With Rovo, you can streamline your workflow and power up your team's productivity. Find what you need in a snap with Rovo Search. Connect Rovo to your favorite SaaS apps to get the personalized context you need. From brainstorming to complex requests, Rovo Chat delivers insights in the context of your work. And Rovo is already built into Jira and Confluence. Get started with Rovo, your new AI teammate@rovo.com.
Kevin Roose
Casey, what's going on?
Casey Newton
Oh, my gosh. So the other day I'm walking down Market street and for context, this is like, you know, maybe like one of the main thoroughfares in San Francisco. And over the past year, this is a little bit obnoxious, but I would say four or five times someone has recognized me from the podcast and stopped me and wanted to take a picture. It always makes my day hard for listeners are the best. It had happened to me just the previous week. Well, then this weekend, I'm coming home from the gym, and you know how you are when you're coming home from the gym. Your face is flushed.
Kevin Roose
Yeah, you're sweaty.
Casey Newton
Sweaty. Your hair's, you know, all over the place. And this very sweet young woman comes up to me and asks for a picture. And of course I'm thinking I kind of look, you know, gross right now, but anything for a hard fork listener, right? And she's there with a guy who I assume is, you know, her boyfriend or her husband. And so I, you know, I put on a show and I'm introducing, you know, hey, and, you know, what's your name and all that. She hands me her phone and they go and they stand up against the street with their backs turned, you know, so they can get kind of San Francisco in the background. And that's when I realized these people have no idea who I. They are just tourists and they want a picture of themselves in San Francisco.
Kevin Roose
I'm Kevin Roos, a tech columnist at the New York Times.
Casey Newton
I'm Casey Newton from Platformer, and this is Hard Fork. This week, Google's crazy new plan to build data centers in space. Is this the final frontier of the AI bubble? Then former Trump White House policy advisor Dean Ball tells us what Republicans really think about AI. And finally, it's a history mystery. Professor Mark Humphries is here to talk about how an unidentified new Gemini model offered mind blowing results on a challenging research problem. It was about Canada.
Kevin Roose
It was not about Canada.
Casey Newton
It was basically about Canada.
Kevin Roose
It was about sugar.
Casey Newton
It Was about the sugar trade in Canada.
Kevin Roose
Well, Kasey, today we are gonna start by talking about space.
Casey Newton
Finally. The final frontier, some call it.
Kevin Roose
Yes. Because I have been looking into this story that I have become obsessed with, which is that we are gonna build freaking data centers and put em in space.
Casey Newton
I'm very excited to talk to you about this. I would say I have sort of been skimming the headlines, so I have a lot of questions for you about this. But I think whenever we can start an episode in space, that is a great place to start. Because I don't know if you look lately, but who wants to be on planet Earth right now? Okay, let's. I like an alternative, I'll say that much.
Kevin Roose
Yes. So this has been a thing that has been quietly percolating in the tech industry. Obviously we have this giant data center build out going on here on Earth. Every company wants to build these giant data centers, fill them with these GPUs, use them to train their AI models and do things like that. As you may have noticed, it is not easy to build data centers here on Earth.
Casey Newton
No, I've tried. I got nowhere. I mean, I felt like I was building IKEA furniture. It's like, you want me to do what?
Kevin Roose
And you need land, you need permits, you need energy to power the data center. You need to do all of this relatively quickly. And people sometimes get mad when you try to put up a data center where they live. Also, we are facing an energy crunch for these data centers. There is literally like not enough capacity on our terrestrial energy grid to power everything. That may get worse as people demand more and more AI and the growth continues exponentially.
Mark Humphries
Yes.
Kevin Roose
So a couple companies, including just recently Google, have now announced that they are exploring a data center in space.
Casey Newton
Which sounds like a joke when you say like any building, anything in space seems so impractical, so expensive, so doomed to failure that it truly does just sound like a joke. But what you're saying to me right now, Kevin, is that there is a legitimate serious plan to try to do this.
Kevin Roose
Yes. I also thought this was like some kind of crazy science fiction moonshot thing. And it is like an experimental thing. No one is doing this like today. But Google has put out a paper on what it calls Project Suncatcher.
Casey Newton
Yes, Suncatcher, which sounds like a lost Led Zeppelin single, but is somehow a project to build data centers in space.
Kevin Roose
Yes. So this is. They're calling this a moonshot. They're saying, you know, this might not happen for several more years, but this is an active area of research for them. There are a couple other companies that have been doing this. Jeff Bezos, Eric Schmidt, other other sort of big tech folks are really interested in this idea and I think we should talk about it today just to kind of give people a sense of like what the future may hold if we continue to demand all of this power and all of these data centers to run these giant AI mod.
Casey Newton
Yes, I think it is so worth talking about because among other things, it indicates that we are at the stage of this bubble where people have come to feel like we cannot provide enough electricity for the future. We want to build on the planet that we live. We actually have to get off the planet to realize our ambitions. So if nothing else, that just tells you how ambitious these companies are getting and the crazy big swings that they're about to take.
Kevin Roose
Totally. So yeah. Where should we begin?
Casey Newton
Well, let's talk about Project Sun Catcher first. What exactly is Google proposing to do and what did it say about it last week?
Kevin Roose
So this was a blog post and a paper that came out last week. They are calling this a future space based, highly scalable AI infrastructure system design. And basically they have started doing some testing to figure out if a space based data center would actually be possible. And the problem that they're trying to solve here is twofold. One, as we mentioned, it's like very hard to build stuff here on Earth. You need all the permits and approvals and energy. The second is like the sun is a really frickin good source of energy. Right. It emits something like 100 trillion times as much energy as the entire output of humanity. But building solar panels on Earth has some issues. Mainly the sun sets for half the day, so you can only get power for half the day, which has long.
Casey Newton
Been one of people's primary criticisms of the sun.
Kevin Roose
Yes, but if you put the solar panels and the data centers into low Earth orbit and you put them on something called the dawn dusk orbit path, which I did not just look up this week. I definitely knew what that was from my high school astronomy class. You can effectively give them nearly constant sunlight and the solar panels can be much more productive, up to eight times as productive as solar panels here on Earth.
Casey Newton
So let me ask you this, because when you say data center, I picture one of these like giant anonymous, you know, office complexes that's like the size of six, you know, football fields that they're, you know, building all over the heartland right now. I assume that they are not going to build something like that in space.
Kevin Roose
No, these would be if you look at some of the mock ups that some of these companies, there's another company called Stark Cloud that's sort of like a startup that's got some funding from Nvidia. And if you look at the mock up that they have made, it kind of looks like a giant bird, but like the wings are these like very thin solar panels, these sort of like arrays of solar panels. And the kind of the, the center of it is kind of this, these clusters of computers essentially. And it's just kind of out there orbiting the space, in space. And the wings are kind of catching all of the sun and they're feeding that energy into the computers at the center of the cluster.
Casey Newton
Got it. So we're in one of these giant terrifying bird like structures that are sort of swarming over the Earth in this future and they're getting so much energy from the sun and it's so efficient and that is sort of driving all of the compute that's happening inside the computers. How does whatever is happening inside the giant terrifying bird get back to us down here on Earth in a timely fashion?
Kevin Roose
That's a great question. And I asked this to a couple people I talked to over the past week or so who've been working on this stuff and what they told me is this is actually not that much different from something like Starlink, right? You're sending data from a satellite or a series of satellites back to Earth. It's not that far away, right? It's not like these are light years away. It's like it might take, you know, a couple more milliseconds than you would take to transmit something here on the Earth. And that is that we know how to do.
Casey Newton
Got it. Okay, Kevin, so last week Google puts out a blog post about this. Give us a sense of where they are in this experiment.
Kevin Roose
So I would say they feel like they are pretty early in this process. There are still some technical barriers to overcome and we can talk about those. But they have started actually running tests to figure out things like, well, if we send our TPUs, our AI training chips out into space, like will they just sort of fall apart because of all the radiation out there? And they actually did an experiment that they described in this paper where they took just a normal like tpu, like the kind that they would put in their data centers here on Earth, and they like took it to a lab and they hit it with a proton beam that was supposed to like simulate a very intense kind of radiation that these chips would experience if they were floating out in Space. And they found that their newer TPU's actually withstood radiation much better than they thought. So these things can apparently handle radiation well beyond what's expected of them in a five year mission.
Dean Ball
Hmm.
Casey Newton
Now if you watched the Fantastic Four First Steps earlier this year, you know that cosmic radiation is what transformed the Richards family and Ben Grimm into the Fantastic Four. Has Google addressed that at all about sort of any of those concerns?
Kevin Roose
They did not address that to my knowledge. They did address some other potential hurdles. One of them is like, if these chips glitch out or break, how do you fix them if they're in space? And I asked a couple people who have worked on similar projects and they basically, basically said, yeah, we got to figure out how to like get robots up there to like fix the data centers.
Casey Newton
Got it. So they'll focus on using robots for that. I guess that makes sense. Now, am I right that Google is actually planning to do some kind of like test launch within the next couple years on this?
Kevin Roose
Yeah, they are planning to test this in 2027 by launching two prototype satellites in partnership with Planet, which is a company that sort of sends up these little tiny satellites into space for mapping and things, is their plan. There are also other companies, including Star Cloud, which is also planning to send up some prototypes pretty soon. So they are moving forward with testing on this. I will say I think this is probably not going to happen in any real way for at least a couple of years, in part because things are still very expensive to send up into space. It is not right now economically feasible to send up a whole bunch of chips and a whole bunch of satellites up into space. It costs many times more than what you would need to build a comparable data center here on Earth.
Casey Newton
Yeah. And people here on Earth are saying that building the data centers that we're doing here on Earth are not economically feasible. Right. So I can't imagine how much more out of control the costs are going to be once you leave orbit. One thing I thought was interesting in the Google blog post was that the company tried to play Suncatcher in the line of self driving cars. So what is now Waymo and quantum computing, which hasn't quite become a mainstream technology yet, but has made a lot of strides, you know, just within the past year we did an episode on it not all that long ago, and they're sort of saying like Suncatcher is kind of one of those where we are willing to work on this for 8, 10, 12, 15 years to make it into a mainstream technology. And So I took that as Google saying, like, hey, this is not just like some crazy little experiment that a couple engineers are working on in their spare time. Like, it seems like they're serious about this.
Kevin Roose
I think they're serious about this. And I think they are looking out to a Future, you know, 5, 10, 15 years away where kind of the demand for AI and AI related tasks is just essentially infinite. Right. It's like this is not something that 10% of people are using every day. This is something that 100% of people are using constantly. That there are like sort of entire companies or sectors of the economy that have been sort of fully turned over to AI. And maybe that happens and maybe it doesn't. But if it does happen, we're going to need a lot of energy and a lot of data centers and we may run out of land and power here on Earth.
Casey Newton
Now, something that I did not realize until after I had read about Suncatcher is just how many other companies are looking at doing the same thing. Can you kind of give me a high level overview of like who else is playing here and does it seem like anyone else is further along than Google is right now?
Kevin Roose
Yes. So as I mentioned, there's this company, Star Cloud, which is a Y Combinator startup that got some funding from Nvidia. They are sort of the main ones here doing this. There's also a company called Axiom Space that is doing this. And we think that there are some Chinese companies or at least one Chinese effort to do a space based data center, although they've been a little bit vague about the details there. And then the information had an article about some comments that Eric Schmidt and Jeff Bezos have made suggesting that maybe they are also interested in or looking at doing something like this.
Casey Newton
Well, you know, Jeff Bezos just put Lauren Sanchez into space so that you have to wonder if that was kind of a first step towards something in this vein.
Kevin Roose
Yes.
Casey Newton
You know, one thing I think that is interesting about this approach, Kevin, is that as you know, we've seen an increasing amount of resistance from people in sort of local communities to having data centers put in their towns or near their towns. They're worried about how it's going to affect the cost of energy for them. Right. They're worried about water usage or the environmental impact. And so I think that, you know, if this sort of thing comes to pass, we'll have gone from like, you know, just like the NIMBY saying not in my backyard to this new group of people that I'm calling The knobs that are saying, not on my planet, you know, and they want all the data centers just built up in the sky. So do you think knobs are going to become a sort of major political force?
Kevin Roose
I do. Although I also think that eventually people may sort of start to not want them in space either, but it's going to be harder for them to protest. You got to get in a rocket, go up there into low Earth orbit. It's very inconvenient now.
Casey Newton
Why wouldn't people want them in space?
Kevin Roose
Well, there are various people who think that this is going to create a lot of, like, space debris and things like that. That would eventually be bad. I think talked to some folks who, you know, work on this stuff, and they were like, they don't think that's really going to be a big deal. There's all kinds of stuff up in space now. We generally don't pay much attention to it, but I can see this sort of sounding to people like Elon Musk, you know, proposing to build colonies on Mars or something. Like, it's just like, it's like, too futuristic. It's too sci fi. And it sounds like these very, you know, rich companies and individuals trying to kind of flee from their problems here on Earth by, like, sending stuff into space.
Casey Newton
Here's what I would say. I would love to be, like, living at a time when one of the top 10 concerns I had in my life was space debris. If I ever get there, Kevin, I will be in heaven. Heaven.
Kevin Roose
Well, you'll be in low Earth orbit.
Mark Humphries
Exactly.
Kevin Roose
Now, I have a question for you.
Casey Newton
Yeah, yeah.
Kevin Roose
Would you go to space?
Casey Newton
Yes, absolutely.
Kevin Roose
Would you go to space to fix a data center?
Casey Newton
I mean, what is the salary for that job?
Kevin Roose
Very high.
Casey Newton
I mean, there's probably a certain price for which I would do it. But here's the thing, you know, I'm not handy around the house. Yeah. It's like if I, you know, if, if, if ChatGPT doesn't know what to do, I'm calling the handyman. Okay.
Kevin Roose
I will just say, yeah. That I think we should make an offer to Google, which is if you guys get this project Suncatcher up into low Earth orbit, we will do a podcast episode where we go up there and cut the ribbon.
Casey Newton
You were just dying to be exposed to massive levels of solar radiation.
Kevin Roose
You know, I just think it'd be fun.
Casey Newton
When we come back, the ball is in our court. Dean Ball talks about how he crafted the AI action plan.
Rovo AI Promoter
Know the feeling when AI turns from tool to teammate if you're Rovo, you know, with Rovo you can streamline your workflow and power up your team's productivity. Find what you need in a snap with Rovo Search. Connect Rovo to your favorite SaaS apps to get the personalized context you need and Rovo is already built into Jira and Confluence. Discover Rovo by Atlassian and streamline your workflow with AI powered search, chat and agents. Get started with Rovo, your new AI teammate@rovo.com 1.3% it's a small number, but in the right context, it's a powerful one. Stripe processed just over $1.4 trillion last year. That figure works out to about 1.3% of global GDP. And powering that figure are millions of businesses finding new ways to grow on stripe like Salesforce, OpenAI and Pepsi. Learn how to build the next era of your growth@swepe.com Enterprise AI is transforming the world and it starts with the right compute Arm is the AI compute platform trusted by global leaders, proudly NASDAQ listed, Built for the future. Visit arm.com discover well, Casey, recently we've.
Kevin Roose
Been talking about some state level AI regulations that have been passed and signed into law. But today we're gonna have a discussion about national AI policy.
Casey Newton
Yeah, I think that the states have been acting because the federal government has not really passed any legislation related to AI just yet. And that's left us with a lot of question around how the administration has been thinking about AI.
Kevin Roose
It's been a little confusing, I think, especially, you know, in this administration. It has not been particularly clear to me what President Trump and his allies believe about things like whether we are headed towards some kind of an AGI moment or how the federal government should try to protect against some of the risks of very powerful AI systems. So the conversation that we're going to have today I think will help us answer some of these questions and just kind of get a better sense of like what is happening in Washington, especially on the right, when it comes to AI and AI policy. Yeah, so earlier this year, Dean Ball spent several months working as the White House's senior policy advisor for artificial intelligence and emerging technology. He was brought into the White House in order to lead the drafting of the White House's AI action plan. And in that role in the White House, Dean not only got to see how the AI policy sausage was made at the highest levels of government, he actually got to make the sausage himself. He was sort of responsible for taking all these different ideas from the various parts of government and putting them together into a document that would represent the administration's sort of official view on AI.
Casey Newton
Yeah. And while he was there, Dean also got a good sense of who are the various factions on the right when it comes to AI policy? What do they believe? What are the competing incentives? Who has whose ear? And I think if you want to understand the likely path forward for AI regulation over the next few years, that's a really important part of the conversation.
Kevin Roose
Yeah. So Dean left the White House in August after the AI Action Plan was released. And since then he's become a senior fellow at the foundation for American Innovation and the author of Hyperdimensional, a newsletter about AI and policy.
Casey Newton
And because we're going to be spending a lot of time in this segment talking about AI, let's do our disclosures.
Kevin Roose
I work for the New York Times. We're just doing OpenAI and Microsoft over alleged copyright violation.
Casey Newton
And my boyfriend works at Anthropic.
Kevin Roose
Let's bring him in. Dean Ball, welcome to Hard Fork.
Dean Ball
Thank you both for having me. It's so good to be here.
Kevin Roose
So how did you end up at the White House earlier this year working on AI policy? What was your background before that?
Dean Ball
I was a think tanker. A lot of it was not tech policy. A lot of what I did was state and local policy. But I was always very interested in tech. And basically when the AI policy conversation really took off, sort of early 2023, I made the decision to start writing about AI basically as a part time gig, just like purely on the side, wasn't being paid for it or anything. And then eventually I decided I really liked it and I was finding my voice and I was hired by the Mercatus center at George Mason University to go spend some time there. Spent about a year there and then was recruited to the White House on the basis of primarily my writing on substack. And my substack is called Hyperdimensional. It's where I talk about AI stuff.
Kevin Roose
The substack to White House pipeline. I feel like that is you are not the only person who has posted their way into a job in the federal government.
Dean Ball
You can post your way to the federal government. It's really true. And probably I'm probably like a big chunk of it was probably my posts on X really, which is maybe even more scary. But yeah.
Kevin Roose
So, okay, you get this call, you go to the White House. What did you find there with respect to AI policy? Was there like a coherent single view of how AI should be governed and regulated?
Dean Ball
I would say there are coherent Intuitions, but the field is so nascent that. And there haven't been a lot of fights where dividing lines have really firmed up yet. I think, by the way, this is true on the left as well. I don't think that those intuitions have formed yet into a lot of different sort of very specific policy positions. I don't think they've concretized yet is really what I'm saying. I think though, there's a combination of excitement and some worry and some confusion, probably equal parts, which is, you know, in a macro sense that's probably roughly where I am too, actually. And that sounds about right to me.
Casey Newton
You say there were some coherent intuitions about AI in the administration. What were those intuitions?
Dean Ball
I think coherent intuition number one is AI is the most important technological, economic, scientific opportunity that, that this country and probably the world at large has seen in decades and quite possibly ever. I think basically everyone shares the assessment this is gonna be extremely powerful and it's gonna be really important. And second intuition that directly follows is there are gonna be some risks associated with this that are sort of familiar to us and things that are cognizable under existing sort of policy frameworks and others which might be more alien and might be like risks that we don't really even have concepts for as clearly yet. And then maybe the third intuition is regardless of those risks, it feels like AI is going to play a very big role in the future of American global leadership.
Kevin Roose
Yeah, that's really helpful and kind of helps me get a sense of like the lay of the land when you arrived. I'm wondering if you can help me understand the kind of intra right factions when it comes to AI, because I think I've identified at least two different views of AI that I've heard coming from prominent Republicans and maybe you could call them like the David Sachs view and the Steve Bannon view. David Sacks, the President's AI czar, is constantly talking online and on his podcast about, you know, these AI doomers who he thinks are sort of ridiculous and are overhyping the risks of AI and trying to sort of, you know, get their way on policy, calling them woke, implying that they're sort of trumping up these fears of, no pun intended, of job loss and things like that to sort of get their way when it comes to policy. Then there's Steve Bannon, who has been, you know, out there talking about the existential risks from AI. And you and I were both at this Curve conference, actually, all three of us were there a few weeks ago where one of Steve Bannon's sort of guys was there and gave this very fascinating talk about how he thought like he was sort of in league with the, the so called doomers who believe that this could all go very badly very soon. Are there more views on the right than those two? Are those sort of the primary camps?
Dean Ball
No, I think that there's a whole spectrum. I can't speak for either David or Steve of course, but I would put them on like roughly polar opposites in terms of how, of how conservatives talk about this issue. But I think there's a whole spectrum in between. So first of all, you've got national security people. You have national security people who don't actually know a ton about. And this is again, both sides here know, they're just, they think of this as a strategic technology that's important for US competition with China and other things. And also maybe they think there's some national security risks and they, but they're not really thinking about like the domestic policy. They're not really thinking about regulation. They're not thinking EA versus Doomer. So that would be one I think also you know, related to the sort of Bannon viewpoint. But, but maybe, you know, more toward the middle would be like people that are worried about kids safety primarily. There's a lot of conservatives who would distance themselves from the AI doomer view, but who would also distance themselves from the pure accelerationist view. And they would use the lessons we've had with social media as an example. So sort of that kid safety viewpoint for these people. Very often the issues of things like LLM psychosis, of course, teen suicidality with chatbots being another very salient issue for this group, for everyone, I hope. But yeah, so there are others in between and I guess I would put myself somewhere in kind of the middle in a weird fusion.
Casey Newton
Where does industry fit into that spectrum? Like my sense from the outside is that industry groups and lobbyists have had a lot of success in this administration in getting what they want.
Dean Ball
Want.
Casey Newton
Where are they in those conversations?
Dean Ball
I think it really depends on incentives. People in policy conversations very often will refer to industry as being this kind of monolithic, coherent entity. It's of course not. And there's different people that have different incentives. So if you're a US hyperscaler, you don't hate the export controls, you don't want more competition for the same chips.
Kevin Roose
That you're trying, meaning like a Microsoft or Google or an Amazon.
Dean Ball
Yes, Microsoft, Google, Amazon web services, etc. You don't hate that because, like, a, you don't want Chinese firms competing for your chips, but even if it's not the same chips you're competing over, you don't want to be implicitly competing over space at TSMC fabs to make the chips. So, you know, hyperscalers, you know, they will definitely have, like, nuanced positions on export controls, but by and large, like, their incentives are not to hate them, and they largely don't. Frontier Labs. I mean, they want to make money selling tokens to people, so they want access to chips. But I think there's some people who believe, and from a political theory perspective, it's not wrong to believe that ultimately they want to create moats. And I think there's a lot of ways you can make moats. Seems to me like the main way they're trying to make moats right now is through infrastructure that they've basically all come to the anthropic today announced a $50 billion commitment to build their own data centers. Google obviously does this. OpenAI does this through Stargate. Meta does this. XAI does this. Everyone does this. Everyone's building infrastructure. And the basic view is like, well, the models maybe are not your moat per se. Like, the parameters of the model are not your moat, but perhaps the infrastructure is. And so, you know, these are all competing interests. And no one's making illegitimate arguments here. Everyone's operating from incentives. And of course, the job of government is to sort of solve for the equilibrium.
Kevin Roose
Dean, is there. Is there a MAGA view of AGI?
Dean Ball
Not yet, no, not really. I don't know that there's any political persuasion view of AGI. I think MAGA might actually be the closest to having one. And I think it's at the moment, maybe the persuasion, at least from what I see online, is like, maybe it's sort of more doomery.
Casey Newton
I believe we saw a bipartisan bill introduced over the past week that would require reports of job losses due to automation, which suggests that there is some increasing attention to that likelihood.
Dean Ball
Yeah, well, I mean, so there's this big question in the AI field. At places like the Curve, in places like Lighthaven, there are these gatherings of various sort of doyens of the AI community and they get together, and the main question that people talk about is, like, when are the pitchforks going to be out for this technology? And what is gonna cause the pitchforks to come out? And I have come to the conclusion that rather than it being a singular issue, it's going to be this kind of miasma. Of issues. It's gonna be like, you know, it's sloppification, it's not safe for kids, it's driving up your electricity prices, it's using all the water, it's taking your job and it's taking your job. And also it's going to kill everyone. And also, by the way, it's fake. It'll be all those things and kind of this weird sort of Vichy swaz.
Casey Newton
The aspect of the AI action plan that I find the most annoying is the attention on the ideology of the chat bots and the suggestion, you know, that they should be able to, you know, respond in some ways but not in other ways. Can you kind of illuminate the discussions that were being had and what, what the administration actually wants out of these models?
Dean Ball
Yeah. So I think the main point here, first of all, like, the most important thing you're talking about, the woke AI executive order, is what it is. It's traditionally phrased. This is an executive order that deals with federal procurement policy. In other words, this is not an executive order. It is not a regulation on the versions of AI models that a company like Anthropic or OpenAI or any other company ships to consumers or private businesses. This is purely about the versions of their models that they ship to the government. And the government is saying, in this case, we do not want to procure models which have top down ideological biases engineered into them. We would like our government employees to have access to models which are, you know, I think objective is a really hard word. Obviously we've been like debating about, like, what is truth for, you know, since there was language. Right. So I don't think we're going to resolve that. I have a feeling the General Services Administration guidelines will not resolve that issue. You know, I think it's folly to even try and I think the executive order doesn't try. You know, the executive order steers clear of doing so. The executive order says instead of you just, we don't want you as the developer imposing some sort of worldview on top of the model.
Casey Newton
Mm, well, good luck with that, I guess.
Kevin Roose
Well, I, I want to ask one follow up on that because my sense is that, you know, the Trump administration and Republicans in Congress have been very upset with how the Biden administration sort of jawboned how they applied pressure to social media companies to take down, you know, misinformation or what they considered misinform information about the COVID vaccines or things like that. That was seen as like, very inappropriate. In fact, they're like ongoing investigations of the contacts between the Biden White House and the social media companies over this issue.
Dean Ball
Yes.
Kevin Roose
And then we turn around and we see this, like, Woke AI executive order, where it's like, I understand the subtle point you're making about, you know, this is not regulating the models that the companies are releasing to the public. It's just the ones that they're selling to the government. But, like, we all know that there's.
Casey Newton
There.
Kevin Roose
There's one set of bottles, right? And they get. They get built and they get sold to various customers. And I think, you know, it's reasonable to see that and think, okay, this is the Trump administration doing exactly what it got so mad at the Biden administration for. For doing, which is to contact the tech companies and tell them, hey, this is how your product should be working. This is the kind of things it should be allowing and not allowing. And I don't know, does that seem at all to you? Hypocritical?
Dean Ball
Well, so look, I think that there is an inherent tension here, and this is a tension that has existed on the right, and it's particularly existed sort of. Post Trump 45, post President Trump's first term, there is this argument that exists of should we stick to our principles that the government shouldn't be doing this kind of jawboning, or should we accept that the government. Government has this power and now we need to throw it back at the left. Right. I can tell you that I personally have always definitively been on one side of that argument, which one review. We should stick to principles. We should not fight. We should not.
Kevin Roose
No job boning from anyone.
Dean Ball
Yeah, you shouldn't do that. I mean, like, you know, you shouldn't do that. At the same time, I think the government totally has a right to say, and again, what we're talking about here, like, I wouldn't think of this as like a model training thing. I would think of this as the sort of thing that can be relatively, like, trivially easily changed by the developer. Right. So models that are sold to the government already have compliance burdens that are significantly higher than this executive order. Right. They have to comply with the Freedom of Information Act. They have to comply with the Presidential Records act if they're sold to the White House. There's all sorts of data stewardship laws that are way more difficult than anything in the Woke AI executive Order. The Woke AI executive order basically says, like, like, you need to disclose in the procurement process to the agency from whom you're procuring. You need to disclose like what the system prompt is. You can change a system prompt for a specific customer. It's not that hard. And I would only point out that, like, I will just say it here right now that like, if you did try to use federal law to compel a developer to change the way they train the models that they serve to the public, that is unambiguously unconstitutional. It is a violation of the First Amendment, you are violating that company's speech rights and you are violating the American citizens speech rights who might use that model. So it would be quite dire and grave for the government to do that. And I am confident that the Woke AI executive order was not intended to do that.
Casey Newton
So, Dean, I really enjoy your newsletter. I've been reading it since before you joined the government. I continue to read it today. One point of view that you advocate for with great frequency is that most, if not all AI regulation should be done at the federal level. And you spend a lot of very valuable time looking into how states are attempting to regulate AI in ways that I think you believe are mostly bad. Could you kind of give us a high level overview of your interest in this subject and what you see states doing that concerns you so much?
Dean Ball
Yeah, so I come from a state and local policy background, I should say. And so my view is that a lot of the real governance in this country happens at the state and local level. And I mostly, now that I live in dc, I mostly say, thank God that that's the case. That being said, there are some things that inherently implicate interstate commerce. And I think that models which are trained to be served to the entire world, which cost a billion dollars to train, that the standards by which those models are trained and evaluated and measured, I think those have to be federal standards because you can't have competing standards. Now, maybe we don't end up having competing standards. Maybe what happens is the biggest state regulates. And that happens all the time in America. There's many, many technologies where the state of California or the state of New York or somewhere like that, Texas, sometimes has an implicitly federal effect. One state doing lawmaking, I think that's a failure mode. I think it's an issue, a structural issue of our Constitution that the founders couldn't really possibly have contemplated because the notion of economies of scale didn't quite exist for them. And so I think it's a really, really difficult issue of Supreme Court jurisprudence right now. It's the case that California by default vault, is the central regulator of AI in America thus far. I think they've done a better job than I would have guessed, but still not a great job. So I was broadly supportive of their flagship AI bill from this year, which was called SB53. It is a transparency bill that applies only to the largest developers of AI models. And to me, it seems rather reasonable overall.
Casey Newton
Let me bring it back to maybe some more contemporary AI concerns, though, which is earlier when you were describing some of the, the kind of, you know, landscape in Washington and who's concerned about what you mentioned. There's this group of Republicans who are very concerned about chatbot psychosis, child safety, teen suicidality, that those are all harms that are present today that seem to be encouraged on some level by products that are out on the market. And we have a Congress that is very loathe to pass really any regulation at all when it comes to the tech industry, whether that's for ideological reasons or just logistically. It's very difficult, difficult to get Republicans and Democrats to agree or the government's.
Kevin Roose
Shut down half the time.
Casey Newton
That's also been increasingly an issue. And so in such a world, I can very much understand the point of view of a state lawmaker who says, well, I don't want the kids in my state to kill themselves. Like, we're going to do something about this right now, and we're not as dysfunctional as the federal government, so we're going to get in there and we're going to try to do something. So how do you view that dynamic and is your desire truly that the states would just say, hey, we're not going to get involved, and that's on Congress?
Dean Ball
No. So, I mean, look, I understand the incentives of the state lawmakers, like, for sure. I think Congress needs to act. Like, my, my view is more proactive. My view is like, Congress needs to deal with this. This is a problem that Congress needs to deal with. I don't blame the state lawmakers. I blame. Sometimes I do. Sometimes I blame them for poor statute drafting. There's no excuse for that. Right. Their job, like, and I say this sometimes to legislators and they're like, well, we'll let the courts figure that out. And I say, no. You took an oath to the Constitution too. Not just the judges, but in the general case of like, I want to protect kids in my state. No, of course I don't blame them for that.
Kevin Roose
Yeah. I want to zoom out a little bit and ask a question about AI and polarization. It feels to me right now like AI is kind of in this weird, confusing, pre polarized state, like there's this sort of machine that sort of when an issue gets important enough or salient enough enough people, it kind of gets run through the polarization machine and like it comes out the other side and like Republicans take one position and Democrats take another. Do you think something similar is going to happen with AI where like it will become very predictable which view you hold on AI based on which party you vote for.
Dean Ball
I think what's more likely is that over time it splinters and there's like different things that people talk about. So there's going to be data centers and there's going to be, you know, China competition. That'll be an issue. And there'll be like the software side regulation, there'll be the kids issues. Just like today, you know, we, we don't talk about computer policy or Internet policy. We talk about Internet policy used to be a thing in the 90s, Internet policy was a thing. But now it's like social media privacy, whatever else. I think it'll splinter in that way. Will those issues themselves be polarized? Yeah, I mean in some ways they will be, yeah. I do hope though that there's certain parts of, and this is a very important part of the action plan in my view too, the action plan. Not every single aspect of an issue has to be polarized. There are legitimate, legitimate tail risk type events, national security issues that I think it is the obligation of the federal government to deal with in a mature and responsible way. I've heard Ezra Klein before, I love this turn of phrase of his who.
Kevin Roose
I've never heard of.
Casey Newton
Yeah, we're not familiar with his work.
Kevin Roose
Yeah.
Dean Ball
I've heard him describe government as a grand enterprise in risk management. I think that's true in a fundamental sense. I think that's very true. And so there are certain things that we just do need to deal with. And the action plan tries to make some incremental progress on some of those things. And of course there's a lot of things we need to do to embrace the technology and let it grow and all that too. And I think that's an important part as well. But that's less controversial to say as a Republican. I think the maybe more controversial thing right now to say is like, yeah, there are like legitimate risks and I hope those things can be bipartisan, that dealing with those risks can be bipartisan. Because really like if we can't deal with catastrophic tail risks, then we do not have a legitimate government. The whole point of government is to deal with this issue and we should just as Michael Dell said about apple in the 90s, we should throw the thing out and return the money to the shareholders if we can't manage these things. I really do believe that.
Casey Newton
So let's talk about that point specifically. When I look at AI policy in America today, I mostly see the big Frontier Labs getting just about everything they want. Right. Like it seems like there is a high degree of alignment between the labs and the government. And when it comes to like safety restrictions, for example, I don't see a lot that is holding them back from, you know, building their, their next two or three frontier models. So there are components of the AI Action plan that are meant to address some of those catastrophic risks that you mentioned. Tell some us how you envision that actually working. Where is the moment where the industry stops getting everything that it wants?
Dean Ball
Well, I would say there's so much you can say here. I think the first thing is that many of the people who work at the Frontier labs, I can't speak for the labs, of course, but knowing a lot of them personally, including up to senior, very senior levels, I can say that they have an earnest desire to deal with these problems and they invest real resources as companies. And part of the reason they do that is because they have incentives because their companies would be bankrupt if they caused a pandemic. Right. And the other thing is that like a lot of these problems are super tractable. Like we don't have to act as though these things are like the hardest problems we've ever dealt with. To me as someone with experience in public policy, and by the way, this is the posture of like people, people that I met in government who are 30 year veterans of thinking about tail risks. To them you bring up like AI Biorisk or AI Cyber risk and they're like, yeah, sounds like a serious risk. Okay, there's a hurricane that's tracking toward Florida. Let me go deal with that. Right. Like these things come across your desk every day when you're in government. These are eminently tractable problems in the near term with current technology and technology that I think we're going to have in the near future without spending a ton of money. Money, there's a lot of traction you can get on them that doesn't involve really in any meaningful way slowing down AI development. I want to push back that there's this trade off between sort of mitigating tail risks and slowing down AI development. Now will that always be the case? No. At some point there will be trade offs. We'll have to make those trade Offs and they'll be hard. And it's hard for me to know where I'll come down on that because it'll depend on the particulars. But right now, now we have this great opportunity of like, oh, we can accelerate AI development and we can also have better biosecurity, which, by the way, was a problem before ChatGPT existed. There was a whole pandemic about it.
Kevin Roose
So, like, yeah, sometimes I talk to people who work on AI policy or just, you know, work on AI and think about policy, and they'll say things like, you know, I don't think we're going to get any meaningful AI regulation until there's a catastrophe for you. Do you, Dean, think that it will take something like that to really catalyze significant movement on AI policy in Congress?
Dean Ball
Possibly. I mean, like, I can't. I can't say that I like that. Certainly a catastrophe is plausible and could catalyze movement in Congress for sure. I think there are other ways to achieve this. I really do. Like, I think you can make incremental advancements in the absence of a catastrophe. Now, it depends on, like, a lot of people in the AI safety community will say this, or people that are at labs who care about AI safety also, they will say this. That's like a very anthropic type of position. And I don't say that as a.
Kevin Roose
Pejorative, by the way, to be totally transparent. Like, I've heard this from people at lots of different labs where they're sort of like, yeah, I don't really think, like, we're capable of. And it's not so much a knock on, like, this particular Congress or anything. It's just like, I don't think the government is capable of regulating things in advance.
Dean Ball
I am okay with government being in a mostly reactive posture, particularly with respect to things that aren't tail risks. Tail risks are the one exception because you, you, you know, those things can be very, very damaging. And so you want to do some stuff in advance to mitigate that. But when it comes to, like, most other harms from AI, I'm com. Government just really reacting to realized harms in areas where it's like, okay, well, it's a realized harm that we've seen. We think that's going to continue happening. It doesn't appear to be resolved adequately by the existing system of common law liability that allows people harmed to sue the people who harmed them. And it can be meaningfully addressed through a targeted law. And if all those conditions are satisfied, then we should totally pass that law. I think kid safety is in this category. Category.
Kevin Roose
Yeah. Yeah. Well, Dean, thanks so much for coming. Really fascinating conversation and people should check out your writing. Your website is hyper dimensional.
Dean Ball
It was. It was a real pleasure, guys. Thank you.
Casey Newton
Thank you.
Kevin Roose
Thanks, Dean.
Casey Newton
When we come back, we'll have more to say about the Canadian fur trade than we've ever said before.
Kevin Roose
It was not the Canadian fur trade. It was the upstate New York sugar trade.
Casey Newton
They're related in ways I don't understand.
Rovo AI Promoter
Know the feeling when AI turns from tool to teammate? If you're robo, you know with Rovo, you can streamline your workflow and power up your team's productivity. Find what you need in a snap with Rovo Search. Connect Rovo to your favorite SaaS apps to get the personalized context you need. And Rovo is already built into JIRA and Confluence. Discover Rovo by Atlassian and streamline your workflow with AI powered search chat and agents. Get started with Rovo, your new AI teammate@rovo.com 1.3% it's a small number, but in the right context, it's a powerful one. Stripe processed just over $1.4 trillion last year. That figure works out to about 1.3% of global GDP. And powering that figure are millions of businesses finding new ways to grow on stripe like Salesforce, OpenAI and Pepsi. Learn how to build the next era of your growth@swepe.com enterprise.
Casey Newton
This podcast is supported by AT&T. America's First Network is also its fastest and most Reliable based on RootMetrics United States Root Score Report 1H2025 tested with best commercially available smartphones on three national mobile networks across all available network types.
Mark Humphries
Your experiences may vary.
Casey Newton
RootMetrics rankings are not an endorsement of AT&T.
Mark Humphries
When you compare, there's no comparison.
Casey Newton
AT&T.
Kevin Roose
Well, Scooby gang, it's time to get in the old mystery machine. Because today we've got a mystery.
Casey Newton
That's right, gumshoes. Grab your notebook and your magnifying glass because there are a few clues and we're about to crack the case wide open.
Kevin Roose
And this one is a history mystery. It involves an experiment that a historian ran using an AI model. And we're going to talk about it all with the historian in just a second. But Kasey, to set the scene here a little bit, there are a lot of rumors going around right now about this new Google Gemini 3 model.
Casey Newton
There really are. Gemini 2 came out almost exactly a year ago, came out last December. And while Google has updated it throughout the year. We have been hearing an increasing number of whispers this fall about Gemini 3 and rumors that it really is pretty great. So Alex Heath reported a few weeks back that he expected Gemini 3 to come out in December. And one thing that happens in the run up to the release of new models is that companies quietly test them. And that brings us to our story today.
Kevin Roose
Yes. So Mark Humphries is a history professor at Wilfrid Laurier University in Ontario, Canada. He does research involving a lot of old documents and trying to decipher the handwriting on these documents. And he is also kind of an AI early adopter. He's got a substack called Generative History where he's been writing about his experiments using AI to solve some of his research problems. And recently he had a post that really caught our attention called Has Google quietly solved two of AI's oldest problems? In which he explained a really fascinating experiment that he ran using one of these kind of test models inside Google's AI Studio, which is a Google product where you can kind of experiment with different models. And he says that the responses that he got back from this mystery model made the hair on the back of his neck stand up. Like this was so astounding to him, not just because they were very good, but because they seemed like a different kind of capability than ones he had seen in any other AI model.
Mark Humphries
Yeah.
Casey Newton
And so the mystery is what model was Mark using? But I think the bigger story is what does it mean that this historian was as impressed as he was with this very unusual thing that he found a large language model doing?
Mark Humphries
Yes.
Kevin Roose
And we should say, like it is very hard to determine exactly which model anyone is sort of being shown at any given time. The way these pre release tests go, companies will, you know, show 1% of users one model and another 1% of users a different model and kind of ask them to compare the two and.
Casey Newton
They, and they give them weird code names. They don't tell you what you're using.
Kevin Roose
Exactly. So there's still some uncertainty around this. This may have just kind of been a one off. We will obviously need to see, see what Gemini 3 actually does when it comes out. But for now, I think this is a very interesting story because it points to the way that these AI models are starting to do things that surprise even experts in their fields.
Casey Newton
Yes. And so for those reasons, it's time to bring in Mark Humphries and talk about what he found. Kevin, you know the difference between an American and a Canadian historian? What's that Canadian historians process data while American historians process data.
Kevin Roose
Is that true?
Casey Newton
Yeah, that's true.
Kevin Roose
Well, let's talk to Mark, and he can pronounce it however he wants.
Casey Newton
Hell yeah, brother.
Kevin Roose
Mark Humphries, welcome to Hard Fork.
Mark Humphries
Thanks for having me.
Kevin Roose
Where are we catching you today? Are you up in Canada? What's going on up there?
Mark Humphries
I am. I'm in Waterloo, Ontario, in Canada, in my office at the University of Wilfrid Laurier. University. University.
Casey Newton
So Waterloo. So you must just be surrounded by AI computer scientists at all times.
Mark Humphries
There are a lot of startups and a lot of AI researchers and a lot of computer companies. In Waterloo. Yes.
Kevin Roose
Home of the BlackBerry.
Casey Newton
That's right.
Mark Humphries
That's right, yes. Rim park.
Kevin Roose
So before we get into the specifics of your most recent brush with this new mystery AI model, can you just tell us, like, how you've been using AI in your history research over the last year or so?
Mark Humphries
Sure. So my research partner and I, Leanne Letty, who, whose lab this all kinds comes out of as well, have been working on trying to develop ways of processing huge amounts of data, mostly handwritten, related to the fur trade. And that involves a couple of things. It involves trying to recognize the handwriting accurately, but it also involves trying to basically generate metadata for all of, you know, tens of thousands of records to try and understand what's in those records and make connections between them. So we're kind of operating at tasks that are kind of just at the threshold of what AI models are capable of doing. So it's been kind of interesting to watch over the last couple years, the models get better and become capable of doing some of these things and then finding out new limitations as we go along.
Casey Newton
And tell us a little bit about the kind of work that you do. In general, I know you're really focused on using older documents in your work. What kind of stories are you trying to put to together?
Mark Humphries
Yeah, so I've always been really interested in stories of ordinary people. So in the fur trade, when you're trying to understand what happened to ordinary people in the 18th and the 19th centuries, the problem is many of them were literate, didn't write, and although they kind of appear in a lot of documents that are generated in the kind of course of living. These are marriage, death records, account books, stuff like that, it's a lot of detective work. It's a lot of trying to piece together stories from fragmented documents, what somebody bought in one place, a contract they signed somewhere else, a baptismal record somewhere else. And so a lot of this is trying to do that, and that's what Dr. Leddy and I have been trying to do with our graduate students, is to try and piece together what these stories about ordinary people can tell us in the fur trade and in the western part of North America from kind of the period of about 1760 through until the early 19th century.
Casey Newton
You know, it's interesting, Kevin, because every time I go to a Starbucks and they try to give me a receipt, I think, I don't need any paperwork about what just happened here. Right. I'm just going to take my mocha and get out of here. But what Mark is saying is that that document could be of huge value to a future historian in understanding our lives.
Kevin Roose
Exactly.
Mark Humphries
Yes.
Kevin Roose
They will want to know.
Casey Newton
Let's get into it.
Kevin Roose
So, Mark, tell us about this, this experience that you had with Gemini, the AI model that you were trying to use for this transcription, basically taking this very old document about the fur trade and plugging it in and saying, transcribe this. Tell me what this says.
Mark Humphries
Yeah. So, you know, I think to understand why this is kind of a significant or looks like it could be a fairly significant development, it's important to understand kind of where we've come from in the last two years on this. Right? So when GPT4 first came out in 2023, it could kind of sort of read handwritten documents. It would be mostly errors, but you could kind of see that it was beginning to be able to do this. And it's been really easy for kind of companies and systems to get up to about 90% accuracy. And then everything above 90% has been pretty difficult. And the problem is that that last 10% is the most important part. Right. So that if you're interested in people's names, you're interested in amounts of money, you're interested in where they were. You've got to get that stuff right in order to make it useful. And up till about, I guess, you know, when Gemini 2.5 Pro came out back last spring, we were kind of still in that era, and Gemini 2.5 Pro got up to about 95% accuracy. And that's really good. So what I was interested in is when we began to see reports kind of on X that there were new models being tested by Google in AI Studio, which is their kind of playground app. I was just curious how much better would this get?
Kevin Roose
So, okay, you are hearing these rumors that there's this new mystery model inside the AI Studio that Google tests new models in before they're released. What do you do?
Mark Humphries
Yeah. So we have, Dr. Leddy and I have a corpus of 50 different documents that we've been using to benchmark how these models improve over time. They're all documents that we are pretty sure are not in the training data because we've either taken them ourselves or they've been kind of from sources that are not typically online. And you can't be 100% sure, but it seems to be the case. So I started to put a few of those documents in. And for your listeners who are not maybe aware, the way that the testing of these types of models often works is you kind of have to put in the document dozens of times before you get the hit on the model you're hoping to test, because it kind of randomly pops up. So it's not an easy thing. Thing to do. I managed to test about five of our 50 examples, about a thousand words, and the results were impressive, to say the least, in the sense that the error rate again declined by about 50% from where it had been with Gemini 2.5 Pro. And it got to about a 1% word error rate, which means every one in 100 words. Obviously you're getting wrong, but that can include capitalization errors, punctuation, stuff like that. So that in itself is really significant. No models come close to that. Human experts who do transcription for a living offer about a 1% error rate, so that. That itself is fairly important.
Casey Newton
And your sense of having used this new experimental model, did that just come from? You're inputting, you know, dozens and dozens of queries, and every once in a while you would just get a result that was radically better than the others. And you, you thought, aha, I must be getting the new one. Or were there any other assigns about what Google was showing you?
Mark Humphries
Well, it's AB testing. So what that means is normally in AI Studio, you put in a query and you get a response, and when you get the a B test, you get two responses, and it asks you to rate which one's better. Right. And the labs do this in order to get feedback on, you know, is a model actually better on specific types of tasks than other ones? Right. So you might have to do that 20 or 30 times until you get one of those two responses. And then the differences were pretty notable.
Kevin Roose
So you said the overall error rate fell by about 50%, but that was not actually what impressed you the most about this new model. What impressed you the most?
Mark Humphries
Yeah. So first of all, that was, you know, impressive. And then I was curious, okay, if it's Gotten to this point, how's it going to do on tabular data? And as historians, one of the things you work with, to go back to your Starbucks example, are receipts and ledgers that come from merchants in the past. And a lot of that's fairly boring. But if you want to know where somebody is, where they bought their coffee one morning, and you want to trace that person's movements, you can use these types of documents to do that. You can see what they bought and all of those types of things. The thing is, to this point, models have been pretty bad on tabular data. It's often very. It's kept kind of like a cash register receipt system is kept. So it's kind of just on the fly. No, nobody's expecting people to necessarily read it down the road, so it's difficult to interpret just by looking at it. It's also sometimes quickly written. So it's even worse handwriting than people are used to. And because it's historical documents, in this case, I'm dealing with records from 18th century New York State, upstate New York, in Albany. And those records are written in pounds, shillings and pence. So that's the old. It's a different base than we're used to using, in which you have basically a different form of currency measurement. And so when I dropped in a page just kind of at random from this ledger, I was just curious to see what I get back. And suddenly it not only came back in a near perfect transcription, which itself was kind of remarkable given how difficult it is to make sense of what's actually on the page, but as I started to go through it, I was looking for errors, I was trying to find errors. And I began to realize that some of the things that I was seeing on there that looked like errors were actually clarifications. And they required the model to do some really interesting, interesting things.
Kevin Roose
Give us an example.
Mark Humphries
Sure. So in the actual ledger document, right, what we're dealing with is a series of kind of entries that are made in a daybook. So this is as people come into a store, they're buying things, and it's being recorded just like on a cash register sheet. And in the one case that I was in particular looking at here, what it basically says in one of the entries is, samuel Slit came in on the 27th of March. And it says, to one loaf of sugar at 4145-@-140191. And what that means when you actually break it all out, is that this guy named Samuel Slick came into the store, he bought one loaf of sugar, if you're not aware, in the 18th century, sugar comes in hard conical shapes and they break off pieces and they sell it to you. And it says 145 sold at 1 shilling, 4 pence per pound. And then the total is 0 pence, pounds, 19 shillings and 1 pence. And this is the old kind of notation, right? And what I saw in the actual model's response, though was that it had figured out that in fact it was one loaf of sugar measured out at 14 pounds of sugar, 5 ounces sold at 1 shilling, 4 pence. And then for the total, right? And what's insignificant about that is that in order to figure out that what was written on the page, page, just random number 1, 4, 5, to figure out that that was 14 pounds and 5 ounces, the model had to be able to work backwards from a different currency system with a different base. The thing that makes that important is that models shouldn't be able to do that, right? That these models are basically the way they're trained is in pattern recognition. What they're trying to do is they're trying to predict the next token. And so the first problem here is that predicting numbers is actually, actually very difficult for models to do, right? In the sense that the model has no idea whether Samuel Slit is buying 14 pounds 5 ounces or 13 pounds, 6 ounces. Right? I mean, that's a random number, effectively, it's not probabilistic. The other problem is that although there would be, you know, a lot of material in the training data that would be relate to this kind of old currency system, the reality is there's not that much, much of it in terms of the actual percentage of material is there because there's so little of this that's out there in terms of the overall sum total of all the records that exist. And so when we're thinking about it, the model's having to do some interesting things there. What it looks like to me is it's a form of symbolic reasoning. I have to know in my head that I'm dealing with different units of measurement which don't have a common kind of a base pair to multiply or divide by. And then I have to kind of abstractly realize that these units of measurement do, in fact, they are comparable as long as we do some conversions and we have to then, you know, move them around in our heads to figure out. This is something that I had to think about it for a second and realize, in fact, the model had done something that was mathematically correct and unexpected.
Casey Newton
So what are the implications for you in your work of a model being able to do this kind of abstract reasoning?
Mark Humphries
Yeah. And so as an historian, what it means is that assuming that this replicates, once we start to see the actual model come out, you're going to be able to trust the models to do a lot of stuff that historians would normally need to do. Right. So it's one thing to transcribe a document. It would be another to say, here's a ledger, go through and add up all the sugar that was bought and sold in the store ledger. And right now you can't trust a model to do anything like that. Right. You can't trust it to necessarily recognize sugar. You can come up with quantities, do that type of math. If we're getting to a point where models can begin to do that, you can begin to get them to do tasks that would take humans a very long time.
Kevin Roose
Right. It sort of sounds like the equivalent of the moment where like, AI coding tools went from being a useful assistant for a person who's a professional programmer to like, actually being able to go out and program things just on their own with very minimal instruction. Yeah, it's like that for history, right?
Mark Humphries
Yeah. And I think that's a really good example. But I think that the interesting thing about history here is that I think it's a very typical kind of knowledge work kind of area. Right. In the sense that a lot of the stuff we're doing is pretty esoteric. And your listeners will probably be wondering, you know, who's really interested in how much sugar people bought in Albany in the 18th century.
Kevin Roose
Well, Casey is, but he's a special case.
Mark Humphries
Yeah, yeah, that's fair.
Casey Newton
I'm really interested in this Samuel Slit and why he needed 14 pounds of.
Kevin Roose
Of sugar.
Casey Newton
Like, take it easy, Sam.
Mark Humphries
It's true. Well, he's a merchant. He also wants to go and sell it to other people, Right?
Casey Newton
Oh, he's a dealer. There we go.
Mark Humphries
Now he is a sugar dealer. But the interesting thing about this, I think, right, is that the stuff we do as historians with these historical records, this is what all knowledge work people do, right. Is that you take information and you synthesize it. You take it from one format, you put it into another. You realize the implications of the things you're reading and you draw conclusions and analysis based on that. Right. And it can be 18th century sugar, but it can very easily be any other kind of widget that a knowledge worker uses. So what I'm seeing, turning on Here, for historians, is highly likely to start turning on in other areas as well that up to this point, the models, you know, we've been getting this sense that they're starting to get good enough where you can feel like, yeah, I think I can trust the output on this. But you're getting to the point where it just, it works. And as somebody who uses coding assistants all the time now that is, it's a very similar situation where you used to have to cut and paste back and forth and it would, you know, it would never run the first time. You'd have to run it, you know, three or four times, paste the errors back and forth, and eventually it would work. And now you can just kind of hit the button and it almost always works. Right. And that's what we're going to see here with knowledge work.
Casey Newton
So I want to zero in on what makes this so interesting. So we don't know at this moment that this is Gemini 3, but I think Kevin and I feel like it's highly likely to be Gemini 3. Right. And we also don't know a lot about how, if it is Gemini 3, exactly how it was trained. But I think we can assume that it was trained in a way that its predecessors were, which was in part by just feeding it lots more data, lots more compute.
Kevin Roose
Right.
Casey Newton
Just sort of following the scaling laws. And there's been so much debate over the past year about are we seeing diminishing returns? Right. Have we sort of figured out the limits of what we can get out of these scaling laws? The story that you're telling us, Mark, is a suggestion that, no, we have not gotten everything that there is to be gotten out of this increased scaling. And in fact, we should expect to see continued emerging properties from this ongoing scaling. And you've just given us an example of it right there. So that's why I think this is so fascinating.
Kevin Roose
Yeah. And I was fascinated by this experiment and I wanted to see if I could actually get to the bottom of what happened here. So I asked some folks who would be in a position to know, like, hey, there's this, you know, history professor in Canada. He thinks he like stumbled ont this like unreleased Gemini 3ab test and it was really good. And they said, lose my number. No, they were very tight lipped. They did not want to talk about it. They are keeping things very secretive over there. But I was able to confirm that Google does test new models in AI Studio before they sort of appear elsewhere. And so I think if I were a betting man, it's a pretty good bet that what you experience was in fact an unreleased model, probably Gemini 3.
Casey Newton
So, Kevin, I have not been in the AI studio myself recently to see if I could try this model. Have you made any efforts to try to access whatever this model is?
Kevin Roose
Yes. So I use AI Studio. People don't know this, but like, Google has like, you know, 800 AI products right now. They're like a billion ways to use Gemini. Um, and the, the most effective way, the best way to use Gemini is inside this product that basically no one except, you know, developers and nerds like, like us uses, which is called the Google AI Studio. And if you go in there, I don't know, for whatever reason, Mark, do you find this too? But like, the model, like the version of Gemini in AI Studio is better than the one like on the web. I don't know why, but this is something I'm consistently able to get AI Studio to do things. Things like transcribing long interviews that the regular old Gemini won't do. So anyway, I was in there this morning actually doing some research for our segment about Suncatcher. This, like, Google, you know, project about putting AI stuff in space. And I was trying to have it summarize this research paper and give me some ideas in comparisons to what other companies are doing. And I got this, a B test. This, like, you know, choose between these two answers. And I am looking at it right now. It says, which response do you perform?
Mark Humphries
Fur?
Kevin Roose
And it has these two side by side things, and they basically both look pretty good. I think the problem I'm identifying is, Mark, is that unlike you, I am not smart enough to come up with like, problems that are challenging enough where the difference between one pretty good model and a very good model is readily apparent. So maybe you can help me with that.
Casey Newton
Well, I mean, here's an idea. I know, you know, Mark really focuses on the 17 and the 1800s in the fur trade. What about the 1500s? Yes. I bet you can make a dent.
Kevin Roose
Yeah, well, I will. I'll look into that. All right, well, totally fascinating experience. And I can't wait to hear more about what you're doing with AI and history. This is a really interesting mystery that I hope we've shed some light on.
Casey Newton
Thank you, Mark.
Mark Humphries
Thank you very much for having me. It.
Rovo AI Promoter
Know the feeling when AI turns from tool to teammate. If you're Rovo, you know, with Rovo, you can streamline your workflow and power up your team's productivity. Find what you need in a snap. With Rovo, search. Connect Rovo to your favorite SaaS apps to get the personalized context you need and Rovo is already built into JIRA and Confluence. Discover Rovo by Atlassian and streamline your workflow with AI powered search, chat and agents. Get started with Rovo, your new AI teammate@rovo.com 1.3% it's a small number, but in the right context, it's a powerful one. Stripe processed just over $1.4 trillion last year. That figure works out to about 1.3% of global GDP. And powering that figure are millions of businesses finding new ways to grow on stripe like Salesforce, OpenAI and Pepsi. Learn how to build the next era of your growth@swepe.com Enterprise this podcast is supported by AT&T.
Casey Newton
America's First Network is also its fastest and most Reliable based on RootMetrics United States Root Score Report 1H2020 tested with best commercially available smartphones on three national mobile networks across all available network types.
Mark Humphries
Your experiences may vary.
Casey Newton
RootMetrics rankings are not an endorsement of ATT.
Mark Humphries
When you compare, there's no comparison.
Casey Newton
AT&T hard Fork is produced by Rachel Cohn and Whitney Jones, were edited by Jen Bullion. Today's show was fact checked by Will Potential Heisholt and was engineered by Chris Wood. Original music by Alicia Etoupe, Marian Lozano and Dan Powell. Video production by Sawyer Roque, Pat Gunther, Jake Nichol and Chris Schott. You can watch this whole episode on YouTube@YouTube.com hardfork Special thanks to Paula Schumann, Pui Wing Tam, Dahlia Haddad and Jeffrey Miranda. You can email us@hardforkytimes.com on what else you think we should build up in space.
Kevin Roose
If your small business is booming, you.
Mark Humphries
Might say Cha Ching but you it.
Kevin Roose
Should say like a good neighbor, State Farm is there and we'll help your growing business. Like a good neighbor, State Farm is there.
The New York Times · November 14, 2025
Hosts: Kevin Roose, Casey Newton
Guests: Dean Ball (AI policy advisor), Mark Humphries (history professor)
This episode of Hard Fork explores three remarkable trends at the cutting edge of tech:
Timestamps: 02:28–17:24
The Problem with Earth-based Data Centers:
Project Suncatcher:
Overcoming Challenges:
Testing Roadmap & Competition:
Social & Political Reactions:
Memorable Exchange:
Timestamps: 19:08–49:10
Dean Ball’s Background:
The Landscape in DC:
Right-wing AI Policy Factions:
The “Woke AI” Executive Order:
Federal vs State AI Policy:
Big Tech and Catastrophic Risk:
Notable Quotes:
Timestamps: 51:14–73:04
The Setup:
The Mystery Experiment:
Significance:
Broader Importance:
Memorable Moments:
On Ambition & Power Crunch:
On Regulatory Approaches:
On Historical Reasoning in AI:
For more engaging summaries of “Hard Fork,” subscribe to The New York Times or visit nytimes.com/podcasts!