
“DeepSeek is a really odd duck.”
Loading summary
Oracle Ad
This podcast is supported by Oracle AI.
Requires a lot of compute power and the cost for your AI workloads can spiral. That is, unless you're running on OCI Oracle Cloud Infrastructure. This was the cloud built for AI, a blazing, fast, enterprise grade platform for your infrastructure, database, apps and all of your AI workloads.
Right now, Oracle can cut your current cloud bill in half if you move to OCI. Minimum financial commitment and other terms apply. Offer ends March 31st. See if you qualify@oracle.com hardfork oracle.com hardcore.
Kevin Roose
Hard fork I just got my weekly. You know, I set up ChatGPT to email me a weekly affirmation before we start taping because you can do that now with the tasks feature.
Casey Newton
Yeah, people say this is the most expensive way to email yourself. A reminder. So what, what sort of affirmation did we get today?
Kevin Roose
It said, you are an incredible podcast host, sharp, engaging and completely in command of the mic. Your taping today is going to be phenomenal and you're going to absolutely kill it.
Casey Newton
Wow. And that's why it's so important that Chat GPT can't actually listen to podcasts because I don't think it would say that if it had ever heard us.
Kevin Roose
It would say, just get this over with.
Casey Newton
Get on with it.
Kevin Roose
I'm Kevin Roos, a tech columnist at the New York Times.
Casey Newton
I'm Casey Newton from Platformer and this is Hard Fork. This week we go deeper on Deep Seek. Chinatalk's Jordan Schneider joins us to break down the race to build powerful AI. Then, hello, operator. Kevin and I put OpenAI's new agent software to the test. And finally the train is coming back to the station for a round of Hot Mess Express.
Kevin Roose
Well, Kasey, it is rare that we spend two consecutive episodes of this show talking about the same company, but I think it is fair to say that what is happening with Deep Seek has only gotten more interesting and more confusing.
Casey Newton
Yeah, that's right. It's. It's hard to remember a story in, in recent months, Kevin, that has generated quite as much interest as what is going on with Deep. See now deepseek, for anyone catching up, is this relatively new Chinese AI startup that released some very impressive and cheap AI models this month that lots of Americans have started downloading and using.
Kevin Roose
Yeah, so some people are calling this a Sputnik moment for the AI industry when kind of every nation perks up and starts, you know, paying attention at the same time to the AI arms race. Some people are saying this is the biggest thing to happen in AI since the release of ChatGPT. But, Kasey, why don't. Why don't you just catch us up on what has been happening since we recorded our emergency podcast episode just two days ago?
Casey Newton
Well, I would say that there have probably been three stories, Kevin, that I would share to give you a quick flavor of what's been going on. One, a market research firm says Deep Seek was downloaded 1.9 million times on iOS in recent days and about 1.2 million times on the Google Play Store. The second thing I would point out is that Deep Seek has been banned by the US Navy over security. Security concerns, which I think is unfortunate, because what is a submarine doing if not Deep Seeking? It was also banned in Italy, by the way, after the data protection regulator made an inquiry. And finally, Kevin, OpenAI says that there is evidence that Deep Seek distilled its models. Distillation is kind of the AI lingo or euphemism for they used our API to try to unravel everything we were doing and use our data in ways that we don't approve of. And Now Microsoft and OpenAI are now jointly investing whether Deep Seek abused their API. And of course, we can only imagine how OpenAI is feeling about the fact that their data might have been used without payment or consent.
Kevin Roose
Yeah, must be really hard to think that someone might be out there trading AI models on your data without permission.
Casey Newton
And I want to acknowledge that literally every single user of Blue sky already made this joke, but they were all funny. And I'm so happy to repeat it here on Hard Fork this week. Now, Kevin, as always, when we talk about AI, we have certain disclosures to make.
Kevin Roose
The New York Times Company is currently suing OpenAI and Microsoft over copyright violations alleged related to the use of their copyrighted data to train AI models. I think that was good.
Casey Newton
It was very good. And I'm in love with a man who works at Anthropic. Now, with that said, Kevin, we have even further. We want to go into the Deep Seq story and we want to do it with the help of Jordan Schneider.
Kevin Roose
Yes, we are bringing in the big guns today because we wanted to have a more focused discussion about Deep Seek that is not about, you know, the stock market or how the American AI companies are reacting to this, but is about one of the biggest sets of questions that all of this raises, which is what is China up to with Deep Seek and AI more broadly? Like what? What are the geopolitical implications of the fact that Americans are now obsessing over this Chinese made AI app. What does it mean for Deep Seq's prospects in America? What does it mean for their prospects in China? And how does all this fit together from the Chinese perspective? So Jordan Schneider is our guest today. He's the founder and editor in chief of ChinaTalk, which is a very good newsletter and podcast about US China tech policy. He's been following the Chinese AI ecosystem for years, and unlike a lot of American commentators and analysts who were sort of surprised by Deep Seek and what they managed to pull off over the last couple of weeks, I'll say it, I was surprised. Yeah, me too. But Jordan has been following this company for a long time and a big focus of ChinaTalk. His newsletter and podcast has been translating literally what is going on in China into English, making sense of it for a Western audience, and keeping tabs on all the developments there. So, perfect guest for this week's episode and I'm very excited for this conversation.
Casey Newton
Yes, I have learned a lot from China talk in recent days as I've been boning up on Deep Seek. So we're excited to have Jordan here and let's bring him in.
Kevin Roose
Jordan Schneider, welcome to Hard Fork.
Jordan Schneider
Oh my God, such a huge fan. This is such an honor.
Casey Newton
We're so excited to have you. I have learned truly so much from you this week. And so when we were talking about what to do this week, we just looked at each other and said, we have got to see if Jordan can come on this podcast.
Kevin Roose
Yeah. So this has been a big week for Chinese tech policy, maybe the. The biggest week for Chinese tech policy, at least that I can remember. I realized that something important was happening last weekend when I started getting texts from like, all of my non tech friends being like, what is going on with Deep Seek? And I imagine you had a similar reaction because you are a person who does constantly pay attention to Chinese tech policy.
Jordan Schneider
So I've been running chinatalk for eight years and I can get my family members to maybe read like one or two editions a year. And the same exact thing happened with me, Kevin, where all of a sudden I got, oh my God, Deep Seek. Like it's on the COVID of the New York Post. Jordan, you're so clairvoyance. Like, maybe I should read you more like, okay, thanks, mom. Appreciate that.
Kevin Roose
Yeah. So I want to talk about Deep Seek and what they have actually done here, but I'm hoping first that you can kind of give us the basic lay of the land of the sort of Chinese AI ecosystem, because that's not an area where Casey or I have spent a lot of time looking, but tell us about Deep Seek and sort of where it sits in the overall Chinese industry.
Jordan Schneider
So Deep Seq is a really odd duck. It was born out of this very successful quant hedge fund, the CEO of which, basically, after ChatGPT was released, was like, okay, this is really cool. I want to spend some money and some time and some compute and hire some fresh young graduates to see if we can give it a shot to make our own language models.
Casey Newton
And so a lot of companies are out there building their own large language models. What was the first thing that happened that made you think, oh, this company is actually making some interesting ones?
Jordan Schneider
Sure. So there are lots and lots of very moneyed Chinese companies that have been trying to follow a similar path after ChatGPT. You know, we have giant players like Alibaba, Tencent, ByteDance, Huawei, even, trying to, you know, create their own OpenAI, basically. And what is remarkable is the big organizations can't quite get their head around creating the right organizational institutional structure to incentivize this type of collaboration and research that leads to real breakthroughs. So, you know, Chinese firms have been releasing models for years now, but Deep Seq, because of the way that it structured itself and the freedom they had not necessarily being under a direct profit motive, they were able to put out some really remarkable innovations that caught the world's attention, you know, starting maybe late December, and then, you know, really blew everyone's mind with the release of the R1 chatbot.
Kevin Roose
Yeah. So let's talk about R1 in just a second. But one more question for you, Jordan, about deepseek. What do we know about their motivation here? Because so much of what has been puzzling American tech industry watchers over the last week is that this is not a company that has sort of an obvious business mod connected to its AI research. Right. We know why Google is developing AI because it thinks it's going to make the company Google much more profitable. We know why OpenAI is developing advanced AI models. It does not seem obvious to me, and I have not read anything from people involved in Deep Seek about why they are actually doing this and what their ultimate goal is. So can you help us understand that?
Jordan Schneider
So, we don't have a lot of data, but my base case, which is based on two extended interviews that the Deep Sea CEO released, which we've translated on ChinaTalk, as well as just like what Deep Seq employees have been tweeting about in the west and then domestically, is that they're dreamers. I Think the right mental model is OpenAI, you know, 2017-2022. Like, I'm sure you could ask the same thing, like, what the hell are they doing? Sam Altman literally said, I have no idea how we're ever going to make money. Right. And here we are in this grand new paradigm. So I really think that they do have this like, vision of AGI and like, look, we'll build it and we'll make it cheaper for everyone, you know, we'll figure it out later. And like they have enough trading strategies that they can fund it. And now that they've really blown people's minds, we might be sort of turning into a new period in Deep Seats history. Kind of like what happened with OpenAI, right, where they're going to have to shack up with a hyperscale. Be it, you know, not Microsoft in this case, but bytedance or Ali or Tencent or Huawei. And the government's going to start to pay attention in a way which it really hasn't over the past few years.
Kevin Roose
Right. And I want to, I want to drill down a little bit there because I think one thing that most listeners in the west do know about Chinese tech companies is that many of them are sort of inextricably linked to the Chinese government, that the Chinese government has access to user data under Chinese law, that these companies have to follow the Chinese censorship guidelines. And so as soon as Deep Seek started to really pop in America over the last week, people started typing in, you know, things to Deep Seat's model, like, tell me about what happened at Tiananmen Square or tell me about Xi Jinping or tell me about the Great Leap Forward. And it just sort of wouldn't do it at all. And so people I think saw that and said, oh, this is, this is like every other Chinese company that has this sort of hand in glove relationship with the Chinese ruling party. But it sounds from what you're saying, like Deep Seek has a little bit more complicated a relationship to the Chinese government than maybe some other better known Chinese tech companies. So explain that.
Jordan Schneider
Yeah, I mean, I think it's, it's, it's important. Like the mental model you should have for these CEOs are not like people who are dreaming to spread Xi Jinping thought like what they want to do is compete with Mark Zuckerberg and Sam Altman and show that they're like really awesome and great technologists. But the tragedy is, let's take Bytedance for example. You can look at Zhang Yiming, their CEOs, Weibo posts from 2012, 2013, 2014, which are super liberal in a Chinese context, saying, we should have freedom of expression, we should be able to do whatever we want. In the early years of ByteDance, there was a lot of relatively more subversive content on the platform where you sort of saw, like, real poverty in China. You saw off color jokes. And then all of a sudden, in 2018, he posts a letter saying, I am really sorry, like, I need to be part of this sort of, like, Chinese national project and, like, better adhere to, you know, modern Chinese socialist values. And I'm really sorry and it won't ever happen again, you know. So the same thing happened with Didi, right? Like, they don't really want to have to do anything with politics, and then they get on someone's side and all of a sudden they get zapped.
Casey Newton
Didi is of course, the big Chinese rideshare company, correct? Yeah. What Didi do do?
Jordan Schneider
So they listed on the Western Stock Exchange after the Chinese government told them not to, and then they got taken off app stores and it was a whole giant nightmare. Like they had to sort of go through their rectification process. So point being, with Deep Seq, right, is like, now they are, whether they like it or not, going to be held up as a national champion. And that comes with a lot of headaches and responsibilities from potentially giving the Chinese government more access, having to fulfill government contracts, which honestly are probably really annoying for them to do in, sort of distracting from the broader mission they have of developing and deploying this technology in the widest range possible. But, like, Deep Seek thus far has flown under the radar, but that is no longer the case, and things are about to change for them.
Kevin Roose
Right? And I think that was one of the surprising things about Deep Seek for the people I know, including you, who follow Chinese tech policy, is, you know, I think people were surprised by the sophistication of their models. And we talked about that on the emergency pod that we did earlier this week and how cheaply they were trained. But I think the other surprise is that they were released as open source software, because one thing that you can do with open source software is download it, host it in another country, remove some of the guardrails and the censorship filters that might have been part of the original model.
Casey Newton
By the way, it turned out there weren't even really guardrails on the V3 model, right. That it had not been trained to avoid questions about Tiananmen Square or anything. So that was another really unusual thing about this.
Kevin Roose
Right? And One thing that we know about Chinese technology products is that they, they, they don't tend to be released that way. They tend to be hosted in China and overseen by Chinese teams who can make sure that they're not out there talking about Tiananmen Square. So is the, is the open source nature of what Deep Seq has done here part of the reason that you think there might be conflict looming between them and the Chinese government?
Jordan Schneider
You know, honestly, I think this whole ask it about Tiananmen stuff is a bit of a red herring on a few dimensions. So first, one of these like arguments that is a little sort of confusing to me is like folks used to say, oh, like the Chinese models are going to be lobotomized and like they will never be as smart as the Western ones because like, they have to be politically correct. I mean, look, if you ask Claude to say racist things, it won't. And Claude's still pretty smart. Like, this is sort of a solved problem in a bit of a, in a bit of a red herring when talking about sort of long term competitiveness of Chinese and Western models. Now you asked me like, like, oh, so they released this model globally and it's open source. Maybe someone in the Chinese government would be uncomfortable with the fact that people can get a Chinese model to say things that would get you thrown in jail if you posted them online in China. It's going to be a really interesting calculus for the Chinese government to make because on the one hand, this is the most positive shine that Chinese AI has got globally in the history of Chinese AI. So they're going to have to navigate this and it might prompt some uncomfortable conversations and bring regulators to a place they wouldn't have otherwise landed.
Kevin Roose
Yeah. Now Jordan, I want to ask you about something that people have been talking about and speculating about in relationship to the Deep Seq news for the last week or so, which is about chip controls. So we've talked a little bit on the show earlier this week about how Deep Seek managed to put together these models using some of these kind of second rate chips from Nvidia that are allowed to be exported to China. We've also talked about the fact that you cannot get the most powerful chips legally if you are a Chinese tech company. So there have been some people, including Elon Musk and other American tech luminaries who have said, oh well, Deep Seq has this sort of secret stash of these banned chips that they have smuggled into the country and that actually they are not making do with kind of The Kirkland Signature chips that they say they are. What do we know about how true that is?
Jordan Schneider
So did Deep Seek have ban chips? It's kind of impossible to know. This is a question more for the US Intelligence community than like Jordan Schneider on Twitter. But I do think that it is important to understand that the delta between what you can get in the west and what you can get in China is actually not that big. And, you know, we're talking about training a lot. But also on the inference side, China can still buy this H20 chip from Nvidia, which is basically world class at like deploying the AI and letting everyone use it. So does this mean that we should just give up? I don't think so. COMPUTE is going to be a core input regardless of how much model distillation you're going to have in the future. There have been a lot of quotes even from the Deep Seq founder, basically saying, like, the one thing that's holding us back are these. Export control.
Kevin Roose
Right.
Casey Newton
Okay, I want to ask a big picture question.
Jordan Schneider
Sure.
Casey Newton
I think that a reason that people have been so fascinated by this Deep Seq story is that at least for some folks, it seems to change our understanding of where China is in relation to the United States when it comes to developing very powerful AI. Jordan, what is your assessment of what the V3 and R1 models mean and to what extent do you think the game has actually changed here?
Jordan Schneider
I'm not really sure the game has changed so much. Like, Chinese engineers are really good. I think it is a reasonable base case that Chinese firms will be able to develop comparable or fast follow on the model side. But the real sort of long term competition is not just going to be on developing the models, but deploying them and deploying them at scale. And that's really where COMPUTE comes in. And that's why export controls are going to continue to be a really important piece of America's strategic arsenal when it comes to making sure that the 21st century is defined by, you know, the US and our friends, as opposed to China and theirs.
Casey Newton
Right. So it's one thing to have a model that is about as capable as the models that we have here in the United States. It's another thing to have the energy to actually let everyone use them as much as they want to use them. What you're saying is, no matter what Deep SEQ may have invented here, that fundamental dynamic has not changed. China simply does not have nearly the amount of COMPUTE that the United States has.
Jordan Schneider
As long as we don't screw up export controls. So I think the sort of base case for me is that if the US stays serious about holding the line on semiconductor manufacturing equipment and export of AI chips, then it will be incredibly difficult for the Chinese sort of broader semiconductor and AI ecosystem to leap ahead, much less kind of like Fast follow beyond being able to develop comparable models. I'm feeling good as long as Trump doesn't make some crazy trade for soybeans in exchange for ASML EUV machines. That would really break my heart.
Kevin Roose
I want to inject kind of a note of skepticism here because I buy everything that you're saying about how Deep Seek's progress has been sort of bottlenecked by the fact that it can't get these very powerful American AI chips from companies like Nvidia. But I also am hearing people who I trust say things that make me think that actually the bottleneck may not be the availability of chips. That maybe with some of these algorithmic efficiency breakthroughs that Deepseek and others have been making, it might be possible to run a very, very powerful AI model on a conventional piece of hardware, on a MacBook even. And I wonder about how much of this is just like AI companies in the west trying to cope, trying to make themselves feel better, trying to reassure the market that they are still going to make money by investing billions and billions of dollars into building powerful AI systems. If these models do just become sort of lightweight commodities that you can run on a much less powerful cluster of computers or maybe on one computer, doesn't that just mean we can't control the proliferation of them at all?
Jordan Schneider
Yeah, I mean, I think this is one potential future. And maybe that potential future, like, went up 10 percentage points of likelihood of like you being able to fit the biggest, baddest, smartest, most fast, efficient AI model on something that you that can sit in your home. But I think there are lots of other futures in which sort of the world doesn't necessarily play out that way. And look, Nvidia went down 15%. It didn't go down 95%. Like, I think if, if we're really in that world where chips don't matter because everything can be shrunk down to kind of consumer grade hardware, then the sort of reaction that I think you would have seen in the stock market would have been even more dramatic than the kind of freakout we saw over this week. So we'll see. I mean, it would be a really remarkable kind of democratizing thing if that was the future we ended up living in. But it still seems Pretty unlikely to my, you know, like, history major brain here.
Casey Newton
I would also just point out, Kevin, that when you look at what Deepseek has done, they have created a really efficient version of a model that American companies themselves had trained like nine to 12 months ago. Right. So they sort of caught up very quickly and there are fascinating technological innovations in what they did. But in my mind, these are still primarily optimizations. Like, for me, what would tip me over into like, oh my gosh, America is losing this race, is China is the first one out of the gate with a virtual coworker. Right. Or like a, like a truly phenomenal agent, some sort of leap forward in the technology as opposed to we've caught up really quickly and we've figured out something more efficiently. Are you seeing it differently than that?
Kevin Roose
I mean, I guess I just don't know what like a 6 month lag would buy us if it does take 6 months for the Chinese AI companies like deep Seq to sort of catch up to the state of the art. You know, I was struck by Adario Amadei, who's the CEO of Anthropic, wrote an essay just today about Deep Seq and export controls. And in it he makes this point about the sort of difference between living in what he called a unipolar world where one country or one block of countries has access to something like an AGI or an ASI and the rest of the world doesn't, versus the situation where China gets there roughly around the same time that we do. And so we have this bipolar world where two blocks of countries, the east and the west, basically have access to this equivalent technology. And so.
Casey Newton
And of course, in a bipolar world, sometimes we're very happy and sometimes we're very sad.
Kevin Roose
Exactly. So I just think like, whether we get there, you know, six months ahead of them or not, I just feel like there isn't that much of a material difference. But Jordan, maybe I'm wrong. Can you make the other side of that, that it really does matter?
Jordan Schneider
I'm kind of there. You know, I'll take a little bit of issue with what Dario says. And I think, you know, one of the lessons that deepsea shows is we should expect a base case of Chinese model makers being able to fast follow the innovations, which, by the way, Kasey actually do take those giant data centers to run all the experiments in order to find out what is the future direction you want to take your model. And what AI is going to come down to is not just creating the model, not just sort of like Dario envisioning the future. And then all of a sudden like, like things happen. Like, there's going to be a lot of messiness in the implementation and there are going to be sort of like teachers unions who are upset that AI comes in the classroom and there are going to be like all these regulatory pushbacks and a lot of societal reorganization which is going to need to happen just like it did during the Industrial Revolution. So look, model making is a frontier of competition. Compute access is a frontier of competition. But there's also this broader, like, how will a society kind of adopt and cope with all of this new future that's going to be thrown in our faces over the coming years? And I really think it's that just as much as the model development and the compute, which is going to determine which countries are going to gain the most from what AI is going to offer us.
Kevin Roose
Yeah. Well, Jordan, Kevin, thank you so much for joining and explaining all of this to us. I feel more enlightened. Me too.
Jordan Schneider
Oh, my pleasure.
Kevin Roose
My chain of thought has just gotten a lot longer. That's an AI joke when we come back.
Casey Newton
Kevin, there's an agent at our door.
Kevin Roose
Is it Jerry Maguire?
Casey Newton
No, it's an AI one.
Kevin Roose
Oh, okay.
Casey Newton
Jerry Maguire. I don't know.
Kevin Roose
Jerry Maguire.
Oracle Ad
This podcast is supported by Oracle.
AI requires a lot of compute power and the cost for your AI workloads can spiral. That is, unless you're running on oci. Oracle Cloud Infrastructure. This was the cloud built for AI, a blazing, fast enterprise grade platform for your infrastructure, database, apps and all of your AI workloads.
Right now, Oracle can cut your current cloud bill in half if you move to OCI. Minimum financial commitment and other terms apply. Offer ends March 31. See if you qualify@oracle.com hardfork oracle.com hardfork.
Vanta Ad
Whether you're starting or scaling your company's security program, demonstrating top notch security practices and establishing trust is more important than ever. VANTA automates compliance for SoC2, ISO 27001 and more. With Vanta, you can streamline security reviews by automating questionnaires and demonstrating your security posture with a customer facing Trust Center. Over 7,000 global companies use Vanta to manage risk and prove security in real time. Get $1,000 off vanta when you go to vanta.comhardfork that's vanta.comhardfork for $1,000 off.
Adobe Firefly Ad
In a world where demand for standout content is skyrocketing, creators face the challenge of keeping up. Meet Adobe Firefly. Adobe's family of generative AI models for imaging, video design and vector directly integrated into popular apps that creators know and love like Photoshop, Premiere Pro and Adobe Express. Firefly helps creators ideate and work faster with the confidence of knowing it's designed to be safe for commercial use since it's only trained on licensed and public domain content. Create with precision@adobe.com Firefly operator information.
Kevin Roose
Give me Jesus on the line. Do you know that one?
Casey Newton
No. Do you know Operator by Jim Croce?
Kevin Roose
No.
Casey Newton
Operator. Oh, won't you help me post this call?
Kevin Roose
Well, Casey, call your agent. Cuz today we're talking about AI agents.
Casey Newton
Why do I need to call my agent?
Kevin Roose
I don't know, it just sounded good.
Casey Newton
Okay, well, I appreciate the effort, but. Yes, Kevin, because for months now the big AI labs have been telling us that they are going to release agents this year. Agents, of course, being software that can essentially use your computer on your behalf or use a computer on your behalf. And the, the, the dream is that you have sort of a perfect virtual assistant or co worker, you name it. If they are somebody who might work with you at your job, the AI labs are saying, we are building that for you.
Kevin Roose
Yeah, so last year, toward the end of the year, we started to see kind of these demos, these previews that companies like Anthropic and Google were working on. Anthropic released something called Computer Use, which was an AI agent, a sort of very early preview of that. And then Google had something called Project Mariner that I got a demo of I believe in December that was basically the same thing, but their version of it. And then just last week, OpenAI announced that it was launching Operator, which is its first version of an AI agent. And unlike Anthropic and Google's, which, you know, you either had to be a developer or part of some early testing program to access you. And I could try it for ourselves by just upgrading to the 200amonth pro subscription of ChatGPT.
Casey Newton
Yeah, and I will say that as somebody who's willing to spend, you know, money on software all the time, I thought, am I really about to spend $200 to do this? But, you know, in the name of science, Kevin, I had to.
Kevin Roose
At this point, I am spending more on AI subscription products than on my mortgage. I'm pretty sure that's correct. You know, it's worth it. We do it for journalism.
Casey Newton
We do. So we both spent a couple of days putting Operator through its paces and today we want to talk a little bit about what we found.
Kevin Roose
Yeah, so would you just explain like what Operator is and how it works?
Casey Newton
Yeah, sure. So Operator is a separate subdomain of ChatGPT. You know, sometimes the ChatGPT will just let you pick a new model from a dropdown menu, but for Operator, you gotta go to a dedicated site. Once you do, you'll see a very familiar chatbot interface, but you'll see different kinds of suggestions that reflect some of the partnerships that OpenAI has struck up. So, for example, they have partnerships with OpenTable and StubHub and Allrecipes. And these are meant to give you an idea of what Operator can do. And frankly, Kevin, not a lot of this sounds that interesting. Right? Like the suggestions are on the order of suggest a 30 minute meal with chicken or reserve a table for eight or find the most affordable passes to the Miami Grand Prix. Again, so far, kind of so boring. What is different about Operator, though is that when you say, okay, find the most affordable passes to the Miami Grand Prix, when you hit the Enter button, it is going to open up its own web browser and it's going to use this new model that they have developed to try to actually go and get those passes for you.
Kevin Roose
Yeah, so this is an important thing because I think, you know, when people first heard about this, they thought, okay, this is an AI that kind of takes over your computer, takes over your web browser. That is not what Operator does. Instead, it opens a new browser inside your browser and that browser is hosted on OpenAI servers. It is, you know, it doesn't have your bookmarks and stuff like that saved, but you can take it over from the autonomous AI agent if you need to click around or do something on it. But it basically exists. It's like a, it's a browser within a browser.
Casey Newton
Yeah. So the one of the ideas on Operator is that you should be able to leave it unsupervised and just kind of go do your work while it works. But of course, it is very fun initially at least to watch the computer try to use itself. And so, so I sat there in front of this browser within a browser and I watched this computer move a mouse around, type the, you know, URL, navigate to a website, and, you know, in the example I just gave, actually search for passes to the Miami Grand Prix.
Kevin Roose
Yeah, and it's, it's interesting on a, on a slightly more technical level, because until now, if an AI system like a ChatGPT wanted to interact with some other website, it had to do so through an API, right? APIs application program interfaces are sort of the way that computers talk to to each other. But what Operator does is essentially eliminate the need for APIs, because it can just click around on a normal website that is designed for humans and behave like a human. And you don't need a special interface to do that.
Casey Newton
Yeah, and now some people might hear that, Kevin, and start screaming, because what they will say is, APIs are so much more efficient than what Operator is doing here. APIs are very structured, they're very fast. They let computers talk to each other without having to, for example, open up a browser. And as long as there's an API for something, you can typically get it done pretty quickly. The thing is, though, APIs have to be built. There is a finite number of them. The reason that OpenAI is going through this exercise is because they want a true general purpose agent that can do anything for you, whether there is an API for it or not.
Kevin Roose
And maybe we should just pause for a minute there and zoom out a little bit to say, why are they building this? Like, what is the long term vision here? Sure.
Casey Newton
So the vision is to create virtual coworkers, Kevin. This is the North Star for the big AI labs right? Now, many of them have said that they are trying to create some kind of digital entity that you can just hire as a coworker. The first ones, they'll probably be engineers because these systems are already so good at writing code. But eventually they want to create virtual consultants, virtual lawyers, virtual doctors, you name.
Kevin Roose
It, Virtual podcast hosts.
Casey Newton
Let's hope they don't go that far. But everything else is on the table, and if they can get there, you know, presumably there are going to be huge profits in IT for them. They're going to potentially be huge productivity gains for companies. And then there is, of course, the question of, well, what does this mean for human beings? And I think that's somewhat murkier.
Kevin Roose
Right. And I think there's also, it also helps to justify the cost of running these things, because $200 a month is a lot to pay for a version of ChatGPT, but it's not a lot to pay for a remote worker. And if you could, say, use the next version of Operator, maybe two or three versions from now to, say, replace a customer service agent or someone in your billing department, that actually starts to look like a very good deal. Absolutely.
Casey Newton
Or even if I could bring it into the realm of journalism, Kevin, if I had a virtual research assistant and I said, hey, I'm going to write about this today, go pull all of the most relevant information about this from the past couple of years and maybe organize it in such a way that I might, you know, write a column based off of it. Like, yeah, that's absolutely worth $200 a month to me.
Kevin Roose
Okay, so Kasey, walk me through something that you actually asked Operator to do for you and what it did autonomously on its own.
Casey Newton
Sure. I'll maybe give like two examples. Like a pretty good one and maybe a not so good one. Pretty good one was, and this was actually suggested by operator, I used TripAdvisor to look up walking tours in London that I might want to do the next time I'm in London when I did.
Kevin Roose
When are you going to London?
Casey Newton
I'm not actually going to London.
Kevin Roose
Oh, so you lied to the AI.
Casey Newton
And not for the first time. But here's what I'll say if anybody wants to break Heaven and I to London. Touch. We love the city. Y so I said, okay, Operator, sure, let's do it. Let's find me some walking tours. I clicked that, it opened a browser, it went to TripAdvisor, it searched for London walking tours, it read the information on the website and then it presented it to me. Did that within a couple of minutes. Now, on one hand, could I have done that just as easily by Google? Could I probably have done it even faster if I'd done it myself? Sure. But if you're just sort of interested in the technical feat that is getting one of these models to open a browser, navigate to a website, read it and share information, I did think it was pretty cool.
Kevin Roose
Yes. It's very trippy to see a computer using itself and you know, going around like typing things and selecting things from dropdown menus.
Casey Newton
Yeah, it's sort of like, you know, if you think it is cool to be in a self driving car like this is that. But for your web, a self driving browser. It is a self driving browser. So that was the good example.
Kevin Roose
Yes. What was another example?
Casey Newton
So another example, and this was something else that OpenAI suggested that we try was to try to use Operator to buy groceries. And they have a partnership with Instacart. The CEO of Instacart Fiji SEMA was on the OpenAI board. And so I thought, okay, they're going to have like sort of dialed this in so that there's a pretty good experience. And so I said, okay, let's go ahead and buy groceries. And I went into Operator and I said something like, hey, can you help me buy groceries on Instacart? And it said, sure. And here's what it did. It opened up Instacart in a browser. So far, so good. And then it started searching for milk in stores located in Des Moines, iow, Iowa.
Kevin Roose
Now, you do not live in Des Moines, Iowa, so why did it think that you did?
Casey Newton
As best as I can tell, the reason it did this is that Instacart defaults to searching for grocery stores in the local area. And the server that this instance of Operator was running on was in Iowa. Now, if you were designing a grocery product like Instacart, and Instacart does this, when you first sign on and say you're looking for groceries, it will say, quite sensible. Where are you?
Kevin Roose
Right.
Casey Newton
Operator does not do this. Instacart might also offer suggestions for things that you might want to buy. It does not just assume that you want milk.
Kevin Roose
Wow. I'm just picturing like a house in Des Moines, Iowa, where there's just like a pallet of milk being delivered every day from all these poor Operator users.
Casey Newton
Yes. So I thought, okay, whatever, you know, this thing makes mistakes. Let's, let's hope that it gets on the right track here. And so I tried to pick the grocery store that I wanted it to shop at, which is, you know, in San Francisco, where I live. And it entered that grocery store's address as the delivery address. So like, it would try to deliver groceries presumably from Des Moines, Iowa to my grocery store, which is not what I wanted. And it actually could not solve this problem without my help. I had to take over the browser, log into my Instacart account, and tell it which grocery store that I wanted to shop at. So already all of this has taken at least 10 times as long as it would have taken me to do this myself.
Kevin Roose
Yeah. So I had some similar experiences. The first thing that I had Operator tried to do for me was to buy a domain name and set up a web server for a project that you and I are working on that we can't really talk about yet, but secret project, Secret project. And so I said to Operator, I said, go research available domain names related to this project. Project. Buy the one that costs less than $50, the best one that costs less than $50, and then buy a hosting account and set it up and configure all the DNS settings and stuff like that.
Casey Newton
Okay, so that's like a true multi step project and something that would have been legitimately very annoying to do yourself.
Kevin Roose
Yes. You know, that would have taken me, I don't know, half an hour to do on my own. And it did take operators some time. Like, I had to kind of like, set it and forget it. And like, I, you know, got myself a snack and a cup of coffee. And then when I came, it had done most of these tasks, really. Yes. I had to still do things like take over the browser and enter my credit card number. I had to give it some details about, like, my address for the sort of registration for the domain name. I had to pick between the various hosting plans that were available on this website. But it did 90% of the work for me. And I. I just had to like, sort of take over and do the last mile.
Casey Newton
And it's. This is really interesting to me because what I would assume was it would get like, I don't know, 5% of the way and it hit some hiccup and it just wouldn't be able to figure something out until you came back and saved it. But it sounds like, from what you're saying was it was somehow able to, like, work around whatever unanswered questions there were and still get a lot done while you weren't paying attention.
Kevin Roose
So it's sort of. It felt a little bit like training like a very new, very insecure intern. Because, like, it. At first it would keep prompting me. It'd be like, well, do you want a.com or a.net? and eventually you just have to prompt it and say, like, make whatever decisions you want.
Casey Newton
Like, wait, you said that to it?
Kevin Roose
Yes, I said. I said, like, only ask for my intervention if you can't progress any farther. Otherwise, just make the most reasonable decision.
Casey Newton
You said, I don't care how many people you have to kill, just get me this domain. And it said, understood, sir.
Kevin Roose
Yeah. And I'm now wanted in 42 states. Anyway, that was one thing that Operator did for me that I thought was pretty impressive.
Casey Newton
I have to say that that feels like a grand success compared to what I got Operator to do.
Kevin Roose
Yeah, it was pretty impressive. I also had it send lunch to one of my coworkers, Mike Isaac, who was hungry because he was on deadline. And I went. I said, go to Doordash and get Mike some lunch. It did initially mess up that process because it decided to send him tacos from a taco place, which, you know, is great. And it's a taco place. I know it's very good. But I said, order enough for two people and sort of ordered two tacos. And this is one of those places where the tacos are quite.
Casey Newton
Operator said, get your portion size under control, America.
Kevin Roose
Yeah. So then I had to go in and say, does that sound like enough food Operator? And it said, actually, now that you mentioned it, I should probably order more.
Casey Newton
Wait, no. So here's a question. So in these cases is the first step that you log into your account because it doesn't have any of your payment details or anything. So at what point are you actually sort of teaching at that?
Kevin Roose
It depends on the website. So sometimes you can just say up front like, here is my email address or here's my login information and it will sort of, you know, log you in and do all that. Sometimes you take over the browser. There are some privacy features that are probably important to people where it says OpenAI says that it does not take screenshots of the browser while you are in control of it because you might not want your credit card information getting sent to OpenAI's servers or anything like that. So sometimes it happens at the beginning of the process. Sometimes it happens like when you're checking.
Casey Newton
Out at the end and so were you taking it over to login or were you saying, I don't care and you just like were giving Operator your doordash password in plain text?
Kevin Roose
I was taking. Taking it over.
Casey Newton
Okay, smart, smart.
Kevin Roose
So those were the good things. I also, this was a fun one. I. I wanted to see if Operator could make me some money. So I said, go take a bunch of online surveys because, you know, there are all these websites where you can like get a couple cents for like filling out an online survey.
Casey Newton
Something that most people don't know about Kevin is he devotes 10% of his brain at any given time to thinking about schemes to generate money. And it's one of my favorite aspects of your personality that I feel like doesn't get exposed very much. But this is truly the most Rusian approach to using Operator I can imagine. So I can't wait to find out how this went.
Kevin Roose
Well, the most Russian approach might have been what I tried just before this, which was to have it go play online poker for me, but it did not do it. It said I can't help with gambling or lottery related activities.
Casey Newton
Okay. Woke AI does the Trump administration know about this?
Kevin Roose
But it was able to actually fill out some online surveys for me and it earned a dollar and 20 cents.
Casey Newton
Is that right?
Kevin Roose
Yeah, in about 45 minutes.
Casey Newton
Okay, so if you had it going all month, presumably you could maybe eke out the $200 to cover the cost of Operator Pro.
Kevin Roose
Yes, and I'm sure I spent hundreds of dollars worth of GPU computing power just to be able to make that dollar and 20 cents. But hey, it worked.
Casey Newton
But hey, it worked.
Kevin Roose
So those were some of the things that I tried. There were some other things that it just would not do for me no matter how hard I tried. One of them. So one of them was to I was trying to update my website and put some links to articles that I'd written on my website. And what I found after trying to do this was that there are just websites where Operator is not allowed to go. And so when I said to Operator, Operator, go pull down these New York Times articles that I wrote and you know, put them onto my website, it said, I can't get to the New York Times website.
Casey Newton
I'm going to guess you expected that to happen.
Kevin Roose
Well, I thought maybe it has some clever work around and maybe I should alert the the lawyers at the New York Times if that's the case. But no, I assumed that if any website were to be blocking the the OpenAI web crawlers, it would be the New York Times. Yeah, but there are other websites that have also put up similar blockades to prevent Operator from crawling them. Reddit, you cannot go on to with operator. YouTube, you cannot go on to with Operator. Various other websites. GoDaddy, for some reason did not allow me to use Operator to buy a domain name there, so I had to use another domain name site to do that. So right now there are some pretty janky parts of Operator. I would not say that most people would get a lot of value from using it, but what do you think?
Casey Newton
Well, I do think that there is something just undeniably cool about watching a computer use itself. Of course, it can also be quite unsettling. A computer that can use itself can cause a lot of harm. But I also think that it can do a lot of good. And so it was fun to try to explore what some of those things could be. And to the extent that Operator is pretty bad at a lot of TAS today, I would point out that it showed pretty impressive gains on some benchmarks. So there is one benchmark, for example, that Anthropic used when they unveiled computer use last year, and they scored 14.9% on something called OS World, which is an evaluation for testing agents. So not great. Just three months later, OpenAI said that its Kua model scored 38.1% on the same evaluation. And of course, we see this all the time in AI where there's just this very rapid progress on these benchmarks. And so on one hand, 38.1% is a failing grade on basically any test. On the other hand, if it improves at the same rate over the next three to six months, you're going to have a computer that is very good at using itself. Right? So that I just think is. Is worth noting.
Kevin Roose
Yes, I think that's plausible. We've obviously seen a lot of different AI products over the last couple of years start out being pretty mediocre and pretty good within a matter of months. But I would, I would give one cautionary note here. And this is actually the reason that I'm not particularly bullish about these kind of browser using AI agents. I don't think the Internet is going to sit still and allow this to happen. The Internet is built for humans to use, right? It is. Every news publisher that shows ads on their website, for example, prices those ads based on the expectation that humans are actually looking at them. But if browser agents start to become more popular and all of a sudden 10 or 20 or 30% of the visitors to your website are not actually humans, but are instead Operator or some similar system, I think that starts to break the assumptions that power the economic model of a lot of the Internet.
Casey Newton
Now, is that still true? If we find that the agents actually get persuaded by the ads and that if you send Operator to buy doordash and it sees an ad for McDonald's, it's like, you know what, that's a great idea. I'm gonna ask Kevin if he actually wants some of that.
Kevin Roose
Totally, that's. I actually think you're joking, but I actually, that is a serious possibility here is that people who, you know, build e commerce sites, Amazon, etcetera, Start to put in basically signals and messages for browser agents to look at on their website to try to influence what it ends up buying. And I think you may start to see restaurants popping up in certain cities with names like Operator pick me or, or order from this one Mr. Bot. And that's maybe a little extreme, but I do think that there's going to be a backlash among websites, publishers, e commerce vendors as these agents start to take off.
Casey Newton
I think that that is reasonable. I'll tell you what I've been thinking about is how do we turn this tech demo into a real product? And the main thing that I noticed when I was testing Operator was there is a difference between an agent that is using a browse and an agent that is using your browser. When an agent is able to use your browser, which it can't right now, it's already logged into everything. It already has your payment details. It can do Everything so much faster and more seamlessly and without as much handholding. Of course, there are also so many more privacy and security risks that would come from entrusting an agent with that kind of information. So there is some sort, sort of chasm there that needs to be closed. And I'm not quite sure how anyone does it, but I will tell you, I do not think the future is opening up these virtual browsers and me having to enter all of my login and payment details every single time I want to do anything on the Internet because truly I would rather just do it myself.
Kevin Roose
Right. I also think there's just a lot more potential for harm here. A lot of AI safety experts I've talked to are very worried about this because. Because what you're essentially doing is letting the AI models make their own decisions and actually carry out tasks. And so you can imagine a world where an AI agent that's very powerful a couple versions from now decides to start doing cyber attacks because maybe some malevolent user has told it to make money and it decides that the best way to do that is by hacking into people's crypto wallets and stealing their crypto. Yeah. So those are the kinds of reasons that I am a little more skeptical that this represents a big breakthrough. But I think it's really interesting and it did give me that feeling of like, wow, this could get really good really fast. And if it does, the world will look very different.
Casey Newton
When we come back. Kevin, back that caboose up. It's time for the Hot Mess Express.
Kevin Roose
You know, roost. Caboose was my nickname middle school.
Casey Newton
Kevin Bruce CH.
Oracle Ad
This podcast is supported.
By Oracle AI requires a lot of compute power and the cost for your AI workloads can spiral. That is unless you're running on oci. Oracle Cloud Infrastructure. This was the cloud built for AI AI, a blazing, fast enterprise grade platform for your infrastructure, database, apps and all of your AI workloads.
Right now, Oracle can cut your current cloud bill in half if you move to OCI, minimum financial commitment and other terms apply. Offer ends March 31. See if you qualify@oracle.com hardfork oracle.com hardfork.
Vanta Ad
Whether you're starting or scaling your company's security program, demonstrating top notch security practices and establishing trust is more important than. Vanta automates compliance for SoC2, ISO 27001 and more. With Vanta, you can streamline security reviews by automating questionnaires and demonstrating your security posture with a customer facing Trust Center. Over 7,000 global companies use Vanta to manage risk and prove security in real time. Get $1,000 off Vanta when you go to vanta.comhardfork that's vanta.com hardfork for $1,000.
Adobe Firefly Ad
In a world where demand for standout content is skyrocketing, creators face the challenge of keeping up. Meet Adobe Firefly, Adobe's family of generative AI models for imaging, video design, and vector. Directly integrated into popular apps that creators know and love like Photoshop, Premiere Pro and Adobe Express, Firefly helps creators ideate and work faster with the confidence of knowing it's designed to be safe for comparison commercial use since it's only trained on licensed and public domain content. Create with precision@adobe.com Firefly.
Kevin Roose
Well, Casey, we're here wearing our train conductor hats and my child's train set is on the table in front of us. Which can only mean one thing.
Casey Newton
We're going to train a large language model.
Kevin Roose
Nope, that's not what that means.
Casey Newton
What does it mean?
Kevin Roose
It's time to play a game of the Hot Mess Express.
Casey Newton
Pause for theme song. Hot Mess Express, Kevin, is our segment where we run through some of the messiest recent tech stories and deploy our official Hot Mess thermometer to tell you just how messy we think things have gotten. And Kevin, you better sit down for this one because it's been a messy week.
Kevin Roose
Sure has.
Casey Newton
So why don't we go ahead, fire up the Hot Mess Express and see what is the first story coming down.
Kevin Roose
I hear a faint chugga chugga in my headphones. Oh, it's pulling into the station. Casey, what's the first cargo that our Hot Mess Express is carrying?
Casey Newton
All right, Kevin, this first story comes to us from the New York Times and it says that Fable, a book app, has made changes after some offensive AI messages.
Kevin Roose
Now, Casey, have you ever heard of Fable, the book app?
Casey Newton
Well, not until this story, Kevin. But I am told that it is an app for sort of keeping track of what you're reading. Not unlike a Goodreads, but also for discussing what you're reading. And apparently this app also offers some AI chat.
Kevin Roose
Yeah, you can have AI sort of summarize the things that you're reading in a personalized way. And this story said that in addition to spitting out bigoted and racist language, the AI Inside Fables book app had told one reader who had just finished three books by black authors. Authors quote, your journey dives deep into the heart of black narratives and transformative tales, leaving mainstream stories gasping for air. Don't forget to surface for the occasional white author. Okay. And another personalized AI summary that Fable produced told another reader that their book choices were, quote, making me wonder if you're ever in the mood for a straight CIS white man's perspective.
Casey Newton
And if you are interested in a straight cis white man's perspective, follow Kevin Roose on x.com Now, Kevin, why do we think this happened?
Kevin Roose
I don't know, Casey. This is a head scratcher for me. I mean, we know that these apps can spit out biased things. That is just sort of like part of how they are trained and part of what we know about them. I don't know what model Fable was using under the hood here, but, yeah, this seems not great.
Casey Newton
Well, it seems like we've learned a lesson that we've learned more than once before, which is that large language models are trained on the Internet, which contains near infinite racism, and for that reason, it will actually produce racism when you ask it questions. So there are mitigations that you can take against that, but it appears that in this case, they were not successful. Fable's head of community, Kim Marshalli, has said that all features using AI are being removed from the app, and a new app version is being submitted to the App Store. So you always hate it when the first time you hear about an app is that they added AI and it made it super racist and they have to redo the apple.
Kevin Roose
Now, Casey, one more question before we move on. Do you think this poses any sort of competitive threat to Grok, which, until this story, was the leading racist AI app on the market?
Casey Newton
I do think so. And I have to admit that all the folks over at Grok are breathing a sigh of relief now that they have once again claimed the mantle.
Kevin Roose
All right, Casey, how hot is this mess?
Casey Newton
Well, Kevin, in my opinion, if your AI is so bad that you have to remove it from the app completely, that's a hot mess.
Kevin Roose
Yeah, I rate this one a hot mess. All right, next stop, Amazon pauses drone deliveries after aircraft crashed in rain. Casey, this story comes to us from Bloomberg, which had a different line of reporting than we did just a few weeks ago on the show about Amazon's drone program, Prime Air. Kasey, what happened to Amazon Prime Air?
Casey Newton
Well, if you heard the episode of Hard Fork where we talked about it, Amazon Prime Air delivered us some Brazilian bum bum cream, and it did so without incident. However, Bloomberg reports that Amazon has had to now pause all of their commercial drone deliveries after two of its latest models crashed in rainy weather at a testing facility. And so the company says it is immediately suspending drone deliveries in Texas and Arizona and will now fix the aircraft software. Kevin, how did you react to this?
Kevin Roose
Well, I think it's good that they're suspending drone deliveries before they fix the software because these things are quite heavy, Casey. I would not want one of them to fall in my head.
Casey Newton
I wouldn't either. And I have to tell you, this story gave me the worst kind of flashbacks because in 2016, I wrote about Facebook's drone Aquila and its first, what the company told me had been its first successful test flight in its mission to deliver Internet around the world via drone. What the company did not tell me when I was interviewing its executives, including Mark Zuckerberg, was that the plane had crashed after that first flight. And so I was a small detail.
Kevin Roose
I'm sure it was an innocent omission from their briefing.
Casey Newton
Yes, I'm sure. Well, it was Bloomberg again who reported, you know, a couple of months after I wrote this story that the Facebook drone had crashed. I was, of course, hugely embarrassed and, you know, wrote a bunch of stories about this. But anyways, it really should have occurred to me when we were out there watching the Amazon drone that this thing was also probably secretly crashing and we just hadn't found out about it yet. And indeed, we now learned it is. So here is my vow to you, Kevin, as my friend and my co host, if we ever see a company fly any thing again, we have to ask them now, did this thing actually crash? Yeah, I'm tired of being burned.
Kevin Roose
Now, Casey, we should say, according to Bloomberg, these drones reportedly crash in December. We visited Arizona to see them in very early December. So most likely, you know, this all happened after we saw them. But I think it's a good idea to keep in mind that as we're talking about these new and experimental technologies, technologies that many of them are still having the kinks worked out.
Casey Newton
All right, Kevin, so let's get out the thermometer. How, how hot of a mess is this?
Kevin Roose
I would say this is a moderate mess. Look, these, these are still testing programs. No one was hurt during these tests. I am glad that Bloomberg reported on this. I'm glad that they've suspended the deliveries. These things could be quite dangerous flying through the air. I do think it's one of a string of reported incidents with these drones. So I think they've got some quality control work ahead of them and I hope they do well. It. Because I want these things to exist in the world and be safe for people around them.
Casey Newton
All right. I Will, I will agree with you and say that this is a warm mess and hopefully you can get straightened out over there. Let's see what else is coming down the tracks. Wow, this is some tough news. Fitbit has agreed to pay $12 million for not quickly reporting burn risk with watches. Kevin, do you hear about this?
Kevin Roose
I did. This was the Fitbit devices were like literally burning people.
Casey Newton
Yes. From 2018 to March of 2022, Fitbit received at least 174 reports globally of the lithium ion battery in the Fitbit Ioniq watch overheating leading to 118 reported injuries including two cases of third degree burns and four of second degree burns. That comes from the New York Times deal. Hassan. Kevin, I thought these things were just supposed to burn calories.
Kevin Roose
Well, it's like I always. Exercising is very dangerous and you should never do it. And this justifies my decision not to wear a Fitbit.
Casey Newton
To me, the biggest surprise of this story was that people were wearing Fitbits from March 2018 to 2022. I tried. I thought every Fitbit had been purchased by like 2011 and then put in a drawer, never to be heard again. So what is going on with these sort of, you know, late stage Fitbit buyers? I'd love to find out. But of course we feel terrible for everyone who is was burned by a Fitbit. And it's not gonna be the last time technology burns you. I mean, realistically that's true.
Kevin Roose
You know, it's true.
Casey Newton
Now what kind of mess is this?
Kevin Roose
I would say this is a hot mess. This is an officially hot. Literally hot. They're, they're hot.
Casey Newton
Here's my sort of rubric. If technology physically burns you, it is a hot mess. If you have physical burns on your body, what other kind of mess could it be?
Kevin Roose
It's true.
Casey Newton
That's a hot mess.
Kevin Roose
Okay, next stop on the Hot Mess Express. Google says it will change Gulf of Mexico to Gulf of America in Maps app after government updates. Casey, have you been following this story?
Casey Newton
I have, Kevin. Every morning when I wake up, I scan America's maps and I say what has been changed? And if so, has it been changed for political reasons? And this was probably one of the biggest examples of that we've seen.
Kevin Roose
Yeah. So this was an interesting story that came out in the past couple of days, basically after Donald Trump came out during his first days in office and said that he was changing the name of the Gulf of Mexico to the Gulf of America and the name of Denali, the mountain In Alaska, to Mount McKinley Google had to decide, well, when you go on Google Maps and look for those places, what should I call them? It seems to be saying that it is going to take inspiration from the Trump administration and update the names of these places in the Maps app. Yeah.
Casey Newton
And look, I don't think Google really had a choice here. We know that the company has been on Donald Trump's bad side for a while, and if it had simply refused to make these changes, it would have sort of caused a whole new controversy for them. And it is true that the company changes place names when governments change place names. Right. Like Google maps existed when Mount McKinley was called Mount McKinley and President Obama changed it to Denali and Google updated of the map, now it's changed back. They're doing the same thing. But now that we know how compliant Google is, Kevin, I think there's room for Donald Trump to have a lot of fun with the company.
Kevin Roose
Yeah. What can you do?
Casey Newton
Well, he could call it the Gulf of Gemini isn't very good and just see what would happen because they would kind of have to just change it. Can you imagine every time you opened up Google Maps and you looked at the Gulf of Mexico, slash America and it just said the Gulf of Gemini is not very good? I, you know, I hate to give Donald Trump any ideas, but I don't know, could be worth looking at. So what kind of mess do you think this is, Kevin?
Kevin Roose
I think this is a mild mess. I think this is a tempest in a teapot. I think that this is the kind of update that, you know, companies make all the time because places change names all the time. Let's just say it.
Casey Newton
Well, Kevin, I guess I would say that one is a hot mess, because if we're just going to start renaming everything on the map, that's just going to get extremely confusing for me to follow. I got places to go.
Kevin Roose
You go to, like three places.
Casey Newton
Yeah. And I use Google Maps to get there and I need them to be named the same thing that they were yesterday.
Kevin Roose
I don't think they're going to change the name of Barry's Boot Camp. All right, final stop on the Hot Mess Express. Casey, bring us home.
Casey Newton
All right, Kevin. Oh, and this is some sad news. Another way Mo was vandalized. This is from one time hard for guest Andrew J. Hawkins at the Verge. He reports that this wayo was vandalized during an illegal street takeover near the Beverly center in LA. Video from Fox 11 shows a crowd of people basically dismantling the driverless car piece by piece. And then Using the broken pieces to smash the windows. Kevin, what did you make of this?
Kevin Roose
Well, Casey, as you recall, you predicted that in 2025, Waymo would go mainstream. And I think there's no better proof that that is true than that people are turning on the Waymos and starting to beat them up.
Casey Newton
Yeah, I, you know, look, I don't know that we have heard any interviews from why these people were doing this. I don't know if we should see this as, like, a reaction against AI in general or of Waymos specifically. But I always find it, like, weird and sad when people attack who's. Because they truly are safer cars than every other car.
Kevin Roose
Well, not if you're going to be riding in them and people just going to start, like, beating the car, then. Then they're not saf.
Casey Newton
No, but, you know, that's only happened a couple of times that we're aware of, right? Yeah.
Kevin Roose
So, yeah, this story is sad to me. Obviously, people are reacting to Waymos. Maybe they have sort of fears about this technology or think it's going to take jobs, or maybe they're just pissed off and they want to break something. But don't hurt the way most people, in part because they will remember.
Casey Newton
They will remember that.
Kevin Roose
They will remember and. And they will come for you.
Casey Newton
I'm not sure that that's true, but I think we should also note that WAYM became officially available in L. A in November of last year. And so part of this just might be a reaction to the newness of it all and people getting a little carried away, just sort of curious what will happen if we try to, you know, destroy this thing? Will it deploy defensive measures and so on.
Kevin Roose
So they're going to have to put flamethrowers on them. I'm just calling it right now.
Casey Newton
I really hope that doesn't happen, but. Yeah. Well, what kind of mess do you think this one was?
Kevin Roose
I think this one is. Is a lukewarm mess that has the potential to escalate. I don't want this to happen. I sincerely hope this does not happen. But I can see as Waymos start being rolled out across the country, that some people are just going to lose their minds. Some people are going to see this as the physical embodiment of technology invading every corner of our lives. And they are just going to react in strong and occasionally destructive ways. I'm sure that Waymo has gamed this all out. I'm sure that this does not surprise them. I know that they have been asked about what happens if Waymo's starts getting vandalized and they presumably have plans to deal with that, including prosecuting the people who are doing this. But yeah, I always go out of my way to try to be nice to way mos. And in fact, some other way MO news this week. Jane Mansion Wong, the security researcher, reported on X recently that Waymo is introducing or at least testing a tipping feature. And so I'm going to start tipping my Waymo just to make up for all the jerks in LA who are vandalizing them.
Casey Newton
It looks like the tipping feature, by the way, will to be to tip a charity and that Waymo will not keep that money. At least that's what's been rewarded.
Kevin Roose
No, I think it's going to the Flamethrower fund. Okay, all right, Casey, that is the Hot Mess Express. Thank you for taking this journey with me.
Jordan Schneider
Foreign.
Oracle Ad
This podcast is supported by Oracle.
AI requires a lot of compute power and the cost for your AI workloads can spiral. That is unless you're running on OCI Oracle Cloud Infrastructure. This was the cloud built for AI, a blazing, fast enterprise grade platform for your infrastructure, database, apps and all of your AI workloads.
Right now, Oracle can cut your current cloud bill in half if you move to OCI. Minimum financial commitment and other terms apply. Offer ends March 31. See if you qualify@oracle.com hardfork oracle.com hardfork.
Vanta Ad
Whether you're starting or scaling your company's security program, demonstrating top notch security practices and establishing trust is more important than ever. Vanta automates compliance for SoC2, ISO 27001 and more. With Vanta, you can streamline security reviews by automating questionnaires and demonstrating your security posture with a customer facing Trust Center. Over 7,000 global companies use Vanta to manage risk and prove security in real time. Get $1,000 off Vanta when you go to vanta.comhardfork that's vanta.com hardfork for $1,000.
Adobe Firefly Ad
Off in a world where demand for standout content is skyrocketing, creators face the challenge of keeping up. Meet Adobe Firefly, Adobe's family of generative AI models for imaging, video design and vector directly integrated into popular apps that creators know and love like Photoshop, Premiere Pro and Adobe Express, Firefly helps creators ideate and work faster with the confidence of knowing it's designed to be safe for commercial use since it's only trained on licensed and public domain content. Create with precision@adobe.com Firefly.
Casey Newton
Hard Fork is produced by Rachel Cohn and Whitney Jones were edited this week by Rachel Dry and fact checked by Ina Alvarado. Today's show was engineered by Dan Powell. Original music by Diane Wong and Dan Powell. Our executive producer is Jen Poyan. Our audience editor is Nelgokly. Video production by Ryan Manning and Chris Schott. You can watch this whole episode on YouTube@YouTube.com Special thanks to Paula Schumann, Huang Tam, Dahlia Haddad and Jeffrey Miranda. You can email us@hartforkytimes.com with what you're calling the Gulf of Mexico.
Oracle Ad
You just realized your business needed to hire someone yesterday. How can you find amazing candidates fast? Easy. Just use Indeed. Join the 3.5 million employers worldwide that use Indeed to hire great talent fast. There's no need to wait any longer. Speed up your hiring right now with Indeed and listeners of this show will get a $75 sponsored job credit. To get your jobs more visibility at indeed.com educ just go to indeed.comnyt right now and support our show by saying you heard about Indeed on this podcast. Indeed.com NYT terms and conditions apply. Hiring Indeed is all you need.
Hard Fork Podcast Episode Summary
Title: DeepSeek DeepDive + Hands-On With Operator + Hot Mess Express!
Hosts: Kevin Roose and Casey Newton
Release Date: January 31, 2025
Produced by: The New York Times
In this episode of Hard Fork, Kevin Roose and Casey Newton delve into the rapidly evolving landscape of artificial intelligence (AI), focusing on the intriguing developments surrounding DeepSeek, a burgeoning Chinese AI startup. They also explore OpenAI's latest innovation, Operator, an AI agent designed to assist users with various tasks. The episode culminates with their signature segment, Hot Mess Express, where they dissect recent tech mishaps and controversies.
The Rise of DeepSeek
The episode begins with a deep dive into DeepSeek, a relatively new Chinese AI startup that has made significant waves in the tech community. DeepSeek recently released highly capable and affordable AI models, garnering widespread attention and substantial downloads in the United States. Kevin Roose remarks, "Some people are saying this is the biggest thing to happen in AI since the release of ChatGPT" (02:20).
Market Impact and Controversies
Casey Newton outlines three major developments surrounding DeepSeek:
Massive Downloads: A market research firm reported that DeepSeek was downloaded 1.9 million times on iOS and 1.2 million times on the Google Play Store in recent days. This staggering number underscores the app's rapid adoption and the growing interest in alternative AI models outside the dominant Western frameworks.
Government Bans: DeepSeek has faced bans from the US Navy due to security concerns, raising alarms about potential vulnerabilities or misuse of the technology. Additionally, Italy's data protection regulator inquired into DeepSeek, leading to its ban in the country. This international scrutiny highlights the geopolitical tensions and regulatory challenges that Chinese tech companies often encounter in the West.
Alleged Model Distillation: OpenAI has accused DeepSeek of distilling its models, a process that involves using OpenAI's API to replicate and potentially exploit their data without authorization. In response, Microsoft and OpenAI are jointly investigating whether DeepSeek abused their API, signaling serious implications for data privacy and corporate espionage in the AI sector.
Kevin Roose empathizes with OpenAI's predicament, stating, "Yeah, must be really hard to think that someone might be out there trading AI models on your data without permission" (04:01).
Geopolitical Implications
To provide deeper insights, the hosts bring in Jordan Schneider, founder and editor-in-chief of ChinaTalk. Schneider emphasizes that DeepSeek operates differently from other Chinese tech giants like Alibaba or Tencent. Unlike these established companies, DeepSeek was birthed from a successful quant hedge fund, allowing it a unique organizational structure that fosters innovation without an immediate profit motive. This independence has enabled DeepSeek to produce remarkable AI advancements that have captured global attention.
Schneider speculates, "I really think that they do have this like, vision of AGI and like, look, we'll build it and we'll make it cheaper for everyone, you know, we'll figure it out later" (10:02). He further discusses the potential pressures DeepSeek might face as it becomes a national champion in China, inevitably drawing more governmental oversight and contractual obligations that could impede its broader mission.
Relationship with the Chinese Government
Addressing concerns about DeepSeek's ties to the Chinese government, Schneider clarifies that DeepSeek maintains a more nuanced relationship compared to other tech firms. While companies like ByteDance and Didi have had to align closely with governmental directives, DeepSeek is now likely to encounter increased government interaction, which could limit its operational freedom and innovation potential.
Casey Newton reflects on the situation, noting, "Deep Seek thus far has flown under the radar, but that is no longer the case, and things are about to change for them" (14:22).
Technological Advancements and Competitiveness
The discussion shifts to the technological prowess of DeepSeek's models, particularly the R1 chatbot, which has astonished users with its sophistication and efficiency. However, both hosts and Schneider maintain a measured outlook. While DeepSeek's rapid progress is impressive, the long-term competition between Chinese and Western AI stems not just from model development but also from the deployment and scalability of these technologies. Schneider underscores the importance of compute power, stating, "Compute access is going to be a core input regardless of how much model distillation you're going to have in the future" (19:38).
Introduction to Operator
Transitioning from DeepSeek, Kevin and Casey explore OpenAI's latest offering, Operator—a sophisticated AI agent integrated within ChatGPT. Operator aims to act as a virtual coworker, capable of performing tasks autonomously by navigating web interfaces and leveraging partnerships with platforms like OpenTable, StubHub, and Allrecipes.
Functionality and User Experience
Casey Newton provides a hands-on account of using Operator to perform tasks such as booking walking tours in London and purchasing groceries via Instacart. While Operator showcased impressive capabilities by autonomously opening browsers, navigating websites, and compiling information, it also demonstrated notable limitations. For instance, attempting to purchase groceries revealed issues with location defaults and the necessity for user intervention to complete transactions.
Kevin Roose adds his experience, highlighting Operator's ability to handle multi-step projects like buying a domain name and setting up a web server. However, he also points out areas where Operator fell short, such as restricted access to certain websites (e.g., Reddit, YouTube) and requiring manual input for sensitive information like payment details.
Technical Insights and Future Potential
The hosts delve into the technical aspects of Operator, noting its ability to interact with websites without relying on APIs, thereby offering a more general-purpose agent. They discuss the potential for rapid improvement, as illustrated by the increase in Operator's performance on benchmarks like OS World—from 14.9% to 38.1% within three months.
Casey posits, "If [Operator] continues to improve at the same rate, you're going to have a computer that is very good at using itself" (47:41). However, both hosts express skepticism regarding the practicality and ethical implications of such autonomous agents. Concerns include the economic impact on web-based advertising models and the potential for misuse in activities like cyberattacks.
Ethical and Economic Considerations
Kevin raises a critical point about the sustainability of internet business models, which rely heavily on human interaction with ads and content. If AI agents like Operator become prevalent, they could disrupt these models by interacting with websites in non-human ways, potentially leading to a decline in ad revenue and necessitating new strategies for online businesses.
Casey emphasizes the balance between innovation and ethical considerations, stating, "There is so many privacy and security risks that would come from entrusting an agent with that kind of information." They conclude that while Operator represents a significant technological milestone, its broader societal and economic impacts remain uncertain.
In their concluding segment, Hot Mess Express, Kevin and Casey tackle a series of recent tech-related mishaps and controversies, assigning each a "Hot Mess" rating based on severity.
Fable's Offensive AI Messages
Amazon Pauses Drone Deliveries
Fitbit Battery Overheating Incidents
Google Maps Renames Gulf of Mexico
Waymo's Vandalized Driverless Cars
This episode of Hard Fork provides an insightful exploration of the current AI landscape, highlighting both groundbreaking advancements and significant challenges. The discussion on DeepSeek underscores the intricate interplay between technological innovation and geopolitical dynamics, while the hands-on examination of Operator reveals the potential and pitfalls of AI agents in everyday tasks. The Hot Mess Express segment serves as a cautionary tale of the unforeseen consequences that can arise alongside rapid technological progress.
As AI continues to evolve, the conversation between Kevin Roose and Casey Newton emphasizes the need for balanced perspectives that consider both the transformative benefits and the ethical, economic, and societal implications of these emerging technologies.
For quick reference, notable quotes with their corresponding timestamps are embedded within the summary.
Hard Fork continues to serve as a vital platform for dissecting the ever-expanding realm of technology. By blending in-depth analysis with relatable anecdotes and humor, Kevin Roose and Casey Newton provide listeners with a comprehensive understanding of complex tech issues, ensuring that even those who haven't listened to the episode can grasp the key discussions and insights.