
“Sam Altman in his remarks said that this is a major upgrade,” but he also said that “we’re not at A.G.I. yet.”
Loading summary
Invesco QQQ
Over the last two decades, the world has witnessed incredible progress. From dial up modems to 5G connectivity, from massive PC towers to AI enabled microchips. Innovators are rethinking possibilities every day. Through it all, Invesco QQQ ETF has provided investors access to the world of innovation with a single investment. Invesco QQQ let's rethink possibility. There are risks when investing in ETFs, including possible loss of money. ETFs risks are similar to those of stocks. Investments in the tech sector are subject to greater risk and more volatility than more diversified investments. Before investing, carefully reading, consider fund investment objectives, risks, charges, expenses and more in perspectives at Invesco Do. Invesco Distributors Incorporated.
Kevin Roose
I had a first this week.
Casey Noon
What's that?
Kevin Roose
I had my first experience with smelling salts.
Casey Noon
Wait, did you faint? Yes.
Kevin Roose
So I had to get a blood draw at the doctor. And I am a big baby when it comes to getting blood taken. And so like half the time when I get blood taken, I pass out. And this time I not only passed out, but I vomited and had to be brought back with smelling salts, which, Casey, if you have never, never experienced smelling salts. They're. They're not messing around.
Casey Noon
They are not. And I. Okay, so I cannot believe that I'm just learning this information about you because I am also a fainter.
Kevin Roose
You're a fainter.
Casey Noon
I am a fainter.
Kevin Roose
We are Legion.
Casey Noon
In 12th grade, we went to go see a cadaver for my AP biology class. And intellectually, I was like, so fascinated, you know, by all the systems of the body. And so me and all the other kids are standing around the cadaver and the person is sort of explaining, well, you know, this is the liver and this is the spleen. And then I just got a whiff of something. I don't know if it was embalming fluid or formaldehyde or something, but it was like something triggered in my brain and was like, this is against nature. Like, you should not be this close to an opened up dead body. And I truly, I spun around, I took a header off a whiteboard that was like, you know, against the wall. And I wake up and I'm staring at the ceiling and the first thing I Hear is my AP bio teacher, Ms. Oliver, saying, do we have an emergency contact for this kid? Obviously, like, I don't wanna tell people that they should like faint, but it is one of the most amazing CR experiences you can have in the human. Like, do you know what I mean? The moment when you're like your consciousness just sort of like leaves you.
Kevin Roose
Yes.
Casey Noon
Crazy.
Kevin Roose
And when I was brought back with the smelling salts, it felt very like Victorian, you know, like I was like, no, the vapors.
Casey Noon
The vapors.
Kevin Roose
Call Mr. Darcy. I'm Kevin Roose, a tech columnist at the New York Times.
Casey Noon
I'm Casey Noon from Platformer and this is Hard Fork this week. Give me five GPT5 will tell you all about OpenAI's latest Frontier model. Then Kevin and I get access to the new Alexa. Plus we found a few minuses and we're bringing in Amazon's VP of Alexa to talk about it. Alexa, prepare my interview questions.
Kevin Roose
Well, Kasey, it has been another busy week in the world of AI.
Casey Noon
Boy, has it.
Kevin Roose
And because we are going to talk about OpenAI, I should add my disclosure that the New York Times company is suing them and Microsoft for copyright violations.
Casey Noon
And my boyfriend works at Anthropic.
Kevin Roose
So we've gotten a bunch of new AI releases and announcements this week. We're not going to go through all of them, but some of the highlights, we got something called a world model from Google. DeepMind Genie 3 has this kind of interactive game engine where you can just sort of like describe a game that you want to play and it can sort of build it in real time. Pretty cool. We can't use that yet, so that was just a demo or research preview, but that was early in the week. And then we got a new Claude version. Opus 4.1 is out. So I've been playing around with that. Not too different, but sort of a newer update from them. We also got open source models from OpenAI, putting the open back in their name. They released two open source models this week. And Casey, have you played around with either of those?
Casey Noon
I have not yet downloaded them. Have you?
Kevin Roose
I have not. One of them is apparently small enough that you can run it on a MacBook. Another one, you kind of need a dedicated GPU for it. But these are basically OpenAI's first open source models since GPT2 many years ago. People have been hounding them. You guys are sort of betraying the founding spirit of OpenAI by not making these things open and accessible through open source. And they said, well, here you go, here are some models. They're not their sort of top of the line models, but people are sort of finding various uses for them. And this is sort of designed to compete with the open source models coming out of China with companies like Deep Seq.
Casey Noon
Yeah. And the early word on these is that they're pretty good and that they're competitive with the O3 mini, O4 mini, which are more proprietary models. But yeah, the early reviews I was reading of the open source models were that they were powerful and good.
Kevin Roose
Yeah, so those are some of the announcements that we got earlier in the week. But the big one is that this week OpenAI released GPT5, their long awaited flagship model. People have been asking Sam Altman about this, including us, for many months now. This was long awaited. There was lots of hype and rumors flying around about this and we just got off a press briefing, a sort of zoom call with Sam Altman and some of the other leaders of OpenAI and Casey. What did we learn?
Casey Noon
Well, it probably won't surprise people to learn that what they told us during this briefing was that GPT5 is their best model ever. You know, know, Sam Altman in his remarks said that this is a major upgrade. He called it a significant step along the path to AGI. But he also said that we're not at AGI yet, among other things. For example, he said, look, this model does not continuously learn, and in his view, AGI will continuously learn. So I actually thought it was cool that he said that because now we have sort of one thing to hang on to where, well, maybe when a model can do that, we'll feel like we really are getting close to AGI. But you know, the one other thing that he said that struck me that I thought was kind of funny was that he said that after they had sort of put GPT5 together, he went back to using GPT4 and he said, quote, it was quite miserable. He said he never wants to go back to using GPT4 ever again. That's how good he says GPT5 is. Kevin.
Kevin Roose
Yeah, so he sort of compared it to the previous models. He said GPT3 felt like talking to a high school student, GPT4 felt like talking to a college student. And GPT5, he said, is the first time it feels like talking to an expert, someone, you know, who has a PhD in a subject. So I think we should caveat this all by saying that as of our taping this week, GPT5 had not yet been rolled out and we hadn't been able to sort of put it through its paces. But it will be rolled out this week, including to free users of ChatGPT who have not previously had access to their top of the line models.
Casey Noon
Yeah, and I think that that's important because OpenAI's best models at the moment have been reserved for paying users. So the, the chatbot that I use the most is O3, which is a reasoning model that OpenAI makes that' not accessible to people who are on the free plan. So I do think it's really notable now that even free users, which I think is going to include a lot of high school and college students out there, are now going to have access to what at least they are saying is PhD level intelligence and reasoning.
Kevin Roose
Yeah, so one of the most annoying features of ChatGPT for years now has been this model selector where you go in and it sort of gives you this little drop down menu and it defaults to GPT4O now. But you can kind of pick your own model if you want something more powerful than that and you're a paying user. But for GPT5, OpenAI is doing away with the model picker, or at least making it sort of less necessary because they have built this, what they called a router that will essentially analyze your request and how much sort of computation it needs to be able to answer that request, and whether it's a simple query or something more involved, and it will kind of direct it to the correct model. And I think for a lot of people this is going to be their first experience with a reasoning model because OpenAI does not make that the default right now in ChatGPT. And so I think that will be a big update for people. Regardless of whether GPT5 is actually better than previous models. Just the ability to kind of use these reasoning models for free seems like a pretty big deal.
Casey Noon
Yeah, getting rid of the model picker I think could cut both ways. And we should say that all of the big labs have a model picker. So Gemini has one, Anthropic has one, and Claude. And sometimes I will maybe ask an easy question of one of these models that is maybe set to reasoning mode and then I'll think, oh, gosh, I probably didn't need that much computational power. On the other hand, I do sort of feel like it sets up an incentive for OpenAI, which wants to sort of save as much money as it can to just always try to route you to the sort of the absolute least compute that you need. And so I will just be curious to see if I feel like that is affecting the quality of my experience now that I, you know, maybe can't go in and actually say, hey, let me use the good stuff.
Kevin Roose
Yeah. So on this briefing, OpenAI said all the sort of expected things about how GPT5 is, you know, better at everything than previous Models. But they also spent a lot of time talking about what they called the vibes of the model, which they believe are quite good.
Casey Noon
They also gave a series of demos and one that I thought was interesting introduced this concept of what I believe Sam Altman called software on demand. So GPT5 can instantaneously create a piece of software for you. In the demo that we saw, one of the employees there built a tool to let his girlfriend learn French. And it did this in some sort of fun ways. In one case, it sort of created a series of flashcards for her. In another, it created like a little snake game with like a mouse and a piece of cheese. So every time like the mouse caught a piece of cheese, it would like show her a new word to learn. And he was able to do all of that just via a text based prompt. And it actually looked like pretty good. Yeah, I mean, I think, you know, five years ago, if you turn that in and an Intro to Computer Programming class, you probably would have gotten an.
Kevin Roose
A. Yeah, it's pretty impressive. But those are also things that other models can do today. So I'm gonna need to like really drive this thing myself to figure out what it can do and I'm gonna put it through my usual bevy of tests known as Rusebench and see how it does. Yeah, I confess during this briefing I zoned out a little bit. I've been to a bunch of these. Everyone says their model is the latest and greatest and it's so good at coding and it's all got all these agentic capabilities. And it all starts to sound a little bit like marketing hype to me. For me, the interest to ask about new models these days is not how much better is it or how does it score on these benchmarks. It's like, what is possible for me now that wasn't before. And I still don't have a really good answer to that from GPT5, although I'm going to investigate.
Casey Noon
Yeah, and you know, that might be a good point to bring up two of the questions that got asked of the GPT5 team during the briefing that we're on that I think would be of interest to our listeners. One was somebody asked, hey, like, are you starting to run into the limits here? Right. Are the scaling laws holding? And Sam Altman said, quote, they absolutely still hold and we keep finding new dimensions to scale on. He said that they're still finding new paradigms that will let them scale in new ways. So he very much tried to give the impression that like, no we are not struggling at all to figure out how to build better models. I suspect though, that that might get some pushback as people start to use this thing and just sort of observe that like, yes, it is like clearly better in a ha. A handful of ways. But to your point, Kevin, can you really do anything that you couldn't do before that doesn't like, seem like it's the case? It's just sort of that it can do what it used to do a little bit better. I'm curious what you made of that.
Kevin Roose
Yeah, I think that's reasonable. I mean, I wonder if they are starting to finesse their definitions of the scaling laws to sort of account for the new reasoning models. Because people, you know, from, for months now have been saying, well, the models in the pre training phase may have gotten as good as they're going to get or about as good as they're going to get, but the way to get them to be more intelligent is through this post training phase is through these reinforcement learning cycles, these sort of reasoning environments that they're trained on and put into. And so I suspect that when they say that the scaling laws have not broken, they're also sort of referring to this kind of reinforcement learning reasoning approach as well. And I believe them. I've talked to people who say, you know, they think there's still a long way to go on that, but this was a really big model. We don't know exactly how big. We don't know exactly how many GPUs it was trained on or how much data it was fed. But it's safe to assume that they did everything they could to sort of max out the scale of the model, at least in pre training. And you know, from what we saw in the demos, it doesn't look like it's like that much smarter. I mean, maybe it's a little better at some things, but it did not, you know, come out of the box super intelligent or anything like that.
Casey Noon
Yeah. The other big question that I'm always interested in when these big new models come out is what was the safety testing experience like? And is this model going to be sycophantic and what sorts of very intense relationships are people going to form with it? And Nick Turley addressed that one. He noted that earlier this week, OpenAI put out a blog post which actually wrote about in Platformer this week that is all about their approach here. They say they're working with physicians, really trying to bring in a lot of outside expertise to help them understand how people are interacting with these models and make them safer. And he said that they are absolutely not optimizing for engagement here. They just want to sort of make a useful tool that sends you on your way and that essentially they're going to have more to communicate about this soon. So we didn't get a ton, ton of detail there, but they have said that, at least in some ways, they think that they have improved these models to make them less sycophantic. In addition to that, they said they did 5,000 hours of red teaming. They shared, I believe, these models with some external experts for advice on that. So they did say that they rate this model as high on the scale of could it be used to create novel biorisks. So they're building in a bunch of protections around that, you know, but that doesn't seem great. But anyway, that was the sort of safety report that we got in advance of the launch.
Kevin Roose
So they also said that in addition to all these new capabilities, GPT5 is much more reliable than previous models. They claim it hallucinates less and it does this interesting thing called safe completions, where basically if a model doesn't want to, you know, accommodate some request or carry out some task because it's sort of against the guidelines, instead of just refusing it, it, it will sort of make up a safer version of the request and complete that instead. So it'll be interesting to see how people use that. But yes, this is the claim they make. It's more reliable, less deceptive, and gives these kind of safe completions.
Casey Noon
Well. And that actually gets into something interesting though, Kevin, which is like, what is it that makes OpenAI say this is GPT5? The, the sort of like big number releases have this amazing marketing power now, I think, because the leap from G to GPT3 was so big, and the leap from 3 to 4 was also pretty big. And so that creates a lot of expectations for 5. But in the background, OpenAI is just trying a bunch of things, building a bunch of new models and then sort of stapling various things together. And then eventually they get to something and they say, we're going to call this one five. But like, it's not quite as linear as it looks from the outside, right?
Kevin Roose
Yes. And all of the labs have had experiences where they thought they were training like a, a new model, and then it didn't quite work out for as well as they wanted it to, and so they sort of assign it some lower number. You know, that's happened at at least a number of big labs that I know about. It's happened at OpenAI. You know, they had a previous big model that they were building, which ended up becoming 4.5, I believe was supposed to be GPT5 at one point, and it just didn't turn out as well as they'd wanted it to. So, yes, they are playing games with the numbering of the models and sort of the marketing around that, but I think calling this GPT5 just signals that they, they want this to be viewed as a similar step in that people saw from GPT3 to GPT4.
Casey Noon
Yeah, to me, that is one of the most interesting things about this release is like whatever GPT5 turns out to be, this is the thing that they thought was kind of the next big step forward. And I think we should like evaluate it on those grounds.
Kevin Roose
I think it's also like the big picture here is that OpenAI is trying really, really hard to stay at the head of the pack. This is a company that has been sort of racing very hard toward AGI or something that they can claim AGI. And they are, they are still going. I find actually their execution to be quite impressive. Like, this is now a very large company. They've got a lot of different competing teams and priorities and they had all the. This board drama and I think, you know, it was reasonable to expect and I certainly expected that in the wake of all that they would sort of slow down and maybe allow some competitors to catch up.
Casey Noon
But.
Kevin Roose
But they showed this week that they are not slowing down, that they are in fact accelerating and they want to get here before anyone else.
Casey Noon
True. Although they have also experienced a lot of poaching in recent weeks and months. And I think one thing I'll have my eye on over the next several months is are they able to continue iterating very quickly or have some of the losses that they've experienced over the past few weeks really hurt them? Yeah. You know, incidentally, Kevin, I'm told that in response to the GPT5 launch inside Meta headquarters, the superintelligent researchers have moved their desks even closer to Mark Zuckerberg. So that's how seriously they're starting to take this over there.
Kevin Roose
They are now sitting on top of Mark.
Casey Noon
There are now two researchers who are sitting at Mark Zuckerberg's desk with him. And we'll have to see how that plays out.
Kevin Roose
Yeah, yeah, those are some of our initial impressions, but we are going to actually come back tomorrow after we have had a little time to play with the model and give some first impressions. There too.
Casey Noon
Let's travel to the future now, Kevin.
Kevin Roose
All right, Casey. It is now Thursday. GPT5 has been officially released for a few hours. I still do not have access to it for some reason, but I gather that you do. So give me your day one vibe check. What are you seeing? What is GPT5 like and what do you make of the reaction to it?
Casey Noon
Well, this is a very significant moment in the history of Hard Fork, Kevin, because for the first time I'm having a conversation with you while vibe coding something.
Kevin Roose
What are you vibe coding?
Casey Noon
Well, ChatGPT is currently hard at work building a To do list app for me. I said I wanted it to have the aesthetic of the fantastic four First Steps movie that just came out. I didn't love the movie, but I did love the production design. So I was like, make me a To do list app that looks like that. Let's see how it goes.
Kevin Roose
I can't believe we're building giant gigawatt data centers for your stupid to do apps. This is so wasteful. God.
Casey Noon
Listen, you need to send over roostbench, your proprietary suite of evals, so I could really put this thing through its paces. But look, let me give you some high level notes, Kevin, on what I'm seeing and on what others are seeing. The headline here is that this does seem like a really meaningful improvement to ChatGPT. And I think in particular, if you are a free user of ChatGPT, you're going to have have a great day. Right. Because for the first time now, in addition to the kind of, you know, standard ChatGPT model, you're going to have some reasoning capability. So essentially, if you're cheating your way through high school, you're just going to have a lot easier time of it now because this thing can do some really extended work on long problems.
Kevin Roose
Yes, you can now cheat your way through an entire semester with just one press of a button. Exactly. This is good.
Casey Noon
Yeah.
Kevin Roose
I mean, the way I saw people talking about it online was that they thought that OpenAI had not raised the ceiling of the AI frontier by a lot with GPT5, but they had raised the floor, essentially. Now, all the free users who previously got defaulted into the less powerful models are now going to be using the more powerful models, which could be a big perceptual shift, if not a sort of one about the frontier capabilities for sure.
Casey Noon
And I, and I do think it does have some things that are not quite capabilities, but still will meaningfully affect how people use these AI systems. For example, this thing really is just a lot faster than its predecessor. So just over the past couple of hours I took some editing work that I sometimes ask ChatGPT to do. I know about how long it takes using the O3 model. I put it through GPT5 and sure enough, yeah, it blazed through it. It did just as good of a job as it had done before. And so if you're the sort of person who's using ChatGPT a lot, I think that's really going to stand out to you.
Kevin Roose
Yeah. What about the pricing? I saw some people saying that GPT5 was much cheaper than they expected it to be. That would not cheaper for the sort of Chat GPT subscriber. Those subscription prices are staying the same, but for developers who are building on top of it, my understanding is it's a lot cheaper than other models from other AI labs.
Casey Noon
That's right. It came in at $1.25 per 1 million input tokens, which is the same as Google's Gemini 2.5 Pro. Google has of course also been pricing really aggressively to try to box out the competition. What makes that figure interesting, I think, Kevin, is that that number is a lot smaller than Anthropic's Claude for Opus API, which comes in at $15 per million input token. So I think some of these really well capitalized AI labs are taking this moment to say, hey, we're going to put a lot of pricing pressure on some of our competitors.
Kevin Roose
Yeah, it very much reminds me of the moment like 10 years ago when like Ubers were $4 because the venture capitalists were just subsidizing the artificial actually cheap prices. We're in sort of that moment for AI tokens now.
Casey Noon
Yeah.
Kevin Roose
What else can we say about GPT5 in the couple of hours now that it has been out?
Casey Noon
Yeah, well, you know, I sometimes like to joke that the worst insult you can make to anyone who has just released a new AI model is my timelines are now longer. And it does seem like that is something that people are saying about the new GPT5. And what I they would mean by that is I now think it's going to take a little bit longer until we reach AGI some sort of very, very powerful AI system. In fact, some people were posting online screenshots of some prediction markets that until today when asked who do you think will have the most powerful AI model? At the end of August were showing OpenAI in the lead. And almost instantaneously after the live stream on Thursday, hey, OpenAI collapsed and Google is now ascended and is assumed to have the best model by the end of this month. So you know, I don't want to overstate what that means necessarily, but it does seem like there was a huge contingent of people who thought that GPT5 was going to be this revolutionary new model and it seems instead like a more evolutionary one.
Kevin Roose
Yeah, that makes a lot of sense to me. One other thing that stuck out to me, and I wonder if it stuck out to you too was OpenAI released some benchmarks and some data about GPT5 and one of the things they showed showed was that hallucinations, the rate of GPT5 just making stuff up while answering questions has gone way down. It's now, you know, for some types of questions it's sort of around 1% hallucination rate. And I think that was interesting to me because this is clearly something that was a problem with earlier versions of this. In fact there was some speculation and some indication that actually newer, these newer reasoning models were hallucinating at higher rates than the sort of previous generation of models and there's a lot of concern about that. It seems like they have figured out a way to get the hallucinations under control with GPT5, although with everything. I don't totally trust these benchmarks. I'm going to have to see this for myself.
Casey Noon
Yeah, everyone's mileage is going to vary on this one. I will say I have already caught it hallucinating a couple of times like somewhat disappointingly so. As always, don't trust these things for anything mission critical. You're always going to want to double check your first facts.
Kevin Roose
Yeah, only use it to build stupid to do apps with the Fantastic Four aesthetic on them.
Casey Noon
The Fantastic Four have a very cool aesthetic and I think you need to open up your mind a little bit.
Kevin Roose
Okay, that is our day one vibe check of GPT5 and we will continue to play around with this and tell you anything cool or interesting or strange or upsetting that we find.
Casey Noon
Sounds good.
Kevin Roose
All right, that's enough about GPT5. When we come back we'll talk about another AI system we got our hands on this week. Alexa plus.
Invesco QQQ
Why do tech leaders trust indeed to help them find game changing candidates? Because they know that it takes an innovator to find innovators. When it comes to hiring, Indeed is paving the way. Indeed's AI powered solutions analyze information from millions of job seeker data points to match potential candidates to employers jobs. You'll find quality matches faster than ever meaning less time hiring and more time innovating. Learn more at Indeed.comHire.
Daniel Rauch
Our state has changed a lot in the last 140 years. We know because Multicare has been here guided by a single making our communities healthier. That comes from making courageous decisions, partnering with local communities to grow programs and services, and expanding healthcare access to those who need it most. Together, we're building a healthier future. Learn more@mycare.org.
Casey Noon
The New York Times app has all this stuff that you may not have seen. The way the tabs are at the top with all of the different sections.
Kevin Roose
I can immediately navigate to something that.
Amazon Representative
Matches what I'm feeling.
Casey Noon
I go to games always doing the mini, doing the wordle.
Amazon Representative
I loved how much content it exposed.
Casey Noon
Me to things that I never would.
Amazon Representative
Have thought to turn to a news app for.
Casey Noon
This app is essential.
Amazon Representative
The New York Times app.
Casey Noon
All of the Times Times all in one place.
Amazon Representative
Download it now@nytimes.com app.
Kevin Roose
Now. Casey are you an Alexa user?
Casey Noon
I have been an Alexa user for a long time. I still have one of the original Amazon Echoes in my house and to Amazon's credit, it still works.
Kevin Roose
The Pringles can.
Casey Noon
Yeah, I have the big old sort of Pringles can echo.
Kevin Roose
Yeah, me too. So I am a heavy user of this product. I have probably five of them in my house in various rooms. And so I'm very excited for our conversation today, which is going to be about the new AI iFied Alexa Plus. And before we get into our experiences using this thing and our interview with the guy who runs it, we should make a couple disclosures. One of them is that the New York Times company has recently agreed to a licensing deal with Amazon that will allow Amazon access to Times content for its AI platforms, including Alexa. So we just thought you should know that we have nothing to do with that, obviously, but that is going on in the background in another part of the company. The second thing we should say is that if you have an Alexa device, it is going to be going off constantly during this segment unless you go over right now and hit the little button that mutes it. So sorry in advance to Alexa owners, but we'll give you a little time right now to pause this, go over, hit the mute button on your Alexa and come back.
Casey Noon
Or alternatively, just find the circuit breaker at whatever house you're in right now. You just shut them all off, run on battery power for the rest of this episode.
Kevin Roose
Alexa order 14 bags of dog food. I wonder if that actually works.
Casey Noon
Wait. And I should also probably disclose that my boyfriend works for Anthropic because I'm pretty sure that anthropic is providing APIs that are being used in Alexa.
Kevin Roose
Wow, we've got so many disclosures today.
Casey Noon
Yeah.
Kevin Roose
Okay. All right, let's get started. So Alexa, Alexa is one of the most puzzling technology products that I have ever encountered. And like you, I have been an Alexa user since the very early days. People don't realize this is, this product was released in 2014. Alexa is 11 years old.
Casey Noon
Yeah.
Kevin Roose
And when it came out, I was very excited. I thought, I'm going to put this smart speaker in my house and I'm going to, you know, ask it to do things for me. And it's going to be like having a little assistant right there on my kitchen counter. And Alexa has added dozens of features, maybe hundreds of features since 2014. And I use zero of them because the three things that I use Alexa for are setting time timers, choosing music to play in my house, and telling me the weather before I leave for the day.
Casey Noon
Absolutely.
Kevin Roose
Are those like, similar to what you use?
Casey Noon
Those are the exact three things that I use Alexa for. Have I tried to use it for other things? Yes, but the experience, frankly has just never been that great. So I always come back to those three.
Kevin Roose
Yes, those are the big three in my house. Same use cases, same limitations. But when generative AI started to get good a couple years ago, I think people naturally started to ask, well, when is Alexa going to start using this new generative AI technology?
Casey Noon
Technology.
Kevin Roose
It's sort of built on this older, more deterministic kind of system. But it seemed like a natural thing to expect that Alexa would start to incorporate some of this technology to be able to answer maybe more open ended questions, to give longer, more detailed responses, to do more than just set timers and tell you the weather.
Casey Noon
Yeah. I mean, once OpenAI released voice mode for ChatGPT, it just immediately seems so much more interesting and powerful than Alexa and Siri, which is Apple's very similar system that it makes for its devices. And so, yeah, I think both of us were like, okay, well when are we going to get that OpenAI style voice mode in these smart devices that we have in our homes? Yeah.
Kevin Roose
And so it's taken a while. We should say that, like it is not been a smooth or simple process. And part of what I'm so excited to talk with Daniel Rauch, the VP of Alexa about later in this show is just why it's been so hard to sort of shove an LLM based generative AI technology into this preexisting assistant product. But we just talk briefly about what Alexa is and then our experiences with it, because both you and I have gotten to try this over the past few days.
Casey Noon
Yeah. So, Kevin, tell us a little bit about Alexa.
Kevin Roose
So Alexa is the name for the most recent overhaul of the Alexa virtual assistant. It's powered by generative AI. We don't know exactly which model or models, but it seems to be sort of a mix of Amazon's proprietary AI models and then maybe some of Claude, which it has a deal with Anthropic. And so it's been using Claude inside of its AI products for a number of months now. And Amazon claims that the new Alexa plus is able to do much more. It's able to be much more conversational, more personalized. It can do things like book reservations at a restaurant or order you an Uber. It can answer questions that sort of aren't just pure lookups where you're looking for, you know, what time is the baseball game tonight. It can actually do more complex things for you. It can control the smart devices and appliances in your house, and it can purchase things for you online. So this new Alexa is not out to everyone yet. They've been rolling it out slowly. They are now in what they call the early access period. But we were able to get this on some new devices that we ordered. It also doesn't work on every kind of Echo device. You have to have sort of one of the newer ones to be able to run it.
Casey Noon
And, Kevin, when you say that this has been rolling out slowly, like, it has been rolling out extremely slowly, like. Like it was only on June 23 that Amazon said that 1 million people had Alexa across, presumably hundreds of millions of Echo devices out there. Yeah.
Kevin Roose
So you and I both got the new Echoes that can run the Alexa plus early access program and turned it on and set it up. And a few things stick out to me right away. One is the voice on this new Alexa is just way better than the old.
Casey Noon
Yes, I would agree with that.
Kevin Roose
It is way more fluid. It sort of sounds like more like something you'd hear out of ChatGPT voice mode. They have managed to sort of overhaul the actual voice part of the voice assistant, so it sounds much more like a human.
Casey Noon
And there are a bunch of different voices. I think I saw eight of them inside the app. Half of them are masculine, half of them are feminine. So, yeah, you can change that to your liking.
Kevin Roose
Yeah. The other big difference I noticed right away is that the new Alexa plus does not require you to say the wake word. Like Alexa between every question and answer pair. Right. With the old Alexa, if you wanted to ask a follow up question, you had to say Alexa again. With the new one you can just kind of leave it and it will sort of intuit or pick up that you have a follow up question and it will sort of listen for a while longer so you can actually have these more extended multi turn conversations.
Casey Noon
Yeah. And that lets it do different kinds of things. Like one of the first things that I did with Alexa plus was it said, hey, would you like to try to solve a riddle? And I thought, what are you, the sphinx? But I said, well, sure, what the heck? And you know, it gave me a series of clues and we within, you know, three clues. Kevin. I was actually able to solve the riddle.
Kevin Roose
Wow, good for you.
Casey Noon
I felt really smart.
Kevin Roose
I'm so proud of you. Thank you.
Casey Noon
Thank you so much. So yeah, what else were you doing with this thing?
Kevin Roose
So another thing it can do is just give you longer answers. Like the original Alexa was sort of limited to a sentence or two. Maybe you could ask it to look something up on Wikipedia and it would sort of spit out a few sentences. But it was really limited beyond that. But I can now ask it to make up a story and read it to my kids. So we had some fun doing that the other night as a family. You can ask it to suggest a recipe for dinner based on, you know, what's in your fridge and it will sort of help you with that. I used that last night. So these are some of the new features that I was excited to try. I also tried some of their integrations. Like they have an integration with OpenTable and with Uber and a bunch of other companies.
Casey Noon
Tell me about this, because I set this up, but I did not actually use it. So how did that work?
Kevin Roose
So basically you scan a little QR code on your phone and link your Uber account or your OpenTable account to your Alexa account. Takes, you know, a minute or or so. And then you can just kind of say like, order me an Uber from this place to this place or I want a table at a restaurant in downtown San Francisco near the ferry building for two people at 6:30 tomorrow. And it will sort of pull up a couple options and you choose what you want and then it can go book the table for you. So that I thought was cool.
Casey Noon
And that actually worked when you tried it.
Kevin Roose
So I did not actually follow through with the booking, but I did order an Uber for myself and it did work.
Casey Noon
Okay, cool. Yeah, I mean that, that actually seems like like, truly useful. Just be like, say to your thing on your desk, hey, I need an Uber to the airport. And it pulls up one. That's great.
Kevin Roose
Yeah. And it can do other cool sort of multi step things too. Like, I was able to say I was. I needed a new thing for my kitchen, like a box grater. And I was able to kind of go to Alexa and say, hey, look up on Wirecutter what the best rated box grater is and add it to my Amazon.
Casey Noon
Now can I guess why you needed a new box grater?
Kevin Roose
Why is that?
Casey Noon
You used it to grate ginger and it dulled the edges.
Kevin Roose
No.
Casey Noon
Okay. Was it. What was the reason?
Kevin Roose
I left it in an Airbnb.
Casey Noon
Okay, I should have seen that coming. Anyways, go ahead.
Kevin Roose
So anyway, those were some of the good things about this product, but we have to talk about some of the limitations as well. Kasey, what was your experience with Alexa?
Casey Noon
Okay, so I have to say I did not have a good experience with this thing. First of all, I bought an Echo Show 5. There's a big banner on the. On the page that says it works with Alexa. And so the thing shows up at my house. And basically what I've come to understand is that an Echo show is a device that just constantly invites you to spend money with Amazon. And I found it honestly infuriating because I plugged this thing in, and when you set it up, it's like, what? You know, what kind of background do you want? I was like, show me some art. You know, that's one of the options. And I would say for about 4 seconds per minute, it would show me, you know, some renaissance, you know, masterpiece or something. And then it would be like, hey, do you want aspirin? You want paper towels? You want to buy paper towels? You can actually buy paper towels right now. Just say, hey, Alexa, buy paper towels. Towels. And it was just sort of this, like, forever. So I eventually just unplugged the thing because I was like, why did I just spend $90 to have a permanent rotating advertisement for household products, like, on my desk? That is so weird. So I was just like. It just put such a bad taste in my mouth about the whole thing. And then a day later, I get the Echo Show 15. Okay. And for some reason, Amazon sent me two of them. I truly don't know why I did not need two of them. And so I unbox the thing.
Kevin Roose
Thing.
Casey Noon
And the thing is meant to be mounted on a wall. Now, there are a lot of things I'm willing to do for A podcast, but mount an echo show on my wall.
Kevin Roose
You're not willing to do a construction process?
Casey Noon
Not one of that. No, I was not going to do that. And so I just also, the thing, like, can't stand up on its own, so I just had like a 15 inch screen sitting on my desk for a day while I'm talking to it. It was like, this whole thing is like, very silly. So that's like the hardware side of it. Okay. You may have a better experience because, you know, I don't know, you like mounting things to your wall. And so you did that. And so, you know, you're having a good time. But that was just kind of all of the, you know, the. The precursor steps I needed to take to even be able to, like, engage with this thing. And so then I, you know, I finally have it set up and I start to try to put it through its paces. So I, you know, go through the little riddle game and I, you know, it's like, hey, like, I could help you, like, to. A personalized meal plan. All right, great. Yeah, you set me with a personalized meal plan. And, you know, it's like, well, you know, we could do this or that. And it showed me, like a row of like, recipes that it could cook for me. And so I swipe through with my finger and I see a lemon pasta. And I say, okay, show me the lemon pasta. And it says, sorry, I didn't get that. And I said, alexa, the lemon pasta right there. Could you make me this lemon pasta from this website that you're showing me right now? Dead silence. And I was like, oh, my God. And it's like this, like. And right here we have just landed in the exact spot that has been bedeviling Apple for the last year that is bedeviling Alexa right now. These systems are just very hard to make reliable. Now I will say I. The device was sort of having trouble connecting to my Internet. Everything else in my house was connected to the Internet. It was working fine, but, like, this was just sort of every once in a while being like, you're not connected to the Internet. So was that an issue with me? Was that an issue with the hardware? I'm not totally sure. Maybe that was why it wasn't able to perfectly answer my question. You know, I do want to say that in case this was not actually an AI issue, but. Oh, man, I. Within five minutes, I was like, get this thing out of my house, like, tr. And like. And I. And again, I wanted to, like, I was excited about it. And after five, and after, like, two days of ads for paper towels and one day of I'm not going to show you the lemon pasta, I thought, what am I doing with my life?
Kevin Roose
Yeah, I. I should say, like, I have also had a bunch of, like, very bizarre and frustrating experiences with this thing. Okay, so we've said what we like about this thing.
Casey Noon
Yeah. I think many of you remind me what that is.
Kevin Roose
Again, many of the new capabilities are quite cool.
Casey Noon
Yeah.
Kevin Roose
Unfortunately, many of the old capabilities I relied on as the reason I used Alexa at all have become broken as a result of this update.
Casey Noon
Okay, so tell me about this.
Kevin Roose
So one of the things you also notice very quickly when you're using this thing is that the latency is just, like, a problem. It's a little slow to respond to questions. It's not as zippy as the old sort of Pre LLM Alexa. I understand that these things have to go to the cloud. They're processing more complex instructions. It's all going to take a little time. I assume that will get better. The basic things that it gets wrong now include alarms, which is actually a thing that I use Alexa for every day.
Casey Noon
Wait, so tell me how it got it wrong.
Kevin Roose
So the new Alexa update seems to have broken Alexa's ability to reliably set and cancel alarms, which is a core thing that I use this product for. And so for this morning, I woke up on my own a little bit earlier than my alarm. Like, 10 minutes before my alarm was supposed to go off. And so I said to Alexa, alexa, cancel the alarm. Silence. Nothing. This is a command that I have issued probably a thousand times.
Casey Noon
And Alexa plus is a little smarter now. And she's giving you the cold shoulder. Yes.
Kevin Roose
And she's saying, actually, I'm gonna wake you up anyway in 10 minutes. So that was not good. I also experienced some, like, hallucination when I would ask it questions about, like, things happening in the world, things happening in the news. I asked it about a tennis tournament that's going on right now. I said, who's the top seed in this tennis tournament? It gave me the name of a player who's, like, not even playing in this tournament. And it also, like, has trouble orchestrating the different tasks. So one of the things that would happen is I would, like tell it to. I gave it a research project for a dinner playlist. I was looking for some new, new music to put on our dinner playlist, and instead of doing that research project, just started searching on Spotify. Like, it routed the query to Spotify within the Alexa thing and started playing the music when what I had asked was, like, do some research for me. So it seems to have a little trouble figuring out, like, what exactly the user wants and, like, orchestrating the commands.
Casey Noon
That case seems like a little borderline to me. I can imagine some people, like, asking for that, you know, maybe being happy if it, like, played some music. But I had this almost opposite issue where, again, as I'm sort of going through, okay, what can this thing actually do? And it's, you know, and it's like, you know, ask me what I can do. So I asked it, and one of the things it said was, I can help you explore Gen Z music trends. Which there was just, like, something funny about the way it said it to me. So I was like, yeah, sure, why don't you help me explore Gen Z music trends? And, you know, thanks for a second. And then it goes, well, I found some podcasts about it on Amazon Music, and I was like, I sort of assumed you were either gonna tell me something about Gen Z Music or you were gonna play Gen Z Music. You're trying to sell me Amazon Music, which I feel like is very consistent with how Alexa plus handles everything, which is, could we sell you a service right now? Could we sell you a product? So you kind of. I want to say two things. One is, I have not used this product all that long, and so I don't want people to think about anything I'm saying as anything other than first impressions. Like, I have not truly had a chance to do the amount of reviewing that I would like to do. Two, I'm very confident that lots of other people are probably having much better experiences with this thing, because I think if most people were having experience as bad as me, I just would have heard about this before now. But all of that said, Alexa plus did not make a great first impression on me. And the Echo family of devices that are just little windows that let you send money to Amazon.com they're not for me.
Kevin Roose
Yeah, I had, I think, a slightly more positive experience than you. I did actually enjoy some of my interactions with Alexa plus, but it just seems like it is not quite there yet. And I think Amazon knows this, which is why it's in this early access program program. If you open it up, it says, you know, Alexa may make mistakes. So they're sort of like doing all of the careful rollout that you would expect from a product that is not fully baked. But, like, some of the features just don't seem to work. There's another feature that I tried where you can like email a document to this email address and it will sort of ingest it into your Alexa and then you can have it summarize it. So I was very excited. I was like, I can like, you know, I can learn about new papers in AI while I'm like doing the dishes. And so I emailed the paper to the Alexa email address and I say, summarize the paper I just sent you and it says I did not receive a document. So I'm just like, I think they need to spend a little more time in the kitchen cooking this one. But I think the overall, my overall impression is that like the Alexa that you have now in this early access program, it is a little like having a kind of GPT 3.5 class model inside of a smart speaker, which I think is a valuable thing and one that I would like them to continue to build on. But it is not the state of the art in either the language model or the kind of basic tasks. And actually it seems to be regressing on some of the basic tasks. So I would say this is like two steps forward, one step back.
Casey Noon
I think the most powerful thing that the new Alexa plus has done for me is it is made me forgive Apple for not shipping anything with the new Siri. I get it now. Apple, I talked a lot of mess about you on this podcast about not shipping this thing. But now having used one of your close rivals attempt to do the same thing that you're doing. Like, I get it now. I think the finest minds in the world who are working on this stuff actually don't know how to do this yet. That's my big takeaway.
Kevin Roose
Yeah, I think what's happening with Alexa and Siri right now is sort of a, it's like a symbol of what's happening in the American economy writ large with which is like we are trying to jam these like new AI technologies into these legacy systems and processes and it's just kind of a messy fit. Like these things are weird, they are not deterministic, they are not reliable in the ways that like a sort of older, more rule based thing could be. And they have these amazing capabilities. But when you try to like make these hybrid Frankenstein things with like the old system, with the new brain, it just doesn't really work. And I think that's like happening not just in these virtual assistants, but like in a lot of places throughout the economy.
Casey Noon
Absolutely. I also just think that when I'm using a chatbot on my laptop and it gives me something that's like 80 or 85% right. That's much more useful to me than like an Alexa response that's 85% right. Cause on a chat, like in a chatbot setting, I can just sort of take what I need. I can edit or modify. I can maybe ask the same question of another chat bot and see if I get a slightly different or better result. Like I feel much more in control of my own destiny. I can take the stuff that works and leave behind the stuff that doesn't. When you're doing this thing with a smart speaker, if it doesn't work, you say, yeah, why do you spend 90 bucks on this piece of junk? You know? And it's like, and, and I think I, what I learned about myself was I have so much less patience for this sort of thing when it is a piece of hardware in my home that has made some really big promises about how it's going to me with all my routines and everything. Well, if it's kind of hard to set up and it doesn't work like the vast majority of the time, it all just kind of feels like a waste.
Kevin Roose
Right. So I'm, I'm very glad we got to exchange first impressions of Alexa. Plus we should also have a conversation with someone at Amazon who has been involved in this overhaul of their flagship voice assistant. So when we come back, we're going to be joined in the studio by Daniel Rauch. He's the Vice president of Alexa and Echo and at Amazon and we're going to get into all this with him. And you could ask about those ads.
Casey Noon
Oh, I'm going to. But first, some ads.
Daniel Rauch
For 140 years, MultiCare has been in Washington prioritizing long term solutions, partnering with local communities and expanding access to care. Together we're building a healthier future. Learn more@ multicare.org.
Kevin Roose
Daniel Ralph welcome to Hard Fork.
Amazon Representative
Thanks so much for having me.
Kevin Roose
So Casey and I have both spent the past few days playing around with the new Alexa and I'd like to just start by asking about the technology that powers this thing.
Amazon Representative
Yeah.
Kevin Roose
How much of it is a new LLM based system versus the old, more deterministic model that powered the old Alexa?
Amazon Representative
Yeah, well, from a AI and model perspective, everything is entirely new. There are some legacy deterministic systems downstream, but really it's a complete re architecture of everything that you would say Alexa is from the way you have a conversation and engage with the experience at a very basic level, you know, all the way through Alexa, acknowledging You or just maintaining a chat. So there's a lot of new under the hood.
Casey Noon
Yeah. Talk about the challenge of moving to the. From this deterministic system to something that is very powerful but also much less. Less reliable.
Amazon Representative
Yeah, I would say. Well, hopefully you're not seeing much less reliable. I would say, you know, we've got some edges to sand and we're in early access. I'm sure we'll get to talk about the nature of the rollout, but I.
Casey Noon
Just mean like in general are not as reliable as a deterministic system.
Amazon Representative
I get it. So, you know, we want to capture all the benefits of that non deterministic, we'd call it stochastic system in the space. It has the elegance of really engaging in human conversation, but we want the predictable outcomes. Now, large language models don't support interfaces out of the box to classic systems. So getting those capabilities to interface, we would talk about it as APIs across these interfaces for other systems, it's quite hard. They speak natural language. APIs don't speak natural language. They speak clunky computer science language. But it's very predictable and it gets a lot of things done. So I would say if you had to list the technical challenges, challenges, you would say, well, the many millions of things, we stopped counting at some point. But the many millions of things that, you know, original Alexa could do, marrying that with the power of LLMs is definitely the first and most prominent on the list.
Casey Noon
So take us Back to when LLMs are first coming out. You guys are starting to play around with them. It's sparking ideas for you. Of, gosh, if we could like marry this to Alexa, we could have something really cool. What are some of the uses that you're thinking about? Like what are, what are the kind of dreams that you have for this that you're hoping you can bring into reality?
Amazon Representative
I mean, I think it's. We think of the capabilities in two buckets. I would say take everything that Alexa, the original Alexa can do and just make it way better. You know, just picking up from what customers are already doing with Alexa. Then you start brainstorming. And where I think you were really headed was what, what are all the new things that we can do and the depth of conversation that you can have with the new Alexa experience? Change just opens whole vistas of new kinds of things we can get done. We can help you plan a trip and then follow through on it. We can watch for concert tickets for you. We can not just help you brainstorm about cuisine. But either Pick a recipe and get some groceries and invite the neighbors or let your partner know it's date night and we're going out and book a table. So I think the kinds of journeys and the kinds of tasks we can get done for customers are just so much more expansive.
Kevin Roose
So Casey and I have spent the past couple days trying out Alexa plus and we have some feedback which we can share with you now or later. We've talked about it on the show just before this. I think it's fair to say we both had some things that impressed us about the new Alexa and some things that were challenging, including some of the basic stuff that Alexa seemed to be very good at before, or at least that I know how to get reliable performance out of Alexa four that no longer seem to work as well. But what I actually want to know is like, why has it been so hard to do this? Because back in 2023, when Amazon announced that it was going to revamp Alexa, sort of give it this brain upgrade with these new AI capabilities, they said this was going to be ready in 2024 and then that got pushed back a couple different times. So walk us through kind of the, the journey that you all have been on over there, trying to sort of shoehorn this new technology into this existing products and maybe some of the challenges that you encountered along the way tell.
Amazon Representative
You we should definitely get some of the feedback. We can cover as much as you like here on the show. So if you rewind the tape. Actually you were asking about this too. As we're starting to experiment, what can we imagine doing? If you go back to 2023 and the models that were available then, the state of the art, very little instruction following reasoning, low ability to execute on these interfaces with other systems. We announced something called let's Chat, which was sort of a mode of Alexa. So, so think about flipping a switch on Alexa and turning on a chat interface so that you can do some basic question and answer and have a discussion about a topic, mostly about knowledge native to the model's training data versus bringing something in at runtime, the way that modern chatbots answer questions by going out on the Internet. I think what we mostly learned from that announcement and the customers that we rolled out to is that we just had to increase, increase our vision and do something more audacious. Basically, customers really wanted, and we all really wanted to pick up from where Alexa is and was and extend all of those capabilities. That is many millions of things that Alexa can do. And when you count the tens of thousands of Services and devices that are integrated with Alexa and the space of the interfaces and the systems that you need to integrate with, it's incredible large. So that's the first sort of technical challenge I mentioned before. Sort of the first and probably most important bucket. Second is really grounding it in authoritative sources. I think, as all of us know, you can sit there and fiddle with a chatbot long enough to press it into being smarmy or responding in ways that we don't believe are the way Alexa might act, for example, or press it to give you wrong information that's from some unauthoritative source source or a mistake in its training. Data online shifts back to its native training. So getting Alexa to speak confidently in her personality, with authority and answer questions.
Casey Noon
Right.
Amazon Representative
It's another key challenge. Personalizing an experience of this depth so that Alexa is always learning from her interactions with you and extending your interactions so they get more delightful over time. I think this is something you probably wouldn't have seen in a weekend's worth of fiddling with the experience. You'll see it gets more personalized. That's another big technical challenge, because the surface area is so much bigger. So those are a few of the reasons why it sort of took so long. And I think if you rewind the tape to 2023, it's really about learning how big a project Alexa would be and then starting to put one foot in front of the other, really inventing the space of creating those integrations, because it just hasn't been done.
Kevin Roose
What's an example of some. Some early failure mode that you all had to overcome? I mean, I've heard some stories from folks who have worked on Alexa or worked with suppliers who provide models to Alexa. They. They would tell me stories about, you know, you. You'd ask Alexa to set a timer for you, and it would, like, write you an essay about the history of timers. It was just sort of misunderstanding the request in the way that a large language model might. So tell us some of those stories.
Amazon Representative
Well, that. That's a good one. I mean, verbosity was definitely an early issue, and, you know, it continues to.
Casey Noon
Be an issue on our podcast, by the way. We still haven't solved it.
Amazon Representative
Yeah, I've got some training ideas.
Casey Noon
Okay, good.
Amazon Representative
Verbosity. These models want to give you an extensive answer. Customers don't want an extensive answer read out, and they certainly don't want a disquisition on the nature of timers. Right. What they want is an interface that sets a spaghetti timer.
Kevin Roose
And how do you get them to do that. Is it just as simple as putting in the system prompt, like if a customer asks for a timer, like don't give them an essay on the history of timers or be concise or how do you actually solve that problem?
Amazon Representative
I would love it if it were that easy. You need a set of models. There are over 70 models in Alexa. It's a vast space. There are different models specialized in different tasks. There are different corpuses of training data we use on different models to get them to complete instruction sets for us and really follow the rules of the road. In interfacing with something, you always need to loop back to central systems that are maintaining context in the conversation and picking up from references and pronouns that you've used to refer back in time and sort of cascade those forward. But the amount of work that went into just the interface between a large language model and the downstream systems that complete tasks is. I mean, it's the biggest body of work that we've put in and it would be too much to even try without a whiteboard here. It'd be too much to even try to explain, explain to you and your listeners, I think the technical depth that went into it. We've got a great team working on it and it's hard.
Kevin Roose
Of those 70 models in Alexa, how many are Amazon's own in house models versus models like Claude that you all get from external companies?
Amazon Representative
There's a mix. So you know the best way to know what models are Alexa is just go to the Bedrock webpage and look at an update on the latest in there. We use the best tools that we have available to us for the job and we've got great partners over in AWS helping make sure we've got the right best tools for the job. Most of our traffic does flow through Amazon Nova models. We have the most control over how those get trained and tuned and post trained. I think it's over 80% of traffic on sort of the main big inferences within the system flow through Nova models. But there are many different reasons to use many different models. I think you guys know better than most that models are special. They specialize in different things. So we use the best tool for the job.
Casey Noon
Can you give us a sense of like how big the team is that's working on Alexa? Like how, how big of a priority is this within Amazon?
Amazon Representative
It's thousands of people. Okay, yeah. Now that's building hardware, it's building Alexa. Plus it's integrating with all those systems, it's adding new integrations and new things that Alexa can do. It's, it's a pretty, it's a pretty vast scope. So it takes, it takes a big team.
Casey Noon
Yeah.
Kevin Roose
There was a former machine learning scientist at Alexa AI Mikhail Eric, who did a long post on X last year, sort of his version of a postmortem or retrospective on what was happening with Alexa. And he wrote that Amazon had, quote, all the resources, talent and momentum to become the unequivocal market leader in conversational AI. But then he said that Amazon and Alexa had fumbled the ball because Alexa was, quote, riddled with technical and bureaucratic problems. Sort of made it seem like the problem was not just that the technology was an uneasy fit, but that there were also some organizational and bureaucracy problems that had to be solved. Can you talk a little bit about that?
Amazon Representative
I won't comment on that post in particular, honestly, I don't remember it, but there is definitely, I would say, sort of a startup culture transformation happening within the Alexa team. I think, you know, the life cycle of any product that's been around for 10 years, right. It has ups and downs, but I think our rate of innovation had slowed down. And I think coming through for customers on integrating these new powerful tools is something that's really quickened and inspired the team. I don't identify with the bureaucratic comment. Maybe it's, maybe it's a comment about me, so maybe I won't identify with it, I don't know. But I do think the team is inspired. It's inspired by the vision, executing at an unbelievable pace and really, really creating a lot of invention because it's, there are a lot of really hard problems.
Kevin Roose
I'm curious where the new Alexa sits in relation to Amazon's overall AI ambitions. You know, this is a company that has offered a lot of AI models through aws. Big, you know, market share in cloud based AI. Also recently started an AGI lab lab at Amazon that is going to be pushing towards something like an artificial general intelligence. Is Alexa part of that overall effort to create and serve more capable AI systems or is this sort of a consumer targeted sort of spin off of those efforts?
Amazon Representative
I would say we do believe, and I share this belief the leadership team at Amazon has that this generation of generative AI is going to transform every customer experience we have. And that means, I mean, we have a lot of different types of customers. You mentioned aws, we have enterprise business customers, we have consumer customers, we offer a very big landscape of services. I know that at some point within the last year we counted and there's over a thousand different AI efforts going on with consumer applications alone. So if you sort of look at the scale and scope of what Amazon does and, you know, assume our belief that every experience will be transformed with generative AI, it's as big as Amazon is at that point. I would also say, just internally, it's part of how we work now. To be as productive as you can be in this day and age and get as much done for customers as we aspire to, you have to build AI into how. How you're working. You both do this. I know that, and I'm sure many of your listeners do, too, but it's certainly part of what's going on at Amazon as well.
Casey Noon
Yeah.
Kevin Roose
Okay. Well, Daniel, we have some product feedback for you. Let's do it. As they say, feedback is a gift always.
Casey Noon
So we'd like to give you Christmas.
Kevin Roose
Casey, why don't you start?
Casey Noon
All right, well, so let's see.
Kevin Roose
The.
Casey Noon
I feel like most of my feedback is less about, like, Alexa as an AI than it is about, like, Alexa in the actual, like, hardware that I got. I first started with The Echo Show 5, which does say on the website that it is Alexa plus enabled. But then some of your folks were like, no, like, to get the full experience, you should get the. The 15. So I sort of had the two experiences.
Amazon Representative
Okay.
Casey Noon
On the five. My first observation was that after I told it I would like to. To see art, I felt like every time I looked over at it, it was asking me if I wanted to buy paper towels or Advil or something that was a little bit less the case once I got the 15. I don't know, you know, why that might have been, but I felt like the Alexa AI thinks of me primarily as a person who might send more money to Amazon if you just sort of gave me a few more ideas for how I might do that. And what I would say love, is if it evolved to treat me like a person who isn't, like, constantly looking to buy paper towels. You know what I mean? So that, I think, is actually my biggest piece of feedback was I wanted fewer ads, fewer reminders that Amazon Music exists, fewer reminders that Amazon Prime Video exists. Just, like, get to know me as a person a little bit.
Amazon Representative
That's my big feedback subject line.
Casey Noon
Yeah.
Amazon Representative
Enough with the paper towels.
Casey Noon
Enough. Enough with the paper towels. If I say I want to see art, I really mean it. So I get it. Because you want to show everything that your hardware can do. You worked very hard on it. It can do Many things. You want to showcase all of those things. But I do think it comes across as a kind of insecurity in the device. If we're not constantly showing you everything that we've built into this thing, you'll never discover it, and you'll put this thing in a drawer. I understand the pressures that you're under, and I understand why it has evolved this way. But I have to say, when I unplugged it, I felt more relaxed because it wasn't giving me a list of things to do. And I didn't feel that way about my original Alexa, which is, like, great at the things that it does. So that's. I know, I know that's a lot, but those were my emotions.
Amazon Representative
The first one, to me, the Fi. The Echo Show 5 Feedback sounds like a bug. I don't know what state it got into, but just if you asked for artwork and it. That's not what it was showing to you, that one sounds like a bug.
Casey Noon
Okay.
Amazon Representative
The latter part, it might just be. You have a different reaction than most of our customers do to the. To the onboarding experience is maybe what it sounds like, or you're just looking for more diverse things. I. I will be curious to follow up with you in a week and find out if your use has helped shape the nature of what we're showing you. Yeah, that is certainly our intention, is that when you're onboarding to the new experience, the types of things you're asking for are the types of things we're showing you, and that could be anything. Like one of my most delightful. We have a new element called for you, which is a place where we post little notifications about things we think you might be interested in. And I had been helping my daughter study the periodic table for part of her chemistry final, and I wasn't ever great at remembering in particular the elements that you need a mnemonic for, like lead or they don't match pb. Pb. Very good. So you were good at chemistry, obviously.
Kevin Roose
Yeah, nailed it.
Casey Noon
Very low latency on this.
Amazon Representative
You don't need them mnemonics. But I had done that the night before, and when I came in in the morning, my. For you said, you know, should we. Should we make a chemistry quiz for Ellie or something like that. It was like, write a chemistry quiz for Ellie. And with the generative content capabilities of Alexa, I just said, yeah, let's try that. Like, let's make. Can we make a sheet of all of the elements that aren't intuitive now.
Casey Noon
Did it also ask you if you wanted to buy lead.
Amazon Representative
It didn't ask me that, I think, which, you know, it's a product safety thing. So I'm glad we ticked that box. We will have to just look and see what like the extent of the Amazon service being shown to you. But I will tell you that the body of feedback that we get from customers doesn't accord with that specific version of it. Definitely customers want to learn what they can do. That is one of the biggest things that we hear from customers. I want to come back to what you said about unplugging the device and plugging it back in. We made the Alexa experience incredibly easy to get out of and get back into and get out of, which is not true for sort of like an OS update. Right. It's very, very hard to go backwards and we worked very hard to try to make it possible because we knew there would be so much change. The very high 90% tile of customers stick to the new experience.
Casey Noon
That makes sense to me. I mean it's clearly much more capable, it can do more stuff and I know it's going to evolve and presumably improve over time. So yeah, no part of me was like, I want to go back to the old experience. I was just like, wow, this is very intense. Honestly, I think the bigger shift that I experienced was going from just a pure speaker to something with a screen like that actually feels bigger than the channel.
Amazon Representative
I understand that.
Kevin Roose
Piggyback on Casey's question. I think this is one of the big questions about the Alexa business model is whether you see this as something that is going to make money on its own or whether this is primarily sort of a way of increasing the amount of money that people spend on Amazon. You know, I spend an ungodly amount of money on Amazon.
Amazon Representative
Thank you for your business. Thank you for your business.
Kevin Roose
A large fraction of my income is spent on various things on Amazon and so I'm well aware of the many products that exist on Amazon.com, the website. I do not need like ads to be cascading on my, on my screen telling me to buy more stuff on Amazon, but it does seem like this is primarily going to be an ad supported product. Andy Jassy recently said on the earnings call for their most recent quarter that you all were trying to bring more advertising experiences to Alexa. So talk to us about that and like, are we just going to inevitably be more annoyed at the number of ads that are showing, showing up on these devices?
Amazon Representative
I definitely don't think you'll inevitably be More annoyed. I would say advertising is definitely part of the business plan, but it's not the biggest part. It's actually probably the smallest part. The most important decision we made on the business side with Alexa was bringing it into prime and putting it into prime sort of brings together both all of a customer's prime benefits. Because you might watch a video or listen to a song from Amazon Music or use your Amazon photos benefit, which is awesome, with an echo show and review your family photos. I use that all the time to look back at the kids in particular. But you have this long list of prime benefits. Alexa is a great place where they come together and putting in the value of having this world's best personal assistant into prime just turns the prime flywheel. And we know every time we've added a benefit to prime, customers use their prime benefits more, it's stickier to them, it provides them more value, and it turns into a great business.
Kevin Roose
That's the goal. Okay, so Casey's feedback was about advertising. Mine is about some of these new features that don't work and some of the old features that actually don't work either. So some of the more complicated stuff that I tried with Alexa, such as setting up routines that involve multiple steps, such as emailing documents or research papers to the Alexa email address and trying to. It summarized these things. These things just didn't work for me. The routines didn't run, the papers didn't show up to be summarized. I assume this is just sort of growing pains and beta testing bugs and. And things like that. What I found more sort of frustrating that I wanted to ask you about, because I'm actually not sure why this happens, was that some of the basic features that Alexa had previously been good at and reliable at for me were less reliable with Alexa. So this morning, for example, I tried to cancel an alarm that was about 10 minutes from going off, and Alexa just didn't. Didn't listen, didn't hear me. The alarm went off anyway. So help me understand why that is. Is that like a hallucination of the model? Is that a problem that is sort of related to the orchestration of the various tasks and sending it to the right place? What is going on there?
Amazon Representative
Honestly, we'd have to dive deep into each of those to figure it out. You know, early access is here as a program to cover off on these kinds of issues and to make sure customers. Customers know that they can opt into Alexa. They can opt out if they want. Again, the very vast majority of customers stick to it. The key Challenges probably in everything you said are that interface between large language models and these more predictable rule based systems that communicate through APIs. Something like canceling an alarm, making sure we find out the exact intent of what you were looking for, translating that into a set of commands and then issuing those commands to an API. Sometimes it does fail. It's rarely at this point because of hallucination. We've got so much going on to monitor for model hallucinations. It is sometimes because of just incorrect use of an API or just misunderstanding exactly where to send those commands. So that's more likely the case in each of these cases.
Casey Noon
Got it.
Kevin Roose
I will give you one more piece of feedback which is actually not from me. This is from my 3 year old son who is our head house's most active Alexa user. He talks to Alexa all the time, probably more than he talks to us. Should I be concerned about that? Maybe, but we'll save that for a later episode. But he was doing story time with this because he wants to, you know, constantly wants more stories about various vehicles, you know, various dinosaurs. And so we were doing a story time about super tow truck that rescues cars from the water and he asked for for another one and it gave him a totally different set of characters. So if there's some way for kids to have like a kind of, you.
Casey Noon
Know, their own private cinematic universe, persistent.
Kevin Roose
Cinematic universes for super tow trucks, I know at least one 3 year old would really appreciate it.
Amazon Representative
I got it. Excellent product description by the way. I like that for sure. I agree that children as they explore it doesn't even have to be an imaginary friend, but they do love themes and they love to continue them. So it's great. That's great feedback with Take that to the team.
Kevin Roose
Yeah. For all of our feedback, I actually am very glad I got to try this. I'm going to keep testing it. We are very active Alexa users in my household, so we'll keep sending you our feedback.
Amazon Representative
That's awesome.
Casey Noon
Yeah, we like trying new things around here. Yeah.
Kevin Roose
Daniel, thanks so much for coming.
Casey Noon
Thanks Daniel.
Amazon Representative
Really appreciate your time guys. Thanks a lot.
Casey Noon
Oh wait, what was that?
Kevin Roose
Did you just set off your Alexa?
Casey Noon
Oh, Siri, stay out of this. Gosh, she's got a lot of nerve coming into this podcast recording. Wow.
Daniel Rauch
For 140 years MultiCare has been in Washington prioritizing long term solutions, partnering with local communities and expanding access to care. Together we're building a health learn more@ multicare.org.
Kevin Roose
Well, Casey, we've got some good news and bad news. The bad news is that our hat promo is coming to an end. So as of this week, the limited time offer to get a free Hard Fork hat along with a new annual New York Times Audio subscription is running out. This is your last chance to get this very cool limited edition hat as a thank you for subscribing to New York Times Times Audio and these Hard Fork hats are only available to subscribers in the United States. The good news is that we are going to have more hats available for sale. A different hat with a slightly different design is going to be available in the New York Times store, we're told pretty soon, so look out for that. If you already subscribe to New York Times Audio or you just want the hat without the subscription, we are told that the next wave of Hard Fork hats will be available internationally as well.
Casey Noon
Hard Fork is produced by Whitney Jones and Rachel Cohn, were edited by Jen Poyat. We're fact checked by Caitlin Love. Today's show was engineered by Chris Wood, original music by Marion Lozano, Diane Wong, Rowan Nimisto and Dan Powell, video production by Sawyer Roque, Pat Gunther, Jake Nicholl and Chris Schott. You can watch this whole episode on YouTube@YouTube.com Special thanks to Paula Schumann, Pui Wing Tam, Dahlia Haddad and Jeffrey Miranda. You can email us@hardforkytimes.com with a story about when you fainted.
Daniel Rauch
For 140 years MultiCare has been in Washington prioritizing long term solutions, partnering with local communities and expanding access to care. Together we're building a healthier future. Learn more@mycare.org.
Podcast Information:
In this episode of "Hard Fork," hosts Kevin Roose and Casey Newton explore the latest advancements in artificial intelligence, focusing on the release of OpenAI's GPT-5 and Amazon's newly enhanced virtual assistant, Alexa Plus. They provide personal anecdotes, expert insights, and critical evaluations of these cutting-edge technologies.
Overview of GPT-5 Release
The episode begins with excitement as Kevin Roose announces the launch of GPT-5, OpenAI's highly anticipated language model. "GPT5 will tell you all about OpenAI's latest Frontier model" [00:32]. Both hosts express their enthusiasm for the advancements this new model brings to the AI landscape.
Features and Improvements
Casey Newton highlights the significant upgrades in GPT-5, referencing Sam Altman's remarks during the press briefing: "This is their best model ever... a significant step along the path to AGI, but we're not at AGI yet" [05:30]. GPT-5 boasts enhanced reasoning abilities and a deeper conversational aptitude compared to its predecessors.
Enhanced Capabilities
Sam Altman described GPT-5 as "the first time it feels like talking to an expert, someone who has a PhD in a subject" [06:27]. This marks a leap from GPT-3’s high school level to GPT-4’s college student level, positioning GPT-5 as a more knowledgeable and reliable assistant.
Accessibility and Availability
A notable change with GPT-5 is the removal of the model picker for free users, making GPT-5 the default experience. Kevin mentions, "GPT5 had not yet been rolled out... it will be rolled out... including to free users" [07:02]. This democratizes access to advanced AI, previously reserved for paying subscribers.
Reactions and Early Impressions
Casey reflects on initial user reactions: "GPT5 does seem like a really meaningful improvement to ChatGPT" [19:06]. Users have noted faster response times and improved reasoning capabilities, though some remain cautious about its revolutionary impact.
Reduced Hallucinations
OpenAI claims GPT-5 has significantly reduced hallucinations, with only about a 1% rate for certain question types: "The rate of GPT5 just making stuff up... has gone way down" [23:44]. However, the hosts emphasize the importance of personal testing to verify these claims.
Pricing and Market Impact
GPT-5 is priced competitively at $1.25 per 1 million input tokens, matching Google's Gemini 2.5 Pro and significantly lower than Anthropic's Claude for Opus API at $15 per million tokens [21:43]. This aggressive pricing strategy aims to increase accessibility and pressure competitors.
Implications for AGI and AI Development
The hosts discuss whether GPT-5 represents a step closer to Artificial General Intelligence (AGI). Casey notes, “We still don't have a really good answer to [what's possible now that wasn't before]" [10:22], highlighting the ongoing debate about GPT-5's true transformative capabilities.
Conclusion on GPT-5
GPT-5 is portrayed as an evolutionary advancement rather than a revolutionary leap, raising the floor rather than the ceiling of AI capabilities for free users. The episode concludes this segment with plans for deeper analysis after hands-on experience.
Introduction to Alexa Plus
Transitioning from GPT-5, the hosts introduce Amazon's Alexa Plus. Kevin notes its rollout in the early access phase and its enhanced generative AI capabilities: “The new Alexa is not the old Alexa, it's an entirely new architecture” [49:05]. They aim to explore how these changes impact user experience.
Features and Improvements
Both hosts share their initial experiences with Alexa Plus. Casey describes "vibe coding"—using GPT-like prompts to build custom applications: "ChatGPT is currently hard at work building a To do list app for me" [18:57]. They highlight improvements such as more natural voice interactions and multi-turn conversations without needing to repeat the wake word: “The new Alexa plus does not require you to say the wake word” [32:56].
New Capabilities
Kevin outlines new features, including the ability to book reservations, order Uber, and control smart home devices. “I can ask it to make up a story and read it to my kids” [34:13]. Alexa Plus can also suggest recipes based on available ingredients and integrate with services like OpenTable and Uber for seamless task execution [34:13-35:22].
Limitations and Challenges
Despite the advancements, the hosts report significant limitations. Casey shares frustrating interactions where Alexa Plus fails to execute basic commands reliably, such as canceling alarms or providing accurate information on current events: “Alexa, cancel the alarm. Nothing” [41:26]. Kevin echoes similar points, noting increased latency and reduced reliability on previously stable functions: “The latency is just, like, a problem” [40:07].
Advertising and User Experience
The integration of advertising features in Alexa Plus sparked criticism. Casey expressed irritation with Alexa’s persistent product suggestions like paper towels, affecting her user experience: "It was like, why did I just spend $90 to have a permanent rotating advertisement for household products" [37:42]. Amazon's Daniel Rauch addressed feedback regarding advertising, emphasizing Alexa's role in enhancing Amazon Prime benefits and balancing personalization without overstepping.
Feedback Session with Amazon's Daniel Rauch
The hosts engage with Daniel Rauch, Amazon's VP of Alexa. They discuss the technical challenges of integrating LLMs with deterministic systems, leading to occasional misinterpretations and failures in executing commands. Rauch explains, "We have to increase our vision and do something more audacious... integrating LLMs with over 70 models in Alexa is very challenging" [50:05].
Future Prospects and Roadmap
Daniel Rauch outlines Amazon’s vision for Alexa Plus, which includes more personalized and authoritative responses, deeper integrations with Amazon services, and continuous learning from user interactions. However, the early access phase is still ironing out reliability issues and optimizing user experience: “We’ve got some training ideas” [56:27].
Business Model and Monetization
The conversation touches on Alexa’s business model, with Kevin inquiring about the balance between monetization through advertising and maintaining a positive user experience. Daniel Rauch asserts that while advertising is part of the plan, it's not the primary focus: “Advertising is definitely part of the business plan, but it's not the biggest part” [68:33].
Conclusion on Alexa Plus
The episode concludes with mixed impressions of Alexa Plus. While acknowledging its enhanced capabilities and potential, the hosts highlight the current reliability issues and user experience challenges, emphasizing that Alexa Plus is still a work in progress, poised for further development and improvements.
Kevin and Casey wrap up the episode by reflecting on the rapid advancements and growing pains in the AI landscape. They express optimism for continued improvements in GPT-5 and Alexa Plus, while acknowledging the challenges that come with integrating complex AI systems into everyday devices.
Notable Closing Remarks:
The hosts encourage listeners to stay tuned for further updates and their ongoing evaluations of GPT-5 and Alexa Plus in future episodes.
Kevin Roose [05:30]: "GPT5 is their best model ever. It's a significant step along the path to AGI."
Sam Altman (Paraphrased) [06:27]: "GPT5 feels like talking to an expert, someone with a PhD in a subject."
Casey Noon [20:19]: "You can now cheat your way through an entire semester with just one press of a button."
Amazon Representative [49:05]: "Everything is entirely new. There's a complete re-architecture of everything that Alexa is."
Casey Noon [36:15]: "Alexa Plus did not make a great first impression on me."
Kevin Roose [73:00]: "Should I be concerned? Maybe, but we'll save that for a later episode."
This comprehensive summary captures the essence of the "Hard Fork" episode, detailing the discussions on GPT-5 and Alexa Plus, highlighting personal experiences, expert insights, and critical evaluations. Notable quotes with timestamps provide depth and authenticity, ensuring that the summary is informative and engaging for listeners who haven't tuned in.