
Loading summary
A
Hey everyone. I'm super excited to be sitting down with the co founder of Siri, Adam Cheyer. Siri was the original AI agent and long before Apple bought it, Adam played a key role in creating one of the most important and widely used digital services of all time. What's cool about Adam is he's dedicated his career to building and designing a series of well received AI tools and has unique insight into what products will be winners and losers. I want to ask him where AI assistants go from here and if the current wave is overhyped or underhyped and how we can design the future we want. Let's find out. Adam, thanks so much for joining today. You know, you're the co founder of Siri, which is, you know, in many ways feels like, you know, kind of the mother or grandmother of what we've seen in terms of generative AI. Like I can very easily draw a straight line between what Siri was doing and what's happening now in terms of your world now, you know, when you look over the tech horizon, you know, what catches your interest, what excites you and what's your impact assessment for all things AI and tech.
B
Yeah, thanks. So I've been in this industry, next year will be 40 years in AI professionally for me. And I would say the first thing is this latest generation of AI exceeds my expectations in some dimensions. So it's beyond what I would thought I would ever see in my lifetime. So pretty strong statement. However, there are glaring missing parts to it that I don't think most of the industry realizes yet. And no one is doing well. And so I'm both equally amazed and frustrated at the current crop of systems. And as an example, a lot of people come up and say, oh, chatgpt is a hundred times better than Siri. I'm like, oh really? I go with Siri. I can say, hey Siri, tell my wife I'm going to be late. Can ChatGPT do that? Oh, no, no, it can't send a message to my wife. Oh, with Siri, I can say, play my workout playlist or make me a reservation tomorrow night at 7pm at a restaurant. Can chat. No, it can't do that. So one of the glaring missing pieces is when I first started my first version of Siri 32 years ago, 1993, before I saw a web browser, I said, someday everyone will have an assistant that you can say, I want to know this or I want to do that. And the assistant's job will be to work with all the services and applications around the world to help you get a job done, learning as you go. And it's amazing. On the knowing side, ChatGPT is incredible, but it's the architecture and all the agentic and, you know, orchestration type things we hear. Operator from OpenAI to me, it feels like what you get when you try to force fit doing use cases into a knowing architecture and it just feels so wrong and broken and just the wrong direction. The approach is not right. So I'm both amazed and sometimes frustrated.
A
That's really interesting and I like how you framed that in terms of the force fit. So it sounds like, I mean, I'll ask you directly, is this something that you think is just going to be solved in the coming months or is there just something fundamentally mismatched about how we're trying to solve this problem?
B
So I've spoken about this a bit in the past. I think there are three major topics that are missing with the current crop of AI and LLMs. And all three of them are hard individually. And in order to get it right, by my definition of right, you need to solve all three together. And so because of the complexity of each and the complexity of doing it all together, I'm hopeful it can happen in the next 10 years. But it's not, in my appreciation going to happen in the next, anytime in the next 10 years, few months or years. So what are the three things? The first is the user interface is all wrong in my opinion. So there's a reason why most websites, most apps do not have a chat based interface. Whereas today AI most often comes in a form where you're having a textual back and forth conversation. Graphical interfaces are important. Take travel. You know, travel is emotional, it's visual. You want to work with maps and timelines like you need visual information. Having a text chat to do that type of task is, it's, it's just not satisfying and it's a bad experience. So what is the right mix of language and graphical interface? That's, that's problem one and I don't think people have solved it. Problem two is this issue about knowing and doing. Of course, everyone wants both. We just saw yesterday, I saw a post, OpenAI just bought a pizza. I'm like, yeah, Siri did that in 2008, you know, good. But if you think about what is a knowing architecture, it's based on usually cached index of information. The PT in GPT stands for pre trained. So it's been built out. Whereas doing requires APIs, right live call so you can buy the ticket and book the taxi or whatever, right? You need authentication. There's a number of things. A company, if you're going to use an API and not try to do this web scraping approach, everyone's trying. You need participation from the service provider and no one has really figured out how to do that. And what's the right combination of knowing and doing? You want to be able to say, oh, book me a reservation at the restaurant that, that, you know, Heisman Trophy winner from Pittsburgh owns, or whatever, right? Knowing and doing. Users don't make a distinction to what's the right architecture that combines both. And the third is we need an ecosystem and an app store, since doing requires API calls and participation. Service providers want things like brand. They want their brand to show up. They don't want to just be disintermediated. They want to own the user, they want to control the user interface. They could do that on the web, they could do that on mobile. What's the AI equivalent? Where an OpenTable or a Yelp or Uber can plug in and have the things they need to feel successful as a business and differentiated as a business. And on the user side, users want different things. Maybe I like Lyft and not Uber, or vice versa. I have preferences. I, you know, I care about some tasks and domains, but not others. So how do you have a marketplace or an ecosystem? And I think it's a big problem today with IP rights and people are trying to use scrapers to go around working with the partners directly. It feels slow and broken and just the wrong way to do it. So for me, those are the three things that are missing. The right interface, an architecture that cleanly supports knowing and doing, and an ecosystem that provides the right a service provider the ability to have knowing and doing and the right interface under their control, meeting the needs of both sides of the marketplace. So that's my view of what's missing.
A
It's a really interesting list. And the reason I say that is because I hear these three and it's so obvious to me, hearing each one, it's like, yes, that is something missing with the crop of generative AI tools. And yes, that is something Siri got right, you know, 10, 15 years ago. And so it's interesting to kind of compare and contrast the two.
B
The irony with Siri is we got the mix of graphical interface and language. We got the access to APIs and apps for doing. We were not very good at knowing, so we had to like call out to Wolfram Alpha at the time. Now we have LLMs. But the third vision, the ecosystem, which was always, for me, the whole point of Siri, got left on the cutting room floor after Steve Jobs died. So you know that Siri launched October 4, 2011. Steve died the very next day. His admin wrote to me and said he was literally clinging to life to see the launch of sir. And he passed the next year. Oh my gosh. And we had agreed beforehand that just like the iPhone came out with 15 great apps to set the standard, the next year the App Store opened and every developer in the world, for every industry could now plug and play following the standard that Apple set. That was always the whole idea of Siri. We're going to start with the same 15 built in apps, you know, timers and weather and stock and maps, but then you need to open it up as an ecosystem so that it's as important as the web and as mobile. And that unfortunately never happened. So that third vision, like we were trying to get all three, all the way back in 2010, 2011, me going back 32 years, but that was never achieved.
A
I didn't know that story offhand about the timing and the missing piece there with the vision. I'm curious and I'm sure you have a very interesting perspective on this is to me, the App Store was one of the pillars of the success of the iPhone, right? Like that pillar was one of Steve Jobs genius ideas that made this whole thing take off. And it kind of seems like nobody's really nailed that ecosystem idea since then. Like, he nailed it and like, yeah, like Google did the same thing with Android. But I don't know, it feels. And I'm curious what you think, Adam. There's been kind of a decline of that and we haven't seen a really good ecosystem like that in the past handful of years. First of all, do you buy that premise? And second of all, why do you think that is and do we need to get back to it?
B
I do think we need to get back to it. I think it's a huge missing piece for me. There are paradigm shift moments and I came up with what I call the 10 plus theory or conjecture. Everyone says things are going faster and faster. I say things are actually slowing down a little bit. Let me explain at a macro level, not at a micro. Of course, there's a lot of micro things. So let's start with how people interact with computers. 1984, the Mac, you had the mouse, and windows go mainstream 84, 10/1 years later, and there was an ecosystem for it here's how you build applications for. With a graphical interface, right, that supports the mouse and Windows and all that. Ten plus one years later, 95, Internet Explorer and Netscape and now the web takes off and again there's an interface paradigm, there's an ecosystem. You just have to install this web server, blah, blah, blah. 10 plus two years later. Can see where I'm going with this. 2007 was the iPhone, and the following year they split the iPhone and the app store into two. So 2008 was the iPhone. So I've been saying that for more than a decade that in ten plus three years later, my prediction was that there will be a new paradigm for interaction as important as the web and mobile, and it will be the conversational assistant. And if you take GPT3, which was June 2020 and 3.5, or ChatGPT as we call it, which was November 30, 2022, if you split the difference, 2021 is a pretty good prediction. But you need, in order to be a paradigm scale system, you need the right interface, you need knowing and doing, and you need an ecosystem. All of these have that. The web mobile wouldn't even computing wouldn't exist without having those three things. And so I just feel like the potential for what we're seeing today has not been fully realized. It exploded in some dimensions, but it has these gaping holes. And then my next prediction, ten plus four years later from 2021. So 2035 is my prediction for the next major paradigm shift for how we interact with computers. And it is. Any guesses?
A
I don't know. Spatial superintelligence. Where are you going, Adam?
B
Close. I think it's augmented reality. And I think by 2035, battery life, the headsets, whether it's contacts or glasses, the application ecosystem, all of the things will come together where we can literally, we live in our world, but we can augment all the pixels that we see through our visual prosthetics and we'll have augmented reality, the physics, et cetera. And for me, the same way that Siri was like the harbinger that opened up the need, the desire, the possibility to have a conversational AI, you know, some 10, 15 years later, things like the Apple Vision Pro, they were not meant for consumers. Now Apple knew it wasn't, it wasn't meant for mass market. The battery doesn't even last long enough for to watch a movie. But it, it introduced the possibility, the desire that I think by 2035 the next, you know, breakthrough interface paradigm will happen. But again, you need the same three things. Knowing and doing the right interface and an ecosystem that every developer, every industry can build for it.
A
The I like that prediction. And the reason I like it is because it feels realistic based on the gaps that I'm seeing in augmented reality right now. And I've been, you know, just very briefly, I've been a skeptic of augmented reality for a while in the Metaverse, and I feel like every year we get sold some bill of goods of why it's, you know, that the technology is right here and everybody should be super excited. And I still haven't seen the killer app for it. You know, I, I still haven't seen the confluence of, of, you know, ideas and what can be unlocked. That actually makes me say, okay, people are going to want to wear this and are going to want to engage here. So I'm, I'm curious from your perspective, like, what's, what's missing, like, what could be here in 10 plus four years, or call it, you know, 10 years from now that we don't have today and people are getting wrong.
B
No. Well, I never think about a killer app. There are sometimes killer apps, like a spreadsheet. You know, Lotus 1, 23 might have been the killer app that brought everyone a PC on their desk in some form. But, you know, what's the killer app for graphical interface and mouse? What's the killer app for Web? You know, maybe Knowledge Search. Like, it changes things, but it's not just one, it's not one app. I always think about the killer os. It's a framework, a paradigm for doing things that is better than the way we can do computing today. And if you think about the paradigms in this scale, the graphical interface, we were chained at our desk and we had to load it up with software from CD ROMs and floppy disks and we could compute using what we had. The web let us magically now compute using software around the world. But you were still chained at your desk. The mobile interface now freed you from the desk so you could walk down a street, bump into lampposts while you're staring at your phone. You could now compute in moments that you couldn't beforehand. Conversational AI lets you now. There are still moments you can't do computing if you're driving in your car. 10,010. I forget the statistic. I think it's like 10 billion hours in the US commute time, or at least it used to be before pandemic. Right? You can't be looking at your screen or doing this but you can be saying, hey, you know, what's this? You can learn things, you can do things, you can, you know, it, it gets you moments when you're washing dishes. A conversational assistant can help you compute. And so what's missing for ar? What's the killer app? It's like today we're bound with screens everywhere there, you know, and someone said we've lived in a world for 10,000 years without screens, and we will live 10,000 more years in a world without screens. And why? It's because it is still somewhat inconvenient to have to pull out this phone and it's not in the right place. And it's, oh, this computer. I've got a desktop monitor this big and if I go to a movie theater, it's that big. Screens are so important today. And someday we'll be able to have screens that what we use screens for will vanish. And we can now do it with ar. But you need to be able to interact, you need to be able to have interface paradigms, you need to be able to get all information, et cetera. And that needs to be solved. Right?
A
And it seems like, you know, just getting back to it, it seems like right now the we're quite far away from having figured out the design principles to get that right and to get people kind of engaged in that. And maybe that's a compute issue. Like, maybe it's raw horsepower, maybe it's not asking the question properly. But I'm curious, Adam, I mean, you, in terms of creating Siri, I'm sure learned an awful lot about principles of design, about principles of human psychology. What were some of the big lessons you took away from that? And how do they inform the way you parse out whether technology now is going to take off or not?
B
Well, as I mentioned, we haven't even figured out the design principles for AI, let alone augmented reality. I think in many ways we may be closer to the design aspects for AR than we are with AI in some cases now, there's a lot to figure out. Occlusion. There are real world physics issues. Pointing is a little bit tough. You can't type easily. So how do you get information? Maybe it's through speech. But, you know, so I tell a story that, you know, how did I meet Steve Jobs? We had launched a free app in the App Store named Siri. We were a little startup, 20 people, 22 people. He called our office unannounced two weeks after launch, said, hey, what are you doing? Want to come over to my house? Tomorrow we're like, Steve Jobs is calling us. How did you get this phone number? Because people don't know one of the meanings for siri is secret in Swahili. And we had like no phone number on the website, no sign on our door, and yet Steve Jobs was calling us. That first day we went to his house, I had a discussion. I said people often think voice will replace the gui, replace the keyboard. That's not how it works at all. So here's my first design principle, both for AI and ar. If it's on the I'll say screen or visual field, the best way is to directly manipulate it used to be through a mouse. The iPhone introduced multitouch and pinch and zoom. But despite designers best attempts to put everything you need to know and everything you need to do on that screen, however big the screen is, you can't do it all the way. So if it's on the screen, the best way is to interact with it. If it's off the screen, the best way is to ask for it. And the perfect interface is a beautiful combination of both that keeps the task and conversational context. Whether you're clicking or talking or scrolling, it needs to be all, you know, seamless. And that's, it's multimodal in a way that people are multimodal all the time. So for me that's one big design principle when you look at devices. And when I sold Viv to Samsung, we launched our AI technology on half a billion devices. We had an open developer ecosystem, we had a good, I think we solved the three problems that I wanted to some extent. Wasn't a commercial success for various marketing reasons, but I think there are clues into if people want to know how to, how I think those things should be solved. Go look at our work there. But we shipped on, on phones, tablets, smartwatches, giant televisions, speakers. And each context you have to think about what are the capabilities for input and for output. Screens are super giant, like if you're on a big tv. But input is really a challenge. A speaker, it has to be voice in, voice out. Obviously a smartwatch you actually can do touch and do some things. But typing you can't do on a smartwatch. You can if I'm sitting at a desk type. You know, typing is a great fast way to enter information and pointing. So I would combine those two principles is you really need to have a seamless model that can dynamically adapt to the input output requirements of the underlying device. And if it's on the screen, the best way is to touch it. If it's off the screen, the best way is to ask for it. And you need a seamless transition where it, where users don't have to think about which they're doing.
A
So for anyone who's watching this and is, you know, in the, in the business of design or thinking about how to, you know, solve these challenges or, you know, create the next workflow or doer or app that's going to help with this? Well, what's kind of your approach in terms of the balance between.
B
Just, I.
A
Guess like abstractly figuring out what the journey is that's, that's multimodal that, that the user is trying to solve for versus just iterative feedback and getting something in people's hands and playing with it from there. Because Siri is kind of a long story. Right. Like as you said, it started in the early 90s. What, what, what's kind of your recipe there that you've learned and reused in your career?
B
Yeah, there are different ways for entrepreneurship. When we took Siri around to, we talked to every vc, we had a working prototype for certain use cases and Siri was very domain focused. It did restaurants and movies and whatever. And some VCs said, oh great, just put whatever you have up as an MVP and users will tell you what they want to do and you'll iterate with them and they'll tell you. I said that's great for many products. When I was the founding member and first developer at Change.orgChange.org was rapid iteration cycle. We tried so many product ideas until four years later it finally took off viral. So it was iteration. But with Siri, you only get one chance to make a first impression. Steve Jobs wouldn't have called our office on the early prototypes. Right. We wouldn't have changed the world in quite the same way. And so I wanted to take the time and work with some of the best designers and people to really think through these issues. You can't just throw it up and users aren't going to know what they want or if it doesn't exist yet. They'll tell you the problems with whatever you give you, but they won't tell you the solutions. And so user testing is important, but I think just doing deep thought around what are the possible space, you know, how do you do it? I'm so proud of many of the design approaches. I think there are things that we solved in the original version of Siri that got lost. And I'll give you one example. When Siri came out on iPhone And I had arguments with Steve Jobs about this. We argued lots of things. We had different opinions. I love the process, by the way. We can talk more about this, about how we debated solutions. But he said, I only want a voice interface. No typing to Siri. The original Siri had three modes of input. There was touch or, sorry, typing, tapping or talking. And they were kind of equal. And I say that the original 2010 launched Siri had 95% task completion rate success. I don't think any voice assistant since then has that. And why? It's because until we get LLM technology, which can kind of say something about anything, even though it can't do very much until then, every assistant, Alexa, et cetera, had domains, knowledge, and they would, you know, if the user asked a question that was in domain, make a reservation tomorrow at 7 o' clock and it was in scope, it would work pretty well. But if they asked something that was out of scope, they wouldn't. And so we, you know, it would fail. You know, you'd say, how many hairs on the leg of a grasshopper? And it'll say, looking for restaurants named Grasshopper near you. Ah, right. But that's not a question Siri could handle. Right, But a user doesn't know. And a spoken interface you can't discover. Speech is an expert modality. In a sense, it's not a beginner. You can't browse, you can't discover. And so we spent a lot of time looking at what's the beginner experience, what's the intermediate like? Maybe, you know, it does restaurants, but you don't know. Can I say, find me a restaurant with a view of Golden Gate Bridge that's like, maybe it's in scope, maybe it's not. Not sure. Right. So there's beginner, intermediate, expert. Oh, I know it can do make a reservation at a French restaurant, boom. I can do it faster with voice than any other medium. We looked at the different use cases and we optimized for all three. And I think many voice assistants and conversational assistants are only saying, you know, say anything. They don't give any support for beginners, intermediate modes and language is often an expert modality. If it can do it, great, but if it can't, and there are many things it can't do still, users don't have a way to discover that. And users hate to feel stupid. If it says looking for a restaurant named Grasshoppers, you know, name Grasshoppers near me, user will attribute the failure to Siri. Rightfully so. But they will feel bad that it. They'll have a, you know, they. They don't want to feel stupid. They want to feel smart and successful and empowered. And so then they stop using Siri. Except for the one or two expert use cases they know, set a timer and whatever. Everyone has two. So they don't expand confidently their scope. And as such, it only gets limited. And so anyway, those are some retrospective thoughts about how you approach the problem. And, you know, if you go back and find old Siri videos you can before the iPhone launch, you'll see our solutions to those.
A
So was there more of a graphical user interface then for that? Because that was one of the shortcomings we talked about. And to me, you know, the most obvious place for the beginner to learn more and, and be able to, you know, finesse. What they're asking is a graphical way. Is, is that what it was? Or was there a different.
B
So we had type, tap or talk. So let me say a few words on that. In the beginning, it would give you a list of the most popular suggestions by category. So you'd say restaurant. So you could just scroll, oh, it has restaurants, it has movies, it has, you know, blah, blah, blah. But what does it do about. As soon as you clicked on restaurants, it would pivot into a menu. You could say movies starring. And then it would say, you know, it'd say starring actor, rated pg. Like, you would say, oh, these are the things I know about movies. And you could go, you could kind of browse and build a model of what it did. So for the beginner, you could browse. Menus are great in a sense for browsing. Then typing, you could. We had what we call semantic autocomplete. So you would type R O, say you wanted to find a romantic comedy R O. And you would see romantic comedy would now show up on the list. But you would also see rodeos and tea rooms. You know, like rodeos. I didn't know Siri knew about rodeos. So it would expand your understanding of what was in scope. Even though you're looking for romantic comedies, you file away for later. Hey, cool. It knows about rodeos or tea rooms or whatever. So you would go, ro. Oh, here's the romantic comedies. And then you could say starring. Click to Tom Cruise. And it was like this autocomplete that both expanded your horizons of what was in there and contracted it to what was doable. And so you could leap in if you knew restaurants, but you didn't know if it was the Golden Gate view. You would type restaurants and you would see a menu of the possibilities. So for me it was, yes, it was a mix. If you didn't type anything or ask for anything, it gave you suggestions that you could browse. If you knew what you wanted, it would help you get there both expanding and contracting. And if you knew that it worked, French restaurants tomorrow, you would just say it. Boom. Speech is an expert modality.
A
I'm just reflecting on that and I'm coming back in the notion everything is still anchored for me to this current wave of AI tools and what they can do and what they can't do and their modalities. And they're getting better at modalities, as you know, but the, you know, they're not perfect when somebody gets it quote right. In your mind, is this going to be basically the, you know, the meeting point of Ciri and these conversational agents that know, like, is that where we're going? And in your mind, does that look something like, you know, the movie her with Samantha or hal or we've, we've got these kind of sci fi examples of that. Is this where we're heading? Or in your mind, does, does, does the road take us somewhere else?
B
Yeah, I think it's pretty close. Um, I think it'll be woven into the computing we use. We'll still use graphical interfaces. So it's not, you know, Samantha from her was all voice only. As I said before, voice is great when you're driving in your car or when you're washing your dishes, but it's not great for a lot of applications. Plan your trip to Paris. Would you go to Paris? Just if Samantha said, oh, you should stay at this hotel, it's like, no, I want to see photos of the hotel that I'm going to stay in. Right. It's useful, but you need more. So I think it will exist on screens. You will use a mix of GUIs and language in the seamless, perfect way, you know, natural way. Like you might browse through options and then ask an AI, hey, does this one, Is this one good at that? Or how far is this, like you'll ask a knowledge question or whatever, you know, are there any dog parks nearby? That would be hard to do on the screen. There's no good way to ask for it in the graphical interface. So ask for it with language. Right. So that's where I think the interface paradigm takes us. It's, we're still living on screens until architecture replace those screens. Then I think this conversational AI, I hope just like The Internet evolved, will evolve to have my three missing pieces. So if you, if you draw a parallel, the Internet was 95, but you remember what we had in 95? It was static web pages with horrible interfaces like Blink. The blink tag was really popular and it wasn't dynamic. You couldn't do anything. So bad UI static, couldn't do anything. That sounds a lot like GPT today to me. Bad UI knowledge, great. Can't do anything. No ecosystem really. But over time we got JavaScript and we got interactive. We had databases and middleware tiers and could now pull an APIs and real time and doing, oh, okay, it's getting richer. The interface is getting richer. The access to live dynamic websites from real data pulled from an API or whatever commerce starts to be able to happen. Amazon takes off. So I do think, and eventually we get these standards and now every business in the world knows how to build a good user experience that is dynamic and pulls live data and can know present information and do things. It's kind of mature. So I, I, I see this in the, the time up until 2035, we'll be evolving the maturity of AI just like we did the Internet, right?
A
And I, as I think about that and I think about these technologies, whether it becomes AR or it's AI now, you know, you kind of, you know, you introduced the idea that like killer app is not the right way to think about some of this stuff, which is totally fair and is interesting to me. The other way that I was thinking about Adam and you know, to me, whether it's the web, whether it's Siri, whether it's this, you know, conversational, you know, AI that we've got here is just like the word that comes to mind is magic. And I know you're a guy who has magic in your background and certainly a word I've heard you use before, to what degree is we're planning for this stuff? And as you said, have this like, wow, moment of a first impression, like, what's at the intersection of human psychology and magic and technology? How should we be thinking about this and what implications does it have for design and future product roadmaps?
B
Yeah, those are great questions. I have a few different possible answers. So one is one form of magic is presentation, right? Magic makes you feel. Makes you feel. And if you take Steve Jobs as an example, when he would do his product reveals, it was never about stats, and here's how much memory it has, et cetera. He literally did a magic show and he used magic principles in his Demonstrations. So I'll give an example. He once started one of his keynotes and he showed a picture on the screen of a jeans pair of jeans with a pocket. And he goes, have you ever wondered what he started out? And he said, the ipod has always been about a thousand songs in your pocket. And then he showed you a pocket. But Gene's pockets have a little pocket at the top. He goes, have you ever wondered what that little pocket is for? And everyone in the audience went, no way. That's a magic thing, right? That card, I haven't come anywhere near it, crossed the table or the other side of the room. Your signed card can't. But look, there's a card over there. No way that's going to be my signed card, right? Same exact mechanism. And then he pulled out the ipod Nano, this little thing that fit in your pocket, right? So one, there are lessons in magic about how to present your products and your tech systems in a way that makes people feel, that builds anticipation, that creates wonder. You know, when's the last time you've been really excited about a technology? Well, whatever it is, someone is using magic principles to make you desire it. The second thing I would say is AI and magic are very similar because it's a moving target. And in 96, I believe, AI was IBM's deep blue, beating the world chess champion. We're like, oh my gosh. Chess was an intellectual activity. Humans mastered highest form of intellect and now a machine could do it. That was AI. But then we're like, ah, it's just brute force search. Looked, looked ahead, right? Siri even came out and people were like, oh my gosh, I can talk to a computer to know things. You know, if we had said 25 years ago, you're going to have a device in your pocket who knows not only where you are, but who you are. And you can have a conversation with it and it will on your behalf, do things and answer things. Would have been magic, right? It would have been like science fiction. No way. Siri was that. And Apple, if you remember the six months after Siri launched, the stock price nearly doubled of Apple. It surpassed Exxon to become the most valuable company in the world market cap wise. Just selling the iPhone 4S, which literally was the iPhone 4. Slightly better camera in Siri. So magic it made you feel. But it's a moving target today you're like, ah, it's just Siri. I set a timer, right? And now AI is not deep learning. Now that was last year's definition of AI. Now it's large language models. So sometimes people say, what will AI? Oh, my gosh, it's changing so fast. What will AI be in 10 years? I'm like, oh, it'll be amazing. You know, all this large language model stuff. But, Luca, that's not AI because it's a moving target. And it's the same with magic. I give the definition of magic. What is magic? Magic is, you know it when you feel it. So I might do a trick, but if you don't feel anything, it's not magic for you. There may be two people for one person, it's the most incredible thing they've ever seen. For the other, like, oh, I've seen that before, right. Magic is only happening for one person. It's the same with AI. So for me, there are lessons to be learned. I also say entrepreneurs and magicians are exactly the same. They imagine an impossible future, desirable future. That's the hard part. What should exist. And then once you have the vision, you work backwards from there to solve the math and the science to make it come true. So whether. Whether it's David Copperfield flying over a stage or you've created Siri or the next great breakthrough in AI, you know, the. You need the vision, and then you solve the math and science, and then you hopefully make people feel and care. If not, it's not magic.
A
I. I really like that. I. I've never heard that before. I really like that comparison between kind of entrepreneurship and. And magicians. And, you know, it got me thinking, Adam, that an awful lot of entrepreneurship, especially in tech, but in everything, is trying to do these magic tricks and it's performative, it's trying to convince people I have magic. This is magic. And frankly, a lot more of it seems to be, like, telling versus showing these days. And I'm curious from your perspective. There's. We're all these days so bombarded with stories about, oh, this is going to be the next thing in tech, or that's going to be the next thing in tech. I'm curious, from your perspective, what have you been hearing about recently? Or what are some of the tech trends that you say, like, you look at it, you say, that's not magic. That's not going to be the next big thing. I don't care how many times you tell me that's not it. That's BS for right now.
B
Yeah. So. So one thing that I think if I have one superpower, I think it's seeing where the world is going to go and then timing the market. To get there. So I had been working on Siri since 1993 versions of it. If I had started it sooner, it would have failed too early. If I started later, maybe Alexa gets it right. So I built the first voice assistant. Many came after. I was founding member and first developer of Change.org, which has now more than half a billion members. It was the first social network for social activism. Many came after I started Sentient. I was a founder of Sentient, which was the first large scale machine learning platform company. 2009, we had more than 2 million CPUs and GPUs. Just as deep learning was starting to come out for neural nets and for genetic algorithms with Viv Labs, I sold to Samsung. It was the first open AI assistant ecosystem. And it wrote code on the fly in 50 milliseconds for every user request. It would dynamically write a program to solve complex tasks on the fly. And now we see AI can code almost 10 years later. So timing is everything. I believe that we will not see magic in the current crop of AI. I think we've seen incredible magic, man. You see these moments that are fantastic, but to some extent what's going to blow you away now? It's tough. I think what I want AI to do is mature as we talked about with the web, to get the knowing and doing and better interface and ecosystem. But none of those are magical in the same way. The first time you saw ChatGPT, right? Or the voice. The voice, voice to voice modes. I actually think the next big magic place will be to the side. It'll be something, something new. I have some thoughts about what those, you know, where the next big magic moment might happen, but I don't think it will be in AI. I think AI will mature and progress and become this kind of boring infrastructure for everything. It will hugely change our society. But we won't have just like the Internet has. Who goes to a library anymore? But no one's like, whoa, the Internet. Oh my God, you know, it's just the Internet. Oh, it's just a mobile phone. It's just AI.
A
So, yeah, what can you tell me or what are you willing to tell me about where you think the magic is going to happen next?
B
I'll tell you two things. The one I already gave you, which is my prediction for timing, or I do think AR for 2035. There's a lot to do to get there, but I think it will go mass market in 2035. And to do that you need a magic moment, right? You're not going to get mass adoption if you can't create desire and frenzy around it. There's another which is more subtle, but it's important and I like it or not, the world right now is being faced with lots of complex problems. We've got, we went through the pandemic which slapped everyone in the face, right? It made us feel a little bit of pain when you couldn't fly on an airplane, et cetera. But there's hunger, there's poverty, there's war, there's. Our country is divided, there's water, pollution, climate change. There are big systemic issues in the world and AI is not going to solve them for us. We are going to have to build new tools that of course, AI will be part of that. That allows us to collectively solve problems better, big and small. I think it's a huge need that's getting ever more present and in terms of my trends and triggers kind of framework that I do, we need something new. New breakthroughs in human collaboration that augments our intellect and our ability to solve big problems. AI is a little piece of it, but it's not enough. AI is not going to solve our problems. We working together will solve our problems. So I, it's not as sexy as AR and AI. I call it CI, Collective intelligence. And I didn't invent the word. Doug Engelbart, who also invented the mouse, he was the one to coin the phrase collective iq. And the whole concept of augmenting human intellect comes from him. But it is time now. I think the world is going to need that and I think that's the big breakthroughs. If entrepreneurs are listening to this, go, go there because the world needs it or will need it soon. And the first company that stands up and shows a real breakthrough there I think will have huge adoption.
A
Interesting. Where it sent me, Adam, is when I think about the story of Siri again, you talk about, you know, you talk about the next. The company that's going to solve this. Siri and I know a lot less about this than you do. You had a front row seat to it and created it. But Siri's story, you know, as I was learning about it, you know, sure, like there's Apple and there's Steve Jobs, but you know, I don't know if most people know, like the Stanford research was kind of a key chapter for Siri DARPA and some of the initiatives there intersected with it. It seems like there were some pretty big. There's kind of a who's who of institutions that at Some point crossed paths with Siri. Was that cross pollination one of the keys in your mind that actually enabled Siri to be the success that it is? And does that have implications for the future that we build? And collective intelligence being able to have institution like this kind of collective institutional approach?
B
Yeah, I just want to kind of shout out a little bit the role that government funding and research matters. So we all take the Internet as, oh, you know, as standard. It has brought huge prosperity to the world. Right. And we might ascribe it to Tim Berners Lee came up with a web browser. But all of the work on arpanet and Internet was government funded. Early on it laid all of the foundations right Licklider and many of the government people funding those projects were huge. If you go Back to the 2000s, mid 2000, 2005, there was a director of DARPA named Tony Tether and he funded thousand projects a year. But there were two that he cared about more than anyone and invested more in. And those two were the Calo PAL project, which was, what was it? It was an intelligent assistant that was based on machine learning, but could also interact and communicate with its user. It would build models of understanding of the user's world and their work life, you know, the tasks and the projects and who works on what and the roles and then perform support tasks on top of it. But it was the first, you know, a real early working AI assistant. And the other one was DARPA Grand Challenge. What was that? It was self driving cars. So those were the two big projects that he was funding. Tony Tedder was funding to universities and students and grad students doing research. Fast forward 10 years, 2015, Tesla comes out with its first attempt at self driving module. Amazing. Over the air update. Crazy. Never thought I'd see that in my lifetime. Siri 2011 launches and then there's Cortana and Alexa and Google Assistant and on and on. And it lays the foundation for the AI that we're living in today. So I do think that sri, which was a nonprofit research institute which funded, you know, received funding, some commercial, but largely from government research funding. That funding, even though it's just like sprinkling seeds, you can see a direct payout 10 years later to our economy, to our, to everything. Right. So it is, you know, I think it's important and something important to consider in this day as we're wrestling with budgets and you know, allocations and, and researchers from around the world and you know, how do you bring them to universities? And yeah, that system has contributed hugely to our, to our, you know, the world's prosperity and creative problems too. I'm not saying there are no problems. Sure.
A
No, that's great. And it's, you know, it's not always the sexiest topic, but it's so interesting and it's so powerful to, you know, remember what's been unlocked when we can have that, you know, kind of government support. And I've had a few conversations recently that have been reiterating that. Which is, which is awesome. Adam, you know, just kind of closing out here. Do you have any kind of parting words of wisdom for, you know, leaders who are building the digital products and services of the future and, you know, the best way to do that or to approach that based on some of the lessons you've learned over time?
B
Yeah. So I guess for leaders right now, AI is a thing. It is real and it is as important and transformative as the web and mobile. So every business will get involved, every employee, you know, their, their roles will change. So lesson one is we will have to embrace it. We, you know, just like the Internet, you can't ignore it. Right. It's there. You have to have a website. You have, you know, it's something that will have importance. It is still early in its maturity. AI is great at some things. I think language education is amazing. Summarization, translation, general knowledge is great, but it's also a couple years out from really being viable for some tasks that you would just think would be easy. Like take customer support. Well, if we have ChatGPT that can answer every question about anything in the world and every topic, well, obviously it should be able to answer every question for like just one company. No brainer. But if you look around, where's a great generative AI customer support system? It's not quite there yet. Even OpenAI doesn't have a gen AI support system. Why is that? Right, so there are. So I want to balance the hype. It is important you have to be learning and embracing it, but you should also know it's still immature and not quite there for some applications. So, you know, proceed with, with some caution and, and I say use it internally as much as you can with the right procedures and processes in place. External it's hard guard rails are still challenging regulations but use for your employees with the right procedures, find the applications that it works really well and you know, stay on top of things. But be aware it's not magic for every application.
A
Adam, I want to say a big thank you for joining today. Really appreciate the conversation love your insights was super interesting, and I really appreciate it.
B
Thank you so much. It was great to be here, Sam.
Date: August 25, 2025
This episode features Adam Cheyer, the pioneering co-founder of Siri, in a probing conversation about the present and future of artificial intelligence, digital assistants, and their impact on technology and society. Cheyer reflects on how AI’s trajectory deviated from early visions, what’s fundamentally missing in today’s AI products, and where the true “magic moments” of technology still lie ahead. He offers candid insights on design, ecosystem, the role of government research, timing, and the next paradigms of computing—touching on everything from AR to collective intelligence.
Cheyer identifies three core challenges that must be solved together for AI assistants to achieve their promise:
A. User Interface is All Wrong
B. Knowing vs. Doing
C. The Ecosystem/App Store Analogy
Adam Cheyer offers a nuanced, historically rich perspective on the ongoing digital disruption. He urges leaders to think deeply about interface, architecture, and ecosystem—not just to chase the latest tech hype. The real breakthroughs, he contends, will come from blending human-centric design, collaborative intelligence, and timing—the true ‘magic’ of technology innovation.