
Nathan Benaich, Founder and General Partner of Air Street Capital, joins me to discuss AI in 2025. From runaway consumer adoption to evolving enterprise moats, from still-elusive AI-driven drug breakthroughs to the renewed vigour in robotics, several core themes stood out.
Loading summary
A
Hi. Happy holidays. While I'm away this holiday season, I wanted to drop here a few of my conversations with leading experts in AI. I discussed how AI might surprise us in 2025 with Ethan Mollick, Dylan Patel, Nathan Beneach, and Kai Fu Lee. The conversations were initially released for members of Exponential View. Here's my discussion with Nathan Beneach. Enjoy today. I'm super excited to have Nathan Benesh, who is the founding general partner of Air Street Capital. He is a man who is committed to backing AI entrepreneurs for, I would say, coming up to close to a decade now, and very, very well known for his annual report, the State of AI. And Nathan, it's great to see you. You're in a wonderful, glamorous location today as well.
B
Yeah, thanks for having me. It's going to be a fun discussion. I mean, so much of what we've written in the report is probably due for update already.
A
Yes, well, I, I'm giving a speech tomorrow and I'm thinking, how much do I have to update because of Google's last announcement, which happened 24 hours ago.
B
Yeah. Hopefully you're not making any like fancy graphs that you have to go back in there and like update everything with painful formatting.
A
Yeah, we've done a little bit of that. Now, I always grade my predictions. I do a horizon scan at the end of every year. And I know that you and your colleagues with the State of AI report also grade your predictions. And you know what? We both graded ourselves pretty well, so almost too well. So are we being too unambitious with our predictions of the state of the AI world or is it just more predictable than we let we let on.
B
Or we're just insiders playing inside baseball? I don't, I don't know. I mean, there's certainly been predictions that we've made that have just not hit for several years. I mean, the main one has been around competition around AI semiconductors, and there's been so many new chip companies that have been started over the years. And I was originally excited about this as a prospect to compete with Nvidia, but it just looks like it's a losing battle over and over again. Which prompted that slide we put in the report this year of trying to stylize a portfolio of $6 billion invested in all these chip companies versus $6 billion in Nvidia. @ the same time points that the startups got the money. The Delta would is basically like 6 billion in startups would have transformed into 31 billion today versus 200 billion if you would have put it in Nvidia.
A
So it's Nvidia.
B
Yeah. It's so like there we got wrong. But I mean some other stuff we, we got right. That was like pretty punchy.
A
Yeah, that semiconductor one is, is also one that I, I got wrong. I got really excited back in 2015 and 2016 with companies like Graph Core. And then when Jonathan spun out of Google to set up Gro had mythic Mike Henry's business, Cerebras. And I thought these companies are going to do really, really incredible things. I think really only Cerebras is in the game now with this big single chip. And as you say, everything has flooded back to Nvidia, which I think is something we definitely got wrong. This has happened against the, this remarkable landscape of scale and adoption. When we look at consumer adoption, I wouldn't have guessed 305 million weekly users for ChatGPT if you'd asked me at the start of this year, 8th busiest website in the world. And then we see this with these revenue numbers. Menlo Venture just released their survey of enterprise spending on Gen AI generative AI. It was running at 13 billion for 2024 from about 2 and a half billion, 3 billion the year before that. And at the same time, Dries and Horowitz are saying there's no money in foundation models. It's a race to the bottom. So when we look at this question of scale, speed and the business model bundled together, which is the shape of the industry in a way, what do you read into what has happened in 2024? What do you think is emerging? And were you surprised with where we've ended up as we come to the end of the year?
B
Yeah, I'm not going to lie. Like I was surprised with the, the revenue prints of OpenAI and Anthropic because I remember making these slides like a year or two years ago and OpenAI launched and they were, they said they're going to make billions of revenue and like, I don't know anybody who predicts they can go from zero to billions like that quickly. It's hard to, hard to put your, your money against that. But like they, they've done it. So I think I've been surprised with that. I've been also surprised with how open source has competed with closed models. So you know, in some sense I was like enthusiastic of, oh, there's going to be a big universe of open source models specialized for all sorts of things and it makes pragmatic sense for customers and users to like not use The Ferrari for every task, but, you know, have a more specialized part for the job. To some degree that might still be true in enterprise, but I'm not sure if consumers care. On the consumer side, it seems to really be like ChatGPT equals AI and that's just it. And in enterprise, for those who really care about privacy and self hosting, maybe they can do open source. But at the same time, like the devil's argument here to me is if open source is so great and is really kind of growing in adoption, why is it that OpenAI, Anthropic and others are having their revenue grow so much?
A
Right.
B
So then it sort of leads me to these analogies which perhaps are helpful hopefully is like at the moment, I think because the market structure is so nascent, these products are so nascent, there's experimentation in both. In the same way that when smartphones were launched you had iOS and then at some point you had Android. And Android was cool for quite a few a while because you could do funky things with it, you could fork it, change things, whatever. And people like this open and permissive garden. But as you kind of roll the years forward, it seems to me like iOS just increasingly crushes Android. And at least for like the segment of people I work with, even in or outside of technology, I have, I don't think I've added a phone number recently that's not an iPhone in like the last year. And I add a lot of phone numbers based on the people that I'm meeting all the time. So I think that at the limit, users and businesses just want a service that's dependable, is powerful, and is updated whenever new things happen and is at a good price.
A
I mean, it feels like it's much deeper than. You don't get fired for buying IBM. It's deeper in the sense that you have an account manager, you have a roadmap that's visible, you have a developer community that is emerging, you have well run developer days.
B
You. Yeah.
A
And all of these things come together and if you're an enterprise, that is really beneficial for you.
B
Yeah, yeah. And then I think you have a bit of this challenge. If you're going to work on open source is like which, which cloud is going to reliably update and serve these models? Because in one sense, okay, you have like Google, Microsoft and Amazon, but all of those companies have their own internal large model divisions and probably biased towards serving, towards serving those. I don't know. I can't remember if Google serves open source models, but they sure as Hell, don't promote it in Google workspace.
A
Right, right.
B
Oh, so as like a large enterprise buyer of Google services, I think you're just going to end up nudging towards their proprietary stuff and then, then you're left with, you know, maybe players like Databricks or Snowflake that are sort of neutral grounds because neither of them are really committing to building their own, like large, large system. So maybe they can be the preferred partner to serve open source, but then they have different business incentives.
A
So, you know, look, this feels a lot like the, the traditional enterprise cloud market. Right. These are the decisions that people make. The buying is often, you know, we have a great MSA Master Services agreement with aws, Amazon's cloud, and therefore it makes sense for us to use these models because they are just good enough. But I feel the underneath that there are some real remarkable differences emerging. I mean, I've been surprised by two things in about two weeks that have really changed the way I already work. So of course OpenAI released the Pro version of 01 at $200 a month. And then I started to move lots of my workflows onto O1 Pro. And then this morning at Bayern, one of my colleagues got me access to Gemini Deep Research. And I've now done half a day's work using Gemini Deep Research, which has been available for 36 hours. I mean, we'll dig into each of those. But there is something that is quite exciting and it feels like there is momentum in the core technology beyond the sort of standard playout of clandestine.
B
Yeah, I think because they're quite general purpose and therefore the utility is quite diverse. They're less fungible. It's not like you're buying a deterministic API that does authentication with text messages. And it doesn't matter what type of company you are, you need that and you just pay for volume. And it is what it is here, I think this competitive advantage by creativity and what and how you formulate your tasks.
A
Yeah. So let's look at those models from those big companies and I think the three that really stick to mind are anthropics. Claude OpenAI's what is it, one model or two? Right. GPT 4 and 01. And then what Google has done with Gemini because each of them feel to me to be distinctly different qualitatively and quantitatively than GPT, which was the best that we had at the start of the year. Do you see it in a similar way that they are qualitatively different?
B
They certainly have different product experience, Vibes and they're going through different things, I think. I mean the main, the main sort of line in the sand seems to be at the moment like the coding abilities where developers love anthropic systems for coding. Now it looks like Google service is going to rival it, but that's one thing. And then with OpenAI you get voice and voice mode. You don't really have that in, in other models. I still think the Google experience is terrible. Like that AI dev. I can't even remember what the URL is.
A
Visual Studio or something.
B
Yeah, yeah, AI Studio. But then this vertex, if you're an, if you're a developer, like, oh, sorry, an enterprise, it's just very confusing. Pick a domain and stick with it. So, and then, and then the other probably big difference is, you know, anthropic systems are mostly consumed via API or at least that's where most of the revenue comes from. And then OpenAI is more of a consumer company.
A
Yeah. And we don't know what's happening with, with the Google consumption there. So if we, if we look at the voice mode. Yeah, I don't really get voice mode on OpenAI. Is this something that you use? I mean, do you have personal. I don't understand what to do with it. How do you use it?
B
Yeah, I mean for, but, but, but really for like consumer use cases, like I'm in the kitchen or something and like, you know, stuff's going all over the place. I'm like, how do I do this? And it's so much better to like listen to the agent, like talk to talk to Armin than write or read.
A
Well, personal use cases are valuable. I mean the iPhone is a multi hundred billion dollar business based on sending pictures of cats to strangers. Right.
B
They still haven't gotten me to convert to paying. I'm still on the street. Like I think these things are just kind of fun to talk to, to be honest. They're. I mean companion is the wrong term because that sort of implies that you, that you're like lonely or something. But I think just getting advice is pretty cool. Like I was, I was reading another mutual investor, like tweeted recently that they had this long discussion for hours with their colleagues about something and they fed the audio transcript in and then the agent like told them what the logical flaws were with different people in the room.
A
It's going to make us, it's not going to make us insufferable. At Christmas parties we'll have our little earpiece.
B
Yeah, yeah, wrong there. But, but it's Also, yeah, true. But also it's also cool to like, use this as a way of saying, you know, how would, how would this person react if I said this? Or what's, what's this person's vantage point without, like offending them and ask, you know, So I think that's very powerful and interesting. Plus, like, I just love voice. I think it's super Uncanny Valley right now.
A
Oh, yeah, yeah, yeah, it's really interesting. I, I mean, I use the clawed voice mode and I will show you my, my power trick here and for listeners as well. With Claude voice mode, you press the button and you talk into it and you can talk for up to 10 minutes. So when I'm dropping my daughters into school and I'm on the way back, I'll just hit voice mode and I'll say, hey, Claude, it's me. It's a Zema Zar a Z E M from Exponential View. It doesn't have memory yet. So I'm going to set this up. I'm going to talk to you about what I'm thinking and I want you to then give me notes for when I get back to my computer and I will sketch out an email I need to send. I will talk about a company maybe I'm looking at investing in and I want to be reminded about things I don't know. I might have a note for my team and I'll just have this Ulysses, like James Joyce stream of consciousness. Then when you get back, I get back to my computer, I will fire up Claude and you know, I mean, a second after I've stopped talking, it's processed all of this and I have these structured notes and I've made use of that, that 10 minutes that would otherwise have been quite dead time. So that's been a real shift in my behavior. It's something that I now do every day, actually, to the detriment of various podcasts, which I used to listen to on my, on my drive back. That has been pretty surprising.
B
That's sick. I just like how the Apple Store has like, you know, genius geniuses and like little boot camps that teach people how they could use, like, service. I think we really need that. Like, unless you have the volition to really go and read it yourself or like learn it or come up with this is inspiration. Like, how does somebody even think about using this tool in that way?
A
Yeah, I agree with you. I mean, I felt, I felt really out of touch when about three months ago I started to play around with one of these agentic workflow tools. This one's called, called Lindy. And it requires a little bit more, not so much coding, but computational thinking than just talking into a text box. But it's really powerful. And one of the, One of the things that we built, we use is we've built Persona in Lindy of four of our different reader types and another Persona who's an expert editor. And so we'll put essays through that and get them to argue until they believe that the thesis is really strong and distinctive. And then the editor will sum up their recommendations and it'll come back to us. And something like that. I felt, wow, I need to spend four or five hours figuring out how to do this. Now I'm quite lucky. Like you, I could get hold of the founders. So I phoned up Floral, the founder, and I said, explain to me how I get this to work. And he did. And you're right, you know, we need kind of geniuses to, to help us make, make sense of the, the tools. But then I wonder, like, you've. You've invested in these founders quite a lot. To what extent do you look for the ones who have got that insight of how do we make this application actually really, really work and sing for the, for the customer?
B
It's tough, like, because I think the default is, yes, you do. But when you look at the end of one or end of two that created this artifact, it was absolutely not. So I think the future is bright for new AI products because the ones that we have today are basically quite naive. Their models exposed as products. And I think recently models are not products because people need help and they need guidance and they need some affordances in the product and the UI to get things done and prompts and things. And then b, there were. All the products are designed by AI research people who are like, you know, not in any detrimental way, just generally not that much of like a user affinity compared to somebody who's been designing mobile apps their whole life. Those individuals who built consumer Internet companies and E commerce and all the rest of it media have been flooding into AI. And so they bring things like vignettes like the one you described and can make beautiful product experiences out of it. So I think there's so much more that's gonna, that's gonna come as a result of the phenotypes of contributors that are moving into AI. But similar to you, I think once you've experienced some workflows that really work, I don't think you can go back, like, for example, before, when you wanted to Generate a narration for text. You would read it and maybe you would use a recording app and then maybe like the next step was speech to text and you could edit your audio file as words. And that was amazing. Unlock. But it still took you an hour for like a few thousand words and the ums and things. But once you have like a voice clone, I mean, this is game over. Like, why would you ever read a piece of text?
A
Are you using voice clones?
B
Yeah.
A
Yeah. How. How do you use them?
B
Well, any piece of text I write, if I think it's like work generating audio for, I take 10 minutes and I generate the audio for it and then I shove it in the, in the same medium that we share the, the text. And you know, for people who want to consume it, they do. I've even tried to do like discussions. Like, for example, if we did this one over writing, you know, you have, it's like written interviews. You can have my voice clone and your voice clone and we just go, mine, yours, mine, yours.
A
So one of the things that you, you said in your. I think this is, this was. I read it on one of your essays, was that you thought that the consumer market around AI has incredible potential over the next year. Could you explain why you think that is?
B
Well, I think, I think the most exciting technologies are ones that unlock new user experiences and new product experiences that you couldn't do before. And I think in enterprise, we're still stuck in this mindset of there's a lot of unstructured data, it needs to be structured, needs to be summarized. You need search and you need workflow automation. This theme has generally been the case for like a decade, basically. And yeah, the capabilities are getting better, but everybody knows what the problems are. Whereas I think in, in consumer land, you have these sort of uncanny valleys that once crossed, unlock entirely new use cases. And it's not like.
A
So just explain the uncanny. Just explain the uncanny valley for, for people who might not be familiar with it.
B
I mean, the way I, the way I interpret it, maybe it's not the textbook definition. It's like you have some text, you have some tech capability. For example, voice cloning, which is not new. You, you know, it first kind of came out in 2017, 18. There was, you know, a business called Liarbird in Canada that was allowing you to clone your voice and they had this like Obama voice clone. It still sounded a bit robotic. It sounded like him, but a bit robotic. It's not good enough. That that's going to represent you in your business Persona. You know, reading your newsletter or like giving a talk or something. So it's good, but not good enough. But then once you get like the quality of audio cloning that if you were to share it, 98% of people would think it's you. And maybe there's one or two things around the edges that is good enough, that it's basically indistinguishable and therefore it is now usable and I can deploy it.
A
So coming back to why you thought consumers were going to work, you were saying that, you know, voice cloning was one of the things that was getting across the uncanny valley. But just let's understand a bit more what the. Where you see these opportunities playing out.
B
Well, I think then people realize that there's for voice, there's a lot of scenarios where you are speaking things that you already wrote or it's better to write them, but you maybe want a bigger audience that can consume it in terms of languages and that's painful. Or the post production of changing something that you've recorded is pretty excruciating, particularly if it's video. And like I record something, I make a mess up and then now it's dark and it was light before, like that's game over, as you know. Now you have this capability of like, oh, I said something wrong. I just go into the transcript, rewrite it and generate it and it's the job done.
A
I'm quite excited about the prospect of things that are simply too hard to do economically without AI. And that lends itself very much to the consumer market. Some of the things that we all, small business market, some of the things that we now do, and I'm sure the same is true in Air street are exceptional notes for every meeting. Like a conversation with my insurance broker about my home insurance. I get great notes. And you would never have invested in doing that previously. And I think that does change our experience as individuals of the world because we have got that layer of quality.
B
Yeah. And I, and I think it's going to expand a bit further. Like for example, we have a company that's building like an AI secretary over the phone for small businesses. And there I think it's quite cool because, you know, frequently you're calling your service provider, whether you're doing like home remodeling or it's your lawyer or your electrician or whatever, they're not there. And you get the voicemail like, please call me back, whatever. And now it's like an AI system can talk to you multilingual, resolve your query. Probably Pass your message to the individual. But then the cool thing I think is you get accountability because you could get the transcript back with the key points. You could maybe even like get text messages saying, hey, like the electrician said you should call them back in 48 hours. And it texts you in 48 hours saying, hey, it's been 48 hours, you should call back. So I think people are going to really like this stuff because yeah, more transparency and more accountability. They get the information they need. And I think people's like reaction to these systems are getting better because they realize they're not dumb anymore. Like it's not right. Like in so many cases I would prefer customer service, frontline customer service to be like an AI agent or an LLM or whatever because it's way better. And then if I actually need to talk to a person, then a person can get on the phone and not behave like a robot, which is what they currently do.
A
Which is what they currently do. The other problem you have is that people are rushing these agents out and they're not, they're not good enough. And so I've got a couple of consumer products because I'm sufficiently frustrated with them. I wouldn't normally do this, but I'm going to name them Oura, The LifeRing and SoundCloud, the music service which are replace for their premium customers. Any support ticketing with a bot, Neither of which work.
B
Yeah.
A
At all. And you're, you're starting. In fact, URA was so bad I had to get a friend who's, who could get his chat bot or a support to work to put my support query in for me and he's now handling it, which is a weird delegation, but it was the only way through. But I think that there is, there is this sense actually that as these products get better, the customer surface can improve and what you can then do is free the humans, as you say, to have much, much more discretion. And I wonder, I mean, do you look into that space? Because within customer service, the reason people don't have discretion is because they are working through decision trees and it's a principal agent agent problem. They're given no freedom to go left or right. Could you imagine that with, when you support an agent like that, with an LLM, it actually gives them much more discretion in how they could behave. I mean, is that, is that possible given what you know of CS technologies?
B
I think the benefit is like the language model just has access to a massive corpus very, very quickly and therefore the human doesn't need to somehow realize or like figure out how like some weird thing on your phone works to go fix it. And, and then also you don't have the multilingual problem. I think that's quite big. You can potentially handle a bunch of people at the same time.
A
Mm.
B
But I think because, because language models are quite good at outputting like bullet bullets and stepwise plans. This like aligns quite well with customer service which usually involves some kind of stepwise plan. And did you try this? Did you try that? If you didn't do that, try this.
A
But it could be more dynamic. So let's turn this into a 2025 prediction. What do we think? What do you think we will see with consumer meets AI for 2025 then that we haven't seen getting any real traction or success this year?
B
I still want to see these like real time speech to speech translation, like where we're doing this, but I'm talking in French and you're talking in English or something. And then it's like instant latency. Like it's with, with zero latency translation. That would be so epic.
A
And that would be. I. Do you think that that's going to be something that we could get in 2025 or there's still technical hurdles to get.
B
I think we can get there, yeah. For example, I OpenAI recently hired the guy who, who did WebRTC right to.
A
Do which was the foundation of Google Talk. Google Talk talks voice back in the day, 15, 20 years ago. Yeah, yeah, yeah, yeah. Okay. So they're, they're, they're thinking about that. They're thinking about that, that issue what I think. So I'll give you one of my thoughts on the consumer app. So there's a lot of noise around consumer companions. So Mustafa Salomon From Microsoft inflection DeepMind X has talked about them. And the new Google LLMs have very large context windows of a million tokens which is, you know, 800,000 words, roughly 900,000 words. And a human speaks about a million words a year. So, so you have now got the capabilities to run something that is, is recording your, your life in some passive way. So I've seen a few of these things on Kickstarter. They're little devices, pendants that you wear and you know, they record your life. But I think that you might start to see people figuring out some kind of product like that from the ground up. Maybe it's, it is, it is what we used to use Yahoo for back in the mid-90s and what we use our home Screen for today, which is the way into the services that we, we, we need. I mean I think we might see something like that come out in 25.
B
I think relatedly there could be thinking there could be something cool around like a personalized like generative autobiography.
A
Right.
B
Because like you could, you know, you're speaking all these thoughts into like some model or you certainly could do that. But, but based on all the artifacts that you leave online and things that you could tell it every day, you know, it can figure out gaps and stuff and then you know, over, over the next few years it can write like the story of your life and you know, guess what? It could probably predict like how things are going to go in the future if you continue on the same course.
A
Find that really compelling. I find it compelling as an idea because it sounds so kind of outlandish and slightly ridiculous. But so too is I'm going to take 500 photos a day and share photos of doorknobs, cats and cheese to my friends.
B
Yeah.
A
And yet I can, I can see that moment where people can have that sense of self reflection because it's, it's easy to do. It can be presented in lots of different ways and as you say, it could start to deliver.
B
Yeah.
A
Predictions. It's a cool idea. Nathan, are you going to, are you looking for investment for that?
B
I might just end up having to incubate it because I pitched it a bunch and no one wants to do it. But like relatedly you kind of have this already for kids. Like they have these, you know, here's like your two year old kid book or your four year old kid book with like pictures of your life and things that happened and whatnot. So I mean, I'm not saying you should have like an LLM for a 4 year old but like it's kind of cool if you know, they wake up and like when they're six or something and they're bit more human, like they can talk to a system and it can tell them about their early life. Like I would love to know some of these things. And you only learn these little artifacts through random conversations with your parents or picking up the picture books at the holidays or something.
A
The people who are not as excited about these prospects as you and I are. It's worth popping in that there are obviously lots of edge cases where there are bad counterexamples. The kind of the kid I think it was with character AI where there was a horrible situation and what was happening with replit, but there are also some good examples There's a robot toy called Moxie which is, has been designed by a combination of child psychologists and sort of pediatry educationalists as a, an emotional companion for kids who have Autism spectrum disorder and is sort of kind of present and patient the whole time actually before the LLM wave really. I saw it three years ago and you could see they had pretty good results from their sort of evidencing of its positive impact on child's well being. There are anchor points which says, hey, this can really work. What will happen? Of course founders will just rush out there and figure out what drives engagement and you know, what drives use and maybe something will emerge.
B
Yeah. And then everything I said is probably all wrong.
A
Everything else is wrong. Yeah, yeah. Okay, great. So those are, that's consumers. Let's flip to some other spots I am curious about. You know, you are a scientist by training. You have PhD in you know, biology and machine learning and so much promise about science and AI. I mean Dennis and his collaborators got the Nobel Prize for the work on Alphafold. And every day there's a new thing. Hey, we've accelerated materials discovery. Hey, we are doing something in multiomics, predictions and so on. Has there been any beat delivered from all of this wonderful work?
B
Hitting me with the tough questions I think yes, but the problem is the bar is just so damn high, at least in biopharma. Until we see a drug that can kind of unanimously be claimed to have not existed without some advanced AI technique become approved and really work and ideally be a blockbuster, I think you'll still always have the Debbie Downers basically that'll poo poo the space. And I think it's maybe because like the narrative has been framed as like AI is this magic thing for drug discovery and you know, these 10 year development cycles are going to become far shorter. We won't have to spend billions of dollars, et cetera. I think maybe that was too much of a grandiose like framing because I think similar to our discussion about how, you know, once you discover this workflow that can be better solved with AI, you're kind of not going back. I think the same is true in biology and, and just empirical science because when I look at 10, 10 years ago or so 11 years ago when I was doing my finish my PhD, you know, if you were for example taking pictures of cells with the microscope and then we wanted to like enumerate the number of cells that like gloat or whatever or express a certain protein, we'd still have to take the image and Then basically go click, click, click, click, click. Like there was no automated computer vision software that I could just show this to you and it would just be done. And that's like pretty nuts. Like now this is a trivial problem. And why I think this stuff is important and actually quite impactful is because like the enumeration of experimental data in science is such a big scale problem that right now a lot of humans are doing the analysis for and humans make mistakes and reproducibility is a real issue. That's probably one of the biggest issues plaguing most of science, particularly in biomedicine. Most of the biopharma companies or biotech companies certainly are spun out of academic research, which has largely been done by humans which are doing work that is not reproducible. And then, you know, go figure why a lot of biotechs don't work out.
A
Right.
B
So I think if you start like automating experimentation, automating data analysis, having it be far more robust, I think you can make quite a big dent into the irreproducibility of science. And I don't think you can go back. Like why would you ever go back to going click, click?
A
Yeah, absolutely. And you know, I, I have spoken to a number of scientists and read a lot of papers that have come out even in, you know, peer reviewed journals and there is a, I get a sense that once people start to move their labs onto these tools, they don't really, they don't really come back from them. In the same way that once you start recording something in a jupyter notebook or you do your analysis in, even in Excel frankly or spss, you know, you know, you don't go back.
B
Yeah.
A
And it just, it takes, it takes a bit of time.
B
Yeah. The other fun one I heard was around grant writing. So I remember there was like this study in Nature, I think it was like Nature Reviews or something 10 years ago or 15 years ago. And it was basically like the academic grant writing process is completely broken because the majority of applicants spend more time writing the grant, which overall costs more money than the amount of money they're applying for because of reviews and just how bloody slow the whole thing is. I have a friend who's in still in academic science, but he's using ChatGPT to help brainstorm grants with him.
A
Right.
B
So he's like, hey, I had this idea in like gene therapy and I think I could take this approach by, I don't know, like considering this pathway and this, this way that a molecule could bind and you know, what's the prior art? What do you think? Like, do you think this could work? Blah, blah, blah. And then like half a day the guy is like a nice grant that.
A
He can go, okay, so that is acceleration of science. Right? Because he's spending 90% less time Grant writing, he's getting the grant earlier and he is able to get working faster. So that's, that's already a return.
B
Yeah, yeah.
A
I mean, I, I think. What, what, what do you think? What might we see what would need to happen in 2025? Is it a, is it a, is it a technical breakthrough, a model breakthrough? Is it a company's products just crossing the chasm that would say, oh, wow, this AI science thing is now turning from, you know, promise to real evidence.
B
I think for bio you'd need to have a really good readout, which readout being industry parlance for like clinical performance. Probably like phase two or, I mean phase three is not going to happen based on the drugs that are in pipeline. But you need to have a really good readout that people like world and that this molecule is like substantially designed with AI.
A
So we have to just be patient, I think is the answer there. Because that process just takes the time it takes, right?
B
Yeah, yeah. I mean, the problem is like, you know, the kind of OG companies here have been at this for 10 years. You know, that's recursion and XCNTIA and they are making good progress, but it just takes a lot of time because also I think, you know, a lot of the AI companies in bio have been focusing on the early discovery. Like, I don't know, hey, we have like a billion molecules in the universe. How do we sift through them and find which one binds to this protein that causes a disease far more than they've been focusing on the clinical side of things. So in a way we have like this engine to generate lots of molecules, but you still have to feed them to people and run human trials. And even that is like not very, not nearly as advanced as what it could be in terms of like, who's the right patient to feed this drug to, where, where are they? What do I measure when they are actually in the clinic?
A
Right. And there's even a phase, there's a phase before that, there's, there's the process of synthesizing the first time and then figuring out whether, how you can deliver it is what you synthesize soluble. If it isn't, you have to go back through that process again. It's another 30 days and a hundred thousand dollars each time. So there are lots of places where you could squeeze out some time once you get these tools working. But it won't be a single model to end all other models. Okay, we have about 10, 15 minutes left, so I want to kind of whip through some other thoughts for you. So what about one area where I have I hugely revised my view of the world on self driving cars in in July this year. I looked at the data. Both are coming out of Wuhan and coming out of San Francisco and you know, wherever else Waymo was operating. And the disengagement trends were significantly improved and actual customers using this regularly was going higher and higher up. I know the chairman of Apollo which runs Wuhan, I spoke with him and got a sense of the fact that they're very, very close to or if not have already passed this break even point there. So I changed my view. I then sat in a Waymo in October and in San Francisco and then I was just way mowing my way around the city and loving it. It's an incredible experience which I'm sure you've done as well. So that then got me to think about humanoid robots, which I also had thought of as being just a little bit technology seeking an application. But I've started to shift on that and next year I will likely shift further once I've done more thinking. How do you interpret the robotics space and where it might go in 2025?
B
It's been amazing to see the vibe shift in robotics because for the last decade it was where you went to burn a lot of money and get nowhere. Yeah, there were some businesses that got to scale like auto store, Cardex, Berkshire Gray did a SPAC and then tanked and a few others. But it's been really hard. And I think generally speaking the mobile robotic space and manipulation space was gatekept by the fact that models were not robust enough to be able to adapt to all these customer sites. And you had to like build basically a custom system for customer. But like I think coming out of COVID and then b all this like general purpose AI system like really changed that because now you can have basically a single system now nobody wants to work in facilities and you had like a decade worth of companies throwing money at trying to tell businesses that they need robots and now they kind of believe it. So it's magical to see this like vibe shift to now this is renaissance of robotics which has really only happened in the last like year, year and a half at most. So I think it's going to go, I don't know, fast, but it's going to go faster in warehousing, logistics, manufacturing, probably more warehousing and logistics because those segments kind of serve online commerce and there's big pressure there. Manufacturing I think maybe is a bit more challenging but then as it comes to humanoids I think I probably have to think a bit more about it. But at the moment I feel like it's going to be self driving cars. It's going to take a while because I keep hearing, you know, that there's a lot of teleops, a lot of like demonstration where it's not robust. Many of the cases that are shown are better done with just normal, quote unquote, normal robots. And while I'm sympathetic to the argument that the world is designed around humans and therefore a humanoid factor makes the most sense, I do feel like quite a lot of the economically useful tasks don't really need a humanoid form factor.
A
Yeah, I'm not thinking about humanoid robots in industry. I think that Ocado warehouse, Ocado is a big online grocer in the uk. The warehouse is essentially a pixel system and these little robots on tracks run around and get pack what they need, you know, you'll do. And JD.com's warehouses in China, you know, similarly you don't need to, you know, weaken the capabilities of the machines by fitting them in a human world. But actually where I was slightly persuaded for humanoid robots, what the idea of them being a luxury product initially in, in homes or maybe in offices where there still are people, I was actually persuaded by one of my readers who said look, I bet you spend like me five hours a week loading and unloading a dishwasher. Right. How much would you pay for a robot to just do that overnight while you're, you're asleep? And I'm thinking, you know what, I'd probably pay something for that. You know, maybe not 10,000 bucks that the unitree Chinese robot goes for, but if it was half that price, you know, maybe I'd be thinking this is five hours of my week that is worth coming back. And I can imagine there being some people. So I've gone from completely not believing that people would want them in their home to starting to think maybe they would because maybe that would solve the dishwasher thing.
B
Yeah, but it builds character.
A
It builds character. My kids who are not doing it at 9pm at night, I mean to be fair, they do it on the weekends. But yeah, I'm generally the loader. Yeah, let's switch to a sort of broader the last few minutes kind of predictions thoughts for the next year. And I have just three sets of questions here. So set number one really is what are your top three predictions for the AI space for next year? We've covered a lot of ground. You can go where you like, chips, China, foundation, models, reasoning, AGI, risk, wherever you like.
B
I'm quite excited about these like fully generative world games. Like I think there's going to be a pretty cool video game that's based around interacting with gen elements. Like the whole thing is generative. And you can see where this is heading with Genie 2, this work from Google DeepMind where you can kind of throw a picture of an environment and then the thing will just generate.
A
Explore it for infinity.
B
Yeah, it'll just kind of keep generating and it has like some intuitions around like game worlds and levels and characters that you can move around. I think this is pretty epic. I'd love to see some like blockbuster game that happens there. I think they'll similarly they'll probably be like an app or website that's created by somebody with no coding ability that actually goes viral and like hits maybe like the Apple store top 100.
A
What's your third prediction?
B
I think, I mean I feel like there's going to be some like national security implications or there is like a line that I think governments are going to have to take with regards to these like big labs. And I mean there's not really any other country than like the US that has these big labs where it's like, okay, the next step is we're going to have to raise, I don't know, tens of billions or hundreds of billions. It's probably going to be from some nation states or sovereign funds. And like is that, is that okay? Is that a line we want to take or do we want like the US government to fund it as it relates to like critical, critical assets and.
A
They'Re critical assets because we don't want the weights and the know how to be exfiltrated or is there some other thing that makes them critical and requires that protection?
B
I mean, I think realistically, you know, even if you have a foreign government that wires $10 billion to a big lab, like if they have no special rights, how are they gonna, how is that $10 billion gonna give them any ability to extract information other than soft power? So I do think it's a bit of a, a soft power vibes, positioning kind of thing.
A
Right.
B
Where yeah, if the company just needs is like on a lifeline for massive sums of money. And then the only player that's willing to front the from a nation state that like, is now adversarial. Like, what are you going to do?
A
Yeah, okay, that's great. So let's go to the second section. Let's try to understand what you think the most determinant thing is that will impact the whole AI ecosystem. And we'll play this in a game which is if you can ask The Nathan of December 2025 One single thing and that version future, you can only send you back a short message. What would you ask them to best understand what will happen in 25?
B
That at the end of the day, like, if you're not a frontier model, then nothing else matters. But yeah, I think that's the biggest question. You know, I think it's heading that way. Like I originally thought it wasn't, but it sort of feels like it is.
A
Yeah, the sense that it's the big frontier models and they'll just expand and construct their capability. Okay, that's a great one. And then the final question, which is when you look out to 2025 and we think about both the growth in capability that we've seen in 24 and the growth in usage, consumer and enterprise, will that rate be faster, slower, or the same for the next year?
B
Going to say same because I can't choose. The reason for that is we sort of know what a 10,000 GPU cluster gave us. It gave us like GPT4, Omni, et cetera. We don't yet know what a hundred thousand GPU cluster has given us. And I think we debated this briefly on Twitter.
A
Yeah.
B
So let's see. But then I also think that there, there is so much more to run. However, I don't know what I have to see next to be in this like, holy shit moment again. Like, we've, right, we've, we've normalized so many of these capabilities, which is probably a good thing. Like society has to like adjust and adapt and like figure this thing out. So that's probably good. But at the same time, like, the bar is so high now, like, oh, cool, you have a better voice model, you have a better thing. Yeah, I've seen that already.
A
Who cares, right?
B
Whereas I think when, when you and I were like, I mean you, you were kind of earlier than me, like with your startup and machine learning, but like, Jesus Christ. I mean, if, if, if you back then saw what you had today, like, you wouldn't believe it.
A
I wouldn't believe it. I would, I Sometimes I barely believe what I have Today. Today.
B
Yeah.
A
I'm sitting there using it.
B
Yeah, it's literally magic.
A
It is magic. And I agree with you. I think it's very difficult once you have. As we've seen, there's one benchmark which is PhD level science benchmarks. And in 18 months, the best models have gone from as good as flipping a coin random to better than a typical PhD human expert. And that's in 15 months. Once you're at that level, I'm not actually qualified to judge whether those models are better, the future models are better than the previous ones because I don't have that expertise. So. So I, I think that it gets harder to, to judge. And my prediction on this question will be that the fact that it's harder to judge and the benchmarks won't move in as big lumps will create a little bit of negative media media pressure. But I think that usage will grow faster next year than it did this year because for exactly the reason you said, we are starting to normalize and come up with a language around this and I'm more likely to be surprised to the upside of my predictions than to the downside. So, in other words, I think 25 is going to be pretty exciting year and I slightly fear for you and your colleagues producing the next state of AI report. And I do hope you enlist some AI to support you writing it.
B
We tried this year. It turns out it was challenging to get them through reason through papers. But yeah, if more PhDs get paid for data annotations, probably to our benefit.
A
Nathan, thank you so much for taking the time to join me today. It's wonderful and happy New Year to you.
B
Thanks, you too.
Podcast: Azeem Azhar's Exponential View
Host: Azeem Azhar
Guest: Nathan Benaich (Founding General Partner, Air Street Capital)
Date: December 26, 2024
In this engaging, forward-looking conversation, Azeem Azhar and Nathan Benaich dissect the future trajectory of artificial intelligence, reflecting on surprises and missed predictions from 2024 and looking ahead to normalization and impact in 2025. They cover everything from business models and the power dynamics among AI companies to consumer and enterprise adoption, breakthroughs in biotech, the future of robotics, and bold forecasts for the coming year. The dialogue is grounded in tangible examples, candid self-evaluation, and a keen sense of both optimism and realism about the disruptive power of AI.
“We both graded ourselves pretty well, so almost too well. So are we being too unambitious with our predictions of the state of the AI world or is it just more predictable than we let on?” — Azeem [01:16]
“I was originally excited about this as a prospect to compete with Nvidia, but it just looks like it’s a losing battle over and over again.” — Nathan [01:46]
“I don’t know anybody who predicts they can go from zero to billions like that quickly… but like, they’ve done it.” — Nathan [04:13]
“It feels like it’s much deeper than ‘you don’t get fired for buying IBM.’ It’s deeper… account manager, roadmap, developer community…” — Azeem [06:28]
“Each of them feel to me to be distinctly different qualitatively and quantitatively than GPT, which was the best we had at the start of the year.” — Azeem [09:57]
“Once you get like the quality of audio cloning that if you were to share it, 98% of people would think it’s you … it is now usable and I can deploy it.” — Nathan [18:33]
“I think if you start automating experimentation, automating data analysis, having it be far more robust, …you can make quite a big dent into the irreproducibility of science.” — Nathan [31:31]
Three predictions from Nathan Benaich:
“I think there’s going to be a pretty cool video game that’s based around interacting with gen elements. Like the whole thing is generative.” — Nathan [40:25]
Singular determinant for 2025:
“If you’re not a frontier model, then nothing else matters.” — Nathan [42:52]
Growth & normalisation:
“The bar is so high now, like, oh, cool, you have a better voice model… Yeah, I’ve seen that already.” — Nathan [44:07]
The conversation is candid, insightful, and often self-reflective with a balance of technical acumen and everyday pragmatism. Both speakers blend optimism about AI’s possibilities with a healthy skepticism and willingness to revisit and revise views as the field rapidly evolves.
This summary captures the essential content and flavor of the episode, preserving speaker insights, humor, and thought-provoking forecasts for AI in 2025.