
Kai-Fu Lee joins me to discuss AI in 2025. Kai-Fu is a storied AI researcher, investor, inventor and entrepreneur based in Taiwan. As one of the leading AI experts based in Asia, I wanted to get his take on this particular market.
Loading summary
A
Hi. Happy holidays. While I'm away this holiday season, I wanted to drop here a few of my conversations with leading experts in AI. I discussed how AI might surprise us in 2025 with Ethan Mollick, Dylan Patel, Nathan Benech and Kai Fu Lee. The conversations were initially released for members of Exponential View. Here's my discussion with Kai Fu Lee. I'm delighted to be chatting to my friend Kai Fu Lee, a long term friend of Exponential View. He's a storied artificial intelligence researcher, Venter operator and investor based in Taiwan. He runs Synovation Ventures and has really been busy in the last few years with a series of generative AI efforts. Kaifu, thanks for joining us this evening.
B
Yes. Hi Azim, glad to be here again.
A
Now generative AI has really swept the US and Europe. The term generative AI, it's become synonymous with artificial intelligence, much to the annoyance of many long standing AI researchers who see the world of course as broader than that. Has that been the same in your part of the world?
B
Yeah, I think overall in Asia, specifically in China, I think generative AI is generating a lot of waves. It's probably not at the same height as, as the US partly because I think when ChatGPT came out it was a one of a kind and it was not believable that one could have that kind of user experience and quality. So that kind of became viral. Whereas in, in much of Asia, I think there are a number of competitors by the time the term became hot, both American and Chinese and from other origins. So people are feeling, oh, a number of companies can build it and they are free. So I think it's not created, it didn't have that chatgpt moment. But certainly more slowly it, it's kind of taking off or people are aware of it. Certainly in the research community and, and in the product investment community, it is the most interesting and exciting area.
A
Right. I mean, I, I agree there was this incredible CHAT GPT moment. It must be the most commonly used headline on slides that people, when people give their presentations on AI in the last year and a half, you know, the Chat GPT moment. Right. And the data, the market research data showed very, very rapid uptake of the tools by employees right across the broad border. I mean, particularly I saw data coming out of India where the rates of use within a year were extremely high. OpenAI has said 305 million people are using ChatGPT every week. And Menlo Ventures, which is a venture capital firm, had some data showing that the amount that US enterprises were spending on Gen AI products, mostly on foundation model access had gone up sixfold to about 13 and a half billion dollars that they're forecast for 2024. I mean, do you have any way of quantifying something similar in your part of the world and in China in particular? What do you think the numbers look like there?
B
Well, it's probably not those numbers, not at that level yet. The market penetration is still smaller and I think the enterprise spending is smaller. But some interesting phenomenon we, we see in, in China first is that because the access to good models came a bit later because OpenAI didn't make their models or chatgpt available in China. So there was a latency in the beginning but now the Chinese models are good enough and they're actually very cheap. And so that's enabling a lot of applications to coming out, come out. And we see more and more interesting products going in in the consumer space, both going in China and also going overseas. The other thing we see is that the businesses in China, it's a double edged sword. There isn't as well established a layered approach to the SaaS and enterprise software. So companies tend to roll their own, which is of course bad in entrepreneurs who want to develop a new space in the SaaS. But the good thing is these enterprises are getting their hands dirty. Dirtier than the American companies. Right. The American companies rely on stacks of APIs and technologies and products and the Chinese companies are going deeper and actually at this formative stage of the generative AI technology, one probably has to go deep to extract the most value and to develop the greatest economic benefit for the company. So we're interesting, interesting implementations that are helping individual companies but maybe not as easily expandable throughout the, the space.
A
Well, I mean the, the ramifications of those two points that you've made, which is cheaper and cheaper models that are getting up to GPT4 capability and the fact that enterprises have to roll their own are really, really significant. So we're definitely going to come back to those two points. I think they are super, super interesting and we should get there. Let's turn to how you and I are both using these tools. Has your, how has your day to day work changed in the last 18 months or two years now that are these tools are a little bit more mature?
B
Yeah, of course I don't do anything without generative AI tools but I want to make a particular emphasis on AI first tools. That is to flip on its head where AI becomes central, human is on sort of behind the scenes backseat driving.
A
But AI is Can you give an example? Can you give an example?
B
Yeah, absolutely. One example is with Microsoft Copilot. It is, it is able to have you create documents. It's a wonderful product. But the human is still in the driver's seat and you've got a really nice coach sitting next to you saying oh, I can fix that, I can improve that. But I think the right tool should be AI is writing better than the most of us. Not, not better than you because you're a well known author. But AI writes better than most of us for most type of things we write from letters to poetry to business to applications to research reports to PowerPoints. So we've invested in the company that build a product called POP AI P O P A I and it is an AI first application in the sense that when you go in it doesn't expect you to write anything. It expects you to continuously to prompt, improve feed in documents and then inform AI how to make a better document for you. When you're all done, of course you can edit it, but it's an edit A. The key difference with Microsoft Copilot is AI does the writing, human does the tweaking.
A
That's a, that's really interesting. And one of the implications of what you've said is that when we think about writing we often think about the words that just get produced one one after another. And I noticed a significant difference between GPT3 and my go to model now which is Claude 3.5 new sonnet by the way they need better products names than they currently have is, is that it has a conceptual understanding because writing really as, as any creative endeavor starts, right starts with the structure. The infrastructural ideas and the words that represent those ideas flow from that structure. What I found at GPT 3 and 3.5 was it was pretty superficial. When I now work with with Claude I think it's getting deeper to the fundamental ideas and how they they interact. Which tells me something about an LLM being feeling to me as more than just something that produces the next token. Does that make sense to you?
B
Yeah, I think Claude3.5 is special in that way both in its better writing in a more complex scenario and better code. One other interesting aspect in the POP AI product I mentioned is that it has a feature called Humanize. It's used to address an issue that people are starting to able to detect. Hey, this is written by ChatGPT and teachers would say check with students, hey, did you Write this with ChatGPT? So an interesting feature called Humanize is a special model that turns any LLM output that can check it for whether it's likely to be written by an LLM and then modify it so that it feels more human written. So I think, you know, we're starting to figure out ways two years after the launch of ChatGPT to fix a lot of problems like recency was a problem, hallucination was a problem, and the recognizable non humanness is a third problem. Now we're sort of patching each one so it's getting to higher level of usability.
A
And, and, and so in your use, you know, your everyday, how would you use, you know, how would you use POP AI?
B
Yeah, so I would, if I want to study something like is scaling law reaching, you know, list limits? I would find a bunch of documents that are from credible people and I would feed all of them in not one but all 20. And then I would converse with it and understand it. And then I would use my search tool to find out more stuff. And then when I'm done, I would have it summarize and write something for me with my proper prompt and then I would tweak the prompt until the result looks good. Then I would humanize it. And now you know the secret of how I do my writing now.
A
Well, that, I mean, I mean I think, I think that what you're doing as well, that is the, the kind of consistent interaction with, with the system. And something that struck me about the distinction between these AI tools and pop AI sounds like a good example of this is that with something like Google, Google search, the difference between someone who knew how to do it well and the average person was not very significant. But with the generative AI tools that we're starting to use, whether it's a more traditional LLM chatbot interface like Claude or pop AI, which is obviously more developed, it feels like the gap between a 10x or a hundred x human and a x human is more significant. Right. It's a combination of the domain expertise you have and also your understanding of how the tool works.
B
Yeah, and it's interesting you mentioned Google because the other example of some the tool I use that I've introduced to you is called Beagle B E A G O Another one of our venture built companies and it's a search engine that gives you one answer because I remember fondly when I went to Google, Larry Perry's page said Google is not the best way to do search because when you search you don't want a bunch of words and get a bunch of Links, you want to ask one question and get one correct answer. Of course that's been elusive because getting the correct answer has been difficult. But I think we're very close now. There are a number of search overviews by Google and others, but Beagle has found an interesting way to significantly improve factuality and with a very high content rich flow and also interspersed with images. So brings kind of brings a different kind of a search experience and I think at a, at a grand level it's a small step towards Larry Page's vision of one question, one answer. So now when I want to know something, especially something complex, right? If you want to do something simple, any search engine will do, in fact, other search engine will do better for things like navigation tasks. But if you want to ask something complex like why would Elon Musk want to serve in the government when he's such a brilliant businessman and an inventor? What does he get out of it? Right? These are the exact words you type into a one question, one answer answer engine. You should call it Search Engine. It gets, you get an elaborate answer with all the possibilities combining multiple sources. And again, you know, one of the problems with early why couldn't ChatGPT have done it? Because they didn't solve the recency problem that was solved by the recent advance of RAG and use the vector database. And, and also if you could have a search engine that has a, you know, up to the minute crawl of all news and then a full repository of truth like Wikipedia, WebMD and things that are held to be true and a bunch of social stuff that's not necessarily true, but recent and interesting and provocative. If you very quickly crawl these things and then combine it with a full search engine and an LLM, that's how you build a product like Beagle. So it couldn't be built a year and a half ago, but now, you know, also ChatGPT has a search capability. Perplexity is pretty popular. So I think, I think in, I would bet regardless of who wins in two or three years we will mostly move out of the current search engine into what I would consider answering. Yeah.
A
So it's interesting that both of the examples that you've, you've given are more sort of sophisticated applications, right? The we think about the foundation model, you access it via an API. Then we threw a chat interface over the top. But we, we haven't perhaps outside of, you know, image and video generation, seen a lot of applications. My sense of 2025 is that one of the words that will be common will be the idea of agents, right. Agent based systems emerging. But I, I'm curious that you've said applications a few times. So when you look out at 2025, do you think it's more about applications?
B
Well, I think we will have mature text tools like the tool I gave. I think we will have near mature multimodal tools. I think text to image is now mature, getting mature. So I would say you know text and text to image, we're going to see lots of applications in lots of applications and products. They will come out everywhere because inference cost is getting lower. Text to video, that's still a bit expensive. It's still got rough around the edges. So I will say that's maybe a year behind agent. I think we'll find some really exciting vertical applications but the pervasive use I think it's still got rough edges to come out right. If ChatGPT was an indication, it was shocked the world but didn't change people in the way they use tools until the main problems like hallucination recency were fixed or non humanness were fixed. So my prediction is that we've taken time to fix text and text to image. We'll need to take time to fix text to video and we'll need to take time to fix agents. But for those two technology will see vertical applications that are less vulnerable to the current problems of these technologies that just came out are exciting but either are too expensive or still have rough edges to be smooth over.
A
So the, so the point being that for a broad application, right, you need a degree of robustness because a broad application is all about edge cases whereas in a vertical industry application, you know, everyone's use is a bit more constrained.
B
Yeah, I'm also very, very excited about agents. I love the, the, the anthropic demo of controlling a PC. There's a gen AI company in China that's built something that controls a phone and, and they're both very nice demos, incredibly impressive because they're the, the extent to which they can control things. But a couple of issues I think. One is you know, still the speed and then the rough around the edges that would take time to fix. But the more fundamental issue is hey, these two products were designed not to be driven by voice. Why are you gluing a voice interface to something not designed to be voice driven? This was something I discovered like three decades ago when I tried to build speech into the Mac when I worked at Apple. And then later I've seen other people try to build speech into the phone and Even today they're not that common because the, the, the on the PC, people have a big interface called the keyboard and the trackpad or the mouse. So that is often the default way that people use this. But more importantly, the apps were built to be driven by keyboard and mouse. So now you're using Voice to drive the PC when there is a keyboard and mouse sitting in front of you.
A
Yeah, yeah. It's very inefficient, right?
B
Yeah, yeah. You start with a handicap, right? With a phone, it's better because the phone, I think, has not as good an interface and it's, it's something that has a microphone very close to you. So driving a phone is better, but much better would be driving something like, you know, a task like planning a trip or something like that. That was by definition voice driven.
A
I mean, what I, what I love about the voice. Voice apps, I use Claude on the phone in the voice mode where you and I can just. In my commute home, when I'm dropping my. After dropping my kids to school for 10 minutes, talk into this with the anything from how I have to respond to the insurance company to the new idea I've got to notes on a company I'm looking at investing in and I get home, open the app up and it's structured into four sections of. There were my thoughts. Now what it can't yet do is turn those into actions. I want to turn it into the letter to the insurance company and start to sketch out my diligence document for the investment. But it hasn't got that far. But I can absolutely see that happening. So I think Voice is pretty interesting. I want to turn to the core technologies at this moment. So a couple of weeks ago, absolute grandee, even though he's a young man of deep learning, Ilya Sutskeva was talking at the big flagship conference and he declared, made some comments about the scaling laws. So the scaling laws have been the thing that really sort of driven the investment levels. Throw more data, throw more compute, make a more complicated model, get a better model at the, at the end of it. And he says, look, pre training, which is a type of scaling as we know it, will unquestionably end. We've reached peak data and there will be no more. We're going to have to deal with the data that we have. Do you agree with him?
B
If I were working at OpenAI, I would surely agree with him because they have their most advanced. If they say so, it must be so. And we also hear issues about GPT5 scaling as much well as before. So it's definitely reaching diminishing returns if not reaching its limits. But I would say because we're not OpenAI, so I wouldn't fully agree that we're done right. OpenAI may be done. We're not done because what these giant models do is that they set a ceiling below which you can basically fine tune your model to be close to that ceiling. And if I'm not opening eye, my ceiling's lower, so I want to raise the ceiling so I can fine tune towards it. That's the first point. The other is when you have a great model, it becomes a teacher model. So whether you call it reinforcement learning with AI feedback or using better model to improve smaller models, you still need a bigger model. It doesn't have to be yours. But I think what he states is definitely true, especially knowing what he knows. But I would also add a few more minor points. First is that we haven't fully exhausted the data. There's a little bit more. Because my experience at 01 AI is that when we added Chinese to English, not only did our Chinese get really good, but our English got better. And our Chinese data is very, very, very high quality because we're a poor little startup so we can't afford to train on a lot of data. So we find really good ones. And we've done it for Chinese. A number of other Chinese companies have done it for Chinese, but I bet nobody has done Japanese, Korean, Spanish, Arabic. So. And when you go deep, there's the Internet data, there's also the book data. So I think we can probably still increase the high quality data by 2x. Of course that's not the next order of magnitude, but it's also not reached the limit. That's the 1:1 difference. And the other thing which I'm sure he would agree with is, is that there are other scaling laws like you know, o scaling on the inference.
A
This is, this is chat GPTs. Yeah, yeah, that's right. So, so he's talking, I think, about the, the big pre training scaling where the, the engine chops up and processes material and produces this final product. But what we've seen now is what people call test time scaling or inference scaling where you know, you ask the machine the question and it generates lots of different answers and evaluates them after the fact, goes through an iterative process and then presents a final answer to you. And I actually just had sometimes with ChatGPT's 01 Pro, have had it think for a few minutes before it gives me A response. And in that time that's, that's a computational problem. That's a lot of scale that that's been being thrown to it. So my interpretation is that the demand for computing capacity is still going to go up, but the ratio moves in in favor of compute that's required for inference, not just for the training up front.
B
Yeah, yeah, I think that's right. So the amount of compute is still a lot. Jensen will still be very happy. He can sell more inference chips. One other point I would make is that super fast inference speed becomes really important. Right before I think super fast training was important and one of our, at 01, our special result was that we were able to train pre train a very good model for just $3 million. And, and that was, you know, 2000 H1 hundreds running for a month. And the model is currently ranked number seven in the world. So that was something we focused on. We're glad we did it. Otherwise we would have spent hundreds of millions perhaps or at least a million hundred million, which we don't have in the bank. So we're glad we did that. But I think, and also about nine months ago we said, okay, making inference fast was more important because inference was too expensive. If you make it fast it also becomes cheaper. And that was important. Our other accomplishment was our inference is around $0.10 per million tokens, which is pretty cheap.
A
Okay, let's put some, let, let's put some framing around this for people. 01 is one of your startups focusing on building foundation models.
B
Yeah, yeah.
A
And you said 2000 H100. So the H100 is one of Nvidia's GPUs. I think the general view is that some of the big models were trained on forty or fifty or a hundred thousand hundreds. And Elon Musk is building this 100,000 cluster. Has built this 100,000 cluster. The other thing you send said was inference cost per token is 10 cents per million tokens. So a million tokens is roughly 800,000 words. It's what I might say in a year. And at 10 cents per token you've just forced me to update one of my databases because we had $0.24 per million token as the cheapest GPT4 quality model. And to contextualize that for listeners, we were up at $200amillion tokens a couple of years ago. Right. So the price has absolutely dramatically come down.
B
I think, you know, if people don't know a million tokens, I would say if you were building an AI search million tokens will give you about a hundred searches. Because we build an AI search.
A
Right.
B
And, and so if you think you're building a search engine, so 10 cents per hundred searches is point 0.1 cent per search, which sounds really low, which it is. If you were back in the early days of GPT4, it would have been 75 per million tokens. That would be 75 cents per search, which would bankrupt any company, including Google. Especially Google, if they were.
A
Yes, because they have the most searches.
B
Yeah, yeah, yeah. So that's a good comparison. But the other point I want to point to is because we're now focused on inference time or test time scaling, which means think longer and longer and longer on the test time, so the inference cost becomes so important because when I'm talking about, you know, $0.10 per million tokens in, in the GBD, the 01 search, it actually does a lot of tokens because it charges you for it. Because it does. But it doesn't show you that those are its chain of thought tokens. So that's kind of one aspect. The other aspect is when it's thinking for three minutes, it's crunching on gpu. So if you have a faster inference engine, it would crunch less. Right. If your inference engine were 20 times faster, you wouldn't need three minutes. Right.
A
How do you speed up that inference? Is it, is it architectural? So is it about the chips you end up using? Is it about designing particular chips that are good for inference? Or is it things that you do algorithmically, Distillations, optimizations, other things that you can do in the model?
B
Yeah, I think about a third of it is algorithmic. Like we use a very unique MOE model that's mixture of experts. Mixture of experts, yeah, we have, you know, many, many mixtures. You know, some models, like mistral, they have eight experts. We have, you know, more than 10 times their number of experts. So.
A
Right.
B
They're fine, finer, granular, so able to capture information better. And then each at the runtime, we activate fewer. Fewer parameters, total. So our total average total actively active parameter is 22B. So roughly it's 22.
A
So just to clarify, 22 billion parameters that you activate through the network think that people think that the CLAUDE model is about a 400 billion parameter model, GPT3 was 175 billion. So you're activating only a small portion of that, which reduces the computational cost.
B
Yeah. Our total parameter size is also over 200, but because our active is smaller, it's faster because of that. That's A third of a reason. Another third is we optimize our inference engine to build on a special cache memory architecture because GPU is expensive. So how do you do, do you save money? You have less GPU computing. How do you have less GPU computing? You use cache. Where's the cache? There's the HBM on the gpu, but there's also RAM and SSD and potentially SRAM you could have on the cpu. So while the GPU is crunching, the CPU is orchestrating, orchestrating data to the right places and remembering pre computed numbers so they don't have to compute it again.
A
So I'm going to, I'm going to, I'm going to pull out some acronyms again for the audience. HBM is a high bandwidth memory on the Nvidia GPUs, SSD solid state drives. SRAM is a RAM that's close to the chip. The CPU is the non GPU chip. And for most of us it's what we have thought of as a computer chip. And if you're watching this on video, over the back of my shoulder is a BBC B Micro which has got a 6502 CPU sitting, sitting in it. I think what's interesting about what you've just described of course is how the mighty CPU has fallen. Now all it does is orchestrate data flows from different memory, different latency for a gpu. Whereas you know, five years ago it was a thing that mattered.
B
Yeah, yeah, exactly. Right, yeah, yeah. So building that memory cache architecture and designing a special computer with the right memory and parts was critical for about a third. Now the computer is not a custom computer, it uses standard parts, but we, it just has a ton of memory and runs our software to use CPU to orchestrate the data. So that's about a third and the last third are just a bunch of engineering tricks like how many bits do you need to represent it? How do you basically deal with context windows and can you not deal with 22b? Can you sometimes answer the question with 4b, sometimes 10b, sometimes 2b? The fewer you use, the more money. So just basically, basically penny pinching and squeezing everything out possible. So three things.
A
But this is, this is super interesting because what you're describing to me here is the layer below what we talked about, which was AI native applications. Here what we're understanding is that the engineering of the infrastructure and the O and those architectural choices is really, really important and is a new set of disciplines that perhaps we've only got two or three years experience of and then it makes me think about 20, 25 and beyond, about the class of applications that we might have and the choices that we might make. We might have. So if you speed up inference and so instead of having to wait three minutes, you get the same quality of response in nine seconds as you said you 20, 20 times improvement. That sounds great. Except what if I really want the best quality response I can and I'm willing to wait three minutes.
B
Right.
A
So don't we end up having a kind of a Pareto frontier set of trade offs where, where I can say listen, this is important enough, I'm willing to go off and make a cup of tea, think about it for three minutes and then I'm sitting there thinking how do you make that easily accessible as a choice for an end user? Otherwise I'm constantly faced with the cognitive load of where I want to put that slider. So there seems to be actually that feels like it's a UX user experience and design problem more than an engineering problem that we need to, we need to address. Because there's any number of tasks I want responses in a second, but there are ones which are so important I'm willing to wait half a day if necessary. How do you think about that?
B
Yeah, yeah, I think yeah, clearly you lose a little bit of precision in doing the speed improvement that we have. It turns out not all that much, but they do lose some. But in some cases if you're trying to win a Nobel Prize and you had your human who's having the O1 Pro being your research assistant, you rather it take longer to get the right answer right. But if I'm doing a search, I would rather get an answer instantaneously than wait a minute. So you have to really give that flexibility, I think to the application developer. So they can have super fast and not perfect or really slow or somewhat slow, but as good as they can do. Yeah.
A
And I think some of this ends up being about perceived time for the, for the end user. One of the things that I notice now with, with Google searches is Google searches are very fast, but I may have to do six or eight to get the result I want. But the total time ends up therefore being a few, a few minutes. And in some sense I might be happier with a response that takes two minutes. But the trouble is what am I going to do during those two minutes while I, while I sit around and wait?
B
Right.
A
That feels like an eternity compared to just doing another Google search. So this, this all you know is great opportunities for UX designers And you know, product, product people, by the sounds of things.
B
Well, Azim, you just made a very insightful point. I have to point out it's something we weren't ready to publish before, but I talked about our Beagle search and that's in fact what it does. It, you ask a long question, it decomposes it into multiple searches. So it calls Google search not once but a number of times and then the results come back and are aggregated. So. So then you can get the answer pretty quickly. But it's actually mentally modeling your process. But on your other point about the user, the interface designer, having to really understand the user to design the right interface. And I think it's going to be hard to be a UI designer who doesn't understand something about the technology. So now they have to know more about that as well and give feedback to the people who are building technologies.
A
I want to come back to the two really interesting points you made at the start of our discussion. I said I'm going to hold them back so that people kind of get through to this far. So you said a couple of things. You said that, you know, models are getting cheaper and cheaper and particularly we're seeing in China from, and the region, both from 01, but also deep Seq and Quinn models that are extremely good and extremely cheap. And it suggests a path. And one of the beauties of the cost of these models being cheap is it's easier for people to, you know, expand experiments and make mistakes. The second thing that you said was that Chinese enterprises are not used to the kind of clean layers of SaaS and infrastructure that American firms are. And so they're having to get their hands a lot dirtier. And getting your hands dirty is really about experimentation and about learning.
B
Right?
A
That's what it involves. And the reason I find these two trends so interesting is, is that as I've talked to enterprises across Europe, in the US and some parts of Asia, the thing that I've said is you have to build your learning muscle because the technology is changing so quickly. You can't just pick up the phone to Salesforce and the thing is delivered. And the only way to do that is actually to get close to the technology and to build and to experiment and to make mistakes. And that is where commercial velocity will ultimately come from. So when you made those, those two points, I was sitting there thinking, wait, does this actually somehow benefit an ecosystem that doesn't have a really mature SaaS environment in the SaaS stack?
B
It's. I'm hesitant to reach that conclusion, but it's a plausible consequence. To give you an example, we're working with a very large patent team patent company in China. They have a lot of patent experts who are engineering plus legal experts who understand the invention, disclosure works with the inventor, write something and file a patent. That, as you know, is a very expensive process that a very highly paid people get paid doing. It's very hard to get in the door. So if you could somehow improve the efficiency of that job, you can either make a lot of money, more money, or save a lot of money or get more patents done. So in a typical American case, I think people would say, okay, well, I would use the Cloud API, the GPT4.0 API, and I would bring in some, you know, Accenture or whoever to come and implement something for me. But I think, you know, the system integrators aren't going to be domain experts. They're not going to be the best at the, the, the chat, the, the LLM stuff. And then the patent lawyer people don't know the technology either. So you have three people with discrete skill set that may take a long time to reach the right solution. And then they insist on building it on top of whatever they have. They may have data bricks, they may, they may have SAP, they may have salesforce, whatever combined together. So in China, this team just called us, this company, it's a rather large company with thousands of these lawyers and engineers. They just says, well, we have, you know, some, I can't disclose the number. We have, you know, hypothetically, 2,000 lawyers who are engineering people. Can we, can you help us do 4,000 lawyers worth of work? And we go in and say, well, we can, but it's going to take a lot of tweaking. Say, well, you tweak it with us. Sure. And we say if we tweak it and you get 4,000 lawyers of output, we, you give us X percent, let's say 10% of your savings. And then they would say, yeah. And then we might even say, I'm hypothetically speaking again, maybe we can form a joint venture and we can develop a software that will give you the a hundred percent improvement in efficiency. Then we can sell it to other patent firms who can get similar benefits. And then this company might be able to go public. So it's a completely different business model. So we feel if we do the joint venture, there's a lot of upside. So we're happy to send 5 experts and 10 engineers to do this task. You know, and OpenAI would never do this. Right. So we'll see how it works out. But it's a very different business model, very different type of upside and with open minded traditional companies like this patent firm, we might be able to hit something interesting. I would not quite be ready to say that, you know, it would disrupt the American default. I think that's going to be difficult because if you ask me today would I rather play in the American ecosystem or the Chinese ecosystem, I think I pick American because there are many, many low hanging fruits to build a good business.
A
Yeah, yeah. I mean definitely I'm seeing a number of startups that are taking around ground level up approach of the kind that you're talking about in the US market. And I think that you've got a classic innovator's dilemma if you're a traditional SaaS company because the way in which you build, the way in which your products work, the way in which your teams are organized doesn't necessarily sit nicely with an AI, you know, an AI model. But I think they can market more aggressively of course than any of these startups.
B
So I was just thinking of a paper from, I think Sequoia wrote the paper. They said it used to be software as a service, now it's service as software, so. Right, right. So when we go in and help this patent company, it's flipping, flipping on its head.
A
Yeah. And you're, you're charging like a share of the kind of combined increase in value which is, means everyone's aligned and I think finding AI alignment support pricing for these tools over the next year, 2025 is going to be, going to be quite interesting. You know, should you be charging per API call? Which feels a bit naive. Should you be charging per seat? Well that also doesn't feel right in an autonomous sort of self running system. So, so I, I think that that'll be interesting to explore in 20, in 25. Let me, let's understand, let's come back to, to where we think we might end up in, in 2025 by looking back at 2024. Would you, if you can cast your mind back to the Kai fu Lee of December 2023 and you look at where we are in 2024. Have you in general been surprised to the upside? Things have gone faster, deployment has been quicker, there have been more breakthroughs. Or are you, is it where you thought it would be or are you surprised to the downside?
B
I think the, the technological pace is faster than I thought, the barrier of entry was lower than I thought. The fact that a number of Chinese companies are now reaching the entering the first tier on, on the shoestring. The fact that a number of startups like Mistral Cohere are able to build interesting models and technologies and the fact that Anthropic could, could challenge OpenAI by taking away its market share is really more dynamic than I have seen. So that's a bit of a surprise. So the total amount of innovation piecemeal, I think in aggregate is very good. Maybe roughly equivalent to what I thought. The number of killer apps is probably a bit, a bit lower than I thought. But then I would attribute that to the inference costs becoming what has been an inhibitor because the cost was prohibitive for apps to be built on expensive APIs. But now that has changed. Yeah, and I'm a little disappointed that multimodal is still more of a demo than adding real value, with the text to image being one notable exception. But I think that will change next year.
A
Why? Why is multimodal modality important?
B
Well, it's clearly important ultimately because we live in a world where that is three dimensional and we don't learn closing our eyes and just, you know, input text and tokens. We learn faster, better because we have a world in which we live in and we have functions well beyond cognition. We also manipulate and move and reason spatial reasoning. So and, and ultimately if we had AI that was really smart, it needs to understand the world around us, it needs to move around, it needs to manipulate things, it needs to get physical tasks done. So it's like when those things are possible, it's hugely beneficial, maybe even bigger than text. But I think we are limited in a number of ways. I think one is just we have a history of text data, right. Thousands of years of text. We've kept, we didn't have a thousands of years of data images, certainly not video. And what little video we have is either movies or TikTok and those don't represent videos of the real world. So our data is, is less good. So it's understandable why it took longer. And, and also, you know, video understanding has come a long way, but where are the apps? So I, I think hopefully next year we'll find a lot more apps because you can't let researchers come up with the apps. It's like, you know, good piece.
A
Yeah, yeah, yeah. Okay. So coming back to 2025, you talked about multimodality and this multimodal world we live in. That has an implication on robotics. I have changed my mind this year on self driving cars. I had a chance to chat with people from the Apollo project. And of course I went to San Francisco and I could not get out of Waymo cars. I just wanted to get in them. And so I really realized that they're, they're changing. I think Apollo in Wuhan is going to be profitable this year. Waymo is, is already as big as Lyft in San Francisco and is getting access to more and more cities. So I think on self driving cars 25 will be a story of the start of that part of the S curve where things really take off. The other thing that I'm starting to think about with humanoid robots is I didn't really believe the thesis of humanoid robots in industry because I think that you know, as JD.com has with their warehouses and Cardo in their warehouses, you, you just redesign the warehouses to be roboticized and automated. And then I was, and with that I was uncertain about whether humans, people would want, consumers would want humanoid robots. But even there, while I haven't changed my mind, I'm open to the possibility that they would start as luxury products. I'm looking at what's happened with Unitree's robots coming out of China and then figure and sanctuary and so on in North America. And so my mind changed slightly in, in 24 thinking about robotics, self driving cars or humanoids. What's your view of where it is and how might your mind change in, in future couple of years?
B
Yeah, actually my VC Sinovation Ventures invested in about five autonomous vehicle companies. We're happy to see they're all going IPO now. So that's, that's great. I think what you describe is very accurate. I'm, I'm quite bullish. But I think we still have a couple of hurdles to overcome. One is basically they need to go from their current technology to more of an end to end technology. That is they're still glued together with old AI technologies. They sort of work. But even Waymo, I don't know if they're end to end. Tesla is close to end to end but then they don't have all the sensors. So I, I think a lot of these smaller autonomous vehicle companies have to basically revamp their technology stack. That will be a challenge. And also I think where probably as this technology matures it may be the first one where the professional drivers may have various complaints. Right. So, so jobs being displaced in large quantity in this case truck drivers for example, are highly paid and not easily able to take on any new job because they're so good at what they do and they haven't done anything else for perhaps for their careers because it's a highly paying job. And also I think accidents and deaths may create more issues. So we'll have to see those are not technical. The only technical issue is upgrade to the end to end or LLM like capabilities. So that's how I feel generally. Bullish embodied AI. I'm also very fairly bullish, but a little bit cautious about immediate applications. Embodied AI means, you know, the, basically the AI is able to capture data and integrate into the real world. It doesn't have to be humanoid. So that aspect I'm bullish and I think it's critical to taking us to the true multimodal application. Now the humanoid. I still reserve some conservatism about that because I think the difficulties with mechanics are not necessarily solvable by these fast improving AI algorithms. There's still mechanical parts. Also, I think with humanoid robots there's one thing very special about it is because you're now launching robots, not you're not benchmarking it with a robot dog or a robot car. You're benchmarking with a human. So where it falls short to the human, people are going to get disappointed. And you set a very high bar, which is, you know, why. Early efforts to deport deploy natural language processing for customer service was, has been difficult because people assume it's got to masquerade as a human. And when you can't, you are unhappy. When you know your first generation, you know, smart speakers from Amazon start to talk to you. You're very happy when they can just play back a song. But once you start, once they entice you to talk to it, you think it's human. Then you realize, hey, you're not human. You don't know all these things. Now LLMs change that. But now the question is, you build this human that dances. Oh, that looks cool. And then it enters your home. Then you're saying, hey, what? Why are you falling down here? Why can't you pick up the clock like I told you? Why can't you shut off the TV like I told you? Why can't you cook me a meal? Aren't you a human? So that high bar may end up and the high price that you pay so you expect a high bar which is human like so that might create a wave of disappointment as well.
A
Interesting. Interesting. Yeah, I, I can imagine that that high, high bar, that set of expectations is pretty tough for the humanoid robot. But we are in the expectations game during this conversation. So let's turn in the last couple of minutes to 2025. In general, we'll go for quick answers because we're at the end of the call. What do you most hope will emerge in the field in 2025? What is the thing that you would make it a dream year for agents.
B
Real multimodal app and seeing every app being replaced with an AI native app. And that would be. Those are the three I look forward to.
A
You've got a big wish list there. I mean I know we're near Christmas and I've got a beard, but I, I'm, I'm not as good as Santa Claus when it comes to, to, to present. When you look out for, for 25, are you more likely to be surprised to the upside as we get to the end of 2025 or to the downside?
B
I think on apps we will, I won't be but I think most people will be surprised with how fast and furious these will come to the upside. I think technology and model, it's hard to say because like Ilya just turned a whole world optimistic about scaling law to not so optimistic. So I think there will be disappointments. I also think on the technology side, OpenAI, I respect them greatly as a company. They're brilliant, they're industry leading. But I think Sam makes a lot of promises and when, when those promises are not delivered that would cause a huge downside. I hope that doesn't happen. But you know, Elon Musk somehow has had the ability to make big predictions. Some come true, some don't. He's still hugely respected as he should be. I think Sam's build up larger expectations and just like Ilya's comment I think really hurts OpenAI's set expectations. I worry that more of those promises, driven by their need to grow, may cause some downside in the technology side.
A
Either way, it's going to be another year of a lot of reading, a lot of rewriting, rethinking, redrafting. I mean it's an amazing time for us to be in this space. Haifili, thank you so much for your time today.
B
Yeah, thank you Azim. Great.
Date: January 2, 2025
Host: Azeem Azhar
Guest: Kai-Fu Lee, AI researcher, investor, and CEO of Sinovation Ventures
This episode features a wide-ranging conversation between host Azeem Azhar and renowned AI researcher and investor Kai-Fu Lee. They discuss how AI—particularly generative AI—is evolving globally, focusing on developments in China and Asia compared to the US and Europe. The conversation covers emerging AI applications, shifting business models, infrastructure and cost reductions, the future of search, multimodal AI, robotics, and key developments expected in 2025.
“Chinese companies are going deeper and actually at this formative stage of generative AI technology, one probably has to go deep to extract the most value…”
—Kai-Fu Lee, (03:52)
“The key difference with Microsoft Copilot is AI does the writing, human does the tweaking.”
—Kai-Fu Lee, (06:51)
“If you were back in the early days of GPT4, it would have been 75 per million tokens. That would be 75 cents per search, which would bankrupt any company, including Google.”
—Kai-Fu Lee, (24:21)
“It used to be software as a service, now it’s service as software…”
—Azeem Azhar referencing Sequoia paper, (38:41)
“With humanoid robots…people are going to get disappointed…you set a very high bar, which is why…early efforts to deploy natural language processing for customer service [failed].”
—Kai-Fu Lee, (47:01)
This in-depth conversation between Azeem Azhar and Kai-Fu Lee provides a rich, global perspective on the dramatic shifts and new frontiers in AI expected in 2025. The dialogue highlights the rapid democratization and cost reduction in model building, the importance of application innovation, emerging business models, and the coming push for true multimodal and agentic AI systems. While both are optimistic, they stress the importance of flexibility, experimentation, and realistic expectations in navigating the exponential pace of change.