
Andrew Feldman is the Co-Founder and CEO @ Cerebras, the fastest AI inference + training platform in the world. In Sept 2024 the company filed to go public off the back of a rumoured $1BN deal with G42 in the UAE. Andrew is the leading expert for all...
Loading summary
Andrew Feldman
Our AI algorithms today are not particularly efficient. In a GPU, most of the time it's doing inference, it's 5 or 7% utilized. That means it's 95 or 93% wasted. We won't be as dependent on transformers in three years or five years as we are now. 100%. The fundamental architecture of the GPU with off chip memory is not great for inference. Now, they will continue to do well in inference, but it can be beaten, and I think they know it.
Harry Stebbings
This is 20 VC with me, Harry Stebbings. Now, we did a show with Jonathan.
Ross
Ross at Grok and it blew all numbers out of the water.
Harry Stebbings
Millions of plays. Everyone loved it and everyone said that.
Ross
We had to get Andrew Feldman from Cerebras on the show.
Harry Stebbings
So I'm so excited to make this episode happen today.
Ross
Joining us in the hot seat is Andrew Feldman, co founder and CEO of Cerebras, the fastest AI inference and training platform in the world. Now, in September 2024, the company filed to go public off the back of a rumoured 1 billion dol deal with G42 in the UAE. They challenge Nvidia in the inference market. Andrew is the leading expert for all things inference. This show was incredible. I have the best job in the world.
Harry Stebbings
I sit down with the smartest people and learn from them. And this show is exactly that.
Ross
But before we dive in today, turning your back of a napkin idea into a billion dollar startup requires countless hours of collaboration and teamwork. It can be really difficult to build a team that's aligned on everything from values to workflow. But that's exactly what Coda was made do. Coda is an all in one collaborative workspace that started as a napkin sketch. Now, just five years since launching in beta, Coda has helped 50,000 teams all over the world get on the same page. Now, at 20 VC, we've used Coda to bring structure to our content planning and episode prep. And it's made a huge difference. Instead of bouncing between different tools, we can keep everything from guest research to scheduling and notes all in one place, which saves us so much time. With Kodi, you get the flexibility of docs, the structure of spreadsheets, and the power of applications, all built for enterprise. And it's got the intelligence of AI, which makes it even more awesome. If you're a startup team looking to increase alignment and agility, Coda can help you move from planning to execution in record time. To try it for yourself, go to Coda iO20VC today and get six free months of the team plan for startups, that's Coda iO20VC to get started for free and get six free months of the team plan. Now that your team is aligned and collaborating, let's tackle those messy expense reports. You know, those receipts that seem to multiply like rabbits in your wallet. The endless email chains asking can you approve this? Don't even get me started on the month end. Panic when you realise you have to reconcile it all. Well, Pleo offers smart company cards, physical, virtual and vendor specific so teams can buy what they need while finance stays in control. Automate your expense reports, process invoices seamlessly and manage reimbursements effortlessly all in one platform. With integrations to tools like Xero, QuickBooks and Netfl Netsuite, Pleo fits right into your workflow, saving time and giving you full visibility over every entity, payment and subscription. Join over 37,000 companies already using Pleo to streamline their finances. Try Pleo today. It's like magic, but with fewer rabbits. Find out more at Pleo IO 20 VC and don't forget to revolutionize how your team works together. Rome A Company of Tomorrow runs at hyperspeed with quick drop in meetings. A Company of Tomorrow is globally distribute and fully digitized. The Company of Tomorrow instantly connects human and AI workers. A Company of Tomorrow is in a Rome virtual office. See a visualization of your whole company. The live presence, the drop in meetings, the AI summaries, the chats. It's an incredible view to see. ROAM is a breakthrough workplace experience loved by over 500 companies of tomorrow. For a fraction of the cost of Zoom and Slack. Visit Roam. That's or AM for an instant demo of Roam Today. Nobody knows what the future holds, but I do know this. It's going to be built in a Roam virtual office. Hopefully by you. That's Romero. AM for an instant demo.
Harry Stebbings
You have now arrived at your destination. Andrew, it is such a pleasure to meet man. I've wanted to do this one for a while. I've heard so many good things from Eric for a long time. So thank you so much for joining me, Harry.
Andrew Feldman
Thank you for having me. I appreciate it.
Ross
Not at all.
Harry Stebbings
This will be a fantastic conversation. I have my pen ready. I feel like this is going to be a learning experience for me. I want to go back to 2015. What did you and the team see in the AI landscape in 2015? 2015 that led to the founding of Cerebras?
Andrew Feldman
We saw the rise of a new workload. This is every computer architect's dream. We saw a new problem to solve. What that means is maybe you can build a new machine better suited to that problem. And so in 2015, and credit goes to Gary and Sean and JP and Michael, my co founders, they saw on the horizon the rise of AI and what that meant was there'd be a new problem for computers, that what the AI software would ask from the underlying chip processor would be different. We came to believe that we could build a better machine for that problem. That's what we saw. You know, obviously we didn't see it exactly right. I underestimated it. You know, this is my fifth startup and the first time I underestimated the size of the market by a lot. But what we did get right was that this was going to be big and it would put a different type of pressure on a processor and that it would put pressure on the memory bandwidth, that it would put pressure on the communication structure. That's what we saw. We dove in. It's been an extraordinary nine years.
Harry Stebbings
How does the movement into an age of AI change the requirements from a chip perspective of what is needed for a provider and how that then resulted in how you built Cerebras?
Andrew Feldman
The way to think about a chip is it does two things. It does calculations and it moves data. This is what a chip does sometimes along the way, it stores data. And so what AI presented was a very unusual combination of challenges. First, the underlying calculation is trivial. It's a matrix multiplication and an FMAC can be developed by any second year electrical engineering student. So you say to yourself, holy cow, this has a huge number of very, very simple calculations. The hard part with AI work is results. And intermediate results have to be moved a lot. Therein is the most complicated part. They have to be moved to memory and from memory and they have to be broken up and moved among GPUs. And what we saw was that this was going to be the hard problem and that if we could solve for that problem, we would build an AI computer that was faster and used less power.
Harry Stebbings
When we think about how we're going to build and what we're building for, to me, kind of a couple of core elements which is like where you're going to focus. Are you focusing on fine tuning? Are you focusing on training? Are you focusing on inference? Three. You chose all three?
Andrew Feldman
Yeah.
Harry Stebbings
Why? And I'm sorry for my base questions, but I thought like GPUs were specialized towards training and they weren't specialized towards inference. Can you have a mono architecture that does three best?
Andrew Feldman
The first step in computer architecture Is deciding what you're not going to do, what are we not going to be good at? Is really the first important question. To answer your question, you say, is the computational work for training from scratch different from fine tuning? And the answer is it's not different. It's approximately the same. Now, inference and training have some different requirements, and generative inference in particular has some very challenging requirements on exactly the communication dimension that I mentioned. In generative inference, you have to move all the weights from memory to compute to generate a single word, and you have to move them again to generate the next word. And again, so if you have a 70 billion parameter model, not a giant model, and each weight is 16 bits, you're moving what, 140 gigabytes of data to generate one word. This is an enormous amount of data movement across memory. And that's called what's consumed, that needs is memory bandwidth. If you have an architecture like we saw in the gpu, that is your fundamental limitation. It's a fundamental architectural limitation. That was what we went to wafer scale to solve. They use memory, a memory called HBM to type a dram. It is phenomenal memory, but it's slow and high capacity. And when they set the architecture for graphics, that's what you wanted, you didn't have to go back and forth to memory very often. Sram, on the other hand, is unbelievably fast, but has low capacity. And so we wanted to use sram, but if you build a normal sized chip, you can't hold a model. And so by going to wafer scale, we were able to put down a huge amount of SRAM and get the benefits of speed and enough capacity. If you build a normal sized chip with SRAM and you want to do a 400 billion parameter model and inference, you might need 4,000 chips. Or if you want to do a Deepsea 671, you might need 6 or 8,000 chips. What an administrative nightmare. And if you can keep it on as much as you can on one wafer, two wafers or four or 10, you get all the benefit of the SRAM. And because you've been able to use the wafer, you get this tremendous capacity as well.
Harry Stebbings
Can I ask you first? I totally get you on HBM and kind of the slowness of it. Why is it then that bluntly, so much of the market just continues to use it and 40% of Nvidia's revenue is using that chips for inference.
Andrew Feldman
Unless you went away for scale, there isn't really a credible other Choice. This is the way GPUs had always been made. It's called a graphics processing unit. That's the way they were built. It was part of their advantage against a cpu. They were built this way, but now they're dedicated chips like ours. What used to be their advantage is now their weakness. That's a fun market to be in when over a very short period of time what you're good at becomes your weakness.
Harry Stebbings
With a market cap like they do and with Janssen, as good as he is, which I'm sure we both agree with, right.
Andrew Feldman
They must know, they do know this A they don't make memory. So they're a consumer of other people's memory. And that's sk, the Hynix guys or Samsung, I mean Micron. They're only three or four or five companies that make huge amounts of memory. Not many choices, but it's part of a complex architectural trade off. The flip side, you could say it's worked really well for them, right? Look at where it's taken them. But in comparison to those of us who do wafer scale, it's a small set, it's a set of one us, we have a real advantage against them on inference.
Harry Stebbings
How do LPUs fit into this? We've got HBM, we've got SRAM with you and Botany having many more of them to make it work. And scale, where do lpus fit into this mix?
Andrew Feldman
In our business there are a lot of ways to skin a cat. Our way is different than Nvidia's way. It's different than the tpu, it's different from cranium. It's. They're different right now. And every day since August 26th when we launched inference, our way has been the fastest way across a whole set of models tested by artificial analysis. And others can ask when we think.
Harry Stebbings
About kind of that speed, I am interested. You said that kind of. You're one of one with wafer and kind of the architecture associated. What does that mean in terms of cost with such efficiency, is it inherently more expansive and what does that look like from a cost profile?
Andrew Feldman
This isn't our first dance. We've been building computers for a long time. When you make a choice like wafer scale, you have to weigh the trade offs. We use less power because one of the most power hungry things on a chip are the iOS are moving data off chip. And so if you are moving data off chip frequently, you're using more power than if you can keep it in the silicon domain on chip. So we knew we would use less power. We knew if you went to wafer scale, that you had to solve some problems that people said were impossible to solve, like yield. So we had to invent techniques that allowed us to yield wafers. In fact, we invented techniques that allow us to yield as well or better than others who are building much smaller chips.
Harry Stebbings
What is yield and why is it impossible to solve?
Andrew Feldman
A wafer begins 12 inch diameter circle a slice of silicon, and your chip is punched out of this the way your mother might take a cookie cutter and cut out cookie dough during the process, at some point, just like your mom might have done, she lifts up the edges and all the little bits are removed, and what's left are just the cookies. Those are your chips. Now what happens is there are a set of naturally occurring flaws, and that's like your mother closing her eyes and throwing up a handful of M&Ms. Now, the bigger the cookie, the higher probability you hit an M and M. The bigger the chip, the higher the possibility that you have a flaw. And traditionally, what you did when you had a flaw was you threw away the chip, or you sold it as a less valuable part. You shut down part of the chip and sold it as a less valuable part, something called binning. So every wafer is going to have flaws. The bigger your chip, the higher probability you hit a flaw, and the more part of silicon is wasted when you throw it away. This is what everybody thought was known truth. And one of the things our team realized was that there are other ways to handle flaws. What if instead, you built your computer, you built your processor out of hundreds of thousands of identical tiles and say there was a flaw? Say you just shut down that tile and worked around it. Say you had a row or a column of redundant tiles that when you needed them, you could just pull in. Now, that had been traditionally the technique used in memory making. And the memory yields are extraordinary. And so it occurred to us that if we could build a computer, build a processor built of hundreds of thousands of identical tiles, we could use redundancy such that when there was a flaw, we could just leave it there, shut it down, work around it, and pull in one of the redundant tiles. And that had never been done in a computer before, and that's at the heart of our architecture that allowed us to yield and deliver whole wafers. Nobody had ever been able to do that in the 70 year history of our industry. Really, really smart people struggled. I mean, Gene Amdahl, one of the fathers of our industry, had a company called Trilogy that Crashed and burned trying to do this. And we figured it out.
Harry Stebbings
When you speak about kind of being the fastest and across all benchmarks, being the fastest, what matters the most? Is it being the fastest, Is it being the most efficient, Is it being the least costly? How do you think about the stack of prioritization for your customers?
Andrew Feldman
I think it varies. If you go to get a cancer diagnosis on God forbid, your mother or your wife, I think 93% accuracy is just plain not as good as 94% accuracy. And you pay a lot and wait another week to understand what the accuracy is. Right? You pay a lot. Now, on the other hand, if you want llama 405B to generate data to help you tune llama 70B, maybe you can wait a few days, three days, a week more. There's no urgency there. On the other hand, if you want an answer from perplexity, you don't want to wait 45 seconds for a search answer. You don't want to wait in a chat, you don't want to wait three minutes for R1 on GPUs to give you an answer. What we know is that in interactive mode, milliseconds matter. In interactive mode, what oars holes over at Google years ago showed was that you can destroy your user's attention with milliseconds of delay. So being the fastest matters everything in that domain. So I think what you have to do is sort of be thoughtful and say, in some cases, being the fastest doesn't matter. We'll call those batch lots of maybe cheapest matters. There's in other domains, there is no search. If you got to wait eight minutes to get an answer, that's not a product. When you go fast, a whole set of of new opportunities open up. Netflix used to to mail DVDs. That's what happened. When the Internet was slow, they'd mail DVDs. I remember.
Harry Stebbings
I look young, Andrew. I'm not that young. I remember Blockbuster.
Andrew Feldman
Yeah, well, if you remember Blockbuster, right first, I mean, let's look at the history of that. You're exactly right. First we used to drive to Blockbuster to get a dvd. Then Netflix was mailing them to us, and then we got broadband and suddenly Amazon's a studio. Right. It changed everything. And speed in inference does the same thing.
Harry Stebbings
When we chatted before, you gave this great equation for inference. What was the equation that you gave for inference? Because it was really helpful for me in understanding.
Andrew Feldman
It begins with the following. Training makes AI. That's how we make AI. And inference is how we use or consume AI. And so understanding how big the inference market is is understanding the number of people who are going to use it, how often they're going to use it, times how much compute each use takes. And right now we are in this rare time where the number of people using AI is growing, the frequency with which they use it is growing, and the amount of compute used in each instance of use is growing. That's why you're getting this extraordinary growth and that's why it's off the charts right now.
Harry Stebbings
When we think about the distribution of resources between training and inference, what will that look like in 5 years time? Because we've seen all focus go to training, not all, but a lot of focus go to training and not as much go to inference. What does that look like?
Andrew Feldman
What we made until the middle of 2024, what we made in AI was a novelty. It wasn't very useful. Late in 2024, what we made began to be useful.
Harry Stebbings
What was the turning point?
Andrew Feldman
If you look at the models they became, I mean, ChatGPT was not really a technical innovation, it was a user, user interface invention, but it gave more people access, but we didn't really right away know what to do with it. It was cool, right? That's what I mean by novelty. It was like, whoa, this is cool. Now, if your marketing team isn't on an LLM each person several times a day, they're not doing their jobs. That difference between novelty, it's cool and this is part of everyday workflow. That's what changed starting sometime in Q4 last year and running into this year is AI became useful not just to a select group in Silicon Valley, but to my dad, to my brothers, the doctors, to ordinary people who aren't buried in the Silicon Valley discussion. And when you get them, then the market is ripping.
Harry Stebbings
Do you not still think we are so incredibly early though? Going back to your point of like how many in five years time, then where are we? Are we a hundred times bigger? Are we a thousand times bigger in terms of.
Andrew Feldman
I think we're way over 100 times bigger. Yeah.
Harry Stebbings
What does that mean in terms of what we need to equip ourselves to deliver? These are incredibly energy utilizing. It is incredibly difficult.
Andrew Feldman
Our industry consumes a lot of power.
Harry Stebbings
Yeah. And a lot of water. And we're seeing that come down. But are we equipped from an energy and a data center standpoint to deliver the inference requirements for a population that is as AI hungry as we are?
Andrew Feldman
I think a couple things. I think the first thing is to Admit that this is a power intensive problem. We consume, our industry consumes an enormous amount of power. Second thing to say is therefore the burden is on us to deliver exceptional value as an industry. You take both the good and the bad, right? In order to make it worthwhile from a societal perspective to expend all this power, you better deliver the goods. We better use AI to find cures for diseases. We better use AI to solve a bunch of different societal problems. That's the macro view. Do I think that we are equipped? I think we are in a very unusual situation in the US where we have plenty of power, but it's in all the wrong places. We have power in Niagara. What we don't have is power where you want to build data centers, where we have good fiber. What we don't have is a national way to relax the local regulations that make getting power. And so when you go to Silicon Valley, if you want to build a data center, you're dealing with local government and installed interests. And that is not an efficient way to decide if you want to build a power plant or put a new data center in, especially if it's large. I think those places that have ripped out some of that burden in Texas, for example, are getting a huge amount of data centers built.
Harry Stebbings
You know, when I spoke to Jonathan at GROK before, he said there were a huge amount of data centers being built that were not actually really equipped properly and that we're seeing this massive supply of data centers that are really kind of done by tourists, so to speak. And that is a massive problem and that the provisioning of these data centers isn't there.
Andrew Feldman
Do you agree a data center is a construction project to begin. It's a access to power and then it's a construction project and it's got a design engineering component. I think there's been a huge push for new construction data centers. We will see. We don't know if they're going to be good enough. I think many of them will be fine. The guys who were there early were some of the bitcoin mining companies, Terrawulf, the guys at Crusoe and guys in Europe. They were early in building buildings near low cost power in order to run compute that used a lot of power. And they are some of the leaders now in some of the largest projects. Now those are certainly not tourists. Those are extremely sophisticated data center builders. Sure there's some tourists, but there are a lot of very, very knowledgeable data center builders building huge facilities right now. I mean gigawatt scale facilities Both domestically and internationally.
Harry Stebbings
How do you think about how the cost of inference goes down? With the surge of demand that we mentioned over 100x, does the price reduce 100x? Does it follow Moore's Law continuously? How do we think about the ever reducing price of inference?
Andrew Feldman
The cost of inference is built up of several pieces, right? There's the power and space that is consumed to generate the response. That's a data center cost, that's an OPEX item number one. Number two, there's the cost of the computer. We can drive down the cost of the computers with each generation by driving up their performance, et cetera. But the other thing we can do is we can develop more efficient algorithms. Our AI algorithms today are not particularly efficient. There's a tremendous amount of room in a GPU. Most of the time it's doing inference, it's 5 or 7% utilized. That means it's 95 or 93% wasted over time. I think as an industry, we get better at things. We can drive the cost of compute down. We can build more efficient data centers with lower PUEs, and our algorithms will get more efficient so that our utilizations on our now cheaper computers are higher. So you get a higher percentage of the maximum number of flops. You get more tokens per unit time for the same power.
Harry Stebbings
When you look at the inefficiency of the algorithms, as you mentioned there, and what that means for the utilization of the chips, why are people suggesting that we're at scaling laws already? That seems to suggest that there is so much room for improvement. How do you think about what you just said in conjunction with the idea that scaling laws. We're hitting this asymptote point. How do you reconcile the two?
Andrew Feldman
I don't think there's a lot of debate among senior ML thinkers that we have tremendous room for algorithmic improvement. I don't think there's a lot of debate there. There's even debate about whether the scaling laws are over or whether we ran out of mojo to keep making data or gathering data to fill these ever bigger models. But OpenAI's work on O shows me that the scaling laws certainly for inference, are fully functional. Right? The more compute you put on inference, the better answer you get. Many of the leading models are now MOEs. They're not presenting all of the weights to each token. And that's one way to do it. Present the important stuff, not the unimportant stuff. There are other ways to do it that we will invent and learn over time. But we have human Models that aren't all to all connected. Many of our models today are all to all connected. That's a lot of unnecessary connections. Connections that don't produce anything that we still end up doing math over.
Harry Stebbings
I'm sorry, what does all to all connected mean?
Andrew Feldman
In many of the layers in a neural network, every element is connected to every other one. That's not the way actually the the learning happens. Some are more valuable and some are not valuable at all. Imagine you're going to read 50 books. You want to learn something. You can read all 50 books. Or you could read three books that are really important. Or you could read summaries of the three books that are the most important. The problem is we don't know which they are at the beginning. And there's a process that you could learn. There's things called dropout and all these other techniques to use sparsity to help solve these problems. We are early in the evolution of AI plays right into this point that we'll get better at these algorithms. Transformers aren't the end of the world. We'll get better. Better will mean faster, more accurate and more efficient. That's what's exciting about an ever changing industry. That's why at I'm not in all these other industries that don't change quickly. Same nine years ago as they are today.
Harry Stebbings
But this show is kind of strange for me because I speak to a lot of people and they think about the three pillars and they're like compute, algorithms and data. A lot of the common refrain is that actually we're very far along in all of them and that has been the refrain. And when I hear you it's like actually it's very exciting. Sounds like they're wrong.
Andrew Feldman
I think they're wrong. I don't think we're very far along and it's very difficult to say that we are early in an industry but we're far along and all is underpinning. I think we are early in all of them.
Harry Stebbings
If we just take them one by one in five years time, how much synthetic versus human data will be used to train models? If you were to put a percent on it, Almost all synthetic and the utility value of synthetic is the same as human.
Andrew Feldman
When you teach a pilot to fly in a simulator, there is a lot of potential data that isn't very useful in teaching a herd of fly. They spend a lot of time going straight doing nothing as a pilot. Now takeoff and landings are where you want to spend your time. And that's why when we put Them in simulators, that's what we have them doing. And in simulators we can create data where engines blow, where there are a whole set of problems, where learning can take place. That's simulated data. And in the same way as we think about creating data, whether it's for self driving, whether it's for other forms of AI, what we want is the data that's hard to gather, right? Otherwise we just have a bunch of data of people driving straight on a freeway. Not difficult. We've been able to do that for a decade. What we want is an unprotected left turn in the snow. It's snowing, it's hard to see. You got an unprotected left turn, that's a difficult thing. And you want that thousands of different ways, millions of different ways. That's where the synthetic data comes along, is to use it to fill in the empty parts, where it's really expensive or painful to get that type of data. Think of the pilot. You want them spending a huge amount of time on things that are rare in their training. Same with a surgeon. A huge amount of time on things that are rare. Most of the time it's carpentry, but their expertise is only when it's rare, something happens, the unexpected occurs. That's when their metal is shown. And we will get better at synthetic data by a great deal.
Harry Stebbings
I love it. I get it. From a consumer perspective and from an expectations perspective. If we move the needle on compute algorithms and data, what does that mean for the experience of AI?
Andrew Feldman
Faster and cheaper is the first answer. The second is when things become faster and cheaper, new applications emerge. It's used everywhere, right? When computers became faster and cheaper, suddenly they were in cars and then you were in your pocket, and then they were in your dishwasher and in your TV. That's what happens. I mean, 30 years ago you're like, I need a computer in my tv. You kidding me? I need one in my pocket now. You've got powerful computers in your pocket. You've got them in your tv, you've got in your kids toys, you've got in the car. That's what happens. Diffusion of innovation accelerates when you make things faster and cheaper.
Harry Stebbings
This is Jevons Paradox and Satch's belief there.
Andrew Feldman
No, yeah, I know. In the VC community, you got to cite 19th century English economists. I'm English.
Harry Stebbings
I'm English. Come on. If I'm not allowed to cite an English philosopher, what am I here for? Are you just like, oh, these fucking VC just being like oh, Javan's Paradox.
Andrew Feldman
That's right. It's like, look, make stuff cheaper and faster. There are very few examples in our industry, actually none in compute in 50 years in which by making things cheaper, faster, the market got smaller. The market always gets bigger.
Harry Stebbings
Always can ask from an architectural standpoint. You mentioned Transformers there. Is there a world where we move past Transformers?
Andrew Feldman
There is a world, 100%. We won't be as dependent on transformers in three years or five years as we are now. 100%. They're not the end all, be all.
Harry Stebbings
Why is it? What will replace it and what does that look like?
Andrew Feldman
I don't know. I don't know whether they're going to be state based models. I don't know whether they're going to be other types of models. But what I know for sure is that innovation doesn't stop. The transformer has some weaknesses that people are desperate to overcome. There's a quadratic effect in the attention head. There's all sorts of things that could be improved, but it's pretty darn good now. It's the best we have. And that's what you run with. You run with the best you have, and the minute it's not the best you have, you drop it in favor of the best you have. I mean, the number of innovative companies designing models is large. And what Deepseek showed us is you don't need 5,000 people and billions of dollars of gear. You can do it with 200 smart people. More gear than Deepseek said they had, but less gear than others had.
Harry Stebbings
Were you very impressed with Deepseek and what impressed you most?
Andrew Feldman
I think it was a result of focused engineering and that impressed me. It was designed to be better. They weren't confused about being model intellectuals, or they weren't confused about whether it was important to break new ground or they were interested in being better. From an invention standpoint, that's a little boring. But from an engineering standpoint, that was sweet effort. They really built a model that was just plain better at many, many things. And that's cool. I like good engineering projects. Now that they chose to announce it right around Trump's inauguration and the politics of it, that's all a separate matter. And we can talk about that later.
Harry Stebbings
But is distillation wrong?
Andrew Feldman
No, I don't think distillation is wrong. Is summarization wrong?
Ross
I'm a vc.
Harry Stebbings
Are you kidding me? That's what we do.
Andrew Feldman
If you didn't summarize, you wouldn't know anything, right? That's Right, that's exactly, that's exactly right. I don't think distillation is wrong. And if distillation is wrong, then certainly using people's copyrighted data is wrong. That's the problem. The problem is you got to be a little bit consistent.
Harry Stebbings
Well, Sam's been a guest many times and we hope he will be again.
Andrew Feldman
So I'm not, I think neither are wrong actually. But I think you have to be consistent.
Harry Stebbings
Well, the thing is, with it bluntly.
Ross
Deep Seeker is open.
Harry Stebbings
So everything that they did innovate on OpenAI can learn from and take too.
Andrew Feldman
I think there are few examples of an open source anything having the sort of immediate impact that model had. I mean, that model had a giant impact in a technical community of really smart people. And there are very few examples of other open source software projects that had that type of impact in that amount of time. You're in the business of betting on these guys. They ramp up and they oh look, 10,000, now it's 100,000 users, it's now a million users. We better start a company around that, get those grad students. But this had a loud boom in the industry. Immediately I was like, whoa.
Harry Stebbings
The thing I have to think as a venture investor is where is enduring and defensible value simply? And how do I get in early and build that over time in hardware?
Andrew Feldman
Harry?
Harry Stebbings
Well, this was my question, which is like, well, I mean, you have to be a very smart investor like Eric Richer to do hardware, to be clear. But on the model side, do you think there is value when you look at the sheer number of players or with relatively comparable models to demonstrate enduring.
Andrew Feldman
Value, you need both immediate value and a trajectory for more. I think the problem is in some industries you are capable of demonstrating a leadership position for a short period of time. And then someone else, maybe the next generation, they generate the next and the next generation the next. And I think that ends up in the soft world, being you're competing against other people's release cadences, you're four months ahead, they're six months. If that's really where you are, there's not a lot of value. But if you can stay at the top over years, right, even if you're not the best, even if you're top decile over years and the people above you are changing constantly. Very large Silicon Valley companies have been built with not the most compelling technology. It might have started the most compelling technology and then it got to a point where it's good enough, it was easy enough to use, that's when you're at the mature market. But we're a long way from there right now. Right now we are in the early phases. You characterized my position exactly right. Data, compute, algorithm. I think we have a ton of room for improvement on all of them.
Harry Stebbings
When we look, you said that compute and hardware, that's where the value is. How does that value distribution shake out? You know, we've obviously got the 800 pound gorilla that is Nvidia. How do you think about how the distribution of value shakes out in hardware and in compute over the next five years?
Andrew Feldman
Historically, one of the barriers to entry was sort of the capital intensity of a project. And in the world of building chips, there's both scarce resources in expertise and it's very expensive. Historically, it hasn't fit very comfortably in a software company. And the things that modern software companies value are not entirely conducive to chip making. So when I look down the road, who has endured in much of infrastructure tech? People who build systems, Cisco, Juniper, chip makers have endured. There's a reason that Apple and Nvidia are among the most valuable companies on earth. There is. What they do is hard. That's why it's worth challenging. If it weren't hard, if it wasn't enormous and difficult, why spend time being the underdog and challenging it?
Harry Stebbings
A lot of people place defensibility around Nvidia's kind of CUDA lock in. To what extent is that real versus hype in inference?
Andrew Feldman
It's not real at all. There's no CUDA locking in inference. None. You can move from OpenAI on an Nvidia GPU to Cerebras to fireworks service on something else, to together to perplexity with 10 keystrokes. Anybody who actually uses AI knows there's no CUDA locking in. I think there was a fundamental effort to disintermediate CUDA first by Google with TensorFlow and first by some grad students with Caffe and some of these early efforts, but later by Google with TensorFlow and then Facebook or Meta with Pytorch. I think today most AI is written in Pytorch and you ought to be able to compile it and run it on your hardware. Nvidia has many moats. When you are a dominant market share leader, that in itself is a moat that you're the default solution is a moat that everybody learns to think about AI in your structures. Those are moats. The software compilers are hard, but they're tractable.
Harry Stebbings
I completely agree with you in terms of kind of Being the leader is the most in itself.
Andrew Feldman
It's never talked about that way.
Harry Stebbings
Would you put OpenAI in that same. It is the leader. Everyone's mother knows.
Andrew Feldman
ChatGPT let's look at intel, right? Intel has made until hiring Libbo. Prior to that, nearly a decade of catastrophic decisions and they still own 80% of the x86 market, 75% of the market. AMD is worked up to like 25% or 30% and after a decade of screwing up and you ask yourself that's a moat. How big is my moat? I can make a bunch of bad decisions for a decade and only lose 20% share. That's extraordinary. The moat was just unbelievable. We'll see. I mean I'm a huge fan of Lipo's. He's an investor in our company, I wish him well and I think if anybody can change that company, he can. But I think we rarely talk about what being the market share leader means in terms of a moat in the right context because as a challenger we have to think about it. Exactly, because it's exactly that that we need to, we need a bridge for. It's exactly these characteristics of the moat that we need to get over.
Harry Stebbings
In five years time though. Is it Uber or is it like AWS and cloud? And what I mean by that is like cloud is an interesting market where like a couple of players or several players have relative segments, 25, 30% and it's shared relatively evenly between them. Not exactly, but relatively. Or is it one like Uber where Uber has 90%, Lyft has five and then there's alternative providers with the other five.
Andrew Feldman
Right. I think it's going to be between those two in five years from now Nvidia is going to have 60. Right? I think right now they have approximately all of it. I think they will come down over time.
Harry Stebbings
Of Nvidia's usage, what percent will be training versus inference.
Andrew Feldman
I think they will continue to have a meaningful business on both sides. I think they're exceptional at training. They will not roll over and play dead in inference. I think they're a world class company. I mean they've had one of the great decades of any company in history. Right? I mean from 2014 they were worth what, 10 billion to where they are right now? It's one of the great decades in corporate history. I don't think they're going to roll over and oh yeah, we're not going to be in the inference market. Market. That's not going to happen. They can have meaningful share but the market's growing and we'll have a piece. I think others will have a piece. There'll be some very big companies made in this 100x growth.
Harry Stebbings
Do you think chip providers will be far larger than model providers in terms.
Andrew Feldman
Of enterprise value in the 5 year time frame? Yes.
Harry Stebbings
How does that prediction change in a different timeline?
Andrew Feldman
I think in a shorter timeline, when you price an option variance and uncertainty increases the options value value. If you look at the way Black Scholes works, or if you look at any option pricing, model uncertainty is a friend, variability is a friend of the value of the option. And when people are paying these extraordinarily high prices for model companies right now, I think part of that is this extraordinary uncertainty is this wild variance. And so in the shorter run, it might not be the case. But in the longer run, as markets mature, as we begin to understand the value of of these models, we understand what their businesses look like, what their long term net profitability looks like. What did Warren Buffett say about markets? In the short term they're a voting mechanism and in the long term they're a weighing mechanism. At some point the weighing kicks in. Usually it's in the public markets and then investors say, which is likely to give me better growth in the future.
Harry Stebbings
I mean, listen, you mentioned the word public there. I do want to just hone in on your business. You're cash flow positive in a world where everyone else literally bleeds cash. Help me understand, what do you do to make your cash flow positive when everyone else is bleeding or hemorrhaging cash?
Andrew Feldman
Traditionally, your gross margins were a measure of your technical differentiation, right? If you're running a negative gross margin business, it speaks for itself. You're selling commodity. Your value creation isn't being recognized in the market. And so I think our technology is creating an opportunity for us to maintain margins where some others can't.
Harry Stebbings
A lot of your revenue is concentrated to the G42 deal. To what extent is that a strength or a weakness?
Andrew Feldman
It's both. The way you catch three large customers is to catch one first. The way you build three large strategic partners is learn to be a strategic partner. That's a learned skill. We didn't arrive knowing how to be a strategic partner at G42. Now that we've worked at it and worked at it, it's a muscle we can replicate. We could be a better partner to any of a dozen different companies in the world.
Harry Stebbings
What have you Learned in the G42 relationship build process that makes you dial a good partner in A way that you weren't.
Andrew Feldman
We've deployed tens of exaflops of compute, vastly more than anybody else that isn't AMD or Nvidia. Right. I mean at a huge amount of compute. Our software has been hardened on some of the largest AI clusters in the world. World. We've gone through the growing pains of increasing manufacturing 2x and 5x and 2x a gamete through unbelievable growth in manufacturing. We've worked with our supply chain partners to be sure that they're ready for this extraordinary growth. When you work with a strategic partner of this size, your organization comes out different on the other side. There are things you've learned and there are mistakes you've made. And I hadn't done a big relationship in the Middle East. There was a huge amount to learn. I think you come out a much better company and much better prepared to do business with a hyperscaler, to do business with another massive partner, to do business with another sovereign. It takes real work and your team has to learn.
Harry Stebbings
You said you'd come out better. Why go public when you did? When this happened, I was like. It seemed preemptive. Respectfully. And my question now to companies is why go public at all? There is so much private capital. The collisons have shown, I think very clearly that you can stay for a lot longer than you plan to.
Andrew Feldman
Databricks certainly shown that. Right? I mean there. Those were historically public market valuations. You know, the valuations that Anthropic and OpenAI and some of the others are getting are historically public market only valuations.
Harry Stebbings
And like you said, S1's live. Anyone can read it. I wouldn't read it. Want people reading mine.
Andrew Feldman
We have nothing to hide.
Harry Stebbings
I mean, no, but your competitors have got asymmetric information.
Andrew Feldman
Yeah, we've got asymmetric technology. To be public, you have to be ready organizationally, be ready with your processes. You need to be ready to forecast and predict, to be held accountable in a way that private companies historically haven't been. We think that there's tremendous value. We think that we will be among the first in the category. We think that some of our largest targets would have a stated preference for doing business with public companies. Large enterprises in the US have done that historically. Those were some of the reasons that led us to.
Harry Stebbings
How many G42 relationships a la G42 will you have in the next 24 months? How fast can you ramp them?
Andrew Feldman
That's a good question. Several. Those are big numbers.
Harry Stebbings
Sorry, remind me, how big is the G42? It's 87% of revenue. I know that it was big.
Andrew Feldman
I mean when we announced it, it was some estimated it was north of a billion.
Harry Stebbings
Well done. That must be a bit of a high five, wasn't it?
Andrew Feldman
Look, I think, come on. First, yeah, there's tremendous excitement and then there's sort of every entrepreneur's reality is I got to make a lot more gear I need to make. And you make a list of your top 10 vendors and you fly it to the mall and say big orders are coming, be ready. Right? You work with all your partners to get ready because you need to make a great deal more stuff. And that's one of the real differences between hardware and software is when we grow fast, the number of people you need to work with in your supply chain and the amount of collaboration that needs to happen is truly extraordinary.
Harry Stebbings
Are Nvidia going to have a clusterfuck of unhappy customers who bluntly have waited so long for chips? By the time they get them, the chips are outdated and they're going what?
Andrew Feldman
All of that's an opportunity for us and others. That's opportunity. I think being a market share leader isn't easy either. But when you're late, when the bully falls, everybody wants to give him a kick. I mean that's a lot of that happened at Intel. They'd been the dominant player and when they fell, everybody was happy to jump in and kick them when they were down. I think there is a real opportunity in the potential for Nvidia customer unhappiness for sure. For those of us who are competing with them. I mean, if you can't get your gear, you may as well test somebody else's. That's a huge opening.
Harry Stebbings
Head over to Cerebras and use the promo code Harry, 20 for your chips.
Andrew Feldman
Do that.
Harry Stebbings
There we go. I'm here for you, baby. Influencer mode turned on. Yeah, yeah, no no worries. It's fine. I hope if we could do a 20 take and on the billion deal, that's fine.
Andrew Feldman
I know this venture business hasn't been so good to you, Harry. And you got to get shoes for your kids and the like and. And yeah, we're happy to donate to.
Harry Stebbings
400 million fund and fees. That's right. When I have no kids as well.
Andrew Feldman
You and 20 is a rough way to make a living, Harry.
Harry Stebbings
Dude, you don't get it. Okay, you in hardware, you said about the complexity of hardware. Are export controls being implemented properly? Do you think that is a good idea? You know, everyone was going with deep Sea. Wow, how did this happen? They must have stolen chips. How could this be?
Andrew Feldman
It turns out that they probably did use chips in Singapore. I think the following. I think managing software and managing hardware compliance are extremely different things because their vector of diffusion is different. There's different weights. If you sell a server that weighs five or six hundred pounds, arrives on a pallet, you can go visit it. You want to deploy it in Kazakhstan, you can put a data center in, you can have somebody from the embassy visit it, take photos of it once a month. It's not going anywhere. You can keep track of who uses it and provide logs, and that's much, much harder with software. And open source is a whole nother. That's the first observation. The second is that I got to know the leadership in commerce in the previous administration. I didn't always agree with their policies, but it is a world of unintended consequences. You sought to limit Chinese access to EDA tools to delay the growth of a Chinese chip market. And so US venture capitalists backed tons of Chinese companies in Shenzhen to build EDA tools. Right. This is a unbelievably slippery, dynamic, challenging problem. I don't know if it's attractable problem. To delay another nation's progress on a technical trajectory is an enormously challenging thing. I certainly came to appreciate just how difficult it was for well meaning people to predict the impact of policy during the last two years. For sure.
Harry Stebbings
Do you think this administration is better for AI than the prior administration?
Andrew Feldman
I don't think there's any doubt that's the case. I think the past administration lined itself up against Big Tech. That was a mistake. AI is also in a different place. So it's easier to be for it. It's less scary now than it was. We sort of have a better picture of the trajectory, both the risks and the benefits. This administration sort of had the foresight to put in place an AI czar or leader to be a focal point for discussions. Yeah, I think it's probably net a fair bit better.
Harry Stebbings
You said it's very challenging to kind of hinder a nation's development, adoption, progression of a technology. Respectfully, you chose to not sell to China.
Andrew Feldman
Yeah.
Harry Stebbings
Why was that? And does that not go against the difficulty in hindering progression?
Andrew Feldman
No. I have a very simple rule and I encourage a team to use it. I mean, you don't need a big handbook to help you make good decisions in a company. Just ask yourself, would my mother be proud and would she be proud if I did this? Would she be proud if I explained exactly the situation and would she look at me and say, I'm proud you're doing this son. And I asked myself that. And I came to believe that the deal on the table wouldn't be used for good. And I wasn't comfortable with that. And I wouldn't have been able to explain it to my mother. And that's a moral compass. What do you mean it wouldn't have.
Harry Stebbings
Been used for good?
Andrew Feldman
To do facial recognition to identify minorities for persecution, Build military equipment to things that I either couldn't see or what I saw didn't feel right. It's more important than money.
Harry Stebbings
Do you think we fundamentally underestimate the Chinese's capabilities?
Andrew Feldman
100%. And it is one of the most obvious and frequent errors in judgment is that you underestimate the other side. You have to look carefully at what they're doing. And their investment in infrastructure has been extraordinary. The rate at which they generate engineering talent is exceptional. The government's ability to have a policy and implement it. That's not a democracy. They weren't designed to have checks and balances there. The funding that flowed into the development of AI technology, that their venture capitalists were backstopped by their government. They have national champion companies that they've developed a belt and suspender strategy to sort of make much of the third world dependent on them and their technologies. I think they absolutely should not be underestimated. They have a lot of people and we see a tiny fraction of it. They have produced industrial policy that has moved their nation forward.
Harry Stebbings
What was the most significant do you think?
Andrew Feldman
The creation of economic zones like Shenzhen was clearly a visionary move. They knew that their own system was in the way they created zones that relaxed their own system.
Harry Stebbings
Could the US learn from him in that way?
Andrew Feldman
We did some of the same things in Trump 1 administration. Right? What did we do? We relaxed our own rules in the development of vaccines. We knew that in this time it would be very difficult to go through the steps that we always go through. And we tried to implement some thoughtful workarounds. Rather. I think that, you know, why are they committed to trains as a mode of transportation? And we can't build a decent train system in the US or in California or why we have three different standards for train rails and the rest of the world can. Can build extraordinary high speed trains linking important cities. What are we doing wrong in the building of our infrastructure that. That our bridges and our freeways are in disarray? Those are questions we got to ask ourselves when we see other people doing it differently. If you watch a good football team and you say, whoa, that's interesting offense and you're not thinking to yourself, how could our team learn? What could we do? Why did that work? What was it about the people they had or the talent or the structure or something that made that a successful series of plays? And what can I take away from that? How can that inspire me to do, to do better? I'm always looking for inspiration in others and competitors and partners. We have some of our partners at G42. I mean the work ethic is unbelievable. It inspires me and the scope of the challenge that are taken inspires me and I think I'm always looking for that.
Harry Stebbings
Andrew, I could talk to you all day. I do want to do a quick fire with you. So I say a short statement. You ready?
Andrew Feldman
Yeah, sure.
Harry Stebbings
What do you believe that most around you disbelieve?
Andrew Feldman
I think we're closer to peace in the Middle east than people believe. There is a rise of a, of a moderate, business focused Arab state that it wasn't there 25 or 30 years ago. If you visit the UAE or Qatar or even KSA, what you see is amazing transformation. A desire for to be included in the west in their own way, but also to enjoy the benefits of it. We are closer than people think.
Harry Stebbings
What's the most underrated threat to Nvidia's market share dominance?
Andrew Feldman
The fundamental architecture of the GPU with off chip memory is not great for inference. Now they will continue to do well in inference, but it can be beaten and I think they know it.
Harry Stebbings
What's a crazy AI prediction you have that most people would call science fiction? Dario at Anthropic said we'll live to 150.
Andrew Feldman
I don't think we're going to live to, to 150. I don't think what, 90% of our code will be written by machines in this year. But I do think that within a year or two, most people in the US will engage with an AI every single day in one form or another, whether they know it or not. That AI might be in their mapping program that helps them pick a better route to work. It might be any number of different things. Things. Within a year or two, AI's penetration will be approximately the same as telephones.
Harry Stebbings
What have you changed your mind on in the last 12 months?
Andrew Feldman
Many decisions I made turned out to be wrong.
Harry Stebbings
What was the most wrong decision?
Andrew Feldman
There are two ways you can be wrong. You can actively be wrong or you can fight against what was right in 2016. JP, one of our co founders and chief system architect laid out a plan that would have us doing water cooling and for our systems nobody else was doing it. And I fought so hard and I was so wrong. JP was right. About a year or two later, Google announced that the TPUs were going to be water cooled. We were first and now Nvidia is only selling water cooled parts. I mean I was dead wrong and JP was right. Many, many instances when you make a lot of decisions every day where you're wrong. I've been wrong about people. People I thought were pretty good turned out to be extraordinary. People I thought would be extraordinary were really smart but couldn't finish projects and get stu. If you're not prepared to be wrong a fair bit, you ought not to be making a lot of decisions because it comes with the territory.
Harry Stebbings
As a venture capitalist I'm never wrong, so I don't know what.
Andrew Feldman
As a venture capitalist you're wrong nine times in ten and everybody forgets. As long as you're really right and.
Harry Stebbings
I get a picture of you signing the term sheet with me and then I go, I think yours is a.
Andrew Feldman
Perfect industry in which nobody cares about the average, on average, you're wrong all the time. And what they care about is the occasional time you're really right. That's what moves a fund. That's different than being a CEO. I think we got to be mostly right most of the time. But if you're making a lot of decisions, you're still making a ton of mistakes.
Ross
This is your fifth startup.
Harry Stebbings
I mean, you are a sucker for punishment, aren't you? I mean really like five times, like Christ, Andrew, did you not get beaten alive enough? My question to you though is like, I believe in the value of serial entrepreneurship, right? I've spoken to many who you don't. How do you think about the inherent benefits that you have having done it four times before?
Andrew Feldman
I think if you are in a business in which running a business is a benefit, then experience matters a great deal. I think if you are in a business in which you look like your customer, there was a reason why social, social networks were started by people right out of college or in college because dating is top of their mind and they look like their customers and that was more important than knowing anything about running a business in that environment. It will certainly select for people who are of the demographic that, that their customers are. They know that backwards and forwards. But if you want to have a business has manufacturing in it, it has a supply chain that has you managing Hundreds or thousands of engineers to a timeline, to a schedule. I don't think anybody would turn around your statement and with a straight face say, you know, what I'm looking for is an engineering leader with no experience. Right? No. And I, I don't want somebody who's led a team of 4 or 500 who has experienced the challenges of growth. What I'm looking for is somebody with no experience.
Harry Stebbings
Naivety is a bonus here.
Andrew Feldman
That's right. I think the people who sell that sometimes are consultants, right? Oh, look, my guys have no experience in your industry. They're not biased. Right. Maybe a little bit of experience in the industry would help. Right? Come on.
Harry Stebbings
Where are people investing today in AI?
Ross
Across the stack.
Harry Stebbings
You can choose any part where you're like, why is so much cash going to that part? I'm not saying that company. I don't.
Andrew Feldman
I think part of the dynamic in your industry is sometimes money needs to find a home. Some guys have raised really, really big funds and they got to find a home for their money. And some people don't like to be left out and they're willing to make investments for maybe for some status purposes or other reasons that don't seem to make sense. There are some underappreciated places of investment, I'd say in the chip world, the sub milliwatt, really tiny, tiny little chips that live next to sensors that do inference. These are tiny little things that will only send back useful data. Is an extremely interesting market and they will sell enormous volume. Now, it's not a part of the market I love to play in. I like to build bigger things and sell them to the data center. But I think that part is extremely interesting. I think they'll be fundamental for robotics. That's an area where extremely underappreciated.
Harry Stebbings
Final one, if we think About Cerebras in 10 years time, where do you envision the business in 10 years time? If everything goes well, where are we in business having that conversation?
Andrew Feldman
So 10 years ago, Nvidia was worth $10 billion. That's a long run in our world right now. I think in three to five years. I would like our technology to have been used to solve two important societal problems. I would like it to be used to have found a therapeutic for an affliction that impacts more than a million people a year. I would like our inference to be powering a collection of apps that don't exist today. And I would like that a meaningful portion of the population in the US and in Europe inadvertently uses our technology. So uses something that we power and that they don't even know it. I think those are things that make.
Harry Stebbings
Me really happy and I've wanted to make this show happen for a long time. As I said, I heard so many good things from Eric for many years. There's been so many requests to have you on the show. My team is just like, just, just get Andrew on the show, Harry. I'm like, okay, okay. I like tweeted it obviously, which is how we got this, but thank you for joining.
Andrew Feldman
You tweeted it and like 40 people sent me notes saying, how come you're avoiding Harry? How come he has to go tweet it? I was just like, all right, just call me. It's good. Send me a note. Happy to come on really thoughtful questions, Harry. Really thoughtful and interesting. A really fun conversation.
Harry Stebbings
So I have wanted to do that.
Ross
Show for a while, but frankly I was just blown away by Andrew's humility, his no BS approach. He was incredible to work with in the process and I just so appreciate his time today. If you want to watch the full episode, you can find it on YouTube by searching for 20VC. That's 20VC on YouTube. But before we leave you today, turning your back of a napkin idea into a billion dollar startup requires countless hours of collapse, collaboration and teamwork. It can be really difficult to build a team that's aligned on everything from values to workflow. But that's exactly what Coda was made to do. Coda is an all in one collaborative workspace that started as a napkin sketch. Now, just five years since launching in beta, Coda has helped 50,000 teams all over the world get on the same page. Now at 20 VC, we've used Coda to bring structure to our content planning and episode prep, and it's made a huge difference. Instead of bouncing between different tools, we can keep everything from guest research to scheduling and notes all in one place, which saves us so much time. With Kodi, you get the flexibility of docs, the structure of spreadsheets, and the power of applications, all built for enterprise. And it's got the intelligence of AI, which makes it even more awesome. If you're a startup team looking to increase alignment and agility code Coda can help you move from planning to execution in record time. To try it for Yourself, go to Coder io20VC today and get six free months of the team plan. For startups, that's Coda io20VC to get started for free and get six free months of the team plan now that your team is aligned and collaborating. Let's tackle those messy expense reports. You know those receipts that seem to multiply like rabbits in your wallet? The endless email chains asking can you approve this? Don't even get me started on a month end panic when you realize you have to reconcile it all. Well, Pleo offers smart company cards, physical, virtual and vendor specific so teams can buy what they need while finance stays in control. Automate your expense reports, process invoices seamlessly, and manage reimbursements effortlessly all in one platform. With integrations to tools like Xero, QuickBooks and Netsuite, Pleo fits right into your workflow, saving time and giving you full visibility over every entity, payment and subscription. Join over 37,000 companies already using Pleo to streamline their finances. Try Pleo today. It's like magic, but with fewer rabbits. Find out more at Pleo IO 20 VC and don't forget to revolutionize how your team works together. Rome A Company of Tomorrow runs at hyperspeed with quick drop in meetings. A Company of Tomorrow is globally distributed and fully digitized. The Company of Tomorrow instantly connects human and AI workers. A Company of Tomorrow is in a Rome virtual Office. See a visualization of your whole company. The live presence, the drop in meetings, the AI summaries, the chats. It's an incredible view to see. Rome is a breakthrough workplace experience loved by over 500 companies of tomorrow. For a fraction of the cost of Zoom and Slack, visit Roam. That's or AM for an instant demo of Roam Today. Nobody knows what the future holds, but I do know this. It's going to be built in a Roam virtual office. Hopefully by you. That's Roamro AM for an instant demo. As always, I so appreciate all your support and stay tuned for a fantastic episode coming on Wednesday with I think one of the most under discussed firms in venture capital, Lead Edge Capital and their founder Mitchell.
Podcast Summary: The Twenty Minute VC (20VC)
Episode: AI Chip Wars: How Cerebras Plans to Topple NVIDIA's Dominance | Why We Have Not Reached Scaling Laws in AI | What Happens to the Cost of Inference | How We Underestimate China and Shouldn't Sell To Them with Andrew Feldman
Release Date: March 24, 2025
Host: Harry Stebbings
Guest: Andrew Feldman, Co-founder and CEO of Cerebras
In this episode of The Twenty Minute VC, host Harry Stebbings welcomes Andrew Feldman, the co-founder and CEO of Cerebras, renowned for developing the fastest AI inference and training platform globally. Andrew shares insights into Cerebras' strategy to challenge NVIDIA's dominance in the AI chip market, the inefficiencies in current AI algorithms, the future of AI scaling laws, and the geopolitical implications of AI technology.
Timestamp: [04:32]
Andrew Feldman discusses the inception of Cerebras in 2015, driven by the recognition of evolving AI workloads that traditional GPUs couldn't efficiently handle. He emphasizes the inefficiency in current AI algorithms, noting that GPUs are only about 5-7% utilized during inference tasks, resulting in significant wasted computational resources. This inefficiency stems from the fundamental architecture of GPUs, which rely heavily on off-chip memory, making them suboptimal for inference operations.
Notable Quote:
“We won’t be as dependent on transformers in three years or five years as we are now. 100%. The fundamental architecture of the GPU with off-chip memory is not great for inference.” — Andrew Feldman [00:00]
Timestamp: [06:08]
Andrew elaborates on Cerebras' innovative wafer-scale architecture, contrasting it with traditional GPU designs. While GPUs excel in handling large-scale training tasks, their architecture poses limitations for inference due to high memory bandwidth requirements. Cerebras addresses this by utilizing SRAM over HBM (High Bandwidth Memory), enabling faster data movement and higher efficiency during inference. This approach allows Cerebras to maintain high performance without the administrative complexities of scaling across thousands of traditional GPUs.
Notable Quote:
“By going to wafer scale, we were able to put down a huge amount of SRAM and get the benefits of speed and enough capacity.” — Andrew Feldman [07:39]
Timestamp: [12:02]
The discussion shifts to the cost dynamics of AI inference. Andrew highlights that the primary components driving inference costs are power consumption and the physical space of data centers. Cerebras' wafer-scale chips consume less power by minimizing off-chip data movement, a significant drain on energy resources. Additionally, they tackle the traditional yield issues associated with large chip designs by implementing redundancy across thousands of tiles, ensuring higher yields and cost-effectiveness.
Notable Quote:
“We use less power because one of the most power-hungry things on a chip are the I/Os moving data off chip.” — Andrew Feldman [12:43]
Timestamp: [23:45]
Addressing the debate around AI scaling laws, Andrew asserts that there's ample room for algorithmic advancements. He challenges the notion that we've hit an asymptote with current scaling laws, emphasizing that many AI algorithms, especially those governing transformers, are far from optimal. Cerebras aims to enhance efficiency by developing more effective algorithms, thereby increasing chip utilization and reducing inference costs.
Notable Quote:
“I don't think there's a lot of debate among senior ML thinkers that we have tremendous room for algorithmic improvement.” — Andrew Feldman [24:06]
Timestamp: [35:24]
The conversation delves into NVIDIA's stronghold in the AI chip market, particularly through its CUDA platform. Andrew contends that NVIDIA's dominance, while formidable, isn't insurmountable. He argues that CUDA lock-in isn't a significant barrier for inference applications, as many AI frameworks like PyTorch are hardware-agnostic and can be compiled for different architectures. This perspective suggests that challengers like Cerebras can effectively compete by offering superior architecture tailored for AI inference.
Notable Quote:
“There's no CUDA locking in inference. None.” — Andrew Feldman [35:24]
Timestamp: [46:20]
Andrew discusses the complexities of export controls, especially concerning sales to China. He underscores the challenges in preventing the progression of another nation's technological capabilities despite stringent policies. Cerebras' decision to refrain from selling to China stems from ethical considerations, particularly the potential misuse of AI technology in areas like facial recognition and military applications.
Notable Quote:
“The deal on the table wouldn't be used for good. And I wasn't comfortable with that.” — Andrew Feldman [48:54]
Timestamp: [53:08]
Looking ahead, Andrew predicts that AI's integration into daily life will mirror the ubiquity of telephones within a couple of years. He anticipates that AI will become seamlessly embedded in various applications, enhancing user experiences without overt acknowledgment. Furthermore, he foresees significant growth in the AI chip market, with Cerebras aiming to drive transformative societal advancements through their technology.
Notable Quote:
“Within a year or two, AI's penetration will be approximately the same as telephones.” — Andrew Feldman [53:27]
Timestamp: [55:10]
Andrew reflects on the lessons learned from past mistakes, emphasizing the importance of adaptability and humility in leadership. He shares his experience of initially resisting water cooling for Cerebras' systems, only to later adopt the approach successfully as industry standards shifted. This narrative highlights the necessity of being open to change and continuously iterating on strategies to align with evolving technological landscapes.
Notable Quote:
“If you're not prepared to be wrong a fair bit, you ought not to be making a lot of decisions because it comes with the territory.” — Andrew Feldman [55:07]
In the concluding segments, Andrew outlines Cerebras' vision for the next decade, aiming to solve significant societal challenges and integrate their AI technology into everyday applications. He underscores the company's commitment to ethical considerations, technological excellence, and strategic partnerships to navigate the competitive AI landscape.
Notable Quote:
“I would like our inference to be powering a collection of apps that don't exist today. And I would like that a meaningful portion of the population in the US and in Europe inadvertently uses our technology.” — Andrew Feldman [58:28]
For More Information:
To explore more episodes and resources from The Twenty Minute VC, visit www.20vc.com.