
Can we make science as fast as software? In this episode, Erik Torenberg talks with Patrick Hsu (cofounder of Arc Institute) and a16z general partner Jorge Conde about Arc’s “virtual cells” moonshot, which uses foundation models to simulate biology and guide experiments.
Loading summary
A
I want to make science faster. Our moonshot is really to make virtual cells at ARK and simulate human biology with foundation models. Why are we so worried about modeling entire bodies over time when we can't do it for an individual cell?
B
If we can figure out how to model the fundamental unit of biology, the cell, then from that we should be able to build.
A
My goal is to really try to figure out ways that we can improve the human experience in our lifetime. There are a few things that if we get them right in our lifetime, will fundamentally change the world.
C
Today we're talking about making science move faster. My guests are Patrick Hsu, co founder of the ARC Institute and A16Z general partner Jorge Conde. We get into virtual cells and foundation models for biology, why science gets stuck in incentive knots, what an alpha fold level movement for cell biology could look like, and how breakthroughs translate into actual drugs and business outcomes. Let's get into it.
D
Patrick, welcome to the podcast. Thanks for joining.
A
Thanks for having me on.
D
I've been trying to have you on for years, but finally I could get your time here.
A
I am, I'm excited to do it. It's going to be great.
D
For some of the audience who aren't familiar with you and your work at Ark and beyond, how do you describe what's your moonshot? What is what you're trying to do?
A
I want to make science faster.
E
Right.
A
You know, we can frame this in high level philosophical goals like accelerating scientific progress. Maybe that's not so tangible for people. I think the most important thing is science happens in the real world. If it's not AI research, which moves as quickly as you can iterate on GPUs, right? You have to actually move things around. Atoms, clear liquids from tube to tube to actually make life changing medicines. And these are things that take place in real time. You have to actually grow cells, tissues and animals. And I think the promise of what we're doing today with machine learning in biology is that we could actually accelerate and massively parallelize this. And so our moonshot is really to make virtual cells at ARK and simulate human biology with foundation models. And you know, we'd like to figure out something that feels useful for experimentalists, people who are skeptical about technology. You know, they just want to see the data and see the results, that it's actually the default tool that they go to use when they want to do something with cell biology.
B
Okay, well, hold on, let's back up. Why is science so slow in the first place? Like whose fault is that?
A
Whose fault is that? Now that is a, that is a long one. We should get into it. We should get into it. It's really multifactorial.
E
Okay, Right.
A
It's this weird Gordian knot that ultimately comes down to incentives.
E
Right?
A
Comes down to, you know, people talk a lot about science funding and how science funding can be better, but it's also about how the training system works, how we incentivize long term career growth, how we try to separate basic science work from commercially viable work, and generally the space of problems that people are able to work on. Today, I think things are increasingly multidisciplinary. It's very hard for individual research groups or individual companies to be good at more than two things.
E
Right.
A
You might be able to do computational biology and genomics, or chemical biology and molecular glues, but how do you do five things at once is increasingly hard. And we really built ARC as an organizational experiment to try to see what happens when you bring together neuroscience and immunology and machine learning and chemical biology and genomics all under one physical roof.
E
Right.
A
If you increase the collision frequency across these five distinct domains, there would hopefully be a huge space of problems that you could work on that you wouldn't be able to. Now, obviously in any university or any kind of geographical region, you have all of these individual fields represented at large, right. Across these different campuses, but people are distributed and you want everyone together.
B
Okay, but if I may. So a universe. I would have thought a university was an attempt to bring in multiple disciplines under one roof. You're saying it's not, it's too diffuse.
A
It's across an entire campus.
B
Okay, so the physic, like literally the physical distance creates inefficiency.
A
That's part of it. And I think the other part is folks have their own incentive structures, right. They need to publish their own papers, they need to do their own thing and, you know, make their own discovery and you're not really incentivized to work together. I think in many ways in the current academic system and lot of what we've done is to try to have people work on bigger flagship projects that require much more than any individual person or group or idea.
B
That's cool. So like sort of the original hypothesis for the ARC Institute is if you can bring multiple disciplines together to increase the collision frequency, as you said, and if one could remove some of the cross incentives that may exist in sort of traditional structures, the combination of those two things will make science faster.
A
Yeah, these are absolutely part of it.
E
Right.
A
We have two flagship projects, one trying to Find Alzheimer's disease drug targets. The other two make these virtual cells. And I think it's not just the people and the infrastructure, but also the models will hopefully literally make science faster. That you could do experiments at the speed of forward passes of a neural network if these models could become accurate and useful.
B
Yeah. So that, that will be one thing that solves the length of discovery is you compress the time discovery takes naturally by just throwing technology at the problem. At the risk of oversimplifying.
A
Well, we're techno optimists here.
B
No, we are.
A
Yeah.
D
Why has AI progressed so much faster in sort of image generation and language models than biology? And if we could wave a wand, like, where are we excited to speed.
A
Certain things up, to be honest, It's a lot easier.
E
Right.
A
Maybe that's a hot take.
E
Right.
A
But technology is easier than biology. Natural language and video modeling is easier than modeling biology.
E
Right.
A
And to some degree, if you understand and learn machine learning and how to train these models, you have already learned how to speak, you already know how to look at pictures, and so your ability to evaluate the generations or the predictions of these models are very native.
E
Right.
A
We don't speak. Speak the language of biology.
E
Right.
A
You know, at very best, with an incredibly thick accent.
E
Right.
A
So when you're training these DNA foundation models, I don't speak DNA natively, so I only have a sense of the types of tokens that I'm feeding into the model and what's actually coming out.
E
Right.
A
Similarly, these virtual cell models, you know, I think a lot of the goal is to figure out ways that you can actually interpret the weird fuzzy outputs that the model is giving you. And I think through that's what slows down the iteration cycle is you have to do these lab in the loop things where you have to run actual experiments to actually test with experimental ground truth. And I think increasing the speed and dimensionality of that is going to be really important.
B
How much of this is the fact that you talk about we speak biology poorly or with a very thick accent? How much of this is if you're training on an image, we can see the image and so we can see how good the output is. What about all the things in biology that we can't see or don't even know exist yet? How can we create a virtual cell? And maybe we should come back to what a virtual cell model is, by the way, for the lay audience. But how can we create a virtual cell model when we're not even sure if we understand all of the components that are in a cell and how they function.
A
People talked a lot about this in NLP as well. There's this long academic tradition in natural language processing. And then it was just weird and non intuitive and intensely controversial that you could just feed all this unstructured data into a transformer and it would just work. Now, we're not saying this will just work in all the other domains, including in biology, but I think there is this controversy around what does it mean to be an accurate biological simulator. What does it mean to be a virtual cell? It's true. We can't measure everything, Right. We can't measure, I think, things like metabolites and really high throughput with spatial resolution. And there are going to be different phases of capability where initially they model individual cells, then they model pairs of cells, then they model cells in a tissue and then in a broader physiologically intact animal environment. And those are length scales and kind of layers of complexity that will aggregate and improve upon over time. And I think the other kind of non intuitive thing in many ways are the scaling laws that you get in data and in modeling. I'll give you an example. There's a lot of discussion in molecular biology about how RNAs don't reflect protein and protein function. And so while we don't have proteomic measurement technologies that are nearly as scalable as transcriptomic measurement technologies today, that's the single cell resolution, certainly, but we're getting there. And you can layer on certain nodes of protein information that you can add on top of the RNA information. But in many ways the RNA representation is a mirror. Right. It might be a lower resolution mirror for what's happening at the protein layer, but eventually what is happening in protein signaling will get reflected in a transcriptional state for an individual cell. This may not be very accurate, but when you imagine the massive data scale that we're generating in genomics and functional genomics.
E
Right.
A
You start to gather tremendous amounts of RNA data that will read in kind of like what's happening at the protein level at some, at some sort of mirror echo.
E
Right.
A
And then that can, you know, be the case for metabolic, metabolic information as well and so on.
E
Yeah.
B
So it's a low pixel image, but if we can get sort of zoomed out far enough, we'll get a sense of what's going on.
A
You have to bet on what you can scale today.
E
Right.
A
We're able to, you know, scale single cell and transcriptional information today. We're able to add on, you know, protein level information over time. We'll need spatial information, spatial tokens, and we'll need temporal dynamics as well. And we'll, you know, I kind of bucket things into three tiers. There's invention, engineering and scaling. And there are certain things today, biotechnologically that are scale ready. And then there are things that we still need to invent.
E
Right.
A
And that's part of why we felt like we needed a research institute to be able to tackle these types of problems. That we weren't just going to be an engineering shop that's just trying to scale single cell perturbation screens.
E
Right.
A
That, you know, would be interesting, but in three years would feel very dated, I think.
E
Right.
A
And so there's a lot of novel technology investment that we're making that we think will bear fruit over time.
E
Yeah.
D
Can we flesh out the virtual cell concept? Why? That's the ambition we've landed on. What it's going to take to get there or what are the bottlenecks?
A
I would say the most kind of famous success of ML and biology is alphafold.
E
Right.
A
And this solved the protein folding problem of, you know, when you take a sequence of any amino acid, what does the protein look like?
E
Right.
A
And you know, it's pretty good. It's not perfect, it certainly doesn't simulate the biophysics and the molecular dynamics, but it gives you a sense of what the end state is with 90% plus accuracy.
E
Right.
A
And that's the alpha fold moment that people talk about.
E
Right.
A
Where anytime you want to, you know, work with a protein, if you don't have an experimentally solved structure, you're just going to fold it with this algorithm. And we kind of want to get to that point with virtual cells as well. And the way that at ARC we're operationalizing this is to do perturbation prediction, where the idea is you have some manifold of cell types and cell states that can be a heart cell, a blood cell, a lung cell and so on. And you know that you can kind of move cells across this manifold.
E
Right.
A
Sometimes they become inflamed, sometimes they become apoptotic, sometimes they become cell cycle arrested, they become stressed, they're metabolically starved, they're hungry in some way. And so if you have this sort of, this representation of universal sort of cell space.
E
Right.
A
Can you figure out what are the perturbations that you need to move cells around this manifold? And this is fundamentally what we do in making drugs, right. Whether we have small molecules, which started out as natural products from, you know, boiling leaves or antibodies, when we injected proteins into cows and rabbits and sheep and took their blood to get those antibodies. We were basically trying to get to more and more specific probes, right. And we had experimental ways to kind of cook these up. Now we have computational ways to zero shot these binders. But ultimately what you're trying to do with these binders is to introduce, inhibit something and then by doing so kind of click and drag it from kind of toxic gain of function, disease causing state to a more quiescent homeostatic, healthy one.
E
Right.
A
And the thing that is very clear in complex diseases, right, where you don't have a single cause of that disease is there are some complex set of changes. There's a combination of perturbations, if you will, that you would want to make to be able to move things around. Now you know, people talk about this classically as things like polypharmacology, right. But you know, I think we're moving from a, oh, this thing happens to have, you know, a whole bunch of different targets kind of by accident to we have the ability to manipulate these things commentorially in a purposeful way.
E
Right.
A
That to go from cell state A to cell state B, there are these three changes I need to make. First this, then these two changes and then these six changes over time. And we kind of want models to be able to suggest this. And the reason why we scoped virtualcell this way is because we felt it was just experimentally very practical. You want something that's going to be a copilot for a wet lab biologist to decide what am I going to do in the lab?
E
Right.
A
We're not trying to do something that's like a theory paper that's really interesting to read, where the numbers go up on a ML benchmark. But you know, you practically can decide what are the 12 things that you're going to do in the lab in 12 different conditions.
E
Right.
A
And actually just test them.
E
Right.
A
And then that's how we kind of enter the kind of the lab in the loop aspect of model predictions to experimental measurements to you know, you know, kind of improved or RL'd or whatever model kind of predictions again. And the goal is to be able to do in silico target ID where you can basically figure out new drug targets, figure out then the compositions, the drug compositions you would need to actually make those changes. I think if we could do that, we could make a new AI, like vertically integrated AI enabled pharma company.
E
Right.
A
Which I think is obviously a very exciting idea today. But I think in many ways the kind of pitch and the framing of these companies precedes the fundamental research capability breakthroughs. And that's what we're really invested in at AHRQ is just kind of just making that happen, along with many other amazing colleagues in the field, to just make this possible for, you know, the community.
B
So if the goal is, I'm oversimplified for you. Like if we wanted to get to the alphafold moment where, you know, it kind of gives you a useful structure, folded structure, 90% of the time, to use your, your data point, we wanted to take that comparison in the, in the virtual cell model. And we said, okay, we 90% of the time, if I ask the model, I want to shift the cell from cell state A to cell state B and it's going to give me a list of perturbations. And let's say that at 90% of the time, those perturbations in fact result in the shifting experimentally, in the shifting from cell state A to cell state B. How far away are we from that alphafold moment for virtual cells?
A
I find it helpful to frame these in terms of like GPT 1, 2, 3, 4, 5 capabilities. Right? And I think most people would agree GPT 1 and 2. A lot of the excitement was that we could achieve GPT1 in the first place, that you could see a path with scaling laws of some kind to kind of make successive generations where capabilities would improve. But these are with our EVO kind of DNA foundation models that we developed at ARC with Brian.
E
He.
A
One of the things that we've seen is that these are really kind of, these genome generations are quote unquote blurry pictures of life.
E
Right?
A
We don't think if you synthesize these novel genomes, they would be alive. But you know, we don't think that's actually also impossibly far away. We'll just have to kind of follow these capabilities we're generating. We're taking a very integrated approach to attack this problem, right, where you need to curate public data, you need to generate massive amounts of internal private data, build the benchmarks and train new models and build new sort of architectures and kind of doing these things full stack back and we'll just kind of attack this hill climb over time.
D
What's the GPT? I'll say GPT3 moment going to look like. And by that I mean sort of a public release that alters the public's conception of just what's possible here from a capabilities perspective and also inspires a whole new generation of Talent to like rush into, into, into, into biology.
A
Well, the good thing with biology is we have a lot of ground truth, right? There are entire textbooks, right, that describe cell signaling and cell biology and how these things work. And so, you know, even without a virtual cel, right, if you went into ChatGPT or Claude and you basically, you know, you asked us some question about, you know, like receptor tyrosine kinase signaling, it would have an opinion on how that works, right? And so I think you would want the model to be able to predict perturbations that are kind of famous canonical examples of biological discovery. So I'll give you an example. If you've loaded into the model an ipsc, kind of an induced pluripotent stem cell state or human embryonic stem cell state and fibroblast cell state, could it predict that the four Yamanaka factors would reprogram the fibroblast into a stem like state and essentially rediscover from the model something that won the Nobel Prize in 2009? That would be one really kind of classic example. And then you could go do the inverse. If you have a stem cell, can it discover neurogenin 2, ASCL1 MYOD, can it find differentiation factors will turn that into a neuron or into a muscle cell or so on. And these are kind of classic examples in developmental biology. But you could also use this to try to discover or kind of recapitulate the mechanism of action of FDA approved drugs, right? And so you could say, for example, if you kind of inhibit HER2 in breast cancer cell states, you would get this type of response. Or it could predict the certain clones that will be able to kind of be more metastatic or they'll be more resistance and they'll lead to minimal residual disease. I think lots of kind of biological evals that you can kind of add onto these models over time that are really tangible textbook examples as opposed to, I think, what the kind of early generation of models do today, which is, you know, very quantitative things like mean absolute error over like, you know, the differential expressed genes and stuff like that. You know, that's. Those are ML benchmarks. And we want to increase the sophistication into something that you could explain to an old professor who has, you know, never touched a terminal in their life.
B
By the way you talk about textbooks as ground truth, do you think we're going to find that a lot of the textbooks are wrong?
A
I would say textbooks are compressed, right? So for example, when you look at these kind of classic cell signaling Diagrams of A signals to B which inhibits C. Right. That's a very kind of two dimensional.
B
Representation of our understanding of a complex.
E
Right, right, right.
A
I mean, yes, textbooks are what they are. They represent the corpus of reliable knowledge. But everyone knows that there are incredible number of exceptions. And part of what discovery is is to find new exceptions.
E
Right.
D
Why don't you talk about the difference between simulation of biology and the actual understanding? And what would it would it take to actually be able to model the extremely complex human body?
A
You know, some people don't like the phrase virtual cells because it sounds too media friendly. It's not rigorous enough.
E
Right.
A
But I've always found it funny that, you know, you know, many people are okay with like digital twins and digital AV cars, which talks about modeling biology at a way higher level of abstraction. I think virtual cells, if anything, is actually way more scoped and rigorous than modeling a digital twin or avatar. But I think these are useful words because they describe the goal and the ambition that. No, in the long run, we don't care about predicting the, you know, kind of perturbation responses of an individual cell at all actually.
E
Right.
A
Obviously we want to be able to predict drug toxicity, we want to be able to predict aging, we want to be able to predict why a liver cell becomes cirrhotic when you repeatedly challenge it with ethanol molecules or whatever.
E
Right.
A
And you know, these sort of chemical or environmental perturbations should be predictable. I think you just kind of have to layer on the complexity.
E
Right.
A
Like why are we so worried about modeling entire bodies over time when we can't do it for an individual cell?
E
Right.
A
Where we sort of accept or broadly believe that this is a kind of fundamental unit of biological computation, if you will. And let's just kind of start there. Just like you kind of have to start with things like math and code and language modeling.
E
Right.
A
And things that are just sort of easier to check. You can build a super intelligence over time.
B
Yeah, I think that makes sense. Right. That's a very sort of laudable, ambitious goal. If we can figure out how to model the fundamental unit of biology, the cell, then from that we should be.
A
Able to build like in early AI, we just started with language translation, just basic NLP tasks.
E
Right.
A
This is long before the tremendous ambitious scope that we have today. And I think we hopefully can mirror that type of trajectory if we're lucky.
D
It seems that biotech and pharma has been a shrinking, interesting the rate of growth. What's it going to take for these innovations in the science to reflect themselves in business models and growth for the industry.
A
A lot of these biotech startups would try to initially sell software to pharma companies and then they would kind of realize, oh wow, we're like competing for SaaS budgets, which aren't very large. And then now they're realizing, oh, we have to compete for R and D budgets. And I think there is this narrative from the current generation, these companies, that our biological agents will compete for R and D budgets and replace headcount or something like that, just like we're seeing in agents across different verticals. Whether or not that will, I think, pan out depends on just whether or not these things meaningfully allow us to, you know, build drugs more effectively in the pharma context.
E
Right.
A
And I think that's just sort of the most important thing in this industry. And so I think we believe in virtual cells not just because we think it will be a fountain of fundamental mechanistic insights for discovery, but also because if in the case of success, it could be industrially really useful.
E
But.
A
We'Ll have to see over time, right, if we have 90% of drugs failing in clinical trials, right. That kind of means two things. And you're not sure what percent of which. Right. One is we're targeting the wrong target in the first place. The second is the composition. The drug matter that we're using doesn't do the job.
E
Right.
A
It's not clear for each individual failure which one it is, or if it's both, or what proportion of each. And we'll have to kind of sort that out over time, like you can imagine, even in the case of success, when we had 90% accurate virtual cells, you'll probably end up with suggestions like, okay, now you need to target this GPCR only in heart, but not in literally any other tissue, right? We don't have the drug matter that can do that today. And so that's also why again, you probably need research to figure out novel chemical biology matter that allows you to drug poliotropic targets in a tissue or cell type specific way. And so I think part of why biology is slow is because there's just this Russian nesting doll of complexity in terms of understanding, in terms of perturbation, in terms of safety. And the crazy thing is the progress in just the short time that I've been doing this is insane. I did my PhD at the Broad Institute in the heyday of developing single cell genomics, human genetics, CRISPR gene editing and so many other things. And I think the kind of early 2010s papers on single cell sequencing would have, like, 20 cells or 40 cells.
E
Right.
A
And at ARC in the next, you know, kind of n, like, I don't know, relatively short amount of time, we're going to generate a billion perturbed single cells.
E
Right.
A
That's. I mean, how's that for Amour's Law?
B
Yeah, that's remarkable.
E
Yeah. Yeah.
D
Jorge, I want to hear your answers a couple of these questions, too, as the lead of our biopractice, Both on the GPT3 moment, what that could look like, and also, like, I'm curious if you think it's gob1s or sort of building off that, or if it's going to be something different and also, what's it going to take for the science to kind of reflect itself in the business, for the industry to grow?
B
Yeah. So I'll take the second one first if I could. So I think, you know, in terms of where the industry is right now, I think one of the big challenges we have is, as Patrick describes very nicely, like, you know, discovery's hard and it takes time. And, you know, the fail modes are exactly as you described. Oftentimes when drugs fail, which they do 90% of the time in clinical trials, it's because we're going after the wrong thing or we made the wrong thing to go after the right thing.
E
Right.
B
Like, those are the two fail modes. And that happens all too often. And so I think a lot of the stuff that Patrick is describing is going to basically improve our hit rate or our batting average on figuring out what to go after and then making the right thing to go after said thing. The challenge we have, I think, in the industry is that the bottlenecks still are the bottlenecks. And the biggest bottleneck we have, which is a necessary one, is we have to prove that whatever we make, that we have the right thing to go after the right thing, so to speak. And that when we have it, that it's going to be as, you know, de risked as possible before you put it into humans.
A
And we have to be good at making them in the first.
B
And we got to make it, too. Yeah, exactly. And so that bottleneck is a necessarily important one. That bottleneck should exist. I'm not suggesting we've got to remove it. But are there ways to reduce the cost and time associated with getting through the bottleneck of human clinical trials? And, you know, it's interesting because, you know, we talk about, you know, all of the various stakeholders when you're making a drug, there Are the companies. There's, of course, the science that supported the company that's trying to commercialize a product, and they're the regulatory agencies, you know, and everyone is trying to ensure again, that what's, you know, first and foremost is the ability to discover and commercialize drugs that are safe and effective for humans. That middle part of actually getting through that bottleneck is hard to speed up in a very obvious way. Like, you can increase the rate the way you enroll clinical trials. You can use better technology to change the way we design these clinical trials. So maybe they can be faster or shorter, et cetera, but some of them just have a natural timeline you have to go through. Like, if you want to demonstrate that a cancer drug promotes survival, guess what you're going to have. It's going to take some time to demonstrate a survival benefit. Or if, you know, you want to do a longevity drug that by definition is a lifetime of a trial in terms of length. So there's a lot of these bottlenecks that are really hard to get through. So what helps the industry? I think there are a couple of things that help the industry. One is capital intensity will hopefully at some point, go down over time as technology gets better. Capital intensity is something that our industry faces. In some ways, it looks a little bit like AI now, right, in terms of the cost of training these models. But the capital intensity is very, very high. That has not come down. So we got to get the success rates up to impact capital intensity to get it down. The second thing is, where can we compress time so good models can help us compress early discovery time? We still haven't seen, and I think it's coming, but it hasn't happened yet. We haven't seen artificial intelligence or other technologies massively compress the amount of time it takes us to do the clinical development, the clinical trials, the enrollment of patients, all those things. We're seeing some interesting things coming. We haven't seen sort of the payoff there yet. And the third thing is, if we can make better drugs going after better things, the effect size should be higher. So therefore the answer should be obvious sooner. If we can get those three things right, Reduce capital intensity, compress timelines, and effectively increase effect size in some very tough sort of intractable diseases. That is what I think fixes the industry. And from where we sit at the early stage. At the early stage, in terms of being early stage investors, the reason why that helps us is if the capital intensity goes down and the value creation goes up, it becomes easier to Invest in these companies in the early days because you get rewarded for coming in early. The problem we have right now is that most companies aren't. You're not seeing rewards happening when there's value inflection. So you come in early, you bear the brunt of the capital intensity. And even if a company's successful, that success isn't reflected in the valuation. So we're not seeing the step ups that you see in other parts of the industry. And that's just really, really hard from an investment standpoint. So I think we need to see those various factors addressed for this space to really get fixed, to use your word.
A
Yeah, that was great. I have a lot to add on to this. Please add away just a few simple observations. The first is the amount of market cap added to Lilly and Novo based on the development of GLP1s means like over a trillion dollars is, or you know, I mean, Novo stock has decreased a lot. So, you know, trillion dollars, let's say, is more than the market cap of all biotech companies combined over the last 40 years have been started.
E
Right.
A
And I think that, you know, one, one of the kind of interesting kind of corollaries of this is that, you know, when we have a 10% kind of clinical trial success rate for kind of preclinical drug matter, right. You tend to circle the wagons a bit and try to manage your risk.
E
Right.
A
And so the way that do this is you try to go after really well established disease mechanisms where if I developed new drugs that go after well understood biology, it should work the way that I hope it will in the trial, which is really, really expensive and costs a lot more in many ways than the preclinical research. The problem with this is you go after very well validated disease mechanisms, but with really small patient populations, then the expected value of this actually is relatively low. One of the things that we've seen with GLP1s is just the value that you can create when you go after really large patient populations. And I think that has culturally really increased the ambition of the industry, both from the investor and from the drug developer side. And I think, you know, that's something that we should keep our foot on the gas for.
E
Yeah.
B
And look, I think the trend on that is positive. I would argue the trend on that is positive. You're absolutely right. Like the, the demonstration of the value that has been created with the increase in use of GLP1s and the value transfer that's gone to companies like Lilly and Novo, I would argue is like Very merited, right. Because they've cracked an endemic social problem in terms of managing diabetes and eventually helping manage obesity. And so I think that's remarkable and there's a lot of value that goes to that because they tackled, they cracked a very, very challenging problem for society beyond just science. So that's great. And I agree with you, like the, the, the, the prize, the juice needs to be worth, worth the squeeze. Right? You're right. A lot of biotech has been around like go after the low hanging fruit because it's low risk and we got to eat today.
E
Right.
B
So you go get it, you know, and you sort of, you push off the big, the big ambitious indication, the large population or the really tough to crack disease. But you know, I do think we're seeing more and more of that.
D
And by the way, like we can.
B
Get into some of these genetic medicines, but some of these genetic medicines are going after some of the hardest problems, the things that you quite literally couldn't address but for editing DNA. And I think that's incredibly remarkable and laudable and frankly inspiring. But the fundamental elements of the industry have to work. So the capital formation is there to support those kinds of things. And right now it's hard because of the issues we talked about before.
D
Fifteen years from now we're back in this room, we've barely escaped being part of the permanent underclass and we're reflecting on sort of the GPT3 moment or maybe the legacy of GLP wants sort of beyond where they are now. What do you think it could be? Or I'm curious to get your take on, what do you think is going to be the technological breakthrough that we're going to point back to and say, oh, this is really what, what said it all that. Or do you think it's going to be sort of, you know, multifactor combination?
B
Yeah, look, I think it's going to go back to sort of where we started this combination conversation. Excuse me, GOP ones as a drug are, you know, what is four decades in the making or something like that, you know, these are, these are not overnight successes. But I do think what we are going to see more of and our hope is that when you combine the fact that we're getting better at understanding what to target, getting better at designing medicines to hit those targets, by the way, in a whole array of new creative ways. So we have small molecules, the natural products that we got from boiling leaves, as you said earlier, like those have gotten, you know, we're getting really good at designing smarter and Better smaller molecule, small molecules that do new things that function in ways that they didn't before. We've gotten quite good at designing biologics or proteins with a lot of help from things like AlphaFold that help us understand how proteins fold. We're going to get a lot better at designing some of the more complex modalities, like the gene therapies of the world or the gene editors of the world. And when you can do that and combine that with our ability to hopefully use things like virtual cell models to really understand what to go after, like, we're going to have drugs, we. I would hope, and I would expect that the industry will continue to bring forward drugs that have very large effect size for very difficult diseases that hopefully affect a lot of patients. If that's true, then we'll start to see some of these really, really difficult diseases that affect all of society get tackled, hopefully, you know, one by one by one by one. And so we have obesity, we have metabolic disorder, we're dealing with cardiometabolic disease. We're starting to see interesting, promising things happening in, like, neurodegenerative diseases. You know, if we can, you know, tackle cancer or at least, you know, several cancers that now have begun to be treated more like a chronic condition than a death sentence that they were in the past. The more we see of that, like, I think that value to society will accrete over time. And I think this should be an industry that is extraordinarily valued by society and candidly, by the markets we have to deliver.
A
If we play this out, and let's say these AI models work and you can make a trillion binders in silico, that will be exquisite drug matter. We still need to make these things physically and test them in animals and hopefully predictive models and then actually in people.
E
Right.
A
And I think that will increasingly be the bottleneck in many ways. Right. And, you know, my. My friend Dan Wang recently released a book called Breakneck, which talks about, you know, kind of like the US And China and the difference between the two countries and their philosophy, the way they approach markets.
D
We're a country of lawyers or a country of engineers.
A
Exactly. That's right.
E
Right.
A
China is an engineering state.
E
Right.
A
It' politburo is, you know, folks who have engineering degrees. You know, you need to build bridges and roads and buildings, and these are the ways that we solve our problems. Whereas, I think from, you know, the first 13American presidents, 10 of them practiced law from 1980 to 2020, all Democratic presidential candidates, both VP and president went to law school.
E
Right.
A
And so you kind of see the echoes of that in the FDA and the regulatory regime and, you know, all the kind of the bottlenecks that people talk about developing drugs stateside and increasingly you see folks thinking about how we can run phase ones overseas.
E
Right.
A
Build data packages that we can bring back domestically for phase 2 efficacy trials. I think that's interesting directionally, but it's not enough.
E
Right.
A
And, you know, I think we need to kind of figure out these two bottoms, the making and the testing, even if we can solve the designing part.
B
Oh, I agree. Yeah. Yeah, that, that's the bottleneck.
E
Yeah.
B
You know, we, we joke about it. You have to do is you have to get a molecule that can go, you know, first in mice and then in mutts and then in monkeys and then in man. Like, there's, you know, it takes a long time and it's just so hard to compress that. And so when you do, you should make the journey worth, you know, make the journey worth it.
E
Right, yeah.
B
So when you fail on the other end of that, like, that's obviously horrible. And so finding ways to make sure that when you, when you walk that path, that it'll be a successful journey as often as possible is what this industry desperately needs.
D
Alphafold solved protein folding problem. But why didn't it solve judge discovery or more broadly, what would it take to, to get what is sort of the bottleneck on the, on the, on the tech side, at least.
A
On the tech side.
E
Yeah.
B
Maybe another way to ask the question is that because I always ask the founders a version of this question, like the AI ones.
A
Sure.
B
That are like, oh, we're going to do AI for life, for drug discovery. So my, my question that I always like to ask founders is give me examples where you think AI is hyped, potentially overly hyped, where there's real hope, like the sort of what do we expect, what's next and where we already see real heft.
E
Yeah.
B
So, like, if I asked you, like, in AI, we know, where is there hype, where is there hope and where are we seeing heft today?
A
I would say there's hype in toxicity prediction models.
E
Okay.
B
So that's the idea that we will say, I'm going to show you a molecule and you're going to tell me the model is going to tell me if it's going to be toxic or not.
A
That's right.
E
Right.
A
There's heft and anything to do with proteins.
E
Right.
A
Obviously protein binding, but increasingly in protein design, I think there is real heft there. And then where there's hype is in multimodal biological models, whatever that means. And I think pick your favorite layers. It could be molecular layers, it could be spatial layers. That could be. Actually, I would say there's also heft in the pathology AI prediction models, you know, like, you know, automating the work of pathologists and radiologists. That's, that's, yeah, that's a powerful use case for sure.
E
Yeah.
A
And there's a lot of stuff where you don't have to train, you know, weird biology foundation models and you can write, you know, regulatory filings and reports and things like that. That's impactful and important.
E
Yeah.
B
So now go back to Eric's question is why don't, why hasn't AI turned out drugs yet? I think that was your question.
E
Right.
A
You know, AI for drugs is one of these weird things where everyone who works in the industry is trying to claim that their drug is like the first AI design molecule.
E
Right.
A
I feel like in, Yeah, I mean, increasingly in just a few years, this will just be a native part of the stack.
E
Right.
A
Just like we use, you know, the Internet and we use phones, we're going to have AI and all parts of the stack.
E
Right.
A
And so it's just going to become a native part of everything that we do. And so, you know, like, why hasn't it worked yet? Is this long multifactorial process that we've been talking about today, there's designing, there's the making, there's the testing, there's the approvals side of it. And you know, I think the, I do think safety and efficacy as the kind of two pillars in the industry are the two things that we need to get right. We need to be able to figure out faster ways that we can predict whether or not a molecule will work and if it's going to be safe or not. And there are ways that AI can operationalize this. If you designed a small molecule, you can now computationally dock it to every protein in the proteome and see if it's likely to bind to off target molecules. You can use this to tune binding, selectivity and affinity. That might be ways to predict safety and efficacy. And how will that work? Well, that's a feedback loop that we'll have to actually test in the lab. And that's part of what's slow, is the testing takes real hours, days, months, years. And that's really why we've picked at ahrq, the virtual cell models as our Initial wedge, because we think it can integrate a lot of these different pieces.
D
In Dario Amade's essay, Machines of Love and Grace, he predicts, among other things, the prevention of, of, of many infectious diseases and the doubling of lifespans perhaps in as soon as the next decade. What's your reaction to his, his essay's bullishness and some of his predictions?
A
I think the core intuition that Daru had was the idea that important scientific discoveries are independent or they're largely independent. And if they are statistically independent, then it would stand to reason that we could multi parallelize. And so we had models that were sufficiently predictive and useful. You could have not just 100 of them, but millions, billions of these discovery agents or processes running at a time, which should compress the timeline to new discoveries and turn it into a computation problem.
E
Right.
A
I think that is a very futuristic framing for something that is actually very tangible today. And if we can have virtual cell models at work, for example, that can start to do these kinds of things that we've been talking about help us. We can have molecular design models, we can have docking models. You can then have, you know, when you bind to this thing in this cell versus all the other off target proteins, will a cell kind of be corrected in the right way?
E
Right.
A
These kind of layers of abstraction and complexity start to get to things that feel very tangible to drug discovery. If you could actually traverse these steps reliably and in sequence, you could start to see how you can get the compression.
E
Right.
A
And so I think in the long run of time this should be possible.
B
One of the core suppositions in building a good virtual cell model is that we are feeding it all the relevant data.
A
The right data?
B
Yeah, the right data. And so we'll work to, you know, it's gene expression data or it's DNA data, or any number of factors, protein and protein interactions, all the things you described. Um, what if we're missing a core element? Like, what if we just haven't discovered the quirk or whatever? Like we just don't know what we don't know and therefore the what we're feeding the model is fundamentally or importantly incomplete.
A
I think that's almost certainly true.
E
Right.
A
Like it's seems almost obvious that we're not measuring many of the most important things in biology.
E
Right.
A
And you can of course find many important exceptions for any of these measurement technologies in biology. We ultimately have two ways to study it in high throughput. It's imaging and sequencing. But there are so many other types of things that you would care about that those things aren't necessarily going to do at scale. That's really why I think the stuff that we're talking about of the RNA layer as a mirror for other layers of biology is one that we spent a lot of time thinking about. And there's a difference between a mechanistic model and a meteorological simulation type of model. So, for example, if you want to predict the weather, right, you can build AI models that will predict whether or not it will rain next Tuesday. It won't explain physically or geologically or whatever why and how that happens, but as long as it knows if it's going to rain next Tuesday, you're probably happy. And I would say similarly with a virtual cell model, it may not tell me literally why, just like a alpha fold doesn't tell me literally why did the protein fold this way and how. But it just told me the end state and it was reasonably accurate. I think that would already be very important.
D
Shifting gears a little bit, we've been talking about science and biotech, but in addition, you're an elite AI investor more broadly. So I want to talk about how you're. I want to talk about where your investment focus is right now, just as it relates to AM more broadly. Where are you excited? Where are you spending time? Where are you, you know, looking forward to?
A
Oh, yeah. My, my goal is to really try to figure out ways that we can improve the human experience in our lifetime. I kind of think of, like, if I think about the future that we're going to leave to our children, right? There are a few things that, if we get them right in our lifetime, will fundamentally change the world, right. And how we live in it. I think synthetic biology is obviously one, right. Think GLP1s things that improve sleep, things that can improve longevity. These are all things that are kind of easy to get excited about. I think brain computer interfaces is another area where we're going to see really important breakthroughs over the decades to come. And then I think the third is in robotics, both industrial and consumer robotics that allow us to basically scale physical labor in interesting ways. And you can kind of see how each of these three things, even in the sort of medium cases of success, really kind of change the world. And so I'm very interested in helping make these kinds of things possible.
E
Right.
A
And so there is sort of, you know, in the kind of techno optimist sort of vision of the world, right? There's a few different types of scarcity, right? There's, you know, it's very easy when you do research to come up with important ideas. The hard thing is to tackle them in the right time frame.
E
Right.
A
It's like, you know, writing futuristic sci fi things is not that hard. Being able to actually execute on it in the next five years or eight years, much, much harder. And I would say academic discovery is littered with plenty of ideas that are interesting and important, but kind of long before their time. And in many ways the story of technology development is trying to use new technologies to solve old tricks. Most of our tools are for productivity.
E
Right.
A
In many ways, whether that's the industrial revolution or the computing revolution or the current AI revolution, we're trying to kind of do the same stuff. And so I think there's a relatively small set of very powerful ideas. New technologies give us new opportunities to attack them. And there's a set of people and teams that are going to be positioned to be able to do that. They need to have technical innovation and then an intuition about product and business in a way that you kind of in the RPG dice roll of the skills that you get. In these three domains, people start at different base levels. And you might have an incredibly technical founder who doesn't know how to think commercially, or someone who's just natively a very commercial thinker who doesn't have very strong product sense even though they could sell the crap out of it. I think this three broad categories of capabilities you need to bring together in a way that you can allocate capital to in the right times in order to make these ideas possible in a really differentiated way. Like this thing literally wouldn't happen if we didn't get these people together and fund it at the right time in the right way.
E
Right.
A
And that's really what motivates me. And these are kinds of the things that I've been excited about, you know, backing, you know, longevity. Companies like Neulomand.
E
Right.
A
BCI companies like Nudge.
E
Right.
A
Robotics companies like the Bot company.
E
Right.
A
You know, these are some of the examples of kind of, you know, things that I think must happen in the world and therefore should happen. And, you know, how do we actually find the right people and the right time to actually kind of go on the fellowship of the ring hunt.
E
Yeah.
D
If not too difficult. I want to ask Jorge a question adopted to these additional spaces. Robotics, sort of BCIs and longevity, if appropriate. Terms of the three questions I believe were what's overhyped? Where do you see opportunity or path? And what's got heft already?
A
I think the cool thing about agents generally is that they do real work. Right. Compared to SaaS, companies that came before agents replace real productivity.
E
Right.
A
And I think, you know, they have a lot of errors today. And I would say the computer use agents will probably trail the coding agents by maybe a year.
E
Right.
A
But, but it's coming and we'll follow the trajectory as these go from doing, you know, minutes of work without error to hours to days.
E
Right.
A
And I think, you know, you're going to get a completely different product shape as we march through that across legal, bpo, medicine, healthcare, whatever.
E
Right.
A
And we'll kind of follow that as an industry and that's going to be really exciting. And I think that's where we're going to see real heft is because most of the economy services spent, it's not software spend. And the reason why we're all excited about this stuff is that it can attack the services economy. And I would say, where is their hype? There's a tremendous amount. That's no doubt. The hype is in the model capabilities. And we're working with an architecture that dates back to 2017. And if you look at the history of deep learning, it's like every eight years there's something really different. And it feels like in 2025 we're really overdue for some net new architecture. And I think there are lots of really interesting research ideas that are bubbling up that could do that thing. And in many ways there's a set of really interesting academic ideas, especially in the golden age of machine learning research from, I don't know, like 2009 to 2015.
E
Right.
A
There's so many interesting ideas, little archive papers that have like 30 citations or less. And as the marginal cost of compute goes down year on year, I think you're going to be able to take all of these ideas and actually scale them up.
E
Right.
A
Where you don't see the scaling laws when you're training them at 100 million or 650 million parameters like back then. But if you can scale them up to 1B, 7B, 35B, 70B, you start to see whether or not these ideas will pop.
E
Right.
A
And I think that's very exciting because there's just going to be a lot of opportunity for new superintelligence labs to do things beyond what the kind of, you know, established foundation model companies are doing today.
E
Right.
A
As they kind of, you know, in addition to these research teams.
E
Right.
A
You know, these are in many ways becoming applied AI companies.
E
Right.
A
They need to build product shape and, you know, all kinds of different enterprises and do RL for businesses and make money. Right. And I think or build coding agents and make API revenue and that's important and I think a timely race to survive today. But I'm just very bullish on the research of say, a Sakana AI, which was founded by one of the authors of Attention is all you need, Ian Jones. And they're doing incredibly interesting stuff on model merging and how you can have kind of evolutionary selection of different kind of, you know, models in moe. And I think the there are sort of opportunities here in the long run to move beyond just like RL gyms, for example, also to kind of figure out new ways to learn and find like kind of reward signal is going to be really exciting.
D
I think it's a great place to wrap. Gearing towards towards closing anything upcoming for ARK that you'd like us to know about. Anything you want to tease. Anything for people who want to learn more, what should they know about?
A
So AlphaFold was in many ways came out of a protein folding competition called casp, right? Critical assessment of the structure of proteins. And we created our own virtual cell challenge@virtualcellchallenge.org, where we have $100,000 prizes sponsored by Nvidia and 10X Genomics and Ultima and others. And it's an open competition that anyone can enter where you can train perturbation prediction models and we can openly and transparently assess these model capabilities both today and in subsequent years. Follow them to get to that ChatGPT moment.
E
Right.
A
And so I'm extremely excited about this. You know, we, we'd like more people to, you know, train models and apply both bio ML experts and engineers in any other domain. And you know, I'm, you know, I just, I want this thing to exist in the world. You know, hopefully we're important parts of making that happen, but I'd just be happy that someone does it.
D
Yeah, that's an inspiring note to wrap on. Patrick, Jorge, thanks so much for the conversation.
A
Thanks so much, guys. Appreciate it.
B
Thanks for having me.
C
Thanks for listening to the A16Z podcast. If you enjoyed the episode, let us know by leaving a review@ratethispodcast.com.
D
We'Ve got.
C
More great conversations coming your way. See you next time. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com disclosures.
a16z Podcast – “Faster Science, Better Drugs”
September 15, 2025
Host: Andreessen Horowitz | Guests: Patrick Hsu (Co-founder, ARC Institute), Jorge Conde (a16z General Partner)
This episode dives deep into the current frontiers and bottlenecks in biological science and drug discovery, focusing on how advanced AI and interdisciplinary collaboration could fundamentally accelerate progress. The conversation explores the moonshot goal of simulating “virtual cells,” the translation of breakthroughs into real medical and business results, and the incentive structures slowing science. The guests also reflect on what a transformative “AlphaFold moment” would look like for cell biology, and what’s truly required to compress the painfully slow timelines of drug development.
“Our moonshot is really to make virtual cells at ARC and simulate human biology with foundation models.” (A, 00:00)
“It’s this weird Gordian knot that ultimately comes down to incentives.” (A, 02:30)
“Technology is easier than biology. Natural language and video modeling is easier than modeling biology.” (A, 05:49)
“We kind of want to get to that point with virtual cells as well... to do perturbation prediction.” (A, 11:16)
“You have to bet on what you can scale today.” (A, 09:51)
“There’s designing, there’s the making, there’s the testing, there’s the approvals... testing takes real hours, days, months, years.” (A, 41:04)
“That bottleneck should exist. I’m not suggesting we’ve got to remove it. But are there ways to reduce the cost and time associated with getting through the bottleneck of human clinical trials?” (B, 27:06)
“We believe in virtual cells not just because we think it will be a fountain of fundamental mechanistic insights... it could be industrially really useful.” (A, 23:32)
“The market cap added to Lilly and Novo... more than the market cap of all biotech companies combined over the last 40 years.” (A, 31:01)
“If the capital intensity goes down and the value creation goes up, it becomes easier to invest in these companies in the early days because you get rewarded for coming in early.” (B, 29:03)
“Could it predict that the four Yamanaka factors would reprogram the fibroblast into a stem like state... something that won the Nobel Prize in 2009?” (A, 17:33)
“If you want to predict the weather, right, you can build AI models that will predict whether or not it will rain next Tuesday... Similarly with a virtual cell model, it may not tell me literally why...” (A, 45:00)
“We created our own virtual cell challenge... an open competition that anyone can enter where you can train perturbation prediction models and we can openly and transparently assess these model capabilities.” (A, 54:34)
“If we can figure out how to model the fundamental unit of biology, the cell, then from that we should be able to build.” (B, 00:14)
“Science is slow…it's this weird Gordian knot that comes down to incentives.” (A, 02:30)
“Technology is easier than biology…we don’t speak the language of biology. At very best, with an incredibly thick accent.” (A, 05:49–06:18)
“Anytime you want to, you know, work with a protein, if you don’t have an experimentally solved structure, you’re just going to fold it with this algorithm. We kind of want to get to that point with virtual cells as well.” (A, 11:16)
“RNA representation is a mirror…it might be a lower resolution mirror for what’s happening at the protein layer, but eventually…at some sort of mirror echo.” (A, 09:31)
“That bottleneck should exist. I’m not suggesting we’ve got to remove it. But are there ways to reduce the cost and time associated with getting through…human clinical trials?” (B, 27:06)
“In just a few years, [AI for drugs] will just be a native part of the stack. Just like we use the Internet and phones…we’re going to have AI in all parts of the stack.” (A, 40:50)
“If we could do that, we could make a new AI, like vertically integrated AI enabled pharma company.” (A, 14:38)
“I just, I want this thing to exist in the world…I'd just be happy that someone does it.” (A, 55:12)
This episode offers an unvarnished look at both the promise and practical hurdles of integrating AI into the biology and drug discovery pipeline. While massive modeling advances—like AlphaFold for proteins—signal what’s possible, the stubborn pace of progress in clinical development means the next big breakthrough will require both technical and organizational innovation. ARC’s Virtual Cell Challenge and similar efforts aim to catalyze this transition from dream to reality—and listeners are invited to join the mission.
To learn more or enter the Virtual Cell Challenge, visit: virtualcellchallenge.org