Summary6 min read

NVIDIA AI Podcast

Episode 275: How CytoReason is Bridging the Data Insight Gap to Accelerate Healthcare Breakthroughs
Host: Noah Kravitz
Guest: Shai Shen-Orr (Co-founder & Chief Scientist, CytoReason; Professor, Technion – Israel Institute of Technology)
Date: October 8, 2025

Episode Overview

This episode explores how CytoReason leverages AI and agentic workflows to bridge the growing gap between biological data and actionable insight in the life sciences, focusing primarily on drug discovery and development. Shai Shen-Orr discusses the unique challenges and profound opportunities in computational biology, the necessity of integrating diverse molecular data and literature, and how CytoReason’s platform empowers pharma and biotech companies to make data-driven decisions more effectively. The episode is forward-looking, addressing the future landscape of biomedical research and the increasing role of AI.

Key Discussion Points

1. Shai Shen-Orr's Background & The Birth of CytoReason

Shai’s Journey into Computational Biology:
- Began in the late 1990s as the Human Genome Project unraveled new data-driven challenges in biology.
- Early fascination with applying AI methods to biological systems and recognizing the switch from "1 tube, 1 result" to "1 tube, a million results."
- Quote:
  
  "I often call it deep more than big. A big experiment is a million measurements on 100 people. So there's way more features... a P is greater than N type problem." (03:10, Shai)
The Data-Insight Gap:
- The rate at which new biological data is generated is exponential, while insight extraction is only linear; the majority of biological data remains underutilized.
- Quote:
  
  "The gap between data and insight... data is exponential, insight is linear. Every day, percent data utilized to give insight is lower." (04:01, Shai)
Founding CytoReason (2016):
- CytoReason was built as an "AI for pharma" company—not to develop drugs, but to develop an advanced analytical platform bridging this insight gap.

2. What CytoReason Offers & Its Core Users

Clients & Use Cases:
- Serves both major pharmaceutical companies (e.g., Pfizer, Sanofi) and biotechs.
- Platform is used across the drug development lifecycle: target prioritization, indication selection, clinical trial design (including subpopulation selection).
User Base:
- Data scientists (overdue workloads as data scales)
- Biologists (increasingly computationally enabled)
- Heads of therapeutic areas, portfolio managers, and strategic decision-makers.
Platform Value:
- Integrates virtually all available human molecular data into a single, unified disease model.
- Provides "a yardstick to all the science, the molecular science that's out there." (09:44, Shai)

3. Architecture, Agentic Workflows, and Automation

Why Agentic Workflows?
- The velocity and volume of data (e.g., "every two minutes a new paper comes out" in immunology) make manual analysis unsustainable.
- Quote (Red Queen effect):
  
  "You have to run just to stay in place." (12:28, Shai)
Implementation:
- Automation is vital; employees are encouraged to spend 20% of their time thinking about how to automate their jobs.
- Agentic AI is used for data ingestion, curation, QC, and increasingly complex decision-support roles.
Unique Challenge in Biomedicine:
- Biological data are "deep, not just big," with more features than samples—a tough challenge for traditional machine learning methods.
- New measurement technologies continually emerge, requiring hybrid modeling approaches (combining deep learning, LLMs, statistics, rules).

4. Integrating Medical Literature with Molecular Data

Why Literature Matters:
- Scientific literature is prior knowledge, not just data. Integrating it narrows the search space and boosts model robustness.
Building Trust and Explainability:
- Biomedicine demands not only predictive accuracy but mechanistic explanations—users need to understand why a model predicts what it does.
- CytoReason uses LLMs with retrieval-augmented generation, confidence scoring, and "biocredibility" validators.
Guardrails & Confidence:
- High trust thresholds are set for models, leveraging confidence scores derived from literature sampling and AI techniques.
- Quote:
  
  "People are seeking ... it to be a mechanistic model. Explain to me why that prediction makes sense. And give me trust in it." (17:08, Shai)

5. Distinction Between CytoReason and Other AI Healthcare Approaches

Focus on the ‘Biology’ Layer:
- While AI has made huge strides in chemistry (e.g., protein structure prediction) and clinical data (EHR, recruitment), biology remains the most "unsolved" and crucial level.
- Major pharma challenges: unknown disease biology (causing Phase II trial failures) and human biological diversity.
Workflow Integration:
- CytoReason helps users identify the best drug targets, mechanism rationales, and new disease indications.
- Platform enables “small tests” (simulated or in-vitro) to validate and build confidence in its AI-generated hypotheses before major investments.

6. User Feedback, Challenges, and Platform Evolution

User Demands:
- Some users want deep granularity on specific drugs; others demand breadth across pipelines—the platform must support both.
- As new molecular measurement modalities emerge, CytoReason rapidly integrates them, despite initially small datasets.
Hybrid Model Approach:
- Combines deep learning, LLMs, rule-based systems, and traditional statistics.
- Emphasizes flexibility to best address each type of data and modeling challenge.

7. The Future of Biomedicine & The Researcher’s Role

Shai’s Outlook:
- Computational biologists, life scientists, and clinicians will increasingly automate current workflows, freeing them to tackle new, even more challenging problems.
- The future is not about fear of automation but excitement about continually advancing the boundaries of knowledge.
- Quote:
  
  "It's a field of unknown unknowns... The necessity for us, the obligation... to bring in AI... to actually bring cures to people. I see this as an obligation and I'm not afraid of... suddenly a machine doing what it is because there's always the next gang." (31:11, Shai)

Notable Quotes & Memorable Moments

On the “Red Queen Effect” in Biomedical Data Work:

"You have to run just to stay in place." (12:28, Shai)
On AI-Driven Automation:

"Eighty percent of your time you spend on whatever your job is. Twenty percent you have to spend on how do I make my job obsolete and automated." (11:32, Shai)
On the Deep Data Challenge:

"There's very few places in biology today you can just stick them into a deep learning model and you'll get good performance... Everywhere else, there's just not enough data, and you need to somehow overcome these things." (28:13, Shai)
On the Value of Meaningful Work:

"Using the tools to get the old ones [problems] done faster so we can get to the new stuff." (32:04, Noah)

Timestamps for Important Segments

[01:31] Shai’s background and the origins of computational biology
[04:01] The exponential growth of biological data vs. linear insight
[05:41] CytoReason’s business model and user base
[09:43–13:25] The Red Queen effect, automation, and agentic workflows
[14:56–18:25] Role of literature, knowledge integration, and confidence
[20:37] How CytoReason fits into pharma’s workflows, compared to other AI approaches
[25:31] User feedback, breadth vs. depth, and hybrid modeling
[28:51] Vision for the future of biomedicine and the evolving role of researchers
[32:38] Shai’s “Tech on Drugs” podcast plug

Additional Resources

CytoReason: cytoreason.com
Tech on Drugs Podcast: Available on Spotify and other platforms
CytoReason LinkedIn: Actively maintained for updates and news

Summary compiled to capture the full depth and insight of the episode while preserving speaker intent and tone. Advertisements, initial greetings, and outros have been omitted.

Loading summary

Transcript87 lines

[00:00]
A
Foreign.
[00:11]
B
Hello and welcome to the Nvidia AI Podcast. I'm your host, Noah Kravitz. You've heard of language models, video models, reasoning models and foundational models. And here on the podcast we've talked a lot about healthcare specific AI models for things like protein structure prediction. Well, today we're exploring disease models. The Cytoreason disease model is a comprehensive model of human diseases that models and compares treatments and patient groups, helping researchers of all levels make data driven decisions across the drug development lifecycle. That brief description doesn't really do justice to what disease models and Cytoreason as a company are all about, but our guest is here to help. Shai Shen Orr is co founder and chief scientist at Cytoreason and and professor of systems immunology and precision medicine at the Technion, the Israel Institute of Technology. Shai's here to tell us all about Cyto reason, how it got started, why the technology is so important and what they're trying to do. And we're grateful to have you here. So, Shai, welcome and thanks for joining the AI podcast.
[01:12]
A
Oh, thanks, Noah. Pleasure to be here and thank you for inviting me to speak about my favorite subject, I guess, please, and just.
[01:19]
B
Take it, take it right from there. I don't even need a more pointed question. Tell us about, tell us, tell us about your favorite subject. Maybe start with your background a little bit and then tell us how cyto reason came to be and what it's all about.
[01:31]
A
Sure. So, yeah, I'll go back. I guess at this point I can think of myself a bit of a dinosaur in the face of doing computational biology, data science. I started back in the, I guess late 20th century, as they say, with the idea where basically we're just starting the human genome was getting sequenced and the realization that biology is making a leap from a one tube, one result type field to one tube, a million results type field, and suddenly there's room for what evolved. What I think now we think about is data science and AI in the context of medicine and life sciences and healthcare. And for me, that discovery of falling in love in biology you can do. I was kind of doing a lot of stuff around AI in the late 90s, as they say, very different space eons ago.
[02:29]
B
Yeah, yes.
[02:30]
A
But discovering that you can actually use the same kind of, the same type of thinking but in a space such as life sciences and healthcare was to me profound and kind of changed my life course. Realizing that in this space actually not only, you know, is the data interesting and There's a good, I think humanitarian cause. But also are the AI challenges are profound because this data is, is, I often call it deep more than big. Right. I have, you know, a big experiment is a million measurements on 100 people. Right. And so there's way more, you know, way more features of P is greater greater than N type problem.
[03:11]
B
Right.
[03:11]
A
And that brings really interesting problems. And how do you build machine learning models that actually overcome this when there's not that much of a repeat kind of information to learn from and that that brings in a lot of prior knowledge and we'll get to talking about that I guess later. Yeah, so that's kind of. I came from this systems immunology. I'm a faculty member at the Technion and I realized, I guess in the 20 years I've been doing this type of work, I realized that as biology were kind of fascinated with that Q1 million results. I realized that we're actually in this amazing times where data is exploding but value insight from it does not explode in the same rate. Right. The gap between data and insight actually you can think about like data is exponential, insight is linear. Every day, every day percent data utilized to give insight is lower.
[04:07]
B
Right.
[04:08]
A
And the question is, how do you overcome this and how do you develop these techniques to ultimately bridge what I call the data insight gap? And words biologists, molecular biologists have investing a huge amount in making this amazing tool that can measure basically now every layer of human biology comprehensively. The analytical side of this and the AI solutions for this have been missing. The field is still largely a manual field where you give people some data, they sit in front of their computer, they try to figure out they make some value and insight from this. And I figured that's not a sustainable solution. This field needs to move to ultimately build much larger integrative solutions that bring in many different angles of machine learning, AI statistics and so forth to ultimately bridge this. And it needs to be done in a way that's ultimately is reproducible and productized and that's kind of what launched Cider Ease. So we founded Sideways in 2016 with the aim of basically building a pharma AI company that is not a biotech company, that does not develop drugs. It develops an analytical platform, an integrated AI solution to bridge the data inside gap. So that's I guess our origin story.
[05:26]
B
Got it. And so then if you're not, if Cyto Reason isn't a biotech company, do you serve biotech companies who are some of the customers? Or maybe if that's not the right way to get into it. Tell us a little bit about what Cytoreason offers.
[05:42]
A
So I think it's actually a great place to get into it because this data inside gap exists throughout life sciences, everywhere. Now biology is a big or deep data field, as I said. And the question now is, you know, okay, well you know, where is it going to be the matter the most to bridge the gap between the data and the insight? And nowhere is it more important or cost effective and I think ultimately brings the right utility to humanity than to close it in drug development. I don't know if you're familiar with the numbers. They're horrendous. Right. If I, if you know a drug today costs $2.5 billion to develop, most of that cost is actually it needs to overcome the failures. It's a failure business. Most drugs that you try to develop, even if from the first drug that you put into a human, the likelihood of actually it failing is 90% to ultimately not making it right. And if you're talking about many sub drug, if you're a pharma company and you need to be developing and you have many different assets that's being developed, well you need some kind of a scalable solution to ensure that the, your success rate over time grows. And it's, you know, it's a no brainer that it needs to be a data driven solution. So Cidereason customers are some of the world's largest pharma companies. Pfizer, Sanofi are you know, examples of companies that Cidereason has long standing relationships with. But the same problems that happen with Pfizer, you know, who's developing hundreds of molecules of laser happens in a biotech. So anybody who's developing a drug, and I would even argue diagnostics and so forth has this problem of how do I make decisions that are data driven at scale.
[07:22]
B
Right. And so do the Cyto raisin models allow researchers, pharmacists to predict the effects of a drug they're working on? How does it, how does that work? Just kind of in lay terms.
[07:36]
A
Sure. So you know, in terms of the user base for the, for it's really the Cytoreason platform is an enterprise solution. You know, we're trying to address the needs of data scientists who, you know, whose work, because of the data inside gap keeps growing like crazy.
[07:52]
B
Right. And as you said you're, you're working with some of the biggest pharma companies and institutions in the world. So this is a lot of Data big gap getting bigger.
[08:00]
A
Correct, correct. And so what a data scientist needed to do, you know, two, three years ago in a pharma company, it's like, keeps growing, it's like 10x because the data keeps exploding and nobody wants to make a decision.
[08:13]
B
Yeah.
[08:14]
A
Having just suffered from the problem that they didn't have time to take the most appropriate data set and analyze it and figure out what it is that they need to be doing. Right. So it's data scientists, it's biologists who are not necessarily programming though this is becoming less of an issue. I think now you can touch on that, but who are ultimately driving their particular drug programs and need to make decisions around those in the context of the competition, the standard of care is what other pharma are doing. And it then goes up to, you know, heads of therapeutic areas who need to choose what is the, you know, not only do I want to develop this drug, what's the right disease to go after? Well, you know, there's many diseases. They're only going to give me so many shots on goal to, to fail or succeed, I need to make those choices. Right. Some of those considerations are commercial, but many of them are scientific and that's what sideways and brings table and it goes on and on in that space. You can think about portfolio management, people who make strategic decisions. Those are the user base and center. Reason basically brings in all the world's molecular data in humans right now, integrates it into a single model that allows us to learn from this and ultimately support decisions, use cases such as how do I prioritize a target, which target or combination of targets? I prioritize. Which diseases do I prioritize for my next trial or what subpopulation should we be excluding or including from the trial? Because they'll succeed.
[09:44]
B
Right.
[09:45]
A
So those are really expensive decisions, complicated ones. And we basically try to bring a yardstick to all the science, the molecular science that's out there.
[09:56]
B
Right. It's a big yardstick. I want to get to how you decided to start building agentic workflows and if there was something specific, specific kind of challenge, whether on the science end of things or in wrangling different types of data and that kind of thing. But maybe walk us through a little bit kind of how you architected, how you built cider reason and then, you know, describe now the agentic workflows you're using and go into why a little bit.
[10:20]
A
Yeah, sure. I mean, I think in some ways I already gave a bit of a clue.
[10:24]
B
Yeah.
[10:24]
A
Because I described the Data inside gap. Right. So imagine you live in a field and I'll give you an example just, just from, to give. Make this kind of real. If you talk about the scientific literature, my field I mentioned in the beginning, I'm a systems immunologist. I work on the immune system in immunology. Every two minutes a new paper comes out.
[10:45]
B
Wow, okay.
[10:46]
A
It's kind of humbling, right?
[10:48]
B
Every day.
[10:49]
A
And even if you're going to argue that half of them are not worth reading, you're still going to run out of time. Now similarly, if you're talking about single cell data, RNA seq data, gene expression, proteomics, all, you know, things that, you know, our audience may be familiar with or not, doesn't matter. That data is like, while I'm sitting here working, while we're talking, there's data coming out that is. Could be very valuable to make, to drive my decision. So saturation as a company is by its own definition of its goal and vision needs to somehow beat this exponential growth in data. Right? So immediately says, I say this to every employee at salary. Then, you know, say 80% of your time you spend on whatever your job is.20% you have to spend on how do I make my job obsolete and automated. So, because I have the next challenge to do, because that data, if we don't, we're going to be beaten by the, you know, avalanche of data coming. So that if you think about this as a pitch for agentic AI, you know, it doesn't get better than that.
[11:59]
B
I'm just, I'm seeing a commercial in my head with like agents and Dr. Scrubs and track shoes. You just run it as fast as they can to stay ahead of the data just piling up.
[12:09]
A
Right, right.
[12:09]
B
But, yeah, that's. Yeah, exactly.
[12:11]
A
So you basically are constantly in a game in which you need to make it faster. Just, you know, it's actually what's called, you know, the, an evolution. And then you remember Alice in Wonderland? The Red Queen. Sure, right. Where she said to Alice, you have to run just to stay in place.
[12:27]
B
Yeah, Yep.
[12:28]
A
Right. And it's also an evolutionary principle of how, you know, viruses in the immune system combat. This is another topic. I can talk to you about this another time. But, but the Red Queen effect. So this, this need for us to continuously run is a huge driver for automation acceleration. And I would even say the cognitive meta analysis that we as humans need to do to somehow describe to a machine how we make decisions so that we can automate them. And so with that in mind, I think almost certain reason had the thought that we needed genic AI even before agenda AI was there. And of course when it came around we jumped on the bandwagon. So it starts at the earliest stages. I need to bring the data in. Right. To bring the data in, you could, you, you could go the manual route. Right. Which is like to have people bring it in one by one. Totally unsustainable given the data keeps growing.
[13:25]
B
A paper every four minutes, if we include the bad half. Yeah, it's a lot of, a lot of data. Yeah.
[13:30]
A
Or you know, data sets and so forth, like at every molecular level that. So you, you really cannot do this manually and strive to do to get that level. Right. You can build pipelines and automate that. You want to process this. As soon as you do this more and more with kind of molecular level data, you realize there's in biology has a lot of these exceptions and outliers and so forth. So ultimately a more appropriate solution is to teach a machine a workflow that may be very complicated where humans make decision, but you can see it and then start that automating process. Now would it work perfectly from day one? Depends on the complexity of the data that you're going to. But if you then, you know, you put a QC process that you start with manually and then you make that automated as well and so forth, you can build processes that really accelerate your data intake. And that's just the most obvious place where the agentic AI comes in. Right. It can come in in other places around.
[14:26]
B
Sure.
[14:27]
A
You know, decision support will. We'll talk about this more.
[14:31]
B
So thinking about keeping up with the data, the literature in particular, were there specific techniques or, you know, there are obvious advantages. We've talked about just agents being able to go out and do the research and grab the data kind of obviously is a, you know, a game changer. But other things that you discovered about working with agents to curate and review the medical literature in particular that jump out at you.
[14:57]
A
Well, I think it's a wonderful question. I'll answer, I think on two angles. So first of all, just to say why in such a data rich field somebody needs literature which you can think about it as data. You can say, well that's data. Right. But I actually want to uniquely identify that data from other data because I would argue that literature is already at a stage of knowledge and biology has a lot of data that isn't yet knowledge. Cytoreason deals with with all the data, but also the literature. And the reason we need it is because of, well, a. People want the knowledge. Right. But the other side which is more interesting is when I discover that this is a deep data field where there's way more features than there are kind of measurements or samples and so forth. The way that people make decisions, you basically cannot just stick this into, you know, machine learning model and it's going to be, it's going to be basically an overfit. Right. And the way you kind of deal with this is actually by the integration of prior data which comes from the literature that allows you to narrow down the search space in a variable.
[16:03]
B
Right. Okay.
[16:04]
A
It has two advantages. One advantage is that you do make, it's easier for you to make discoveries. And there's another advantage which is, relates to our customer base and I think in general and how people make decisions, large decisions in the face of uncertainty, which is that they want to stand on, on the shoulders of giants or at least stand on some level of confidence. Right. So being able to connect new novel discoveries, emerging phenomenas and so forth that the, that the AI model produced to knowledge that I, I, you know, I solidly believe in a firm is actually an important thing for our customer base and for any scientist to actually make the leap. Right. Because it's going to. The next stage could be, you know, it usually would be an experiment. It can sometimes be a very expensive experiment. And people, it's not enough just to have a predictive model. People are seeking from our, our customers are seeking from us and where we started for it to be a mechanistic model. Explain to me why that prediction makes sense.
[17:09]
B
Right.
[17:09]
A
And give me trust in it. And the literature brings that piece in. Right. From an agentic and I perspective. That means also for instance, as an example, confidence scores on the literature are a really key thing for us. Right. Because this literature is complicated. There's not a huge amount of instances of any one event being how sure are we that this particular kind of description of a biological event is actually correct? And that for us was a huge piece of entering of how we kind of been pushing the LLMs within Saudaris and then the agenda KI workflows and it goes, I kind of mentioned, it goes everywhere here. We need to be really sure. You know, Gen is awesome and Geni, but we need to be, we need to have the high quality and so we've been putting a lot of these guardrails, if you like.
[17:58]
B
Yeah. Where do the confidence scores come from? Are you cited reason generating them? Are they in the literature?
[18:03]
A
So you basically can come up with A variety of different techniques in which by sampling the literature. Right. And also fitting you, you can, you can, you know, sampling the literature, you can build that confidence on one hand by putting an LLM rag component. Right. So you're actually doing retrieval, augmented generation and kind of querying this to be more certain about what it is that I'm looking for.
[18:25]
B
Right.
[18:25]
A
All of those. And there's a variety of other techniques. There's also the kind of the, what we call biological expectations or biocredibility in the end, to check ourselves on this and so that it's a loop that keeps improving. All of those are techniques that allow us to basically build the confidence that we need for these heavy decisions. On one hand, to leverage the necessity to leverage an AI and a Geni guy to basically move forward and do this on large scale. And on the other hand, to ensure the confidence is high.
[18:51]
B
I'm speaking with Shai Shen Orr. Shai is co founder and chief scientist at cytoreason, the company we've been talking about, and he's also professor of systems immunology and precision medicine at the Technion Israel Institute of Technology. We've been talking about Cytoreason and just recently the importance of building trust in the model's output. And you know, it's something that applies to generative AI in any situation. But as you were saying, Shai, in biology, in precision medicine and drug discovery and pharma and everything, these decisions are both, you know, literally can be life and death for lots of people, but also quite expensive and involve, you know, saying go involves a lot of resources being put to use, a lot of money being spent. And it made me think, Shai, we've had a couple of conversations on the podcast with folks in the protein sequence prediction and generation space and other drug discovery related spaces, and I'm wondering about Cytoreason's place in the workflow, in the researcher's workflow or the end user, whoever's using it. And you know, when you mentioned about making these decisions and experiments being expensive, I've talked to folks before. I've read about folks using AI models, generative AI generation, to do sort of simulated experiments right before moving to the wet lab, being able to run and kind of narrow down which ones are worth the cost and the effort to do are your customers using cider reason kind of in the same way or what does a workflow look like? And then out of that, I wanted to ask you if there's a time you can share with us where CIDER reasons workflow enabled something really unique from an end user. So you talk a little bit about the workflow, if you could.
[20:38]
A
Sure. So from a workflow perspective, there's maybe two points to say and it sounds like you have been talking to some interesting people doing interesting stuff around Protein Sarcoz. I'll differentiate ourselves from it. So Cyrusn is a company and this is also interesting, I think, almost from the Nvidia kind of marketplace and kind of the company Cytorizen is a company that Nvidia invested in. And I think we stand out uniquely within those. Because if I, if I look at the healthcare flow, right, there's the chemistry of it. Right. There is the biology of it and there's the kind of clinical side of it.
[21:15]
B
Okay, Right.
[21:16]
A
And if you look where AI has been playing a big role at this point, it's certainly been in the chemistry space, chemical structure, and I would put protein structures there as well.
[21:25]
B
Right.
[21:26]
A
From small molecules libraries to protein structure, there's a huge amount that's happening with kind of, you know, Nvidia GPUs and you know, generative AI and so forth to basically build those molecules. And of course, anything that gets built there, there's a simulation test. But ultimately somebody puts in experimental tests. And experimental tests are usually, I would say, at early stages. They are what would be called an in vitro experiment. There's no animal, there's no human. You know, you're just texting to see, well, okay, it was this antibody that I just kind of simulated when I generated. Does it hold the properties and can it be a good, kind of good direction on the, on the clinical side? There's also a lot of, I think, agentic AI happening with, with a lot of kind of shortening, say kind of recruitment for clinical trials and so forth. Right. There's a lot of space happening in the electronic and medical record. It's a relatively well defined space. There's a huge amount of almost like human operations that goes there. And then I think agentic AI has been playing a big roll. It. Cytoreason is quite unique in that we're focused on the biology side of things. And biology, if you compare them to the two, is actually the biggest unsolved problem.
[22:39]
B
Yeah.
[22:39]
A
I would say today if I look at pharma, the two big problems from a science perspective are, one is that we don't have a good understanding of the, of the biology. And you see it in clinical trials that phase two, which is the first time we tested in humans, is where the biggest failure Rates are right.
[22:58]
B
Okay, okay.
[22:58]
A
So it tells you. And then the second piece is human diversity. So the biology can be, you know, you and I may not have, you know, we may have the same indication and so forth. We actually may look very different and could be for different causes. We still don't have a good understanding of this. That's where cytosins are playing and bringing AI there is, you know, the search space is why way, way bigger than the chemistry. And so it's an early stage to build on, but it's clearly biggest problem. And that I think where we'll see companies going on and certainly that's kind of where we've been kind of leading and Nvidia kind of putting the or, you know, I think our trust in us has been a huge thing for us. So, you know, I think if I look at that space, are the users behaving there? Well, first of all, they need to explore the disease biology and then they need to think about their use cases. And again, the use cases is what would be a good target to choose from given I need to have IT work in this particular disease. And given that I know, you know, this is, you know, this disease already has a bunch of standard cures that I need to beat.
[24:04]
B
Right.
[24:05]
A
And I know that there's people who are not responding and what is it that's about them that maybe I can target. And there's other companies that are developing and there may be, you know, out to market before me. So they need to. All of this commercial questions need to come into a scientist thinking about disease biology and saying, where's the niche that I can come in? And, and so whether it's target polarization or bigger than this indication, choose the, the, the next clinical trial. Is it happening in ra? Is it happening in Crohn's disease? Is it happening in, you know, in Alzheimer's? That's not an easy. Those are the use cases. And so if you think about what salaries it brings, it tells this particular target is the best priority to go for this disease. Here's a bunch of mechanisms why we think this is the case. And the users can go and do small tests. It's very different from the protein structure ones I mentioned before. But small tests that actually validate the top hypotheses, build confidence in the AI prediction and then you go and you execute on them.
[25:10]
B
What's the feedback been like from your users? And I'm wondering, I mean, go anywhere you want with this. I'm wondering if there are certain areas that have been brought to your attention to focus in on whether it's that the users have been kind of poking at a certain area and wanting more functionality or if maybe something you didn't expect popped up and it's a different path to look at.
[25:32]
A
Sure. So I think in general in this. So it's a very interesting question. There's a lot of these. Right. So in general, the users want a lot from.
[25:42]
B
What does any user want right now?
[25:43]
A
Yeah, but I'll just mention a few directions that you'll just see how they themselves struggle. Right. So on one hand, you could think about this as I'm invested in particular asset I just paid or I invested a huge amount of hundreds of millions of dollars to manufacture a drug. And what I want to do is deepen, I want to study that, get every possible layer and model every possible layer here that my prediction is the best. And on the other hand, orthogonal to this is you could. And this is, you know, obviously like a person who is a program lead for that drug. If you ask them that's what he wants to do. Then you go to somebody who's in charge of an AI strategy for a pharma company or is the head of a therapeutic area and they say, well, that's one drug. I obviously, you know, it's. I care about it, but I have 100 drugs that I actually am simultaneously developing and I need to evaluate them across tens, if not sometimes hundreds of diseases. We need scale. Right. Those two are orthogonal and you need to basically care do it because to both. Because the science is always in the depth.
[26:52]
B
Right.
[26:52]
A
And the commercial problems are often in the bread. Right. And you need to do them. Right. Other pieces of challenges come from new data types biology keeps inventing or biologists new measurement modalities. So I can model a tissue and say here is the MRNA in it. Or I can model a tissue and say, well, I've modeled, I took the mRNA, I've developed methodologies to describe that this biopsy actually is made up of cells. And now new technology allows me to say, well, I can tell you the geographical position of every cell and how they interact. So as new technologies come out, well, let's add them into the model. And they never come out with a huge amount of data in this field that's deep data. It's like we have 10 samples that we generated with a new technology that each file is a gigabyte. And so again it comes to this, how do I enter a prior in so that, you know, on one end I'm aware Of the fact that I only have 10 samples and the world's population diversity is bigger than 10 patients for this. On the other hand, I use this new technology to contain and societies and models from a perspective. And the word model here is deceptive in some sense. We develop what I would call hybrid models. So on one hand we have services that are deep learning and LLM and so forth. And on the other hand we have places where it would be standard traditional statistics and statistical learning and rule based. Because the problem, the richness of the data is so big. Like, you know, there's, there's very few places in biology today you can just stick them into a deep learning model and you'll get good performance. Right. Maybe it's images and genetics and protein structure everywhere else. There's just not enough data and you need to somehow overcome these things. And so we build our. Our model is a. Is ultimately an integrated framework that calls a lot of different services, that has many different solutions to each tailored for the, for the different components and then integrate them together.
[28:51]
B
What do you see the future of biomedicine? What do you see the researcher, scientist, sort of the job look like and specifically the tools. Right. In a few years, whatever the timeframe is, two, three years, five years, 10 years, whatever timeframe makes sense to you from what you've seen. What do you see that role looking like and what do you see the, the technology component looking like in a couple of years? And I'm thinking about both, you know, everything you described in the industry and balancing research and science and all of these, you know, the data and everything. But also something you said about what you say to your own employees when they, I don't know if you said when they start, but you know, like you need to figure out how to automate, you know, make yourself obsolete, automate what you're doing away because there's so much more we have to do.
[29:40]
A
Yeah.
[29:41]
B
So. Yeah. Where are we headed?
[29:43]
A
I think it's a wonderful question. Obviously I will only claim this as my viewpoint and.
[29:49]
B
Exactly. Yeah.
[29:50]
A
I feel like, you know, I personally feel I've been blessed by that I encountered biology when I did.
[29:56]
B
Yeah.
[29:57]
A
And that I ended up in what is simultaneously an infinite field. Right. As we will not solve all of biology in my lifetime, even with agentic AI and so forth. And on the other hand, a field that has been ripe to start thinking in a more. I often call it engineering fashion, but a rule kind of basically building principles on which you can actually teach machines to help you. So from my Perspective as I look at this, and I, you know, I think about the job of computational biologists and the job of biologists and the job of clinicians, all of which are critical to ultimately bring that healthcare to patients. I think all of these people have been blessed with, with now solutions that allow them to take yesterday's thing, automated to a level they could never imagine, it was like a science fiction thing, and then get busy with the next cool thing that they couldn't even imagine. Yeah, right. And, you know, I'll give you another example. In biology, you know, you keep discovering new things. It's a field of unknown unknowns. Oftentimes when I bring in data scientists who never had any exposure to biology, one of the things they struggle with at cider reason is they expect to have a gold standard like that. I know what the truth is. And I'm like, we don't know. We're continuously glimpses. We're in an unknown, unknown space. Right. And so I think the challenge is it's not the only field in science in which this is the case. But, but I think those challenges are amazing. And actually the necessity for us, the obligation that we have, I think, to bring in AI in machine learning, to accelerate our ability to actually bring cures to people. I see this as an obligation and I'm not afraid of the situation of suddenly a machine doing what it is because there's always the next gang. And it's actually why I got into this. Right. Is the fascination with the discovery. And so I think that's a good way of giving hope to the folks.
[32:05]
B
Absolutely. Yeah. No, no, no. Absolutely shy. I think that was a great place to end. Kind of an uplifting, I don't want to say hopeful because it implies, you know, a lack of hope in other situations. But like you said, there's. There's no end of, of hard problems and cool things to do. And so, you know, you know, using the tools to get the old ones done faster so we can get to the new stuff. It's a great way of looking at it. Usually. Shai, I ask. I kind of wrap these episodes by asking the guest where listeners can go to find out more about everything we've talking about. And I definitely want to do that with you. But first, I understand you've got a podcast to plug.
[32:39]
A
I. I do, I do, I do.
[32:41]
B
You play the host role.
[32:42]
A
Yeah.
[32:42]
B
Tell us about it.
[32:43]
A
Yeah, it's. Thanks for mentioning it. No, it's. It's called Tech on Drugs. Nice. And I basically interview interesting people from walks of life, mostly scientists and clinicians, I would say, who are coming up with new innovative technologies, whether it's computational and sometimes they're experimental as well, that allow us to, you know, bring drug development to the next stage. And there's a huge amount there about AI from all.
[33:11]
B
Well, like I was saying, we've. I'd heard of, you know, pretty protein structure prediction before. Right. So we've talked a little bit about it on the pod. So I imagine you have plenty of fertile ground to cover there. Tech on drugs.
[33:24]
A
Tech on drugs, yes. On Spotify.
[33:26]
B
Okay. And it's available Spotify, all the regular channels. Fantastic. So check that out as well. More information about Cytoreason, the website Cytoreason.com. is there a research blog, other social channels? Do you cover all that on your podcast?
[33:41]
A
Right. So not on the podcast, but there's a website. We're on LinkedIn actively. And so that's probably the best resources to get in touch with folks insider reason.
[33:52]
B
Great. Well, Shai, again, thank you so much for making the time to talk with us. Like I said, it's vital work, as you kind of alluded to. And we both mentioned coming up with ways to extend. Improve people's lives. But the energy you bring to it and that sense of like, yeah, let's get this done, the next cool thing's around the corner. I think it's awesome. It's really inspiring. And for me, you know, personally, I'll carry. Carry that with me. So thanks again for taking the time. All the best of luck to. To you and your teams.
[34:19]
A
Thank you so much. Nice.