S1, E28 - Adam Blum: AI-Powered Clinical Trial Matching - Practical AI in Healthcare

Summary6 min read

Podcast Summary: Practical AI in Healthcare

Episode: S1, E28 – Adam Blum: AI-Powered Clinical Trial Matching
Date: March 15, 2026
Hosts: Steven Labkoff, MD & Leon Rozenblit, JD, PhD
Guest: Adam Blum (Founder, Cancerbot)

Episode Overview

This episode focuses on the persistent and critical challenge of clinical trial matching, and how AI can finally provide scalable, meaningful solutions for patients, clinicians, and life sciences companies. The hosts are joined by Adam Blum, an AI entrepreneur and lymphoma patient, who founded Cancerbot to translate his own grueling experience with trial search into a robust AI-driven platform. The conversation candidly explores the technical, clinical, and practical realities of trial-matching systems, the inherent complexity of eligibility logic, and what it actually takes to make these approaches safe, accurate, and patient-centric.

Key Discussion Points & Insights

1. Adam Blum’s Origin Story and Motivation

Background in AI and Startup Experience
- Adam built early NLP and AI tools (e.g., Microsoft English Query) and founded Skillmore (LLMs for employee skills).
- Personal context: Diagnosed with follicular lymphoma (UK, 2024); direct encounter with clinical trial search prompted invention of Cancerbot.
- Quote:
  
  "When I got a cancer diagnosis, I said, I think I have some tools to go solve the hard problems of clinical trial matching." (03:23, Adam)

2. The Patient’s Perspective: Clinical Trial Search Is Broken

Navigating the Maze of Registries & Matchers
- Adam spent weeks exhaustively reading every available registry (clinicaltrials.gov, iSRCTN, EUCTR, WHO ICTRP).
- Commercial matching services proved superficial—asking only basic questions and yielding inaccurate "matches."
- Quote:
  
  "Most patients are not in a condition to, to go, you know, spend several weeks doing nothing but retry us." (08:22, Adam)
Barriers for Both Patients & Docs
- Eligibility described as “a wall of text,” filled with inconsistent terminology, complex logic, and non-computable formats.
- Even oncologists struggle to do a comprehensive search.

3. What’s Wrong with Existing Solutions?

Lack of Computability, Standardization, and Depth
- No standard for encoding eligibility logic; criteria are free-form, inconsistent, and difficult for AI or humans to parse.
- Commercial matchers ask only for high-level attributes, rarely drill into prior therapies, comorbidities, or nuanced exclusion/inclusion criteria.
- Quote:
  
  "They're not really trying to flesh it out to that point...the trials that were sent to me that I was told I was a match, I was unequivocally not a match for." (15:48, Adam)
Patients’ Needs Are Complex and Diverse
- Each patient has unique balancing of “benefit,” “burden,” and “risk”. Superficial matching ignores what really matters to individuals.
- Quote:
  
  "Every patient has their own computation of what's best, has their own framework for what's best, even though they might not express it that way." (18:49, Adam)

4. Cancerbot’s Approach: Deep Structure, Custom Prompts, SME Collaboration

Deep Attribute Extraction
- Disease-specific schema – e.g., 84 common attributes just for follicular lymphoma.
- Manually engineered, disease-specific prompts for LLMs, refined by medical subject matter experts (SMEs).
- Quote:
  
  "You really need these hand tuned prompts for every attribute...the prompt will go on for pages potentially now to reassure you, of course, this is just done once for every disease." (25:42, Adam)
Guardrails Against LLM Errors & Hallucinations
- Human-in-the-loop: SMEs tune prompts and validate attributes extracted from trial texts.
- Complex eligibility logic is transformed into conjunctive normal form (“laundry list” of AND/OR conditions) for computability and user clarity.
- Quote:
  
  "Transforming the logic into a consistent representation...makes it very understandable for patients what they have to hit or doctors...It makes it much clearer." (34:15, Adam)

5. Cancerbot in Practice: User Experience

Patient Workflow
- Patients either upload their EHR (Epic MyChart supported; other EHRs forthcoming) or answer a customized set of questions (beyond just age/gender/stage).
- Matches are ordered by “goodness” (user-selected: risk, benefit, or burden); missing eligibility items are highlighted for easy follow-up.
- Users are directly connected to appropriate trial coordinators, drastically reducing wasted effort for both patients and sites.
- Quote:
  
  "The patient sees...the list of trials that will show it to you in the order of patient goodness...we actually show you what things are missing." (29:00, Adam)

6. Technical and Community Collaboration

Open Source & Standards (OMOP+)
- Cancerbot’s extended OMOP schema (“OMOP Plus”/ctomop) fills gaps needed for precise eligibility matching, like assay names.
- All code and extensions released as open source.
- Collaboration with Harvard DCI Network to build test bed for matcher evaluation.
- Quote:
  
  "OMOP is an absolutely stunning accomplishment and we wouldn't have gotten as far as we did without being able to...stand on the shoulder of those giants." (44:31, Adam)
Accuracy Validation: Results So Far
- Precision for automatic attribute extraction:
  - Follicular lymphoma: 85% (ASH 2025)
  - Multiple myeloma: 90% (BSH 2026)
  - Breast cancer: early results >90%
  - Harvard’s DCI Network independently validating.

7. Looking Ahead: Roadmap & Community Involvement

Expansion to More Diseases
- Partnered with Blood Cancer UK: CLL (chronic lymphocytic leukemia) and mantle cell lymphoma next.
- Open to adapting roadmap based on demand, especially where standard of care lags trial innovation.
Role-Based Access & Engagement
- Supports roles for patients, clinicians, nurse navigators, and trialists/researchers.
- Encourages direct involvement from users and patient foundations.
- Contact:
  - adam@cancerbot.org, www.cancerbot.org

Notable Quotes & Memorable Moments

On Patient-Driven Innovation:

"You have two things that most patients don't. You had deep AI and IT expertise. Right. And you had the stubbornness of an ultra marathon or maybe a pit bull, whichever covers you better. Um, but the system shouldn't require a tech CEO to navigate it, right?" (08:25, Leon Rozenblit)
On Complex Logic & Human Cognition:

"The eligibility logic is so frightfully complex it boggles the human mind...let me, I gotta like draw a diagram to understand what the hell this just said." (37:36, Leon Rozenblit)
On Generalizing Innovation Patterns:

"You've developed an iterative process where SMEs inject disease specific knowledge into prompts. That's a really elegant design pattern and one that...is essential for anybody building disease specific clinical, clinical or research systems." (37:17, Leon Rozenblit)
On Ethical Commitment:

"Adam's doing this as a not for profit. So if you guys want to donate to this effort, I'm sure he'd be happy to have the help, whether it's in kind or whether it's actual dollars." (48:38, Steven Labkoff)

Timestamps for Key Segments

Origin Story & Patient Journey – 03:06–08:25
Clinical Trial Matching: What’s Broken – 08:25–15:16
Why Commercial Matchers Fall Short – 15:16–20:05
Cancerbot Architecture & Technical Deep Dive – 24:37–32:03
User Experience: Patient Workflow – 28:57–32:03
Guardrails & Logic Representation – 32:03–38:30
Open Source, OMOP+, and DCI Collaboration – 39:53–44:44
Accuracy, Outcomes, and Expansion – 44:00–46:26
Contact and Community Involvement – 46:36–47:49
Closing Reflections and Takeaways – 48:58–50:15

Final Takeaways

AI-powered clinical trial matching can only work at scale with careful, disease-specific engineering, close involvement of subject matter experts, and deep commitment to transparency and usability.
Transforming eligibility logic into a format that's “computable” and cognitively tractable benefits both automation and people—patients and clinicians alike.
Building on existing standards and sharing tools as open source accelerates progress and improves adoption across the healthcare ecosystem.
Patient-driven, not-for-profit innovation can set new standards for both accuracy and patient empowerment in a notoriously difficult domain.

For more information or collaboration, visit cancerbot.org or contact Adam Blum at adam@cancerbot.org.

Loading summary

Transcript69 lines

[00:04]
A
Welcome to Practical AI in Healthcare, the podcast that cuts through the noise to spotlight real world solutions delivering real world value. From patient care to clinical research, from life sciences to patient engagement, we focus on what's truly moving the needle in healthcare. No hype, no theory, just practical insights where AI is making a true impact. Welcome aboard and let's get to it.
[00:28]
B
Foreign.
[00:35]
A
Hello and welcome to this rich edition of Practical AI in Healthcare. My name is Dr. Stephen Lapkoff and I'm here with my partner, Dr. Leon Rosenblit. How you doing, Leon?
[00:43]
C
I'm great, Steve. How you holding up?
[00:45]
A
Oh, dealing with the snow and trying to dig out. We had 18 inches this week and now another two. It's been tough on the snow front, but enough about the weather. You know, I wanted to reflect a little bit on last week's episode. We had Charlie Harp from Clinical Architecture and I found his insights around data quality and the framework he's building to be super interesting.
[01:06]
C
Yeah, I really liked a lot about what he's doing. I think he's approaching the problem at a novel level of abstraction by looking at developing a data quality taxonomy instead of just developing data quality tooling, which a lot of people have tried before, and also by looking at an open source, open standards approach. First they're going to build a commercial product. Actually, I think they already did build a commercial product on top of it, but that's fine because other people can build on top of the open source work. So there's lots of stuff about that that's really interesting and it's going to make a big difference when we talk about how AI is going to be using high quality data down the road.
[01:49]
A
Yeah, I agree with you and I hope our listeners get the opportunity to tune into that episode. I think this picky framework that he's built is going to really, as it gets adopted and voted on HL7, it's going to become something really important to pay attention to and hopefully will change the nature of how data quality is measured and used as a front end tool for building training sets. So what do you think about this week's episode, Leon?
[02:18]
C
You know, I love clinical trial recruitment as a problem. I know you're probably closer to it because you've worked both sides of the of that particular market, but it's a very compelling problem. I think it's something we dealt with on the registry side because often there was an intended use case for registry data is to identify clinical trial cohorts. And our guest today is someone who's not only thought about this deeply, but is a patient who just encountered the problem and went out and started building. And the stuff he's doing is really, really cool.
[02:53]
A
Yeah. Our guest is named Adam Bloom and we'll bring him on just a moment. And he has done something that is remarkable. And we'll hear that story in just a second. So let's welcome Adam to the show. Adam, how you doing? Welcome to the show.
[03:07]
B
Happy to be here. Thanks for having me.
[03:09]
A
So, Adam, the first question we usually ask our guests is basically, what's your origin story? What did you do? Or what brought you to this point in your career? And maybe touch base on some of the things you've done in the past,
[03:20]
B
like, I know you wrote a book,
[03:21]
A
that kind of thing.
[03:24]
B
Yeah. So I've done various startups. I guess the common thread has been. What's now AI or neural networks or deep learning intersecting with structured data. So built the first tool that used natural language to query structured databases at Microsoft, Microsoft English query. And the most recent startup was a startup called Skillmore that infers skills for employees using, using large language models. Prior to that, a company called Open Ed that served 12% of U.S. classrooms by putting skills on. Putting skills on educational resources. So that's, that's sort of. And so I was running Skillmore and that's some context for, you know, why. When I got a cancer diagnosis, I said, I think I have some tools to go solve the hard problems of clinical trial matching.
[04:40]
A
So let's go right there. You know, you've shared with me on more than one occasion that you're in the midst of a cancer journey, and that cancer journey involved a disease called follicular lymphoma, diseases I've worked on. I've worked with the Follicular Lymphoma Foundation. I know the folks over there real well. What happened? You were, you know, this is a recent diagnosis too. It's not that old in your life. So what happened and how did it change your life and how did AI get involved?
[05:10]
B
So I was diagnosed with liquor lymphoma in August of 2024. And I happen to live in the UK and my doctor said, okay, well, standard of care under the NICE framework in the UK is watch and wait or radiotherapy. Even though it was actually a rather large tumor, a bulky tumor in my Anguilla lymph node, and it was also fast growing. Even though follicular lymphoma is supposedly indolent, it wasn't there. And then a month later, it was there. So, so when I, when I encountered this, you know, of course I did my reading on follicular lymphoma and saw that there were, if I was in another geography like the U.S. i probably wouldn't have gotten watch and wait. And radiotherapy would have been probably something like R squared or rituximab plus or sorry, radiotherapy plus rituximab or radiotherapy plus bendustamine. And so I said, well, it's okay, I'll, I'll, I'll find a clinical trial so I can get some, some better care than just plain radiotherapy. And. And so thus began my adventure of trying to find clinical trials. I started looking through all of the registers, clinicaltrials.gov, iSR, CTN here in the UK, EUCTR in the EU and the WHO's ICTRP that tries to aggregate from all of those things. And so I took three weeks off of work and just went through reading the unstructured text and trying to find a trial is sort of mind numbing experience doing that. And then at the same time they're of course in Googling, you know, I found that there were services, commercial clinical trial matchers that said, hey, we'll do, we'll do a match for you. And I engaged with them and answered the five questions. Age, stage age, gender, disease, stage, grade. And then they say, okay, well we found you a match and you work with a consultant who's generally not a medical person. And then they say we found you a match and they try to broker you to pharma. But none of the, the matches that I was proposed by those sources were actually matches. And I was able to know that because I had spent so much time reading essentially all the follicular lymphoma trials that has existed, at least certainly the ones in the US and the UK that would have been a fit for me. So I just, from that I said, okay, there's, there needs to be a tool for this because it's just a completely unmet need. It's just not. Most patients are not in a condition to, to go, you know, spend several weeks doing nothing but retry us.
[08:26]
C
So Adam, this is, it's kind of cool that you had this really unusual skill set that allowed you to turn what's really a really scary diagnosis and a difficult, intractable trial search problem into something you get your hands on. You have two things that most patients don't. You had deep AI and IT expertise. Right. And you had the stubbornness of an ultra marathon or maybe a pit bull, whichever covers you better. Um, but the system shouldn't require a tech CEO to navigate it, right? I mean, if, if the only people who can succeed are folks with your exact background, boy, are most patients in trouble. So let's dig into what's actually broken. Why is matching patients to trials so hard?
[09:11]
B
Because you've got this, this wall of text for eligibility. And it's quite daunting. It can go on for pages. And it's not the, it's not just, you know, hey, here, look, sort of a tick list of the common things. It's. It's written almost like free form, you know, it's almost stream of consciousness the way these doctors are writing it. And that's not just in, you know, the, the way they refer to a particular patient attribute. It's also the values used in patient attributes. And it's even things like describing. Very often trial eligibility is about, like previous therapy. And so sometimes when people will say, like, you know, can't have had R CHOP in the past or must have had it. And then sometimes they'll say, must have had a proteasome inhibitor and an immunomodulatory drug, or sometimes they'll say, must have had a PI and an imid. And you know, this is like there's so many dimensions where it's just not written anything close to something that a patient could interpret. And then even the logic that's expressed, you can often have, you can say, well, you must have had a PI or an IMID or you must have
[10:36]
A
had
[10:39]
B
a corticosteroid, or you must have had this type of drug, but not this. So you have these like arbitrarily complex logical expressions and just reading them. Just to be clear, like, I'm not a medical practitioner. It was difficult for me. It was lots of time spent with a large language model, you know, reading the text, looking up the stuff, trying to see what the large language model would say was a summary of it. And even that was typically not very good. So I would say that, that really it's difficult to impossible for a patient and it's even for a doctor, it's no matter which way you slice it, even if you're an oncologist, you're going to spend, if you're going to do a truly comprehensive search, you're going to spend a lot of time doing it.
[11:33]
C
So, Adam, you're describing two kinds of closely related problems. One is the definitions that currently exist, although they seem to be in a single repository, right? There's ClinicalTrials.org, and it's all there. They're not computable and they're very difficult to interpret, even certainly for a novice and I would say even for an expert. Right. I mean, I've tried reading some of them when necessary. And like you, I do a lot of querying in my life and it's not easy. I think that, you know, you shared it with us, what you had to go through. So give us a sense what a typical patient has to do. So they're faced with this diagnosis, they're looking for a trial. What are they doing in practice?
[12:14]
B
Well, so it's interesting, you know, we talked about one of the things that, that we've done since we've started cancer about is give it away to a bunch of foundations. So I've gotten patient support Org. So I've gotten to know a lot of them. And, and it's interesting because I was on a call from a lymphoma patient support organ that was a fantastic presentation from a doctor at Stanford, and he did an amazing job. And at the end, people said, okay, well these drugs sound really interesting. How do I go find the relevant trial? And he actually said, go to clinicaltrials.gov and the problem is that's just, it's. It's still incredibly daunting. Even if you sort of know the therapy that you want, because you don't necessarily. There's so many different synonyms for these things, like do they call it R benda, you know, or do they call it rituximab bendustamine, or is it some. Or do they call it br? And so you as a patient, you can't even really do the structured search because there's all these synonyms for the same thing. So I guess failing something like cancer bot it is something like reading through ClinicalTrials.gov but reading all of the. Handling the different attributes, handling the different values that can be on the attributes and handling all the complex logic is still pretty daunting. And even if you're a doctor, I'm not saying doctors can't do it. You're just still going to spend a lot of time if you really want to do a truly comprehensive search. So what I've learned after starting Cancerbot and talking to a lot of doctors and patients and the nurse navigators, these wonderful nurse navigators that are in the patient support orgs that it typically is around things that they know, right?
[14:13]
A
They.
[14:13]
B
They know that there's this trial here. Is it the absolute best trial for that patient? Not necessarily, but they know it's probably as long as they're eligible, it's probably better than standard of care. So then they go to that trial and then they do their best to, you know, especially with the patient's support orgs, or if a doctor knows about it, they can go do their best to like, look at that one trial and just spend a fair amount of time figuring out, okay, this is what I know about the patient or the patient knows about themselves, and this is what's in the trial. And that works if you have this very small set that you know about. I don't think it actually works at all if you're trying to find, you know, the best option. Yeah.
[15:03]
C
So, Adam, the. You've described very vividly that the eligibility criteria themselves are a mess and are not designed to be computable. And in fact, they're not standardized to any particular standard, which I think a lot of people would find surprising.
[15:16]
B
Right.
[15:17]
C
There is no standard for describing a clinical trial protocol. You know, I would agree with you that that's just a fundamental problem and that that sort of gap in standardization computability underlies a lot of the issues you're seeing. Now, some folks who claim they've solved the problem are various commercial trial matchers, right? So there are people who will call you up and say, I found a clinical trial for you or their, you know, their software vendors who will claim to do that for organizations. What's wrong with using them? Why, why don't they work?
[15:48]
B
So the, the, the. I, I'll just tell you my experience. So I tried over a dozen, and all of the suspects that you know, you probably know about, they simply don't. They generally only ask age, gender, disease, stage grade. Some of them, not all. Some of them do ask about prior therapy. The best of them ask about prior therapy, and that gets you so far. But they're generally not asking about all your lab values. They're not asking you about all the things that are there. And they're not asking necessarily, even if they are asking about prior therapy in the level of detail that you would need to know, especially things like what are the adjuvant and neo adjuvant therapies that are given with the primary therapy. So they're not really trying to flesh it out to that point. Now the way they handle it is they generally, and most of these have consultants that once you've established this, they say, okay, we'll get back to you. They get back to you and they say you have a match. The trials that were sent to me that were I was told I was a match, I was unequivocally not a match for. So luckily, I didn't spend a lot of time, you know, going through that disappointing process where they, you know, then they present me to bring me to the trial owner or the clinical research coordinator, who then, you know, re Estep tries to fully establish to my match. And I wouldn't have been, because I was able to know from my background, from my knowledge of my labs, diagnostics, biomarkers, that I wouldn't have fit those things. So that, that is another option. But they're not, because they're not doing that full match. They're relying upon this person in the loop, which I maybe could theoretically work, but that person is going to have that same challenge, right? You've got this handful of attributes that, you know, there's this much longer list. Then they're going to have that same challenge of, you know, if they're really going to do a comprehensive search, finding it. The other thing I want to emphasize in terms of, like, you could say, well, why do you need a comprehensive search? You know, I had radiotherapy, my standard of care in the uk, it was radiotherapy. And at some level you could say, if you get something better, if you get radiotherapy plus rituximab and radiotherapy plus bendustamine, isn't that better? The issue is that every patient has their own standard for what's good for them. There's many dimensions of what's a good trial. For some patients, it's benefit, which, to be honest, was most important to me. I felt like I had an obligation to my family to just do whatever I needed to, fly wherever I need to fly, take whatever patient burden I needed to to do it. Another patient is going to have a completely different idea of what's best. So, interestingly, my grandmother also died of lymphoma and, well, had lymphoma. She ended up dying. She had one round of chemotherapy she did not want to do any more. She did one round of chemo, she was in her 80s, she didn't want to do any more. So for her it would have been, is there something that's better than standard of care, which was chemotherapy at the time, that would have had low patient burden, and then there'll be another patient that says, okay, well, I just don't want to feel like a guinea pig, so I want the lowest risk. So every patient has their own computation of what's best, has their own framework for what's best, even though they might not express it that way. So that's why I think it's really important that patient, patients get a more comprehensive picture of what's available to them.
[19:51]
A
Yeah.
[19:51]
C
So Adam, I heard you describe part of a similar distinction as the difference between ranking trials and precision matching. Is that a same concept that you just described or is it, is there more, is there more to that framework? Right.
[20:06]
B
So it's interesting because. And that brings up sort of like a third option. There are several open source AI based matchers and they, they attempt to rank the trial for what's best for the patients. Most of them are using these small Data sets, generally TREC 2022, which is a set of trials across, you know, all conditions and diseases, but it's rather small. And so they'll take a patient vignette, so like 50 year old man, follicular lymphoma, bulky tumor, otherwise healthy. Give me rank these trials which are across all kinds of symptoms, on what's best. Now of course, what happens is the follicular lymphoma goes to the top and better, right? And then they say that the recommendation worked. If the highest, sorry, if the doctor labeled gold standard, which in this case, of course a doctor would have gone in and said for this vignette, the best one is the follicular lymphoma trial, of course. And then they'll say that the way to assess accuracy is if the doctor labeled gold standard is in the top N. Generally they do top five of the suggested trials. The problem is that doesn't say anything at all about whether you're actually eligible. It just doesn't. It, it's just saying like of these things, this is the highest rank. Now one would hope that a large language model that does this also, you know, if it gets two follicular lymphoma trials, it could look at, you know, which one are they actually eligible for? So like hopefully for me it would have done the treatment naive one. And the answer is it probably did, but it's not actually assessing true eligibility. And then back to your question of like, so is a dimension of that also like goodness? Ideally, if you separate out eligibility matching from patient goodness, you can do a better job at both. Right. You can say let's first find everything you're fully eligible for, right. And now let's rank or sort within that based upon what matters to you as a patient. Is that, is that risk, is it benefit or is it patient burden? So what's interesting is you could say, well, eligibility matching is sort of like the other approach than this ranking, I actually think if you do the eligibility matching to sort of call it to the things that you're absolutely eligible for, then you can now do the ranking in a more interesting way. You know, you're eligible for it, and you can say, well, let's just sort it by what matters for the patient.
[23:07]
A
So, Adam, you know, everything you're saying, you know, you know that I've worked in the life sciences for many, many years, and we've talked at length about the fact that the life science companies are equally frustrated with this system. You know, clinical trial. It's not that infrequent that a clinical trial will fail because they can't recruit enough patients to the trial. So there's this problem that has evolved into this system, which is the clinical trialists need to find patients, the patients want to find the trials. And that intersection is not overlapping nearly as well as it should. And that means that 20 years ago, in a trial recruitment statistic, which is called per patients per trial, per site, patients per month per site, how many patients are recruited into the trial at a given site? A tough trial used to be about 0.5 psm. However, with the advent of clinical trials in the personalized medicine space, where genomics are involved and things like that, those statistics have dropped down to something well south of that in the 0.001 or.0015 or.002. And when you get down into that kind of a range, that means everybody's having trouble and everybody's frustrated. So that takes us right into, like, the nuts and bolts of what Cancerbot does. Why don't we unpack that a bit and tell us, how does Cancerbot try to bridge that gap? Because it's a big, wide gap right now.
[24:38]
B
So in terms of the mechanics for, like, how Cancerbot's doing this, of course, the, the core of it is this instead of just a database with these trials that have a few structured attributes, like, you know, what's the disease and what's the city that it's in, and what's the basic intervention. We want much deeper structure that gets to all the patients. So what we want is a database for all, for, for. For each disease, because each disease is actually going to have a different set of eligibility attributes. So it's not like one size fits all, like every disease has all the same attributes. So we do some analysis, sort of a heat map of trial eligibility for each disease, and then effectively we have a schema for each disease and what makes it eligible. And so what we need to have is a database of all the world's follicular lymphoma trials. There's 84 attributes that are commonly used in follicular lymphoma eligibility. And, and so the challenge then is to get this unstructured eligibility text, the wall of text, and take apart from that wall of text, fill in each of the 84 attributes. Now the sort of naive solution is let's just take the trial text and give it a prompt to an LLM and say to the LLM, like here, fill in all these attributes. And it turns out it doesn't work very well. In fact, it's not particularly accurate at all. Like it's probably in the 60% accuracy range, little better than random. So the way that we do it, in full disclosure, we've already got our utility patent on this, so I'm comfortable explaining this. We have this thing called a prompt workbench. So we have these, a team of subject matter experts. And to get these 84 attributes for follicular lymphoma, for each one of the attributes there's sort of a default prompt like what is the minimum serum bilirubin level? Or what's the, you know, what is the minimum absolute neutrophil count? Or, or like you actually look at the potential prior therapies and you could say, and this is actually not just one attribute, you're going to have multiple questions that you ask around prior therapy like, is it required to have a proteasome inhibitor? Is it excluded to have a proteasome inhibitor? Is it required to have a immunomodulatory? And you have to do that at the therapy types level. You have to do it potentially at the, it could require specific components. So you could say, is it required to have bendamustine? Is it excluded to have bendamustine? Is it required to have rituximab? So it's actually way more than 84 questions. So we have a set of like default prompts that we ask for all of these things. But then it turns out to handle all the variability of vocabulary that the LLM is not able to like handle it itself. You really need these hand tuned prompts for every attribute. And so the prompt will go on for pages potentially now to reassure you, of course, this is just done once for every disease and we give the subject matter experts, the medical subject matter experts, the ability to immediately test that against the trial at hand or a potentially a small subset of trials. So you're not, you know, engineering these prompts for every Single trial, right. It's just once per disease. And then for each one of the attributes and each one of the questions that you have, and then the subject matter experts can say, okay, go ahead and extract all the attributes here, they can see the results and then fine tune the prompt further and that yields above 90% accuracy.
[28:48]
A
So, sorry, so as a patient, what is the, so you're getting into the, into the nits of how it works on the back end. Let's, let's, let's say.
[28:56]
B
So how does the patient use it?
[28:58]
A
What does the patient see? I mean, the patient doesn't see all that complexity.
[29:00]
B
The patient sees what fully not so the patient. What the patient does is either they upload their EHR. So Epic, we support Epic MyChart, and we're in the process of adding other EHRs, or we ask a handful of questions. Questions more than just, you know, age, gender, disease stage grade, so it's age standard disease stage grade, but also things like what was the prior therapy? Roughly a dozen questions. Right? That, and it's going to vary per disease. And that's it. Then we have a partial patient record. And then from that we are show you the list of, of trials that will show it to you in the order of patient goodness. Because first we ask you what's important to you, risk, benefit or burden. But we also have like match eligibility scores for each one because you may not be fully eligible from those, just those dozen questions and we actually show you what things are missing. So typically what happens is it's sorted by goodness by default, and you go to the, the best trial for you, you look at it and then it says okay, and it shows you all the things that you're matching on in green, right? And then it shows you the stuff you're missing. And when you say, okay, I'm interested in the trial, then the first thing we do is say, okay, I, you know, can you answer these three questions right, that we can make you 100% eligible, but even if you don't, you can still register interest because those matches that, let's say that you're missing three questions and you're like an 85% match. At least we're not showing you anything that you're not eligible for. So you're potentially eligible for it. And as you, when, once you express interest, then we introduce you to the trial owner or the clinical research coordinator of the trial, they of course will, you know, re ask you those questions, which they need to do. But the chances of you being eligible are much Higher. Even if you didn't quite get it. Like every, maybe you were missing a lab, you didn't have your LDH levels and it's a follicular lymphoma trial. You're, you're, you reduce, you drastically reduce the chances of both sides being disappointed. Right? You're disappointed because you've delayed your standard of care because you've looked at the trial. The trial owner is disappointed because they spend all this time to re vet you and it turns out you're grossly not a fit. So the experience is a much better experience for both sides, even in the case where you didn't quite fill out absolutely everything that you needed to get that match.
[32:04]
A
So as you're building this scaffolding and the patients are interacting with it, you know, you mentioned there's these really long prompts that are very complicated that are on the back end. What are the guardrails you're putting in place to ensure that there's no errors made, no hallucinations coming out? I mean, one of the challenges here is that, you know, many of these models are geared to, you know, kind of be a little sycophantic and they're trying to always please. And if it doesn't have information on a given topic, it can serve something that it thinks might be the next best thing or whatever it would want. But you've actually put some guardrails in place. Maybe you can unpack that a little bit.
[32:43]
B
So yeah, there's two areas. One is because you have these like differing vocabulary that often the large language model is not even aware of the nuances of. You need these subject matter experts to engineer. This is why these prompts are so long. You need to engineer like when I say it must have had an immunomodulatory drug, by the way, you can also look for an imid or you can look for any of these things, right? And so you are drastically reducing the chance of hallucinating on the vocabulary. Then there's a second aspect of trial complexity. So one is the language, right? The language of the attribute, the language of the attribute values. The second one is the logic of the trial. These trials can be very complex, especially in the examples of prior therapies that can say, well, you had to have this and this and this, or you can have this and not this, but you must have this as well. And as that logic gets really complex, this is still an Achilles heel of LLMs is complex reasoning. So what we do is we transform the logic into just to the, the Jargon for it would be conjunctive normal form. But that's a fancy word for transforming the logic using de Morgan's law into essentially a laundry list that's actually longer. So it's a laundry list where everything at the top level is anded together and you push any or conditions to a named. I call them named or conditions to a piece of jargon that expresses the or condition. So in multiple myeloma. And Steve, we did some work before in the air when you were at mmrf, the two big acronyms are meats crab and meats slim. And so it turns out those are named or conditions for in case of meats crab, hypercalcemia, renal insufficiency, anemia and bone lesions meet slim. It's an OR condition of that includes having multiple lesions and puts a bar on the serum light chain ratio. What was interesting though, is when we said, okay, we're going to always transform the logic to essentially a laundry list of and conditions. And then any or conditions get pushed below that and we'll just have a piece of jargon, a name for what this or condition is. Every time that there was always already a medical term for those OR conditions. Meats, crab, meats, slim, even things like do you have renal or hepatic or hematological sufficiency. Those things. If you drill into what any of those things mean, it usually an or condition within there. So transforming the logic into a consistent representation. One is it makes it very understandable for patients what they have to hit or doctors, you know, what are the criteria. And so we show this is what the trial requires. This and this and this and this. And then you can just see what that we show what the patient has. And it makes it very. It makes it much clearer. So those two things, having these elaborate prompts that sort of over communicate what you need to be true to fit an attribute. And the second thing is transforming the logic that's in there to be, as you've. I've heard you call it many times, Steve, computable, which may mean making it longer. So it turns out you can make it longer by always transforming it to a set of and conditions. But the key thing is, even though it's longer, it's still more understandable.
[36:48]
A
Sure.
[36:50]
C
So, Adam, this was a very rich description. I want to make sure we unpack it just a little bit for our audience. There are two things you said that are worth noting. One is that you've developed an iterative process where SMEs inject disease specific knowledge into prompts. That's a really elegant design pattern and one that I think is essential to understand for anybody who's building disease specific clinical, clinical or research systems. Right. You just have to.
[37:17]
B
It probably applies way beyond that technique could apply way beyond clinical trials.
[37:21]
C
Yeah, yeah, it's, I mean it's just an essential, I think it's an architectural pattern that's good, that's really essential to understand. And then there's this, the second piece is that the observation that the eligibility logic is so frightfully complex it boggles the human mind.
[37:37]
A
Right.
[37:37]
C
Which I, I certainly has been my experience trying to read it. You're like, let me, I gotta like draw a diagram to understand what the hell this just said.
[37:44]
B
Right.
[37:45]
C
But there's this. But the beautiful solution is, you know, you, you've relied on de Morgan's law, which for our audience says that you can unpack any, any complex Boolean statement into a series of and, and, or statements that are like a laundry list that you can check off that just becomes longer.
[38:02]
B
Right.
[38:02]
C
That's a trade off. And that, you know, I think your insight and that, by the way, that observation is used in other places in basic clinical decision support. But I think where your contribution is unique and really worth noticing is you've noted that that's a UI advantage, that's a user experience advantage, not just a computability advantage. Right. Because people usually say edit makes it easier to process. Right? But it's like, but. And it also makes it easier for humans and LLMs to think about and it's really important.
[38:30]
B
Right.
[38:31]
C
Like, and the, the other, the really interesting observation that you made is that most complex conditionals, right. The, the, the rows in the conjunctive normal form winded up being nominalized in domains is really consistent with the way cognitive scientists think about chunking. Right. If you have a concept that's very complex but it's used often, you're going to wind up a noun phrase that just describes it and people will be like, oh, it's that thing, right?
[38:54]
B
Yeah.
[38:54]
C
And that's, and then we could talk about it and everybody understand what you mean and you can treat it as, like a, as one item in the laundry list. I just think that's really, really important. And one of the pieces I think is missing from when the clinical trial side that trialists that Steve's works with on the pharma side don't know, that they don't understand how to take that very complex logic and translate it into the laundry list. So I just wanted to unpack that a little bit and let's tie it back to the work that you're doing at Harvard. All three of us are involved in the DCI network. And by the way, I just want to call out and thank you for contributing. Everything that you've done is open source that's not available for public use and contributing to public good projects, including the one you're doing in dci. There we're looking at goodness of fit, right? Not just is this of good match, but is this good for you? Can you talk a little bit about the project and the work that's going on at dci?
[39:54]
B
Yeah, I think the first step is. So we want to use this software for clinical trial matching, including encouraging other matchers to start doing precision approaches. And so we're trying to create a test bed really for all matchers, not just for cancer. Bottom the first step has really been this. Let's create a long term patient database that has room to store all of this stuff. So we sort of extracted out the patient info database that we had in Cancerbot as its own project is actually called ctomop, but that's open source. And hence the OMOP is an incredible achievement from the Odyssey group to have this database for analyzing patient conditions. It's a fantastic basis for doing this kind of matching. It's missing two things. It's missing some of the things that you would that show up in trial eligibility. So as an example, even though there's a genomics extension and oncology extension, they've done an amazing job where it doesn't have like the name of the assay. And a lot of trials rely on the name of the assay. So there's this gap analysis of, you know, what needs to be added to that. So that's in ctomap of these sort of missing things. The other thing that's there is in order to do fast matching, doing all the joins in the normalized OMOP database is just too slow. So there's a flat version, we call it the patient info table. So that when you are trying to compute those matches, you can do it really quickly. But that's all out there, open source and you know it. I commend the vision that you two and Yuri Quintana had for you really need to have this long term patient database to do a good job of it. It certainly influenced what we built at Cancerbot and so we were happy to sort of take that part of that. We also sometimes refer to it as the OMOP plus database. Take the OMOP plus database and make it available to anyone.
[42:17]
C
Yeah, for our audience. OMOP stands for Observational Medical Outcomes Partnership. It's a data standard that's used extensively in clinical research and for data interchange in clinical research. So we've. So what you've built sounds really exciting and sensible. Do we have any idea how accurate it is? I know you presented at ASH in 2025 and BSH has a paper out recently. So what are, what are the numbers showing so far?
[42:44]
B
Yeah, so what we're, what we do, of course, is we're, we're looking at all the attributes and we're saying, okay, as we infer an attribute in a trial, how often is that correct? So that's, that measures our precision. And then out of everything that's in, in the, that's in the eligibility text, how many, how much of it did we get? So we did a paper for ASH 2025, we were 85% accurate. But of course, we're constantly improving and getting better at formulating these prompts with the subject matter experts. And so we did another paper and that was assessing follicular lymphoma. And then we did Another paper for BSH, British Cytohematology 2026, that was just accepted, will be presented in April. And that was for multiple myeloma. It was 90%. And now we're doing another effort with breast cancer trials and I think it's looking like it'll be above 90%. And then of course, DCI Network is kind enough to do their own assessment in parallel. And one hope, you know, that I would assume that the accuracy numbers will be relatively close. And we think that both of those studies will come in above 90%. Don't know exactly what that number is yet.
[44:01]
C
Yeah, but it's fantastic that you guys are measuring it this quickly and getting rapid results. It's very encouraging. You know, we're all applauding and, you know, keeping our fingers crossed for good results. I just want to call out that you guys made a really important choice in settling on OMAP as the base model and that, you know, best all indicators are. That's a really good practice. Right. You start with something that already exists and build on top of it or extend it necessary.
[44:31]
B
Something that doesn't get enough airplay. OMOP is an absolutely stunning accomplishment and we wouldn't have gotten as far as we did without being able to sort of stand on the shoulder of those giants, so to speak, of the OMOP creators.
[44:45]
C
Yeah. So shout out to the OMOP and the Odyssey teams and also to the general principle that it's very difficult to implement, which is don't invent your own standard, dude. Just don't. There's an XKCD cartoon that everybody shows at every conference where informaticists are involved. Right. That I can't do on air. But there's a big advantage to actually researching what's out there and figuring out how to build on top of it. So you guys are already supporting follicular lymphoma, you got multiple myeloma, you've got breast cancer now. What else is on the roadmap? How do you see this expanding?
[45:23]
B
Well, we have a partnership with Blood Cancer UK and we've committed to them to just work our way through all the blood cancers. So we're doing CLL next, so that should maybe, by the time this airs, that should be on the site. And then we're doing mantle cell lymphoma, both at the request of Blood Cancer UK and another foundation that is about to start a teaser there very soon that is focused on mantle cell lymphoma. So mostly blood cancers. But we definitely want to hear from patient support orgs that want this. And if you are supporting a lot of patients and maybe you're in an area that doesn't have particularly good standard of care and there's a lot of clinical trial innovation, you know, if you sort of. If you're a patient support org or just a clinician and you want to make the case that maybe we change the order up a little bit because there's poor standard of care, but interesting trials, we're definitely open to it.
[46:27]
A
So to that end, Adam, why don't you let the audience know how they can find you and how they can actually affect that. That order, if you will.
[46:37]
B
Yeah, just. Adamancerbot.org would love to hear from you.
[46:40]
A
And cancerbot.org is also the website, right?
[46:43]
B
Yes, yes, absolutely. Yes.
[46:45]
A
And you know, for researchers who want to get involved with this, they also use the same entry point.
[46:52]
B
Yeah, we actually have different, different roles that we. Today, we have the role of patient, but you can also sign up as a physician, so you can a clinician and you can manage your patients. You can also sign up as a nurse navigator and sort of manage your patients from there. And I just recently found out from doing all of lots of. We've talked to lots of researchers about how we can help them fill their trials and realize that we also need to have a researcher role so we don't have to have the live conversation. And then. Oh, sorry, there is the researcher role. If you sign up for the researcher role, then you will see all the trials that you are the registered owner of. But an enhancement that we're also doing is maybe you aren't listed as the registered owner. And you just want to say, look, I want to see what's going on in this trial, so we'll be adding that capability as well.
[47:50]
A
So, Adam, I just want to say that you and I met from a whole different world. I was at the MMRF and you were working on this, and you're solving a problem that both sides of the equation really need solved. From the clinical trial perspective, this remains one of the worst challenges in the whole stack and the ability to get patients both interested and to navigate the complexity to get, you know, to know that they're actually highly likely to be a good match, and then connect them right up to the clinical trial manager to finish that last mile and do it well with a high degree of robustness. That's really changing the way this is working very much to the positive. I hope that you can continue to get all the input that you need and the support that you need to bring this forward. And by the way, to our audience, Adam's doing this as a not for profit. So if you guys want to donate to this effort, I'm sure he'd be happy to have the help, whether it's in kind or whether it's actual dollars. But feel free to contact him because I think he's doing God's work here. He really, really is. Leon, I'm going to hand it back to you to land.
[48:59]
C
Thanks, Steve. And Adam, it's just been just a wonderful conversation, and I really appreciate you sharing both your personal journey as a patient and your journey as an entrepreneur and innovator in the space. We decided to take a problem life threw on top of you and turn it into something that can help humanity and, you know, help all of us. So just, I really admire what you've done. And I. I also feel like we've. We've pulled out some gems in the course of this conversation. We've had. We talked about scaffolding around AI for safety. We talked about making complex logic cognitively tractable for humans and LLMs, which I think is a really important pro problem, you know, problem to solve. And we talked about a design pattern of, you know, an innovation flywheel of injecting clinical expertise into prompts that I think is an architectural pattern that everybody building systems should understand. So, thank you so much for joining us today. I really thought it was terrific to hear from you and we look forward to hearing more about the successes of Cancerbot. You know, thanks to also to our audience and we, you know, hope you enjoyed the episode and join us again next week for another exciting episode of Practical AI in Healthcare.
[50:15]
A
Thank you for joining us this week on Practical AI in Healthcare. If you're ready to go beyond buzzwords and hype and explore how AI is truly transforming healthcare, stay tuned for more conversations that get us to what works. Until next time, stay stay Practical.