Summary6 min read

Emerging Litigation Podcast

Episode: Agentic AI on Trial: You Be The Judge Part 1 – Medical Diagnostics
Date: January 21, 2026
Host: Tom Hagy
Panelists: Galina Datzkovsky (AI/Compliance Strategist), Marina Kaganovich (Attorney & Google Compliance Advisor), Judge Lisa Walsh (11th Judicial Circuit of Florida)

Episode Overview

This episode inaugurates a special "You Be The Judge" series on the legal and ethical risks posed by agentic AI—AI systems capable of acting autonomously—in high-stakes environments. Focusing on medical diagnostics, the discussion explores emerging litigation issues, liability, standards of care, and workflow design in cases where AI makes diagnostic decisions with minimal or no human oversight. The panel evaluates a hypothetical legal scenario of an autonomous AI agent in mammography, probing who or what might be liable when things go wrong.

Key Discussion Points & Insights

1. Defining Agentic AI and Its Risks

(04:55 – 08:24)

Galina Datzkovsky provides foundational definitions:
- Agentic AI refers to systems that can act independently, set and pursue their own goals with little human oversight.
- Distinguishes agentic AI from "traditional" narrowly focused or rule-based AI.
- Generative AI enables creativity, but agentic AI takes a further step by enabling proactive, strategic, and autonomous action.

Notable Quote — Galina Datzkovsky (07:03):
“Agentic AI refers specifically to artificial intelligence systems that possess agency, which is really the ability to act independently, make decisions and pursue goals with minimal human oversight ... Agentic AI is proactive, strategic and action oriented.”

2. Hypothetical Scenario: AI in Mammography Screening

(08:24 – 12:16)

Marina Kaganovich presents a scenario:
- An AI agent screens mammography scans, forwarding only positive results to a radiologist and directly notifying patients (all-clear or follow-up).
- If the AI misclassifies a positive as negative, the patient gets a false sense of security and may discover their cancer much later, possibly with a worse prognosis.
- Raises immediate questions about liability—developers, calibrators, physicians, or institutions?

3. Exploring Potential Liability

(12:16 – 18:14)

Judge Lisa Walsh unpacks legal concepts of “agency”:
- In law, "agent" vs. "principal" is narrowly defined—who is holding out authority, who is responsible for actions?
- Multiple parties could be targets in litigation:
  - AI developer (design/algorithm flaws)
  - Person who calibrates the system (settings, risk sensitivity)
  - Doctor (clinical implementation)
  - Hospital/Health Network (deployment, oversight)
  - Data trainers/providers (quality of training data)
- Liability depends on who made pivotal decisions and contracts/licensing terms.
- Agency in legal terms is not a neat match with technical agency.

Notable Quote — Judge Lisa Walsh (13:10):
“Typically what the focus is on is on something called the principal. So the principal ... is whoever it is who holds an agent out as having the authority to do something, and then the agent who does something may bind the principal.”

4. Training Data Quality and System Calibration

(18:14 – 20:21)

Galina Datzkovsky queries the impact of insufficient or non-diverse datasets—how flaws or biases baked into training affect liability.
Judge Walsh agrees dataset issues could underpin liability, especially if claims of AI superiority over human review create unjustified trust.

Notable Quote — Judge Lisa Walsh (18:50):
“If the data that the system was trained on is out of date, isn’t sensitive enough, doesn’t alert the system to be able to recognize a malignancy or an abnormality ... that in and of itself might be a ... subset of potential liability because it’s a flawed system.”

5. Standard of Care in the Age of AI

(20:21 – 23:40)

Panel explores: Is there a unique "standard of care" for machine learning systems versus traditional medical practitioners?
Medical malpractice law uses professional standards—AI is not a physician.
Questions arise whether a new, possibly more stringent legal standard will emerge for AI, given expectations of AI’s consistency and lack of human frailty.

Notable Quote — Judge Lisa Walsh (21:38):
“There is a question in my mind as to whether there will be a development in the law of an entirely new body of law of how you evaluate standard of care ... That’s not. It is medical care, but it’s not the standard of professional care of a physician, you know, a human physician or medical technician.”

6. Balancing Efficiency and Oversight

(23:48 – 25:07)

Marina Kaganovich notes that, to date, even when AI is used, radiologists still review mammograms. Fully automated systems remove both the safety net and efficiency balance.
Questions: What mitigations (thresholds for review, feedback loops, random checks) can ensure negative results are truly negative?

7. Workflow Mitigations and Quality Control

(25:07 – 29:33)

Judge Walsh suggests possible safeguards:
- Multi-point data review before declaring a scan negative.
- Patient opt-in for human review.
- Quality control: sample secondary reviews to check system accuracy over time.

Notable Quote — Judge Lisa Walsh (29:33):
“So what about quality control? Even after the system is rolled out, you need to know what is the real result for the way it's working... there has to be some manner of determining that to ensure that it’s doing what it says it should be doing.”

8. Responsibilities of Developers vs. Deployers

(29:58 – 31:54)

Galina Datzkovsky: Developers should use robust training data, permit tailored calibration, and protect against known AI failure modes (e.g., hallucinations).
Responsibility shifts: Once a product is purchased, hospitals/providers must conduct onboarding checks, set appropriate thresholds, and ask the right questions during procurement.

9. AI Hallucinations and Prompt Engineering

(31:54 – 32:23)

Judge Walsh asks whether prompt design or system configuration can help minimize “hallucinations” (AI making things up).
Datzkovsky clarifies hallucinations are less about prompts in medical imaging, more about model architecture and implementation.

Notable Quote — Galina Datzkovsky (32:23):
“When I mean eliminate hallucinations, you would use the right models ... you’re using those models that will be very precise on the data and wouldn’t be specifically looking to get creative around the data.”

Memorable Moments & Quotes

Hagy, Opening (01:32): “It’s no longer optional for insurance companies. It’s an imperative to remain competitive. AI now drives claims processing, fraud detection, and even coverage decisions. But speed comes with controversy.”
Judge Walsh (18:50) on “urban myths”—the public’s trust that AI is better than human review—fuels risk if performance doesn’t live up to expectation.
Datzkovsky (27:11): “If it goes to a human being, to a doctor, for example, they could miss exactly the same things that the system does. So when you say what does it mean ‘negative is negative,’ you would come back to that idea of training the agent to understand what human beings consider negative.”

Important Timestamps

02:11 – 03:02: Guest introductions and episode structure
04:55 – 08:24: Galina defines agentic AI versus traditional AI
08:24 – 12:16: Hypothetical mammography use case and liability setup
12:16 – 18:14: Legal agency and routes to liability
18:14 – 20:21: Data quality’s impact on legal exposure
20:21 – 23:40: Medical standard of care and AI
25:07 – 29:33: Workflow mitigation, thresholds, and quality control
29:58 – 32:23: Developers’ responsibility, hallucinations, prompt engineering

Tone

The discussion is collegial yet rigorous. Judge Walsh approaches legal questions analytically but with a clear caveat that the law is evolving in this space. Galina and Marina balance technical and compliance perspectives, frequently looping back to the importance of context, reasonable expectations, and practical best practices for both developers and users.

Summary Takeaways

Agentic AI is blurring the boundary between tool and independent actor, raising complex liability and ethical questions in medical diagnostics.
Multiple parties may share liability—developers, calibrators, healthcare providers, and those overseeing implementation—depending on workflows, system design, training data, and oversight practices.
Expectations of AI performance may lead to higher or new legal standards, especially as systems become more autonomous and claims about their accuracy are made.
Robust training data, careful calibration, flexible workflows, and ongoing quality control are all critical for legal defensibility and patient safety.
The law is not yet settled; cases like the one described will shape the future boundaries of responsibility as agentic AI becomes embedded in life-and-death decisions.

End of Summary

Loading summary

Transcript31 lines

[00:00]
A
Foreign.
[00:10]
B
Hello and welcome to the Emerging Litigation Podcast. I'm your host, Tom Hagee. This is part one of a series that we are calling Agentic AI on Trial AI liability and legal risks. And our first one is going to deal with insurance claims, efficiency denials and ethical challenges. AI, of course, everybody knows this is no headline. AI is transforming a lot of businesses, certainly changing quite a bit. There's a lot of, a lot of legitimately scary stuff out there, I would say, in the use of AI. Just watch your social media feed. But, but it's also just powerful and does just so many good things with insurance. It's, it's, it's no longer optional for insurance companies. It's a, it's an imperative to remain competitive. AI now drives claims processing, fraud detection, and even coverage decisions. Intelligent document processing and natural language processing allow insurers to scan and interpret claims in minutes. Predictive analytics tailor policies to customer behavior. But speed comes with controversy. Algorithms at one very large insurance company reportedly denied 300,000 claims in two months. It's a lot of denials. And another insurance company, they're going to remain nameless here just for the heck of it. They allegedly terminated care prematurely with a 90% error rate. Who would accept a 90% error rate? No teacher I ever had would. And I, and I know lawsuits and new state laws are pushing for human review as a requirement. I think that's, that's a given. Certainly in my business, you have to have a human looking it over. It's powerful, but you want to keep your eyeballs on things. Today we're going to explore efficiency versus ethics, transparency and liability in automated denials. Those are issues that come up in almost every aspect of the use of artificial intelligence. Our panel comprises Galena Datzkovsky, PhD, she's a business strategy advisor. With her is Marina Kaganovich. She's an attorney and compliance advisor at Google.
[02:12]
A
She.
[02:12]
B
And then we have an actual judge, Judge Lisa Walsh of Florida's 11th Judicial Circuit. I want to thank Katherine Ratigan, who is a partner with Robinson and Cole in Providence, Rhode Island. I've had the pleasure of working with her on legal issues with regard to emerging technologies like drones and other things. And so it's a pleasure to work with her again. She pulled this podcast series together for us, and she's also going to give a more detailed introduction of our guests, which, which is necessary. So with that, here is episode one of Agentic AI on Trial AI Liability and Legal Risks. I should say also that the opinions expressed here are those of these Presenters, not their clients. Nothing to do with any cases that may be before in them. It's strictly their opinions. So with that disclaimer, I hope you enjoy it.
[03:03]
C
We have three, three leaders in legal, compliance and technology. We first have Galena Datzkovsky. She's an internationally recognized authority and compliance, information governance, AI data analytics. She has a PhD in computer science and really deep expertise in AI. She advises a lot of organizations on business strategy. She's on a lot of boards. She's just really interesting to hear her background and has a lot of great knowledge to share with this community. She's joined by Marina Kaganovit, who's an attorney and compliance advisor at Google. She also specializes in AI governance, cybersecurity, risk management and data privacy. She works with a lot of executive leadership teams to talk about how to secure cloud migration in a compliant way. This is an evolving regulatory environment. She's really knowledgeable in that. She does global cross functional programs for, you know, a lot of different organizations. As an advisor, she's part of the Armor International Board. She's really a thought leader in this space as well. And then we're also joined by Judge Lisa Walsh, who is a circuit court judge in the 11th Judicial Circuit of Miami Dade County, Florida where she's an administrative judge of the appellate division and she participates in some international arbitration division as well as she. I think an interesting fact is she's done over 100 jury trials, both criminal and civil. She has a lot of appellate decisions out there, lots of experience, and she kind of brings her judicial experience and insight into this discussion. So they're really a great group and they represent a pretty, a big. It's kind of interesting because they converge on law, technology and governance and justice. Really all their perspectives are really interesting to put together in this you Be the Judge podcast.
[04:55]
D
Okay, so I'd like to introduce our session and just discuss a little bit the evolution of AI, which is really rapidly progressing beyond assistance and co pilots toward autonomous agents and ultimately interconnected ecosystems of specialized agents operating without direct human intervention. So in this session we will explore these evolving risks from a legal perspective, focusing on key areas including ecosystem risks, user reliance and trust, legal liability, security, privacy, accountability and other aspects. So in the format of you be the judge, we'll ask our judge, we'll present the session and hypothetical cases and cover the issues as if they are actual court cases. But before we do that, I would like to define agentic AI for those of you who know this may be a little bit repetitive, but agentic AI refers specifically to artificial intelligence systems that possess agency, which is really the ability to act independently, make decisions and pursue goals with minimal human oversight. So just a little bit to dig into that, they set and pursue goals autonomously. These agents adapt to changing environments by themselves. They make decisions based on context and feedback. They might also use external tools to execute complex tasks. And finally, if I had to summarize it, agentic AI is proactive, strategic and action oriented. Now, in comparison, it's kind of funny to say with traditional AI and agentic AI, because what's traditional AI? We haven't been doing AI for that many years. But at any rate, it's worth noting that AI has been in use for quite a while and we've had what we would call traditional AI machine learning. Machine learning may be task specific, but maybe not static and rule based. And where have we seen that? For example, in various robotic systems, We've had those for quite a while. We've had expert systems in specific, specific areas. So we've had quite a bit of experience with AI in the traditional sense. Now, what's changed AI up is generative AI models. And what does that mean? It generative AI models respond to prompts. They can actually be creative, they could write their own things. They cover not just text or just vision or specific areas, but they actually work with text, image code generation, et cetera, et cetera. So really push the field of AI forward beyond specific vertical uses into the mainstream with the agentic, with the generative AI. Now with agentic AI, we already defined that we're taking it a step further and making AI actually autonomous and act independently. And this is a very important thing to keep in mind for the rest of this podcast and for the cases we are going to be looking at that the agent, the AI agent, is acting independently, potentially making decisions, taking actions, etc. Etc. Keep that in mind. I'm sure the judge will be commenting quite a bit on that. And with that, let me turn it over to my colleague Marina for the next segment and our first scenario.
[08:25]
E
Thanks so much, Galena.
[08:26]
A
Hi everyone. I think building on Galena's introduction, it's really an exciting opportunity for us to talk about the evolution of AI. And as she had mentioned, machine learning has been around for quite some time and has been used in the medical field, particularly in the radiology context. Context quite extensively up till now. And so one of the scenarios that we wanted to explore a little bit further is how this changes with the use and integration of AI agents into a workflow. So if we look at an example or scenario where we have an autonomous AI agent that screens mammography scans and then works to rank them as either positive or negative based on its training, and the agent is programmed to only send, let's say, positive screens for secondary review to a physician, a radiologist in this case, and for those patients who do have a positive screen, they would also receive a letter from the agent indicating that a follow up with the physician is needed and the agent would then proceed to schedule the follow up appointment. The patients, on the other hand, who receive, let's say, negative results from the AI agent would just receive a letter saying, hey, you're all clear, you know, your test is, your scans have been reviewed, no issues have been identified, and then no further action is taken. So this type of scenarios, it's, it's quite realistic because we already do have AI quite heavily involved in mammography screenings and assessments. But we're taking it one step further and we're integrating an autonomous agent into the mix. So that kind of at a very high level sets out the scenario. And let's now assume that we have a situation where this AI agent, for whatever reason, misreads a patient result and the result is indicated as being negative and no further action is taken, when in reality the patient scans is positive. And so no follow up is scheduled. And according to the workflow that I set out a moment ago, the patient's scans are not sent to a radiologist for review because when the AI identifies that a scan is negative, it just determines that no further action is required. And the patient is essentially advised of that in the form of a letter. And so in this particular scenario, the patient receives the all clear letter and determines not to take any further action. And then at some point later, the patient then discovers that they have cancer, but it's identified only at a much later stage and at that point it's metastasized. So we're finding ourselves in a situation with a scenario where because of the use of AI in this workflow and the lack of physician interaction in this case, the patient files suit. And so we wanted to break it down and look at the scenario from a few different perspectives and of course hear from Lisa in terms of the legal aspects of how this would be assessed at tratt. So the first question I would pose is, and of course, keeping in mind that these scenarios are intentionally crafted at a high level and the outcomes will of course be very dependent on facts and circumstances. But for the purposes of our discussion, let's assume that the AI agent missed identifying the positive result because the screening approach that was used was under calibrated. And had the sensitivity been set up to over screen, it would have caught negative screens as false positive positives. And so there would be more noise in the system. But on the flip side, positive results would also have been less likely to be missed. And so with the patient in this type of scenario pursuing legal action, who do we think they would go after and who would be liable in this type of scenario? Lisa, what are your thoughts?
[12:16]
E
So first of all, it's a pleasure to be together with both of you again. We've had past live interactions on this and it's a fun exercise to ask non legal practitioners, non judges, to think about this stuff. Think about it as development is happening, as products are being rolled out, and as decisions are being made. So who might be liable? And the options are, could the developer who developed the system be liable? Could the person responsible who decides how to calibrate the system be liable? Or what about the doctor who buys the system and implements it into their practice? Or maybe the hospital itself or health network that you know, that the patient is ultimately part of. And finally the provider who actually trains the healthcare system or the doctor or the practice on using the system. So which of any or all of these potential actors might be liable? So for starters, the whole concept of agentic AI is very interesting from a legal point of view, because the idea of an agent, from the developer of a product, of an AI product, agent or agency may mean something very, very different than it means in the context of the legal system. The word agent has a very particular meaning if you're talking about law of agency or law of agents. Typically what the focus is on is on something called the principle. So the principle in this case is whoever it is who holds an agent out as having the authority to do something, and then the agent who does something may bind the principal. So let's talk about the developer, the developer who created the system. They are not holding the agent out to the patient for, for the, for any action of the, of the developer itself. So the traditional notions of agency law, I think are, are a little muddled when it comes to the developer here. It isn't the system necessarily that that is, is wrong from beginning to end, as the prompt that you just explained indicates. You can calibrate the system many different ways. You could calibrate it to catch, to be so sensitive that you might capture false positives and you will miss very, very few false negatives. So calibration is an option here. I would imagine that if a system like this went awry, it is, it is probably likely that there will be an attempt to bring a developer into a lawsuit. As to whether they are liable, I think it may depend on, you know, is there anything about the system itself that, that is created in a reckless way? Is there anything about the creation of how the agent acts autonomously that is prone to error itself without, without any interaction of the user, meaning in this case the medical practice or the hospital or the doctor? So you always look to who is the principal in the case when you're asking a question about agents liability under, under traditional agency law, if you talk about the person responsible for setting calibration at the hospital, then it gets interesting because that person may have made a decision that we want to have as few medical appointments with the doctor as possible. We only want to focus on positives and we want to be sure that we don't capture any false positives and maybe they're not terribly sensitive to false, you know, false negatives. So depending upon the acts of the individual or entity responsible for calibrating the system, they may very well be pulled into a lawsuit depending upon what they did or didn't do and the choices they made. So what Galena and Marina have taught me over the years is what are the licensing provisions in the product? You know, is the user abiding by the licensing agreement when they bought that particular system? Perhaps in the, you know, the agreement or the contract in which the system was bought, there is a very clear warning there that says be advised that where, wherever you calibrate this system, here are the risks that are involved. So the person who made a decision on calibration, if they were not risk sensitive, that may land them in some hot water. The doctor for using the innovation, I imagine they'll be, you know, that a doctor is likely because ultimately it's the radiologist who diagnoses. I think it depends on how the system is interacts. What is the entity that screens do you go to, you go to your, in this case mammography, you'd be going to your gynecologist or your general practitioner. Might that doctor be responsible where it's ultimately the radiologist who, you know, who doesn't, isn't the one who caught it. All of those facts, all of those nuances I think would go into that. And finally the hospital or health network and the provider of training data, the hospital or health network, it depends on what is their nexus between the decision making, purchasing the system, calibrating the system, rolling it out and following up. And as far as the provider of training data, if the data is, you know, is murky or unable to be understood, and didn't adequately advise the purchaser of the potential risks, maybe there's an issue there as well. But I'm going to bounce it back to Marina and Galena for any further insight.
[18:14]
D
Yeah, sorry, Marina.
[18:17]
E
Sorry, Khalif.
[18:19]
D
So I do want to raise one question, Lisa, if you don't mind, and that is in terms of the training data, because a lot of times these kinds of systems are trained on data from previous scans and reads. So could there be a problem that's brought out to say that the data used to train the agent to recognize problems to begin with wasn't sufficient, wasn't diverse enough, et cetera. And could that change your opinion of the liability?
[18:50]
E
Very well might. If the data that the system was trained on is out of date, isn't sensitive enough, doesn't alert the system to be able to recognize a malignancy or an abnormality, especially if it lulls the purchaser into a sense of security, that this system is so much more sensitive than human eyes to be able to do an initial screen, you know, of, of a mammogram. So there, there very well might be problems if it's. That's like a fundamental fly in the ointment. That's a fundamental flaw in the system. We. This is real, this is real stuff right now. I mean, the most recent medical scan that I had was read by, by an AI system. I think this is, this is growing to be universal across the board. But what we, you know, what we believe, the urban myth that we patients believe, is that it's better that it's more sensitive, that it's not subject to fatigue and boredom, that, you know, that, that it's not going to overlook anything because it will be just as fresh when it reads my screen as when it read the prior thousand. But everything is only as good as the components that go into it. And if the, if the training to recognize is not sufficient, that in and of itself might be a, you know, a subset of potential liability because it's a flawed system.
[20:22]
A
I'm wondering if we can actually add in another wrinkle here and talk about standard of care. We've all had these conversations before, right, in a different context, but certainly I think it's quite relevant in this one. And so in the scenario we set out, we have a referring physician, we have a radiologist. There's also A radiology tech that actually helps take the stance. And so we. What role does standard of care play? If this type of scenario was being litigated in front of a jury, what are your thoughts?
[20:53]
E
If it's not? This is interesting because agency law, you look at from the point of view of the individual or entity that is holding out the agent to be representative of what, of what it's allowed to do. Medical malpractice law, it introduces a layer. It's a specialized standard of care, in other words, a medical malpractice claim, whether it's against a hospital for its nursing care or its tech or a physician for the standard of care in medical practice. That's a very specialized definition of standard of care. But this is. This is machine learning. This is not a physician. So there is a question in my mind as to whether there will be a development in the law of an entirely new body of law of how you evaluate standard of care, and that you would have to look back to the creation, the development, the input, the training data, all of that into and how it operates in the field. Are there any hiccups or glitches in the way that it actually carries out its function? That's not. It is medical care, but it's not the standard of professional care of a physician, you know, a human physician or medical technician. So I am not altogether. I mean, please don't, you know, don't quote me on this, because I'm just thinking out loud here, and these things might come to me someday. I certainly don't want to be accused of prejudging them. But, you know, thinking out loud about it, it may or may not be the appropriate lens to look at this stuff, because it's not. It's not a trained professional. And one other wrinkle on the issue of standard of care, the standard of care of a human cytologist, for example, that evaluates, you know, they screen and they, they look at slides and determine, are there any abnormal cells that I'm seeing on the slides? The standard of care is not perfection. It is never perfection. It is what is a reasonable standard of care in this particular profession. And missing false positives is part of a reasonable standard of care. It happens. But I'm wondering whether we are going to have an even higher strict expectation of the way a system will work when it's not subject to the limitations of a human being. So this may. May go in different, you know, I would say in different directions, even to even have greater scrutiny perhaps of a system that is Supposed to never miss.
[23:41]
D
Yeah, this is definitely a fascinating new area to explore. I think we have more questions. Right.
[23:47]
E
Yeah.
[23:48]
A
So I think I would actually. Sorry, to your standard of care point. I mean, machine learning has been used here for quite some time.
[23:57]
C
Right.
[23:57]
A
And I think what's interesting is that in our workflow, the, the machine learning still go typically the way that it's used today. Machine learning applied to scans still goes out to a radiologist to ultimately sign off and review. Right. And so our, our agent here kind of takes some of that away because we're saying that there's a presumably large subset of scans that are never seen by the radiologists in favor of efficiency, but also kind of as a nod toward the efficacy of the way that, that the agent functions of the machine learning functions as well to make sure, you know, there's obviously testing that's done to make sure that certain percentage thresholds are met. So I'm wondering, like, what are your thoughts on, for both of you really in terms of the, what's gone wrong in this case and in terms of the sort of developer and deployer implementation, how could this have been avoided? Because I keep thinking that, you know, we can keep sending all scans to radiologists for review, but that kind of negates some of the benefit of trying to implement an agent in a workflow where there's a very high level of confidence. And so what do you think? What are some of the other mitigants that should have been considered?
[25:08]
E
It seems like the fulcrum here was on if the, you know, if the system reads it as negative, there's no follow up. So there's two things. One is, what do you mean the system read it as negative? Does that mean that the system saw zero abnormality? How many points of comparison were there? How many areas did it look at? I mean, there could be many things that could be fed in to ensure that negative means negative. There is a gray area in every medical scan which is questionable. You know, is it a cyst? Is it a mole? Is it, you know, is it, is it something benign? Is it a potentially a flaw in the, in the tech that was, you know, manipulating the patient to take this particular scan? So there's the initial threshold of negative means negative. In other words, are you looking at enough points of data that you can clearly clear something out? And to have the screens reviewed by a physician obviates any efficiency whatsoever. But perhaps to have the report reviewed for efficiency might be one stop gap. And I think that's what happens now with Patients is at least the report that's generated by the system will give you all of those points of data, what it observed and that could be reviewed. That, you know, would sort of be splitting the baby here. And then the, the other thought is automatically sending the letter that you're clear as to whether there's anything else that can go to the patient or maybe the patient can opt and say, I want, you know, although I don't know what insurance would cover, I want, you know, I want my doctor to review it or to maybe have tech review of some sort. So I see in that workflow, I, I think it really lies. Negative must mean negative. And is there sufficient data to truly screen someone out without overlooking anything? Can you ever get that?
[27:10]
A
Good.
[27:11]
E
What are your thoughts?
[27:12]
D
Yeah, I wanted to add a little bit to that and maybe take a slightly different tack because at the end of the day, if it goes, if the review goes to a human being, to a doctor, for example, they could miss exactly the same things that the system to your point before. So when you say what is, what does it mean? Negative is negative, you would come back to that idea of training the agent to understand what human beings consider negative. On some variety of scans that were deemed negative, and by the way, they could have been deemed where flaws could come in. They could have been deemed negative incorrectly. So your data could be coming from the same, very same set hospital where some things were read as negative but actually weren't right. So unless they were screened out of the data pile, which is not necessarily the case, that's where your flaws could come in. Now, in, in my view, if you have good provenance of where the data came from, how you trained it on what's negative, the whole point of agencies, that it should be able to do that, send the letters and act autonomously. Now, of course it should work, it should continue to learn. So you want to give it as much of a feedback loop as possible. So if it does miss something, it will learn what it missed and how it missed it, which it probably will never forget, unlike a human individual who could act, overlook or forget or be tired. To your point. So there are all these, all these questions about it. Now, ultimately, ultimately the idea is that done right, it should be more efficient than a human being. And of course, if there's a true gray area, whatever that means, true gray area, meaning the system really can't decide. And you could set the threshold of certainty. So you will set a threshold of certainty. Maybe if it's 87% certain, it would say it's negative. And if it's below that, it, and it'll go to a human being and if it's, you know, below something else, then it's, it's, it would be considered positive. Right. Or what, whatever. So you could set, set, set up different thresholds and that would be another question I would ask then. Okay, so what were the thresholds? Where were the decisions made? Right. Because like you said, nothing is necessarily a hundred percent.
[29:33]
E
So what's, what about, what about quality control? Even after the system is rolled out, you need to know what is the, what is the real result for way it's working for me. So for example, I'm going to pull, you know, eight of a hundred negatives and have independent review or, or, you know, there has to be some manner of determining that to ensure that it's doing what it says it should be doing.
[29:59]
D
Yep, that's one way to do it. The other way, if you take all your gray areas, maybe you set your threshold low. So anything that's 50% to whatever, 90%, you still have to do a secondary screening and you keep feeding that back in. So your system will get better and better. So there could be different things you could do to mitigate and train your agents, which are not atypical. All of that will depend. And I'm sure in the court case, if there was such a court case, you would be asking them exactly what those protocols were.
[30:31]
E
So my question, which I think to dovetail, to kind of bounce off, that is from the point of view of the creator or the developer themselves, what could they implement at the, you know, at the front end before rolling it out to end users to avoid any of this from happening.
[30:48]
D
So developers, yeah, so from the developer's perspective, it's again, using the best possible training data, using the best workflows, setting, allowing to set thresholds and other variable conditions. So each given user, each given hospital, each given health system, each given insurance provider could calibrate it the way it's most appropriate to them and giving that kind of flexibility. I think that's where the developer of course, comes in, making sure that it's not vulnerable to hallucinations or forgetfulness. Right. Those are things a developer has to ensure. Given all of those, the rest is really on the installer and user, in my view. I don't know if Marina, you agree with that.
[31:36]
A
I think you have a good point there. And I think also a lot of this would be covered in like a vendor onboarding review. Right. Because the hospital or the healthcare provider Whoever's onboarding the use of this tool would presumably or should be asking these questions.
[31:54]
E
So, and I'm the layperson here, so hallucinations normally will. My, my. Again with urban myth. My understanding of hallucinations is that they will ordinarily occur because the. The guess of the system is attempting to please the prompt. Is there a way to recreate a prompt so that, that you're looking for positives? You're not looking to rule out cancer, you're actually looking for positives?
[32:24]
D
Yeah. So I think in this case though, maybe I would take it slightly differently because you're not engineering a prompt. So when I mean eliminate hallucinations, you would use the right models because probably they didn't. They either created the model or used some open source model, or however they designed the system, they're using those models that will be very precise on the data and wouldn't be specifically looking to get creative around the data. And different models, as you know, as we look at them, offer different strengths. So a lot of times developers will use multiple models, multiple pieces of their systems, short of using their own and training their own, which is expensive. But still, if. No. I know many software products that use multiple models for different pieces of their product to minimize the impact depending on what the product is meant to do. Yes, there's a lot of things a developer can do, but at the end of the day, even if the developer did everything right, we could still end up with the result we just described.