wavePod

Get Wave AI

#754: Accelerating Healthcare Decisions with AI Agents - AWS Podcast | Wave AI Podcast Notes

Back to AWS Podcast

#754: Accelerating Healthcare Decisions with AI Agents

AWS Podcast

Fri Mar 20 2026

In this episode, Jillian speaks with Cohere Health®, a clinical intelligence company focused on stre

Summary

AWS Podcast #754: Accelerating Healthcare Decisions with AI Agents

Date: March 20, 2026
Host: Gillian Ford
Guests: Gigi Yuen (Chief Data and AI Officer, Cohere Health)
Kenji Fujita (Staff AI Platform Engineer, Cohere Health)

Episode Overview

This episode delves into how Cohere Health, a clinical intelligence company, is transforming healthcare decision-making with the help of AI agents—specifically leveraging AWS’s Bedrock and Agent Core. The discussion explores the unique business and technical challenges of implementing AI in a highly regulated space, what it takes to build trustworthy and reliable AI applications, and practical advice for organizations considering adopting similar technologies.

Key Discussion Points & Insights

1. The Healthcare Problem Space & Cohere Health’s Mission

Timestamps: 01:21 – 03:41

Gigi Yuen describes Cohere Health’s aim to streamline complex transactions between “payers” (insurance, government financiers) and “providers” (hospitals, doctors) in US healthcare.
- “About 20 to 30% of healthcare spend in the States are spent on administrative tasks. Half of these are low value...that’s about half a trillion dollars.” (03:16)
- Mission: eliminate administrative waste so patients get faster care, providers focus on medicine, and payers finance efficiently.

2. Trust, Reliability, and Hallucination-Free AI

Timestamps: 04:02 – 09:33

Gigi: In healthcare, trust and transparency are paramount. “There’s no tolerance for hallucination, like none.” (05:22)
Cohere Health ensures clinicians are engaged from the very start of product development—not just for post-hoc validation.
Emphasis on:
- Fairness and reliability.
- Domain expertise (clinicians) in early and continuous involvement.
- Explicit evaluation and monitoring to catch errors before they impact patients.
- “We really have to move to the mindset of evaluation driven development where eval comes first.” (08:39)
Kenji: The right guidelines ensure all patients get consistent care while allowing some flexibility for user preferences.

3. Incorporating Domain-Specific Data

Timestamps: 09:54 – 12:43

Discussed why incorporating domain-specific data matters—not just how:
- For accuracy, contextuality, compliance, or output control (why informs what/how to use).
- Importance of data use rights: “What rights does Cohere Health and our partners have to use the data—for what purpose— is really, really critical.” (12:22)
The need to distinguish between operational, learning, and training purposes for private data.

4. Human-in-the-Loop vs. Automation and Risk Management

Timestamps: 12:43 – 17:09

Gigi: The “million dollar question” is when to automate versus when to involve humans.
- High-risk/high-reward = human in the loop. Cohere Health never uses AI to deny care.
- Lower-risk tasks (e.g., imaging) may be safely automated.
Example: Surgical approvals always get human review; diagnostic imaging may proceed automatically provided risks are low and guidelines are met.
Use of a structured framework to assess risk and automate safely.

5. AWS Bedrock and Agent Core: Technical Rationale

Timestamps: 17:45 – 25:34

Kenji: Chose Bedrock/Agent Core for:
- Speed of innovation.
- Out-of-the-box solutions for memory, tenancy, and security (critical for regulated industries).
- “The tenancy concerns that we have in a highly regulated space like Healthcare are covered with some of the components of Memory Client out of the box...implementing them only takes a couple lines of code.” (18:39)
Workshops and close support from AWS teams expedited their implementation and experimentation.

6. Lessons in Building vs. Buying and Model Evaluation

Timestamps: 25:34 – 33:22

Gigi: Cohere Health has debated internally between building custom models vs. using/fine-tuning third-party models.
- “Change is constant...so the best thing I could do as a leader...is to set very clear metrics: accuracy, cost, latency, reliability...” (27:04)
- Leaderboards are used to track model performance and inform switching decisions.
Kenji: Rich labeling by clinicians—over 50–60K new labels daily—supports robust, domain-specific evaluation and fine-tuning.
Flexibility: “It’s not always one size fits all...some use cases rely on smaller models we host internally; others on frontier models.” (31:56)

7. Business Impact and Measurable Outcomes

Timestamps: 33:22 – 36:43

85% of prior authorization decisions now automated—decisions made in minutes.
Human review for the remaining 15% saw a 30–40% productivity increase.
“It helps with job satisfaction because [clinicians] spend their time making clinical decisions...everything is surfaced, all the deep research is done.” (34:49)

8. Advice for Building and Scaling Agentic AI Solutions

Timestamps: 36:43 – 43:07

Gigi’s three essentials:
1. Evaluation-driven development: Let metrics guide from day one.
2. Define success criteria for agents at scale: Beyond POC, think through integration, training, user adoption, and supporting personas.
3. Don’t underestimate data integration: AI is only as good as the data/process/people readiness.
Debated team structures: Central agent team vs. distributed among dev teams—no one-size-fits-all answer, still an open topic. (40:52)
Kenji’s advice:
“Just start testing out and trying the tools...asking the questions early and often to the service teams...has helped uncover solutions we would have spent more time trying to figure out ourselves.” (42:34)
Gigi: Start with business need—not FOMO; carefully choose a cross-functional team including technical domain experts: “They are gold in the team.” (44:36)

Notable Quotes & Memorable Moments

“There’s no tolerance for hallucination, like none.” — Gigi Yuen (05:22)
“We really have to move to the mindset of evaluation driven development where eval comes first.” — Gigi Yuen (08:39)
“The tenancy concerns that we have in a highly regulated space like Healthcare are covered with some of the components of Memory Client out of the box.” — Kenji Fujita (18:39)
“Change is constant... so the best thing I could do as a leader... is to set very clear metrics: accuracy, cost, latency, reliability.” — Gigi Yuen (27:04)
“It helps with job satisfaction because [clinicians] spend their time making clinical decisions.” — Gigi Yuen (34:49)
“Don’t let FOMO get in the way. Really always start with the why.” — Gigi Yuen (43:07)
“Agentic work truly does require a new profile of developers and a development team...hardcore platform developer working side by side with a data scientist and clinical MD.” — Gigi Yuen (44:23)

Timestamps for Important Segments

01:21 – 03:41 — Healthcare admin burden: quantifying the problem and Cohere’s mission
04:02 – 09:33 — AI in healthcare: trust, no-hallucination, human-in-the-loop
09:54 – 12:43 — The necessity and risk of domain-specific data
13:41 – 17:09 — Automation/human review boundaries, illustrative examples
17:45 – 25:34 — Technical journey: AWS Bedrock/Agent Core
25:34 – 33:22 — Model evaluation, business metrics, and model choice
33:22 – 36:43 — Automation outcomes: measureable business impact
36:43 – 43:07 — Scaling, advice for AI agent adoption, team structure insights

Actionable Takeaways

Start with the Why: Know the business problem and success metrics before choosing technology.
Involve Domain Experts Early: “Clinician in the loop” is crucial from design to deployment.
Emphasize Evaluation from the Outset: Build robust evaluation and monitoring frameworks that combine both automated and human review.
Leverage Existing Tools: Using AWS services like Bedrock Agent Core can dramatically accelerate adoption while ensuring compliance and scalability.
Invest in Team Diversity: Successful agentic AI development requires platform engineers, data scientists, and domain (clinical) experts working closely together.
Don’t Underestimate Change Management: Process and people readiness are as critical as technology.

This episode serves as a comprehensive guide for organizations in any industry looking to build, scale, and measure the value of AI agentic systems in high-stakes environments. Cohere Health’s hands-on insights and clear frameworks offer a practical roadmap from the early days of AI deployment to production and beyond.

Loading summary...

Transcript

A (0:00)

This is episode 754 of the AWS podcast released on March 20, 2026.

B (0:09)

Welcome everyone to the AWS Podcast. I am your host, Gillian Ford. And this episode today I am super excited about. I think there's going to be something for everyone here. I know agents is really top of mind for. I mean, let's face it, like, every single person on the planet is probably thinking about this right now. And you get to learn from two people who have been in the trenches at a company that has not only just been thinking about this, but actually has business critical applications that are using agents today. So I'm really excited to talk to Gigi Yuen and Kenji Fujita from Cohere Health. So GGUN is the Chief data and AI Officer and Kenji Fujita is the staff AI Platform Engineer at Cohere Health. So there's something here for everyone. Whether it is you are someone who is thinking about agents and how do you apply it? Maybe you're in a highly regulated industry and maybe you want to understand how to use it in aws. Some advice from these two, we're going to cover all of this.

A (1:20)

That.

B (1:21)

All right, let's get started. So Gigi, I'd love to understand first, if you can tell our listeners about what is Cohere Health and the specific biggest business problems that you were thinking about of within the healthcare industry.

A (1:38)

Well, first of all, thank you for having us on your podcast, Julian. What a privilege. So Coheer Health, we are a clinical intelligence company and our mission is to streamline the payer and provider connectivity and the collaborations. So just take a second. When I say payer and provider, what do I mean by that? Payers are insurance companies or nonprofits or government entities that finance healthcare services, whereas providers are entities like a doctor's office, hospital systems, provider groups who provide care. So I guess providers provide care and payers pay for the care. So we have millions of providers in the United States and hundreds of payers. So you can imagine with these two entities, in order to support the whole healthcare ecosystem, there are a lot, a lot of transactions, a lot of administrative tasks from payoff, claims processing, payment quality, care coordination, and unfortunately, the fraud, waste and abuse that comes along the way when you have all these back and forth. And if you look at different studies, most recently Health affairs published an article where about 20 to 30% of the health care spend in the state are spent on administrative task. 20 to 30%. And some of them are necessary, some of them are avoidable, or I would say low value and depends on which studies you read, it's about half of these 20 to 30% low value administrative tasks. So that's about half a trillion dollar. So that's the problem space Coherent Health is set up to solve. We want to eliminate the waste. When you do that, what does that mean? Right. Patients can get the right care faster, providers can actually focus on what they do best and the payer can really, really do a good job financing their care. So that's the nutshell.

A (6:51)

I think let's put it in terms of the software development life cycle. Yeah. So when we are doing design and kind of the reference architecture, it's easy to have the technologists in the room. I think especially in healthcare and I imagine in many, many, you know, nuanced domains, we must have the experts in the room on the get go. It is common for many healthcare startup to talk about performances, talk about, you know, clinicians in the loop. But I think there is a difference when you engage your domain expert in the beginning versus at the end when we simply ask them to do validation. In career health, we do every single development project, have clinicians on the team and in the loop as opposed to waiting until we already built the prototype or already about the launches solution, asking them to validate. And I think that domain expertise is key to ensure that we're measuring the right things. I think that leads to my second point. I grew up in an era, we talked about test driven development and I think now, especially with a lot of this agentic solution, we really have to move to the mindset of evaluation driven development where eval comes first. What are the metrics that are important? How are we going to track? We talk about no tolerance for hallucination. It took us a month to iterate on exactly how we quantify hallucination in particular clinical settings. All those upfront work is more important than ever. I think that's key once the solution is launched. Now we are at the, you know, monitoring and tracking phase. Obviously having 24, 7 monitoring is key, potentially using, you know, the gen AI itself to help as a judge. But I do believe in the importance of human audits. Having that regular, intelligently sampled human audit, it's really, really critical. And I'm going to say one last thing and Kenji, you may have Something to add to because we've been working together on this is we're learning that there's different Personas that will use AI solutions, especially since healthcare is such a personal space and even learning about how do you roll out to different Persona groups over time help us build a more trustworthy and useful applications. So

A (36:43)

and a little context. Right. I've been doing this line of work for 20 years. So agents or not, right. The movement from a proof of concept or a pilot to a large scale solutions. I think McKinsey says that 95% of AI prototypes and pockets even go into production and only half of the ones that go in production actually stay in production. So this has been a age old problem regardless of agent or not. I think agent does create Extra layer of pressure because on one hand you can innovate and experiment faster but on the other hand there's more unknown where that you have to manage. So you actually exemplify the challenge and you write we're going to start with evaluation driven development. If there's one thing you want to take home from this podcast, that's the one line that I will really encourage us. And as we think about these metrics, to Kenji's point, having the right experts to label, provide the label data, provide the ground truth, even if it's weak ground truth, right? They are valuable, but make sure that they tie back to your business and operational metrics. So they're not pure functional non functional eval, but tying back to the overall company or your client strategy. The other piece I've personally seen a lot of struggles going from POC to scale is not having that, not investing that time. Let me put it the other way, I should put it in a positive way. Let me try again. Another part I've seen successes in taking from pocket to large scale deployment is taking the time to define and letting everyone know what must be true for the agent to be successful at scale. Because the nature of POC is to simplify, right, Is to not consider certain edge cases and assume certain level of integration and operational efficiencies. So in order to kind of flip that switch, we must be very, very clear on what are those important criteria for it to be successful. And they're not usually AI related, they're usually about the people who are going to be using the tool. Are you going to give them the right training? Are you going to give them the right transition plan? Are you going to bring in the right advocates? Because as I mentioned earlier, everyone look at technology differently. There are different Personas. Who do you bring on board to help you evangelize. And oftentimes things fail because of processes. Because if you don't change your processes but you get a new tech, you're not going to see the benefits and you can quickly fold, the impact can quickly be minimized. And then I think the third thing is classic system built, right? We always assume data integration is easy and it's never easy. So I think that's the piece that we always have to take a step back and say amazing AI system. Let's make sure the people, the process and the data already and having that clarity so that everyone's on the same page and marching to a single. It's key.