AWS Podcast #754: Accelerating Healthcare Decisions with AI Agents
Date: March 20, 2026
Host: Gillian Ford
Guests: Gigi Yuen (Chief Data and AI Officer, Cohere Health)
Kenji Fujita (Staff AI Platform Engineer, Cohere Health)
Episode Overview
This episode delves into how Cohere Health, a clinical intelligence company, is transforming healthcare decision-making with the help of AI agents—specifically leveraging AWS’s Bedrock and Agent Core. The discussion explores the unique business and technical challenges of implementing AI in a highly regulated space, what it takes to build trustworthy and reliable AI applications, and practical advice for organizations considering adopting similar technologies.
Key Discussion Points & Insights
1. The Healthcare Problem Space & Cohere Health’s Mission
Timestamps: 01:21 – 03:41
- Gigi Yuen describes Cohere Health’s aim to streamline complex transactions between “payers” (insurance, government financiers) and “providers” (hospitals, doctors) in US healthcare.
- “About 20 to 30% of healthcare spend in the States are spent on administrative tasks. Half of these are low value...that’s about half a trillion dollars.” (03:16)
- Mission: eliminate administrative waste so patients get faster care, providers focus on medicine, and payers finance efficiently.
2. Trust, Reliability, and Hallucination-Free AI
Timestamps: 04:02 – 09:33
- Gigi: In healthcare, trust and transparency are paramount. “There’s no tolerance for hallucination, like none.” (05:22)
- Cohere Health ensures clinicians are engaged from the very start of product development—not just for post-hoc validation.
- Emphasis on:
- Fairness and reliability.
- Domain expertise (clinicians) in early and continuous involvement.
- Explicit evaluation and monitoring to catch errors before they impact patients.
- “We really have to move to the mindset of evaluation driven development where eval comes first.” (08:39)
- Kenji: The right guidelines ensure all patients get consistent care while allowing some flexibility for user preferences.
3. Incorporating Domain-Specific Data
Timestamps: 09:54 – 12:43
- Discussed why incorporating domain-specific data matters—not just how:
- For accuracy, contextuality, compliance, or output control (why informs what/how to use).
- Importance of data use rights: “What rights does Cohere Health and our partners have to use the data—for what purpose— is really, really critical.” (12:22)
- The need to distinguish between operational, learning, and training purposes for private data.
4. Human-in-the-Loop vs. Automation and Risk Management
Timestamps: 12:43 – 17:09
- Gigi: The “million dollar question” is when to automate versus when to involve humans.
- High-risk/high-reward = human in the loop. Cohere Health never uses AI to deny care.
- Lower-risk tasks (e.g., imaging) may be safely automated.
- Example: Surgical approvals always get human review; diagnostic imaging may proceed automatically provided risks are low and guidelines are met.
- Use of a structured framework to assess risk and automate safely.
5. AWS Bedrock and Agent Core: Technical Rationale
Timestamps: 17:45 – 25:34
- Kenji: Chose Bedrock/Agent Core for:
- Speed of innovation.
- Out-of-the-box solutions for memory, tenancy, and security (critical for regulated industries).
- “The tenancy concerns that we have in a highly regulated space like Healthcare are covered with some of the components of Memory Client out of the box...implementing them only takes a couple lines of code.” (18:39)
- Workshops and close support from AWS teams expedited their implementation and experimentation.
6. Lessons in Building vs. Buying and Model Evaluation
Timestamps: 25:34 – 33:22
- Gigi: Cohere Health has debated internally between building custom models vs. using/fine-tuning third-party models.
- “Change is constant...so the best thing I could do as a leader...is to set very clear metrics: accuracy, cost, latency, reliability...” (27:04)
- Leaderboards are used to track model performance and inform switching decisions.
- Kenji: Rich labeling by clinicians—over 50–60K new labels daily—supports robust, domain-specific evaluation and fine-tuning.
- Flexibility: “It’s not always one size fits all...some use cases rely on smaller models we host internally; others on frontier models.” (31:56)
7. Business Impact and Measurable Outcomes
Timestamps: 33:22 – 36:43
- 85% of prior authorization decisions now automated—decisions made in minutes.
- Human review for the remaining 15% saw a 30–40% productivity increase.
- “It helps with job satisfaction because [clinicians] spend their time making clinical decisions...everything is surfaced, all the deep research is done.” (34:49)
8. Advice for Building and Scaling Agentic AI Solutions
Timestamps: 36:43 – 43:07
- Gigi’s three essentials:
- Evaluation-driven development: Let metrics guide from day one.
- Define success criteria for agents at scale: Beyond POC, think through integration, training, user adoption, and supporting personas.
- Don’t underestimate data integration: AI is only as good as the data/process/people readiness.
- Debated team structures: Central agent team vs. distributed among dev teams—no one-size-fits-all answer, still an open topic. (40:52)
- Kenji’s advice:
“Just start testing out and trying the tools...asking the questions early and often to the service teams...has helped uncover solutions we would have spent more time trying to figure out ourselves.” (42:34)
- Gigi: Start with business need—not FOMO; carefully choose a cross-functional team including technical domain experts: “They are gold in the team.” (44:36)
Notable Quotes & Memorable Moments
- “There’s no tolerance for hallucination, like none.” — Gigi Yuen (05:22)
- “We really have to move to the mindset of evaluation driven development where eval comes first.” — Gigi Yuen (08:39)
- “The tenancy concerns that we have in a highly regulated space like Healthcare are covered with some of the components of Memory Client out of the box.” — Kenji Fujita (18:39)
- “Change is constant... so the best thing I could do as a leader... is to set very clear metrics: accuracy, cost, latency, reliability.” — Gigi Yuen (27:04)
- “It helps with job satisfaction because [clinicians] spend their time making clinical decisions.” — Gigi Yuen (34:49)
- “Don’t let FOMO get in the way. Really always start with the why.” — Gigi Yuen (43:07)
- “Agentic work truly does require a new profile of developers and a development team...hardcore platform developer working side by side with a data scientist and clinical MD.” — Gigi Yuen (44:23)
Timestamps for Important Segments
- 01:21 – 03:41 — Healthcare admin burden: quantifying the problem and Cohere’s mission
- 04:02 – 09:33 — AI in healthcare: trust, no-hallucination, human-in-the-loop
- 09:54 – 12:43 — The necessity and risk of domain-specific data
- 13:41 – 17:09 — Automation/human review boundaries, illustrative examples
- 17:45 – 25:34 — Technical journey: AWS Bedrock/Agent Core
- 25:34 – 33:22 — Model evaluation, business metrics, and model choice
- 33:22 – 36:43 — Automation outcomes: measureable business impact
- 36:43 – 43:07 — Scaling, advice for AI agent adoption, team structure insights
Actionable Takeaways
- Start with the Why: Know the business problem and success metrics before choosing technology.
- Involve Domain Experts Early: “Clinician in the loop” is crucial from design to deployment.
- Emphasize Evaluation from the Outset: Build robust evaluation and monitoring frameworks that combine both automated and human review.
- Leverage Existing Tools: Using AWS services like Bedrock Agent Core can dramatically accelerate adoption while ensuring compliance and scalability.
- Invest in Team Diversity: Successful agentic AI development requires platform engineers, data scientists, and domain (clinical) experts working closely together.
- Don’t Underestimate Change Management: Process and people readiness are as critical as technology.
This episode serves as a comprehensive guide for organizations in any industry looking to build, scale, and measure the value of AI agentic systems in high-stakes environments. Cohere Health’s hands-on insights and clear frameworks offer a practical roadmap from the early days of AI deployment to production and beyond.