Building Agent Studio: How Medable Is Using Agentic AI to Accelerate Clinical Trials - Just Now Possible

Summary7 min read

JUST NOW POSSIBLE with Teresa Torres
Episode: Building Agent Studio – How Medable Is Using Agentic AI to Accelerate Clinical Trials
Recorded: March 2026

Episode Overview

This episode dives deep into how Medable, a clinical trials technology company, is building and deploying an agentic AI platform—Agent Studio—to solve significant workflow and data challenges across clinical trials. Host Teresa Torres is joined by the Medable team: Luke Bates (Product Leader, Agent Studio), Jen (Product Management), Matt Schofield (Product Designer), and Fikra Matthews (Principal Architect). Together, they cover the journey from spotting mammoth customer pain points to designing, testing, and iterating on a flexible agent platform that accelerates clinical trials globally—discussing everything from workflow orchestration to data layers, retrieval strategies, evaluation methods, regulatory compliance, and product vision.

Key Discussion Points & Insights

1. What Medable Does and the Clinical Trial Challenge

[01:34] Jen frames Medable's mission: “Our mission and vision is to bring effective therapies to patients faster.”
Current clinical trial processes take over a decade and generate a staggering volume of documentation.
Medable focuses on reducing these timeframes, especially with tools for remote patient engagement (e-concents, electronic assessments) and, more recently, agentic AI solutions that automate and optimize clinical trial workflows.

“There are single studies that produce tens of thousands of documents per month. It's a lot of documentation.”
— Fikra Matthews [14:25]

2. Evolution Toward Agent Studio and Platform Thinking

Medable tackles two major pain points: immense cognitive load on staff and complex, fragmented data sources.
[15:19] Luke: “It was a complete natural fit… this high cognitive load... that type of problem is really well suited to be able to bear RAG knowledge for a protocol and provide easy answers versus having to depend on a CRA.”
The Agent Studio platform allows both internal teams and customers to build, deploy, and customize AI agents tailored to their own environments, leveraging pluggable models, workflow automations, and connectors (MCPs).

“We take a platform approach to every solution… so that the next solution that comes around will have that capability already baked in.”
— Luke Bates [05:21]

3. The Flagship Agentic Applications: ETMF and CRA Agents

Electronic Trial Master File (ETMF) Agent

[06:37] Jen: The ETMF system manages the tens of thousands of trial documents required for regulatory review.
Users upload documents and assign metadata manually—a process ripe for automation given over 350 classification categories.
The ETMF agent automates document classification and metadata assignment, starting with human-in-the-loop to ensure trust and accuracy.

Clinical Research Associate (CRA) Agent

CRAs monitor data across 13+ disparate systems —time-consuming and error-prone.
The CRA agent aggregates these sources, surfaces insights, and can recommend—and sometimes even take—actions, all with human oversight.

“Legacy systems today can surface a potential signal... but the AI can actually go one step further and take the action for you on behalf of the human. Again, with that human in the loop.”
— Jen [09:53]

4. Inside the Agent Studio Platform

Enables both engineers and non-engineers to configure agents using any model (“bring your own model”), connect to multiple data sources, and orchestrate workflows via skills (Anthropic protocol), triggers, and MCP connectors.
“Our platform allows users, not just engineering users, to be able to configure their own agents using agnostic models.” — Luke [03:50]
Broad deployment patterns: user-facing apps, internal augmentations, and custom builds for clients.

User Experience: Customization Spectrum

Medable supports a range of customer technical capabilities:
- Some clients want "out of the box" solutions (pre-built agents).
- More advanced users assemble their own multi-agent systems with platform Legos.
- Middle ground: easy configuration of parameters without coding.

“You can just use it in other ways. You're saying, here's the Lego blocks, make whatever you need... and then maybe there's even this middle ground, like, [you] can pick the color, you can pick how big the thing is... offering that full range. Is that a fair analogy?”
— Teresa Torres [28:08]
“Yeah, exactly.” — Matt [28:40]

5. Data Retrieval, Ontologies & Context Window Management

Massive, heterogeneous data is a consistent obstacle.
The team invests in an AI-powered data layer, aligning all sources to a “common ontology”—making sure synonyms across systems (e.g., 'participant') map correctly for agents.

“We use the phrase RAG a lot, but RAG is just so... people tend to align it with vector DBs and embeddings... we allow customers to create simple, identifiable sort of little vector pools, but we also have more complex systems where you’re talking about data hierarchies and layers of summarization.”
— Fikra Matthews [33:16]

“We need that common layer that allows us to understand… when we talk about the 13 different systems that a clinical research associate uses… you need that common layer.”
— Jen [36:25]

Platform enables multiple retrieval strategies (embeddings, keyword search, file traversal) and aims to automate smart selection of these as the field matures.

6. Orchestration, MCP Connectors & Tool/Skill Management

MCP connectors are the main protocol for plugging in both internal and third-party data/services, layered with custom authentication.
Key lesson: API/connector quality is often dictated by external vendors’ capabilities.
Sub-agent architectures and manual/automatic tool filtering are used to avoid context window bloat and improve efficiency.

“When you’re building MCPs for some of these external tools, you’re at the mercy of the other vendors’ API mechanisms... we’ve used agent skills to help prime MCPs.”
— Luke [45:22]

“We also have a sub-agent that will filter tools before passing to the main agent to avoid context rot... all these options work well in different circumstances.”
— Fikra [47:59]

7. Evaluation Strategies & Ensuring Reliability

All agent configurations pass through a rigorous evaluation phase before launch—both at the Agent Studio/platform level and product (ETMF, CRA) level.
Golden datasets (e.g., 2,000 documents for ETMF) used for pre-launch accuracy assessment and ongoing human-in-the-loop review for continuous improvement.

“We’re able to monitor... which recommendations the human’s agreeing with and which they aren’t. Humans aren’t always 100% correct… there’s a continuous kind of evaluation process we go through.”
— Jen [54:14]

Human-in-the-loop corrections aren’t always ground truth; the team collaborates with clients to resolve ambiguous cases and improve both agent and human accuracy.

8. Regulatory (GXP) Compliance & Industry Requirements

Health and drug trial industries are heavily regulated; Medable’s platform is built to meet and document specific intent, design, and test traceability in line with GXP (Good Clinical Practice and similar frameworks).
Evals and documentation aren’t just QA; they’re foundational to regulatory acceptance.

“We need to be able to also prove… that we’re building these agents to a specific intent… evals become really important in that part.”
— Luke [60:16]

9. Notable Quotes & Memorable Moments

“Our ambitious, big hairy goal is one year [for clinical trials].”
— Jen [11:14]
“There’s no one right way of [data retrieval]. Everything we’ve tried has been good in some way and bad in another.”
— Fikra [33:16]
“We want to enable that full, end-to-end clinical trial process that has these agent workers helping along the way. Today there’s 10,000 uncured illnesses; at current pace, it’ll take 200 years to get to market.”
— Jen [63:50]
“You can’t even read books about the stuff that’s being advanced right now. You got to stay on top of the latest papers and YouTube videos... It’s a different way to learn than traditional SaaS platform technologies.”
— Luke [19:30]
“Why isn’t this doing the exact thing my traditional system should do? ...We shouldn’t be just comparing these agents to these systems. We should be comparing them to humans. Humans make errors, too.”
— Luke [62:38]

Timestamps for Key Segments

[01:34] – Medable’s mission and industry challenge
[05:21] – Agent Studio platform approach overview
[06:37] – ETMF agent: document/metadata automation problem
[08:56] – CRA agent: unifying trial data & workflow support
[13:33] – Clinical trial timeframes and manual processes
[15:19] – Why a platform over point solutions?
[18:14] – Team’s AI/ML learning curve and backgrounds
[22:03] – Timeline: building Agent Studio amid AI/agent revolution
[23:24] – Modular agent and sub-agent system explained
[25:39, 28:08] – Customer customization spectrum; LEGO analogy
[30:23] – Skills protocol & platform evolution over time
[32:34] – Data layer, ontologies, and rag/retrieval strategies
[42:41] – MCP server implementation, security, and design
[45:22] – Context window management strategies
[50:47] – Latency, cost, and UX tradeoffs in agentic workflows
[50:57] – Evaluation harnesses: pre- and post-deployment
[54:14] – Continuous evaluation, handling human disagreement
[58:06] – GXP regulatory compliance challenges
[63:48] – The vision: “full self-driving” clinical trials

Final Thoughts & Vision

Medable’s north star is end-to-end automation of clinical operations—reducing time-to-market for therapies from years to months.
The Agent Studio approach blends robust platform thinking, cutting-edge AI techniques, regulatory rigor, and deep customer empathy.
Key theme: Shift from AI being a “black box” to building reliable, transparent, customizable, and compliant agentic solutions for some of humanity’s hardest problems.

Listen If...

You want practical insights into building and shipping real-world AI/agentic products.
You’re interested in how cutting-edge AI techniques intersect with highly regulated industries.
You want to learn from builders at the intersection of product, engineering, design, and compliance.

Memorable closing moment:

“It’s very refreshing to hear people use especially this type of technology for genuinely hard human problems. So keep it up.”
— Teresa Torres [65:32]

Loading summary

Transcript117 lines

[00:04]
A
Welcome to Just Now Possible with Teresa Torres.
[00:09]
B
Hello, I'm Luke Bates. I'm the product leader responsible for Agent Studio at Metable. I spend a lot of time working with product teams, the rest of the company and our customers, working to understand their problems, to address them with a platform way to deliver agents.
[00:23]
C
My name's Jen. I'm also in the product management space here at Metable. I've been with Metabol for about five years now. So I've gone through lots of variations of product roles here at Metabol. Most recently I am overseeing our product teams where we're looking to solve customer problems with our agentic powered solutions.
[00:44]
D
I'm Matt Schofield. I'm a product designer for Medable. I work on both the site experience and patient experience teams in addition to our Gentiq platform and our design system. Lots of hats in that role.
[00:58]
B
I've been with Metable for a little
[01:00]
D
over three years now.
[01:01]
E
My name is Fikar Matthews. I am principal architect here at Metable and I'm on the engineering side of Agent Studio. So I would have initially started with helping build out the initial core infrastructure, agent runtimes, tools and MCPs and evaluation layers. Right now I'm focusing specifically on helping build out the ETMF agent agent work. So it's, yeah, mainly the engineering side.
[01:23]
A
Before we get into your agent platform, which I'm really excited to learn about because I've heard great things about it. Let's just start with what does Medable do overall? For folks who haven't heard of it,
[01:34]
C
I would say that overall, Medible's mission and vision is to bring effective therapies to patients faster. So we've had a journey throughout our over 10 years of life where we're really looking to solve the problem. That is, it takes over 10 years currently to get a drug to market. So where our bread and butter has been the last few years has been in the clinical assessments and econsent space where patients are able to complete electronic questionnaires and complete an electronic consent. And this is really looking at solving the overall problem of today. A lot of patients don't live near a clinical site and therefore they might not be able to go into a site to be able to complete a clinical trial to understand if this could help them cure their disease, essentially. So ultimately all the things that we try to attack are trying to enable patients and to get effective therapies faster. And as we've evolved into kind of the agentic AI space where taking a whole new kind of lens at that problem space because of the technology of what the technology unlocks in order to help solve that problem even more effectively.
[03:00]
A
Yeah, amazing. Okay. I'm a little bit familiar with just how many rounds and how long it takes and the money involved to run a clinical trial to get a drug approval. I know about this. From the U.S. standpoint, are your customers. Is this primarily a U.S. challenge or is it global?
[03:18]
C
It is the global challenge.
[03:19]
A
Okay.
[03:20]
C
Yeah. So, yeah, we are the clinical sites that we support over 100 different languages and we support clinical sites all over the world.
[03:29]
A
Oh, wow. Okay. Excellent. Yeah, we tend to have a global audience, so I always like to set the scope of what we're talking about. Okay, so now I understand you have an agent platform that's then driving a lot of other agentic products. Does one of you want to give me the overview of that product and then we can dig into how it came about?
[03:51]
B
Sure. We have the platform that we have allows users, common users, not just engineering users, to be able to configure their own agents using agnostic models. So you can bring your own model. You can use any model that's a flagship model that's out there. It pairs with rag knowledge. We have a breadth of MCP connectors to be able to connect with different data systems. We support workflow functionality within Agent Studio and Agent skills, which is a common pattern that Anthropic has helped to develop to be more precise and manage context windows for agents. We also support multiple triggers. So a lot of people are familiar with chatting with an agent, but that's not the only way you can interact with an agent. You can. We focus on how can you interact with an agent through Microsoft Teams or through Slack or how can it be triggered based off of a webhook or another system? And you not have the human trigger from the beginning. We have a couple different deployment patterns for how we deploy Agent Studio. We're talking a little bit about ETMF today, but ETMF is an application that we built on top of the Agent platform, and the Agent platform is the engine for it. We also, some of our more traditional products operated in the past. We also augment those products with agents as well. And we also allow our customers to build agents directly onto our platform. So it's both both a platform and an experience for our users. And in that case, we have a services arm that helps to provide like a forward deployed engineer to help build out agents to solve customer problems. So it's pretty broad and we take a platform approach to every solution. That we provide. We start with does the platform capability exist for these solutions so that the next solution that comes around will have that capability already baked in and easy for somebody to be able to configure?
[05:37]
A
Yeah. Okay, so I heard two products that we're going to, I want to dig into more. So one is this Agent Studio platform that is allowing your internal teams to build agentic products, but it's also enabling your customers to build their own agentic products. And then it etmf. What is the acronym?
[06:00]
C
Trial Electronic Trial Master File. And there's actually, we do have a third agentic powered application, I would call it, which is a, we call it a CRA Agent, but CRA stands for Clinical Research Associate. So that's really focused on a specific Persona that we're developing in the agentic application for. So do you want me to go into the each of those and the problems that they're solving?
[06:25]
A
Yeah, and you said there were three. So I heard two.
[06:27]
C
Oh, I added to the agent platform.
[06:30]
A
Okay, gotcha. Yeah, yeah. Give me a quick overview of the two. So you have two products today that build on the agent platform.
[06:38]
C
Yeah, exactly. So you could build lots of agents right on the agent platform, just out of the box, leveraging the forward deploy engineers or if, even if customers want to build their own. But the ones that we've been focused on versus what we're calling agentic powered applications, which actually have multiple agents powering them. But essentially the etmf, the Electronic Trial Master file, is system that has been around in the clinical trial business for many years now. And essentially what it does is allows you to store all the relevant documents for your clinical trial so that when the FDA or someone comes in to review that clinical trial, they're able to easily find all the relevant information associated to it in order to determine was this trial conducted properly. So essentially what we've discovered in our problem discovery is that what's happened today is there's over 80,000 documents a year uploaded to this system just for a given customer. And essentially it takes the users that have to upload these documents into the system around at least five minutes per document to assign the associated classification and metadata that allows the auditors or the people that need to review the documents at the end of it to more easily find them and more easily do their analysis. So this was an obvious problem we felt for AI to be able to see solve. Right. There's over 350 classifications that the user has to understand and assign associated metadata to. So leveraging AI to solve this problem just made sense where we're able to, we would be able to completely take the human out of that process, of the classification and the metadata. Obviously we look to start with human in the loop to always to ensure that we're feeling the trust and the accuracy of the agent results. But that's an overview of kind of what the first problem area that we're looking at with this agent powered application.
[08:52]
A
Okay, great. And then do you want to give the same overview for the second product?
[08:56]
C
Yeah. A clinical research associate is someone, and there's lots of them, probably sometimes over 1300 clinical research associates at a given sponsor or a, or a CRO and the people that are actually funding the trial and essentially what their job is to monitor the data coming through from the clinical research site and from the participants throughout the life cycle of the trial. To do this Today, there's about 13 or more different systems where data is coming in throughout the life of the clinical trial. And so they need to navigate through all 13 of these different systems to be able to just understand what's going on in the trial. Is the data of high quality? Is there any patient safety risks? And it's very time consuming for them just to get to the point of understanding the data because of the problem of having to solve through or sort through these 13 different systems. So what we've done is leveraging the Agentec platform is to be able to connect those 13 different data sources together, surface the data to them in a more easily understandable way, and then one of the key things is provide recommended actions. And this again is where AI is suitable and is doing a better job than some of the other legacy systems, as the legacy systems today can surface like a potential signal of something. But there's no recommended action on how to actually move forward. And the AI can actually go one step further and take the action for you on behalf of the human. Again with that human in the loop.
[10:49]
A
Okay. And then just to give listeners some context, do one of you want to meet like a high level structure or outline of how a clinical trial works? I want to just give people a sense of like the scale of time, the scale of people. Because Jen, you just described two very clearly clear problems. But I want to also make sure people have the context to understand them.
[11:14]
C
Yeah. What's interesting is I think a lot of people think like the development of the drug is the long tail, but it's actually not. It's the actual clinical operations. And so that's where we really focus here at Medival is trying to reduce the time for that clinical operation so that 10 years could be potentially reduced down. Our goal, our ambitious bigot hairy goal is one year. And so essentially like it starts from creation of a protocol which is the rules of the clinical trial and actually ensuring that the science team comes together and understands what they need to assess to determine if this drug is effective. And then it goes to sourcing a bunch of sites, clinical sites, to find participants to determine who can participate in the trial. And they're recruited and there's an enrollment process to determine if they qualify to participate based on the eligibility criteria. And then from there it's a matter of depending on the protocol, like procedures. So if it's a vaccine, you get the vaccine and then you monitor the symptoms of that vaccine for typically around seven days and then you monitor any ongoing symptoms from there. And all of this, because it is global and it's taking place across the world, needs to come together into one kind of streamlined data source. So one of the major problems today is that each of the different sites have a different kind of source system. Oftentimes it's paper still. And then they have people that they hire at the sites that actually look at the paper and re enter that system into what is called electronic data capture EDC today. And then you have the CRAs, for example, reviewing the data. They'll go on site and they'll look at the source and they'll look at the EDC and they'll make sure that they're the same. So there's just so many manual things that still occur today in this industry that technology in general can help solve. But we're really seeing a lot of cool new opportunities with the, with our agent platform as well.
[13:33]
A
Yeah, this helps. So like when we talk about your ETMF product, you're talking about tens of thousands of documents being entered over 10 years and making sure that all those documents end up with the right metadata. And when you're talking about CRA product, it's really helping with across a 10 year period, making sure that data coming in, getting to the right place and being seen by the right person.
[13:59]
C
Yeah. And you can imagine if you find an issue a month after it's been entered, no one remembers the reason that issue occurred in the first place. So proactive data monitoring, being able to find issues when they happen and not one month later is super valuable.
[14:18]
A
So you mean like symptom and then they need to follow up with more information. You can't do that a month later. Yeah, Yeah.
[14:25]
E
I also think it's interesting to note the actual scale of the documents. There are single studies that produce tens of thousands of documents per month. It's not tens of thousand documents over 10 years, it's tens of thousand documents a month per study. It's a lot of documentation.
[14:43]
A
Wow, okay. I'm hearing clear customer problems. Like, I know that clinical trials are extremely expensive and a lot of it, it sounds, is because of this administrative cost and just the volume of data and having to get it right. One thing I'm really curious about is how did you decide to start with an agent platform? First of all, is that where you started? Did you start building one of these agents and then realized there'd be more? Tell me a little bit about just the beginning of this platform. How did it come about? How did you identify a platform was the right approach?
[15:20]
B
So I think we saw that there was this human like capacity limit to the problems here. And we've been trying to tackle this for a number of years prior to adopting agents and AI ourselves in our own operations. And it seemed like a complete natural fit to be able to look at this and be able to say, look at this high cognitive load that we're putting on these individuals. They have to learn how to. They have to learn about a protocol. This protocol could be over 200 pages of material that's scientific evidence and safety and guidelines that need to tell them how to do the job. And they have to be experts at knowing that document. And there's hundreds of these people that are supporting this across sites globally, around the world. And we look at a problem like that, it's like that type of problem is really well suited to be able to bear like, like rag knowledge for a protocol and be able to provide easy answers versus having to depend on a CRA who helps to be the arbiter of asking questions and getting information from a handful of scientists that are supporting the clinical trial. So that's like number one. Number two is really this messy data problem. There's data in multiple systems and you need to move those that data around into very structured formats. And that part is painful. And with the advancement of AI, it does that kind of stuff really well. And people do not. I would say there's a huge accuracy problem that leads to all this burden that agents help to address with that. So it was a natural fit for us to be able to say AI is the right problem to solve. But We've been a SaaS platform company from the very beginning. We've always solved our Solutions from a platform approach to be able to look at it from the perspective of we build isolated environments for our customers, we are the data processor for their environments. And we wanted to follow that same practice with the way that we build agents. We want to build agents that are purpose built for our customers. They're not going to be all out of the box, they're all going to have their nuances. Every trial is different, every customer is different. It allows them to be able to manage that isolated concepts. And as we build on the agent platform, we're able to accelerate our ability to deliver more solutions much faster. Which is a key part of that is it feeds into our product development life cycle.
[17:31]
A
Yeah, I love that you're building a platform that both you're using internally and you're exposing to your customers to use for their own needs as well. I think that's just a win win. I also love this mindset of, okay, all of our customers have really bespoke needs and I could see how in clinical trials that's definitely the case. And so you already had developed this platform mindset and when AI came along you were like, okay, let's take the same platform approach. And what's that platform component that enables these types of solutions? I'm curious about as a team, your background. Did you do any of you have an ML background? Were you familiar with, like, how did you get up to speed with let's build an agent platform?
[18:14]
E
I don't have a machine learning background, but I did essentially my background, I started in academia around biomedical engineering and then moved into from that after my postdoc work. I did work just generally in social media, but bringing it back to life sciences. So primarily my background is in engineering and life sciences and things like that. But machine learning was something I had touched on in my research, but not directly. And so it was nice to have come around to it, see this growth and this kind of surge in this different approach to AI. And I fascinated by it. So it was just, it was. You couldn't keep me away from reading and learning and trying to do things with an experimenting with it. And especially embedable the environment for that kind of experimentation was incredibly encouraged. Right. Because we wanted to see what we could do and we wanted to understand what were the limits and levels of what could we provide to our customers with this. And it required a lot of experimentation and, but always with that eye. I think, as Luke said, we are a platform company and we wanted to provide solutions that would allow our customers to solve the problems that they had, we would provide solutions to them, but they also could provide solutions for themselves through our systems. And so that was what brought me here. That was my experience.
[19:30]
B
Beaker was a very early adopter in us advancing our AI and agentic capabilities because of his researcher mindset. So I'm just pointing that out. I did not myself come from machine learning. A lot of it is self learned. Like I think everyone in the room here, we're all, you can't even read books about the stuff that's being advanced right now. Like, you got to stay on top of the latest papers and the latest YouTube videos to see what's been released yesterday to learn how quickly you can adopt it and learn about it. It's a different way to learn about something than the traditional SaaS platform technologies, which is where my background is primarily.
[20:03]
C
SaaS technologies did work in my early career, early part of my career with GE Digital. They hired or they acquired a company called Wise IO, who is actually very big experts in the machine learning space. Very. They hired a lot of really smart data scientists. And essentially what we were doing was mining like more of like industry of things. So we were mining through gas turbine data windmills and we were making recommendations more to GE internal users to be able to say, hey, it looks like there's a problem with this piece of equipment and this specific aspect of the machine and providing a recommendation to the user as to how they should act on that and then having that sort of continuous learning and feedback mechanism to continuously improve. But it definitely feels like a different world here with the Agentic platform. Sweet.
[21:02]
A
Yeah. I think what's pretty fun about this moment in time is there's just so many teams that are in similar situations. They see the potential of this new technology and everybody's just diving in and trying to figure it out. And it's actually why this podcast exists. It's just how do we help level everybody up by sharing stories and what's working and finding patterns and it's a lot of fun. Matt, is there anything you wanted to add?
[21:25]
B
Sure.
[21:25]
D
I look at it as I'm learning on the fly as well. And as a designer, I'm always looking for different tool sets to tell the story and things like agentic and AI and vibe coding and things like that has really taken centerpoint into our process and it's allowed me to learn pretty fast. And our team is pretty small and nimble. That's a good thing to have in our toolbox and yeah, just continuous learning on it.
[21:55]
A
Excellent. Okay. And then Give me a sense for when did you start working on the Agent Studio?
[22:04]
B
I want to say it was about two years ago. Is that right, Vikra?
[22:08]
E
Yeah, to different degrees. So we've been, we have, we've been. Yeah, about two years ago we would have started, I'd say the Agent Studio software we have today. I'd say the specific thing is probably about a year when we really started building. So yeah, I'd say depending on. Yeah, about two years.
[22:25]
A
Okay. The reason why I asked this question for listeners. We are recording in February of 2026 and I feel like, Luke, when you first described the platform, it sounds like almost any agent harness that exists today. Right. So you've got skills and MCP connectors and like today these are everyday things that everybody's playing with and there's a million SDKs. And it sounds like when you started though, almost none of that existed. So this was very much. We're going to build this ourselves because this enables all this, the rest of this functionality we can see down the road. So I want to dig in a little bit because you have a platform that you're then building a product on top of. Maybe we can look at this, look at these in parallel. So if we start to look at how does the agent platform work and then maybe what does it enable in one of those two products so we can get an understanding of both the platform and then also what it's unlocking in one of your two products. How does that sound,
[23:25]
B
the way it works? Yeah, that's a good question. So when we have an app like etm, I'm going to work backwards from one of our apps from ETMF or cra that'll help it to make more sense. Those are not just single agents, those are an ecosystem of agents or orchestrating agents. And there's also front end experience and there's databases that are not necessarily agentic that are also plugged into those are more agentic powered applications. But if you work back from it, the smaller components that help to make it up are built with very specific jobs. So you might have an agent that just focuses on. I'm going to be really good at classifying this document for etmf. And so the way that you would build that agent is you would make sure that you have the. We've invested the right knowledge that helps to share what are the document repositories that are representative of what we'd want to be able to classify with. You'd want to be able to create that agent to add that knowledge to it to say that this is the knowledge that's relevant to this agent. We would be able to configure the model parameters, like which model do we want to use? And define a system prompt. And we do everything in a versioned way. So like every time you create a new version of an agent, you have the ability to work through a draft process, go through the process of building out evaluations and then being able to publish that and share that out with the end users. For an agent that's supporting one of these applications, the end user is really that app and it's not like a user going and working on this specific agent, but you're able to use it in many different ways. You're able to directly interact with that agent or sub agent, or you can work with it directly through the application.
[25:01]
A
Okay, so I'm a heavy Claude code user and as you just were describing that, I can imagine like I've got my Agent MD file, I'm picking my model, I'm giving it instructions. But I know that might work for you internally building agents for your other products. I suspect that doesn't work for your customers who don't really tinker with things like Claude code. So maybe this is a question for Matt. Are your customers actually creating agents and then building their own solutions that involve multi agent systems? And if so, how are you making that easy for them to use?
[25:39]
D
Yeah, it's a pretty hard task to accomplish. I wouldn't say that all of our customers are building their own agents. I think if you use the example of the CRA agent. Through our research we found that for the most part we want the agent portion of that to power what they see on the interface.
[25:59]
B
Right.
[25:59]
D
Having the ability to adjust things like when a trigger happens or how often you get notifications, for example, it's more, I guess on the. It's like human readable, for example. It's not. You're not getting in too much to the technical implementation of it. So it was a challenge to kind of balance that with the output of what they actually see on the ui.
[26:25]
B
I think what you're pointing to is there's a different way that people work now when it comes to working with AI and agents and cloud code and cursor, that requires a different type of mindset. And not all of our customers and our users have that mindset. So that's why we have two deployment mechanisms when we provide direct access for Agent Studio to our customers is we can say, hey, we are there to be your sherpa we're there to help guide you through the process, to help to adopt this culture change and to better to build out these agents that support your business need. But we also allow you to be able to do this as you've jumped on this journey. We had a recent example where we gave an onboarding to one of our customers, a user who's maybe more agent aware, more AI aware. He's probably spent some time using N8N or using cursor, and I don't think he's done cloud code, but he had some context, but he wasn't deeply clinical. He was mostly on the business side. We gave him an onboarding of about 40 minutes and he stepped away for a month and came back and he built his own ecosystem of agents. He envisioned it in a different way than we had shown him. He went and said, I actually want to bring this value into Microsoft Teams because that's where my users are. And I want to find a way to have them interact with teams and work with this ecosystem of agents that use the connectors and data that we provided for him. Within that instance, he was able to do this within a 70 minute session. Like he built his own agentic system. He didn't find any bugs and there's no bugs. But he, he did it all just off of guidance that he had heard from three weeks ago, four weeks ago, and he did that through the user experience that I think was big. It speaks a lot, a lot about how easy it is to use once you have that right, authentic mindset.
[28:02]
A
I just had this analogy in my head. It's almost like you're building Lego building blocks in some ways you're shipping products for them. Where you've said, here's your LEGO kit, I've already put it together for you. You can just use it in other ways. You're saying, here's the Lego blocks, make whatever you need to for the people who are those kind of builders and want to customize. And then maybe there's even this middle ground of like when we give you the kit. Matt, to your point, we're starting to look at what are the things you might customize. You can pick the color, you can pick how big the thing is and we'll make some smart decisions for you so you don't have to be the builder, but you can still tweak it and that you're offering that full range. Is that a fair analogy?
[28:41]
D
Yeah, yeah, I think, yeah, exactly, yeah. It's like that platform, platform first approach. So it's we had started with a lot of specific use cases, but to be able to scale it and to be able to have a large user base be able to use it in the same way, get the same value. That's what the challenge was.
[29:03]
A
Yeah. Okay.
[29:04]
C
We also have different customers and we have different Personas at the customers. So you might have one of the larger customers who has a big technology organization and they want to be able to leverage the platform and the way that like we've described for the one specific user. But any of the smaller customers might want an out of the box agentic experience that's hosted by Medival and maintained by Medival. But I do think there is a next paradigm which is how do we enable what I think of as AI? I don't know, it's an admin maybe is not the right word. But that is able to take recommended improvements from the agent but still have that human in the loop of oh yeah, I like this improvement that you recommend we make and add that in to the agent powered application.
[29:56]
A
You mean a process improvement like maybe how the agent is defined. So a self improving agent.
[30:03]
C
Exactly.
[30:04]
A
Yeah, yeah. This stuff is. It's just so cool to see what, where the, where things are headed. Okay, so let's get back to the Agent Studio and like the components so you can create agents. I already heard earlier you support the skills sort of protocol. Is it literally the same skills protocol that Anthropic is popularizing?
[30:24]
B
Okay, absolutely. We even have a feature and function that's built around their create a skill.
[30:28]
A
Okay, okay.
[30:29]
B
So you can generate a skill on platform that aligns.
[30:32]
A
So even though you start long before these tools are available, your platform has evolved to integrate the latest and greatest as more agentic functionality has become available.
[30:42]
B
And the intent is to be able to follow these same standards so that it helps to solve the onboarding problem for anybody who comes to our platform and they have some familiarity with MCPS or skills and be able to implement in a platform way so that these things are potentially interoperable. You could say I created this skill over here. I'm going to load this skill into Medable Studio. Agent Studio.
[31:02]
E
Prior to that, like we, we were building systems to allow us to integrate generic tooling and everything into our agents some time ago and prior to the MCP protocol. Now when the MCP protocol came out again from Anthropic, that was something we got on board and adopted. It allowed us to move a lot faster from our tool development and then allowed us to integrate other systems and Allowed us not to be the only ones who could add to that, which was great. But we started doing something similar trying to platform how our tools would integrate to our agent platform. We saw this protocol. Protocol allowed us to accelerate what we were doing. We adopted it and it was. It was great to be able to see these advancements and take them on board as they become useful.
[31:41]
A
The next area I want to dig into several times it's already come up just the volume of data that you're working with. And I think, Luke, you've mentioned rag steps. We're talking about MCP connectors. Luke, you already mentioned when you're defining an agent, you have to look at what context it needs. I feel like this problem of large data retrieval, getting the right context to the agent is not an easy problem. And I'm really curious even about. It sounds like you're connecting to other services versus via mcp, but I suspect you're designing MCP servers. And I feel like there's just this whole art we're learning around context window management, retrieval, even just tool design in MCP servers. Is there stuff you want to share there about what you're doing and how you're making this work with such large data?
[32:34]
B
I think the thing that we're working on and we're active in development on is we're building out an AI data layer that helps to align all of these systems into a common ontology so that it's much easier to be able to work with. And I think the balance that we're trying to strike is how do we vectorize the right data, whether it's static and how do we defer to MCPS to be able to call data just in time. And I think that there's a little bit in the middle of that that we need to determine what's the best way to manage the context window going forward. There's a lot we all need to learn, I think, about that process. I don't know if you want to add to that, Jen or Fikra.
[33:17]
E
Yeah, I think. Yeah. What I think what Luke says is correct. I think as you identified, there are so many different ways of doing this. We use the phrase rag a lot, but RAG is just so. A lot of people tend to align it off with sort of vector DBs and that sort of embeddings and things. So we use that. We use that in a sort of a simple way for we allow our customers to create simple, identifiable sort of little vector pools that they can attach to an agent. So you've Got a bunch of documents, you can dump it in here, connect it to your agent, and now you've got an expert on something that's a simple use. They could create one simple agent to do that. But we also have like much more complex systems where you're talking about trying to build things into sort of data hierarchies. Right. You start off with low level pages, you're talking about layers of summarization and how the agent works its way down through it or how we access that data. And again, you've got similar like agentic rag. So I don't know, take your pick. If you want to just give it a bunch of markdown files and say here's some high level markdown. What did they describe? Work your way through it, find the information you want, come back to me. Those are different approaches that these are just like. But there's just no one right way of doing it. Everything we've tried has been good in some way and bad in another. So it's, we've tried to broaden out our data access. And this is what the data layer as we will describe it essentially is. It's a collection of how we bring our data together into one unified portal so the agents can get at it best.
[34:38]
B
It seems like the pattern that we're chasing here is not unlike what we did with skills or what anthropic did with skills. We're trying to chase after. How do we remove context? How do you remove unnecessary context and only provide the right context so that you can get really accurate responses and very effective responses.
[34:58]
A
Yeah. So am I understanding right that this data layer, it's meant to be like Fikra, you kept saying like a portal. Like I know a lot of teams, they have like embeddings retrieval steps, they have even keyword search retrieval steps. Maybe they have like you mentioned, just grepping or searching through markdown files retrieval steps. And then they're creating almost an agent that sits in front of that. And then based on what is needed by the retrieving agent, that agent is choosing the right tool and returning the data. Is that kind of the idea behind this data layer?
[35:33]
C
Yeah, and the way I see it is it's like similar, when you create that connector, you create that connection to that data source, essentially you leverage AI to be able to map to an ontology that is pre built. You might need some human in the loop there for anything that the LLM doesn't do or isn't as certain about. But then once you have that mapping to that Ontology, you can proceed with the steps that you mentioned in regards to being able to more accurately query that data source in a human understandable way so that you're not grabbing and trying to understand the data as you are doing that query. Because you already have that layer that allows you to understand it.
[36:20]
A
Yeah. Help me understand what the ontology step is doing. What's the benefit of having that ontology?
[36:25]
C
Because we are working across so many different systems and a lot of the same kind of words are being used across these different systems, but with a different alias, let's say. So, like a participant is a common word used throughout the clinical trial, but in every different system there will be a different word that means participant. So as we're connecting to these data sources, we need that common layer that allows us to understand and allows LLM to understand. Here's what a participant is in this data source.
[36:58]
A
Oh, so it's like a mapping of. You have lots of data sources, you have an agent who might be using any number of the like terms. How do you make sure that when you retrieve from the different sources, you're using the right language?
[37:13]
C
Yeah. So when we talk about like the 13 different systems that a clinical research associate like, obviously as a human, you learn those things over time. But in order to enable AI to be able to help that human understand how to or to be able to join the data across these and make it effective for the human, you need that common layer.
[37:34]
A
Yeah. Okay, this is great. And I really want to dig into this because I've had so many episodes where a number of the things you're touching on have come up, like these data layers of really just how do we structure data to make it useful for the LLM challenges around retrieval? I think what I like about what you're building is you're building this agent Studio platform. So it has to work in a lot of different environments. It's not you're building a retrieval step for this very specific purpose. It's you're trying to build retrieval enablement across a wide variety of use cases. And so I'm curious about how do you decide when, if you're building an agent in your system, do you have all these building blocks of. You could choose embeddings, you could choose markdown files, you could choose this system, and those are like building blocks that are available. And then if you're Fikra, I heard you say you're giving that to your customers, like, they can create a rag step. And what I'm curious about there is Are you telling them to create an embeddings step? Do they choose that? Do they have the knowledge to choose embeddings over markdown? Like, this feels very complex. This feels like a very engineering decision. And so I'm curious about the platform approach of, like, how you're enabling this across all use cases. And then I'm also curious about what are you exposing to customers and how are you helping them make good decisions around this.
[39:09]
B
So we have a couple things that can help with that process, but you're pinpointing one of the problems that we're actively working to solve. People will look at a new agent and they'll think, okay, I need to need this agent to do this thing, and am I going to use the system prompt to guide everything that agent's going to do? Am I going to build a skill and attach that skill? Does that mean that I'm not going to put this part in the system prompts and am I going to put some of this in the knowledge base? These are all questions that will come up for our end users. And I think the thing that we want to get to is we want to get to a place where the users don't have to make that decision all the time. It's like you describe the problem that you're looking to be able to solve and we help to come up and propose the right recommendation for the configuration of an agent to the point of actually building that agent for you. With our agents, it's very, it's like agency agents. So I think that's where we want to get. But while we're getting there, we're trying to solve, we're trying to make sure we're grasping these customer problems first. And we have all of these core components available. So I would say right now a lot of these solutions that we're building for customers are probably more of us involved in trying to make sure that these solutions are working and less of, and providing feedback to their experiments on the system. But I think it needs to evolve from there for sure.
[40:21]
A
Almost imagining you might need like, rules. If your data is structured this way, if it lives in this form, then we're gonna, you're gonna, we're gonna ask you some questions about your data and then we're gonna recommend, oh, that's good for embeddings, or oh, that's good for keyword search, or oh, that's good for unstructured, maybe markdown. Because it seems like every team, I talk to the team themselves for their specific use case, they go through these cycles. Like everybody starts with embeddings because it's like the sexy rag step. And then they realize like that's not really the right solution for them and they jump to the next thing. And it's almost as an industry we're discovering these rules, like depending on the nature of the data, this is the right way to retrieve it. But it seems like for your platform in the long run, you almost have to codify these rules so your customers don't have to think about it.
[41:14]
B
I think you're probably going to go here too. But if we're focusing on the outcomes of what these agents should be doing, we're trying to evaluate whether these agents are actually accomplishing those needs with the configurations that we've provided. And we have something built into the platform that allows us to be able to evaluate various configurations. So you could build a set of evaluations and be able to benchmark based on different configurations. So maybe I want to use this one with this model with from GPT and this other model from claude, and then. Or maybe you want to change the system prompt altogether, maybe one you want to be able to use and knowledge embedding versus some other mechanism. So I think we're creating all these tools that allow you to experiment and land on the right outcome. And I think you're pinpointing correctly that we need to have something that helps to close that gap and make it a lot easier for the users.
[42:03]
A
Yeah, I think by building a studio, like an agent studio, while also building your own products, it helps create this like awareness of. I feel like so many teams are focused on their specific use case they're building for, but because you're trying to build the platform, it's almost like you're getting this step ahead on what are the generalized rules, which is pretty cool. All right, I could nerd about this forever. I want to talk a little bit about how you're using mcp. Is this primarily to connect to third party services? Are you creating MCP servers for your customers? Tell me a little bit about what you're doing there.
[42:41]
E
We're building most of our own mcps and we're using a couple of third party ones, but we're building most of them ourselves. We've built a sort of a layer around it so that it manages our own AUTH systems. Essentially our authentication mechanisms internally is wrapped around all the MCPs. So all our MCPs, when they're invoked, pass through our AUTH mechanisms and then those services will allow us to retrieve credentials that Our users can add, again, if they want to access, essentially if they want to access some system that they have, a third party system that they will add their credentials to our systems and so they will only ever get the access to their systems that they already have themselves. We don't do like super user accessed external systems. So when somebody is using one of our agents, they are accessing third party systems as themselves from a credentials perspective. So our, so that's that stage we've taken the FCP system and we've layered our own all on top of it to make sure that all our authentication and everything that we're passing credentials around is all managed by ourselves. So essentially MCPS turned to be the very, for us, at least a very convenient protocol to allow us to call these tools. And we just built on top of it to help manage all of it, make sure it was all very secure.
[43:56]
A
So it sounds like you're using MCP for internal retrieval. So like I know, Jen, you mentioned 13 sources are some of those internal sources and you're building MCP servers for each of those. Some of them might be third party outside the building sources and you're using their MCP servers for those. Is that an accurate picture?
[44:16]
C
Yeah, exactly. So like we have what we call an ECOA platform, which I mentioned, is like collecting patient questionnaires throughout the trial as well as consent. So those ones, right, is internal data. But we also have systems that are external, like the electronic data capture or the TMF that we've talked about as well, where we're leveraging like the external systems and the available documentation that they have to create the relevant mcp.
[44:46]
A
Okay, so one challenge I see come up a lot actually. These might be two different challenges. I come up, see, come up a lot with MCP is 13 external sources. So that's a lot of MCP servers. I'm assuming they have a lot of tools. So already in the back of my head I have a content, a concern about context window bloat. And then the second challenge that comes up a lot with MCP servers is just the design and are they designed to be token efficient and what are you doing to ensure that? And so any just things you've learned as you've designed MCP servers, how you're managing that context window that other people could learn from.
[45:23]
B
One of the things that we've learned is when you're building MCPs for some of these external tools, you're at the mercy of the other vendors, API mechanisms in their wrappers. So if they don't have a strong and robust mechanism to be able to query their data, we're going to have challenges with that too, and the agent will have challenges with it. And if you just built an MCP from scratch and say, hey, go in and do stuff with that data in that other system, and we don't understand like the structure of that data in the other system, you're going to spend a lot of that context window going back and forth trying to figure out the right query to get access to the data. And so we've had a couple different patterns that we've explored. The data ontology will definitely be a big part of this. But right now we've used, for instance, agent skills to be able to help prime mcps when we set them up for a new customer. And for context, we set up an MCP server and then we create instances of those servers that can be deployed per customer. So once you create one for a system, any customer can configure their own with their own security, client ID and secret and that type of thing, and their own credential mechanisms. But then we would also augment that with a skill that says we built a skill based on knowledge of that system that whenever you use this specific connector, apply this skill, use this query structure that helps to navigate the system in a much more effective way. And when it comes to using 13 systems, I would advise against building an agent with 13 systems all in that one agent. I would probably use a sub agent structure where you're managing a bunch of smaller context windows to help.
[46:47]
A
Okay, so no agent probably has all 13 MCP servers, but you might have task specific agents that have one or two. And then you're expanding the context window that way.
[46:59]
E
Yeah, and just to add to that, we also, yeah, we also actually have. So from a user perspective, we also have the ability for. When you're configuring your agent and you're adding your MCP server, you can filter the tools manually, you can say, I have an McP server with 20 tools in it, but I know this agent's only going to want three of them, and those three will be the only tools that appear in the context for that agent. So from a practical manual perspective, that's one way. But on the other side, when we get over a certain number of tools attached to any single agent, we also have a sub, an automatic sub agent that will go, here's your request, here's the context. It's just the tools. We only pass what we believe to be the valid tools to the main agent to make decisions on what tools it could use. So we try and pre filter the. Pre filter the tools if they go above a certain if it's going to say started leading to context rot and stuff like just too much in it. We try and pre filter with a smaller sub model that will just filter them for us. And all of these things work very well in different circumstances. And again sometimes that that works for a point and then we do have to go in and say look, there are still too many tools we need to reduce, we need to filter manually, but we have a lot of options. Put it that way.
[48:11]
A
Yeah. Okay, so I like the fact that first of all when you're configuring the server at your customer level, they can say I don't need all of these tools. It makes me wonder are they capable of that decision? Do they know enough about tools to. Or is there a way for you to talk through their use cases and figure out what tools should be configured? That feels hard. Let me pause there and let you react to that.
[48:35]
E
One of the things that we try to do in our UA is we try to provide an agent in our agent building screen. Right. So the users can talk to an agent about the agent they're building and that agent can try provide them with context information or not context, but you can try to provide them with enough information to make the harder decisions. So that's something we are experimenting with as well. And but you're right, a lot of the customers may not have the expertise to understand which ones to choose, but we give them the option to. And that's where our like forward deployed engineers will help people make those kinds of decisions.
[49:13]
A
Yeah, I've talked to a few companies that have taken this platform approach. Like they basically sell digital workers or customer service agents and they're trying to build these UX where their non technical customers are configuring agents. It's such a fun UX space to just play with. Like how do we make this accessible while we're still learning it ourselves? Like it's just wild. I'm a. I love it. Okay. And then it also sounds, there's this filter of like when you set up the agent or you set up the MCP server, you can already filter the tools at that step. But then Fikr, it sounds like there's also this double check of an agent gets to a point where the tool list is just too long. There's an intermediary agent that may filter it. And so you just need these tools. Yes, so there's agents all the way down is what I'm hearing.
[50:03]
E
Yeah, yeah. Essentially that's it. That was. I think that's. Try to use these tools at every point because again of how good they are at certain things. That's understanding the description of a tool and telling you which one is good or bad. And that's a something that's. That can be done by a small and quick model that you can stick in the middle. That isn't going to increase your latency too much, which is something you have to be careful of. We're getting more and more used to slower responses and these things not being instantaneous. But still you have to keep in mind that people are going to get impatient. So you need to. These things need to respond quickly so you can't push in too much interagent. Lots of LLM calls in the middle of things. While something, while a user is waiting for something to happen. So you have to be conscious of it.
[50:47]
A
I imagine latency and cost are both a concern there. Okay, then let's get into evals. Tell me a little bit about how do you know any of this is working?
[50:58]
B
We have an evaluation mechanism built into the platform that allows us to be able to define probably a lot of what you've seen commonly with evaluation harnesses, but we associate it to our benchmarking mechanisms that allow us to not only use these evaluations to test the accuracy and success of an agent's response and like whether the responses are relevance relevant, we're also evaluating whether we're calling the tools in the right order and we're calling the correct tools with the evaluations. And we're using that as a mechanism to help in the deployment process of agents so that they can be validatable in the clinical trial industry to be able to say do we have the right configuration to be able to support this? And have we considered the multiple options that we could use to be able to meet a better performance metric with these evaluations? I would predict your next question is like, how manual is this process of creating these evaluations? And I think that there's tiers, there's evaluations that are potentially global that we can, that we provide out of the box, that we can help to support what we would expect out of an agent. There are evaluations that are probably very specific to a customer, specific rules they're trying to associate. So if they have a specific. I want to be notified when 10 of these records exist. You might want to be able to generate an evaluation that evaluates for that at that point in time. And right now it's manual, but we have an active development actually be able to generate some of these things based on the rules that we're aware of from our customers and our agents.
[52:27]
A
I'm really curious about. I can imagine you have to eval at the agent Studio platform level, but then you also might have to eval at the ETM F level. Get that right? Yep. Okay. And CRA level. Right. You probably have evals. And then what. I'm really curious about what you do for your customers that build agents and do they have evals? So maybe if you want to just talk about how you think about what goes at what level, what are you evaluing and where?
[52:57]
B
I think so part of it, just from the agent perspective. And I'll pass to Jen in a second to talk about what we did for tmf, because I think that's a really compelling example. But we have the evals as part of the agent development process that precedes the publishing of that agent out to its end users. So it's something that, it's a key, like stage gate, that we go through to help make sure that we can build agents that are validatable, they're VXP compliant for our customers. And then what we want to do with evaluations is they're not just like fire and forget. Like, we've launched the agent, we've shared it out with people. We also run that on a regular frequency to be able to determine whether we're getting different responses over time that we need to be able to monitor. Then when it comes to the ETMF agent. I'll pass to you, Jen, for that.
[53:44]
C
Yeah. So for the ETMs specifically, we were able to leverage kind of a golden data set where we're able to take like over 2,000 documents and we're able to run the agent against these 2,000 documents where we knew the correct answer. So from there, that made it, I won't say easy for Peaker, but it made it. It allowed us to be able to monitor ahead of launching. Right. The accuracy of this agent and build that trust with the customer. And then obviously, once we have users leveraging the platform, we're able to actively monitor, because we start with human in the loop, we're able to actively monitor which recommendations the human's agreeing with and which ones they aren't. And there's a continuous kind of evaluation process we go through to. Because humans aren't always 100% correct. So to make sure that we're looking at how accurate the agent is or how often the agent's disagreeing with the human, and then from there which one is the most accurate. So we're continuously evolving our accuracy and imperfect.
[54:50]
A
You're describing something I'm wrestling with right now, which is when you have a human in the loop giving feedback, but the human may not be right. So tell me what you do there.
[55:01]
C
I think we're fortunate to have close partnerships with our customers at this point in time. And they understand this problem because they've actually dealt with the 10 years down the line, seeing that the user put the wrong classification and for the document. And so they understand that the human doesn't have that 100% accurate. So we're able to work closely with them to understand at this point in time, like the acc. The kind of which one is correct and able to modify the agent on behalf of that, like, description. Whether or not. Right. That will scale. I think we'll need to work through the right way to scale this.
[55:40]
A
So it sounds like you don't like when a human corrects it. You don't by default assume the human is correct. You think about it as there's a delta.
[55:48]
C
Yeah.
[55:49]
A
And then what? And then what happens? So you're like, okay, we have a document where the metadata doesn't. What the agent added doesn't match what the human wants to add. What happens next?
[56:01]
C
Yeah. So from there we would take a look at kind of the rules that are applying. So based on whichever one was correct. So let's say the human was correct. And from there we would take a look at the, not like all of the contacts that the agent is given and look to how we can improve so that next time around the agent can get that recommendation and you're able
[56:23]
A
to like, look at it and see that the human was correct because there's like clear rules to follow.
[56:29]
C
That's where the process of working with the customer comes in. Right. Now, I think if we look about this like from a platform approach. Right. One thing we can look at is some sort of experience that is a review experience that allows that AI admin role or kind of business expert role to be able to actually do that and then provide that automatic feedback into the agent once we know which one is truly accurate.
[56:57]
A
Yeah, okay. This is. It's so easy to default to the human in the loop is the ground truth. But I think there are lots of cases where the human in the loop shouldn't be the ground truth. And I don't know that this has come up before on our podcast. And so I find it fascinating because I am actually wrestling with this myself, with. I'm working on extracting opportunities from interview transcripts. And sometimes the. The interviewer can correct the opportunity. But I have a lot of people I work with that don't really know what an opportunity is, so I can't take their correction as truth. And so it's very messy. It's very hard to know, okay, what do we do with this? And I'm basically doing what you described, Jen. I have to do a lot of manual review to figure out, like, what are the implied rules here? How do I decide? Is the agent better? Is the human better? Then there's a UX challenge of how do you communicate to the human that they might be wrong? Yeah. Okay, Luke, I want to go back to something you said. You used an acronym I'm not familiar with. It was something about being compliant with something that maybe is in the medical health care baby space.
[58:06]
B
We have that problem a lot. We have a thousand acronyms for everything, and we just assume that everybody knows what they all mean. You can call us on them at any point in time. Good clinical practice. GXP just refers to the wealth of practices that exist within good practice. So it could be like good manufacturing practice, good clinical practice. And this is a set of guidelines that regulatory bodies adhere to when they assess and review systems that are used in clinical trial contexts. And also in this case, it will be used for agents that will be used in clinical context. And that's probably a big differentiator for our agent platform and how we're delivering specifically to the clinical trial industry as we're focused on the regulatory needs of this specific industry.
[58:50]
A
Yeah, I wanted to ask about this because I suspect a lot of teams out there aren't taking on some of these challenges because of the regulatory environment and how scary and hard it feels. And so I'm happy to hear that that hasn't kept you from doing this. And I imagine a lot of it comes back to where we started, which you have big, hairy, real customer problems to go after that this technology seems really well suited for. But is this something like when you went into this, when you're like, we're going to build agents, Were you aware of how you were going to work through these regulatory challenges? Was this just a big unknown that you had to unravel the hairball? Tell me a little bit about that.
[59:29]
B
We've been battling the regulatory challenges for our entire existence.
[59:34]
A
Okay.
[59:34]
B
Like, we've been refining, like, we had very challenging process that kept us flow in the way that we delivered products. Sometimes it would take us a whole quarter to be able to get new features and capabilities out there because we have all of these documentation requirements to be able to meet those standards. We've spent a lot of time refining that process to make it a lot easier for ourselves so that we can move a lot quicker. And with the agent platform we're finding opportunities to accelerate that even more to be able to deliver products within this, this environment.
[60:02]
A
Okay, and then you mentioned like this came up in the context of evals at the agent studio level. So give me a sense of you doing at the platform level to make sure you stay within these regulatory rules.
[60:16]
B
So there's a couple things. So the what systems look for, what regulatory bodies look for systems is that we follow a very specific practice of understanding the intent of a specific software product, understanding how that traces to a design specification and how that design specification traces to actual test evidence that it worked the way that you'd expect. So that common pattern exists for GXP systems and every feature and capability that we deliver on the platform follows that pattern. So you can trace the way that we've built all of these components so it has the right foundation and that allows us to deliver our documentation aligned not different from the way that we're delivering the rest of our software products. What's nuanced is when you build agents on top of this platform, the platform itself would already be validated. But when you build agents on top of it, we need to be able to also prove those things that we're building these agents to a specific intent that it has a design specification and that you can prove that it's doing the thing that you'd expect. And evals become really important in that part of it's doing the thing that you'd expect it would they would do. But is it a key part of the challenge that helps to address that problem?
[61:25]
A
Yeah, I can, I gotta say kudos to you. I can imagine this is the kind of thing that a lot of companies would look at that and go this is non deterministic. We can't prove it's doing the thing it was intended to do. We're just going to shy away from it. But you're finding ways to just do it, which is amazing.
[61:41]
B
I think I was thinking about this before. I think when people see AI and they've experienced AI and they've been spending time on GPT and they get weird responses sometimes or they get hallucinations and they think you can't just sprinkle AI magic dust and solve these problems. There's a lot more involved in a proper AI solution where the whole solution is an AI. Like a bunch of these components are purpose built, intended to be, in many cases deterministic. And the probabilistic nature that comes with the AI helps to be the connective tissue with all of those components to help make the translation layer for the human a lot easier and the output a lot more consistent across these systems.
[62:18]
A
Yeah, I saw someone post on LinkedIn. They said something like, how is everybody convincing their customers it's okay to get unreliable responses from an LLM? And I was like, whoa, that's not your job. Your job is to make the responses from the LLM reliable. I was like, I think you're missing AI product development skills here.
[62:39]
B
And I think the other thing that people fall into the trap of. Sorry to interrupt you, but I think people fall into the trap of like, why isn't this doing the exact thing my traditional systems should do where I'm getting specific answers? We're not. We shouldn't be just comparing these agents to these systems. We should be comparing them to humans. Humans make errors too. And the idea is, how can you get an agent to be. To do. To have less variance of errors than a human would doing that same job? How can you accelerate their ability to be more accurate with agents?
[63:08]
A
I think, and I think as an industry, there's still, there's obviously teams that are building this stuff and using this stuff and learning these techniques of evaluations and measurement and feedback loops and guardrails. But we still have a huge amount of the industry that like their notion of AI is still just chatting with ChatGPT and they don't understand these mechanisms exist. And that's actually one of the big reasons why I wanted to start this podcast is how do we just educate everybody? You don't have to just release the first AI answer. There's more we can do to make this reliable. All right, let me ask you this. What is next for Metable and your AI agent studio?
[63:49]
C
Yeah. So I think you've heard throughout the podcast a lot of the next in terms of like tactical development and evolving of the platform and the solutions. But our CEO just released a paper called Full self Driving and essentially like, that is where we're trying to go for Medival, which is like the full self driving of clinical trials. So we really want to re envision the space and we want to leverage the technology available to do. But essentially, instead of having all of these manual operations that require masses of humans to be able to monitor single data points, we want to enable that full end to end clinical trial process that has these agent workers helping them along the way. Today there's 10,000 uncured illnesses and they'll take 200 years to get to market at the current pace. So I don't think it's a matter of having less humans to do these clinical operations. It's a matter of enabling the humans to do more clinical operations so that we can get treatments to patients faster. There's an evolution to get here and we're starting with kind of specific use cases that we can see obvious problems that are fit for our agent platform, but we want to evolve that by we understand the key problems in this space and build kind of agentic powered applications that have a full end to end workflow for each of the relevant Personas that helps them get to that place of kind of full self driving for clinical trials.
[65:33]
A
Yeah, I like that. It's such a good problem space and just creating value for humanity, which I really appreciate. It's such it's very refreshing to hear people use especially this type of technology for genuinely hard human problems. So keep it up. I have really enjoyed learning about your product and what you're building and I appreciate you taking the time to spend time with me.
[65:58]
C
Thank you.
[66:00]
A
If you enjoyed this conversation, please subscribe in your favorite podcast app and give us a rating as it helps others find the show. Thanks, I appreciate it.