Summary4 min read

The Digital Executive Podcast – Episode 1153

Scaling Agentic AI with Chirag Agrawal

Host: Brian (Coruzant Technologies)
Guest: Chirag Agrawal
Release Date: November 16, 2025

Episode Overview

This episode explores the technical and operational challenges of deploying AI agents at scale, featuring insights from Chirag Agrawal, a senior engineer with extensive experience in large-scale AI platforms and multi-agent orchestration. The conversation delves into best practices for bridging the gap between AI research and resilient production systems, managing developer freedom, optimizing for operational metrics like latency and cost, and building foundational layers for ethical and interoperable agentic AI.

Key Discussion Points & Insights

1. Bridging the Gap: From Models to Production Systems

Model as Dependency, Not the Product:
Chirag advocates for treating AI models as dependencies instead of the central product. The focus should be on the overall system that surrounds the model.
- Notably, building custom agents from scratch leads to teams re-inventing undifferentiated components such as retrieval, orchestration, caching, and evaluation, which is inefficient.
Recommendation for Teams:
- Build agents from scratch as a prototype to learn, but transition to established frameworks for production.
- Evaluation and guardrails need to be integrated from the start, not as afterthoughts.
Quote:

"Product teams should think about not the model, but the system around it... use a framework going forward to ship the agent in production systems."
– Chirag Agrawal [02:05]

2. Developer Freedom vs. Architectural Discipline

No Need for Trade-off:
Chirag stresses that well-designed developer tooling gives developers freedom to experiment within disciplined architectural boundaries, improving velocity without compromising reliability.
Effective Tooling:
- Provides abstractions for developers (like binding model outputs to APIs) while handling schema validation, error management, context compression, and safety guardrails at the platform level.
- This leverages organization-wide improvements—one platform change uplifts all dependent products.
Quote:

"Developer tooling provides good abstractions... The goal of the architectural discipline is to provide a safe playground where developers can move fast but without breaking the larger system."
– Chirag Agrawal [05:01]

3. Monitoring and Optimizing Operational Metrics

Key Metrics:
- Latency: Track everything from prompt construction, model response time (first token latency), to first word rendered to user.
- Token Usage: Monitor input/output token counts, cache usage, and dynamic prompt segments to optimize both cost and speed.
- Quality: Harder to define, often task-dependent. Key metrics include tool selection accuracy, argument filling, and truthfulness, typically tracked offline.
Balancing Trade-offs:
Chirag likens managing latency, cost, and quality to balancing a triangle: improving one often impacts the others, requiring continual system tuning as per user feedback.
Quote:

"If you try to improve latency too aggressively, you might compromise the intelligence or the quality of your product. And if you try to chase the quality too aggressively, then your token cost and latency will explode."
– Chirag Agrawal [11:15]

4. Ethics, Bias, and Interoperability: Building Trustworthy and Connected Agents

Foundational Ethics & Transparency:
Ethical considerations, mitigation of bias, and transparency must be built into the core platform, not bolted on afterwards.
- All agent requests should be auditable, observable, and traceable.
- Evaluation hooks for real-time monitoring and reflection are essential to prevent or quickly mitigate unsafe behavior.
Interoperability via Emerging Standards:
- Agents must communicate across system boundaries using open and typed protocols, inspired by standards like HTTP.
- Chirag highlights excitement about emerging standards such as MCP (Model Context Protocol) and a2a, which facilitate agent discovery, authentication, and collaboration.
The Next Frontier:
Chirag envisions “an Internet of agents”—multi-agent systems that share capabilities yet remain governed independently, akin to the progression of early mobile apps to rich, interconnected ecosystems.
Memorable Quote:

"Bias and transparency, these are things that should be built at the foundational layer... All the requests that flow through these agentic platforms, they should be auditable, observable and traceable."
– Chirag Agrawal [12:55]

"Looking ahead, I think the next frontier in production AI enterprise is an Internet of agent or multi-agent systems where agents built by different teams can share capabilities but still operate under their own governance."
– Chirag Agrawal [14:17]

Notable Quotes & Timestamps

On frameworks vs. custom builds:
“Instead what they should do is probably like build the agent once for a prototype... but then use a framework going forward to ship the agent in production systems.”
– Chirag Agrawal [02:25]
On developer freedom & system discipline:
“A developer platform provides general solutions to essential problems. It doesn't really curb developers’ freedom at all.”
– Chirag Agrawal [06:18]
On prompt engineering for efficiency:
“Cached input tokens can be really valuable to monitor... you will end up utilizing a lot of cached input tokens which will reduce your cost and latency.”
– Chirag Agrawal [09:29]
Analogy to Mobile App Evolution:
"This current scenario reminds me of the early days of Android and iOS... I think same thing is going to happen with these agents. I think they are sort of in an nascent stage right now, but they're going to improve dramatically over next few years."
– Chirag Agrawal [14:00]

Important Segment Timestamps

AI infrastructure beyond models: [01:51]–[04:02]
Developer empowerment and tooling: [04:52]–[06:33]
Operational metrics and trade-offs: [07:19]–[11:57]
Ethics, bias, transparency, and the future: [12:45]–[14:46]

Conclusion

Chirag Agrawal provides a pragmatic and nuanced view of scaling agentic AI—emphasizing the importance of robust system architecture, disciplined experimentation, and foundational ethics. He advocates for leveraging shared frameworks, monitoring key operational metrics, and building for interoperability and transparency at every step. The vision for AI’s future? A world of interconnected agents—each robust, trustworthy, and able to collaborate across domains.

Loading summary

Transcript13 lines

[00:00]
A
Foreign. Welcome to Coruscant Technologies, home of the Digital Executive Podcast. Do you work in emerging tech? Working on something innovative? Maybe an entrepreneur? Apply to be a guest at www.corazon.com brand welcome to the Digital Executive. Today's guest is Chirag Agrawal. Gerard Agarwal is a seasoned technology professional with over a decade of experience in building large scale AI platforms, distributed systems and developer tooling. As a senior engineer and tech lead, he specializes in LLM infrastructure, advanced AI powered conversational systems and multi agent orchestration, including agent execution system, prompt engineering, AI tool use and AI memory. Along with software development kit and compilers, Chirag leads cross functional architectural initiatives, driving lower latency reliability and scale for Alexa and its developer ecosystem. Well, good afternoon Chirag. Welcome to the show.
[01:11]
B
Thank you for having me, Brian. It's great to be here.
[01:14]
A
Absolutely my friend. I appreciate it. You are currently in India via Seattle, Washington, but I appreciate you making the time. It's hard to traverse time zones and this busy world that we have, so I appreciate that. Chirag, let's jump into your first question. You focused on the infrastructure between models and applications, runtimes, orchestration, memory, execution, graphs, et cetera, right? How should product teams think about the gap between we have a model and we have a system running it reliably in production, and what are the most common pitfalls you see?
[01:51]
B
That's a great question. So I think model should be treated as a dependency, not the main product. So like I would say, product teams should think about not the model, but the system around it. And like one of the common pitfalls I usually see is that teams try to build their AI agents from scratch and then what happens is that they spend most of their time doing the undifferentiated work. And this is because they often overlook the complexity of the system around the model that is required in order to shift this shift the agent. In production. There are like many concerns related to retrieval, cold calling, orchestration, like you said, context management, compression, caching, all of that like evaluation. And everybody is doing it. Like every team that's building an agent has to do all of this work. So it's really undifferentiated and it's often handled by frameworks or developer tooling provided by platforms. So instead what they should do is probably like build the agent once for a prototype because it does serve as a useful learning exercise, but then use a framework going forward to ship the agent in production systems. I think another pitfall I usually see is teams not treating evaluation and guardrails as first class citizens in their development life cycle like they are often thought of as like this last step that we will do after we have built our agent, we will just evaluate it on some data set. But that is usually a mistake. And when I'm saying evaluation I don't just mean the data set. I'm also including the framework required to run the evaluation and make it repeatable for your system so that you can quickly fine tune your prompt or other behaviors of your agent over the course of time without it bogging you down.
[04:02]
A
Great. Thank you so much and I appreciate you unpacking that for us, especially in this development world we live in. Building agents and I took a couple highlights away. Your recommendation is you build that agent as a prototype. You stick to a framework going forward, however, and shouldn't be so much focused on the models you said they should be treated as a dependency I believe. And really the application is the main focus. So again I appreciate your insights. And Chirag, one of your focus areas is developing tooling, typed function calling SDKs, binding model outputs to real APIs. How do you strike the balance between giving developers freedom to experiment and enforcing disciplined architecture so the system scale remain manageable and they don't fragment?
[04:52]
B
So developer freedom and architectural discipline, they can be seen as opposite forces, but they are actually not. In fact, developer tooling provides good abstractions to developers and it often lowers the barrier to experimentation and it speeds up velocity of development. So the goal of the architectural discipline is to provide a safe playground where developers can move fast but without breaking the larger system. An example is, as you said, like binding model output to real APIs. So a good, well architected developer tooling will provide you abstractions so that you as a developer can design your API and bind into model outputs the way you see fit. But concerns like schema validation or error handling or safety guardrails or progressive context compression, all of these things that are required to keep the user experience smooth and to manage latency and concerns like cost, they are all handled by platform for you. So a developer platform provides general solutions to essential problems. It doesn't really curb developers freedom at all. And this type of tooling or platform can create incredible leverage for an organization. For example, a single change that you do in the platform can improve latency or cost for all teams or products running on the platform.
[06:33]
A
Thank you, I appreciate that. I do believe that there is such a thing as developer freedom while remaining in that architecture discipline. Right? They can work in harmony, although sometimes that can be challenging. But over time I've seen a lot of improvements in the way we work on that system design life cycle and some of the tools that we use today to kind of keep those guardrails in place. So I appreciate that. And Chirac, many organizations adopt AI with enthusiasm. But when you push into latency constraints, cost optimization, and real world performance, you hit friction with what are the operational metrics you monitor closely in agentic platforms, and how do you trade off quality versus cost versus speed?
[07:20]
B
There are like three big buckets of metrics we monitor. First is latency, second is number of tokens, and third is quality. And within these three buckets, you can define the metrics depending on which part of the system you're monitoring. For example, for latency, you can start with how much time it took to construct the prompt. And this would include all the time you spend gathering context. Another one could be how much time it took from the moment you send the prompt to the model to the time you received the first token. And that tells you really like how much latency you're adding to the system by virtue of your prompt. Because usually that metric is related to the size of the prompt and the size of the model you're using. And then the other one that you can monitor is how much time it took to render the first word of the response or the first element of the response to the user. Then that's because it's generative AI. And thankfully users don't have to wait for the entire model response to be available. As soon as the first word of the response is ready, you can start streaming it right away to the user. So those are just like few examples of the metrics you can monitor for latency. But of course you can go much deeper than that depending on your problem space for number of tokens. Of course, the obvious metrics to monitor are like how many input tokens you're sending in and how many number of tokens the model is producing per user request. Because those two things drive your cost that you're going to pay for these tokens and the latency you're going to incur for that user request. But within those, you can also monitor number of cached input tokens. And you can measure number of tokens for different parts of the prompt that are dynamic in nature. For example, conversation history, which is built up over the course of the back and forth you do with the AI. Cached input tokens can be really valuable to monitor because they can guide you. They can essentially guide your prompting strategies. If you design your prompt in a way where the top parts of the prompt are unchanged over the course of the conversation and the most dynamic parts of the prompt are towards the bottom, then you will, you will end up utilizing a lot of cached input tokens which will reduce your cost and latency. And you can, you can do this for different parts of the prompt. So that kind of like really guides your prompt and context. Engineering quality, which is like our last bucket, is much harder to define and it's, it is often dependent on the domain or like the task you're solving with AI. But some of the high level metrics I believe which should be common across most of the agents are, it could be like accuracy of tool selection, accuracy of argument filling, truthfulness of the model relative to the context. And these are often like metrics you cannot monitor online because you need the truth, like ground truth for it. So this last bucket is often monitored offline through your test harness. And you can, you can behavior, you can model the behavior of the agent through online reflection. But that's, yeah, I think that that spills into more of like product quality metrics rather than operational metrics. Your, the second part of your question which was related to the trade off between these three is all quite interesting. So these are kind of like arranged in a triangle. If you try to like lean on one of them too much, then you would have to give up the others. For example, if you try to improve latency too aggressively, you might compromise the intelligence or the quality of your product. And if you try to chase the quality too aggressively, then your token, your token cost and number of tokens will explode which will add to latency. So it's really like a, it's really an art. You have to constantly tune the system as per the user feedback.
[11:58]
A
Great, I really appreciate that Churag. I know there's a lot of metrics that you would like to monitor during this whole process. Latency has always been a big one. You talked about some of the other constraints, you know, cost optimization, token usage, caching, and again kind of breaking out the trade off with that quality versus cost versus speed. So I appreciate that. Chirag, last question of the day. As you build the foundational layers of agentic AI systems, how are you thinking about ethics, bias, transparency and interoperability so agents from different teams and systems can coordinate. But looking ahead, what do you see as the next frontier in production AI infrastructure, whether that's plug in marketplaces, agent networks or open protocols and things bias.
[12:46]
B
And transparency, these are things that should be built at the foundational layer of the system itself. They cannot be bolded on later on. All the requests that flow through these agentic platforms, they should be auditable, observable and traceable. And like I said earlier, evaluation hooks should be built into the runtime platform itself so that we can monitor agents behavior in real time through reflection. And you can essentially devise ways to prevent unsafe behavior if you have that or at least mitigate it very quickly. Interoperability is the other side of it. So agents built by different teams need to communicate through open and type protocols, much like how systems are integrated through HTTP. And that's why like I'm very excited about the emerging standards like MCP and a two way which kind of define how agents discover each other, authenticate, authenticate other unknown agents and collaborate safely cross system boundaries. This current, the current scenario reminds me of like the early days of Android and iOS where like we had, we had like mobile apps and they were good, like they were like useful functional mobile apps, but they were not that good. And over the course of the next decade they became really, really good. I think same thing is going to happen with these agents. I think they are sort of in an Asian stage right now, but they're going to improve dramatically over next few years. Yeah. So like looking ahead, I think the next frontier in production AI enterprise is an Internet of agent or like multi agent systems where like agents built by different teams can, you know, share capabilities but still operate under their own governance.
[14:46]
A
Thank you. And I think that's really important. We talk a lot about guardrails, governance ethics and you talked about how it's important to build those into the systems and make sure that these systems are audible, observable and then moving into interoperability, I think it's important that agents can interact with other agents that were built by different teams. You know you talked about MCP or that model context protocol which is so important today. And that's been really a hot topic obviously for that interoperability. So I appreciate that. And Chirag, it was such a pleasure having you on today and I look forward to speaking with you real soon.
[15:26]
B
Yeah, thank you for having me. It was a pleasure talking to you as well.
[15:30]
A
Bye for now.