Summary7 min read

AI & I Podcast — Episode Summary

Episode Title: The Secrets of Claude's Platform From the Team Who Built It
Host: Dan Shipper
Guests: Angela (Head of Product, Claude Platform, Anthropic), Caitlin (Head of Engine, Claude Platform, Anthropic)
Date: May 8, 2026

Episode Overview

This episode delves deep into the evolving landscape of AI development platforms, focusing on Anthropic’s Claude platform and its new managed agents. Dan Shipper interviews Angela and Caitlin from the Claude Platform team at Anthropic, unpacking the philosophy, infrastructure, and user experiences behind building, deploying, and scaling agents. The conversation provides an inside look at how Claude is poised to shape the next generation of AI-powered productivity tools and collaborative workflows—and the challenges and opportunities in making flexible, scalable, and user-friendly AI platforms.

Key Discussion Points & Insights

1. The Evolution of AI Platforms (00:48–04:08)

From Completion Endpoints to Managed Agents:
- Early LLM APIs were simple completion endpoints. Over time, richer abstractions and tooling emerged—culminating in managed agents, which provide persistent memory, tool integrations, and cloud-native infrastructure.
- Angela:
  
  “As we make improvements to Claude... we find ourselves basically needing to evolve the platform to be higher and higher order abstraction... to help you get the best outcomes out of something.” (01:47)
- The platform must keep pace with both experimental/edge-case users and those wanting robust solutions out-of-the-box.

2. The Core Primitives & User Choice (04:08–09:11)

Building Blocks of Claude Managed Agents:
- Managed agents use the same core primitives available via Claude’s APIs: messages API, code execution sandboxes, web search, file systems, and “skills.”
- Caitlin:
  
  “We’ve taken what we see as all the most powerful of those things and put them together into a harness and a set of infrastructure...” (04:16)
Balancing Opinionation and Flexibility:
- The team aims to offer opinionated defaults (e.g., encouraging file system usage and skills) while leaving room for customization/hot-swapping and open APIs.
- Angela:
  
  “We want Claude to very specifically use file systems... to be really opinionated about that. But... we are trying to make it flexible enough for people to add in different pieces...” (07:33)

3. Infrastructure Tradeoffs and Platform Lock-in (09:11–12:53)

Developer Dilemmas:
- Users weigh quick prototyping and flexibility against the convenience, stability, and scale of managed platforms.
- Common concerns include losing flexibility (“lock-in”) and lag between core model features and managed agent updates.
Platform Design Philosophy:
- Anthropic’s internal and external products are built on the same platform, encouraging tight feedback loops and minimizing divergence.
- Caitlin:
  
  “All of our first party products are built directly on the same platform as everybody else.” (10:37)

4. Harness Engineering & Path Dependence (12:53–16:25)

Specialization vs. Generalization:
- The optimal “harness” (agent infrastructure, tool call protocols, memory, etc.) can differ depending on the model and use case.
- Subtle early design choices can create lasting “path dependencies,” shaping what a model/agent is best at.
- Angela:
  
  “Each one of these harnesses performed drastically differently... you can actually hill climb a tremendous amount by just harness engineering the right pieces together.” (13:05)
Potential Lock-In on Capabilities:
- Philosophical discussion: will models—and agents—become specialized along “lanes” (e.g., Claude for file systems, GPT for something else)?
- Angela:
  
  “What we end up treating as the right path and the right primitives need to be very carefully thought through.” (15:03)

5. Who Are Managed Agents For? (16:25–19:01)

Target Users:
- Initially, managed agents serve both technical and less technical users, with onboarding designed to clarify core capabilities and patterns.
- Used both internally (enterprise automation, internal workflows) and as the foundation for customer-facing products.
Quick Start Experience:
- Notable that non-developers can succeed with Codex/browser-driven agent setup.
- Caitlin:
  
  “We actually felt pretty strongly about launching [quick start], not specifically for non-technical people, but for anybody to be able to wrap their head around the primitives.” (17:15)

6. The Real Infrastructure Bottleneck (19:01–20:59)

Expectation vs. Reality:
- Builders often expect harness engineering to be the tough part, but scaling and productionizing (infrastructure, state management, sandboxing, reliability) is typically harder.
- Caitlin:
  
  “Everybody hits an infrastructure wall... your whole agent dies, right? And so... the infrastructure part especially is the wall that most people end up hitting.” (19:36)
Managed Agents as a Solution:
- Offloads the “nightmare” of infrastructure from developers, freeing them to focus on higher-value work.

7. Agent Collaboration and Multi-Agent Patterns (20:59–26:36)

Always-on, Team, and Company-wide Agents:
- There’s demand for seamless deployment of persistent, collaborative agents (e.g., Slackbots with memory/personality).
- The platform is extending to support richer multi-user, multi-agent capabilities.
- Angela:
  
  “When you get to the team layer, suddenly everything gets massively more complex... it needs to happen at a slightly higher bit of abstraction than just a single agent.” (25:07)
AI Software Factory Mentality:
- Inspired by companies like Vercel and Stripe’s “Minions,” the internal proliferation of collaborative agents supercharges organizational productivity.

8. Concrete Use Cases: Internal Agents (26:36–30:30)

Legal/Marketing Review Bots:
- Agents as workflow accelerators (e.g., reviewing marketing copy before legal approval), with humans-in-the-loop as necessary.
- Caitlin:
  
  “The agent reviews first and then puts it in legal’s inbox as a first pass review done... sometimes it’s clear enough to approve, other times it needs extra human review.” (27:25)
Customization and Ownership:
- Challenges in keeping agents relevant, assigning ownership, and keeping prompts/rules up-to-date.
- Emergent solution: allow anyone (even non-engineers) to open a PR or make tweaks in Claude Code, with lightweight review processes to avoid “foot-gunning.”
- Angela:
  
  “The way that we tell them is like you’re just talking to Claude, but under the hood, it’s many, many Claudes engaging with each other to get to the part where then they [the Claudes] are doing the more complex work...” (33:43)

9. Multi-Agent Orchestration & Innovative Patterns (34:24–35:50)

New Possibilities:
- Multi-agent setups allow for bespoke architectures (advisor/executor, adversarial, best-of-N, divide-and-conquer) tailored to different domains (e.g., research, bug hunting).
- Angela:
  
  “If we can make the primitives very Lego-like, then people can put them together to solve things at a slightly higher form factor... and that’s really exciting to see...” (35:50)

10. Success Metrics and Agent Lifecycle (35:54–39:09)

Evaluating Agent Success:
- Moving from standard evals towards “verifiable outcomes” and success as “outcome + budget.”
- Need to manage agent lifecycle: automated tools to retire deprecated agents or migrate them to new models.
- Caitlin:
  
  “Some of the most AGI-pill people are running agents that monitor their agents to see if their agents are outdated…” (37:50)

11. The Next Year: The Future of the Platform (39:09–42:24)

Ambitions for 2027:
- The long-term goal: radically simplify agent building—user provides only “outcome” and “budget,” and Claude figures out all underlying architecture/model choices dynamically.
- Handling limitless scaling: supporting massive, always-on agent fleets and complex, long-running workflows via system-level scaling and abstraction improvements.
- Angela:
  
  “We want to get closer... to that state where the parameters we care for from users will be that outcome... and the budget. Everything else should be figured out for you.” (39:36)
- Caitlin:
  
  “I never want the ability of the platform to scale to get in the way of what people would otherwise be able to accomplish.” (42:24)

Notable Quotes & Memorable Moments

On Infrastructure:
A: “Infrastructure sucks... It sucks so much to like spin up servers.” (06:33)
On Path Dependencies:
A: “It feels like maybe at the time such a small, almost like, kind of like footnote, but it ends up becoming very big.” (14:28)
On Democratizing Customization:
C: “People on those teams... popped open cloud code, made some changes to the actual [agent]. So... everyone can open a PR to do this.” (31:08)
On Platform Philosophy:
B: “Everything should compress down to an outcome and a budget, and that’s probably about it. And everything else should be figured out for you...” (35:54)
On the Long-Term Vision:
B: “Claude is actually able to understand itself enough that it can almost write itself on the fly...” (41:11)

Important Timestamps

[01:47] – Evolution from completion APIs to managed agents
[04:16] – Core primitives and managed agents
[06:33] – Infrastructure as the real bottleneck
[11:32] – Model lock-in and harness engineering
[14:28] – Path dependence in agent/model design
[17:15] – Intended audience for managed agents
[19:36] – Infrastructure challenges in productionization
[25:07] – Team-based and multi-agent workflows
[27:25] – Example: Legal/Marketing approval agent
[31:08] – Enabling non-engineers to customize agents
[35:50] – Modular, multi-agent orchestration patterns
[39:36] – Long-term ambition: ‘outcome + budget’ as the only interface

Episode Mood and Tone

Conversational, insightful, and technical, with the guests blending forward-looking speculation, real implementation details, and a practical sense of humor about the joys and pains of building at the cutting edge of AI platforms.

Summary Takeaway

This episode offers a rare peek behind the curtains of Anthropic’s Claude platform as it moves from being a basic LLM API to a full-featured, modular platform underpinning a new wave of agents and collaborative AI. Angela and Caitlin articulate the philosophy of building not just for developers, but building infrastructure that liberates teams—from legal to engineering—to quickly automate, innovate, and collaborate with AI, all while candidly navigating the balance between abstract flexibility, production reliability, and the relentless pace of AI progress.

If you’re building (or planning to build) with Claude, this is essential listening—laying out the technical roadmap, design patterns, and cultural philosophies that will shape what you can (and should) do with AI in the next year and beyond.

Loading summary

Transcript105 lines

[00:00]
A
A year from now, where do you think the platform will be?
[00:03]
B
We'd want to experiment with directions where CLAUDE actually gets so good at understanding itself, it figures out what model you should be using, it figures out how to spin up all the sub agents. You don't have to think so much about what kind of architectures are there because CLAUDE is actually able to understand itself enough that it can write itself on the fly.
[00:18]
C
In that world, if CLAUDE is on the fly, or agents on the fly are becoming what they need to become in order for you to do what you're trying to do, the platform has to seriously scale.
[00:27]
A
How close are we to Claude? Make me a billion dollars. That's really what I'm asking. Angela, Caitlin, welcome to the show.
[00:47]
C
Thanks for having us.
[00:48]
B
Yeah, thank you.
[00:49]
A
So, for people who don't know, you both work on the platform at Anthropic. So, Angela, you're the head of product for the cloud platform and Caitlin, you are the head of engine for the cloud platform. I'm really psyched to talk to you because a, you've been launching a bunch of stuff. You have cloud managed agents that came out recently, you've been launching new features for it. And I think that it comes at this really interesting time where it makes me think about what actually is a platform in AI for a model company. Because in the GPT three days, the platform was a completion endpoint. You just send a problem to get a response. After that, it was like a completion endpoint with tool calling and a couple and chat sessions, like that kind of stuff. And now with cloud managed agents, you're essentially getting a cloud on a computer with memory and all this other stuff. So I'm just trying to help. I'd love for you to help me unpack that trajectory and what it means to build a platform in AI.
[01:48]
B
Yeah, I think your characterization is very accurate. I think as we kind of, as a lot of these technologies have evolved with the LLM for starting and then I think putting that behind an API was very fun. A lot of people were like, wow, I could do some at the time. I think it was very cool. Now we'll probably look back at it and be like, oh, that was really basic. And then I think we've moved more and more towards a slightly more stateful world. As you kind of want to persist the sessions state to be able to make sure that the performance of the model is better and better. I think that that's probably like actually the through line as a lot of these kind of like as we make improvements to Claude. And as it continues to get better and more autonomous, we find ourselves basically needing to kind of like evolve the platform to be sort of like higher and higher order abstraction. But it's in the pursuit of helping you get the best outcomes out of something. Like, I think in the very beginning we were very like, everyone was very exploratory. It's like you have no idea what people are going to build with these LLMs. And you wanted to kind of have as much possibility out there as available. And then as those use cases started to kind of narrow down, people started building products with it, people started now building agents with it. And more and more of that is about customers coming to us and being like, how do I get the best out of Claude? How do I set up my tools, how do I run the loop? And so on and so forth. And you have some people who are really, really experimenting and they're on the edges and that's great. And then you have just a whole host of other folks that are coming in who are, are like, I kind of want a lot of this stuff out of the box. And in our pursuit for making sure that Claude is basically producing the best outcomes, we find ourselves enriching the platform to be richer and richer and richer. And that's contained in that is both the state, it's the tools that you start to see us adding. It contains a lot of kind of almost sort of the cloud components of a lot of these types of things. But it's in pursuit of the same mission of just making things literally as easy as possible. And I think in probably the forward state of a lot of these things, in terms of maybe the philosophy of what a platform ultimately ends up doing, it probably ends up just being whatever. It's like the set of primitives and infrastructure that enables you to basically get the outcome as fast as possible with actually as little of work as possible. And I think that that tends to follow a certain form factor, at least in this current stage.
[04:09]
A
How would you characterize what the primitives are today? So maybe that's just asking, what are the primitives in cloud managed Agents?
[04:16]
C
Yeah, so Cloud Managed agents is built on all of our same primitives that you could otherwise build on directly. So the Messages API, and within the Messages API we've built a whole bunch of, I guess maybe innovations around the API. Like you could just get tokens in and out if you really wanted to, but you can use some of our built in tools. You can use stuff like code execution, spawn a sandbox and execute work you can use, I guess, like, you know, web search and all these sorts of different things. And so I think we've taken what we see as all the most powerful of those things and put them together into a harness and a set of infrastructure that is, you know, just the way to get what we think is the best outcomes out of Claude.
[05:02]
A
So I'm sitting here feeling this sense of, I've been thinking of it as like time deflation, like my time gets more valuable in the future as opposed to the opposite, whatever the opposite would be, my time gets less valuable in the future. And the reason is because we're so, for example, internally for us, we're building an agent. We're building some agent products where it's agents that do specific things for us internally and then hopefully for customers. And in order to do that, we have a couple Mac Minis with Claude running in a loop on the Mac Mini, a lot of that and it's like a thousand line Python file or whatever. And a lot of that mirrors what you guys are building in cloud managed agents. And so for, for, for me, and I think for a lot of people building on cloud or on the cloud platform or ecosystem, there's, there's at least I feel this, maybe we should just wait for you guys to build it. But then I don't know what the lines are. And, and I, yeah, I'm sort of wondering if I want to build an agent. Like what is the best path to do that in a way that aligns with what you guys are doing?
[06:12]
B
Yeah, I think, you know, this part of the kind of platform business is actually somewhat similar to any other form of the platform business where you do have customers like, like yourself who are building and you know, you're kind of thinking, should I go ahead and do it? Because maybe I have this like immediate need. But at the same time I don't kind of want to like, you know, repeat the work per se. And you could have just, when you could have just gotten it for free out of the platform.
[06:34]
A
And also infrastructure sucks.
[06:35]
B
It does.
[06:36]
A
So it sucks so much to like spin up servers. I can't believe you do that all the time.
[06:43]
B
It's like a huge part of the job. That part everyone's like, that's started.
[06:48]
C
But I will actually say part of why we ended up building cloud managed agents was because Anthropic ourselves had gone through enough of these iterations where we built products that were agents that you could run autonomously in the cloud. And we did that stand up the infrastructure so that it works well, sort of work enough times that we ourselves were like, okay, we're done building this for ourselves. We're doing it once in a way that's going to really work from everything that we've learned, but also for all the people who are doing it. Like you can run whatever you're running on a couple of Mac Minis maybe. Right. And for a lot of people that could work. But I think if you're building agents into your product and you're running something really at scale, scale, that's where it really starts to become more and more challenging to get that infrastructure right.
[07:32]
A
That's really interesting.
[07:33]
B
Yeah. And then maybe to answer the other part of your question, I think we have two pieces of the philosophy here. One is a bit in the way that we kind of design managed agents, which is that we try to have it be modular enough. Like we want to be opinionated about some pieces that we feel like should be very well married to the CLAUDE model. But then we oftentimes the way we want, for example, we want CLAUDE to very specifically use file system that's a very particular clawed in a specific way.
[07:59]
A
Or just file systems in general.
[08:01]
B
Just file systems in general. We also really want to lean into skills. I know a lot of folks like skills, but that's something that we want to have our hardest, be really opinionated about that. And so we're kind of particular about those kind of primitives being the case. So use the file systems, use the skills. They're really basic. But at the same time we still find people who are still trying other methodologies to go do that. And we want to kind of help you when you build to start just kind of starting the best foot. So that's one piece on some of the more opinionated ones. But as each one of these endpoints or APIs that we have as part of the suite, we try to open them up a little bit in certain areas. So there's things that we're looking forward to and being from. Maybe it's not available today, but in our design we are trying to make it flexible enough for people to add in different pieces because we recognize that this API OR suite of APIs is not necessarily going to solve maybe everything in its original construct and there are going to be pieces that need to open up. Then the second bit is we're public about this is when we do design a lot of these things, we do put out blog posts and reference implementations. So if you did want to at least be inspired by that construct, but still maybe make your own on the Messages API. You can definitely do that.
[09:11]
A
I think that's to the point you just made. That's something that's coming up for us again. We have Clauds running on a Mac Mini with a Python file and a couple other, you know, bigger, more serious implementations on like, you know, cloud infrastructure that we're trying to figure out what to do with. And I think I told the team that we were talking today and I think one of the, one of the questions that they have or one of the feelings of consternation that they have considering using cloud managed agents for this kind of thing, for spinning up agents for our customers is just right now it's like we have a playground. We have, we just have like a little, we have a server or Mac Mini. We can just like pipe stuff to, to Claude. It can do anything that cloud code can do. It has a file system, it has a browser, it has like all this stuff. If we want to, you know, switch it, switch it out to GPT 5.5 or Gemini or whatever, it's like pretty easy to, to do that. So is that kind of. And I feel like they, they, they feel like we're going to get, if we use a cloud manager, we're going to get locked in and it's not going to, we're not going to have the flexibility to do all the stuff that we want. And there's also a worry that features are going to come to Claude code itself that won't be in cloud managed agent for a little while and that it'll prevent us from being at the edge, which is sort of what we promise to our customers and really to ourselves. We just love being like, just doing whatever the new thing is. How do you think about that?
[10:38]
C
Yeah, so I think what's nice about the way that we work internally, I guess, is like, so we run the platform and the platform for what most people think of it as is our externally facing APIs and our suite of APIs. The other rest of what our team actually does is internal platform in the sense that all of our first party products are built directly on the same platform as everybody else. And so what's cool about that is we spend all of our time, not all of our time, but a lot of our time working with the teams internally who are building on top of the platform and kind of enabling the features that they will build, sharing ideas and these sorts of things. And so I think over time you'll maybe see less and less divergence of what might be available in cloud managed agents, what might be available in coworker Claude code that might sit on top of the same infrastructure. That's, I think, one way to think about that.
[11:32]
B
Yeah. And then I think on your point around or your team's point around, having some kind of model lock in fear, I think that that's valid. Many folks kind of have that consternation. And I think we're kind of at this place where there's a bit of an evolution here, where if you look back maybe even just a couple months ago, it was very standard to build a very, very, very generic harness. It's super generic. And then you can kind of hot swap models across all of those things. And I think for kind of an older generation of models across labs, that kind of worked okay. A lot of things were moving at a pace where I think that that was mildly reasonable. I think now for the next kind of generation of models. And as we kind of see it forward, I think you kind of see this a little bit from every lab. Everyone's taking slightly different techniques and perspectives on how they want to kind of advance their particular form of the model. And so in theory, I guess you could do kind of the superset of all those things. But more often than not, I think when you build agents for your company or for your customers, you do want to deliver an outcome ultimately for them. And so I think that that level of abstraction of what you're actually hot swapping stops becoming this really generic harness and hot swapping the model, and it gets more to the harness and the model gets very paired. You still need redundancy, and you still might want to use other models for things, but you probably do it at the layer of the agent, meaning the harness plus the model, rather than necessarily the other architecture of really, really generic harness and hot swapping everything underneath.
[12:54]
A
That's really interesting. Is that how, I don't know, the cursors of the world are doing things. Do they have a separate harness for each model, or is it a generic harness that they're kind of hot swapping the models in and out of? Do you know?
[13:05]
B
I'm not entirely sure. My intuition would be that. I don't know about cursor in particular, but there have been team that we have talked to who have kind of fallen on similar kind of perspectives. And it's mostly because they're just trying to squeeze the most out of each model to kind of almost like harness engineer. Every single nuance. And one example that we have, it's not an external customer per se, but something that we've done a lot internally. We recently launched Memory, for example, with managed agents, and we tried a bunch of different harnesses ourselves. We tried one that was the one that we ended up launching. We tried a bunch of others using a bunch of different other techniques. And at least personally for myself, when I saw the kind of eval suite from the team, each one of these harnesses performed drastically differently. And so I think just even looking at something like that shows you that you can actually hill climb a tremendous amount by just harness engineering the right pieces together. And I think if you were to just take that forward across all model combinations, across all different labs, all different kinds of providers, there is a lot of alpha in that kind of construct. And so I wouldn't be surprised if more than just ourselves have experimented with that level of unit tying.
[14:13]
A
It's really interesting that there's this path dependence where you make some choice for how you do requests and responses, or how you do tool calls, or whether you have the model want to use file systems or not. And then that sort of changes the trajectory of all these different models.
[14:28]
B
Yeah. And it feels like maybe at the time such a small, almost like, kind of like footnote, but it ends up becoming very big.
[14:36]
A
Do you think that that will end up affecting the model's generalizability in the sense that at some point they'll just have these sort of maybe locked in lanes of stuff that they're good at because Claude is really good at file systems and OpenAI GPT is good at some other things. Yeah. How is that going to, how's that going to flow through the model's personality and behavior if it's locked into a specific way of doing things?
[15:03]
B
I do think it does actually kind of tend to lock the model. So what we end up treating as the right path and the right primitives need to be very carefully thought through. And so I think in some eras of other models, they become really, really, really good at reasoning and then they almost over optimize on that level of reasoning. And there's other perspectives around, okay, yes, we want it to be really good at a computer. Maybe the computer part is the interesting part. And so if you think through maybe some of the primitives which we could get right, we could get wrong, but at least we'll go through the thought process of that will probably at least lead us one path or the other. I think it's hard to say which direction per se will ultimately be true, but I do think there's a lot of path that dependency ends up taking. So being really thoughtful about what you choose to actually include or give the model more natively is really important.
[15:55]
A
Are there any of those path dependencies that you've had to undo?
[15:59]
C
Hmm,
[16:03]
B
probably. I can't speak enough about that at the anthropic level. I've only been here like a couple of months but I have to imagine that that has been the case. I mean we've experimented even at other labs. The kind of primitives that we have to take a look at are constantly changing and you do kind of hit like a little local maximum and rethink like okay, maybe there's like a more generic approach to doing it.
[16:26]
A
Yeah, yeah, interesting. I want to take a step back and ask you something that maybe I should have asked at the beginning which is who is cloud manage agents for? Right. Like I set one up earlier today. We've got some people already using it in production inside of every and I just, I just did one today. I really loved the sort of like getting started chat experience that you had and the sort of some of the examples that you had and it felt to me like even if I was not technical I might want to use this to set up an agent. It might be a little bit complicated but what I actually did is I just and I'm sorry to say this, but I did it in the Codex in app browser. So I had Codex driving the manage Agent setup and it like I had a slack bot working pretty, pretty quickly. It was like it was really cool. So how do you think about when you're designing stuff when you're designing cloud managed agents, who it's for?
[17:15]
C
Yeah, so it's interesting because I think you're right that especially with that quick start experience which we actually felt pretty strongly about launching, not specifically for the sake of making it so that non technical people could go and build agents, but actually just for anybody, technical or not to be able to wrap their head around the primitives. Like here's what it can do and here's that fits together exactly like you know, the kind of education portion of it. But I think when we think about who is for we think about a couple different things. One is we're seeing people internally within companies build automation or build really powerful platforms or systems. Like we've seen people say I want you know a full end to end software development platform. Right. And like managed agent is a perfect solution for something like that. Or you know, I want to automate a little process over here where like legal has to review my marketing copy. Right. And things like that.
[18:07]
A
And so you shouldn't have to reimplement memory and like all that stuff every time.
[18:11]
C
Every time you're doing that right, you can get started really quickly and you can get something right running quickly. The other user that's top of mind for us is people building into their products that they expose to their customers. And so that's the other one where actually, yes, like, you do still want a lot of customization. You do still want to make something that's going to be really powerful for your product. But we still definitely, definitely believe that not spending your engineering resources on the infrastructure and on all the little harness engineering tweaking sort of stuff is.
[18:40]
A
Why couldn't we have talked like a month ago? You would have saved us so much time.
[18:47]
B
We'll just need to talk more.
[18:49]
A
But I am sort of curious. Okay, so maybe infrastructure is one of these things, but when you see people setting up agents, what do you see them think the hard thing is? And what ends up actually being the hard thing? And are they the same?
[19:02]
B
Good question.
[19:03]
C
Maybe this is, I don't know, spicy. I'm not sure. But I think, I think people think the harness engineering part is the hard part. And so actually, like, you know, in the past, we launched the agent SDK, which is what you guys, I think, are using on your Mac Minis. And for a lot of people, they were like, okay, great, I don't have to do the harness engineering part where I have to do prompt caching and I have to maximize my context window and all these sorts of things.
[19:28]
A
I think we're just actually using just Claude in bat, like the Claude dash P command. Oh, wow.
[19:33]
C
Yeah.
[19:33]
B
Okay.
[19:33]
A
It's. It's pretty good.
[19:34]
C
Yes.
[19:35]
B
Cool.
[19:35]
C
Nice.
[19:36]
B
Okay, cool.
[19:37]
C
And, but regardless, regardless, you guys did that because it takes off your hands building the harness. But I do think what we saw with a lot of customers was, okay, now I want to go and take that thing and get it into production and scale it and everybody hits an infrastructure wall. Everyone hits the same problem of like, oh, wow, I either need to keep a server constantly running, or I need to use infrastructure that will spin up and spin down and I need to store the transcript data and I need some secure sandboxing and all these sorts of things. And so, you know, and like, if you boot a cloud code session or you boot the agent SDK in a sandbox and like, that's the thing that you have running, but your sandbox loses connection and dies or whatever, your whole Agent dies. Right. And so I think the infrastructure part especially is the wall that most people end up hitting but they're more expecting that the actual harness engineering and like getting the most out of the model is the part that's going to be harder.
[20:29]
B
Yeah, I totally agree with that. I was just going to say like you know we talk to so many people who are now at a place where they're like prototyping really quickly and they're super excited and it's like it's doing the thing and then yet there's like a class of people who are you know, really pushing and being okay, I do want to hill climb, I really want to edit the hardest. But then once you have that thing like productionizing is just a friggin nightmare. Especially for the more interesting kind of long running async ones that you want to do a bit more remotely that are a bit more autonomous and everyone kind of runs into that wall and was a big inspiration for why we built what we built built.
[21:00]
A
I feel like one of the like examples of the shape of an agent is openclaw and in particular the thing that it has brought to us internally is you have an always on agent in Slack that has its own personality and has its own like part of the world that it like ends up working on. Are you guys like, is that a possible future for like okay, a one click agent that lives in my Slack that yes, I can go set up all the internals but like I don't have to really think about all of the, you know, the technical infrastructure stuff because I think you all have the beginnings of that but it's still like a lot of steps from the current managed agent to something that's always on in my Slack that I have to like set up and customize. So is that, does that fall in the realm of platform's job or is it like too far in the product direction?
[21:52]
B
No, it definitely is something that we really want to do. I think like you know we, we focused a lot on the infrastructure piece to start because that's where we just see a lot of these pain points. But yes, I think it's, I don't want to exactly say final shape but in its advanced shape we actually want to make it so that you can deploy these agents really really easily. We've made some light steps in this direction. For example we included vaults as one of the primitives as just kind of.
[22:17]
A
And vaults store your keys and stuff like your oauth keys, credentials, credentials.
[22:22]
B
Yeah. As solving some of the lower Level pieces as a starting point. But once you wrap some of these more agent identity type of primitives in a more secure way and you can handle it really easily and it works with the whole system, then I think it's very natural for us to get to a place where maybe you are either one clicking Slack integration or alternatively even maybe just telling Claude, add Slack and it just handles absolutely everything. And then before you know it, your little bot is just picking you on Slack.
[22:51]
A
I love it. I can't wait for that world. What are the best internal use cases of agents? Because I think there's this big question happening right now where okay, yeah, everyone's in Codex or cloud code, but then now we have these agents that are out in the cloud. Now everyone inside of a company can like have their own agent. There are team agents, there are company wide agents. So what are the patterns that you see for when people make really useful internal agents? What they do and what they look like?
[23:17]
C
Yeah, I would say we similar to and we've actually seen a few examples of these in some of the more like AI pilled AGI pilled companies like Stripe built Minions and they talked about that a lot as their kind of like end to end development platform that their engineers could use. I think Ramp did something similar and we've done similar things as well, right. Like yeah, we've built kind of platforms internally that are, you know, I have agents running that I can talk to from Slack or from wherever. Right. And at a certain point that becomes actually like a pretty thin layer on top of managed agents. Like you don't have to do very much to accomplish that.
[23:54]
A
That's what I was thinking. Like I looked at Minions or whatever Ramp does and I was like, it, why? Why? You know, so is it, is it actually useful to have a sort of like thin coding agent that anyone in the company can use or like why not just install the cloud app in Slack?
[24:08]
C
Yeah, I would say the difference in a platform like that and some of the things that we've done internally is there's a lot of customization that you might want to do on, you know, the development environment where an agent is actually running and able to verify its changes. Right. And things like that.
[24:23]
A
So here's how our CICD works.
[24:26]
C
Yeah, exactly. And so, you know, I think for lots and lots and lots of people, like Claude Code is an excellent tool. Right. And you can run cloud agents with, with cloud code and that is really great. But I think if you're trying to do a bit more end to End development. Right. And you maybe want to bake in more custom things. Then you could start with something like managed agents and build a layer on top of that and end up with something that's maybe closer to that end to end experience.
[24:49]
A
It also seems to me like there's something in particular about having a team that you need to work with that makes the manage agent shape important as opposed to it just all works in cloud code. Like I guess technically you could sync the skills between everyone's Claude code, but there's something about just we all have one agent that does this thing that seems to work.
[25:08]
B
Yeah, I'm really glad you brought that one up because I think that's actually one of the more common areas where we see a lot of the opportunity is that to your point, there's a lot of individual productivity that's happening whether you're a developer or non developer. There's so many tools that you're using to just make yourself more automated, more, you know, high leverage. But then when you get to the team layer, suddenly everything gets like massively more complex. Like number one, obviously it can't like sit on your laptop and yes, you could maybe like, you know, put it in the cloud, but it's again more for yourself to kind of like handle with your laptop closed. But then you go to like, okay, well now like the three of us want like, you know, a couple of agents that interface with each other and work with each other and then maybe we're automating a process kind of end to end. And especially for some of the more complex processes that you kind of envision being like really transformed with AI,
[25:55]
C
you
[25:55]
B
do need that kind of team orientation and that needs to happen at a layer that's a slightly higher bit of abstraction than just a single agent. I think some of the teams exploring kind of multi agent architectures and things like that are really exciting, but it needs to be built on top of a little bit of a platform that everyone spin up and down and control. And I think G from Vercel had a really good perspective on this in a way where I think his company Vercel is obviously incredibly AI pilled and he kind of describes it as sort of like an AI software factory internally. And I think that's exactly the right mindset and that produces extremely high leverage organization that's really just creating a tremendous amount of productivity, but not just for themselves, just for every single process that they have in the company.
[26:37]
A
And I really want to go back to this like okay, agent use cases. We've got coding agents that anyone can use in the company. What are the other ones that you see people standing up that are really useful?
[26:48]
C
We've seen a few. So one of the fun things that we get to do is just kind of work with our internal teams of different functions and help them identify because we actually just get to learn a lot as a result of doing that. The silly example I brought up earlier of legal team needs to review marketing copy was one of the ones that.
[27:06]
B
Very real.
[27:07]
C
Yeah, extremely real. Blew people's minds with very basic agents that just give people the right setup to be able to do that. So you've seen that.
[27:16]
A
Well, what does that actually do? So it's like there's marketing copy and there's a legal agent that is just like watching what everything marketing does and is like, stop. Like, no, how does it work?
[27:26]
C
Okay, I'm a marketer and I've written some copy, right? And in the past, maybe you would have opened a ticket or something and be like, can you please review this copy? But instead you submit it to this like, you know, little app that we built on top of agents that is like, okay, cool, now I'm going to go as an agent review first and then put it in legal's inbox as a already first pass review was done. And maybe actually like the agent is. It's clear enough that it can say, okay, marketing, you're good, right? Or maybe it's still like, no, this needs like an extra human review. And so, yeah, it's just. And that's the sort of thing where again, just thin layer on top, but you can build the, you know, you have access, I have access. We can both see the outputs and we can work together on that.
[28:07]
A
Okay, but then, so for example, why is that not a skill?
[28:10]
C
So it very much can be a skill. And that actually is like, if you, you would probably build that agent as a, you know, legal reviewer agents, right? And so you would have MCP servers or whatever it is that help you access external contacts. You would have skills that help you understand, like, here's what rules we have to follow and not follow, right? And all those things. And you'd put all those things together, but then you can just fire off a session with that agent. And then I think the last piece you need, and this is where I'm saying it's a really thin layer, is just like the form factor on top where different people can collaborate together and work with that agent and multiple agents can be involved in the system. And so I think it goes a little bit broader than a skill because you kind of still need the right form factor for the agent to be able to go run and then for people to be able to interact with it.
[29:00]
B
Another core bit on why it's not a skill is because, or not exclusively a skill is because you actually do need human in the loop. And so, like, if you were to automate the whole thing and you were just taking the skill and looking at yourself from legal skill, for example, like, in that world, of course you could have just, like, done a pure skill. But if you need a human in the loop to be like, okay, like, I want to review and I do want to check, and I want to, like, we're looking at, like, legal things. And so there's a bit of, like, you know, authentication that's sort of necessary in order to automate that entire process. You kind of need, like, agents to go do the thing. And so because you need to spin up sort of separate sessions for that to happen, some sort of stitching is necessary that can't be instantiated in a single skill.
[29:38]
A
That's really interesting. Yeah. Okay, so just to push on that a little bit. So what is the best practice for you? You create an agent that it's. Its job is to make sure that when marketing is writing something, they can get it approved really quickly by legal. And sometimes it'll approve things immediately, sometimes it sends stuff to legal, and ideally, it's like, getting better all the time so it can do more and more. Right. What is the best practice for who owns that agent once it's built? Because one of the things that we found is if you don't have a human who's responsible for the agent, it gets stale very quickly, and then it ends up being kind of this, like, dead thing that's all just, like, out there doing stuff, but it's not necessarily good. And also, even if it kind of works, there's all. There are going to be all these times where legal's like, you asked me to approve this, but I don't really need to approve this thing. Like, let's update your prompt. So, like, how does that all work?
[30:30]
C
When it works well, it's actually really interesting because the form factor thing, the app that sits on top of that, that we originally built, one of our teams worked on that, and sitting with these teams and understanding what they needed, and they were like, okay, here you go, and we're going to go do other stuff now and let us know how this goes for you then. A really cool thing actually ended up happening where people on those teams who were using the tool were like, oh, I wish this little thing could get tweaked or this thing could get better. And they like popped open cloud code, made some of the changes to the actual. And so.
[31:08]
A
And then is your team responsible for approving the pr? Does it just like go in?
[31:13]
C
Usually my team's responsible for reviewing the PR if it's a system that we actually own. But. But yeah, like people can kind of self serve making changes to those things, which I think is really cool. So it is. I do think we're still in a stage for a lot of teams, a lot of companies, like even going back to, you know, like Stripe has Minions. Right. Like Stripe has a large developer productivity team. We used to work at Stripe, so we spend a lot of time with them, but they have a large developer productivity team. They're awesome. And they're obviously putting a lot of work and energy into building platforms and tools like this. And so I think we're definitely still in a place where something like managed agents or being able to build on top of our platform is really powerful, but you still kind of need the AI pilled people and technical people within a business to then go create something really excellent on top of that that works well for whatever you're trying to do.
[32:02]
A
That's interesting. Yeah, I love the, anyone can open a PR to do this because everyone's using cloud code. One of the things that I find talking to people who are in infrastructure roles at companies where this is starting to happen is like, you know that, you know the meme where it's like there's, there's a person and he's like going like this and he has like daggers in his like back and he's like cover. Yeah, it's like infrastructure people are that for like now anyone can, can like, can, can submit prs. How do you deal with that and how do you do that? Well, because obviously like in an ideal world you would love for a legal to be able to submit prs to improve this agent. And also sometimes they're probably going to submit stupid stuff that wastes time. And so what are the, what are the right ways to either organizationally, like culturally or technically, like make that possible without ruining your lives.
[32:56]
B
For this particular one that we've constructed, that Caitlin's given as an example, we actually have a couple layers of abstraction away from that kind of PR layer. So at the very beginning it kind of started that way and to kind of basically prevent users from kind of foot gutting themselves a little bit. They kind of get to a place where oftentimes their way of interacting with the agent that they own, whether it's if it's the marketing team who owns the marketing agent requesting, or if it's the legal team owning the agent that does the review, they actually engage with those agents through Claude itself. So they actually spend more of their time talking directly to Claude. And then Claude will oftentimes figure out what should be the right way for them to go and handle it so that they're not kind of hopping straight down to the absolute core bit and doing something that may result in some complications.
[33:44]
A
And they're talking to Claude or CLAUDE code, like Claude chat or Claude code code or coworker.
[33:47]
B
It's a different instantiation of Claude that we made that actually is a managed agent in and of itself. So it's just kind of like manage agents all the way down in that construct. But we found that each layer, if we kind of tune and prompt each variant of the managed agent, it helps to solve different parts of the problem for users. So at the end state for that marketing person or that legal person, it is a really simple interface where the way that we tell them is like you're just talking to Claude, but under the hood. It's many, many clauds engaging with each other to get to the part where then they, the clods themselves are doing the more complex work that the human doesn't really necessarily need to interpret.
[34:24]
A
Interesting. You guys just launched multi agent orchestration. What are the coolest, what are the coolest things that people are doing with that?
[34:33]
B
One of the more interesting ones is like, I think people are using it to like construct sort of different harness techniques. And that one I'm personally very excited by because like there's different techniques that people have experimented with where, you know, like for example, we recently did like the Advisor strategy one, but really if you were to genericize it, you just separate like execution from advice. And there's also one where you can have like two, you know, modes where they're, one is generating someone something and the other one's adversarial to it. And then there could also be sort of like, you know, you split it into a bunch of different, like little tiny pieces and then they kind of recombine and then there's ones where maybe it's kind of something closer to best of n kind of like style of thing and then there's so many more. And in each one of these different types of architectures, or strategies. They are good for very specific use cases. So some of them are much better for deep research or wide research type of style use cases. Right. And there are others that are like, these are the kind of ones where they all sort of swarm together are better for like bug hunting, for example. And so like, that's like really cool to see that. Like, if we can make the primitives very Lego, like, then people can put them together to solve things at a slightly higher form factor, which is more like an architecture or a strategy, and they get much more interesting results out of that. And that's really exciting to see because it also suggests that you can actually hill climb at multiple layers of abstraction.
[35:51]
A
How do you know if an agent is successful? How do you measure success for an agent?
[35:55]
B
Yeah, I mean, there's like evals and stuff like that, which everyone has talked about ad nauseum. One direction that we really like is this kind of verifiable outcome. We've been somewhat opinionated on that one. And it's almost like in the absolute end state of. We talked a little bit about what's a platform at the end of things. Going from that philosophy, it's like our principle of maybe the end state of some of these things is that everything should compress down to an outcome and a budget, and that's probably about it. And everything else should be figured out for you to resolve exactly across those parameters. And so for us, we're kind of, yes, we still have evals. We have a lot of these other things that we measure that are domain specific. Like some coding evals would be like, you might want to measure just the actual PR getting merged. Those are more verifiable. But as we get to the place where an outcome is actually a spec that you are just as a human able to define, and our ability to interpret that and regrade itself over and over is closer to what we care about.
[36:51]
A
Claude, make me a billion dollars. Your budget is $10.
[36:54]
B
Exactly. And then say, no mistakes.
[36:56]
A
Go, go.
[36:57]
B
Exactly.
[36:58]
A
Maybe Mythos could do that. And then one of the things that we've been running into that I'm curious if you have a solution for is agents get outdated pretty quickly sometimes because there's no human attached to them. Sometimes they're just running an old model or there's in an old architecture or whatever. And it feels like there needs to be a end of life cycle for agents. We've talked about having a little funeral for them and having a little page on our website that's like, here's all the decommissioned agents and stuff. Like, how do you manage, especially in a really big company, how do you manage all the agents that are out there? And maybe they're like in slack pinging stuff once a week, but you're like, this is super stale. How do you make sure that you retire them as quickly as you are making them?
[37:50]
C
So one of the things we have actually done is we have made skills that help you do things like upgrade to a new model when a new model comes out. Right. Like, we've actually put a good amount of work into making it easier to do exactly what you're talking about. And I think maybe some of the most, like AGI pill people are like, running agents that are monitoring their agents to see if their agents are outdated and in need of that sort of stuff. But I think for the way that we like to talk to customers who ask us this question, I do think the most interesting instantiation of this is there's a new model and now I need to go upgrade my agents or maybe be done with those agents, because the new model enables me to build agents that are way more powerful and do more interesting things than the old agents did. Right. But I think that upgrade process and that migration process is something people have had to wrap their heads around as, like, it's like a breaking change, and I have to put actual energy into making that work. And obviously, sorry to talk about evals, but, like, if you have evals, this process is easier and things like this, but I do think that's one of the things we've tried to do is how do we give you skills and how do we give you the right tools to make that process easier? And then you could go be AGI pilled and choose to actually automate more of that with more agents.
[39:09]
B
Yeah.
[39:11]
A
So a year from now, we're back at Code with Claude. Where do you think the platform will be? What will I be able to do and how it will be different from. From what I can do today?
[39:24]
B
Do you want to go first?
[39:25]
C
You can go first. A year is a long time. It's in this. In this industry especially.
[39:30]
A
How close are we to Club make me a billion dollars? That's really what I'm asking.
[39:36]
C
Probably won't be sitting here.
[39:37]
B
Yes, yes. We will be asking Claude for this. I mean, yeah, like, we want to get closer and closer to that. That state where I think we. We kind of. Okay, so a couple things, I think in a year from now. I mean, one thing that we'd love to get really, really close to is actually that kind of simplicity. And this might be a significantly higher order of abstraction. I don't know what the form factor will look like or whatever, but the kind of parameters we will care for from users will be that outcome. And of course it has to be verifiable. There are some parameters that have to be restrictive and the budget. And I think we'd want to experiment with directions where CLAUDE actually gets so good at understanding itself. It figures out what model you should be using, it figures out how to spin up all the sub agents. I actually don't think you need to think so much about harness engineering in that world today. You know, you don't have to think so much more aggressively about like tool construction, for example. Like we've kind of made that a little easier and you get to leave a little bit of that scaffolding less prompt engineering to. Yeah, exactly, exactly. And I think if you just keep going up that stack, like today, a lot of the innovation is happening at this kind of like, like, like really high level, almost like harness architecture like level, which is really fun. But I think a lot of that honestly also kind of goes away where you almost like don't have to think so much about model selection, you don't have to think so much about what kind of architectures are there. Because we probably would have gone through enough iterations with Claude where CLAUDE is actually able to understand itself enough that it can almost write itself on the fly to figure out what is necessary in that two parameter world of outcome and budget. I don't know that we'll get there in a year, but I feel we might be able to do the outcome part of that with maybe some bars, some error bars on the budget side.
[41:11]
A
Really cool.
[41:12]
C
Yeah. Okay, that was really cool. I'm going to give you a slightly more boring answer, which is in that world, if cloud is like on the fly or agents on the fly are like becoming what they need to become in order for you to do what you're trying to do. The platform has to like seriously scale that it does. And so I do think some of this will be what are the right abstractions that actually enable that? Right. Like so somewhere on the primitive to higher order realm. Right. But I do think so much of what our team is going to be doing is making sure that the tokens that people want to come in and out of CLAUDE are going to be able to come in and out of CLAUDE because our system is scaled to meet not just the demand, but like in that world where it's just like you have agents that are like literally constantly running and recreating themselves and doing this sort of work. You just need a system that can handle long running requests, can handle a bunch of differently shaped things. And so I think for us, it's going to be I never want the ability of the platform itself to be able to scale, to get in the way of what people would otherwise be able to accomplish with these things. And so I think that's something that's going to probably be very friend of mine when we're talking in a year.
[42:24]
A
Awesome. I'm excited. Thank you so much for joining. I really learned a lot.
[42:28]
B
Thanks for having us.
[42:38]
D
Oh my gosh, folks, you absolutely, positively have to smash that, like, button and subscribe to AI and I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure, unadulterated knowledge bombs. About ChatGPT Every episode is a rollercoaster of emotions and insights and laughter that will leave you on the edge of your seat, craving for more. It's not just a show, it's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor. Hit like Smash, subscribe and strap in for the ride of your life. And now, without any further ado, let me just say, Dan, I'm absolutely, hopelessly in love with you.