Summary6 min read

Podcast Summary: Inside Stainless: The Developer Tools Startup Anthropic Just Bought for $300 Million

Podcast: AI & I
Host: Dan Shipper
Guest: Alex Rattray, Founder & CEO of Stainless
Date: May 20, 2026

Episode Overview

This episode features an in-depth conversation with Alex Rattray, founder and CEO of Stainless, recently acquired by Anthropic for $300 million. Stainless is the company behind APIs and SDKs powering many major tech companies and is at the forefront of enabling AI-native interfaces to the internet via the emerging Model Context Protocol (MCP). Dan and Alex explore the nitty-gritty of how modern APIs, agentic AI, and MCP are reshaping how programs – and increasingly, AIs – interact with each other and the world. The discussion dives into technical, practical, and philosophical aspects of building scalable, secure MCP servers, and the vision for AI as a “cyborg” blend of language models and traditional code.

Key Discussion Points & Insights

1. What Stainless Does & Origin Story

Stainless builds APIs and SDKs for major companies (OpenAI, Anthropic, etc.), making it easier for computers to “talk” to computers.
[02:46] Personal anecdote about running barefoot and early startup days; building foundations together as friends.

2. APIs: The Dendrites of the Internet

APIs as neural connections: Just as dendrites enable neuronal communication, APIs are the meshwork binding the digital world.
- [05:23] "APIs are the dendrites of the Internet... everything that we think of when we think about technology at this point, APIs are kind of at the heart and center..." — Alex
Traditional APIs are optimized for human developers, not for LLMs (Large Language Models).

3. MCP (Model Context Protocol): AI-Native Interfaces

MCP explained: A protocol allowing LLMs to interact with web services via structured “tools.”
- [10:05] "Just like a website is built for humans ... MCP is sort of the equivalent ... for the model." — Dan
Core Issue: MCP today is limited; it’s difficult to scale, hard to secure, and challenging to make ergonomic for models.
- Context budget: LLMs can't maintain the vast context needed to expose entire APIs, limiting flexibility.
- "It's difficult to deliver on the core vision of what's so exciting about MCP..." — Alex [09:00]

4. Scaling & Usability Challenges for MCP

Most MCP servers today must handcraft tools specifically for LLM usability.
Exploding complexity: Trying to expose entire APIs (like Stripe) burns through LLM context windows and confuses the model.
- [12:38] "Today's models not only can't handle that amount of context ... it's a poor use of context ..." — Alex
Security pitfalls: If everything is a tool, permission boundaries can blur.
- [16:28] "You don't want the AI to color outside the lines ... and send a bunch of money to my own AI bank account. Ha ha, ha." — Alex

5. Building Good MCP Servers – Principles & Pitfalls

[22:28] Alex’s practical design tips:
- Keep number of tools low and focused
- Precise naming and concise, specific descriptions
- Input schemas: Few parameters, well-described
- Response data: Only what’s needed—minimize data returned
- "Good writing is hard." — Alex [22:42]
Dynamic Mode: Three core tools — list endpoints, get endpoint, and execute endpoint — help manage large APIs but introduce latency and some lossy behavior.

6. Real-World Usage at Stainless

[25:05] Alex uses MCP servers most on the business side for cross-referencing customer data across Notion, HubSpot, Gong, and Stainless’ own database:
- "What are the interesting customers that signed up for Stainless last week? ... It’s incredible." — Alex
[28:52] Organizing knowledge via AI: Claude Code writes customer quotes and SQL queries into a git repo (“company knowledge repository”).
- "Whenever you need anything later, you're like, Claude, like go search through my master repository to figure out where the best customer quote is for this." — Dan [29:19]
- "Claude code can handle unstructured stuff really well, so you don't have to think about it too hard in advance." — Alex [30:31]

7. AI for Software Ops & Customer Support

Experiments with Claude Code for ticket resolution:
- "Is that going to work out 100% of the time? Definitely not. Is that going to work out 50% of time? Still no, to be honest with you. But can that improve the overall efficiency? Yeah, maybe." — Alex [33:08]

8. Cyborg Vision: The Future Is AI+Code

The “Cyborg” AI Model:
- Future AIs are “part LLM, part code execution,” blending models' reasoning with traditional computation.
- [33:42] "I like to say, is the future of AI is cyborgs ..." — Alex
Production model: Instead of giving the LLM dozens of tool options, give it the ability to write and execute code (e.g., TypeScript or Python), using the same SDKs a developer might use.
- [36:59] "Rather than the model having a bajillion tools, model has two tools. One to execute code ... and one to search the docs..." — Alex
- Lower context usage, higher efficiency, better alignment with how LLMs already function.

9. Security Considerations

"The security has to take place at the API layer itself ... using OAuth with granular permissions, proper scopes." — Alex [40:56]
Security should be tightly bound to API permissioning, not just MCP exposure.

10. Future Products: “YOLO Mode” and Developer-First Environments

Dan encourages being bold (“YOLO mode”)—don’t over-throttle early developer adoption.
- [47:14] "The things that get adopted are often the ones that are willing to take the risk to be YOLO very early." — Dan
Stainless plans environments where developers can easily test and build using code-execution-plus-API, with strong but flexible security choices.
- "We're working on it." — Alex [43:15]
Vision: Automations and tools will increasingly be “prompt-driven engineering” — you write an instruction + code, not a custom tool each time.

Notable Quotes

On AI-native APIs:
"What's the thing that gives LLMs an easy way to interface with an API? ... What we're seeing so far as MCP is rolling out... is that it's not working so great."
— Alex [07:41]
On designing tool schemas:
"You want to keep the number of tools relatively small... and the tool name and the description be really precise and specific." — Alex [22:28]
On AI and security:
"The security has to take place at the API layer itself ... using OAuth with granular permissions, with proper scopes." — Alex [40:56]
On the future of agentic AI:
"I like to say, is the future of AI is cyborgs..." — Alex [33:42]
On prompt engineering as the new engineering:
"At that point, the only work in engineering that you have to do is prompt engineering. We'll see if it's that quote unquote easy..." — Alex [48:55]

Memorable Moments & Timestamps

[03:15] Barefoot CEO anecdote: Early startup days and personal quirks.
[12:38] MCP context argument: LLMs can’t process all API options.
[22:28] Magic recipe for good MCP tool design.
[28:52] Claude Code for company knowledge base: The future of knowledge management via AI.
[33:42] “Cyborgs” vision for AI: A blend of code and language models.
[47:14] Dan’s YOLO Mode pep talk: “Often the things that get adopted are often the ones that are willing to take the risk…”

Key Takeaways for Listeners

APIs are evolving: Originally meant for human-programmed automation, APIs must now become “AI-native,” requiring new design paradigms (like MCP) and fresh thinking around context, security, and scalability.
MCP is promising but immature: Today’s models struggle to use richly featured APIs at scale; best practice is tightly scoped, carefully designed tools.
AI’s next leap is code-native interaction: Giving LLMs the ability to safely execute code using SDKs will likely become the dominant pattern—less “tool zoo,” more “write and run code.”
Security is foundational: The safest (and most scalable) way to manage AI actions is at the API permission layer, not by endlessly custom-wrapping.
Prompt engineering is the new engineering: As AI agents become programmable via natural language and code, the challenge is less about building new tools and more about composing prompts and leveraging well-defined APIs.
Practical adoption still has bumps: Usability, disconnections, and messy knowledge bases remain obstacles, but the tools are evolving rapidly.

Where to Learn More

Stainless: stainless.com
Dan Shipper / AI & I podcast: every.to/chain-of-thought

Loading summary

Transcript72 lines

[00:00]
A
The Internet runs on computers talking to each other, but its entire architecture was built for a pre AI world. Now we're trying to hook AI up to the Internet with MCP Model Context Protocol, which turns any website or web service into a set of tools that an AI can use natively to get worked on. And the software companies that learn how to do MCP well are going to win over the next decade. That's why I brought Alex Rattray, the founder and CEO of Stainless, onto the show. Stainless job is to help computers talk to each other. They make the API and SDKs for all the big companies that you know about, like OpenAI and Anthropic. And they're starting to build MCP servers too. So Alex and I get into the nitty gritty of what the future of MCP looks like, how to design good MCPs, why MCPs are actually really hard to scale and possibly insecure. And we try to figure out together what a better model for allowing AIs to use the Internet might look like. This is a great episode. Alex is a good friend of mine. Let's dive in. Alex, welcome to the show.
[01:16]
B
Thanks Dan. It's really exciting to be here.
[01:19]
A
It's good to have you. So for people who don't know you are the founder and CEO of Stainless, which is the API company. You make APIs for companies like OpenAI and Anthropic and just name your big company that you might use your API. Stainless is probably behind it. Before that you worked at Stripe doing their API surprise. And before that, most importantly, we were very good friends in college and we remained good friends and we were both starting companies in college. I'm a tiny investor in Stainless, but it's been really, really fun to watch your journey and get to get to hang out together so much over the years. And I'm just very excited to bring you on to talk about AI and what you're doing at Stainless.
[02:02]
B
Thanks, Dan. Yeah, it's been really fun over the years. I mean, when we were in college, I was working on a startup, you were working on a startup. You had a conference room at a venture capitalist office as your office and you let me crash there with my co founder and team and we were just like on the other side of the conference table hacking away into the evening and very fond memories of those days. And these days it's not every evening, but you know, on the weekends, whatever, same thing is still happening. And it's, you know, you don't see that every day and it's really a Nice feeling, and it's been great to see everything happening with every. On the way.
[02:46]
A
Thank you. As I say, started from the bottom. Now we're here. And, yeah, I mean, I. The thing that I always say when people. When I run into people and they ask me about you in order to embarrass you, I. I just talk about how you're the only person that I know of who has consistently run barefoot through the streets of Philadelphia. Because when we first met, you were. You were not a fan of shoes, and you were a fan of running. You want to talk about that?
[03:15]
B
Yeah. It wasn't that I didn't like the concept of shoes. It's that I couldn't find a good pair of. Um. And at a certain point, you know, it's like I was running through Nikes, and they would. They would bust open every few months. Um, I think what was actually going on is I had really wide feet. Um, and was I. I was buying probably narrow shoes. Um, but they would. Shoes would constantly get ruined. And, you know, on a college budget, it's just like, this is. This is it. This is no good. Um, and eventually I decided, okay, the longer you wear your shoes, the. The more worn out they get. But the longer you just wear your feet, the tougher they get. Um, so the longer you wear your feet. Try it out. Try this at home.
[04:01]
A
What could go wrong?
[04:03]
B
Uh, I actually currently have a really annoying splinter in one of my feet. Uh, that I was. And so don't actually try this at home.
[04:10]
A
But are you still running barefoot?
[04:13]
B
No, no, this is just from around the house. Um, I see.
[04:17]
A
Dangerous. Yeah.
[04:20]
B
Yeah, but see, that's the thing. If I had been going around on the asphalt without socks on, then my feet would have been tougher and I'd have no splinter.
[04:30]
A
So when you're not running barefoot, you're running. You are running stainless. So you're running stainless. And so how. How many people you are. You know, you're. You're around 50, right?
[04:45]
B
Just about. Yeah.
[04:47]
A
That's. That's pretty wild. And you started stainless in a pre AI world, and now we're in an AI world, and I think you have some ideas for what the future of AI is going to be, and maybe how. How APIs fit into that, maybe how MCPs fit into that. Do you want to, like, paint a little bit of a picture for us about where we're going?
[05:08]
B
Yeah, I would love to. So, to start, like, what's an API? Not everybody's familiar with that. So, um, it stands for application Programming interface, there will not be a quiz. Right? Right, Dan? No quizzes.
[05:23]
A
No, no quizzes.
[05:24]
B
Great. But basically it's, it's how one computer program talks to another computer program. It's how, it's how computers talk to computers, how apps talk, talk to apps. And so APIs are, are the dendrites of the Internet. Dendrites are where your neurons connect and, and actually exchange information with each other. So if you have like two neurons in your brain, but they're not talking to each other, you're actually not thinking. Right. There is no thought happening in a brain without connections between neurons. And if you think about the Internet, if all these servers in the cloud aren't talking to each other, you wouldn't have Internet. Right. Like there's nothing going on if, you know, programs, Internet software is doing nothing without APIs, without connections to other programs. And so it's really fundamental to the mesh of pretty much all modern software. Everything that we think of, when we think about technology at this point, APIs are kind of at the heart and center of that, just like dendrites are the center of the mesh of the brain and how we think. And Seamless's mission from day one was sort of to make it easier for computers to talk to computers. And you know, it's the long running trend of technology to have more automation. Right. Automation is what we mean when we say, okay, we're going to, you know, we're gonna, we're gonna apply technology to that. You know, we're generally gonna be making things more efficient. And APIs are how most business to business interactions in some format or another become, become real, become automated. And what we see with the rise of AI is that there's a new computer has entered the chat, right? There's a new kind of system that can talk to other systems or at least we would like it to be able to. You used to have either humans interacting with a computer through a user interface, a UI, or a computer acting with a computer through an API. And now we have LLMs interacting with computers. Right. And what's that through? And I'm sure anyone familiar with, you know, with every, and his regular listeners is going to be familiar with MCP Model Context Protocol, which is a system for connecting LLMs to computers, broadly speaking. And it's an area that we're investing in at Stainless. It's really, I think part of our core mission of, like I said, make it easy for computers to talk to computers. And we've invested a lot of time, you know, at Stainless the core product that we first brought to market is software development kits, SDKs. And so these are ways of saying, okay, Stripe has this great Rest API. You know, you can send JSON over HTTP and get back JSON over HTTP. And if you want that to be really convenient, you're going to use the Stripe Python library, the Stripe Python SDK. So you can go, if you're a Python developer, you'll go, pip, install Stripe. And then in your application code you'll write Stripe customers create. And all of a sudden you have a nice new customer object in sort of your Stripe database and you're off to the races. Or Stripe charges create in the old days to charge a credit card. And SDKs are what gives developers that easy way to, to interface with an API. What's the thing that gives LLMs an easy way to interface with an API? And you might say mcp, and in a sense you'd be right. But what we're seeing so far as MCP is rolling out into the world and people are experimenting with it and trying it out, is that it's not working so great. Like there's, it's, it's difficult to deliver on. What I see as the core vision of what's so exciting about mcp, which is just like a dashboard in a user interface, lets you click around, see a bunch of stuff, fill out forms, click buttons, do things. Anything that you would do while you're interacting with the software, you do through the user interface generally. But LLMs interacting through MCP tends to be much more restricted. You can only do a few little things. There's usually not a ton of tools that you're going to be exposing to the models
[10:05]
A
just to stop you there. I think what I'm hearing you say is what MCP does is just like a website is built for humans to be used. MCP is sort of the equivalent, you can think of it in certain ways, of exposing a set of tools for the model that it can use to perform certain functions. Just like you might click a button on a website, the MCP gives to the model a bunch of things they can click on or use to get work done. So an example might be a Gmail. MCP has a Send Mail tool or a Compose Mail tool, or a Read Inbox tool, that kind of thing. And instead of a human going on the Gmail website and doing it, it's the LLM is essentially logging in and using it itself. And it's a native interface for language models. But you're saying that that's not working that well. Can you tell me more about that?
[10:58]
B
Yeah. So let's, let's start actually with kind of what I see as the big vision of MCP and in some sense the big vision of agentic AI in the first place. And I'll start with the most pedestrian example you can imagine. It's going to be funny given some of our context, which is, let's say Dan walks into my store and buys a pair of stripey socks and maybe a few other things. And then the next day I hear back from Dan that there was something wrong. Unfortunately it happens, you know, and I turn to someone on my team and I say, hey, can we refund Dan for those stripy socks he bought yesterday and send him a discount code for, for the next time he comes in with like a little thank you note because we like to take care of our customers. This is like the most normal thing to do in software is, is some little task like this and what you're going to do, what you know, the member of my team would be doing would be opening up their internal admin and looking around for some things. They might go to the stripe dashboard and try to look through the list of payments or the list of transactions or orders and try to find one that has someone named Dan, which Dan, I don't know, there might be a bunch of Dan's. Try to look through the list of products in the order and see whether there was some stripey socks in there. That might be a few clicks required depending on find the right one. Then go to the screen where you can create a refund. Create a refund, make sure it's the right amount, then go and create that discount and then take that discount code and send it over to some other SaaS app where you log in to send some, some mail automatically. Right? And of course, if you step away from the consumer version of this to a business to business context, of course you might be going into Salesforce and sending a slack message to an account administrator, you know, an account manager, so on and so forth. And in the normal course of work, it's just the most normal thing in the world to be doing. Having one task Involve going through five different apps each time, 15 different clicks and scrolls and loading spinners just to do sort of like one simple thing. And the promise of agentic AI is to be able to take that same prompt I just said and Type it into ChatGPT or Claude or whatever and say, hey chatty buddy, can you help refund my, my friend Dan Da Da, da da da. And just have the AI go off and do that and basically go through these five different apps and the 15 different screens and the various different, you know, button presses to complete the task and then come back and say, great, it's done that. In order to do that now, there's only so many tool calls you have to make as an AI model to perform that exact linear chain of events. It's somewhat tractable. But if you think about this in the general case, you want the LLM to be able to do, you want your agentic AI to be able to do anything that that human operator would have done, and you would want them to be able to do it without having to wait for a bunch of JavaScript to load on a website or anything like that. And that means you need not only the Stripe Create Refund tool and the Stripe List Transactions tool and the Stripe List Products and Look up Customer and Create Discount tool. You need not only those tools, but you need everything that you can do in the Stripe Dashboard, which is basically everything that you can do in the Stripe API. And that's actually a lot. Like there are hundreds of different endpoints that you have access to in the Stripe API. The Stripe Dashboard is actually massive. It's a huge application. And if you were to take that list of tools today and go to an LLM and say, hey, here's our MCP definition for all of this. Here's a Create refund tool, here's a Create Transactions tool, so on and so forth, and you tell it all about those tools. Here's the description, here's all the different request properties that you can send, here's the response properties you can get back. Here's all the documentation for each of those things. Everyone listening to this should already know, you've just burned through your entire context budget. That's maybe hundreds of thousands of tokens just there in pretty much translating the Stripe open API spec directly over to MCP tools. And today's models not only can't handle that amount of context, it's a poor use of context because you have a lot else going on, but it's also confusing to the model. It's just too much to hold in your brain at one time. And that's just the straight part of it, right? Because what you're really trying to do is enable your operators to do anything they would normally do. And again, that spans many, many different SaaS tools, right? In the course of one interaction, it might be five. In the next interaction, it might be a different five. And so if you think about every single SaaS tool that your business uses on a daily basis to get your work done, ideally you would want every single one of those tools to be exposed to your operators in their AI chat with every single tool available in there, with every single nook and cranny and corner case available, so that you can do anything through AI. That's the vision. Now, there's a lot of problems with that. The biggest one that I mentioned is sort of this context window limit. But you also have all sorts of security and permissions problems because you don't want the AI to color outside the lines and say, okay, in addition to refunding Dan socks, I also refunded every customer for all transactions ever. And then I sent a bunch of money to my own AI bank account. Ha ha, ha. And so there's more to the challenge, but that's the vision. I see.
[17:01]
A
But I think the place we started there was you said it's not working, but I don't think that that's the reason why it's not working today. Right. Or is that the reason why it's not working today?
[17:14]
B
So what people do with MCP today is sometimes they'll try to expose all parts of their API. The way people build MCP tools is generally speaking they have an underlying API, usually a REST API, and they wrap different parts of that, different endpoints, different operations in MCP tools. And you can kind of do that in a one to one mapping or you can kind of handcraft things for the mcp. And today in order to succeed, people are finding that you really have to kind of handcraft it to the mcp, to the LLMs. You have to say, okay, I'm making one specialized tool to look up a customer and refund their transaction based on a description.
[17:57]
A
So there's all these decisions that you have to make where you need to have the ergonomics of the model and how the model thinks in mind in order to make sure the model does the right thing more often than not.
[18:09]
B
Yeah, it's hard. It's hard. Yeah, yeah. So I use this SDK analogy sometimes. So it took a long time for humanity to get to the point where we could make a really good Python SDK for a Python developer wrapping it up, an API. And I think we've cracked that nut. Stainless offers really great Python libraries, but we're building on the shoulders of giants here. A lot of people have done this over time. We haven't figured out how to expose an API ergonomically to an LLM in the same way that we've figured out how to expose it ergonomically to a Python developer. And that's a new research problem, in a sense. And it's harder because I can go learn how to be a Python developer if I want. I can't really learn how to go think or see like an LLM, but, you know, sure would be powerful if I could. And, and that makes, that makes it tricky. We do have at Stainless, I think, some, some things that we're cooking up to address some of these problems, including not including the ones that you also mentioned. LLMs have a really hard time with a repeated, sustained chain of actions. And even if you get an API response back around, hey, list all the transactions, there's so much data and you might have to go through the next page and the next page and the next page to go through all the transactions to find the one that has Dan with the stripey socks. And that's again, a ton of context with one or two small needles in the haystack. And LLMs are pretty good at that, but they're not perfect. And with too much hay, you know, we all kind of end up throwing up our hands. And that's true for LMS too. So, yeah, so there's a lot of challenges today.
[19:59]
A
And so when you look at, I mean, you're building MCP servers for people, but when you build them and just generally when you see people doing it well today, like what are the principles or how do you think about making an McP server that one, people use, which is actually a big one, and then two, when it is used, actually does the right job.
[20:24]
B
There have been relatively few times that I've seen it done well. I have seen it done well. We're picking something up that I'm really excited about. But with today's technology, you really have to do a good job of product management. I mean, you have to go out into the market and talk to your customers and see what their actual needs are and look over their shoulders as they, you know, use and operate, you know, your software and think about what could we unlock through AI, where people would be doing things that they can't really do with our software today because it just got so much easier. And then you have to do kind of a lot of engineering work, usually to wrap it up in a bow that works for, for the models. And you have to, you know, you have to set up a really good system for evals. And if you're doing mcp, you have to think about the different clients that people might be using. Are they using cursor, are they using claude code, are they using something else? And the different models underlying all that. So you end up with this pretty crazy matrix of things that you might want to optimize for and ways that you might want to evaluate and make sure that what you're offering is working well. And it's also kind of a black box to get that feedback back to your servers so that you can find out, hey, we gave a tool call response here. We gave an answer of some kind. Was it actually any good? Did the user like it? Was the LLM able to use it? And that's a problem that I think I haven't seen a lot of people solve yet as well. And so thinking about that as a first class thing, maybe you have like a send feedback tool. That's something that we've been thinking about doing. Just so if a user says out loud in the chat, oh man, that was useless garbage. Okay, now at least the MCP server is going to find out about that.
[22:16]
A
But is there anything specific you've learned about how to do it? Well, other than obviously you got to talk to your customers, think about your use cases, but more concrete, more applicable stuff about how to design a good MCP server.
[22:29]
B
You want to keep the number of tools relatively small, relatively low. You want to have the tool name and the description be really precise and specific.
[22:40]
A
Aren't those two things at odds?
[22:42]
B
Yes. Good writing is hard. Yeah. I mean, that's why you can make a great tool of lookup person by name and product description and then refund them. You can make a great tool that does that. And you also want a small number of properties in the input schema. You want a small number of parameters and you want them concisely described but sufficiently described. This is also hard. And you want the response data to come back with a very small amount of data only exactly what the model will need. That's also very hard because you may not know a priori which things the model is really looking for. And you know, we have a technique that we use in our MCP servers today where we give the model a JQ filter, which is a way of filtering out JSON. And that can work pretty well. But. But that's kind of a special trick.
[23:40]
A
Doesn't this mean that like MCP just needs another level of like a search tool functions? Search tools like find a list of relevant tools.
[23:48]
B
Given my task, the tool browsing problem is definitely one very serious one. And that is one approach. And so we actually do this at Stainless today, where you can get an MCP server for your API that just has, like I was saying earlier, the very simple thing of every endpoint is exposed as a tool. And if you have a small API that works great and you can also filter it out so you expose an MCP server with only a small subset of your, of your endpoints, that works great. You can also use kind of what we call dynamic mode, where there's three tools, no matter how big your API is. One is, you know, list endpoints, the other is get Endpoint and learn about it. And then the last one is execute endpoint. And so that enables this context thing to scale really well, but it means there's three turns of the model just to do one thing. And so that, that gets slower. It's, it's more expensive in another sense and there's some lossiness. The, it doesn't perform. It performs pretty well usually, but not, not quite as well because the, the tools aren't loaded up in quite the same way.
[25:03]
A
Are you using NCP servers yourself?
[25:06]
B
Yeah, I use, I use MCP to actually, funnily enough, not so much on the coding side, but I use it on the business side. So I'll use like the notion HubSpot Gong MCP servers to kind of say, hey, like an action MCP server for, for our database, a read only. A read only copy of our database and say, hey, what are the interesting customers that signed up for Stainless last week? And it'll go off and make a great query of our postgres database and then it can cross reference those things in HubSpot and then look up our notes in Notion, maybe even look at transcripts and gong and tell me all about it. It's incredible.
[25:49]
A
Lots of us are shipping AI to production, which is great for productivity, but it also comes with anxiety. You tweak a prompt, swap models, adjust parameters, and everything looks fine in testing. So you merge and then three days later or even sooner, the support tickets start rolling in. The AI is giving your customers unexpected answers and you have no idea when it Happened or why. BrainTrust is the AI observability platform that fixes this. It connects evals and observability in one workflow. That way you see what actually happened in production and can measure whether changes made things better or worse. Traces show the full execution path. Evals define what good looks like. And experiments let you compare prompts and models side by side before shipping. Production traces feed directly into your Eval datasets. Every failure becomes a test case. You catch regressions in CI before they reach users and teams at Notion, Stripe, Zapier, Vercel and Ramp use it to ship quality AI at scale. BrainTrust is designed for teams building production AI systems where silent regressions are expensive. It's built for any stack. They have SDKs for Python, TypeScript, Go, Radio, Ruby C. There's no framework lock in or vendor dependencies. It's SOC 2 type 2 certified and GDPR and HIPAA compliant. Get started@braintrust.dev that's braintrust.dev and now back to the episode. And so, so that's one of your, that's one of your big use cases. Like, are you doing that like every week or how. Like, how are you. I'm. Now I'm interested not even from an MCP perspective, but for anyone running a business that has some complexity. And you're like, I want to know what's going on in the business. Like, what is, what are you actually doing and what is the report that comes out and how often are you doing that and all that kind of stuff So I can tell me, so I can steal it?
[27:27]
B
Yeah, for me, it's still usually in kind of like playing around mode. One of the things is the MCP servers disconnect and then I get annoyed. And so, you know, you have to just kind of reconnect and whatever. It's not a huge deal, but there are a lot of little paper cuts still in a technology this new that you're going to expect that can hold back some amount of your usage. One of the things that I found really helpful, kind of at the meta level, and I'm sure you've had other guests talk about this, is the practice of just collecting notes for the AI by the AI and kind of edited and curated by yourself. You know, I have a, like a. I can't remember if I call it a note. I think I have a notes folder or research folder or something like that in a special git repo that I, that I use just for this sort of like internal stuff. And I'm like, hey, when you find interesting customer quotes, put them in this folder and give the full citation so that the next time I start asking interesting questions, it doesn't have to go searching through the MCP servers again. It has them kind of cached and in just on disk in markdown files.
[28:42]
A
Wait, that's crazy. Wait, so how are you get it? Like, what are you, what are you using to write into that into that Git repo. Like, is it cloud code? Is it, are you using ChatGPT? Like how does it get in there?
[28:53]
B
Yeah, I use, I use cloud code these days for that kind of thing.
[28:56]
A
And so you just have cloud code open and running and then a new customer testimonial comes in and you're just like, hey, can you throw this in my, like Git Master company Git Knowledge repository basically. And, and then whenever you need anything later, you're like, Claude, like go search through my master repository to figure out where the best customer quote is for this.
[29:20]
B
Totally.
[29:21]
A
That's fucking so cool. Can I, can we see it?
[29:26]
B
No, it's too messy and probably has a lot of confidential information. Uh, the latter being more, more important.
[29:33]
A
Um, is it, when you say it's messy, like are you having Claude organize it at all or like how is it structured?
[29:38]
B
There's a lot that, that I want us to do here that we haven't had the chance to do yet. There's some, there's some other low, lower hanging fruit that, that I'm working through that, that our business team is working through right now. Um, just on the, on the basics of your kind of CRM systems and so on. But, and so it's not as, it's not well structured now, but I think that's fine. Yeah, I don't plan to prioritize structuring it super, super well until we're using it more. I'm using it more broadly because I use this stuff some of the time. One of the business people on the team uses it a fair amount. I think one or two kind of of our customer support engineers uses this stuff a lot, but it's not yet kind of broader than that and I would like it to get there. And once we see how everything's evolving, I think that's when we'll start bringing in more structure. But as it is, Claude code can handle unstructured stuff really well, so you don't have to think about it too hard in advance. In my view, you can move things around later.
[30:46]
A
What else do you have in there other than customer quotes?
[30:49]
B
SQL queries. So I'm a software developer, I don't write a lot of code these days, but you know, I spent a lot of time doing that. And so when I say, hey, you know, can you look up? You know, I might be, hey, how is our month on month growth of XYZ metric over the last three months? You know, I did this recently, I did this for my last board prep and it came out with a pretty good answer right away. And I was like, wow, this is awesome. And then I kind of looked a little bit deeper and I was like, oh, I actually want to exclude these users from this analysis and I want to filter it this way and filter it that way. And I kind of imbued more of this business context into that SQL query and I iterated with Claude code to get it to be better and better for the specific kind of metric that I was looking for, the specific kind of story that I was trying to tell. And then I got it to a good place and I was like, great, let's dump this to an analysis folder or an analytics folder for future use.
[31:55]
A
And then next time you're doing your board prep, you can be like, hey, what was that query that we did last time? And it'll presumably go get it.
[32:01]
B
Yeah, that's really cool.
[32:03]
A
What else?
[32:05]
B
As any software team is, these days, we're using this also for, hey, a customer comes in with a question, can, can cloud code just fix it? You know, and so you'll have in some cases a linear ticket is filed. And then, you know, our support engineers are really very technical and so they may not have the, the wall clock time to go down and chase down the fix themselves to, you know, an incoming bug. They have the technical skill, but guess what, another customer writes in two minutes later and, and they want to jump on that. They don't want to be knee deep in a debugger. And so something that we do sometimes is they'll file the ticket in case and by default it'll, maybe they intend to do it later or some other engineer is going to be doing it later. But hey, can we, can we see if Claude code can just take a crack at it? Is that going to work out 100% of the time? Definitely not. Is that going to work out 50% of time still? No, to be honest with you. But can that improve the overall efficiency? Yeah, maybe. We're still, I would say experimental there, but, but we're seeing a lot of promise.
[33:29]
A
That's really interesting. Okay, well, I know you also, you know, in our, in our pre production call you were talking about you have a big vision for the future of AI. Do you want to, do you want to talk, talk me through that?
[33:42]
B
Yeah, yeah, I would love to. You know, we talked earlier about how agentic AI can, can make operators lives a lot easier by taking their da, you know, certain pedestrian tasks and sort of running with it independently. And that's something that I think as an industry we're almost on the cusp of. And if you start stepping, you know, you ask how you get there and you also start asking about the steps beyond that and beyond that. A big part of the way I see things unfolding from here, I like to say, is the future of AI is cyborgs, which is like sort of like extra ridiculous because like what is a cyborg other than like already like a robot? But cyborg, as I understand it, is a term that means you're sort of like part person and then part machine. And in this case, I mean when you go and talk to an agent, what you're going to be getting is part GPT, Neural Net, LLM, part AI and part code where the machine, quote unquote that I'm talking about is traditional cpu, not GPU software. And to me, I think I expect this to play out in two main ways. One is your kind of one off operational use cases like we were talking about a minute ago. And then the other is production software and in the use case we were talking about a minute ago, where someone needs to kind of perform some tricky one off action with a bunch of points and clicks and now you want an AI to just do a bunch of tool calls. The way I actually see that happening and what we're building towards is code execution. So rather than the model having a bajillion tools, model has two tools. One to execute code where it just kind of has a text box of like, hey, put in some typescript and you're going to use this APIs TypeScript SDK and you're just going to write stripe transactions list or stripe dot, you know, charges list and you're going to stripe customers retrieve and stripe refunds create. This is really easy for models, they're really good at writing code. And if you give that tool a little bit of sort of a readme where you say here's an example request and here's some other resources, some other API calls that you can make. It's really good at extrapolating from patterns with if the SDK is sort of and the API are well formed and predictable and then you give it an additional tool to kind of search the docs and ask questions to the docs and anything it's not sure about or gets wrong on the first try, you give it the documentation. And what this does for that scenario that we were talking about earlier is you have very, very limited impact on the context window up front. We're talking about 1000 tokens or something like that maybe less. And the context impact of doing a whole bunch of paginated list requests zero. You know, the model will go look for somebody named Dan and it'll double check that the purchase is stripey socks. And you might write three nested for loops. But then only at the end, when it found the right thing, it'll console log found Dan, customer id blah blah blah, transaction ID blah blah blah. And then create refund, you know, refund ID 123. And the context hit coming back from all of this is going to be like 10 lines of text. You know, it's, it's really minimal. And all of this will run really, really quickly too. So you don't have a round trip to the model every time you're doing something like this. It's just CPU code and it runs in a server in the cloud right next to the Stripe API in AWS somewhere probably. And it goes super, super fast.
[38:02]
A
Okay, so what I am understanding you saying is like the language model has a tool where it can write code and send that code to this tool that whoever the company is, whether it's Stripe or whatever, whoever's MCP server you're using, they'll go and execute that code and that code is going to interact with their API and then return the results. Rather than these sort of, you have 50 different possible tool calls and you know, all that stuff. It's just model writes API code and API provider executes that code, runs it on their API and returns the results. Why wouldn't I just, why wouldn't my model just like write the code that I then run myself instead of relying on an API provider to do it?
[38:50]
B
I expect that that will happen a lot more. I expect that the code execution tool is going to become the most widely used tool. The problem, one of the problems that we have today is that the code execution tool doesn't work so well with libraries. LLMs have a hard time working with library and knowing exactly what version of the library it's using. Using the right version probably usually the latest version and not hallucinating aspects of the API and knowing how to iterate. If it's hallucinates wrong, and if it can't use any library off NPM or you know, the Python package index or anything like that, really, really well, basically perfectly out of the box, then okay, well, forget about using a library. At that point you just have to hit the raw HTTP API and that point, in order to figure out what's in there, you need the whole open API spec and you're back at square one because that document is massive. And furthermore, something that's really scary about that is if you don't have a typed library with, with static typing where the computer can say what you're trying to do is wrong, then the LLM will try to make an API request that is wrong some percentage of the time. The code execution tool can run a type checker and say, oh, you know, you're asking about Stripe transactions list, but that actually doesn't exist. Stripe doesn't have a transactions API. You might want payment intents, you might want orders, you might want balance transactions. Which one do you want? And if the API provider is doing a great job building this tool, it'll return the documentation for all of these things in line. It might have its own AI look at what the model's trying to do and come up with a suggestion. And that sub agent is well trained, specified, always updating, and isn't burdened with the context of the full conversation.
[40:54]
A
What do you think of the security model?
[40:56]
B
The security model is really, really interesting. This is another area where we're really starting to think about things at Stainless, and I'm getting really excited about it. So if any listeners are really interested in this and have some ideas or want to talk, you know, please do reach out. At the end of the day, I think the security has to take place at the API layer itself. Right now you see people trying to implement security by sort of limiting what's exposed through mcp and that kind of makes sense. But at the end of the day you could do anything that's in the API under the hood, right? And what people should be doing is using OAuth with granular permissions, with proper scopes. And at that point the security happens the right place, which is at the API layer. There's limitations to OAuth scopes and it's pretty hard to build. So it'd be nice if someone made that easy. But in my view, that's kind of the, that direction is sort of the right, the right layer.
[42:06]
A
So going back to my earlier question, I'm thinking about the idea of having a model write code that then the API provides provider executes to, you know, interact with their API and then returns the results. Would you ever consider just creating a tool use tool that developers use? Because like, for example, I'm thinking about for Quora, got all these tools, maybe Gmail is going to build, you know, like a code use thing or whatever. But really I just want, I would probably use what you're talking about inside of Quora, but we would need a tool use tool or. It's not a tool use tool, it's like a. It's a computer. It's a computer use tool where. And I know OpenAI has this, but it's not really well built for, for, for. For lots of libraries and stuff. It's not a custom environment. Like I need a computer use tool where I control the environment and I can install different libraries in it and be able to call at any time to then call any API or it has to have network access, basically.
[43:13]
B
Yep.
[43:14]
A
You guys should build that.
[43:15]
B
We're working on it.
[43:17]
A
Fuck yeah. You're building it for developers who want to access MCP servers or people who are providing MCP servers.
[43:25]
B
We're starting with people who are providing MCP servers, but ultimately I think that we're going to need this to work such that you can give the model a code execution environment where it can hit not only the Stripe integration, but also the Salesforce integration and also anything else, but not too much anything else. Right. And so one of the advantages of starting where we're starting of just one API provider is that you ensure that there's no network connections allowed out of that sandbox where we're running the code to anything other than in this case, API.swepe.com and that's really, really critical for security for something like this. And so there's ways to expand that by bit and keep things, and keep things secure. It'll take some time. The other thing I think, to point out, as you see some of these generalizations, is it's not just that you want this code execution sandbox to work really well for any API, for any library, which I think we really do. I think we really need that. You also start to see that this is just a powerful model for AI doing stuff. And sometimes you realize that the thing that the AI did this one time in this one off case is actually enduringly useful. Maybe anytime a customer writes into support and says, hey, my socks had holes in them, you should automatically get a refund. Maybe you want that, maybe you don't. But there's a lot of stuff that people do one or one time and then two times and then three times, and then they say, okay, we should automate this. Right? And that's. And that's what software teams do all day, every day. Right. And we're going to be. I think we're also going to be seeing that with AI where the same. This, that same code search tool that we're talking about all the same prompting that will make an AI really, really good at interacting with an API in one of these code sandboxes, kind of like almost quote unquote in its brain, or can like write code in its head, run the code in its head, see the results, and then move forward with your, with your, with your query, with your task. It should be able to say, okay, actually this is enduringly useful code. Let me commit this to the repo.
[45:37]
A
Yeah, yeah, yeah, yeah. It's like, you know, chat is a really good interface for exploring, but sometimes you just want a dashboard. You know, you just. I just want to like log into my Stripe dashboard and see all the stuff without having to be like, what is my mrr? It should just show up, you know, because I just do that every day. But I want to, I want to push you as a, as a hashtag value add investor. Because I, I think that there's a, I think that there's this thing that happens in AI where often the first attempt at something like this, people try to be really cautious. And I'm sure that your customers care about you being cautious, like big enterprise customers. But the things that get adopted are often the ones that are willing to take the risk to be YOLO very early. So an example is Dolly was like totally private for like a long time and people were like posting some images but you couldn't get in. And then stable diffusion was just like, fuck it, like, anyone can use this. And then that just really started the whole image generation wave. Obviously stable diffusion sort of fumbled the bag, but they had a lead for a little while. Same thing for cloud code. Honestly, like, if you look at Codex is not like this as much anymore. But if you look at the difference between Codex CLI and Claude code, cloud code was just like, fuck it. Like YOLO mode. It's super industrious, it has a sandbox, but you can just do dangerously skip permissions. And Codex just fell way behind because it was. First it was in the browser and so their whole thing was, the whole thing was like locked down. And then it was in the, it was in the, in the cli, but it was really built for pair programming and so it just wasn't particularly industrious. It wouldn't go off and do a bunch of stuff. It didn't. It would get locked out of doing certain things even if you did full auto mode. And now they've like caught up because they're, they're like, yeah, you can just let it do whatever you want and So I would, I would really push you on. There might be a version that you could do like today or tomorrow or like very soon for individual developers that would let them set up this environment that, for example, I would use like immediately. And I care about security, but I care, I care a lot less than some X gigantic enterprise company. But I think the people like me who are building at this scale are eventually, hopefully going to be the big companies. But we're the ones that are really doing the AI first adoption, not the big companies.
[48:01]
B
Well, I would love to get this in your hands. What are some of the APIs your team uses the most?
[48:07]
A
We have a bunch of different products, but I'm thinking right now about Quora, the email assistant.
[48:12]
B
And
[48:14]
A
it has all of the, like, the, the big APIs that it's using. It's mostly the Gmail, the Gmail API. And so you're interacting with the assistant over chat. And then it has a list of tools that are like, you know, archive email or draft email or send email or whatever. Like there's a whole categorize tool, so it categorizes your mail in certain ways. And I think we would definitely try out something like this because it would, if it, if it ran the same way, it would make it much more flexible for us to make more tools and not break old ones. You know, it's really interesting.
[48:55]
B
I mean, in a sense, what I actually predict is that people who are quote unquote building tools, once we have a code execution kind of super tool like I'm talking about, is that the only way you really build a tool is with instructions, with prompts, and the full power of everything you could possibly do in the API. In the Gmail API, for example, it's all there in one tool. But sometimes you have specific tasks or specific categories of work that you want to describe in a particular way to help the LLM perform a sequence of actions as productively as possible. And at that point, the only work in engineering that you have to do is prompt engineering. We'll see if it's, we'll see if it's that quote unquote easy. As we all know, prompt engineering can be, can be really tricky. It's hard, yeah, but, but I think, I think that's, that's part of the vision. That being said, you know, we do have some pretty nifty ways with the MCP servers that we generate today to help developers mix and match all the parts of the different tools, all the different parts of the API as they compose and write their own tools.
[50:10]
A
This is awesome. So for people who are listening and want to know more from you and know more from Stainless, where should they find you?
[50:18]
B
Stainless.com. that's our website.
[50:22]
A
Awesome. Or at least visit stainless. Com. Alex, great to have you on. I can't wait to do more of this when you have some of these new things launched. This is really, really fun and yeah, great to chat that.
[50:33]
B
Thanks, Dan. You too. Oh my gosh, folks, you absolutely, positively have to smash that, like, button and subscribe to AI and I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure, unadulterated knowledge. Bombs about chat GPT. Every episode is a rollercoaster of emotions, insights, and laughter that will leave you on the edge of your seat craving for more. It's not just a show, it's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor. Hit like Smash, subscribe and strap in for the ride of your life. And now, without any further ado, let me just say, Dan, I'm absolutely, hopelessly in love with you.