Summary7 min read

Podcast Summary: MCP Co-Creator on the Next Wave of LLM Innovation

Podcast: AI + a16z
Guests: Yoko Lee (a16z Infra Partner), David Soriapara (Anthropic, MCP Co-Creator)
Release Date: May 2, 2025

Main Theme

This episode explores the Model Context Protocol (MCP), an open standard designed to make large language model (LLM) applications extensible, composable, and interoperable with a wide variety of tools, datasets, and creative workflows. The conversation digs into MCP's origin story, how it unlocks “agentic” behaviors, key protocol features, developer use cases, underappreciated capabilities, and the future direction of MCP as open infrastructure.

Key Discussion Points & Insights

1. What Is an "Agent" in AI?

Timestamps: 00:00–00:39, 45:27–48:25

Definition Debate:
- Yoko Lee: Calls an agent “a multi-step LLM reasoning chain.”
- David Soriapara: Sees agency as “autonomous orchestration or task solving,” with anything that reacts and takes multiple steps now having “some agency over what it’s doing.”
  - Quote: “Anything that’s a multi-step thing is for me already like an agent right the moment it does two steps and it reacts to the first step, it's basically an agent.” — David Soriapara (00:14, 45:41)

2. MCP: Purpose and Analogy to APIs

Timestamps: 01:51–04:04

MCP is an open protocol enabling anyone to extend AI apps with custom workflows and integrations by standardizing the way LLM clients and external tools communicate.
- Quote: “What it enables is something like the current API ecosystem, but for LLM interactions with some form of context providers or agents in any form or shape.” — David Soriapara (02:24)
API Analogy: Like how APIs standardized SaaS integration, MCP provides a schema/interface to connect tools and LLM apps flexibly.

3. Origin Story and Rapid Prototyping

Timestamps: 04:04–07:29

Frustration As Fuel: David's frustration with lack of workflow extensibility in apps like Claude Desktop and Z code editor sparked the idea of a universal integration protocol.
Chicken-and-Egg of Protocols: MCP was shaped by fast prototyping, not theorizing. The first demonstration was a Puppeteer server that let Claude control a browser—designed to “wow” and prove possibility.
Early clients integrated into Claude Desktop and Z; first “boring but useful” servers did things like Postgres or GitHub integration.

4. Creative & Unexpected Community Use Cases

Timestamps: 09:03–16:06

Long-Tail Innovation:
- Quote: “I just love it when a protocol kind of unlocks the long tails, when the long tails is really long. …Now everyone can build the software for one.” — Yoko Lee (11:50)
Notable Community Examples:
- Buying Christmas gifts via Claude on Amazon (10:00)
- Morse code notifications using Philips Hue lights (13:09)
- Cat-monitoring project with Raspberry Pi and narration agent (14:10)
- JetBrains IDE control, creative tools (Blender/Unity/Synths), reverse engineering tools
Personal Projects:
- Yoko describes solving practical problems (like automating texts to her husband when late for dinner using MCP and Cursor agent), and inventive hacks (lights blinking Morse code when tasks finish).

5. Underutilized MCP Features

Timestamps: 16:06–26:22

Sampling (16:54): Lets a server request a client’s current LLM to generate output (“sample”) instead of bringing its own, enabling richer integrations and agentic loops that are model-agnostic.
- Quote: “Sampling is a way for the MCP server to go back to the client and ask the client, hey, can you give me, with the current selected model a completion... That way I can build these MCPs that are very rich, ...model independent.” — David Soriapara (16:54)
Chaining: Chaining servers and clients to create multipart workflows and agent loops.
Prompts & Resources (23:15):
- Prompts: Insert dynamic or static context directly into LLM calls; can be user-driven and fetch data on demand.
- Resources: Server-exposed blobs (files, screenshots, etc.) that clients can query—mindshift compared to traditional local resource access.
  - Quote: “You could expose your current screenshot as a resource…these things leave a lot more use cases open to explore.” — David Soriapara (26:04)
Many features remain underused as most developers focus on the “tool calling” aspect only.

6. Transport, Authentication & Authorization

Timestamps: 26:47–34:29

Transport Independence: MCP isn’t tied to HTTP—can use standard IO, JSON-RPC, or other transports.
Authentication/Authorization:
- Current usage: local servers with simple API keys.
- OAuth in Development: Collaboration with auth experts (Microsoft, Okta, AWS) to shape secure, user-friendly mechanisms for professional/remote MCP servers.
  - Quote: “What authorization does…enables MCP servers that are remote, that are bound to a company account.…Now I’m authorized and it opens this company and corporate ecosystem.” — David Soriapara (29:37)
- Focus: Authorization (“am I allowed access?”) before tackling identity/authentication in agentic chains.

7. Creativity, Artists & Non-Developer Use Cases

Timestamps: 34:29–39:10

Early adopters: Developers (natural due to comfort with JSON, APIs, etc.)
Creative use:
- Blender/Unity for 3D and game environments
- Music/synths programming via LLMs + MCP
- Quote: “I think that's a very unique thing because… an environmental artist in Blender has probably never had the ability to really express itself in words…and have it translated to a 3D environment.” — David Soriapara (36:28)
Inputs will expand from just words/code to images, curves, and more as clients evolve.

8. Abstraction Layer & Future Modalities

Timestamps: 39:10–43:40

Debate over ultimate “interface” or abstraction—natural language, programming languages, pixels, video, etc.
- “Only natural language…would not be good enough…a combination of them might be the right thing.” — David Soriapara (40:21)
Experimentation is needed; MCP enables such experimentation with different modalities.
Surprising example: LLMs can generate ASCII art/animations quite well due to their token sequence prediction (43:17).

9. MCP’s Roadmap and Scope

Timestamps: 43:40–47:33

Will keep evolving on core features (auth, modalities, streaming).
There’s open debate about what MCP should NOT try to address (e.g., being a full-fledged agent framework or handling all retrieval/database tasks).
Staying focused on composability, useful abstraction.

10. Composability, Trust Boundaries & Multi-Agent Workflows

Timestamps: 47:33–50:26

Multi-agent systems may naturally emerge where there are trust boundaries (e.g., separating access between a travel agent and a banking interface).
- “Agents is less a function of the task, but more a matter of trust boundaries.” — David Soriapara (48:25)
For most current workloads, single-agent “call graphs” are sufficient, but evolution is expected as needs grow.

11. How To Contribute & Community Growth

Timestamps: 50:26–52:57

MCP operates as a traditional open-source project:
- Review issues/PRs
- Implement features in SDKs
- Contribute to RFCs/spec proposals (higher bar for spec work)
- Build trust with maintainers over time; governance models are evolving for more sustainability.
- Quote: “Just go and help and work with us. That’s really what we need at the moment.” — David Soriapara (52:57)
Key contributors include Microsoft, AWS, Okta, and active community members.

Notable Quotes & Memorable Moments

“I just love it when a protocol unlocks the long tails...now everyone can build the software for one.” — Yoko Lee (11:50)
“Sampling is a way for the MCP server to go back to the client and ask…with the current selected model a completion...so I can build these MCPs that are very rich, ...model independent.” — David Soriapara (16:54)
“For me, agents is potentially more like agency. So something that does some form of autonomous orchestration, autonomous task solving…multi step thing is for me already like an agent.” — David Soriapara (00:14/45:41)
“Developers know how to query an API; artists know how to sketch a line. What happens when both can instruct LLMs?” — Paraphrased from Yoko Lee (throughout)
“I need that version for my dog.” — David Soriapara, on the cat-monitoring project (15:06)
“There's so much going on…the spec lift is on the client side because we expect fewer clients than servers. We want to make it trivial to build a server.” — David Soriapara (19:09)
“Inputs today are mostly words… later I wonder what kind of experience we'll have if every design tool becomes an MCP client.” — Yoko Lee (39:10)

Timestamps for Key Segments

Agent Definitions: 00:00–00:39, 45:27–48:25
Purpose and Origin of MCP: 01:51–07:29
Creative & Real-World Uses: 09:03–16:06
Underutilized Features (Sampling, Chaining, Prompts, Resources): 16:06–26:22
Transport, Auth, and Authorization: 26:47–34:29
Creative Tools & Modalities: 34:29–43:40
Future, Roadmap, Agents vs. Trust Boundaries: 43:40–50:26
Open Source & Community: 50:26–52:57

Engagement & Tone

The episode is playful, technical, and deeply collaborative. Both guests share a hackers’ delight in creative misuses and edge cases, while honestly addressing technical hurdles (e.g., client support bottlenecks, auth challenges). The spirit is one of open experimentation and empowerment—“unlocking the long tail” for everyone, from power users to professional dev teams, artists, and tinkerers.

For listeners wanting to dive deeper into building with or contributing to MCP, the episode encourages direct involvement, creative experimentation, and collaboration.

Loading summary

Transcript72 lines

[00:00]
Yoko Lee
I have to ask the question here. How do you define an agent?
[00:06]
David Soriapara
I'm not going to get into that. What do you think is an agent? How do you do that?
[00:10]
Yoko Lee
I think it's a multi step LLM reasoning chain. It's very simple for me.
[00:14]
David Soriapara
Okay, yeah, I can't get behind that. For me, agents is potentially more like in this word, like agency. So something that does some form of autonomous orchestration, autonomous task solving and that usually anything that's a multi step thing is for me already like an agent right at the moment. It does two steps and it reacts to the first step is basically an agent because it now has some agency over what it's doing welcome Back to.
[00:39]
Podcast Host
The a16z AI podcast. It's been a while, but here we are again with another great discussion about the fast moving AI space. This time it's MCP or Model Context Protocol, which has been a major topic of conversation this year as it means to open up new LLM use cases and energetic behaviors by connecting models to any number of new tools, datasets and external applications. And here to talk about it are 816Z, Infra partner Yoko Lee and Anthropic's David Soriapara, who created MCP along with his colleague Justin Spahr Summers. Among other topics, Yoko and David discussed the MCP origin story early and popular use cases, important work still to be done, for example around authentication and what is the right level of abstraction for carrying out certain types of workflows. It's an insightful and timely conversation that you'll hear after these disclosures. As a reminder, please note that the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. For more details, please see a16z.com disclosures.
[01:51]
David Soriapara
So MCP is first and foremost, it's an open protocol and it does not say much yet. But what it really tries to do, it tries to enable building AI applications in such a way that they can be extended by everyone else that is not part of the original development team through these MCP servers and really bring the workflows you care about, the things that you want to do to these AI applications and for that it's like a protocol that just defines how you know whatever you are building as a developer for that integration piece and that AI applications, how they talk to each other and that's really what it is. It's a very Boring specification. But then you know, what it enables is hopefully, at least in my best case scenario, is something that looks like the current API ecosystem, but for LLM interactions with some form of context providers or agents in any form or shape.
[02:46]
Yoko Lee
Yeah, I really love the analogy with the API ecosystem just because they give people like a mental model of how the ecosystem evolves. It feels like API when it first came out is an abstraction on top of a set of things you can do on a different set of servers and services. Before you may need a different spec to query Salesforce versus query HubSpot. Now you can use similarly defined API schema to do that. Not exactly the same because everyone defines query parameters differently. And then when I saw MCP earlier in the year when I was building something with it, it's very interesting that it almost felt like a standard interface for the agent to interface with LLMs. It's like, what are the set of things that the agent wants to execute on that it has never seen before? What kind of context does it need to make these things happen? When I tried it out, it was just super powerful. And I no longer have to build one tool per client. I now can build just one MCP server, for example, for sending emails. And I use it for everything on Cursor, on Cloud desktop, on Goose. So curious about, I guess like what's the behind the scenes story? What inspired you when you first realized, oh, we need a protocol for this and how did you create it?
[04:05]
David Soriapara
Yeah, just. Thank you, that's, I think that's an interesting question. And I, with all of these type of ideas, I think they never like happen in a vacuum. So I was like, I joined Anthropic about a year ago, pretty much actually a year ago, and I was working mostly on how we can use Claude more internally to accelerate ourselves. And as part of that, one of these original ideas I was thinking through is I cannot be the person who builds for everyone their specific workflow, their specific things, but I need to enable them to build for themselves because they know best what they need and how their workflow and their like agentic bits that they want to build fit into, you know, their system, on the ecosystem that they're working in. And so that was one aspect to that. The second aspect to that was that at the time I was using both Cloud Desktop, which was amazing with its artifacts that really allow you to greatly visualize things. But it had this limitation that basically there was no interaction with anything outside of the text box. You couldn't add Google Files yet or Anything like that. And at the same time I was using a code editor, which was amazing because it had access to all my code and had access to all these cool things, but it couldn't visualize anything very nicely as Cloud Desktop. And I was very frustrated by just copying things from Cloud desktop back into the editor and back and forth. I'm like, there needs to be a better way across these two applications. And then if you take these two things together of like I need some way of enabling people to build something so like some form of API. But at the same time I also want to work this across multiple applications like a code editor. You know, for me there was the Z code editor, which I really like, and Cloud Desktop, which is, you know, obviously my favorite desktop application. You look at like how do I solve this classic M times N problem of like I have, you know, M clients and I need like end providers. And the answer is a protocol for that. And it's always been protocols for these type of thing. And there are many patterns in the past that match that. And that's how I came to go, hey, I would really love to have some form of protocol that enables me to tell Claudes, to tell, you know, Z, to tell Cursor the workflow I care about and the things that I miss from it because I'm a developer and I want to build for this and I know how to build for this. Just let me do it. That was really the, the origin in that regard and that was just the idea. And I took this idea to a person called Justin Spar Summers who is the co creator of MCP with me and he took a real liking to that idea and thought it was a good idea and was really one of the key people to prototype the initial version, really make it work within part of the product side of Anthropic and played a really big role in making this a rather big thing in Anthropic initially. Yeah. And so we both basically co created this together until we released it into the Open in November 2024.
[07:07]
Yoko Lee
I love that. I love the creative partnership here and then with the framework or protocol. I will have to ask, it's kind of a chicken egg problem. Do you create the instance of you can implement using protocol first or do you have the protocol in mind and if you created something of concrete instance or examples of it, what was the first MCP server or client you created?
[07:30]
David Soriapara
Yeah, that's very good observation. It's a very classic chicken and egg problem. And the way we usually do this internally and Like Justin is amazing at this is very rapid prototyping. So we did had just a very intense few weeks of writing prototypes. Very simple, like simple things that just demo it, like for the most part initially. So one of the first ones that we wrote was the Puppeteer server. That's like the ability to control a Chrome instance. And one of the reasons you do this is because it is a very active process. There's something happening on the screen and it makes people wow. Which is what the effect you want to get to people. You want to be like, how do I convince people that there's a lot of possibility here. Hey, I can control your browser and I can do things you couldn't do before. And Claude is the one doing it, not yourself manually doing it. But while we're doing this, we're like, we're refining the concept and we spend a lot of time discussing, I wouldn't say fighting, but definitely like having an interesting discourse about certain primitives you want to put in and leave out. And there was a lot of changes to the way things were working in the first few weeks. And again, like the first MCP client that I think Justin was writing was he wrote it into Cloud Desktop, I wrote it into Z. So that happened kind of in parallel. And then I think the real use case MCP server that we really had for ourselves was one of these very boring ones, like maybe a GitHub integration or something like that to just help me doing my work better. Like a postgres server. Nothing super fun, nothing super creative. Just like the most obvious thing you would go and want to do.
[09:03]
Yoko Lee
I would say that your example is super creative because you can't really have the agent do anything for you. Recently I've seen an example of the giblification process. So there's someone have a MCP server that control the browser to ask for models to generate giblified images so they don't have to implement API endpoints. That blew my mind.
[09:26]
David Soriapara
Very cool. That is very good.
[09:29]
Yoko Lee
Yeah, I guess outside of your first initial use cases, since you've probably seen every MCP client server out there in the community, what are some top interesting implementations you have seen for MCP servers or clients?
[09:45]
David Soriapara
I like when people get creative. I think it's great that people build a lot of these integrations that are very sensible and are quite straightforward. Again, the Postgres servers, the GitHub servers, the Asana servers of the world. But what I really like is in figure creative. I think one of the things that made me just laugh Was this person, very early on around Christmas, just hooked up Claude and Claude desktop to their Amazon account and just had Claude buy their Christmas gifts. And I always just thought, this is hilarious. It's like so funny and so creative.
[10:21]
Yoko Lee
How is that implemented? Does it have payments?
[10:25]
David Soriapara
I forgot the exact details, but I think it was some combination of basically playwright or puppeteer controlling the browser. But it was deliberately built around something from Amazon that I want to buy or select a set of gifts. And so I love these type of things a lot. And I like when I've seen your Morse code MCP server. I love these kind of things. He's like, playful, engaged with technology. Years and years ago, I was a pretty active member of local hacker spaces and these type of things in Germany. And I love the creative way people interact with technology and try to build things. And so every time I see these kind of combinations, they're just beautiful. And we talk about this a bit later, right? Like when people deal with synthesizers in Unity and Blender. But then there's obviously like a fun, interesting technology part to that. I thought Jetbrains did a really good job having an MCP server that can control their ide. And that is. That's a bit more of a complex setup and I love that part. And then there's fields that I didn't even think about. There's a somewhat famous YouTuber called Laurie who's a reverse engineer, and they used a claw to help with reverse engineering some files and using mcp. I thought that was kind of cool because it's like some of these things, nobody would ever build First Party, a reverse engineering tool into their desktop, but that person can just go and build it themselves because of course they have the ability and they have the skill to do that. And so that's the kind of stuff I love.
[11:50]
Yoko Lee
I just love it when a protocol kind of unlocks the long tails, when the long tails is really long. Because no one, as you said, no one else will build it as first party, but now everyone can build the software for one.
[12:02]
David Soriapara
Yeah, I'm actually a little bit curious, like, what are one or two examples that you had that you found quite funny and interesting for yourself?
[12:10]
Yoko Lee
Yeah, so there was one I built. It's actually a very practical use case where sometimes I'm like, so into coding, I skip dinner. So obviously my husband will be texting me. He's like, where are you? Are you home for dinner? And then I just used the recent mcp. I've. Well, this is like another beauty of it because with the same MCP server, you can unlock very different experiences by entering different prompts. So instead of sending him an email, so I asked cursor agent to can you text my husband at this number and explain why we're late for dinner? Because cursor agent is doing most of the coding. I was just reviewing, and I texted my husband, and it's like a number my husband can reply to too. So it's like a very practical use case.
[12:54]
David Soriapara
That is a good use case, Right?
[12:56]
Yoko Lee
Explain. Like, we got stuck here. I couldn't debug this. I felt really bad for the agent.
[13:03]
David Soriapara
I love this. This is so creative. This is just. This is like exactly the kind of little bit of magic that people get off using it.
[13:09]
Yoko Lee
Yeah. And then so the Morse code example, it was so much fun to build. It was more. There was someone on Twitter who asked, like, I want the coding agent to notify me when it, you know, finishes the task, because sometimes it's takes five, 10 minutes. So I thought, what's a really funny way for it to communicate with a human? Obviously, like, you can text, you can play some music, but then we have a lot of Philips Hue light bulbs at home. So I thought, what does it take for the agent to get access to my local network? Because it's under the same IP and I just control my lights. And how do you speak with the lights? Through the Morse code. So I kind of picked up Morse code that week. It's a lot to debug on. Like, what's long, what's short, what's the interval? So in the end, the experience is cursor or, like cloud desktop. When it finishes the task, it will start a Morse code sequence on whatever it will have to say through Morse code, and then you just need to listen or see it very closely. So that was a lot of fun to build. And that week, we have three cats at home and they're all freaking out because the lights are just on and off, turning on and off. Another one, you know, like, since I started using MCP as a developer, I was going back to my previous projects. I really just built for fun and thinking about how can I rewrite this using, like, as a MCP client so I can plug in any MCP server on it. So as an example, last year, so I built this Raspberry PI cat narration project where using the Raspberry PI camera to detect if my cat is jumping on the kitchen counter, and they will narrate what the cat is doing or yell at the cat. So I'm Actually in the process of converting that agent loop into an MCP client, so it can use a 11 Labs MCP server to actually yell at the cat and then it just unlocks Net new examples like this. I just love building and play on the side.
[15:07]
David Soriapara
I need that version for my dog.
[15:10]
Yoko Lee
I'll send you a Raspberry PI later. Most of the LLMs nowadays is still too big to run on device, so I still have to call Cloud or some other models to make that happen. But the fact that now I can make the CAT detector extensible, it's very interesting to me. So now not only can I call 11Labs MCP to yell at a cat, one I think underselling feature of MCP I've found is that the client could chain together different tool calls. So not only can it use 11 labs to get the cat, it can also send me an email to say what a cat is doing. So I guess speaking of underutilized protocol features, most of the people today, they're implementing MCP servers with tool calls, but we know that there are so many other features to be unlocked. So, curious about your thoughts here. What are some underlying utilized features that you feel like people should start experimenting with?
[16:07]
David Soriapara
Yeah, this is an interesting one because when you're creating a specification, you have all these use cases in mind and you think about it in a very principled way, and out of that comes a set of primitives that you want people to use. And then reality hits you and people use it very differently. I think obviously people use it, as you said, for tools. But there's I think, two or three things that I really think that are quite underutilized and I'm. I wish people would use it more, but I think there's a problem, particularly around client support initially. But the one thing that I really love in the protocol actually is a very poorly named feature called sampling, because it's quite confusing what it does. I think when you read the name.
[16:53]
Yoko Lee
You are sampling what it does.
[16:54]
David Soriapara
Yeah. So when you really think about what you're trying to do, it makes a lot of sense. So what it is, what sampling is, it's a way for the MCP server to say, I want to call an LLM, but because I'm an NCP server, I don't know what this DLLM that the client is using. And I could bring my own SDK, but then I'm binding myself to that SDK and that might be an anthropic SDK that could be an OpenAI SDK, but now I'm expecting an OpenAI API key or a cloud API key from, from the user and that's really not great. And maybe they use a different model in cursor. And so sampling is a way for the MCP server to go back to the client and ask the client, hey, can you give me, with the current selected model a completion like a sample from the LLM? That's where the name comes from and give that back to me. And that way I can do MCP servers who would go and summarize a Reddit post or summarize whatever I might want to do or even have their own agentic loops themselves. But the controller of the LLM inference is still declined. And so there's a lot of. I think that's the really cool bit that you can build these MTPs that are very rich, that go way beyond tool calling and have them all completely model independent and that's really what it's for. And we can talk later if you combine them in the right way. This has a lot of cool properties, but that's not one of the features I would love to do has seen more people use. But again, this is a matter of clients don't support this very well or at all. And so I wish more clients would support it and then more people can build it and can build these more richer things that go beyond just tool calling and be agent loops, be summarization bits and so on and so forth. So yeah, that's one of these features.
[18:42]
Yoko Lee
This is so interesting. I guess one very concrete example I always wanted to build with this model you mentioned with sampling is actually code review agent. So in this case I will want to build a server that does code review, but it may want LLMs to complete this valid syntax since it doesn't want to bring its own LLM. So it feels like a very natural jump off point. What does it take for clients to support this?
[19:09]
David Soriapara
They just need to do it. There are obviously reasons why certain clients wouldn't want to do it, particularly people like clients with fixed subscriptions might have, you know, might prefer not to do this because, you know, it suddenly becomes an API. But other than that I think it's just a matter of client support and priorities. Obviously clients support what people do and so they are mostly focused on tool calling. There's so much going on. The spec that needs to be added and the heavy lift in all the MCP land is very deliberately actually on the client side because we expected very way fewer clients than servers. And so we want to make it Very trivial to build a server. And so every complexity that we could shift to the CL we put to the client as a result. Like it's just hard to build a really good, full, spec compatible MCP client where it's very trivial to use any feature you want on the MCP server side. And so just like they're just a little bit behind and it just will probably take time. And for some of them it might just not make sense out of like the way they deal with inference in general. But in the end of the day it's just a matter of like just waiting and seeing that some people like implement it. That's the end of it, right?
[20:16]
Yoko Lee
Sampling is such an interesting concept to kind of, at least when I first saw it, I was like, oh, this is so powerful because the divide between client and server is less of a physical one, but more of a logical one. So technically you could write a server that's sampling with another client that's also a server. I know it's kind of like complex when you describe it, but can you give us an example of how to best use these kind of chained server client combo? And then how does that relate to sampling?
[20:47]
David Soriapara
Yeah, I think that you're alluding to a very interesting piece. Interesting enough. We have very early in the process built ourselves. So we had the prototypes for what you described. And I'll go into detail in a second with prototypes of this actually before we even release it to the public. But what you're describing is you take an application that is an MCP server that exposes tools to an MCP client, but you also, within that MCP server you are using an MCP client. And so you can also use other MCP servers downwards. And so you have this little program which is an MCP client and an MCP server at the same time. And what I think about this is upstream and downstream connections. And now you can chain these things indefinitely long. It's probably not very practical to do them indefinitely, but you can definitely think about a few chains and you can even go as far as create whole graphs out of this. And you can very quickly envision worlds where there's an MCP server that has an agentic loop that orchestrates two or three other MCP servers. Their tools do a really good agentic loop. And then you can have this entity out of three or four servers and give them to a client like a cursor. And I think that's a very interesting concept that feels very agentic, particularly if you then use additional primitives that go beyond tool calling, such as resources or prompts, where there's additional data streams basically that MCP servers can expose, or data that they can expose upwards and downwards. And I think then you can actually model of quite rich interactions. And I would love to see people play around with more of that and use, for example, like an AI framework, like that pedantic AI or LangChain, whatever, to build a connection of client upwards, client downwards, server upwards, and then change these things and see what happens. And then you are suddenly free and you can go to a user and say, hey, which five MCP servers you want this agent to control? And you might have a very general agent loop and people can go and experiment and they can suddenly have, you know, cat monitoring software connected to an agent that also speaks, you know, email, WhatsApp, whatever it might be.
[23:03]
Yoko Lee
Right.
[23:03]
David Soriapara
And as you mentioned before, there's a lot of power in using LLMs for these orchestration tasks. And so you can build these complex systems, these complex agent graphs using that technique you described quite quickly.
[23:15]
Yoko Lee
Yeah. Since you also mentioned resource and prompts, which are the other two very powerful and underutilized functionalities today in the spec, I really think these are the sleeper hits of mcp. Do you want to, I want to briefly explain how, you know, how does the developer leverage resource and what is prompts as a concept within.
[23:36]
David Soriapara
Yeah, I can. Yeah, happy to do that. It's one of the things to understand when we think about MCP is that MCP is focused on how the primitive that you're exposing interacts with the other side, usually the user, but it could be an agent. And prompts are meant to be driven by the user, for example, that the user explicitly adds it to the context of a call. And so prompts are templates that people can insert. But the interesting bit is on one hand side, they can be very static templates, you know, an example of how to use this MCP server, but they can also be very dynamic. They can be just as much API calls under the hood. So we had, for example, an MCP server that exposed prompts, that downloads a stack trace from like a sentry API. And so now that goes into the prompt. But I am as a human on the other side, I say I want this into context. Now, I don't let the model decide it, I decide it. And that's the difference between a prompt, for example, and a tool. And so resources, on the other hand, they're quite unique because resources are just like blobs of data. And they were, for example, can be very easily used to model something like a file system towards the MCP client. And in this interaction model that described user driven, model driven, tools being model driven, prompts being user driven, resources sit in between by being application driven, whatever that might mean. And so an application, for example, cursor could choose to say a resource can be added to an agent, similar as you can add a file to an agent. But it could also, for example, do things like ingest a resource into a RAC system first and do retrieval before. Right. Because these resources could be arbitrary, long. So one of the things we thought about of like, you know, very early on MTP is do you actually need to build something for retrieval into this? And we came to the conclusion, hey, if the client controls the retrieval bit, resources can just go into this and into this retrieval system. It can be used that way. And if, and you know, if you wanted to do on a server side, you would use a tool. But so those are these distinctions that I think people have not really caught on yet to. They're also like, these things are fairly rich. You know, both tools and resources can be audio in the new spec, they can be images.
[26:05]
Yoko Lee
Right.
[26:05]
David Soriapara
So there's a lot people could do. You know, you could expose your current screenshot as a resource. These type of things that I think leave a lot more use cases open and to explore that MCP has to offer. But I understand people do tools because it's the most obvious thing to do.
[26:22]
Yoko Lee
This is such an interesting point. When I first look at resources, I almost felt like it's a mind shift. Like traditionally as a developer I always thought resources would be on the side of client. So the client will expose resources and query it locally. But in this case, it's almost like the MCP server is exposing a file system that client can query.
[26:48]
David Soriapara
Yeah.
[26:48]
Yoko Lee
Curious about your thoughts behind how did you think of the model? How did you decide that it's going to be like a server side versus a client side thing and what does it entail for the transport layer?
[26:59]
David Soriapara
So for the, I think the initial model was like MCP was like, how do I provide context in these different user interaction models? And so for that, resources came quite naturally actually out of the need of like, how do I actually enable an MCP client that doesn't have access to the local file system by itself, but I want to give it access to the local file system soon. And now a bit of history. Looking back into July, August 2024, Cloud Desktop would not have access. You can upload files and these type of things. But it's not as natural to add a file system to this and similar to some agents that we might have internally. And so it felt very natural to have something like that. That was really the genesis of this of how do these servers are supposed to provide context. And so there's some of these. And now for the transport layer. MCP in the end of the day is just transport independent, which was quite important for us. So initially that date came out of the local use case where I wanted to use standard IO which has a lot of niceties of the lifecycle of the MCP server is controlled by the client automatically. There's a lot of things it can do but it also means you just, you can't really speak. You could technically speak HTTP, but really realistically you're speaking something that is like lion based and say you're speaking something like JSON rpc. And that's very heavily inspired by how the language server protocol does this, which is very, very similar. It has an interesting property that I'm somewhat ambivalent nowadays today about because it has some drawbacks and requires certain things that probably would be better in a more classic API like way on the HTTP layer, but it still enables people to go and at the same time and implement MCP over other transports. You can if you like. You know, I used to work at Facebook for 10 years and there you use these thrift RPC mechanisms internally and that's all these security infrastructure is built around this and you could just build MCP over this and there would be no, you know, no change required. You just do a different transport and both sides are still happy. I mean, so that's why. That's one of these reasons we chose it for that flexibility and partially also because it was an evolution from standard IO to HTTP.
[29:14]
Yoko Lee
Yeah, that's so interesting. One of the top questions from just talking to a lot of developers who are building MCP is how do I authenticate MCP both from sphere client to server, also server to tools. Yeah, I know there's so many different great ways to make it happen there. There's also spec evolvement. So I guess what's your thoughts around how auth will shape up around MCP in general?
[29:37]
David Soriapara
Oh, that's such an interesting and deep topic. I think the interesting bit is that everybody wants authorization. I think it's clear that the current implementation that people effectively use is local MCP server which is just give me an API key or some form of token via an environment variable. Is is usable but it's not exactly great. And particularly for the case where servers will be remote, it's impossible. And so we have an early part of the specification around authorization, which just uses OAuth. There's some caveats to that and we're working very closely with the original OAuth authors and experts in the field to make this really go well. But I think there will be. There is an initial focus on how the user authenticates, which is different potentially, but not sure yet. It's potentially different how agents will interact with each other and authenticate with each other. And for now we want to solve the user and the human server problem. And for that we would just use whatever the OAuth spec in the best possible way because it turns out when you innovate on the levels of primitives and other things, you want to stay as boring as possible for everything else. But what authorization does, of course it enables a very different set of MCP servers because it enables MCP servers that are remote, that are bound to a company account, that are really driven by a professional service offering something for you that you have a subscription to. You can envision. I think PayPal has an MCP server. For example. You can see I want to use this MCP server. I log in with my PayPal account, now I can use this MCP server and now I'm authorized and it opens like this company and corporate ecosystem that I think will be super important in our day to day lives. While at the same time, you know, MCP still retains this like a bottom up hacker mentality that it had originally from for developers. But it's just like the authorization is the key step to like this much, much richer ecosystem for like professionally developed MCP servers. At the end of the day, I.
[31:44]
Yoko Lee
Guess like when we talk about auth, there's two layers. One is authentications. Do you get access to this thing? And then authorization, which is what are you scoped to get access to? It's very interesting because I see these concepts sprinkled in different layers on mcp. So for example, you could scope access to certain resources, say I can only access resources, that's this specific folder. And then there's also obviously like third party authentication from server to all the API providers. How do you think about it when it comes to authentication versus authorization? And what would you want to see from auth providers in the wild? What needs the most help when it comes to what makes developers lives easier?
[32:33]
David Soriapara
It's a good distinction. Good question. I think what we're focusing at the moment is Mostly authorization. So am I allowed to access this resource? Because that's what people want. We don't necessarily. And I have yet to see a lot of use cases for this authentication part, like which identity am I, who am I and who are like there's other parts to that. I think that will come later of who is acting on behalf of whom. Particularly in an agent world this will be important. But for now we're tackling like one thing at a time with the biggest boulder in the way first. Right. Which is for at the moment is like how can I get access to something that is behind some form of authorization and I need to do this. And so that's what we're tackling. And so I think at the moment the focus is like 100% on authorization. And then I think from there we will potentially in the future go about authentication and identity and these type of aspects. Now for all providers, the thing that, and luckily a lot of the big ones are doing this is just engaging with us and telling us of what is the common denominator that everyone has that we can build upon so that developers feel they have some safety in that. It's not like, oh, you can only use this with this provider and talk to us. What are you willing to implement? Where there are potential in this agentic world, things missing in authorization pieces and luckily they do this. The authorization specification development that's currently going on is driven by a combination of very engaged people on the security and identity side of Microsoft, from Okta, from aws. And so it's like the right people in the room already that are in many ways way better suited to help me make these decisions than I myself because I'm not an identity and authorization expert. And so I just want to hear more people that are experts in that field to tell me what's the right way to do this so that we can all figure this out together. And that's really what I want from people. Right?
[34:30]
Yoko Lee
Yeah. Amazing. I love this community driven development and iteration and the spec too. Every time I check out MCP's specs there's like hundreds of issues. So a lot of respect on how you guys kind of groom the issues day in, day out. Another topic I want to dive into, we kind of talk about a little bit is in the creativity field, how does MCP work and what are some use cases? Because today most of the clients we have seen are very developer focused. It's very natural when it comes to new technologies adoption cycle because developers, we know how to configure it we know to put in a JSON blob. But then recently I start to see very interesting cool use cases with more creativity tools like Blender, you can now use words to create a 3D model and then you can use MCP servers around like a Unity instance, you can have your own synthesizer, so on and so forth. What are some top creative use cases you have seen or you are most excited about and want people to build more of?
[35:31]
David Soriapara
I'm actually curious about your take later because you're a very creative person. But for me I find that goes back to what I love about MCP is this ability to bridge gaps of what you care about in the world and what you care about in your life. And so when I saw for example the Blender MCP server, which I think was one of the first original big ones, or there's one about where a person connects Claude to Ableton, I just find it so fascinating and really cool because on one hand side, I'm just astonished that LLMs are really good at this actually. And you're surprised because it's a side of LLMs you would have never seen before without mcp. But on the other hand side, I just love the creativity of connecting these tools and then actually getting something useful out of it. Of course, to create to a creative process. And you know, this is better than others. As a creative person yourself, an aspect of control that every artist wants to have. And LLMs and MCP doesn't give that to you, but it gives you a different set of interfaces to something. And I think it's very interesting and creative to play around how you can describe, for example, a 3D environment. And I think that's a very unique thing because an artist that's an environmental artist in Blender has probably not never had the ability to really express itself in words and maybe even, you know, how can you write a poem and have it translated to a 3D environment? And that's super fun. And then of course you want to go back into Blender because you need control. But I think it's a great fun exercise and experimentation bit that I think helps creatives to look at it in different ways, if anything.
[37:12]
Yoko Lee
Yeah.
[37:12]
David Soriapara
And then of course, you know, I'm, I, I love synthesizers. I'm, you know, I'm a terrible musician myself, but I love them. And I mean I love this idea where people used, for example Claude to program patches onto physical synthesizers. And that's just like fascinating to me that Claude can do it, but also just cool to see that People have thought about connecting the LLM to something that's a physical thing in the world that makes a sound afterwards. And so I love that part. But I'm curious what you think about this because you're a very creative person, how you think about this as well.
[37:45]
Yoko Lee
You know, I've been thinking a lot about kind of along the lines of what you mentioned, the input of clients. So today the input is mostly words. So we describe what we want to see, but then we know that words and actions or visuals are never one to one. So it's very cool to have a starter template described by words, but then the later iteration has to be dictated by the artist's choice. So, for example, I'm a huge user of Procreate. And then in Procreate you don't really describe what you want to see, you just draw what you want to see. And then so much of that is controlled by my. The latent space in my brain. Like my brain is not describing what I should be drawing. Right. That's not how like my model works. It's more controlling the muscle to kind of deciding like how to draw this curve of a line. What's the color? That looks good to me. So to some extent, I almost feel like the MCP client really severely dictates what the whole experience will be. For example, if the client sends some bezier curve to the server and then have the server decide, is this something that looks good to you? That's not something that we have seen very often yet. So today the input for either code or languages, that's very common. But later I wonder what kind of experience we'll have if every design tool becomes MCP client.
[39:11]
David Soriapara
I don't know. I've made no clue what this is going to look like, but I think it's a very interesting thought exercise.
[39:17]
Yoko Lee
Yeah. Here comes a philosophical question on agents, just based on everything we've talked about, which is, what do you think is the ultimate communication mechanism or modality for agents? On the one side we have natural language, on the other side we have programming language. I mean, technically we could frame all the problems in the world into a programming language if that language supports it. And then on the other side, we have input mentality as pixels, screenshots, sometimes videos. What do you think based on what you've seen on MCP servers and client interaction would be the ultimate. The abstraction layer that you'll be like, this is the right way or a great way to provide all the necessary context for agents?
[40:04]
David Soriapara
I think the. Yeah, such an Interesting. Such an interesting bit. I think the. I think I don't know is one part of the answer, but I think the real answer is there's probably a set of. There's a probably merit between having either of them. I think programming languages are very good interaction pattern between agents because there's a lot to say about dense mathematical but, you know, slightly different form of syntax that is very, very clear about its intent and very constrained in a way which programming languages are. And then there's a very free form of, you know, natural language. I think only natural language would not be good enough, personally. That's a personal opinion I have. A combination of them might be the right thing, I think. What the. So I don't have really an answer because I feel it's a bit too early to tell and I think I want to see this space explored a little bit more. More. So I'm like, when I look at development in this field, I'm like, I feel it's a bit too early to really tell what's the right abstraction there. But I think things like MCP enable people to experiment with different things. And then of course other language frameworks that exist and a bunch of other things in the space enable people to experiment. But I think there's a lot more experimentation to be done to really understand what the actual general abstraction should look like. And if you think about MCP under the assumption MCP sticks around and stays around, as I hope it will, McP is like two, three years into tool calling existence already. So it's like we have seen a lot of these interactions before. We have a somewhat general abstraction and I think that we're a bit too early for agents to see what this is going to look like. But I think your observation around, there's so many different modalities and different options and that's like. And I just talked about the text side of things, right? And you already had pixels and other bits in there. And so I think there's so much interesting space to communicate. Who knows, maybe models really like to talk about things over video streams. And we don't know, maybe, maybe that's in the animad modality. We end up. It's just like video streams everywhere because they just like watching video pictures of things.
[42:22]
Yoko Lee
That's so interesting, you know, like these modalities split into each other as I do a lot of like random projects on the side. One of them is called AI Tamagotchi. So it's basically like an AI driven stateful Tamagotchi So instead of just eating one thing, like the Tamagotchi can request like 10, 20, 50 things, whatever the LLM state will let it do. One thing I realized is that I could use most of the models today to generate ASCII art and even ASCII animation. And then when I was thinking about it, it almost felt like a visual task, but like a language model still generates a sequence of tokens. And then if I give the task to say like a diffusion model, it doesn't really generate a token generate pixels. So the question is, like, what is the better way to generate sequence of images or sequence of ASCII characters to animate something like this? So it really.
[43:17]
David Soriapara
What have you found? What do you think of this?
[43:19]
Yoko Lee
I actually think ascii. Like, I actually am more on the language model side today for these stateful, like very predictable animation sequences. It almost felt like this is, you know, like a modality. I didn't think that would have worked, but it did because the predicting next token, turns out it also works with predicting the next ASCII character.
[43:40]
David Soriapara
There like a lot of things if you think about, you know, like transformer models and attention, you know, it would fit them. Like there's this sequential things are probably somewhat good to generate with it. Yeah, smart observation.
[43:53]
Yoko Lee
Yeah. And then the funny thing is I tried out a lot of different generation tasks. It's best at generating cats. That's why I searched for the Internet. ASCII cats is really well represented in the data set. This actually brought me like our agent chat brought me to the. This other question kind of on the high. When you think about the future for mcp, what do you want to solve and what do you want to keep evolving and what do you not want to solve? Because it does feel like a lot can be MCP specs problem. Right. You could implement the rac, you can implement the database, you can implement everything, anything in the world. So I guess how do you think about it, what kind of things you want to keep executing on and what kind of tasks you felt like are just not part of what's on going spec should be taken care of.
[44:39]
David Soriapara
Yeah, that's such a good, such an interesting question. I think everyone who builds a spec is faced with this type of problem of you need to stick to your gun, so to speak, and focus on the area you want to be good at and not try to boil the ocean, so to speak, and try everything. I think for mtp, there's a few things. I think there's evolution of the current part of mtp, I think there's a very clear path for evolution around authorization, around other parts of that. But then I think there's potentially still place for a bit more abstractions regarding agents. But that's like a very low conviction opinion yet because I just again back to like I need to see this a little bit longer and I feel I really want to explore this space.
[45:28]
Yoko Lee
I want to ask the question here. How do you define an agent?
[45:31]
David Soriapara
Yeah, I'm not going to get into that. What do you think? What do you think is an agent? I don't know.
[45:38]
Yoko Lee
I think it's a multi step reasoning chain. It's very simple for me.
[45:42]
David Soriapara
Okay. Yeah, okay. I think I can get behind that. I can get behind that. I think there for me agents is potentially more like in this world like agency. So something that does some form of autonomous orchestration or autonomous task solving and that usually anything that's a multi step thing is for me already like an agent. Right the moment it does two steps and it reacts to the first step, it's basically an agent because it now has some agency over what it's doing. And so I think that's at the end of the day for me to add for the most part. But there's a lot of definitions of agents out there and so I think there's a potential there to think about this. I think MCP is somewhat good position in the sense that it allows for these graphs. I think some of these graph pieces that MCP inherently indirectly enables can also be dynamic, which I think is a very interesting and unique part of it. So maybe there's a little bit around agents, I'm not fully sure yet, but it's something that I definitely take a look at. And beyond that I think again the rest at the moment is just evolution, like streaming and other based modalities. I think there's other interesting bits to MCP of like how does something like that potentially fits into other models, types that are not just pure text based models? I think that's a long term interesting question. So what does this look like for video, audio, images, whatever it might be? I don't know if there is a use case for this or something like it. And that does not have to be mcp. But I think it's an interesting question to think about different modalities. But yeah, again I think for the most part it's modalities. If the evolution then you know, maybe there's a big maybe big question mark next to the do we need more for agents or can agents Be already very well formulated in MCP abstraction. And again, that's back to experimentation.
[47:34]
Yoko Lee
That sounds like such a fun experimentation.
[47:36]
David Soriapara
It is a lot of fun.
[47:37]
Yoko Lee
Yeah. I often try to refactor my code and then try to refactor the single agent into multiple agent, like five agent. I just mean multiple calls to make decisions on the chain. And then interestingly most of the time for the tasks I'm trying to do, which is very simple, you know, send an email or like ping someone and like very transact long transaction workloads. A single agent worked just fine. So I haven't really come across a use case myself that requires multiple agent collaboration. It's like a very complex task. But what's your view there? Do you feel like we're going to kind of go pretty deep in a single agent, like a. It's almost like a technical detail, like a single call graph with LLM. Or do you feel like it will be multiple processes working together?
[48:25]
David Soriapara
For me, one of these observations is that I think agents is less a function of different of the task, but more a matter of trust boundaries. If you have a travel agent that needs to have access to your bank or whatever it might be, there might be interesting bits where there are trust boundaries, which is like that's where a protocol wants to be used in between rather than just being the same framework or whatever it might be. And so I do feel there will be some form of composability based on these trust boundaries because you will probably eventually want to use whatever interface your bank gives you for agents and nothing else. And so there's a boundary that this needs to interact with something else. And so these things will happen to some of the more part of the world that require a bit more trust. I think beyond that it's a bit tricky to see how these going to work out. And I can totally see a single agent or agent framework being quite powerful. But again, composability, the ability to switch things out for users that are not developers, I think can be very useful in a way. And I think there's also a question of will there be two or three meta agents that drive other pieces that are MCP shaped or will there be everything be very specialized and then you know, you have developers build these different agents. So it's a bit of a complex question that I pay back to like experimentations. But for my use case at the moment, a lot of these like single agent few interactions do all the things I needed to do. But then I which is I think similar to what you're saying. But then we are also very early, very, very early. Right. Really on exploring with agents and the models are at the spot where these things become very powerful. So we'll see what this is going to look like in a year. But again, I think that trust boundary is an interesting bit that I look at. And then how does an agent act on behalf of another agent in these type of aspects? And I think there might be protocol there that might be needed.
[50:27]
Yoko Lee
That's awesome. Amazing. Well, last question, I guess. I just love MCP as an open protocol from day one. So as a result you all amass like a huge community kind of contributing, giving suggestions. When you think about where you need the help the most in the next phase of MCP development, can you talk more about where you think you will want more contributors in? How do people reach you? How do people collaborate on the spec or other things related to the spec?
[50:56]
David Soriapara
Yeah, I think for contributions. At the moment we run this as a very traditional open source project. So what we're looking for is people maintaining, helping, writing issues, reviewing issues, reviewing PRs, writing PRs, building trust with us as the maintainers to hopefully help us longer. And so we're looking for people that just want to be active in the community, be this driven by companies, be it driven by individuals, it really doesn't matter to us. So that's a big part of just going through the Python SDK issues, helping people there, going to re implement some of the bugs there and see if they're actually a problem and get more detailed information, reviewing PRS when necessary, probably better writing PRs and fix bugs. I think those are great starting points. When it comes to the specification itself, the lift is a bit higher and the bar is a bit higher. So there it's probably good if you either address a very specific need or write a very detailed RFC for it. That might sit there for a while, you might rally up, if you're like a company, might rally up some support for it and come to us together. I think that helps quite a bit and so I think those are good starting points. So it works very much like a traditional again project. We look into governance models that are a bit more sustainable in the long run, that are like a bit more consensus driven. And so we're going to work towards that. But yeah, besides that, just come help out on the code. The specification is a bit hard to work with, but other than that, you know, if you feel strongly, just go for it as well. And yeah, build trust. You know, we have a lot of people helping us. You know, the patenti people, for example, do a great job at the Python SDK. The Microsoft people did a great job with the authorization specifications. Same with the Okta people and the AWS people. So there's a lot of things already happening. We have people helping us so much with the Inspector. There's some, you know, just community contributors that I really highly appreciate. So yeah, just go and help and work with us. That's really what we need at the moment, for the most part.
[52:58]
Yoko Lee
This is awesome. I really enjoyed the conversation. This has been so fun chatting about anything, you know, from Tamagotchi to CAT monitoring apps to, you know, MCP protocol, the future of it. Thank you so much for making the time to David and then till next time.
[53:15]
Podcast Host
And with that, another episode is in the books. Thanks for listening all the way through. If you enjoyed this episode, please do rate, review and share the podcast among your friends and colleagues. And keep listening for more exciting discussions about agents and more We Promise, as well as more insightful interviews with founders and builders across the AI space.