
In this episode, a16z GP Martin Casado sits down with Sherwin Wu, Head of Engineering for the OpenAI Platform, to break down how OpenAI organizes its platform across models, pricing, and infrastructure, and how it is shifting from a single general-purpose model to a portfolio of specialized systems, custom fine-tuning options, and node-based agent workflows. They get into why developers tend to stick with a trusted model family, what builds that trust, and why the industry moved past the idea of one model that can do everything. Sherwin also explains the evolution from prompt engineering to context design and how companies use OpenAI’s fine-tuning and RFT APIs to shape model behavior with their own data.
Loading summary
Sherman Wu
We want ChatGPT as a first party app. First party app is a really great way to get 800 million wows or whatever.
Martin Casado
Now a tenth of the globe, right?
Sherman Wu
Yeah, yeah. 10% of the globe uses it every week. Every week, yeah. Even within OpenAI, the thinking was that there would be like one model that rules them all. It's like definitely completely changed. It's becoming increasingly clear that there will be room for a bunch of specialized models. There will likely be a proliferation of other types of models. Companies just have giant treasure troves of data that they are sitting on. The big unlock that has happened recently is with the reinforcement, fine tuning. With that setup, we're now letting you actually run rl, which is a lot you to leverage your data way more.
Podcast Host / Narrator
OpenAI sells weapons to its own enemies. Every day, thousands of startups build on OpenAI's API, many trying to compete directly with ChatGPT. It's the ultimate platform paradox. Enable your competitors or lose the ecosystem. Sherman Wu runs this Highwire act. He Leads Engineering for OpenAI's developer platform, the API that powers half of Silicon Valley's AI ambitions. Before OpenAI, he spent six years at OpenDoor teaching machines to price houses where a single wrong prediction could cost millions. Today, Sherwin sits down with a 16Z general partner Martin Casado to explore something nobody expected. That the models themselves are becoming anti distintermediation technology. You can't abstract them away and every attempt to hide them behind software fails because users already know and care which model they're using. It's changing everything about how platforms work. Sherwin and Martim talk about why OpenAI abandoned the dream of one more model to rule them all. How they price access to intelligence, and why deterministic workflows might matter more than pure AI agents.
Martin Casado
Sherwin, thanks very much for joining. So we're being joined by Sherman Wu. It'd be great actually if you provided the long form of your background as we get into this, just for those that may not know you, I mean, I've used Sherman as one of the top AI thought leaders, so I'm really looking forward to this.
Sherman Wu
Yeah, yeah, thanks for having me. I'm really excited to be on the podcast. Yeah, so a little bit more of my background, so maybe we can start from present day and go backwards. So I currently lead the engineering team for OpenAI's developer platform. So the biggest product in there, of course, is the API.
Martin Casado
Is there more for the developer platform than the API? Just kind of assume that it's anonymous.
Sherman Wu
Well, so I also think about other things that we put into our platform side. So technically our government work is also like offering. Deploying this into different areas. Yeah, like I've talked about.
Martin Casado
Oh, like, so if you have like a local deployment.
Sherman Wu
Yeah, yeah. So we actually do have a local deployment at Los Alamos National Labs. It's super cool. I went to visit it. It's very different than what I'm used to, but. Yeah, in a classified supercompute with our model running there. So there's that, but like, mostly the API because.
Martin Casado
Did you go to Los Alamos?
Sherman Wu
We did, yeah, I did go to Los Alamos. It's great. They showed us around. They showed us some of the historic sites. Yeah, I used to work at Livermore, man.
Martin Casado
So I've got like.
Sherman Wu
Oh, yeah, yeah.
Martin Casado
First job out of college, so.
Sherman Wu
Right, right, right.
Martin Casado
You saw that next.
Sherman Wu
Yeah, we hope to. Yeah. So I work on the developer platform. I've been working on it for around three years now. So I joined in 2022. I was basically hired to work on the API product, which at the time was the only product that OpenAI had. And I've basically just worked on it the entire time. I've always been super interested in the developer side and kind of like the startup story of this technology. And so it's been really, really cool to kind of see this evolve. And so that's my time in OpenAI. Before OpenAI, I was at Opendoor for around six years. I was working on the pricing side. My general background before pricing is such a dissonant. Yeah, yeah.
Martin Casado
Pricing and open or to like running.
Sherman Wu
API, it's such a different. It's been fascinating actually for me to see the differences between the companies. Like, they're run so differently. They both have Open in the name, so there's some overlap, but that's pretty much it. But yeah, I was there for around six years working on the pricing. So our team basically would run the ML models.
Martin Casado
This is actually pricing the assets on Opendoor, the inventory.
Sherman Wu
Exactly. So, yeah, Opendoor would buy and sell homes, and their main product was buying homes directly from people, selling them with all cash offers. And so my team was responsible for how much we would pay for them. And so it was a really fun, like, ML challenge. It had a huge operational element to it as well, because not everything was automated, obviously, but it was a really fascinating technical challenge.
Martin Casado
And is there any sense of that on the API side, like GPU capacity buying, or is it just totally unrelated.
Sherman Wu
On the API side? There is a small bit of like how we price the models, but I don't think we do anything as sophisticated as Opendoor. Opendor is just like such a hard problem. It's like such a like expensive asset. The holding costs are very expensive. You're like holding onto it for like months at a time. There's like a variability in the holding.
Martin Casado
Time and massive long tail of potential things that could.
Sherman Wu
Long tail, yes. And like try to think about it from a portfolio perspective and like if one of them just like you're holding on it for two years, it blows. Everything like goes negative. So it's a very, very different six years, different challenge. Yeah, yeah, six years there. Lots of up and down. Saw a lot of the booms, saw a lot of the struggles and then we IPO'd before I left. But yeah, just in general it was a very great experience. I think for me it was also just had such a very like business operations and like a very like by the book type of culture. Whereas OpenAI is like very different.
Martin Casado
What's so interesting, I was just thinking about it now. It's like even for a company like that, like you don't think about it as a tech company, but if there is a deep technology problem, it actually is the pricing. Right. It's actually an ML problem.
Sherman Wu
Yeah, that's what it tracks.
Martin Casado
It's not like the website, it's not the platform, it's not the API. It's literally that.
Sherman Wu
Yep, yep, yep. And that's what attracted me to it. I think that's what was interesting. It's also a way like lower margin business than OpenAI because you're like making a tiny spread on these homes. They would talk about like basis points, like eating bits for breakfast and all that. So anyways, I was at Opendoor for around six years and then before that was my first job out of college which was at Quora at MDN Group there. Yeah. So I was working on the newsfeed, so worked on Newsfeed ranking for a bit. Worked on the product side. That was actually my first exposure to like actual ML and industry and learned a lot from the engineers at core. We basically hired a lot of the early Feed engineers.
Martin Casado
Was Charlie still there when you were there?
Sherman Wu
Charlie was not there when I see.
Martin Casado
Like right after you.
Sherman Wu
Yeah, yeah, yeah.
Martin Casado
That was a really legendary team. It's still known to be kind of this super iconic founding team.
Sherman Wu
Yeah, yeah, the early founding team was really solid. I still think that even while I was there, I still like amazed at the quality of the talent that we had. I think there's like one. The company's like 50 to 100 people. But yeah, like a bunch of the Perplexity team was there. Dennis was on the feed team with me. Johnny Ho, Jerry, Ma.
Martin Casado
Yeah, that's right.
Sherman Wu
And then, and then Alexander the Scale. Now I was there between high school and college. It was an incredible team. I think I kind of took it for granted while I was there. I was a good group.
Martin Casado
How did you get to Quora? What did you study in an undergrad?
Sherman Wu
Yeah, so before that I was at MIT for undergrad. I studied computer science. Did like one of those, like computer science and the master's degree kind of like crammed it in. I ended up at core because I got in what we call an externship there. So at mit you actually get January off. So there's like the fall semester and then January's off and then you have the spring semester and so it's called independent activities period. So some people just like take classes, some people just do nothing. But some people will do like month long internships and some crazy companies will offer a month long internship to a college student. And it really is just kind of like a way to get people into.
Martin Casado
Did you come out here from Boston?
Sherman Wu
Yeah, yeah, it was crazy. So you had to apply, I remember. Yeah, this is, I think 2013, January or something. You had to apply. And I remember the core internship was the one that just paid the most. They paid, I think it was like 8,000, $9,000. And I was like, wow, that's like over a month and you're kind of ramping up.
Martin Casado
Like half the time I can eat for a year.
Sherman Wu
Yeah, yeah. As a college student, it's like great. And yeah, they would kind of like fly you out here. So I did the interviews and then luckily got an offer. And so, yeah, it came out for a January. That was right when they moved into their new Mountain View office. And I basically honestly just ramped up for like two weeks and then have two weeks of good productivity working on the Feed team.
Martin Casado
So that was that, like user facing, like. Yeah, User facing product work.
Sherman Wu
Yeah, yeah. I distinctly remember my externship project for those two weeks was just to like add a couple features to our feature store.
Martin Casado
Yeah.
Sherman Wu
And that would make its way into the model. I remember my mentor there was is Tudor, who's now running, I think it's called Harmonic Labs. Yeah, yeah. Crazy team. Crazy.
Martin Casado
I mean, by the way, I think it's one of the untold stories of Silicon Valley is like how good that original team ended. I mean, a lot of them are still there and still good, but the diaspora from Quora is everywhere.
Sherman Wu
Yeah. Yeah, that's actually how I ended up at OpenAI to kind of fast forwarding from there because OpenAI kind of kept a quiet profile. Ish. I had always kind of kept tabs on them because a bunch of the core people I knew kind of like ended up there. It's kind of like checking in on it and they were like, yeah, something crazy is happening here. You should definitely check it out. So, yeah, I definitely owe a lot to Quora. But yeah, part of the reason why I went there versus other options as a new grad was the team was just so incredible and I just felt like I could learn a ton from them. I didn't think about everything afterwards. I was just like, man, if I could just absorb some knowledge from this group of people, it'd be great.
Martin Casado
Awesome.
Sherman Wu
Yeah.
Martin Casado
So one place I wanted to start is something that I find very unique about OpenAI is it's both a pretty horizontal company. Like it's got an API. Like I would say we've got this massive portfolio of companies, right? And I would say good fraction of them use the API. And then it's also a vertical company in that you've got full on apps, right? Like everybody uses ChatGPT for example. And so you're responsible for the API and kind of the dev tool side. So maybe just to begin with, is there an internal tension between the two? Like, is that a discussion like, like the API may, whatever, it may help a competitor to like the vertical version or is it not? These things are just growing so fast. It's not an issue. I'd just love to how you think about that. By the way. It's very unusual for companies to have both of that, these two things this early. It's very unusual.
Sherman Wu
Yeah, yeah, I completely agree. I think there is some amount of tension. I think one thing that really helps here is Sam and Greg, just from a founder perspective, have since day one just been very principled in the way in which we approach this. They've always have kind of told us we want ChatGPT as a first party app. We also want the API. And the nice thing is I think they're able to do this because at the end of day it kind of comes back to the mission of OpenAI, which is to create AGI and then to distribute the benefits as broadly as possible. And so if you interpret this, you want it in as many surfaces as you want and the first party app is a really great way to get, you know, I don't know, it's like 800 million wows or whatever now and.
Martin Casado
100 million wows.
Sherman Wu
Yeah, yeah, it's pretty, it's actually mind boggling to think about.
Martin Casado
I don't think many people listening to this don't understand how big that is.
Sherman Wu
But that is, I mean, it's crazy.
Martin Casado
Yeah, that's gotta be like actually historic for the time it's taken to get to 800 million.
Sherman Wu
It's historic. It's also just like, yeah, the amount of time and just like how much we've had to scale up a tenth of the globe. Yeah, yeah. 10% of the globe uses it every week. Every week, yeah, yeah. And it's growing, and it's growing so like at some point, you know, it'll hit like, you know, it'll go even higher than that. And so, so yeah, like, obviously the reach there is unmatched, but then also just like being able to have a platform where we can reach even more than just that. Like, one thing we talk about internally sometimes is like, what does our end user reach from the API? Like, it's actually, it's like really, really, it's really broad. It might, might even. It's hard because ChatGPT is growing so quickly, but like it, like at some points it was definitely larger than, than ChatGPT and the fact that we're able to get tap in all this and, and, and get the reach that we want. But yeah, I mean there's definitely some tension sometimes. I think the, I think it's come up in a couple of places. I think one of them is on the product side. So as you mentioned, you know, sometimes there are competitors kind of like building on our, on our platform who, you know, might not be happy if ChatGPT launches something that competes with them.
Martin Casado
Yeah, I mean that, you know, that's the tale of the old as the cloud or operating systems or whatever. So like that's, you know, I think it's More like does ChatGPT worry about the competitor? Yeah, you know, type thing like, you know, you enabling a competitor.
Sherman Wu
Yeah, yeah. I mean the interesting thing is like I would say not particularly mostly just because we've been growing so quickly.
Martin Casado
It's like such a force right now.
Sherman Wu
Yeah, yeah. Growth solves so many different things and like, and the other way we think about it is like everyone's kind of building, building around AGI, building towards AGI. Of course there's going to be some overlap here. So Yeah, I mean, but I would say, like, at least in my position, I feel more of this tension from the customer, like the API customers themselves. Right. It's like, oh my gosh, you know, you're like, are you going to build this thing that I'm working on?
Martin Casado
Yeah. That story is as old as computer system. There's never not been computer platform that didn't have that problem. Okay. So I kind of go back and forth on this one. I want to try one out on you. Which is the problem historically with offering a core service as an API is you can get disintermediated.
Sherman Wu
Right.
Martin Casado
And so I can build on top of it, but then the user doesn't know, like, whatever. I build on top of the cloud, but I disintermediate from the cloud and then I can switch to another cloud or whatever. And it occurs to me that that's kind of hard to do with these models because the models are so hard to abstract away. Like, they're just, they're just unruly. Right. If you try to like have traditional software drive them, they just don't kind of manage very well. So part of me thinks that it's almost like this, like anti disintermediation technology that you kind of have to expose it to the, to the user directly.
Sherman Wu
Yep.
Martin Casado
Does that make sense? And so I'm wondering if, like. So even If I think ChatGPT is really just trying to expose the model to the user, the API is kind of just trying to expose the model to the use. So I think there's almost this argument. It's like, if the real value is in the models, it doesn't really matter how you get it to them because it's gonna be very tough for someone's gonna to abstract it away in the classic sense of computer science of like, they don't know that they're using the model. Like you always know you're using GPT5.
Sherman Wu
Yeah. And the interesting thing is, I think like the entire industry kind of has slowly changed their mind around this too. I think like in the beginning we kind of thought like, oh, these are all gonna be interchangeable.
Martin Casado
It's just like software.
Sherman Wu
Yeah, yeah, exactly. So the piece of infrared you can just swap out. Yeah. But I think we're learning this on the product side with like, you know, the GPT5 launch and like 4.0 and like how so many people liked O3 and 4.0 and all of that.
Martin Casado
I found that I felt that when it changed, I'm like, I'm like you're not as nice to me. Like, I like the validation.
Sherman Wu
Yeah, it's actually fun because I really loved GPT5's personality. But I think it's like the way I used, you know, chat. GPT was very utilitarian. Like it's like, you know, mostly for work or just like information.
Martin Casado
Yeah, I've definitely come around just, you know, but like I actually felt the dissonance when it changed. It's like, it's like there's this emotional thing that goes on, but it's almost like it's an anti, you know, disintermediation technology. Like you kind of have to show this to the user.
Sherman Wu
Yeah, yeah. And then you see a lot of like, you know, more successful products like Cursor, like do this directly, especially the coding products where users want more control. We've even seen some like, you know, like more general consumer products do this. And so it's definitely been true on the consumer side. The interesting thing is I think it's also been true on the API side and that's also something that I think.
Martin Casado
No, exactly, that's exactly what I'm saying. The argument could be that I could use the API to disintermediate you, but like you don't see that happening because it's so hard to put a layer of software between a model and a person. You almost have to expose the model.
Sherman Wu
Yes, yes. And I think, if anything, I think the models are like almost like diverging in terms of like what they're good at and like their specific use case. And I think there's going to be more and more of this. But yeah, basically it's been surprisingly hard for or like the retention of people building on our API is like surprisingly high, especially when people thought you could just kind of swap things around. You might have, you know, like even tools that help you swap things around. But yeah, the stickiness of the model itself has been surprising.
Martin Casado
And do you think that is because of a relationship between the user and the model or do you think it's more of a technical thing which is like my evals work for OpenAI and the correctness maintains.
Sherman Wu
Yeah, yeah, I think it's both. So I think there's definitely an end user piece here, which is what we've heard from some of our customers. Like they just get familiar with the model itself. But I also think there's a technical piece which is like the. Also, as a developer, especially with startups, you're really going deep with these models and really iterating on it trying to get it really good within your particular harness. You're iterating on your harness itself, you're giving it different tools here and there. And so you really do end up building a product around the model. And so there is a technical piece where as you kind of keep building with a particular product like GPT5, you're actually building more around it so that your product worked uniquely well with that, with that model.
Martin Casado
So, so I, I use, I use cursor and a lot of, just for like a lot of stuff like, like writing blogs and like, you know, you know we're investors and I use it for sometimes for coding and it's remarkable how many models I use in cursor. So like literally my go to model is GPT5. I love GPT5. I think it's a phenomenal like you know, and then like I use like max mode with GPT5 for planning and then. But you know, like, I mean I like the tab complete model that's in cursor and like, you know, the new mod dropped is for like some basic, you know, some stuff is like, yeah, the composer one's good.
Sherman Wu
Yep.
Martin Casado
And so like, you know, and I.
Sherman Wu
Think that like kind of reflects this too because it's like, it's a particular model for each particular use case. Like I've talked to a bunch of people who've used the new composer model and it's just really good for like fast like first pass. Exactly. Like keep you in flow kind of thing and then you kind of like bubble out to another model if you want like, you know, deeper things.
Martin Casado
I mean I literally sit down, I literally sit down and ask GPT5 to help me plan something out and it's really good at that. And then you know, like when I'm coding and I'm doing like the quick chat thing, then I'll use composer and then if there's like whatever, there's like some crazy bug or something like that. Like so you know, do you remember like in the early days of all of this, we're like there's going to be one model and like, I mean like, like even like investors, like we will never invest in a model company because like there will only be one model and it's going to be AGI. But like the reality, it feels like there's this massive proliferation of models. Like you said before, they're doing many things. And so maybe two questions, maybe too blunt or too crass, but the first one is what does that mean to ape for AGI? And the second one is, what does that mean for OpenAI? Does that mean that you end up with a model portfolio? Do you select a subset? Do you think this all gets superseded by some God model in the future? How does that play out? Because it's against what most people thought. Most people thought this is all going towards one large model that does everything.
Sherman Wu
Yeah, I think the crazy thing about all this is just how everyone's thinking has just changed over time. I distinctly remember. And the crazy thing is not that long ago, it's just like two or three years ago. I remember even within OpenAI, the thinking was that there would be one model that rules them all. And it's like, why would you. I mean like this kind of goes to the fine tuning API product. There's like, why would you even have a fine tuning product? Why would you even want to iterate on it? There's going to be this one model that subsumes everything. And that was also kind of the. That is also like the most simplistic like view of what the AGI will look like. And yeah, it's like definitely completely changed since then. I think one. But then the other thing to keep in mind is it might continue to change even from where we are today. But it's becoming increasingly clear, I think, that there will be room for a bunch of specialized models. There will likely be a proliferation of other types of models. I mean, you see us do this with the Codex model itself. We have GPT 4.1 and 4.0 and 5 and all of this. And so I definitely think there's room for all this. I don't think that's bad. For what it's worth, if anything, I think, you know, as we've tried to move towards AGI, things have just been very unexpected and I think the market just evolved and the product portfolio evolves because of that. So I don't think it's a bad thing at all. What I do think it means, you.
Martin Casado
Could easily argue it's very good for OpenAI and very good for like the model companies to like. Yeah, because not have like, you know, winner take all, consolidated dynamics. Right. I mean you just have a healthier ecosystem, a lot more solutions. You can provide a lot.
Sherman Wu
Yeah.
Martin Casado
You know.
Sherman Wu
Yeah. And as the ecosystem grows, it generally is helpful. Like this is one thing we actually think about a lot too is, is as the general AI ecosystem grows, OpenAI just stands to benefit a lot from this. And this is also why some of our parts, we even started opening up to other models like Our Evals product now allows you to bring in other models to all this. We think it's like any rising tide generally helps us here. But yeah, I think as we move into a world where there'll be a bunch more models, this is why we've kind of invested in our model customization product with fine tuning API with the reinforcement fine tuning, opening that up as well. It's also part of why we open sourced GPT oss as well because we want to be able to facilitate.
Martin Casado
I want to talk about that in just a bit because the open source is actually very interesting and I mean actually I thought the open source model was great, but clearly something that companies have to be careful with. But before that I want to talk a little bit about the fine tuning API. I've noticed that you are moving towards more sophisticated use of things like fine tuning, which you know, in a way you could read that as a bit of a capitulation that like, you know, there is product specific data and there's product specific use cases that a general model won't do. To your point. Right. So like as opposed to proliferation model, you do that. It seems like a lot of that data is actually very, very valuable. Right. And so you know, to what extent is there like interest in almost a tit for tat where you can like expose, you know, the ability to get product data into fine tuning and then you also benefit from that data because the vendors provide it to you versus like this is 100%, you know, like they keep their own data and there's kind of no interest in that because it feels to me like the next level of scaling this is kind of where we're at. And so I'm just kind of curious how.
Sherman Wu
Yeah, So I mean, maybe even like taking a step back. The main reason why we even invested in a fine tuning API in the very beginning is one, there's been huge demand from people to be able to customize the models a bit more. It kind of goes into like prompt engineering and also like I think the industry's changed their mind on that as well. Like it's evolved. But the second thing is exactly what you said, which is the companies just have giant treasure troves of data that they are sitting on that they would like to utilize in some fashion in this AI wave. And you can, you know, the simple thing is to put it in like, you know, some like vector, like do rag with it or something. But there's also, you know, if they have a more technical team, they do want to see how they can use it to customize the models. And so that is actually the main reason why we've invested in this. The interesting thing was way back, kind of back in like 22, 23, our fine tuning offering was I'd say like too limited so that it was very difficult for people to tap into and use this data. So it was just like an sf, like a supervised fine tuning PI and we're like, oh, you can kind of use it. But in practice it really is only useful for. It's honestly just like instruction following. You kind of change the tone and you're just really instructing it. But I think the big unlock that has happened recently is with the reinforcement fine tuning model because with that setup we're now letting you actually run rl, which is more finicky and it's harder and you need to invest more in it, but it allows you to leverage your data way more.
Martin Casado
By the way, this is just a naive question for me, which is it feels from just my understanding, from my own portfolio, it feels like there's two modalities of use. One of them is I've got a treasure trove of data that I've had for a long time and I create my model on that treasure trove of data and all that happens offline and then I deploy that. There's another one which is like I actually have the product being used in real time. I've got a bunch of users.
Sherman Wu
Yeah.
Martin Casado
And like I can actually get much closer to the user. I can kind of a B test and decide which data and like it's kind of more of a near real time thing is, is like, is this focus on like more product stuff or more treasure trove or.
Sherman Wu
So the dream with the fine tuning API was that we should be able to handle both. Right. It's like, it's like we actually had this dream and we have this whole like Lora setup with the fine tuning inference where we should just be able to scale to like millions and millions of these fine tuned models. Which would is usually what would happen if you have like this online lear learning thing. Exactly. In practice it's mostly been the format. In practice it's mostly been like the offline data that they've already created or they are creating with experts or something and using their product that they're able to use here. But the main thing I was trying to say around the reinforcement fine tuning APIs, it kind of changes the paradigm away from just small incremental tone improvements, which is what SFT did to Actually improving the model to potentially soda level on a particular use case that you know about. That's where people have really started using the reinforcement fine tuning API. And that's why it's gotten more, more, more uptake. Because if the discussion is less like, hey, I can make this model, you know, not like speak in a certain way better, it's less compelling. But if it's like, hey, for like, you know, medical insurance coding or for like coding planning, agentic planning or something, you can create the world's best model using your data set with rft, then it becomes a lot more.
Martin Casado
And will you, will you ever like, or maybe do you, will you ever like find ways to get access to that data? Like, yeah, so listen, if I, if I had the data and I wanted cheap GPUs, I'd trade you for it. Like, I don't.
Sherman Wu
Yeah, I mean we've talked about this and we've actually been piloting some pricing here too where it's like, because this data is like really helpful and it's kind of hard to get and if you actually build with the reinforcement fine tuning API, you can actually get discounted inference and potentially free training too. If you're willing to share the data. It's always kind of, it's up to the customer there, but if they do, it is helpful for us and there'll be benefits for the customer as well. That's awesome.
Martin Casado
Okay, you said that views on prompt engineering have changed. Yeah, actually I wasn't aware of that. All the other things I was aware. This one I wasn't.
Sherman Wu
Yeah, I mean I think the prevailing view, this is back in 2022, I remember I was talking to so many people and they're basically, I mean this is similar to like the single model AGI view as well, which is like, like prompt engineering is just not going to be a thing and you're just not going to have to think about what you're putting in the in the context window in the future. Like the model would just be good enough and it'll just like know. It'll know what you need to do.
Martin Casado
Yeah, that's definitely not a thing.
Sherman Wu
Yeah, but like that like, I don't know, maybe people forget it but like that was like a very common belief back then because like scaling laws or whatever, something with the scaling laws and like you'll just mind meld with the model and like you just like prompting and like instruction following will be so good that you won't really need to do it. And if anything like, yeah, it's like clearly been wrong. But it is interesting because I think it's a slightly different world that we're in now where the models have gotten really, really good at instruction following relative to the GB35 or something. But I think the name of the game now is less on prompt engineering as we had thought about it two years ago. It's more of like, it's like the context engineering side where it's like what are the tools you give it it? What is like the data that it pulls in? When does it pull in the right data?
Martin Casado
Well, this is very interesting. I mean, I mean to reduce it to like an almost absurdly simplistic level. Like the weird thing about rag, for example, the classic use of RAG is like you're using like cosine similarity to choose something that you're going to feed into a super intelligence. You know, you're like, I'm the random. I'm like randomly grab this thing based on like fucking embedding space. It doesn't really, you know, and like, and then you know, when you want the super intelligence, decide the thing to do. And so it's like pushing intelligence in that retrie retrieval clearly is something that.
Sherman Wu
Makes a lot of sense.
Martin Casado
And to be like the pushing the intelligence out in a way.
Sherman Wu
Exactly. And to be fair, I think like RAG was kind of introduced when the models were like, it's like pre reasoning models. It was like you only had kind of like one shot to like do this and it wasn't that smart. But now that we do have the reasoning models, now that we have, I mean if you like. One of my favorite models is actually 03 because it was like one of the most diligent models. It would just like do all these tool calls and it's like really the intelligence itself trying to like do the, you know, tool calls or RAG or anything like that, or write the code to execute. And so the paradigm has shifted there. But yeah, because of that I think like context engineering, prompt engineering, what you put, what you give the model is like extra important.
Martin Casado
Okay, so you have API, so you have the API which is horizontal. You've got ChatGPT and other products which are vertical. We haven't even talked about pixels. This is all just language. Are agents a new modality? Is that something else? Like codecs or.
Sherman Wu
What do you mean by modality here?
Martin Casado
Like they feel both vertical and horizontal to me in a way. Like to me ChatGPT is a product, right? It's like it's a product. And like my mom uses it, right?
Sherman Wu
Yep.
Martin Casado
And an API is a dev thing. You kind of give it to a developer and like a CLI is kind of somewhere in between to me. It's like, is it a product? Is it like it is horizontal? Like.
Sherman Wu
Yeah.
Martin Casado
How is it handled internally? Is that a totally separate team that does agents or.
Sherman Wu
No. So it's. Yeah, it's interesting because, like, I think the way that I, the way that you frame it just now almost seemed like agents was like this like singular concept that like, you know, might or like might have its own particular.
Martin Casado
Maybe a better question is what is an agent to you?
Sherman Wu
Yeah, yeah, yeah, yeah, yeah.
Martin Casado
Even getting a language is like important for this conversation. Yeah.
Sherman Wu
So I don't know. I actually don't even know if it'd be helpful for me to share. But my general take on agents is it's a, it's an, it's an AI that will take actions on your behalf that can work over long time horizons. And I think that's the, that's the most pretty general utilitarian.
Martin Casado
Yeah, yeah, definitely.
Sherman Wu
But like, if you think about it that way. Yeah, I mean, maybe this is what you mean by modality, but it is just a like, way of like using AI and it is a. I guess it could be viewed as a modality, but we don't view it as like a separate thing, separate from API. And let me just try and kind.
Martin Casado
Of, you know, give you a sense of where this question's coming from. Like, I know how to build a product. Like, and we know how to do, go to market for products. We know how to do like, you know, we know the implications of turning them into platforms. Like, it's just, we've been doing this for a very long time. Right. We know how to do the same thing for APIs. Right. We know how to do billing. We know like the tension of like people build on top of it and all of that stuff. And like what I've been trying to. And this is just maybe a personal inquiry, it's just not clear for me. For an agent, if you, if it, if it sits in one of those two camps, is it more like the product camp? Is it more like the. Or because it's kind of both. Like, I could like literally give you coding.
Sherman Wu
Yeah, yeah.
Martin Casado
And like as a user and then you just talk to it. Or I could like build in a way, kind of embed it in like my app. And so like. But then that means something to you as far as like, you know, how do you price it? And what does it mean for ecosystem? Like, like, for example, like would you be fine if I started a company and just like built it around Codex? Is that a thing?
Sherman Wu
Starting a company and building it around Codex? I actually think that'd be great. Like it's a. We, we like really like the Codex SDK and we like want people to be able to and hack on it. Yeah, actually I think this might be what you're getting at, which is, and this is like kind of a unique thing about OpenAI and kind of reflects on how it's run, which is at the end, like at the end of the day, OpenAI is like an AGI company, it's like an intelligence company. And so agents are just like one way in which this intelligence kind of be manifested. And so the way that I'd say we actually think about internally is all of our different product lines. Sora, Codex API, ChatGPT are just different interfaces and different ways of deploying this. So you don't really, so there's no like single teams. Like this is, you know, like thinking about agents. I would say the way that it manifests itself more is like each product area thinks about like what is, you know, this intelligence is actually turning into a form where like it can actually agentic behavior is more possible. What would that look like in a first party product like ChatGPT? What would that look like? This is actually why Codex ended up becoming its own products. Like what would it look like in a coding style product? Like we explored it and chatgpt like kind of worked there. But like actually the CLI interface actually makes a lot more sense. That's another interface to deploy it. And then if you look about the API itself, it's like this is another interface to deploy it. You're thinking about it in a slightly different way because it's the developer first mindset, we're helping other people build it. The pricing is slightly different, but it's all these different manifestations of this core intelligence that is the agent behavior.
Martin Casado
It is so remarkable how much of this entire economy is basically just token laundering in a sense. Right? It's literally like anything I can do to get English in or a natural language in and then the intelligence out. And I mean it's because these things are so resistant to layering. It's so hard to layer a language out. Like, you know, like I could even do it easily, pretty easily with like codecs. I could just like use it, you know, as a component of a program and just you know, basically launder intelligence. I mean of course, you know, I'd be charged to do that. So yes, I actually, my view of this and having seen now so many kind of launches of different products, I've seen agent launches in the definition that you have. I've definitely seen APIs and I've seen products on these is like they're actually quite different than like what we're used to. Like the cogs is different, the defensibility is different like all. So we're kind of rewriting it. And so it's kind of like you know, you came from a kind of pricing background. I mean you're working on model for pricing now you have the API. So I just love your thoughts on like, I mean how, how have you evolved your thinking and how do you price these? You know, access to intelligence where you know, you don't know how many people are going to use it. It's almost certainly usage based billing, not something else. Like can you talk just a bit about like philosophy around pricing on these things? Is it different for product versus API?
Sherman Wu
Like yeah, I think the, the, the, the honest truth here is like it's evolved over time as well. And, and like I actually think the simplest, like the reason why we've done usage based pricing on the API honestly is because it's been like it's closest to how it's actually being used. And so that's kind of how we, how we started. I actually think usage based pricing on the API has surprisingly held strong and I actually think this might be something that we'll keep doing for quite a long time. Mostly because the cogs are so high.
Martin Casado
I don't know how you don't do usage based. Yeah, I just don't know how that.
Sherman Wu
Yeah. And then there's also the strategy of how we price it and internally one thing we do is we always make sure that we actually price our usage based pricing from a cost plus perspective. We're actually just trying to make sure that we're being responsible from a margin perspective.
Martin Casado
By the way, this is a huge shift in the industry in general just because like I remember the shift from on prem to, to recurring. Yeah, that was a big, big deal. Like that created zora. Like it like created whole companies like books on it. Like a bunch of consultants on how you do this. It changed like you know and like I think the shift to, to usage is, is as bigger, bigger and it's also even a really hard technical problem.
Sherman Wu
Yeah.
Martin Casado
Like I can't even imagine. 800 million.
Sherman Wu
Wow.
Martin Casado
Like how do you build.
Sherman Wu
Yeah. Well, well, well. 800 million. Wow. Is a little easier because it's, it's, it's not usage based price. Missing its subscription. So it's like that was like wait, that's way. But I mean there's still like, like a lot of users on the API that we need to like you know, manage all the billing side.
Martin Casado
There's some like overages or stuff you've got to deal with on that or.
Sherman Wu
What do you mean by overages?
Martin Casado
Like I don't know, I guess most.
Sherman Wu
People have quotas and then we'll kind of like oh, they're like max quotas that we don't let people go over. But like in practice these quotas are like pretty, pretty massive.
Martin Casado
That would literally be like one of the most complex systems somebody's ever built if you would do usage base at like that scale. I mean these are very, very, very. And like you have to be correct. Like these are very hard systems. Scale.
Sherman Wu
Yep, yep, yep. Yeah, yeah. I mean whole team thinking about this now internally. Yeah, I mean users prepricing is also interesting. So there's. We acquired this company called Rockset a while ago. A founder, his name is Ben Cot.
Martin Casado
He's right now. Awesome.
Sherman Wu
Ben's incredible. He's one of the best.
Martin Casado
Like Ben Cot, if you're listening, we're huge fans. I'm a huge fan.
Sherman Wu
He's going to love this.
Martin Casado
He's great man. He's a legend.
Sherman Wu
Anyways, I was talking to him about pricing as well and his take is that pricing is kind of like a one way ratio and basically once you get a taste of usage based pricing, you're never going to go back to the per se, per deployment type pricing. And I think that's definitely true and I think it's just because it gets closer and closer to your true utility. You're getting all this thing. The main pain point is you have to maintain all this infra to get it to work well. But if you do have it, he thinks it's like a one way ratchet where there's just no going back. And I think the hot new thing now is like oh, with AI you can now kind of measure out outcomes and so that's like another, you know, like step forward and if that works like maybe it's a one way ratchet. So we thought about that. It's like, you know, is there some type of like outcome based pricing? This is more on the first party side on an API it's kind of. Yeah, that's hard to measure.
Martin Casado
Very hard I mean that's hard because you end up having to price and value non computer science infrastructure. Right. Like you're literally going into verticalization now. Like you're like, I mean listen, if it's like porting a code base, maybe you'd have some expertise. But if it's like whatever, like increasing crop yields some level you need to like.
Sherman Wu
But there could be a world where like the AI is like good enough, where it can like actually, you know, make judgments of these and do it in an accurate enough way where you can tie it to billing.
Martin Casado
I think this is a problem with AI conversations because like at any point in time you're like, but it could get good enough. It's not a problem anymore.
Sherman Wu
Yeah, at some point it'll be solved. It's so much like the prompt engineering and the single AGI thing from before. Yeah, yeah. It's like when you, when you reach that level of, when you push it that far, everything's kind of solved. On outcome based pricing it sounds very appealing. Like if it can work, it can work. But one thing that, that we've started realizing is it actually ends up correlating quite a bit with usage based pricing, especially with test time compute. Like if the thing is just like thinking quite a bit. Like actually if you charge just by usage based and not outcome based, you're like basically approximating outcome based at this point. If the thing is thinking for so long, it's like highly correlated with what it's doing.
Martin Casado
It's just adding more value.
Sherman Wu
Yeah, yeah, exactly. And so maybe at the end of the day usage based pricing is all you need and it's like, which is just going to live in this world forever. But yeah, I don't know, it's constantly evolving. I think our thinking has evolved here as well. I personally am keeping track of if the outcome based pricing setups can actually work here. But at least on the API side I think it's such a usage based setup we have the good infrastructure around this and so I think we'll probably stay with that for a while.
Martin Casado
So how do you think about open source? I mean I think you're the owner, only big lab that's releasing open source is that.
Sherman Wu
No, Google has some of theirs.
Martin Casado
Okay.
Sherman Wu
Yeah, mostly smaller models on their side.
Martin Casado
So how do you think about open source vis a vis competition? Cannibalization? What's the strategic goal? What's the complexity?
Sherman Wu
Yeah, yeah. So I personally love open source. I think it's great that there's a.
Martin Casado
All of us grew up with it, right?
Sherman Wu
Yeah. All of us grew up with it. Like the Internet wouldn't exist without it. Like so much of the world, world is built built on top of it.
Martin Casado
Cloud wouldn't exist without of it. Basically nothing would exist without of it. Except for maybe Windows.
Sherman Wu
And so it was interesting because like I felt like over the last three years before we launched the open source model, I know Sam feels this way as well. It's like there's this like weird like, you know, mindset where because OpenAI hadn't launched anything, it just seemed like it was super like anti, like OpenAI was like super anti open source. But I'd actually been having conversations with Sam ever since I joined about open sourcing a model. We were just trying to think about, about like how can we sequence it? What? Compute is always a hard thing. It's like, do we have the compute to kind of like train this thing? So we've always wanted to kind of do this. I'm really glad that we were able to finally do it. I think it was, was it earlier this year. I like lost sense of time.
Martin Casado
AI time is so good.
Sherman Wu
Yeah, I was like, was it last year? No, it was this year. Yeah. When GPT OSS came out. And so I was just really glad that we did that. The way that I generally think about it is one, I think as a. This is also particularly true for OpenAI because as you said, we are a vertical and a horizontal company. It's like to continue investing in the ecosystem and just from a brand perspective, I think it's good. But then also I think from OpenAI's perspective, if the AI ecosystem grows more and more, it's like a rising cost. This is all really helpful for us. And if we can launch an open source model and it helps unlock a whole bunch of other use cases in other industries, I think that's actually not good for us.
Martin Casado
I'll say what people don't talk about a lot is how well these open source AI business models actually work. Because this is very, very like, like the cannibalization risk is actually very low.
Sherman Wu
Yeah.
Martin Casado
And like you don't really enable competitors a lot because I mean, when we say open source you really mean open weights.
Sherman Wu
Right.
Martin Casado
It's not like they could recreate it. Right. You know, and like if I can distill your API as well as I can distill like you giving me the weights in some way, like. And so like it doesn't really change that dynamic a lot.
Sherman Wu
But yeah, yeah, I mean to be, to be clear like we have not seen cannibalization at all. Yeah, of course it's like, seems like a very different set of use cases. The customers tend to be like slightly different. The use cases are very different.
Martin Casado
And by the way, it turns out inference is super hard. Like to actually have scalable, fast performant.
Sherman Wu
That's a hard, hard problem. Yeah, so. So like I'd say the way that I personally think about open source in relation to the API business in particular is well one, it hasn't shown cannibalization risk. So you know, I'm not particularly worried about that. But also like, especially for all these major labs, like they're usually like two or three models where like that is where you're making, making all of your impact, all of your revenue. And those are the ones where we're throwing a bunch of resources into improving the model. And these tend to be the larger ones that are like extremely hard to inference. We have a really cracked inference team at OpenAI and my sense is like even if we just open source them, if we just literally open sourced GPT5 or something, it would be really, really hard to inference it at the level that we are able to get it to do. There's also by the way a feedback loop between the inference team and the training team too, so we can optimize.
Martin Casado
Is it possible to verticalize models for products? Products?
Sherman Wu
You like train models specifically for products?
Martin Casado
Yeah, I mean to actually, yeah, I.
Sherman Wu
I think, I mean we've kind of done this with GPT5 codecs, right? Or do you mean like even more verticalization?
Martin Casado
Like verticalization, like, like deep, deep, deep verticalization where like, you know, like the, like, like the, the released model wouldn't, you know, it's like actually part of a product.
Sherman Wu
I think we're like basically starting to move in that direction. I think there's a question of how deeply you verticalize. I think most of what we've done is mostly at like the post training, like the tool use level. Like Codex is particularly good at using the, sorry, GPT5, Codex is particularly good at using the Codex harness. But there's like even deeper verticalization you can do like that. And that one I think is more of an open question.
Martin Casado
Yes, well like a lot of my, I mean a lot of my mental model, this comes from the pixel space which is like you, you know, you can laura a bunch of image models, right? And you can do a bunch of stuff to make it better, more suitable for some products for examp. But like these open source models are really really good. And like I, you would believe that you could like verticalize a model for like editing or cut and paste or this or that, you know, like that's actually part of this but you actually don't see that happen. Yeah, it's almost always like you're just kind of exposing like a model, not something like specific to a product.
Sherman Wu
Yeah, I think, I think so. I think there's a distinction to be made between the like the, the image model space and the text model space.
Martin Casado
Yeah.
Sherman Wu
Also because the image models tend to be way smaller and like you can iterate on it a lot faster. That's why you get that crazy cool proliferation of the image model side. Whereas I don't know for the text models there's always going to be this really big fat pre training step that you have to invest in. And then even the post training side is not the easiest thing. Just from a compute perspective. Obviously it's much smaller but it's still pretty heavy to do a full mid train or a post training run. And so I actually think that's one of the bigger bottlenecks because I think you are right that on the image side. Yeah you can fine tune a image diffusion model to be extremely good at editing faces. Yeah.
Martin Casado
Something very specific and then you build.
Sherman Wu
A product around that and it's like yeah, you can just kind of put all these resources into and iterate on that one specific model. Whereas it's a much heavier motion on the tech side.
Martin Casado
I got to say it is a bit of an anti pattern to do both languages like language based models and diffusion like pixel models in the same company. Most that have tried like it sounded very clunky to do it but I mean you and Google are the two kind of counter examples for this and so like is it possible to even like converge the infrastructures on these things? Like I mean is it totally different orgs, is it shared infrastructure? Like how do you operationalize?
Sherman Wu
Yeah, I think, I think you're totally right. Synanti patterns, it's pretty tough to pull off. I think honestly like props to Mark on our research team for like you know structuring things in a way where we're able to do it. From my perspective I think the biggest thing is I think our like image like our, our I think called like the world simulation team like the team that builds Sora and all that under Aditya is just extremely solid. Like they are probably it's like the highest concentration of like talent that I've.
Martin Casado
Seen in a while. But is it the Same, like, is it the same? Is it like, are they like totally separate infrastructure? Do they use the same infrastructure?
Sherman Wu
Yeah, yeah, yeah. So it's, it's, it's actually like pretty separate. So. And I think that's part of the reason why we're able to kind of do this. Well, it's like one is like the team needs to be extremely strong, which they are. And then two is um. They're, they're, they're, they're run very separately. They're kind of like thinking about their own particular roadmap. They think about productization very separately as well. Right. Which is how like the Sora app kind of came out of that as well. And then, yeah, even like the inference stacks are slightly different. Are kind of like different. They own a lot more around their inference stack and they optimize their inference stack pretty separately. And so I think that contributes to helping us run things in parallel. But it's pretty hard to pull off for sure.
Martin Casado
Maybe you can educate this on me. Like, so I think about APIs as mostly text based for OpenAI. Do you guys do actual, do you do actual pixel based stuff?
Sherman Wu
Yeah, yeah, we do. We have a bunch. So Dolly, Dolly 2 is in the API. The OG OG model. Dall E2 is in the API Dolly.
Martin Casado
That was like the first real text image model, right?
Sherman Wu
Yeah, yeah, yeah, yeah. That was actually the model that got me to go to OpenAI because it was this summer when I was looking for, I was thinking about something new. It's when Dall E2 came out and it just completely blew my mind. Wow. And I distinctly remember I was like asking it to do the simplest thing, like draw a picture of a duck or something. Like the simplest thing now. And it's like it generated a picture of a, you know, like a white duck. And so that, that was actually the thing that kind of got me to OpenAI in the first place. But yeah, we have a bunch in our, in our API, the Image Gen model as well as in our API. And then Sora 2 is in our API. We launched it at Dev day. It's actually been a huge hit. I've been very, very surprised. Need more GPUs for that. But the amount of use cases and.
Martin Casado
Then from your standpoint, like you can converge that like the API infrastructure. Probably like that.
Sherman Wu
Yeah. So there's, yeah, I'd say on the API side a lot of the infrastructure is shared for those, but once you reach the inference level, they're separate. Right. Because you got to inference them differently and it is that team that has just been really laser focused on making that side particularly efficient and work well separate from the text models. But yeah, we have ImageGen, we have VideoGen, and we'll continue adding more to the API there.
Martin Casado
So it feels like we've been evolving our thinking as an industry on a bunch of stuff. Sure. Is like the models like we've talked about. The other one is like context engineering. It seems to me that like actually how you build agents and expose them has evolved too. So maybe you can talk a bit about that.
Sherman Wu
Yeah, yeah, I think so. At Dev day this year when we launched our agent builder, I got a bunch of questions around this because the agent builder is like the bunch of different nodes and it's like the deterministic thing. And I was like, oh, is this really like the future future of agents? And we obviously put a lot of thought into this when we were thinking about building that product. But the way I think about it.
Martin Casado
Is do you think they came from a point of being constrained by the way they're like, oh, this is too constraining.
Sherman Wu
And like, yeah, I think people are like, it's too constraining. It's not like AGI forward, you know, like at the end of the, again, at the end of the day the AGI will be able to do everything. And so like, why not? Why have nodes in this like node builder thing?
Martin Casado
Just tell it what to do.
Sherman Wu
Yeah. And so I think there's like two things at play here. One of them is like, there is a like practicality component. And then the other thing is I think there are actually like different types of work that exist out there that could be automated into agents. And so on the practicality side is, yeah, like the models today, just like, maybe in some future world instruction following would be so good that you just like ask it to do this four step process and it like always does the four step process. Exactly. We're still not there yet. And in the meantime, you know, this entire industry being born and a lot of, you know, people still want to use these models and what can you build for them? So there's a practicality component of it. When did you launch that Dev day? So it feels like forever ago. Earlier this month. October. It was like October 6th or something. Yeah, yeah. So less than a month ago actually. Okay. It's been crazy seeing the reception to it. By the way, I think the video where Christina on my team demos agent builder is like one of the most viewed videos on our YouTube channel now.
Martin Casado
I will say just Anecdotally, from kind of my perspective, people love it.
Sherman Wu
That's great.
Martin Casado
But I also saw the dissonance too. Like I saw when it came out, people were like, wait, what is this?
Sherman Wu
Yeah, exactly.
Martin Casado
Low code. Low code.
Sherman Wu
Yeah, exactly. That's another low code thing.
Martin Casado
And now people love it. Yeah, yeah, yeah.
Sherman Wu
So there's a predicament mentality piece. There's another piece which is like when we were talking to our customers, we've realized that there's like. Because at the end of the day a lot of this, the agent work is just trying to automate work and like what people do in their day to day jobs realize there's like actually like two different types of work. There's the work that we think about which is like maybe what like software engineers do, which like it's very undirected, there's like a high level goal and then you have like, you know, you have your cursor and you're just like writing, writing code and you're kind of like exploring things and going towards an objective that's like, I don't know, more like knowledge based work, like data analysis, maybe like that like coding is kind of like this. Um, but then there's another type of work which is actually what we realize is like maybe even more prevalent in industry than, than, than, than software. We're just, we're just not aware of it, which is work tends to be very procedural, very like SOP oriented. Like customer support is a good example of this. Like customer support, there's like very clear policy that these agents and people have to follow. Yeah. And it is actually not great for them to deviate from this and like try something else. It's like the, the team really, the, the, the people running these teams just really want these sops to be followed. Yeah, yeah. And this pattern actually generalizes what is an s different work. A standard operating procedure. Yeah, sorry. So it's just like the way in which you need to operate the, the, the, the support team. But like this extends to like marketing, this extends to like sales extends to like a bunch way more than it has any right to. And what we realized is like there's a huge need on that side to have determinism here of which an agent builder with nodes that kind of like helps enforce this thing ends up being very, very helpful. But I think a lot of us, especially in Silicon Valley don't really appreciate that. There's like a ton of work that.
Martin Casado
Actually falls into this camp, I gotta say. Like there's a pattern that's similar to this. I'm wondering if you've seen it that I've seen where some regulated industries actually can't let any generated content go to a user.
Sherman Wu
Yeah, right.
Martin Casado
And so what they do is, I think it's so interesting, they'll like either pass in a conversation tree and you can choose something from here.
Sherman Wu
Yep. So there's some human element.
Martin Casado
So. So as part of the prompt they're like, here are the viable things you can say, choose which one to say. So the language reasoning has happened by the model, but nothing generated comes out.
Sherman Wu
Interesting, interesting.
Martin Casado
Does that make sense?
Sherman Wu
Yeah, yeah, yeah, yeah, yeah.
Martin Casado
And then another one I've seen is like actual pseudocodes I'll pass in like a Python function and then we'll ask.
Sherman Wu
A human to like, like use the pseudocode to write actual code that, that makes it in.
Martin Casado
Or the. It actually has a response catalog as part of it and it has like the logic to apply. And then.
Sherman Wu
Interesting.
Martin Casado
And so like the model takes the language in from the, Takes the langu from the human user and then. Well, like, you know, the logic of how to respond is I can Python code because it just turns out that like, there's been a lot of code written for these types of things and then it actually includes the responses that you would send out. Does that make sense? Actually, a lot of NPCs are done this way. Like actually video game NPC. So. Yeah, so, because the way that I think about it is like, you know.
Sherman Wu
So that way with the NPCs, it's. The actual code being generated by the model is not what ends up making it to the, to the end user, just to the.
Martin Casado
That's. It's not the code is not being generated by the, the model, it's the prompt has the code. So like, so let's say, let's say that I have an NPC and I want the npc. Like let's say you're the gamer and so you're coming in and you're talking to my npc. But my NPC has some logic that it needs to do. Like if you say a certain thing, I'll give you a key or maybe it'll barter, like describing the game logic in English just doesn't work actually if you try and do it. And then like actually scripting the output doesn't work either. If you needed to use it in a game context, like, you would have to know, like, give like a specific direction or a specific this or that. So how do you make these Things behave in a more constrained way. People pass in functions. They'll actually describe the logic in Python. So my prompt will be like, you're an NPC in a video game. The user just asked you a question. Here's the logic you should go through. If the user says this, then do this. It's like the pseudocode. Like, if the user has this in the belt, do this, whatever, whatever, whatever. And then here are the set of valid responses. And so you're almost constraining.
Sherman Wu
I see, I see.
Martin Casado
And then when it actually does do a response, you can validate that it's one of those responses.
Sherman Wu
I see. Highly structured. Yeah. Okay. So the NPC still only exists in that, like the space that it can act in is still only within the space of the program that you.
Martin Casado
So the, the. Yeah, well, the logic is in there so it can have a normal conversation. But like, in as much as you're trying to guide the logic for like, like, like, like game design or game logic. And so like, so you see this with NPCs, but you also see this with regulated industries. For like, I literally can't have it. Like.
Sherman Wu
Yeah, I was gonna say what you described kind of sounds like, you know, Giving the, the SOPs to like, your set of human operators. To like.
Martin Casado
Yeah, yeah, yeah.
Sherman Wu
To stick to it, please.
Martin Casado
Yeah. You, you must say these three things. And here's like the decision.
Sherman Wu
A refund if it's like less than this amount. Yeah, yeah, yeah, yeah. Very interesting. Yeah, yeah. I mean, I mean, yeah, I don't want to equate them to NPCs, but like, this is very similar to similar.
Martin Casado
I'm just saying it's actually like, if you want, if you want to really guarantee what happens, you have, there's like a set of techniques that you do. And like, there's some situations where you want to constrain what they do. It could be from a regulatory standpoint. It could be because you want it to run for a long time. And it also could because I actually have game logic, and my game logic is a traditional program. Like, I have, have like a monetary system, I have an item system, I have a battle system. Like, you can't describe that in English. Like, you have to kind of give it to them so it can behave within that.
Sherman Wu
Yes, and that is, that is exactly the problem I think we're trying to solve here. Right. Just like, if you do not give it any of this, like, it can just kind of go off and do, do whatever. And yet there are like regular regulatory concerns around this and that is the exact use case that I think we're trying to target with Asian Builder.
Martin Casado
That's awesome. Well, listen, we're running out of time and there's a million more things I want to ask you. But listen, I really appreciate your time to come in. It was a great kind of surveying like what's going on and particularly like teasing apart horizontal versus vertical in this patient.
Sherman Wu
Yeah.
Martin Casado
Which I really want to do. So thank you so much.
Sherman Wu
Yeah, thank you.
Podcast Host / Narrator
Thanks for listening to this episode of the A16Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating, or review and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts and Spotify. Follow us on X1 6Z and subscribe to our substack@a16z.substack.com thanks again for listening and I'll see you in the next episode. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com disclosures.
Date: November 28, 2025
Host: Andreessen Horowitz (a16z), featuring Martin Casado
Guest: Sherman Wu, Head of Engineering for OpenAI's Developer Platform
This episode digs deep into OpenAI’s simultaneous development of horizontal (API/developer platform) and vertical (first-party apps like ChatGPT) products for a user base approaching 800 million weekly. Martin Casado and Sherman Wu discuss the paradigm shift away from the “one model to rule them all” vision, challenges of balancing platform and product ecosystems, the stickiness and differentiation of AI models, the growing world of model fine-tuning, open source strategy, and the changing infrastructure of AI development.
Throughout, the conversation highlights OpenAI’s adaptation in the face of both technical and market realities, including pricing models, enterprise demands, and practical agent-building.
Sherman Wu recounts his journey:
On his move to OpenAI:
Company culture contrasts:
“A tenth of the globe uses it every week. Every week.” – Sherman Wu ([09:43])
“Retention of people building on our API is like surprisingly high, especially when people thought you could just kind of swap things around.” – Sherman Wu ([13:57])
“The crazy thing about all this is just how everyone’s thinking has just changed... Even within OpenAI, the thinking was that there would be one model that rules them all... It’s like definitely completely changed since then.” – Sherman Wu ([17:04])
“It actually ends up correlating quite a bit with usage based pricing... Maybe at the end of the day usage based pricing is all you need.” – Sherman Wu ([35:34])
“It was interesting because... OpenAI hadn’t launched anything, it just seemed like it was super anti open source. But... we were just trying to think, how can we sequence it?” — Sherman Wu ([36:41])
“To be clear, like we have not seen cannibalization at all. ... The use cases are very different.” – Sherman Wu ([38:25])
“There’s a huge need on that side to have determinism, of which an agent builder with nodes... ends up being very, very helpful.” – Sherman Wu ([48:20])
“If you do not give it any of this, like, it can just kind of go off and do whatever. And yet there are like regulatory concerns around this and that is the exact use case that I think we’re trying to target with Agent Builder.” – Sherman Wu ([51:54])
Casado and Wu’s conversation is open, highly technical, and continually reflects a real-world pragmatism. They debate not only strategic decisions but industry-wide evolutions, all the while acknowledging the rapid pace of change—and the unpredictability—of building AI both as infrastructure and end-user product.