
Mike Krieger is the Co-Founder of Instagram and now CPO @ Anthropic. In Today’s Episode with Mike Krieger We Discuss: 03:07 Where Will Value Be Created and Sustained in a World of AI? 04:59 Are Foundation Models Commoditised Today? 08:36...
Loading summary
Mike Krieger
I think models over time get more different rather than more similar. I still think we are in like day one around Is AI an indispensable part of most people's work? And I think the answer is no. I think the deepsync piece people seem surprised that there were cutting edge research teams there and if you were paying attention that part should not have been the surprising piece. I think we've if anything under invested a bit in two things. One is just having a faster iteration speed on first party products and then on the second part on the API side.
Harry Stebbings
This is 20 VC with me Harry Stebbings and today our guest is incredible Mike Krieger, co founder of Instagram and now CPO at Anthropic.
Unknown
One of the best placed individuals to speak about the state of models today.
Harry Stebbings
And where they are going. This was an incredible discussion for me to have and I so appreciate Mike being so open.
Unknown
But before we dive in today, turning your back of a napkin idea into a billion dollar startup requires countless hours of collaboration and teamwork. It can be really difficult to build a team that's aligned on everything from values to workflow, but that's exactly what Coda was made to do. Coda is an all in one collaborative workspace that started as a napkin sketch. Now, just five years since launching in beta, Coda has helped 50,000 teams all over the world get on the same page. Now at 20 VC, we've used Coda to bring struct structure to our content planning and episode prep, and it's made a huge difference. Instead of bouncing between different tools, we can keep everything from guest research to scheduling and notes all in one place, which saves us so much time. With Kodi, you get the flexibility of docs, the structure of spreadsheets, and the power of applications all built for enterprise. And it's got the intelligence of AI, which makes it even more awesome. If you're a startup team looking to increase alignment and agility, Coda can help you move from planning to execution in record time. To try it for yourself, go to Coda iO20VC today and get six free months of the team plan. For startups, that's Coda iO20VC. To get started for free and get six free months of the team plan. Now that your team is aligned and collaborating, let's tackle those messy expense reports. You know, those receipts that seem to multiply like rabbits in your wallet, the endless email chains asking can you approve this? Don't even get me started on the month end panic when you realize you have to reconcile it all well. Pleo offers smart company car, physical, virtual and vendor specific so teams can buy what they need while finance stays in control. Automate your expense reports, process invoices seamlessly, and manage reimbursements effortlessly all in one platform. With integrations to tools like Xero, QuickBooks and Netsuite, Pleo fits right into your workflow, saving time and giving you full visibility over every entity, payment and subscription. Join over 37,000 companies already using Pleo to streamline their finances. Try Pleo today. It's like magic, but with few you are rabbits. Find out more at PLEO IO 20 VC. Don't forget to secure trust with your customers. Trust isn't just earned though, it's demanded. That's why over 9,000 companies, including Atlassian, CORA and Factory, rely on Vanta to automate their security compliance. So Vanta helps businesses achieve certifications like SoC2 and ISO 27001, turning months of tedious work into this beautifully fast and straightforward process. Their platform automates compliance across over 35 frameworks, it central workflows and it proactively manages risk, all while saving you time with automation and AI. So whether you're just starting or scaling your security program, Vanta connects you with auditors and experts to get audit ready quickly and build trust with your customers. Get $1,000 off your first year by visiting vanta.com 20vc. That's v a n t a.com 20vc.
You have now arrived at your destination.
Harry Stebbings
Mike Dude, I am so excited for this. I've literally just been out for a walk and I've been listening to every show that you've done in the last year. And so I told you before, I don't want to start with oh, how did you get into tech and all the normal rubbish. I want to start with a very challenging first question, which is I as a Venture Investors day, have to determine where value is in the future and I look at the world today and I don't know. And so my question to you is, when we look forward, where will value be generated in an AI driven decade that we have ahead of us?
Unknown
Yeah, I think it's an awesome question. I get a version of this question often from entrepreneurs who I went from, you know, purely building startups myself to now running a company that is partly enabling new startups to get created or helping boost their their fortunes. And the question I get often is like, well, what can I build that is not going to be in the lane of an anthropic or, you know, another one of these labs and I don't have a perfect answer because I have the crystal ball. But my sense of where it ends up being most valuable to exist is place where you have some differentiated go to market some differentiated knowledge of some particular industry or some special data that only you have access to, ideally two or even three of those as well. So companies that are, you know, within a financial sector, within a legal sector, within healthcare, I mean healthcare I've like gotten exposed to and it is, you know, a tremendously complex ball of yarn. And like the work up front, it's not the sexy work, it's actually not.
Mike Krieger
The work that you're going to be.
Unknown
Able to really do in a accelerator or you know, a short amount of time. But it is the worst that the legwork that you've put in. I think those are durable places to generate value and then you know, you can sit in a place where you can pull on what's great from the foundation models. You can do your own fine tuning if you need it, you can do your own AI specialization if needed. But the thing that's going to give you legs and like be durable over long run is being able to sell into those places, have something that you understand about those places uniquely and then get better for being deployed there over time.
Harry Stebbings
When you say about the legwork there, what I think to you and you said about differentiated GTM and differentiated data pools or data sources, does this next generation wave of AI benefit existing vertical SaaS companies who have those already and can implement AI, or does it benefit bottoms up net newly created companies in those spaces? Which one more so that's a great question.
Unknown
I think it can be both. At the highest level. The way I think about AI and product design is you have to dance this very delicate dance of showing the future and dreaming up what the models are currently capable at their edges, you know, because you want to design for where they'll be, gosh, three months from now, which is how quickly things are mo but not over promise and underdeliver because that's like a very trust breaking piece. And now if you're a startup you.
Mike Krieger
Can do a little bit more of.
Unknown
The over promising because people are kicking your tires, the early adopters, they have a little bit more of that sort of willingness to engage. It's much harder if you're like an existing verticalized SaaS company and you say we've added AI and people try it and it's like it's not that good or like, oh, I thought it was going to do all these things or you said it could do these 30 things. It does like two of them. Well, I think that like each of those two groups have a very different challenge. On the former, it's you have established products, you have established behaviors, you want to skate to where the puck is going without alienating your existing customers. I think we can dive in. I think there's some good patterns for doing that. And on the startup front, you probably don't yet have the data and it's like landing the initial sort of lighthouse customers. Or you don't have the relationship, but you have some hypothesis about where AI will have an impact on a given industry or a given vertical. And then your differentiation is not the established relationships, it's painting the future and finding ways of delivering that value quickly within a company that might be willing to take that bet on you.
Harry Stebbings
You mentioned there about kind of startups building for where models will be. It's a very challenging time where startup products are so determined, quality wise by the quality of the models. And a change in model can seismically change a startup's output, be it a coding software or a legal platform, whatever that is.
Unknown
Should startups build for what we have today or should we build for what.
Harry Stebbings
We can project forward in time?
Unknown
Really good question. I've heard from multiple people that say like my startup was not a startup until Cloud3.5 Sonnet or the second Cloud3.5 Sonnet, but I hear that from entrepreneurs that are. This company was not a company until this model breakthrough where now, you know, the accuracy went up from 95 to 99 and now that's, you know, close enough for this industry or from sometimes it's like from 70 to 90, sometimes you get those kind of generational leaps as well. So how to figure out where that is? Like there's times where entrepreneurs have been knocking their heads against the wall within a particular space where whether it's helping people code, whether it's helping with legal analysis, whether it's, I mentioned healthcare or something in that space. And the lovingly assembled version of what they did, which probably involved multiple tools, was either like price uncompetitive because it required an opus class model that was not going to be sort of supported by the underlying business is still worth doing because when the model arrives, you're not starting from square zero. And so often the companies that do benefit from those model generation shifts are not the ones that suddenly start that day. Like, gosh, you know, it sounds like Cloud3.7 Sonic can do that. It's the ones that have Been beating against the wall. I take Cursor as an example. Somebody showed me a list of Hacker News front page submissions from the Cursor founders over time and it finally broke through. But that was not their first product or their first sort of like iteration on it. They've been trying and going, but I don' exactly how long it was, but it was, you know, it was not just quickly enabled by the model. It came from that sort of building context, building knowledge, building sort of experience about what has gone wrong or gone well with that space so that the model can unlock you. So I guess to be more succinct, don't wait around for the models to be perfect, be exploring in this space, be frustrated by the current generation, the models, and then be very aggressively trying the next one so that you can feel like you can now finally deliver on the thing that you saw in your head. If only the models were just a bit more capable.
Harry Stebbings
I have to ask, when you said about differentiated gtm, differentiated data, and then you said, wow, you know, there's so many different releases and they come so thick and fast. I don't know how to say this. Is there value in the model layer? If it's not a differentiated data game, is it a differentiated GTM game? How do you think about that?
Mike Krieger
I think it's a couple of different pieces on the model layer and like on the foundation model layer especially, I think about like three places where it's worth investing for sort of a long term place in the market. One is talent. I know it's hard to quantify exactly what does talent mean? What does talent density mean? But talent begets talent, right? And you become an attractor and especially talent around sort of a cohesive mission or a story about why you're building what you're building. And I've absolutely seen that at Anthropic, where I love our research team and feel like monthly we get some new significant hire that has come from potentially another lab, potentially academia, and has joined. And. And so it's an advantage you have to cultivate and also maintain because people are obviously free agents and they can do what they want to do. So you have to maintain whatever was attractive in the first place. But that is important because to stay at the frontier, it requires more than just more of the same. It requires also figuring out what the right breakthroughs are. So that's one. The second one is I think models over time get more different rather than more similar. Of course, there's a lot of similar benchmarks that people are Looking towards. But there is something Claudi about Claude and I think there is something GPT about GPT and they have their pros and cons and that's both from a character and tone side of things. But then there's also sort of the places where those models really excel. And for us it's clearly been coding as one really big vertical that we've gone after and it wasn't an accident. And it's also not a thing that we just say, great, it's good at code, let's just continue to be kind of good at code. It's seeing that traction and seeing how many companies are now relying on cloud models for code, for example, or for agentic planning inspires the next generation of what you want to do from a reinforcement learning perspective. So the first one's talent, the second one is focus and model characteristics over time that you sort of develop deeper. And then the third one is I got this question a bunch with Deepseek. When Deepseek came out like, all right, what does Deepseak mean for you? And I think there's things that we learned from on the tech side just looking at what they were doing. But from a go to market and place in the market perspective it has almost no impact. And that's because the relationships we end up having with companies are not they sign up for the API, they want to just exchange their input tokens for output tokens at some rate. It's actually, hey, I want to be your long term AI partner. I want to help co design products with your applied AI team. I want to dream big with you. I want to think about not just your API but also cloud for work. And so it looks more like being a company, which I know sounds trite but is sort of what you're providing people is a partnership, not just AI models. I think the more you are just like, maybe it's good inverting that all to see what the failure mode looks like. I think it is resting on your laurels or not retaining your best people just believing that making the models incrementally better in every benchmark is enough and then treating the API as just like a way of exchanging money for intelligence without figuring out how to be more of that AI partnership. If you can't do all three of those, I think you're in trouble.
Harry Stebbings
I do want to go into the coding element in a minute. I do just have to ask when we look at kind of blockers or barriers to progression, when you look today, what do you think the Biggest blockers are because this is one where I have completely disparate opinions from different people. Whether it's Alex Wang or whether it's Jonathan Ross at Grok. What is the blockers compute data algorithms.
Mike Krieger
It's getting the environments by which the models get trained in to better and better match real world challenges that aren't sort of single shot. I know Alex has been thinking about this problem as well because we talked about evals for agentic behavior as like one sort of very specific version of the broader thing that I'm talking about, which is even within the realm of software engineering. The work of a software engineer is not just to produce code, it's to understand what needs to get produced, to work out the timelines with their product management counterparts, to deeply understand the requirements and deeply understand the user use case that they're building for and then also delivering whatever they built in a way that they can be tested and iterated on and then as user feedback at the other end, if they're building some kind of public facing product, there's no eval for that. Right. There's like it's interesting that we call the sort of most common software engineering thing SUI bench. Right. Like to actually be a SWE is a lot more than just, you know, I looked at a pull request, I produced this pull request, you know, pull this to stiff and then you're going to accept it or not. So building environments and evaluations that better mirror that. We think a lot about office professionals at anthropic in terms of one of the use cases that is going to be potentially really multiplied by these models in the future. Nobody's really evaluating that. Well, there's something around research that we're starting to get a bit better around evaluations. There's extremely convoluted, I mean that in the best way. Eval's like humanities last exam which very much like okay, multi step reasoning. But there's yet to be the sort of I show up to a new job, I quickly understand what my role is, who is who in the organization, what are the relationships that are being mapped, where to go find extra information if I need it and then be in the sort of run loop of the, you know, the functioning of the business. That's a hard environment to sort of capture. And so that to me is figuring out how we better either break that down into component parts, which is probably part of the story, but also think about it holistically is the biggest block or two, at least one slice of progress, which is how do Models go from being extremely good at extreme slices of things to being more generally helpful.
Harry Stebbings
Collaborators before we dive into those kind of specialized products. On the data side, I had Adarsh on from Macaw recently. I asked him the question, and I'd love your thoughts, which is like, when we look at the future of data within models, will there be more synthetic data that compounds on top of each other, or will human data continue to be the predominant data source that drives model progression? How do you think about that?
Mike Krieger
I think for the models to improve, you do need a story around how do you perhaps seed it with original human data, but then can generate all these synthetic environments about which it can sort of pathfind and explore? Claude's been having fun playing Pokemon this week, which has been a good but kind of funny distraction for our own research and engineering teams. Like, what is everybody doing? They're like, oh, we're watching the cloud plays Pokemon Livestream. But I think games are an interesting example where you can imagine a lot of different runs through the same game within some constraint and rules. That gets a lot harder when the problem space is less well defined than, did you make it out of the Viridian Forest? I never played Pokemon. I'm learning just watching this live stream. But it's still important to be able to take sort of golden paths, but also synthesize a variety of approaches through it so that you can still think about how the model can progress in the face of uncertainty. So I think it absolutely has to be a mix. And I think the best models will come from that combination of great. Like for code, it's having good foundational understanding of code and good examples, but then also being able to explore a really wide variety of paths through that. The other part that is still, I think, underappreciated is how do you measure and evaluate and get data in for character? And I'm going to use a very loose word, which is vibes. What is exactly the feel of using a model? We don't really know until we actually like sit down and play with it. Which is in some ways kind of a nice property of it because it means it's almost this very qualitative, human like aspect to it. But it also means that you don't have good regression testing on it. Sometimes we'll go from Claude 3.5 to 3.7 and people will say, oh, Claude seems friendlier but more terse, or Claude seems more willing to answer my questions. But I wish it was better at creative writing. These things are not easily available. This goes to the data question. And so I think it is important to both be able to have the data in there around these more softer skills, but then also have the evaluations for them.
Harry Stebbings
You know what I find bizarre? I find it bizarre that we're able to choose models and you may go, well, duh, you will do because there's specializations within them. But I think when you project yourself forward three to five years, you will not be selecting which model you use. It's like selecting which Google you use. Am I completely wrong or do I completely miss the point?
Mike Krieger
No. There's a concept that I love from my background was in human computer interaction and you might have heard this term of leaky abstractions, right? Which is like with software builders, we try to do a perfect job of sort of encapsulating all the complexity under some little shell and then the users should not have to think about any of these things. And the reality is the current state of most AI product design is an extraordinarily leaky abstraction. The take having to choose the model. Why should you choose between Opus, Haiku or Sonnet? Most people don't understand the difference, right? Or if you go to the OpenAI dropdown selected, there's a lot of models in there and every single one of them has a good reason for being there. And yet the overall experience is one of why would I choose one over the other? Oh, this capability is available here but not there. We suffer from this problem as well. So model selection. The second one is once you understand how these models are built, they build up context. They have turns. Every turn actually has the full context replayed to it. That's how it's able to make the next inference. What that leads to is this experience where every chat is different, which I always think of when you're talking to a coworker. You might have different email threads, but it's still one coworker behind all of that. And if you reference some their favorite sports team or you reference a project you've worked on together, it's not like, oh, I don't know what you're talking about or I'm going to have to go retrieve my memory. It's sort of like a shared underlying piece that's like another. It's forcing people into a understanding of the models that I don't feel like we should be having people need there. And the last one is prompting, which is as much as things have evolved and we've done a bunch of work around, like how do we take simple human prompts and then translate some into ones that are very model optimal. I want to make that absolutely transparent to people where it's not something that they're like they're engaging with it. And if the model has a lack of clarity on the problem or needs help understanding better that then it engages in conversation rather than seeing the difference between somebody who's an extremely good prompter versus not now that gap closes generation to generation. But we need to collapse it even further.
Harry Stebbings
How do you think about model quality versus product in UX and how to prioritize and think about those two and the relationship between the two?
Mike Krieger
You can't separate the two anymore. And I think to be a UX designer, I was just in a product review right before our call and I was thinking about Instagram product design sessions. It was pixels, some synthetic data or maybe real data. We took my feed and then reformatted it to this UX that we're proposing. But there's not a lot of non determinism there. You're going to put it out to the world and maybe people will use it in some ways. But designers and product managers and definitely engineers today need to think, all right, what I'm actually doing is I'm designing a scaffold and a product around a fundamentally non deterministic system, which means the evaluation, the model quality, the prompting on the back end all is part of the product design, which is it's going to have direct implications. So one example is you can prompt cloud to ask follow up questions or not. And that might be what you want in one part of the product, but not another part of the product. Right. You might prompt Claude to want to go and think longer about a problem and do more reasoning or not. And again, these are all decisions that upfront you are making in product design and they're going to have this manifestation in the actual product. And then the other piece we talked a little bit earlier about, as a startup founder or as somebody who's doing maybe classic B2B SaaS, you need to figure out, triangulate where the models are, where they're going and what the user needs are together. That's going to be the case in your product design as well, where you're doing the evaluations hopefully upfront to see if what you're doing is even possible with the current models, or at least having an eye out for where they might be. But models change over time, products change over time. If you don't have a good framework around evaluation, even regression testing those evaluations, you might end up launching a product that three months later people are like, oh, the product used to be good, but something else has happened where it's no longer serving that purpose. And you're like, but I'm not sure which of these three things change. Is it the model? Is it the product design? Is it the introduction of a different feature? The system prop got longer. It's in many ways the most complex product development work I'll ever do.
Harry Stebbings
I interviewed Sam in London from OpenAI and he said one of the joys that they have as a startup is that they can just release things much quicker and it doesn't have to be perfect. And actually the challenge is as they've got bigger, you have more and more weight and pressure placed on every release. How do you think about that release? And it doesn't have to be perfect. Let's get it in the hands of users versus now. Anthropic is a massive company with millions of users. It does. How do you think about that as. As the product leader?
Mike Krieger
I think about this a lot and especially because you have different surfaces and different audiences that have different both expectations of stability or sort of desire to be on the cutting edge. And so in an API product like people, value is predictability and stability and the opt in of something that's more future facing. Right. And so it can be a very opt in thing. So I remember we launched prompt caching, which is a big cost savings for people. Initially we did that via like a beta header that you had to opt into. And a lot of what we do on the API is in that bar. If you do that for our customer facing like our more consumer stuff. That's really lame to have to like have people opt in or like really you want to be able to sort of iteratively release and be experimental with folks and you know, you don't have to totally break their experience, but you got a little bit more of that permission. And then we have all these enterprise customers that are using Claude for work in an enterprise now. I think AI adoption in the enterprise is still a early adopter product in the enterprise. So you can get away with more than you know if you're. I don't know how many releases Salesforce does a year, but I know a lot of these companies do like two, right? Or three. And it's usually oriented around some big event that they can do. Like we're really far from that. We're still launching pretty quickly, but we're honestly still finding the balance there. Where, you know, is it A monthly drop is it. You know, you ship as often as you can, but there's an admin opt in on each kind of thing that adds complexity as well. And so I. It's a great question. We're. I would say, like it's an active topic of conversation, how raw or how quickly we can ship, knowing that we want to bring things out to the world and you don't know how they're going to be received and you want to learn, but as you accumulate sort of notoriety or, you know, people start depending on you for workflows, you can't treat that completely sort of wantonly.
Harry Stebbings
Are we in a product marketing nightmare? And what I mean by that? We have Deep Seat release something this week. We have OpenAI release something this week. We have Anthropic release something this week. We have mistral release something 10 days ago where bluntly, every single day there's a new release that the world maybe gets apathetic. How do you think about that and how does that inform how you think about product launches? Messaging?
Mike Krieger
Yeah, I mean, it is much more complicit. Instagram, the things that you had to watch out for. The big rocks were very known in advance. It's like, don't launch anything during WWC week. That's going to be a flurry of announcements with the September iOS event. You know, there might be some other big rock, like holidays. So much easier from a product marketing perspective. We're here. It reminds me a little bit of Crossy Road where you're like, okay, the car's going by. All right, there's a gap in the car. Like, launch tomorrow or like, now it's good. But, oh, now we hear there's a rumor. It's so much harder. And I've heard from folks at other labs as well that everybody's kind of trying to read the tea leaves and be like, all right, is anybody. Is it quiet? Is it okay to launch now? Or like, I think we're going to. We can do it next Tuesday. So it's much harder.
Harry Stebbings
Go, go, chip it.
Mike Krieger
You know, it requires a completely different approach. And, you know, I give credit to our, literally our product marketing team because they've had to orient from a point where, you know, we were Cloud3.7 sunup, we launched on Monday and we locked the blog post for that Sunday night at 9pm which is not best practice from a marketing perspective. You know, we were briefing press that day on Sunday. Thank you to folks that helped on the phone with us on Sunday. But that's the point where everything is done and ready and locked and we can like, we can go. And so it does involve that sort of. Of ability to react quickly and be nimble. I mean, even things like the. When we release a model, there's a model card and there's evaluations and a comparison table. There are things in that comparison table that were released the week before, right. Like Grok 3 was just a week prior.
Harry Stebbings
So it involves what happens when those are released, when Grok 3 releases theirs. Like, jokes aside, like, does everyone at Anthropic and OpenAI get by it? Oh shit, they beat us again. Or like, oh shit, we won. Yeah.
Mike Krieger
One of the things I try to do to support the team there is remind it's the model releases are going to happen and at any given point you are going to be. It's so over, we're so back that cycle, you have to live that in AI and you can't get too down about one release because yeah, for sure it is inevitable. And sometimes you're lucky and there's two or three months where the model that you launched or the product that you launched is still state of the art across all the things you really care about. Sometimes it lasts a week and you can't over rotate on either of those. You can't rest on your laurels, you can't be all that. I think the thing that's really useful too is it's a chart I show to almost every sales call, which is just mapping from Anthropics founding to where we are today and the milestones. And at any given point you could say, wow, Claude 2 that's pretty far behind. Oh, Claude 3 State of the art. And then no, it's not. And you got to look at the trajectory and trust that you are going to continue to make improvements is number one. And then number two, remind yourself that if everybody switched every single day purely due to eval being changed, one, that would be an insane thing to do to your user base as a provider of software. But two, that would make for an even crazier industry. Over time you start learning that people don't just deploy models, they're doing fine tunes or they're deploying models. Plus they've done a lot of really bespoke work to make that model be great for that use case. It's not a thing that's going to switch overnight or you're one of three or four options within a model selector, right? Which like for example, in a coding environment, so you're still in the mix and you still have a chance. But I'm not sure if it's like finding the meditative zoom out angle of it or just like get used to the bumpy ride or some combination of the two. But it is for sure a thing that like every time there's a model launch, I assume every one of those labs is like either watching the launch stream, looking at the VL Zimba, either, or, all right, now we got work to do.
Harry Stebbings
I would argue that brand is the most important thing. To your point, people aren't switching every day. They're kind of like, oh, I'm a Claude person or oh, I'm a chatgpt person. And they kind of identify already with their models. Do you agree with that statement or do you think that's too glib?
Mike Krieger
I don't. I think that is right. I think especially on the consumer front, you know, I was just reading Ben Thompson. You know, he has Nat Friedman and Daniel Gross on there pretty often and they're talking about some people being cloud people and some people with chatgpt people. I think that definitely happens where you like the personality, you like the interface design, you like the vibe. Again, it actually reminds me a lot. You know, we had this interesting back and forth with Snapchat over the years with Instagram and then even before that people would launch a new product that's like Instagram but just for super high end photographers or with this like additional twist or one photo a day, you know, it's bereal. And I had this like fake formula. I'm not the mathematician clearly at Anthropic, but it was, you know, social networks are made up format or formats that you have in your product audience and vibes and format, you know, for Instagram you got stories. We had feed and then eventually we had video audience. You know, initially was sort of hipstery photographers. Eventually grew to be anybody that's really interested in sort of visual storytelling or visual media. But the vibes of Instagram, even when we had more product similarities to a Snapchat, even to a Facebook, the vibes were very different. And I don't know what that fake formula is for AI products yet, but I think it's some version of that where there's like model personality is probably one of them. There's likely something around the scaffolding prescriptiveness of the product that you're working around. And then there's vibes and again, hard to measure, but absolutely there's.
Harry Stebbings
Can I ask you a hard one? When we have so Many different models and so many different providers. Open source is a very viable possible route and distillation is looked to of in a shady way. Is distillation really wrong if it ultimately propels spaces forward?
Mike Krieger
Well, even like let's take within the labs, like I assume every single one of the labs is using like even within themselves, like it is very valuable to be able to take the knowledge of your highest end model and then be able to make it lower latency, more affordable, et cetera. So there's that loop, et cetera. Overall, I think the places where this gets interesting are one, do we want any nation to be able to be able to distill models from any other ones? Personal answer is no. I think that there's value in even as AI gains in capabilities. Being really thoughtful about that from a national security perspective. And then the other piece is to have the advancements happen at the rate that they happen and be sustainable long term. You do need the labs to be able to be able to commercialize all of that training and innovation, et cetera. And I think finding the right models for that long term is important. I think the open source models, take LLAMA for example, they've been able to do that from their own research perspective in data ingestion and training. And so I guess I would say distillation does not feel essential in order to unlock those things and poses other issues even just from a terms of service perspective.
Harry Stebbings
Does LLAMA show that there is no value in the model and all the value is in the data? If Facebook are willing to give it away for free because they know that no one can copy the data that they have, is that what that shows?
Mike Krieger
I think it's a good interesting question is like whether LLAMA is the quality of LLAMA due to the fact that they can. I don't know if they've said that they do, but they clearly can train on Instagram and Facebook and et cetera, data or is Gemini better for being able to train on YouTube? It's actually clearer to me that Gemini benefits from that. Whenever they have a good video understanding demo, for example, I'm like, well somebody has probably the largest repository of video in the world and can likely train on a lot of those pieces. Less clear on the Facebook front. I've never heard from people, gosh, you know, what LLAMA does extremely well is generate good content that would work well on social media. It just seems like a good general purpose model. So I'd actually go back to the value is all in how good is your team, do you have the underlying data that you need to do it? But then also how useful is your model in actual use cases? And that is the highest order bit. I almost wish I'd started with that because evals aside, evals are really useful for hill climbing and for internal research, but they don't tell the story of is the model going to be excellent at what it needs to be excellent or deployed for? Or even if it is excellent at that thing, is it only excellent at that thing in very narrow situations? Or is it something that as an entrepreneur outside the labs, you can rely on the model to be your representative, I guess, in that product. So yeah, I think the values for the labs, the values in the team, it's in the model's ability to actually perform the right actions in the real world without so much non determinism that it becomes sort of unreliable.
Harry Stebbings
I'm going to ask one question on this. It's not a trap to go down, but I've spoke to Alex Wang about it on the show and Isa poolside on the show and they said we deeply underestimate China's ability in AI. Do you agree that we underestimate it?
Mike Krieger
Yeah, I think the deepsync piece, people seem surprised that there were sort of cutting edge research teams there and if you were paying attention, that part should not have been the surprising piece. Instagram was blocked in China fairly early and then we saw the sort of emergence of a parallel world of startups when if you take up Facebook and Instagram, what happens and what emerges? And those products were often very high quality. They demonstrate a lot of creative thinking and were built at scale too. They were solving problems. People love talking about the super app and WeChat and there were some technical challenges solved by those at scale that were of the same scale of challenges that Facebook was doing. So it would absolutely be a mistake to have underestimated or continue to underestimate China's ability to both, both train at the frontier, especially if they get access to compute and then continue to innovate there too. So I think it's a pretty western centric view that I've definitely seen happen in more traditional software around like, well, maybe it's caught in this 90s, early 2000s view of like, oh, all they're doing is replicating what's already been working elsewhere and doing that. There's been products that I think take a differentiated view and grow internal to the Chinese market and then sometimes make that over external. I mean TikTok being an interesting example of that on the other side, final.
Harry Stebbings
One, before we move into verdict products, did deepseat cause you to rethink anything or change anything about the way that you progress?
Mike Krieger
There's some architectural pieces and I won't speak for the research team because they're definitely the deep experts where they're like, oh, interesting, that's worth us considering. Or some ideas that had been considered and maybe were worth reevaluating. So I think there's that piece there as well. It's interesting. Our plan was already to show the chain of thought when we launched our reasoning model. So that was not a reconsideration. But maybe it was interesting to see somebody else do that. And there's some user interface kind of details in there and I think GROK does as well now on theirs. So it'd be curious to see how that evolves through your distillation question. That might be a reason why more labs either choose to not show or otherwise obscure the chain of thought down the line. The other piece that from a product perspective, there were two. I think that's like the under talked about piece of deepseak. I think they were able to go from nobody knowing about them to them being frankly in many circles better known than Claude. My great aunt was calling me about Deep Stick. I'm not even joking. It was like cliche was actually happening. I gotta think like, what do you think about Deep Stick? I'm like, great, it's broken through and that to me, what do you think.
Harry Stebbings
They did to break through that maybe Claude hadn't?
Mike Krieger
I think there is a lot of interest, of course in world politics right now and have the narrative be this was much cheaper. And whether that was exactly true or they were able to figure something out, it's the story. And frankly, and I've had this conversation with our marketing team as well, I don't think we tell the Claude story well enough externally yet around what is different or what is notable about the fact that, you know, at Claude 3 we were training a model at the frontier that was state of the art with a team that was much, much, much smaller than any other lab. Right. And I think we've been always very efficient with our compute as we train. So I don't know. I think whether that was a story that they told or was just told for them by the media because it legitimately was a really compelling story. The sort of uniqueness of the moment was a big, big piece there. And I think especially like it's January, you know, new presidency, China relations like Fed into the moment really, really, really well. So I think that worked well. And the second part on the product, like they went from not having a product to having like an iOS app that actually had a lot of like good details. And for me it was like a good like let's say nudge, but it was like stronger than that, like a shove around. Like we need to be getting some of the ideas out to market quicker without to your earlier question focusing as much on exactly the polish that it needs to happen in every situation and instead be willing to put it out there and learn because. Because sometimes the novelty of experience is itself valuable. Right. It was the first time most people experienced the live chain of thought. That's interesting. And I wish we had done that sooner because it would have been novel for people to experience that.
Harry Stebbings
When you look at usage, you see emerging markets, usage retains and you see western markets not really at all. How do you think about them as a standing credible threat?
Mike Krieger
I think that they already have this sort of. They're known at a level where that has some ability to generate that ongoing staying pattern, et cetera on a retention front. I think if all we're doing in these AI first sort of lab generated products, even six months from now or a year from now is asking questions, maybe sometimes having slight proactivity, I don't think that's differentiated or interesting in the long run it should be, wow, I can now do something uniquely because I am using cloud or I'm using Deep Sea or any one of these products. And it unlocked hours of work for me and it made me smarter and it made me a better partner to whoever are the important people in my lives. It has to transcend beyond surface level utility. Some people find the deeper level, don't get me wrong, and those are the people that are your DAUs right now. But for a lot of people, they'll try it. They generate a poem with it, they write a letter to their son. There's all this stuff that they can do that provides some value in the moment. But I still think we are in day one around. Is AI an indispensable part of most people's work? And I think the answer is no for most of them. And so I think Deepseak and all of our honest product staying power will come from who can get there and do that sustainably over time and have the right product design, the right integrations and the right deployment of that to actually succeed.
Unknown
And who can build those products?
Harry Stebbings
My as an investor's big question often, which is when Does a model provider move into an application provider? And I'm just fascinated to hear your thoughts around what is attractive enough where you dedicate the resources to become an application provider, not just a model provider enabling.
Mike Krieger
I think the two main criteria that I look at is because our team, for all of anthropic being big, you know, I think we crossed a thousand people. Our product team is, you know, maybe a tenth of that. Like it's by Instagram year two standards very large. But by, you know, large SaaS company, very small. We're somewhere in between all of those differences and we're supporting like you have cloud code now, we have the API, we have cloud AI, we have cloud for work. So it is across a lot of different surfaces. So I think generalizability is really important. Even if we pick a Persona or a vertical to go after, we are going to be building things that are general purpose as a rule, with maybe some specialization at the user level, but not at the. I don't anticipate us building a lot of verticalized experiences that are like fairly bespoke to a given model workflow or use case. So I think that's one piece like.
Harry Stebbings
Translation, transcription, customer service, quite horizontal kind of homogeneous things. That seems like right in the pathway.
Mike Krieger
I think it does, except for the fact that I think that there's a lot of valuable workflow knowledge that means that you can retain a differentiated product over time.
Harry Stebbings
Like if you're a power user, yes, perhaps. But if you're not a translator and you're your mum who maybe uses it once a month for that odd thing that she needs.
Mike Krieger
Yes. Yeah. I think the role of the great we can help you translate this and from like individual user will get you to pay $10 a monthly subscription. That feels iffy because I think that the models are quite good at that already. Right. And maybe you're right, there's not the like if you play with elevens like like console and workbench, a lot of the features that they've built are very clearly for people that are translating hours or like voicing hours of content with a reliable voice across a whole workstream. Descript, I think descript is some of the best product design in AI and like they've clearly put so much time into the workflow of it. I had to use it once for a personal podcast. It was like, oh, this has clearly been built by people who are like day in, day out sitting in this workflow and understanding it. So yeah, I think that maybe we've come to some Synthesis of our views, which is there's value in the more professional use cases and the workflows that are unlocked by that. And I think on the consumer and maybe even prosumer side, it gets good enough. From a basic AI product perspective.
Harry Stebbings
When you look at what you're brilliant at today, you do so well. As we said on the code front, is there a roadmap here to put your own IDE in, code agent in? How do you think about that?
Mike Krieger
Again with the product focused lens, I think we have to pick our bets carefully and even building. We built Plaud Code, which we just released as a sort of command line agentic coding tool internally first because we just wanted to accelerate our own team. And after seeing it play out for a couple months, we're like, this is good. It's not a solution to all coding problems and doesn't obviate the ide, but it's useful enough to us in enough cases that we want to see people use it out in the real world and so. And shipping is never free, right? There's like you got to name it something externally, we got to find the right packaging around it. There's a go to market piece. We do it carefully. I think my view of where the models are today is you still need a hands on keyboard and you still need that exchange of hey, I did this. Is this right? Well, let's pursue this direction down. Yes, this is great. Let's put up a blower cross or we went down kind of like a false trail. Let's unwind the stack metaphorically and maybe an actual usage and then keep going. That's why I think that there is a role for this sort of in between IDE and the full on cognition Devon. Full on delegation of tasks within can be used for a certain category of tasks. Our product engineers love Claude code because a lot of product engineering is all right, we got to update the backend, we got to create the front end. We've got to submit these things for translation. We're going to like, you know, oh, this still doesn't work. Let me do this. And it's that sort of build the product end to end workflow that does well with a thing that can work agentically across a lot of different things. I did two pull requests last week. I hadn't coded since joining Anthropic, which made me sad. And so I got to finally use Claude code. I have not opened our code base before, so I don't really know like how it's even structured. But Claude code is very good at finding the file that has the right piece and then going on and making edits. And obviously not everybody's in the in the same situation I'm in. But it is really valuable for those use cases. So when I think about the coding space and where we can play and add value, it really is on the agentix side. It's not on the IDE side. There are other companies that are spending like they wake up and go to bed every night thinking about how do we make a great ide. And that involves things like low latency autocomplete that involves like the right integrations, figuring out how you play with the VS code plugin ecosystem and all of that complexity. Right. There's a bunch of work there that is valuable and different than what we're doing. I think we can really play in. Let's be talking to these models and be doing real work with them in that agentic loop. But recognize that they're not yet at the place where for many use cases you can let them kind of run free for hours. You need that more human in the loop piece.
Harry Stebbings
You power and you work with cursor Codeium Statblitz. My question to you is when you look at bluntly as you said that the first time you've coded since joining Anthropic and the changes that we see in development developer behavior, what will the role of a software developer be in three to five years time, do you think?
Mike Krieger
Yeah, I mean I think it already looks. Starts to look different already. I was a huge early proponent of GitHub copilot. I think my quote was on the homepage for a while. I don't know if it still is because I saw the potential and then even GPT4 came out before they had multimodal and I was trying to do swift with it. I would draw ASCII art of the screens I was trying to build for artifact and then go make coffee because it was at that time quite slow and come back and it had like an 80% version. Obviously now it would be a 95 to 99% version but something like 37 sonnet. I think the skills that become important one I think it becomes multi. What am I looking for? Like multidisciplinary where it's knowing what to build as much as it is knowing what like the exact implementation that you want. I love that about our engineers. Like many, maybe even most of our good product ideas come from our engineers and come from them prototyping and I think that's like. Like what the role ends up looking like for a lot of them. The second piece is code review really changes when all of a sudden you're mostly evaluating AI generated code. I even experienced this. I put up a pull request and some of the comments that came back were, yeah, Claude, code does this sometimes. We don't actually use default arguments in this case. And I was like, oh, well, damn it. So it was sheepish. If I was coding it, I would have probably noticed those patterns a little bit better. And so there's kind of two sides that need to happen. One, models and just the infrastructure on models need to learn from code bases and code reviews better so that they can produce code that feels idiomatic to that company. But then also, how do we evolve from being mostly code writers to mostly delegators to the models and code reviewers? That's what I think the work looks like three years from now. It's coming up with the right ideas, doing the right user interaction design, figuring out how to delegate work correctly, and then figuring out how to review things at scale. And that's probably some combination of maybe a comeback of some static analysis or maybe AI driven analysis tools of what was actually produced. Like, is there security vulnerability? Is there some other flaw? Is there a bug? Computer use plays a part. So you can tell I get very excited about the space automated testing of ui so that what would be great is you delegate the task a year from now. Three years is crazy. Let's even take a year from now. You delegate a task to it. When you come back to it, it says, I evaluate these three approaches. I tested them all out. I had a different agent actually try them out in a browser. This one is the one that worked best. I've run it through this additional agent that did a vulnerability test. It all looks good. All we need to do is help you resolve this one question. Let's review this particular critical section of code to make sure it's what you really wanted. That feels like you're suddenly empowered to be more of a manager and delegator to these things rather than just a partner in the loop.
Harry Stebbings
You said three years sounds ridiculous. A year would be much more realistic. I agree. And I get you. When we look at the speed of scaling, do we think that we hit a plateau or an asymptote in product releases, the speed of development because it feels so fast. Now to our point earlier, do we hit that plateau or do we continue in this exponential progression movement?
Mike Krieger
There's a question I think a lot about. I started the year by Looking at our product development process and looking at where we are cloudified, like where are we using Claude and where we're not. And you look at it and say, okay, CLAUDE can be useful in sort of taking initial ID and creating a PRD out of it. And cloud can be useful obviously in the coding side. Cloud can be useful in synthesizing a lot of conversations that people are having about a product and kind of like finding the kind of thorny issues of disagreement driving alignment and actually figuring out what to build is still the hardest part, right? That is actually the only thing that is still best resolved by just getting together in a room and talking through the pros and cons, or going off and exploring it in Figma and coming back. And so like any dynamic system, if you optimize one piece, all of a sudden something else becomes, becomes the blocker or the critical path and alignment. Deciding what to build, solving real user problems and figuring out a cohesive product strategy still very hard and probably the models are more than a year away from solving that. That is the constraint. It's why I'm really bullish on at least startups being able to explore the space because I remember this from both Instagram and Artifact days when it's just a couple of you. Alignment is a coffee conversation in an afternoon rather than steering the ship of a large company that has commitments to customers and all of those things. That's still a very human problem that I think we're at least three years away from the models being solving at that level of abstraction.
Harry Stebbings
Final one, I just have to ask before we do a quick fire, but we mentioned some end products there and building them. When you think about building end products for consumers versus building the API division of the company, which is very significant, how do you think about the balance and the trade offs there between building an API business and building an end user consumer business business.
Mike Krieger
There's what we get out of beach and I think about that trade off. So I think we learn a lot more quickly with first party products. So as a specific example with CLAUDE code, within a week of it being deployed internally, we had found a way in which one of the sort of tools that it has access to, the model wasn't using as well as it could have and that made its way directly into 3.7 sonnet. That's a way in which internal dog, the first party tool, directly led to a model improvement in the next generation. There's like a few other places where we've hit that even building first party products much Harder with a third party product. Right. Like they'll tell you if something's wrong, but it's a bit more. I was like that. Even though we work really closely, including with some of those coding startups that you mentioned, it's still not the same. So there's a lot of value in what we learned there. Then there's the sort of stickiness and sort of. We talked about brand and loyalty. I think it's easier from a consumer if you can build a brand around a product than just an API. The fact that we power a lot of these coding products is visible to people. It's often the default in the drop down selector. If you're in the know, you know, but not everybody does. And it's still, you know, not the thing that they downloaded, not the thing that they install that they're going to tell their folks about and. But yet it's also a place where we've gotten tremendous distribution and we're not going to invent every company and we're not, you know, this way. We can kind of play this sort of. It reminds me of my like investing days where you get to see a lot more and there's more than one shot on goal and it's not all of those things. And so it's been actually a fairly from like a resource allocation perspective, fairly even split. I think we've if anything under invested a bit in two things. One is just having a faster iteration speed on first party products. It's like my current obsession. And then on the second part on the API side, how do we build abstractions beyond tokens in, tokens out. And every time we do that we get great feedback from people. So, so whether that's helping the model plan and work agentically, whether it's having the model build more knowledge graphs and repositories of how companies operate internally, if you're using the API to build more of an internal knowledge product, whether it's perfecting tool use, whether it's understanding very large bits of context and having memory that transcends conversations, those are problems that I think are worth us solving on the API because there are things that we can take what we learned on the training side and directly map it to the API and build good products around it. So that's how I think about those. But it's a new problem. And Instagram was easy. It was like 95% product, 5% API and it was, you know, that's all we really needed to do.
Harry Stebbings
What can and will you do to increase product Speed on the first person consumer side, I think there's two things.
Mike Krieger
One is recognizing that we were running I think a larger company playbook for what is actually like we're still on like Start. Our products are even if the company has good traction and like the API business is doing really well and people are using cloud AI and upgrading in Clouder I Pro. It's still early days and it's still like do or die or like make it or break it. So we need to operate in that way. And so getting the right people together sooner, faster and ignoring organizational boundaries, we got too calcified I think. And like, oh well this is on this team's plate versus this team's plate and oh, you can't get this done this quarter because it's not on this team. I mean I get why organizations evolve and some of that is natural, but we can't afford that right now. So it's been a lot more who are the right people? Let's get them together. Let's clear out all the other distractions and then like, like, let's clear out my calendar so that like I spend more of my time in product review and design review than I do in administration.
Harry Stebbings
Deepsea showed the benefits of constraints. Do Western companies, Respectfully, you and OpenAI have too much money.
Mike Krieger
The way I would put it is the adoption that we've gotten of our products is ahead of their actual like true product market fit because they are still the best ways of getting the models and I don't think that's durable over time. So I think that's like not a thing that to rest on. And two, I just think we're underserving people because I don't think we've gotten the right products yet. So it's what I wake up stressed out about every morning or inspired by depending on the day. It's like I think we've got all, we've got so much work to do on that side.
Harry Stebbings
Listen, I want to do a quick fire round. So I say a short statement, you give me your immediate thoughts. Does that sound okay?
Mike Krieger
That sounds great.
Harry Stebbings
What's OpenAI done better than you on?
Mike Krieger
They've moved faster at shipping V1s, even ahead of where the model is sometimes.
Harry Stebbings
What have they done worse than Yuan?
Mike Krieger
Probably personality and having the features they built be cohesive.
Harry Stebbings
Which alternate model provided you most respect OpenAI?
Mike Krieger
I think that they've balanced first party product development and an API that people use at scale as well. But we had an Instagram principle that was do the simple thing first. And I think they often do the simple thing first.
Harry Stebbings
If you could rebuild the anthropic product and stack from scratch, what would you do differently?
Mike Krieger
I love this question.
Harry Stebbings
I do too. It's a good one, isn't it?
Mike Krieger
Yeah, it's a really, really good one. I think the things that we built that were actually very valuable last year are now feeling like they're having. This is a long answer rather than quickfire. I'm sorry. Have some cost to the information architecture, which almost sounds like a very nerdy way of describing it, but basically like people should not have to think about like projects versus artifacts versus chats and how they all relate. And I think tearing it all down and being like what actually matters is do you have the right context into the right conversations? Do you feel like you can always know where to go next in the product? And is anthropic and cloud itself being a helpful sort of guide to what work is most important to do next? Is a different paradigm than like I know to create a project and then like if you get good at that, it's an amazing product, but there's a lot of steps along the way. So that was that on the product side. I think that's the fundamental thing on the stack. I mean cloud AI and probably chatgpt.com were very much like initially just built to be sort of showcases of the models and not really built in a lot of ways to be the sort of foundational for a much more complex sort of multi product sort of thing. And I think we have an active effort right now around tearing down some of that and rebuilding the core UX to just feel good. It doesn't feel great right now. It feels a little bit like it's been an evolution of a product that served a purpose at the time, but now is being asked to do way more things such that the incremental thing is now both harder to add and getting slow.
Harry Stebbings
What have you changed your mind on in the last 12 months?
Mike Krieger
How much first party stuff is important? I think I saw the growth in the API and I was like, this is what we should just invest a lot more of our time in. And I think that you'll miss out and not have enough of a the durable moat if you're not equally investing, maybe even investing even more on the first party side of things.
Harry Stebbings
How much did it hurt you being late to that?
Mike Krieger
I think significantly if you take a deep seek moment, right. Ideally the story of, oh, there's more than one leading edge AI product to be used is a narrative that we should have captured. I think it hurt us there.
Harry Stebbings
What's a major technical or product challenge on the horizon in AI that no one's talking about that you think is critical?
Mike Krieger
The models, as they get more. They'll give you the headline which is basically like discernment and privacy. So as the models get more capable, they'll also become more knowledgeable. Right. They'll have. You'll be in conversations with them about everything from something that might be quite intimate or something that's quite sensitive from a company perspective. Or they'll have access to all of your particular company's things. And then everybody loves to talk about agent to agent interaction. Right. The intersection of those two. Not enough people think or talk about, I think which is do you trust your MIC agent or your Harry agent to be out in the world and then not be jailbreakable or reveal something that it knows that is quite personal or sensitive? I think my metaphor is my five year old. It's great watching her with somebody that she's just met because she doesn't quite differentiate between stuff that's secret and private to our family and stuff that is things that are okay to talk about with a new friend or somebody at the checkout aisle. So. So that discernment is something you acquire over time for people. And I think models, this is very underappreciated and probably under researched as well from a model capabilities perspective. Because models fundamentally want to be helpful and that is not always what you want them to be. And there's a safety case for that. But then I think there's also a privacy and data security case for it too.
Harry Stebbings
Do you worry about your 5 year old becoming more comfortable talking to models and agents than they are humans?
Mike Krieger
I've had so many conversations with Alex Wang about this because he has this whole thing about how in the future most friends will be AI friends. And I don't think he's wrong. And I think that there's ways in which that's already starting to be the case with people having lots of online game experiences and some of those are NPCs and you might just have more of a comfortable sort of existence in there as well, even if you're not breaking through. So I worry she is so gregarious that I'm not actually worried in her particular case. But let's abstract to the broader sense. There is a lot you can learn from what it feels like. Here's the bull case. I was a fairly awkward teenager and I probably could have benefited from some practice mode AI interactions around some of these things to build it up. And at the same time that's not the real. It doesn't feel like it's totally closing the loop around the consequences of real interaction. It's the difference between reading about what it's like to have your first really hard argument with your high school girlfriend and then actually having it. And when you're in that moment. This is now the classic. Is it the Chinese room experiment where it's not the Chinese room experiment, it's a different thought experiment where somebody's in a black and white room only reading about red and they go into a role and they see red and you're like, is there something qualitative differently about that? Absolutely. And is there something different between talking to a model and engaging a model, even in emotional roleplay, and then having that same interaction with a real human? Absolutely. And so it is probably a helpful piece of future human interaction and absolutely insufficient as the whole.
Harry Stebbings
Does Europe become more or less relevant in an AI driven decade?
Mike Krieger
Europe I want them to do well because I love a lot of Europe and I lived in Portugal growing up as well. I saw a funny, maybe somewhat defeatist argument where if real world experiences and human interaction become more valued, Europe becomes more valuable itself as, as the perhaps world capital of sensory experiences. That feels weird if that's all you're resting on. That feels a little limited in there as well. What I think will be really interesting from a Europe perspective or European perspective is what are the things? A thing I really respect about Europe is there's often been the case that there are things about the lifestyle or the society that they hold very, very strongly that then they not always elegantly, but at least attempt to enshrine in either best practices or even laws. And so even as we think about doing our product design and data privacy and selling to German users or German companies, there's a different set of questions that get asked that are often very helpful questions. And so maybe the bull case there is that those are actually questions that are relevant to everybody and they will just be at the leading edge of asking some of those questions. I think from a labs perspective it's a lot harder question to answer. I think there's maybe some combination of access to compute, maybe they move further up the value chain and if it is the case that building applications on top of these models it is a lot easier and you can go from 0 to 1 and you can be more nimble than even these labs that are going to all have tens or hundreds of millions of users and you have to move slowly at that pace. Can innovation happen there? Probably, but it probably involves a different, both regulatory and startup ecosystem environment to really make that actually the case.
Harry Stebbings
Final one. Dario has said that this will be the generation that could live to 150. I'm slightly butchering and summarizing his quote, obviously, but this could be the generation. I'm very optimistic. My mother has multiple cirrhosis that will find cures for diseases like Ms. With AI. Do you agree with his optimism and how do you think about AI increasing longevity and human lifespan?
Mike Krieger
Yeah, I think the potential is huge. I think there's everything from today where AI is helping, which is in closing the loop on drug discovery and closing the loop on clinical trials. Novo Nordisk used to take, I think it was something like 15 weeks to do their clinical trial reports. Now they use Claude and get it done in 20 minutes. And that's a step change. Now there is years of research that preceded that. So I'm not saying that we've cut years to weeks or years to minutes, but that's a point of the process that we can make faster. And that's like with the models today. Then you see arc, which is this science and research institute that Patrick Collison and some others have started and funded. They're working on foundational models for cells where you have all of a sudden a real cell model that you can run experiments on. And that kind of thing should also accelerate drug discovery and experimentation there tremendously, because all of a sudden you're cutting the loop there. So I'm very optimistic. There's a lot of places where AI is I think, underutilized relative to its potential. And I think some of the smartest people in the field and the smartest minds of my generation were working on serving more targeted ads. Maybe that was true at one point. I think a lot of them today are working on how do you make models that are tremendously useful and valuable and intelligent across a lot of domains.
Unknown
Mike, you've been fantastic.
Harry Stebbings
Thank you so much for letting me just completely unpack all of my questions on you without warning. But you've been amazing.
Mike Krieger
My pleasure. Really fun to do this.
Unknown
I just love doing that show with Mike. And if you wanted to see more from the episode, you can find it on YouTube by searching for 20VC. That's 20VC on YouTube. But before we leave you today, turning your back of a napkin idea into a billion dollar startup requires countless hours of collaboration and teamwork. It can be really difficult to build a team that's aligned on everything from values to workflow, but that's exactly what Coda was made to do. Coda is an all in one cloud collaborative workspace that started as a napkin sketch. Now, just five years since launching in beta, Coda has helped 50,000 teams all over the world get on the same page. Now at 20 VC, we've used Coda to bring structure to our content planning and episode prep, and it's made a huge difference. Instead of bouncing between different tools, we can keep everything from guest research to scheduling and notes all in one place, which saves us so much time. With Coda, you get the flexibility of docs, the structure of spreadsheets, and the power of applications, all built for enterprise. And it's got the intelligence of AI, which makes it even more awesome. If you're a startup team looking to increase alignment and agility, Coda can help you move from planning to execution in record time. To try it for yourself, go to CODA io20VC today and get six free months of the team plan. For startups, that's Coda iO20VC. To get started for free and get six free months of the team plan. Now that your team is aligned and collaborating, let's tackle those messy expense reports. You know those receipts that seem to multiply like rabbits in your wallet? The endless email chains asking can you approve this? Don't even get me started on the month end panic when you realize you have to reconcile it all. Well, Pleo offers smart company cards physical, virtual and vendor specific so teams can buy what they need while finance stays in control. Automate your expense reports, process invoices seamlessly and manage reimbursements effortlessly all in one platform. With integrations to tools like Xero, QuickBooks and Netsuite, Pleo fits right into your workflow, saving time and giving you full visibility over every entity, payment and subscription. Join over 37,000 companies already using Pleo to streamline their their finances. Try Pleo today. It's like magic, but with fewer rabbits. Find out more at PLEO IO 20 VC. Don't forget to secure trust with your customers. Trust isn't just earned, though, it's demanded. That's why over 9,000 companies, including Atlassian, Core and Factory, rely on Vanta to automate their security compliance. So Vanta helps businesses achieve certifications like SoC2 and ISO 2:7000:2001, turning months of tedious work into this beautifully fast and straightforward process. Their platform automates compliance across over 35 frameworks. It centralizes workflows and it proactively manages risk, all while saving you time with automation and AI. So whether you're just starting or scaling your security program, Vanta connects you with auditors and experts to get audit ready quickly and build trust with your customers. Get $1,000 US off your first year by visiting vanta.com 20vC. That's V A N T A COM 20vC. As always, I so appreciate all your support and stay tuned for an incredible episode this coming Wednesday with Anton at Lovable, the fastest growing company in Europe.
The Twenty Minute VC (20VC)
Episode: Anthropic CPO Mike Krieger: Where Will Value Be Created in a World of AI | Have Foundation Models Commoditized | When Do Model Providers Become Application Providers | What Anthropic Learned from Deepseek
Release Date: March 3, 2025
Host: Harry Stebbings
Guest: Mike Krieger, Co-founder of Instagram and Chief Product Officer at Anthropic
In this compelling episode of The Twenty Minute VC (20VC), host Harry Stebbings welcomes Mike Krieger, the renowned co-founder of Instagram and current Chief Product Officer at Anthropic, a leading AI company. The discussion delves deep into the evolving landscape of artificial intelligence, exploring where future value will be generated in an AI-driven decade, the commoditization of foundation models, the transition from model providers to application developers, and the insights Anthropic has gleaned from competitors like Deepseek.
Mike Krieger opens the conversation by addressing the critical question of where value will be created as AI continues to advance. He asserts that the most durable value resides in areas where companies possess differentiated go-to-market strategies, specialized industry knowledge, or unique data access—ideally a combination of these factors. Mike emphasizes sectors like finance, legal, and healthcare as prime candidates due to their complexity and the substantial groundwork required to excel within them.
“The thing that's going to give you legs and be durable over the long run is being able to sell into those places, have something that you understand about those places uniquely, and then get better at being deployed there over time.”
— Mike Krieger [05:22]
Harry probes whether the next wave of AI will benefit existing vertical SaaS companies or foster entirely new startups. Mike responds by highlighting the dual potential:
Existing Vertical SaaS Companies: These companies can integrate AI to enhance their offerings without alienating their current customer base. However, they must be cautious not to overpromise capabilities that current models cannot deliver reliably.
New Startups: Agile startups can experiment with AI innovations more freely, capitalizing on advancements as models improve. Mike underscores the importance of startups continuously iterating and leveraging AI advancements to unlock new value propositions.
“Don't wait around for the models to be perfect, be exploring in this space, be frustrated by the current generation of models, and then be very aggressively trying the next one so that you can feel like you can now finally deliver on the thing that you saw in your head.”
— Mike Krieger [07:56]
The discussion shifts to the intricate relationship between model quality and product design. Mike argues that AI models and product UX are now inextricably linked. Designers and product managers must account for the non-deterministic nature of AI, ensuring that the integration of models enhances rather than disrupts user experience.
“You can't separate the two anymore. Designing a scaffold and a product around a fundamentally non-deterministic system means that model quality, prompting, and backend optimizations are direct components of product design.”
— Mike Krieger [20:29]
When exploring AI's impact on software development, Mike shares his experiences with Anthropic’s Claude Code, a tool designed to assist in coding by finding and editing code files efficiently. He envisions a future where developers transition from writing code to delegating tasks and overseeing AI-generated solutions, emphasizing the evolving role of software engineers as managers and collaborators with AI.
“The role ends up looking like being more of a manager and delegator to these things rather than just a partner in the loop.”
— Mike Krieger [44:54]
Mike identifies the primary technical challenge as creating training environments that accurately reflect real-world, multi-step processes. Current models excel in narrow tasks but struggle with broader, more complex interactions that require deeper understanding and adaptability.
“Figuring out how to better break down complex environments and think about them holistically is the biggest blocker.”
— Mike Krieger [13:31]
The conversation touches on the future of data in model training. Mike advocates for a hybrid approach, combining original human data with synthetic environments that allow models to explore diverse scenarios. He cites examples from gaming, such as Pokémon, to illustrate how synthetic data can enhance a model's ability to handle uncertainty and varied approaches.
“It absolutely has to be a mix. The best models will come from a combination of great human data and the ability to generate diverse synthetic environments.”
— Mike Krieger [16:02]
Harry raises concerns about the increasing complexity of model selection for end-users, likening it to choosing between different versions of Google. Mike agrees, highlighting the concept of "leaky abstractions" where the underlying complexities of AI models expose themselves to users. He emphasizes the need to simplify user interactions with AI, avoiding the necessity for users to understand model differences.
“The overall experience is one of why would I choose one over the other? We suffer from this problem as well. So model selection needs to be collapsed further.”
— Mike Krieger [18:22]
The rapid pace of AI model releases creates a "product marketing nightmare," with constant updates making it challenging to maintain stable product messaging. Mike discusses Anthropic’s strategies to balance rapid iteration with the need for stability, such as opting features into beta and carefully managing enterprise customer expectations.
“Every time there's a model launch, I assume every one of those labs is either watching the launch stream or evaluating their next steps. It's about getting used to the bumpy ride.”
— Mike Krieger [26:35]
A thought-provoking segment explores the potential for AI to become integral to human relationships, particularly among younger generations. Mike expresses both optimism and caution, acknowledging AI’s role in augmenting social interactions while recognizing the irreplaceable value of genuine human connections.
“There is a lot you can learn from what it feels like, but it is absolutely insufficient as the whole. AI can be a helpful piece of future human interaction but not the entirety.”
— Mike Krieger [57:46]
Mike addresses the underestimated capabilities of China in AI development, stressing the importance of recognizing China's advanced research teams and innovative startups. He also highlights Europe's unique strengths, such as stringent data privacy laws and societal values, which can influence global AI practices.
“It's a mistake to underestimate China's ability to train at the frontier, especially with access to massive compute resources and innovative startup ecosystems.”
— Mike Krieger [33:33]
In response to a question about AI's impact on human longevity, Mike expresses optimism about AI accelerating medical research, particularly in drug discovery and clinical trials. He cites examples where AI has significantly reduced the time needed for research processes, underscoring AI's potential to contribute to medical breakthroughs.
“AI is helping close the loop on drug discovery and clinical trials. For instance, Novo Nordisk uses Claude to complete clinical trial reports in 20 minutes instead of weeks.”
— Mike Krieger [61:22]
In a rapid-fire segment, Harry poses several concise questions to Mike, eliciting candid responses:
What has OpenAI done better than you?
“They've moved faster at shipping V1s, even ahead of where the model sometimes is.”
What have they done worse than you?
“Probably personality and having the features they built be cohesive.”
Which alternate model provider do you respect most?
“They’ve balanced first-party product development and an API that people use at scale as well.”
If you could rebuild the Anthropic product stack from scratch, what would you do differently?
“Simplify the information architecture so users don’t have to differentiate between projects, artifacts, and chats.”
What have you changed your mind on in the last 12 months?
“The importance of investing more in first-party products alongside our API offerings.”
What is a major technical or product challenge on the horizon in AI that no one’s talking about?
“Ensuring discernment and privacy in increasingly knowledgeable models, preventing them from revealing sensitive information.”
The episode with Mike Krieger provides invaluable insights into the future trajectory of AI, emphasizing the importance of specialized knowledge, thoughtful product design, and the delicate balance between rapid innovation and user stability. Mike's reflections on Anthropic's strategies, challenges, and the broader AI landscape offer a nuanced perspective for investors, entrepreneurs, and technologists navigating the rapidly evolving world of artificial intelligence.
Notable Quotes:
Mike Krieger [05:22]:
“The thing that's going to give you legs and be durable over the long run is being able to sell into those places, have something that you understand about those places uniquely, and then get better at being deployed there over time.”
Mike Krieger [07:56]:
“Don't wait around for the models to be perfect, be exploring in this space, be frustrated by the current generation of models, and then be very aggressively trying the next one so that you can feel like you can now finally deliver on the thing that you saw in your head.”
Mike Krieger [20:29]:
“You can't separate the two anymore. Designing a scaffold and a product around a fundamentally non-deterministic system means that model quality, prompting, and backend optimizations are direct components of product design.”
Mike Krieger [44:54]:
“The role ends up looking like being more of a manager and delegator to these things rather than just a partner in the loop.”
Mike Krieger [26:35]:
“Every time there's a model launch, I assume every one of those labs is either watching the launch stream or evaluating their next steps. It's about getting used to the bumpy ride.”
Mike Krieger [61:22]:
“AI is helping close the loop on drug discovery and clinical trials. For instance, Novo Nordisk uses Claude to complete clinical trial reports in 20 minutes instead of weeks.”
Note: Advertisements and non-content sections have been omitted to focus solely on the substantive discussion between Harry Stebbings and Mike Krieger.