Loading summary
Mike
Foreign.
Dan Shipper
Mike, welcome to the show.
Mike
Great to be here. Dan. Good to see you.
Dan Shipper
So for people who don't know you, you're the head of Anthropic Labs and you're the co founder of Instagram. And today what I want to talk to you about is Fable 5. So Fable 5 is driving tomorrow, recording this the day before. This will come out after it drops. But what I really wanted to do is bring you on the show to tell me about what it's like to use this model beyond the first day. I think when a model this powerful drops, it's so useful to have someone who's using it day in and day out to tell you this is where it's powerful. This is how what it actually changes. This is what it doesn't change. So that you kind of like, don't. You kind of don't get the same AI psychosis type thing you can actually think about. Okay. Like, this is how it fits into my life.
Mike
Yeah, absolutely. And it's also just been interesting, you know, we've had some, you know, models in this, you know, mythos class leading up to the Fable release, you know, for a couple of months now. And it's, I think it's very exciting to see how people will build with us externally. But I think you're also right that day one impressions, I think it really comes from getting to use this over a couple of weeks. I think we've seen that even with previous models, like the December into January usage, Opus 4.5 or Opus 4.6 was really important because people spent extended time on the model and then figured out, oh, actually I wasn't pushing hard enough. I got to go further and I got to rethink what's even possible with this generation.
Dan Shipper
Totally. I mean, I don't know, I feel like there are people internally at every who have been using it who have been like, oh my God. I think I kind of need a new set of skills to use this model. And I think you can especially see this with people who are maybe more non technical internally and who are more on the knowledge work side of things where they're like, I don't even know what I would use this for. And the people who are orchestrating agents are like, holy shit. I feel like there's so many new things I need to learn. So I'm curious for you. Tell us about the difference between your impression, when you first tried it and now.
Mike
Yeah, I think that your, your point on, on adapting workflows is a really good one. Quite literally. Workflows. I'll talk about that in a second. But also just in terms of, like, how do I, like, think about usage of the model? Because at first the timing was interesting because it kind of coincided with me transitioning from CPO into labs and going really back into builder mode. I think it was about a month and a half or two months into that that we first had one of these models available internally. And I sat there and I was like, I feel like a total newbie again. Because I feel like the way that I am prompting or even thinking about decomposing a task is really out of date now with this model. Like, it's no longer. And it's even thinking about the time horizon or the sort of, like, interactivity model, I think has to evolve as well. Like, going from, I think early on would be like, I have an idea for this feature. Can we start by, like, absolutely not. Right to great. Like, let me express more of the intent. And then just being, you know, I remember, like, you know, you know, March, April, be like, wow. On the one shot, it's already incredibly impressive. But then it also understands the intent around how we're going to evolve this and understands, like, the global context as well. So I think that's been a really interesting evolution till now where, you know, I was funny, I was talking to somebody this morning where, you know, I think about doing work. I had a flight and I was like, okay, I can do most of this work remotely. And I don't even worry that, like, the WI fi is going to drop out. Because I know that if I set up the right, you know, context instructions, like flash loop, you know, I'll see it. It'll see it through. And. And I think my last two months have been full of a lot of times where I will, you know, wish Claude a good night, set it off on, like, a pretty complex task of something of this, like, model class and wake up to, you know, actually, it's usually done by, like, two in the morning. And I guess it just twiddles its thumbs for the next four hours. But, like, really impressive ability to, like, complete the swing, get itself out of the situation where it's like, okay, all right, well, Mike asked me to do this complex task overnight. I got stuck because this remote service went down. I'm gonna write a, like, scaffolded, like, backend for it for now. So I'll, you know, I'll document that. I'll, you know, go all the way through. I have a, like, good mental model of, like, how far that's gonna get me. And then when it comes back online, I'll fix it. I'll keep track of that fact. It's just like, it is, I think the most impressive thing for me is like, you're just being able to like delegate that kind of level of task and just trust that the right thing will happen by the end. And of course, like, you'll review the result. And there's still like a whole verification thing that we should, we can and should talk about because I think that's an important part of still completing the swing there. But it's really forced me to rethink like, what is being productive with one of these models look like. And it is much more like we've talked for a while about, you know, like, what is it like when these models are more of like a companion or a coworker. And it really feels, feels like now it's like a teammate that I can delegate like a lot of work to.
Dan Shipper
Hmm. And what is your, what is your day to day flow like right now? Because one of the things I notice is if you, if you just give it a big task and you monologue into it and you just like let it go for a few hours or overnight, it's like the most impressive model that I've ever tried. But, you know, it's so slow and it's so expensive that you. I feel like I don't want to use it for day to day tasks. So what is your actual flow like in terms of how you use it day to day and where does it slot in versus other models?
Mike
Yeah, I've ended up having a lot more architectural planning conversations up front with it as well. So that's been another interesting change where I think this is an area that I think all models need to continue to improve. And I'm really grateful for the Instagram experience of having to start from our initial version that was duct taped on a server in LA to being able to scale it and eventually integrate it with all of the Facebook infrastructure because you kind of develop a sense of what infra abstractions and complexity are appropriate for each stage of it. And I still sometimes go back and forth with Fable where it'll be like, this is a good implementation. Like, well, I do plan on shipping this fairly soon. I think we should probably think about more than one server and kind of like that back and forth is important. But a lot of that sort of planning and I'll often actually ask it. It's kind of a thing I've realized is um, Fable can like be so sort of sort of complete in its thinking in terms of how much you are sort of planning with it. Like often just saying, can you just like make an HTML page like that represents what we just talked about so I can share it with the team is actually valuable or even just a markdown document, but I like having diagrams. So that's been an interesting like, use of like, let's plan with it, let's think it through and then let's have some sort of document that we can align the team on. Because, and this is a dynamic I've seen in labs and just teams beyond anthropic, which is you, you can build a lot very quickly and just forcing more of that early alignment, even if you do an initial prototype and then back it out into more of a sort of plan architecture, that works too, I think is really, really, really key and ends up being the place where the human to human interaction still stays very much part of the process. And then from then on, I think either overnight or during the day, having it execute on those chunks of tasks is really important. And it just means having a lot more concurrent sessions than I did before because I often will think, all right, there's these few pieces of work I go back and forth between liking having one very long running cloud code session and really asking it to do everything in background, sort of forked subagent so the main thread stays responsive. And then other times just embracing, like, I'm just gonna, it's one of those days where I'm gonna have like five or six tabs, like tackle, like long comprehensive work. But I do think that there's something to this, like, long horizon, like don't, you know, don't worry, I'm on it. It's gonna take me a while and like more of like this back and forth. And that modality I think is something that we'll have to figure out in our products as well. I think you want to preserve both and they, they interact with each other in interesting ways. And like, my preference is usually I always like having at least one cloud that is high context but also very, very fast response and like its instinct is right. I'm going to answer you and I'll kick something off if I need to, and if not, I'm just gonna, you know, hang tight and, and, and wait for the next kind of loop. I do think you're right that for the I'm just trying to fix this, you know, interaction question or something that's like very fine, detailed, like Fable will go off and think very hard about those things. And I think Fable is the first model where I've actually played more with the effort levels for that reason, where I've been like, okay, this is. I just needed to, like, tweak some ui. I'm actually going to fold up, like, no, put it to medium or something and see how that plays out. I didn't find myself doing that as much with Opus, maybe because the range felt less, like, wide, where it really can feel quite wide with Fable.
Dan Shipper
What about, like, a quick question? Like, you're. You're on the go, like, are you asking Fable, you know, random questions as. As they come to you? Because it feels like you're using a rocket launcher to kill a mosquito or something, or are you flipping back and forth?
Mike
It's so funny you asked that because I had been. And you know, you're like, it's thinking. It's thinking really hard about it. Then since last week, I was like, no, I was asking it something that, like, true. I felt embarrassed actually asking Fable about it was something like probably something NBA Finals related. And I was like, okay, I switched my iOS app to SATA and I was like, oh, yeah, I use this all the time for fast questions. It's like order of magnitude, like a feeling of like. And it's actually not even the sort of like, tokens per second. It's actually probably more around how much thinking goes into the answer. And sometimes, like, the answer does not need to be fully thought through. So, yeah, I'm thinking myself through and I think this is a good product question for us too, which is in general, you don't want people to have to be thinking so much about these choices. So ideally, what we can sort of coalesce around in the longer run is sort of maybe like some more bucketable use cases that are really grokable to people. Or maybe it varies by Surface, where it's actually probably unlikely that most of the time with the iOS stuff, I'm doing Fable type tasks and having a sticky model selection per Surface might be the way to do that. And we'll have to sort of explore what that means from a product perspective. But I for sure have had the feeling of like, this. This is not a stable worthy question. I should ask Sonnet this.
Dan Shipper
Can you show us something that you've built with it?
Mike
Yeah. So one of the things that we did this go around is we encouraged personal sort of like, account usage for us, especially on the weekends, which was really fun because we have. You can imagine A lot of anthropic specific tooling, et cetera. But it was really good to step back and be pure cloud code. Let's like work on something over the
Dan Shipper
weekend and you're in the terminal app or you're in the desktop app.
Mike
That's a great question. I am mostly still in the terminal app. It's interesting watching my wife, who's like not a professional engineer and more of a UX designer pm like really fall in love with cloud code via the desktop app. And I think it's like sort of simplified some of the abstractions for her in that way. But for this one I was still. Is it ghosty or Ghost tty? Ghosty and the terminal app. But let me show you. I. This is one of those. Like everybody has some bespoke need around this. Like I wanted a good sort of media tracker experience and I was like, you know, I'm playing games, like I'm watching TV shows, I get all these recommendations and I just wanted to build something like that was personal to me and like sort of fit some of the use cases that I. That I had. And I kind of like. The two biggest criteria that I started with was like one, like really easy to add things and so like you can talk to Claude, Claude does the gentic search over everything and then puts the right things in and then also proactively like, you know, there's a season or a new sequel to a game that it could go off and research those things. Most of the UI was like Fable One shot, which was already impressive. But then the thread I've been pulling out a lot in labs this year is how do you sort of bring the software team, which is cloud these days, closer to the software itself. And so this was like maybe Saturday morning I had a full weekend with kids stuff. So a lot of this was sort of kickoff work. Go do, you know, go for a hike with the kids, come back, you know, continue to do the work, sometimes check in on the work on the hike. I probably shouldn't, but you know, it was like nice to like pop into remote mode and see what was going on there. You know, try not to do that too much. But I had this idea around, hey, like, could you. Could we like do a spike on. I say spike a lot with these models. Like, can we do a spike on, like, what if you could actually modify the software from within itself? Which is, you know, and it was. I built both. It was like a react native version and then this version, which is just the web version. So I Already had like a chat type thing where you could sort of ask Claude to you know, add things by URL, which is like, you know, I want every software to have this where I should never have to like navigate a menu to do anything ever again. And this is like in many ways Dan like the. I was trying to distill the like agent native architectures to like its like fullest degree which is like also have the agent be able to modify the app. So like maybe like phase one of agent aided architecture. Like every single thing in this product is you know, accessible from the agent and has tool calls, et cetera. That's like hopefully becoming Table 6. It was sadly not in a lot of software and it's great because I was like what's that? Like somebody had recommended in Brazilian there's a show about radioactive stuff in Goyania. I did not remember what it was called and Claude was able to figure it out. It was so much better than being trying to figure that out intuitively. But then the next step I was interested in is what would it mean to actually be able to modify the software from itself on the go. And so if you long press this little chat thing. So what it actually what I built, Claude built was a way where it used our managed agents to basically take on like edit requests and then you can preview them. And I used like the Vercel live preview thing here. This like, this whole like feature was also one shot which was really cool and I just added to it over time. But you know, it was like it actually does like a little diff view if you wanted to. You can go into the manage agent conversation and see like what it did. Although I almost never do because again it's like especially don't particularly care on like the code quality of like or the like long term maintainability of this software. You can see that it had a session in here too. But it's been really fun because I'll be using it on the go and say like, you know, I had a feature request the other day like oh like the floating action button was too low on native iOS but it was okay on there. Like can you go if it do it? It did it. It was really fun with some of the like Expo tooling. Now it actually like live reloaded on my phone which was also like a really cool kind of kind of feeling. But it was just like, you know, does this thing need to be like a you know, production level thing that's going to go to a million users? No, but it felt really Good to have something where I felt like it didn't have to stop at just the weekend and I could keep working on it just by using it and having this like kind of end to end close thing. So I felt like this was a good manifestation of both, like Fable's building ability, but also like, I think a lot of what both of I have been thinking, both you and I have been thinking about, like, how does Claude embed itself and like into software beyond just even the usage side of things.
Dan Shipper
This is really cool and I want, I want people to understand like, so this has been built. You could build something like this, maybe not the self modifying part, but you could build something like this for like 10 years or 20 years or something like that. But the cost to build has like gotten dramatically lower. So think about how much it would have cost to do this in the Instagram days versus now. Like, can you help us understand like how that has changed?
Mike
Yeah, I think, and I think about this a lot when I think back to that, that time as well. Because, you know, I thought of myself as a very productive programmer in the early Instagram days. You know, I was like really into mobile development and we had like a good clarity of things and I think the gap from idea to fully realized version of like some complete product, like you were still looking at, you know, 4ish days of kind of my all nighters, which was like my natural state is up till 4, you know, sleep until noon. Not conducive to family life. So I've had to shift. But that was like my, my building thing. But yeah, I call it, you know, Instagram v1, which, you know, probably had more features than this thing did, but not by an order of magnitude, was like five days of all nighters, me working on like the sort of front end and back end and Kevin working on the initial filters to get that, that out. And, and this is also like, you know, like built on already, you know, many years that I've been working on, on, on iOS pieces as well. And then the iteration, you know, I think a lot about what we were gated on after that launch when things went well, was we had all these ideas for where to take it, but we were just trying to keep the site up or we were just trying to like add the 1 incremental feature and you know, hashtags take a week to build, but then there's like all the things that you want to continue doing on it as well. And so I think it's both that shortening of time, like there's still the time required for the idea and the concept and the iteration and then the other piece, which is the way you can then iterate on what you have. And I think a really. I think really fun, but also like, very, you know, sort of in the float kind of way. And then, you know, if now this is me as a sort of professional software engineer, sort of startup founder. Beyond that, if you had that idea, you know, and I saw multiple people go through this, like, it was like, well, I'll try to find maybe a consultancy that will take this on. But, like, now there's like, it's a really lossy process of, like, what I wanted, you know. Yeah, don't raise money for it. And I think that the thing that I think is like, the most exciting part about these models getting not just more autonomous, but again, closing that gap between intent and execution is what I've seen it do to people's ability to build who are not, like, builders. And the trajectory of these models has been, you know, something able, you know, of this general mythos class is like, in that class of models, and eventually, you know, models of, you know, that are cheaper and more accessible to other folks become available too. And, like, as that process happens, like, I just think it is just opening up. So many, like, I got a thing the other day. I get very excited about this stuff. If you can't tell from somebody internally, and we had built them an internal tool that kind of combined Fable and, like, access to some internal mcps. And she said, like, it is the first time in my life. And she works in recruiting and she's like, the first time in life where, like, I feel like, the thing that's in my head and the thing that exists in the world is now, like, they're right next to each other. Like, I can just do it. And it was like, very, like a meaningful moment to her because prior to that, like, I remember these days, these days were five years ago or four years ago where that person, if they wanted a tool, would have to either make do or try to get an internal tools engineer that probably was overloaded with 50 other requirements. But instead, now they are just having the time of their lives building. And I think that's cause for a lot of hope because I don't think that human capacity for creativity and what's possible is enormous. And I think at our best, we are basically expanding the number of people who can then see that through to something that feels real.
Dan Shipper
I totally agree, but I do think that there's a question in the back of my mind and I think it's probably going to be in the back of the minds of some of the people listening. So I want to ask you, given everything you just said, is software engineering over?
Mike
Yeah, I think software engineering is different. It is like dramatically changed. And as I, as I probably would have defined it if you had asked me around the Instagram time, like, what is software engineering? I'd probably say like, all right, like thinking through the hard problems and like thinking about an architecture and then like spending a lot of time in, you know, like text mate. I don't know, that can be like, you know, like text editor, you're going to edit those things or Xcode, you know, and watching Rails casts, you know. Yeah, exactly. Right, exactly. And understanding the intricacies of Django's like ORM layer and then like 15 bugs after you deploy it. Like so much of that is radically different and collapsing into other parts of like product management. I think that sort of like PM Eng split, I think I see it even in our teams has become much more diffuse. That's radically changed. But I think the overall maybe zoom out from software engineering and think about software production or software development, but not in just a pure developer case. I think that is alive and well and essential still. So I think that that is the moment that I feel like we are in. I think Fable is another step on the direction of. And I'm not going to call it the final step, of course a lot will still happen, but I think a pretty significant step in terms of the trust at least I end up placing the model in terms of its capacity to see things through and even, you know, architect things reasonably is quite high. So that part feels like it is not ever going to be done, but it is pretty, pretty done. Right. Like it's gone really far. But I think that the overall sort of craft of the. What needs do you have? Like, what are you putting out? Like, is it actually good? I think still a very human endeavor. But I also sort of can see that that is not a transition that is sort of pain free in a way. Like, I think there are plenty of people who love the craft of like actually putting. And I used to love this. I'm like, I solved that problem so elegantly. You dream about code and if you had the experience of like, you dream about the thing that you're working on, they like wake up in the morning. Like I figured out how to solve this thing really elegantly and, and that for sure has, has. Has passed. And I Think that there's, you know, there is a feeling of loss, I think in some of the like better engineers that I talk to as well as the feeling of oh my God, but I can do insane amounts of work now at the same time. So we're holding both ideas in our heads at once, I guess, which I
Dan Shipper
think is the most important part of this. Like it's normal to feel sadness for that kind of thing and excitement. But I'm curious, let's just take the thesis of software engineering is alive and well. What does that actually look like inside of Anthropic?
Mike
Yeah, I think there's a few pieces. I think there's still the crafting of. Well I got to take it off from the full software development cycle or maybe what I see on a day to day, maybe I'll do a little bit of both. But I think there's still a lot of. We all got together, we talked about the next way we want to evolve Cowork and now we've kind of broken it down into areas of ownership. I think that ends up still being quite important because there is still context that you hold as a person that is sort of beyond cloud. Right? Like what is the actual intent of this product? How's it going? What do we need to know about the sort of other products that are coming down the pipeline that are going to be integrated in some interesting way? So I think that aspect is really important still. And so you know, though we have many clouds to each human, each human, at least the way we've been working on Anthropic still kind of has, you know, we call them DRIs, like directly responsible individuals, still has like a DRI ship over some part of the product or some area. I think that'll be the case for a while because I think there is value in not just this distributor. Like we should all make Cowork better, but instead like all right, I'm thinking through how Cowork does at this particular task and there's still a lot of, you know, the try to keep meetings minimal but they still emerge and you still have these kind of alignment conversations then like a lot of that sort of asynchronous delegation. I think what many engineers here have now found is they've, they've all built and I think we should solve this at some point at like a broader product level. But they've all built some version of. All right, I'm going to now like create a dashboard of where all my clauds are doing and what's waiting for me and which pull requests like need my attention because you know, either a human or a Claude code reviewer got back to me. So there is a lot of that sort of meta maintenance of the work that I think again I think we'll standardize some, but I think some of it will always be a little bit bespoke to the way each individual likes to work. Just in the way that people organize their windows now, they organize their work. And then there is I think also the understanding how things work in production. And I think that is another like there's a few like next frontiers I think for the models and I think one of them that Fable does, you know, make significant strides in, but I think there's, there's more work needed here is understanding what happens to code after it gets deployed. You know, because there's incidents, there's, you know, this was all working well, but like this network link got cut, which is not in your usual failure mode. And like it manifested like so much of Instagram, like 2012 to 2016 was like, like dealing with that and scaling things up. And so that role of the engineer still remains really key. And I think getting the reps in around incident response and understanding how to stay calm, gather data, like remediate what's immediate but then like go off and work on longer term fixes, like still a necessary part of it. And then I'm trying to think if there's any like other pieces that are, that are notable as well. I think what's maybe the last thing to say is I really like the role that the engineering prototype now plays. You have to be clear when it's a prototype versus not. But the old phrase was like code wins arguments. And I never loved that because the person that could code could go do it. But actually why should they necessarily win an argument by default? But actually it's been really cool now where sometimes we will have some disagreement or some sort of debate about where to take a product and often it's the PM that will say, all right, I just tried it and like jank in like these eight ways. But look, it actually shows like how this could work and that can open up some interesting pieces of conversation. So almost all of that is quite different than it was six months ago, I think especially at the level of parallelism and the level of need for these kind of higher order abstractions of work. But I think what hasn't changed is
Dan Shipper
that ownership, lots of us are shipping AI to production, which is great for productivity, but it also comes with anxiety. You Tweak a prompt, swap models, adjust parameters and everything looks fine in testing. So you merge and then three days later or even sooner, the support tickets start rolling in. The AI is giving your customers unexpected answers and you have no idea when it Happened or why. BrainTrust is the AI observability platform that fixes this. It connects evals and observability in one workflow. That way you see what actually happened in production and can measure whether changes made things better or worse. Traces show the full execution path, evals define what good looks like, and experiments let you compare prompts and models side by side before shipping. Production traces feed directly into your eval datasets. Every failure becomes a test case. You catch regressions in CI before they reach users and teams at Notion, Stripe, Zapier, Vercel and Ramp use it to ship quality AI at scale. BrainTrust is designed for teams building production AI systems where silent regressions are expensive. It's built for any stack. They have SDKs for Python, TypeScript, Go, Ruby C. There's no framework lock in or vendor dependencies. It's SOC 2 type 2 certified and GDPR and HIPAA compliant. Get started@braintrust.dev that's braintrust.dev and now back to the episode. Fable is also very expensive and because of that, like when I was testing it, I felt kind of like I was a kid in a candy shop and I was just like, I'll do this and I'll do this and I'll do that. But now that there's going to be a bill, I'm going to be thinking about it because I have to pause before I do it to be like, is this going to cost me 100 bucks or whatever? And I do think that's going to limit who gets to use it and for what. So how do you think about that?
Mike
Yeah, I think it's most clear cut on the sort of professional software, you know, sort of classic company doing work. It'll be really interesting. It's like, you know, a lot of process of that goes into pricing as well. There's like, it's both more expensive than Opus and then also I'm like, in many ways it's really cheap if you think about, you know, like how much incredible work it's doing. But of course, like everybody has their own economics around what they're working with. So anyway, most clear cut, I think for most sort of software teams and I think as an industry if like phase one was companies even struggling to get some of their employees to adopt AI coding which models were early, maybe the tooling wasn't there. And then phase two was great. We'll create leaderboards and see who can use the most. Which you know, as you can imagine creates like some like also like not ideal incentives to phase three where people were like, okay, now we're just trying to figure out who's using it effectively and like letting them spend as much as possible, having a clear process for that, but making sure we're not doing things wastefully. Which I think to me in general makes sense. Although I think you could like also over rotate that way too. I think something of Fable class should hopefully fit in well into that. Where if you're demonstrating results and you're getting use out of the model, then that hopefully there's a flywheel even inside companies where that goes and perpetuates that. I think on the personal use side, it's a really good one. It's a really good question, I think where I've seen it even in my personal testing because our personal accounts. Okay. Which is funny like paying my own company I work at. But. But you know, you do become more, more thoughtful about it. Something that was interesting was this. The app that I built over the weekend actually fit in with like only a bit of extra usage. So it wasn't like a, you know, thousands of dollars to build this thing that like is a personal thing to myself. But it was also spaced out a little bit more. Probably the in between of that. What we'll probably have to do the most thinking about is the sort of hobbyist or like independent who's like not you know, within the larger company, but also is thoughtful about, about the pricing as well. I think like my overall advice is like just give it a try and see how much it can do without you having to then do a lot of follow ups. And it's like I think measuring cost has gotten so multifaceted now because there is the per turn cost and then there's like what did it cost you not to just do the task but like complete the task to your satisfaction. And I think that's where Fable is really shine for me, which is it actually just does it right so that I don't have to go spend the like 9, 10 subsequent turns. Be like, no, that was not quite what I meant. Like can you also do this piece?
Dan Shipper
It's been really impressive for me because you ask it to go do something and then it just does. It does a thing and you're like, wow, you thought through all the little details. Of this thing in a way that I've never seen another model do. I don't know how much you can reveal about the training process, but what makes the model different?
Mike
I mean, I think in many ways a continuation of a lot of the work that the team has done. And I bow down in total awe of our teams, both on the pre training and on the RL side. I think that the piece that it has evolved in that, at least I noticed the most, is kind of adjacent to that as well, which is a sense of the system more than just the individual piece of the work. I will often be very positively surprised when it will write something and say, all right, but you know, I know that like, in production, this needs to be different. Like, and then it will keep bugging you. Like, have you turned on that like, feature flag yet? Like, it's not going to work until you do. And you know, sometimes being sessions that have gone on for days and be like, look, you still haven't done that thing. Like, you better. Like, I was like, you're right. Like, I didn't turn on that feature flag. I should go off and do that. Or if we change this, the contract will change over there. Or watching it, actually one of my favorite times of seeing it in action, I think where it demonstrates some of the training is watching it respond to code review feedback, either from people or from other cloud reviewers, where it doesn't just say, oh, yeah, that's an issue. I'm going to go fix it and actually be really thoughtful around, hey, for this level of fidelity of what we're building, I'm going to accept this risk. Or, yeah, I see what you mean. Other code reviewer, which is often just another fable model, like, talking to you. I see what you mean, but I'm actually going to push back. I think that that's actually not right. Getting the model to have that judgment is really important. And I think trying to pinpoint like an area where I feel like it's really progressed, it is that sort of not just immediate knee jerk. Yeah, yeah, that's right. I gotta go fix it. And more. Huh. I'll think about that for a minute. No, I thought about it and I still disagree, you know, and I think that's a very useful sort of ability. It's so valuable to have products like cloud code out there because you have now like a living, breathing thing where people are like, this is where the model is doing well. And like, you know, we have like, people who test it. I count the every folks is like, very, very High on the list where like we really trust the feedback because it is being put to paces and like repeated multi day, you know, hard tasks. And that also like very much feeds into how we think about like what do we need to improve on the next slide? Like, what are the tasks that we need to specifically think about the model being better at?
Dan Shipper
Is chat the right interface for this model? Because it's not very turn by turn, it's. It's very like I'm delegating something for you. So how does that change how you should use it or how you think about the interface?
Mike
I don't think like the fundamental, like you are like sending messages and it is giving your message back is like totally wrong. I think that there's ways we need to evolve it. Like one is maybe like three that come to mind. Like one is, is your laptop the right place for it. So I think that's number one where I mentioned with the side project I was working on how useful it was to have the mobile side. Boris, who created cloud code, he's always like, you know, ahead of the curve on how these models get used. Almost a year ago, maybe nine months, I was talking to him, he's like, yeah, I've moved a lot of my cloud code work to mobile. I was like, no way. And like it took me a while to get there. But especially with the Famo class, like there's oftentimes where, you know, because it's can keep the session going and we use like kind of remote dev boxes at Anthropic. Like it is like, I'll have a thought and be like, okay, I need. Can you keep. Keep up and doing that. So I mean, number one is like decoupling the, the. Where the work is happening from where I'm talking to about the work. The second one touches a little bit on what I was mentioning earlier around, like what are. How do you take everything that Fable has sort of discussed or decided or proposed about something and make it comprehensible? And that's an area that we're thinking a lot about. Like there are some skills that are out there or that we've used around. Like, all right, can you diagram this? Can you do that? So that's a place where the current chat UI I think is insufficient, where like it will experience this with think, but it will give you like a lot of text. You're like this. I need to like take a walk, fully understand this. And I think that that is a piece of property I have some things will do with Fables like, okay, like, you have a lot more context on this than I do. Can we, like, back it up? Like, like, let's do like, more progressive disclosure of the complexity here. So I think that that, that piece, the last one that I think we're still early in Polyon, is thinking through multiplayer, where at some level the abstraction levels, and because we have this sort of DRI and ownership area, usually a chunk of significant work, a human and a couple of clods that is still flowing together. But another case is that is less the case where maybe it's an incident response where multiple people are thinking about it. Maybe it's a project where there's multiple competing, not competing, but like conjoining areas that are coming together and thinking through, like, what would it mean for, you know, and we have like, chat sharing, which gets you a little bit of the way there. But I think there is going to be a need for more. Like, all right, you've got an independent club that's doing a lot of work that was, you know, kicked off by somebody. But can it be keeping up with all the other work happening on the team? I think that is an interesting and an underexplored sort of next frontier about how this work ends up happening. But I think it's really exciting because I think again, it's the level of teammate collaborator that the models are now capable of and we're almost holding them back by not having the right abstractions around them for that to happen.
Dan Shipper
Yeah, it makes me think I've mostly been using this for my own vibe coded stuff, so I haven't really had to think about this. But there's a problem when you're using this inside of an organization, which is, do I really understand every part of this? And therefore how do I transfer the context of. Of what the model just did into my brain? That's one of the big bottlenecks. How do you think about drawing the line, especially with a model like this, around how much you actually need to understand and how to make sure that you have enough context on what it's done to feel comfortable.
Mike
I think there's two big pieces here. The first is verification, where I became fully verification pilled earlier this year and now almost in the same way. And actually it connects to how a thing I used to do when I was sort of typing code more full time, which is try to find the sort of tightest dev loop that you can around the idea that you're trying to develop in, like, sometimes with Instagram, that meant, like, you know, actually Making a new build target in xcode that was just that screen with some sort of synthetic data and just doing that dev loop. And I'm not. And I would mentor newer engineers, I'd be like, if there's one thing that I can impart on you, like, it is try to get that for any project you're working on and things will go much more quickly. I think that is no longer exactly the case here, but I think what is the case now is anytime I set it up, like, how do I get, like for every pull request that Claude is putting up, that there is an attached photo or video, whether that's an iOS PR, whether that's something in the UI and that's. I think that helps you gain a lot of confidence because even now you might have Fable Golf and do work for a couple of hours and be like, like, I'm done. And it's really useful to say like, and here's the like full screenshot gallery of the full, right, because you might say like, oh, you know what, on screenshot 8, that error state, I've never actually seen it, but I can see how, you know, a person might hit it. Let's actually make that different. And so getting that comprehensive verification, I think is something we've been working on a lot internally and like sort of publishing more and more skills and knowledge about. But I think is a really key piece there. And then the second one is I think you ultimately as a person still need to stand behind the work that you are doing, especially if you're putting it into a production system. Like a lot of people use cloud every day. There's still the accountability of like, oh no, it's still Cloud might have written it, but like, you need to understand, you know, the, at least the general decisions that were made on these pieces as well. And so I have seen a fair amount of engineers actually adopt this practice where like cloud will have done the work, but then there is like the follow up conversation around, well, can you like, can I make sure I deeply understand like all the trade offs that you made and that, and whatever lowercase A artifacts need to be produced in order to make that comprehensible is important. It is really interesting though to be in meetings where somebody will say like, oh yeah, and I have this PR ready And somebody else asks, they're like, oh, that's interesting. Did you do X or Y? And have that moment of pause and they're like, you know what, I'm not entirely sure I will record before we merge this PR and That's. I think that adapting to that norm and figuring out and work with that is something we'll have to do.
Dan Shipper
Tell me more about the verification loops.
Mike
Such a.
Dan Shipper
It's such a hot topic right now. Sounds like one way that you do that is with screenshots and screen shares. But what are the other ways that you think about that?
Mike
I think part of it, it starts in can you get to a place where you are exercising real like sort of real flows that aren't just like a static injected piece and the system gets more complex, that gets more and more complicated. So we've invested a bunch into like even just getting it so that the iOS app can log in to staging on a real account and have real data. But then you don't want it to then go through an eight stage onboarding process every time when you're just trying to test the second part of the screen. So there's a lot of work around. How do you, is there a special affordance, is there some shared secret, whatever that is around getting the app to really feel as human using the product as possible. So that's one aspect of it. The second is this mix of well known paths versus the things you're exercising in the exact moment. The former being really useful for regression testing. And so we don't think of places where we've expressed sort of ideal workflows in text basically. And Claude can repeatedly check that. And then there's also. And Claude does a really good job of this sort of expressing the intent of the current change at hand. So that gets really, really deeply exercised. So I think that the combination of those two things is important. The visual verification I mentioned as well, video has been really cool to see actually. Video is very under explored tool to give Claude as well. Like a thing I've been prototyping is just giving Claude video captures of the thing that it has built and then giving it just basically an FFM tag. And you'll watch it scrub through and say like, oh, this animation has some jank in it. I'm going to go fix that. And it never would be able to do it with like a screenshot sort of latency capture because it will have missed the moment. So I think that's, that's another piece that is, that's really, really important. And then for the pieces that aren't sort of easily testable end to end because there is some more complex system getting Claude to go and build like as robust a sort of, you know, mock backend as possible or use ones off the shelf has been also really interesting. Like when I think about Artifact, we had really comprehensive tests. This is kind of pre LLM. And one of the ways that we were able to do that really robustly was that basically every piece of info we had, whether it was postgres, redis, you know, all the AWS things, had a really good in memory implementation that you could just do really quickly in unit tests and kind of extending that to like Cloudland now, you know, I was working on something where it had like a pretty robust backend and for kind of complicated reasons, it was hard to spin that up on my dev server. But it was able to again, one shot a really good like proxy for that. By proxy, I mean like a substitute for that. And that was so valuable. And over time it's been interesting as that like substitute has evolved, as the rest of the code has evolved. Which is the thing that, you know, if you had pitched that idea to me before, I'd be like, well, that's going to be really hard because the upstream is going to change. How are you going to keep it in sync? And I don't think about that anymore. I'm like, yeah, Claude will read the changes and it'll adapt the thing and it'll keep the two in sync and that's fine.
Dan Shipper
There's some really interesting architectures around. When you get a bug, it just automatically goes out and closes it. You know, the agent just gets kicked off it closes it and then it sends a message to the customer being like, it's fixed. Are you noticing with Fable, any change in how that process works?
Mike
Yeah, I think there's a couple of things like on a very like human to human or human to cloud level. One of the things that I've seen it do, I've been on other models with the cable, but I just need to do it really consistently too is if the bug report, for example, came from somebody, you know, mentioning something in our like feedback channel in Slack. And then like the thing that got fed into the cloud code session is like, oh, there's this. And because of the Slack mcp, you can actually pull the thread, have it, then actually post back, you know, as me, it'll be like, hey, this is Mike's Claude. Like, like, I fixed it. Here's the, you know, here's the pull request. But then I think and previous clients could have done it, but think it does really well is then say, but hold. I hold tight. It's not in production yet. I'll follow up when it actually is and Then like maybe a few hours later like, oh, like this deploy went out, like you should go test it. Is it fixed now? Like that level of follow through I think is new on, on the closing the loop piece and it's fine. I definitely have these long running cloud code sessions that are basically like interacting as, as me. I guess let's put some disclaimer in there too. And the second goes back to that taste and discernment piece that we were talking about, which is like, it's one thing to say there was a bug report, therefore I must go fix this thing. And it's another one to say, you know what, I hit this over the weekend. One of our internal systems basically had been running without restarting for a while. There was a memory leak and it had good discernment saying like, all right Mike, it's the weekend, just bounce the server. It's going to solve it for now and we'll work on the asynchronously get the PR going to fix this more long term. So, so I think if you're going to have Claude in the loop in this kind of like sort of close the loop bug report or system sort of issue to change. I think you really wanted to understand where, you know, as any good SRE or engineer in the loop would like, great, let's solve the problem at hand. Let's like defer the question of like do we need to re architect on top of a completely different language? And understanding that balance is really important.
Dan Shipper
One of the things that's like really exciting, mostly exciting to me about new models is it raises the floor so that everyone can kind of go build apps in one shot. But it also raises, raises the ceiling for experts. So like if you're a software engineer or a founder, you can just go do things that you never would have been able to before because you have access to this really powerful model. So for me, I built this one shot version of Borges Infinite Library. It's like a 3D game version of the, of the, of the library. It's wild. It runs right in the browser. It's so good I can find like any, every essay inside of it. I'll send you the link. It's sick, but I think there's going to be this flowering of people doing things like oh, I made a game or maybe I trained a new model or whatever that they couldn't do, that they couldn't do before. And I'd love to give people some inspiration, some examples of things that they might be able to do that they might not be thinking to do with this model. What are some ideas that come to you?
Mike
Yeah, I think a few. Maybe I'll start with the fun side and riffing off the game piece. I think people have a lot of creative ideas for how do they express the complexity of what they are like their world. Like everybody has the thing that they know really really well and there's probably some level of like how do I then explain that to somebody else or how do I apply techniques elsewhere that I could then go off and do. My wife is studying environmental engineering like studying geothermal very complex math and simulations. And I've seen as the models have gotten better she has been able to apply even more complex techniques from even outside of that domain into that work. And I think with Thimble should be able to do full on pytor end to end simulations of that work in a way that wouldn't be possible. I think that maybe is one is like bring the like beautiful complexity of what you have and either show it to other people by like maybe making a game or maybe making a visualization which I've seen her do as well. Or at least like make you know, bring other techniques to bear. And the second piece is its ability to compose software that like solves a really unique problem to you. I've seen that internally a lot of the work that we've been doing is how do we get that as many of our internal systems like MCP ified with the right permissioning structure and the right deployment kind of set up. Although externally you have good options around some of these like platform as a service pieces and you can just ask Claude about them and they'll like help you set things up. But like I love that feeling of like that thing that you always wish that you had. And then what has blown my mind there was a person who works in our go to market organization has been like building this like really like for deeply thought integration of Claude into every part of her whole process. And you don't have to stop at that one shot like she's been working on it for months now and she can keep going. And like I think one of the things that is maybe underappreciated the models is I think in previous generations it would eventually get to a complexity level where it was hard to iterate on it without feeling like you then would break the thing that they had, you know like under or over abstracted. Whereas this is actually, you know she's had access to something fable or fable like for a couple of months. And like, you've just seen it keep growing and growing and growing and growing. And now she's like, deploying it to the whole gtm. Org. And like, I think that is really cool. Like, the ceiling of complexity that a person that does not start out as technical can now build for solving problems within their domain is like. Is impressive at it.
Dan Shipper
I agree. It writes great code. Like, my benchmark that I have is called the Senior Engineer benchmark. I just have it see if it can rewrite a code base from first principles. And the nearest model, the previous top was like a 62 or 63 out of 100. And this model got 90 on the benchmark, or 91, which is human senior engineer level. You can just keep going with this thing in a way that it's really fantastic. I'm curious though, one other thing that's really powerful that you mentioned is dynamic workflows. Tell us about that.
Mike
We'll build things internally sometimes and I will go really aggressively bug the engineer who built it and be like, when are we shipping this publicly? Because I think people are going to really like it otherwise. There's good reason why it was built internally, but we try to ship as many of these as possible. And dynamic workflows was definitely that to me. I personally built this engineer named Sid, who's awesome. And I was like, sid, like, I want to get this out to the world because it's so good. But I think it's especially good with a model like Fable for two really big reasons. One, it helps sort of create the scaffold for, like, deep, meaningful work. The craziest dynamic workflow I did and used Fable for was I had an internal project that we had written in Python, but we needed it actually in TypeScript for like a really specific deployment reason. And having been internal to Instagram, and we were like, should we rewrite the whole thing into hack and, you know, port it to the PHP engine at Facebook? I was like, you never would have done that. Like, maybe they can now with the model. But, you know, at the time it seemed impossible. But here I had, you know, a pretty complex code base and I was like, I'm just going to set up a dynamic workflow and just let it run over the weekend. And it did. And the workflow was so cool. It was like, all right, I'm going to do like a deep understanding of the work. I'm going to create sort of like a. Almost like a spec of how everything works. I'm going to go module by module. I'm going to translate these pieces, I'm going to test it incrementally. I'm going to do another adversarial test. I'm going to go check for anything that I missed. And it was this like really cool like series of steps that the workflow was able to to orchestrate. And I came back and I was like, yeah, this thing is a typescript and bun port of that thing and it's actually better in these ways. And it was very documented. Like these are the things I couldn't port. But most of these were very specific to this specific implementation. It wasn't worth porting. And I do not think you could have done that A with previous models at that level of success and B without the kind of scaffolding that workflows provide. So I think that is is extremely exciting kind of combination of model capabilities and then our own ability to orchestrate them over longer and longer time horizon with that feeling of like you had a goal, you broke it down effectively and then you were able to make it work. The other piece is, I think over time we'll be able to also make some of those subtasks sort of tuned to the have the model be tuned to the level of complexity of it. So you can imagine that some parts of dynamic workflow don't need extra high thinking. They could use a medium thinking to get it done or even a smaller model. And I think that's really the future of where these things are going. So yeah, I'm a huge workflows da you.
Dan Shipper
For people who haven't used it before, Tell me about how you got that workflow made. How did you design it? How did you make sure it was good?
Mike
Yeah, it was pretty iterative, but sort of just started with cloud code like, hey, I have this complex kind of task like let's design a workflow to go and do it. It kind of showed me the plan. I was like, oh, this is close to what I want. I want to make sure that you do these three or four levels that of additional verification for missed features. It's like, here's what you have, are you ready to go? And it expresses the workflows in code, which I think is really valuable to kind of see what it was about to do. And then what was interesting is it did the full port and then I had a couple of follow up kind of questions that I had or like little tweaks. And I did those as sort of like mini workflows that built off the previous one as well. But I think that's like we talked a little bit about whether chat was the right interface. So we've had that conversation over the last year and I think think workflows are a good middle ground of you can compose them using chat, but they're expressed using code and then they're executed with like I think a nice clean UI around what's happening at every stage. And like I think we'll start bridging longer horizon work with chat in ways like that over time.
Dan Shipper
Mike, this is such a great conversation. Thank you so much for joining and telling us all about this new model.
Mike
I'm really excited to get to spend time with you and really, really look forward to what people think outside too.
Podcast Host
Oh my gosh, folks, you absolutely positively have to smash that like, button and subscribe to AI and I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure, unadulterated knowledge bombs About Chat GPT Every episode is a roller coaster of emotions, insights and laughter that will leave you on the edge of your seat craving for more. It's not just a show, it's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor. Hit like smash, subscribe and strap in for the ride of your life. And now, without any further ado, let me just say, Dan, I'm absolutely, hopelessly in love with you.
Release Date: June 10, 2026
Guest: Mike Krieger, Head of Anthropic Labs & Instagram Co-founder
In this episode, Dan Shipper sits down with Mike Krieger to dive deep into the launch, real-world capabilities, and impact of Anthropic's latest language model, Claude Fable 5 ("Fable"). They discuss its practical use over several weeks, how workflows and expectations around AI are evolving, what Fable changes for technical and non-technical users, why software engineering isn't "over," and how advanced AI is expanding who gets to build, create, and automate. The conversation is both hands-on—sharing examples of projects built with Fable—and philosophical, exploring the changing craft of software and the role of verification, human oversight, and collaboration.
Timestamps: 00:07–04:46
Initial Impact: Mike describes the shift from CPO into “builder mode” and feeling "like a total newbie again" with Fable, realizing the way he decomposes and prompts tasks is now outdated.
Overnight Delegation: Mike highlights Fable’s ability to execute complex, long-running tasks effectively:
Evolving Relationships with AI Agents: Fable feels "like a teammate" for complex work, not just a tool.
Timestamps: 04:46–08:34
"Fable is the first model where I've actually played more with the effort levels… where it really can feel quite wide with Fable." – Mike (08:25)
Timestamps: 08:34–10:02
Timestamps: 10:02–14:49
Timestamps: 14:49–19:04
Timestamps: 19:04–21:47
Evolution, Not Extinction
Emotional Ambivalence: Acknowledges feelings of loss for "the craft" vs. excitement at new possibilities.
"There is a feeling of loss, I think in some of the better engineers that I talk to, as well as the feeling of oh my God, but I can do insane amounts of work now at the same time. We're holding both ideas in our heads at once." – Mike (21:16)
Timestamps: 21:47–25:34
Timestamps: 25:34–29:51
Timestamps: 29:51–32:33
Timestamps: 32:33–35:38
Three Frontiers:
"We're almost holding them back by not having the right abstractions around them for that to happen." – Mike (35:32)
Timestamps: 35:38–41:43
Timestamps: 41:43–43:48
Timestamps: 43:48–46:55
Building for Experts and "Non-Builders"
"The ceiling of complexity that a person that does not start out as technical can now build ... is impressive." – Mike (46:41)
Timestamps: 46:55–51:05
Example: Full Codebase Porting
Workflow Design: Iterative, via chat, then codified in reusable code—this combination is seen as a future mainstay.
"Workflows are a good middle ground: you can compose them using chat but they're expressed using code and then they're executed with a nice, clean UI around what's happening at every stage." – Mike (50:46)
[End of Content Summary]