
Loading summary
A
It's here. The model, the myth, the legend Mythos from Anthropic has finally dropped. Well, baby Mythos. We're calling it Fable 5. And this new model is crushing benchmarks, but the question is, can it crush my backlog? I got early access to the model and of course, I have my own opinions on where it does really well, where it needs a little work. And the question on everyone's mind, does it live up to the terrifying marketing hype? Let's get to it. Okay. Let's talk about what Anthropic is telling us about this model, and then we'll get into what I think about it. So this is Claude Fable 5, the first mythos class intelligence model to reach GA. Now, if you haven't been paying attention, Anthropic has been marketing slash, scaring, slash, warning us about the unbelievable capabilities of Mythos. And it is finally here. Now. They had originally been rolling this out with a couple select companies. I got early access to test what I thought of the model. But you have to know this is not Mythos Capital M, Big Mythos. This is baby Mythos. This is Fable. And so it's going to have some guardrails on it, in particular around cybersecurity exercises and biology exercises. Now, good news. Your girl's working on PRDs. She's shipping SAS. She's not working on biology quite quite yet. Although give me. Give me a little time and some time to experiment, maybe I'll get there. So this is really going to be focused on what the everyday user with the everyday software engineer is going to think about when they're using this model. Although I did run into some things that I suspect are a result of the tuning and training of this particular model to be extra safe. Now, quick, it's not cheap. It's $10 per input token and $50 per output token, and it's going to be a new tier above Opus. And so if you're going to use this model, you're going to pay. Pay the price. So what is Anthropic saying? Basically, it's a completely new model Class. So we had Sonnet, we had Opus, and now we have Mythos, the first of which is Fable 5. It's completely state of the art. It is exceeding every benchmark they tested by a significant amount. This 80% on Sweep Pro, you'll look at that compared to some of the more recent models that have come out. Very, very good benchmark performance. And then they're saying it's really good for long, complex tasks. Now, what are some things that earlier models couldn't do that they are saying now that Fable 5 can do? It's very autonomous, including running days long, asynchronous tasks. It's really an engineer's engineer. And that's some of the downside I experienced with this model. I'm going to show you a very specific example of where you don't want an engineer do doing your work with an engineer's point of view. Proactive. It's very good at vision, exceptionally good at vision. This is a place where I actually really loved the model. And you know me, I'm pretty critical of models, but I did see a step ahead of vision. So that's something we're going to dive into. And then effort. It works hard, it builds harder, it verifies more, it's built for ambitious work. Now guess what it also can do? It can consume those tokens. So Anthropic has said it consumes rate limits and tokens at about 2x the rate of other models. So again, this is a big boy model and it's going to consume tokens and some of the things that it's good at and even some things that they have done in the harness seem like they're intentionally or not token consumers. So we're going to keep an eye out on costs and an eye out on efficiency when using this model. Again, talking about long running tasks, Fable 5 is supposed to be able to run four days. So doing long running, planning, being able to spin up sub agents, and I show a little bit about dynamic workflows which are, you know, different architectures of sub agents and holding multi day sessions. Now I have done probably day days, long sessions with other models. I didn't have Fable for many days, so I cannot verify that it ran for days. I did get it to run, however, for several hours on some tasks that that may or may not have merited that several hour effort, but it definitely seems like it has both the harness and the intelligence capability to run for a very long time, if that's appropriate for your task. Now, here's your pros and here's your cons. They explicitly say that Fable works like a seasoned engineer. Unfortunately, if you have worked with a seasoned engineer, you know there's good to this and you know there's bad to this. So it is very complete in its investigation. It's definitely going to go search out all the corners. It's definitely going to think about how it can be 120% sure that it's shipping the right thing. But guess what, that's not always in service of launching and that's honestly not always in service of building a great product. So while you can give it a goal and it will be very autonomous and it will be very thorough, honestly, sometimes you want like a slightly less thorough engineer product manager talking, even engineer talking, sometimes you want it to be a little bit dumber. We'll talk about some of the prompting techniques it says and when to use this model. But it's just something to think about when you're working with any high intelligence model is how much intelligence does the task actually take. Now, as I said before, it is token intensive by design and I did most of my tasks on extra high. And so it was like token burning on token burning. And so they say that high is probably the sweet spot for most work. I used extra high just because I don't want anybody in the comments saying, claire, you picked high for this task and it should have been extra high and you would have had a better experience. I used extra high. I used all of the brains of Fable, but again, it is very, very token intensive. My question for any of these models, this is not an anthropic model question, this is not a Fable question, is, does this token intensity actually output the right results? And that's a place where I'm just not 100% sure. But again, as us humans in the loop, we're going to have to be much more intelligent about where to put what model and where to use what reasoning and what effort level to match what we're doing. And again, I think that the untrained of us will say, oh, well, I have this FABLE model, I should use it. It's better than anything. And honestly, I still think there's a place for good old Sonnet. I think there's a place for opus, and I think there's a place for other models in the ecosystem. Now, there are safeguards in this model. And so this is one of the first things that Anthropic told me, testing the model. And this is one of the headlines that they're making in the release, which is there are specific classifiers in this model for cybersecurity, biology, chemistry and distillation. Basically, they don't want anybody doing bad stuff in those categories, in particular with this very intelligent model. What's nice about how they've implemented this, however, is they have this new fallback concept. And so if you get classified into one of these categories, instead of saying like do not pass go, you may no longer stable. It Just falls you back to Opus 4.8. This is also a capability in the API now where you can do this graceful fallback to for eight if you're using a Mythos class model. They also have a 30 day retention policy used only to catch misuse and it's not used to train Claude. So while it's still not training Claude, they do want to check the use of this model because they have been and will forever be very cautious about us normies using their intelligent models. And just, you know, for context, 95% of sessions on this model did not hit a fallback. I don't believe I hit a fallback, but again I'm not doing anything in cybersecurity biology or chemistry at least yet. Okay, so this is the question, is this or is this not Mythos? It is Mythos. Fable has the safeguards. Mythos does not Fable all us normies can have in general availability. Mythos is still restricted to these Project glasswing partners, some of these enterprise level partners that are really checking it against some cybersecurity use cases. I would suspect that at some point we get some access to a Fable five point whatever or that the Project Glosswing class opens up. But for now we get Fable, Project Glosswing or these pre selected companies get Mythos but they are all fundamentally the same underlying model. A couple product things that are also launching today. Along with the Fable 5 model, Cloud managed agents are going into public beta. If you haven't paid attention, this is Anthropic's hosted harness hosted sandbox for running long running agentic work. I am still trying to figure out what a good use case for cloud managed agents is. I will get there, but Fable ships out of the box in cloud managed agents. There's also a new advisor strategy where you can use Fable 5 as a senior advisor and use cheaper models as an execution layer. A lot of people are doing this with Opus and Sonnet and so this is going to work today in the API and in Claude code and is a strategy you can use. And then as I mentioned, this fallback API where you can put an optional parameter on the Messages API that allows you to continue to block requests by using 4.8 at opus pricing. Okay, as we said, crushing benchmarks. Look at this. Fable 5 compared to Opus 4, HPT55 and Gemini 3.1 Pro. Significant increase in SU Bench Pro benchmark very far ahead of these other models. And while I wasn't testing the most advanced use cases, I didn't find something that technically it failed at. So I think these benchmarks are really going to hold and these benchmarks have outperformed across the board. So this is Anthropic's state of the art model. Okay, so enough about what they say, let's talk about what I say. What is it actually like to use? So I ran Fable 5 on a bunch of different work and I want to give you my feedback on where I thought it did well, where it needed a little bit of work and where I was really surprised. As I said before, it's really good at vision. And where is it good at vision? That really impressed me. It's really good at document formatting. So this super simple, but we've been doing these handwriting documents for my 7 year old based on classic texts and classic poems. And on the right is Opus 4a and on the left is Mythos 5. And it looks so silly, but I really do think Mythos 5 did a much better job of a second grade layout for a handwriting sheet. There's just like the right spacing, it's very clear to read. There's enough white space, I think on the one on the right it's just very dense and even the lines themselves are sort of hard to tell. Do you write above? Do you write below? So I do think that PDF formatting documents. I tested this against a bunch of different models. Mythos 5 really did a good job. So very simple eval for me, but a very, very good one. Now here's the problem though. The writing is nearly unreadable. So if you're thinking about Mythos for prose, for spec writing, for P PRDs, unfortunately it's an engineer. And what's the problem with engineers? They just really get wrapped around the axle on details. And this is a real struggle with these more intelligent frontier models is they're like too smart. And so it's just very, very hard to parse what they're saying. And I'm going to show an example of this in actually Claude code. So I have this concept of a product graph that I'm working on for chat pd. It's actually a fairly complex open source and I had Fable 5 go through that and actually do like an adversarial review of my requirements to try to figure out where there were internal consistencies in the logic. And it gave me this markdown document that looks very long and intelligent. But if you actually go through it, it's just really hard to parse. It's these like internal references. It's very detailed, but not in a way where you can zoom out there are these big blocks of paragraphs, like look at, look at this. It is just really hard to see the forest for the trees in this particular model. And I saw this sort of like over and over again. Working on it with specs is it was very complete but nearly imparsable. And that's a real challenge when working with these very, very high intelligence models. Again, I would actually suggest pulling back to maybe a Sonnet or Opus model for specs and then looking at Fable as an orchestrator of execution where that detail really matters. But you don't have to read it. The other thing that shock, shock shocked me was how like actually legitimately terribly bad it was at design or at least a one shot design. And so I asked Fable to design a skills registry and man alive did it do a very poor job. I mean, I'm not even talking like AI slop bad is like fundamentally terrible design. Gray, black, red, simple outlines, just really, really terrible. Now the anthropic team suggested that I just needed to be a little bit more detailed in my prompting. I've never had to do this before in I would say the last year of models in terms of front end. But even when I prompted it, it was still just not very impressive design. I think there's this real balance between design slop and specificity and just shipping a terrible design. I'm not sure what about Fable 5 resulted in this. I'm gonna have to keep testing it as it rolls out today. But this was a real disappointment in terms of design. So again you might want to toss an Opus in the mix instead of relying on Fable for design. It's really conservative on execution. So when I was trying to do that ambitious days long work, I took a spec and I said can you ship the V0 of this, the MVP? I said enough to that a customer could get value. And the mvp they just really took minimal to heart. It was like very, very narrow, not actually that useful. And I'm curious that this comes from some of the safeguards on this model. And it's been a challenge I've seen since the kind of later Opus models as they're not super ambitious. And so again you'll have to think about how to prompt this to get that long running outcome paired with the right product ambition. And then I really doubled down trying to test these Claude dynamic workflows and these sub agent designs, trying to see if this would really add value. And the multi agent capability is definitely there. And I definitely had some successful multi agent runs kicked off in Fable. But I also ran into a lot of stalls and errors in using multi agent orchestration. Now I made the mistake. I walked away from my laptop and came back to these sub agents that had stalled after about three hours and so like egg on my face. But I really want to see how technically the Claude code model holds up to the promise of multi agent orchestration. I had some successes and some bugs. I think this is a Claude code issue, not necessarily a model issue. Although with this promise of long running days, long prompts, you really got to deliver technically on the outcome. So what's my takeaway? I would hand it hard problems. Of course not cybersecurity, bio or chemistry problems, but hard technical problems were being extremely detailed matters long horizon work. I would also hand it vision problems where you really want something to look good or you want it to parse PDFs or other documents. It's done exceptionally well there. I was actually really surprised. I probably wouldn't hand it my front end work or I definitely wouldn't hand it my front end work and I definitely wouldn't hand it strategy or spec work. I think it overthinks things. I think it's pros is nearly impossible and so maybe I'll test it again with effort level lower on sort of pros and spec writing. But it wasn't it for that. That being said, I'm not a hater on this model. I definitely not. It definitely has a place in your stack. I'm gonna test it. If you wanna learn more. Definitely look up the prompting guide for Fable. It's gonna probably repeat a lot of what I said. Hand it your hardest problems, what this model is good for and what is not, and how to get a good outcome. That being said, Mythos is here. I cannot wait to hear what you build, what you overbuild and what you make ugly with this new model. Thanks for joining How I AI. Thanks so much for watching. If you enjoyed this show, please like and subscribe here on YouTube or even better, leave us a comment with your thoughts. You can also find this podcast on Apple Podcasts, Spotify or your favorite podcast app. Please consider leaving us a rating and review which will help others find the show. You can see all our episodes and learn more about the show@howiaipod.com See you next time.
Host: Claire Vo
Date: June 9, 2026
Claire Vo gives an early user's review of Anthropic's Claude Fable 5, the first "Mythos" class AI model to reach general availability. She covers Anthropic's claims, her personal experience using the model, and where Fable 5 excels or fails for practical work. The episode is a hands-on, candid breakdown meant to demystify cutting-edge AI—from benchmarks and pricing to specific product strengths and surprising weaknesses.
| Area | Verdict | |-------------------------|--------------------------------------------| | Complex, technical work | Strong; excels with detail & multi-agents | | Vision, doc formatting | Exceptional performance | | Design, front end UI | Surprisingly bad results | | Spec/strategy writing | Overwhelming, unreadable output | | Token usage | Extremely high—expensive to run | | Safeguards | Prudent fallback to older models |
Claire encourages listeners to experiment with Fable 5 for their most demanding tasks—just beware of costs and readability. For further tips, find the model’s prompting guide and share your own “overbuilt” or “ugly” masterpieces.
End of summary.