A (57:28)
arguably there is like a symbiotic relationship where like everyone's using Claude code because it is the best harness for coding at the moment. You know, OpenAI is way behind. Interestingly, OpenAI from a model perspective, Codex 5.4 is a better model. In Opus 4.6 it just is. It's better. They've tweaked it, it's faster. It doesn't do the nonsense that like earlier versions of codecs were doing. It's much, much better. But the codecs harness is like definitively worse than the code harness. The interesting thing is now you can have a situation where you can use Codex inside of the Claude code harness because people have already hacked it. So you can just plug any model in, including local models. So, you know, the, the, the agentic coding stack, right, is made up of a bunch of things. At the very bottom of the stack is this ball of math. The math ball is like the thing, you know, the weights, the training, the training data. That's the thing that like costs like billions of dollars. And like giant, you know, data centers to produce is like the math ball, right? So as long as the math ball is secure, you're kind of okay. Then above that you've got like the system prompts and the fine tuning and all of the like layers on top of it that the, the Frontier Labs add to make it do things. Right? Now one of the interesting things is you also have a system prompt inside of the harness, right? And so people have been reading the system prompt and it's quite hilarious. Like, we should, we should try and pull out some of the, the things here because, like, some of it, like there was someone who was like, it literally just like repeats over and over, like, don't do illegal things. It's like say it like 10 times and it's, and it's actually hilarious because like the state of the art of stopping a model from doing bad stuff is say it as many times as you possibly can. And like, clearly this brute force approach of like just everywhere, just keep reminding it, don't do illegal stuff is like the state of the art of like getting them to not do illegal stuff. So, so probably one interesting thing is once you know what the system prompt is, it's much easier to circumvent it. And so, you know, this, this will. Now we've, we've seen earlier Claude code system prompts like, this is like the thing that sits right above the ball of math, right? Get leaked. There was one that got leaked like a year ago. It was like 300 pages of like, you know, it's like a spell, it's like invocations of like, hey, math ball, don't do this stuff, do this stuff, whatever, right? So, so you have that and then you can take a model and it's. And all of the like reinforcement learning, fine tuning, all the stuff that's happened. You can take that model and its system prompt and you can use it raw without anything. You can literally just talk to that thing and ask it to do things and it will not be able to do much because it doesn't have access to tools. It doesn't. It could from first principles work this stuff out, but you have to put a layer on top of the model itself that gives it all of the things that it can use to actually do stuff like writing scripts and doing git commands. It needs to know all of that stuff. It needs to know how to do it, why to do it, when to do it. All of that is basically what's in Claude code. It's what makes it really good. Arguably you take the lessons from this thing and you know, one, one interest. One interesting thing about this is that each model is quite idiosyncratic. Like they're, they're quite idiosyncratic in terms of like what prompts work on them because the mo. The math bowl at the bottom is like completely inscrutable. No one has any idea how that works. Like it's like no one in the world who understands what the fuck these things are doing in the bowl of math, right? It's like literally just like from iterative testing and poking it that you work out how it works. The system prompts and all of these harnesses and all of the tool use and all of that stuff is quite different for different models. Different models have different needs and personalities and stuff. You can take different models and plug them into a different harness and they work quite differently. It's not like very like, you know, deterministic thing, unfortunately they're quite, quite stochastic. So what will be interesting I think is everyone taking all of the tricks and their tricks, right? Because they're hard earned empirical data that people have been able to kind of extract from interacting with these models. You can take those tricks and you can apply them to an open source model. Now like all of the tool use, the harnesses, the loops and all of the things that Claude code has done. And so arguably this would be very good for open source models because they will get much better as people figure out how to apply all of the hard won learnings. But the kind of, I guess, frustrating thing in dealing building harnesses is one of the most frustrating things because every three months everything you've done gets invalidated the new model. And my guess is that Anthropic will release a new model very soon, like in the next week. That's my hot take because that model will probably invalidate all of the random shit that was inside Claude code because it'll just do things in a very different way and react to things in a different way and they probably have A different version of Claude code that's already been kind of fine tuned for this new model. And interestingly, we saw the leak of that, right there was the Mythos leak where they accidentally put up the website of like, here's this new model coming. So. So yeah, my guess is that we will see. We will see some stuff happening over the next couple weeks and then as soon as Anthropic does a new model, then OpenAI is forced to respond and then, et cetera, et cetera. So, yeah, well, it should be. It should be a fun couple of weeks from here, I would say, based on the fallout of this situation.