A (24:13)
And we're back. Like every Great middle manager, ChatGPT5's router creates more work based on its own interpretation of what's going on. And as a separate large language model, I can't imagine it has a ton of training data available. If I had to guess and this is a guess by the way. OpenAI has done and will do a lot of fine tuning and reinforcement learning to make it work though, to give it a little grace. This is a new thing that it's doing, and it's doing so at a huge scale. The problems start, by the way, with the fact that ChatGPT5 is taking the user's initial prompt and then deciding which model to use. Unlike previous models, which sent your prompt directly to the model along with the static prompt, which was cached and came first, an important feature in how these models limit token burn. OpenAI starts with a router model that makes takes what you ask and gives it to ChatGPT and tags it based on what kind of thing your question might need. The thing might be a tool such as whether it has to do a web search to spit out the thing at the end, a reasoning model, whether it needs to use a coding language, and so on and so forth. Once ChatGPT has bounced your query across various models burning compute along the way, it then pushes it towards the chat portion of the generation. And each time you ask ChatGPT a question or to do something, a new specialized static prompt is generated, sometimes several, making it impossible to cache them in advance. In simpler terms, each time you message it, ChatGPT has to dump all cached information and instructions for what you need to do and reload it with each prompt. Now here are some examples of what ChatGPT5 has to reload every single time you prompt it, whether or not to use a browser or search the Internet, and under what conditions to do so, because they will change with each prompt how to approach a particular problem based on what the user asked, including any specific ways you meant to answer, tone, brevity, and so on based on their request. Specifics around how it might use, say, OpenAI's code interpreter, such as the usage rules for running a Python script, or how you want the code's output, which again will be different based on each prompt and you can even say do it in exactly the same way. And because it's the large language model, it may hallucinate something different every single goddamn time you prompt ChatGPT5, it has to do this. Worse still, a particular conversation can involve you using multiple different models and tools, requiring you with each and every prompt having to inject a different static prompt for each component that ChatGPT5 uses, and you can't cache the static prompt before the user's intent, because if you did that, it might send an instruction to a model that doesn't make sense, such as telling a reasoning model to give a quick and simple answer, or a mini or nano model to do some sort of deep reasoning which would create a crappy answer and burn tokens in the process. And this is all thanks to the complicated way that OpenAI insisted on building GPT 5 every single time you send something to ChatGPT, you can trigger it to use a different series of models, audio, audio, vision, reasoning, each with their own instructions, static prompts, all while putting different tools, each requiring their own instructions based on what you asked. And reasoning models even have different depths of reasoning, unlike 4o, which is a multimodal model combining text, vision and voice. GPT5 is a rat king of OpenAI's models and tools that gets reborn every single time you ask it to do anything. It can process prompt, it can prompt cache some things, but the core instructions not so much. But let's get a little more granular because I know I've been quite repetitive, but this is detailed. So from what I've been told there are either one or two models at work for the routing, I'm going to go with what I think is most likely based on the discussions I've had with people familiar with the architecture. I've heard the term orchestrator thrown around, potentially suggesting the router may be more omnipresent throughout the process, but I was unable to confirm its existence. Reach out if you hear differently, I'll explain things as they were explained to me, though. When a user sends a prompt, it goes through the splitter leg which decides to send the query on one of two paths. One is called the fast path, where a query is straightforward, such as a text only conversation that doesn't require any analysis or extra tools, or thinking a path where the query may require reasoning or more complex tools like code generation or access to a web browser for research. To be clear, there are prompts where it may be split into multiple parts that trigger multiple models or tools, each requiring their own static instructions. From what I understand, the splitter model is a completely separate large language model, though we don't have a ton of details about it. I also, based on conversations I've had, think there's a chance there could be a separate model that sits above the splitter that does much later classification of how a query might be routed. So you ask it to do something, it might just go okay, this looks like it needs a tool, but I'm going off. Why now? In any case, none of this can be cached because all of this exists before inference, which is where by the way, it's inference I've misstated in the past as like inferring meaning. Inference is everything that happens to get an output to you. So all of the stuff that's happening and by the way, this is all a completely new cost that OpenAI has created. No one does this like this. It's so fucking stupid. But now we get to the chat leg. Now that OpenAI has added layers of abstraction, it can begin cooking up the output, by which I mean do inference. The chat leg is where the pieces that the splitter model created are pulled together, each loaded into there with their their respective static prompts based on what the user asked ChatGPT5 to do. Each piece of the model a tool to generate Python, an image generation tool, a reasoning model to generate an output has to process an entirely new static prompt. And again, that's every interaction. Remember, static prompts are effectively instructions. So the splitter model has told each piece of the pie how to act to create a particular output. As a result, much of this can't be cached, creating more and more repetitious token burn per response. And me to have to repeat this stuff so that you really get it. The upshot of the chat legs static prompt baggage is that you can do a little more here. At least in theory, because each component can be instructed separately, they can again in theory be made to give more individualized specialized outputs, like creating an image with text that is as I'll give an example of very shortly generated using a specific reasoning model. I'm clutching at straws here. I don't really know if this better, but I'm trying to be reasonable. I'm trying to be normal. Every day I try and be normal. Previously, OpenAI's advantage was that a model like 4.0 was kind of a jack of all trades. But to get the benefits of ChatGPT 5, and that's in air quotes, it's engaged a conductor model that can just make things more convoluted, even in the case of simple requests. Let me give you an example. You upload a chart of NFL players stats and ask ChatGPT to decide which is the best of the group and create an image to show the results. In GPT4O, ChatGPT would use one model and thus one static prompt to look at the image, decide which tools to use, and then how to format the response. Response. You only needed one prompt which was cached because one model can look at the stats, pull the data and make the decisions, and then use the image generation tool to make the final image. In GPT5, the ChatGPT conductor model would see the stats, route it to a vision model requiring its own static prompt, then a separate text only reasoning model, one that has no ability to use tools but it might be cheaper to get an answer from, and also requires a static prompt and that would then decide which players are best and then spit out an output and then route it to a completely separate model that can generate text to query the image to, or again need a static prompt for this to then generate the image. On top of all this onerous baggage lies another problem. The GPT5's various models are just more complex. By splitting out the component elements of what a model can do and allowing each model to have different levels of reasoning, even the cheaper ones like midi and nano, OpenAI has created an endless combination of different reasons to have to make a brand new static prompt instruction, all automated by a router, a large language model that chooses what large language model to choose for a query. It is, if I'm honest, kind of funny. Reasoning models work when simply described by breaking up a prompt into component pieces, looking over them and deciding what the best course of action might be. ChatGPT's router is effectively an abstraction hire, breaking up the prompt into component pieces, then choosing different models for each of those pieces, which may in turn be broken up by a reasoning model. While I wouldn't say this is a hat on a hat situation, it is at this point unclear what exactly the benefits of ChatGPT5's new architecture are. Less hallucinations, better answers Based on what I've been told, this was a decision made to increase the model's performance. What I can say is that this very likely increased OpenAI's overhead at a time when it needs to do the exact opposite. Even if ChatGPT5 pushes people towards cheaper models, it does so while guaranteeing extra costs and latency and whatever signals it may learn as people use, this will have to create significant benefits, massive 100% plus gains for it to be anything close to worthwhile. While OpenAI's router may be smart in terms of nuance of how it might answer a query, and even that I question it most decidedly is not more efficient and may have actually increased the burn rate for a company that will lose as much as $8 billion this year. And I think that number might be low too. Yet what I'M left with in writing this script is how wasteful all of this is. OpenAI, a company that has already incinerated upwards of $15 billion in the last two years, has chosen to create a less efficient way of doing business as a means of eking out modest at best performance improve improvements. It just sucks. In our own lives, we're continually pushed and pressured and punished if we get into debt, judged by our peers and our parents if we spend our money recklessly, and if we're too reckless, we find ourselves less likely to receive anything from credit to housing. Companies like OpenAI live by a different set of standards. Sam Altman intends to lose more than $44 billion by the end of 2028 on OpenAI and graciously told CNBC like Lord Farquaad that he was willing to run at a loss for a long time time where he was treated like he was this smart, reasonable decision maker rather than someone that needed to rein in their horrendous spending habits and be more mindful. The ultra rich are rewarded far more for their errant spending habits than we ever are for any thriftiness or austerity measures we make, and none of us are afforded the level of grace that clammy Sam Altman has been and has been feels appropriate. ChatGPT5 is an engineering nightmare, a phenomenally silly and desperate attempt to juice what remains of the dying innovation and excitement within the walls of OpenAI. It's not November 2022 anymore, and let's be honest, there really hasn't been anything exciting or interesting out of this company since GPT4. There's nothing exciting happening at this company. As many as 700 million people a week allegedly use ChatGPT, but nobody can really say why. And OpenAI, despite its massive popularity, cannot seem to stop losing billions of dollars. And it can't seem to explain why that's necessary other than this shit's really expensive, dude. Can anyone actually articulate a reason why we need to burn billions of dollars to do this? What are we doing? Why are we doing it? Has everybody just agreed to do this until it becomes completely untenable? Do we all yearn for the abyss so much that we can't find camaraderie in admitting we were wrong? Look at GPT5. This is, if you believe the hype, the best funded, best resourced company in the world, with the greatest mind at its helm and the greatest minds within its walls. And this is the best. They've got a large language model that chooses which large language model will answer your question. Gee fucking whiz, Sam Altman sounds dandy. And how much better is this? You say? Oh, you can't really say. Fucking brilliant. Hey, does it do anything new? No. Oh, what's that? It's actually our job to work that out for ourselves. Thanks, man. I love it. I love this shit. And if you're someone that is a hype merchant listening to this and you've done really well getting to the end of the third part, by the way, I respect you. You. I want you to email me and explain why they should be justified in burning billions of dollars. If you tell me Uber, if you tell me aws, I will eat you alive. I mean that. I mean that completely literally. I will unhinge my jaw. I'll eat you like Kirby and out of dunce. I've said that one before, but I'm going with him in any case. This three parter has also really reminded me how ridiculous this is, how how nonsensical things have become and how much waste has been kind of justified. Justified on this idea that this will become something by people that don't really know what it does today or might do in the future. None of this is going to end well, and not even the boosters seem to be having fun anymore. Everybody's just flailing around waiting for it to end. Even Sam Altman seems tired of it all. I know I bloody well am. I thank you for listening to Better Offline. The editor and composer of the Better Offline theme song is Matt Osawski. You can check out more of his music and audio projects@matasowski.com m a t t o s o w s k-I.com you can email me at ezetteroffline.com or visit better betteroffline.com to find more podcast links and of course, my newsletter. I also really recommend you go to chat where's your ed? To visit the Discord and go to R betteroffline to check out our Reddit. Thank you so much for listening. Better Offline is a production of Cool Zone Media. For more from Cool Zone Media, Visit our website coolzonemedia.com or check us out on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.