Loading summary
Host or Presenter
Breaking news out of OpenAI. A brand new model, or really an update of a very popular model has just been released. And Sam Altman just tweeted out and said, good new model out. And he's referencing an update that OpenAI just tweeted out where they said, GPT4O got an update, have a little party emoji. They said the model's creative writing ability has leveled up. More natural, engaging and tailored writing to improve relevant relevance and readability. It's also better at working with uploaded files, providing deeper insights and more thorough responses. Okay, I'm going to unpack everything going on here because this is absolutely fascinating and there's a lot of interesting things people have criticized, but also people are sort of applauding with this. Before I do that, I will say if you haven't already joined the AI Box waitlist over on AI Box AI for my AI tool that is coming out shortly, I would love you to join. And if you want to get daily AI news straight into your inbox with just pure signal, check out the bottom of AI Box AI. There's a little newsletter thing you can fill out, you can get on the waitlist, get the newsletter and we'll be sharing every single day. My top three AI stories. I try to make this short, concise, useful. So check it out. I hope this is really valuable to you. Link is in the description. All right, so over on. Over with everything happening over on OpenAI, what I think is really fascinating here is they haven't really announced if you know, per se. They went and did like a retrain of this model and people are betting that they haven't. This is just some essentially almost like software updates or tweaks they were able to do to make this a little bit better, maybe even better prompting as we feel like is happening with O1 preview. So this is really interesting. Um, Regist, a lot of the comments on this update are people, you know being disappointed. This isn't GPT5. There's someone that said can we get pinned chats folders in the chat history sidebar? Thanks. Absolutely. This is something that I've been begging them for for like, for like two years now. So I think this is really interesting. Someone said lol. Everyone is waiting for O1 OpenAI another update for 4.0. Okay, so it gets deeper into what is so interesting that's going on here. So there was lmarena AI that just tweeted. So by the way, this is formerly lmsys.org, which is kind of like this benchmark platform that everybody uses. It's now called Chatbot Arena. Chatbot arena is kind of the name, but anyways, it's LM Arena AI is the website. You can see that all these websites are getting more popular. They're buying better domain names than limsys.org now. Lmarena AI makes a lot more sense. Anyways, this is what they said in a recent tweet about this whole announcement and what came out and if it's actually better, they said over the past week, oh, and by the way, I'll just explain how this Chatbot arena works is essentially users can go on there and they're given like two different chat responses and they pick which one they like better. And so it's the Chatbot arena, it's anonymous. You don't know which chat bot is giving you there, giving you the response. There's know hundreds of chatbots on there. You go talk with it and just pick which responses you like better. It pits the best AI models against each other. And whoever gets the highest score by the end essentially rises in the rankings. I love this because it's, it's not just like voting on, oh, I like OpenAI better than I like Google. It's Anonymous. So it's essentially just kind of like a blind test on what one has the best responses. Okay, so this is what they said. They said over the last week. The latest OpenAI ChatGPT420, 2024, 1120 completed anonymously or competed anonymously as, quote, anonymous chatbots. They just called it Anonymous chatbot. It gathered 8,000 plus community votes. The result, OpenAI has now reclaimed the number one spot, surpassing Gemini XP exp 1114 with an impressive 1361 score. Okay, what this means is just a week ago, Gemini came out with a model that shot to the top of the rankings on the Chatbot Arena. And essentially everyone was like, oh my gosh, OpenAI anthropic. All these other companies are toast. Google's coming and smoking them with Gemini. I mean, to be fair, like it did for a week. But it's funny because you could probably assume the first day that Gemini hit that top chart, OpenAI put in this new model. Sometimes I wonder if Open Eyes just sitting on the sidelines with models ready to go in case something big happens. I just pushed this. Or maybe it's not even the best thing they got, but they're like, oh, this is significantly better. It was just like they don't want to get bumped down from the number one spot on this, right? They, they never want to be seen as number two. They always want to be number one. So they were number two for about a whole week while this new chatbot was competing anonymously. They. And then this is what LM arena tweeted about it. They said, latest GPT4O shows remarkable improvements. We see a leap in creative writing. It went from a score of 1356-1402. So up, you know, almost 40, they said, as well as technical domain coding and math. So this is the exact rankings they bumped from. Overall. They bumped from, of course, the number two chatbot to number one on style control. They went from number two to number one on creative writing, number two to number one on coding, number two to number one on math. They went from number four to number three. So they still have some work to do on math. On hard problems, which a lot of people call logic reasoning, I believe they went from number two to number one. So then they said, huge congrats to OpenAI. More analysis below. So this is fascinating because I love when we have these models come out and beyond, just like looking at Reddit or something where people are complaining about which one's better or worse for different use cases. This, the Chatbot arena is a pretty, in my opinion, it's a pretty good take on this. Now, this isn't without criticism. Some people are saying OpenAI is playing chatbot arena and they're not making their models better. They're just figuring out how to work game the system a little bit. Right. Okay, so I want to break you down into this analogy, but first things first. I mean, I do want to give a kudo to OpenAI for continuing to push. Some people have complained that, yeah, they're focusing on GPT 4.0 when 01 preview is still this beta thing and everyone wants it to actually be live and available. So LM arena posts essentially a bunch of screenshots of how much better the GPT 41120 is. And it's pretty impressive when you are comparing this thing to other AI models. For example, you know, it's, it's the only model that has cracked above, above 1400. And you can, you can see like on the very far end, you know, below, below 1300, you have models like the Llama 3.1 405B, which is essentially Meta's very, very best model. So Meta's very, very best model, llama 3.1405 B, 400. Yeah, whatever. 405 billion parameters. That is somewhere around, you know, 1250, 1260 and we're already having this one up in the 1400. You know, it's past 1400. Really, really impressive now. Are they just gaming, you know, the benchmark? What's going on here? I mean, side note, funny they had a typo when they first released this where they said it was called 1111, but it's 2011. Whatever, they fixed it. So in any case, some people are looking at all of these benchmarks. They're saying they're shocked that Claude 3.6 sonnet is number eight in the arena. So sometimes it's shocking. Right? I know a lot of people that say they love Claude, it's got a better tone and style, they use it for everything. And so yeah, to me this is kind of funny. Um, but this is. And then, you know, in response to that someone said, I'm starting to think that Claude Bros are paid shills. They swear up and down it's the best model ever. But the benchmarks say otherwise. So you know, there's, there's some fighting words in the comments here. This is, this is what I think is interesting though. John Howell said it appears they released it anonymously to gather data, fine tune and over fit on these leaderboard evaluations. Don't trust this leaderboard too much as it's being gamed by OpenAI and others. So someone thinks it's being, someone thinks it's being gamed, which I think is interesting. And that was a AI analytics consultant that said that. Not, I don't think anyone too, too famous. But this is interesting. This is a big point that I've seen in a bunch of these different comments. Someone said, I'm still puzzled how O1 Mini continues to beat 01 Preview at coding. In my experience, if I get, I'll use 01 mini coding on the web and when it gets stuck on something, there's usually a good chance o1 preview can solve the issue. Okay, this is other people talking about other models because they're all in the same screenshot. So bottom line, 4.0 is getting better but at the same time most people are still waiting for, you know, the O1 preview to come out. They're saying, you know, if they're complaining that maybe it's not as good and maybe that's the reason it hasn't come out fully is because it's just not as good at everything because yeah, in the benchmarks people really, really are loving the, you know, the GPT4 and it's ranking higher than 01 preview, which is at like spot number three. Just under Google Gemini. So Google Gemini's newest model is beating like this 01 preview, this brand new model that's supposed to come out and crush everything. So yeah, yeah, that's. That's pretty, pretty crazy. And somehow the O1 Mini, meaning like the smaller, condensed, faster model is doing better or has. Is. Yeah, scoring higher, unlike for coding, than 01 preview. So there's a lot of things that are a little bit perplexing here. Overall, you got to give a big kudos out to the whole team over at OpenAI. Just everything that they have been able to put out and all of the work that they've done that you know that the benchmark speaks for itself. This thing is really accelerating. So if you haven't tried it already, I would encourage you to go over to Chat GPT and try out the new model, try out some of the updates. You know, you creative, you know, article about Egypt, which I just did to test it out, and you can see how much, you know, I don't know, like how interesting it is. Okay, so I just said write a creative article about Egypt. I was just testing it. Alleged creativity. I'll give you the first blurb and you can, you could decide where we're at. Said Discovering the wonders of Egypt. A timeless journey through history and culture. Egypt, a land of timeless allure, stands as a bridge between ancient civilization and modern life. Renowned for its iconic landmarks, vibrant culture, and profound historical significance, Egypt continues to captivate travelers and historians alike. Okay, it sounds a little bit Chat GPT. It honestly sounds a little bit less strange words, but somehow timeless allure sounds kind of funny to me, even. I don't know, some things that just sound a little too cliche. Vibrant culture, I don't know, that's just me. But overall, I am noticing some of the words that they're saying. It seems a little bit less chatgpt. I haven't heard it say the word delve. Delve is a clear giveaway if it says let's delve into this, that it's chatgpt. So sometimes I wonder if they just like manually turned off some of those words or if they've, you know, put some extra prompts in here to try to make this thing sound a little bit more natural. So overall, you know, creativity is up. How, how the sounds is up, the tone is up a little bit. Up to you to decide if this really competes with Anthropic Claude. But overall I'm really excited and I'll keep you up to date on everything happening with this story. Seems like OpenAI is not dead in the water. This is something you got to love about a small startup. Well, people come that you know that like Google or others are like, oh, they got stagnant. Which by the way, I don't think that's the case with Google right now. They're really cooking with a lot of these new models. But while you can claim that some of the bigger companies are stagnant, OpenAI, you have to admit, is just doing things so scrappy. Like a startup like throwing out anonymous try to rank and outrank Google. You gotta love to see it. So they definitely have a lot of fight left in them. Absolutely fascinating. Again, if you're interested in getting the waitlist for AI Box, go over to AIBox AI and check out my newsletter that we send out every single day of our top three three AI Stories. It's at the bottom of the AI Box website and you can get those straight to your inbox. And hope you thanks so much for tuning into the podcast today. I will catch you next time.
Episode: ChatGPT Got an Update, 40 Beats Google Again
Release Date: December 1, 2024
The episode kicks off with the host breaking down significant news from OpenAI regarding the latest update to their popular model, GPT4O. Announced via a tweet from Sam Altman, the update enhances GPT4O's creative writing capabilities, making it more natural, engaging, and tailored for improved relevance and readability. Additionally, the model now excels in handling uploaded files, offering deeper insights and more thorough responses.
Notable Quote:
“GPT4O got an update... the model's creative writing ability has leveled up... it's better at working with uploaded files, providing deeper insights and more thorough responses.”
[00:00]
A significant portion of the discussion centers around the Chatbot Arena, previously known as lmsys.org, now rebranded to LM Arena AI. This platform anonymously pits various AI models against each other, allowing users to vote on the best responses without knowing which chatbot is generating them. Recently, GPT4O reclaimed the number one spot, surpassing Google’s Gemini XP 1114 with a score of 1361, after Gemini had briefly led the rankings.
Notable Quote:
“OpenAI has now reclaimed the number one spot, surpassing Gemini XP 1114 with an impressive 1361 score.”
[Various timestamps]
The host speculates whether OpenAI strategically timed the GPT4O update to counter Gemini’s rise, maintaining their dominance in the arena. This maneuver showcases OpenAI's commitment to staying at the forefront of AI advancements.
The update has sparked a mix of applause and criticism within the AI community. Some users express disappointment, expecting a more substantial update like GPT5 instead of what appears to be incremental improvements. There are also calls for features such as pinned chats and folders in the chat history sidebar, which have been long requested by users.
Notable Quote:
“This isn't GPT5. There's someone that said can we get pinned chats folders in the chat history sidebar? Thanks. Absolutely.”
[Early in the transcript]
Moreover, skepticism arises regarding the integrity of the Chatbot Arena rankings. Critics argue that OpenAI might be optimizing GPT4O specifically to perform well in these benchmarks rather than genuinely improving the model's capabilities.
Notable Quote:
“John Howell said it appears they released it anonymously to gather data, fine tune and over fit on these leaderboard evaluations. Don’t trust this leaderboard too much as it's being gamed by OpenAI and others.”
[Mid-transcript]
The host delves into the possible strategies behind OpenAI’s swift response to competitor advancements. He suggests that OpenAI might be strategically holding back improvements, ready to release significant updates when needed to maintain their top position. This proactive approach reflects OpenAI’s agility and competitive spirit in the rapidly evolving AI landscape.
Notable Quote:
“Sometimes I wonder if OpenAI’s just sitting on the sidelines with models ready to go in case something big happens.”
[Mid-transcript]
Transitioning to a more personal note, the host shares his experience testing the new GPT4O model by prompting it to write a creative article about Egypt. He observes improvements in creativity and tone but notes some clichéd phrases, suggesting that while the model has advanced, there’s still room for refinement.
Notable Quote:
“Overall, I am noticing some of the words that they're saying. It seems a little bit less ChatGPT. I haven't heard it say the word 'delve.'”
[Later in the transcript]
This hands-on testing underscores the advancements in GPT4O’s ability to produce more natural and engaging content, aligning with the update’s advertised enhancements.
In wrapping up, the host commends OpenAI for their relentless drive and innovative spirit, highlighting their ability to compete fiercely with giants like Google. He emphasizes the importance of platforms like Chatbot Arena in providing unbiased evaluations of AI models, despite some controversy over potential gaming of the system. The episode concludes with an encouragement for listeners to explore the updated GPT4O model and stay informed through AI Box’s waitlist and newsletter.
Notable Quote:
“Overall, GPT4O is getting better... a big kudos out to the whole team over at OpenAI.”
[Final segments of the transcript]
This episode provides an in-depth analysis of the latest developments in AI, particularly focusing on OpenAI’s GPT4O update and its implications in the competitive AI landscape. Through detailed discussion and personal insights, listeners gain a comprehensive understanding of the evolving dynamics and future prospects in artificial intelligence.