Transcript
John Doe (0:01)
Welcome to the AI Chat podcast. Today on the podcast, we're going to be talking about some breaking news out of Grok, AKA xai, and there's been a ton of beef going on between OpenAI and Elon Musk, Sam Altman and Grok and all that kind of stuff. And today, the new flagship model, Grok 3, has just launched. Actually, last night I had to stay up, watch the live stream. It was actually pretty interesting. And they unveiled a bunch of new metrics that pretty much have Grok3 beating Chatgpt and, and every other model, not by an insane leap, but by some significant numbers. I'll be breaking down all of that and also showing you a live demo because I have, you know, XAI Premium or whatever that you need to have Grok 3, so I'll be breaking down all of that. Before we get into the episode, I wanted to mention if you've ever wanted to start an online business or use AI tools to grow and scale your current company. I have a exclusive school community called AI Hustle, or every single week I record videos that I don't post anywhere else that essentially show you the AI tools that I'm using to grow and scale my companies and different side hustles that I'm doing. And my co host Jamie, has made over $25,000 last year doing a side hustle with Amazon. He's using AI this year to scale that up. We break that down, along with dozens of other videos in a classroom section. Over 300 members that all talk and kind of share their ideas. So I'd love to have you as a member of the community. It's $19 a month, and in the past we had it at a hundred, so it's at a discount right now. And if you lock that in, it won't be raised on you, but the link is in the description if you want to check that out. And I'd love to help you take your business to the next level using AI. All right, let's get into the episode. So what's happening with Grok? This is, of course, the latest flagship model. They did this whole live stream last night, which, okay, this is just a side tangent, but whenever they do these live streams, they always say they started a specific time. I've just noticed with Elon Musk and all of his companies, and literally, for, like, the Tesla live stream, I know they had, like, some sort of, like, issues, but I, like, waited on the live stream for, like, 50 minutes before it actually started. And for this one, I think I only had to wait like an extra 20 minutes beyond when they actually said it was going to start. But it always drives me crazy. I will say maybe it's a good marketing thing because like the number of viewers on the stream was like a hundred thousand and then 200,000 and 400,000 and like 20 minutes in there's like a million people watching this live stream that hasn't even started yet. So I guess maybe there's a marketing strategy involved in all of that. But anyways, that's, that's my only criticism of the whole thing. So this thing is really impressive, a bunch of new capabilities. The big one that I think they, they pushed back a little bit for was to get reasoning and some of these like deep learning models. And of course when Deep Seek came out and totally swamped the whole field like a month ago, then you had OpenAI and Google Gemini both really quickly, within like two weeks, release their own kind of updates to the reasoning models and their deep research models. And so Grok obviously couldn't launch without that with all the other top players. So they've actually launched that as well, which has been pretty interesting. Now if you go over to grok.com or on the mobile app, these are the two places where it's updated. First you will literally see a dropdown where you can switch to Grok 3. This is their latest model and they have something called Think they're going to be releasing their deep research, I believe later. But I actually was testing out Grok 3 today and I will admit I had mixed results with it. Now I'll show you if I have my history here, I'll show you. It does some pretty impressive things. So I was like, I got to test this out. I was going to Walmart this morning and getting some new stuff for my car because my wife got pulled over last night because one of our brake lights I guess on our truck is out. And so, yeah, know, traumatic experience for her. It's actually the first time she's been pulled over in her life, which is hilarious because she's driving my truck. So, you know, go figure. So I was, I was testing, I was testing it out and I was asking it, what kind of, you know, what kind of like blades do I need for my, for my windshield wipers on my truck? And it told me and I was like, I'm just going to trust whatever it says and I'm going to go, I was like, for. It said for 2006 Toyota Tundra you'll need 19 inch windshield wiper blades. All right, I took it at Faith, and guess what? It lied to me. I bought them. I came back to the car. They were way too short. I needed 26 inches. So this thing was definitely off on that. And the annoying thing was I was like when I was in the store. So I actually also asked it for, like, the bulb that I needed. I was like, what type of brake light bulb do I need? And it gave me one. And when I was in the store, I was kind of doubting it because there's just like this random bulb that says 7 4, 4, 3 on it. And I'm like, ah, I don't know. So I googled it and it was right. And I'm like, okay, it was right about the bulb. It was probably right about the blades. Oh, man, I picked the wrong one to verify on Google with. So turns out I needed to go back and get different blades. Okay. The thing that I did think was quite impressive with this, though. First of all, I'm like, okay, I have this truck. What kind of blades do I need? And then I just said, what type of brake lights do I need? It automatically jumped to the assumption. It's like, I'm assuming you're talking about, you know, the same truck that you just referred to. So this is what you need. It tells me the bulb type. It also tells me what wattage and voltage I needed to look for, which was pretty useful. Then also told me, look, you're going to probably want to get two of these because the passenger one also needs it. It also goes and tells me common brands, which was useful because I think I actually ended up buying a Sylvania. And I, like, knew because of what it was saying that it was probably the right one. So anyways, then it goes and gives me a bunch of other information on, like, if you want to replace it, these are all the steps. You want to replace it. This was cool because these weren't questions I was even asking. I was just like, what bulb do I need it? You know, guess what kind of truck? Because of my last question. And then it's like, and if you're doing this, you're probably gonna want to change it. Here's the steps to change it. So, like, I guess I could have changed my prompt to be like, only tell me the name of the bulb, no other information. And I probably could have got a faster response. But, like, for me, as someone that was actually using it, this was very useful to get kind of all this additional details in it. And by the way, if you're Just listening on Apple, I'm explaining everything, but if you're on Spotify or YouTube, I'm breaking down this and sharing my screen with a video to break it all down. So anyways, I wanted to give you the antidote of me actually testing this thing out. So you, you know one other thing that I did with it, I tried their like image upload and how in Walmart and there's these two different windshield wipers and I never know if I'm getting scammed by these companies. Anything in auto mechanics, I swear there's like a gimmick. There's two kind of windshield wipers from the same brand. One's like 15 bucks, almost 7 bucks. One was Optimum Plus. And I'm like, is there any difference between these in reality? And Grok was a pretty good salesman and told me that apparently the op that one of them has equal pressure throughout the whole blade and is less likely to get wrecked and blah blah, blah, blah. So I, I end up buying the more expensive one because Grok told me it was good, but I mean it's kind of useful. And it was cool that I was able to just like snap a photo while in Walmart on my phone, have it upload and then like it actually was really quick. And I don't think the Internet was like super fast in Walmart. So anyways, this was my actual use case tests test of Grok. Let's go into what the updates were and why. I think this is an impressive thing. You guys are probably all sick of like, okay, that's enough of your like stupid trip to Walmart for your car. What is this thing actually capable of? So I'll, I'll break all of this down for you. So the first thing that was really impressive about this is how they actually trained this. They said they wanted to start with first principles, which is something that XAI has been really good, good at. And they essentially went to like, they're like, hey, we got to build like a facility to have enough GPUs to train this AI model. So they go to all of these people that could build facilities like, how long is it going to take you to like build us a data center? All these data center companies, and they're like, yeah, we can build you a data center. It's going to take us about 24 months. And they're like, okay, well we'll be screwed because in 24 months we definitely like, that's two years. If that's how long it takes us do the data center, then we got to train on top of that. Like, like where is ChatGPT gonna be in two years? They'd be completely smoked. So they said, screw it, we're just gonna buy a pre built factory. So this wasn't something built for a data center. They literally went and found I think like an, they said they had to find a factory that was like new enough that it was still good. But someone just went out of business and it's kind of hard anyways. They found some like electrical company that just went out of business or moved locations. They grabbed their factory. It wasn't big enough, so I think they actually had to, to add on to it. But they grabbed it and they were doing every hack in the book, like trying to essentially hack and get this thing built faster. So they, they, the first thing that they did was they wanted to put a hundred thousand GPUs. Everyone said this was impossible. With some engineering feats. They did some crazy stuff. They, they were able to attach a hundred thousand GPUs and halfway through the training they were able. I think that took them about 120 days, like three months. Halfway through the training they added another hundred thousand GPUs, which was like, took them another 90 days. So really like three to six months had this entire thing up and running. And people are like, how the heck did you get this like factory that was not built for data centers? Because data centers are notorious for a bunch of different reasons. Number one, absolute power hogs like 200,000 GPUs. And when you think of a GPU, this isn't the little one in your computer like you're talking about a brick. Like just this massive honking thing. And 200,000 of those is a complete power hog. In addition, cooling that many GPUs is insanity. So what they, what they said they ended up doing was they, they didn't have enough power from the grid. They were getting it hooked up. And in the meantime they just bought thousands and thousands of generators and lined them up on an entire side of the factory. Had all of these generators going on the other side of the factory. They said they literally purchased 25% of the entire United States. Cooling, remote cooling capacity. So pretty much like these trucks, all of these have to be liquid cooled. So they're like waters going through pipes and circulating through pretty much. And so people have like trucks to do like liquid cooling for like big events or concerts or like things. But there's like not that many of them. So they literally had to get 25% of the capacity of the entire United States, I'm sure. Although that was probably great business to be in to cool this entire thing. They said they had so many problems with like one cable would be disconnected because what they did that was different was they actually connected all 200,000 GPUs together and they had to make the redundant so that if one cable got pulled out of one, or there's an issue with one, then all the rest of them keep working. There's a lot of like really impressive things that they were able to pull off for this. All of this to say Grok 3, this current model they had was trained on 10 times more compute than Grok 2, the earlier one. And it might be like one of the most, one of the biggest, most compute used in any AI model. So what was the output? And I know like you heard at the beginning, I'm like complaining that it, it told me the long, the wrong length of my windshield wipers or, or something like that, which Chat GPT or any other model probably could also do that. And I, I'll go and like research why that was. But like overall the other questions I've asked this and tested out, it's very thorough, it's very in depth, it shows you its reasoning processes. It could do a lot of really impressive things. So what are the benchmarks? How did this actually perform for the Math benchmark, math aim 24, it scored 52. Grok mini, which is like their, you know, smaller version was 40. This is like really impressive. The only model that got close was Claude at like 39 and that's like still worse than their mini model. It's still completely beating GPT4.0 and which scored I think the worst and deep Seek so and so Gemini I guess also did fairly good. So anyways they, they completely beat everyone on the math one by like a long shot. 52. So then science, they scored 75, the next runner up was 65. A bunch of models scored 65. So they're solid 10 ahead. And then when it comes to coding, they completely crushed again, scoring 52 on coding. And the next best model that wasn't from grok was like 40. So they really, really crushed it on math, science and coding. And it seems like Chat GPT I've heard from a lot of people is notoriously like sort of struggles in this area. Claude does really good. Most of the developers I talk to use Claude even though they haven't come up with an update in like forever because they just say it's better at coding and so sometimes you find these use cases that these models had better training data or were trained better or fine tuned better on. And it seems like Grok might be the winner now with code. So in the live stream I watched last night, they literally had it. They, they said like build a game for us that's a cross between Bejeweled and Tetris. And it literally like wrote all the code, they ran it and it was an actual functioning game where you had like these Tetris blocks that were, each thing was a different color and if you got three of them in a row, it would like bejeweled, it would like destroy the line or the blocks or whatever. So it was interesting and it was able to spit it out pretty quick. So this was pretty impressive reasoning test time compute. It crushed it. Um, and you essentially are able to tell it think longer about this prompt. And if you just put that in your prompt, there's a button that you can also do. If you tell it to think longer, it bumps up its response from like 78 to 93. So if you tell it like use more compute, think longer about this. And we kind of saw the same thing with ChatGPT. They, they ran some of the similar experiments with similar results. But if you tell it to think longer and use more compute, it'll essentially try to solve the same problem like ten or fifteen or a hundred times. And then it's like what is the average of all 100 times. So if I had said what windshield blade do I need for my truck? Think as long as you want about this. It probably instead of just going and searching and grabbing like the first couple results from Google, it probably would have like looked at like a hundred results and then try to solve it a hundred times and then realize, oh, actually you need the 26 inch blade. So I mean, maybe that's a user error on my fault. I need to, I need to ask, I need to tell it to do that. So the one thing that did not come with ChatGPT because subscribers to the premium tier, which is like 50 bucks a month, get Grok 3 first. Although I think I'm paying like $17 a month. Maybe I'm grandfathered in because I've been paying it for a couple years, but like the $17 a month tier, I'm getting Grok 3. But the one thing that did not release was the voice. So Elon Musk said the voice is a little spotty, should come out in about a week, but that's where you can talk to it and and apparently just like OpenAI's voice thing, which is phenomenal, where it's like really dynamic and you're like, talk really fast. Like, talk like you're running on a treadmill, talk like you're singing, like, talk like you're yodeling, like you can tell it to all these crazy stuff. The voice mode should be good and. But that's not gonna be coming for a little bit in the next few weeks. Grok 3 models also gonna be available via their API, which I'm stoked about because I can then integrate it into AI Box, my software startup. So a lot of really cool things. The biggest thing, okay, this is the biggest W of the entire night. And that is. Elon Musk said once Grok 3 is fully rolled out and everyone can use it, everyone's like, what happens to Grok 2? Because they took like a Q and A on on Twitter and people responding. And he said the older version, once the new one is fully rolled out, the older version will get completely open sourced so anyone can use it. This is amazing. And I think OpenAI could solve all of their controversial problems of going from a nonprofit to a for profit and everyone hating them and stuff if they did this. Sam Altman was like, he did like a, he did a sneaky poll, in my opinion on Twitter yesterday where he was like, what do you guys want? Do you want like the best O, like the best O3 model open sourced because they're about to come up with their new model, or do you guys want us to make the best phone model we possibly can? And like the way he phrased it, everyone, I even said like, oh, I want the best phone model. Cause I was like, oh, this would be cool to have like an open source on my phone. But what I'm realizing is people can take the best model and make phone phone model versions for like, we could do that after. Really what we want is the best model possible that's open sourced, which they're not going to do their flagship model because that's how they make their money. But they could do their older model because like, now that Grok 3 is out, as a consumer, I'm never going to go to my XAI app and just choose to use the older model. I'm always going to try to use Grok 3, but Grok 2 is still capable of doing a lot of things and for developers, saves a ton of money if you can open source it, not have to pay their API fees and host it yourself or run it locally on your own computer. Super, super cool. So I think the biggest win of this entire announcement, other than okay, they made a model that beat everyone in the benchmarks. That is cool. But I think the biggest win is that they're saying they're going to set a precedent where the older model will always be open source. They're just giving that for free to everyone, to the public. So that is really cool. I would love to see OpenAI do that. Since that was the purpose of their company was to be an open source AI company and now they're closed source. I would love to see them follow suit and I think that this will put some pressure on them to potentially do that. You already see Sam Altman kind of like talking about it and I think if this becomes precedent for grok, they'll essentially be forced to, which I'd be thrilled about. Overall, super excited for everything happening. I'll keep you updated on all the latest news going on with xai. Thanks so much for tuning into the podcast. And again, if you want to grow and scale your current business or side hustle using AI tools, make sure to check out the link in the description to the AI Hustle School community. Thanks so much for tuning in and I'll catch you next time.
