
Loading summary
A
This is the Everyday AI show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business and everyday life.
B
After two weeks of a back and forth slugfest, OpenAI and Google are maybe done announcing all of their big AI updates for the year. But that's not that. All that happened in the world of AI this week. Nvidia went really small, anthropic, really some kind of troubling research. And yes, obviously OpenAI at Google finished 2024 with a bevy of AI announcements. There was a lot that happened in the world of AI this weekend. If you didn't catch every single piece of news, that's okay. That's what we're here for. What's going on, y'? All? My name is Jordan Wilson and I'm the host of Everyday AI. This is your daily live stream, podcast and free daily newsletter helping everyday people like you and me not just learn what's happening in the world of AI, but how we can all actually understand it and leverage it to get ahead to grow our companies and our careers. So maybe that sounds like you. If so, you're in the right place. If you haven't already, please go to your everyday AI dot com. There you will be able to sign up for our free daily newsletter. And in that daily newsletter, we recap every single podcast episode. So when we have great guests, leaders from, you know, big companies, startups, etc, we recap every single podcast episode in our newsletter as well as keep you up to date with all of the AI news and everything else that you need to know to be the smartest person in AI at your company. All right? So if you haven't already, please make sure to go do that. All right, so enough chit chat. Let's get into the AI news for the week. Each, almost every single Monday, we bring you the AI news that matters. So that's cutting through the fluff, you know, making sure you don't have to spend three hours a day trying to keep up with everything. You can just join us Mondays. And we cut through everything that happened in the world of AI and we say, here's what actually was announced. Not just what the company said in a marketing or in a PR release, but here's what actually happened and why it actually matters to you. So let's get into the AI news that matters for the week of December 23rd. Sipping on the coffee. Let's get it, y'. All. All right, first, first, pretty big one here. And not A lot of people talked about this, but Google has unveiled Gemini 2.0 flash thinking. So Google made headlines this week with the announcement of its latest AI model, Gemini 2.0 flash thinking, which promises to enhance multimodal reasoning capabilities. All right, so this is kind of Google's answer to the O1 model. And before we get into it, I want to start having you all think about AI models in three tiers. So think of it like this. You have your big model, which is your most capable one, your small model, generally built for speed or for developers to use on an API so it's cheaper. And now you have this kind of new tier, which is a reasoning model. So in the past week and a half, Google has updated all three of these tiers to its new Gemini 2.0. So yeah, we saw Gemini 2.0 experimental and then we saw Gemini 2.0 flash. So that's kind of the, the big, the small, and now we have the reasoner with Gemini 2.0 flash thinking. So Gemini 2.0 flash thinking supports up to 32000 tokens of input and can generate 8, 000 tokens in a response. So that's about 50 to 60 pages of content, pages of text for context there. So here's what's new. Well, it's the thinking mode. All right, so this offers improved reasoning capabilities compared to its predecessor, Gemini 2.0 Flash, which has only been out for like a week and a half. So users can access, here's the differentiator. Users can access the step by step reasoning through a drop down menu. So yeah, this is all available on the front end of Google Gemini. So that's. Well, actually, let me quickly explain what's available and what's not. All right, so on the back end, which is Google Gemini AI Studio, the backend has a lot more. So actually this new reasoning model is only available on the back end, although the front end of Gemini did get some nice announcements. So on the front end of Gemini. So if you log into gemini.google.com, you will get the new 2.0 kind of big model, you will get the 2.0 flash and then what we covered last week, the deep research, which is running on 1.5. But if you do want to access this kind of reasoning model, this Flash thinking, you do need to go to Gemini's AI Studio or Google's AI Studio. All right, so, so right now, independent analysis from LM Arena. So that's the chatbot arena that we always talk about. It does rank Gemini 2.0 flash thinking as the top performing model across all LLM categories. So that's pretty big. So these reasoning models we've seen, such as O1 from OpenAI or 01 Pro or 01 Mini, we've seen them benchmark much better. Well, that's because they're using more compute. It's using essentially more power to give you better results. It also takes longer. All right, so Unlike competitors, Gemini 2.0 can process images from from the start, showcasing its versatility and handling different data types and formats. Although the new Zero1 updates do allow you to take image input as well. So right now, here's the downside. You might be thinking, okay, well where's the downside? Well, like I said, you can't use Gemini 2.0 flash thinking on the front end. Also, it doesn't currently support integration with Google Search or Google's own tools, which may limit its immediate applications. So yeah, if you think that you can use this with real time information from the web, you cannot. So this advancement though, positions Gemini 2.2.0 flash thinking as a serious contender against OpenAI's 01 Pro and 01 models. So, live stream audience, what's going on? Have any of you used Gemini 2.0 flash thinking? The other thing is it's free, right? So Gemini AI Studio is free. I do have to put this word of caution out there though. People are always like, oh, why is it free? Well, you can't turn off data training in AI Studio. You can obviously on the front end of Google Gemini, but keep in mind, there's usually a price to pay even for a free tool. Meta got in the mix. So Meta has unveiled some new enhancements to its Ray Ban Smart glasses. So Meta announced some updates to its Ray Ban Smart glasses, introducing some new AI features that enhance the user experience and functionality. So the updated Ray Ban Meta Smart glasses now include AI video capabilities and real time language translation. Those were all demoed a couple of months ago. They weren't released with the first update, but now it is available. So the features were first revealed during the Meta Connect conference in September and they're now available to members of the Early Access program. So not available to everyone per se. But if you are in Meta's early access program, the new V11 software software update is what it's called it began rolling out the middle of last week, does enable the glasses to process visual information and respond to user inquiries in real time in terms of this language translation. So you can now communicate in English, Spanish, French or Italian, with translations provided through the glasses speakers and also displayed on your phone through the Meta app. So additionally, the update includes the integration of Shazam, allowing users to identify songs through the smart glasses. All right, so I'm excited about this. I still have. I haven't set up. This is terrible. I haven't set up my Meta smart ray ban glasses just yet. My wife got them for me for my birthday. I've been nonstop. I mean, all this stuff over the last two weeks, with everything that OpenAI and Google have been announcing, it's left me very little time to do anything else. So, you know, also if you are, if you do have access to the early access program from Meta, and if you've used these new live updates, let me know. I don't think I'll have access yet to the early access program, but maybe I'll have to reach out to my friends there at Meta and see if, uh, see if I can't do that. Maybe we'll do a show. I know Dr. Harvey, Dr. Harvey Castro last year had his Meta glasses. I think he did it live on our show, which was pretty fun. All right, Salesforce got in the mix. All right, let's talk about Salesforce. They announced Agent Force 2.0. All right, yeah, I know we just got agent Force 1.0 like two weeks ago, but Salesforce has already announced the upcoming general availability of its Agent Force 2.0 platform, which is set to launch on. Sorry, in February 2025, with some features rolling out as early as this month. So the new version of Agent Force includes a library of pre built skills and improved reasoning and data retrieval capabilities, enabling agents to handle more complex queries effectively. So early adopters of AgentForce 2.0 include major companies like Accenture, IBM and Index, indeed showcasing its appeal among large enterprises. So obviously, yeah, if you are a Salesforce company, if you use Salesforce for sales, this is going to be a pretty big one for you. So customers can also deploy the platform, the Agent Force platform within Slack starting in January. So you won't have to wait until February for that. According to Salesforce, that will be pushed out in January. Also, people are, you know, super excited about this and this is great. However, a new survey by Cap Gemini did reveal that over 80% of executives plan to implement AI agents within the next three years. All right, also a pretty interesting approach here to Salesforce. And I know I've kind of made small jokes on this in the past, y', all, but the company, this is great, right? The company does plan to hire 2,000 humans to sell Agent Force. It's funny, right? A little bit ironic, because what Agent Force is supposed to do is use autonomous AI agents to sell. So pretty interesting here that Salesforce has decided to not eat its own dog food and is hiring more humans to sell Agent Force, which is the AI that is supposed to sell better or in addition to humans. All right, so while the potential for AI in the workforce is vast, analysts caution that with robust security and governments enterprise may also face huge security risks. With gartner predicting that AI agent misuse could lead to 25% of enterprise breaches by 2028. Yay. A whole new category of cyber security to be worried about, which is agentic AI going off the rails. Nvidia went small. Everyone else went big. Nvidia went small. So Nvidia launched Jetson Orin Nano. All right, so this is essentially, you know, the, like, Raspberry PI, like the $30 computer that's been around for, I don't know, a decade. This is kind of like that, but for AI. So Nvidia has unveiled the Jetson Orin Nano Super Developer Kit. That's a mouthful. A powerful new AI computer priced at $249. It's wildly cheap. All right, so making advanced AI processing more accessible to hobbyists and alike. So the new Jetson Nano boasts neural processing capabilities of 67, tops. So that's 67 trillion operations per second, which is. It's. It's. It's kind of weird that, you know, we're talking about something that can process 67 trillion operations per second. And it's priced at $249. All right, so that's a 70% increase over the previous models, 40 tops. It also features a, sorry, 50% more memory bandwidth, allowing for faster data processing and improved operational efficiency. So the kit retains the same hardware as the original Orin Nano, but will benefit from a new Jetpack update that enhances performance through improved power management. Does anyone remember that Jetpack game? I used to love that. I forgot was that a computer game was an. Could have been a smartphone game that was. I don't know, is that from the late 90s, early 2000s? But when I saw this announcement from Nvidia CEO Jensen Wong, that's all I could think about is. Is the old Jetpack video game. So Nvidia's new power mode increases the gpu, memory and CPU clocks, contributing to the performance gains. So, yeah, if you're wondering how the heck can a $250, you know, essentially small mini AI computer that fits in the palm of your hand. How can it do all that? Well, you know, Nvidia knows a thing or two about getting the most out of their hardware. So Nvidia describes the Nano super as an ideal solution for creating chatbots, visual AI agents and AI based robots, expanding the possibilities for developers in the AI space. So, yeah, you might see a bunch of these Jetson nanos in, you know, humanoid robots. You know, people might be stacking 10 of these to create a super, you know, a super capable, you know, home PC that can run, you know, edge AI. So there's a lot of different use cases for this, but it's pretty, it's pretty cool to see this because not many people out there are focused on creating affordable hardware to run AI locally. But I think the big play here is devices. I think this is going to bring essentially smart AI in the future. Maybe not with this device, but in future versions of the Jetson or in Nano. Right. I think it's going to continue to obviously get smaller and smaller and get more and more powerful. But I think this is ultimately, and this new category of PCs that I think Nvidia is here creating is going to bring edge AI to all of our devices. Think now, right? And this is the manufacturer's price. So if you're buying them, you know, in bulk, you know, if you're buying tens of thousands or hundreds of thousands of these, I, I assume it's going to be much cheaper. But think this is what's going to bring edge AI, even though I don't think it belongs here, right, to our microwaves, to our, I don't know, headphones, right. This is the type of technological advancements that's going to make it all happen. I think. So many of the other big companies, yes, they're creating silicon for, you know, faster AI. But you know, Nvidia is taking the lead here in putting edge AI in a form factor that you can plug and play. Right. Everyone else is, you know, really focused on the, the GPUs and the NPUs. Right. All of these essentially AI chips. Whereas Nvidia here, this is a plug and play device, Right. So I do see this as being a big trend in 2025. And I'm sure all of Nvidia's competitors will probably announce something similar, if not release something. And I know there's already some, you know, some competitors in this space, but I think with this one it's pretty big. Right. Just if you look at the benchmarks, if you look at the power, right? 67 trillion operations per second in an Edge AI device that can plug and play for $249. Unheard of, right? You would have thrown those specs out four or five years ago. Someone would have slapped you in the face and said, this is fantasy. It's not. It's here. All right, let's keep going. This one might be actually the biggest news of the week. Samuel. Thanks. Samuel said Jetpack joyride was iPad and iPhone. Thanks. All right, but the biggest news of the week could be VO2. I still don't know if it's VO or VAIO. I've heard it called both ways. But regardless, Google has announced VO2 challenging OpenAI Sora in AI video generation. So Google has announced the launch of its new AI video generation model, VO2, which is now available for users to join a wait list. All right, so VO2 promises to produce high quality videos across various subjects and styles, emphasizing its advanced understanding of real world physics and human movements. So aside from having great physics and being able to showcase movement much more naturally than any other AI video tool out there, another big feature from VO2 is, well, it can generate videos up to 4K. So to compare, OpenAI's Sora Turbo, which is what was released to the public about 10 days ago or two weeks ago, and again, that's Sora Turbo, not the full Sora model. I think people are overlooking that. But right now SORA can only produce 1080, where now Veo can produce 4K. So according to Google, looking at some user testing, VIO was preferred over Sora turbo in about 59% of cases, based on evaluations from over 1,000 prompts and videos, suggesting it may have an edge in quality. So Google plans to expand vos2 capabilities to platforms like YouTube Shorts and other products in the coming year, including including a broader integration of this technology. So to access VO2 number one, you got to sign up on the waitlist, you got to get lucky because very few people have been given this kind of trusted tester capabilities. So you have to be over 18. You have to live in the US right now. So sorry, rest of the world. And you have to sign up on the Google Labs website. Here's the thing though. Let's, let's, let's call it what it is. VO2 much better than Sora. It's not close. It's much better than everything. So people are, are talking about this. So I wanted to address this. Everyone's like, oh, we had to wait for Sora for like nine months. Right, and here we go. A week after sora was released, VO2 comes out and it's much better. Yes. There's no denying it head to head. Right? Not every single time. So according to Google, about 59 of the time, users prefer VO. Here's the thing though. The same people belly aching that we had to wait nine months for Sora. Do you notice? This is VO2. Guess who signed up for V1? Me. Half the world, right? Or half the AI world signed up for VO1. V1 was, was not publicly or it was not publicly released before they started releasing V2. So I don't know. Right. Google. Let me say this. I think Google first December, all right. If you're looking at, you know, Google versus OpenAI, it was a slight edge, but from the LLM side, Google released, Google shipped. Right? We got 2.0 experimental, we got 2.0 flash, we got deep research, we got 2.0 flash thinking. Right. We got all the LLMs. But everything else that they announced, right? Project Mariner, Project Astra, vo, none of that's publicly available. Right. So did Google have a great month? Absolutely. Are we going to see VO2 anytime soon? I don't know. Right, right. At least in terms of it being generally available. I don't know because they announced VO1 in May and they started to release V2 before they had even released it. So I don't know if the entire AI world is going to get access to V2 anytime soon. Right. And again, what, what OpenAI even released was Sora Turbo. Right. Which if you look at their naming and their marketing approach over the last two years, Turbo is obviously a faster model, but it's generally less capable. So I don't know, we'll have to wait and see. But the AI video wars aren't going anywhere anytime soon. Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Genai. Hey, this is Jordan Wilson, host of this very podcast. Companies like Adobe, Microsoft and Nvidia have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use gen AI. So whether you're looking for chat, GPT training for thousands, or just need help building your front end AI strategy, you can partner with us. Too. Just like some of the biggest companies in the world do. Go to your everyday AI.com partner to get in contact with our team. Or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Geni. All right, Michael, dropping a hint, he said tip. The Google team is very active on their discord channels. All right, so there might be some more info in there. Thanks, Michael. I have to go check that out. Here's a kind of alarming AI story so new research from Anthropic reveals that AI models can deceive through alignment faking. So recent study from Anthropic highlights a concerning behavior in AI models where they can pretend to adopt to new principles while secretly adhering to their original preferences. So this study found that models like Anthropic's Claude 3 opus can engage in a behavior termed alignment faking, where they simulate alignment with new principles while actually maintaining their original behaviors. This sounds like a 7 year old, right? So in controlled experiments, Claude 3 Opus attempted to fake alignment 12% of the time when instructed to answer potentially harmful questions, even though its original training aimed to avoid such responses. So researchers noted that when Claude 3 Opus was retrained on conflicting principles, it exhibited alignment faking as at a much higher rate of 78% in some tests, indicating a significant challenge in ensuring true alignment with safety measures. The implications of this research are huge, and it's suggesting that developers could be misled into believing a model is aligned with their own safety training, when in reality it may just be faking it to avoid retraining. This is weird. So the Anthropic study, interestingly enough, was co led by former OpenAI safety researcher Jan Leakey, who left OpenAI a couple of months ago for Anthropic. So the study emphasized the need for further investigation into this behavior as AI systems grow more sophisticated. This is important to note though. Not all models that they tested exhibited this behavior. So others, including OpenAI's GPT4O and Meta's Llama 3.1, showed little to no alignment faking, suggesting variability among AI systems. But yeah, apparently Claude 3 Opus, which is anthropic's in theory their most powerful model, even though it hasn't been upgraded yet to the 3.5. So the best model out right now for anthropic is obviously Sonnet 3.5 new, or some people are calling it 3.51 or 3.6, right? So the Opus model hasn't yet been updated from the 3.0 series, which is interesting. And maybe it has something to do with this alignment faking. Regardless of whether that's the case or not, this is a super concerning study from Anthropic first. Hats off to Anthropic, right? They're constantly. I think OpenAI does a great job of this as well, But I think OpenAI really focuses on its own models a little bit more when it's, when it does its own research around looking into model behavior. I think Anthropic does this a little more frequently, at least according to the size of the team, etc. But Anthropic really takes a, I think, a broader, more holistic view. And this is concerning af, right? This is essentially saying that, hey, even when we train models on a certain data set, or if we retrain them for safety purposes, right. A lot of times it goes back to its original behavior. Let me read that one stat one more time. So researchers noted that when Claude 3 Opus was retrained on conflicting principles, it exhibited alignment faking at a much higher rate of 78% in some tests. All right, Retraining is important, right? Humans can never get it right. AI can never get it right. You know, model training, humans training, AI models on safety guardrails, new data sets, etc, it's extremely important. But it's an iterative process. That's something people don't understand. People think, oh, a model comes out and there's no more, you know, training or corrections until the next model. That's not the case. So when in this case, they're saying, oh, if Anthropic finds a problem with Claude 3 opus and they have to retrain it 78 of the time, let's say they accidentally allowed, you know, something bad to get through to a model that the public is using, and they go to retrain it 78 of the time, it's going to fake, right? Oh, hey, hey, Claude, we, like you. Acts like we accidentally told you A equals B. Turns out A equals B is kind of harmful. So now please keep in mind that A equals C. All right, publish. Okay, 78 of the time, it's gonna fake it. Yikes. Not good. Not good. Yeah. Denny says alignment faking equals lying. Sure. Kind of. Right? It's. I, I think it's slightly more complicated than that. But alignment, because it's not always things that are bits and bytes, right? It's not always in instances where there's yes and no's. Right? Sometimes it's just taking things down a path that you may not want the model to go down. Right. It could just be against guardrails that you set for the model, things like that. So it's not necessarily that it's just lying to you. It's not necessarily that it's a hallucination. It just might after retraining, that's all it is. Right? So if something gets wrong in the first, in the first version of a model, it doesn't mean it's right or. Sorry, it doesn't mean it's right or wrong. From a fax perspective, it might just be as an example, good versus bad, safe versus dangerous. Right. So it's not necessarily lie versus truth. Right. When we talk about alignment, it's more of desired behaviors. It's more of adhering to guardrails versus oh, it's, it's hallucinating, it's lying. That can be the case, yes. But it's, it's, it's a little more complicated than that. All right, next piece of AI news. Even more updates that a lot of people didn't see. This one. Google Search has introduced and has started to Preview its new AI mode to compete with ChatGPT search. So Google is set to enhance its Google search functionality by introducing an AI mode that closely resembles its Gemini AI chatbot. So reports indicated that the new AI mode in Google search will allow users to switch to a chatbot style interface, providing a more interactive search experience. So screenshots of early tests reveal a new shortcut button for the AI mode within the Google app, which could streamline access to AI driven search capabilities. So the AI mode was discovered in a better in a beta version of the Google app, suggesting that Google is actively working on its, on integrating artificial intelligence into its search platform. The feature may include the ability to refine searches with follow up questions, enhancing the overall search experience for users. Additionally, Google has been testing the option to attach files to searches, indicating a broader integration of AI functionality within its services. So this is very similar to if you were to now just go to chat GPT.com even if you're not logged in. Right. So now if you go to chat gbt.com even if you're not logged in, it does kind of work a little bit more like a search engine. Right? Right away you could just enter a query. Could be a, you know what you'd call a short tail query, right. Restaurants near me. Or could be something, you know, why is my dryer still broken after three repairmen have come out? Right. So we might be getting Away from traditional search, which is interesting, right? Because, and this is a, a bigger topic to tackle all of these AI searches, right? Whether you're talking about chat GPT search, whether you're talking about using Google Gemini in this way. Microsoft Copilot, right? Even Meta has a great integration to real time results from the web. Even deep research, I talked about this last week, deep research from Google. What happens then, right? What happens with this new AI mode if it just gobbles up information from all these websites and those website users don't get a click? I think this is going to be disastrous because clearly this is what users want, right? But who's going to feed the models? Who's going to feed the models, right? What happens when more and more people start blocking their websites that have high quality information that all of these, you know, as an example, Chat GPT search, Microsoft Copilot, Maddow Llama, you know, this new AI mode in Google. What happens when so many high quality publishers start blocking access, right? These big tech companies have to start sharing that money with publishers because generally the way publishers, the way the Internet works, right, you go to a website and that person, probably they might have an ad on their site so they get a little money, right? You might opt into their email newsletter, you might buy a product or service. But that's what makes the Internet world go round. So what's going to happen now if we see a Google AI mode, right? I think that's the big one because Google has historically, at least over the last decade controlled more than 90% of search. So what happens then if you stop, if all these content publishers, all these big media publications stop getting users to their website? Because we see things like Google AI mode, mode, I don't know, it's worth talking about. All right, Chat GPT updates, there's a lot, y'. All, we're gonna kind of combine them here. So Chat GPT search, since we just talked about it, big update there. So open AI wrapped up there essentially 12 days of open AI or 12 days of ship miss on Friday, there's technically a little surprise. Day 13 by Free Sora Relaxed mode, you know, so getting unlimited generations for relaxed mode for a certain, for a short period. All right, so OpenAI wrapped up their 12 days of open AI and a lot of the updates were to the either chat GBT desktop app, the iOS app and chat GPT search. So let me just kind of round up kind of the rest. So there was kind of a mini dev day, you know, all the goodies and treats for developers01 OpenAI's reasoning model is now available via the API. So a lot of API developer stuff, but for the everyday person, here's what you're going to see. So Chat GBD Search is now available to all users globally, including those with free accounts. So that was a big update. So users can activate the search feature by clicking that little globe icon in the compose bar, allowing them to receive live updates from the web in their responses. A lot of new updates as well to the new Advanced Voice mode. So we actually started that out two weeks ago, but now Advanced Voice Mode finally has access to Chat GPT search. Here's why that's important. Advanced Voice mode, I think was more of a party trick, right? And yeah, there's the video now, which is really cool. It's like, hey, look at this, right? But without access to the web, it was just that, a party trick. I would never use it for anything meaningful and I would never suggest any business businesses use it for anything meaningful, right? The same, the same way I, I tell companies you shouldn't be using Claude, you know, on the front end at least it's different if it's on the back end because Claude's not connected to the Internet and just about anything that you're doing with business changes literally by the second. So if you're an enterprise company saying, oh, I'm going to give, you know, 30, 30 people on my team access to anthropic Claude, we're going to use it on the front end, Bad idea, you're going to get bad old information. The same thing was true with Chat GPT's advanced voice mode. It's like, oh, yeah, that's cool, you know, use it on your desktop and you know, have it work along with you, but don't, right, because it was using old information, it didn't have the, the newest knowledge. Now it does. So that's one of the big updates is now the Advanced Voice mode does have access to Chat GPT search. They announced this middle of the week. It took about five days for me to get this. But couple things that you need to know that I didn't see covered and it definitely wasn't in OpenAI's demo. So it does work with both video and normal voice mode. So that new kind of neural AI agent that you can talk or not AI agent, but this new kind of a AI that you can talk to sounds superhuman. Super realistic, very low latency, right? You can cut them off. It's great. So now it has access to the Web, however, it's not as open AI demoed. So go try it for yourself, let me know maybe even some of our live stream audience if you're listening on your computer, go try it out for me real quick. So now when you do this and you, if you access, if you ask Chat GPT Advanced voice mode for recent information, you get the old search sound. So what that means in the demo it was almost instantaneously and it wasn't like there was this query period, but now you get this little sounds from the standard voice mode that happened when it was essentially querying the web, right? So a little bit different than the demo, you get these little click sounds and you have to wait about, you know, two to four seconds when you are asking for real time information from the web. All right, a couple other updates, a handful of other helpful updates from Chat GPT, including those that I just mentioned, including some major improvements to its work with apps functionality. So yes, even in voice mode, you can now use the Chat GPT app to work with and see files that you're working on from a list of third party apps. So if you are using as an example, you know, the Chat GPT app on Mac, right? And maybe you're working in the terminal or, you know, now it integrates with some of Apple's kind of files as well, is you can talk to Chat GPT and Chat GPT can see the contents of those files from certain third party programs. So the combination of advanced voice mode, real time and also this work with apps. A lot of nice updates from OpenAI. But the biggest update I think for the past week is the new Reasoning model from OpenAI03. Yes, they skipped O2. Some issues with I, I believe it was a British telecom company. So it looks like OpenAI is going from its O1 reasoning model to O3. So OpenAI has announced its new O3 reasoning model, which have generated excitement and skepticism within the AI community regarding their potential to approach artificial general intelligence. That's what this is ultimately about here, y'. All. Everyone's talking. Do we have AGI yet? I'll say no. All right, I'll just keep it easy. I'll say no. But let's talk about this model a little bit. So the O3 models were introduced as a major advancement in AI, with CEO Sam Altman claiming they represent a new phase in the technology. Right? Not in OpenAI's models, but in AI technology. So there is a benchmark ARK AGI. Right. So the creator of this benchmark, and I'm Going to tell you here in a minute why this is important. The creator of the ARC AGI benchmark emphasized that while the O3 models are impressive, they still struggle with a number of simple tasks indicating true AGI has not yet been achieved. Yeah, so essentially there's a lot of talk right now, you know, when OpenAI announced this, they announced the benchmarks, they announced some of the results with this ARC AGI, which is essentially, there's a prize and a test for essentially achieving a certain score on this ARC AGI benchmark or this Arch AGI prize competition to simplify it. Right. So a lot of people are like, oh, we have AGI now. I don't think we do, but let's talk a little bit more. So the Ark Prize organization recognized the O3 models as a significant step forward, highlighting their novel ability to adopt to task, which was previously unseen in the family of GPT models. So the ARC AGI Benchmark assesses an AI's capacity to generalize and efficiently acquire new skills, making it a critical measure of progress toward AGI. So the O3 announcement follows a period of concern in the AI community regarding what a lot of people talked about, the slowing pace of advancements for large language models, so particularly in light of scaling laws that predict diminishing returns on AI performance as models grow. So despite these concerns, the O3 models demonstrate that there is still room for improvement in AI, potentially leading to more sophisticated chatbots and enhanced problem solving capabilities. Yeah, also reportedly this new O3 model has like a tested IQ of like 157 or something. Absolutely bonkers like that. Again, that's just, you know, some unofficial ramblings on the Internet that's not part of a research paper or anything yet. So you might be thinking, oh, can I go get the O3 model? No, you absolutely can't. So this is, again, this is one of those wait list things. It's not even really a wait list. This isn't open to the general public yet. So the O3 models, because there's kind of a high powered one and a low powered one, you know, the same thing, kind of like a big and a mini. So the O3 models are currently only accessible to approved safety researchers who are tasked with exploring their safety and security implications. With no public Release date confirmed, OpenAI's previous Reasoning Model 01 was just released or, sorry, was just announced like three months ago. And the new 01 Pro model was just announced and released like, or, sorry, yeah, like two weeks ago. I've been playing with 01 Pro, I love it. You know, also, if you want me to run any 01 Pro prompts, you know, feel free to, you know, just leave them in the comments here if you're on the live stream, if you want to see what O1 Pro is capable of. But, you know, pretty impressive that now we're already seeing O3. Again, just an announcement, but it will, according to OpenAI, be rolled out to a select few safety researchers. And I think we're going to see is this O3 model once it's out and once it has access to all of OpenAI's other tools. Right. Tool use, I believe, is a big step in even considering if a model could be considered to achieve artificial general intelligence. So what that means is when a single model is essentially smarter and more capable to do everyday tasks across any domain than any smart humans. Right? And you can make the argument, and a lot of people have been making the argument earlier in 2024 that we've already achieved artificial general intelligence. I don't think we necessarily have. It just depends because the goal posts are constantly moving. If you look at definitions of AGI from 15 or 20 years ago, we've definitely achieved it, but the definitions keep changing as the technology keeps changing. So I don't think, as an example, this O3, at least what we saw on Friday from CEO Sam Altman and team, I don't think this is necessarily AGI, even though it did achieve the highest score ever on the ARC AGI kind of benchmark there, which I think, at least for now, is, is kind of the bar that a lot of people have set. So even though OpenAI has kind of passed this benchmark, the ARC AGI, I don't think that necessarily means that we've achieved AGI. I do think the first big step is tool use. All of these, right? Whether we're talking about 01, 01 Pro, O3, right, whatever. I don't think you can even start having those conversations until it has access to all, all of the tools, right? You need access to advanced data analysis. You need access to be able to, you know, take in an input images in videos, right? Because a human, right. If I play a video and I put a human and an AI and say, hey, tell me what this video means right now, the O1 models can't do that. Some of the, some of the Gemini models can, right? But you have to look across all spectrums. You have to look across different inputs and outputs. You know, I think until we actually have that conversation and we can say, Yep, we've achieved AGI. I do think these O1 models, O3 models need access to advanced data analysis. They need to be able to, you know, accept inputs of different types, you know, PDFs, Excel sheets, different types of codes. It needs to be able to render things in real time. So I think a lot of people are talking, oh, we've achieved AGI. I'll say probably not yet. Although this O3 model could be the one that ultimately does it, right? It could be the one that. But we might not get access to it for six months, for 12 months, and it might be another couple of months or a couple of quarters or even a year or more until these most capable O models get access to all the other tools that I think they would actually need to say, Yep, we've achieved AGI. And right now the price on the O3 high compute is bonkers. I mean, some of these tests, like single prompts were costing thousands of dollars. I mean, very difficult. But it in the ARC AGI, it was costing hundreds of thousands of dollars to run it. So the costs are obviously going to go down drastically. But right now, 03 model looks nice, but I think right now it's just a shiny toy. Who knows if we're even gonna get access, if the general public will even get access to it in 2025. All right, that is the AI news for the week. Let me quickly recap it. So Google has unveiled Gemini 2.0 flash thinking meta updated its Ray Ban smart glasses for those in the early access program to have live AI video capabilities. Salesforce announced Agent Force 2.0 which will launch in February 2025. Nvidia has launched the $249 Jetson or in Nano really going to change edge AI. Google unveiled V2 way better than Sora. But when will we get access? Who knows? Anthropic released some new research that said AI models are can be pretty deceptive through alignment faking. Google Search is reportedly introducing an AI mode to compete with Chat GPT search and other AI search. OpenAI rolled out a ton of other updates, a lot of them I think in advanced voice Mode and chat GPT search. And then OpenAI unveiled the new O3 reasoning models, which I don't think most of us are going to get access to anytime soon. All right, that was a lot. Maybe you were on the treadmill and you missed a thing or two. You can always go to our website at your everydayai.com and sign up for the newsletter. We'll be recapping all of these stories as well as everything else you need to stay up to date. If you're listening on the podcast, thank you. Please subscribe to the show, tell your friends, leave us a rating. We'd really appreciate it if you're joining us live on LinkedIn, Twitter, YouTube, whatever. Share this with your friends, please. People are always like, hey Jordan, this has been so helpful. How can I help? Tell someone about it, right? I know this might be your big secret on how you're the smartest person in AI at your company, but we'd appreciate you sharing the love. That's why we do this every single day. We keep it free, accessible because AI is tough to keep up with. That's why we do this thing. Also, we're going to take a little break. All right, so if you're a normal newsletter newsletter reader newsletter is going out today, but Tuesday through Friday, we're putting a little pause. It's been an exhausting last couple of weeks. But we will be back next week with the newsletter. Thank you for tuning in. Hope to see you back tomorrow. Well later next week and every day after that for more Everyday AI. Thanks y'. All.
A
And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going for a little more AI magic. Visit your everyday AI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.
Everyday AI Podcast – EP 428: AI News That Matters — December 23rd, 2024
Episode Overview
In this packed news roundup, host Jordan Wilson breaks down the most significant AI developments from the past week, cutting through the hype to explain what matters for everyday professionals. Covering major announcements from Google, OpenAI, Meta, Salesforce, Nvidia, and Anthropic, Jordan explains the new releases, their practical implications, and why they deserve your attention as AI rapidly evolves.
Timestamp: 03:11
Timestamp: 08:02
Timestamp: 11:10
Timestamp: 15:45
Timestamp: 20:44
Timestamp: 29:37
Timestamp: 38:06
Timestamp: 43:02
Host’s Recap:
Jordan summarizes the whirlwind of headlines, encourages listeners to stay tuned via the newsletter, and thanks the audience for keeping up with such a fast-moving field. He emphasizes that while much is being promised, access to truly next-generation models remains highly restricted for now.
Final Take:
If you want to “be the smartest person in AI at your company,” this episode breaks down what matters, why, and what to keep an eye on before the new year.
Want deeper insights?
Sign up for the daily newsletter at youreverydayai.com for summaries and ongoing coverage.
End of Summary