
Loading summary
A
This is the Everyday AI show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business and everyday life.
B
Did AI end up being a political force this year? Did AI start to kill off influencers? And maybe more importantly, did we achieve AGI in 2025? Well, more than 11 months ago, we laid out 25 kind of bold predictions for 2025 that covered those questions and a lot more. So now as the year comes to a close, we're fact Checking ourselves. Why? Because when we give predictions, they're much more than that. We think of them more as a roadmap for busy professionals who are trying to make sense of AI and grow their careers. And we're lucky enough on this podcast to be heard by millions of people. So usually around this time each year, it's when business leaders reflect on what's worked that year and then plan for the year ahead. So on today's show, that's exactly what we're going to be doing. We're going to reflect on the advice that we gave you and fact check ourselves in part two of our AI roadmap. Rewind. All right, I'm hope, I hope you excited? I am too. Let's get into it. Welcome to Everyday AI. If you're new here, what's going on? My name is Jordan Wilson and Everyday AI, it's for you. It's a daily live stream podcast and free daily newsletter helping everyday business leaders like you and me make sense of these AI updates and how we can extract the important insights to grow our companies and our careers. So this is a little bit of a special episode. So like I said, in, in January of 2025, we laid out our AI prediction and roadmap series. And now this is part two of fact checking ourselves. So if you missed part one, you can go click back, you know, two episodes, go listen to episode 674 and you can go ahead and check on part one. So in part one, we, we're going reverse order. We went predictions 25 through 13. And then on today's show, we're going to go predictions 2012 through one. So, and we're going to be recapping it all in today's newsletter. So if you miss a certain fact or stat, maybe you're out walking your dog or running the treadmill or running your dog on a treadmill and you miss something, it's all going to be recapped in our newsletter as well as all the other AI news that you need to know for today. Let me set the stage here. So maybe if you didn't listen to episode one, or maybe this is your first episode ever. Well, welcome. So I've been lucky enough over the past three years to talk to some of the smartest people in the world when it comes to AI. So every, everyone from, you know, people who work at the big companies, you know, like Google, you know, Microsoft, et cetera, to small business owners, you know, who are making use of this technology, to Fortune 100 CEOs. So you hear a lot of these conversations on the podcast, but there's a lot more that happen off air that you don't hear about. So I have been lucky enough over the past three years to talk to hundreds of the smartest people in the world. So when I put together these prediction shows in January, like I said, I take a lot of care in them. And at the time they may seem kind of bold and crazy, but as maybe you'll find out in part one and in part two, they actually ended up being not too crazy. But you should go back and listen to them anyways. I still think that they're really valuable. And I want you to fact check myself because I'm going to be coming out with our 2026 AI prediction and roadmap series here in a couple weeks. So if you want to go listen to the full episodes, go listen to episode 443 through 447. All right, let's get straight into it. Like I said, we already went through 25 through 13 in part one. So part two we're picking up with number 12. So number 12 was the first big copyright case would be settled. So here's what it was. Well, yes, that happened. It was a $1.5 billion settlement where anthropic had a class action lawsuit regarding allegedly just stealing information from books. Right? That's what was alleged against them. And there was a $1.5 billion settlement. So this is, I think, one of the first big pieces that eventually we're going to move to a pay to train model. Right. And another good example of that was from, well, this week when OpenAI and Disney, you know, went into a agreement where essentially Disney gets some equity in OpenAI and, and they pay a billion dollars and OpenAI gets to use their IP. So this was the first big copyright case that was settled. Obviously the biggest domino that I've talked about has still been going on for more than two years. The New York Times vs OpenAI case, the anthropic case wasn't the only one against the authors, you know, of these books. It was technically called Barts versus Anthropic, the class action lawsuit. But we got the first big copyright case to be settled and there's been a lot of other lawsuits filed since very similar to that one. And you know, for whatever reason, I think at the time Anthropic was maybe a sitting duck. Right. Technically you could probably make this a, a similar case against some of the other big tech giants. But you know, they're kind of a smaller, technically a smaller company even though they're enormous technically one of the biggest private companies in the world. But they were still small enough. Right. Smaller than Anthropic or sorry Anthropic is smaller than OpenAI, smaller than Microsoft, smaller than Google. So they may have been an easier target per se for this first class action lawsuit. So let's go on to number 11. So anyways, yeah, that I don't know to me. Okay. I also need to grade the, the prediction if it came true or not. So this one I'll say yes, it came true. I didn't think that the New York Times versus OpenAI would be that big case. So yeah, it ended up being still a pretty big one, $1.5 billion settlement to these hundreds of authors that their books were allegedly used for anthropics training data. All right, let's go to number 11. AI influencers are going to start killing off human UGC content. So my actual prediction was that AI influencers or avatars are going to start rep human user generated content at scale. So I'm talking specifically UGC influencer type videos. So although there isn't a huge AI like there's not a huge study on this, I think it's happening. And the crazy thing is is most people don't know. Here's why I'm not a big person on social media, right. This everyday AI gets live streamed to, to LinkedIn, YouTube. But I'm, I'm not like on Instagram or Snap Hub, whatever those things are. Right. However I do know we've right. My other company, you know, I have another company, we do you know, marketing for small businesses and I know the tools that are out there and I know people that are using these and big companies are now using just AI influencers for UGC campaigns. User generated content. Right. It's, it's those videos if you're scrolling your feedback, it's just a random, you know, random person pushing a, a beauty product or you know, a new workout, whatever it is. Right. You don't know who it is, but someone saying, oh my gosh, this thing, like, you got to check it out, right? So many of those are fake. And AI influencers are becoming a huge thing. So as one example, hopefully I get this right, there's an AI influencer, Lil Maela. It's who has nearly 3 million followers across different social media networks, earns up to $10 million a year and has partnerships with real brands like Samsung, Calvin Klein, Prada and a lot of others. So these companies obviously know that they're working with an AI influencer, right? A lot of the fans or the followers know as well, but some don't. And I'm telling you, y', all, this is going to become the norm. I actually had a, a friend text me earlier this week, you know, asking me about this because she saw an article, you know, about something like this in the media, right? Like is AI going to kill off a certain breed of influencers? And I said, hey, probably by the end of next year, I'd say maybe 80%, you know, of, of what you might see online is going to be AI, right? And the, the platforms are also pushing this and enabling it, right? Whether you think that's a good thing or a bad thing, right? As an example, TikTok expanded their Symphony Gen AD tools. So essentially TikTok is providing a similar service meta similarly has introduced and expanded their AI ad tools use, right? So these big social media platforms are giving brands the tools to use AI in their ads, right? Not full blown avatars just yet, but it's getting there in those full blown AI avatar companies. Heygen Synesthesia, our one, there's a lot of them, you know, they're improving vastly, right? And then you have tools like vo, like Sora, right, that are primed now for making UGC content. I think cameos and Sora is a great use for making UGC content, right? Maybe of yourself, you know, because I think that's all you have the rights to do. But you know, that's a whole nother whole another issue. But I would say this has definitely been a fact. AI influencers are starting to kill off human UGC content. There was actually a Yahoo Finance article here. It said gen Z job warning as new AI trends set to destroy 80% of influencer industry. All right, our next and sorry, I'm a little hoarse. Yeah, been getting randomly sick here. At the end of the year, it's the Chicago weather. It's like, I think it's, it's today, it's going to be like 50 degrees. But like four days ago it was negative five. So yeah, sorry if I sound a little hoarse, it's because I am. All right, so number 10, this is, I said non techies are going to be able to, are going to build on the Fly software, y'. All. This was, this is how fast time flies in AI world. This was before Vibe coding, right? Vibe coding was literally not a thing. And I just kind of, in January, I just kind of predicted there's going to be this thing called vibe coding, right? Except I said it's going to be non technical people building on the Fly software. I think vibe coding is a much better term. Right? So the term vibe coding was actually coined by Andre Karpathy in February, right. A month after this AI prediction, right? But at the time I'm like, these tools are coming out. People are just going to start building their own software and it's not going to be hard. Obviously this one is extremely true, you know, and, and it's, it's so true. It's now going to be baked into browsers, right? A recent announcement, Google's disco experiment. You can generate mini web apps just from the tabs that you have open in your browser, right? And you don't even have to prompt anything, you just click a button, right? It's, it's, it's a, an experimental feature that's coming out. I mean, talk about things like Gemini canvas, ChatGPT, canvas, Claude artifacts, let alone all the, you know, higher level, you know, IDE, vibe, IDEs, you know, cursor, Windsurf, you know, the new Anti Gravity from, from Google Codex, right? There's great kind of vibe coding platforms, but then there's things that are, I'd say more like disposable apps, right? That you can just create in a front end large language model. And that's what so many people are doing. I actually had a great conversation a couple months ago with Paige Bailey, one of the heads there at Google, and she said, you know, the people that are winning hackathons are non technical people. And they're just going into, you know, in her case that she was talking about, they're going into AI Studio. AI Studio has a great build feature. You just, you don't need to know anything about coding. You say, make me an app that does A, B and C and it just happens and you can just use it and it just works, right? So the reality is. Well, a recent study said 70% of new enterprise apps in 2025 were built using low code AI tools. All right, so yeah, crush that one. Number nine, Reasoner wrappers will hit the scene. So my prediction back in January was that tools are going to emerge that wrap reasoning models in enterprise data to drive decisions. So essentially how the transformer old school models work on structured data, I said, well, there's going to be a, a kind of movement for reasoning data or a company's kind of how they make decisions. Right? So I'll say this one didn't hit right a little bit maybe, but it didn't hit as hard as the other predictions. In this case though, I'll say that I'm definitely not wrong. It is starting maybe just a little early on this prediction, but there is probably this new agency layer, right? Kind of what, what I and other people call it, right? But this is a business agency, this is a business's decision making, their expertise, you know, their subject matter experts, it's what happens in their head, right? So my thought was back in January that there's going to be a big push for collecting and curating essentially companies IP their decision making process. Right. And there is a little bit, you know, there's tools like Leta that have become pretty essential to parse and structure the hidden thought process that pairs models, these reasoning models with human or company reasoning. Right. I still think it's going to be a multi billion dollar industry, you know, whether it's next year or the year after. But there is this kind of enterprise layer growth and I think the strongest public evidence is kind of the explosion of agent frameworks and observability or tracing, which is exactly what these wrappers need to function reliably in business settings. So maybe didn't hit on this one. Kind of close, but I don't think it's going anywhere. I think as reasoning models become the default, I think that companies are going to eventually, smart companies are going to eventually realize, wait, we need to feed these reasoning models with our company. Reasoning number eight, virtual machines will become all the rage again. All right, so as a reminder, this is before there was even a chat GPT operator, poor chatGPT operator. It was announced and it's already, it's already dead. Right? But my prediction was that because agents need a computer, that virtual machines or virtual desktops become trendy or at least they become a thing again. And that's definitely happened, right? So Even like Chad GPT's agent, it uses a virtual machine. So maybe companies aren't out there, you know, putting out virtual machines, but it's actually going to start happening and it has, there's a recent announcement, right? Like some of these predictions, there's like big news that came in November and December and I was like, yeah, right, like as I've been planning the follow up show, I'm like, yeah, I knew, I knew something was coming. Right. So obviously Google's Project Mariner, that's positioned as a browser based agent, but the big one actually came in November when Microsoft announced Windows 365 for agents at Ignite. And essentially they have now cloud PCs that are specifically designed for AI agents. Right, that's it. So yeah, it happened. It took a while, but yeah, this was true. Kind of seemed like a, like, like a niche prediction because at the time there was no general use agents from the big companies. Right. You had Copilot, you had Copilot Studio. And that was it. That was it. And Copilot Studio was kind of brand new, but no one was talking about virtual machines. And now they're definitely a thing. Number seven. So AI becomes overly political. I said AI is going to become deeply entangled in politics and policy conflict. Yeah, that's happened. So, you know, and I'm looking at this, I'm based in the US, our biggest listenership is here in the US. So looking at this mainly through the US's point of view. But in December, so this week, President Trump signed an executive order blocking states from regulating AI, framing deregulation of AI as a critical, critical, critical weapon in the AI race against China. So there is literally no bigger way to, for AI to become political than at the federal level saying, hey, states, you can't make laws on this. Which is very, not in the same vein as mostly everything else that President Trump and the Republican Party, right, they like to hand as much power over to the states. You know, that's kind of been one of their big talking points. And this, and AI is completely the opposite, right? So if nothing else, you know, this went against the grain of what President Trump has traditionally done in his second term in office so far. So yeah, definitely AI has become, I'd say, way too political because even outside of that, even outside of the geopolitical, even just in the domestically, it's become political, Right? So In July of 2025, the Trump administration signed America's AI Action Plan. And in there there was an executive order preventing the federal government from procuring AI models that include, quote, unquote, ideological biases or social agendas or woke AI. Right? So essentially the federal government said, yeah, we're not going to do business with anything that we don't like, right. Anything that we say is woke. So y', all, I know, like I, I already know just from what I said there. I'm gonna get, you know, people who are Democrats who are angry at what I said, people who are Republicans angry at what I said. I know I'm gonna get hate mail just from both sides on this, right? I'm just speaking facts, right? This is literally what happened. All right, so just FYI also, it doesn't get more political in AI than the. The cozying up of big tech leaders with the federal government or President Trump. So as an example for the president's in campaign, everyone donated like a million dollars, right? So meta. Mark Zuckerberg, personally, this is according to reports. Nvidia, Google, Amazon, Apple, Tim Cook, Microsoft, OpenAI with, with Sam Altman. Right. Broadcom, Adobe, Perplexity. Right. Essentially everyone, just about any big tech company donated to President Trump's inauguration. Not. Not really normal. So, yeah, another sign that, yeah, AI became extremely political in 2025. All right, prediction number six was global AI regulations tighten, just not in the U.S. right. And yeah, that happened and honestly impacted how users worldwide were able to interact or not interact with their favorite large language model of choice. Right. Obviously in the EU, very restrictive with the EU AI Act. And a lot of features like OpenAI's long term memory upgrade got really delayed in the EU and the UK because of this strict AI regulations. And there's rules for general purpose AI models that became applicable in the EU starting in August with penalties that can reach up to 35 million pounds or 7% of global turnover. Also, Singapore launched its global AI assurance pilot in February and Italy also had their national AI law that took effect in October. So you have a lot of big nations or groups of nations that have different laws, right? I'm not going to spend time talking because it would take hours, but a lot of just very strict regulations on how AI can be used. And in many cases, certain features or certain AI tools just can't be used in certain countries. Right. Which obviously, you know, stifles their innovation. And so, yeah, the EU act went into full enforcement in 2025. Meanwhile, the US did absolutely nothing for the most part to block innovation. There's no real regulation in the us. Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Genai. Hey, this is Jordan Wilson, host of this very podcast companies like Adobe, Microsoft and Nvidia have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for ChatGPT training for thousands or just need help building your front end AI strategy, you can partner with us too. Just like some of the biggest companies in the world do. Go to your everydayai.com partner to get in contact with our team or you can just click on the partner section of our website will help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI. All right, number five narrow AI agents or narrow AGI also achieved. But anyways my prediction when it came to agents that there wasn't going to be a runaway general purpose agent. The general purpose agents we're not going to be good but narrow agents would absolutely dominate. Right? And that prediction came true because you can. There's no one great agent out there. Look at the big players, right? Microsoft Copilot I'd say is probably the closest to having a general agent. But even those are built on a company's certain use cases, right? So they're narrow, they're not general for the most part. Right. OpenAI's agent mode, not really good, right. I'll be honest. Sorry friends at OpenAI who are listening. It's not very good, it's slow, it's clunky for certain tasks it's okay, it's manageable. But general agents aren't there yet. They will be I think especially with some of the recent model updates that just came out. Like looks at watch in the last 10 days or the last couple of weeks specifically Gemini 3 Gemini 3 Flash in GPT 5. 2 and then Claude Opus 4. 5 right on the agent side that really changes what's possible. But we haven't really realized that yet. But the agents that have been making the biggest splashes I'd say are just narrow agents. Agents built around a certain vertical, a certain task. So as an example, Salesforce's agent Force2O you know handles end to end CRM workflows. Oracle deployed 50 role based AI agents in Fusion Cloud apps. GitHub Copilot coding agents obviously Anthropics Claude code right. A lot of coding agents, right. With narrow use cases and Menlo Ventures report show that there's a $7.3 billion enterprise investment in departmental AI, which is essentially narrow AI use cases. All right, number four, prediction. LLM memory becomes a major focus. This prediction obviously came true. So at the time, right. And I, I didn't want to harp on this on every single one, I thought it was funny to talk about the, the vibe coding one because it feels like vibe coding's been around for like five years, but it wasn't even talked about until February, which is after this. But think back to January. Aside from ChatGPT had a very early version of memory for some users. But aside from that, LLM memory wasn't really a thing. It wasn't. People talked about it, some people talked about it. Right. I wasn't the only one, but it obviously came True. So OpenAI had a handful of big updates to their memory and chat history. Google Gemini rolled out a memory upgrade or you know, memory to its users. Claude just. Right. They were the last of kind of the big three that finally rolled out chat memory. So yeah, now the three big front end LLMs in chat, GPT, Gemini and Claude finally have memory. Right. Like I said, it was going to be a major focus and it obviously happened. Also context caching on the developer side has become a standard API feature, whereas before wasn't really a thing. Right. Which is a kind of a, you know, context caching is a form of memory. It's a, it's a form of, you know, not having to pay over and over for things that have already been processed. It's a, it's a more of a technical memory mechanism. And then Also in with Gemini 3, they introduce an architecture that allows for agents to recall user preferences across months of sessions. Right. So memory is taking shape and form in many different ways. But I mean, if you just look at the, the progress on the front end, it's been, one of the biggest things has been memory. It's been using more of the context window in making more of it, if that makes sense. Right, in that context window. Right. Because outside of the context window it doesn't really matter much. But in that context window and then being able to layer in memory, layer in also Gemini has a level of personalization that can work with your, you know, your search history. Right. And even just as we get connectors. Right. And being able to work with that data, it's, it's huge. Right. So I think that the context plus connectors plus memory was definitely the biggest leap forward when it came to front end large language models in 2025. All right, our top three last three predictions here we go. Number three. Large language models become small language models. Well, what I meant was my prediction there was that smaller efficient models would become more prominent and capable than January 2025's big models. Right, that's what I was alluding to. Essentially I said, hey, in a year there's going to be small models that are way better than big models and people are going to be using small models a lot more than they're using them now. Because you know, back in January people weren't really talking or using small models and in some cases there weren't a lot of them available. Right. OpenAI had their mini series, but now, I mean, Google. So let's get to the reality. The reality is, yes, it obviously happened. There's been a new category of models, right? Like as an example, Gemini, Gemini now has their Flash, but then they have their Flash Light, which is an even smaller version of their kind of behemoth model, you know, the Gemini 3 Pro. But it's not just that, it's not just the, you know, the, the GPT 5 mini, the, you know, Gemini 3 flash. It's not even those. Some of the models like GPT OSS. So OpenAI's open source model, a 20 billion parameter model, more powerful than GPT4O, right? So think about a year ago, give or take, GPT4O was the world's best model, right? Now you have an open source 20 billion parameter model that when you look at third party analytics, right, like artificial analysis, it's better. Okay, I'm going to do the math. I'm going to do the math for, for, for those of us that aren't great at math. Parameters, right? Models are judged in their size by parameters for the most part. With closed proprietary models you don't know how many parameters they are. With open source models you do. That's why with GPT OSS 20B, we know it's a 20 billion parameter model. You can think of it like, kind of like a hard drive, right? According to most reports, the earlier versions of GPT4 were 2 trillion parameters. So GPT OSS20B, OpenAI's open source model was about 1% of the size of the model that was the most powerful model in the world about one year prior and it was better. So yeah, the models got smaller and then, I mean there's a whole, I mean the Phi series from Microsoft, really good. The Gemma series, Gemma 3 from Google, which we might see more of here in the next couple of days. So keep, keep your eye out, right? Small Language models. I've always been bullish on them. I'm even more bullish today than I have ever been because that's the direction we've seen. Even domain specific models from OpenAI with their science models, anthropic with their financial models, right? We don't talk about them a lot unless you're one of the few companies that use them. But I've said this all along, the future of large language models is using many small domain specific models. All right, Number two, speaking of models, mixture of models becomes a thing. All right, A year ago, no one's talking about mixture models. We talk about mixture of experts, which is a little different. But my prediction was that there would be systems that would orchestrate multiple models in parallel as a bundle. So let me kind of explain the difference between mixture of models and mixture of experts. And also I even think at the time last year when I was building my prediction show, I don't even know if mixture of models was talked about. I might have made up the, the mechanism. But you know, now people are, are talking about it. There's, there's other, other words for it, right? But mixture of experts or moe, which a know. And now there are people talking about mixture of models. You know, there's different names for it, but mixture of experts is essentially a single sparse model that uses a gating mechanism to activate only a specific subset of parameters or experts for each unit, right? So these big parameters, when you have a mixture of experts model, it only activates certain parameters, right? And that's how it can, you know, it can use a mixture of them faster and better, right? Whereas a mixture of models is more of an ensemble of independent, fully active models that all process the same input, aggregating their distinct outputs to improve overall accurate accuracy. So this, this is another one that man, I was, I was playing in the show, you know, last week and I'm like, there's not really, there hasn't really been one big story, you know, that someone went with this mixture of models and then Zoom did fall companies. Zoom, right. So yeah, the, the video calling company, right? So Zoom put together a essentially mixture of models approach and got the highest score on humanity's last exam. But critics were not really happy and a lot of people are like, I don't know if this counts, because all they did, right? So humanity's last exam, I'm not going to go into it because it'll take a long time to explain, but it's essentially a very hard AI test, right? We'll just Say that with questions that make models really think and reason, the answers aren't in their training data. Right, but usually you give one model. So what Zoom did is they created essentially a mixture of models. So they didn't build a model, they didn't fine tune an open source model, they didn't, you know, they just used off the shelf models, kind of duct taped them together and then they beat the best score. So yeah, they essentially use a mixture of models, but there's a lot of, there's actually a lot of movement in this space. You know, similar to model routing but a little different. But Google's Interactions API from this month, actually it now dynamically routes queries and shares the information between Flash Pro thinking in real time. So they're also using, it's not just model routing, right? Because that's usually, you know, a router sends your prompt or your query to 1. This is a little different. It shares it between all three. So, you know, even Google's Interactions API for developers is using kind of this mixture of models. Also some other, you know, signals. The IDC reported in November that 70% of top AI driven enterprises will use multi, multi model architectures by 2028. And another good example, ARIA or sorry Area, they're a recent sponsor on this show. They're a key player essentially. They have a mixture of models, architecture and they, I mean they recently raised a hundred million dollars. They're a newer company, you know, they had a hundred million dollar round in September I believe. So it is a space that is growing, you know, essentially not having to use one model. I think this is, in the future it is going to become much more commonplace than you would think, right? Because I think the smartest companies are having a modular approach when it comes to their AI strategy. Right? Because sometimes, and I think we saw this with GPT5, right, going from GPT4 to GPT5. For companies that maybe didn't have a modular approach, they had a singular approach. If they were just using, you know, GPT4O, whether on the front end or the back end and maybe all of their, you know, processes with a big switch like that, maybe, I mean, maybe some things work better, you hope things work better, but a lot of things are going to work worse. So that's why I think having a mixture of models approach is really smart. That's what the smartest companies are doing. All right? And then last but not least, the big prediction I said in January, AGI is going to be achieved, but no one notices. All right, so my prediction was that AGI, artificial general intelligence, would arrive in 2025. But daily life doesn't feel any differently. And there's no bold proclamation of, you know, hey, AGI has been achieved. You know, we're going to throw up the AGI flag on the federal buildings and, you know, now the humans take a nap. I think most people have said that AGI is just going to happen and no one's really going to notice. So here's the reality of what happened, and I'm not going to fact check myself on this one too hard. I'm going to let you. All right? So live stream audience, let me know, say, AGI, yes, AGI, no AGI maybe, right? Spotify people in the comments. Let me know. I'm gonna read them. But here's some of the reality. So in 2025, large language models started surpassing top, the smartest humans in the world on almost all elite thinking tests, right? So any test that humans take and a single AI model takes, AI models on almost all beat the smartest humans in the world. Okay? But I mean, AGI still doesn't have one. Agreed meaning. And even the optimistic voices note that today's AI systems can look brilliant on some tasks and be like, wow, this is definitely AGI level. But on other tasks, they can seem very dumb, Right? And let me be, let me be very honest. A lot of that's a skill issue, FYI. But one of the biggest things a little dorkier, unless you're a math geek, right? It was the IMO gold medal performance that was, I think, maybe the standout AGI moment example that a lot of people pointed to in 2025. So what is the IMO? That is the International Mathematical Olympiad. Essentially, it is one of the most difficult math competitions in the world. And OpenAI and Google DeepMind for the first time ever, won gold. They solved all the problems. And then you had, this week, you had Boris Power, the head of applied research at OpenAI. So a lot of people would conclude that Boris is probably one of the smartest people in the world when it comes to AI and research. He said that winning the IMO gold was seen as an AGI level difficulty problem problem, right? It happened, it passed. And no one was like, all right, life changes, right? I don't think that there ever will be a standard or agreed upon term of AGI, right? But there's a lot of sides and I want to talk about two of them. One would be the, the AI IQ test, right? So this is from tracking a good, good Resource. So essentially I have a. On my screen here. And if you ever want to see the video version, it's always on our website@your everydayai.com. but I'll describe it right, essentially, tracking AI, they give offline IQ tests to large language models, right? So the answers aren't in their training data. And then they do, they do an offline and they do a Mensa Norway test, right? So now it's at the point, right. A year ago, the models had an average, you know, an average iq, about the, you know, the average person has about an IQ of a hundred. You know, I think bright, you know, there's different classifications. I guess a bright person is around 115. Gifted is in the 120s. And I think you get to a genius level IQ, I think between like 135 and 130, 140. So now you have, I mean, Gemini 3, Gemini 3 Pro, and not too far behind GPT5 Pro, thinking, GPT52 thinking, and GPT52 Pro. They're at the about 130 mark for these IQ tests. So they are near genius level. They're in the top 1.5% of humans. Right. Okay, so I again, I mean, AGI is not an IQ test, but if you know what you're doing, an AI model is smarter than almost any human in the world. And that's just, you know, normally those humans are smart in their one field. All right, and then I will throw out, right? And I did a whole show on this, I don't know, maybe a year ago, I looked at older definitions. I looked at definitions of, of artificial general intelligence from, you know, 2000-2005-2010-2015. And when you look at all those old definitions and where we're at today, if you were just to look at those old definitions, we've achieved every single old definition, right? But as AI gets smarter, the goalposts keep moving. So I don't know if AGI has been achieved, Right. But at what point will it be achieved? I don't know. I guess when we have a definitive definition. For me, I would say, yeah, probably right. And one reason why I'm going to show you here in a second. But Sam Altman in 2018 in the OpenAI charter defined AGI as AGI is, by which we mean the highly autonomous systems that outperform humans at most economically valuable work. Let me repeat that last part again. Outperform humans at most economically valuable work. Work. So benchmarks have never really measured this. Right. And I think that AI companies may have gotten kind of smart at overfitting their models to score really well on the benchmarks. Right. That doesn't always help them when it comes to human preferences like LM Arena. Right. But now there's a new, a new benchmark that's judged by humans. And I'll probably do a dedicated show on this once because I think it's really interesting. So there's a. This is an open AI. I did talk about this a couple days ago. So this is an OpenAI benchmark called GDP VAL. Right? So right now their model GPT 5.2 Pro has the best score on this, but when they released it in September, it didn't. Right. So a lot of people are like, oh yeah, of course OpenAI is going to have, Right, the best score on their own benchmark. Well, when they released GDP VAL, they didn't. It was Claude 4. 5. But GDP VAL test models on real jobs with economic impact. Right, exactly what Sam Altman's definition. It's when autonomous systems can outperform humans at most economically valuable work. So GDP Bell test models on real job tasks from 44 jobs in nine big US economic sectors. So tasks use real work files like spreadsheets, slides, PDF images, and then they produce those actual products. Right. So previous versions even of GPT5 weren't any good at producing spreadsheets or slides. Well, now GPT5.2 thinking and pro, they really are. So how this is essentially a human who's an expert in a field judges this and they compare the model's work to that of a human's without knowing who did what. Right. So yeah, it's like, oh, hand in a spreadsheet. Smart human, expert human hands in a spreadsheet model. Hands in a spreadsheet. And then an expert who I think they said on average had about 14 years of experience. So a domain expert judge the two ones and they didn't know who did what. Right. And then the scores are average to get the model's overall win rate. And then grading the human grader bases it on usefulness, format and quality. And so here's an example, right? This is an actual example from GDP val. So in a retail task, so this is in retail, they're given a past sales and promo calendar in Excel and they have to produce a forecast and a one slide summary. So then the graders compare the models, spreadsheet and PowerPoint slide to the humans for math readability assumptions and presentations. So what happened? And. Well, let me first read OpenAI's kind of definition here. So they said GPT5.2 thinking is the best model yet for real world professional use on GDP. Val. In eval measuring well specified knowledge work tasks across 44 occupations, GBT5.2 thinking sets a new state of the art score and is our first model that performs at or above human expert level. Specifically, GPT5.2 thinking beats or ties industry professionals on 70.9% of comparisons. All right, and then if you look at GPT5 to pro, the win tie rate is actually 74.1%. Okay, let me repeat that. On economically valuable work judged by experts, one model versus smart human, they have to produce spreadsheets, PowerPoints, writing, etc. The AI did better 3/4 of the time on economically valuable work. Wait, what was that definition that Sam Altman, the CEO of that same company said? AGI. By which we mean highly autonomous systems. Yes. That outperform humans at most economically valuable work. 44 real jobs across nine major US sectors in the AI model wins or ties 3/4 of the time. I don't know. To me, my prediction that AI AGI is achieved but no one really notices kind of seems like it came true. But guess what? As the models get better, we're just going to keep kicking the goalposts. But don't worry, as the models get better, we're still going to be here every single day, keeping you up to date. All right, that is a wrap on our 2025 AI roadmap rewind. So volume two, like I said, if you missed volume one, make sure to go check it out. That's episode 674. I hope this was helpful. And yes, we are going to be announcing dates soon for our prediction series for 2026. And we got something else special kicking off here real soon. So if one of your personal goals or if one of your company's goals is to double down and go all in on generative A eye in large language models, trust me, we have something special coming. So if you haven't already, please make sure. Number one, you're going to want to go subscribe to the podcast if you haven't already. So if you're listening on Apple or Spotify, thank you for that. Number two, you're going to want to go to our website. Trust me, your everyday AI.com go sign up for the free daily newsletter. We're going to be re recapping today's show and a whole lot more. Thank you for tuning in to our recap. Our Rewind Series 2025, even though it's not over. It's been a great one. Thank you for your support. Thank you for tuning in. Hope to see you back tomorrow and every day for more Everyday AI. Thanks, y'. All.
A
And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going for a little more AI magic. Visit your everydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.
Everyday AI Podcast – 2025 AI Roadmap Rewind: Human vs Machine, AI Models Shrink, and AGI No One Noticed
Host: Jordan Wilson
Date: December 18, 2025
Episode Focus:
A fact-checking review of bold AI predictions made for 2025, examining what came true, what missed the mark, and what it means for professionals navigating AI’s rapid evolution. This episode covers predictions 12 through 1, looking at copyright lawsuits, AI influencers, “vibe coding,” the politics of AI, small models, AGI, and more.
Jordan Wilson closes out the year by revisiting the second half of his "25 for 2025" AI predictions, grading their accuracy with real-world examples and industry news. Listeners get an honest appraisal of what 2025’s most significant AI shifts were, how AI quietly changed society, and whether we might have crossed the line into AGI (artificial general intelligence) without fanfare.
[03:35]
“This is, I think, one of the first big pieces that eventually we’re going to move to a pay-to-train model.” [04:55]
[07:43]
“Probably by the end of next year, I’d say maybe 80% of what you might see online is going to be AI." [10:46]
[13:00]
“The people that are winning hackathons are non-technical people.” (Citing a conversation with Paige Bailey at Google) [15:49]
[17:51]
“Maybe just a little early on this prediction…but there is probably this new agency layer.” [18:30]
[20:30]
“It kind of seemed like a niche prediction…But no one was talking about virtual machines. And now they’re definitely a thing.” [22:53]
[24:00]
“There is literally no bigger way for AI to become political than at the federal level, saying: hey states, you can't make laws on this.” [25:15]
[29:29]
[32:05]
[34:56]
“Context plus connectors plus memory was definitely the biggest leap forward…in 2025.” [37:15]
[39:12]
“The future of large language models is using many small domain-specific models.” [42:07]
[43:11]
“The smartest companies are having a modular approach when it comes to their AI strategy.” [46:01]
[47:08]
“On economically valuable work judged by experts… the AI did better three-quarters of the time… and no one really noticed.” [53:11] “As AI gets smarter, the goalposts keep moving.” [54:00]
On the speed of AI progress:
“This is how fast time flies in AI world. This was before vibe coding... And now, vibe coding is baked into browsers.” [13:06]
On political influence:
“It doesn’t get more political in AI than the cozying up of big tech leaders with the federal government or President Trump.” [27:43]
On narrow agents:
“There’s no one great agent out there... The agents that have been making the biggest splashes are just narrow agents, agents built around a certain vertical, a certain task.” [32:59]
On AGI’s fuzzy line:
“If you were just to look at those old definitions [of AGI], we've achieved every single old definition. But as AI gets smarter, the goalposts keep moving.” [52:35]
Jordan candidly appraises the accuracy of his predictions and offers both industry insight (“this is what the smartest companies are doing”) and practical encouragement for AI professionals and businesses moving into 2026. The episode is an accessible, honest, and sometimes humorous guide for anyone seeking to keep up with AI’s dizzying pace—reminding us that with each step, the boundaries (and benchmarks) continue to shift.
Essential Call to Action:
Subscribe to the podcast, sign up for the newsletter (youreverydayai.com), and stay tuned for the upcoming 2026 Roadmap and prediction series.
For full segment breakdowns and more AI news, subscribe at youreverydayai.com.