
Loading summary
A
Is this actually Sora? Here's your AI Slop feed with all these Nintendo characters and Pokemon and South park and SpongeBob SquarePants everything. It was just like all this copyrighted stuff immediately. That was all that and Sam Altman was all you see in the feed. And it's always immediately like, oh, I will never use this. Like, this is not interesting to me at all. Welcome to the Artificial Intelligence show, the podcast that helps your business grow smarter by making AI approachable and actionable. My name is Paul Raitzer. I'm the founder and CEO of SmartRx and marketing AI institute and I'm your host. Each week I'm joined by my co host and Marketing AI Institute Chief Content Officer Mike Kaput, as we break down all the AI news that matters and give you insights and perspectives that you can use to advance your company and your career. Join us as we accelerate AI literacy for all. Welcome to episode 172 of the Artificial Intelligence Show. I'm your host Paul Raetzer along with my co host Mike Kaput. We are recording Monday October 6th at 11am which is very relevant because today is Dev Day for OpenAI. So there will be news from October 6 that we will not be covering in this episode, but we will be covering it next week on. I guess that would be what, the 14th? Ish. Yeah. Yes, around the 14th, which is the first day of Mekon, which is brought to us. Mekon is bringing us today's episode. So we have, we have a lot to cover. We did have some new models last week. We had Sora 2 from OpenAI. We got Cloudsonnet 4.5 there I think is going to be a bunch of new stuff announced today by OpenAI at the Dev Day, including there's a lot of buzz around an agent builder, a no code agent builder that'll allow people to build their own agents easily through like a ChatGPT type interface. That would be really interesting. And there's some other stuff that's being rumored as well, so. And then we're hearing more and more buzz that Gemini 3 is imminent from Google DeepMind. So we'd said October was going to be crazy. It is off to a very busy start and I think a lot more coming, so we're going to get into it in a second. This episode again is brought to us by Macon. I'll start there AI Academy as well. We'll start with Macon. Since I already mentioned it, this is. We are a week away. Which man? Mike, I don't know about you, but I Think you're further along than I am in your breath. But I'm getting there. I'm locked in on the workshop. I'm really excited. I'm doing an AI Innovations Workshops. The first time I'm doing this, I built a new Innovations GPT for it. I have not released it yet, so don't go searching for it on our website. I'm really excited about that workshop and then I'm doing the move 37 moment opening keynote and I'm equally as excited about that. I still to finalize that presentation. But this is your last chance if you want to be with us in Cleveland October 14th to the 16th, we'd love to have you. And you can join 1500 plus other AI forward professionals and leaders who are going to come together and I don't know, hopefully like learn a ton but inspire each other, make connections, build partnerships, launch companies together. Like it's, it's such an amazing three days and you know, I think I'm, I'm finally starting to like mentally get in the place Mike where I'm just, I'm just excited now to get there and do it. It's my favorite three days of the year professionally. So yeah, we'd love to see in Cleveland. It's Macon AI M A I C O N AI. You can use the POD100 code for that last minute $100 off your ticket. So again, we'd love to see you. Dozens of incredible speakers, incredible sessions over three days in Cleveland, which it looks like we're gonna have just beautiful fall weather. I mean the leaves are changing. It's no better time of year to be in Cleveland than in Oct. So love to have you join us. And then also AI Academy by SmartRx. We've been talking a lot about AI Academy. You can learn more at Academy. SmartRx AI I'm going to turn it over to Mike for a second, give you a preview. We've been doing kind of these previews of some of the course series and certific certification programs. So we just launched one recently. AI for Professional Services. This is part of our AI for Industries collection and Mike created that. So I'll let him give a quick background on, on what that course series is like.
B
Yeah, Paul, this is one I'm especially excited about just given our background in the agency world. So professional services. AI for Professional Services doesn't just cover marketing agencies but any type of firm that is billing for any type of human expertise in the form of services. So you think things like accounting firms, lawyers, Consultants, et cetera. So we cover a few representative samples. But the idea here is really an evergreen course that uses frameworks to help you as a professional services professional or leader, really accelerate your career and company with AI. So we go through a step by step process to actually understand at a high level what disruption is happening right now in the industry at large due to AI. What are some of those kind of structural factors that everyone needs to be aware of and adapting to and then really getting in the weeds on for your particular job. The type of work you do, no matter what type of pro services firm it is, how do you actually reinvent that work, do it more efficiently, do it with more innovation focus, and make it more performance driven using AI. So we teach you A to Z exactly how to do that. You come away from the course not only with a professional certificate, but hopefully with a roadmap for the exact types of use cases and tools you should be adopting in your own professional services work or in your firm or your team at large.
A
That's great. And then just for people, you know, how the roadmap works in terms of what we're building starting, I think later this year we'll start releasing AI for Businesses series where we'll actually then take some of those like law firms or marketing agencies, and drill more, I guess, deeply into those specific businesses. And so the whole concept of what we're doing with Academy is really build this learning journey where you can kind of start at a macro level and fundamentals, piloting, scaling, and then go by industry, by department, by business type, by career path. And so over time, we, you know, we're working very aggressively to create as much of this content as we can as quickly as possible. But the idea is to really give you a journey that you can follow to sort of pursue mastery in the area of interest for you and AI. So it's exciting to see all this stuff coming out. It's. The pace is incredible. I mean, Mike's been putting in a ton of work on this as well as everybody else on our team, and we'll be launching the new learning management system very soon here. We'll have more news on that. So, yeah, and thanks for all of our listeners who are part of AI Academy already. We appreciate it and hopefully you're really enjoying it and getting a ton of value out of it. All right, so I feel like this first main topic we could honestly just spend the entire episode on. There's so many layers to Sora 2 and the new Sora app we're Going to do our best to cover it concisely hit a few key points here, but I feel like this is a topic we're going to probably be coming back to because not just the model itself and the app, but like the larger implications from technology standpoint, from a legal standpoint, things like that. So let's kick it off, Mike with Sora 2 and their new viral app.
B
Yeah, Paul, this one is a doozy. So OpenAI has unveiled Sora 2. It's its most advanced video generation model yet and it is rolling out this new social app to showcase it. So OpenAI is pitching this as almost a ChatGPT moment for video because Sora 2 models physics very realistically. So it's not just making videos look better, it's actually understanding how physics works in these video clips. So for instance, a basketball shot could miss and bounce off the rim in a hyper realistic way. A paddleboard backflip has played out in video with buoyancy intact to these elements. This fidelity kind of makes it less like, you know, AI generated special effects and more like a true world simulator when you are generating a video. And one item here that's really turning heads is the new Sora iPhone app. Looks a lot like TikTok. It has a vertical video feed you scroll through, except every AI clip is. Every clip is AI generated. And the standout feature here that you can use with Sora that starts to kind of really make people pay attention is called Cameo, where you record a short clip of yourself and friends with your permission, can drop your likeness into any AI generated scene. You are considered a co owner of the result and can revoke it at any time. So this is part of OpenAI's attempt to manage some very real issues around consent and deepfake abuse. And right now the app is invite only. I've had people already ask me if I have ways to get invites. I don't because I don't have one myself. But it is only in the US and Canada right now. But there is expansion planned now, Paul, there's a lot to unpack here. There's a stunning new video model which alone is really impressive. There's this AI generated video feed. The potential for deepfakes seems out of control and we'll talk about this a bit. There's some breathtaking copyright concerns as well.
A
Yeah, so I struggled a little bit with how to unpack this one. Honestly, this is, like I said, such a broadcast topic. And I was watching very closely last week looking at all the responses online and so what I'm going to do is walk through four components here. Mike. I'm going to talk briefly about the tech. I'm going to go through my personal experience because somehow I do have access to it.
B
Oh nice.
A
I don't know how I but I do and actually have four invites which now that I said that they will probably be gone before this airs. So please don't flood me with requests for my four invites. The legal perspective is incredibly important, as you alluded to Mike, and then what happens next? So Mike, I'm going to kind of walk through some thoughts here. Interrupt at any time. Jump in if there's anything you want to add. So on the tech side, Mike covered a little bit from it. There is a system card that goes with it that sounds like super technical. It's basically just more deep dive into the technology. So we'll put the link into the show notes in that system card post. It says Sora 2 is a new state of the art video and audio generation model. Builds on the foundation of Sora. The new model introduces capabilities that have been difficult for prior video models to achieve, such as more accurate physics, sharper realism, synchronized audio, enhanced durability and expanded stylistic range. Now those all sound very similar to VO3 from Google. So again, they're not the first ones doing this. I feel like they've raced to get this out in many ways in response to how popular VEO has become, how viral it became for Google. So just something to keep in mind this, they're not the only ones doing this. The model follows user direction with high fidelity, enabling the creation of videos that are both imaginative and grounded in real world dynamics. My understanding, Mike, is That they're about 12 to 15 seconds is what you can create. The video clips, my personal experience, I only created a few that were permitted. I tried to do some with copyrighted characters. We'll talk about that in a minute. Sora 2 back to the system card. Sora 2 expands the toolkit for storytelling and creative expression while also serving as a step toward models that can more accurately simulate the complexity of the physical world. That is a recurring theme. You're going to hear this idea. Mike, hit on it up front. This is all the basis for things that are much bigger than this. Just always keep that in mind. This is not the end game, what we're seeing here. Sora 2 is available on Sora.com I just went and checked. You can go and like play around with it. You just have to get started and I don't know, I assume you have to have an invite to get started button to work. Maybe when I downloaded the Sora app, it just worked again. I don't know why. And then in the future they'll make it available through their API, says Sora2's advanced capabilities require consideration of new potential risks, including non consensual use of likeness or misleading generations. Our iterative deployment includes rolling out initial access to SORA2 via limited invitations, restricting the use of image uploads, that feature of photorealistic person and all video uploads, placing stringent safeguards and moderation thresholds on content involving minors. So it's some important context as we kind of move into these other areas that I wanted to touch on. The cameo thing. Sam Altman is everywhere. Like, if you didn't follow last week, like, if you're not on Twitter at all and you just didn't see this stuff going on, people like, Sam put himself in there to make cameos of him doing whatever, and people took full advantage of that, I would say. So some of them are really funny, some are pretty crazy. But again, like we say with all things AI related, be careful what you upload, like what permissions you're giving once you upload these things. So Sam became like the viral meme last week, just doing everything you could imagine. All right, so my personal experience. So this thing comes out on Tuesday, September 30th. I finally go in Thursday evening. I think I went and tried it and I just like went and downloaded the app and I just instantly had access. So I didn't know at the time it was because I have a pro account. Like, I don't know why. And then I, at the top left, it has four invites. So as you alluded to Mike, immediately I was like, wait, did I open Instagram reels? Like, is this, is this actually Sora? It looks exactly like reels and TikTok. Like, it's the same format, same scrolling mechanism. It's just all AI generated. And then there were no that, I recall disclaimers about creation of anything that infringes on copyrights. Like, there was nothing up front. It was just, here's your AI slop feed with all these Nintendo characters and Pokemon and South park and SpongeBob SquarePants. It was Star wars, like, everything. It was just like all this copyrighted stuff immediately. That was all that. And Sam Altman was all you see in the feed. And so I was immediately like, oh, I will never use this. Like, this is not interesting to me at all as a user. Some people May be really excited about that stuff. And so then I thought, well, let me see what it's like to create something. I didn't know would it immediately go live? Like, am I going to create something and it automatically publishes? I didn't know. And so I clicked the button. I realized, like, I suck at giving video gen prompts. Like I can't think of creative things. So I actually went into ChatGPT separate chat and I just asked for help and I said, testing Sora 2. Write some prompts I can use to test its full capabilities. So it came back with things like a close up of a glass of red wine being poured in slow motion. Droplets splashing on a wooden table, cinematic lighting. I could never write that. An underwater library with glowing jellyfish drifting past shelves of ancient books. So I'm like, okay, these are clever. Like that would create something fun, I guess. But then I think at the time I was watching the Guardians playoff game, which I don't really want to talk about because I'm still sad about it, but so I was like, all right, give me some related to baseball. So it was popping up with some stuff related to baseball. And then I knew that it was generating all of these copyrighted characters. I knew it was able to do this. So I said, make them more creative and fun. Incorporate well known characters. So then it starts writing ones like Batman stepping up to the plate in full armor. The Joker pitching with a wicked grin. Gotham skyline glowing in the background. So I was like, all right, let's give that a try. That sounds kind of cool. So I hit the create button or whatever the button says and it immediately pops up and it says, this content may violate our guard rails concerning similar similarity to third party content. So I was like, oh, okay. Like how is everybody else creating all these characters? I can't do it. So I try one more. So I try. Harry Potter using his broomstick to chase down a flyball in a magical baseball game at Hogwarts. Quidditch hoops in the background. Boom. This content may violate our guard. I was like, oh, they put some guardrails in our place. So this is 48 hours later, the feed is still filled with all of these characters, but I'm now not able to generate it. So they've obviously kind of made some changes. So this then leads me into the legal aspect of all of this. It is blatantly obvious that this thing is trained on an immense amount of copyrighted content, including shows, movies and video games names. And so you immediately got by by like again, Tuesday night, Wednesday morning, people had access. They started immediately tweeting things related to AI slop and the legality of this stuff. So there was one. Pietro Scarano, CEO of Magic Path. I don't, I don't know, Pietro. This tweet just ended up in my feed. This one made me laugh. He said, man, imagine being Mark Zuckerberg spending billions to build a slot machine. Slop S, L, O, P, not slot. Only for another slop machine to outslop you days later. I thought that was pretty funny. He's referring to vibes from. From Meta. So then on Wednesday, October 1st, a day after it launches, we already have. Or two days after, we have commentary from Sam Altman. So someone had tweeted, this is like, read this verbatim. Sam Altman 2 weeks ago, we need $7 trillion in 10 gigawatts to cure cancer. Sam Altman Today we are launching AI slot videos marketed as personalized ads. So Sam actually retweeted that and said, I get the vibe here, but we do mostly need the capital for building AI that can do science. And for sure, we are focused on AGI with almost all of our research effort. It's also nice to show people cool new tech products along the way, make them smile, and hopefully make some money, given all the compute we need. When we launched ChatGPT, there was a lot of quote, who needs this and where is AGI? Unquote. Reality is nuanced when it comes to optimal trajectories for a company. So it's like, oh, okay, so Sam's basically admitting, yeah, we're just kind of filling this world with some crap, but it's going to make us some money and it's kind of fun along the way, doesn't address the fact that they're all compute constraint and they can't do all the breakthroughs they want to do because they don't have computers. And now they're going to just pour compute into all the inference time compute. That's going to go into, like generating these. So my first tweet I think about this was I had shared a. There was a Star wars movie clip that was featuring super Mario characters. Blatantly an issue. I said, does copyright law in the US just crumble or do the major brands fight back in a meaningful way? It will be fascinating to watch this unfold. One thing is clear. The leading AI labs are fully into their don't give an F phase when it comes to copyright and IP law. Then on October 3rd, Ed Newton Rex, we've mentioned numerous times who worked at Stability AI on AI models. He was replying to a video of Michael Jackson looked completely real. You put this thing on Facebook, you know, your parents and grandparents think Michael Jackson's alive again. Like it looked like exactly like him, sounded like him. So he said, even if OpenAI now tightens Sora's guardrails, the damage has been done. They will have used people's copyrighted intellectual property and likeness to go viral, getting them to number one in the App Store, which will let them make a ton of money. This is why after the fact opt out is so parasitic. So now, few days later, we're Friday, October 3rd. Sam has now dealt with all the blowback. They, I assume, have heard from Disney and all the other rights holders that, you know, control these characters. And so he publishes a blog post called Sora Update Number One. In this post he says we have been learning quickly from how people are using Sora and taking feedback from users, rights holders and other interested groups. We of course spent a lot of time discussing this before launch. I think that's a lie. But now that we have a product out, we can do more than just theorize. Keep in mind there is nothing that we've already talked about here, Mike, or that has come out since that couldn't have been not just theorized but known was going to happen.
B
Right?
A
You don't train a model on all this copyrighted stuff, allow people to output it and not know that you're going to get massive blowback. You absolutely know that. So it's disingenuous to even like, I don't know that that paragraph really bothered me. But anyway, we are going to make two changes soon and many more to come. First, we will give rights holders more granular control over generation of characters, similar to the opt in model for likeness, but with additional controls. So we're going to like allow Disney to say if they want their characters created is basically what he's saying. We are hearing from lots of rights holders who are very excited for this new kind of interactive fan fiction and think this new kind of engagement will accrue a lot of value to them, but want the ability to specify how their characters can be used, including not at all. We assume different people will try different approaches and will figure out what works for them. But we want to apply the same standard toward everyone and let rights holders decide how to proceed. Second, we are going to have to somehow make money for video generation. People are generating much more than we expected. No way that it's an Invite only. You controlled this thing. There's no way OpenAI isn't smart enough to project usage. So again, this whole article just bothers me because it's unnecessarily gaslighting. I think like this wasn't needed. You knew all this was going to happen. So we are trying to come up with some revenue sharing thing for rights holders so they can make money along the way. Okay, so then also Friday, another tweet. And again, some of this is just giving perspective. I don't know this guy, Reed Southern, but it was a good tweet. He said if your copyright is contingent on opting out from anyone who randomly decides they're going to use your work for profit, it becomes effectively worthless. There's a reason it doesn't work that way. No one else is allowed to do this. Why are AI companies the exception of then Bill Peoples, who's the head of Sora @OpenAI tweets Friday night or this was. This was Sunday. Good to see quick changes. Oh no, this was me. I was replying to a tweet from Bill Peoples saying some of the changes they were making. I said, good to see quick changes. Still shocking that OpenAI and others are releasing these apps models. With so little testing and so few safety and control measures for such obvious issues, can't they just prompt GPT5 to identify and fix this stuff before releasing it? So that was like, again, I know the answer to this. They probably did and they chose to just do it anyway. So then Saturday night I was like, I, I want to talk about the legality side of this, but Mike and I aren't lawyers. Like we, we can like theorize about this stuff. But I was like, I have friends who are IP lawyers, so I'm going to email one of them. So I emailed my friend Krista Laser, who's an associate professor of law at Cleveland State University College of Law and owner of Learn Innovation Law. She's an IP attorney shorthand. So I emailed her four questions. So the first one I said, is their approach legal? And she actually put a YouTube response up. So we'll put a link to that in the show notes. You can listen to her whole like it's like five and a half minutes. I'm just going to give you real quick highlights here. So I said, is their approach legal? She said, we've had courts come out mixed on that, meaning the training on this data and it's possible that some amount of training on lawfully obtained copyrighted works will be considered fair. Use. But there's no indication that OpenAI paid for access to these copyrighted works to engage in their training. Another question I have for her is, are individual users, this is important for all you listeners, are individual users who choose to output copyrighted material using the model or app at legal risk? The short answer to what she said is yes, unless OpenAI has licensing deals with the rights holders like Disney, that they sublicense to users. So in their agreement with Disney, they're allowed to transfer a sublicense to you, the user. Then there's potential legal jeopardy for Sora users who create videos without permission. So you create a funny thing with Mario characters in the Star wars movie and maybe you get a letter saying, take it down and here's the legal ramifications of your actions. And then in terms of what happens next, she said, it's pretty clear that OpenAI has shifted towards more of this opt in model because it's legally a lot safer. So they've already made change to, hey, it's going to be opt in. It's much safer, obviously, to negotiate up front with rights holders and to make sure that, especially if a rights holder like Disney, for example, has said no, that you are not training on or especially outputting things based on that. So again, we'll put the whole link there to her video. So final thoughts here, Mike, where this goes, what they're doing, just in case it's not blatantly clear to everybody, this is what they wanted. Like, they wanted to get a viral hit, they wanted it to get to number one in the App Store, which it did. So they got what they wanted. They claim it's part of their iterative deployment strategy, which it is in some way, which means, hey, we're just going to put tech out in the world and see how people use it again. There is nothing that has happened in the six days since this came out that they couldn't have predicted and probably didn't predict. The real reason they did this is for competition. Google got one up on them with VO3. There's other stuff coming, there's other models coming and they, they had to just get out ahead of it and get it out there and then create enough demand for the stuff that then it like, proves the model and now people have to come to the table and negotiate with them, basically. The other thing is Dev Day is October 6th today, like later in the day than we're recording this, and there's other stuff coming in. My guess is they just wanted to get it out there on the tech side, they said at the end of their Sora 2 announcement post, video models are getting very good very quickly. General purpose world simulators and robotic agents will fundamentally reshape society and accelerate the arc of human progress. Sora 2 represents significant progress towards that goal. In keeping with OpenAI's mission, it is important that humanity benefits from these models as they are developed. We think SORA is going to bring a lot of joy, creativity and connection to the world. So that is their macro level. And then one other note here on society and the creator economy so Mr. Beast tweeted on Sunday, October 5th if you don't know who Mr. Beast is, he is extremely popular with kids. My kids know him. YouTube. He has 443 million subscribers. I think he's the highest subscribed person in the world.
B
I think so, yeah.
A
Yeah. So he tweeted when AI videos are just as good as normal videos, I wonder what that will do to YouTube and how it will impact the millions of creators currently making content for a living. Scary times. One other perspective that I found I don't I want to like. I want to be as politically correct here as I can. So Vinod Khosla, who's the co founder of Sun Microsystems and the founder of Khosla Ventures, he tweets, I think this is Sunday night. All the replies to OpenAI's announcement of Sora on X that are criticisms, quote unquote finely pure slope are from tunnel vision creatives. Let the viewers of this slop judge it not. I can't believe this. Not Ivory tower Luddite, snooty critics or defensive creatives opens up so many more avenues of creativity if you have imagination. This is same initial reaction to digital music in the 90s and digital photography in the 2000s. There will be a role for traditional video still, but many more dimensions of creative video through AI. Okay, my only thought again, I'm going to try and say this as nicely as I can. If you want to turn the entire creative industry against AI labs and VCs who are funding those labs, this is exactly the tone you take. I don't know why else you would say something like this. What was it? Luddite? Snooty critics or defensive creatives? So for me, as someone who sees enormous potential for AI in the world to do good, but also has tremendous respect and love for human creativity and creators, I really hope this sort of combative messaging stops. Like there is no good that comes from this other than making you get that dopamine hit when you get to put out a tweet calling people, what is it? Snooty creatives and tunnel vision creatives. That is not going to go well in society. That's not going to play well. There's a whole bunch of people who make their livings, who aren't Disney, who can't just sue OpenAI for doing this, who have no voice whatsoever, no control whatsoever, even if they think they've opted out. Like, the vast majority of the people aren't the big brands that have a team of attorneys. It's people who build stuff for a living, who write, who take pictures, who create videos. And to just discard them in the economy makes no sense to me. So I don't know. But again, big picture, this is only the beginning. This is a very crowded space. This is not just OpenAI. Meta's doing it, Google's doing it, Runway's doing it. Like, everybody's in this audio video generation space. There's many more innovations to come and maybe even more legal challenges than that.
B
Yeah, I'm personally excited to try it out, but it also felt a bit to me like two things I guess jumped out when I was looking at this. One, it's game over for deep fakes. Like, I've thought that for a while. And VO3 is comparably amazing. So I realize Sora 2 is not the only one to do this. But I think really seeing Sam just go hog wild embracing it, I was like, okay, this is game over for this, no matter if it's OpenAI that allows it or others. And then I also have to feel like it's kind of game over for attention spans because this AI video slop feed is taking off. And like, keep in mind, I, whether I'm embarrassed of it or not, enjoy short form video on YouTube and TikTok as much as anybody. But you have to also accept they are nuking our attention spans already. Now you can say, hey, maybe it's just a new medium, maybe it's just creativity expressing a different way. Right? Very possible. But you have to accept that short form video has ramifications to people's attention, whether you agree with that or not. And I think it's going to about to go into overdrive. And someone mentioned in one of the tweets or articles here, they were like, just wait until you have a reinforcement learning. And they use the term sloptimized feed, which I really liked. I was like, oh God, here we go. But those are, those were two kind of worrying things that jumped out.
A
Yeah. I have no idea why I would ever go back into that app.
B
Yeah.
A
Other than just to test it, like from an entertainment value or an educational value. It's zero to me. Like, I just don't find it entertaining at all. But I mean people were like they had to shut off the south park stuff because people were creating entire episodes of South park, like 2530 minute videos, but just stitching stuff together. Yeah, it's going to be insanely disruptive. But I'm interested in the bigger picture of what this is a step toward. I am not interested in a scrolling feed of AI generated crap, which is basically what this is, even if that makes me a snooty critic and defensive creative.
B
All right, so next up this week Anthropic has released Claude Sonnet 4.5 which is billed by Anthropic as the best coding model in the world. The new model can handle complex multi step tasks like building full applications, managing databases and performing security audits. In one demo it generated 11,000 lines of code to spin up a Slack style chat app and stopped only when the job was done. Anthropic even said, quote, practically speaking, we've observed it maintaining focus for more than 30 hours on complex multi step tasks on benchmarks. Sonnet 4.5 is state of the art. It leads on SWA Bench Verified, which tests real world software engineering. It is three times better than its predecessors at navigating and using a computer. Enterprises like Canvas say it's already useful for deep engineering and research work. Now at the same time, Anthropic also unveiled the Claude Agent SDK. This is the same infrastructure it uses internally to build agents. So this means developers can now design their own long running AI systems with memory, context management and multi agent support baked in. And Anthropic also stresses that this is their most aligned model yet with fewer cases of deception or prompt exploitation. Now Cloudsonnet 4.5 is available today and it's available at the same price as before. Paul I found it interesting. Anthropic seems to be really leaning into becoming the AI for coding or AI for building agents with this release. Seems like they're trying to own that corner a bit here.
A
Yeah, I was actually listening to a podcast yesterday from Sholto Douglas, so the podcast again link in the show notes, but it's the MAD podcast with Matt Turk. So Sholto is someone we've talked about before. He's been on a few podcasts in the last few months that have been great. He's extremely intelligent, very well Spoken makes things very approachable. Like I really like listening to his stuff. So he's an anthropic AI researcher, worked on this model and then he's former Google DeepMind. He started there I think right before ChatGPT, if I remember correctly. So the whole episode was about Sonnet 4.5 and I was going to highlight a few of the things he touched on that maybe builds on some of the other links that we'll put in the show notes. So the way that Anthropic builds is Haiku is their smallest model, Sonnet is the mid tier, and then Opus is the biggest model. What happened here is the Sonnet, this mid tier model, is now outperforming their biggest model, Opus. And what he said is that they've found that these mid tier models can be made smarter largely through reinforcement learning, which is kind of like after the initial training run happens, you go through and sort of fine tune this thing and certain capabilities, expert knowledge in different domains, things like that. They're finding that basically you do a massive training run. So we say we build OPUS within three to six months. They can usually do a much more, more affordable, efficient model like Sonnet and make it smarter than the big run they just did. And so he was pretty much saying like, this is what's going to happen every three to six months. Like we'll come out with like say a Gemini 3 is going to come out probably this month from Google. There's a decent chance that they're going to have a more efficient model three months later that already surpasses their dominant largest model. So it's just sort of a byproduct of what's happening right now is like we're in this every three to six month phase. And the way he tells it is whatever these people saying that there's, we're hitting walls of training and stuff, he's like, we're not seeing it. Like there's nothing we're seeing that tells us there's any wall whatsoever that these things aren't going to just keep getting smarter and more generally capable. He did address why they focus on coding. So we've talked about this quite a bit with Anthropic. Like they've done a good job of their economic impact research and the research on usage patterns within Anthropic. But we always sort of hesitate with it because the use for Anthropic is so dominantly coding that they don't get a great perspective on overall knowledge work. But he said the reason they're focused on coding is twofold. One, they think the fastest path to build more powerful AI is to automate AI research. So they are very actively trying to automate AI researchers, which everybody's doing. Meta's doing it, Google's doing it, OpenAI is doing it. But this is like, this is their main North Star at the moment is like automate AI research because then we can compound it. Now the other reason they're doing is because the software market is vast. So estimate about 300 billion and I'll explain where that number comes from in a second. So they see it as well, if we can build coding agents that can build software, then we can go get a piece of that $300 billion annual market of software. Our tools can build it, or we can build it ourselves. So they look at coding and economic impact. So build the research engine, then generate revenue by building software and enabling the building of software. So then Related I was listening to a separate podcast episode this weekend where it was actually a presentation given by an A16Z Andreessen Horowitz general partner Alex Rampel at the LP summit. So the title of his presentation was Software is Eating Labor. He in that presentation cited that the worldwide SaaS market is about 300 billion per year. But then to give this context back to this overall economic eval as we've been talking about, Mike, the labor market in the US alone is 13 trillion. So again, if you're thinking about the funding like the building of why are they building these AI models to be able to do the things humans do? It's because VCs want to make money and there's $13 trillion sitting there to replace human workers. They're going to build stuff. The other couple of quick notes that I thought was interesting is this idea of 30 hours of continuous work. So this is a theme we've been hitting on a Recently we had the Agent 3 release from Replit where they talked about 200 minutes of continuous runtime for their agents to do coding. Sholto talked about this idea of long term coherency, meaning it like stays good at doing the task it's doing for extended periods of time. He actually referenced the meter evals, Mike, that you and I have talked about, which is that AI models today have a 50% chance of successfully completing a task that would take a human expert one hour and that's doubling every seven months. So seven months prior to that it was 30 minutes. Seven months prior to that it was 15 minutes. So what it means is we're seeing this continuous runtime with long time coherency, specifically for coding. But then, you know, that's when you take it into other domains and say, well, can it do what a marketer would do for two hours or 30 hours? Would it do? And he was basically saying it's not a technical limitation. Like they could probably do 60 hours. They would just keep working at the problem. It becomes an issue of like taste and context, that humans are still just better at saying, hey, this is, you're actually wasting time. This is not a great direction. You should, you should go this direction instead or try this. He asked him about the difference between like Google and anthropic, and he said that they very confidently believe that they're the best at coding. But he said scientific breakthroughs are going to come from Google. He speaks very glowingly of like Google, Google, DeepMind. And then the other thing that he touched on that I've seen come up a lot lately, and maybe it's just because I listen to these podcasts or read this stuff, but the bitter lesson, and I don't know if we've ever talked about this on the show, but there's a computer scientist, Richard Sutton, he wrote an essay, I think it was in 2019, where he sort of coined this term, this bitter lesson. And the basic premise, and Sholto was a big believer in this, is that generalization and compute went over time. What I mean by that is for a long time in, in AI development and computer programming, there was this belief that humans could code the best paths forward, that we humans are uniquely capable of, like, figuring out the plan and what to do. And we're super clever and we'll always find ways to make these models better. And a lot of times we got to get involved and we got to write more instructions for the AI. What the bitter lesson says is, no, actually the models are better than us. Like, over time, they just figure things out better than a human could. So the lesson is that methods that leverage computation scale better than those that rely on human design knowledge or heuristics. In other words, when researchers try to handcraft domain knowledge, rules or clever shortcuts, these approaches often work well at a small scale. So in the near term, but they fail to generalize or improve as problems get larger. Conversely, approaches that rely on general learning algorithms plus more compute. So throw more Nvidia chips at them, throw more training time at them, eventually, while they may start off kind of less elegant, they tend to win out in the long run. So it's considered bitter, the lesson, because researchers naturally want Their insights, expertise and clever designs to matter. But history shows that repeatedly scale compute driven approaches outperform human ingenuity and crafted specialized solutions. So that's what he's saying here is like, yeah, we keep, we do all these things, we keep it and it makes a difference in this near term. But at the end of the day, as of right now, we know that if we just keep building more data centers and giving more Nvidia chips and giving more data, the things just get smarter and it obsoletes all these human written rules that we think matter right now. So I don't know, it's fascinating stuff. Again, like it's 4.5 is probably a great model, especially if you're into coding. I know some other people love using Claude just in general, but for the most part it's a, a coding model that's the primary use, at least that they think of it as.
B
Yeah. And just to reemphasize one thing you mentioned earlier, we've talked about this in a number of contexts, but when you say that the A16 general partner Alex Rample says the worldwide SaaS market's about 300 billion a year. You said the labor market in the US alone is 13 trillion. Follow the money. Look at how much money the VCs are putting into every AI lab. I can guarantee you the labor market, the SaaS market, is the ultimate target.
A
Yes, it is. It is pure economics and pure capitalism. And I don't think it's even a debatable thing. That's, I just, I still like, if you just zoom out and you just look at those numbers, there's no way. People don't build to replace humanly like it, it's how humans work, it's how capitalism works. So yes, it's, it's a bitter lesson, I guess.
B
Another one.
A
Yeah.
B
All right. Our third main topic this week. OpenAI has turned ChatGPT into a shopping platform. They have launched a new feature called Instant Checkout which lets people buy products directly inside conversations. OpenAI says, quote, Every day millions of people use ChatGPT to figure out what to buy. Now with Instant Checkout, they can buy directly from you inside those conversations. So here's how this works according to OpenAI. So say you describe what you're looking for in the course of a chat, like, hey, I want to durable carry on bag under 300 bucks. ChatGPT will recommend the most relevant products across the web, kind of like it normally does. But then if you have Instant Checkout enabled, users can buy the product without leaving ChatGPT, you can pay instantly with a credit card, Apple Pay, Google Pay, or Stripe. Merchants remain the seller of record here. They keep control of the payments, fulfillment and customer data. This is all powered by OpenAI's new agentic commerce protocol. This is an open standard built with Stripe, and right now merchants like Etsy are already live Shopify integrations are coming next, and so we'll see very soon here how many things you can suddenly start buying right within ChatGPT without leaving the platform. Now, somewhat related to this, at the same time, it appears that OpenAI is gearing up to turn ChatGPT into an ad platform. We talked about this a little bit in past episodes, but Adweek now reports that a new job listing at OpenAI shows that the company is hiring someone to build tools that let advertisers create and manage ads inside ChatGPT. So this hire will be responsible for experimenting with native ad formats, suggesting a future where ChatGPT may serve suggestions the same way search engines show sponsored results. So Paul, it feels like we're really recently seeing Chat GPT move very quickly towards becoming a buying and maybe ad platform. Like, is that where we're headed? Like, what does that kind of start to mean for businesses?
A
It's certainly been all the indications over the last year. Plus with the hires that they've made, the, you know, the one you just highlighted that they're currently looking for, they've been putting these steps in place. And Sam has talked about becoming more of a platform and personalization being a key. We talked about it with the new Pulse feature a couple of weeks ago and how natural that's going to be to inject, you know, purchasing decisions because you're talking about trips you're taking or, you know, fitness and health needs. And it's just like it's so natural to just inject ads in and be all the better if I can just click one button and I can just make my purchase right from there. So, I mean, my general experience right now with ChatGPT has been that I often would trust the links less than I would if it was served to me in Google. And so I've personally found myself going to Google to verify sites and vendors that I find through ChatGPT. Specifically, when I'm using like agent mode in ChatGPT to help conduct research for purchasing decisions, I will often like go out and then verify that it's like legitimate companies and stuff like that. I think that this is an instance where there's just so many unknowns and we're not even asking probably all the best questions yet. You think about this initial, you know, human to commerce, but what about when it's agent to agent and like, my agent's just going and doing this research and then maybe it's buying directly through ChatGPT and how does that change things? But I think that as we look into 2026 and we can start to project some of the impact this is going to have, some of the changes in buying behavior, definitely something that marketers, you know, brands, you really got to start thinking about how SEO is changing, how E commerce is changing. There's some tremendous trends that are emerging that are going to dramatically affect the way you're doing business 12 months from now. Like, I mean, we can already start to see it. So I think that a key going into your 2026 planning is make sure you're asking the right questions. Make sure you're not just building your strategies based on what you know to be true today. Because there's going to be some incredible, like, innovation opportunities in the near future, but also, like, things could move fast and you could find yourself, like, maybe you're in a really strong competitive position today in the traditional way of doing search and commerce, and maybe that changes really, really fast. And some upstarts come along and take that market share just because you weren't asking the right questions or, like, thinking more deeply about this. So, yeah, I mean, again, I think these are just huge trends. Commerce, personalization, you know, ChatGPT and others as platforms to business, not just like tools. It's going to change things pretty quickly.
B
Yeah. I think just to harp on why we promote AI literacy so much, the only way to know what questions to even start asking is to understand what's possible.
A
Yeah, and you got to know where the tech is going, too. And again, we heard the Sholto thing every three to six months, like, new models. Like, this is not something you can just sit back and not worry about for a quarter. Like, oh, figure it out in January. It's like, no, I wouldn't be waiting until January. I'd try and stay up on this stuff now.
B
All right, let's dive into this week's rapid fire. First up, OpenAI is growing at breakneck speed and burning through cash just as fast. According to the information, OpenAI brought in $4.3 billion in revenue in the first half of 2025. That's already 16% ahead of all of last year. And this surge reflects explosive demand for ChatGPT and other tools. But the costs are equally staggering, OpenAI burned 2.5 billion in those same six months. Development alone are going to be topping 6.7 billion. Most of that's going into building bigger, more powerful models and keeping ChatGPT running. But even with this burn, OpenAI is not running on fumes. It had nearly 17.5 billion in cash and securities at mid year and it's aiming for 13 billion in revenue by the end of 2025 alongside 8.5 billion in total burn. Now Paul, we're used to pretty big numbers in AI. These still seem pretty immense. The demand here is crazy, especially since we've been talking about They've barely monetized ChatGPT except through subscription and obviously tokens, right?
A
Yeah. And the personalization of ads, video ads like these are all revenue channels that they're obviously developing and then continuing to expand into the software market which I think we'll touch on a little bit later and the enterprise market, you know, taking going head on with OpenAI or with Microsoft and Google, especially on the productivity platform side. So yeah, I mean I think obviously ChatGPT is the cash cow for them but I think that, that you know, is going to keep expanding out into these other lines and maybe even you know, like complete. We've talked about competing on an infrastructure side where they do this massive build out of compute capacity and then they start competing with AWS and Google Cloud and Microsoft Azure where they're now selling compute and data storage and things like that and intelligence on demand which is going to be everywhere like the inference for this stuff. So I would love to see the breakdown of what their revenue mix is projected to look like. Not the total revenue but where the percentages lie. 20:28 I'm sure they have that in a deck somewhere. I would be interested to see what they think the bigger business lines are going to be.
B
Next up, some more OpenAI news. They've also launched a new series showing how they actually run their own business on OpenAI technology. This is called OpenAI on OpenAI and this project highlights internal tools the company is building to solve everyday problems. So a few recent entries as they launch this series include highlighting a tool called GTM Assistant which is a slack bot that centralizes account research and product knowledge to boost sales productivity. Another is docugpt, which turns contracts into structured searchable data so finance teams can review deals faster and more consistently. They also have a research assistant that analyzes millions of support tickets to surface trends and a support agent framework that turns each customer interaction into new training data. Even Inbound sales are now routed with AI ensuring personalized responses and fewer missed opportunities. So Paul, I mean this is really valuable I think to see how OpenAI is actually using their own AI. I mean I feel like we've been waiting for this for a while to hear about this more like what did you take away from these initial examples?
A
SaaS companies are in trouble. So my initial thought was that and then I happened I still own some HubSpot stock. I I was again I no investing advice. So I people who haven't followed along for a while know my former agency was HubSpot's first partner back in 2007. So I was lucky enough to buy into HubSpot when it IPOed at $35 a share back in 2012 or whatever it was. So I have long been a follower of HubSpot stock. I I still own a bit of it and so I saw it cratered and I was like what the hell happened? Like I so I go and like did they have earnings call? I missed like what had happened to HubSpot stock. So do a search. What happened to HubSpot stock last week? First result is Yahoo Finance and here's verbatim shares of customer platform provider HubSpot Hubs is the NYSE listing fell 7.2% in the afternoon session after OpenAI announced internal software applications that could potentially compete with existing SaaS offerings. The news sparked concerns across the sector as OpenAI revealed internally developed tools for sales, inbound marketing and customer support, core areas for HubSpot. According to TD Cohen analyst Derek Wood, the announcement has quote, refueled the debate that SaaS is at risk of being displaced by DIY solutions. On top of LLMs, the potential for OpenAI to enter the applications market with its own AI native solutions triggered a broader sell off among enterprise software stocks. So yeah, it's tough. I mean HubSpot's great company. We still are powered by HubSpot. We love HubSpot. But I think them and other SaaS companies have to deal with this reality that people are going to be able to vibe code stuff. They're going to be able to just build something when they get tired of a software product or becomes too unwieldy or they don't like the pricing model. We're seeing it with changes in pricing from HubSpot and others where they're moving away from license based pricing and they're trying to figure out when there's fewer humans to buy our licenses, like how do we make more money, how do we get More value based pricing based on outcomes, consumption, things like that. So the software industry is in a bit of an upheaval and there's lots of unknowns about where it goes. And you can see, I mean, we have, I don't know, seven to 10 core SaaS, products we use to run Smartr X. I can see every one of them dealing with this stuff. Like Asana is another one where you just, you can feel them trying to figure out where this goes and what the business model is and how they kind of get out ahead of it. But that's, I mean, one, yeah, it's fascinating to see open. I share this stuff. Two, it does open up all kinds of concerns, especially if they launch this agent builder later today. Like that goes at Zapier and Make and all these other players, even like Agent Force, you know, from Salesforce, like it's a direct attack on those kinds of companies. So they are a very, very ambitious company that needs to make a whole bunch of money and they're going to try a whole bunch of ways to make that money and you don't want to be in their way when they do.
B
All right, next up, Sam Altman says that critics of GPT5 have it all wrong in a new exclusive interview with Wired. So he sat down with Wired after a rocky August launch that was filled with glitches and gripes about chat GP or about GPT5 rather. And many called GPT5 overhyped and even pointed to it as a sign that the AI boom was cooling off. But in this interview, Altman insists GPT5 marks a real turning point. He argues it's the first model genuinely accelerating scientific discovery. It's helping physicists and biologists solve problems in ways earlier systems couldn't. And while skeptics claim scaling has stalled, OpenAI says the gains in GPT5 came from smarter training, not bigger data and more compute necessarily. They're still spending hundreds of billions of dollars on new data centers and betting that scale plus reinforcement learning will eventually get them to the next phase of AI development. Now what's interesting is Altman actually in the interview changed how he defines the goal, which is AGI. He says it's not a single moment where machines surpass us. It's a process. So it's not really being treated anymore as a finish line, but as a long accelerating curve progress. Now, Paul, it's a pretty interesting about face from Sam here. What did you take away from this interview? It seemed like he was kind of trying to rewrite the record on Some.
A
Of these issues, they've been moving on this AGI definition for a while. He's been hedging against it for the last 18 months. Like does. It's interesting, it does give a different definition every single interview he does.
B
Yeah.
A
But this, like, no longer having this definitive moment is sort of a talking point. He's been weaving into a lot of what he's been saying for a while. The first model genuinely accelerating scientific discovery. That would be hard to make that statement. I mean, Google is certainly pretty far along and making some massive impacts on biology and chemistry. I mean, Demis won a Nobel Prize for chemistry. So that being said, I do think GPT5 is underrated. I do think that most people don't really understand how good of a model it is. And there's been a lot of stuff, even in the last couple weeks I've seen where it's like assisting with math theorems and things like that, like where you're starting to see some. The top mathematicians in the world who are actually using it to assist them. And so, you know, I think, I think it is a great model. And it did certainly probably not get in the first week or two out of the gate. It probably didn't get the recognition it deserved as being a great model. I use it all the time. I mean, I definitely, I was definitely at a point where I was using Gemini 2.5 Pro more, and I would say that's kind of shifted. You know, it's maybe like 60, 40 now. I'm probably back in chat GPT 60% of the time and Gemini 40% of the time. Depends on the use case. Often I test them both. But it's a really good model.
B
Yeah, that is a point that always strikes me. You know, I'm sure you've had people come up to you as well at events where they're like, oh, I just got rid of my Gemini account. I'm over on ChatGPT or Claude Full time and I'm like, I don't understand how you do this. They change so often. I have to just have all the accounts.
A
Right. You just lost all your chat history three months now. And when you change back to the other one. Yeah, all right. Just ask for Christmas, like, or ask for your birthday. Say, hey, just give me a ChatGPT license for the year. A Gemini license for the year.
B
That's a great idea. Instead of a gift card, give me this, you know.
A
Yeah. And by the way, that's a great gift idea for if you have students in your life, give them, although you can get them free higher ed, you can get all the models for free until the email may 26, but maybe they don't know that you can hold up and say, I spent the $200. There you go.
B
All right, next up, Elon Musk says that Wikipedia is hopelessly biased. So his AI company, xai, is building its own rival. This project is called Grokipedia, an open source knowledge platform powered by xai's chatbot, Grok. And like Wikipedia, Grokipedia will invite public contributions. But Musk is promising fewer guardrails and more openness. He's casting it as an answer to what he calls an army of activists who are shaping narratives on the existing Wikipedia, which also happens to be a primary source for Google results and AI training data. Musk has also said this project is super important for civilization in a post on X. So, Paul, I guess a lot of things potentially going on here, certainly within his rights to create Grokopedia. I guess I'm pretty skeptical we're suddenly going to get truth from a guy who routinely tweaks his own model when it says something he doesn't like. But I could be proven wrong.
A
Yeah. So this one we've talked a little bit about a couple months ago, I don't remember what episode it was, but where he said, like, hey, we want to basically rewrite history like that to be correct. And at the time we said, well, whose version of history is correct? Like, why would your version be more correct than somebody else's? So I would say, like, at a high level, no one would argue that Wikipedia doesn't have issues like that. It doesn't have inaccuracies and things like that. I saw actually someone ran an experiment last week where they were giving wikipedia pages to GPT5 and asking it to find the inaccuracies. And it was doing very well at it. Like it was finding multiple inaccuracies and things like that. So there's no argument against this idea that Wikipedia is not a perfect source. That being said, like, my immediate reaction is, well, where's the. You're going to have bias in yours too. So I thought, well, you know what, let's try Grok. So I went into my X account and I opened Grok, which is their AI assistant, their version of ChatGPT. And I said, won't Grokopedia suffer from the same human bias he Elon Musk is accusing Wikipedia of just from a different perspective. So this is Grok's response that Elon Musk built Elon Musk's push for Grokopedia aims to counter what he perceives as Wikipedia's bias. His vision is an AI driven open source knowledge repository built by xai, his AI company, leveraging the Grok model to provide unrestricted public access and sidestep Wikipedia's hierarchical editing structure, which he's criticized since 2023 for stifling free speech. However, your point hits on a fundamental challenge. AI isn't immune to bias. It inherits it from the humans who design, train and curate it. Grokopedia's reliance on Grok, an AI developed by xai, means its outputs will reflect the data it's fed and the perspectives of of its creators. If Wikipedia's bias stems from its editor community, often skewing left leaning as Musk has claimed, or influenced by Western media dominance, Grokopedia could simply mirror a different bias, potentially one aligned with Musk's views or Xai's mission to understand the universe, which might prioritize tech centric or libertarian leaning narratives. For instance, xai's recent GSA contract to deploy Grok to federal agencies could introduce a pro innovation, pro business slant shaped by Musk's influence and company goals. The risk is compounded by AI's opacity. Opacity unlike Wikipedia's editable pages, where bias can be traced to specific edits, Grokopedia's AI generated content might embed bias in its training data, say from skewed data sets or human level examples. Without clear accountability, will Grokopedia prioritize neutrality or reflect its creator's agenda? To avoid this, Grokopedia would need a diverse global contributor base and transparent bias mitigation processes. Without them, it risks swapping one bias for another, just with a shinier AI wrapper. Musk's track record suggests ambition, but the devil's in the execution. Data diversity and oversight will determine if Grokopedia transcends human bias or just reframes it. What do you think? Should they crowdsource the training data to balance it out? So that's a great answer from Grok. All Grok's perspective. None of my personal opinions in there. All I know is he tweeted that like version one of this thing or version 0.1 is going to be out this month and there's no way that they went through the process that GROK is proposing to eliminate bias from this with that time horizon. So all I will say, like GROK will be trained on Grokopedia, so they're using GROK to write Grokopedia, and then that synthetic data from Grokopedia will then be used as a replacement to Wikipedia, which is a dominant source of XAI training data for grok. No further comment.
B
We'll be able to try it out soon enough. All right, next up, Thinking Machines Lab, the startup headed by former OpenAI CTO Mira Muradi, has released its first product or tool called Tynker. Tynker is a training API that strips away the messy infrastructure work of fine tuning large language models while still giving researchers control over the parts that matter most, data, algorithms, and evaluation. So the significance here is it makes experimentation really easy. Princeton and Stanford researchers testing it said it freed them up from worrying about compute and let them focus on their science. Andrej Karpathy called it a clever way to slice up the complexity of post training, giving developers about 90% creative control with under 10% of the engineering overhead. So in practice, Tynker could accelerate everything from building specialized classifiers to refining smaller models for niche tasks. So, Paul, the reason we're kind of talking about this is Thinking Machines Lab has raised a ton of money, but they've been really quiet about what they're actually building. Tynker is kind of a first look behind the curtain. Like, what does this tell us about what we can expect from Mirati and her startup?
A
Their last round, they raised 2 billion at a $12 billion valuation. And I think the company's about a year old, right? That was July of 2025 when they raised. Yeah, I mean, Mira is a major player CTO of OpenAI played a major role in Sam, you know, getting ousted as the CEO and then a bigger role in getting him back as CEO. It seems like they're focusing on the technical side. Like, I don't think we're going to be getting a ChatGPT competitor from thinking Machine Lab in the near future. Maybe that's on the roadmap. But they have been very stealthy to date. Very little is known, but it seems like they're going to take a very open approach to their research and their building. And. And so it's going to be intriguing to follow this. This is not. Yeah. The average listener who's not a developer building AI, you're not going to be using Tynker, but it is just a company we like to keep an eye on because Mira is an important figure in AI today.
B
Next up, California just passed the Nation's most ambitious AI transparency law. So Governor Gavin Newsom signed Senate Bill 53 SB 53, the Transparency and Frontier Artificial Intelligence Act. And it requires large AI developers to publicly disclose their safety frameworks, publish updates within 30 days, and explain how they're aligning with national and international standards. The law also creates whistleblower protections and a new system for reporting critical AI safety incidents directly to the state's Office of Emergency Services. Non compliance carries civil penalties enforceable by the Attorney General. So California has nearly 40 million residents and AI hubs like Silicon Valley, so state level rules here matter and can also help set de facto national standards, which is why we're talking about this and it's also why some of the AI companies lobbied so hard against this. OpenAI argued this could stifle innovation. They push for federal or global agreements instead. META launched a super PAC to sway future regulation. And Anthropic, by contrast, endorsed the final version of this bill after some negotiations. So, Paul, this is definitely not as strict as the previous AI regulation being proposed in California, which we've talked about in the past, which was SB 1047 that was vetoed by Newsom. But this does seem like a pretty big deal nonetheless.
A
The labs hate this. I mean, outside of Anthropic, who I'm sure doesn't actually love it, generally speaking, but they are much more safety conscious and feel like there has to be something done. But the idea of having to adhere to 50 different state laws is a nightmare. And again, like their point about dragging down innovation, increasing the cost of doing these things, and allowing America to keep a competitive advantage, which again, I am not an expert in this stuff that makes sense. Like, it seems if I was running an AI lab, I would hate the idea of having to work with all these different states. I can say as a employer who has an employee in the state of California, they're pain in the ass to deal with. Like, as a whole, California is very challenging. But you know, I get why Governor Newsom is doing this. I understand that they can't wait for federal who Also, the current administration doesn't want states to get involved. They don't want any regulation, really. So, like, the current administration isn't going to step in and put the kind of protective measures in place because they don't want it. Certainly not going to happen internationally. There's no way that they're. So I don't know what the alternative is. So I don't personally see state by state as a very logical way to do this. It doesn't seem like it's coming from anywhere else. So, like, who. Who's going to do it if they don't do it. So I think that's kind of just where we are is like somebody feels like they got to do something. The labs don't want it, the current administration, the White House doesn't want it. But that's why states have their own rights and their ability to do these kinds of things in their own laws. So yeah, I don't know. I haven't. Maybe we'll dive into this one in a future episode. Like get a little deeper in like the reaction to this. I didn't have time to pull like a ton of how the different labs and leaders are responding to this, but that's certainly not what they wanted to have happen.
B
So on a past episode we talked about OpenAI's GDP VAL benchmark that basically is trying to measure how well AI does real world economic work. And hot on the heels of that, the company Merkur has just released the AI Productivity Index, or apex, which ranks models on their ability to handle work in consulting, finance, law and medicine. Apex, like gdpval, simulates real world deliverables like drafting contracts, building financial models, or diagnosing patients, and then has domain experts grade the output. The final results of this initial benchmark put GPT5 on top, scoring 64% overall. Grok4 came in second, Gemini 2.5 Flash came in third. Now one interesting finding is that while GPT5 led across all four of these fields, some cheaper models outperform premium ones, suggesting cost doesn't always equal capability. And even the leading models struggled on tasks like redlining contracts where the top models barely cleared 50% success rates. So advisors to Apex, interestingly include Larry Summers in economics, Cass Sunstein in law, and Eric Topol in medicine, all huge luminaries in their fields. And Paul, what I found really interesting here is that they devised these test assignments in all these domains, partnering with experts in each one. So for instance, I was drawn to looking at some of the consulting stuff. They had a benchmark for the role of consulting associate and they had input there and review from experts from McKinsey, BCG, Deloitte, Accenture, EY. They were also advised on this by a former McKinsey global managing director. So I thought that was pretty fascinating to see.
A
Yeah, and we talked about Brendan Foody, the CEO, I think on the last weekly episode. I just listened to another podcast with them a couple days ago. I think this one was. So I talked about the 20 VC podcast. Yeah, this was Lenny's podcast. Why experts writing AI evals is creating the fastest growing companies in history. Another really good podcast episode. I'll drop the link in the show notes. Yeah, I mean, further, again, you got to remember what Merkor is doing, which is they're building the reinforcement learning economy of experts teaching AI models how to do their jobs. So they're hiring bankers from Goldman Sachs and attorneys from top law firms and doctors and consultants. And they're paying these people 95 to $500 an hour to train the models to do the work of experts across all these domains. So there, there's a number of reasons why they would do this kind of research, but it's fascinating stuff. And, you know, I think that when you listen to Brendan, the, the CEO, like, they're. They're very aggressively going to go after this. And I think it's good that they're sharing the research because they need to create a more awareness about what they're doing and the impact that's going to have on the economy and jobs. So I would, I would. You know, we mentioned Brendan is someone to pay attention to. He's only 22. Yeah. Which is kind of crazy. He's already got this. The impact he's having. But I would listen to what he says and listen to these episodes if you really want to understand what they're doing and the broader impact all this is going to have. But yeah, these evals are fascinating. And I think I mentioned maybe on the episode 170. It would have been one. Yeah, 170. I've been working on this idea. I mentioned it to Mike a couple months ago of like, helping companies develop their own evals. I think this is really important that not you're sitting around watching for all these other evals and waiting for them to do a study of your job or your industry. I think you need to do those yourself. And I've got some ideas of how to help people do that, but like, at a real high level, just to sort of open source the thinking here. Take. Take key, key job titles or key roles within your company and find the fundamental things that those people do. And then when new models come out, like have set prompts that allow you to evaluate how far along have these models come. So if you have a task that generally takes like, say it's like 10 steps, whatever, and it takes you two hours today and you go check, you know, Claude Sonnet 4.5, or you know, Gemini 3, when it comes out and you say, okay, give it the prompt, give it that project and how well does it do, you know, with accuracy, how much time did it take to do it how. And then when the next model comes out, same prompt like you, you have this set way to benchmark your own work of how the models are evolving. And I think that's what's missing. I don't know a single company that's doing this from a knowledge work perspective. They're doing it from a coding perspective. But again, my, I guess my call to action here is don't wait around for Merkor or OpenAI or whoever else to figure out the benchmarks for your company. Develop them yourself. It's something we're going to do at SmartRx this fall and I'm going to try and take that learning and I'll try and package it up so other people can do the same thing in some simple ways.
B
I love that. And in the meantime, go to the link we'll have in the show notes to Apex, because if you click into each of these areas, they literally show you like a sample prompt that they use. And it's just straightforward. It's really good for ideas in your own industry.
A
Yeah, so okay, yeah, I'm looking. Because if you don't.
B
Yeah, yeah, consulting associate. And they'll say like your client's a private equity investor targeting Malaysian small to medium sized companies, blah, blah, blah. And there's this whole like series of steps to take giving you a sample file. Go like run these calculations or find this output.
A
Yeah, this is really good. Yeah, definitely take a look at this and use it as inspiration for those kind of like personalized ones we were talking about. Yeah, that was good.
B
All right, we've got some more more news around AI's evolving impact on jobs and the economy. So a few items here we're tracking. So first, a report from CNBC says the airline Lufthansa is cutting 4,000 jobs and leaning on AI to fill the gap. The airline says most of the cuts will come from admin roles in Germany. This is part of a sweeping restructuring plan and they said that the increased use of AI will lead to greater efficiency in many areas and processes. Second, Business Insider says nearly 90% of BCG's 33,000 employees now use AI. And performance reviews fold AI use into the various skills consultants are judged on, like problem solving and insight. I found it interesting too. They're also pushing hard into custom GPTs. So employees are building no code tools to check slides, anticipate client questions, and even enforce BCG formatting. And they've now built more of these than any other OpenAI customer, apparently. Third, Citibank is putting every one of its 175,000 employees through AI training. The bank sent out a memo this week announcing that learning how to prompt effectively is now mandatory. And so basically they just say that if you learn to get better at prompting, you're going to make your AI work much more powerful now. Fourth, kind of a coda to all this is. For all the headlines about AI threatening jobs, wiping out jobs, some new data tells a different story. Some new data says the US labor market hasn't been yet meaningfully disrupted. Researchers at Yale looked at labor market trends since the launch of ChatGPT in late 2022 and found no major disruption. They found the mix of jobs is shifting a little bit faster than usual, but not dramatically and not in a way that clearly points to AI. And they see some of the biggest shifts started before generative AI even hit the scene. So AI tools like ChatGPT and Claude are transformational for many, but according to this, haven't yet transformed employment patterns. Most workers are still in the same kinds of jobs. There's been no clear spike in unemployment tied to AI exposure. So, Paul, another week, another set of signals that AI is definitely having an impact on how companies are hiring and training. Interesting to see that according to the Yale research, the widespread job loss or disruption is not yet showing up in their data.
A
I would love to see that report in 12 to 18 months. I hope it says the same thing. I'm not optimistic that it will. Yeah, I like to see the, the movement on the literacy stuff, though. And the prompt training.
B
Yes.
A
Like, that's great. And the building of GPTs, which is what we've always said, like that want the fastest way to value and help people understand AI who don't. Don't understand it. Build a GPT that helps them do their job. Personalized GPT that does the things they do and like, assists them and takes away some of the mundane, repetitive stuff that they don't enjoy and find fulfilling. Like, that's how you have success. So it's, it's good to see this stuff. The, the prompt training is critical. You gotta have like, the fundamental training with it though. Like prompt training on its own. Like, oh, Mike, here's like five prompts to use. It's like, why am I using this way? How does machine work? Like, what? It's fine. Like, it'll get further to provide prompt training. That's good. And a catalog of prompts to use. But fundamental understanding of the technology is also critical if you actually want to, like, reskill and upskill. People in the organization. So hopefully there's an element of that going on as well.
B
All right, Paul, our final topic. We've got AI product and funding updates. I'm just going to run through these real quick and kind of close us out here.
A
Sounds good.
B
All right, first up, Google is rolling out a major visual upgrade to its AI mode. In search. You can now search with images and conversational text, asking for, you know, things like, hey, show me barrel jeans that aren't too baggy and get back a stream of shoppable options. And this new feature is powered by Gemini 2.5. It's designed to make visual exploration and shopping online feel much more natural. Google is also launching a redesigned Google Home app. It's deeply integrated with Gemini. It has a new Ask Home feature that lets you use natural language to control devices, find specific camera clips, and even describe complex routines you want to create. Many of the new Gemini powered features, including daily summaries called HomeBrief, will require a new Home Premium subscription starting at 10 bucks a month. A new AI startup called Periodic Labs has launched with a massive $300 million seed round to build what they call AI scientists. This is founded by top researchers at OpenAI from OpenAI and Google DeepMind, and the company aims to move beyond Internet data by creating automated robotic labs. These labs will allow AI to design, run and learn from physical experiments to accelerate discoveries in fields like material science. Apple is reportedly shelving plans for a cheaper, lighter version of its Vision Pro headset. Instead, the company is shifting resources to prioritize the development of AI powered smart glasses designed to compete with products from Meta. The first version of these glasses is expected to pair with an iPhone and rely heavily on voice interaction, with a potential release in 2027. And last but not least, we mentioned this a couple times. A reminder that OpenAI will have its Dev Day today, Monday, October 6th, after we record today's episode. So tune in for that. OpenAI says it'll be the largest one they've run, with 1500 developers expected. So, Paul, that's all we've got in a packed week of AI. And I'm sure there's more to come. Thanks for breaking everything down for us.
A
Yeah, I was just scanning to see if any news leaked to the dev day and I came across an Axios article we might have to talk about next week. It says Senate Democrats warn AI could erase nearly 100 million U.S. jobs in the next decade, according to a new report report from the Senate Help Committee, again by Axios Chat GPT based analysis says 89% of fast food jobs, 64% of accounting roles, and 47% of trucking positions are at risk. Senator Bernie Sanders wrote that AI and robotics being developed today will allow corporate America to wipe out tens of millions of decent paying jobs, cut labor costs and boost profits. This, this is the prelude to the political upheaval that they want to cause going into the midterms. So I don't want to diminish the significance of things like this at the end of an episode, but again, I'm just scanning this now. This is exactly what I would expect. This is the playbook I would have expected them to start running where you have to start seeding doubts about AI's positive impact and focus on the negatives. So yeah, we will have to pull this and talk a little bit more about it next week along with all the Dev Day stuff and maybe some other new models from some other companies. It's going to be an endlessly exciting October, but we also have Makon next week. Yes, Mike and I are going to record on Friday this week I think because we got to be ready for Macon next week. Final call to action Macon. AI maicon AI. We would love to have you in Cleveland October 14th the 16th with us. We will have a regular episode next week. Even though we're going to be at Macon, we're going to record it ahead of time. We'll launch that and then I guess we'll we can't skip an episode in October. Too much to talk about. We would run out of time. All right, thanks everyone for joining us. We will talk to you next week and hopefully we will see a bunch of you in Cleveland next week. Thanks for listening to the Artificial Intelligence show. Visit SmarterX AI to continue on your AI learning journey and join more than 100,000 professionals and business leaders who have subscribed to our weekly newsletters, download loaded AI blueprints, attended virtual and in person events, taken online AI courses and earned professional certificates from our AI Academy and engaged in the Marketing AI Institute Slack community. Until next time, stay curious and explore AI.
The Artificial Intelligence Show
Episode #172: Sora 2, Claude Sonnet 4.5, ChatGPT Instant Checkout, How OpenAI Uses AI, Grokipedia & Mercor’s AI Productivity Index
Hosts: Paul Roetzer & Mike Kaput
Date: October 7, 2025
In a blockbuster episode, Paul and Mike break down a whirlwind of essential AI news, focusing on:
Throughout, the hosts maintain an insightful but approachable tone, challenging listeners to think both strategically and skeptically about the AI-driven future.
(07:24–31:30)
“Is this actually Sora? Here’s your AI slop feed with all these Nintendo characters and Pokemon and South Park and SpongeBob SquarePants, everything...it was just like all this copyrighted stuff immediately. That was all that, and Sam Altman was all you see in the feed. And so I was immediately like, oh, I will never use this. Like, this is not interesting to me at all as a user.” — Paul (00:00, 09:47)
Copyright Chaos: Despite guardrails, the early feeds were flooded with copyrighted characters (Star Wars, Batman, Mario, etc.)
OpenAI’s PR Response:
Bigger Questions:
Mr. Beast (YouTube star) asks:
“When AI videos are just as good as normal videos, I wonder what that will do to YouTube and how it’ll impact the millions of creators currently making content for a living. Scary times.” (26:36)
Vinod Khosla (VC):
“All the replies...are from tunnel vision creatives. Let the viewers of this slop judge it, not ‘Ivory tower Luddite, snooty critics or defensive creatives’...opens up so many more avenues of creativity if you have imagination.” (26:37)
“I have no idea why I would ever go back into that app...from an entertainment value or an educational value. It’s zero to me.” — Paul (30:46)
(31:30–41:59)
Anthropic’s new Sonnet 4.5:
Strategic Focus: Anthropic is betting on AI for coding/agents, seeing quick revenue in automating software production and AI research itself.
“The bitter lesson”:
(42:01–46:57)
“It’s so natural to just inject ads in and be all the better if I can just click one button and I can just make my purchase right from there.” — Paul (44:08)
(49:40–53:43)
“SaaS companies are in trouble.” — Paul (50:50)
“OpenAI announced internal software applications that could potentially compete with existing SaaS offerings. The news sparked concerns across the sector.” (Yahoo Finance, 51:26)
(53:43–56:39)
(57:27–62:23)
“AI isn’t immune to bias. It inherits it from the humans who design, train and curate it.…[Grokipedia] risks swapping one bias for another, just with a shinier AI wrapper.” (Grok’s answer, paraphrased by Paul at 59:52)
(67:45–73:27)
“Don’t wait around for Merkor or OpenAI or whoever else to figure out the benchmarks for your company. Develop them yourself.” (69:31)
(73:27–76:55)
Job shifts:
Yale data: Despite the hype, there’s “no evidence” generative AI has yet disrupted US employment patterns. The hosts are skeptical this will still be true in 12–18 months.
On skills:
“Prompt training is critical, but fundamental understanding of the technology is also critical if you actually want to reskill and upskill people in the organization.” — Paul (76:05)
(47:15–77:02)
| Segment | Main Topic/Segment Description | Start Time | |---------|--------------------------------------------------------|------------| | 1 | Welcome, Episode Context, and MAICON Plug | 00:00 | | 2 | Sora 2 & Video App: Tech, Personal Experience, Legal | 07:24 | | | - Viral Sora “slop” & Copyright Blowback | 09:47 | | | - Legal Response (Krista Laser) | 20:22 | | 3 | Broader Industry Response & Societal Impacts | 26:36 | | 4 | Claude Sonnet 4.5: Coding, Automation, and “Bitter Lesson” | 31:30 | | 5 | ChatGPT Instant Checkout, Shopping & Ads | 42:01 | | 6 | Rapid Fire: Finance, OpenAI on OpenAI Tools | 47:15 | | 7 | SaaS Industry Threat, HubSpot Stock Shock | 50:50 | | 8 | GPT-5 Backlash & Altman's AGI Rethink | 53:43 | | 9 | Grokipedia (Musk/Wikipedia Rival), AI Bias | 57:27 | | 10 | Thinking Machines’ Tynker Release | 62:23 | | 11 | California AI Regulation | 64:30 | | 12 | Mercor/Apex Benchmarking Knowledge Work | 67:45 | | 13 | AI’s Real (and Not-yet) Impact on Jobs | 73:27 | | 14 | News Roundup: Google, Apple, OpenAI Dev Day | 77:02 |
This episode rigorously unpacks the breathtaking acceleration—and growing pains—of the AI industry in autumn 2025. The key themes are:
“You gotta know where the tech is going…this is not something you can just sit back and not worry about for a quarter.” — Paul (46:57)
For more details, links to referenced podcasts, articles, and legal opinions, refer to the episode show notes at SmarterX AI.