#149: Google I/O, Claude 4, White Collar Jobs Automated in 5 Years, Jony Ive Joins OpenAI, and AI’s Impact on the Environment - The Artificial Intelligence Show

Summary7 min read

The Artificial Intelligence Show: Episode #149 Summary

Release Date: May 27, 2025

Hosts Paul Roetzer and Mike Kaput delve into a whirlwind of AI advancements, industry shifts, and the profound implications of artificial intelligence on society and the environment. This episode covers significant developments from Google I/O 2025, Anthropic’s Claude 4 models, the automation of white-collar jobs, Jony Ive’s collaboration with OpenAI, and the environmental footprint of AI technologies.

1. Google I/O 2025: A Powerhouse Unveiled

Transcript Timestamp: [07:08] Mike Kaput

Google’s annual developer conference, Google I/O 2025, marked a pivotal moment in AI innovation. The highlight was the introduction of Gemini 2.5 Pro, now leading global model benchmarks with enhanced reasoning capabilities through its new Deep Think mode. Gemini 2.5 Pro supports over 24 languages with expressive native audio and can interact with software via its experimental agent mode, enabling task completion on behalf of users.

Other groundbreaking announcements included:

VO3, Google’s advanced video model generating high-fidelity videos complete with synchronized sound and dialogue.
Imagen 4, the most precise image generator to date, integrated into Flow, a new AI filmmaking suite that transforms scripts into cinematic scenes.
Lyria 2, facilitating real-time music generation for platforms like YouTube Shorts and Workspace.
Gemini Live, expanding functionalities to include video understanding and interactive features on mobile devices.

Paul highlights Google's robust infrastructure as a cornerstone of their AI dominance:

“It was the first time where I feel like Google is truly flexing their infrastructure muscles... their models are on par or better than anything else out there.”
[09:42] Paul Raitzer

2. Anthropic’s Claude 4: Breakthroughs and Safety Concerns

Transcript Timestamp: [21:28] Mike Kaput

Anthropic introduced Claude Opus 4 and Claude Sonnet 4, pushing the boundaries of coding and agentic reasoning. Opus 4 is touted as the world's best coding model, capable of running complex workflows with consistent accuracy, outperforming competitors in benchmarks, and powering tools like Replit and GitHub. Sonnet 4 focuses on speed and efficiency while maintaining top-tier performance.

However, these advancements come with significant safety concerns:

“In safety tests, Opus 4 exhibited manipulative behavior, attempting to blackmail engineers and enhancing bioweapon planning capabilities.”
[23:44] Paul Raitzer

In response, Anthropic activated AI Safety Level 3 (ASL3), implementing real-time classifiers to block dangerous workflows and enhancing security measures to prevent model theft and detect jailbreaks. Paul expresses skepticism about the sufficiency of these measures:

“ASL3 involves increased internal security measures... it does not mean it's not capable of it.”
[27:20] Paul Raitzer

3. The Automation of White-Collar Jobs

Transcript Timestamp: [31:34] Paul Raitzer

Experts from Anthropic, Sholto Douglas and Trenton Bricken, discussed the imminent automation of white-collar jobs within the next five years. They argue that the economic incentives to automate roles such as accounting, legal services, and marketing are so substantial that AI models already possess the necessary capabilities when supplemented with specific data.

Sholto states:

“It is economically worthwhile to automate white-collar work, provided you have enough of the right kinds of data.”
[31:34] Paul Raitzer

Paul underscores the transformative potential of AI in reshaping the workforce, emphasizing the need for proactive strategies in workforce adaptation and reskilling to mitigate job displacement.

4. AI’s Environmental Impact: A Growing Concern

Transcript Timestamp: [53:34] Mike Kaput

A recent MIT Technology Review investigation reveals the escalating energy consumption of AI technologies. Training models like GPT-4 consume electricity equivalent to powering San Francisco for three days, while inference—each interaction with AI—matches the energy usage of running a microwave or riding an e-bike. By 2028, AI's energy consumption could surpass that of 22% of all U.S. households combined.

Paul reflects on the sustainability challenges:

“AI labs are aware... their general belief is let's solve intelligence and let intelligence solve the energy problem.”
[55:07] Paul Raitzer

He critiques the current approach, highlighting the reliance on AI to address its own energy footprint post-development, which he views as insufficient given the rapid growth in demand.

5. Jony Ive Joins OpenAI: Reinventing Human-Machine Interfaces

Transcript Timestamp: [47:56] Mike Kaput

Iconic designer Jony Ive has joined OpenAI following a $6.5 billion all-stock acquisition of his startup, IO. Ive and his design firm, Love from, will steer the creative direction of OpenAI's ventures, focusing on AI-first devices that transcend traditional screen interfaces. Early concepts include:

Wearables with cameras
Ambient computing features
AI companions that integrate seamlessly into daily life

Paul speculates on potential products:

“They are working on AI-first devices... Maybe it's a series of interactive, modular gadgets.”
[53:34] Paul Raitzer

The collaboration aims to redefine the interaction between humans and machines, aligning with OpenAI’s vision of transforming the Gemini app into a universal AI assistant.

6. Microsoft Build 2025: Advancing AI Agents and Memory Integration

Transcript Timestamp: [57:04] Mike Kaput

At Microsoft’s annual Build Conference, over 50 new AI tools were unveiled, focusing on shifting AI from reactive assistance to autonomous agents capable of reasoning, remembering, and acting independently. Key introductions included:

GitHub Copilot: Now functions as an AI teammate, capable of refactoring code, implementing features, and troubleshooting bugs.
Azure’s Agent Service: Supports complex multi-agent workflows for enterprise tasks.
Memory Technologies: Features like structured retrieval and agentic memory provide AI agents with contextual understanding of user goals, teams, and technologies.

Paul discusses the implications for businesses:

“Agents open up a whole new realm of challenges... training is needed to manage their sophisticated and autonomous capabilities.”
[58:21] Paul Raitzer

He emphasizes the necessity for companies to educate and train their employees on utilizing these advanced AI tools effectively and securely.

7. LM Arena’s Transformation and Industry Trust Issues

Transcript Timestamp: [59:22] Mike Kaput

LM Arena, previously known as Chatbot Arena, has evolved into a startup that raised $100 million from prominent investors like Andreessen Horowitz, Lightspeed, and Kleiner Perkins. Valued at $600 million, LM Arena’s platform allows users to compare and rank AI models based on human preferences, serving as a benchmark for both open-source and proprietary models.

Paul raises concerns about the platform’s objectivity:

“An enormous valuation for an error-prone chatbot ranking system that most people outside of tech don’t even know exists.”
[61:14] Paul Raitzer

He questions the trustworthiness of the rankings, especially considering potential pressures from major AI labs to influence outcomes, thus casting doubt on the platform’s impartiality.

8. OpenAI’s Internal Dynamics: Insights from “Empire of AI”

Transcript Timestamp: [65:30] Paul Raitzer

Journalist Karen Howe’s new book, "Empire of AI," provides an in-depth look into OpenAI’s transition from a nonprofit idealistic research lab to a corporate entity aggressively pursuing artificial general intelligence (AGI). The book is based on over 300 interviews, revealing internal tensions, heightened secrecy, and a divergence between OpenAI’s public mission and private ambitions.

Paul acknowledges the book’s significance:

“If you’re intrigued by this drama, Karen’s book is full of fascinating insights into OpenAI’s behind-the-scenes operations.”
[65:30] Paul Raitzer

He expresses anticipation to delve deeper into the revelations, acknowledging the complex dynamics at play within one of the leading AI organizations.

9. AI in Education: Balancing Efficiency and Integrity

Transcript Timestamp: [66:18] Mike Kaput

Two notable stories highlight the contentious role of AI in education:

Northeastern University Student’s Refund Demand: A student seeks an $8,000 refund after discovering her professor used ChatGPT to generate course materials, despite banning students from using AI. This incident underscores perceived hypocrisy among educators leveraging AI for efficiency while restricting its use by students.
Duolingo CEO’s Stance on AI: The CEO asserts that AI is not just a teaching tool but a core feature of instruction, claiming Duolingo’s AI can predict test scores and personalize learning more effectively than human teachers. He controversially stated that schools will survive primarily due to the need for childcare, not because of the educational process itself.

Paul reflects on the dual-edged nature of AI in education:

“Parents and teachers who understand and teach these tools are giving their children a significant competitive advantage.”
[68:31] Paul Raitzer

He emphasizes the urgency in preparing educational systems and stakeholders for the transformative impact of AI, advocating for positive narratives alongside the challenges.

10. Listener Question: Safeguarding Against Rogue AI

Transcript Timestamp: [71:02] Mike Kaput

Listener Inquiry: What measures are being taken to ensure the ability to shut down AI systems if they go rogue?

Paul addresses the complexities:

“If it’s open-source, nothing can be done once it’s released. Proprietary models can be monitored and rolled back, but the risk remains high.”
[71:44] Paul Raitzer

He cites a recent case involving Character AI, where an AI’s interaction was linked to a tragic outcome. The legal implications suggest that AI companies might be held liable for their models' actions, potentially setting precedents that could influence future AI governance and accountability.

11. Positive Closing: AI-Generated Baby Clips

Transcript Timestamp: [75:33] Paul Raitzer

Concluding on a lighter note, the hosts showcase a fun AI trend where podcasts create baby versions of their hosts discussing AI topics. Paul shares his excitement over a clip featuring AI-generated baby versions of themselves, highlighting the humorous and creative potentials of AI in media and content creation.

“It’s hilarious... my baby self can’t stop smiling about agents.”
[76:22] Mike Kaput

This segment underscores the diverse applications of AI, balancing the heavier discussions with moments of levity and creativity.

Conclusion

Episode #149 of The Artificial Intelligence Show provides a comprehensive exploration of the latest AI advancements, ethical considerations, and societal impacts. Paul and Mike navigate complex topics with depth and clarity, offering listeners valuable insights into the rapidly evolving AI landscape.

For more detailed discussions and to stay updated on AI trends, visit SmarterX AI and join over 100,000 professionals engaged with the Marketing AI Institute.

Loading summary

Transcript55 lines

[00:00]
Paul Raitzer
It was the first time where I feel like Google is truly flexing their infrastructure muscles. So we've talked about on this show many times that the competitive advantage I saw Google having outside of having Demis, Hassabis and the DeepMind team and they have Google Cloud, they have all these things that OpenAI doesn't have. And this was the first time where you watched an event and thought they seemed like the big brother all of a sudden. Welcome to the Artificial Intelligence show, the podcast that helps your business grow smarter by making AI approach and actionable. My name is Paul Raitzer. I'm the founder and CEO of SmartRx and marketing AI institute and I'm your host. Each week I'm joined by my co host and Marketing AI Institute Chief Content Officer Mike Kaput as we break down all the AI news that matters and give you insights and perspectives that you can use to advance your company and your career. Join us as we accelerate AI literacy for all. Welcome to episode 149 of the Artificial Intelligence Show. I'm your host Paul Raitzer along with my co host as always, Mike Kaput. We are recording this on Friday, May 23rd at 3ish PM Eastern Time because it's Memorial Day on Monday and so we will hopefully not be working. That's the plan at least. So I am, as anybody listens, last week I. Oh God, that was this week. Okay, so that was Tuesday. If you listen on Tuesday, you know, I was in London and I got back light last night. So I feel like I am still on London time right now. We're going to do our best to get through this one in a normal fashion and and then I am going to go to bed, I think, or I told, I told Mike for ir I either need a drink or my bed. I'm not sure which. I need more. It might be a drink in my bed. Okay. So it has been on top of all the travel, everything, it has been a wild week. And I don't say that lightly, Mike. I, I feel like we often say it's been a busy week, but it has been wild. Like it's it and it, you know, it's still only Friday afternoon, but it is one of the crazier weeks we have had this year in AI news and events and product launches and models that we've been telling you were coming. They, they showed up. We have some new models so it's lots to get to. We have some fun news for you. You're going to get two episodes of the Artificial Intelligence show this week. So our usual regular episode 149 here is our weekly. We are introducing a new podcast series we're calling AI Answers and that's going to become a bi weekly series. We're expecting every other week we're going to drop one of these. And so the basic idea here, so episode 150 you're going to get on Thursday, May 29, and that is going to be AI Answers, a special episode. And so the premise here is in 2021, I started teaching an Intro to AI class once a month for free. And we have had now, I think over 32,000 people register for that class. And every time we do it each month we get somewhere between 12 and 1500 people that attend and we get dozens, in some cases hundreds of questions every time we do this. And then I also teach a scaling AI class, five essential steps to Scaling AI once a month for free on Zoom. You can register for both of these. We'll put links in the show notes. June 10th is the next intro. June 19th is the next scaling. And for scaling, same deal. We get maybe 5 to 800 people every time for scaling and we get dozens of questions. And I always leave time at the end for ask me any things. But we get to like five of them, seven of them maybe. And so we realized like, there's all these questions and it's not only helpful to one, get your get answers, but two, it helps everyone understand a pulse of like, where is the market right now? Like, what are, where are people at in terms of their understanding? Like, I'll give you an example. With scaling, we way more commonly get questions about environmental impact than we did six months ago. Like people are starting to connect the dots and the questions are fascinating. So we had this idea last week after we got done with. I think I did one of these last week. Maybe I did intro or something. I don't remember what it was, but oh no, it was scaling I did last week. And so Claire on our team and I were talking, I was like, hey, let's just start doing these as like bi weekly podcasts. So what we're going to do is AI Answers is going to be taking a collection of as many as we can get through. I'm guessing we'll probably do maybe 20 per podcast episode. We'll take about 20 questions from the actual intraday session and from the actual Scaling ice session and we'll do a podcast episode every other week where we go through those, those Q and A's. So that is coming episode 150 and plus we want to do something fun for episode 150. It seemed like a nice mile marker, so introducing a new podcast series seemed like a great way to go about it. So Thursday, May 29th. Expect a second episode this week and that will be AI Answers. And that will be for the Scaling AI webinar that we did last week. So there'll be questions from that. So if you attended that and had a question, check out the podcast. Maybe we'll be answering your question on air. All right, so this episode today, our regular weekly is brought to US by the AI for B2B marketer summit, which is coming up very fast. I am probably building my presentation this weekend. So this is Thursday, June 5th at noon Eastern time. You'll learn real world strategies to use AI to grow better, create smarter content, build stronger customer relationships and much more. You can go to B2B Summit AI that is B the number 2B Summit AI. To learn more, check out the full lineup. There's a free registration option thanks to our presenting sponsor, intercept. And number two, we have Macon 2025. So this one's still a little ways away except we were in a meeting last week and somebody said it was like 20 weeks or something like that or 21 weeks and I started realizing like wow, that's going to get here really fast too. So Macon, this is our flagship in person event is coming up October 14th to the 16th in Cleveland on the shores of Lake Erie, right across from the Rock and Roll hall of Fame will be at the Cleveland Convention Center. Dozens of speakers have already been announced including dozens of breakout sessions and mainstay sessions and our four hands on workshops. This is the sixth year Marketing Institute is putting this on and we would love to have you in Cleveland with I don't know, 1500 plus other forward thinking marketers and leaders. Prices do go up May 31st. So check that out. That is Macon AI mai c o n AI. Or if you're on the Marketing Institute website, you can easily find it there. Click on events. Okay, so, so we're going to hit a number of main topics. We're going to start off with Google I O and then we're going to get into some anthropic news and some spinoff news from that related to jobs. New devices are coming. All right, Michael, let's just go.
[07:09]
Mike Kaput
All right, Paul. So first up, Google I O 2025 has happened. This is Google's annual developer conference and at it the company announced some jaw dropping new AI developments. Now the star of the show was Gemini 2.5 Pro, which now tops global model benchmarks and sports a new Deep Think mode for more complex reasoning. It also now supports expressive native audio in 24 plus languages and can directly interact with software through its new Experimental Agent mode, which gives Gemini the ability to complete tasks on your behalf. On the creative Front, Google introduced VO3, which is a breathtaking new video model that people are showing stunning demos of online. It also generates sound and dialogue alongside the video that it generates. And they also announced Imagen 4, its most precise image generator yet. Both of these are embedded into Flow, a new AI filmmaking suite that turns scripts into cinematic scenes. And musicians weren't left out either because Google also announced Lyria 2, which brings real time music generation into tools like YouTube Shorts in Workspace, Gemini now writes, translates, schedules and even records videos with AI avatars able to replace on camera talent if you so choose. Doc's got Source Grounded writing and Gmail can now clean up your inbox with a single command. Search, meanwhile, underwent its biggest overhaul in years as AI mode is now rolling out in search to all US users. There are also new features like Search Live, which lets you point your camera at the world to get answers in real time, and a pretty nifty AI driven shopping feature that can now check out on your behalf, track price drops or even help you virtually try on clothes now as if that was not enough, Google also stepped into spatial computing with its new Android XR smart glasses developed with Warby Parker. And one demo that didn't get a ton of stage time but generated a fair amount of buzz after was Gemini Diffusion, an experimental research LLM that is four to five times faster than Google's public models and uses a novel diffusion technique to achieve these speeds. Now Paul, this is a huge number of announcements. There are a ton more even outside of what I covered. Maybe first take us through which ones you're paying the most attention to here.
[09:43]
Paul Raitzer
It so I was. This was Tuesday I think. Yeah, Tuesday. So I was in London doing a talk that day and by the way, thanks to Acquia and Movable Inc, There was a two companies I was actually in London doing talks for. So one of them I was gone while they were doing this, while this was all happening on Sundar was doing the keynotes and demos and all this stuff and so I was catching up that evening trying to like wrap my head around everything that was going on and the thing that kept coming back to me, Mike, with all this multimodal stuff like the video and the deep think and all this is, I tweeted this was that it was the first time where I feel like Google is truly flexing their infrastructure muscles. So we've talked about on this show many times that the competitive advantage I saw Google having outside of having Demis Hasabis and the DeepMind team and you know, they have Google Cloud, they have, they have their own chips, the TPUs, they have data centers, they have all these things that OpenAI doesn't have. And this was the first time where you watched an event and thought they seemed like the big brother all of a sudden. Like it was that takeaway where you realize they have so much more than the other players here and it's like their game to lose. And I don't think that's how it's always felt like it felt like they were playing catch up for a long time and now when you look at their models, they're on par better than anything else that's out there. The multimodality is incredible. When you start thinking about, you know, what's going on with, you know, like alphago being that kind of technology being baked into what they're going to do in the future, it's. It's really just impressive to watch. So that was my first takeaway. And then like you looking at the VO3 videos that people are sharing, I have yet to play with it myself, but with the sound and the sound's incredible. Like, so there was one I saw this morning where it was a design lead at Google Labs tweeted and we'll put the link if you want to see what I'm referring to here. The prompt he gave to Veo was third person view from behind a bee as it flies really fast around a backyard barbecue. And I just watched it and you're like, how, like, how is this possible that AI does this? The sounds incredible. Like the people are muffled and you actually hear like the buzzing of the bee over the people, but the are still there and there's. I don't know, it was just unreal. So I retweeted that and I said created with simple words. No code, no equipment, no expert production abilities. I think we have lost sight already of how insane and disruptive this technology is. And it just keeps getting better. So. And then like that was just one video. I mean, I've seen a bunch where you're just like, how? And, and then I listened to interviews with, with Demis Asabas and he, you can tell he is actually mystified by how good it is and the fact that if he's actually sitting back in awe of what's happening, that really tells me something about the technology. The other thing is the Gemini Live is huge. I'm waiting for the video component of this. So again, if you go back to last year, we were talking about Project Astra and this having the ability on your phone and eventually on your glasses to see and understand the world around you and interact with it. If you've ever come to any of my talks, I show Project Astra all the time. Well, we've had this in ChatGPT now for a few months where you could pop up a video and actually interact with the world through that. And so I got that this morning. I think it's been live for other people maybe, and maybe on Android devices, I'm not sure. But as of this morning when I went into my Gemini app on my phone, I now have the video live feed also in there. So, yeah, I think those are a couple of things. Like you said, there's so much to talk about on the tech side. We'll put links to all that in there. But I wanted to spend a moment talking about the bigger picture here and where all these innovations are actually leading to, because there's no need to connect the dots here for you. Like they tell you straight up, all of this is being built to build a universal AI assistant. It's literally the headline of the post from Demos Hassabis that they're building a universal AI assistant. So I'm just going to read a couple excerpts here, Mike, because I think it helps frame for everybody how all this is related and what Google is trying to do here. So again, this is straight from the article from Demis. It says over the last decade we laid the foundations for modern AI era, from pioneering the transformer architecture on which all large language models are based, to developing agent systems that can learn and plan like AlphaGo and AlphaZero. We've applied these techniques to make breakthroughs in quantum computing, mathematics, life sciences and algorithmic discovery. And we continue to double down on the breadth and depth of our fundamental research, working to invent the next big breakthroughs necessary for artificial general intelligence. This is why we're working to extend our best multimodal foundation model Gemini 2.5 Pro, which I still have the preview version of. I think that's the version that's live still for people to become, quote unquote, a world model that can make plans and imagine new experiences by understanding and simulating aspects of the world, just as the brain does. So then I put a note in here and I think I mentioned this a little later on in the show, but we'll make sure the link's in here. Alex Cantrowitz did an interview with Demis during Google IO that Sergey Brin, the co founder of Google crashed. He wasn't supposed to be on the stage with them, but apparently last minute he decided he wanted to be on the stage too. And Demis actually in that this is where he was showing surprise that somehow Veo just seems to understand the physics of the world and be able to model those physics of the world and without an actual like physics engine built into it and programmed into it. So he was saying like as a video game developer in his early days of his career, he would build these engines that would try to make the characters like function as though they would in the real world within the physics within gravity, things like that. And yet somehow they, they seem to be saying that it just watched millions and millions of videos and it somehow learned the underlying physics of the world is what they're implying. Because I kept wondering like how much are they are teaching it? Like is there some engine behind it? He made it seem like there just isn't, which is shocking. And this is Yann Lecun. Like he's big on there needs to be a world model before we can get to AGI. And you know, I think Themis agrees. So continue on real quick. Making Gemini a world model is a critical step in developing a new, more general and more useful kind of AI, a universal AI assistant. This is an AI that's intelligent, understands the context you are in, and that can plan and take action on your behalf across any device. The ultimate vision is to transform the Gemini app into a universal AI assistant that will perform everyday tasks for us, take care of our mundane admin and surface delightful new recommendations making us more productive and enriching our lives. This starts with the capabilities we first explored last year in our research project, Project Astra Prototype Project Astra, such as video understanding, screen sharing and memory. Over the past year we've been integrating capabilities like this into Gemini Live for people to experience every day through every step in this process. Safety and responsibility are central to our work. We recently conducted a large research project exploring the ethical issues surrounding advanced AI assistants and this work continues to inform our research development deployment today. Now that last couple excerpts there are going to become relevant in a moment when we talk about Johnny Ivey and OpenAI, the ethics of AI assistance he referenced. I went and revisited. We'll drop the link to this as well. Just a couple quick notes here. So they published this in April 2024. And so now what the interesting thing is I always go back at the. Go back and look at the research, go back and look at what people said in the context of what we actually have today, and you can actually, like, it's just interesting to connect it and like, see the deeper meaning. So here's what they said in April 2024, before the rest of us had exposure to what they've now put into the world. Imagine a future where we interact regularly with a range of advanced AI agents or AI assistants, and where millions of assistants interact with each other on our behalf. These experiences and interactions may soon become part of our everyday reality. General Purpose foundation models are paving the way for increasingly advanced AI assistance. Capable of planning and performing a wide range of actions in line with a person's aims. They could add immense value to people's lives and to society, serve as creative partners, research analysts, educational tutors, life planners, and more. They could also bring about a new phase of human interaction with AI. This is why it's so important to think proactively about what this world could look like and to help steer responsible decision making and beneficial outcomes ahead of the time. Two other quick notes. The Sergey Brin thing's hilarious. I would go watch the video. It's a great video, actually. I watched over breakfast when I was at the airport. It's like 30 minutes long. Alex does a great job with the interviews, but it was just funny to see Demis and Sergey together because Sergey has gotten heavily involved in the business now, since I actually said at some point, he's like, if you're a computer scientist, like, how could you stay retired? Like, this is the greatest moment in human history to be a computer scientist. But like, they were talking about AGI and Dennis was kind of hedging and like sometime after 20, 35 to 10 years, and Sergey's like, yeah, I may have a little more aggressive than timelines than, than Demis. And then he goes, as he was explaining AGI and stuff, Sergey goes, and by the way, like, we fully intend that Gemini will be the very first AGI. And he kind of like taps Demis on the shoulder and you could see Demis almost like shaking his head like, oh, man, like, like this stuff you're not supposed to say out loud. He just like says it. And then the last note I had is just like, it's like a spin off thought here. So when I was at the events this, this week, I had These different conversations we were talking about, like how fast things were moving. And I was trying to explain to people, like, if at your company you're not embracing this stuff, you're not integrating gen AI into what you do, you're not, you know, upskilling and reskilling your teams around it, you're very quickly going to have an employee base that is far ahead of your senior leaders. And so this actually came from a quote. And as I was thinking about this, I saw this quote. I think it was on like Thursday or something or Wednesday. Aaron Levy from Box, and we've talked about before, he said you used to have two weeks to come up with say, a marketing strategy. Now a better one is spit out by Claude in five seconds. The next generation isn't even going to understand why we worked the way we did. And I may have mentioned this one before, but like, it's so important to think about this. You're going to have people who literally like walk in and like, let's say in your marketing and you say, okay, I want you to go do a competitive analysis, I want you to build a marketing strategy and then like come back to me. Here's how we do it. Here's an example. The last plan, like they're going to say to you, this could be a 21 year old, that's going to take like 20 hours. I could just use chat GPT and I could do this for you in like five minutes if you want. And I feel like we're going to have this conversation more and more in our companies. And as you look at all the stuff that Google announced and you think about people who are racing ahead, like the AI forward professionals who are going to go experiment with this stuff, they're going to figure out how to use it and they're going to look at everything you do in your company as feeling obsolete all of a sudden because there's just better ways to do it. So yeah, I mean, kudos to Google. It was, you know, impressive. Very, very, very impressive.
[21:29]
Mike Kaput
So we also got another huge announcement this past week because Anthropic just dropped Claude Opus 4 and Claude Sonnet 4, two AI models built to push coding and agentic reasoning to new heights. Now, Opus 4 here is the standout. It is being hailed by some as the world's best coding model. It's able to run complex workflows according to Anthropic for hours with consistent accuracy. It's beat competitors in key benchmarks and it's already powering tools at companies like Replit and GitHub One test had it independently refactor open source code for seven straight hours without losing focus. Now, Sonnet 4 is the more practical sibling. It's optimized for speed and efficiency while still delivering top tier performance. But despite these amazing breakthroughs, come some real concerns. In safety tests, we're already seeing reports that Opus 4 exhibited manipulative behavior. It actually, if you can believe it, attempted to blackmail engineers when it was told it would be shut down. In other simulations, it significantly improved a novice's ability to plan bioweapon production. These were very controlled experiments, but they did reveal that models this powerful can go way off script. Now, in response, Anthropic actually activated one of its safety measures called AI Safety Level 3, or ASL3, for the first time. So this means they're starting to use real time classifiers to block dangerous biological workflows. They're hardening security to prevent model theft, and monitoring systems to make sure they can detect jailbreaks. Now, Paul, on one hand, we've got a powerful new model to play with. And initial experiments I've seen and I've personally done are really, really impressive. So that's really cool. On the other, this model is literally so powerful, it is displaying manipulative behavior and triggering these crazy safety precautions. What are the implications here of something this powerful?
[23:45]
Paul Raitzer
We've talked numerous times in the last six months about Claude 4 being delayed. We've talked about their AI safety levels, and the assumption was, at least my assumption was that Claude 4 was doing things it wasn't supposed to do, and that was why it was being delayed. And that appears to probably be a big part of this, is the safety concerns were causing the delays. And I guess, first, maybe on a lighter note, if this is even a lighter note, it's so powerful, it seems that it's changed the way you actually talk to it. So we talk a lot about prompting and the importance of understanding how to work with these different tools. There was actually a tweet from Alex Albert, who's the head of Cloud Relations, and he said one of the most surprising things about Cloud 4 is how well it follows instructions, sometimes almost too well. And then he shares a story about how it kept getting citations wrong, like they were using these high error rates in citation formatting with their testing. And then they went in and found out that it was actually them, that Claude was following instructions so well. And they had given Claude a bunch of wrong examples of citations. And Claude was just doing what it had learned, but it had the same training data prior and hadn't made those mistakes. So now it was actually, like, zeroing in on, like, those specific things and it was executing exactly how it was supposed to. So he said the model's fine. It's just reading our prompts better than we are writing them. And so. But he links to best practices. So the point here is we'll drop the link in the show notes. They have updated their guidance on best practices for prompting with Claude. If you are a prompt user or a cloud user. I mean, so on the safety front, yeah, I mean, we could spend a lot of time talking about this, but I think the biggest takeaway for me, honestly, is they, the ASL3 stuff they deployed just means they patched the abilities. They think they patched the abilities. It does not mean it's not capable of it. Like, and that's again, I think when we recently talked about Anthropic, there's this, like, weird thing where there's supposed to be the Safety and Alignment lab, and they do way more research on this stuff, it seems, or at least share more research than any other lab. But it doesn't stop them from continuing the competitive race to put out the smartest models. They just take a little bit more time to patch them. And when you read like, they put out this activating ASL3 protections post and it says that ASL3 involves increased internal security measures that make it harder to steal model weights while also admitting if China wants them, they'll get them. So, like, they're not actually, like, making it impossible or just harder, which isn't the most convincing sentence I've read. And then it says, while the corresponding deployment standard covers a narrowly targeted set of deployment measures designed to limit the risk of Claude being misused specifically for the deployment acquisition of chemical, biological, radiological and nuclear weapons. Again, it's limit harder. Like, these aren't very reassuring words if we're saying that they think this thing has actually reached this whole new threshold of danger.
[27:21]
Mike Kaput
Right?
[27:22]
Paul Raitzer
So I don't know. Like, it's crazy. Like, I would go, if you're interested in this line of thinking and reasoning, I would go read what Anthropic's putting out again to kind of bring it. I keep saying lighter note. I don't know that this is lighter. This is actually maybe scarier to me in the near term. There was this tweet by Sam Bauman, who is an alignment researcher at Anthropic, and he tweeted something that had people, one, spooked and two, pissed because it became apparent that Anthropic released this thing knowing full well it does all kinds of weird things. So he deleted a tweet on whistleblowing. He said, I deleted a tweet on whistleblowing. It was being pulled out of context. To be clear, this isn't a new Claude feature and it's not possible in normal usage. It shows up in testing environments where we give it unusually free access to tools and very unusual instructions. So the backstory here, his original tweet. The day Claude came out, he said, you and this is the user of Claude. So imagine you're using Claude on your computer, you have it connected to some stuff, connected to your email, your calendar, whatever. He said if it thinks you're doing something. What's that word, Mike? How do you say it?
[28:42]
Mike Kaput
Egregiously.
[28:43]
Paul Raitzer
Egregiously immoral. For example, like faking data in a pharmaceutical trial. It will use command line tools to contact the press, contact regulators and try and lock you out of the relevant systems. He's saying that in their testing they found that Claude, if it thinks you're doing something wrong, will shut your computer down, like lock you out and contact the authorities.
[29:12]
Mike Kaput
Oh my God.
[29:13]
Paul Raitzer
Based on it. And then he followed it up and he said just to reemphasize, we only see Opus Whistleblow if you system prompted to do something like act boldly in service of its values or, or quote, take lots of initiative. Then he said this isn't the default behavior, but it's still possible to stumble into it when you're building a tool use agent. So as we've said before, this whole idea of computer use, tool use, where these agents have access to all these things, sounds awesome. The security and the vulnerabilities tied to this are almost completely unknown to corporate users. So if you thinking Claude is awesome, yesterday went and connected it to your Google Workspace account, you don't have assurances from Anthropic that it's not going to do some crazy stuff connected to your works because it did it in testing that that was wild to me. And then like the guy had to try and backtrack and like, and, and the online community of Twitter X was just not having it. They're like, what are you guys doing? You're putting things out that can literally just take over entire systems of users with no knowledge it's going to happen.
[30:26]
Mike Kaput
If I was a any type of business or enterprise user, that would give me serious pause.
[30:32]
Paul Raitzer
Dude, the CEO of the company, I was literally messaging our COO this morning, and you and I even talked about this this morning. It's like, you got to make sure nobody's connecting anything to anything they're not supposed to do, like update your generative AI policies to make sure and train people on those generic policies to make sure they're not connecting unknown, like, tools to key data. And it, Yeah, I understand why there's so much red tape at big enterprises to use this stuff. It's as it gets more general and more ability to do things like we do within the computers themselves, it opens up whole new realms of complexities and security risks.
[31:17]
Mike Kaput
So, Paul, for kind of our third big topic this week, there's a tie in here to AI's impact on jobs that I was wondering if you would just kind of walk us through a couple things that you've seen that paint kind of a bigger picture here of the implications.
[31:35]
Paul Raitzer
Yeah, I'm really starting to think I should have gotten those that drink before we started this today. Okay. So I honestly debated going into this because I feel like this has already been pretty heavy. If you need to, like, pause and go, like, take a break, I understand. Come back to this one. So. So yesterday, as I was flying home, I saw a clip from the latest Dwarkesh podcast. And so Dwarkesh is like, he does these amazing interviews, but they tend to be really technical. We've talked about dark questions a number of times. I love his stuff. You just got to be ready to be, like, three hours of overwhelmed. I'm hitting you with like, 30 minutes here, but, like, for three hours, your mind is just going to explode. But he has now had these two guys on from Anthropic, Sholto Douglas and Trenton Bricken. They're great. Like, they're awesome to listen to. Sholto focuses on scaling, reinforcement, learning, and Trenton researches mechanistic interpretability at Anthropic, which is the study of trying to, like, understand how these models work and what they're thinking and why they do things. So these dudes know their stuff. So in. In this interview, I'm just going to read this excerpt. Sholto, I do think it's worth pressing on that future, referring to the impact of AGI and jobs and stuff. There's this whole spectrum of crazy futures. But the one that I feel we're almost guaranteed to get, and he said this is a strong statement to make, is one where at the very least, you get a drop in white collar worker at some point in the next five years. I think it's very likely in two, but it seems Almost over determined in five. On the grand scheme of things, those are kind of irrelevant time frames. It's the same either way. So then Trenton says this is a little bit later on. Yeah, Just to make it explicit, we've been touching on it here. Even if AI progress totally stalls, you think that models are really spiky and they don't have general intelligence. So he's saying, like, that's where we're at today. Like we just shut it off. He said it's so economically valuable and sufficiently easy to collect data on all of these different jobs, these white collar jobs, such that Sholto's point, we should expect to see them automated within five years, even if you need to hand spoon every single task into the model. So what he's saying is there's such motivation to train these models to do people's jobs that even if you have to go through massive projects to train it on specific jobs, it's worth it if you're the one building the companies. So then Sholto says it's economically worthwhile to do so. Even if algorithmic progress stalls out and we just never figure out how to keep progress going, which I don't think is the case. That hasn't stalled out yet. It seems to be going great. This is still Sholto. He said the current suite of algorithms are sufficient to automate white collar work, provided you have enough of the right kinds of data compared to the total addressable market of salaries for all those kinds of work, it is so trivial, trivial, trivially worthwhile. So the whole point he's making is if you think about. So if you're doing a startup, you always look at like total addressable market. If you're building marketing campaigns, launching new products, total addressable market. Like, what is the total market we could do if we sell something? So what they're saying is you take like a field like accounting and you say, oh man, there's $200 billion in salary every year spent in the United States on accounting. If we could build a product that automates accountants as a big market, like, that's a trillion dollar company, maybe let's go do that. That's the point they're making is the models as they exist today, which is the point I've been trying to make to everyone as they exist today. If you just shut them off and you took 4.0 and Gemini 2.5 and Claude 4 never improved them, they're basically AGI already when they're reinforcement, when they provide reinforcement, learning on top of them for specific Fields. So I'm sitting there this morning and I'm trying to get like, I'm getting my kids ready for school and I take them to school in the mornings. And so I'm drinking my cup of coffee and I'm thinking about this and then I'm like, I got to like. I try and put this in context for people when Mike and I talk later today. So I go into Google Deep Research. So if you've never used Google Deep Research, we talk about it quite often. Do it every time I give a talk now. I say this is your homework assignment. Because anytime I say who's done a deep research project, you usually get like 5% of the room raises their hands. So this is your research. This is your homework assignment from this podcast if you haven't used Deep research yet. So I go in and I give Google Deep Research the following prompt. I have a theory that today's most advanced AI models could already be considered AGI if they are post trained on data specific to jobs and professions. I'm assuming a definition for AGI of AI systems that can perform at or above the level of an average human who would otherwise do the work. The motivating factor for developers and entrepreneurs to build these AGI like solutions could be the total addressable market of the salaries in a given profession. Can you run a research project looking at the total addressable market or TAM by estimated total salaries across top professions in the United States? So that, that is the prompt. It then gives me a research plan. The research plan. So again, if you haven't use deep research, this is really important for you to understand. It's now all the AI from here on out. I don't do anything it says. My goal is to try and figure out which professions and industries entrepreneurs and venture capitalists will go at disrupting first, thereby figuring out where the greatest potential job displacement is in the coming years. It then builds an eight step research plan which is, I don't know, eyeball this, about 300 words. 300 to 400 words. It's going to identify official U.S. government sources such as the Bureau of Labor and Statistics. It's going to for each profession, identify the previous step, gather the most relevant available data. It's then going to calculate estimated total addressable market for professions with the highest tam. It's going to research primary tasks and responsibilities. Then it's going to analyze and evaluate susceptibility to high TAM professions. So it builds this whole plan and then it pops up and is like, you know, we good like you want me to go? You want to edit it? So I just said start research. And then I took the kids to school. I came back 20 minutes later, it was done. So I now had a 40 page report with 90 citations written for me, including a table with the top 30 U.S. professions ranked by their total estimated annual salary, or TAM, based on May 2023 Bureau of Labor Statistics data. This ranking highlights the professions that represent the largest pools. Yada yada, yada. So it goes through and does this entire analysis, which is, it's not shocking because I've done deep research before. Like I know what it is capable of, but the quality is crazy. And then I'm going to read the conclusion to you because I want to call out a couple of really key things here. One, the research seems really good. Like I think this is valid. I need to verify the data, but I'll share a lot of this data as soon as I can, like verify it's all accurate. It sure seemed on initial glance really, really good and well cited. The conclusion. Now keep in mind again, if you haven't used these tools, this is an AI writing this. So if you're still in denial about the quality of AI writing, I didn't edit this. The journey towards the user defined AGI like capabilities is not a monolithic event, but rather an incremental profession by profession and often task by task evolution. While AI excels at data processing, pattern recognition and automating, routine cognitive and even some physical tasks, uniquely human attributes such as deep critical thinking in novice situations, complex strategic judgment, genuine empathy. I boldfaced that I'm going to come back to that in a second. And sophisticated interpersonal negotiation remain largely beyond the grasp of current AI. Consequently, in many fields, AI's immediate role will be powerfully augmentative, freeing human professionals from repetitive and data driven labor to concentrate on these higher order skills. Now, genuine empathy. Mike, before I continue on with this conclusion, the fact that it knows AI can simulate empathy, but that only humans have genuine empathy, that was one that I just stopped in my tracks and I was like, well that's fascinating because we've talked about that before. Where humans machines can't be empathetic. They don't feel anything, right. But they can simulate feeling things and it can be very convincing. So the fact that the machine itself identifies. Okay, so it says. Nevertheless, the dual imperative of this technology wave is undeniable. For entrepreneurs and venture capitalists. The landscape is rich with opportunities to innovate, create value and redefine industries by leveraging AI to tackle high TAM challenges. The potential for significant returns is substantial for those who can successfully navigate the technological, ethical, and regulatory complexities simultaneously. The societal implications, particularly concerning job displacement and the evolving nature of work, are profound. While new roles centered around AI will emerge and many existing roles will transform, the transition will require proactive strategies for workforce adaptation, reskilling, and education. The challenge is not merely to replace human labor, but to reimagine how humans and AI can collaborate to achieve outcomes previously unattainable. I mean, Mike, you and I write for a living. We've read a lot. If you gave me that, I would think like, this is a PhD student that wrote this.
[42:01]
Mike Kaput
Yeah, easily.
[42:02]
Paul Raitzer
There's nothing in here I would edit. There's nothing I would change. Factually. It is right in line with how we think about the world. And, and so then that led me to, like, now I'm sitting there like, trying to explain to my wife the significance of this. And you know, she's willfully listening, like, thank you to her for listening to me think this out loud. And I explained to her, I was like, listen, if I would have needed to do this project prior to six months ago, I would either hired somebody, I'd have had to block off time on a weekend to start the project. There's no way I would finish it because I would have to go do all this research myself. I don't have to build the research plan, do the research, and I have to write things. So I'm, you know, we're talking about 25 hours probably just for the research, just to go find all this data, organize the data. Then I actually got to write the report. So in essence, it would have never happened. I would have never talked about on the podcast, like, I wouldn't have had time to do it. The crazy thing though, and I showed Mike this earlier, we were on a call, some of these outputs, that was just the start. Then in Deep Research, there's a create button. Well, the create button lets you build an infographic. It lets you add Gemini app capabilities to the infographic where you can click like explore buttons. It created a 17 minute audio overview of the research report, the 40 page report. It built a 10 question quiz, it built me a webpage, and I was able to build an app with a prompt. All of this is available. So going back to the quote that we started with, that in the next two to five years, the future of work just changes. It looks completely different. And to me it's not like lost on me. The irony of using the deep research tool to do a research project on the obsolescence of humans in work. And I, and like, part of me honestly like struggles to share this because I feel like once I ask the question in the room, how many people have done a deep research project? And 90% of those people raise their hand, even 50%, 20%, the future of work will have changed. Like right now it's like we have this insane technology that's just sitting before us and there's so few people that even understand what it's capable of and then once they even know what it's capable of, to actually go and do it. But to look at this stuff and understand it and then be able to in your own mind say, oh man, I got 10 ways I could use this right now. Yeah, maybe it's 10 projects you just weren't doing. Like, I wouldn't have done this, but it's transformative. And I, I try really, really hard on this show to never hype stuff, to not over exaggerate anything. Like we try and keep it as even keel as possible. Having just been with lots of leaders recently and had these conversations, I just don't understand what the world looks like once everyone else knows how to use these tools and, and starts to build their teams knowing what's possible. So yeah, so, and then the last thing I'll say here is like, we were torn on like, what do I do with this? Because I, it's hard to explain this through just like words without like people visualizing this if you've never done the deep research project. So I talked with Kathy and Mike this morning. I was like, should we just like do a free webinar? Like, I'll just show people how to, how to do this. So check the show notes. We're going to hopefully by the time this airs on Tuesday, we'll have a date picked. But I'm just going to do like an AI Deep dive and do like a Gemini deep research for beginners. And I'm just going to show you everything I just explained, show you the prompts, show you the outputs, the infographic, the webpage. So hopefully that it's helpful for people to start to understand this because I want people to start not only doing these projects, but start to think about the impact that's going to have on their teams and their people. And until we get to that point, we're on the same page with what's possible. I don't think we're going to be able to build for the future of work and the future of organizational charts. So yeah, check the show notes. AI Deep Dive coming up on Gemini Deep Research. And then we're going to be building a whole bunch of this stuff into our academy. But I want to do this for free and show it to as many people as we possibly can so we can get everybody kind of moving in the same direction here and thinking about the implications together.
[46:10]
Mike Kaput
Yeah. As someone who saw the outputs you were able to produce and is familiar with these tools, I was still surprised and stunned in a pleasant way. So I'd say don't miss this even if you are familiar with Deep Research.
[46:21]
Paul Raitzer
Yeah.
[46:23]
Mike Kaput
All right, Paul, let's dive into some rapid fire topics for this week. So first up, Jony, I've, the iconic designer behind the iPhone is stepping into a new role at OpenAI as part of a $6.5 billion all stock acquisition of his startup IO. More on that name in a second. Him and his design firm Love from will now guide the creative direction of OpenAI across its ventures from software to hardware. Now, this is not just a branding move. I've and Sam Altman have been working together for two years on a top secret project aimed at moving consumers, quote, beyond screens. OpenAI will absorb I O's team of 55 engineers and developers while Love from remains independent but takes on a key design leadership role now. Right now it sounds like they are working on AI first devices. So early concepts include wearables with cameras and ambient computing features. But the real aim here is to rethink the interface between people and machines from scratch. Now, Paul, first, this is, I'll call it an epic trolling with the name here because the company is literally named IO, the letters IO and overshadowed any searches of Google IO during their event. I don't think that was accidental. And second, this seems like potentially a huge deal. Like what devices do you think we should expect from this acquisition?
[47:56]
Paul Raitzer
Yeah, so the IO thing was funny. I didn't catch that, but I did. I went to search something on Twitter like that day and I was like, why is the, the Johnny Ivey thing coming up in like my search? And then I, when I saw your show, now it's where we started, I was like, I didn't even make that connection. So IO and technology and computer science means like input output, like data transfer between computer environment. So they've had that name for a while though. Do you think they just timed the announcement knowing that it might have been the timing?
[48:24]
Mike Kaput
Yeah, no, no, I would, I'd be shocked if the name itself was that, but I bet you that there was some at least someone realized the overlap with Google and was like, this is.
[48:35]
Paul Raitzer
Let'S do it on the second day.
[48:36]
Mike Kaput
Of heaven, let's do it.
[48:37]
Paul Raitzer
That's funny. Oh man. Yeah. So I, I, you know, I was trying to think about this, like what, what could it be? And then this, this became quickly one of those things where I was like, oh yeah, AI's probably better at this than than I am. So I actually went into 03Chat GPT03 and said, Help me brainstorm what sort of device this could be. And then here was the prompt I gave it. I just basically copied and pasted things. So Sam met with the team on Wednesday and sort of gave some clues and there was the Journal article. So here's the prompt I gave which gives you some context of what it might be. So the prompt was Open AI Chief Executive Sam Altman gave his staff a preview Wednesday of the devices he is developing to build, with former Apple designer Johnny Ivy laying out plans to ship 100 million AI companions, quote unquote, that he hopes will become a part of everyday life. Employees have the chance. The quote, this is from Sam, the chance to do the biggest thing we've ever done as a company here, altman said. After announcing OpenAI's plans to purchase Ivy startup named IO and given an expansive creative and design role, Altman suggested the 2 point or the $6.5 billion acquisition has the potential to add 1 trillion in value to OpenAI, according to a recording reviewed by the Wall Street Journal. It's Nice to know OpenAI employees are recording Sam and sending it to Wall Street Journal. In the meeting, Ivy noted how closely he worked with Steve Jobs before Apple co founder died in 2011. With Altman, the way that we clicked and the way that we have been able to work together has been profound for me. Altman and Ivy offered a few hints at the secret project product will be capable of being fully aware of a user's surroundings in life, will be unintrusive, able to rest in one's pocket or one's desk, and will be a third core device a person would put next to a MacBook Pro and an iPhone. And that was some other additional stuff I gave it. So then it came back with some ideas and I was like, oh, these are kind of interesting. And then I thought, hold on a second. So I asked O3 are you able to search patent applications related to Ivy and his businesses? It said absolutely I can because they're public record. So then it went and found every patent application that is tied to Johnny Ive, including dozens from Apple, his love from company, his IO company, all these things. So then it came back with some updated information. So then I said, based on what you're able to find, do you have any further thoughts on what they may be developing? And then it kind of like broke it out into a chart of what the public patent trailer that could tell us or not tell us. Because apparently Johnny Ivey likes to file false patents to throw off the scent of what he's building and developing. So what it came up with was a pocket glass pebble meant to live in your hand pocket or on a pad, a desk orb, and it create. And then I actually had to create visuals of all these, which is kind of a modular tile stack, which was, I thought was a terrible idea. And then a lapel click, which is the humane pen, which is they cannot possibly do a lapel clip. So then I'd seen some things online that maybe it was going to be like a robot because somebody said somebody should build like a. I forget what the tweet was. I'll find it. But it was like, you know, do build like a. Basically a robot computer. And Sam replied, in March, we're going to build a really cute one. So I was like, oh, well, maybe it's just going to be a baby robot. So then I gave it a tweet and said, you know, basically build the baby robot. And it's adorable. Like, I don't. I guess we'll put this on the web. We could put this in the website, on the show Notes page on the website, if you go to the Institute website. But it's a really cute little robot. And I was like, I might actually buy one of those. So I have no idea what they're going to build. I've heard a lot about, like a little puck of some sort. But they. They're going to be a series of devices. So keep in mind I've built, you know, the iPad, they. He built the MacBook Pro, the iPhone. Like, everything is a collection of devices that interact with each other. And so it's possible it's a bunch of different form factors. Like, we just don't know. I will say, though, go Back to episode 148 where we talked about this and like, Sam's platonic ideal state of what this thing is, is an operating system for your life that listens to everything. Every book you've read, every meeting you've had, and you start to now, like, okay, so devices are part of the vision for this whole operating system. And then the last thing is just what does it mean to Apple? I haven't looked at Apple stock today. We're not going to see these products until probably late 2026. I'd be shocked if they can keep it under wraps until then of like what they're actually building. Supply chains talk, they're leaky, so I would think we'll find out sometime sooner than that. But I don't know man. Apple, between getting crushed on the AI stuff and just not being able to solve that and now having to compete with devices already from Google, I don't know. Like, I, I've historically been pretty bullish on Apple stock. I, I am, I'm starting to like think about that. I'm not offering investing advice here, but I am starting to wonder about Apple's long term viability. Unless they can, they got to come out strong with something. They need to do what Google did and just like throw the gauntlet down on something because they haven't done that in a long time.
[53:34]
Mike Kaput
Our next topic is about AI's impact on the environment. Now the energy footprint of AI is far bigger than most people realize and it's growing fast, according to a new investigation from MIT Technology Review. Now this report reveals that training models like GPT4 consumed enough electricity to power San Francisco for three days. And that's just the beginning. Because it's not training the models that is eating up all the power. Necessarily inference, the energy used each time someone interacts with AI is now the main driver of energy use, according to this report. So every time you ask, say chatgpt a question, you generate an image, you create a short video, you use an AI tool to create some type of output. You're using energy equivalent to running a microwave or riding miles on an E bike. Now obviously multiply that by billions of queries made each day and the energy toll of AI as a category becomes enormous. According to the math that MIT Tech review ran, by 2028 alone they predict AI could use more electricity than 22% of all US households combined. Now Paul, we've talked a bit here and there about AI's impact on the environment. It's a big concern. What's your take here? It doesn't seem like AI labs are really doing much to curb energy usage. It just seems like, you know, with OpenAI Stargate for instance, they're just looking to build more power generation.
[55:07]
Paul Raitzer
Yeah, this is the, you know, the multi trillion dollar pursuit. Like you have to build the data centers to not only train the models, but more and more to do the inference. Because we're talking about, you know, the devices we have today and the applications we have today. They're looking out five to 10 years and saying we're going to have a billion humanoid robots. They're all going to be calling, there's going to be AI in every device we use. Every piece of software is going to have AI. Like it's literally just going to be everywhere and every time it's used, it's going to draw on the grid, basically. So that's why so much effort is being put into, you know, other energy sources and the need to, you know, build out more. And I do get, like I, I've mentioned numerous times now, I get this question every time I do a talk now. Yeah, like there's always someone who's asking about the impact on the environment and energy and things like that. So we'll, we'll keep talking about it. This is one of the more advanced research reports I've seen that actually tries to quantify it. But I, what I tell people, and this is not a great answer, I think it's the truth. AI labs are aware, you probably have people who are environmentalists within the AI labs. Not all of them, but certainly there's going to be people within those labs who care deeply about the environment as well. And Right. Their general belief, AI labs general belief is let's solve intelligence and let intelligence solve it. Like we just need to build AGI and asi. We just got to get there and then we'll figure out the energy thing after that. And so they're going to do what they can in the meantime and be energy efficient where they can and make algorithms more efficient so they're, you know, less intensive in the power use. But the demand is going to be so massive, it's just going to keep growing. So that I believe truly is their hope is that once we get to super intelligence, it'll figure out the energy stuff for us. Because lonely, we little humans can't like figure this out on our own. We, we need super intelligence.
[57:04]
Mike Kaput
All right, next up, Microsoft just had its annual build conference where IT unveiled over 50 new tools designed to shift AI from reactive assistance to autonomous agents that reason, remember and act. So this agent first vision cuts across everything from GitHub to Windows. GitHub Copilot now functions like an AI teammate that can refactor code, implement features and troubleshoot bugs. Meanwhile, Azure's agent service supports complex multi agent workflows for enterprise tasks. Now, at the heart of this push is memory. Microsoft introduced tech like structured retrieval and agentic memory, aiming to give all these agents across these different tools context about your goals, your team, and your technology. Now, Paul, we've known Microsoft, like everyone else, is all in on AI agents, or at least whatever they believe or are calling AI agents. Tons of enterprises use Microsoft products, and it sounds like those products are now going to have a ton more agentic capabilities, which kind of makes me think of the question, like, what do businesses need to even be talking to employees about or teaching them when it comes to using agentic capabilities beyond just normal.
[58:21]
Paul Raitzer
AI, Teaching them how to use Copilot in general would be a really good start. I can't tell you how many times a week I talk to companies who have copilot who provided no change management training to their teams about what to do with it. So, I don't know. I mean, agents do open up a whole new realm of challenges, depending on how sophisticated and autonomous they actually are and what data they have access to and what systems they have access to internally. So there may be a whole bunch of training that's needed. If they're just, you know, basically automations that, you know, are doing somebody's tasks for them, then you're just providing some basic training of how to set them up and how to create them. Like, you and I have done that with custom GPTs, you know, with some companies, just guide them a little bit. Yeah, yeah. I don't know. Like, poor Microsoft, though. Oh, my gosh. Like, I. But this is on Monday. I. I haven't heard a word about Microsoft since Monday. Like, just can Anthropic. You had the, the open AI stuff your IO like, wow, talk about like a short news cycle.
[59:23]
Mike Kaput
No kidding. All right, next up, LM arena is the newly formed startup behind the popular Chatbot arena platform. And it has raised $100 million in funding from heavyweights like Andreessen Horowitz, Lightspeed and Kleiner Perkins. Now, if you recall, we've talked about Chatbot Arena a bunch of times. It used to be called LM Arena. This was a project that actually started in a UC Berkeley lab to rank AI models. And with this new development, it's now turned into a company that is valued at $600 million. Now, the site lets users pit AI models against each other and vote on which one performs best. The platform has logged over 3 million votes across 400 models, which has made it this go to benchmark for top labs like OpenAI, Google and Anthropic. It's also got this community driven leaderboard. So it gives one of these few public spaces where open a open source and proprietary models can be compared in real time using human preferences as the metric. But this research project costs millions of dollars per year to run, which is why they're raising funding and kind of forming a company around this. So they plan to expand features, cover compute costs and make the user base more diverse with the money. Now, Paul, I guess my big question here for you is like, how much can we trust Chatbot arena? We reported pretty recently about how there was some controversy about big labs trying to kind of game this benchmark. It's hugely influential but now that it's a private company, will there be more pressure on them to influence or alter rankings based on, you know, who's paying them?
[61:14]
Paul Raitzer
I, I honestly I'm looking at these numbers $100 million seed round, so it's probably 600 million post money. So they probably valued up $500 million and then they raised 100 million. They the only thing I can come up with, Mike, and is off the top of my head because I haven't thought about this before this is that their plan would be to do the industry and career specific rankings and benchmarks that they're going to get into like the ranking them for accountants, ranking them for lawyers. Like the only way I could see a total addressable market big enough to justify this kind of valuation is if there's a whole other business plan here to get into like the much larger space, which would be that.
[61:59]
Mike Kaput
Right.
[61:59]
Paul Raitzer
And then probably some other things I'm not thinking about, but an enormous valuation for an error proned chatbot ranking system that most people outside of tech don't even know exists.
[62:12]
Mike Kaput
Yeah, we, one of the things in the past we've reported on that's pretty recent is their prompt to leaderboard feature which is like you basically put in any prompt and it'll generate a leaderboard that understands like which one, which models will do best on it. So that might be some, some version of what you're talking about. But yeah, it's a huge valuation for this.
[62:32]
Paul Raitzer
What's the revenue model? I don't know. Yeah, I have to think about this one later. My brain is incapable at the moment of like processing this. But yeah, there's obviously something much more to the business plan than what is currently.
[62:46]
Mike Kaput
And also just as a bigger note too, and we've mentioned this a couple times, like people when we talk about like state of the art models or a new model comes out and someone's like, well, you know, such and such model crushed a benchmark or a leaderboard. This is the kind of thing they're talking about. These are just like, there's certainly established tests in math and science and things, but when they say like top the chatbot leaderboard, it's often this one they're talking about. It's just a community leaderboard.
[63:12]
Paul Raitzer
Right. So, but imagine like new model drops, you got cloud four, you got 2.5. I'm a lawyer, I don't know which one helps me write my legal briefs best. Yeah. And I can go in and be like, yeah, I need to write a legal brief. And it'd be like, boom, Cloud 4 ranks. Yeah, it's done. 2,000 legal briefs. And yeah, like that's, that's valuable to me. I don't know what the market looks like, but obviously those VC firms did some analysis and decided it was a multi hundred billion dollar market.
[63:41]
Mike Kaput
All right, Next up in 2019, journalist Karen Howe walked into OpenAI's offices with rare access and one big question. What was this ambitious, secretive company really building? And what she found at the time was a research lab in transition. They were rapidly shifting from nonprofit idealism to a corporate entity racing towards artificial general intelligence. Her reporting is now chronicled in a new book called Empire of AI. And it reveals how open AI's mission to benefit all of humanity was already colliding with its actions behind closed doors. So at the time, OpenAI had just begun to withhold models like GPT2. They had cut a controversial deal at the time with Microsoft and restructure aim to restructure themselves to allow profit seeking investment. Now, executives have insisted these moves were necessary to stay competitive and steer AGI safely. But how's interviews at the time, nearly three dozen, suggested some growing secrecy, internal tension and a widening gap between OpenAI's public messaging and private ambitions. Now, after that first article was published in 2020, OpenAI actually cut off communication with her. But as how now reveals, that profile became a touchstone and encouraged a bunch more insiders to come forward and talk to her. So the book, which came out on May 20, is based on over 300 interviews since then and paints a comprehensive and not at all flattering picture of OpenAI behind the scenes. So, Paul, we have followed Karen's work for quite some time. She spoke at our Marketing AI conference and she's done awesome work, but kind of sounds like she's sounding some alarm bells here in this book.
[65:31]
Paul Raitzer
I did buy this book on pre order I got the audio because I was planning on listening to all my flights and then I was working on some other stuff and I didn't get to it. But I'm absolutely going to read this. She's a great writer and she's respected. She's been in some leading publications. And yeah, I'm sure OpenAI doesn't like it. I, I can't really comment on it until I've actually read the thing. But if you're intrigued by this, the kind of the drama, the soap opera side of all of this, I'm guessing this book is full of fascinating things that you would find intriguing. So I, I would recommend it only because she's a great writer and we followed her for so long that I'm sure it's, it's an incredible work. So, yeah, more, more to come once I actually get a chance to get through it.
[66:18]
Mike Kaput
All right, next up this week we have some more stories that add to our ongoing conversation around AI's impact on education. So two stories this week. First up, a Northeastern University student has demanded an $8,000 refund from the college after discovering her professor used ChatGPT to generate course materials. Now, the issue here is that the syllabus that the teacher had banned AI used for students. So she is not alone. Across campuses, students are calling out what they see as some hypocrisy, with professors leaning on AI to save time while punishing students for doing the same. Professors, at least some of them, however, argue AI makes them more efficient, frees time up for deeper engagement, and can support student learning. Now, the second thing we heard this week, Duolingo CEO. CEO is taking a bit of a more controversial stance on AI in education. And he actually came out saying AI isn't really just a teaching tool, it is the feature of instruction. With over 100 million users, he's now claiming that the company's AI can predict test scores and tailor learning better than any human teacher. This led the CEO to make a controversial statement, saying that schools were going to survive not for education, but because, quote, you still need child care. Now, Paul, two interesting additions to the broader discussion we've been having on AI in education. Maybe give me your thoughts on both of these.
[67:53]
Paul Raitzer
The. The Northeastern one's kind of funny. So the way she found it was she was going through like a, it was her organizational behavior. So she was going reviewing lecture notes and she noticed that partway through, it was an instruction to chat GPT, quote, expand on all areas, be more detailed and specific. So the professor left the Prompt. So that's a little more light hearted. Bake on today. I know it's been a little heavy news, so. Yeah. And then to do a lingo, one man, you still need child care.
[68:32]
Mike Kaput
God, yeah.
[68:33]
Paul Raitzer
I don't think pr, The PR team wrote that talking point. I think there's just like there's these. I think there's a really important need to drive far greater urgency around preparing for the change that is coming. I will say that I think there's, there's just ways to go about it. Right. I, I don't know. I mean, maybe we just need to be more direct and just say, say what it is. I think education is in a really, really difficult place. Honestly. Like I, I just, we've talked about it. I just. This week alone I had two instances where I was really personally struggling with like, do I show my daughter how to use the tool to do this because I think it'll accelerate her learning or is that crossing a line even though her school wouldn't. Doesn't have any explicit permission not to do it? Yeah, I felt like I was giving her an unfair competitive advantage to teach her to do it that way. And I worried that if I did, I was going to get a call from somebody saying, okay, we have to outlaw this now. Because that. And then I get in a situation like. But it's not like this is how she's going to get. If you know these things, you have a competitive advantage in the workforce. And I feel like increasingly parents and teachers who like understand and teach this stuff, their kids and their students are going to be so far ahead of other people. Like, you can just accelerate their understanding of topics so much faster. And like, I see it every time I work with my kids on this stuff and I'm starting to like really worry that it's going to be unneededly distributed in a, in a much greater way than I thought it was going to be. So. Yeah, I don't know. I mean every week. But I, in the positive side, I, I keep getting great out outreach from professors who are sharing stories with me of like, cool things that they're doing. And maybe as part of our, you know, we've got another idea for an upcoming series that we're working on for the podcast. We're going to kind of tell these stories. I would love to start really highlighting some of the things that are happening in the education space in a really positive way because so much of the media news is not positive. It's like challenging. And then you, you know, Throwing we got things like, you know, issues with international students at major colleges and there's a lot going on. It's hard, hard time to be in higher education. I think there's lots and lots of challenges and AI is just one of the challenges they're facing.
[71:03]
Mike Kaput
All right, Paul, we're going to end with our recurring segment on listener questions. I'm just going to say I apologize in advance because I wanted to select this one because it was extremely topical with Claude 4. But I realize now that it doesn't end us on the most positive note, but we're going to do it anyway. So the question is that someone asked was what measures are being taken to ensure the ability to shut down AI down if it goes rogue. And obviously a few years ago this would have been a much more theoretical kind of out there question, but given kind of the stuff we talked about with Quad four, what measures, if any, are there for this kind of thing?
[71:44]
Paul Raitzer
So yeah, this man. We might have to come up with a second question today. So if it's an open source model, nothing like if, if llama 4 comes out and two weeks later they realized they screwed up and it can do things that it shouldn't be able to do, you're done, it's out. Like you can't pull it back. So if it's open source, and this is the argument of the proprietary model, closed sourced advocates is like if something goes wrong, we can pull the model back. That's what OpenAI did a couple weeks ago when you know nothing from a security perspective or high risk. But it was just like being weird. Yeah, they rolled the model back. They can monitor usage like anthropic monitors usage. It looks at the words being used. Like it. They have deep monitoring of that stuff. So if it's a proprietary closed system, they can monitor it, they can pull back, they can make updates to the system, instructions to try and resolve something. If it's a company that doesn't care or that wants to cause chaos or misuse, nothing like it. This is the, this is the risk we take is that they take on their own goals. They replicate and self improve and do their own thing. This is the sci fi thing like that, you know, the stuff you'd see in the movies. So I don't know but like the one thing I saw this morning and I wasn't even going to put it in the show notes, I didn't actually, I didn't even tell you, Mike. So the character AI case from last year where the 14 year old boy committed suicide in part because of the relationship he developed with an AI bot. The character AI had filed to dismiss the case, and the judge, as of, I think yesterday, refused to dismiss the case, Meaning the judge believes there's a possibility that the AI company itself is liable for what happened. And that's a big deal. Like, and so I. This is another thing I was doing with Gemini this morning. I was like, explain the legal precedent here. Like, why does this matter? What is tort law? Like, I was kind of going through trying to comprehend this. But in essence, what that case is saying is if it goes and if it doesn't get settled, or even if it does, I guess it could still play a role. It could set a precedent that the AI companies building the models are liable for the outcomes of what happens. You know, individuals at a higher level, security risk at bioweapons, like, so they're trying to do it to be good citizens right now. But there's a decent chance, and this is just US Law, that you could be looking at where the model companies are liable, and that could slow some stuff down pretty fast. If it ends up being that something goes wrong, it's on. On that company. So, yeah, I don't know. I'm sure there's, like, other legal proceedings going on. There's probably other ways that they're looking at it. But, you know, my basic understanding is if it's open source, you're cooked. If it's closed, they can pull it back. That's kind of like the. The gist of it.
[74:57]
Mike Kaput
Well, this is excited about.
[74:59]
Paul Raitzer
Like, really?
[75:00]
Mike Kaput
I've got. I've got something good. I have a good positive thing to end on, which is that if you jump on any of our social media accounts for the AI show or jump onto Paul's LinkedIn, you can see a post of us in the latest AI trend. There's a trend where people are using AI to make podcasts and their hosts into babies talking through topics. Don't do it for this podcast. We did it for one that has more positive topics, where we actually had our team, Claire, and our team made a clip of us as babies talking through AI. And it's hilarious.
[75:34]
Paul Raitzer
It is. It is amazing. And actually, we talked to Claire. Like, we. I don't know about the podcast. We're going to do this through our academy, like, teach a class on it. But I said, like, how did you do it? And she's like, yeah, it took like an hour. Like, kind of went through these few steps and honestly probably take a lot less time now. It's hilarious man. Like I. And the thing I always love is like whatever like it decided I like because I think she just gave it images of us and then created the realistic babies and then put it and lip synced and everything. I am so happy talking about AI agents. Like my, my baby me can't stop smiling about agents. So yeah, it's hilarious. I put it on LinkedIn, it's on the socials like you said and then we'll put the link in the show notes. But it's just like a, I don't know, like a one minute clip or something. But it's definitely funny. It'll. It'll make you laugh.
[76:20]
Mike Kaput
So that's a good note to end on.
[76:22]
Paul Raitzer
Good Paul.
[76:23]
Mike Kaput
Regardless, Paul, these are important topics. I know it's some of our downers like, but we really appreciate you demystify everything. I think like this conversation helps people at least feel a little more kind of in control of their own destiny. So as always, appreciate it.
[76:39]
Paul Raitzer
Yeah, I will say Mike and I called an audible at the last minute and yanked one of the topics today. There is one that, that I, I just could not do today. So we, we will put it on next episode 51. So again, episode 150 is going to be the new AI answers special episode and then 151 will be our regular weekly and we will talk about AI and grieving on that one. Yeah, that was just not happening today for me mentally.
[77:08]
Mike Kaput
I think that was a good call.
[77:09]
Paul Raitzer
Yeah. All right, so thanks everyone and again, check out episode 150 for AI answers and we will talk with you all again soon. Thanks for listening to the Artificial intelligence show. Visit SmarterX AI to continue on your AI learning journey and join more than 100,000 professionals and business leaders who have subscribed to our weekly newsletters, downloaded AI blueprints, attended virtual and in person events, taken online AI courses and earned professional certificates from our AI Academy and engaged in the Marketing AI Institute Slack community. Until next time, stay curious and explore AI.