wavePod

#138: Introducing GPT-4.5, Claude 3.7 Sonnet, Alexa+, Deep Research Now in ChatGPT Plus & How AI Is Disrupting Writing - The Artificial Intelligence Show | Wave AI Podcast Notes

Back to The Artificial Intelligence Show

#138: Introducing GPT-4.5, Claude 3.7 Sonnet, Alexa+, Deep Research Now in ChatGPT Plus & How AI Is Disrupting Writing

The Artificial Intelligence Show

Tue Mar 04 2025

Summary

The Artificial Intelligence Show – Episode #138 Summary

Release Date: March 4, 2025

Hosts Paul Raitzer and Mike Kaput delve into the latest advancements in artificial intelligence, exploring new model releases, AI integrations in consumer products, and the evolving landscape of AI in professional fields. This episode provides a comprehensive overview of significant AI developments, their implications, and the future trajectory of AI technologies.

1. Introduction and Upcoming Events

The episode kicks off with Paul and Mike announcing the AI for Writers Summit Week presented by Goldcast. Scheduled for March 6th, the summit promises six engaging sessions covering topics from AI copyright to mastering AI prompting. Paul emphasizes the importance of Goldcast's AI-powered content lab in streamlining event content creation.

Notable Quote:

Paul Raitzer [00:00]: "These models already are superhuman at persuasion. It's just red teamed out of them...join us as we accelerate AI literacy for all."

2. OpenAI Introduces GPT-4.5

OpenAI has unveiled GPT-4.5, touted as their largest and most advanced chat model to date. This iteration promises a more natural interaction, a broader knowledge base, improved user intent understanding, and enhanced emotional intelligence (EQ). Early testing indicates reduced hallucinations and higher factual accuracy compared to GPT-4.

Key Highlights:

Performance Improvements: Achieved a 62.5% accuracy rate on Simple QA benchmarks, up from GPT-4’s 38.2%.
Reduced Hallucinations: Decreased from 61.8% to 37.1%.
Accessibility: Currently available only to ChatGPT Pro users ($200/month).

Notable Quotes:

Mike Kaput [05:00]: "GPT-4.5 is out in the wild... it is the first model that feels like talking to a thoughtful person."
Paul Raitzer [07:48]: "I think it's more a sign of what's coming versus being some obvious leap forward in capabilities and performance."

Paul and Mike discuss the implications of GPT-4.5, noting that while some users may not immediately perceive drastic changes, the model represents a preparatory step toward the forthcoming GPT-5, which is expected to integrate advanced reasoning capabilities.

3. Anthropic Releases Claude 3.7 Sonnet

Anthropic introduces Claude 3.7 Sonnet, described as the first hybrid reasoning model. This model offers dual modes: a standard mode for quick responses and an extended thinking mode for in-depth, step-by-step reasoning. Early adopters, including major tech companies like Cursor and Vercel, have praised Claude 3.7's precision and capability in complex tasks.

Key Highlights:

Hybrid Approach: Combines quick responses with deep reflection within a single model.
Real-World Applications: Excels in encoding, web development, and complex agent workflows.
Claude Code: A command-line tool enabling developers to delegate substantial engineering tasks to Claude.

Notable Quotes:

Mike Kaput [19:57]: "Claude 3.7 is very much an intermediary step before the four."
Paul Raitzer [22:26]: "They’re presenting this as like we’ve cracked that reasoning should be part of these models...a prelude to these much bigger things."

The hosts express skepticism about Anthropic’s marketing approach but acknowledge the technical strengths of Claude 3.7. They also speculate on Anthropic’s future, considering potential acquisitions or partnerships due to their limited data and distribution channels.

4. Amazon Revamps Alexa with Generative AI

Amazon introduces Alexa+, a significant overhaul powered by generative AI. Alexa+ transforms the voice assistant into a more conversational and context-aware entity, capable of understanding user preferences, managing smart home devices, and performing complex tasks like booking reservations and summarizing security footage.

Key Highlights:

Enhanced Conversational Abilities: More natural and intuitive interactions.
Visual Understanding: Ability to process video feeds and respond to visual queries.
Agentic Capabilities: Alexa+ can autonomously navigate the internet to complete tasks on behalf of users.
Personalization and Memory: Remembers user preferences and personal data to tailor responses and actions.

Notable Quotes:

Mike Kaput [31:06]: "Alexa touches on so many areas of people's consumer and content consumption habits. How big a deal is this if it works as advertised?"
Paul Raitzer [34:11]: "If anyone's listening to the show the last month, you know how we feel about these Deep Research products. They are transformational."

Paul discusses the integration of Anthropic’s Claude into Alexa+, highlighting Amazon’s strategic investment of $8 billion into Anthropic. He underscores concerns regarding data privacy and the extensive personal data Alexa+ would require to function optimally.

5. Deep Research Now Available in ChatGPT Plus

OpenAI expands access to Deep Research, an agentic research assistant capable of conducting autonomous, in-depth research tasks. Available to ChatGPT Plus, Team, Education, and Enterprise users, Deep Research can generate comprehensive research briefs in a fraction of the time it traditionally takes.

Key Highlights:

Efficiency: Delivers high-quality research reports within minutes.
User Feedback: Positive evaluations, with 7 out of 19 experts rating its responses at a professional level.
Limitations: Initial access limited to 10 queries per month for non-pro users.

Notable Quotes:

Mike Kaput [46:02]: "This is exactly the type of thing we've been needing in some of our previous discussions...like, a way to actually evaluate AI models on the many, many valuable tasks."
Paul Raitzer [48:40]: "It is truly like, if you don't know what this technology is capable of, it can change the way you do."

The hosts discuss the transformative potential of Deep Research for knowledge workers, emphasizing its ability to significantly streamline research and strategic planning processes.

6. AI’s Disruption in Writing Professions

David Perel, a former writing coach, announces the shutdown of his writing education business, citing the obsolescence of traditional writing skills in the face of advanced AI language models. He highlights that AI can now produce content superior to human capabilities in areas like nonfiction writing, pushing writers to focus on personal narratives and unique perspectives to maintain relevance.

Key Highlights:

AI Supremacy in Content Creation: AI tools can generate high-quality content rapidly.
Shift in Writing Focus: Emphasis on personal experience and unique insights to differentiate from AI-generated content.
Opportunities for Writers: AI as a tool for instant feedback and idea refinement.

Notable Quotes:

David Perel [Timestamp Not Provided]: "If you do a great job prompting things like OpenAI's Deep Research, you can now produce content superior to what I could create in a full day's work on most topics."
Paul Raitzer [74:10]: "AI is changing what we do as writers, but I don't think enough people are coming together to really explore what that means."

Paul and Mike reflect on the necessity for writers to adapt by leveraging AI tools while focusing on inherently human elements like unscripted conversations and personal storytelling.

7. HubSpot’s AI-Driven Partner Ecosystem

HubSpot projects a $30 billion market opportunity by 2028, with AI expected to contribute one-third of this growth. The company emphasizes the integration of AI with unified customer data, enabling partners to build AI agents and modular solutions within HubSpot’s ecosystem. This strategy centers on transforming unstructured customer data into actionable insights.

Key Highlights:

Market Potential: AI-driven solutions anticipated to generate $10.2 billion.
Agentic Solutions: Building AI agents that address common business needs within HubSpot.
Data Integration: Focus on converting unstructured data from communications into structured, actionable formats.

Notable Quotes:

Paul Raitzer [79:33]: "A lot of agencies are going to go away. A bunch of other agencies are going to figure this stuff out and build amazing businesses."
Mike Kaput [81:14]: "There’s a huge role for humans in this agentic future."

The hosts discuss the dual impact on agencies—those that fail to adapt may become obsolete, while others that embrace AI-driven solutions can thrive by enhancing their service offerings.

8. Robotics Advancements by Figure

Robotics startup Figure announces significant improvements to their AI system for package handling and accelerates the testing of Figure 02 humanoid robots in home settings by two years. The advancements are attributed to their Helix AI system, which integrates perception, language understanding, and learned control.

Key Highlights:

Helix AI Enhancements: Improved vision and motor control for faster and more efficient package handling.
Accelerated Testing: Humanoid robots to begin alpha testing in homes within the year.
Current Focus: Maintaining industrial applications alongside progressing towards consumer robotics.

Notable Quotes:

Mike Kaput [64:00]: "Do you really expect to see humanoid robots in homes beginning this year?"
Paul Raitzer [66:41]: "I do not believe that anyone needs to think they're going to go over a friend's house this holiday season and run into their robot."

Paul remains skeptical about the immediate consumer availability of humanoid robots, citing the need for further advancements before widespread adoption.

9. Listener Questions: Handling AI Hallucinations

In the Listener Questions segment, the hosts address concerns about AI hallucinations—instances where AI generates incorrect or misleading information.

Key Highlights:

Awareness and Oversight: Users must recognize the potential for inaccuracies and implement human oversight, especially in high-stakes scenarios.
Use Case Appropriateness: Suitable for brainstorming and creative tasks but requires caution in factual or research-intensive applications.
Prompt Engineering: Crafting detailed and specific prompts can mitigate some hallucination risks, though not entirely eliminate them.

Notable Quotes:

Paul Raitzer [81:54]: "You have to know that they exist... use them in use cases where it's okay if they make some mistakes."
Mike Kaput [83:16]: "There’s no guaranteed way through prompting to avoid hallucinations, but you can be more specific, more detailed."

The hosts emphasize the importance of integrating AI tools responsibly, ensuring that human verification remains a critical component of AI-assisted tasks.

10. Voice AI Developments

The episode concludes with a rapid-fire segment on the latest voice AI technologies:

Sesame: An AI startup led by Brendan Uribe introduces a highly conversational voice assistant integrated into companion AI glasses, enhancing real-time interaction and contextual understanding.

Quote:
- Mike Kaput [87:50]: "It was wild to see it all kind of all the Voice tech coming out at the same time."
Heygen and 11Labs Partnership: Collaboration to integrate voice generation with avatar creation, allowing tailored voices that match custom avatars based on specific prompts.
Hume AI’s Octave: Launch of the first LLM built specifically for text-to-speech, capable of understanding context and delivering emotionally nuanced speech.
Eleven Labs’ Scribe: Introduction of a highly accurate speech-to-text model supporting 99 languages, outperforming competitors in various benchmarks.

Notable Quotes:

Paul Raitzer [87:40]: "Alexa touches on so many areas...how much knowledge, how much are you giving up?"
Mike Kaput [64:00]: "These developments are a significant stride in making voice assistants more integrated and emotionally intelligent."

The advancements in voice AI signify a push towards more natural, responsive, and context-aware voice interactions, marking a transformative phase in human-AI communication.

11. Conclusion

Paul and Mike wrap up the episode by reiterating the rapid pace of AI advancements and their transformative potential across various sectors. They encourage listeners to stay informed, participate in upcoming events like the AI for Writers Summit, and remain proactive in adapting to AI-driven changes.

Final Thoughts:

Paul Raitzer: Advocates for embracing AI tools while maintaining human oversight and leveraging uniquely human traits to stay competitive.
Mike Kaput: Highlights the necessity of continuous learning and adaptation to harness AI’s full potential effectively.

Join the Conversation For those interested in further exploring AI, visit Marketing AI Institute to access resources, subscribe to the weekly newsletter, attend events, take online courses, and engage with a community of over 60,000 professionals and business leaders.

Stay Curious and Explore AI!

Loading summary...

Transcript

Paul Raitzer (0:00)

These models already are superhuman at persuasion. It's just red teamed out of them. Like persuasion is the ability to convince people to change their beliefs, attitudes, intentions, motivations, behaviors, and it uses advanced reasoning, it uses emotional appeals. And so I think persuasion starts to become like a truly concerning area of development. Welcome to the Artificial Intelligence show, the podcast that helps your business grow smarter by making AI approachable and actionable. My name is Paul Raitzer. I'm the founder and CEO of Marketing AI Institute and I'm your host. Each week I'm joined by my co host and Marketing AI Institute Chief Content Officer Mike Caput as we break down all the AI news that matters and give you insights and perspectives that you can use to advance your company and your career. Join us as we accelerate AI literacy for all. Welcome to episode 138 of the Artificial Intelligence Show. I'm your host, Paul Raitzer. I'm with my co host Mike Kaput. It is AI for Writers Summit Week presented by Goldcast. So Mike and I are doing this Monday, March 3, 11am Eastern Time. We are recording, we will be live for the writers summit on March 6th. So that is coming up. So if you're listening to this on March 4th, 5th or maybe even the morning of March 6th and you want to join us for the virtual AI for Writer Summit, you can do that still. So it is coming up from noon to 5 Eastern time on Thursday, March 6th. This is the third annual summit. Last year was more than 4,500 people. I think there's 90 countries.

Mike Kaput (1:45)

Yeah.

Paul Raitzer (1:46)

What the number was. So you can go to AI Writer Summit.com. you can also find it on the Marketing AI Institute website. The event has, gosh, really well, six sessions. So I'm going to run through real quick. So we got the State of AI for Writers and Creators. That's my opening keynote. We have a panel discussion on AI copyright and ip what Writers and Creators need to Know. That's always a fan favorite and I'm very much looking forward to the updates from Jen Leonard and Rachel Dooley on that one. That is a. I think that's actually a Fireside Chat if I remember correctly. We have Andy Crestadena doing Mastering AI Prompting, Harnessing Creative Potential. We've got Mike Caput doing AI powered research Research Transforming how writers discover and create. We have a relaxation exercise with Tamara Morowski, our director of partnerships, always popular. We have an amazing conversation with Mitch Joel on the future of creativity, AI storytelling and the writer's evolution. And then it wraps up with an Ask us anything on navigating AI for writers and creators with myself, Mike, Rachel and Andy and then I'll have some closing remarks. So it actually is going to wrap up about 4:30 Eastern time, so noon to 4:30 Eastern on March 6th. AI writer summit.com and again, a big thank you to Goldcast, who's our presenting sponsor for the event. We use Goldcast for our virtual summits. We have three annual virtual summits now. I think one of the standout features for us is their AI powered content lab which takes all the event recordings and instantly turns them into ready to use video clips, transcripts and social content, which saves our team a ton of manual work and hours. So if you're running virtual events and want to maximize your content in an effortless way, check out Goldcast. You can go to Goldcast IO to learn more. And if you join us for the AI Writers Summit on Thursday, you'll get to experience Goldcast for yourself. Also, just a quick note again on the State of Marketing AI report. We are currently collecting responses for the 2025 survey and report. You can get go to that at State of Marketing AI.com Last rate over 1800 people Mike. I think we're probably closing on close to a thousand responses I would imagine.

Mike Kaput (5:00)

Yes, it does. So OpenAI has unveiled it. GPT 4.5 is out in the wild. They say it is their quote largest and best model for chat yet. They say of the model quote, early testing shows that interacting with GPT4.5 feels more natural. It's broader knowledge base improved ability to follow user intent and greater EQ make it useful for tasks like improving writing, programming and solving practical problems. We also expected to hallucinate less. That sentiment was echoed by Sam Altman, who also posted Good News it is the first model that feels like talking to a thoughtful person. To me, I've had several moments where I've sat back in my chair and been astonished at getting actually good advice from an AI. The model demonstrates impressive factual accuracy compared to predecessors in internal testing on what OpenAI calls Simple QA, which is a benchmark measuring factual knowledge, 4.5 achieved a 62.5% accuracy rate, which is significantly outperforming GPT4O's 38.2%. Similarly, it reduced hallucination rates from 61.8% to 37.1%. Human testers apparently also according to OpenAI, showed a clear preference for 4.5 over 4.0, particularly for creative tasks and everyday conversations. The model's responses are notably more succinct and conversational. It has a more intuitive understanding of when to provide brief, empathetic answers versus detailed information. Now, Altman and OpenAI also note that there are some obvious flaws and limitations with 4. 5 at the moment. Altman says it is, quote, a giant, expensive model and it is only available at the moment to GPT ChatGPT Pro users, the ones who pay 200amonth for that license, says Altman. Quote, we really wanted to launch it to plus and Pro at the same time, but we've been growing a lot and are out of GPUs. We will add tens of thousands of GPUs next week and roll it out to the plus tier then hundreds of thousands coming soon and I'm pretty sure y'all will use every one we can rack up. He also makes it very clear this is not a reasoning model. It will not, quote crush benchmarks. He says, quote, it's a different kind of intelligence and there's a magic to it I haven't felt before. So Paul, this seems almost like they kind of optimized a frontier model, almost like for Vibes, which is weird to say, but seems like what they were going for here. What are your initial thoughts so far on four Point? Do any of these pros and cons of the model particularly jump out to you?

Paul Raitzer (7:48)

I think it's more a sign of what's coming versus being some obvious leap forward in capabilities and performance. I I've personally been using it. I was using it this morning as I was kind of getting ready for the Podcast and I was experimenting with some prompts. I think you, you need to have like an arsenal of specific applications or prompts that you test these things on. Like Ethan Moloch, you know, does a great job with this. He's got these like same prompts he uses every time and it's like okay, yes, I can see and feel the difference. I don't think the average user will feel the difference or, or you know, if you just start using it, start to see like these outputs, you're just like oh my gosh, this is such a massive leap over 4. And I, I don't think that's the point. So a couple of notes it. They say it does have access to updated information, including search, it supports files and image uploads, it can use Canvas for writing and coding, but it does not support multimodal features like voice. So you can't go into advanced voice even if you have the Pro account. You're not going to get to talk to 4.5 yet. Video and screen sharing, those aren't in there yet. That'll, you know, kind of come later on. There's a few things that I think very noteworthy. Like as I spent started spending more time thinking about this this morning in Prepar, a couple of things jumped out at me. So first this ongoing debate about scaling laws and you know, there's the two methods now there's throw more Nvidia chips, more you know, more compute and more data at these things and you know, let them learn and get smarter. And then there's the reasoning, like the test time compute where you give them more time to think. So this is the latter or the prior. This is the unsupervised learning, you know, giving it more compute, giving it more data 10 times, probably more than GPT4 is the belief and, and see what happens, see what kind of comes out the other side. And so what they claim is by doing this, by giving it roughly 10x more pre training compute, these things start to recognize patterns better, they draw connections, they generate more creative insights without reasoning. And then GPT5 is where we'll get this merger of the models and it'll now have the reasoning abilities as well. So the reason you may not experience some dramatic feeling in terms of the difference of the output is because it's sort of all just this underlying broader knowledge, deeper understanding of the world. I thought Andres Karpathy, who we've talked about many, many times on this show, but he was at OpenAI for a couple of stints. He had a great tweet that sort of, like, gave his personal perspective. And I thought I'd read that real quick, or excerpts of it. It was a pretty long tweet because I think it sort of sets the stage here. So he said, I've been looking forward to this for two years, ever since GPT4 was released, because this release offers a qualitative measurement of the slope of improvement you get out of scaling pre training compute, which means simply training a bigger model. So he's the One that's saying each.05 in a version is roughly 10x pre training compute. So that's just more Nvidia chips being applied to this stuff, basically. So he said, for context recall, GPT1 barely generates coherent text. GPT2 was a confused toy, in his words. They skipped 2.5, went right to 3, which was interesting. And, Mike, if I'm not mistaken, GPT3 was what was in the world when you and I wrote the marketing Artificial intelligence book.

Paul Raitzer (11:18)

So there was a. There was a section I wrote where I said, what happens when machines can write like humans? That section was written in. I think I wrote that in early 2022. And it would have been projecting out, like, what we were seeing, seeing already happening. And we knew we were going to enter this phase where these things could write like humans. So this is before the ChatGPT moment, but we were already seeing this enough that we wrote about it in our book as, like, sort of an inevitable outcome. So then Andre's cart continues. GPT 3.5 crossed the threshold where it was enough to actually ship as a product and sparked OpenAI's chat GPT moment. GPT4 in turn, also felt better, but I'll say it definitely felt subtle. I remember being part of a hackathon trying to find concrete prompts where GPT4 outperformed G 3.5. So, again, like, this is someone who's sitting in these labs having this same debate. Back from 3.5, which was the first version of ChatGPT in November 22, to GPT4, which came out in March 23. And so they were having the same battle internally. Like, we're trying to find the subtleties, trying to find. It's just smarter. It just feels different, it feels better, but it's hard to, like, explain. So then he goes on to say, we do actually expect to see an improvement in tasks that are not reasoning. Or this is actually going back to. Yeah, yeah, this is still Andre's improvement. Tasks that are not reasoning heavy. And I would say those are tasks that are more EQ as opposed to IQ related and bottlenecked by, for example, world knowledge, creativity, a knowledge, analogy making, there we go, general understanding, humor, et cetera. So these are tasks that I was most interested in doing my vibe check. So for me, I started focusing in on this EQ versus IQ concept because I think this is very, very fundamental to understand where these things go. And that's why I'm saying I see 4.5 more as a prelude. And honestly, like, I think it gives us a few months, not much more than that, because 5 is coming to grapple with what, what it means when these models become high in EQ. So some context here. So in the GPT5 post from OpenAI, they highlight right toward the beginning. Combining deep understanding of the world with improved collabor collaboration results in a model that integrates ideas naturally in warm and intuitive conversations that are more attuned to human collaboration. GPT4.5 has a better understanding of what humans mean and interprets subtle clues or implicit expectations with greater nuance. And EQ Emotional quotient. Right, that's what EQ stands for.

Paul Raitzer (14:15)

All right. So GPT5 also shows stronger aesthetic intuition and creativity. It excels at helping with writing and design. So to me, the EQ part is what really matters here because it moves models more into the realm of skills, traits and even professions that we perceive to still be uniquely human or like safe. So IQ provides the foundation for solving intellectual, technical, analytical challenges. EQ is all about navigating social complexities, communicating clearly handle handling emotional nuances. So when we think about what is the impact of EQ as these models, whether it's Claude or Gemini or in this case GPT 4.5, as they become higher in emotional intelligence, it enables interactions that start to feel way more natural, way more. It gives the AI a feeling of empathy that it can, it can seem more empathetic and it can seem more human. Like it then becomes better at task performance because it helps it discern like the subtleties of intentions behind the user's request because it actually sort of understands humans a little bit better. This leads to better supporting complex tasks like writing and customer service and things like that. It does then reduce misunderstandings and errors like hallucinations just naturally fall because it starts to understand the intent behind prompts more. So I think that as we start to get this emotional intelligence, it, it starts to change the way we interact with these models, it starts to change the, the use cases in a business environment for the models. And it starts to probably creep more into these professions that we thought were maybe safer from AI. And so that kind of led me to think about, well, what are the ramifications of this? Like, as the emotional intelligence increases, what do we now have to face both in business and society? And so a couple of things that came to mind. One is manipulation risks. So AI could be subtly manipulating the user by appealing directly to their emotions. That enables them to start affecting decisions and behaviors, privacy and data. So these AI systems have to analyze and understand deep emotional cues, often requiring access to sensitive data. So this is where, you know, Sam has alluded to this, that the future of their models, and certainly we've heard this with other model companies, is memory and personalization are the keys. It wants to remember every interaction. It wants to personalize the experience to you. So EQ is a path to true personalization. And if you have something that can talk in a very natural way to you and be empathetic to you and truly understand your emotions and your needs, or at least perceive that it is now you get dealing with these emotional bonds and dependencies that people develop with their AI, which we're already starting to see with models that don't have high emotional intelligence. And this leads to maybe the biggest concern of all, which is earlier last year on the podcast, I shared a tweet from Sam where he said he thought these machines would be superhuman at persuasion before they were super superhuman at anything else. And so in the AI exposure key that we talked about when I was sharing the jobs GPT2 stuff that I created last year, one of the key exposures is level 8, which is persuasion abilities. And as I've said before, these models already are superhuman at persuasion. It's just red teamed out of them. Like, persuasion is the ability to convince people to change their beliefs, attitudes, intentions, motivations, behaviors. And it uses advanced reasoning, it uses emotional appeals, it uses the ability to understand and influence people's emotional intelligence. And so I think persuasion starts to become like a truly concerning area of development. So again, just to recap, like, are you gonna go into 4.5 if you're paying the 200 bucks a month and like, feel the difference? I don't know, maybe for some prompts or use cases you might. But I think the underlying thing here is OpenAI is putting this into the world three months roughly before they launch GPT5, which will not only have higher emotional intelligence, because go back to Carpathi's tweet 10x. So if my math is doing this right from GPT4 to GPT5 is a hundred x increase in compute go 10x to 10x. So you're going to not only have a much more powerful model, you're going to have reasoning layered over that model and you're probably going to see a massive leap in the emotional intelligence once you layer reasoning over already more powerful model. So I think it's probably just very important that people don't gloss over this release as like it's the same. I don't really see the difference. That's not the point. I think the point is to Prepare us for GPT5, which will likely be a leap of sorts over what you're used to, and it will have the reasoning capabilities baked into it. And I'm very, very confident in saying that, like no one is really prepared for that. Like in business again, Mike, you and I sit in these meetings all the time. We run workshops, we do talk. You just show people like the most fundamental things like image generation, and they're just like jaws on the floor blown away. This is possible. They're not thinking about like where these things are going and what they're truly going to be capable of.

Mike Kaput (19:57)

In another big topic this week we have another major model release because Anthropic has also released Quad 3.7 Sonnet, which is its most intelligent AI model to date and what they're calling the first quote hybrid reasoning model on the market. What makes this model so unique is its dual mode approach. Users can choose between a standard mode for quick responses or an extended thinking mode where the model performs step by step reasoning that's made visible to the user. So Anthropic says this is because they believe that quote just as humans use a single brain for both quick responses and deep reflection, we believe reasoning should be an integrated capability of frontier models rather than a separate model entirely. In early testing, cloth 3.7 sonnet has shown particularly impressive results, including encoding and web development. Some major tech companies, according to Anthropic, have already noticed improvements. AI programming assistant company Cursor found Quad to be quote best in class for real world coding tasks. Vercel highlighted its exceptional precision for complex agent workflows, and Replit has reportedly succeeded in using the model to build sophisticated web apps from scratch where other models stall. Now, along with this model release, Anthropic also introduced Claude Code, which is a command line tool for agentic coding. Available as a limited research preview, this tool enables Developers to delegate substantial engineering tasks to CLAUDE directly from their terminal. Claude code can search and read code, edit files, write and run tests, commit and push code to GitHub, and use command line tools, keeping the human developer informed at each step. So Claude 3.7 if you're trying to use 3.7 Sonnet, that's now available on all CLAUDE plans as well as through the Anthropic API. However, the extended thinking mode is only available on the paid tiers. So Paul, you had just alluded to this. This seems like a preview of where we're headed with GPT5 models that bake thinking right into a single model. What do you think of their hybrid reasoning approach and also kind of their justification for it as the way the human brain works.

Paul Raitzer (22:26)

I this reminds me so much of their fall launch of Computer use where they presented it. It's like this groundbreaking thing that only they had solved. And this is not a knock on Anthropic by the way. This is just how they're doing their marketing right now and communication stuff. So they're presenting this as like we've cracked that reasoning should be part of these models. Like everybody's doing this like this. They're just literally put out a 3.7 just to be first to market with some early version of an LLM plus reasoning. But it's literally what Gemini is doing, it's what OpenAI is going to do with chat, GPT, things like that. So yeah, all response has been really positive that I have seen. I have not personally tested 3.7, but everything I've seen about it is that it's a very strong model. I thought it was interesting in their I think it was their system card where they said it that they've optimized less for math and computer science competition problems which you kind of highlight as like where it's actually really seems really good. Like really good. And instead they shifted their focus towards real world tasks that better reflect how businesses actually use LLMs. But I couldn't actually find any reports of what those were. Like it just alluded to it but it didn't show those. So it's good. Like that's what we were saying a couple episodes ago. Like that's what we want. It's like focus on actual use cases. So that's great. Like if they're doing that I would love to see that research. One thing jumped out to me is they shared this timeline in the post. I don't know if this is the system card post or the original post, but in this timeline, they show Claude assistance. Then they 2024. Then they showed Claude collaborators, which is 2025, where it's like helping you do extensive work, you know, in much shorter time period. So Claude does hours of independent work for you on par with experts, expanding what every person or team is capable of. So that's like 3.7. Then I assume Claude for Opus, like, because again, this isn't even their biggest model. Like the thing with Anthropic, that's kind of odd is like OPUS is their biggest model and they just keep releasing Haiku and son. I think Haiku is still a thing. Haiku is like their mini. And then Sonnet is like their medium. And then Opus is the big model. And that's the one that we've been sitting here waiting for for like 12 months. So my guess is at some point we get like four Opus, or maybe it's four Sonnet. I don't, I don't know what they're going to do, but they're obviously like three. Seven is very much this intermediary before intermediate step before the four. But then on their timeline, they show 2026 isn't present. They just go right to 2027. It says Claude Pioneers Quad finds breakthrough solutions to challenging problems that would have taken teams years to achieve. So they're very much following and the graph is representative of scaling laws. So if you look at this, and I think it is an intentionally showing what a scaling law graph might look like, and based on Dario Amade's comments that we talked about on the podcast a couple weeks ago, they seem to very much be positioning themselves as like Claude5, I'm guessing in their world is like AGI. And so this is again a step toward. It is the first one to combine reasoning and traditional LLMs. And I think it is a prelude to these much bigger things that Dario has already alluded to, which again tells me they're, you know, that we talked about the two kind of scaling laws up front here with GPT4. Five, you have the traditional unsupervised training, more computer, more data, and they just get, you know, better and smarter. And then you have the reasoning, which is the test time, which is like, give them more time to think. And when you combine those two scaling laws, the assumption from all these major labs seems to be we get AGI, that we, we enter the phrase where these things are just now better than humans at basically all cognitive tasks. And the 2027 that they marked here seems to be around where everybody is kind of centering on what we will have it by then, which isn't very far away, Mike.

Mike Kaput (28:20)

Our third big topic this week, Amazon has just unveiled Alexa, which is a complete reimagining of its voice assistant powered by generative AI. At an event in New York, Amazon's devices and services chief called it a complete re architecture of the AI assistant. And this major overhaul transforms Alexa from that kind of stilted single question interactions that users are familiar with into a genuinely conversational assistant capable of understanding context, remembering preferences and taking meaningful actions. The company demonstrated flowing natural conversations that represent a pretty significant departure from the command based interactions that have defined Alexa up until this point. So Alexa has an impressive range of capabilities that go way beyond just simple queries as well. Amazon says the new assistant can answer personalized questions about your life and activities. Things like, how many books have I read this year. Drawing on information from a customer's Amazon account, it can proactively notify users when concert tickets become available or help with complex tasks like booking dinner reservations. There's also a standout feature in the form of Alexa's new visual understanding capabilities. Through a device's camera, it can process a video feed and respond to questions about what it sees. Now, beyond Basic Assistant, it also has some powerful productivity features. Users can upload files, docs and emails that Alexa will parse and reference in future conversations. Now, the integration with Amazon's broader ecosystem appears to be a pretty big advantage here. Alexa works with the Echo Show Smart Smart displays to power personalized content feeds and provides a new for you panel with timely updates based on user interests. It can control smart home devices, play music from Amazon music on connected speakers, and even direct fire TV devices to skip to particular scenes in movies or shows. Now, in one particularly impressive demo, Amazon showed how Alexa could summarize footage from ring security cameras describing what's happening in a scene and pulling up specific moments. Apparently, Alexa will cost 1999 per month, but will be available free for Amazon prime members. Now, the rollout starting in the coming weeks with an early access period prioritizing owners of Echo show devices, followed by a wider release over the coming months. So, Paul, Amazon touches on so many areas of people's consumer in content consumption habits. Like, how big a deal is this if it works as advertised?

Paul Raitzer (31:10)

I unplugged mine like seven years ago because my kids, when they were little just kept asking it like crazy questions and I was like, oh my gosh, this thing's going to drive me nuts and I don't use it otherwise. I have not personally used an Alexa device and yeah, probably seven years. I don't, I don't even know. So yeah, I, you know, I think this is what Siri was supposed to be like. Like even what Google Assistant was supposed to be. So the vision here is big. The, you know, if they execute, that's a really big deal. One quick note that's really interesting is if you read the post announcing this from Amazon. They mentioned nothing about Anthropic in it and yet Anthropic, I'm sure with Amazon's permission, tweeted Claude will help power Amazon's next generation AI assistant. Alexa plus Amazon and Anthropic have worked closely together over the past year to help Amazon get the full benefits of Claude's capabilities. So as I alluded to earlier, maybe it isn't their own data and distribution that matters. It's their models living within places that do have data and distribution, which Amazon qualifies for as much as anyone. And as we've talked about on the show before, Amazon has invested 8 billion into anthropic to date. That is no small amount if they're going to be valued at 61.5 billion. Assuming Amazon, you know, carries forward their ownership and stake in it. I'm guessing Amazon probably owns somewhere in that 20% range of anthropic. That's, again, big deal, you know. So again, if you're looking for suitors to potentially acquire Anthropic, Amazon sure fits that bill. So a couple notes here. The, the article where Amazon announced all this. They say they have 600 million Alexa devices. So we talk about distribution. You know, if you put that in now, how many of them are unplugged like mine? I don't know. But let's assume some fair percentage of those 600 million are, are actually in use in people's homes. I think the, one of the things that jumped out at me is they refer to it as her and she, like, they, they're very much personifying this, this, these technologies, I guess. So they say she is more conversational, smarter, more personalized. I'm going to highlight a couple of excerpts from the post from Amazon about this because I think there's some very fundamental things here that are part of the bigger story. So the first, and this is my own way of describing this, is this is the Everything AI Assistant, as you kind of alluded to. Some of these, she now quote from them. She keeps you entertained, helps you learn, keeps you organized, summarizes complex topics, and can converse about virtually anything. Alexa can manage and protect your home, make reservations, help you keep track, discover and enjoy new artists. She can also help you search, find or buy virtually any item online and make useful suggestions based on your interest. Alexa does all this and more. All you have to do is ask. So I think it's interesting because when they came out with what were these skills? Is that what they Alexa?

Paul Raitzer (34:12)

And the thing I always like, struggle with Alexa is like There was like 10,000 skills and I don't know what any of them are. I just know I can do weather and sports scores and like beyond that, like, I don't know. So they're kind of. It's like this new age of the everything AI assistant, but now just through conversation, like, you don't have to know the skills, you just have to talk to it and assume it can help you with anything. The next is this whole emotional intelligence thing. So they, and again, they didn't call out in here. This is me looking at it saying like, okay, here we are, we're carrying on this emotional intelligence play. So what they said, quote is conversations with Alexa feel expansive and natural. Whether you're speaking in half, form thoughts using colloquial expressions or exploring complex ideas. Alexa understands what you mean and responds like a trusted assistant. It feels less like interacting with technology and more like engaging with a thoughtful or insightful friend. So again, we're going to start to feel this emotional intelligence coming through in all of our devices, all of our software. Then they get into agents so, you know, can't talk about anything with AI without getting into the agentic side. So they say. At the foundation of Alexa's state of the art architecture are powerful language models available on Amazon Bedrock, which is kind of where you go and get access to all their models. But that's just the start. Alexa is designed to take action and is able to orchestrate across tens of thousands of services and devices, which to our knowledge has never been done at this scale. Again, this is quoting them. To achieve this, we created a concept called Experts groups of systems, capabilities, APIs and instructions that accomplish specific tasks, types of tasks for customers. They also go on to say Alexa introduces agentic capabilities which will enable Alexa to navigate the Internet in a cel directed way to complete tasks on your behalf behind the scenes. Let's say you need to get your oven fixed. Alexa plus will be able to navigate the web. This is a big deal. Navigate the web. Use thumbtack to discover the relevant service provider, authenticate, arrange the repair and come back to tell you it's done. There's no need to supervisor intervene. I didn't watch the announcement thing, but like this seems like it's being underplayed. If this is actually going to work like this, this is the big deal stuff. Another one, memory and personalization. And again, this now gets into the Will you give up the data you need to get the benefit is the question here. The new Alexa is highly personalized and gives you opportunities to personalize further. She knows what you've bought, what you've listened to, the videos you've watched, the address you ship things to and how you like to pay. But you can also ask her to remember things that will make the experience more useful to you. To you. You can tell her things like family recipes, important dates, facts, Dietary preferences and more. And she can apply that knowledge to take useful action. For example, if you are planning a dinner for your family, Alexa can remember that you love pizza, your daughter is a vegetarian, and your partner is gluten free to suggest a recipe or restaurant. So, Mike, I'll stop there for a second because I want to like, explore this, the amount of personal data. So we are all going to have access to the same device. If you're a prime member, you're going to get it baked in for 19.99amonth. Imagine all of these capabilities at your fingertips in, in any device. And, and they're also going to have a standalone Alexa plus app that'll function just like a chat GPT app. They're going to have a new Alexa.com website where you can interact just like you would interact on chatgpt.com how much knowledge, like, how much are you giving up? How much are you like, guiding your family members to give up when you're, you know, someone's mom starts talking about, oh, I heard Alexa plus can do this, I'm going to start giving it all of our family history so I can. Are you there? Like, I don't know if I'm there.

Paul Raitzer (39:35)

Yeah. And so like, and OpenAI can't touch this stuff. Like this is the thing. Like Anthropic's not gonna build this. This is data and distribution. These are two things we keep hammering back to people. They have the data about your personal life and Amazon owns Whole Foods. Like they, they've all data too. Like, so you have data from all these sources. Apple's got it too. Anthropic's not going to get it. OpenAI is not going to get it like that. I think that that's done like the, the race for ownership of our data lives within three or four major companies basically. And whether they build their own models and enable it or they use somebody else's models and that, that kind of leads Mike, to my last note here and we'll timestamp this conversation separately for people. It's sort of like one in the same, but it continues on. There was a big article about Apple and Surrey last week and this is from Bloomberg. Mark Gurman, who's like an insider when it comes to, you know, Apple information and news that we follow closely. So he writes Amazon's Alexa plus, which was announced this past week, is essentially a version of ChatGPT's voice mode with knowledge of who you are, who the people in your life are, your interests and the context of your home and surrounding environment. He goes on to talk about there being one area where Apple has an edge. Amazon lacks an ecosystem of outside the home products and a native app ecosystem that can make Alexa more powerful. It has smart speakers and other gadgets, but nothing like Apple's billions of well integrated mobile devices. But that only makes the Apple intelligence situation even more disappointing. And Mike, was it last week I said like Apple intelligence still sucks. Like I, Yep, I'm not being overly harsh on Apple. Like they know it's bad and so I'll continue on because I think this is like really important context to what's going on and like who the winners might end up being, he said. So Mark continues. Apple could have melded advanced AI with its ecosystem to create something powerful and magical. He then said the next version of Siri will be a test of whether Apple can mount a comeback. The software will likely be released in May, a full 11 months after they introduced it. The current version of 18 of Siri essentially has two brains, one that operates traditional Siri commands. That's the stuff. Like what's the weather, what's the sports, whatever and where's my stocks at? And the other that handles advanced queries which if you've used it is basically like, it's like talking to Siri always has been. It usually doesn't have the answer. If it requires anything to explain something to you, it now connects the chat GPT to it. That's basically what Surrey does now, if you have any. And he says for iOS 19, Apple's plan is to merge those systems together. He expects they'll introduce this as part of their world develop worldwide developer conference in June with a launch of spring of 26. So he's talking about like another full year before we start to get this like merged system. He said the new system dubbed LLM Surrey internally was supposed to introduce more conversational approach in that same release but that's now running behind and might not get till June of 26. So many thing people within Apple's AI division now believe that a true modernized conversational version of Siri won't reach consumers until iOS 20 at best in 2027 when anthropic things we're going to have AGI.

Mike Kaput (46:02)

Let's dive into some more rapid fire for this week. So another big piece of news we're tracking is OpenAI has begun rolling out deep research to all chat, GPT plus team, education and enterprise users. If you recall, Deep Research is this agentic research assistant that can think for extended periods of time. It can go research things for up to 30 minutes and use the web to gather information about topics and actually end up doing pro level research for you online completely autonomously. And it delivers this incredible final result in the form of a comprehensive research brief that often totals dozens of pages. So since this became available to pro users last month, it's definitely been wowing a lot of knowledge workers for its ability to do in minutes a level of high quality in depth research that used to take hours or even days. And I think you can consider us as some of those people wowed by this because we are using it quite often. It's actually interesting. In a publication called Understanding AI on Substack, tech journalist Timothy Lee conducted an evaluation of Deep Research by basically showing it to 19 different experts across different professional fields. Seven out of the 19 said that the responses already were at or near the level of an experienced professional in their fields, and a majority estimated it would take at least 10 hours of human labor to produce comparable reports. What's more, in a head to head comparison with Google's Deep Research, same name, similar product, which was released in December, 16 out of 19 of these people preferred OpenAI's responses. Now I will say as you dive in and hopefully get excited about using OpenAI's deep research in your ChatGPT account, you only get 10 queries per month to start if you do not have a pro account. One other note, there are some additional good news coming, good news items coming out for ChatGPT users. OpenAI is also rolling out advanced voice mode powered by GPT4OH mini to all free ChatGPT users. So you can actually start trying that out for yourself as well. Paul I'm kind of curious to see how much this becomes kind of a mind blowing moment for knowledge workers. I feel like anytime I've actually seriously showed this to someone that doesn't know what's possible, they are pretty impressed. There's obviously lots of issues, you still have to check everything, but it's been literally a month and I feel like this is just incredible. We even have this capability.

Mike Kaput (52:22)

Next up, former Tesla AI director and OpenAI founding member Andrej Karpathi we talked about in a previous segment has sparked a pretty important conversation about what really matters in our AI driven future. In a recent post on X, he made a surprising claim by saying that agency is significantly more powerful and scarce now than intelligence. And he actually wrote I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment, slash media obsession with iq, et cetera. He basically says that we all assume that raw intelligence is the ultimate asset. But in the age of AI that is starting to change. Now. He defines this idea of agency as this separate kind of attribute from intelligence. It's quote, an individual's capacity to take initiative, make decisions and exert control over their actions and environment. This is about being proactive rather than reactive. People with high agency don't just let life happen to them, they actively shape it. They combine self efficacy, determination and ownership over their path. Now the idea here is that with AI, everyone's going to get more of this type of agency by default. But also as AI handles increasingly complex cognitive tasks, intelligence becomes a commodity. It's basically on tap. So really the only true differentiator, he would argue, becomes agency. As a result, we need to be prioritizing agency in everything we do. And he poses several provocative questions along these lines, saying, are we hiring for agency? Are we educating for agency? Are you acting as if you had 10x agency? So Paul, I when I read this I thought this concept is just something really important for knowledge workers right now to take to heart. It feels, and I still have to explore it a bit more, but it feels to me like at least one directionally correct way to really give yourself the best chance of like, for lack of a better phrase, becoming like AI proof, right? And just building an incredible competitive advantage by optimizing for exhibiting as much agency as possible.

Paul Raitzer (54:39)

Like what did you Think, yeah, I loved this tweet. I, you know, when I flagged it last week, I was like, man, we should probably talk about this. And I feel like I could probably spend a full main topic on this one. So I'm trying to be like, concise here. So I have seen this play out time and time again throughout my career. Many of the best producers I've hired, many of the best leaders I've seen, certainly most of the best entrepreneurs that I know have all been like, average students. Like, they, they didn't come from the top Ivy League schools. They were just insanely resourceful and resilient. And they didn't fear failure. Like, they just found ways through things. They viewed failure as, like, part of the journey. So one of the books this made me think about, I read very early in my career, before I even started my own agency, was called Will and Vision. And in that book, the authors tell us and Golder define will as a. Now they talk in company terms. Where you can apply the same thing at an individual level is an unwavering determination and commitment to achieve a specific vision, demonstrating a strong resolve to overcome obstacles and execute a strategy, even in the face of challenges. Essentially representing the driving force behind a company's ability to achieve leadership, market leadership, despite being a latecomer to the market. So theirs is all about, like, what makes market leaders. This touches on education. So I get asked all the time, like, what should my kids major? And I think about it myself with my kids. And the thing I'm fairly confident in, at least in my own belief system, is a liberal arts degrees matter greatly. I don't know if you should go into computer science on its own. I don't, I don't know if programming is a thing 10 years from now, but I know it's part of, like, the thing. So I think going to a university still matters. I do think the life experience of a college experience is relevant. I don't think it's essential. I don't think you have to have it in the future, but I think it matters. And if you're going to do it, I think liberal, liberal arts is a really good choice because I feel like the best talent moving forward, the people with the most agency are going to have elements of Phil. Philosophy, psychology, sociology, history, science, business, fine arts, political science, computer science. Like all of that helps a diversity of experience and perspective. And so like, when I think about even our own hiring plans, I don't care where people went to college. I actually don't even care what their GPA was. I can't. The only time I remember looking at GPAs in the early days of building my agency was my, my actual marketing agency, not different kind of agency is when I would see someone with like a 4.3. I would. This is one of my favorite questions to ask. I would say, what are you going to do when you're not the smartest person in the room anymore? Because so often what happens with people who are just brilliant is they've never struggled. Like, they've never had to like, really know what it's like to fail in, in class and to like be in a room where you don't feel like the smartest person in the room. And when you get into the real world and real life experience starts to matter, that GPA of 3.3 means nothing. Like you're now you're dealing with ramifications of decisions and unknowns. Like in the future you can't go studying a book. And so like, the way I always think about it is like, IQ matters to a degree. Like, you have to be able to understand complex topics. You have to be able to learn things. You have to be able to, you know, take tests. Well, basically in real life. But what I'm more interested in is the emotional intelligence that we talked about. Are you a problem solver? Are you a hard worker? Are you confident but humble about your confidence? Are you resourceful and resilient? Are you curious? Are you a fast learner? Do you have an insatiable desire to keep learning? Like, that was always one of the things I was looking for is like, do you read books outside of work? Like, are you doing things I'm not asking of you to become better at what you're doing, which is, gets down to intrinsic motivation. Are you proactive? Are you persistent? Are you passionate? Do you understand people? Do you understand machines? Like, I think that all of this matters and I think it fits into this agency umbrella that, you know, he talks about. It's this idea that you can achieve anything. And if I can give any advice to parents, to, to, to employers, like, instill a belief that anything is possible, that the only limitation is what you put on yourself. Because it doesn't matter what school you came from or what your GPA was. Once you get into the real world, all that matters is that you work hard and separate yourself and you create value consistently and, and honestly. It's actually kind of easy to do. Like we were having this conversation with some family friends a couple weeks ago and I was saying this, like, how how easy is it to like stand out when you get into the professional world? Like you don't have to be the smartest anymore, you just gotta be all these other things like that you differentiate yourself fast in the real world when you can do those things. So yeah, I'm a hundred percent on this topic. Like I said I could Talk for like 30 minutes about this topic. I think it's very, very important though, and I think it's actually fits in the bigger theme of like what matters in the future when we do have GPT5 and GPT6 and they have reasoning and they have emotional intelligence. Like what else actually is left? I think this is the answer. Like I think these are the things that remain fundamental and we'll figure the rest out. Like if you have all these basic traits and skills and emotional abilities, like you'll solve the rest of it, but if you don't and then you're just book smart, not not going to go well.

Mike Kaput (64:33)

All right, next up, next up, robotics startup figure we've talked about quite often is making waves with a couple significant announcements. So they have a breakthrough improvement to their AI system for package handling. And even more surprisingly, they plan to begin testing humanoid robots in homes much sooner than expected. CEO Brett Adcock, who's quite active on X, revealed that figure will start alpha testing its figure 02 humanoid robot in home settings later this year, which is a timeline that's been accelerated by approximately two years. This kind of unexpected shift is attributed to rapid advancements in the company's recently announced Helix AI system, their internally designed vision language action model that unifies perception, language, understanding and learned control. We talked about this a bit last episode, so this is advancing faster than any of us anticipated, said Adcox, which is accelerating our timeline into the home. Now, they had previously been focused mostly on industrial applications for the current moment. Just last year they began piloting robots at a BMW manufacturing plant in southern South Carolina, and they've been simultaneously refining their technology, both with robots in the home, but also staying focused on that commercial side, specifically commercial logistics. They also outlined this week significant improvements in Helix's low level control system, known as System 1, which handles Visio motor control and essentially governs how the robot sees and moves. So in logistics testing, improvements to Helix have translated into really impressive results. Figures robots can now handle packages at speeds exceeding those of the human demonstrators they learn from. Paul this timeline seems quite, quite ambitious. It sounds like they have made some type of breakthrough with Helix, but it seems real fast. I mean, it's clear we're making progress, but do you really expect to see humanoid robots and homes beginning this year?

Mike Kaput (71:27)

I will say, if you are struggling historically with a use case, like, you're like, wow, I've tried a bunch of models for this thing. This could be really helpful to get some ideas to kind of try to crack that. But yeah, totally, like, kind of buyer beware here of the data you're getting. All right, so next up, a former writing coach named David Perel, who's pretty popular online, very popular Internet writer. I've followed his work for a few years. He's sparked a really interesting conversation about the future of nonfiction writing in an era of AI. So in a candid social media post, he actually said he made the decision to shut down his writing education business after six years, concluding that the skills he's been teaching are rapidly becoming obsolete in the face of advanced language models. He writes, quote, it has only been four months since I shut down my business. But I can no longer imagine teaching writing in a way that resembles anything close to the way I taught in the past. The reason is simple. The world of nonfiction writing has fundamentally changed, and many of the skills I've developed and built my career on are becoming increasingly irrelevant. He gives a pretty blunt assessment here. He says if you do a great job prompting things like OpenAI's deep research, you can now produce content superior to what he could create in a full day's work on most topics. Now, what's interesting is he doesn't say that nonfiction writing is dead, but he does conclude that you have to start thinking along these lines, that the more a piece of writing draws from personal experience, the less likely it is to be overTaken. Taken by AI Personal narratives, memoirs and biographies contain data that language models don't have access to the lived experiences of individuals. He also says that writing that prevent presents truly unique perspectives. What Peter Thiel might call important truths few people agree with you on maintains its value. He basically says, the more humanity, the more personality you can put into this stuff, the better chance you're going to have of actually standing out. Now, he says, for aspiring writers, his message is mixed. The bar has undeniably, undeniably been raised and writers are just competing with AI at this point. But at the same time, these tools can be really powerful aids, offering instant feedback and helping to refine ideas. Now, Paul, this is exactly the reason why we've hosted an AI for Writers summit every year and why we're doing so again this week. Like, AI is changing what we do as writers, but I don't think enough people are coming together to really explore what that means. That's why I really liked David Perel's post here.

Mike Kaput (77:01)

Next up, some news about HubSpot. So HubSpot has unveiled some ambitious projections for its partner ecosystem. And they're forecasting a 30 billion market opportunity by 2028, with AI expected to drive a third of that growth. So this comes from a recent analyst brief by idc. They wrote about highlighting how the convergence of AI and unified customer data is creating unprecedented opportunities for businesses building on HubSpot. So HubSpot's ecosystem has become increasingly central to their business model. 90% of their customers use at least one app from their marketplace. More than half are using five or more. The integration is profitable for this integration of these apps is profitable for partners as well. HubSpot Solution Partners projecting a median revenue increase of 44% from 2024 to 2025. Now, here's where kind of AI fits in. They project that there's a $10.2 billion opportunity specifically tied here to AI First Solutions. They describe an emerging trend towards what they're calling agentic solutions, which is a convergence of services and applications where partners can build AI agents or agent components that function within HubSpot's ecosystem. This could range from complete AI agents addressing common business needs to modular agent skills that can be combined for custom solutions. Now, at the core of all this is data integration. HubSpot emphasizes that AI is only as good as the data that trains it, and they position their unified data strategy as their competitive advantage. They point out that approximately 80% of customer data is unstructured, and that's information contained in emails, calls, support tickets and other communications. So their strategy involves making this data as actionable as structured data. And they've acquired recently companies like Frame AI to accelerate this capability. Now, what is your read, Paul, on HubSpot's AI opportunity and the opportunity that they're kind of outlining here for the partner ecosystem? Obviously, we've talked about a bunch of times. For anyone that's newer to the podcast, you started HubSpot's first ever partner marketing agency. So you have potentially the top opinion here on what the past, present and future of HubSpot could look like here.

Mike Kaput (83:47)

All right, Paul, we're going to wrap up here with a bunch of different product updates related to. There's been a ton of like, AI voice technology updates. So I'm going to kind of run through these rapid fire. Obviously chime in if you are feel particularly passionate about any of these, but otherwise just going to kind of give people a sense of all the stuff that happened this week in voice. So one of the things getting the most buzz online at the moment is something called Sesame, which is an AI startup led by Oculus VR co founder and former CEO Brendan Uribe. And it's come out of stealth mode with a voice assistant that reporter at the Verge described as the first voice assistant I've ever wanted to talk to more than once. Now, often, you know, experiences with Alexa, Gemini, other assistants we've talked about, they're hampered by lag misunderstandings, stilted responses. But Sesame appears to be very, very good at conversational fluidity. It's able to handle interruptions and course corrections mid conversation. It has a bunch of natural sounding pauses that mimic human speech patterns. And what's really cool here is they're not just building a better voice assistant, they're developing companion AI glasses designed to be worn all day, giving you high quality audio and convenient access to your companion who can observe the world alongside you. Now, at the same time, Heygen has partnered with 11Labs to integrate voice generation capabilities into their avatar creation platform. So this collaboration addresses what heygen describes as one of their biggest challenges for creators using the platform, which is finding voices that match the custom avatars they generate. Now, you can generate tailored voices by specifying age, gender, language, accent and descriptive style prompts. Hume AI, which we've talked about in the past, has been busy with their release of Octave, which they're calling the first LLM really built for text to speech. Now, unlike conventional text to speech systems that convert text into spoken words, Octave Hume claims that Octave represents a fundamental shift in the approach. Approach, it's a speech language model that actually understands what words mean in context, enabling it to add appropriate emotional inflection, timing and expressiveness. It can actually interpret the meaning behind text. For instance, if you give it sarcastic dialogue, it naturally adopts a sarcastic tone. Now, using Octave, you can do something called voice design, which allows users to create custom AI voices from from text prompts. You can also give acting instructions, which lets you give directions to modify how the text is read. Last but certainly not least, eleven Labs itself has unveiled something called Scribe, which they're positioning as the world's most accurate speech to text model. While much of the industry focus has been on generating realistic speech, Scribe tackles the reverse challenge, actually transcribing spoken content into text across 99 languages. According to ElevenLabs, Scribe consistently outperforms leading models like Gemini 2.0, Flash, Whisper and Deepgram in benchmark tests. And it achieves particularly impressive accuracy rates in Italian and English and demonstrates huge improvements in traditionally underserved languages, things like Serbian, Cantonese and Malayam. Now, beyond basic transcription, Scribe offers structured outputs with word level timestamps. It can identify who's speaking and can even tag non speech audio for events like laughter. And it's available through their API. All right, Paul, that's a hugely packed week in AI. Tons of developments going on. Thanks for breaking everything down for us. Sure.

Summary

The Artificial Intelligence Show – Episode #138 Summary

Release Date: March 4, 2025

1. Introduction and Upcoming Events

Notable Quote:

Paul Raitzer [00:00]: "These models already are superhuman at persuasion. It's just red teamed out of them...join us as we accelerate AI literacy for all."

2. OpenAI Introduces GPT-4.5

Key Highlights:

Performance Improvements: Achieved a 62.5% accuracy rate on Simple QA benchmarks, up from GPT-4’s 38.2%.
Reduced Hallucinations: Decreased from 61.8% to 37.1%.
Accessibility: Currently available only to ChatGPT Pro users ($200/month).

Notable Quotes:

Mike Kaput [05:00]: "GPT-4.5 is out in the wild... it is the first model that feels like talking to a thoughtful person."
Paul Raitzer [07:48]: "I think it's more a sign of what's coming versus being some obvious leap forward in capabilities and performance."

3. Anthropic Releases Claude 3.7 Sonnet

Key Highlights:

Hybrid Approach: Combines quick responses with deep reflection within a single model.
Real-World Applications: Excels in encoding, web development, and complex agent workflows.
Claude Code: A command-line tool enabling developers to delegate substantial engineering tasks to Claude.

Notable Quotes:

Mike Kaput [19:57]: "Claude 3.7 is very much an intermediary step before the four."
Paul Raitzer [22:26]: "They’re presenting this as like we’ve cracked that reasoning should be part of these models...a prelude to these much bigger things."

4. Amazon Revamps Alexa with Generative AI

Key Highlights:

Enhanced Conversational Abilities: More natural and intuitive interactions.
Visual Understanding: Ability to process video feeds and respond to visual queries.
Agentic Capabilities: Alexa+ can autonomously navigate the internet to complete tasks on behalf of users.
Personalization and Memory: Remembers user preferences and personal data to tailor responses and actions.

Notable Quotes:

Mike Kaput [31:06]: "Alexa touches on so many areas of people's consumer and content consumption habits. How big a deal is this if it works as advertised?"
Paul Raitzer [34:11]: "If anyone's listening to the show the last month, you know how we feel about these Deep Research products. They are transformational."

5. Deep Research Now Available in ChatGPT Plus

Key Highlights:

Efficiency: Delivers high-quality research reports within minutes.
User Feedback: Positive evaluations, with 7 out of 19 experts rating its responses at a professional level.
Limitations: Initial access limited to 10 queries per month for non-pro users.

Notable Quotes:

Mike Kaput [46:02]: "This is exactly the type of thing we've been needing in some of our previous discussions...like, a way to actually evaluate AI models on the many, many valuable tasks."
Paul Raitzer [48:40]: "It is truly like, if you don't know what this technology is capable of, it can change the way you do."

The hosts discuss the transformative potential of Deep Research for knowledge workers, emphasizing its ability to significantly streamline research and strategic planning processes.

6. AI’s Disruption in Writing Professions

Key Highlights:

AI Supremacy in Content Creation: AI tools can generate high-quality content rapidly.
Shift in Writing Focus: Emphasis on personal experience and unique insights to differentiate from AI-generated content.
Opportunities for Writers: AI as a tool for instant feedback and idea refinement.

Notable Quotes:

David Perel [Timestamp Not Provided]: "If you do a great job prompting things like OpenAI's Deep Research, you can now produce content superior to what I could create in a full day's work on most topics."
Paul Raitzer [74:10]: "AI is changing what we do as writers, but I don't think enough people are coming together to really explore what that means."

Paul and Mike reflect on the necessity for writers to adapt by leveraging AI tools while focusing on inherently human elements like unscripted conversations and personal storytelling.

7. HubSpot’s AI-Driven Partner Ecosystem

Key Highlights:

Market Potential: AI-driven solutions anticipated to generate $10.2 billion.
Agentic Solutions: Building AI agents that address common business needs within HubSpot.
Data Integration: Focus on converting unstructured data from communications into structured, actionable formats.

Notable Quotes:

Paul Raitzer [79:33]: "A lot of agencies are going to go away. A bunch of other agencies are going to figure this stuff out and build amazing businesses."
Mike Kaput [81:14]: "There’s a huge role for humans in this agentic future."

The hosts discuss the dual impact on agencies—those that fail to adapt may become obsolete, while others that embrace AI-driven solutions can thrive by enhancing their service offerings.

8. Robotics Advancements by Figure

Key Highlights:

Helix AI Enhancements: Improved vision and motor control for faster and more efficient package handling.
Accelerated Testing: Humanoid robots to begin alpha testing in homes within the year.
Current Focus: Maintaining industrial applications alongside progressing towards consumer robotics.

Notable Quotes:

Mike Kaput [64:00]: "Do you really expect to see humanoid robots in homes beginning this year?"
Paul Raitzer [66:41]: "I do not believe that anyone needs to think they're going to go over a friend's house this holiday season and run into their robot."

Paul remains skeptical about the immediate consumer availability of humanoid robots, citing the need for further advancements before widespread adoption.

9. Listener Questions: Handling AI Hallucinations

In the Listener Questions segment, the hosts address concerns about AI hallucinations—instances where AI generates incorrect or misleading information.

Key Highlights:

Awareness and Oversight: Users must recognize the potential for inaccuracies and implement human oversight, especially in high-stakes scenarios.
Use Case Appropriateness: Suitable for brainstorming and creative tasks but requires caution in factual or research-intensive applications.
Prompt Engineering: Crafting detailed and specific prompts can mitigate some hallucination risks, though not entirely eliminate them.

Notable Quotes:

Paul Raitzer [81:54]: "You have to know that they exist... use them in use cases where it's okay if they make some mistakes."
Mike Kaput [83:16]: "There’s no guaranteed way through prompting to avoid hallucinations, but you can be more specific, more detailed."

The hosts emphasize the importance of integrating AI tools responsibly, ensuring that human verification remains a critical component of AI-assisted tasks.

10. Voice AI Developments

The episode concludes with a rapid-fire segment on the latest voice AI technologies:

Sesame: An AI startup led by Brendan Uribe introduces a highly conversational voice assistant integrated into companion AI glasses, enhancing real-time interaction and contextual understanding.

Quote:
- Mike Kaput [87:50]: "It was wild to see it all kind of all the Voice tech coming out at the same time."
Heygen and 11Labs Partnership: Collaboration to integrate voice generation with avatar creation, allowing tailored voices that match custom avatars based on specific prompts.
Hume AI’s Octave: Launch of the first LLM built specifically for text-to-speech, capable of understanding context and delivering emotionally nuanced speech.
Eleven Labs’ Scribe: Introduction of a highly accurate speech-to-text model supporting 99 languages, outperforming competitors in various benchmarks.

Notable Quotes:

Paul Raitzer [87:40]: "Alexa touches on so many areas...how much knowledge, how much are you giving up?"
Mike Kaput [64:00]: "These developments are a significant stride in making voice assistants more integrated and emotionally intelligent."

The advancements in voice AI signify a push towards more natural, responsive, and context-aware voice interactions, marking a transformative phase in human-AI communication.

11. Conclusion

Final Thoughts:

Paul Raitzer: Advocates for embracing AI tools while maintaining human oversight and leveraging uniquely human traits to stay competitive.
Mike Kaput: Highlights the necessity of continuous learning and adaptation to harness AI’s full potential effectively.

Stay Curious and Explore AI!