
Loading summary
Alex
You know how it is. Blink and you've missed, like a whole week of tech news.
Ben
Absolutely. It moves so fast.
Alex
That's why we do this.
Ben
Yeah.
Alex
You shared some really interesting articles and we've done the digging to get you the. The core stuff.
Ben
Yeah, Distilled it down.
Alex
So today we're doing a quick run through of some key AI bits. We've got Instagram on age detection, some questions around OpenAI's O3 benchmarks.
Ben
Right. That's been making waves.
Alex
The kind of funny cost of saying please and thank you to AI, huh?
Ben
Yeah, that one.
Alex
And this pretty wild startup aiming to automate everything.
Ben
Mechanize. Yeah, big ambitions there.
Alex
Okay, so let's jump right in. Instagram first. They seem to be really leaning into using AI to check user ages.
Ben
They are. The idea is if the AI flags someone as maybe being younger than their profile says, they get put into these teen accounts.
Alex
Teen accounts. Okay. And what does that actually mean for the user?
Ben
Well, it means more protections, things like limits on who can message them, you know, strangers. And also the kind of content they see. It's restricted based on age appropriateness.
Alex
That seems sensible.
Ben
And importantly, if you're under 16, your parents actually have to. Okay. Any changes to those default safety settings. You can't just switch them off yourself.
Alex
Gotcha. And how are they spotting these accounts? Is it just random checks?
Ben
Not exactly random. They've been doing AI age detection for a while, but now it's specifically about getting teens into these protected accounts. They look for clues, like people posting happy birthday messages that don't match the listed age.
Alex
Ah, okay. Clever.
Ben
Yeah, and user reports too. If someone flags an account as potentially underage, that gets fed into the system.
Alex
But AI isn't perfect, Right? What if it gets it wrong?
Ben
Good point. Instagram acknowledges that they. They do have an appeals process, so if you think you've been wrongly put into a teen account, you can challenge it.
Alex
Okay, fair enough.
Ben
And they're also trying to loop parents in more, sending notifications about why accurate age online is important.
Alex
Seems like a big push. Have they said how many people this affects?
Ben
They have. Meta, Instagram's parent company, says they've enrolled, get this, 54 million teens globally into these accounts.
Alex
Wow, 54 million.
Ben
Yeah. And apparently something like 97% of the 13 to 15 year olds stay in them once they're enrolled.
Alex
So for listeners tracking platform safety, especially for kids, this is definitely a key development. AI isn't just behind the scenes anymore.
Ben
Exactly. It's becoming a core tool for policy enforcement, for Actual safety measures, not just, you know, recommending videos or ads. The scale is pretty staggering too.
Alex
All right, let's switch gears. OpenAI's O3 model, there was some buzz around its performance on a math benchmark. Frontier Math.
Ben
Yes, Frontier math. So initially, OpenAI kind of suggested O3 scored just over a fourth, like over 25% on this really hard math test.
Alex
Which sounds amazing, right? Especially when others were down at what, 2%?
Ben
Exactly. Huge jump. But it got a bit complicated.
Alex
How so?
Ben
Well, it looks like that really high score might have been from a version of O3 that, let's say, had a lot more process processing power behind it than the version regular users got access to.
Alex
Ah, okay, so like a special test version kind of.
Ben
Epoch AI, the group that actually made the Frontier Math test, they ran their own checks on the public O3 model and they found it scored closer to 10%. Still good. Better than 2%, but, you know, not quite the 25% plus figure that was first floated.
Alex
Hmm. So quite a gap there. Did OpenAI acknowledge that?
Ben
To be fair, OpenAI's own published results did include a lower score that sort of matched ebok's findings. But the way the initial higher number was maybe emphasized, it raised some eyebrows about transparency.
Alex
Right. Why the difference then? Just the computing power that seems to.
Ben
Be a big part of it. Epoch also mentioned maybe slight differences in test setups, or possibly using an updated version of Frontier Math. And another group, the ARC Prize foundation, suggested the public O3 is just well, different, tuned for chat, for speed, running on smaller compute tiers.
Alex
Like optimizing a car for fuel efficiency versus raw speed.
Ben
Yeah, something like that. Someone from OpenAI Wendezhou basically said the production model is built for real world use cases and quick responses, and that can cause disparities on specific benchmarks like this.
Alex
Okay, that makes practical sense.
Ben
And it's worth noting OpenAI already has newer models like O3 Mini High and O4 Mini that actually beat the original O3 on this test. And an O3 Pro is coming.
Alex
So the bigger picture here for listeners.
Ben
Is I'd say it's just be cautious with benchmarks, especially ones released by the companies selling the AI. They're useful, but they don't tell the whole story. Dig a little deeper.
Alex
Good advice. And seems like this isn't the first time benchmark results have caused a bit of confusion or controversy in AI lately.
Ben
No, it's definitely a recurring theme. Shows we still need better, maybe more independent ways to really measure how capable these things are across the Board, not just on one test.
Alex
Okay, moving on to something a bit lighter. Perhaps politeness and AI.
Ben
Ah, yes, please and thank you, Kaz.
Alex
Someone actually asked about the electricity cost of users being polite to chatbots, and.
Ben
Sam Altman from OpenAI had that great reply. Something like tens of millions of dollars well spent.
Alex
Which is funny, but is there anything more to it? Does being polite actually do anything?
Ben
Well, maybe. Kurt Beavers at Microsoft Copilot suggested that, you know, using polite language could actually set a more positive tone, maybe influencing the AI's response style.
Alex
Really? Like the AI picks up on the vibe.
Ben
Sort of seems to be the idea. He even joked that maybe sometimes swearing at it could be useful too.
Alex
Okay, maybe don't try that at work. But it's interesting, this human tendency to treat AI like. Well, like another person.
Ben
It really is. We project our social norms onto them, even though they don't feel politeness like we do. It's just inherently human, I suppose. A funny quirk of human computer interaction.
Alex
Definitely one for listeners to ponder next time they chat with an AI. Okay, final topic. This startup, Mechanize, sounds ambitious.
Ben
Ambitious is one word for it. Controversial might be another.
Alex
It was founded by Temay Basiroglu, who also founded Epoch AI. Right. The benchmark people we just talked about.
Ben
That's the one. And Mechanize's stated mission is. Well, full automation of all work, the entire economy.
Alex
Wow, okay. Full automation, that's a statement.
Ben
It is. And it's kicked up a lot of dust online on X. Especially people raising serious concerns about, you know, the impact on jobs, society.
Alex
Understandable.
Ben
And also questions about Epoch AI's impartiality given the founder's new venture.
Alex
So what's mechanized actually trying to do right now? Build the robots.
Ben
Not exactly build the robots themselves, but provide the tools, the data, the evaluation methods to enable others to automate pretty much any job.
Alex
Any job.
Ben
That's the goal. The Seruglu calculated this huge potential market, like $60 trillion a year based on global wages. For now though, the focus seems to be more on white collar knowledge worker type jobs.
Alex
The reaction must be intense.
Ben
Oh yeah, lots of criticism. People pointing out that if you automate all labor, what happens to humans? How do people earn a living? It raises fundamental questions and the connection.
Alex
Back to Epoch AI, the research group that adds another layer.
Ben
It does. People are asking if research coming out of Epoch might be influenced even subconsciously, by this goal of full automation. Especially since there was past controversy about E box funding from OpenAI not being initially disclosed.
Alex
Right. But Mechanize has investors. People are backing this vision?
Ben
Apparently so. Some prominent names. Bcruglu's argument is basically that full automation will unlock explosive economic growth and ultimately make everyone better off. Higher living standards.
Alex
The classic tech utopia argument.
Ben
Pretty much, yeah. But the counter is always well, how do people buy things in this automated utopia if they don't have jobs or income?
Alex
Good question. Did he address that?
Ben
He suggested maybe wages go up in roles that complement AI. Or income shifts to other things, investments, rent, maybe Universal basic income or welfare systems.
Alex
Big societal shifts needed there.
Ben
Huge ones. And look, Bisaroglu himself admits the tech isn't there yet. Making AI agents that can reliably do complex multi step jobs without messing up, that's still a massive challenge. They struggle with memory, long term planning.
Alex
So it's not happening tomorrow?
Ben
Definitely not. But Mechanize isn't alone in pushing towards more capable AI agents. You've got Salesforce, Microsoft, OpenAI, tons of startups all working on this, right? The fact that Mechanize is hiring signals they're serious. Even if the ultimate goal seems extreme to many. It definitely makes you think about the future of work.
Alex
It really does. Okay, so quick, wrap up Instagram using AI for teen safety.
Ben
Yep. A practical application growing in scale.
Alex
OpenAI benchmarks showing we need to be.
Ben
Critical about performance claims, transparency and context matter.
Alex
The little quirk about maybe being polite to your AI, the human element and mechanized pushing the boundary, maybe even breaking it on automation.
Ben
A very provocative vision of the future for sure.
Alex
It's a real snapshot of how fast things are moving, isn't it? From everyday app features to these huge economic questions.
Ben
It really is. AI is weaving itself into everything.
Alex
So thinking about all this, the speed, the integration, the potential disruption, what's the big question you think we should all be wrestling with as this technology keeps evolving so rapidly?
Ben
That's a tough one. Maybe. How do we ensure these powerful tools actually align with human well being? Broadly defined, not just efficiency or profit.
Alex
Hmm, that's a good one to mull over. Thanks again for sharing the articles that spark this. It's been a fascinating deep dive.
Ben
My pleasure. Always interesting stuff to discuss and definitely.
Alex
Let us know what other areas you'd like us to tackle next time.
AI Deep Dive Podcast Summary
Episode: Instagram Uses AI for Age Detection, OpenAI’s O3 Faces Benchmark Scrutiny, & Mechanize Aims to Automate All Work
Release Date: April 21, 2025
Host: Daily Deep Dives
Overview:
In the opening segment, Alex and Ben delve into Instagram's latest initiative to enhance platform safety through AI-driven age detection. Instagram has implemented an AI system designed to identify underage users and transition them into "teen accounts," which come with enhanced protections.
Key Points:
AI Implementation:
Instagram utilizes AI to flag accounts that may belong to users under the age they've declared. For instance, if a user’s posts (like birthday messages) suggest a younger age than stated, the AI system flags the account for review.
Ben [00:26]: “They look for clues, like people posting happy birthday messages that don't match the listed age.”
Impact on Users:
Once flagged, users are moved to teen accounts, which restrict functionalities such as messaging strangers and limit content visibility to age-appropriate material. Notably, for users under 16, parental consent is required to modify any safety settings.
Ben [01:02]: “If you're under 16, your parents actually have to... Any changes to those default safety settings. You can't just switch them off yourself.”
Scale of Implementation:
Meta, Instagram's parent company, has reportedly enrolled 54 million teens globally into these protected accounts, with 97% of users aged 13 to 15 remaining in these accounts post-enrollment.
Ben [02:10]: “Meta... says they've enrolled, get this, 54 million teens globally into these accounts.”
User Feedback and Appeals:
Acknowledging the imperfections of AI, Instagram offers an appeals process for users who believe they've been incorrectly categorized. Additionally, the platform is increasing efforts to involve parents by notifying them about the importance of accurate age representation online.
Insights:
This move signifies a significant shift in how social media platforms leverage AI for policy enforcement and user safety. By automating age verification, Instagram not only enhances protection for younger users but also underscores AI's growing role in maintaining platform integrity at scale.
Overview:
The podcast transitions to a critical examination of OpenAI's O3 model and its performance on the Frontier Math benchmark, highlighting discrepancies between reported and actual results.
Key Points:
Initial Claims vs. Reality:
OpenAI initially claimed that the O3 model scored over 25% on the challenging Frontier Math test, a significant leap from previous models that scored around 2%. However, this figure was based on a specialized version of O3 with enhanced processing power.
Ben [02:51]: “OpenAI kind of suggested O3 scored just over a fourth, like over 25% on this really hard math test.”
Independent Assessments:
Epoch AI, the group behind Frontier Math, conducted independent tests on the publicly available O3 model and found a more modest score of approximately 10%, still outperforming earlier models but far below the initially claimed 25%.
Ben [03:15]: “They found it scored closer to 10%. Still good. Better than 2%, but... not quite the 25% plus figure that was first floated.”
OpenAI’s Response:
OpenAI clarified that their published results did include the lower score but acknowledged that the emphasis on the higher number led to concerns about transparency. They attributed the disparity to differences in compute resources and model tuning aimed at real-world applications versus benchmark testing.
Ben [04:09]: “Someone from OpenAI... said the production model is built for real world use cases and quick responses, and that can cause disparities on specific benchmarks like this.”
Future Developments:
OpenAI has introduced newer models like O3 Mini High and O4 Mini, which outperform the original O3 on the Frontier Math test. An O3 Pro model is also on the horizon, indicating ongoing efforts to enhance benchmark performances.
Ben [04:37]: “OpenAI already has newer models like O3 Mini High and O4 Mini that actually beat the original O3 on this test. And an O3 Pro is coming.”
Insights:
This segment underscores the complexities of AI benchmarking and the importance of transparency. It highlights the necessity for independent evaluations to provide a more accurate picture of AI capabilities, ensuring stakeholders have reliable information beyond promotional figures.
Overview:
Shifting gears, the hosts explore an intriguing topic about the implications of user politeness in interactions with AI, sparked by a user query regarding the electricity costs associated with using courteous language.
Key Points:
User Query:
A listener inquired about whether being polite to AI, such as saying "please" and "thank you," impacts the AI's operational costs, particularly in terms of electricity usage.
Alex [05:10]: “Perhaps politeness and AI.”
OpenAI's Response:
Sam Altman from OpenAI humorously addressed the question, stating that the electricity costs of polite interactions amount to “tens of millions of dollars well spent.”
Ben [05:17]: “Sam Altman from OpenAI had that great reply. Something like tens of millions of dollars well spent.”
Potential Impacts of Politeness:
Kurt Beavers from Microsoft Copilot suggested that using polite language might influence the AI's response style, setting a more positive tone in interactions. Conversely, he speculated that swearing could also have an impact, albeit humorously advising against such behavior in professional settings.
Ben [05:37]: “Using polite language could actually set a more positive tone, maybe influencing the AI's response style.”
Human-AI Interaction Dynamics:
The discussion highlights humans' tendency to project social norms onto AI, treating them akin to human interlocutors despite AI lacking genuine feelings. This behavior reflects inherent human traits in computer interactions.
Ben [05:53]: “We project our social norms onto them, even though they don't feel politeness like we do.”
Insights:
This conversation delves into the psychological aspects of human-AI interactions, emphasizing how human behaviors and social norms are often extended to AI entities. It raises questions about the nature of these interactions and their broader implications on user experience and AI behavior modulation.
Overview:
The final segment addresses the provocative mission of Mechanize, a startup founded by Temay Basiroglu, who also established Epoch AI. Mechanize aims to achieve full automation of all work within the economy, sparking significant debate.
Key Points:
Company Mission:
Mechanize's audacious goal is to automate every job across the global economy, targeting an estimated market potential of $60 trillion annually based on global wages. Currently, their focus is on automating white-collar, knowledge-based jobs.
Ben [06:13]: “Mechanize's stated mission is... full automation of all work, the entire economy.”
Public Reaction:
The company's ambitions have ignited intense discussions on platforms like Twitter (referred to as "X"), with widespread concerns about the potential loss of jobs and societal upheaval. Critics question the feasibility and ethical implications of such comprehensive automation.
Ben [06:37]: “Lots of criticism. People pointing out that if you automate all labor, what happens to humans? How do people earn a living?”
Connection to Epoch AI:
Given that Temay Basiroglu also founded Epoch AI—the group involved in the earlier Frontier Math benchmark controversy—there are apprehensions about potential biases or impartiality in Epoch AI’s research outputs due to the intertwined leadership.
Ben [07:22]: “People are asking if research coming out of Epoch might be influenced even subconsciously, by this goal of full automation.”
Financial Backing and Vision:
Mechanize has attracted prominent investors who support Basiroglu's vision of automation driving explosive economic growth and enhancing living standards. However, skeptics challenge the practicalities of sustaining such a utopia without traditional income sources.
Ben [07:41]: “Bcruglu's argument is basically that full automation will unlock explosive economic growth and ultimately make everyone better off. Higher living standards.”
Societal Implications:
Basiroglu acknowledges that technological advancements necessary for reliable AI-driven automation are still in development, particularly concerning AI's ability to handle complex, multi-step tasks with consistent accuracy.
Ben [08:14]: “Bisaroglu himself admits the tech isn't there yet. Making AI agents that can reliably do complex multi step jobs without messing up, that's still a massive challenge.”
Future Outlook:
Despite current technological limitations, Mechanize’s recruitment efforts signal a serious commitment to progressing towards their ambitious goals. This trend is echoed across the industry, with major companies and startups alike investing in more capable AI agents, indicating a broader movement towards extensive automation.
Ben [08:28]: “Mechanize isn't alone in pushing towards more capable AI agents. You've got Salesforce, Microsoft, OpenAI, tons of startups all working on this.”
Insights:
Mechanize exemplifies the ambitious frontier of AI-driven automation, highlighting both the transformative potential and the profound challenges such endeavors present. This pursuit raises critical questions about the future of work, economic structures, and societal well-being in an increasingly automated world.
In wrapping up the episode, Alex and Ben reflect on the diverse ways AI is embedding itself into various facets of society, from enhancing user safety on social media to posing fundamental economic and ethical questions about the future of work.
Final Thoughts:
Integration of AI:
AI's role is expanding beyond backend functionalities to become a pivotal tool in policy enforcement, user safety, and even shaping economic landscapes.
Alex [08:56]: “So the bigger picture here for listeners...”
Caution with Benchmarks:
The discussion around OpenAI's O3 model serves as a reminder to approach AI benchmark claims with a critical eye, emphasizing the importance of transparency and independent verification.
Ben [04:50]: “It's just be cautious with benchmarks... Dig a little deeper.”
Human-AI Interaction:
The exploration of human behaviors projected onto AI underscores the need to understand and navigate the psychological dimensions of interacting with intelligent systems.
Ben [05:53]: “We project our social norms onto them...”
Future of Work:
Mechanize's vision compels listeners to contemplate the profound societal shifts that widespread automation could entail, urging a broader dialogue on aligning technological advancements with human well-being.
Ben [09:14]: “It's a very provocative vision of the future for sure.”
Closing Quote:
Ben [09:33]: “How do we ensure these powerful tools actually align with human well being? Broadly defined, not just efficiency or profit.”
Final Takeaway:
As AI continues to evolve at a breakneck pace, its integration into everyday applications and its potential to disrupt economic and social structures necessitates thoughtful consideration and proactive measures to ensure that technological progress harmonizes with the broader interests of humanity.
Stay informed and ahead of the curve by tuning in to the AI Deep Dive Podcast, where each episode unpacks the latest in artificial intelligence breakthroughs, trends, and applications shaping our world.