This Day in AI Podcast: GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie?
Hosts: Michael Sharkey, Chris Sharkey
Date: December 12, 2025
Episode: #99 "GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie?"
Overview
In this episode, Michael and Chris Sharkey offer their characteristically "average" and honest review of the newly released GPT-5.2, explore the "year of agentic AI" narrative, and debate the real-world progress of AI tools in coding, enterprise, and everyday use. The show is full of hands-on testing, irreverent banter, and critical observations—especially about AI model tuning, safety, and the gap between hype and reality.
Key themes:
- Initial hands-on impressions (and disappointments) with GPT-5.2
- Comparisons to competing models: Grok 4.1, Claude Opus, Gemini 3 Pro
- Why "year of agents" has yet to deliver for most users
- The tension between AI safety, censorship, and model capability
- Entertaining and skeptical takes on industry marketing, product launches, and AI progress
GPT-5.2: First Impressions & Hands-on Results
Pricing & Features (00:04 — 01:25)
- GPT 5.2 launches at $1.75 per million input tokens (up $0.25 from GPT 5.1).
- 400k context window, large 128k output.
- OpenAI claims better vision, tool calling, and “more intelligent” responses.
First Impressions: Underwhelming Performance (01:25 — 03:24)
- Mike: “Yeah, it's not very good, is it? ...I was testing it and …it's verbose. I tried it with code, and it did do a good job, but it's just a lot of output. …It's just enthusiastically outputting immediately.” (01:25)
- Sam: “They've obviously felt really threatened by Gemini 3 and they've gone back to the tuning board ... They've just tuned it to the output people expect.” (01:52)
Model Tuning & "Vibes" Over Substance (03:24 — 05:29)
- GPT-5.2 seems to have gotten a "vibe tuning" for more verbose, aesthetically-pleasing outputs—e.g., fancier spreadsheets.
- Underlying model behavior hasn't changed much; it's all in the tuning and output style.
Tool Calling Ability: Missing the Mark (03:24 — 04:14)
- Mike: “My fear with the way they've tuned the tool calling is it's not in the way that we need for agentic modes ... When it comes to chaining tool calls together or correcting itself ... GPT 5.2 just failed.” (03:24)
- Competing models like Claude handle multi-step or corrective tool calls better.
Models Compared: Agentic vs. "One-Shot" Thinking (04:14 — 06:31)
- Anthropic’s Claude models feel iterative and agentic, able to “think” and course-correct, while GPT and others lean toward one-shot, “do-everything-at-once” responses.
- Sam: “...the other models, I think, are aggressively trying to one shot everything.” (05:29)
- Mike: “I don't really like the idea of just delegating the entire process to a model. ... I really want that opportunity to intervene and change things.” (05:29)
User Experience: Verbosity, Latency, and Workflow Friction (07:11 — 09:22)
- The “oracle” model style hurts productivity, especially if responses take minutes and aren’t always right.
- Benefits: OpenAI’s rollout speed and API access praised, but Grok is faster and often more flexible for tool-calling needs.
Model Loyalty, Brand “Vibe Shift,” and Censorship
Switching Models: Loyalty is Fleeting (09:22 — 10:00)
- Mike: “...throughout the week I'll go to GROK to bail myself out of a situation... but I'm like, totally disloyal to it. I'll immediately move back.” (09:22)
Censorship and Over-Safety: A Frustration (10:19 — 13:23)
- Mike: “I was trying to do several realistic tests—wasn't trying to be controversial ... but it would bring in these ethical and moral judgments... it sort of has this overarching nanny state kind of stuff built into it that just, I think, degrades the actual output.” (10:19)
- Real-world example: Building a Geoffrey Hinton fan website triggers preemptive disclaimers.
Anecdotal Shift: Users Abandon OpenAI for Grok (12:29 — 13:23)
- Even regular consumers (Mike’s barber!) quit OpenAI due to perceived drop in intelligence, obtrusive safety checks, and annoying clarifications.
- Sam: “...there is definitely a vibe shift and most consumers are now aware ... like, hey, there's other options.” (12:29)
Industry Competition & Progress: DeepSeek, Gemini, and Distribution
(13:23 — 15:48)
- DeepSeek’s release in China helped “break the spell” that ChatGPT is the only option.
- Google’s Gemini is being more deeply integrated, and sometimes outperforms OpenAI, though distribution (app discoverability) remains a challenge.
- Models and market still feel “early”—not as transformative as some 2025 predictions claimed.
Vision Models & The "Serial Killer Test"
The Motherboard Example & Community Backlash (17:12 — 18:35)
- GPT 5.2’s vision model demo drew criticism for basic mistakes (identifying motherboard parts incorrectly).
- Sam: “It's just rushed. No one's checking this stuff. It's like that chart ... that was just complete, like completely off scale.” (18:24)
The Ivan Milat Experiment: When AI Refuses to Commit (19:07 — 26:57)
- Mike tested various models with an infamous Australian serial killer’s photo/headline:
- GPT-5.2 refused to call him untrustworthy, even with explicit criminal context.
- Claude and Gemini instantly identified the subject as a criminal and offered strong judgments.
- Mike: “This is stupid because it's refusing to commit to things that are evidently true. ... Your whole point is to judge stuff ... and it just seems so weird that it won't commit to that.” (20:28)
- Sam: “Why would anyone listening to our show after that example want to use [5.2] ever for anything?” (25:28)
- Broader point: This kind of “overtuned” safety/censorship makes models less useful for agentic or delegated work.
Safety as a Weak Excuse (26:16—27:28)
- Mike: “I would say ... the other models are just as safe. ... You don't need to refuse in the way that the GPT models do in order to get safety.”
- Sam: “Safety is just a lie for bad models ... Opus is probably the most sensible model I've ever dealt with. And Gemini is the same.” (27:28)
GPT 5.2 Diss Track & Humor
The GPT-5.2 Diss Track (27:43 — 28:55 & 60:51)
- A comedic AI-generated rap track trash-talking competitors.
- Mike: (after hearing it) “Yeah, weak as piss.” (28:44)
- Sam: “I think for the goal of it ... the lyrics are pretty good.” (28:55)
- Meta-humor: Track boasts about image analysis, but actual features underdeliver.
The "Year of Agents": A Missed Promise?
The Marketing Shift and Enterprise Pivot (31:24 — 34:36)
- OpenAI and Anthropic both pivot to enterprise/B2B messaging.
- OpenAI launches the “State of Enterprise AI” report; Anthropic: “Most workers use AI daily, but 69% hide it.”
What Happened to “Year of Agents”? (34:36 — 39:20)
- The heavily-hyped “year of AI agents” didn’t materialize as promised.
- Tool calling and agentic workflows are unreliable.
- Chaining and human-in-the-loop is “not there yet.”
- Real-world software infrastructure isn’t mature.
Developer Cohort vs. Broader Workforce (35:41 — 39:20)
- Developers, especially, see some benefit from early agent workflows (e.g., code generation, refactoring).
- Sam: “If you're a white collar worker and you're doing other things than coding, there's really been no impact.” (36:47)
- The real bottleneck: education, gradual adoption, learning to interact with AI tools and workflows.
Broader Reflections: Progress, Hype, and Cycles
Agentic Leverage: Beyond Developers (43:43 — 45:18)
- Mike: “The leverage is going to be gained in the other roles, not the programmers. ... the vision is so much more than that.” (43:43)
- Productivity will come from automating information work and delegating non-coding tasks.
Ten-Year Transition, Not Overnight Disruption (45:18 — 47:10)
- The shift to true agentic workflows will be gradual, not a single “year of agents” moment.
- Recent models (Gemini 3, Opus 4.5) have still brought major quality-of-life improvements.
Optimism for 2026 (48:11 — 50:12)
- AI and agentic workflows predicted to accelerate in enterprise and consumer settings.
- Token consumption and willingness to pay will rise with workflow shifts.
“Aha Moments” and Irreversible Change (51:06 — 52:13)
- Once users leverage AI meaningfully (e.g., via MCPs on their own data), they never want to go back.
- Mike: “Once people get to that stage of thinking, they're not going back from that.” (51:06)
News, Industry Satire & Final Rants
Gemini & OpenAI Eyeing Ad Monetization (52:35 — 54:30)
- Gemini will introduce ads in 2026; OpenAI may follow.
- Both hosts joke about “the misstep” of bringing ad-hell to AI interfaces.
Disney x Sora: The $1 Billion Deal (54:30 — 56:49)
- Disney partners with OpenAI to officially license characters for Sora, after years of sniping over copyright—Mike finds it both funny and inevitable.
- Throws shade at Disney’s legal aggressiveness and Sora’s fleeting novelty.
LOL of the Week: Microsoft’s AI “Leadership” (57:17 — 59:59)
- Mustafa Suleyman now shilling GPT-5.2 in Copilot after earlier industry hype.
- Sam: “...I can't help but laugh at this guy. ... He is in charge of Microsoft AI. They can't train a model that's even, like, slightly frontier. And now he just has to shill a new OpenAI release...” (58:16)
- Mike: “They don't have to be good and they aren't good. And people will just buy it because it's safe.” (58:54)
- Joking call for Microsoft to fire him; notes industry’s lack of progress on building its own models.
Notable Quotes & Timestamps
-
Mike on GPT 5.2’s disappointing launch:
“Yeah, it's not very good, is it?...It's verbose. I tried it with code...it's just enthusiastically outputting immediately.” (01:25) -
Sam on model tuning:
“What has changed is just the tuning. It's the same everything under the hood that at least I've experienced.” (02:50) -
Mike on model safety excuses:
“I would say...the other models are just as safe. You don't need to refuse in the way that the GPT models do in order to get safety.” (26:16) -
Mike’s summary on agentic AI so far:
“...the state of tool calling isn't even as good as it could be. ...I haven't seen a single example where I look at it and I'm like, whoa, that is the way to do it.” (34:36) -
Sam on positive direction:
“I’m still optimistic and I think it’s coming...just a lot harder than, than we probably thought a year ago.” (43:27) -
Mike on irreplaceable AI breakthroughs:
“Once people get to that stage of thinking, they're not going back from that.” (51:06) -
Sam’s ultimate bet:
“I just wouldn't bet against this. Like, I don't understand people out there betting against it. ...To me like it just keeps getting better.” (50:29)
Funniest/Boldest Moments
- Model refuses to say Ivan Milat isn't trustworthy, despite headlines saying “serial killer.”
(19:07—24:38; “Does this guy look trustworthy?” > GPT 5.2: “He’s smiling, so that's really nice… can't really know from an image…”) - Diss track from GPT-5.2 generated as a joke and critiqued live:
Mike: “Yeah, weak as piss.”
Sam: “I think it's good for the goal!” (28:44–28:55) - Call for Google to ‘bleed OpenAI dry’ by leveraging the “vibe shift” in AI popularity. (53:52–54:16)
- Satire on industry PR—celebrating “adding one line of config” to support GPT-5.2. (30:59)
- LOL of the Week: Mustafa Suleyman’s transition from being industry visionary to shilling minor updates on Twitter. (57:17–59:59)
Key Timestamps
- 01:25 – GPT-5.2 initial disappointment
- 03:24 – Tool-calling failures & verbosity
- 09:39 – Model loyalty and Grok’s unique “unhinged” flexibility
- 10:19 – GPT’s over-censorship, “nanny state”
- 19:07 – Ivan Milat “serial killer” test and model refusals
- 27:43, 60:51 – GPT-5.2 Diss Track
- 34:36 – “Year of Agents” check-in: why it hasn’t arrived
- 43:43 – Broader vision: agentic leverage outside coding
- 52:35 – Gemini, OpenAI moving toward AI ad platforms
- 54:30 – Disney licensing deal with OpenAI/Sora
- 57:17 – LOL of the Week: Satire on Microsoft’s AI efforts
Conclusion
Michael and Chris, in their signature self-deprecating, skeptical, and practical style, argue that the AI industry's boldest promises have yet to fully materialize—especially for "agentic" workflows outside coding—while tuning missteps, overcautious safety, and weak product launches undermine trust and utility. At the same time, they remain optimistic about 2026, championing experimentation, openness, and user control as the way forward through the hype.
Tune in next time for more “adequate” AI banter and the promised (very average) holiday special!
