This Day in AI Podcast – Episode 99.21
"Is Haiku 4.5 really THIS good? OpenAI's Erotic Mode & Are MCP Apps the Right Approach?"
Hosts: Michael Sharkey & Chris Sharkey
Date: October 16, 2025
Episode Overview
Michael and Chris dive into the rapidly evolving world of AI, focusing on the buzz surrounding the upcoming Gemini 3 release, freshly launched Claude Haiku 4.5, OpenAI’s controversial "Erotic Mode," and the future of MCP (Multi-Channel Protocol) apps and integrations. With their trademark humor and “average guy” approach, they explore how AI benchmarks are shifting, the competitive landscape among top models, the economics of AI tooling, and debate how enterprise and consumer AI will intersect with new tooling and use cases.
Key Discussion Points & Insights
1. Gemini 3 Hype & Model Benchmarks
- Sneaky AB Testing & Desktop UI Generation
- Users are accessing a suspected Gemini 3 model via AI Studio AB tests ([00:23]).
- New benchmark: single-shot generation of desktop UIs (MacOS/Windows) in HTML/CSS.
- Visual fidelity and interactivity are impressive; models can even open text files, use a terminal, and emulate desktop UI features.
- Comparison with Gemini 2.5 Pro shows both are remarkably capable, but Gemini 3’s polish and fluidity stand out.
“It even has the bouncy ball effect at the bottom… it’s pretty phenomenal.”
– Michael ([00:48])
- Gemini 3 Feature Wishlist ([02:40])
- Bigger context window (2 million tokens?)
- Lower pricing
- Improved tool calling and agentic flow to match competition (“still feels turn-based, versus Claude’s internal clock”)
“Simultaneous tool calling and the agentic flow… could use a lot of work. Right now it’s turn-based.”
– Michael ([02:55])
2. Claude Haiku 4.5 First Impressions
- Performance Metrics & Value
- Faster, cheaper, and in some benchmarks rivals GPT-5 and Gemini 2.5—especially in agentic coding tasks ([04:43]).
- Debugged MCPs live with Haiku 4.5: extremely competent, handled complex test scenarios, and unusually “smart” for its smaller size ([04:43]–[05:22]).
- Praised for speed and price: $1/million input, $5/million output; context window at 200K (vs 1 million on rivals).
“It never felt stupid to me… never felt like a lesser model.”
– Michael ([05:22])
-
Limitations
- Smaller context window than rivals; might block some use cases ([07:53]).
-
Hands-On “One-Shot” Benchmark ([08:21])
- Haiku 4.5 can generate a functional MacOS-like UI with notepad and paint apps, matching Gemini 3’s performance (though less visually refined).
- Michel: “Haiku passed the new OS one shot benchmark.” ([09:57])
-
Tool Calling & Agentic Use Cases
- Haiku excelled at simultaneous tool calling, rapid context gathering, and output formatting.
- Chris’s real-world test: Uploading a bill, instructing AI to summarize and email it; Haiku managed all steps and added playful admonishments about electricity consumption ([11:04]).
“It did it all with simultaneous tool calls as well.”
– Chris ([11:43])
3. VO 3.1 Launch & The Evolution of AI Video
- Google’s VO 3.1: New Features
- Start/End frame control, sequence video using multiple image references ([12:41]–[14:30]).
- Demo: Transitioned Michael putting on sunglasses using reference images; albeit some artifacts like odd-looking teeth, the character and transition are impressive.
“The ability for it to transition to the last frame is incredible… a really major advancement.”
– Chris ([14:08])
- Combined images of himself, Mars landscape, and an alien for a short, coherent, cinematic video ([15:44]).
“It’s close enough to me that you’d believe it’s me, right?”
– Michael ([15:49])
- Commercial Adoption & Pricing Barriers
- Current costs ($4 for ~10 sec) keep experimentation and wide application out of reach except for advertisers or enterprises ([18:19]).
- Suggestion: Heavily watermarked “dev mode” for affordable prototyping ([19:56]).
“Honestly if I was… to build that video maker, I think I spent like $250 USD to build that.”
– Michael ([18:46])
- Viewpoints on Model Optimization
- Lowering costs and increasing speed are seen as more urgent than increasing raw intelligence ([20:16]–[20:56]):
- “If you make it cheaper and faster, productivity just skyrockets.”
- Lowering costs and increasing speed are seen as more urgent than increasing raw intelligence ([20:16]–[20:56]):
4. The Commoditization of AI Models & GLM 4.6
- Increasing perception that models are “commoditized,” but each new “tune” still brings unique strengths ([23:15]).
- GLM 4.6 joins Simtheory: solid all-rounder, particularly good for self-hosting and fine-tuning.
“If you’re going to host your own model… GLM 4.6 is a pretty good starting point.”
– Chris ([22:42])
- Anticipation that “frontier” models will become cheap and commoditized quickly.
5. OpenAI’s “Erotic Mode,” Age Gating & Policy Shifts
- Context: Sam Altman’s Announcement ([25:27])
- OpenAI plans to relax ChatGPT content restrictions, introduce user-selectable personalities, and age gating.
- Explicitly mentions allowing erotica for verified adults; stirs meme storms and controversy online.
“We hope you will like it better… as part of our treat adult users like adults principle, we will allow even more like erotica, a verified adult.”
– Michael, quoting Altman ([26:47])
-
Community & Privacy Concerns
- Chris is skeptical of demand for AI erotica: “Aren’t we just kind of used to the restrictions?” ([27:04])
- Michael notes potential blackmail, data privacy, and hacking risks – “What if the government gets access … blackmail is gonna be fast and furious” ([29:26]).
-
Economic & Strategic Motives
- Hosts suggest OpenAI’s push is more about deepening consumer addiction and maximizing engagement (“throwing darts and seeing what sticks”) ([33:31]).
- Debated: Is the age ID genuinely about safety, or really about verifying/controlling users?
6. OpenAI-Salesforce Partnership & The “AI-ification” of SaaS
-
Salesforce & Einstein AI “Pivot” ([35:54])
- Recent OpenAI-Salesforce (and Anthropic) partnerships viewed as SaaS’s way of admitting internal AI efforts can’t compete: “I'm pretty sure Salesforce is just waving the white flag to these guys.”
- Concerns about walled gardens and vendor lock-in via “elite” MCP partnerships ([39:03]).
-
User Data & Platform Lock-In
- Companies (like Salesforce) may resist open MCPs to keep user data captive ([39:32]).
- Hosts reference conversations with major enterprise strategists, confirming everyone is reevaluating open vs. closed platform approaches.
7. The Future Role of MCPs & Agentic Workflows
-
AI as The New Interface
- Hosts believe we’re moving toward interacting with most SaaS applications, and even whole business roles, through AI-mediated UIs (“eventually, all your SaaS runs through your preferred AI platform”) ([37:53]).
- Change will be greatest for new startups—legacy SaaS faces existential risk.
-
Custom MCPs & Workplace AI
- Internal company-built MCPs can streamline IT spend by allowing unprecedented control over data, process, and AI-powered workflows ([49:48]).
- “Just build my own internal MCPs… replace solution after solution… drive down IT spend.”
- Internal company-built MCPs can streamline IT spend by allowing unprecedented control over data, process, and AI-powered workflows ([49:48]).
-
Real-World Use Cases
- Stripe MCP as “best in class”—can do nearly everything except destructive actions ([43:45]).
- Support/help desk platforms like Zendesk and HelpScout: all could be replaced by a shared inbox MCP + database + AI ([45:27]).
8. Critique of App-Centric AI UI (ChatGPT Apps Store, Agentic Tasks vs. Single Apps)
- Skeptical About App Store Model ([55:29])
- Hosts both dismiss the idea that the future of AI interaction is single-purpose app selection within ChatGPT.
- Most real value: chains of context-aware, agentic actions spanning multiple apps and data sources.
“The future isn’t apps. We already have apps.”
– Chris ([55:29])
“Not a single use case where I benefit from having to force select a single MCP.”
– Michael ([56:47])
-
Agentic Flow & Multi-MCP Use
- Modern tool calling is smart enough to avoid constant manual tool selection—the AI should orchestrate.
- Combining MCPs and enabling human-in-the-loop for complex, multi-step tasks is what will drive productivity ([65:51]).
-
Anthropic’s Approach Praised
- Claude and Sonnet models “get” agentic MCP workflows and do not enforce unnecessary UI restrictions ([67:00]).
9. Listener Feedback, Predictions & Wrap-Up
- Gemini 3 Polymarket Watch
- Michael predicts Google’s Gemini 3 will “dominate” AI model leaderboards at end of October ([70:31]).
- Call to Comment:
- Hosts invite listener feedback on AI content restrictions, age-gating, and demand for erotica (“Safe word!”) ([33:50]).
Notable Quotes & Moments
-
“I think what they're saying is: we heard that you want more control over the models… so we're going to give you tools to have control.”
– Michael on OpenAI’s ChatGPT changes ([28:07]) -
“This might be bigger than all of software as a service and all of the App Store and all of that stuff combined.”
– Michael on AI agentic workflow & MCPs ([52:22]) -
“The usefulness of it comes when it’s in combination with other things… That’s what actually gives you power and leverage.”
– Chris on MCP app orchestration ([64:38]) -
“I strongly agree with you… there’s a real need for an MC-first approach… That’s really going to take off.”
– Chris ([47:46])
Timestamps for Major Segments
| Time | Segment / Topic | |--------------|---------------------------------------------------| | 00:23–02:40 | Gemini 3 rumors, desktop UI benchmark, old vs new | | 03:43–07:53 | Haiku 4.5 intro, benchmarks, pricing | | 09:52–11:43 | Haiku's tool calling, real-world MCP use | | 12:41–18:19 | VO 3.1 demos, AI video advances, pricing issues | | 20:16–22:42 | Economics: speed, price vs. intelligence | | 23:15–24:44 | GLM 4.6, commoditization, model evolution | | 25:27–33:31 | OpenAI “Erotic Mode,” age gating, verification | | 35:54–39:32 | OpenAI–Salesforce partnerships, SaaS surrender | | 43:21–46:12 | App lock-in risks, MCP as interface to SaaS | | 49:48–52:22 | Internal MCPs, agentic work, cost savings | | 55:29–58:15 | Critique: ChatGPT app-centric UI, agentic vision | | 65:37–69:15 | AI session workflow, collaborative context | | 70:31–72:29 | Market predictions, listener call-to-action, offers|
Tone & Style
Light-hearted, irreverent, self-deprecating, and energetic. The Sharkey brothers balance honest skepticism with genuine enthusiasm, using humor and plain-speak to demystify fast-moving AI news and tools.
Final Thoughts
This episode frames the future of AI as one where speed, cost, and deep integration (not shiny UI “apps”) will determine winners. The “agentic” tools and MCP vision are positioned as the transformative backbone of next-gen work—while both hosts question the PR strategies of AI’s biggest players (“Do they even use their own products?”) and poke fun at the “erotic mode” arms race.
For those riding the AI wave, this episode offers both reality checks and inspiration for where AI workflows could soon land—especially if you’re building, rather than simply using, these new tools.
Coupon: SIMLINK – 30% off Simtheory annual plans, including pre-release access to the “Simlink” agentic MCP tool ([71:16]).
If you only listen to one section:
Check out the candid breakdown of MCPs, AI workflow orchestration, and why the “app store” model misses the real paradigm shift—start at [43:21] and follow the debate on where AI productivity will truly happen.
