This Day in AI Podcast — EP99.29
Episode: Gemini 3 Flash, GPT-Image-1.5, Skills vs MCPs, and Our 2025 Model Reviews
Hosts: Michael Sharkey & Chris Sharkey
Date: December 23, 2025
Overview
In this lively year-end episode, Michael and Chris Sharkey reflect on an incredibly dynamic year in AI. With their usual mix of humor, self-deprecation, and surprising technical depth for two "proudly average" techies, the Sharkey brothers review the latest model releases (Gemini 3 Flash, GPT-Image 1.5), analyze the shifting landscape of AI model leadership, and dig into the evolving paradigms for building enterprise AI tools (Skills vs. MCPs). The episode serves as both a 2025 AI recap and an opinionated, practical guide for listeners trying to make sense of an industry moving at breakneck speed.
Key Discussion Points & Insights
1. Gemini 3 Flash: Release, Pricing, and Performance
[01:00–10:30]
-
Shipping on the Beach:
Chris recounts deploying Gemini 3 Flash while hunched by a brick wall on a family holiday—“looking like a sophisticated programming hobo” (Mike, 00:56), then laments wasting 20 minutes due to misreading a logo.
“I've got glasses coming, by the way.” (Chris, 01:41) -
Model Impressions:
Mike highlights that Gemini 2.5 Flash had become their “absolute workhorse” for summaries and reasoning within SIM Theory, expecting Gemini 3 Flash to be faster, smarter, and cheaper.- Actual outcome: “Pricing's actually gone up… But in the benchmarks it's crazy… It just seems like a sped up, cheaper 2.5 Pro.” (Mike, 02:49–03:50)
- Chris adds: “It's got this really nice tight output where it actually gets what you're after and really sticks to the brief… I actually like the Flash answer better.” (03:51)
-
Model Naming Critique:
Both hosts agree: the branding is confusing (“they just slap a brand on these models”), and the differences suggest full re-trains rather than mere upgrades. -
Unreliability Across the Board:
Mike and Chris share that no current “flagship model” (Gemini Pro 3, Claude Opus 4.5, GPT 5.2) feels 100% reliable on a daily basis—leading to more frequent model-switching in their workflows.“I find myself in this weird thing where I'm not really trusting any models at the moment. So jumping down to a sort of mid range model that I think is more reliable is actually a really good option to have.”
— Chris, 06:54 -
Model as ‘Recovery Tool’:
Gemini Flash 3 has become their go-to fallback when major models let them down — “sort of Hail Mary model to get things back on track.” (Mike, 07:51)
2. Image Model Wars: GPT Image 1.5 vs. Nano Banana Pro
[10:45–20:39]
-
OpenAI’s GPT Image 1.5:
- Mike argues it was a “rush reaction to Nano Banana Pro,” and while it’s a notable improvement, it doesn’t beat Nano Banana on reliability, character consistency, or infographics quality. “Nano Banana Pro typically just wins and is more reliable, especially at things like upscaling.” (Mike, 11:15)
-
User Perception:
- For mainstream ChatGPT users, GPT Image 1.5 is “good enough,” but AI enthusiasts pursuing quality will migrate to platforms with better models (i.e., Google/Gemini and Nano Banana).
-
Relaxed Safety Controls:
Chris is shocked by the lack of filtering:“It made like the most graphic, detailed image… I can't believe that it did that.” (15:03)
Mike’s party trick: inserting Taylor Swift into every photo.
-
Direct Comparisons:
- Mike runs several side-by-side image tests:
- “This one [Nano Banana] is so good. … GPT Image 1.5… just looks like a dirty effect over it.”
- For infographics: GPT underwhelms—“looks like it was made on Ms. Paint 95.” (16:54)
“If Nano Banana Pro didn’t exist, you would be blown away [by GPT Image 1.5].”
— Mike, 17:23 - Mike runs several side-by-side image tests:
-
Real-World Usage:
Chris uses Nano Banana Pro for system diagrams, calling it “amazing” for non-artistic users, despite requiring a few iterations.
3. Research Agents: Fire Crawl & Gemini Deep Research
[21:04–34:57]
Fire Crawl Agent
[21:04–27:57]
-
Key Capabilities:
- An agent that reliably crawls multiple webpages, extracts structured data, and makes decisions when gathering information—“so reliable and so accurate, it has blown my mind.”
- Used for compiling a timeline of AI model releases. Output was so trustworthy they’re integrating it into the research workflow in SIM Theory.
“It is the most useful tool to have in your toolkit as an MCP for research to go off and get data.”
— Mike, 21:58 -
Why It Matters:
- More depth than a “Google and paste” approach; can now trust an agent to go deep, extract, and check facts rather than just aggregate surface info.
“There's something untrustworthy about [old school] Google searches, paste the first few web pages… This level of decision making … is just such a big step in the right direction.”
— Chris, 24:20
Gemini Deep Research Agent
[27:57–34:57]
-
Impressive Synthesis & Decision-Making:
- The agentic process includes: deep dives, knowledge-gap identification, and reference synthesis.
- Has a “files” API enabling research over user docs, videos, and datasets.
-
Workflow Synergy:
Chris hacks together combining Skills, MCPs, and research files:“The results of using them combined is just unbelievable.” (Chris, 28:53)
- Supports follow-up questions for deeper threads. Mike likens it to how “building a good context is the key to getting good AI tasks done.”
-
Advice for Organizations:
- Research context + agentic planning + execution = best results. Expensive, but “very essential… or at least one for your most serious tasks.” (Chris, 32:09)
4. Enterprise AI: Skills vs. MCPs and the Evolving Paradigm
[34:57–52:50]
-
Definitions:
- MCP (Modular Command Protocol): Secure, explicit tool calls for software/data, but all instructions and examples must be loaded into prompt context—wastes tokens, especially at scale.
- Skills: Procedures and knowledge as context, loaded only when invoked—like “codifying business practices into repeatable skills.” (Chris, 41:16)
- Repeatable, complex tasks (e.g., “brand guidelines,” compliance, filing legal docs)
- Skills can reference other skills: chaining procedures for consistent, domain-expert output.
-
Anthropic’s Strategy:
- Skills released as an open standard, promising a future of “one general purpose agent equipped with a library of specialized capabilities.”
“Not sure I entirely agree with that, but… their pitch is like having a universal agent with skills being [the roles].”
— Mike, 43:04Chris: “I disagree… I think that's what agents are for. An agent will use skills and follow these procedures.”
-
Practical Impacts:
- “Skills” make complex, expert-level procedures re-usable, shareable, and accurate, with far more detailed prompt instructions than possible with regular assistants.
- “Some of these skills are absolutely massive in scope in terms of how much they bring into the prompt.” (Chris, 46:50)
- For enterprise: “Getting this sort of data in and actions out with MCP is still the right area to be focused on for now.” (Mike, 52:38)
- “Just try it… when I started to use them, I’m like, okay. This is magic. It's amazing. I get it.” (Chris, 52:50)
- “Skills” make complex, expert-level procedures re-usable, shareable, and accurate, with far more detailed prompt instructions than possible with regular assistants.
5. 2025 Recap: Major Model Releases & Trends
[53:19–63:10]
- Timeline Walkthrough:
Mike presents a Fire Crawl–generated month-by-month release chart, shocking both at how quickly things moved.- Gemini 2.5 Pro released only in March ’25—felt like “a longer relationship.”
- Key observations:
- “No one was taking Google seriously at all” in January, but they finished the year as top contenders.
- Feeling of constant model churn and a lack of a universally trusted “core” model, unlike prior years.
- Continued open source model attempts failed to keep up.
- Chris: “I can’t believe we’re finishing the year in a state where I don’t really have a go to model at all right now.” (09:33)
- Model Awards:
- Best: Gemini 2.5 Pro (both hosts)
- Runners-up: Chris—Sonnet 4.5, Mike—Opus 4.5
- Worst: Llama (“in general”), with OpenAI 5.1/5.2 also deemed disappointing.
6. 2026 Predictions & Reflections on the Future
[63:10–78:09]
- 2026 Will Be “Year of Agents”
(Contrary to 2025’s predictions)- Skills will reach all providers; agentic workflows will become the organizational standard.
- Mike expects “everyone will recognize [agentic workflows provide] huge leverage… every organization will have their own approach.”
- Open source models may catch up, but still lag behind in key business contexts.
- Centralization vs. Control:
- Anthropic and others want to “own” organizations’ skill repositories—but companies may prefer control and flexibility, integrating best-in-class tools as needed.
- Hiring and Job Market Impacts:
- Workers who adapt to agentic, planning-driven AI workflows will become “immensely productive,” causing a hiring slowdown in roles that don’t.
- Best Model for End of 2026?
- Chris: Anthropic, unless context window remains too small; acknowledges “it has all of the components you need to get this stuff into the future.”
- Mike: Bets on Google or possibly OpenAI “if they put their mind to what enterprises really want.” (75:46)
- “I wouldn’t write them off yet… they’ve got enough smart people.”
- OpenAI’s “Abomination Store”
- Chris: The AI app store fails (“never talk about it again”). Mike agrees; says “it is truly stupid,” prefers integrating action at scale.
- On model provider skill execution: Both expect to decouple execution, running skills on their own infrastructure rather than via the model vendors, for flexibility and cost reasons.
Notable Quotes & Memorable Moments
-
On the state of AI models:
“I find myself in this weird thing where I’m not really trusting any models at the moment.”
— Chris, 06:54 -
On Fire Crawl Agent:
“It is the most useful tool to have in your toolkit as an MCP for research to go off and get data.”
— Mike, 21:58 -
Comparing Nano Banana Pro to GPT Image 1.5:
“If Nano Banana Pro didn’t exist, you would be blown away [by GPT Image 1.5].”
— Mike, 17:23 -
On the value of Skills:
“It’s literally codifying business practices into repeatable skills. And that’s what it is.”
— Chris, 41:16 -
On the future of AI tooling:
“I think next year, increasingly it’s going to be a planning phase that you work with the agent… Delegating, and it goes off and does the work.”
— Chris, 72:05 -
Betting on OpenAI:
“I wouldn’t write them off yet. We’ll see. I’ll replay this and we can laugh.”
— Mike, 75:58
Segment Timestamps
| Segment | Time (MM:SS) | | ----------------------------------------- | ---------------: | | Intro Holiday Hijinks | 00:02–01:45 | | Gemini 3 Flash Review | 01:45–10:45 | | Image Models Showdown | 10:45–20:39 | | Fire Crawl Agent & Gemini Deep Research | 21:04–34:57 | | Skills vs. MCPs Debate | 34:57–52:50 | | 2025 Model Recap & Awards | 53:19–63:10 | | 2026 Predictions / Future of Agents | 63:10–78:09 | | Closing/Thank Yous & Musical Parody | 78:09–end |
Tone & Style
The brothers keep things laid-back and irreverent, mixing practical technical commentary with “average guy” jokes about their own programming mistakes, dodgy AI usages, and dubious “hacks.” They candidly admit when they don’t fully understand emergent paradigms, repeatedly experimenting with models “live” so listeners can learn along with them. References to surfing, hobo coding, AI-made prank photos, and Taylor Swift in every image keep things fun and relatable.
Bottom Line
If you’re in the trenches trying to wrangle AI tools for yourself or your organization—and you don’t have an army of PhDs—this episode distills which models, agents, skills, and strategies actually work, and which to steer clear of. Despite all the hype, the Sharkey brothers prove that sometimes it’s the “average” practitioners who have the clearest take on a year that was anything but average in the world of AI.
[For further detail, check the timeline link Mike promised in the show or browse the full episode transcript.]
