wavePod

Get Wave AI

The Future of AI Systems: EP99.04-PREVIEW - This Day in AI Podcast | Wave AI Podcast Notes

Back to This Day in AI Podcast

The Future of AI Systems: EP99.04-PREVIEW

This Day in AI Podcast

Fri May 16 2025

Summary

Podcast Summary: This Day in AI Podcast – "The Future of AI Systems" (EP99.04-PREVIEW)

Hosts: Michael Sharkey (A), Chris Sharkey (B)
Release Date: May 16, 2025
Podcast Description: Two self-described “proudly average” tech enthusiasts navigate the rapidly evolving world of AI—from the betting markets on best models to the nuts and bolts of practical tool use and the growing complexity of agentic AI. This episode provides a down-to-earth, sometimes irreverent look at industry progress, hype cycles, and the changing AI application layer.

Episode Overview

This preview episode revolves around which company has the leading AI model in mid-2025 and the rapidly changing landscape of large language model (LLM) systems. The hosts banter about their own bets on industry leaders, examine the current betting markets (via Polymarket), dissect model advancements and the plateau of core LLM performance, and dive deep into the future of tool use, agents, and practical AI integrations. The episode is rich in both industry commentary and hands-on insight, making it useful for listeners eager to understand shifts in AI utility and workplace applications.

Key Discussion Points & Insights

1. The AI Model Arms Race & The Betting Markets [00:03–04:46]

Betting on Models: The episode opens with banter about which company is likely to have the top-rated LLM at the end of May 2025, referencing odds on Polymarket. Google's Gemini leads, with hosts expressing surprise at OpenAI's low odds.
- A: “I'm losing money every day. It's depressing.” [00:30]
- B: “OpenAI is a 3% chance of having the best model at the end of the month… just not even close. Like that's kind of crazy.” [02:20]
Current Leader: Google’s Gemini 2.5 Pro is the acknowledged frontrunner, especially after recent coding advances and positive community feedback.
- B: “It was like they saw your bet. And then last week… announced on X that Gemini 2.5 Pro had gotten this upgrade.” [01:36]
OpenAI & Anthropic: The hosts are flummoxed by OpenAI trailing not only Google but also upstarts like xAI and DeepSeek, highlighting the shifting leaderboard.
Plateauing Improvements: There’s a consensus that major LLM leaps (like the GPT-4 moment) are less likely; recent updates are more focused on tooling than on raw model intelligence.

2. Instabilities, Real-World Performance, and Model Switching [04:46–08:00]

Reliability Concerns: The hosts note issues with Gemini 2.5 Pro (e.g., timeouts, perceived drop in intelligence), reflecting on growing user expectations as reliance increases.
- "We've gone from literally trash talking Google, laughing at their failures, to now... they've become so reliant all of a sudden on Gemini 2.5." [05:15]
Experimental Labels: The models are still labeled "experimental," yet users pay full price, leading to debates about enterprise reliability.
- A: “My argument is we're paying full price for the thing. Like we're paying for it. That is professional. Like if you're paying for something at that level, it should be reliable.” [06:18]
Seamless Switching: Both hosts frequently alternate between models—Gemini, Claude, GPT—reflecting that most modern LLMs offer comparable performance for many tasks.

3. Context Windows, Tool Use, and Problem Solving [08:00–14:02]

Anecdotal AI Brilliance: Michael recounts impressively detailed code troubleshooting by Gemini 2.5, attributing success to both model power and user-provided context.
- "It wrote back like a four page breakdown of its logic... and it nailed it like first go. It was absolutely brilliant and not an obvious solution at all." [08:44]
The Importance of Context Size: Models with very large (100K to 1M) context windows, like Anthropic’s Claude Sonnet and Gemini, can process massive data chunks, which is a crucial differentiator, especially for tool-rich use cases.
The Rising Importance of Tool Use: Context management and smart tool invocation is seen as the next competitive edge—being able to chain many tool results, synthesize them, and reason methodically.

4. Tool Calling, Recursion, and Agentic Reasoning [14:02–20:26]

Anthropic's Approach: New rumors suggest Anthropic’s models will “think and think some more,” meaning iterative tool use coupled with reasoning loops to self-correct.
- B: “It can go back to reasoning mode to think about what's going wrong and self-correct…” [13:48]
Multi-Step Tool Calling: The hosts point out that this multi-step, recursive approach to tool use (e.g. refining search queries based on results) is already being done by current LLMs, sometimes in a more transparent and easily observable way.
Observability Matters: Transparency and the ability for users to “interrupt” actions is cited as a hard requirement for real-world adoption, especially for high-stakes tasks.

5. Trust, Human-in-the-Loop, and Practical Automation [15:54–22:31]

Human Approval Required: The duo revisits earlier show discussions, emphasizing most people will not trust AI to take significant action (like emailing a boss or editing production databases) without human approval.
- A: “As much as I'm against the safety controls in AI models, this is the case where I'm thinking human in the loop. Some sort of approval is crucially necessary.” [16:44]
Specialization via Tool Clusters: With hundreds or thousands of potential tools, models may need “clusters” or different agents/assistants with access to carefully scoped toolsets, enhancing both reliability and user trust.

6. Agent-to-Agent Protocols & The Future of SaaS [22:31–26:24]

Agent Layers: AI “agents” may just be models bundled with tool clusters, instruction sets, and gating/approval logic—“just an abstraction layer with different instructions and some tools.” [23:10]
Reproducibility: Consistency matters—users want repeated, comparable outputs for similar queries (e.g., researching multiple stocks with the same methodology). No universal model can guarantee this alone.

7. Skills, Pre-Tuned Prompts, and Customization [28:13–32:32]

Emergence of “Skills”: The hosts cite experimentation with “skills buttons” (pre-tuned, task-specific prompting modules), echoing recent leaks that OpenAI is testing similar features.
- B: “One of the things we’re working on is the ability to train skills…” [29:08]
- A: “You limit what’s available to a particular skill...you give it a guideline around the effort, the methodology, what needs to happen, but then you allow it to use its intelligence…” [30:10]
From Simple Tool Calls to Composite Workflows: Skills become complex clusters of actions—users define effort, constraints, and permissions, while models provide reasoning within these bounds.

8. MCPs (Model Context Protocols), Connectors, and the Next AI Platform Wars [32:32–45:19]

Explosion of MCPs: There’s excitement about the proliferation of MCPs, which allow easy addition of capabilities, but managing tool conflicts and desirability of tool clusters becomes crucial.
Vendors Reacting, Not Leading: The hosts observe that many AI lab moves (like OpenAI’s rushed MCP integration) are reactive to competitor advances and community buzz, not evidence of secretive breakthrough tech.
- “It just makes me kind of curious. Like, it feels very reactive. Not like they're leading anymore, even in those areas.” [39:41]
API vs. Native LLM Tool Use: There’s debate about whether tool calling should be tightly integrated into the core model (e.g., GPT-5 as a “model router”) or managed at the application layer.
Walled Gardens, Lock-In, and Competition: As apps build deep MCP integrations and skills banks, platform lock-in becomes a concern, though the hosts remain optimistic about open protocols and self-hosting.

9. SaaS Disruption and Marketplace Dynamics [45:19–57:02]

MCP Hosting & Commercialization: Expect a rise in cloud platforms for hosting, gating, and monetizing MCP connectors (with fine-grained permissions and account controls). Proprietary data vendors (e.g., Bloomberg) may make bespoke MCPs available for a fee.
- B: “Imagine… Bloomberg having access to all their data in an MCP…and just… taking a toll on that data.” [51:17]
From Connectors to Agents as a Service: SaaS companies may pivot to offering “Agent as a Service” endpoints with embedded skills, rather than exposing raw APIs.

10. Current Flaws in AI Product Design [61:26–68:44]

Critique of App-Based AI Integrations: The hosts humorously roast inefficient “AI everywhere” features in products like Notion, Canva, and Atlassian.
- A: “Now we've added AI. It's so good. Like Seriously, Unless all your docs are already in there and you consume them via MCP into something else…But no one is logging into Notion as their like, starting point each day as their command center to get things done. It's ridiculous.” [61:28]
AI Chatbots Everywhere: Multiplying sub-par chatbots are seen as a distraction from genuinely empowering users. The hosts advocate instead for deep, workflow-integrated AI that operates as a background agent, rather than constant context switching between superficial chat interfaces.

11. Where Are We in the "Year of Agents"? [69:45–77:44]

Are Agents Delivering Now? The hosts reflect on the reality gap between “the year of agents” hype and the present reality. Agents are augmenting human workers, not replacing them, and the biggest productivity gains come when humans leverage agents for background tasks, not total automation.
Trust, Memory, and “Training” Agents: Long-lived workflows give rise to chats or agents that users get attached to—“it knows what needs to be done." Consistent planning, toolchain memories, and customizable interfaces are seen as the next big breakthroughs.

12. Outlook: Tool Use, Reliable RAG, and the Shift to System Layer Innovation [77:54–84:01]

RAG (Retrieval Augmented Generation) and Tool Use: As models grow, simply dumping all data into a prompt is less practical; effective tool-based research modules (including internal data and long-term memory) are more important and must be controllable on a per-assistant basis.
Reduced Hallucinations: Increased tool use and targeted retrieval has been found to reduce LLM hallucinations. The hosts call for business-grade assistants configured for guaranteed behaviors—e.g., always citing official policies—over mere model improvements.

Notable Quotes & Timestamps

“It just blows my mind that it's so low…and that they have so much faith that xai, which is significantly higher, is going to somehow come out with a better model than OpenAI and Google. I just, I just don't see that happening.” (B, 02:51)
“This is why I'm not like a stock trader or any of those things, because I'm not good. But, you know, I guess my point was that anything can change with this stuff.” (A, 01:59)
"It wrote back like a four page breakdown…can you figure out what's going on? And it nailed it like first go. It was absolutely brilliant…" (A, 08:44)
“Transparency and the ability for users to ‘interrupt’ actions is cited as a hard requirement for real-world adoption, especially for high-stakes tasks.” (Summary of section, 15:20)
“The rise of very soon is MCP hosting, like as in platforms, sort of like Cloudflare, Netlify, that are like, okay, we will host your MCPs…” (A, 48:44)
“Now we've added AI. It's so good. Like Seriously, Unless all your docs are already in there and you consume them via MCP into something else…But no one is logging into Notion as their like, starting point each day as their command center to get things done. It's ridiculous.” (A, 61:28)
“We're going to get these sort of like trained states that it reaches where it's like, this can now actually do part of my job for me.” (A, 70:34)
“It's just about time now to implement these ideas and make them work.…The future is not written yet—like, there's a lot of opportunity here." (B, 82:45)
"My suggestion would be OpenAI, like, from the leaked screenshots we've seen, having a bunch of connectors is not enough…you, the user, is still going to have to be very, very specific in your prompting to get it to do useful things with those connectors." (A, 83:32)

Summary

In trademark dry, self-deprecating style, Michael and Chris Sharkey deliver a fast-moving, insight-rich preview of where the AI industry is in mid-2025: Google is on top for now, but tool integration and the “system layer” have become more important than raw model leaps. The discussions zero in on the practicalities of agent-based systems, the vital importance of observability, human trust, and skill/cluster-based tool control, and express healthy skepticism about the value of throwing undifferentiated chatbots everywhere.

They predict an emerging landscape where the most valuable advances will come not from marginal accuracy jumps in LLMs, but from flexible, composable tool and skill platforms—augmented by memory, planning, and fine-tuned agent behaviors—ready to finally automate some of the tedious, repetitive “busywork” tasks professionals face. The episode closes with anticipation for imminent Google and OpenAI announcements, resigned acceptance of another betting loss, and their usual call for average, “adequately okay” engagement with the world of AI.

For those who want a fast take:
The future of AI in 2025, according to the Sharkey brothers, is less about whose LLM is mathematically superior, and more about building practical, transparent, and trustable systems that bring AI’s skills to bear on real tasks—without the smoke, mirrors, and endless context switching of today’s many half-baked AI integrations.

Loading summary...

Transcript

A (0:00)

Foreign this week.

B (0:03)

What everyone is really wondering is which company has the best AI model end of 2025 and will you win or lose your bet? So let's take a look at the current betting market. I feel like we're kind of sponsored. You know how every sport now is sponsored by some sort of gambling thing?

A (0:22)

Yeah, yeah. There's a few things I watch where they're like, have you heard of Polymarket? And I'm like, we're not even being paid for this.

B (0:28)

Yeah, we're just freely promoting it stupidly.

A (0:30)

And sorry, not only are not being paid, I'm losing money every day. It's depressing.

B (0:35)

It really fits with the averageness of the podcast. But anyway. So you bet that Google will not have the best model at the. It's the. Is it. It's not the end of this year, though. Have I even got the right one?

A (0:47)

End of May, they do it once a month and it's based on what the LM sys ratings are. Is there sort of like at the midnight, the stroke of midnight, Cinderella loses her shoes and they decide who has the best AI model.

B (1:01)

So you know what's funny? Because I had the wrong one up on the screen, I was like, oh, your odds aren't that bad. But in fact, now I've switched to the correct one. This is.

A (1:09)

I mean, look at. Gone straight up and to the right since I declared. I mean, times on their side. Right. Because they do currently have the best model. So in the absence of someone else trumping it, they're going to win. And I guess we also know that the Google I O announcement's coming, so presumably they'll have something even better. So 80% probably does seem about accurate right now. It was them having the best.

B (1:36)

It was like they saw your bet. And then last week, not only did they sort of be like, we've got the best model, but then Logan and various other Google people announced on x that Gemini 2.5 Pro had gotten this upgrade and now it was even better at coding and had significant gains. So it, I mean, I guess it hasn't worked out terribly well for you at all.

A (1:59)

Well, this is why I'm not like a stock trader or any of those things, because I'm not good. But, you know, I guess my point was that anything can change with this stuff. And we've seen people leapfrogging each other in. In this space. And I was thinking, what are the odds that none of them are going to come back? I mean, we really haven't seen major advancements from OpenAI in a long time. And I do find it absolutely crazy for all the talk there is around things like O3 and its tool calling abilities and stuff, that OpenAI is a 3% chance of having the best model at the end of the month. Like the people who started it, all the people who everybody loves, just not even close. Like that's kind of crazy.

A (32:32)

Yes, exactly. And I think that's what is exciting me at the moment is seeing just the absolute masses of MCPs that are coming out is truly exciting. And the fact that there is a common protocol for them means it's very straightforward to add them to a system. Now the question for me becomes how do you make sure the model, no matter which model it is, uses the right combination of those at the right time and sort of knows what, what is needed where. Like even when you say have multiple Gmail accounts, like how does it know which one to use when you say check my mail for this or send an email to this person? Like those are all things that through a good design software are possible. Like we can give your preferences, we can give it information it can use to deduce what to do when. And I think that that is the real advancement that's going to come. So that's why in these upcoming AI announcements, yes, I'm always grateful for new models, but it's going to be interesting to see how far along the thinking is of the major providers in terms of the practical applications of these technologies. Because the pieces are there, like the pieces are there to actually make these extremely valuable for people in their day to day life and get more of those things that you're currently copying and pasting in and out of an AI app for as something that can be done just from the one interface where you just, you're that, you're that commander that we dream of who's just saying, okay, now do this, now do that. And I really feel like we're at the point where we're going to have these different AI skills or AI workers where we're like, okay, you go research this, you know, you go send an email saying I can't make it to that meeting. You go set a meeting with Mike. I've been sending you like regal formal calendar invites that are like four pages long on Google Calendar using an mcp, you know, so now it's like if I want to schedule a meeting with someone, I'm just typing it in. And those are, they sound so simplistic. But those things are becoming actually possible now. And I think, I think most people are still doing them manually when they can definitely be done through an AI system now.

B (38:12)

Can we come back to the. The sort of MCP and tool calling layer? And like before you said, I'm excited to see like what these people will announce around this stuff. Right. So there was a leak like maybe like a day ago. Let me try and find it. Yeah. Here says the new Chat GBT web app version includes an option to add custom connectors based on model context protocol. And then they have some UI here showing some connectors. And these connectors are pretty much the most common MCPS that are available today. We know this because like we're doing the same thing. But then I think, okay, let's talk about just the overall strategy of this company for a minute. So MCPS blew up like you know, a couple of weeks ago, seems like a year ago, but a couple of weeks ago they really had their, their moment in the sun where it's like all everyone on X and the people in the sort of know or just like the most annoying voices on X. So just all they were talking about and promoting is MCVs. It's going to change everything. And I think they were partially correct that an open protocol where you can just serve these to AIs is important. But you look at this and it's like clear to me that Anthropic did this and then OpenAI now just like, like Rushing this out off the back of that being very popular and people saying, like, this is something we want, this is something we should have. Right. And I don't think it's that dissimilar to us. Like, us discovering the MCP protocol and saying, like, hey, this could be the secret source of getting connections into the real world in order to train better skills, to get real work done is sort of the lens we were thinking through here. So then I look at them and say, okay, if this is coming out now, I'm guessing like, like, only three or four weeks ago when they saw this blowing up and Sam Altman announced, like, we're going to now support MCPs, right? They've just gone out and smashed this out. And, like, it begs the question around strategy because you talk about the tool calling and, like, the improvements around this stuff, but it just makes me kind of curious. Like, it feels very reactive. Not like they're leading anymore, even in those areas.

A (70:00)

Well, I think my, my thinking is sort of necessarily clogged by the MCP world because that's what we're working on. But I can absolutely see the value of it. Where, like, I think what's happened, and we talked about this before, is that, as you said, what's old, what old is new again. Right. Like, in the sense that all of these ideas were spread around like a year ago, like, oh, it could do this, it could do that. And I think that it just takes time for people who are gradually beginning to trust AI and use it in their workflow to see the power of. If it could do this, then that would be immensely valuable for me. And therefore now everybody's ready for this next iteration. And I think what I'm seeing by working with it every day and actually using it is things around. I need it to really just like we saw with, like, memory. Because you know how we talk about when you have a long session going with an agent. We know a lot of people get protective over specific chats even, because, like, this one knows me the best and it knows what I'm working on. And I just. It's precious. I absolutely must keep this chat going because it's getting my work done. I think it's Going to be the same with the MCP tool calling in the sense that, okay, I. It knows it needs to ask me about this one. It knows what I mean when I ask it to do this thing. And we're going to get these sort of like trained states that it reaches where it's like, this can now actually do part of my job for me. Like I can actually use this combination of tools to get this thing that was a pain before, or I wasn't doing enough of it, or I wasn't doing it consistently enough and it can now do it for me. And I think that the real key here is providing an interface. And I don't think even we know, even though we're thinking about it, exactly how that works, yet an interface that is conducive to that, that is able for people to articulate. Here's what I want from this combination of tools. Like this is how I want it to work. We try it out a few times, we adjust it, it works. Now I've got this. And I think that to some degree will include computer use, browser automation, mcps. It's going to be a combination of all of these things. And I think the models improving in the background in terms of their context retention and their being able to like think through the approach they take in planning is going to be a big thing. So I think that anything that works around the person who's operating it, describing what's needed, and then the AI being able to come up with consistent short and long term planning around that and then remembering that for future iterations is going to be way more important than any modeling improvements.

B (72:52)

Yeah, I, I just think we're at a point where there's like at least the consumer or a lot of people I talk to because of all the hype around agents and agency and you know, you've got Eric Schmidt out on his still relevant tour telling us all about the future. It's reached a point where I think a lot of the practical applications of the technology right now are being lost around this idea that it still is predominantly human in the loop, but it can now go off and do real work for you. And it can utilize these tools to get a lot done and make the individual worker much more productive once they get used to it. And I think at first a lot of people are threatened by it because they're like, oh, it's coming from my jobs. Because, you know, that's the current narrative. And the reality is with the technology today, I don't think that that's Far from correct at all. I mean like literally this week, the Klarna dude, remember him? Where I built the Klarna CRM with a recursive loop? Yep. So he came out and said, oh, they're hiring humans again for support. Because it wasn't, you know, turns out, you know, turns out the AI agents weren't that great at pleasing their customers for their payday loan scam. So I think that, you know, I think that like the promise right now or the thing that excites me about it is just when you are able to just get it to do like those somewhat agentic tasks for you in the background where you can see it going off and working and getting something done for you that would have taken you quite a lot of time and it is saving you time and you become addicted to that and you don't really remember the previous way of working once you start working that way. And then like I see that transformation in organizations occur all the time when it's like, like the team's skeptical for whatever reason. They're like, oh no, it's gonna threaten us. Then they start using it. They're like, hang on, this makes my job much more pleasant because I can focus on what I like doing and the usage goes through the roof as a result of that. So to me, that's where we're at. That's the, the 2025 play. If you can just get your team or your organization or yourself personally into that state, then you're doing something right. And that, that's where, that's where things are. They're not at this stage where there's this like, you know, autonomous agent going off and like doing all this like fancy crazy stuff or replacing a worker. It's just augmenting workers in a lot of ways that I think that's somewhat where we're at. I think that the. You gave that example before about like the triggers around like the sales inbox and then it making some decisions based on being trained some skills as a part of your job. We've talked about that on the show before. I think that is something that will, will increasingly be seen and come this year where it's something that organizations maybe look at and start to begin to implement. But I think it's going to take like quite a while before that becomes the norm.

Summary

Podcast Summary: This Day in AI Podcast – "The Future of AI Systems" (EP99.04-PREVIEW)

Episode Overview

Key Discussion Points & Insights

1. The AI Model Arms Race & The Betting Markets [00:03–04:46]

Betting on Models: The episode opens with banter about which company is likely to have the top-rated LLM at the end of May 2025, referencing odds on Polymarket. Google's Gemini leads, with hosts expressing surprise at OpenAI's low odds.
- A: “I'm losing money every day. It's depressing.” [00:30]
- B: “OpenAI is a 3% chance of having the best model at the end of the month… just not even close. Like that's kind of crazy.” [02:20]
Current Leader: Google’s Gemini 2.5 Pro is the acknowledged frontrunner, especially after recent coding advances and positive community feedback.
- B: “It was like they saw your bet. And then last week… announced on X that Gemini 2.5 Pro had gotten this upgrade.” [01:36]
OpenAI & Anthropic: The hosts are flummoxed by OpenAI trailing not only Google but also upstarts like xAI and DeepSeek, highlighting the shifting leaderboard.
Plateauing Improvements: There’s a consensus that major LLM leaps (like the GPT-4 moment) are less likely; recent updates are more focused on tooling than on raw model intelligence.

2. Instabilities, Real-World Performance, and Model Switching [04:46–08:00]

Reliability Concerns: The hosts note issues with Gemini 2.5 Pro (e.g., timeouts, perceived drop in intelligence), reflecting on growing user expectations as reliance increases.
- "We've gone from literally trash talking Google, laughing at their failures, to now... they've become so reliant all of a sudden on Gemini 2.5." [05:15]
Experimental Labels: The models are still labeled "experimental," yet users pay full price, leading to debates about enterprise reliability.
- A: “My argument is we're paying full price for the thing. Like we're paying for it. That is professional. Like if you're paying for something at that level, it should be reliable.” [06:18]
Seamless Switching: Both hosts frequently alternate between models—Gemini, Claude, GPT—reflecting that most modern LLMs offer comparable performance for many tasks.

3. Context Windows, Tool Use, and Problem Solving [08:00–14:02]

Anecdotal AI Brilliance: Michael recounts impressively detailed code troubleshooting by Gemini 2.5, attributing success to both model power and user-provided context.
- "It wrote back like a four page breakdown of its logic... and it nailed it like first go. It was absolutely brilliant and not an obvious solution at all." [08:44]
The Importance of Context Size: Models with very large (100K to 1M) context windows, like Anthropic’s Claude Sonnet and Gemini, can process massive data chunks, which is a crucial differentiator, especially for tool-rich use cases.
The Rising Importance of Tool Use: Context management and smart tool invocation is seen as the next competitive edge—being able to chain many tool results, synthesize them, and reason methodically.

4. Tool Calling, Recursion, and Agentic Reasoning [14:02–20:26]

Anthropic's Approach: New rumors suggest Anthropic’s models will “think and think some more,” meaning iterative tool use coupled with reasoning loops to self-correct.
- B: “It can go back to reasoning mode to think about what's going wrong and self-correct…” [13:48]
Multi-Step Tool Calling: The hosts point out that this multi-step, recursive approach to tool use (e.g. refining search queries based on results) is already being done by current LLMs, sometimes in a more transparent and easily observable way.
Observability Matters: Transparency and the ability for users to “interrupt” actions is cited as a hard requirement for real-world adoption, especially for high-stakes tasks.

5. Trust, Human-in-the-Loop, and Practical Automation [15:54–22:31]

Human Approval Required: The duo revisits earlier show discussions, emphasizing most people will not trust AI to take significant action (like emailing a boss or editing production databases) without human approval.
- A: “As much as I'm against the safety controls in AI models, this is the case where I'm thinking human in the loop. Some sort of approval is crucially necessary.” [16:44]
Specialization via Tool Clusters: With hundreds or thousands of potential tools, models may need “clusters” or different agents/assistants with access to carefully scoped toolsets, enhancing both reliability and user trust.

6. Agent-to-Agent Protocols & The Future of SaaS [22:31–26:24]

Agent Layers: AI “agents” may just be models bundled with tool clusters, instruction sets, and gating/approval logic—“just an abstraction layer with different instructions and some tools.” [23:10]
Reproducibility: Consistency matters—users want repeated, comparable outputs for similar queries (e.g., researching multiple stocks with the same methodology). No universal model can guarantee this alone.

7. Skills, Pre-Tuned Prompts, and Customization [28:13–32:32]

Emergence of “Skills”: The hosts cite experimentation with “skills buttons” (pre-tuned, task-specific prompting modules), echoing recent leaks that OpenAI is testing similar features.
- B: “One of the things we’re working on is the ability to train skills…” [29:08]
- A: “You limit what’s available to a particular skill...you give it a guideline around the effort, the methodology, what needs to happen, but then you allow it to use its intelligence…” [30:10]
From Simple Tool Calls to Composite Workflows: Skills become complex clusters of actions—users define effort, constraints, and permissions, while models provide reasoning within these bounds.

8. MCPs (Model Context Protocols), Connectors, and the Next AI Platform Wars [32:32–45:19]

Explosion of MCPs: There’s excitement about the proliferation of MCPs, which allow easy addition of capabilities, but managing tool conflicts and desirability of tool clusters becomes crucial.
Vendors Reacting, Not Leading: The hosts observe that many AI lab moves (like OpenAI’s rushed MCP integration) are reactive to competitor advances and community buzz, not evidence of secretive breakthrough tech.
- “It just makes me kind of curious. Like, it feels very reactive. Not like they're leading anymore, even in those areas.” [39:41]
API vs. Native LLM Tool Use: There’s debate about whether tool calling should be tightly integrated into the core model (e.g., GPT-5 as a “model router”) or managed at the application layer.
Walled Gardens, Lock-In, and Competition: As apps build deep MCP integrations and skills banks, platform lock-in becomes a concern, though the hosts remain optimistic about open protocols and self-hosting.

9. SaaS Disruption and Marketplace Dynamics [45:19–57:02]

MCP Hosting & Commercialization: Expect a rise in cloud platforms for hosting, gating, and monetizing MCP connectors (with fine-grained permissions and account controls). Proprietary data vendors (e.g., Bloomberg) may make bespoke MCPs available for a fee.
- B: “Imagine… Bloomberg having access to all their data in an MCP…and just… taking a toll on that data.” [51:17]
From Connectors to Agents as a Service: SaaS companies may pivot to offering “Agent as a Service” endpoints with embedded skills, rather than exposing raw APIs.

10. Current Flaws in AI Product Design [61:26–68:44]

Critique of App-Based AI Integrations: The hosts humorously roast inefficient “AI everywhere” features in products like Notion, Canva, and Atlassian.
- A: “Now we've added AI. It's so good. Like Seriously, Unless all your docs are already in there and you consume them via MCP into something else…But no one is logging into Notion as their like, starting point each day as their command center to get things done. It's ridiculous.” [61:28]
AI Chatbots Everywhere: Multiplying sub-par chatbots are seen as a distraction from genuinely empowering users. The hosts advocate instead for deep, workflow-integrated AI that operates as a background agent, rather than constant context switching between superficial chat interfaces.

11. Where Are We in the "Year of Agents"? [69:45–77:44]

Are Agents Delivering Now? The hosts reflect on the reality gap between “the year of agents” hype and the present reality. Agents are augmenting human workers, not replacing them, and the biggest productivity gains come when humans leverage agents for background tasks, not total automation.
Trust, Memory, and “Training” Agents: Long-lived workflows give rise to chats or agents that users get attached to—“it knows what needs to be done." Consistent planning, toolchain memories, and customizable interfaces are seen as the next big breakthroughs.

12. Outlook: Tool Use, Reliable RAG, and the Shift to System Layer Innovation [77:54–84:01]

RAG (Retrieval Augmented Generation) and Tool Use: As models grow, simply dumping all data into a prompt is less practical; effective tool-based research modules (including internal data and long-term memory) are more important and must be controllable on a per-assistant basis.
Reduced Hallucinations: Increased tool use and targeted retrieval has been found to reduce LLM hallucinations. The hosts call for business-grade assistants configured for guaranteed behaviors—e.g., always citing official policies—over mere model improvements.

Notable Quotes & Timestamps

“It just blows my mind that it's so low…and that they have so much faith that xai, which is significantly higher, is going to somehow come out with a better model than OpenAI and Google. I just, I just don't see that happening.” (B, 02:51)
“This is why I'm not like a stock trader or any of those things, because I'm not good. But, you know, I guess my point was that anything can change with this stuff.” (A, 01:59)
"It wrote back like a four page breakdown…can you figure out what's going on? And it nailed it like first go. It was absolutely brilliant…" (A, 08:44)
“Transparency and the ability for users to ‘interrupt’ actions is cited as a hard requirement for real-world adoption, especially for high-stakes tasks.” (Summary of section, 15:20)
“The rise of very soon is MCP hosting, like as in platforms, sort of like Cloudflare, Netlify, that are like, okay, we will host your MCPs…” (A, 48:44)
“Now we've added AI. It's so good. Like Seriously, Unless all your docs are already in there and you consume them via MCP into something else…But no one is logging into Notion as their like, starting point each day as their command center to get things done. It's ridiculous.” (A, 61:28)
“We're going to get these sort of like trained states that it reaches where it's like, this can now actually do part of my job for me.” (A, 70:34)
“It's just about time now to implement these ideas and make them work.…The future is not written yet—like, there's a lot of opportunity here." (B, 82:45)
"My suggestion would be OpenAI, like, from the leaked screenshots we've seen, having a bunch of connectors is not enough…you, the user, is still going to have to be very, very specific in your prompting to get it to do useful things with those connectors." (A, 83:32)