Podcast Summary: Moonshots with Peter Diamandis
Episode #218: Why We Need New AI Benchmarks, Which Industries Survive AI, and Recursive Learning Timelines
Date: December 23, 2025
Guest: Matt Fitzpatrick (CEO, Invisible Technologies; former Global Head of Quantum Black Labs, McKinsey)
Host & Panel: Peter Diamandis (Host), Alex, Dave, Salim
Episode Overview
This episode delves into how AI is transforming industries, why outdated benchmarks are holding enterprises back, and what companies must do to survive the coming wave of AI-driven disruption. Featuring Matt Fitzpatrick, a leader in AI R&D and CEO of Invisible Technologies, the conversation covers the practical challenges facing companies aiming to become "AI companies," the critical need for narrow industry-specific AI benchmarks, the tension between startups and legacy giants, and predictions for the acceleration of AI capabilities and organizational lag in 2026.
Key Discussion Points and Insights
1. The Coming Disruption: Are All Companies Becoming AI Companies?
-
Scope of Impact: Not All Industries Will Change Equally
Matt Fitzpatrick:- "I don't think, and I think all the data that has come out on this so far, that all industries are going to be impacted equally by this. ...Areas like media, legal services, business process outsourcing—there are many sectors where the structure of what the industry does is going to change." (04:55)
- Sectors like oil & gas or real estate will see less disruption in core functions, though parts may adapt with AI.
-
Challenges for Small and Mid-sized Enterprises
- "If you're a 50 person company, it's hard to deploy a lot of this stuff at scale if you don't even have a CTO in house." (05:38)
- Many companies must decide whether to build AI capabilities in-house or "rent" them from third parties.
2. Startups vs. Legacy Enterprises: Who Moves First?
-
Speed is a Key Differentiator
- Peter: "Your competition isn't really the large, you know, multinational. It's the AI native startup that came out of no place, that's reinvented themselves from the ground up as an AI first company." (06:52)
- Matt Fitzpatrick: "Do the startups get distribution before the big companies build the technology? ...that will be the tension..." (07:34)
-
Specialization and Benchmarking
- Enterprises need to create task- and context-specific benchmarks rather than rely on broad public ones (see below).
3. The Critical Need for New AI Benchmarks
-
Limitations of Public Benchmarks
- Matt Fitzpatrick: "Most of the public focus to date has been on the large public benchmarks for things like coding... your benchmark for most cases is not a broad based, accurate kind of cognitive benchmark. It's accuracy or human equivalence on a specific task." (21:08)
- Enterprises must develop "custom evals" for every task/vertical they wish to automate/augment with AI.
-
Opportunities for Industry Specialists
- Dave: "All of this benchmarking within these domains is really, really hard to figure out. Unless you know [the industry], that's my benchmark to own...you become an instant star." (22:37)
- Alex: "We need thousands of new narrow benchmarks to capture maybe every labor category, every industry vertical?" (23:22)
- Matt: "Yes, we do spend quite a bit of time working on that." (23:54)
4. Case Studies and Examples
-
Charlotte Hornets—Scouting with Computer Vision
- Invisible Technologies built computer vision models to analyze player movement across many types of video feeds for NBA draft preparation.
- Matt: “We fine-tuned a custom computer vision model to specifically look at moving patterns they were interested in before the draft.” (29:23)
-
LifespanMD—Data Aggregation for Healthcare
- Created a HIPAA-compliant multi-tenant cloud to structure and integrate health data across practices, enabling better outcome tracking and chat-based knowledge management.
- Matt: “Before you can even start with AI, you have to make sure that you have the structured and unstructured data together that you want.” (32:41)
-
SAIC & US Navy—Autonomous Underwater Swarms
- Invisible worked on intelligence and movement pattern analysis for underwater drone swarms.
-
Swissgear—Inventory Forecasting
- Brought together 750+ data tables to optimize inventory forecasting, increasing SKU reliability by 2x in months. (64:41)
5. Adoption and the Human in the Loop
-
The Klarna Example: AI Contact Centers
- Klarna initially rolled out fully automated AI for customer service, announced savings of $40M/year, then rolled it back, returning to humans.
- Matt: “The whole movement from all humans, all agents back to all humans was confusing. You'd never want to move to doing everything agentic. You're going to want humans in the loop in almost any topic.” (13:37, 15:21)
- Key reasons for failure: humans sometimes want human contact, complexity of exception handling, nonstandard scenarios.
-
Why “Let a Thousand Flowers Bloom” Fails
- Matt: "Make sure you have a list of two to three things that move the needle. ... Get to a proof of concept in one of them... tie into outcomes to limit your risk." (17:49)
- Focus on value; don't experiment everywhere without operational KPIs.
-
Human Expertise and RLHF
- The panel debates whether human expertise in model fine-tuning (reinforcement learning with human feedback—RLHF) will be rendered obsolete by automated/AI researchers.
- Matt is adamant: "Pairing synthetic and human data together is stronger. ...as the models move more and more into very specific areas, there is more and more RLHF needed for them." (41:38)
- The balance is shifting but not as fast as some AGI advocates claim.
6. Data as Foundation and Bottleneck
-
Data Quality Over Quantity
- Matt: “If you tried to build an AI agent on fragmented customer and product data, it's going to break by definition.” (36:29)
- Only the data relevant to the use case needs to be made clean—don’t wait five years to fix everything enterprise-wide (37:21)
- Many companies have hundreds of siloed databases, which creates immense integration challenges (39:54)
-
Proprietary Data & Security
- Many sectors will never let their most valuable data go to public models—banks, healthcare, traders—but not all data is equally sensitive.
- Matt: "Be clear about the data you need to keep proprietary ... [But] I don't think the paradigm of 'I will not give anything to the model, I'll keep it all in house' ... makes sense either." (54:51)
Notable Quotes and Memorable Moments
-
On the Future of Knowledge Work:
Alex: “I said knowledge work is cooked. Not knowledge workers, not companies. Knowledge work as we currently know it.” (03:59) -
On Why Projects Fail:
Matt: “The failure mode on that has been you let a thousand flowers bloom, none of them have an operational metric, and you kind of end up with a science project dynamic.” (62:39) -
Matt’s Self-described Role:
“Our founder Francis has an idea of: do you have all the ingredients to build a cake but you don't have a cake? What we do is we actually bake the cake. ... We make AI work.” (78:09) -
On Human Roles that Will Stay Longest:
Matt: “All the jobs that involve human interaction, physical work ... the job ecosystem around data centers, electricians, etc. is going to become way more in demand.” (70:18) -
On RLHF:
Matt: “We are as a company, fully a believer ... that human-in-the-loop is going to be a feature, not a bug, for a long, long time. ... Autonomous agents will do all of this with no human salute—I actually think you're going to need more and more humans at every step.” (44:41)
Timestamps for Important Segments
- 04:53 Can every company become an AI company? Which industries are most and least at risk?
- 07:34 Do startups disrupt before incumbents adapt?
- 13:37 Klarna’s failed AI contact center experiment; importance of humans in the loop.
- 17:49 Practical roadmap for companies starting their AI journey.
- 21:08 Why the world needs thousands of new, narrow AI benchmarks for every use case.
- 29:23 Case Study: Charlotte Hornets, computer vision for pro sports scouting.
- 32:41 Case Study: LifespanMD, data aggregation for healthcare transformation.
- 36:29 The importance of clean data; why most AI projects fail.
- 41:38 Will recursive self-improving AI replace ML freelancers and RLHF?
- 54:51 Companies’ strategies for AI and data privacy/proprietary data.
- 61:09 Organizational structure: why operational KPIs—not science projects—matter.
- 64:41 More case studies: US Navy underwater drones, Swissgear inventory.
- 67:08 2026 Predictions: Multi-agent teams, multimodal leap, and RL “mirror world”.
- 70:18–75:02 Which jobs will be last to fall to AI? Last three expert roles.
2026 Trends and Predictions
Matt Fitzpatrick’s Forecasts (67:08–69:27)
- Rise of Multi-Agent Teams:
“You'll train task specific agents for individual tasks, usually orchestrated by an LLM... That's been an architecture that's been discussed, but we're just starting to see green shoots of success.” - Multimodal Leap:
“Video, images, audio are going to become a bigger part of how people engage with these models. I don't think it will all be text-based.” - Mirror World / RL Gyms:
“Simulated environments or digital twins for tasks... so you can actually test how it's going to work before you roll it out to your physical world.”
Actionable Takeaways for Enterprise Leaders
- Don’t Boil the Ocean: Start with 2-3 high-value use cases, not broad experimentation.
- Get Data Right: Focus data-cleaning efforts very narrowly on relevant data, not organization-wide.
- Measure Real Outcomes: Let operational leaders run the AI work, and commit to measurable KPIs.
- Consider Partnering: Most companies, especially mid-sized, are better off renting or partnering for AI before trying to build it in-house.
- Embrace Human-in-the-Loop: For both tasks and model development (RLHF), expertise and quality still hinge on human feedback—especially in specialized and regulated domains.
- Develop or Adopt Narrow Benchmarks: Industry specialists who build these will lead their verticals.
- Rethink Organization: AI-first is about more than automation—consider disruptive structures and innovation-at-the-edge, not merely adapting existing workflows.
Panel’s Final Thoughts & Closing Questions
- Human Expertise: Cooked or Still Essential?
Alex posits three scenarios for the "last standing" roles: law makers/politicians, top intellects, or high-touch authentic human roles (therapy, negotiation, etc.). (73:26) - Organizational Models for AI Adoption:
Salim and Peter: Organizations should emulate Apple’s stealth “disrupt from the edge” models rather than letting many unfocused pilot projects meander. - Government & Social Impact:
Matt: “AI assisted permitting could cut energy and data center project implementation timelines by 50%... AI could shrink public sector process cycle timelines by 70%.” (75:02)
Notable Soundbites
-
On Enterprise Caution:
Dave: “Enterprises are going to move super stupidly slowly compared to AI capabilities. ...That’s going to frustrate the hell out of Google and OpenAI.” (02:50, 46:53) -
On Opportunities in Benchmark Creation:
Dave: "If you declare yourself the owner of [a benchmark] and then broadcast it, ...you become an instant star." (22:37) -
On Expertise and Specialization:
Matt: "Human expertise becomes more and more important in many different areas. ...The human touch elements become more and more important." (51:33)
For more: Visit InvisibleTech AI to learn about enterprise AI solutions. To follow moonshots and tech trends, subscribe at dashmandis.com/metatrends.
This summary provides a detailed and engaging overview of the episode, summarizes all important topics, showcases notable quotes with attributions and timestamps, and delivers practical recommendations for listeners.
