
Loading summary
Matt Fitzpatrick
Most of the public focus to date has been on the large public benchmarks for things like coding. I think the problem is though, Matt.
Peter
Fitzpatrick, CEO of Invisible Technologies.
Matt Fitzpatrick
We're an infrastructure company. What that infrastructure allows us to do is what I would call hyper personalize software at scale.
Alex
It sounds like your position is we need thousands of new narrow benchmarks to capture maybe every labor category, every industry vertical.
Matt Fitzpatrick
That is an interesting setup. Second part of this, which is we're.
Peter
Going to see the largest disruption ever in 2026 from companies that don't make this change.
Matt Fitzpatrick
There are many sectors where the structure of what the industry does is going to change. If you think about knowledge work, the production of large amounts of documentation where these, these technologies are very disruptive. I think the question is which parts of your business can really change your AI?
Peter
What are you seeing most? Most companies get wrong on their mission to implement AI.
Matt Fitzpatrick
You've got two kind of different challenges. One is. Now that's a moonshot.
Dave
Ladies and gentlemen.
Peter
Everybody, welcome to Moonshots. In today's episode, we're going to be discussing why all companies need to become AI companies in 2026, how they do that, what happens if they don't. We'll discuss whether or not big legacy companies can even make such a dramatic change and how they can best do it. We're going to be going over some fun and meaningful AI use cases. So listen up. I think they're ones that are going to sort of get you excited about what you can do. And we'll dive into some predictions from our guest for 2026 today. Joining the moonshot mate is a friend of the pod, Matt Fitzpatrick, who for more than a Decade was at McKinsey, right, rising to the position of global head of Quantum Black Labs. I love that name, Quantum Black. It's so cool. Leading the firm's AI software development, R&D and global AI products. A year ago, Matt joined as the CEO of Invisible Technologies, a company starred by a brilliant friend of mine, Francis Pedraza. For those who don't know Invisible, the company is a modular AI software platform that uses AI training and provides AI training for most of the large language model providers out there building custom workflows and agents for enterprise. They anchor their work in creating clean data and human in the loop delivery to ensure measurable business results. Matt, welcome. Good to have you here.
Matt Fitzpatrick
Hey Matt, thank you for having me.
Peter
We got DB2 AWG, Selim almost. Happy holidays guys. It feels like we're on this pod every other day. I think we should Just move into a large sort of podcast house.
Salim
We will have documented the singularity.
Dave
Well, you know, I'm really looking forward to hearing from Matt because on Thursday we have to do our predictions for next year and Matt is going to give us a ton of insight today. One of my predictions out of the gate is that enterprises are going to move super stupidly slowly compared to AI capabilities. And Matt is the world leading expert on the intersection between AI and enterprise. So I cannot wait for this.
Peter
You can't cheat this way, Dave, and sort of use Matt's predictions as yours.
Dave
I can't. No, you won't be there.
Peter
Everybody listening to this pod will know that. All right, well, take good notes if nothing else. You guys ready to jump in? Matt, I'm going to kick it off with a question, sort of a broad question for you. And here it is. So within the past year, we've heard from every company out there and every CEO that we're going to be pivoting to become an AI company. Salim, in the last pod you said something like, we're going to see the largest disruption ever in 2026 from companies that don't make this change. And I think, Alex, the term you used is they're going to be cooked if they don't.
Alex
I said knowledge work is cooked. Not knowledge workers, not companies. Knowledge work as we currently know it.
Peter
Aha. Okay, so you don't think that companies are going to be cooked that don't make the transition to AI?
Alex
I think we're going to see many more companies over time and many more smaller companies as well.
Peter
Okay, well, we're going to dive into that.
Salim
Look, in an earlier episode, we pointed out that when you thought you had product market fit and you're scaling a SaaS company, you're like toast. Because everything needs to be rethought now given AI. So this is now applying to big companies also.
Peter
All right, so, Matt, the question to kick this off is, can every company truly become an AI company and how? And then which companies and industries do you think need to disrupt themselves now before they become basically irrelevant? So that's a, you know, a softball question to kick us all off here.
Alex
Peter. Saving the hard balls for me.
Matt Fitzpatrick
Matt, always I think your second question relates to your first question in some ways, which is I don't think, and I think all the data that has come out on this so far that all industries are going to be impacted equally by this. I think there are some sectors and areas where you're going to see materially different impacts I think areas like media, legal services, business process, outsourcing, there are many sectors where the structure of what the industry does is going to change. If you think about knowledge work, the production of large amounts of documentation, where these, these technologies are very disruptive. I think where there's, I think the hype has been a bit overblown is if you take a lot of sectors like oil and gas or real estate, the function of what they do is going to stay pretty consistent. And I think actually most of the good analytics of how job dynamics will change over the next couple years we'll get at this. Like the decision on which apartment building or which office building to buy is going to function pretty similar to what it did five, six years ago. And so I think the question is, which parts of your business can really change with AI? It's not all of them and some sectors will be more or less. And then the second part of your question, which is also an interesting one, which is, can everyone actually become an AI company? And that is an interesting second part of this, which is not. There are not that many people that know how to build these sorts of models or deploy these sorts of models. Well, and so one of the big challenges is do you have the expertise in house to do this? How do you think about adjusting the operating functions of your company to do it? Is it the same team you have in your IT function doing it now? And particularly, Peter, I know like the, the, the set of folks that you and I have spoken with in the past, like small businesses, if you're a 50 person company, it's hard to deploy a lot of this stuff at scale if you don't even have a CTO in house. And so I think there' mix of is your industry going to fundamentally change and then what are the actual core competency your company has to implement it? And so I think what you're going to end up finding. Yeah, yeah.
Peter
So I mean, do you end up bringing a chief AI officer into your company? Are you going to bring that capability or are you basically renting it? I mean, part of the other thing that's going on right now, we've talked about this on the pod a lot. Is your competition isn't really the large, you know, multinational. It's the AI native startup that came out of no place, that's reinvented themselves from the ground up as an AI first company, Right?
Dave
Yeah, down and dirty, Matt. Like which happens first. I can get a mortgage by talking to an AI and get it done in under an hour or we're walking on Mars with our own two feet. Which of those two things is going to happen in the real world first?
Matt Fitzpatrick
Yeah, so the way I've often heard the question asked is, do the startups get distribution before the big companies build the technology? And I do think that will be the tension in a lot of ways. And look, I think there's a lot of big established companies that are going to figure out how to do this really well. I think if you take a sector like legal services, I do think the big law firms will figure out how to use a lot of this over time. I think there are sectors where I think banking is a really interesting one to look at right now. If you look at the age of the application footprints in banking, most of the tech that exists in banking is north of 20 years old. And so you do have a bunch of very fast moving newer fintechs that are approaching it different ways. Companies like Revolut. I don't know how that plays out, but I do think that becomes the question a lot of ways is which moves faster, the emerging entrance or the modernizations of the existing? I think, Peter, to hit on what you were asking also is the second part of that, do you buy your rent? I think that's something you've got to be really honest with yourself about as a company. Right. And I think the idea that everyone can buy, everyone can hire people to do this is challenging. The challenge of trying to adapt an existing IT function to do this is many of the skill sets that people hire for. Even like, do they know Python, things like that, there are gaps in that. And so I think the answer that most companies I've seen who don't have the resources in house or being just directive about how to push that is they are finding ways to rent or buy this externally and to partner with folks that can allow them to do it.
Peter
Every week, my team and I study the top 10 technology metatrends that will transform industries over the decade ahead. I cover trends ranging from humanoid robotics, AGI and quantum computing, to transport, energy, longevity and more to there's no fluff, only the most important stuff that matters that impacts our lives, our companies and our careers. If you want me to share these metatrends with you, I write a newsletter twice a week, sending it out as a short 2 minute read via email. And if you want to discover the most important meta trends ten years before anyone else, this report's for you. Readers include founders and CEOs from the world's most disruptive companies and entrepreneurs Building the world's most disruptive tech. It's not for you if you don't want to be informed about what's coming, why it matters and how you can benefit from it. To subscribe for free, go to dashmandis.com metatrends to gain access to the trends 10 years before anyone else. All right, now back to this episode.
Dave
I think, I think legal and accounting are really, really cool case studies and I know you know more about this. You know your McKinsey time, Quantum Black, you're like the guy understanding and parsing all of this. But they're really cool because they can be replaced by a startup, you know, like Harvey.
Matt Fitzpatrick
Dave. Two things I'd say about that. I think one challenge of the implementation of Genai in the enterprise setting is a statistically validated baseline to compare against. Right. And so as an example, if you take something like mortgage underwriting has made huge progress, like in a very positive way. Actually the percentage of mortgage underwriting that's now done by a very guard railed and very effective set of algorithms developed by the banks is pretty high because they can back test and say this is a correct credit decision that has no redlining anything else. But if you think about a document like the reason Contact Centers has been one of the cases we've seen a lot of adoption of this is you do have a clear baseline, right? Like time per call, CSAT cost per call. You have a set of metrics you can pair against something like let me generate an investment memo which is different in format at every firm. It could be 10 pages versus 40 pages. The content is different. It's been harder for folks to build baselines. I do think that's why Legal Services is an interesting one is there are certain areas of legal where, where those baselines are clear. Like you can look at what documents are really good for like an IS agreement. But where I would. I think you're going to see this in a lot of different segments. I think the high end of that market still persists in a really differentiated way which is if you're doing a large M and A transaction, you're still going to want a really good lawyer's advice. And where it changes I think is in the more very basic like produce an NDA type of work and the kind of basic. And I think that's going to be one of the shifts again is that really, really good human guidance is going to persist forever. It's the basic commodity information that right now a lot of people are paid probably excessive amounts of money to do.
Dave
Yeah, well, the NDA is pretty extreme. But I'll tell you, the venture fundings that we do, we do tons of these every year. And the term sheets always say you, the company that we're investing in will bear the costs of the legal capped at $50,000. And then the documents are freaking identical every single time. There's like eight knobs and you could store all combinations on the smallest thumb drive in the world. How is this like a $50,000 and it always runs up to $4,999.99. It's like whoa, what a miracle. So I don't know. That to me feels like that would be on sort of the mid to hard end of the scale, yet it's still so doable. An NDA is a no brainer. Mortgages are no brainers.
Matt Fitzpatrick
I completely agree. I think what's been interesting though is how slow the actual adoption curve has been in say contact centers because contact centers should have had, I mean generally CSAT scores or people don't really like most contact center actions. The kind of general customer feedback you get is pretty unhappy. And that's been true for a decade. So you would have expected.
Dave
The whole Klarna thing. Actually, I know you're an expert on this, like the Klarna thing has been really interesting to watch. Tell us the story.
Peter
What is the Klarna thing?
Matt Fitzpatrick
I was not involved in Klarna anyway, but I can say at least what I, what I know from, from, from reading about it and what my hypothesis would be. I mean basically Klarna announced that they were going to move entirely towards a fully end to end agent contact center. And then a couple of months later. And by the way, the interesting thing was at that time they were the most frequently cited example of agentic success in deployments. And then about eight to 12 months later they basically announced they were rolling the whole thing back and moving back entirely to human contact center agents. And I found the entire evolution kind of interesting because if you think of how these systems should be defined like deployed like a multi agent system, the way it should work is you'd have kind of an orchestration of what the types of calls are. You'd have a set of validations on which calls could go well or badly and you'd have some sense of where you need escalations to human agents versus where you're going to. So you actually would never want to move. And I think this is a theme invisible has whoever you're never going to want to Move to doing everything agentic. You're going to want humans in the loop in every, almost every industry and almost any topic. Because I think actually that's where a lot of the. If these models are trained off of precedent data and then you, you can train them really well to then kind of continue that logic. You're going to want humans for some of the things where you don't really have precedent data. You need them to work through complex things where you don't have enough historical information. And so I found the entire structure of how the change happened quite confusing because you would always want to keep a contact center to be a mix of humans and agents and then evolve the mix between those and on which topics. And so the whole movement from all humans, all agents back to all humans was confusing. I think from anti Salim, you are.
Peter
Pregnant with a question.
Salim
No, no, I just wanted to give out some details here. So the Klarna situation, they rolled out an AI to do customer service calls and the claim was in the first month. It did the work of 700 full time agents, handled 2.3 million calls a month and they projected that it would save them 40 million a year. And they were like really proudly saying this was like month one and it's only ever going to get better from here. The when I saw that I was like, okay, if I was doing that, this sounds like a PR exercise more than anything real because you'd never put that out in the first month. You'd wait a couple of months to see what exactly happened. And Matt, you may be able to give a little more color on why did they roll it back in the end? Did they find the hard cases were too many, the exception handling was too much, or was it a cultural backlash? What was it exactly that had them undo the whole thing?
Matt Fitzpatrick
I don't know in the sense I haven't worked with Klarna, but I think you hear a variety of different pieces of feedback on why folks have struggled in contact centers. I think one reason is there are cases in which humans just want to talk to another human. And so I think some of the PR of saying we're moving to only agents has its challenges. I think two, a lot of the challenges and where contact centers are most sensitive is non first line call resolution topics. So it's not something like check your balance. It might be something like process a refund. Right. Something that's pretty complex. You have to write back to the source systems. And I would. It was surprising to me how quickly they rolled that out. And I Wonder how hard, how well it was able to roll to kind of deal with some of the more complex functionality in that example.
Salim
Right. Can we get between Level 1 to Level 2 and 3 very quickly on those support calls? And then, and then you're, you do not want an AI dealing with you.
Peter
Can we get back to the main question here, which is you've got, you know, 2026 is coming up. If you're listening to this in 2026, it's here now. So here's the question. You're a medium sized company or large sized company, and, and your board of directors has just said to Mr. CEO or CTO, Guys, what's your AI plan? What are you doing? I mean, we're seeing that over and over and over again. Their first reaction is what typically, and what should their, what should they do? I mean, I want to just get some of the fundamentals here because I want to serve our listener base in that fashion.
Matt Fitzpatrick
Yeah. So I think if you're that CEO, you've got two kind of different challenges. One is what are the things I should focus on? And then two is who should do them and do I have those skills in house? And so the first thing I would start with is making sure, you know, the first question. I do think this is a question of following the value. So I'd go down a list. I would not start with letting a thousand flowers bloom. I would start with what are two, three things that if you do them well, materially move the needle for your business? Maybe it's, you know, we were just talking about customer service. Maybe that's one example. Maybe it's forecasting in your FPA function, maybe it's inventory management. But there's, there's definitely two to three things that almost any business on earth, even as a small company has that are. Digital marketing probably is another one that you see pretty frequently. And you focus on one or two of those and you make sure you get to a pilot stage in that one or two, meaning not a strategy document. I do think the one thing that anyone who spent real time in the space will tell you is if you took the paradigm of how machine learning is deployed, where you spend months and months building something and then it works and you can underwrite statistically that it works. This is kind of the exact opposite paradigm in that you can get a prototype up and running in a month, but you have to do a lot of testing and validation to make sure you can trust it. And so it is really a function of making sure you can get something up and running and testing and validating. Peter, the question I always ask is, would you bet your annual bonus that whatever use case you deploy works? And that's a complicated thing. If it's like, let's say generate a claims processing review and you have to do 10,000 of them. Most companies don't know how to say that works or it doesn't. And so what I would do is just to summarize, make sure you have a list of two to three things that move the needle. Make sure you get to do a proof of concept in one of them. And I would probably do that. First, use case as an RFP to a third party vendor that gets compensated based on results. And I say that very specifically because I think if you do it in house, the odds are the in house team has not had a lot of experience with this. And so you also can't hold them accountable in the same way of you get paid if it works. And so I do think tying into outcomes limits your risk.
Peter
I mean that, that is still the business model for Invisible, right? You're paid by money saved, correct?
Matt Fitzpatrick
We borrow outcomes. Yeah, outcomes in various ways.
Peter
Yeah, yeah. Alex, I want to bring you into the game here.
Alex
Much appreciated. So maybe just as a preliminary matter in interest to full disclosure, I have no financial interest in Matt's company, Invisible. I do have a number of questions though. First question, maybe pulling the thread on testing. One of the things that we talk about here on the pod all the time is benchma, the importance of benchmarking.
Dave
I'm curious, given that we all talk about that constantly.
Alex
Alex, that is all we talk about. We talk about nothing else. That is all we talk about.
Dave
Oh, wait, maybe that's you.
Alex
Okay, given that's all we talk about, as Dave just mentioned, and given that Invisible is also in the business of training so many models, what benchmarks do you think most need to be brought into existence in the world? What's most missing? Top three benchmarks you'd like to see summoned into existence.
Matt Fitzpatrick
Yeah, look, I think and you've seen a bunch of these start to get publicized in the past couple of months, but most of the public focus to date has been on the large public benchmarks for things like coding. I think those are very useful as metrics for are the models improving broadly. And I think that is a way you'd be able to see by any standard, if you look over the last three years, the models of 50% to 100% improvement on most dimensions that you can look at. I think the problem is though, if you think about like enterprises or small businesses, your benchmark for most cases is not a broad based, accurate kind of cognitive benchmark. It's accuracy or human equivalence on a specific task. And so what I think you're going to see more and more need for is kind of custom evals on highly specific topics. So if you go back to the Contact center example, the benchmark you want to build, if you're going to roll this out for Contact center is a series of expert agents that are in your Contact center and how they perform and then how the agents perform similarly. Same with claims processing. But basically most businesses are going to have to get comfortable with doing what's called an eval or a custom benchmark for the tasks they're trying to modernize because an 80% accurate, very smart deployment is not, you know, there's still too much risk in that rollout framework. And so I think a lot of this is actually the way that we think about benchmarking will evolve from broad based benchmarks to hyper specific benchmarks.
Dave
I freaking love that because I can immediately see 10,000 listeners right now just found a calling in life based on what you just said because all of this benchmarking within these domains is really, really hard to figure out. Unless you know, title insurance, what's the benchmark for successful AI and title? Well, somebody in that industry listening to this pod right now is going to be like, you know what, I was an early adopter of AI and I know this space inside and out. That's my benchmark to own. And if you declare yourself the owner of it and then broadcast the benchmark, the evidence so far is you become an instant star. Like nobody's grabbing topic ownership in all these topics and if you just get there first, you become an instant star.
Matt Fitzpatrick
I completely agree with that.
Salim
And I think that's a serious geek like Alex type or Matt type or David.
Alex
I don't even agree with that. In this era of post training as a commodity, if you own the benchmark, often it's the case, I think that the benchmark is the hard you can leverage existing resources to post train an off the shelf model. I am curious though, Matt, maybe following up on this. So it sounds like your position is we need thousands of new narrow benchmarks to capture maybe every labor category, every industry vertical, assuming that's the correct, is that something that invisible is working on, can be working on, should be working on.
Matt Fitzpatrick
Yeah, we do spend quite a bit of time working on that. In fact, a lot of the Time, what we're building is customer specific benchmarks for an individual task. So that is a of what we think about is actually how to test equivalence for a given task. And you know, I think one of the things that folks have not fully realized is let's say you take a really high performing LLM and you want to tailor it to your individual context, that process of actually fine tuning it off of your data. So an example I would give is, and I think one of the challenges that people were hoping that this would be a SaaS buyer's paradigm, meaning like I could just buy something that off the shelf would just solve everything I needed. So like I wanted to buy a sales agent, I wouldn't have to do anything. I could just take in a sales agent that would sell well. And the reality is that's pretty hard to do. You need to actually train it up on your specific knowledge corpus, your information. And so the way we would think about it is you take the LLM or you take an agent that's been trained for sales and then you fine tune it off of your specific company information, your products, the way you sell, your way of speaking, and then you have to build an eval or a benchmark against that to say this is performing well or not.
Alex
On that follow up question, if I may, because there was the sort of infamous Bloomberg GPT moment where Bloomberg was sort of in quasi competition with the Frontier Labs. They had a wide variety of internal proprietary data sets. Their original plan, this is now sort of an infamous episode from one to two years ago. Their plan was to offer their own proprietary Frontier model basically, but trained, critically, pre trained and or post trained off of their internal data sets. And the plan was to achieve superb performance in financial domain because they had all the data or a lot of data that were not broadly available to the general public. But what actually happened is the generalist models offered by the Frontier Labs that were training basically off of the Internet and more or less publicly available data sets within a few months leapfrogged Bloomberg's GPT project. And so I guess the moral of that parable in my mind is how far do you think we can really get with proprietary data sets, proprietary benchmarks, before the generalist models completely wipe the floor with them?
Matt Fitzpatrick
Sorry, to clarify, I'm saying you use an LLM. The process I'm describing of actually fine tuning a model, a large language model for your specific context is basically adding more context. You're seeing Most of the LLMs offer a paradigm where you can do this, where you can add your knowledge corpus and train it to be more specific to your individual context. I don't think you'll see individual institutions building their own LLMs. I think that's a very compute intensive, very difficult thing to do. I think you'll see them tailoring the large language models to their context.
Salim
Sure.
Alex
To be clear, if I may, to be clear, wasn't asking whether you think every institution is going to get into the business of pre training their models. I was rather asking whether you think post training, which is inclusive of supervised fine tuning, reinforcement fine tuning, a variety of other post training, whether you think that has a long term future or will maybe in one to two years. We just use pre trained plus post trained generalist model off the shelf and not need any internal benchmarks and any internal data sets for post training.
Matt Fitzpatrick
Well, I think there are clearly going to be use cases where you are going to need the context individual company. Right. Like if you just take the law firm example and just take that, I mean just. I'll answer it this way. There are documents that company has on how they want their doc, their future state documents for M and A agreement to look. Right. And the LLMs are not going to have that information. So at some point you are going to have to see the post processing layer happening at the enterprise. And what we're seeing more and more is there's ways to design that layer so that you can as new models evolve, kind of drop those in and we are seeing more and more folks experiment with that. So they're using all the new tech that's being rolled out.
Salim
But I think in fact what's going to happen is over time that edge in data is going to be the most valuable part of any company. Is that trade secret type of how do we do things. Now at some point it may leak into the public models.
Peter
But like, like if you used OpenAI, right?
Salim
Yeah. If you use any frontier models connecting. I remember we were talking to, you know, replit, et cetera, people are using it and then the, the data is going straight into the cloud and that's kind of dangerous. They're going to have to solve that layer in a very powerful way. That's one of my predictions to forecast, et cetera is we're going to need to see a layer of protection between company data and the broader AI world.
Peter
Matt, I want to make this a little more tangible. Now I know you can't talk about the work you've done with the hyperscalers, but you've identified I think five or six cases where you can speak publicly about it. So if you don't mind, maybe we can toss a few of those in and then talk about them as concrete examples. And since Alex made his no financial involvement statement, I will say I'm a proud advisor and am conflicted in a positive fashion supporting what Matt and Francis are doing. So. But do you want to pick one of those? I loved the example on the basketball court. Can you speak to that one?
Matt Fitzpatrick
Yeah, sure. So we worked with the Charlotte Hornets on fine tuning custom computer vision models for draft prep for them. So in their case, they wanted to look at the spatial movement patterns of players on a very broad scale across single point cameras, across a whole host of different college universities and international locations. And so we fine tuned a custom computer vision model to specifically look at moving patterns they were interested in before the draft. And so that was a big part of their draft evaluation.
Peter
You basically took the video and you were able to use models to evaluate every player based upon the video to see how well they performed at every different. I'm not like a sports guy.
Dave
So it's the big round that's becoming clear here, actually.
Peter
Yeah, yeah.
Matt Fitzpatrick
Yeah, sure. So think of if you take typical NBA stats, they're things like points, rebounds, what's called plus minus is one ratio that's often used, which is like the amount you score versus give up when you're in the game. But they're mostly stats that are kind of transactional stats. What they don't look at is the movement patterns of the players who create space where people are positioned at any point in time. And that's actually a lot of the most interesting data. If you go back to some of the original baseball analytics that Billy Beane did for the A's, it's the movement patterns of players and who is in the best spacing. Right. And there are companies that do this on very consistent formats, like on the same court. But we've been able to do is to do that over many different camera angles, many different stadiums, very, very quickly. And that is using custom computer vision models. So we effectively are able to take a single point camera and understand the movement patterns of players in many different environments.
Peter
And so the Hornets use this how for select team selection, player selection, draft selection.
Matt Fitzpatrick
So to understand which players fit certain characteristics that we're looking at.
Peter
Fascinating.
Dave
Yeah. It's a complicated problem too, because chemistry between, like, it's not just about finding the best player. The chemistry between players matters too. It gets infinitely complex and it's a cool little Case study. But you know, Gavin Baker was saying recently that in fantasy football leagues all over the country, which I used to love before I ran out of time, now I have to spend all the time keeping up with that.
Peter
Now you have an agent doing it for you and having fun.
Dave
But that's exactly.
Alex
Now we're obsoleting human sports leagues, replacing them with robot sports leagues and esports. Yes, very 21st century, not Twinsen.
Peter
I'm betting on T800 again. Yes, that's right.
Dave
Yeah. But people are losing their leagues. Great, great. Fantasy football people are losing all over the place because the AI agent is tracking a huge amount of more detailed data. And if you look at the video footage, if somebody's making it up and down court very slowly, nobody's going to notice that, but the AI will notice it in a heartbeat. And then that just goes into the great model. It's really a cool little case study.
Matt Fitzpatrick
Since we've asked a little bit about kind of if a traditional business were thinking about how to do this, I'll give a slightly different one, which is Lifespan md, which actually, Peter, I think this one will resonate with you in particular, which is a concierge.
Peter
I know Chris, who runs it.
Matt Fitzpatrick
Yeah, yeah. So lifespanity is a concierge medicine business. And you can think of it as they have a network of practices, both internationally, in the United States, which all have very different sets of data on their patients kind of practice information. And so the thing I always start with, with any AI use cases, you have to get the data right. Before you can even start with AI, you have to make sure that you have the structured and unstructured data together that you want. And so the first thing that we're doing for them is on our data platform, Neuron, we're creating a HIPAA compliant multi tenant cloud instance where we bring in together all the patient and provider data that's of interest. And we start to bring both a 360 degree view of both the patient and the practice. And so you can start to think of things like if you wanted to understand what longevity focused tests male patients 35 to 50 are using most frequently, you can start to think about things like that on patient outcomes that are really interesting. If you want to understand practice performance, if you want to understand where you have certain patients that are not compliant or not as interactive, it's effectively just a control tower to understand everything that's going on across that footprints of practices. And then I think the area where generative AI has become more important for that is actually kind of a chat agents where people can ask questions, knowledge management systems and really interrogate and ask questions of all the key data from all of those practices. One of the key things that's challenging about that is obviously in health care you have to be extremely careful about which data is stored locally at the practice versus how that's brought centrally. And so the HIPAA compliant multi tenant cloud is one of the key components of that is actually making sure that no patient data leaves the premise of the individual practices and doctors are able to access certain things and then certain practice metrics are organized centrally.
Dave
I heard the coolest thing this week. It's a QA company that has invented talk to your defect. It's just the coolest concept. The defect actually has a personality and you can ask it questions about itself, like where did you originate? I can totally imagine what just said. In healthcare, think talk to your illness, like have a conversation with it. Where did you come from? How do I treat? Are you getting better or worse? If I do this thing and it's talking back to you with a personality, it's just the coolest idea ever, isn't it?
Peter
It is amazing.
Salim
It's one thing with the defect. It's a little awkward when you say, here's this bacteria you're talking to and you're like, I'm now going to kill you.
Dave
The defect is real. I talk to your illness. Maybe it gets a little weird. I don't know what voice you would give it. Voldemort voice or something.
Peter
How do I kill you? Do I dispatch you?
Matt Fitzpatrick
Dave, one thing I'd note there too is I think there's a question, and I get asked this often of like, how do you. Peter, you asked earlier, how do sectors evolve? I think actually the question of does decisioning of individual patient care change with Genai is a much murkier question. I think the easier place to start, and I think where you know, in many ways be very interesting is the US as an example, spends about 13 to 14,000 per capitation per capita on health care, right? Compared to 2500 to 3000 per capita in say Germany or Canada. Something like 30 to 40% of that is admin cost and that is not admin costs that anyone wants to bear. And so this is something where I actually think the idea that LifeSpanMD is pursuing is not to change the standard, actually to make the physician even more empowered, but to take all of the really painful admin and schedule and make that the part that they don't have to deal with anymore.
Salim
And AI should do a huge amount of damage in those areas.
Matt Fitzpatrick
Exactly.
Peter
What are you seeing most companies get wrong on their mission to implement AI?
Matt Fitzpatrick
Yeah, I think it's a couple different things. I think the first one is a lack of focus on data as the starting point. So I do think the challenge, if you just tried to, if you tried to build an AI agent on fragmented customer and product data, it's going to break by definition. Right? And so I think you do have to be in a place where the data you're going to feed into the models is clear and working. So I think that's been one major challenge.
Peter
Do you think, I mean, if you had to look at companies as a whole in the medium and large size, do they have clean data? How long does it take a company to sort of get its data into a format and a level of fidelity that's useful? I mean, is this a hard lift or an easy lift?
Matt Fitzpatrick
It depends. If you take the paradigm of I'm going to put everything in a data lake and get everything right, which can take five years. And the reality is most big companies have spent a half decade trying to get all their major data schemas in order. But I think if you start with the question of like, what data do I need for this specific use case? Like, you know, if you take, let's take credit underwriting, like to do that. Well, you need one, you need a set of data on the credit itself, the market. You can probably have five to six kind of core data variables. You need kind of the core financials of the business, the security of the credit, all those kinds of pieces of core information. But you don't need every piece of data across the entire commercial bank to be right. You need the core elements for that use case. And so I think companies that are focused on the exact data they need to get right, I think they've done pretty well. But I do think trying to get all data like, I mean, you've also seen the enterprise for a long time, Peter, if you asked any Fortune 1000 company to look at their full data repository and how much of it is accurate and working and clear and accessible right now, very few companies have that. So I do think being very tactical about what data you need. The other thing I think for Genai in particular is that a lot of the most important data is non system of record, non structured data. So it's things like images, videos, text files. It's just not things that people have tried to master historically. And so I think the first step in this is saying, what is the thing I'm trying to solve and how do I make sure I have that data ready?
Dave
Yeah, one thing I see a lot of, you know, had a long board meeting this morning, company that's very AI forward portfolio accounting company called Vestmark. And the data for account reconciliation, for example, the data is abundant, but it doesn't tell you what the person actually does, it just tells you how it was reconciled. So now the path to success is first the AI assistant, which helps accelerate you through the day, but it also knows what you're actually doing. Then that accumulates, then that becomes the RLHF or the training or tuning data. Because what you're trying to do is like, what are you doing, guys? And that's not really represented in the data. But a lot of times you go talk to a bank or an insurance company and they're like, our data is our advantage. Go ahead, bomb it into the neural net and train it. You're like, what the. I don't even know what that means. I'm just going to throw terabytes of spreadsheet data in and see what happens. That's going to go klarna on you.
Salim
Well, you have all sorts of other issues as well. I was talking to this CIO of one of the biggest banks in the world and they have 300 different customer databases. Okay, 300, one for mortgages, one for loans, one for this. Because the mortgage people don't want to tell the loans people about their customer data, so they guard it jealously. It's a total disaster for the poor cio.
Peter
Fascinating, Alex.
Alex
Yeah, I think these are all very interesting points. I'd like to, if I may be so bold, jump up several levels and maybe speak a little bit more about the business model of Invisible. My understanding, correct me if I'm wrong, Matt, is there's an element of the business, I think it's called Meridial, that is sort of a marketplace for ML freelancers. If I understand correctly, and I'm curious. I think in my mind, one of the many elephants in the room in this conversation is that we're arguably on the edge of recursive self improvement. All of the Frontier Labs, more or less, I think, would agree with the assertion that we're nearing the point where you could have an AI researcher, where you just turn over compute resources to the AI researcher and the AI researcher does as good, if not a better job than the human AI researchers who work for the Frontier Labs. If that is indeed the case, Surely one of the several elephants in this room, but given limited time focus on this one, is that the need for a marketplace of ML freelance researchers to train models, doesn't that evaporate entirely as we start to reach the point where AI researchers can build custom models off of custom data sets and custom benchmarks for each client?
Matt Fitzpatrick
Yeah, so as you said, we have two sides of our business. One side Meridial, which we train all the large language models and then on the enterprise side we build basic custom applications for enterprises. Look, I think there has been a five year evolution where I think consistently folks have said at some point you will not need reinforce learning human feedback to validate and test models. And I think the challenge of that logic is a couple different things. One, the spectrum of expertise that if you take language multimodality, extreme expertise on things like computational biology and then the fact that a lot of these are reasoning tasks you do need, and there's a whole host of studies on this, that actually pairing synthetic and human data together is stronger. But you do need human feedback on almost every different sort of agent you want to roll out. And so I think the nature of RLHF is changing. So I think you're moving more towards things like RL gyms, controlled environments, simulations. I think you're starting to see things like much more of the expert work now is PhDs masters. So it's less what I'd call commodity cat dog, Cat dog labeling. But if you say tomorrow you're going to train a model to figure out, you know, different evolutions in 17th century French architecture, in French, you are going to want RLHF to do that, to validate it. And I think that you're seeing that over and over is actually as the models move more and more into very specific areas, there is more and more RLHF needed for them.
Salim
That's interesting.
Alex
Maybe I'll share my intuition and then would be curious to hear what you're seeing in your version of the ground truth. My intuition, my impression is that we're seeing greater and greater data efficiency in part, I mean RLHF was obviously very fashionable over the past three years. Maybe it went through sort of peak fashion if you will. And then we saw the rise of reinforcement, fine tuning, alternative mechanisms that maybe are far more data efficient and maybe even more human time efficient. If you have to just build an RL environment, arguably that's per human hour involved probably a lot more time efficient than staffing out to some so called developing country folks to as you say cat, dog, cat, dog, do supervised fine tuning or some other RLHF type mechanism. Surely I'm projecting. My intuition is that you'd see more data efficiency, not less and therefore the amount of time, effort, money expended on RLHF or any sort of. Even if we buy your assertion that we're seeing sort of hyper parochialization of lots of different tasks and each of them is going to need artisanal annotation, surely there is a competing force which is increasing data efficiency from algorithmic efficiencies. Like, like reinforcement, fine tuning. What are you seeing?
Matt Fitzpatrick
Yeah, I mean people have been arguing that for five years. But I think, I think at least what I've seen on the ground is the, the accuracy that you want in the. If you think about a reasoning task that involves a several step leap and you think about the risk and hallucinations, it is more useful to have human feedback involved in that in some form. All right? And so I think, I don't think that that means if you think in some ways RLHF happens after all the pre training compute cost, it's a pretty small percentage of the total cost in training and it is some of the most valuable feedback. And as you, as you see more and more specific agents being trained for specific tasks, like take legal services as an example, if you train, if you get a new legal services data set, which is interesting, and you want to train a model off of that, you are going to want to see some sort of comparable equivalence, whether it's an associate or an M and a lawyer equivalent, where you actually test if it works. Now, is it possible that at some point, 10, 15 years from now you run out of things to train on? Possibly, but actually, I mean, if you take the number of languages, modalities, robotics is probably the next frontier in some ways. RL gyms, contact centers, there's a lot. We are as a company, fully a believer. And I can talk about the enterprise side too. That human the loop is going to be a feature, not a bug for a long, long time. And I think the entire red herring of the enterprise, for example, is that autonomous agents will do all of this with no human salute. I actually think you're going to need more and more humans at every step.
Peter
Alex, Alex, you're saying that the level of intelligence of these agents as we pass through AGI and get to ASI is such that they'll figure it all out as good as any human and be and replace that human in the loop. What's your Timing on that.
Alex
That was exactly my question, Peter. So my timeline, if I have to spitball. Of course this is not the predictions episode, so don't hold me to it. Hold me to my my predictions in the next episode. My timelines are approximately two to three years as a conservative outer bound for some element of recursive self improvement where we get our AI researcher that's as good if not stronger than the human researchers for building ML models as a conservative outer bound. Not 10 to 15 years, two to three years max.
Dave
Yeah, that's the outer outer edge. But I also believe Matt's totally right that 2026 is going to be the year of recursive self improvement and capabilities growing crazy exponentially and corporations moving at a snail's pace compared to what they could be doing. And it's all going to be stuck and bottlenecked and log jammed and it's going to frustrate the hell out of Google and OpenAI and companies like Invisible are the lubricant that's going to actually get it from point A to point B. But that klearning use case is a really good. In our tests for contact centers, 80% of the people massively prefer the AI. But the 20% that don't like it more than torture the whole thing to death and make it better to repeal the entire thing. There are probably eight ways to fix that quickly, but it's not going to come from Google and it's not going to come from OpenAI and it's going to involve data that isn't in the natural data set. You know, and it's right now. If you told me two years ago that everyone in the world will know what RLHF stands for and there will be three people who are multi billionaires who build RLHF. Companies walking around be like that's not even a thing. Oh wait, now it's not only a thing, it's massive in scale. There'll be new terminology in 2026 for many, many of these other bottlenecks that yeah, the AI can do it, but for whatever reason the bank is not doing it, the contact center is. And those bottlenecks are going to like they're going to be so lucrative for companies like Invisible to just plow them down. I can't answer the specific question of whether your workforce is going to involve the distributed workforce that you just described. What was it called, Alex or Matt?
Alex
It's called Meridial.
Dave
Meridial, yeah. So there is a really healthy debate on whether is Meridial a key part of this or a network of even more agents a key part of this? Or is 2026 the transition year between the two? It's going to be a really interesting foot race between those two different approaches.
Alex
I think you put your finger on it Dave. That really is what I'm asking, which I think is a distinct question from is there value in supervised fine tuning or reinforcement learning with human feedback going forward? Of course there is. What I'm really asking is how much of that can come from AI sort of bootstrapping it in the near term future versus needing human inputs.
Matt Fitzpatrick
And what I'm saying is think about a balance between generalizability and hyperspecificity. And I agree with you on generalizability. I don't actually don't think RLHF is important even now for that. But where it gets more complicated is when you want to train off a specific task. So let's take the insurance claim example that I mentioned earlier, right, you're going to generate a 10 page insurance claim and you could apply this to any enterprise use case, many consumer use cases. But in that world you build, you know an LLM is producing an outcome and it's fine tuned off a specific company's data. But you need a way to actually say at that point does this produce a comparable output to what that claim did to what to what a human doing its task before was doing. And so when I mentioned earlier custom benchmarks, that is the process by which you do that is you actually do need human equivalence testing, you need a human to provide a comparable data set and to say this looks good or it doesn't and you just don't have precedent data to train that off of in any CLM because the human input is not now again that's going to keep going down more and more specific tasks. If you take legal services, take it by language, take it by topic, take it by document type, there's human feedback required for all of that.
Alex
I almost, I mean not to put too fine a point on it, but I want to make sure that. But those in this episode who want to drink the bitter pill with the bitter glass of water for the bitter lesson are so drinking. I'm curious Matt, to understand how you see this. Surely there's a wave of generalism that is over time and maybe we can sort of finesse what the appropriate timescale is. Sounds like maybe your so called timelines are a little bit longer than perhaps mine.
Matt Fitzpatrick
But.
Alex
But would you at least agree with the Premise that over time even the specialized skills end up getting subsumed by generalist models. Or do you think that's just never going to happen? We'll always. Or by always I mean on timescales of 10 to 15 years, which is a pretty long timescale, we're just going to have generalist models that are always sort of specially fine tuned.
Matt Fitzpatrick
I don't think all expertise, all specialized expertise goes away. No, I mean again if you think about a lot of the information that specific experts have, there's no training data available for that. Like it's stuff that sits in people's head. It's experience. Like take, I mean again I'm aware of many of the narratives that human expertise becomes less important. Again, we are a company that actually thinks the human touch elements become more and more important. But take, you know, take sales for example. Many of the best selling patterns, many of the people who've done that the best, like there is no information you can train off of what they do. They live, you know, human interaction. Actually in a world where, you know, there are 500 companies selling email based SDRs, I think human beings become more important in that world. So I don't actually think specialized. I actually think that the shifts are. Expertise becomes more and more important in many different areas. I think human loop stays really important. But I think the, I mean if you take a contact center and Alex, I understand the theory of what you're saying but like we're four or five years into this and if you look at the number of US contact centers that have migrated to using agencies, it's a pretty small percentage.
Dave
Can I ask you. Actually the Jane street question is really burning a hole in my pocket. So it's really clear that stock picking is moving to AI at warp speed. And the reason is because there are no barriers. You're just placing a trade that's already automated. So it's like. And that's.
Peter
And there's a great benchmark. More money.
Dave
More money. Yeah.
Alex
And also almost all of the volume on public equities markets has long since been dominated by algo. So this happened decades ago.
Dave
Yeah, well it started with rapid trading so the quants were already there. So now that it's moving to fundamental analysis, it's the same mindset. So that's one of the reasons it's just taking off. But like Peter said, you're making more money. Okay, let's just keep going then. There's nobody who's saying but I'm going to lose my job. It's like, no, we'll just pay you more, let's just go. So it's a really interesting bellwether but within that world they're struggling because the, the data is so proprietary. It's looking more and more likely that these self improving massive foundation models are going to get to superhuman IQ this year, this year being 2026. But the prompt window is getting massive and the recursive chain of thought reasoning is getting really, really good. So you can actually feed it data without having to retrain it and have it achieve the job. If I take that mindset from Jane street and I move it over now I'm a mechanic and I'm trying to fix a car and trying to diagnose what's wrong with it and I have audio and I have sensor data. Great, easy use case. But am I going to then put that data into the LLM API and Transmit it to OpenAI where they can accumulate it and then if they decide later they want to be a garage, they have all my data or am I going to run some kind of a walled off model and garage mechanics? Maybe not the best example. That's why I chose Jane street because they're never going to take their proprietary data and give it to OpenAI. But in the middle ground you have banks, insurance companies, hospitals, how are they going to deal with this? It's easy. Now sometime in 2026 it becomes easy. But the data, that's my only reason for having a competitive advantage. I don't want to give it over to the API.
Matt Fitzpatrick
Yeah, look, I think you're seeing there are definitely sectors, many of which you just named banking, healthcare, where people are deciding to keep their data on premise or they're using things like small language models for those sorts of reasons. And I think you may continue to see that as a trend. I think one mistake folks often make is not all data is proprietary. So you can have you take the Jane street case, maybe their trading data is proprietary but their, you know, back office kind of forecasting data might not be and so or back office finance data might not be and so. So I think one thing is being clear about the data that you don't that you need to keep proprietary and you do want to take more parameters of security around and then what data you say, look, this is actually I'm going to be very careful as a company but this is data that is not as proprietary. I think that sort of balance, I think the whole, you know, similar to what we discussed with contact centers, the idea of I Will not give anything to the lm, but I'll keep it all in house. I don't think that makes sense either, but I do think that's a paradigm you're seeing more and more. I think that, yeah.
Salim
I want to kind of change the tack a bit, if that's okay. I actually do agree that we'll automate, but I think we'll automate in a way that's different from this discussion. So let me give an example. Let's say I'm Canon Printers and I'm selling home printers, right? Right now I have a bunch of people doing marketing and content development, brand management, then salespeople to sell to the best buys and so on, and online folks. Then you have post purchase getting the customer to try and register the dang printer. And then you've got all the repair support technical staff and then you've got your accounting folks in the company. You could get a job management, right? So you've got pockets of people doing different functions across the board. If I was going to build an AI native printer sales company, then I might think about having all of those things automated completely with AI. And then you're not human centric, but you're function centric across those. The printer could report when it's running out of ink and you ship it a new thing. It tells you when there's a problem with it or there's a problem coming up. You alert your repair staff saying, hey, this guy. Maybe we can upsell them in printer. Da da da da da. And you essentially automate all the functionality with AI and you leave the human 90% out of the loop almost completely because you've automated the core functionality. And right now what I'm seeing is what I used to call radio over tv, right? When you first had television, we took radio announcers, put them on TV to read radio scripts. We didn't adapt for the medium. And I think what I'm seeing right now is we're automating right now what the human being is doing at each of those functions. But surely over time we're going to automate the functional flow and then get rid of the human beings completely.
Peter
AI. Native AI first, right?
Alex
Not to mention getting rid of the printers.
Salim
Well, that's a separate question. That example, who's going to be doing any of the printing? Let's leave that part aside just for the moment.
Peter
I think you're absolutely right, Salim. I mean, this is where a young AI native company reimagines an entire field and has zero legacy and Zero friction in coming forward. The question, as Matt said in the beginning, is do they have the distribution? Right? But this is where a large company can and in this case should actually be investing in entrepreneurs. I mean, one of the things that you and I talk about a lot of times is if I'm a large company and I don't know what to do, I would basically hold a competition, ask young AI entrepreneurs around the world to come forward and how would you disrupt my company? You know, give me a pitch and then I would pick the best five of them and I would fund them and I would say, you know, we're going to fund you to disrupt us and then, you know, we're going to give you access to our data, to everything we have and then ultimately we're going to buy you or buy a majority stake in you and we're going to make you our new company. Right? This is the innovation on the edge, the displacement of the core, et cetera, however you want to call it.
Matt Fitzpatrick
This episode is brought to you by Blitzy.
Dave
Autonomous software development with infinite code context Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Engineers start every development sprint with the Blitzi platform bringing in their development requirements. The blitzi platform provides a plan, then generates and pre compiles code for each task. Blitzi delivers 80 pro or more of the development work autonomously while providing a guide for the final 20% of human.
Matt Fitzpatrick
Development work required to complete the sprint.
Dave
Enterprises are achieving a 5x engineering velocity.
Matt Fitzpatrick
Increase when incorporating Blitzi as their pre.
Dave
IDE development tool, pairing it with their coding co pilot of choice to bring an AI native SDLC into their org ready to 5x your engineering. Visit blitzi.com to schedule a demo and start building with Blitzi today.
Peter
You're a medium size or a large size company. I'm not going to focus on the, on the startup right now. And what do you do in 2026? Because you're going to have to do something. You're going to have pressure from your board, from your shareholders, from just competition. So you got to do something. And what I heard you say so far, Matt, is number one, you got to get clean data. You need to make sure you understand what your data situation is. Number two, you should pick two or three if you would areas, call them benchmarks, where you're going to run experiments on. And it's not a proposal, it's not an idea, it's actually run. It actually do run an experiment. To see how it works, what else? And then pour money on the things that do work and then have an expanding sort of increasing circumference around the company's major revenue engines. How do you think about that? Walk us through a few more steps.
Matt Fitzpatrick
Yeah. So I think one of the things which has been a lot of the topic of conversation here was given all the improvements in the models given, you know, asleep, is walking through on the potential to clean sheet and design a company from scratch. Why has, you know, There was this MIT report that came out that 5% of enterprise models make right now make it to production. Right. So I think there's a starting question of given all this tech excitement, why has that been so much harder? And it's not the technical challenges we've talked about. It's the data. It's the focus on which priorities to look at. I think the other two big ones, though, are the organizational structure by which you pursue those initiatives. And particularly the advice I give everyone is do not locate this in your technology organization. Take your best operator, your best ops person, give them an operational KPI and track it to that and make sure it's a really clear operational KPI. So we talked a bunch about contact centers. You should have an operational person there, lead it around, csat, score, time per call, whatever the core metrics you're looking at. And that should be your guide. If you want to take something like inventory forecasting, you should do it around inventory, day, stock outs, all those kind of metrics. But I think if you have a clear sense of which operational person is leading it and how they're marshaling resources around it, and you have a clear KPI, you're going to make progress if you focus on a couple different things. I think the failure mode on that has been you let a thousand flowers bloom, none of them have an operational metric, and you kind of end up with a science project. Dynamic. Yeah, exactly.
Dave
That's exactly right. If you walk in a thousand flowers, boom. You walk in and you say, I am going to give you a million genius level people for free. Do something, it fails.
Salim
Yeah.
Dave
It's like, here's a million people for.
Peter
Free analysis, and they're all geniuses.
Dave
And it fails for that same reason. It's like, I didn't think of an idea, so I said, a thousand flowers just go bloom. I couldn't think of anything. So maybe you will. Like, how's that going to work? I've seen that you're exactly right. It's just so sad.
Salim
We go even further we basically say not just take the operator and put a kick put, but put them outside the organization and let them build something from scratch at the edge. Because otherwise you get encumbered by all the internal rules and bureaucracies and that gets slowed down for a huge amount. Then it fails for legacy reasons.
Peter
Lockheed Skunk Works. It's the Apple MacBook team.
Salim
Apple is actually a master at this. If you think about what Apple will do is it will form a small team that's very disruptive. They will put them at the edge of the company, they'll keep them secret and stealth and they'll say to them, go disrupt another industry, whether it's watches or retail or whatever. At last count, I think they have 18 teams looking at different industries industries think about and when they think it's ready to disrupt, they go into it and they patiently iterate. Right, the Apple watch, for example. So this is the model I think we're going to see many other companies take on where you, you do this and you. Because if you think of any operational company, the insights they have on all sorts of adjacent industries, incredible. Very hard to disrupt in their own industry because they're probably pretty optimized for it unless you come with the AI startup. But they can really disrupt a lot of the edge cases, a lot of the industries around them. So I expect them to launch AI native startups that go into adjacent industries and go attack some of their neighbors.
Peter
Nice. Matt, before we get to a few of your 2026 predictions, can you just share a couple more of the use cases here just because they're fun?
Matt Fitzpatrick
So we worked with Saic Vanter in the US Navy on building intelligence for underwater drone swarm, for unmanned underwater vehicles. So think of that as if you have a series of and you have enormous numbers of sensors on each of those drones and you need to understand the movement patterns of those different drones. And in each case you see a, you know, you see an object underwater, what do you do? Do you engage? Do you step back? Do you move with other drones? That whole movement pattern and decisioning for underwater unmanned vehicles, that's what we worked on. Fine tuning a model to do that, training it, looking at all the movement pattern data. And again, this is one of those interesting things about drones is they are autonomous. And so thinking about how those movement patterns evolve in complex environments is very hard to do. But you also have lots and lots of interesting sensor data to do that. I think one that maybe anchors more on the human decisioning side is Swissgear. So like Swiss army, the Luggage brand, you know, similarly. And I actually think this is Peter one that a lot of folks in the audience may relate to in some form which is, you know, they had enormous mix of different data tables around products, customers, et cetera, they couldn't really bring together for inventory forecasting. And so we, we use our data platform Neuron to bring together 750 tables really quickly and then optimize the forecasting to look at both minimizing stock outs and optimizing which inventory to hold. Which, you know, if you get inventory forecasting right, it's probably one of the major issues for most small business, for most big and small businesses is you minimize lost revenue, you make sure that you don't hold lots of excess inventory. It's one of the hardest things to do, particularly if you've got a six to eight month order cycle time. And so that was, that was something we partnered with them on and I think was a great outcome. We ended up expanding their overall inventory coverage by about 30% and basically 2x the numbers of SKUs of the reliable prediction. And again that was done in about, in a couple of months.
Peter
All right, so later this week my moonshot mates and I are recording our 2026 predictions. We'll have IMOD back and we'll be talking. Each of us will be provide two predictions for 2026. We'll have our top 10 from the moonshots podcast. It's going to be fun, it's going to be a battle. We're going to ask our listeners to vote on which predictions they like best. I mean of course they're all going to vote for Alex's but. Hey Matt, talk to us about what you see coming in 2026.
Matt Fitzpatrick
Yeah, I think I'll call out a couple and we've, we've just done a bunch of research on kind of our 2026 predictions. So I won't, I won't say all of them, but I'll call out a couple. I think one of the first ones I would anchor on is multi agent teams. So I think one of the challenges and it's inherent a lot of what we discussed here is if you're a large enterprise or medium sized company implementing a use case, you won't necessarily have one decisioning agent that does everything. You'll train task specific agents for individual tasks usually orchestrated by an lla. And what that allows you to do is to pinpoint the accuracy on those specific tasks and then use the broader logic set of the LLM to make sure they all work together properly. And I think that's been an architecture that's been discussed pretty broadly for a while. But I think that we're just starting to see the green shoots of more and more folks having success with that. Contact centers being a good example. So I think that's a big one that I would call out. I think the other, the second one I'll call out is the multimodal leap. I think more and more video, images, audio are going to become a bigger and bigger part of how people engage with, with these models. So I think audio probably one of the most interesting. And so I do think the, the way you'll be able to speak to them, interact with them, visualize them is going to be a really interesting moment for 2026. And I don't think that will all be text based like it has predominantly historically. And then maybe maybe one other.
Peter
No, I was going to ask Alex for feedback.
Salim
Go ahead.
Peter
But finish up, Matt.
Matt Fitzpatrick
Yeah, so the third one I'll call out because we've talked about it a couple on this episode. So I'll is kind of what we call either the mirror world or RL gyms. So I don't actually think that's a well understood concept for many folks in the audience. But think of that as actually creating simulated environments or digital twins for tasks you might want to test. Right. So maybe that's a coding environment, maybe that's a contact center, as we've used that a couple times. But it allows you to actually simulate a series of function calls, tasks or environments by which if you're going to train a model or a task, you can actually test how it's going to work like a manufacturing environment before you roll it out to your actual physical world. And I think that's more and more in both model builders and the enterprise. What we're seeing is a very interesting topic.
Peter
I want to go around to the mates one second, maybe ask some final questions of Matt. Alex, you want to kick us off?
Alex
Yeah. I think the most interesting crux of what we're discussing here is what is the future of human expertise? For that matter, does human expertise have a future? And assuming it does, what's the half life? It's cooked. What's the half life of the value of human expertise? And so to put that in question format, what do you think of all of the forms of human expertise, of all of the labor categories and job roles that exist in the economy today? What do you think will be the last three of those job roles or forms of expertise that will disappear or ultimately succumb to AI. What are the last three to start?
Peter
Last expert standing.
Alex
Okay, that's right.
Matt Fitzpatrick
I mean I'll go back to where I started the episode. I think a lot of the questions commentary on mass shifts ignores the actual function of jobs in society today. So if I go let's, let's take sectors for example oil and gas, a lot of the functional expertise, you know, geoseismic. If you look at seismic engineers, people on oil and gas sites, drilling that, that is a human function like you do need. So I think real estate as another example, like you know, humans actually help select which you can go down a whole list of different areas. I think there are sectors where you're going to see more disruption near term. I call that a couple of them BPOs, legal services. I think media is a fast changing area, but I'm also not exactly sure that those are lead to negative meaning have negative employment consequences. Like if you take media, it's a really interesting one, you know, five, six years ago, eight years ago, I think media as a category really struggled in a lot of ways for paid media as an example. Right. And you've actually now seen in the last couple of years post ELM era substack medium, all these blogs become much more interesting. You have way more media entrepreneurs and so you've changed the function of society and like where the money is coming from changes, but it has not changed total employment. And you know, look, I understand a lot of skepticism that says that you know, AI is going to radically change everything. But I think if you look at the American society for the last hundred years, it's something like 25% of every high school class goes into a field that did not exist when they were in high school. And the reason that persists is people go into the working world understanding the tools they have, thinking about what they can create from that. And one of my favorite statistics the Wall Street Journal reported a couple weeks ago is 20% of US employment right now is digital ecosystem jobs. And something like 9%, 9% of US citizens are full time social media influencers. She's mind boggling to me. But, but yeah, but you know again these, this is the changing nature of work. And so I think that pattern will persist. I think that the core of what will change is the process of looking up information across multiple systems and documents. You're going to that that is going to become less valuable. But I think all the jobs that involve human interaction, physical work, physical like I actually think one of the most interesting things over the next couple years is the job ecosystem around data centers, electricians, et cetera is going to become way more in demand. I was actually sitting, I was on a panel with, with a recruiting, someone who runs a recruiting company. They were saying that job profile I think will 2, 3, 4x over the next couple years. And so that will have pretty interesting implications for the education system, everything else. But I think we will see an evolution.
Peter
I meant the humanoid robot, electrician and plumber. Alex, very quickly, what are your three last standing human roles here or your last standing?
Alex
It's interesting. I'll present multiple competing hypotheses. Hypothesis one, briefly, briefly. One hypothesis is it's the politician because they help to make the laws. Another hypothesis is that it's the greatest intellects, the physicists or mathematicians. Even though as we talk on the pod, math and the sciences are all getting solved on the one hand they're still perhaps to the extent that that represents the culmination of human intellectual accomplishment, maybe the greatest intellects will be the last to be automated. There's another school of thought that says no, it's, it's the roles that involve the greatest need for human authenticity. Because even though it's not actually a capabilities question, people nonetheless demand human contact or something to that effect. And so it's going to be the highest touch job roles where people just want to know that there's a human counterparty on the other side of the interaction. So that's a set of three hypotheses.
Peter
Tastemakers will dominate.
Alex
That's authenticity bucket numbers.
Dave
That's what Mike Saylor said. Word for word actually.
Peter
Yeah, on his boat we had that enjoyable sunset conversation. Thank you, Alex. Salim, do you want to go next on a closing question for Matt?
Salim
I think you covered some of it on the industries that are kind of going after you guys have done some government work, where in government functionality do you see the biggest opportunity for AI, automation, efficiency, etc.
Matt Fitzpatrick
Yeah, everywhere. Look, I actually think this could be one of the, I think this could be one of the really positive trends for society. So I saw a study recently that AI assisted permitting could cut energy and data center project implementation timelines by 50%. Think about housing. One of the biggest challenges right now for housing development in the US is NIMBY regulations and how complex it is to build houses because of the myriad of different regulations and zoning contracts by location. Right. Or even saying the OECD came out with this thing that came out, this report that AI could shrink public sector process cycle timelines by 70% on licensing, benefits, approvals, compliance, and basically accelerating infrastructure deployment. So to me, the simplest thing that AI can do is project management and timelines related to all spending and infrastructure deployments would be a really positive thing for society in my mind.
Peter
Amazing. Good question, Selim. Dave, why don't you close us out in the questions here?
Dave
Oh, I got so many, but I'll pick the best first. Matt, how many hours of video footage will there be of you one year from today compared to one year ago? Because I know we saw each other in Riyadh a few weeks ago and I know that you, you are the thought leader in this whole bottleneck of AI getting into the enterprise. It feels like what we're doing right now, you know, the footage of you that's out there right now is all this bullshit cnbc, Bloomberg type, you know, five minute format. But here we're getting your real thoughts. It's just so much better. But how many hours can we count on a year from today?
Matt Fitzpatrick
Well, look, I think as of 12 months ago I had done almost no interviews of any kind. So this job has been fun in that front. And look, I. What I enjoy about the podcast format is it does allow you to talk about some of the more complex topics and so particularly a podcast like this, that's really interesting. So hopefully many more in the year to come.
Dave
Well, I'm hoping for at least a 10x on that. And then my follow up question to that is the avatar version of you that's also out there talking. Is that a 2026 thing you think or when?
Matt Fitzpatrick
Yeah, it's probably happens in 2026. I don't think it'd be that hard to train an avatar off of my public statement. So I think that'll be an interesting. We are actually working in the sports space actually on the topic of avatar training and I think it is actually an interesting space where you can imagine a lot of different areas where rather than a chatbot interaction, people want to speak to people they know via an avatar that might. I actually think that will become a more natural part of society and a pretty interesting one actually.
Dave
I totally agree. It's just the timeline is could be, you know, as soon as two months as far as I'm concerned.
Peter
It's not an avatar. We're speaking to you right now.
Matt Fitzpatrick
That's a good question.
Dave
That seems very human actually. I don't know. The best one time orbs behind you kind of give it away.
Matt Fitzpatrick
Yeah, they are pretty strange.
Dave
That's not real.
Peter
Matt. Where do people find you? Where do people find Invisible who should go to invisible to check out what you do and how you do it.
Matt Fitzpatrick
Sure. So we have seven offices now. New York, San Francisco, Austin, Texas, dc, London, Poland and Paris. I'm the easiest to find probably. We have an office right off of Union Square, which is where I'm at least half the time when I'm not on the road. And look, I think in terms of who should come to us and from the listener base in particular, any mid cap or enterprise company that knows there is potential in their business, that knows that AI can transform in a positive way and is struggling to bring all the pieces together. I think that is the main thing I would say is there is no doubt, you know, everything's Alex is asking. The technology has made an enormous step change over the last couple years. The hard thing is actually the change management, the operationalization, the metric tracking the evaluation. It's kind of bringing together like, you know, I think it's the difference between the. Our founder Francis has an idea of do you have all the components to build a cake but you don't have a cake. Like what we do is we actually bake the cake. In the end, we build you something that works. We make AI work and we use all the modern tools to do that.
Peter
Amazing. And the website.
Matt Fitzpatrick
InvisibleTech AI.
Peter
All right, thank you, Matt Saleem. Dave Awg. I'm going to see you guys in a couple of days for our 2026 predictions. Make them brilliant. It's going to be fun.
Salim
All right, I want a benchmark for tracking benchmarks.
Dave
That's your.
Salim
All right. No, that's not the one I'm gonna talk about.
Dave
Okay.
Peter
All right, guys, have a great day. Every week, my team and I study the top 10 technology metatrends that will transform industries over the decade ahead. I cover trends ranging from humanoid robotics, AGI and quantum computing to transport energy, longevity, and more. There's no fluff, only the most important stuff that matters that impacts our lives, our companies and our our careers. If you want me to share these metatrends with you, I write a newsletter twice a week, sending it out as a short 2 minute read via email. And if you want to discover the most important meta trends ten years before anyone else, this report's for you. Readers include founders and CEOs from the world's most disruptive companies and entrepreneurs building the world's most disruptive tech. It's not for you. If you don't want to be informed about what's coming, why it matters and how you can can benefit from it. To subscribe for free, go to dashmandis.com metatrends to gain access to the trends 10 years before anyone else. All right, now back to this episode.
Matt Fitzpatrick
With Venmo Stash A toggle on one hand and ordering a ride in the.
Peter
Other means you're stacking cash back with Venmo Stash.
Matt Fitzpatrick
Get up to 5% cash back when you pick a bundle of your favorite brands. Earn more cash when you do more with Stash. Venmo Stash terms and Exclusions apply. Max $100 cash back per month.
Alex
See terms@Venmo Me.
Episode #218: Why We Need New AI Benchmarks, Which Industries Survive AI, and Recursive Learning Timelines
Date: December 23, 2025
Guest: Matt Fitzpatrick (CEO, Invisible Technologies; former Global Head of Quantum Black Labs, McKinsey)
Host & Panel: Peter Diamandis (Host), Alex, Dave, Salim
This episode delves into how AI is transforming industries, why outdated benchmarks are holding enterprises back, and what companies must do to survive the coming wave of AI-driven disruption. Featuring Matt Fitzpatrick, a leader in AI R&D and CEO of Invisible Technologies, the conversation covers the practical challenges facing companies aiming to become "AI companies," the critical need for narrow industry-specific AI benchmarks, the tension between startups and legacy giants, and predictions for the acceleration of AI capabilities and organizational lag in 2026.
Scope of Impact: Not All Industries Will Change Equally
Matt Fitzpatrick:
Challenges for Small and Mid-sized Enterprises
Speed is a Key Differentiator
Specialization and Benchmarking
Limitations of Public Benchmarks
Opportunities for Industry Specialists
Charlotte Hornets—Scouting with Computer Vision
LifespanMD—Data Aggregation for Healthcare
SAIC & US Navy—Autonomous Underwater Swarms
Swissgear—Inventory Forecasting
The Klarna Example: AI Contact Centers
Why “Let a Thousand Flowers Bloom” Fails
Human Expertise and RLHF
Data Quality Over Quantity
Proprietary Data & Security
On the Future of Knowledge Work:
Alex: “I said knowledge work is cooked. Not knowledge workers, not companies. Knowledge work as we currently know it.” (03:59)
On Why Projects Fail:
Matt: “The failure mode on that has been you let a thousand flowers bloom, none of them have an operational metric, and you kind of end up with a science project dynamic.” (62:39)
Matt’s Self-described Role:
“Our founder Francis has an idea of: do you have all the ingredients to build a cake but you don't have a cake? What we do is we actually bake the cake. ... We make AI work.” (78:09)
On Human Roles that Will Stay Longest:
Matt: “All the jobs that involve human interaction, physical work ... the job ecosystem around data centers, electricians, etc. is going to become way more in demand.” (70:18)
On RLHF:
Matt: “We are as a company, fully a believer ... that human-in-the-loop is going to be a feature, not a bug, for a long, long time. ... Autonomous agents will do all of this with no human salute—I actually think you're going to need more and more humans at every step.” (44:41)
Matt Fitzpatrick’s Forecasts (67:08–69:27)
On Enterprise Caution:
Dave: “Enterprises are going to move super stupidly slowly compared to AI capabilities. ...That’s going to frustrate the hell out of Google and OpenAI.” (02:50, 46:53)
On Opportunities in Benchmark Creation:
Dave: "If you declare yourself the owner of [a benchmark] and then broadcast it, ...you become an instant star." (22:37)
On Expertise and Specialization:
Matt: "Human expertise becomes more and more important in many different areas. ...The human touch elements become more and more important." (51:33)
For more: Visit InvisibleTech AI to learn about enterprise AI solutions. To follow moonshots and tech trends, subscribe at dashmandis.com/metatrends.
This summary provides a detailed and engaging overview of the episode, summarizes all important topics, showcases notable quotes with attributions and timestamps, and delivers practical recommendations for listeners.