Summary6 min read

Big Technology Podcast: "Are 95% of Businesses Really Getting No Return on AI Investment?"

Host: Alex Kantrowitz
Guest: Aaron Levie (CEO, Box)
Date: September 17, 2025

Episode Overview

This episode dives into the provocative claim from a recent MIT study that 95% of businesses are seeing no return on their AI investment, despite massive enterprise spending on generative AI. Host Alex Kantrowitz sits down with Box CEO Aaron Levie, fresh from the BoxWorks conference, to unpack the nuance behind the headlines. The discussion covers the real-world state of AI adoption, why internal builds often fail, the evolution and promise of AI agents, the economics driving the industry, and what’s next for both business and consumer applications.

Key Discussion Points & Insights

The MIT Study: "95% Get No ROI from AI"

Headline Claims
- Recent MIT study (as cited in Axios) causes a stir by reporting that 95% of organizations see no return on AI investments.
- Enterprises spent $30–40B on generative AI, yet most pilots and projects are claimed to fail.
Aaron Levie's Take
- "I'm shaking my head on actually like seven dimensions." — Aaron Levie [03:05]
- The discussion around AI ROI is polarized: Wall Street alternates between fear over wasted investments and utopian enthusiasm that AI will eventually upend all software.
- Reality is more nuanced: AI adoption is still in early days; many "failures" are a natural part of experimentation.

Why So Many Projects "Fail"

Proof of Concept Phase
- Most current projects are pilots: exploratory, trial-oriented, and naturally high-risk for failure.
- Early adopters are still sifting through what works and what doesn’t with new tools and vendors.
  - "By definition, you're kind of in the Wild West." — Aaron Levie [03:23]
DIY vs Off-the-Shelf
- Internal AI builds (DIY) have a much higher failure rate than partnerships or off-the-shelf solutions.
- Many non-tech companies tried to assemble custom AI stacks, only to end up with unmanageable, complex architectures.
  - "You might have 10 or 15 different pieces of software...before a single user could actually interact with AI." — Aaron Levie [04:08]

The Value of Applied AI, Not Custom-Built

Tailored, Purpose-Built AI
- AI delivers the best ROI when tightly integrated into specific workflows (e.g., document analysis, data extraction).
- Companies like Box build on open-source, but with internal expertise, not expecting every law firm or business to invent their own stack.
  - "Open source is insanely valuable, but not in the sense where a law firm should go off and build their own AI project." — Aaron Levie [08:21]

Change Management & Workflow Reengineering

Not a Plug-and-Play Solution
- Existing workflows often require redesign to harness AI's potential.
  - "You do have to reengineer the work...We will actually have to reengineer some of our business processes to make agents effective." — Aaron Levie [10:46]
AI Coding as a Leading Example
- Software engineering workflows have already adapted, shifting coders into more "manager-of-agents" roles.

Pilots vs. Production Use

Misinterpretation of Value
- Many failures reflect pilots or small-scale efforts—not final verdicts on AI's value.
- Survey data may miss the broader informal use of AI tools (e.g., employees using ChatGPT personally).

Individual vs. Enterprise Use

Personal Adoption Is Rapid
- Only 40% of firms have official large language model (LLM) purchases, but 90% of employees reportedly use personal AI tools daily [13:45].
- "It's empirical that we're choosing to use these technologies on a daily basis because they're adding that productivity." — Aaron Levie [14:14]
Still Early for Enterprise Readiness
- We're only beginning to tap into advanced enterprise use cases—most still in R&D or pilot stage.

AI Agents: Buzz or Breakthrough?

Defining ‘Agent’
- "It is now the new term of art for talking to an AI system that is doing work for you." — Aaron Levie [31:30]
- Agents don’t merely answer questions; they run multi-step processes, loop through models, and can do minutes or hours of work before human intervention.
The "Decade of Agents"
- 2025 marks the start of serious discussion on AI agents, but the real transformation will take a decade, echoing historical tech adoption patterns.

Concrete Business Examples from Box

Box Automate and AI Agents
- Box is releasing tools enabling businesses to create end-to-end workflows powered by AI agents (e.g., contract review, data extraction, onboarding).
- Agents can be deployed as components within larger processes, leveraging enterprise context and content.
  - "Box Automate lets you basically build these agents on demand...in a workflow that leverages your existing content." — Aaron Levie [37:25]

The Trustworthiness of AI Output

Grounding in Reliable Data
- With proper context and high-quality inputs, Aaron says, hallucinations and model inaccuracies can be minimized.
  - "You could nearly eradicate all of, if not the vast majority of hallucinations or accuracy issues." — Aaron Levie [20:19]
- Human review will remain part of knowledge work: people will increasingly serve as reviewers/editors of AI-generated content.

Industry Economics: Big Bets and High Stakes

Massive Spending, Strategic Losses
- OpenAI’s projected $115B cash burn through 2029 is strategic; the prize is automation at scale for all knowledge work.
- "If you believe like I do...this is the single biggest technology that we’ve probably ever had access to. And so if you think about this as sort of a third industrial revolution...That’s a very small number when you think about the size of the economy." — Aaron Levie [41:47]
Startups vs. Incumbents
- Early adopters/innovators will reap outsized returns; laggards risk disruption, though some entrenched giants can afford to wait.

Notable Quotes & Memorable Moments

On Pilots & ROI Claims
- “Lots of attempts...will absolutely fail, because by definition they're pilots and we're still in the early phases.” — Aaron Levie [03:22]
On Traditional Companies Building Their Own AI
- “A law firm should not go off and build their own AI project. That's a recipe for disaster.” — Aaron Levie [08:21]
The Inconvenient Truth about Workflow Change
- “If you don't do all of that work, you're probably not going to get a 2x or 5x gain from AI...Some of our business processes [must be] reengineered to make agents effective.” — Aaron Levie [10:46]
On Employee Behavior Outpacing Enterprise Readiness
- “90% of employees use personal AI daily...It’s because they're getting value from it.” — Alex Kantrowitz & Aaron Levie [13:45–14:14]
On the Role of Human in the Loop
- “We will be the reviewers of the AI agents' work, we will be the editors, we will be the managers, we’ll be the orchestrators. That’s how you get the productivity gains.” — Aaron Levie [22:18]
On Industry Betting Big
- "This is the single biggest technology we’ve probably ever had...$100 billion loss...that's a very small number when you think about the size of the economy." — Aaron Levie [41:47]

Timeline of Important Segments

[03:05] Aaron Levie critiques the MIT study and headlines about wasted AI investment
[04:08] Discussion of why internal, custom-built AI projects flop versus applied solutions
[08:21] Open source, DIY, and the real winners and losers in AI development
[10:46] Need for workflow reengineering; AI is not a drop-in solution
[13:45] Data on employee-driven AI usage vs. official enterprise deployments
[20:19] Context engineering and alleviating hallucinations
[22:18] Paradigm shift in knowledge work: human as agent manager
[31:30] What exactly is an "AI agent" and why the term matters
[37:25] Box’s real-world deployment of agents and practical AI automation
[41:47] Economic rationale for AI industry’s massive investments and losses

Final Thoughts

The episode paints a nuanced, realistic picture of where AI sits in the enterprise landscape: exciting ROI is happening, but only in focused, well-integrated use cases, and mostly among early adopters. The term "agent" is quickly becoming central, marking a shift from AI as a tool to AI as an autonomous collaborator. Yet, the journey for most organizations is just beginning; success will come to those who systematically rethink their processes and embrace the hard work of change management, not to those waiting for a plug-and-play panacea.

For more on Box’s AI offerings: box.com
Listen to future episodes of Big Technology Podcast for continued analysis of AI’s ongoing impact.

Loading summary

Transcript54 lines

[00:00]
Alex
Why are the headlines telling us that businesses are getting no return on AI investment? And are AI agents finally ready to get to work? We'll cover it all with BOK CEO Aaron Levy right after this. Octane is the premier identity event, bringing together the world's leading minds to discuss the future of secure access. Instead of consolidating security into a single platform, a modern identity security fabric is the key to unifying your defenses. At Oqtane, you'll learn how to extend that fabric across all types of identities, including the emerging threat of AI agents. Join in person in Las Vegas from September 24 to 26, or catch the keynotes and sessions online. To register and see the full agenda, visit okta.com octane that's okta.com okta n.
[00:54]
Aaron Levy
E.
[00:58]
Marco Werman
You're used to hearing my voice on the world, bringing you interviews from around the globe.
[01:02]
Carolyn Beeler
And you hear me reporting in environment and climate news. I'm Carolyn Beeler.
[01:07]
Marco Werman
And I'm Marco Werman. We're now with you hosting the World Together. More global journalism with a fresh new sound.
[01:13]
Carolyn Beeler
Listen to the world on your local public radio station and wherever you find your podcasts.
[01:25]
Alex
Welcome to Big Technology Podcast, a show for cool headed and nuanced conversation of the tech world and beyond. Well, today we're going to talk about AI and its application in business. Whether it's actually making a difference and whether AI agents are a real thing. We have the perfect guest to do it today because we have Aaron Levy back with us fresh off the Box boxworks AI event. And Aaron, it's great to see you as always.
[01:53]
Aaron Levy
Thank you. Thank you Alex. Good to, good to be here.
[01:56]
Alex
So did I, did I add Box boxworks AI event or is it just called boxworks?
[02:01]
Aaron Levy
And I'm just, I actually like, I like you, I like you calling it AI event. It, it is just called boxworks. But, but any, anytime you want to jam an AI in there, we're good.
[02:09]
Alex
Okay, sounds good. You had a lot of AI news. We'll get into that in a moment. But since you are talking with a lot of folks about AI applications in business, I want to run this MIT study by you and get your perspective on what's real and what's not. So this is from Axios a couple weeks ago. MIT study on AI profits rattles tech investors. Wall Street's biggest fear was validated by a recent MIT study indicating that 95% of organizations studied gets zero, get zero return on their AI investment. They studied 300 public AI initiatives trying to suss out the no hype reality on AI's impact on business, 95% of organizations said they found zero return despite despite enterprise investment of $30 billion to 40 billion into generative AI. This has been a study that everybody in the business world is talking about. Do you think there's any validity to it? You're already shaking your head.
[03:06]
Aaron Levy
I'm shaking my head on actually like seven dimensions. We could parse each one. Let's do it. So, I mean, actually maybe the first one that is maybe the most funny is the kind of Wall street element. Actually, Wall street is completely schizophrenic on this dimension. Obviously a report like that scares them on one dimension. But actually there's an equal amount of Wall street maybe kind of frenetic energy around the idea that AI will be so good that all of software is dead. So it's this very kind of bipolar state of where are we in AI adoption versus AI is going to be so powerful that there's not even going to be software business models because everything will just be delivered by AI. And as with most things that have these kind of extreme polarization elements, I think the reality is just way more nuanced. We are still early in the adoption curve of AI, in the early curve of all of these types of technologies. You have lots and lots of proof of concepts. You have lots of trials of different technologies. People are trying to figure out which tool works for which use case. So by definition, you're kind of in the Wild west where there's lots of attempts at trying these technologies with various vendors and technology stacks. And many of those projects and pilots will absolutely fail because by definition, they're pilots and we're still in the early phases. One interesting thing about this study was they saw a significant delta between companies that tried to effectively DIY their AI stack versus going with really applied solutions and use cases. And this is what we tend to find in our customer base. So I think there was maybe an initial theory of, well, AI will be relatively easy to kind of get our arms around. We could build our own AI application. We'll do all of the vector embeddings of our data ourselves. We'll put it into a vector database. We'll manage the security and permissions of data access ourselves. And before you know it, a company that wanted to deploy AI in a particular workflow in their enterprise, they might have 10 or 15 different pieces of software that they have to run and manage just before a single user could actually interact with AI within that organization. So that's probably an architecture that's not going to work. You need to have purpose built solutions that solve sort of tailored use cases. Those can be very big use cases, like all of AI coding. But you probably don't want to be in a position where you have to kind of bootstrap this or build it all out yourselves. And that was one of the kind of recognitions in the survey. But I obviously wholeheartedly disagree with any of the maybe conclusions other than just you have to get your use cases right, you have to kind of target the most effective areas for AI and you probably shouldn't be building this technology yourselves. But it's sort of empirical on our end. We get to talk to customers every single day that are seeing the immediate gains. We've talked to customers where they have had colleagues that can't actually, they can't present the actual ROI savings to their board, the actual kind of expected ROI savings to the board, because the board won't believe how, they won't believe the numbers based on how good they are. So they actually have to water them down. So it's actually more pragmatic and believable based on what they're seeing.
[06:38]
Alex
So isn't that a terrible board? I mean, if the board can't hear.
[06:41]
Aaron Levy
The truth, well, the truth is so good that it doesn't sound credible. So that is the like when the ROI is so good that you actually don't, you aren't going to be believed when you actually explain how this thing's going to work. So we're seeing examples all across the board, at least for our customers. You know, we have the benefit of a very applied use case, which is we take documents and unstructured data and then we have AI agents that can operate on that data to do things like extract structured data from your documents. So give us 100,000 contracts and we'll pull out the structured data fields in those contracts or give us invoices and we'll pull out the key details in an invoice so we can help automate a workflow. Those use cases tend to be very high ROI because either you weren't getting that data before or it used to be very expensive to do so. And AI is getting increasingly good at being able to execute that kind of task. And so there's immediate benefit to customers. You can automate workflows much more easily. As a result, you can lower the cost of operations in some areas. So we tend to see a different set of outcomes based on the AI adoption within our customer base. But if you zoom out and you kind of think about all projects across the past couple of years, I do think you're going to get a mixed bag just as a reality of how early we are in the space.
[07:59]
Alex
Yeah. And it says internal builds fail at double the rate of external partnerships. So spot on there. People trying to parse this together on their own versus doing it externally or having a tough time, which sort of flies in the face of like some of the conventional wisdom. I think the conventional wisdom was you wanted to be able to build internally maybe with open source so you could customize to your use case. But it turns out some of the off the shelf stuff is actually working quite well.
[08:22]
Aaron Levy
Yeah, I think you have to. You know, a lot of the challenge with either these types of surveys or even talking about architectures is you have to kind of separate the tech industry from the non tech industry. The non tech industry being the kind of consumers of these types of technologies and the tech industry being the builders. So open source is insanely valuable, but not in the sense where a law firm should go off and build their own AI project. Using an open source model like that is just a recipe for disaster if we think that every single company on the planet is going to go build their own technology to go automate their workflows. And that has been actually the case for a lot of pilots because we've been early in the technology and you haven't had applied solutions that you could go deploy. But open source is actually extremely valuable for a company like Box because we're powering technology for 120,000 customers. We actually do have the expertise internally to leverage those kinds of capabilities. So I would say the conclusion from the dimension of open source as an example is just you probably shouldn't expect that every company in the planet is going to DIY their own AI strategy. And that's a recipe for not getting the returns and gains from an AI adoption standpoint. And then maybe the final point, the thing I kind of point out is just there really is a decent amount of change management required to getting real gains from AI. There's not a, this is not a panacea type of solution where you could take an existing workflow, drop AI directly into it, and then all of a sudden that workflow will be 3x better. You usually do have to re engineer the work to take advantage of AI. And the conclusion I've recently come to more and more is I think we had this feeling maybe two or three years ago where AI was going to learn everything about how we work, it would be able to adapt to our workflows and then bring automation to our workflows. And I think realistically, increasingly we probably will have to modify our work, hopefully incrementally, but in some cases meaningfully to fully take advantage of AI. And that sounds maybe hard on one hand, but for the companies that do that, the ROI is going to be fairly massive. So if you think about AI coding as maybe the most obvious example right now where you're seeing productivity gains, the way that AI kind of first engineers tend to work is pretty different than how you engineered two or three years ago. The engineer really becomes more of a manager. You're deploying agents to go off and work on large parts of the code base and then it's coming back with a bunch of work that you go and review. So if you don't change your workflow as an engineer to take advantage of background agents and how you give them the right kinds of prompts to actually execute on their task and the new ways you should effectively think about your code base and handling the specifications and rules of what the AI agent should do. If you don't do all of that work, you're probably not going to get a 2x or 5x gain from AI. And so we will actually have to re engineer some of our business processes to make agents effective as opposed to thinking agents will just drop into our processes and automate everything that we're doing.
[11:32]
Alex
By the way, you've brought up pilots a couple times and I think it's important to talk about because this study was not just pilots, it was 95% of organizations get zero return on AI investment. So I think the pilot thing is interesting because it's natural that pilots are going to fail. And in fact, we've had some listeners who've given me some feedback. That said, because I talk often about how like only 20% of AI pilots or 10 to 20% of AI pilots get out the door into production. And that might be a good number because you're, you're going to obviously, you know, have some trial and error in the early days.
[12:05]
Aaron Levy
Yeah, and to be clear, I'm using pilots colloquially in the sense that we're just so early in the technology that when we talk to customers, what a lot of times they have so far deployed is the equivalent of a pilot.
[12:18]
Alex
Just because of literally how organization wide.
[12:21]
Aaron Levy
Yes, well, organization wide is, you know, it's hard for one centralized survey taker to represent an organization wide. It's like, like that's why again, that's why I don't want to like the survey is great. It's an interesting, you know, kind of conversation starter. But like, if you actually tried to go assess how is the answer, you know, answering this question and what is their way of measuring that, that productivity and have they actually surveyed all of the end users that are just using ChatGPT in an unsanctioned way and what they're doing, it's like it's not possible to capture all of that. So, so it tends to more represent the kind of the centralized, you know, heavily sort of, you know, again, kind of, I think more, more likely pilot oriented type projects because of just again how early we are. You know, the word agents just came onto the scene less than a year ago. So we're just early in a lot of these spaces. But again, I think it's a fantastic survey because it gets a conversation going. But I think if the takeaway was to slow down, you know, using AI or to, to do anything other than kind of realize what you should mitigate from a risk standpoint, then actually the failure would just be, or the problem with that would just be all it's going to do is cause some companies to move even more slowly and then you'll have other companies just outrun them. So it's kind of up to the, it's sort of at the risk of. The risk is now on the listener to decide what they want to do about that survey.
[13:46]
Alex
Yeah, and I can tell you one more thing that I found super interesting about this study, which has sort of been underappreciated. So it says official LLM purchases cover only 40% of firms, yet 90% of employees use personal AI daily, at least those served. Which just is so interesting because it means that, yeah, there's, there's more personal use and more interest among individuals than companies to get this stuff into production. Yeah, obviously every action here. So let's hear it.
[14:14]
Aaron Levy
Yeah, well, I know, I just think that's, that's like empirical revealed preference. So you don't even have to survey once you know that. Why are people going off and using AI in a personal productivity sense at that rate? It's because they're getting value from it. So you almost like that is sort of now in the baseline of how people are working. It's unquestionable that if you just sort of eliminated AI just today, let's just say you would just notice. Wow, okay, I actually have to go and do that three hours of research that I used to be able to Go and kick off as a deep research project and go and check back in on it after five minutes. And so it's empirical that we're choosing to use these technologies on a daily basis because they're adding that productivity. And I would argue that what we've seen with AI thus far is barely scratching the surface of what is going to start to happen as you start to deploy these technologies.
[15:09]
Alex
But do you think the use in business, could it potentially be just individuals using, let's say, ChatGPT on their own versus scaled enterprise use of large language models? Or because, or do you think will be some blend in the future? You're obviously watching in the future because you're obviously watching this happen on the other side of things.
[15:28]
Aaron Levy
No, the future is, I think that we are in the earliest phases of just even the diffusion of the technology itself, of the basic use cases of, hey, when you're going to go research a customer, why don't you get a full account plan instead of just saying, okay, this person works at this company and they're interested in these things and these are the trends of that industry. Why not ask an AI system to generate the full plan? That's super powerful, but also relatively basic if you think about how people work and the full scope of workflows that people do. One really interesting example of again, how early we are. Claude this week announced a new capability that will generate files for you. And Even though we're two and a half years, you know, nearly three years into the ChatGPT moment, it's the first time where an AI system can, I believe, generate reliably a kind of high quality document in the form of a word document or PowerPoint presentation. So we're nearly three years in and it's the first time ever that you could generate something that you would sort of look at and say, oh, that looks like a good presentation. So we are only at the very, very beginning stages. Now imagine it'll still take a couple of years. Now imagine a technology like that begins to ripple through corporations. And in the future, before you go and present whatever product you're selling to a customer, instead of spending one or two hours of doing a bunch of research and making your PowerPoint file, that's your presentation. You go to an AI agent, you say, I'm about to go sell to this customer, generate this present presentation for me, you kick that off and again, three minutes later it's sort of done for you. This is going to just show up in all of our workflows every single day. In almost everything that we're doing. So coders are getting the first lens into what the future looks like earliest because they're sort of wired to take advantage of these tools and AI coding has been the kind of first breakout use case. But that same dynamic of you're going to go to an interface, you're going to talk to an agent, it's going to go and execute kind of multiple steps of work for you that will start to emerge within all of knowledge work over the coming years. I actually am probably a pragmatist on this sense that it will not be like this instant overnight transformation of work. It will take years of change management. We just hosted our conference this week as you noted, and it happens to be a crowd. Obviously by definition that is sort of forward leaning and kind of early adopters of technology. But that represents a small fraction of the total economy. It will take years before again all of the banks, all of the pharma companies, all of the law firms start to get wired up in this, in this AI first way. But, but I mean unequivocally it's going to happen and there's, there's nothing that will kind of slow that train down.
[18:21]
Alex
All right, let's talk a little bit more about this. Using cloud to generate documents. Use case. I mean, I would imagine so you talked, the example that you gave was using one of these to go in and sell into a client. Now I would imagine most organizations, they have like their PowerPoint templates and the data baked in. So even if I were to go into cloud and like upload my pricing spreadsheet, my inventory spreadsheet, a document about positioning and say make a PowerPoint based off of this. I'm sure it would do a good job. But how practical is it to then say this is going to be a way that people do their work versus something that might look like a party trick where you're going to use the other documents that you have already when you actually are going to go out into market.
[19:02]
Aaron Levy
Oh yeah, no, the way that this will actually show up and I, you know, I can't represent the exact date that this will happen. But box, you'll just go to box and you'll say here's my sales presentation template, here's the new client information. Please generate a PowerPoint presentation with that like, and then you'll just do that with your existing data. This is not the, this is not sort of some kind of one off vibe coded document. You will use your existing assets as the source material for the next document that you'll generate and you'll go and review its work and that'll take you three minutes. But it will have saved you an hour or two hours of all of the time that it took to do the customer research and move around all the graphics and put the relevant information in place. That will just be done for you and that will multiply that over a million people that do that per day in some sector of the economy and you'll just see that's how you get tens of millions of hours of productivity gained, you know, within the, within the economy.
[20:01]
Alex
And how are you feeling about the trustworthiness of these models? Because you've talked a couple times now about how you could use deep research to prepare you for something, or you could use these models to generate a PowerPoint and then spend a couple minutes checking them over. Are you at the point now where you think these, the outputs of these models are trustworthy enough that that's all it takes?
[20:20]
Aaron Levy
I think as long as, and this is where I get very excited about, about now that obviously within the zeitgeist is context engineering. As long as you are really good about what context you're giving the AI and how you are effectively grounding the AI in trustworthy data with the right kinds of prompts and a high enough quality model, you could nearly eradicate all of, if not the vast majority of hallucinations or accuracy issues. So in our case, everything that we do at box is we think about your existing data as the source material for the AI agent. So it's the source context for the AI agent to be effective. And so if I take an existing PowerPoint document, that's our sales presentation and I say modify this for a new customer and you do that with a Frontier model, that is a reasoning model with some degree of kind of thinking mode, I would posit that 99% of the time it's going to make, you know, infinitesimally small kind of errors or failures on that. That's just like a solved problem at this point. And so and it is still easily worth the kind of five minute trade off for the couple hours you save to go and review its work. And this is the like we actually have this incredible front row seat in watching what the future looks like with coding. So if you talk to the new, like the brand new startups and I don't know if you do this, but I know that you get, you get to spend your time with the demnises of the world and whatnot, but go talk to A five person startup that's brand new. And what's exciting is they are working in the craziest ways that I've ever seen in my entire life. I was talking to a nine person startup the other day that estimates that they're at a minimum executing at the size of about 100 person company. And that was again, kind of conservative probably when you do the underlying math. And it's because each of their engineers has the capacity output now of five or 10 or 20 engineers worth of work. But they are working in a completely different way. They are managers of AI agents. They spend their time on writing really good specs for what they want to build, they spend really good time on the design architecture of their software and then they spend a lot of time on reviewing the output of the agent. So not every area of knowledge work will look exactly like that. But if you imagine, you know, in sales, if you imagine in marketing, if you imagine in legal work, and your role is to manage agents that are doing a lot of the underlying data preparation, research, you know, creation type of work, and then your job is to go review that work and put it together in a broader business process. That will actually be what a lot of work looks like in the future. And this idea of hallucinations or errors will be no different than the fact that I have to sometimes review other people's work and other people review my work and I have errors in the presentations that I create that somebody catches and they see a misspelling or they, they see that I changed the name of a customer in the wrong way and they change that. We will be doing that for AI agents. So it's this, it's this flip of the model where we thought AI agents were going to review our work and kind of incrementally make us more productive. We will be the reviewers of the AI agents work, we will be the editors, we will be the managers, we'll be the orchestrators. And that's actually how you then get the productivity gains. So I'd say watch the AI coding space, watch what startups are doing to get leverage and then think about that against the broader economy.
[23:49]
Alex
You know, it's really interesting, Aaron, because the last time we spoke you told me about this person that you knew who was basically building a company on their own using AI coding tools. And so I was in the process of writing this profile of Dario at Anthropic, which you're quoted in, and I went out and found a developer doing something quite similar using cloud code to build on their Own. So this is clearly, I mean, to the point where like anthropic now has to put some rate limits on. But this is clearly a thing that's happening.
[24:17]
Aaron Levy
And this is the thing that again, I love the MIT survey. I think it's great. It's a fun conversation topic. But the one travesty would be if people miss that what you just said is actually happening on the ground and then not starting to pay attention to what that's going to mean as that ripples through corporations and how people should probably start to think about re engineering workflows for a world of AI agents. And you know, this happens in every single technology, you know, wave, which is actually why you have early adopters and early innovators and why you have laggards is because the early adopters and innovators are going to read, you know, your anthropic piece and see, oh, this actually is a real trend. And the laggards are going to read the MIT piece saying, oh, I've been vindicated. And some companies will then get those early returns at a much faster rate and other companies can wait. And, and you know, sometimes that means that your company gets disrupted and sometimes it doesn't because you actually have, you know, some proprietary, you know, capability as an organization. Like, like if Pfizer or Eli Lilly took a little bit longer to adopt AI as a result of, of, you know, wanting to be more pragmatic, that'll be totally fine. They're not going to get disrupted. Like, they have enough of market position, they have enough distribution, they can afford to kind of wait for this technology to be more baked. But if I'm a startup right now, I'm probably going to use that as my advantage as much as possible to try and run circles around maybe a larger incumbent. And this is what kind of creates this nice tension in the market that, that, you know, creates creative destruction in every, in every kind of wave of, of, of technological change.
[25:53]
Alex
Okay, definitely want to speak a little bit more about what the definition of an agent is and how you're rolling them out at Box and also get your reaction about GPT5. So let's do that right after this.
[26:07]
Marco Werman
You're used to hearing my voice on the world, bringing you interviews from around.
[26:10]
Carolyn Beeler
The globe, and you hear me reporting environment and climate news. I'm Carolyn Beeler.
[26:16]
Marco Werman
And I'm Marco Werman. We're now with you hosting the World Together. More global journalism with a fresh new sound.
[26:22]
Carolyn Beeler
Listen to the world on your local public radio station and wherever you find Your podcasts.
[26:30]
Aaron Levy
Foreign.
[26:34]
Alex
And we're back here on Big Technology Podcast with Box CEO Aaron Levy. Aaron, let me start before we get into agents and before we get into GPT5, let me just start with a basic question, which is if this is already happening in business, which is basically like you're finding ways to get the AI to do work on its own and pull information from different data sources and present it coherently, why do you think it's been so difficult for consumer companies like let's say Amazon with Alexa plus and Apple with Apple Intelligence to put this together, some as something on device or a consumer product that does similar activities? Because they've all promised it, but it, it's not quite there yet.
[27:18]
Aaron Levy
Yeah, I, I think there's the, the fact that the technology can exist is different from the, there's still the execution requirements to bring it to life. And so we get to all have a front row seat on what the frontier models can do. And you have companies that can package those up in a way for these applied use cases. But if you're a company with tens of millions or hundreds of millions of users of your product and consumers that have a certain expectation, that is a lot of execution gap required to go from the frontier model to how do you deliver that to your end customer in a reliable way that is trustworthy, that is affordable. And so I think that the bigger companies are all going through their own version of that motion. I'd also imagine that given the space is moving so fast, I can sympathize for probably some degree of indecision maybe where one day a model is on top and then the next day a different model's on top and another day another model kind of breaks through. And so you probably want to make sure that by the time that you land on a final architecture, you want that to be the sustainable long term architecture. And so to some extent time is on your side up to a point, because you might want to wait to see kind of who falls out and who keeps going. But I think that as an example, the companies you just mentioned, I don't think those spaces have been so utterly disrupted that, that they can't catch up once they land on a final architecture. But you know, we'll have to see kind of how they execute through this.
[28:52]
Alex
And so for business it's more that there are more prescribed use cases. And I think with, with a phone maybe if you're trying to get these proactive notifications, then you're looking at a massive universe of data, whereas you're more concentrated in business or what's the difference?
[29:06]
Aaron Levy
Well, actually, I wouldn't say there's a difference. I would say even in business we're insanely early. Like we have to process how early we are. The breakouts so far have been chatgpt for consumers. The breakouts have been coding agents for very, very wired in engineers that are very online. They're paying attention to everything going on. And then early adopters across the economy. Most of the agents that are being deployed in the enterprise are being done by maybe you can flash it up or something. Geoffrey Moore came up with this idea of the technology adoption curve or at least popularized it. It has multiple categories of where a company or a group of individuals will be. You have these early innovators and early adopters, then you have a chasm. Then you basically have kind of pragmatists and early majority and then you have laggards. And we are in the early adopter, kind of the earliest phase of jumping over the chasm on some use cases. But we have to imagine there's this chasm where what happens is the early adopters, the people that we all hang out with and talk to all day, they're going to try everything. We're going to try these crazy goggles and we're going to put magnets on our head and we're going to do the craziest things. We're going to wear Google Glass. And that actually tells you almost nothing about whether the thing will jump over the chasm. You have to actually see what makes it to the early majority or those pragmatists that really adopt things at scale. And so the kind of technologies that have clearly broken through are ChatGPT products like cursor products like let's say a bunch of these kind of next gen research agent type things. Perplexity's done well in that kind of early majority. But we are so early in terms of AI agents jumping over now the chasm. So some won't make it, some will. But I would say that business is not particularly moving faster than the examples you just gave. I just think we can see lots of examples of it, but they're usually in that kind of early adopter type category.
[31:11]
Alex
Right. And so the week we're talking, you at Box are releasing a number of different agents. Let me start this discussion by just asking you what is an agent? Because it does seem like it's an overused term. And even myself, who I'm in this all the time, I don't fully have clarity on what that word actually means.
[31:30]
Aaron Levy
I think we should anticipate that it's fully overused. It is now the new term of art for talking to an AI system that is doing work for you. So just, we will hear, this will be the main term that we use going forward as an industry. And not because it's a buzzword, but actually it's a useful term. It's a definable object that is doing automated work for you. That could be in some cases as simple as answering a question. But I think most people in the tech industry would generally argue that it should be doing some degree of work and looping through the AI model multiple times to do that work. And so that could be everything from very clearly something like Claude code or cursor has an agent, or replit has an agent, where you give it a task like build me a website that has these qualities and it will go off and do weeks worth of human work in 10 minutes. And that's an agent that is managing that whole process, looping through the model multiple times, keeping track of what it's doing, updating its memory in the process. And that's effectively an agent. So that's an agent encoding. And we're going to see that same kind of agent architecture emerge in law, in healthcare and finance and education, where you can deploy agents to go off and do work for you. And, and there will be a critical axis which is how much work can the agent do before you have to intervene and modify and kind of repoint it in the right direction. And so a lot of that work right now can be maybe a couple minutes long, but we're seeing examples where agents could be running for tens of minutes or maybe even hours and effectively drive better and better and more high quality output. So I think that's a way to think about agents and these are going to be very pervasive in the coming years. But this is really the first year, 2025 is the first year where we could even really be talking about it seriously. And I think Andre, you know, Karpathy had a, had a, you know, probably phrased it as we shouldn't think about this as the Year of Agents, we should think about it as the Decade of Agents. That's probably the right way to think about it.
[33:35]
Alex
This is the Year of Mobile became the decade of Mobile. But then eventually we started using Mobile.
[33:40]
Aaron Levy
Yeah, but, but, but the, and, and again, when you just said the year of mobile mattered, right? Did, did people say, you know, so some people said that was in 2022. But probably the first time it could have been realistic was 2000, sorry, not 2022, 2002, some people but, but it wasn't really realistic until 2006 and 2007 when, when you had the iPhone. So you know, I, and I think fairly other, many, many other people are actually convinced we, we already have our iPhone for agents. We don't to need, need any kind of new breakthrough architecture. We have an architecture that already kind of works as the core scaffolding for agents. So we can start the decade kind of clock now, but it will be a full self driving type problem. Obviously Waymo got kicked off, I don't know, a decade, decade and a half ago and only this year is it accessible in suburban Silicon Valley. So what took a decade or a decade and a half? It was just lots of engineering work, lots of miles on the road, lots of improving every single dimension of the accuracy and the intelligence of the system. We are going to see the same thing for knowledge work. It's going to take years. The early adopters will get the early returns, the pragmatists will use it once it sort of works without a lot of hand holding and everybody will land somewhere in the middle of that spectrum.
[35:02]
Alex
Okay, and so I watched a chunk of your presentation this week and some of the agents that you're talking about enabling companies to deploy will be things that will for instance, take a look at a application to be involved, to maybe take an apartment out or. Oh yeah, or to look at some property records and then do tasks there or to create reports, looking at clinical tests and trying to pull out issues. So talk a little bit about how the process to create these works. And is this still in the demo phase or is this actually real?
[35:40]
Aaron Levy
So maybe second question first. So we made a number of big announcements this week. Some of the product and capabilities that we announced are fully ga right now so customers can already start to use it. Some of it we kind of give a little bit of a, of a crystal ball view into the next couple of quarters of the product that we're getting out there. As an example, we have an AI agent right now that any customer can go and use, which is a data extraction agent. So you can give us again contracts or invoices or medical data. And then we have an AI agent that works through that content, pulls out the critical data from those documents and then lets you go and automate a workflow around that. What we announced at boxworks was a new capability called Box Automate. And what the idea of Box Automate Is is it's very, very powerful to have one off agents that can help you, you know, review a document or generate a proposal or generate a sales plan for a client based on data. That's super powerful. But what's even more powerful is that I can drop many of those agents into a full business process. So what Box Automate lets you do is actually define your business process within Box. It could be a client onboarding workflow, it could be an M and a due diligence review process, it could be a health core healthcare patient review process. And you define that workflow within Box Automate. It's a drag and drop kind of workflow builder. And then at any point in the process you can bring in an AI agent to do work within that process. And so one thing that is very important with AI agents is they need the right context to be effective. So our system allows you to get that context to agents from your enterprise content. So your marketing assets, your research data, your contracts, your invoices, that becomes very important context for agents. So Box Automate lets you basically build these agents on demand or on the fly in a, in a workflow that leverages your existing content and then we can start to help you automate a bunch of knowledge work tasks around the enterprise.
[37:41]
Alex
Now a lot of the early reviews around GPT5 was it was sort of built to do these type of things or like as a foundational layer for this type of work. Right. The, the reviews we read early on was that it just does stuff and there have been people that have noticed that like when you're in ChatGPT using GPT5, you like literally can't have an answer where it doesn't say can I do something for you? So I'm actually curious Aaron, what your response has been. We last time we spoke was pre GPT5, what your, what your feeling has been about this new set of models. Really it's a set of models and, and I'm curious like what you make of the fact that so many people were disappointed early on.
[38:19]
Aaron Levy
Well, yeah, so we on the disappointment or kind of online zeitgeist, which actually interestingly has already shifted I think quite a bit where a lot of folks have kind of updated their views on GPT5 and I think Codex has come out very strong recently on the coding agentic side. I think we have gotten used to and we've been hooked on these incredible kind of jumps and breakthroughs over the past, you know, year or so. We had, we went from, if you think about it we went from GPT4 to GPT4 0 to 0103 and then GPT 4.1 and each of those on a different axis was actually a pretty meaningful step function. So if you had just taken GPT4 and then you jumped to GPT5, it would have looked insanely exponential. But we got these points along the way that effectively kind of gave us an early Preview into what GPT5 would ultimately become, which is a thinking model, a chain of thought with way higher quality of coding skills and a bunch of capabilities on critical dimensions of work. And so I think it was mostly just driven by the fact that we got lots of incremental steps or step function steps on the path to GPT5. And then GPT5 was just the culmination of a lot of those breakthroughs. So again, I think it's probably more psychological than kind of empirical. Like I think if we had gone from three to four to five, it would be the most vertical axis we've ever seen. But it was really again, those steps along the way that maybe caused a little bit of that kind of reaction. In our world, we test every single model on a number of evaluations where we give, we give the model different types of enterprise data, contracts, financial documents, research materials, internal memos, those types of things. And we ask the model a series of questions about that document or data. And we saw meaningful improvements from GPT 5 versus GPT 41 as an example on our evals. So for us it was multiple points of improvement on a number of our key on our key tests. And those improvements then translate into real life improvements for customers where they all of a sudden it'll mean that when you're a healthcare provider using GPT5 on unstructured health care data, you're going to get better results than you got before or when you're using it on your contracts, you're going to get better results. And so on a number of spaces where either it was kind of expert analysis required in healthcare or law or financial services, we saw improvements, or in more a general sense, if you needed logic or reasoning or math, it was also an improvement on those dimensions as well.
[41:08]
Alex
Can I get a quick gut check from you on the economics of the AI industry right now? I mean, we are talking at a moment where we just talked about this on the Friday show with Ranjan that OpenAI's losses are now going to total 115 billion through 2029. Sorry, it's cash burn. 115 billion through 2029. 80 billion higher than it previously expected. It's expected to make like 10 billion this year. But it just signed a $300 billion deal with Oracle that like turned Oracle into a nearly one trillion dollar company almost overnight and made Larry Ellison the richest person in the world above Elon Musk. How does this, how does this make sense?
[41:48]
Aaron Levy
Well, I think it makes sense if, if you believe like I do and certainly others, you know, Jensen, clearly, Sam, even Elon, I think, would believe that this is the single biggest technology that we've probably ever had access to. And so if you think about this as sort of a third industrial revolution where for the first time ever we can bring automation to knowledge work, Just think about that for a second. We bring automation to knowledge work. Everything about the world of knowledge work was always basically limited by how fast we as humans could work. We could type into a computer, put data into a system, somebody else reads that data, it moves along in some kind of process. That was about the speed of knowledge work, was how quickly we could type or read information and then do something in the real world with that data. That was the rate of pace, that was the pace that knowledge work could happen at. And so every field that we know of in kind of knowledge work, you know, healthcare experts reading medical diagnoses, life sciences experts that are doing research on clinical studies, lawyers that are trying to find facts about a case or working through intellectual property, an engineer trying to generate code and read product specifications, all of that work has always been constrained by how fast we as individuals can do that work individually, ourselves. For the first time ever with AI, we can bring automation to effectively all of that work. And that automation can kind of be tuned based on just how much compute we throw at the problem and then of course, how good our data is and how effective our systems are at getting that data to the AI. But in a world where you can toggle, compute and then get different levels of automation and effective output in work to get done at a way lower cost than what people can do, that is the biggest breakthrough we've ever had in the economy and in the sort of in the kind of post industrial world. And so $100 billion of loss, let's say, to get to that point of saturation where that technology is out there, that's a very, it's actually a very small number when you think about the economy, the size of the economy for all of health care, all of law, all of life sciences, all of financial services, all of engineering. So I think that's how these technology companies are underwriting this and the losses are a choice. To be clear, that's very obvious. They are choosing to lose that money. They're doing it for a strategic reason. That's at least their decision. The strategic reason is that this is such a valuable market to own and to dominate in that they would rather build up capacity and in many cases subsidize usage, let's say in free consumer tiers of ChatGPT, then charge everything at today's kind of rate of cost and then make sure everything is profitable. That's a choice. They could decide to charge for everything. They would get less adoption today. They would be instantly a more sustainable business. But enough people believe that the prize is big enough that it's worth actually doing all of the research expenses, all of the data center expenses and the subsidization where necessary to drive that adoption and demand. And it's a go big or go home type of bet. Clearly very smart, very economically rational firms, individuals, sovereign wealth funds believe that that bet is worth it. I'm probably on the side that the bet is worth it because of again, how material of an economic impact the this technology can have. And then we'll obviously see how it plays out with any kind of individual player in the in the space.
[45:37]
Alex
Folks, you can learn more about Box's offerings@box.com there's a video playing on the homepage right now that talks a lot more about things that Aaron and I have discussed here today. Aaron, so great to see you. Thanks again for coming on the show.
[45:47]
Aaron Levy
Thanks, Alex.
[45:48]
Alex
All right, everybody, thank you so much for watching. We'll see you next time on Big Technology Podcast.
[45:57]
Aaron Levy
SA.