Everyday AI Podcast Episode 725: Measuring AI ROI: Why You’re Doing It Wrong and the 7 Steps to Fix It (Start Here Series Vol 11)
Host: Jordan Wilson
Date: March 3, 2026
Main Theme:
How to accurately measure the return on investment (ROI) of AI in your business—why most companies get it wrong, and a practical 7-step blueprint to fix it.
Episode Overview
Jordan Wilson confronts one of the most debated questions in business AI adoption: "Is there really ROI in AI—and if so, how do you prove it?" Drawing on benchmarks, debunked studies, and industry-leading surveys, Jordan dismantles common misconceptions and provides listeners with a structured, actionable approach to truly measure AI's impact. He urges businesses to move beyond intuition or "vibes" and get rigorous about measurement, emphasizing that "AI is gravity at this point."
Key Discussion Points & Insights
1. The Myth of Elusive AI ROI (00:17–08:30)
- Current Situation: Most businesses say AI "feels" like it’s working—faster output, more work done—but struggle to prove real ROI with data.
- Outdated Playbooks: Companies are using obsolete digital transformation strategies that don’t suit the speed or nature of AI in 2026.
"Chances are you can't answer that and the reasons are actually very simple. It's because businesses are using the same digital transformation playbook they've always used when it comes to AI. And, well, that playbook is useless in 2026." (00:37)
2. The GDP Val Benchmark: Real-World AI Performance (03:36–06:50)
- What Is GDP Val? A benchmark from OpenAI testing AI vs. expert humans (average 14 years' experience) across 44 jobs in 9 sectors.
- Findings: Top AI models tie or win 70% of the time in blind evaluations—and do tasks up to 100x faster.
- Implication: The data shows dramatic ROI in productivity and quality, yet businesses are still debating the basics.
"If the AI model is the same or better 70% of the time and it's a hundred times faster, that's the math. So why are we still even debating this concept of is there return on investment?" (06:40)
3. Debunking the "No ROI from AI" Viral Study (06:50–13:20)
- MIT "Study": Widely reported that "95% of enterprise AI pilots delivered zero ROI" (August 2025). Led to stock market drops and panic.
- Reality: The study was based on just 52 qualitative interviews, not quantifiable data, and was really a marketing pitch for MIT’s own AI product.
- Contrast with Real Data:
- IDC: $3.70 return for every $1 in AI.
- Wharton: 74% of enterprises report positive ROI.
- Google Cloud: 74% see ROI in GenAI within 1 year.
- Deloitte: 84% getting ROI from AI investments.
"This quote unquote study was based on 52 qualitative interviews. There's no quantitative piece to that 95%... It was a vibe study." (10:09)
4. Where Does AI ROI Go? The Invisible Productivity Problem (13:20–19:40)
- Hidden Productivity: Much AI ROI is "pocketed" by workers in remote/hybrid settings—time savings turned into free time, not visible company gains.
- Job Roles Are Outdated: 89% of organizations haven’t updated roles for AI. Outputs are still measured using pre-AI standards.
"Workers are pocketing it. That's where the ROI is going, right? Because true productivity and true ROI, well, it requires results-driven metrics, not time-based management." (15:23)
5. The True Measure of AI ROI: Metrics That Matter (19:40–23:25)
- It's Not 'Prompts Sent' or 'Utilization Rates': Measuring AI ROI isn't about usage dashboards, but about actual outcomes—time saved, costs reduced, revenue increased, and risk avoided.
- Simple Formula:
- Time saved (compared to baseline) × hourly rates minus AI/tool costs.
- Focus on cost per task, throughput, and error rates.
“Utilization is not the metric that pushes it. It's: are you saving time? Are you increasing revenue and avoiding risk?” (20:50)
6. Five Reasons AI ROI Measurement Usually Fails (23:25–25:50)
- No pre-AI baseline exists.
- Slow, year-long pilots kill momentum.
- Overreacting to a single lucky success.
- Vanity metrics (like prompt count) dominate.
- Shiny object syndrome—chasing new models before full implementation.
- Root Issue: Not a tech problem but a failure to rethink measurement and process as humans and organizations.
7. The Prerequisite: The BASE Approach (25:50–28:55)
- BASE = Baseline Assessment of Standard Execution (pre-AI):
- Meticulously time and document the process before AI.
- Track error rates, rework cycles, costs per completed task.
- You can’t retroactively create a clean baseline once AI is in play.
“You need to do it. Before you implement AI, you need to measure how long it takes humans to go through and do these certain projects or do these certain tasks.” (27:45)
The 7-Step Blueprint for Measuring AI ROI
(Detailed at 29:00–38:55)
Step 1: Define
- Rigidly define the outcome, rubric, and KPIs before testing.
Step 2: Measure Human Baseline
- Document multiple employees performing the task without AI; average their time, errors, and costs.
Step 3: Build Real-World Cases
- Create 20–40 challenging, messy test cases, including edge cases.
Step 4: Configure the Production Workspace
- Standardize models, accounts, permissions—ensure a controlled and repeatable environment.
Step 5: Run Tests Three Times
- Run each use case three times with memory off (in AI models); require 'proof artifacts' (e.g., logs, outputs).
Step 6: Grade Blind and Standardize
- Human judges grade AI and human outputs blind, using the agreed rubric.
Step 7: Retest Monthly
- After every major model update, repeat the process and use a rolling 3-month average for tracking.
“You have the input and the output, right... You multiply the time that it takes the humans to do it on the AI side...minus the cost of whatever AI tools that you're using. All right, there's your augmented cost and then you compare it to your human only cost.” (36:20)
Quick Recap (timestamped):
- 37:58 — “Step one, define the rubric and success criteria. Step two, measure the human baseline. Step three, build the 20 to 40 real messy work cases. Step four, configure the exact production workspace. Step five, run with three times AI models in three sets of humans. Step six is grade blindly and standardize the output format and criteria. Calculate your ROI, and then step seven is retest monthly.”
Notable Quotes & Memorable Moments
-
On the ROI Debate:
"This whole discussion on does artificial intelligence give you a return on your investment? It's...a very dumb question, if I'm being honest." (05:58)
-
On Corporate Reluctance:
"Humans are innately lazy... the reason why we don't want to actually sit and measure." (13:50)
-
On Old Ways of Working:
"You're using AI, okay. I've been saying since 2023, it's the same thing as, 'oh, our company's using the Internet.'" (15:00)
-
Why Most Companies Fail:
"These aren't technology problems... These are we as working human beings, as knowledge workers, we don't have a playbook to follow." (25:09)
-
Why the 7 Steps Matter:
"You need to establish those baselines and then redo the entire process, right? Blow it up, Measure it first. That's your base, right?" (28:55)
Final Words & Challenge to Listeners (38:55–end)
Jordan insists the debate on whether AI delivers ROI is "over." Quantitative research, rigorous benchmarks, and lived experience all say yes—emphatically.
“The new ROI question is not did AI work? It's how much did we lose by not educating in measuring sooner?” (39:40)
Timestamps of Key Segments
- 00:17 – Introduction: Why most companies can’t prove AI ROI
- 03:36 – OpenAI’s GDP Val benchmark explained
- 06:50 – MIT’s “zero AI ROI” study debunked
- 13:20 – Where AI ROI is “lost” inside organizations
- 19:40 – How not to measure AI ROI: common mistakes
- 23:25 – Five reasons companies fail to measure
- 25:50 – The “BASE” prerequisite for honest measurement
- 29:00 – The 7-step blueprint for measuring AI ROI
- 37:58 – Rapid recap of the 7 steps
- 38:55 – Big takeaway: The only ROI question that matters moving forward
Summary Takeaway
To measure AI ROI in 2026, you must put aside both hype and outdated transformation strategies. Instead, rigorously document your pre-AI baseline, run real-world controlled tests, and continuously re-evaluate after deployments or model updates. Meticulous, repeatable measurements—using the 7-step framework—are the difference between "vibes-based" and results-based AI management. The only real mistake left is to delay adopting this approach.
For more practical AI advice, access the full Start Here Series and connect with the Everyday AI community at StartHereSeries.com.
