Advice from a Call Center Geek!
Episode: Everything I Have Learned About AI Powered QA
Host: Thomas Laird
Date: March 4, 2026
Episode Overview
In this episode, Tom Laird, CEO of Expivia Interaction Marketing and auto QA startup OttoQa, shares in-depth lessons and tactical insights from his journey implementing and optimizing AI-powered quality assurance (QA) in contact centers. Moving beyond the hype and technical jargon, Tom breaks down how AI-driven QA can redefine contact center operations, agent training, and customer experience—challenging outdated mindsets and advocating for smarter, more actionable approaches.
Key Discussion Points & Insights
1. Rethinking Traditional QA Scorecards
- Start With Your Existing Scorecard (03:00–05:00)
-
Many clients want to immediately overhaul their QA forms for AI, but Tom advises against this:
- “We tell them not to do that yet... We want to have a benchmark for calibration.” (03:20)
-
Keeping the current scorecard allows for baseline benchmarking and clear measurement of progress before making form changes.
-
Quote:
- “If you’re using any type of AI tool... make sure that you have something that’s your benchmark to kind of build off of.” (04:00)
-
2. Eliminating “False Hustle” in QA Metrics
- “False Hustle” refers to outdated, overly scripted QA metrics (e.g., “say the customer’s name three times” or canned empathy statements) (05:30–08:00).
-
AI can now assess true engagement and empathy by analyzing sentiment, call outcomes, and authentic interaction—not box-ticking:
-
Quote:
- “All of these questions we can start to answer now with AI powered QA... You don’t have to have these really rigid things.” (06:20)
-
3. The Fallacy of Scoring 100% of Calls
- Cost and Accuracy Trade-offs (08:15–12:00)
-
Scoring every single call isn’t always ideal or necessary; AI at current costs (especially for high-end models) doesn’t guarantee better accuracy at 100% coverage.
-
Top-tier accuracy is best achieved by sampling a robust subset, not by attempting exhaustive scoring:
- “The scoring of 100% of calls is a fallacy... you not only don’t need, you shouldn’t have right now until AI gets much cheaper.” (11:20)
-
Stat:
- Tom targets 98% confidence with a ≤2% margin of error using high-accuracy models and selective sampling.
-
4. Human vs. AI Consistency and Calibration
- Human QA Disagreement Rates (12:15–15:00)
- Tom’s data (and external sources) show that 30–40% of the time, human QA evaluators disagree on call scores, even with identical forms:
- “Two QA analysts listen to the same call... 30 to 40%, almost 50% of the time, they come back with different scores.” (13:45)
- AI, when calibrated properly, provides greater consistency and reliability.
- Tom’s data (and external sources) show that 30–40% of the time, human QA evaluators disagree on call scores, even with identical forms:
5. Aggregate Data: The Real Power of AI
- Moving Beyond Individual Calls (15:40–18:00)
- AI reveals trends and outliers across thousands of calls—insight impossible at “two calls per agent per week” manual rates.
- Tom highlights the ability to uncover coaching opportunities, patterns by agent or time, and provide targeted training.
- Quote:
- “It’s not pulling that one random call and nitpicking on it… it’s sitting down for, with an agent saying, out of these 200 calls you took last week, there’s a trend that we’re seeing.” (17:20)
- Quote:
6. Future-Proof Agent Training & Role of Human Touch
- As easy tasks are automated, agents must excel at complex, high-empathy interactions (e.g., social services, crisis lines, in-depth support).
-
Training and QA must evolve to assess real skills, not just “scoring words and phrases” (18:40–20:30).
-
Quote:
- “We have to get rid of that scoring words and phrases type deal because we can now educate at a much, much higher level.” (20:00)
-
7. Next-Gen Use Cases: AI-Powered Replays and Avatars
- AI as a Customer Avatar for Retraining (21:00–23:00)
- Tom describes the ability to recreate failed customer interactions using AI avatars for targeted agent retraining, simulating the same scenario with the same issue and customer context:
- “To be able to retake that same call with the same customer… create the AI avatar, go have that conversation again.” (21:50)
- Tom describes the ability to recreate failed customer interactions using AI avatars for targeted agent retraining, simulating the same scenario with the same issue and customer context:
8. Tuning AI for Maximum Accuracy
- Iterative Prompting and Self-Validation (24:00–27:30)
- Tom outlines how OttoQa and similar tools use multiple confirmation checks—having the AI validate its answer across several iterations on the same question to maximize reliability:
- “If it says yes again, then we score the calls a yes. If it says no, there’s a conflicting response, then we have it scored again.” (25:40)
- Tom outlines how OttoQa and similar tools use multiple confirmation checks—having the AI validate its answer across several iterations on the same question to maximize reliability:
9. AI QA as an Analytics Engine
- Beyond Agent Scoring—Business Insights (28:00–30:30)
- AI QA can ask and analyze questions about customer behavior, supporting sales, marketing, and strategic planning in smaller centers without full analytics suites.
- Quote:
- “You can use your AI Power QA to do a lot of the analytic work that is going to be just as good as some of these... deep analytics things.” (29:40)
- Quote:
- AI QA can ask and analyze questions about customer behavior, supporting sales, marketing, and strategic planning in smaller centers without full analytics suites.
Memorable Quotes & Moments
- “We don’t have to do that anymore. And I want to kind of tell you why... I want to talk you through really how changes in how you perceive QA can really have a massive impact on the quality aspect.” (02:00)
- “AI powered QA is going to be more accurate than [human QA]… it’s not only more accurate, it’s more consistent.” (14:20)
- “The real power of AI powered QA is in the aggregate data… things that humans couldn’t find before.” (15:50)
- “Changing how we educate and train… is going to be massive as we move forward with AI taking over.” (18:30)
- “There’s so much out there from a QA standpoint that can really be done to educate and make your agents better.” (22:50)
- “You don’t have to have a full analytics suite… your AI powered QA can do a lot of the analytic work.” (29:40)
Episode Highlights by Timestamp
| Timestamp | Topic | Notable Content/Quotes | |-------------|-----------------------------------------------|-----------------------------------------------------------------------| | 03:00–05:00 | Scorecard Calibration | “Have something that’s your benchmark to kind of build off of.” | | 05:30–08:00 | Eliminating False Hustle | “We can now define engagement by looking at sentiment and outcomes.” | | 08:15–12:00 | Fallacy of 100% Scoring | “Scoring of 100% of calls is a fallacy... you shouldn’t have right now.” | | 12:15–15:00 | Human QA Disagreement | “30–40% of the time... they come back with different scores.” | | 15:40–18:00 | Aggregate Insights | “There’s a trend that we’re seeing...” | | 18:40–20:30 | Training for Real Engagement | “We have to get rid of that scoring words and phrases type deal...” | | 21:00–23:00 | AI-Powered Replays/Avatars | “Retake that same call with the same customer...AI avatar.” | | 24:00–27:30 | Prompting for Accuracy | “Multiple checks that are like that... have been able to figure out...”| | 28:00–30:30 | Analytics with AI QA | “AI Power QA can do a lot of the analytic work...” |
Conclusion
Tom Laird delivers a candid, practical exploration of the current and future state of AI-powered QA in contact centers. He demystifies common myths, stresses practical calibration, and explains how to unlock the power of AI not just for accuracy but for actionable analytics and next-level agent development. Whether you’re just starting to explore AI QA or already deep in the process, Tom’s blend of data, anecdotes, and hard-won advice offers critical guidance and fresh perspective for modern contact center leaders.
