Summary6 min read

Advice from a Call Center Geek!

Episode: Everything I Have Learned About AI Powered QA

Host: Thomas Laird
Date: March 4, 2026

Episode Overview

In this episode, Tom Laird, CEO of Expivia Interaction Marketing and auto QA startup OttoQa, shares in-depth lessons and tactical insights from his journey implementing and optimizing AI-powered quality assurance (QA) in contact centers. Moving beyond the hype and technical jargon, Tom breaks down how AI-driven QA can redefine contact center operations, agent training, and customer experience—challenging outdated mindsets and advocating for smarter, more actionable approaches.

Key Discussion Points & Insights

1. Rethinking Traditional QA Scorecards

Start With Your Existing Scorecard (03:00–05:00)
- Many clients want to immediately overhaul their QA forms for AI, but Tom advises against this:
  - “We tell them not to do that yet... We want to have a benchmark for calibration.” (03:20)
- Keeping the current scorecard allows for baseline benchmarking and clear measurement of progress before making form changes.
- Quote:
  - “If you’re using any type of AI tool... make sure that you have something that’s your benchmark to kind of build off of.” (04:00)

2. Eliminating “False Hustle” in QA Metrics

“False Hustle” refers to outdated, overly scripted QA metrics (e.g., “say the customer’s name three times” or canned empathy statements) (05:30–08:00).
- AI can now assess true engagement and empathy by analyzing sentiment, call outcomes, and authentic interaction—not box-ticking:
- Quote:
  - “All of these questions we can start to answer now with AI powered QA... You don’t have to have these really rigid things.” (06:20)

3. The Fallacy of Scoring 100% of Calls

Cost and Accuracy Trade-offs (08:15–12:00)
- Scoring every single call isn’t always ideal or necessary; AI at current costs (especially for high-end models) doesn’t guarantee better accuracy at 100% coverage.
- Top-tier accuracy is best achieved by sampling a robust subset, not by attempting exhaustive scoring:
  - “The scoring of 100% of calls is a fallacy... you not only don’t need, you shouldn’t have right now until AI gets much cheaper.” (11:20)
- Stat:
  - Tom targets 98% confidence with a ≤2% margin of error using high-accuracy models and selective sampling.

4. Human vs. AI Consistency and Calibration

Human QA Disagreement Rates (12:15–15:00)
- Tom’s data (and external sources) show that 30–40% of the time, human QA evaluators disagree on call scores, even with identical forms:
  - “Two QA analysts listen to the same call... 30 to 40%, almost 50% of the time, they come back with different scores.” (13:45)
- AI, when calibrated properly, provides greater consistency and reliability.

5. Aggregate Data: The Real Power of AI

Moving Beyond Individual Calls (15:40–18:00)
- AI reveals trends and outliers across thousands of calls—insight impossible at “two calls per agent per week” manual rates.
- Tom highlights the ability to uncover coaching opportunities, patterns by agent or time, and provide targeted training.
  - Quote:
    - “It’s not pulling that one random call and nitpicking on it… it’s sitting down for, with an agent saying, out of these 200 calls you took last week, there’s a trend that we’re seeing.” (17:20)

6. Future-Proof Agent Training & Role of Human Touch

As easy tasks are automated, agents must excel at complex, high-empathy interactions (e.g., social services, crisis lines, in-depth support).
- Training and QA must evolve to assess real skills, not just “scoring words and phrases” (18:40–20:30).
- Quote:
  - “We have to get rid of that scoring words and phrases type deal because we can now educate at a much, much higher level.” (20:00)

7. Next-Gen Use Cases: AI-Powered Replays and Avatars

AI as a Customer Avatar for Retraining (21:00–23:00)
- Tom describes the ability to recreate failed customer interactions using AI avatars for targeted agent retraining, simulating the same scenario with the same issue and customer context:
  - “To be able to retake that same call with the same customer… create the AI avatar, go have that conversation again.” (21:50)

8. Tuning AI for Maximum Accuracy

Iterative Prompting and Self-Validation (24:00–27:30)
- Tom outlines how OttoQa and similar tools use multiple confirmation checks—having the AI validate its answer across several iterations on the same question to maximize reliability:
  - “If it says yes again, then we score the calls a yes. If it says no, there’s a conflicting response, then we have it scored again.” (25:40)

9. AI QA as an Analytics Engine

Beyond Agent Scoring—Business Insights (28:00–30:30)
- AI QA can ask and analyze questions about customer behavior, supporting sales, marketing, and strategic planning in smaller centers without full analytics suites.
  - Quote:
    - “You can use your AI Power QA to do a lot of the analytic work that is going to be just as good as some of these... deep analytics things.” (29:40)

Memorable Quotes & Moments

“We don’t have to do that anymore. And I want to kind of tell you why... I want to talk you through really how changes in how you perceive QA can really have a massive impact on the quality aspect.” (02:00)
“AI powered QA is going to be more accurate than [human QA]… it’s not only more accurate, it’s more consistent.” (14:20)
“The real power of AI powered QA is in the aggregate data… things that humans couldn’t find before.” (15:50)
“Changing how we educate and train… is going to be massive as we move forward with AI taking over.” (18:30)
“There’s so much out there from a QA standpoint that can really be done to educate and make your agents better.” (22:50)
“You don’t have to have a full analytics suite… your AI powered QA can do a lot of the analytic work.” (29:40)

Episode Highlights by Timestamp

| Timestamp | Topic | Notable Content/Quotes | |-------------|-----------------------------------------------|-----------------------------------------------------------------------| | 03:00–05:00 | Scorecard Calibration | “Have something that’s your benchmark to kind of build off of.” | | 05:30–08:00 | Eliminating False Hustle | “We can now define engagement by looking at sentiment and outcomes.” | | 08:15–12:00 | Fallacy of 100% Scoring | “Scoring of 100% of calls is a fallacy... you shouldn’t have right now.” | | 12:15–15:00 | Human QA Disagreement | “30–40% of the time... they come back with different scores.” | | 15:40–18:00 | Aggregate Insights | “There’s a trend that we’re seeing...” | | 18:40–20:30 | Training for Real Engagement | “We have to get rid of that scoring words and phrases type deal...” | | 21:00–23:00 | AI-Powered Replays/Avatars | “Retake that same call with the same customer...AI avatar.” | | 24:00–27:30 | Prompting for Accuracy | “Multiple checks that are like that... have been able to figure out...”| | 28:00–30:30 | Analytics with AI QA | “AI Power QA can do a lot of the analytic work...” |

Conclusion

Tom Laird delivers a candid, practical exploration of the current and future state of AI-powered QA in contact centers. He demystifies common myths, stresses practical calibration, and explains how to unlock the power of AI not just for accuracy but for actionable analytics and next-level agent development. Whether you’re just starting to explore AI QA or already deep in the process, Tom’s blend of data, anecdotes, and hard-won advice offers critical guidance and fresh perspective for modern contact center leaders.

Loading summary

Transcript1 lines

[00:00]
A
Hi guys. Tom Leonard here. You've heard me break down Contact Center Technology on the podcast time and time again. Now it's time for me to help you with your challenges as your Contact Center Technology advisor through our new company, xpiva Digital, I can help you with things like CCAST selection, any type of AI implementation, and nice studio and integration services. It's the same honest advice you hear on the Geek, applied specifically to your operation. Visit Expedia digital.com to get started. That's Expedia digital.com Welcome to Advice for Cause and Experience. Repeat the Gas the Way talk Contact Center Technology. And the people behind the operations. Welcome back everybody to another episode of Advice from a Call Center Geek. The Call Center Contact center podcast. We try to give you some actionable items to take back in your actual contact center. Improve the overall quality, improve that agent experience, hopefully improve the customer experience as well. So hope everybody's doing well. I know I'm trying to get back on track here with the a more consistent podcast schedule and I think, you know, we're going to do that here hopefully this month as things have kind of calmed down a little bit work wise here. I want to talk about QA again. You know, that's something that I'm really passionate about and I think I got some really cool insights for you here, right? I don't want to talk too much about the technology piece. Maybe there's a little bit of it, but all the things that really we've learned in helping customers maybe recreate or look at their QA from a different perspective, right? Everybody just has kind of that Excel spreadsheet that they've been checking off for the last 15 years, especially you smaller guys, right? And we don't have to do that anymore. And I want to kind of tell you why. And I want to, I don't, to be honest, almost I want to sell you on this. I want to talk you through really how, how changes in how you perceive QA can really have a massive impact on the quality aspect. So let me get right into this. The first kind of lesson that we've learned with customers and clients when they come to us is a lot of them say, my QA scorecard isn't great or hey, I'd like to update my scorecard. And we tell them not to do that yet, right? So we don't want to really reinvent your scorecard on day one. And the reason is because we want to have a benchmark for calibration. And I think if you're using any type of AI tool. I think it's really, really, really important, right, that you have at least something that you've been using to benchmark off of and then you can kind of see the progression as we change your form. So we try not to. Unless, you know, if you don't have a form at all, then obviously there's nothing to benchmark. We're going to help you build something out. That's awesome. But if you do have a form, we want to try to keep that as close as we possibly can so that again, and if you're using any type of AI powered QA tool, I would suggest that you do that as well. Don't change your scorecard. Make sure that you can see that there's a, there's a calibration level that is within your kind of margin of error. If there is any margin of error. Like we're almost to the point where there is none, right? So again, really, really important. Make sure that you have something that's, that's your benchmark to kind of build off of. The next thing is when you start to build your new form. You know, again, we talked about false hustle, wrote a book on this. False Hustle is kind of the things that in the QA world that we had to do because human beings didn't know how to score things any, any other way. For example, right? Saying that, you know, we have to say the customer's name three times during the call and calling that engagement, right? Having scripted phrases like I'm really sorry to hear that, calling that empathy, all of those kind of things we can go away with with AI power qa, right? You can, you can define engagement by looking at sentiment, by looking at outcomes of the call, looking at the attempts, that of your agent and what they, they really did, right? Did, did the, did they really connect? Did the agent listen? Did they control the call? Did the customer feel, heard. All of these questions we can start to answer now with AI powered QA that you don't have to have these kind of really rigid things that before was the only way that a human being could score these things. I think that's the biggest change that we see is getting rid of these things that we've been doing for the last 20, 30 years in a manual scoring sense and really making this AI powered, where you're looking at the outcomes of the call, many outcomes during that, that interaction with the agent and being able to train our agents and educate our agents on just actually talking and engaging and not having to give these stupid, false scripted phrases out. Because again, we called that something like engagement or we called that empathy or you know, I think that's the biggest change that I can see from again, as we move, even the human customer experience is going to get more difficult and having that type of non scripted, non robotic responses and being able to score it I think is, is really, really important. The other thing that we have found is that it is a mistake. Not just that you don't need to, but it's a mistake to score 100 of your calls in 2026 AI. And let me tell you why. So let's just say you're 100 seat contact center and you go out and you go to one of the big tech companies and you're spending $130 a month per seat on AI Power QA, right? So that's what, $13,000 a month that you're paying for for QA. And that's a lot of calls. So when you look at kind of the high end models, if you look at anthropic, if you look at Sonnet, right, if you look at even the high end chat GPT, it's expensive, it's relatively expensive. They cannot use the highest end model on a seat license scoring 100 of all the calls. So what you're getting is 100% scores called at about an 80% accuracy, which really doesn't remove the human element from scores. Some, some things well, but it doesn't get to the point where you can totally have a very comfortable sense that everything is correct, everything is accurate and everything is consistent. So what we say is that again with us in auto QA we don't score 100% but we score enough, right? We get you within about 98% confidence with a plus or minus margin of error of about 2%. So, meaning, you know, if we're, if you have a score that's in, that's your human beings and let's say it's perfect and scored it as a 90, right? We're going to be 97% of that all the time. And then again, maybe there's a little margin of error where we're at 88 or 89, but again it's that close and we won't even move to a proof of concept until we're 96% accurate, which is more accurate than if, and we're going to talk about this in a second. If your QA team scores it, my QA team scores it, and then a third party team scores it, we're all going to be a little bit off. AI powered QA is going to be more accurate than that, right? So again, the scoring of 100% of calls is a fallacy. It is something that you not only don't need, you shouldn't have right now until AI gets much cheaper that it can score more accurately because they're not using the model. And again, ask them, go on their websites, ask how accurate those guys are, right? The big QA companies, they'll very rarely tell you if it's on their website. You normally see about 85% and they kind of say that this is just kind of, you know, it helps the QA agent, but I think it can take away all the tasks of a QA agent from a scoring aspect if you're doing it right and you're using the highest, most expensive models because they're that good right now. So again, it makes zero. It just blows my mind when people are paying all of this money for, for full hundred percent scoring when they're not getting even close to as accurate as they would get if they used a, a percentage of calls at the very high model. That makes sense. We did a study and I found this online too, right. There's a couple different ones. And it all kind of calibrated, right? Where we look back before we were doing auto qa, gosh, probably the last six or seven years of kind of all of our, we had all of our manual QA and calibration sessions and we kind of looked at, you know, when a customer did a calibration session, we found that about 30 to 40%. And I've found this online as well from other people that I've talked to 30 to 40% of the time, human QA evaluators disagree, right? So two QA analysts listen to the same call, they use the same form. And 30 to 40%, almost 50% of the time, they come back with different scores, right? So it makes sense, right? Somebody's tired on a Friday, they just want to blow through it. Somebody is more stricter or is more strict on an empathy type. Question one, you know, interprets call control this way, right? They're human. There's variables that we have as humans, Right. And I think that's where why AI is more accurate, right? It's not only more accurate, it's more consistent. Right? Because again, it just scores the same way, the same standards, if you're doing this the right way. The hardest thing that we found with auto QA was not the accuracy, so to speak. And I Guess you could call this accuracy, but it would be scoring a call 20 times and being correct 18. But two times, right? We were. Had like a crazy number. And again, that was kind of frustrating, right? So you, again, you kind of need to have things in place to kind of handle that. And that's kind of why I think you use now an AI QA tool from a company instead of making it on your own. Those are some of the things that you're going to run into. But again, once you have it figured out, way more accurate. The other awesome thing that we have found is that the real power of AI power QA is in the aggregate data, right? Things that, you know, individual calls are cool, right? An individual scoring is fine, but you start to see patterns that humans couldn't find before. You start to see, you know, things like 80 or let's say 19% of the time they're not using this rebuttal. This agent on a Monday, 62% of the time doesn't show empathy, right? Maybe she's just. She doesn't like Mondays or he doesn't like Mondays, right? You can start to see things in the aggregate on your agents and on your teams, on supervisor teams, and in your overall contact center that you could never see before on just an instant dashboard. That's real power, right? It's not just, you know, did Janie say this on a call last Tuesday? But it's now looking at everyone or looking at your team or looking at all the calls that an agent takes and really giving them insights that, that they could never have before. And again, if you're doing two calls per agent per week, you have no idea what to train, right? That might have been an outlier. And now you're, you're hammering that agent on something they, they know they just screwed up one time, but that's not the real problem. So again, I think that's the real, real cool, right? We, it's not pulling, you know, that one random call and nitpicking on it, right? It's, it's sitting down for. With an agent saying, hey, out of these 200 calls you took last week, there's a trend that we're seeing, right? And I think, you know, that's the stuff that's, that's extremely powerful. I think changing how we educate and train. And that kind of gets back to what I was saying earlier is going to be massive as we move forward with AI taking over again, some of the easier stuff, right, that everyone's been talking about, that still frustrates me because we've been taking away the easy stuff forever. But again, the harder stuff, I think that's no doubt the things that need a human touch, the social services, the 211 type calls, the suicide hotlines, the, you know, maybe some high end customer support from a financial services company, like really in depth issues, right? Are going to need agents that don't give scripted responses that can actually interact, that can really show empathy that, that understand how that other person is talking, that we're not check marking boxes on, on scoring words and phrases. We have to get rid of that, that, that scoring words and phrases type deal because we can now educate at a, at a much, much higher, higher level. I think that's, that's really important. The last thing that we're kind of seeing that has been really, really, really cool is, is looking at, you know, what we called redo, right? And looking at AI in a different way. Looking at how, how do we make avatars of customers, right? Real customers. So again, when you, when you, you have a call and you fail a call, you had a 75 and passing is 80. To be able to retake that same call with the same customer, right? Same customer, same customer name, same issue, right? Create the AI avatar, go have that conversation again. That's the stuff that we're doing at Auto qa. And I think we're trying to think through this AI thing differently than just voice bots and all the boring stuff. There's so much out there from a QA standpoint that can really be done to educate and make your agents better. And we're going to need these tools as we kind of move through the next 2, 3, 5, 7 years of how AI will be taking certain things over and how that human touch kind of really needs to change. So again, some really cool things I think that we have found with AI Power qa. Kind of getting a couple things on the technology part. You've heard me talk about this a lot. Again, we talked about it earlier. The accuracy was okay, but at the beginning the consistency was a struggle. So how do we fix that? There's multiple ways that we figured out from a prompting standpoint. The main one that I always talk about is the reiteration. And we have about six or seven of these different checks on every single call that we score. That's why I think we're the most active accurate is again, we will have the AI ask, you know, have it read the question and answer it. So hey, did the agent use the brand of greeting? Yes. All right. And then we, we say, okay, hold that in your memory and now ask that question again. Did the agent use the brand of greeting? If, if it says yes again, then we score the calls a yes. Put the context why. If it does and if it says no, there's a conflicting response, then we have it scored again. And on the third one, right, we have a couple other things that add into that, but we basically say, what did you come up with it? Was it a yes? Yes. Then we say, okay, put it as a yes. So we have multiple checks that are like that, that are going through that, you know, you really have to think about. And when you're building this on your own, you can definitely do it. But those are some of the real issues that, you know, we have been able to figure out over the last two, two and a half years. That was a struggle at the very beginning. Right. And I think we've come such a far way and so have a lot of other companies as well with, with how accurate and how good this stuff is. But I think again, if you start to look at, hey, I want to train my agents differently. I want to look at trends, I want to look at things in the aggregate. You know, with AI Power qa, you can use it as an analytics tool as well. You don't just have to score agent behaviors, you can look at things. And we do this for a lot of customers is have zero scored sections with maybe 10 or 15 questions about customer behaviors. Why are they calling, why are they canceling, what are some things that they're reacting positively to? So again, you don't, you don't have to have now, even if you're a smaller contact center, a, a full analytics suite and a full QA suite, you can use your, your, your AI Power QA to do a lot of the analytic work that is going to be just as good as kind of some of the things that some of these kind of hardcore deep analytics things are telling you. Because again, you, you just want to know the big stuff. You want to know the main things. How do I affect my business? How do I affect my customer support? How, what can I, what aspects can I tell marketing about what we're hearing? Right? What can I tell sales about some of these, these things that are coming through. So again, please come, come take a demo with me. I'd love, I want to show you our tool. I, even if you don't use it or you don't buy it, I want you to think differently about how AI can really power QA and can do things at a whole different level and make you kind of hopefully rethink how QA can be done and then how the quality aspect of your contact center can be done as well. Thank you guys very much. I hope that you enjoyed this. This QA thing has been a mind blowing, awesome experience. It has taken us in so many different ways. So hopefully I can kind of, you know, help you kind of think a little bit differently, just as I have. Thanks. Talk to everybody next week.