Transcript

A (0:00)

Hi guys. Tom Leonard here. You've heard me break down Contact Center Technology on the podcast time and time again. Now it's time for me to help you with your challenges. As your Contact Center Technology advisor through our new company, Xpiva Digital, I can help you with things like CCast selection, any type of AI implementation, and nice studio and integration services. It's the same honest advice you hear on the Geek, applied specifically to your operation. Visit Expedia digital.com to get started. That's Expedia digital.com. welcome back everybody to another episode of advice from a call center Geek. The Call center and Contact center podcast. We try to give you actionable items to take back at your contact Center. Improve your overall quality, improve the agent experience, hopefully improve the customer experience as well. I guess that's the whole point of this, right? For those of you who do not know me by now, my name is Tom Laird. I'm the CEO of XPV Interaction Marketing. We are a 500 seat contact center outsourcer located here in the states. Work from home and on prem and also the, the founder and CEO of Auto QA where we're, we're fully scoring Contact center quality calls using AI. And I want to talk a lot about Auto QA today. Obviously we're using this with all of our customers at Expedia, but this question, and I do probably, over the last three years, I've probably done one podcast a year on this topic and it's constantly, I don't say evolving, but it's, it's unfolding to be more and more true as we see costs of what some of these other QA companies are doing it. Again, I don't want to make this a sales, this is not a sales podcast for Auto qa. Although, you know, we're definitely going to talk about some of the benefits of doing QA this way, right? Using AI in the most, I think, efficient way possible to make sure that you're getting ROI instantly. So whether you use us, whether you, somebody else, I think these are, this is a main topic that you need to be informed about and you need to be able to ask questions when you're talking to, you know, a provider or, or you know, somebody that you're, you're looking to possibly bring on as a vendor to, to, to do QA with. So there's two main ways to kind of jump right into this thing, right? There's two main ways that people are scoring calls and billing for it from a kind of technology partner platform format, right? And the main One still is the, the old school. You know, everybody's always used IT technology and ccas and cx, which is your C license, right? So looking at the amount of agents that you have, tagging them all with a license and you pay for that right? Now some of these prices and again, this is the stuff that no one else tells you. And this is the honest to God truth. I mean, I'm putting my name to this thing. This is everything that I have, I have really seen is number one, the large language models overall to do things at scale for a hundred percent of calls are extremely expensive. You know, I've, I have talked to head people at CCAS platforms. I mean, you guys know I'm, I'm, I know a lot of those type of individuals and you know, they have kind of picked my brain and we've had conversations on this, on, on how to get the cost down for qa, right? If you're doing a seat license, you have to score a hundred percent. So what we have found is that they do, they come nowhere near using overall, they don't use the best models, right? They use, there's a lot of different tactics with caching and things that you can do to keep your costs down. But it's not going to matter when you have a, you know, 250 seat center, you know, taking three minute, four minute calls, right? The, the volume of call is just so intense, right? I guess technically that would matter. I mean if you have a longer call, it's less, but it's the same amount of tokens, right? It's just the amount of people, the amount of scale, the amount of cost. So when you go onto these websites, you know, you'll see for most of them, which I kind of respect, right? They, they're honest and they say, hey, we really only have an accuracy kind of threshold of up to about 85%, right. So there's still going to need to be some manual overview, but overall, right, we're going to, you know, be about 80, 85, 87% accurate when it comes to scoring how your internal team scores versus how our AI tool is going to score. So that's the trade off, right? So, and that's the thing that nobody tells you. So you can have 100% of your call scored. What I am seeing, it's well over $100 per seat, right? Not all platforms, but the main guys are over $100 per seat per license, right? Upwards of 130 to $135. So you can pay $135 per seat to get 85%. And let's even give them 90%. 90% accuracy. But you get 100% of your call scored or you do it the other way. So this is the route that we have taken. And again, understandably, we are not going to be. Auto QA is not going to be for the 10,000, 20,000, 30,000 seat center, right? We're going to be too expensive. But for the smaller center, under 500 seats, under 200 seats, 10 seats, right? We are going to be extremely cost effective. We're going to be more cost effective than a seat license. So what we do is we utilize a usage model, right? So you pay for how many calls do you want scored? And people say, well, Tom, we're not getting 100 scored. And my answer to that is, yeah, that's good. I mean, that's, that's, that's not a bad thing. Because I think if you're paying to score every call in your contact center, you're not getting better data, you're just spending more money, right? This is kind of this thought process, though, this math that I'm about to tell you, right? It's been backed by about 300 years of proof, right? There's this, this statistical principle called the law of large numbers, right? And it's the same thing that allows, you know, pollsters to predict elections. Now, you can insert a joke there, right? But overall, right, they can, they can predict or become, with a margin of error, pretty close with, with just a thousand responses from a, you know, a certain electorate. It's the way that, you know, pharmaceuticals can improve drugs without testing every human being, right? They, they have a sample size that they know is going to be within a margin of error. We can apply these same principles to, to your QA to get you with 99% accuracy with a margin of error plus or minus 3%. So, meaning, you know, if, if auto QA scores something at an 87, right, we're going to be, you know, maybe you're human beings. And we have found that human beings, if I score a call and whoever's listening to this scores a call and your QA person scores a call and your contact center manager scores a call, the average is 5 to 7% that you're going to be off, right? So that's why we say we are more accurate than human beings because we're 99% accurate, right? Within a margin of error, plus or 3%. So sometimes I'll say 96% or I'll say the 99% plus or minus. So meaning if, if auto scores a call at an 85%, right, goes through all your questions, you know, tabulates all your points and is 85, you know, there's going to be a margin of error of plus or minus 3%. So it could be, you know, 88%, maybe you have all the way down to maybe 83. Right. But we're in that 85. So you can get that accurate and I would argue much more accurate than scoring 100% of your calls at only that 85% accuracy. So again, I don't know. Hopefully I'm explaining that well. But it blows my mind why people are paying $135 per seat, signing long term contracts, having massive minimums, when really they should be asking the question, why am I doing all that and why do I need to score all of these, these calls or when I'm not getting any more data or insights from them. Especially if you're under 500 seats, if you're 200 seats, if you're something smaller, there's absolutely zero reason, zero reason that you should be paying a seat license. You should be using the law of large numbers. You should be having a certain percentage done with the highest large language model you can to give you the most accurate results with the lowest margin of error, which is going to be, believe it or not, more accurate than a hundred percent of calls. Nobody wants to talk about that because everybody does the seat license thing and so nobody wants to say the quiet thing out loud. I'm saying it out loud, right? I think that the law of large numbers is more accurate than 100% of your calls scored. Sampling with the highest margins, with the highest large language models, with the most accurate large language models, with the most expensive models, which is what we do. So it's crazy to kind of think that through, but when you add the logic to it, you know, it makes, it makes total sense. So there's, there's a couple things though that, that I want to kind of bring up here too. And let me, let me bring up my notes here because there's a lot of worry right now. Let me, let me say this before I even do that. If you have compliance questions, right? So, you know, we work for some BPOs that are like, Tom, I don't have a choice, like my client is going to pay for 100 of these calls scored because they can't even have one compliance issue. That's a whole different story. But the cool thing with that is we can just score those specific Calls or those specific sections of the call. So you could say, hey, listen, we're going to score the entire form, right, at the regular cost, right. And we only need to do, you know, 4%, 5%, 7%, right. So if there's 60,000 calls, maybe we do, you know, 2000 or 1500 calls, right. To make sure that we get to that number. And that all depends on how many calls, how many agents, all that stuff, right? So we have all that math. So I think that's kind of the, the, you know, the, the biggest kind of fallback there, right, is we can kind of tell you exactly what you need to do, right, to be in that, that, that kind of space to have your margin of error be, you know, really where it's supposed to be. So let me calm some people here too, right? So, and again, just quickly to get back to that, we can then just take the, the, the two questions, right? So you have two questions that have to be compliance. We can almost give you those for free. Score 100 of those and still you'd be way under kind of where everybody else is from, from a seat license standpoint. So again, I think that story, and again, I don't know if I'm telling it well is not being told enough. So again, if you're a large contact center with, you know, 10,000 seats, 5,000 seats. Yeah, I'm going to be. Or a usage model will probably be too expensive for you, you know, and you got to kind of just stick with your 85% accurate because the, the models just won't allow that. But for most contact centers, 30 seat centers, 50 seat centers, you should not be paying for a C license and you should be just using a usage model using the law of large numbers. And again, we have, I have a full blog post that I did today that kind of gives a, an example. So a 60,000 call monthly call center, right? So a call center that's taking 60,000 calls, right. What we would recommend at auto. Right. And we're saying that for this, you know, again, about a five minute call, we're recommending to give you a 99% confidence level, plus or minus a 3% margin of error. That would be just scoring 1789 calls per month. Okay, that gets you to that 99% accuracy. Our algorithm also spreads it over all of your agents. We make sure that we hit that. Sometimes, depending on the amount of agents you have, we have to bump up that 1700 to make sure that we're getting enough of each specific agent. But overall you, you need about 2,000 calls. So for us, that's going to cost you about $1,500 a month. If you have the 45 agents that it would take to do these 60,000 calls with the $125. And I'm even, I went down, I said $125 in my example here. Most of them are higher than that. That's going to be about $5,600 a month for only 85% accuracy. Long term contracts and a huge setup. So what do we, what do we, what are we talking about here? Right. If you're a 45 seat contact center, right. It makes zero sense for you to go pay for 45 agents, 125 to $140 per seat per month. You know, you're roughly around $6,000 a month for a lower accuracy than you would actually get from scoring 1700 calls a month using the law of large numbers and using that usage model. I mean, I don't know how, how else I can, I can put that. I have done a horrible job kind of in the past with the content that we've done explaining that. But until these big guys can come to me and say, hey, we're at that 95, 96, 97% and none of them can. And again, like I said, I respect it because most of their websites tell you that, you know, there's no reason to again be paying for these seat licenses. And when you see we can score now, 100% of calls, right? To what level? And that's the question you need to start asking, right? Is it 85%? Because if I do it smaller, I actually get with, with a better large language model, my accuracy is actually better, right? So look at the blog post that I will, that I, that I put out today. It's on the Auto QA site. Really cool. You know, it also gives you things, if you're going to do this on your own, right? Making sure that you have a sample that is representative of kind of the entire community. You're looking at stratifying your sampling, having automated selection. And this is the stuff that Auto QA does in our algorithm for you, right? So when we have an integration, we pull it in. All of that math is kind of already done for you. So you know that you're getting a fully scoped and accurate amount of universe of your qa. So the three things again that, that you're probably doing right now, especially if you're not using AI powered qa, right? That, that you're making mistakes with is, is one you kind of your QA cherry picks calls, right? We all know this, right? They pick calls based on duration. They don't want the 12 minute call. They'll take that three and a half minute call, right. They'll look at things like, you know, we're just going to listen to customer complaints. We're just going to listen to things that a supervisor, because they, they kind of maybe don't like an agent or they think this agent's a little rough. So we' listen to more of their calls than anybody else. Right. So your sample size right there is already, is already screwed up. Right. And that can happen as well when you start to get into kind of the, the lower accuracy of, of, of of scoring all of the calls. It's almost the same type of, of numbers that, that start to kind of show up that aren't as accurate. Right. The other thing again with that is again just scoring too, too few calls per agent. And again, you don't need to score a hundred, but we have to have an algorithm that is set that matches how many calls that we need per month and then making sure that we have a minimum amount. So there's kind of two levels to this. There's the overall contact center, then there's the agent. They're Normally within about 5 to 7% number wise of what we need to score. So again, if, if the number is too low on your overall contact center, then then that means we're not getting enough per agent to get a really good look at each agent. So maybe that needs to be bumped up a little bit. We have all the math for that. I can tell you exactly, you know, depending on the service level that you need that you're looking for, with which correlates to the amount of agents that you have, how long each specific call is, right. And then what you want that margin of error to be, you know, we give you the exact amount of calls that you need score. So again, I think a lot of people are being lied is not the right word because I don't think anybody's lying because they're telling the truth, but they're not telling the full story of what it means to score 100% of calls in a 2025, 2026 large language model cost structure. It's not as accurate as scoring one with the greatest best large language models doing, using the law of large numbers, having a specific number of calls to go by with the best models that give you the most accuracy. Right. So again, I hope these questions start to get asked when you're looking at these type of tools, right? Ask what type of large language model they're using. Ask the level of accuracy that they're going to give you. What happens if they're for. If your form isn't accurate, do they change it for you? Do they change questions? How do they set up your questions? Like all of those things you should be asking to make sure that you're getting the most accurate QA and you're getting the most. I mean, let's just keep it there. That you're getting the most accurate amount of data and data that you can use to make agent changes and then looking at your overall contact center as well. So again, I will post this link to this blog post. I think it's really interesting. We also have one that really gets deep into the math, right? So if you have, like, if you're a total math nerd, you know, DM me with that, I can kind of shoot you that. And that goes pretty crazy. Deep into all the math that we've done with this, looking at the statistics, looking at the law of large numbers. So again, I know that this is a little bit, I'm kind of a little bit all over the place here, but I just, I want to make sure that people understand that scoring 100% of the calls isn't always what it's cracked up to be. And a lot of times you're getting much less accurate data than if you're scoring a sample size with the best large language models with the highest margin of error or the lowest margin of Eric, depending on how you look at it. All right, guys, thank you very much. Hopefully that this kind of at least gets you thinking a little bit about this. And I'll talk to everybody next week. It.

Podcast Summary: "Why You Should Not be Scoring 100% of Your QA Calls"

Podcast: Advice from a Call Center Geek!
Host: Thomas Laird
Date: January 8, 2026

Episode Overview

In this episode, Tom Laird tackles the persistent and often misunderstood issue of quality assurance (QA) in contact centers: whether scoring 100% of calls is actually beneficial or necessary. Drawing from his extensive experience as CEO of Expivia Interaction Marketing and Auto QA, Tom argues that evaluating every single call is typically wasteful, expensive, and—even more provocatively—less accurate than a statistically sound sampling approach powered by the right technology and models. He provides industry insights, mathematical rationale, enlightening examples, and actionable advice, especially for small to mid-sized contact centers.

Key Discussion Points & Insights

1. Two Primary QA Pricing Models

[02:10]

Seat License Model: "The main one still is the old school...you're looking at the amount of agents you have, tagging them all with a license and you pay for that."
- Typical costs: $100–$135+ per seat per month
- Offers 100% call coverage but only up to ~85–90% accuracy
Usage or Pay-Per-Call Model: "What we do is we utilize a usage model...you pay for how many calls you want scored."
- Flexible, scalable costs
- Ideal for centers with fewer than 500 seats
- Relies on sampling rather than evaluating every call

2. The Myth of 100% Call Scoring

[06:00]

"If you're paying to score every call in your contact center, you're not getting better data, you're just spending more money."
- Coverage does not equate to accuracy; the process can become cost prohibitive and inefficient.
Even leading AI platforms "really only have an accuracy...up to about 85%."
Important Question: "What level [of accuracy] are you actually getting?"
Costs vs. Returns: For 100% scoring, costs can be prohibitive, especially for smaller centers, while the accuracy gains are questionable.

3. Statistical Sampling & The Law of Large Numbers

[08:15]

Tom references the statistical principle: "There's this statistical principle called the law of large numbers...you can apply these same principles to your QA to get you with 99% accuracy with a margin of error plus or minus 3%."
- A properly sampled subset (say, 1,700 out of 60,000 calls) generates reliable, actionable data for a fraction of the price.
- Memorable analogy: It’s the same math that "allows pollsters to predict elections...pharmaceuticals can improve drugs without testing every human being."
- Typical error margin among human QA teams: 5–7%

4. Concrete Example: Sample Size vs. Seat License

[15:45]

For a 60,000/month call center (~45 agents):
- Seat License: $5,600–$6,000/month for 85% accuracy.
- Sampling Model: Score ~1,789 calls/month for $1,500 with 99%/±3% accuracy.
- "It makes zero sense for you to go pay...for a lower accuracy than you would actually get from scoring 1,700 calls a month using the law of large numbers..."

5. Accuracy Considerations

[10:45]

"We are more accurate than human beings because we're 99% accurate, right? Within a margin of error, plus or minus 3%."
Human variability—QA agents often disagree by 5–7% when scoring the same call.
It's possible to sample less and achieve better, more defensible results by leveraging advanced large language models.

6. Addressing Compliance and Exceptions

[18:30]

Compliance needs? "If you have compliance questions...we can just score those specific calls or those specific sections of the call."
- 100% scoring may be mandated in regulated industries or clients with zero tolerance for error.
- Usage models can still isolate and prioritize compliance elements, mitigating costs.

7. Common Mistakes in Current QA Practices

[27:15]

Cherry-picking calls: Manual QA often biased—certain call types, lengths, or agents get disproportionately analyzed.
Too few calls per agent: Undermines visibility, accuracy.
Sampling not representative: Fails to reflect true agent or center performance; can be avoided with algorithmic, automated sampling.

8. How to Vet QA Providers

[33:00] Tom’s recommendations for those evaluating QA tools/partners:

Ask about the type of language model used.
Clarify accuracy rates and how they're measured.
Investigate processes for updating forms and questions.
Demand transparency regarding sample size calculations and stratification methods.

Notable Quotes & Memorable Moments

"It blows my mind why people are paying $135 per seat...when really they should be asking the question, why do I need to score all of these calls when I’m not getting any more data or insights from them?"
— Tom Laird [13:20]

"The law of large numbers is more accurate than 100% of your calls scored."
— Tom Laird [16:55]

"If your QA cherry picks calls...your sample size right there is already screwed up."
— Tom Laird [28:05]

"I want to make sure that people understand that scoring 100% of the calls isn’t always what it’s cracked up to be."
— Tom Laird [38:10]

Important Segment Timestamps

02:10 — Breakdown of seat license vs. usage models
06:00 — Myths about 100% scoring and associated costs
08:15 — Statistical underpinning: Law of large numbers
10:45 — Comparing AI, sampling, and human variability
15:45 — 60,000 call example: cost & accuracy analysis
18:30 — Compliance exceptions and hybrid approaches
27:15 — Classic QA mistakes (cherry-picking, unrepresentative samples)
33:00 — Questions to ask potential QA providers
38:10 — Final thoughts: why less is often more for QA

Final Takeaways

100% QA call scoring typically offers worse accuracy and ROI for most centers, unless mandated by strict compliance.
Rely on sampling (rooted in statistical best practices) with high-quality models for better insights and cost-savings.
When choosing a QA solution, push beyond marketing claims—demand clear, honest metrics on accuracy, methodology, and cost structure.
Don’t be afraid to ask "the quiet questions"—about how many calls you need to score, what model is used, and what accuracy you’re truly getting.