
Loading summary
A
This is the Everyday AI show, the Everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business and everyday life. Most AI tools that analyze sales or customer support calls turn those conversations into a text based transcript. But text only transcripts miss most of the value and Modulate fixes that. Modulate's new Velma Voice native AI model and their ELM technology actually understand what's happening on those calls. It picks up the valuable tone, timing, emotion and intent that all AI transcription tools can't provide. So whether it's for sales, customer support or voice agents, Modulate's new Velma model helps you capitalize on what text only AI tools miss. Demand more from your AI today with Modulate Modulate AI. What's the difference between text and understanding tone? Probably a lot. It's something that I think a lot of us overlook, especially when it comes to using AI. We just assume that as an example, if we're talking to an AI, that they understand us. Well, maybe not. Maybe all they're seeing or actually understanding is words. Or even maybe worse, all they're looking at or understanding is a series of tokens. And I think as we look into 20, 26 and beyond, something I've been personally very bullish on is actual voice AI. But it goes a step further than that because like I said, there's a big difference between a model, maybe quote unquote, hearing what you're saying or understanding the words that you're saying versus understanding the tone. And now I think the technology is finally there that we can accomplish the latter and what that unlocks for the everyday business leader is huge. So that's what we're going to be talking about today on Everyday AI. I'm excited for it. I hope you are too. Let's get into it. If you're new here, welcome Everyday AI. Well, it's for you. It's a daily live stream podcast and free daily newsletter helping everyday business leaders like you and me keep up with the latest AI technology, how to make sense of it and to leverage all the good stuff, ignore the boring stuff or the stuff that doesn't matter and use it to grow our companies and careers. So it starts here with the unedited, unscripted live stream podcast. But to take it to the next level, please go to our website at your everyday AI dot com. All right. And if you're looking for the daily AI news, that's going to be in the newsletter as well. All right, enough of chit chatting. With me, let's bring on a real expert, someone that is building what I think is kind of the next frontier of AI technology and that's voice enabled and AI that actually understands what we mean, not just what we say. So, live stream audience, please help me welcome to the show Mike Pappas, the CEO and co founder of Modulate. Mike, thanks so much for joining the.
B
Everyday AI show and Jordan, excited to be here.
A
All right, so if you're an avid, like an avid, everyday listener of the show, you've heard Modulate a little bit this week especially. But Mike, for those that maybe aren't familiar, what does Modulate do, especially around voice?
B
Yeah. So Modulate is a frontier AI developer focused on true voice understanding. So we actually got our start in the online gaming space. We work with Call of Duty, Grand Theft Auto, Rainbow Six, Siege on voice moderation, understanding the difference between friends who are trash talking and having a good time with it, which we want to encourage, and that very thin difference that starts to make it actually very damaging to someone who's not expecting those kinds of interactions. You can't do that based on transcriptions. You have to understand how people are actually experiencing the conversation. So that's what really drove us to push AI to the next level for this kind of understanding. These days we're working with Fortune 500 across fraud and AI guardrails and all these different things, but the core of it all is we just want to push AI to actually hear us as humans and get the same meaning from those conversations that we can do ourselves.
A
Yeah. And I think even just with how AI works under the hood. Right. Which we don't have to get too far into, but you know, ultimately kind of like what I said there, you know, right now, if you're chatting with, you know, a quote unquote, you know, voice model from some of the major providers, all it's really doing is taking a transcription and, you know, assigning some tokens to it. But, you know, I, I'm a dork, I, I, I look at the tokenization of different words, right? And one example I always say is the, like the word just, the word just can be tokenized at least like seven different ways, Right. So how or maybe why is it so important to have an AI as things get more, you know, voice first or voice native, why is it important to have an AI that actually understands what we mean, not just the words coming out of our mouths?
B
Well, the, you know, the standard answer is you get a text message from a friend, you're running A little late to an event and the text message says something like, hey, you coming? And depending on that inflection that I just used, I could have made you feel anxious, I could have made you feel cared for. But you don't get any of that in the text message. And all of us have that experience of the sort of passive aggressive anxiety of what does this friend actually mean here? Text doesn't communicate nearly as much as voice does. We have infamous examples within the company of things from some of our gaming customers where someone will say the phrase, hey, come join my private room, which sounds totally ordinary in text, and then you hear the particular way they say that and you feel very personally unsafe. There's a lot of ways to articulate things that communicate so much more. And if we're not looking at that, then we're not actually understanding the experience that human beings are having. And, and how can an AI participate in those experiences or improve those experiences if it doesn't even understand what's going on?
A
And I think that's important especially, you know, as people maybe are using large language models differently than what they were originally intended for. Right. I think it was maybe an OpenAI study that came out at the end of 2025 really showed that people were using AI a lot more for, you know, personal relationships therapy, things like that. Right. I, like, we don't have to dive into that. But you know, like, I'm sure that you guys hear so many different use cases and kind of your example is a great illustration of that where the same words can mean very different things. Like I'm wondering if you can walk us through maybe a client of yours and just so we can understand a use case maybe outside of, of gaming, because the one that you, you know, game is a great use case, but you know, maybe walk us through kind of a non gaming customer that you've worked with. And you know how the ability to understand what' actually happening outside of a transcript, how that's actually helped move the needle.
B
Yeah, absolutely. So what? One of our first big customers outside of gaming was a Fortune 500 company in the food delivery space and they had come to us to try to protect their drivers. It's shockingly frequent that you'll have a driver who's running a few minutes late on a delivery. They'll get a call from an angry customer saying something along the lines of, I'm going to kill you. And then that driver says, I don't want to go to that person's front door anymore. So that was the Initial use cases help us understand when there's those kinds of aggressive emotions being portrayed and help identify calls that are worth the platform. Taking a closer look at. They came back to us a week later and we asked, you know, how's the abuse detection going? And they almost didn't even want to talk about it because they were so excited about how much fraud we were finding for them. And it turns out we had actually found five times more attempted scams against their drivers than their actual fraud detection tools. Because those fraud detection tools were looking at things like metadata of suspicious numbers or, you know, is a weird transaction taking place after the fact. But what we were doing was listening live to the call and saying things like, hey, this person is performing anger, but they don't actually demonstrate authentic anger. Hey, this person is trying to bypass policies in ways which are very clear. And they're trying to make these excuses. Even in some cases with fraud, it's something like, hey, they're apologizing for the baby in the background that's crying, and that's why they're so urgent. That is a recording of a baby. We can tell it's not a baby in the same room that you are. And that's clearly a scam in and of itself. So it's all these different kinds of acoustic elements that come together. And that, that was actually one of our first big proof points because we realized these platforms, even, you know, retail or finance, who we work with a lot today, these spaces, you'd want to know what kind of fraud they're dealing with, what the prevalence is. But no one knows how to just listen to their voice conversations. Everyone is guessing based on small samples. So just the ability to say we can cover the whole basis and give you that sense of the statistics is so powerful.
A
Yeah. And I think it's important for our audience to know and understand that this isn't some, you know, you know, something coming out of left field or something that's, ah, this doesn't happen very often. It happens all the time. And the technology that people are using is very sophisticated. Right. But also easy to use. Right. In an easy example, I think it was in the summer. It was actually the US State Department was targeted. Right. By a voice clone impersonating Marco Rubio. Right. And like from reading reports, it seemed like it, like this got pretty far, maybe farther than it should have. You know, Mike, I'm sure this is something that you guys deal with all the time. Right. Just how easy it is for people to create voice clones and the whole Deep fake thing. Right. Can you talk a little bit about what companies need to be paying attention to? Right. And how can they even tell? Right. How like, and, and what should they maybe even be doing internally to make sure that this doesn't happen to them? Because I don't know what you can do.
B
Right. Yeah. I mean synthetic voices are much more prominent and they're better than they've ever been and will keep getting better. Sam Altman sort of famously came out and said don't even try to detect if they're real. It's completely impossible. But Sam Altman is famously a marketer and I would say as a technologist, that's at least not true yet. And I don't expect it to be true for some time. To humans, the way we hear, we're indexing on specific things we can't hear with the same fidelity. We can't tell at this point the synthetic voices are good enough. We genuinely can't tell. You can't train a person to do it better. Much better than chance. But AI systems can. Technology can notice the discrepancies. And some of those discrepancies are the kind of obvious ones. There's a glitch or something like that. Some of them are more subtle. That the way the technology is generating your voice is very authentic. But the room sound keeps changing. It's as if you were teleporting between different environments. Sometimes it's even, you know, the adversary tries to disguise some of those things by adding the sound of a subway in the background and adding a bunch of static. And that fools a lot of systems. But for modulate, we know what real background noise sounds like. And just as with the screaming baby, we can say that's not real, that's not actually happening to you right now and use it as further evidence. So there is this prevalent phenomenon. You're going to see more and more people not just copying your CEO's voice, but copying, you know, day to day employees voices. But it is possible to implement the right tools to be able to catch right off the bat in a matter of seconds. Something is wrong here. You need to be paying very close attention to this conversation.
A
Yeah. So I know that you all did just come out with some new research and I'm wondering if that's where this comes into play. So is with the ensemble listening model. Right. In the example that you just gave is that kind of this, this new technology and the new research is that what kind of helps to be able to decipher like hey, this, you Know, this is why it's a deep fake. And this is how we can, you know, layer this sound and really construct it.
B
It's the same principle, though. Synthetic voice detection is even only a corner of what the ELM sort of approach allows us to do. So for those who aren't familiar with the research, this idea of an elm, the E stands for ensemble, as Jordan mentioned. So the idea is instead of one monolithic black box model, you have a number of different models that are doing different things. And in our case, we have models that are looking at emotional characteristics. We have some that are looking at. We have some that are looking at the timbre of your voice that implies whether you're synthetic or implies your age potentially. We have others that are looking at what you're saying, your behaviors like interruptions. And you need to have a way to combine all these different data points together, whether to make a decision of, is this a synthetic speaker, are they attempting to be fraudulent? Or even sort of more complicated analyses. And so the big innovation in our ELM research was the ability to orchestrate these different kinds of models in a way that's dynamic, that can actually say, hey, because our synthetic voice detector is flagging pretty high, we actually want to trigger our noise detectors to go to something that's a little more granular, a little bit more precise, because that's what's needed if we want to get even more accurate on synthetic. And now that we're starting to see synthetic voice, let's take a closer look at some of the fraud behaviors that might be accompanying it. It's worth, you know, investing a little bit more time and energy looking at that. So the way we zero into a conversation, just as a actual trained analyst would be, is sort of top down. You start quickly surveying what do I think are the major things that are happening here. And then you look more closely in real time as the conversation is happening, at the elements that are going to help further inform your understanding of what's going.
A
And I, I, I really want to dive into this a little bit deeper. But before we do, going to take a quick break here from a word from a sponsor that's very relevant to today's conversation. All right, so, Mike, you were just talking a little bit about the ELM and some of the new research that's gone in. So one thing I kind of thought about and let me know if I'm completely off base. So, you know, in a previous life, I took a lot of photos, right? And remember, you know, kind of the difference between a flat JPEG Right. And if you go to, you know, edit a flat jpeg, you know, you don't have a lot of control. But then there's kind of these RAW files, right, where you can almost, there's all these different layers and you can individually pull out and inspect and tweak individual ones. So is that kind of how this, you know, ELM works? And if so, can you kind of explain some of the new use cases that this technology unlocks for, you know, everyday enterprises?
B
Yeah, that concept of layers is very appropriate. The way the ELM is trying to model a conversation, it's looking at each of these different components. And the key innovation, again, isn't that you can ask each of these questions. People for a long time have had tools to transcribe a conversation and tools to say, what's the emotion? But let's take a simple example. If the emotion is sarcastic and the comment is nice job, the meaning is not nice job. And if you're then, you know, asking a AI summarizer to explain what happened in the conversation or, you know, any other attempt to derive something out of what actually happened, your systems will not be able to connect the dots between the emotion, sarcasm and the words that were said. It's that extra layer of not only, hey, FYI, this was sarcastic, but we can feed that back. We can use that to inform our understanding when they said nice job. And that can color our interpretation of what happens next. So it's all continuous, it's all feeding back into itself instead of just being five, ten, a hundred completely independent characterizations of the conversation.
A
That's a great way to kind of illustrate that. And I'm guessing, right, because I'm always thinking, you know, not in a, in disruption, in a bad way, right? But I'm always looking at different sectors, different departments, you know, that are maybe ripe for disruption in a good way, right? So we stop, you know, stop doing our day to day work. Like it's, you know, 19, 20. And one thing I always think of, especially when it comes to voice AI, is customer service, right? I don't know anyone that likes, you know, waiting on hold for hours and you know, maybe even after that, you're still not very happy with the support that you get, right? So I'm guessing that this is, you know, a space that you've heard a lot about and you know that your products are, are very, you know, crucial for these companies that want to do this. But can you tell a little bit just, you know, on the topic of today's show, you know, your Example, Right. Being able to layer, you know, sarcasm with, you know, different tones of voice and all these different things. What does this mean for services like customer service or departments like customer service?
B
Yeah, customer service is a huge space that we spend a lot of time in here. So I'll focus on AI agents. We can talk about human agents too, but in the AI agent context, when people talk to AIs, we don't talk normally. We immediately know we're talking to an AI and we regulate ourselves because we're so worried about being misunderstood. So there were, there were some great studies that I saw done on this where, you know, if the, if someone asks you do you own your home? A human might respond to another human in any of hundreds of different ways. Oh well, the bank hasn't repossessed it yet. Haha. If you know, an AI is asking you if you owe your home, you have like four or five possible answers because you're so worried that if you say something off book, the AI is not going to be able to understand. So we are restricting ourselves to make it easier on the AI. And that's part of why the traditional experience talking to an AI agent feels so stifling. It doesn't feel authentic. It's in that uncanny valley of it's trying to feel natural, but it's not meeting that bar. Whereas introducing technology like modulates allows that AI to actually understand what's going on. So if you're starting to feel frustrated, it can hear it, it can notice it, it can try and resolve that frustration, or it can say, I'm sorry, I clearly can't solve this for you. Let me escalate to a human right. Part of deploying these AI agents is you need to be able to trust that they're not going to go off the deep end when something goes wrong. So using a system like modulate that can help them effectively introspect and notice when something is going wrong and can provide you trend analysis of what kinds of things are they doing across thousands and millions of calls. That's what creates the enterprise trust and conviction that allows people to actually deploy these agents at scale in the first place.
A
Yeah, and I'm glad we went straight to that kind of example or use case because I know from personal conversations and just from, you know, hearing from a lot of our audience that this is something that business leaders are grappling with. Right. Because before when it came to, you know, voice AI agents, it was all about latency.
B
Right.
A
There was too much of a delay and it felt you Know, not human. But then it was all about, you know, oh, our company's knowledge. It has to be easy. And before it was, you know, expensive rag pipelines, and now it's, you know, with systems. Whether you're talking about Google, Gemini's, you know, their voice open AI's 11 labs. Now it's seemingly easier for any company to get, you know, pretty human sounding, low latency AI agents that's connected to their knowledge. But you brought up a good thing about trust and then understanding and the tone, you know, so maybe can you walk us through what are those considerations for those people that are on the fence? Right. Should I just go ahead and connect all my data and, you know, go put one of these agents live and see what happens? What's the right way to roll this out?
B
Yeah, I mean the, the three major considerations that we hear from people who are thinking about these voice AI agents. One is again, just, is it gonna go rogue? We, we've talked to someone in the interviewing space who we believe misprompted their agent to check if candidates were flexible. So the agent started asking people if they could try out yoga poses during. Right. This is apocalypse level stuff. But it happens all the time because it's just so hard to, to organize. And so the, the first need for these customers is just to be able to say, hey, if the AI doesn't know, it's not going to hallucinate, which is how LLMs are built. Or if it does hallucinate, we will get an alert right then and there and be able to escalate out of that chain so that we don't end up in that dark place we've seen the courts will uphold. If your AI hallucinates a policy and tells it to your customer, you are on the hook for that policy, even if it's not yours. So this is a serious enterprise consideration. The second fear here is just about scale. If you're deploying this at massive scale, even if you trust it to mostly do a good job, what is it doing? How do you know? How do you find that out? Every AI agent system claims to have some kind of reporting or logging, but the way those systems work, they're not actually picking apart the specifics of what they saw in the conversation because they're too busy participating in it. Having a system that sits on top of it and actually plucks out the key insights in a structured way makes it much, much easier for you to actually know what's going on, which then feeds into the third and final, which is Compliance. If you need to be able to justify, why did you decide that Mike was probably fraudulent? You can't just say, I don't know. The magic box told me. You need to be able to explain what that logic is. Elms are fundamentally explainable because we can look at the component models and we can tell you here are the things that contributed to that. Assess and prove that it's not biased, prove that it's accurate and consistent in a way that black box models can't.
A
Yeah, and I think probably the way that you just described it probably really resonated with a lot of people because I, I get the, the appeal of being able to very easily go get an AI agent, right. Like that can go in and talk to people in real time, right. But you just hit it on the head right there, like, trust me, observability, all of these other things, especially when it's happening at scale, right. Like I'm wondering, as you know, someone, you know, both yourself personally and you know, leading modulate and you know, helping to shape the future of this technology, what are some of the things, you know, beyond, you know, today, next week, next month that business leaders need to be thinking about when it comes to, you know, kind of voice enabled AI agents?
B
I mean, this is such a prosaic answer, but cost, like right, right now, so many organizations are coasting on someone high up, having said, let's reserve a whole bunch of money to figure out this AI thing and we're starting to see a lot of businesses run into that wall of, hey, we're, we're hitting the end of that cache and we haven't been able to prove value yet. So I didn't list earlier as one of the big considerations cost, because so far people have been much more interested in, let's just prove what's possible. But I think we're going to see a reckoning coming where people actually look at the economics of these systems and have to reimagine it. That's a boring answer though. So let me give a more fun technical one too. I think as much as people want to be thinking about how does the AI take the load off of the customer off of the platform, there's also a version of the AI taking the load off of the customer. And we're already seeing applications come up today where instead of me having to call my bank and wait on hold, I can have an AI duet I can delegate to the AI, which creates a bunch of fun new challenges like should the AI prioritize or should the bank prioritize my AIs call the way they would have prioritized my call as a human. Can they have two AIs talk to each other? At what point are we just recreating an API? There's a lot of, like, fundamental design questions of how does any of this impact your brand? What does it mean for you to try to build trust with your customer if your customer is actually sending a delegate to interact with you instead of coming in directly? So I think there's a much bigger question, not just about voice as a mechanism for completing a transaction, but about voice being the emotional thing that builds brand trust, that builds relationships, and how do we make sure that in our haste to automate so many pieces of this, we're not actually crippling that kind of brand trust and loyalty that so many platforms rely on?
A
So yeah, you just set off so many new questions in my head, but I can't keep you forever. But you know, Mike, we've talked about a lot in today's conversation, from, you know, deep fakes and guard rails around, you know, AI voice to, you know, fraud detection and different sectors being disrupted, maybe in a good way. But as we wrap up here, I want to hit, go back to kind of where we started and just ask you, so now that, you know, through your guys's technology and some of the new advancements that you've come up with, AI can actually hear what we mean and, you know, it can be more than just looking at tokens and text. What does this unlock? Right? Like, what should business owners be most excited about?
B
I mean, I, I think the, I wish I had a punchier way to say it, but I think that the thing that they should be excited about is actually understanding their customers and being able to solve their customers problems, right? Like that's the actual job of customer support. That's the actual job of anyone that's picking up the phone and talking to someone. You want to understand them, you want to build a relationship, you want to be able to satisfy something. Right now there are so many frictions in front of us and we can talk about, hey, we can make AI agents understand people better. What about humans? What about all the sort of culture mismatches that we run into all the time where I don't understand what it is that you meant. I can't tell you the number of people I've talked to who said, I'd love to have a little bird on my shoulder that could tell me, hey, what they meant was this. And here's how you can communicate what you want to communicate to them. I think that actually opens the door to much richer sort of worldwide conversations overall. So that's kind of me on my founder perch talking about large mission stuff. But that is what really excites me. And I think any platform that is thinking just about how do I complete this transaction more efficiently is thinking a little bit too small. There's a much greater opportunity here to be saying, how can I use this technology not just to complete that one transaction, but to enrich the relationship that I'm building with my customers and be someone that they can actually trust to solve their problems in a larger way?
A
No, this was, this was a great and fun conversation. And I think, you know, Mike, like as I talked about at the beginning, as you know, voice becomes more and more native and kind of the de facto interface. I think that you answered a lot of important questions that our audience is going to have both in 2026 and beyond. So thank you so much for taking time out of your day to join the Everyday AI Show. We really appreciate it.
B
Thank you so much for having me.
A
All right, if you miss anything, y', all, we're going to be recapping it all in today's newsletter, so make sure if you haven't already, to please go to your everydayai.com thanks for tuning in. Hope to see you back tomorrow and every day for more Everyday AI. Thanks y'.
B
All.
A
The risk with AI Voice agents isn't that they sound too robotic for your company to use. The real risk is that they can sound too confident while saying something completely wrong to your prospective clients or customers, or made up refund policies, promises your company never approved, or discounts that don't even exist. You've got to give your AI Voice agents a trust layer with Modulate. Modulate monitors live voice conversations to flag abuse, false claims, fraud, and user emotions for safer, more empathetic responses. For the guardrail layer you need between your AI agents and your customers, you need Modulate at Modulate AI. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going for a little more AI magic. Visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.
Date: January 29, 2026
Host: Jordan Wilson
Guest: Mike Pappas, CEO & Co-founder, Modulate
Episode Theme:
Exploring the cutting edge of voice AI: technology that not only transcribes speech but understands tone, intent, emotion, and authenticity, unlocking new possibilities across industries.
This episode dives deep into the latest advancements in voice AI, focusing on Modulate’s Velma Voice Native AI Model and their Ensemble Listening Model (ELM) technology. Host Jordan Wilson and guest Mike Pappas discuss why understanding meaning—beyond words—matters, the practical impacts across various sectors (from gaming to customer service), and what businesses must know to safely and effectively embrace voice-enabled AI in 2026 and beyond.
On limitations of transcription:
On explainability of AI:
On the future of customer interaction:
On the big opportunity:
Voice AI that understands emotional nuance, intention, and authenticity—far beyond text and tokens—is not just a technical feat. It stands poised to transform how businesses build trust, detect fraud, assist customers, and construct truly meaningful, human-centric relationships at scale.
Mike Pappas:
"There's a much greater opportunity here... to enrich the relationship... and be someone they can actually trust to solve their problems in a larger way." (27:22)