Everyday AI Podcast – "AI Can Finally Hear What You Actually Mean. What this unlocks"
Date: January 29, 2026
Host: Jordan Wilson
Guest: Mike Pappas, CEO & Co-founder, Modulate
Episode Theme:
Exploring the cutting edge of voice AI: technology that not only transcribes speech but understands tone, intent, emotion, and authenticity, unlocking new possibilities across industries.
Episode Overview
This episode dives deep into the latest advancements in voice AI, focusing on Modulate’s Velma Voice Native AI Model and their Ensemble Listening Model (ELM) technology. Host Jordan Wilson and guest Mike Pappas discuss why understanding meaning—beyond words—matters, the practical impacts across various sectors (from gaming to customer service), and what businesses must know to safely and effectively embrace voice-enabled AI in 2026 and beyond.
Key Discussion Points & Insights
1. The Limitation of Text-Only AI (00:00–05:13)
- Problem: Traditional AI transcription tools focus on converting speech to text, missing key contextual factors such as tone, timing, and emotion.
- Jordan Wilson: “There's a big difference between a model… hearing what you’re saying or understanding the words that you’re saying versus understanding the tone. And now I think the technology is finally there that we can accomplish the latter…” (02:06)
- Why It Matters: Human communication is rich with subtext—emotion and intention are often as important as, or more than, the words themselves.
2. Modulate’s Origins and Mission (03:30–04:26)
- Mike Pappas: Modulate's journey began in online gaming, working with titles like Call of Duty to distinguish between friendly banter and harmful language—impossible with text alone.
- Quote: “You can’t do that based on transcriptions. You have to understand how people are actually experiencing the conversation.” (03:56)
- Expansion: Now, Modulate works with Fortune 500 companies beyond gaming, including fraud detection and AI guardrails.
3. The Importance of Meaning in Voice (05:13–07:13)
- Illustration: Simple word examples (“hey, you coming?”) can carry wildly different meanings based on inflection and tone.
- Mike Pappas: “Text doesn’t communicate nearly as much as voice does. ... It’s not actually understanding the experience human beings are having…” (05:34)
- Voice AI must evolve to build context and understand conversational nuance.
4. Real-World Use Cases: Beyond Gaming (07:13–09:22)
- Example: Major food delivery client—Modulate helped spot not just emotional abuse (protecting drivers) but, unexpectedly, five times more attempted scams than legacy fraud tools.
- Key Insight: AI models can detect fraudulent intent by analyzing emotion, authenticity, and subtle audio cues (even discerning fake background noises).
- Quote: “...what we were doing was listening live to the call and saying things like, ‘hey, this person is performing anger, but they don’t actually demonstrate authentic anger.’” (08:11)
5. Tackling the Threat of Synthetic & Cloned Voices (09:22–12:06)
- Issue: Deepfakes and voice clones are “shockingly frequent,” even fooling government agencies.
- Detection: While humans can’t reliably detect synthetic voices, AI models can—by analyzing discrepancies in emotion, background noise, and acoustic signatures.
- Quote: “Sam Altman sort of famously came out and said don’t even try to detect if they’re real. ... [But] AI systems can. Technology can notice the discrepancies.” (10:47)
6. Modulate’s Ensemble Listening Model (ELM) (12:06–14:18)
- How it works: ELM combines multiple specialized AI models—emotion detection, timbre analysis, behavioral analysis—to form a full picture of a conversation.
- Key Breakthrough: Orchestrating these models dynamically, enabling real-time, context-aware analysis.
- Quote: “It’s the ability to orchestrate these different kinds of models in a way that’s dynamic...” (13:23)
7. The “RAW Photo” Analogy and New Enterprise Use Cases (14:18–16:31)
- Jordan’s Analogy: ELM is like editing a RAW photo (with many tweakable layers) vs. a flat JPEG—allowing granular insight into each conversational aspect.
- Unlocks: Insight into sarcasm, intent, escalations, and a feedback loop for more accurate, holistic AI understanding.
- Example: Correctly interpreting “Nice job” as sarcasm—not sincere praise—by merging tone with text.
8. Impact on Customer Service & AI Agents (16:31–19:54)
- Transformation: Voice AI agents using ELM can recognize user frustration, escalate issues appropriately, and create more natural-feeling interactions.
- Mike Pappas: Users currently “regulate themselves” to be understood by AI, losing authenticity. Richer voice AI can bridge that gap.
- Quote: “We’re restricting ourselves to make it easier on the AI. And that’s part of why the traditional experience talking to an AI agent feels so stifling.” (17:54)
9. Key Considerations for Deploying Voice AI Agents (19:54–22:48)
- Top Three Concerns:
- Trust & Guardrails: Prevent “rogue” AI responses and hallucinations.
- “If the AI doesn’t know, it’s not going to hallucinate… or if it does, we will get an alert…” (21:01)
- Scale & Observability: Need for meaningful reporting and transparency at enterprise scale.
- Compliance & Explainability: Systems must be explainable and auditable—ELM’s component models provide logic trails.
- “Elms are fundamentally explainable because we can look at the component models and… what contributed to that.” (22:27)
- Trust & Guardrails: Prevent “rogue” AI responses and hallucinations.
10. The Future—What’s Next for Voice & Business (23:41–25:41)
- Business Priorities: Moving from exploratory budgets to cost-effectiveness and tangible ROI is becoming urgent.
- New Scenarios: “AI duets”—enabling users’ personal AIs to negotiate with company AIs—raise questions about loyalty, prioritization, and the emotional fabric of customer relationships.
- Quote: “How do we make sure that in our haste to automate… we’re not actually crippling that kind of brand trust and loyalty that so many platforms rely on?” (25:28)
11. The Big Unlock: Relationship Building at Scale (26:22–27:53)
- Deeper Purpose: Beyond efficiency, AI should enable true understanding and relationship building.
- Mike Pappas: “The thing that [business owners] should be excited about is actually understanding their customers and being able to solve their customers’ problems, right? ... There’s a much greater opportunity here to be saying, how can I use this technology not just to complete that one transaction, but to enrich the relationship that I’m building with my customers...” (26:22)
Notable Quotes & Memorable Moments
-
On limitations of transcription:
- “Text doesn’t communicate nearly as much as voice does.”
—Mike Pappas (05:34)
- “Text doesn’t communicate nearly as much as voice does.”
-
On explainability of AI:
- “Elms are fundamentally explainable because we can look at the component models and we can tell you here are the things that contributed to that.”
—Mike Pappas (22:27)
- “Elms are fundamentally explainable because we can look at the component models and we can tell you here are the things that contributed to that.”
-
On the future of customer interaction:
- “We’re already seeing applications come up today where instead of me having to call my bank and wait on hold, I can have an AI duet I can delegate to the AI ... At what point are we just recreating an API?”
—Mike Pappas (24:18)
- “We’re already seeing applications come up today where instead of me having to call my bank and wait on hold, I can have an AI duet I can delegate to the AI ... At what point are we just recreating an API?”
-
On the big opportunity:
- “There’s a much greater opportunity here to be saying, how can I use this technology not just to complete that one transaction, but to enrich the relationship that I’m building with my customers and be someone that they can actually trust to solve their problems in a larger way?”
—Mike Pappas (27:22)
- “There’s a much greater opportunity here to be saying, how can I use this technology not just to complete that one transaction, but to enrich the relationship that I’m building with my customers and be someone that they can actually trust to solve their problems in a larger way?”
Timestamps for Key Segments
- 00:00–03:14 — Introduction to the problem: what voice AI has missed until now
- 03:30–04:26 — Modulate’s background and core mission
- 05:13–07:13 — Why tone matters: simple examples & real-world implications
- 07:13–09:22 — Food delivery case study: uncovering fraud via voice analytics
- 09:22–12:06 — Deepfakes, voice cloning, and detection challenges
- 12:06–14:18 — Ensemble Listening Model (ELM): what it is, how it works
- 14:18–16:31 — “RAW photo” analogy; layering and feedback in conversation analysis
- 16:31–19:54 — Voice AI’s impact on customer service; overcoming “uncanny valley” interactions
- 19:54–22:48 — Rolling out voice AI: trust, observability, explainability, compliance
- 23:41–25:41 — The future: AI duets, changing relationships, and brand trust
- 26:22–27:53 — The real reason to be excited: deeper understanding, better relationships
Final Takeaway
Voice AI that understands emotional nuance, intention, and authenticity—far beyond text and tokens—is not just a technical feat. It stands poised to transform how businesses build trust, detect fraud, assist customers, and construct truly meaningful, human-centric relationships at scale.
Mike Pappas:
"There's a much greater opportunity here... to enrich the relationship... and be someone they can actually trust to solve their problems in a larger way." (27:22)
