Podcast Summary: No Priors – The Future of Voice AI: Agents, Dubbing, and Real-Time Translation with ElevenLabs Co-Founder Mati Staniszewski
Podcast: No Priors: Artificial Intelligence | Technology | Startups
Episode: The Future of Voice AI: Agents, Dubbing, and Real-Time Translation with ElevenLabs Co-Founder Mati Staniszewski
Date: December 11, 2025
Hosts: Sarah Guo and Elad Gil
Guest: Mati Staniszewski, Co-founder and CEO of ElevenLabs
Episode Overview
In this episode, Sarah Guo interviews Mati Staniszewski, co-founder and CEO of ElevenLabs, a pioneering company in voice AI. The discussion explores how ElevenLabs is redefining human-computer interaction through advanced, human-like voice models used in narration, dubbing, customer support agents, and beyond. They address the challenges of merging cutting-edge research with product development, scaling to global impact, and the future of voice as an interface in technology—from creative tools to transformative applications in education and government.
Key Topics & Discussion Highlights
1. Introducing ElevenLabs: Mission, Scale, and Growth
- Mission: Solving how humans and technology interact via voice; building foundational audio models for seamless, human-like speech creation and understanding (00:52).
- Scale: Launched in 2022, now at 350 employees across global hubs, $300M ARR split evenly between self-serve (creators) and enterprise segments, 5M+ monthly active users, thousands of enterprise customers (01:57).
- Quote:
“All is underlied with that mission of solving how we can interact with technology on our terms in a better way.” – Mati Staniszewski (01:16)
2. Founding Insight & Market Belief
-
Origins: Stemming from the poor quality of movie dubbing in Poland, where one voice is used for all characters, inspiring the founders to envision technology that preserves original voices, emotions, and intonation across languages (03:32).
-
Market Vision: Early skepticism from investors about demand; founders believed voice would become the major interface for technology, seeing opportunity in creative and interactive applications far beyond basic dubbing (05:00).
-
Quote:
"Voice is the interface of the future." – Mati Staniszewski (05:58)
3. Product Sequencing: Merging Research & Commercialization
- Lab Model: Sequence product and research by forming “labs” (voice lab, agent lab, music lab), each tackling a core challenge with cross-functional teams (07:36, 10:30).
- Practicality: Focused on first problem (high-quality, controllable voice), then built simple product layers, expanding as research breakthroughs allowed.
- Creation vs. Demand: Many use cases (e.g., dubbing, music, immersive media) were realized or expanded in response to market pull, not just push (10:30–12:05).
- Real-Time Interaction: Seamless communication with technology through natural speech remains the north star.
4. Quality, Voice Selection & Customer Use
- Selecting Voices: Enterprises often lack expertise to evaluate voice quality; ElevenLabs employs a “voice sommelier” to coach brands and select or customize voices, even for specialized or surprising requests (13:15–15:20).
- Personalization: Voices can be tailored to user preferences (age, mood, context) and even dynamically changed based on interaction (15:23).
- Benchmarks: Voice quality benchmarks lag behind those in language models and imaging due to subjectivity, voice specificity, and lack of nuanced labeling/data (16:43–17:54).
- Quote:
"Just switching the voice makes...such a big impact." – Mati Staniszewski (16:43)
"We have like a voice sommelier...to help you find what's the right branding of voice." – Mati Staniszewski (13:57)
5. Agent Platform & Emergent Use Cases
- Customer Support: Transition from reactive (support tickets) to proactive (guided, voice-based shopping assistance) agents—e.g., Misho in India, Square, Cisco, Twilio (18:14–19:06).
- Immersive Media: Live, interactive characters in games (e.g., Darth Vader in Fortnite) and interactive book experiences (20:12).
- Education: Personalized AI tutors embodying renowned experts/teachers—chess coaching as Hikaru or Magnus, practicing negotiation with “Chris Voss” (21:00).
- Digital Government: Collaboration with Ukraine’s Ministry of Transformation for agent-driven citizen services, proactive communication, and educational innovation (21:46–23:19).
- Quote:
"You can learn chess, but you can have Hikaru Nakamura or Magnus Carlsen be your teacher..." – Mati Staniszewski (21:05)
“Agentic government…they are so ahead and actually doing that.” – Mati Staniszewski (22:21)
6. Platform Competition and Business Model
- Strategic Fit: ElevenLabs best for companies needing platform-wide, multi-use-case voice solutions with engineering support; not a pure point-solution or consultancy (24:25–26:43).
- Open vs. Proprietary: The value is in seamless product integration, localization, and support; internationalization and platform openness are key differentiators.
7. Competition with Big Tech & Open Source
-
Why Not Google/OpenAI?: Success comes from focusing on both foundational model research and product layer—big labs haven’t prioritized specialized voice quality, control, and integration (27:21–28:10).
-
Talent Density: “You don’t need scale as much as architectural breakthroughs” (29:28); top researchers (50–100 globally) have disproportionate impact.
-
Open Source: Models will commoditize; product integration and workflow value will differentiate. Real-time conversation and dubbing still ~1–2 years from broad parity (30:02–31:53).
-
Quote:
“The main part that I think is different in audio space is that you don't need the scale as much as you need the architectural breakthroughs, the model breakthroughs…” – Mati Staniszewski (29:28)
8. Competitive Advantage, Ecosystem, & Product Research Rhythm
- Temporal Advantage: Model/research breakthroughs create 6–12 month lead; real, defensible value comes from product ecosystem—brand, models, voices, integrations (32:32–33:28).
- Parallelization: Internal structure allows research and product squads to move independently, aiming to deliver customer value with a “three month rule” for dependencies (33:36).
- Research Focus: Expressive, controllable text-to-speech; accurate speech-to-text; fusing audio with other modalities (e.g., video); reducing latency for live interaction (34:31–35:52).
- Quote:
"The thing that will really give that long term value is the ecosystem that you create around..." – Mati Staniszewski (33:28)
9. The Future of Voice and AI Interaction
- AI Companions: Will be broad, but personal enthusiasm is for “super assistant” over “social” AI; expects boundary between assistant and companion to blur (37:04–37:26).
- Education: Predicts profound change—personalized AI tutors, “your own teacher on demand” (40:24).
- Content Consumption: Envisions Einstein or Feynman giving lectures; anticipates major pedagogical/support applications still to come (41:14).
- Dictation & Control: Not all interfaces need to be personified, but voice will be a dominant input—especially with agents and robots (39:10).
- Quote:
“I think this will be one of the biggest use cases and I don't think it happened yet.” – Mati Staniszewski (40:34)
Notable Quotes with Timestamps
- “At ElevenLabs we are solving how humans and technology interact, how you can create seamlessly with that technology.” (00:52) – Mati Staniszewski
- "Voice is the interface of the future." (05:58) – Mati Staniszewski
- "[ElevenLabs] started with this mission of: can we narrate the work in a better way?" (08:09) – Mati Staniszewski
- "We have like a voice sommelier...help you find what's the right branding of voice." (13:57) – Mati Staniszewski
- "Just switching the voice makes...such a big impact." (16:43) – Mati Staniszewski
- “You can learn chess, but you can have Hikaru Nakamura or Magnus Carlsen be your teacher...” (21:05) – Mati Staniszewski
- “Agentic government…they are so ahead and actually doing that.” (22:21) – Mati Staniszewski
- “The main part that I think is different in audio space is that you don't need the scale as much as you need the architectural breakthroughs, the model breakthroughs…” (29:28) – Mati Staniszewski
- "The thing that will really give that long term value is the ecosystem that you create around..." (33:28) – Mati Staniszewski
- "I think this will be one of the biggest use cases and I don't think it happened yet." – re: AI tutors in education (40:34) – Mati Staniszewski
Important Timestamps
- ElevenLabs Mission & Products: 00:52–02:46
- Origin Story & Market Belief: 03:32–06:52
- Sequencing Research & Productization: 07:36–10:11
- Customer Needs & Voice Selection: 13:15–16:43
- Agent Platform & Use Cases: 18:14–23:19
- Business Model & Market Positioning: 24:25–26:43
- Competition and Open Source: 27:21–31:53
- Innovation and Ecosystem Building: 32:32–34:24
- Research Directions: 34:31–36:52
- Future Outlook (Companions, Education): 37:04–41:14
Closing Perspective
Mati envisions a world rapidly moving toward frictionless, natural spoken interaction with both technology and content—a transformation that’s as much about UI and experience as about AI research itself. ElevenLabs aims to be at the center of that shift, focusing on product depth and ecosystem integration to stay ahead in a future where high-quality voice is a basic expectation, not a luxury.
Summary by No Priors Podcast AI Summarizer. For more details, visit nopriors.com.
