AI Voice Agents: How to Get Started
Podcast: AI Explored
Host: Michael Stelzner (Social Media Examiner)
Guest: Tommy Crist (Founder, Arose AI)
Date: January 20, 2026
Episode Overview
This episode dives deep into AI voice agents—what they are, how they work, and how marketers and business owners can practically deploy them. Michael Stelzner interviews Tommy Crist, founder of Arose AI, a rising expert in AI voice solutions, who shares actionable insights, detailed examples, and best practices for implementing AI voice agents in real-world business scenarios.
Key Discussion Points & Insights
Tommy Crist’s Journey into AI
- Early Beginnings: Tommy started his entrepreneurial journey at 18, teaching himself coding and AI via YouTube in his college dorm, which quickly led to a focus on voice AI as no-code solutions became accessible to non-technical founders.
"I started just watching YouTube videos, taught myself a bit about coding, bit about AI. And then... really stumbled into voice. AI became really viable for businesses." — Tommy [02:00]
- Content as Proof of Expertise: Tommy’s YouTube channel, focused on AI voice agents, became a magnet for clients interested in deploying these technologies.
The Current State and Impact of AI in College & Business
- Ubiquity Among Students: AI is omnipresent in college life, powering everything from assignments to job applications.
- Generational Divide: While young people embrace AI, legacy businesses show hesitancy, highlighting an opportunity for digital natives.
Misconceptions About AI Voice Agents
- Not Plug-and-Play: Creating an effective AI voice agent is not simply signing up for a SaaS tool; it can require 20–100+ hours of setup and iteration for complex needs.
"Some agents, if they're super complicated, can take 80 to 100 hours to build... You can't just sign up for a software and expect to put in an hour or two of work and reap all the benefits of it." — Tommy [05:44]
What is an AI Voice Agent?
- Definition:
"A Voice agent is three different components and really three different AIs working in unison... ears (speech to text), brain (LLM/text to text), and mouth (text to speech)." — Tommy [09:37]
- Integration: Agents can also perform post-call actions like updating a CRM or sending emails.
Notable Quote:
“You’re only really paying for the time the AI is actually on the phone... unlimited scalability... clear cost savings and ROI." — Tommy [06:46–09:15]
Real-World Business Benefits
- 24/7 Availability: Never miss a call, especially critical for industries like home services or emergency repair.
- Consistency and Reliability: If designed well, AI agents offer fewer errors and repetitive, reliable performance.
- Scalability: Handle one or thousands of calls without added personnel complexity.
- Cost Savings and ROI: Dramatically reduced costs compared to hiring, with potential for 8-10x returns.
Example Use Cases
Inbound:
- Receptionist replacement, customer support, answering FAQs, booking meetings.
Outbound:
- Automated follow-ups (e.g., notifying customers of a package), reactivation campaigns (e.g., calling lapsed customers with an offer).
"It only takes 10 cents to make that call. Maybe it's worth doing. ... An AI can make those calls, hundreds, thousands a day, pitch them on this new reactivation campaign they're running." — Tommy [13:07–15:17]
- Interactive, Not Robotic: Modern agents respond in real time and handle follow-up questions or objections.
"It's not just a broadcast of a recording. This is an interactive agent.” — Michael [15:42]
“Handle objections, everything.” — Tommy [15:42]
How Realistic Are AI Agent Voices?
- Sound Quality: Nearly indistinguishable from humans, especially for the voice itself.
- Biggest Clues: Users notice “AI-ness” more from phrasing and behavior than vocal quality.
"Most people actually recognize it's AI from what it says, not how it sounds." — Tommy [16:28]
Limitations and Future Directions
- Current Gaps: Nuanced listening (tone, emotion) is a key limitation, but rapid advances in multimodal AI (like Gemini, OpenAI) will narrow the gap.
- Latency and Cost: State-of-the-art models can be costly and less responsive, but improvements are ongoing.
Considerations Before Implementation
- Start with a Clear Business Need: Don't do it for novelty—focus on real bottlenecks that voice AI can solve.
- Legal Compliance:
- Robocalls/AI voices are regulated (see FCC's TCPA rules).
- Outbound telemarketing with AI requires prior written consent.
- Inbound or transactional (e.g., package notifications) are generally safer, but check state/federal laws.
“I will say that ruling specifically... only things that specifically mentioned were deep fakes or impersonating family members to get money, like extremely nefarious things." — Tommy [21:20]
- Ethical Disclosure:
- Some businesses fully disclose AI use, others aim for indistinguishable interactions.
- No clear difference in performance or customer satisfaction, but legal standards may evolve.
Steps to Deploy an AI Voice Agent
1. Discovery & Mapping
- Analyze business needs and map expected call flows.
- Treat it like creating a training manual for a new human employee.
- Bucket issues: Identify main reasons people call and ideal responses/actions.
2. Standard Operating Procedures (SOP)
- Define SOPs—the more you have, the easier to translate into AI logic.
- If none exist, start by categorizing main customer requests and codifying typical responses.
"Everywhere you can find data on what people are asking you and contacting you about, you can find... what routes... you’d want to take them down in that flow...” — Tommy [25:55]
3. Platform & Tool Selection
Key No-Code Platforms:
- Retell AI (retellai.com) — Tommy’s top pick for reliability and ease of use.
- VAPI AI
- ElevenLabs Agent Builder
Supporting Tools for Integration:
- n8n — Open-source automation for integrations (e.g., Google Calendar).
- Deepgram — Leading choice for fast and accurate transcription.
- Cartesia — Real-time, cost-effective voices (Cartesia Sonic 3 recommended).
How They Work:
- Platforms like Retell and VAPI combine “ears” (speech-to-text), “brain” (LLM), and “mouth” (text-to-speech) from different best-in-class providers.
"They focused a lot on... making it really easy to build voice agents, easy to understand. They also have an incredible uptime... 99.99%" — Tommy [30:25]
4. Voice Agent Customization
- Latency Matters: Target ~800ms–1s responsiveness.
- Background Sounds: Subtly added to reduce awkward silence, make calls feel more human.
- Language Support: Use platforms/models that natively support multiple languages if needed.
- Prompt Engineering:
- Keep prompts simple and clear.
- Define role, available tools, expected context, and provide 2–3 sample interactions.
- Limit agent “babble”—keep agent responses short and actionable.
"I make mine only really say one or two sentences at a time. Because they can... go on for a while." — Tommy [27:33]
- Include clarifying/confirming questions to minimize errors.
5. Testing & Iteration
- Monitor and Refine: Review recorded calls, identify hangups or hallucinations, and adjust prompts.
- Small adjustments can drastically improve performance.
"Listening to calls, finding what in the prompt caused it to hallucinate or trip up there... really, don’t underestimate the small adjustments." — Tommy [39:04]
6. Integration and Data Management
- CRMs and Sheets: Integrate post-call data logging with Google Sheets or CRM.
- Function Timing: Prefer pre- and post-call processes over live actions to avoid failed data captures when customers hang up early.
- Security: Store recordings in at least two places for redundancy.
Notable Quotes & Memorable Moments
-
On the Future of Voice AI:
"The next big jump for voice AI will 1000% be once it’s fully multimodal... and you can notice a lot more of, you know, the nuance of conversation." — Tommy [11:30]
-
On Usefulness:
“Unlimited scalability... Whether you take one call a day or a thousand calls a day, the agent will behave the same.” — Tommy [06:46]
-
On Getting Started:
“In five minutes you can get a demo up and going... as easy as logging in. They're all free to create an account. There’s no monthly subscription.” — Tommy [29:16]
Important Timestamps
- [02:00] – Tommy’s path into AI and Voice Tech
- [05:44] – Biggest misconception: not plug-and-play
- [06:46] – Business benefits: 24/7 presence, ROI
- [09:37] – What is an AI voice agent? The ears, brain, mouth analogy
- [13:07] – Inbound and outbound use cases
- [15:42] – Outbound is interactive, not a simple robocall
- [16:28] – How real do AI agents sound?
- [19:08] – Prerequisites: use-case fit, legal/ethical consideration
- [21:20] – Laws regarding AI voices and robocalls
- [22:15] – Disclosure: to reveal or not to reveal AI presence
- [24:19] – Next steps: Discovery, SOPs, selecting metrics
- [25:55] – Building flows without existing SOPs
- [27:33] – Keeping responses short to prevent rambling
- [29:16-31:22] – Overview of major no-code tools (Retell AI, VAPI, ElevenLabs)
- [33:58] – Latency: what affects agent speed
- [35:39] – Best-in-class for transcription and voices
- [39:04] – Best practices for prompt engineering and iterative agent training
Connect with Tommy Crist
- Website: Arose AI Forward slash booking
- Platforms: LinkedIn, YouTube (search "Tommy Crist")
Closing Thought
This episode equips marketers, creators, and business owners with a practical roadmap to deploying AI voice agents—focusing on strategy, compliance, tool selection, and iteration—to drive real ROI, not just hype.
For full show notes, visit socialmediaexaminer.com/aipod.
