The AI Podcast: "OpenAI's Advanced Voice Mode Is Finally Here" – Detailed Summary

Episode Title: OpenAI's Advanced Voice Mode Is Finally Here
Host: The AI Podcast
Release Date: January 18, 2025

Introduction

In this highly anticipated episode of The AI Podcast, the host delves deep into OpenAI's long-awaited rollout of their Advanced Voice Mode. After four months of speculation and delay, OpenAI has begun deploying this groundbreaking feature to its user base. The host expresses palpable excitement and anticipation, setting the stage for an in-depth exploration of what this update entails, its unique features, and the implications for both casual users and professionals in the AI landscape.

OpenAI's Advanced Voice Mode: Features and Innovations

Dynamic Voice Capabilities

One of the standout aspects of OpenAI's Advanced Voice Mode is its dynamic voice functionality. Unlike traditional AI voice models that rely on pre-recorded audio snippets, OpenAI's approach allows for real-time modulation of voice intonation and inflection.

The host emphasizes this innovation, stating:

"The most impressive part was the fact that essentially the voice is very dynamic voice. So it's not just trained off of a bunch of audio files that are saying something in one specific way." (00:02:00)

This dynamic capability means that the AI can adapt its voice based on contextual commands, such as altering its tone to sound out of breath or injecting specific emotional undertones like sarcasm or happiness. This level of flexibility surpasses previous AI voice models, providing a more natural and engaging user experience.

Enhanced Naturalness and Intonation

The Advanced Voice Mode incorporates intention and inflection directly into the AI model, enabling it to produce speech that more closely mimics human nuances. The host remarks:

"It has all of the intonation, all of the inflection that's all baked into the AI model. So it's, it's phenomenal." (00:03:30)

This advancement addresses a common limitation in earlier AI voices, which often sounded robotic or lacked the subtle variations present in human speech.

Rollout Strategy and Availability

Phased Deployment

OpenAI is adopting a phased rollout approach to ensure quality and performance. Initially, the Advanced Voice Mode is being introduced to ChatGPT Plus and Teams users, with a broader release anticipated by the end of the fall season.

The host provides an update on the rollout status:

"Advanced voice mode has started and we're slowly rolling Users in the alpha to enrolling users in the alpha to ensure the quality of the experience." (00:05:30)

He anticipates widespread availability by late October, expressing confidence that most users will gain access by then.

Regional Limitations

However, the rollout is not without its limitations. Certain regions currently lack access to the Advanced Voice Mode, including the European Union (EU), the United Kingdom (UK), Switzerland, Iceland, Norway, and Liechtenstein. The host conveys his apologies to users in these regions:

"If you are waiting and excited for this, you are not going to get it in a few different regions, or at least it's not yet available." (00:09:00)

He remains optimistic about future availability in these areas and commits to keeping the audience informed about progress.

New Voices and Competitive Landscape

Introduction of New Voices

Alongside the Advanced Voice Mode, OpenAI has introduced five new voices, each named after elements of nature: Arbor, Maple, Soul, Spruce, and Veil. These additions expand the existing palette of voices, including Breeze, Juniper, Cove, and Ember, bringing the total to nine distinct voice options.

The host lists the new voices:

"They have Arbor, Maple, Soul, Spruce, Veil. Right. They're doing all like the nature name things..." (00:07:30)

This thematic naming convention underscores OpenAI's commitment to creating voices that feel organic and relatable.

Comparison with Competitors

While OpenAI's voice selection is competitive, especially with its dynamic capabilities, other companies like Google Gemini Live, 11 Labs, and Wellsaid Labs offer a broader range of voices. However, the host contends that quantity does not equate to quality:

"They got tons and tons of voices, but doesn't necessarily mean they're better. I don't think anyone is doing this kind of dynamic voice changing..." (00:08:45)

He highlights that OpenAI's unique approach to voice modulation and naturalness sets it apart in the crowded AI voice market.

The "Sky" Voice Controversy

Background of the "Sky" Voice

During OpenAI's spring update, a voice named "Sky" was introduced, which closely resembled the AI assistant voice portrayed by Scarlett Johansson in the movie Her. This similarity sparked significant controversy due to the unlicensed use of a voice akin to a celebrity's.

The host explains:

"Everyone that tried the voice, Sky, it sounded exactly like the AI system from the movie Her... like Scarlett Johansson's voice." (00:06:15)

Legal and Ethical Implications

The controversy escalated when it was revealed that OpenAI attempted to obtain permission from Scarlett Johansson to use her voice model but was denied. Despite this, the "Sky" voice was initially released, leading to backlash from Johansson and her legal team.

"Sam Altman or OpenAI texted Scarlett Johansson and was like, hey, can we use your voice... And she didn't give them permission." (00:07:15)

In response to the legal threats, OpenAI removed the "Sky" voice from their offerings, citing the need to respect individual likenesses and avoid unauthorized use.

Community Reaction

The removal of the "Sky" voice has left many users disappointed, as it represented a highly realistic and appealing option. The host empathizes with the community's sentiments:

"Many people are sad about it. We will continue to move on in any case." (00:08:30)

Despite the setback, the Advanced Voice Mode continues to progress, with OpenAI focusing on ethical AI development and user experience.

Future Outlook and Conclusion

Anticipated Enhancements

Looking forward, the host is optimistic about the continuous improvements and additions to OpenAI's voice models. He anticipates that as more users gain access, further refinements will be made to enhance the technology's capabilities and accessibility.

"This is pretty awesome. I'm super excited for everything that's going to be rolling out." (00:09:30)

Final Thoughts

In wrapping up the episode, the host reiterates his excitement for OpenAI's Advanced Voice Mode and encourages listeners to stay tuned for future updates. He underscores the significance of this development in the broader AI landscape, highlighting how dynamic and natural-sounding AI voices can revolutionize user interactions across various platforms and industries.

Notable Quotes

Host on Dynamic Voices: "The most impressive part was the fact that essentially the voice is very dynamic voice. So it's not just trained off of a bunch of audio files that are saying something in one specific way." (00:02:00)
On Voice Intonation: "It has all of the intonation, all of the inflection that's all baked into the AI model. So it's, it's phenomenal." (00:03:30)
Regarding OpenAI's Tweet: "Advanced voices rolling out to All plus and teams users in the Chat GPT app. Over the course of the week, while you've been patiently waiting, we've added custom instructions, memory, five new voices and improved accents." (00:05:00)
Host's Opinion on Delay: "They can also say sorry I'm late in over 50 languages... I think they should apologize for making this thing four months late." (00:06:15)
On Competitive Voices: "They got tons and tons of voices, but doesn't necessarily mean they're better. I don't think anyone is doing this kind of dynamic voice changing..." (00:08:45)

Conclusion

This episode of The AI Podcast provides a comprehensive overview of OpenAI's Advanced Voice Mode, highlighting its innovative features, strategic rollout plan, and the challenges faced during its development. The host effectively balances enthusiasm with critical analysis, offering listeners valuable insights into the future of AI-driven voice technologies. Whether you're an AI enthusiast or a professional leveraging AI in your career, this episode serves as an essential guide to understanding OpenAI's latest advancements in voice technology.

Transcript

A (0:00)

Finally, the moment I have been waiting for for like freaking 4 months. OpenAI has stopped dragging its feet and is starting to roll out their advanced voice mode. This is something that they showed off in their spring update, like four months ago. There's a few things that are still missing. One of them is the video that takes over your whole phone. There's a bunch of other things I'm gonna be telling you who this update is coming to, what, what's going on, what is enabled, and everything all about it. But before we get into that, I wanted to say that today's episode is sponsored by my very own AI Hustle school community. So if you've ever wondered like, or if you've ever wanted to use AI for a side hustle to make money or in your career, if you want to level up your career or with your business, if you're like, gee, I want to know how to use AI to make more money, scale my business and grow faster, you need to join our school community. It is going to be a hundred dollars a month, but we have it at the, we have it at a fraction of that right now. I think it's like 20 bucks a month. And if you lock in the price now, we're not going to raise it on you. It's an incredible community and we cover everything that you need to know to use AI to scale your business or to make money. So Link is in the show. Notes to the school community. Would love to have you in it. Let's get into what OpenAI is doing. So this is actually kind of crazy. The, the thing that I think is amazing about this, that I was the most impressed with when they demoed this entire feature set from OpenAI is the fact that right now we're all used to AI voices. This isn't like a new concept. They already have AI voices on chat GPT that you can chat with. There's like two or three AI voices and those are kind of the traditional AI voices that you see everywhere. The things like 11 Labs or well said labs, both of these I've used to run entire podcasts. Not, not my own podcast. Don't worry, this is not a clone today, but maybe someday, who knows, right? But I've used like those other kind of AI voices to scale up entire podcast, like over a hundred thousand listeners before. So, like they're good enough that people like them and people, all sorts of corporate people use them. This isn't what OpenAI is doing. They have a whole new thing here that to me, the most impressive Part was the fact that essentially the voice is very dynamic voice. So it's not just trained off of a bunch of audio files that are saying something in one specific way. For example, if you took every podcast I ever recorded, threw it into a regular voice model, it's going to sound pretty like if. And if I just typed in the words, it would say something pretty similar to this. Minus, like my stuttering or me pausing to think or me saying or. But. Or like, I don't know, stuff like that. Right. So you wouldn't. It wouldn't be quite as natural. So what OpenAI has done though, is they've trained a model that's super dynamic, meaning that you can tell it to, you know, give it a script and say, hey, like, say this with the voice and it will say it. Then you can say, okay. Say it like you're running up a hill and you're like, out of breath. And it's like, okay, I'm coming. Like, there's no way you can train a normal AI model to do that. Well, the ones of the past, they were not able to do this, right. They could do all sorts of crazy things. I've seen some pretty crazy demos. People say, like, pretend that you're on like a peloton and you're like the run the, you know, the bike instructor, and you're like telling everyone to do everything and you're out of breath. And it does it. Um, it says, you could say like, say X, Y and Z. It says, it's like, okay, now sing it. And then it sings. It's like, sing it happier. Sing it sarcastic, like, more sarcastically, like, now be angry when you say it like. So it has all of the intonation, all of the intonation, all of the inflection that's all baked into the AI model. So it's, it's phenomenal. It will blow your mind if you listen to it, if you try it. And finally, OpenAI is rolling this out to everyone to see. Now, when I say to everyone to see, obviously that's not actually to everyone. This, this is going to be just to the pain users. I was super stoked because I pay for ChatGPT, like, probably a lot of you. And I hopped on my phone and I looked and all I got was this stupid pop up. So it's the. I'll. I'll give, I'll give you the lowdown on this pretty much. It says advanced voice mode is on its way. A roll of advanced voice mode has started and we're slowly rolling Users in the alpha to enrolling users in the alpha to ensure the quality of the experience. All plus users will have access by the end of fall. I'm gonna let you know as soon as you're in. There's a little okay bubble. Right. So moral of the story is by the end of fall, which what's this gotta be like probably the end of October, right. I'm hoping everyone should get it, but they're rolling out, it's coming. So rest assured the wait is over four freaking months and we're finally gonna be getting this. So I personally am pretty excited about this. They had a tweet about this which is kind of funny. I'll tell you why it's funny, because it's. And this is from the official OpenAI account. They said advanced voices rolling out to All plus and teams users in the Chat GPT app. Over the course of the week, while you've been patiently waiting, we've added custom instructions, memory, five new voices and improved accents. So they added a bunch of new stuff which personally I would have rather they just release it when they announced it and added these things as they went. I think it would have gotten more hype and been more exciting. But whatever they then they said in the tweet, it can also say sorry I'm late in over 50 languages, which I think is fitting. And I think that they should apologize for making this thing four months late. But whatever, that's just me. So what are the new voices that this thing's getting and who else is kind of doing this? There's five new voices which I'll tell you the names of them but essentially this is useless because it doesn't explain much until you can actually hear them. But they have Arbor, Maple, Soul, Spruce, Veil. Right. They're doing all like the nature name things and they already have Breeze, Juniper Cove and Ember, which yeah, whatever. Okay, so that's cool. The thing that's funny though, so they now have like nine names. What's interesting is Google Gemini Live also does a lot of voices. Now it doesn't quite do the same dynamics thing, but Google Gemini Live has a little bit more than that. So still sort of beating them on the number. But I don't think the numbers really matters that much because if you go to something like well said labs or Synthesia, they got tons and tons of voices or like 11 labs, tons and tons of voices, but doesn't necessarily mean they're better. I don't think anyone is doing this kind of dynamic voice changing that they're doing, which is absolutely insane. So it's funny because they're. They're doing. All of these things are named after nature. And I think it's kind of like, you know, a plug at like, look, we're, like, making it more natural, which it is. It's. It's impressive. The one funny thing, the one controversy in all of this is that there's a voice that's missing that they announced in the spring update. Oh, what happened? And that is the voice of Sky. So sky, for those that don't know, everyone, everyone that tried the. The voice, sky, it sounded exactly like the AI system from the movie her, if you've seen it, which essentially is just Scarlett Johansson's voice. So there's a lot of controversy that goes into this. And it's. It's just a crazy coincidence that it sounds really similar to her. But essentially what happened was, I think like Sam Altman or OpenAI texted Scarlett Johansson and was like, hey, can we, like, use your voice and train a thing on it? And she didn't, like, give them permission and they wanted to do it because obviously the movie her with the AI voice assistant, they're like, oh, this will be, like, cool. This will be kind of meta. Right? She didn't give her permission. And then they released their model and showed their update with that voice. Then it got tons of controversy. I think she threatened to sue them or sued them or something over it, and then they deleted it. And we're like, no, it's a crazy coincidence that sort of sounded like you, but we're just deleting it for no apparent reason, you know? Right. So obviously they probably. Probably got caught up in that one a little bit. But the voice is gone. Many people are sad about it. We will continue to move on in any case. This is pretty awesome. I'm super excited for everything that's going to be rolling out. I will keep you up to date. Now, the one thing that I will say, if you are waiting and excited for this, you are not going to get it in a few different regions, or at least it's not yet available. That is the eu, the uk, Switzerland, Iceland, Norway and Lichtenstein. I am so sorry, everyone from Liechtenstein. It's not available yet, but it's coming soon. I think the bigger thing here, though, is that it's not available in the eu. I'm really hoping it rolls out, like, without any issues in the eu. I'll keep you up to date. Have a fantastic rest of your day.