OpenAI's Advanced Voice Mode Is Finally Here - The Jaeden Schafer Podcast

Summary4 min read

Summary of "OpenAI's Advanced Voice Mode Is Finally Here"

Podcast: The Joe Rogan Experience of AI
Episode: OpenAI's Advanced Voice Mode Is Finally Here
Release Date: November 10, 2024

Introduction to OpenAI's Advanced Voice Mode

The episode delves into the much-anticipated release of OpenAI's Advanced Voice Mode, marking a significant advancement in artificial intelligence voice technology. After a four-month wait since the spring update, OpenAI has begun rolling out this feature to its users, promising enhanced capabilities that surpass existing AI voice models.

Notable Quote:

"Finally, the moment I have been waiting for for like freaking 4 months. OpenAI has stopped dragging its feet and is starting to roll out their advanced voice mode."
— Speaker A [00:00]

Comparison with Existing AI Voice Models

Speaker A contrasts OpenAI's new offering with current AI voice technologies from companies like Eleven Labs and WellSaid Labs. While these platforms offer reliable AI voices used in various applications, including podcasting, OpenAI's Advanced Voice Mode introduces a level of dynamism previously unattainable.

Notable Quote:

"I've used like those other kind of AI voices to scale up entire podcasts, like over 100,000 listeners before. So, like, they're good enough that people like them and people, all sorts of corporate people use them. This isn't what OpenAI is doing."
— Speaker A [05:30]

Features of OpenAI's Advanced Voice Mode

The standout feature of OpenAI's Advanced Voice Mode is its dynamic voice modulation capability. Unlike traditional AI voices that rely on static audio files, OpenAI's model can adapt intonation, emotion, and style based on user input, making interactions feel more natural and personalized.

Key Highlights:

Dynamic Voice Adaptation: Ability to adjust tone and emotion on-the-fly.
Versatility: Can perform tasks ranging from delivering scripts to mimicking specific speech patterns, such as sounding out of breath or imparting sarcasm.
Natural Interaction: Incorporates natural speech elements like pauses and stutters, enhancing realism.

Notable Quote:

"What OpenAI has done is they've trained a model that's super dynamic, meaning that you can tell it to, you know, you give it a script and say, hey, like, say this with the voice and it will say it. Then you can say, okay. Say it like you're running up a hill and you're like, out of breath."
— Speaker A [10:45]

Introduction of New Voices

OpenAI's update includes nine distinct voice profiles, all named after elements of nature, reinforcing the platform's aim for a natural and organic user experience. The new voices introduced are Arbor, Maple, Soul, Spruce, Veil, Breeze, Juniper Cove, and Ember.

Notable Quote:

"They have Arbor, Maple, Soul, Spruce, Veil. Right. They're doing all like the nature name things and they already have Breeze, Juniper Cove and Ember, which yeah, whatever."
— Speaker A [15:20]

Controversy Surrounding the 'Sky' Voice

A significant point of discussion is the omission of the 'Sky' voice, initially showcased in the spring update. This voice closely resembled Scarlett Johansson's portrayal of an AI assistant in the movie Her. OpenAI faced backlash when it emerged that they might have used Johansson's voice without proper authorization, leading to the removal of the 'Sky' voice from the platform.

Key Points:

Resemblance to Scarlett Johansson: The 'Sky' voice mirrored the AI from Her, sparking controversy.
Legal and Ethical Implications: Allegations that OpenAI used Johansson's voice without consent.
Removal of 'Sky': In response to the backlash, OpenAI deleted the contentious voice from their offerings.

Notable Quote:

"Everyone that tried the voice, Sky, it sounded exactly like the AI system from the movie Her, which is essentially just Scarlett Johansson's voice... they released their model and showed their update with that voice, and then it got tons of controversy... then they deleted it for no apparent reason."
— Speaker A [22:10]

Roll-Out Details and Availability

OpenAI is gradually rolling out Advanced Voice Mode to its user base, prioritizing Plus and Teams subscribers. The feature is expected to be available to all Plus users by the end of fall (anticipated by the end of October 2024). However, the release is currently excluding several regions, including the EU, UK, Switzerland, Iceland, Norway, and Liechtenstein, though plans are in place to extend availability to these areas soon.

Key Points:

Phased Roll-Out: Gradual introduction to ensure quality and manage demand.
Pricing: Advanced Voice Mode included for Plus and Teams users.
Regional Limitations: Currently unavailable in specific European countries with imminent expansion plans.

Notable Quote:

"Advanced voice mode is on its way... All Plus users will have access by the end of fall and we'll let you know as soon as you're in."
— Speaker A [25:50]

Final Thoughts

Speaker A expresses enthusiasm for the advancements brought by OpenAI's Advanced Voice Mode, emphasizing its potential to revolutionize interactions with AI. Despite minor setbacks, such as the removal of the 'Sky' voice and regional roll-out delays, the overall reception of the update is positive, heralding a new era of dynamic and natural AI voice interactions.

Notable Quote:

"This is pretty awesome. I'm super excited for everything that's going to be rolling out. I will keep you up to date."
— Speaker A [28:30]

Conclusion

OpenAI's Advanced Voice Mode represents a significant leap in AI voice technology, offering unprecedented flexibility and naturalness in voice interactions. While challenges remain, particularly in ethical considerations and regional availability, the update sets a new standard for AI-driven communication tools.

Loading summary

Transcript1 lines

[00:01]
A
Finally, the moment I have been waiting for for like freaking 4 months. OpenAI has stopped dragging its feet and is starting to roll out their advanced voice mode. This is something that they showed off in their spring update, like four months ago. There's a few things that are still missing. One of them is the video that takes over your whole phone. There's a bunch of other things I'm going to be telling you who this update is coming to, what's going on, what is enabled, and everything all about it. But before we get into that, I wanted to say that today's episode is sponsored by my very own AI Hustle school community. So if you've ever wondered like, or if you've ever wanted to use AI for a side hustle to make money or in your career, if you want to level up your career or with your business, if you're like, gee, I want to know how to use AI to make more money, scale my business and grow it faster. You need to join our school community. It is going to be 100amonth, but we have it at the, we have it at a fraction of that right now. I think it's like 20 bucks a month. And if you lock in the price now, we're not going to raise it on you. It's an incredible community and we cover everything that you need to know to use AI to scale your business or to make money. So Link is in the show. Notes to the school community. Would love to have you in it. Let's get into what OpenAI is doing. So this is actually kind of crazy. The thing that I think is amazing about this, that I was the most impressed with when they demoed this entire feature set from OpenAI, is the fact that right now we're all used to AI voices. This isn't like a new concept. They already have AI voices on chat that you can chat with. There's like two or three AI voices and those are kind of the traditional AI voices that you see everywhere. The things like 11 Labs or well said labs, both of these I've used to run entire podcasts. Not, not my own podcast. Don't worry, this is not a clone today, but maybe someday, who knows, right? But I've used like those other kind of AI voices to scale up entire podcasts, like over 100,000 listeners before. So, like, they're good enough that people like them and people, all sorts of corporate people use them. This isn't what OpenAI is doing. They have a whole new thing here that to me, the most impressive part was the fact that essentially the voice is very dynamic voice, so it's not just trained off of a bunch of audio files that are saying something in one specific way. For example, if you took every podcast I ever recorded, threw it into a regular voice model, it's going to sound pretty like if. And if I just typed in the words, it would say something pretty similar to this. Minus like my stuttering or me pausing to think or me saying I'm. Or. But. Or like, I know stuff like that. Right? So you wouldn't, it wouldn't be quite as natural. So what OpenAI has done those, they've trained a model that's super dynamic, meaning that you can tell it to, you know, you give it a script and say, hey, like, say this with the voice and it will say it. Then you can say, okay. Say it like you're running up a hill and you're like, out of breath. And it's like, okay, I'm coming. Like, there's no way you can train a normal AI model to do that. Well, the ones of the past, they were not able to do this, right. They could do all sorts of crazy things. I've seen some pretty crazy demos. People will say, like, pretend that you're on like a peloton and you're like the run the, you know, the bike instructor. And you're like telling everyone to do everything and you're out of breath. And it does it. It says, you could say like, say X, Y and Z. It says, it's like, okay, now sing it. And then it sings. It's like, sing it happier. Sing it sarcastic, like, more sarcastically, like, now be angry when you say it like. So it has all of the intonation, all of the intonation, all the inflection that's all baked into the AI model. So it's, it's phenomenal. It will blow your mind if you listen to it, if you try it. And finally, opening eyes, rolling this out to everyone to see. Now, when I say to everyone to see, obviously that's not actually to everyone. This, this is going to be just to the pain users. I was super stoked because I pay for ChatGPT, like, probably a lot of you. And I hopped on my phone and I looked and all I got was this stupid pop up. So it's. I'll, I'll give, I'll give you the lowdown on this pretty much. It says, advanced voice mode is on its way. A roll of advanced voice mode has started and we're slowly rolling users in the alpha to enrolling users in the alpha to ensure the quality of the experience. All plus users will have access by the end of fall and we'll let you know as soon as you're in. There's a little okay bubble. Right. So moral of the story is by the end of fall, which was gotta be like probably the end of October, right. I'm hoping everyone should get it, but they're rolling out, it's coming. So rest assured the wait is over four freaking months and we're finally going to be getting this. So I personally am pretty excited about this. They had a tweet about this which is kind of funny. I'll tell you why it's funny, because it's and this is from the official OpenAI account. They said Advanced Voice is rolling out to all plus and Teams users in the Chat GPT app. Over the course of the week, while you've been patiently waiting, we've added custom instructions, memory, five new voices and improved accents. So they had a bunch of new stuff which personally I would rather they just release it when they announced it and added these things as they went. I think it would have gotten more hype and been more exciting. But whatever they then they said in the tweet, it can also say sorry I'm late in over 50 languages, which I think is fitting and I think that they should apologize for making this thing four months late. But whatever, that's just me. So what are the new voices that this thing's getting and who else is kind of doing this? There's five new voices which I'll tell you the names of them but essentially this is useless because it doesn't explain much until you can actually hear them. But they have Arbor, Maple, Soul, Spruce, Veil. Right. They're doing all like the nature name things and they already have Breeze, Juniper Cove and Ember, which yeah, whatever. Okay, so that's cool. The thing that's funny though, so they now have like nine names. What's interesting is Google Gemini Live also does a lot of voices. Now it doesn't quite do the same dynamics thing, but Google Gemini Live has a little bit more than that. So still sort of be in them on the number. But I don't think the numbers really matters that much because if you go to something like well said labs or Synthesia, they got tons and tons of voices or like 11 labs, tons and tons of voices, but doesn't necessarily mean they're better. I don't think anyone is doing this kind of dynamic voice changing that they're doing, which is Absolutely insane. So it's funny because they're. They're doing. All of these things are named after nature. And I think it's kind of like, you know, a plug at, like, look, we're like, making it more natural, which it is. It's. It's impressive. The one funny thing, the one controversy in all of this is that there's a voice that's missing that they announced in the spring update. Oh, what happened? And that is the voice of Sky. So, sky, for those that don't know, everyone, everyone that tried the voice, sky, it sounded exactly like the AI system from the movie her, if you've seen it, which essentially is just Scarlett Johansson's voice. So there's a lot of controversy that goes into this. And it's just a crazy coincidence that it sounds really similar to her. But essentially what happened was, I think, like Sam Altman or OpenAI, like, texted Scarlett Johansson and was like, hey, can we, like, use your voice and train a thing on it? And she didn't, like, give them permission and they wanted to do it because obviously the movie her with the AI voice assistant, they're like, oh, this will be, like, cool. This will be kind of meta. Right? She didn't give her permission. And then they released their model and showed their update with that voice, and then it got tons of controversy. I think she threatened to sue them or sued them or something over it, and then they deleted it. And we're like, no, it's a crazy coincidence that that V voice sort of sounded like you, but we're just deleting it for no apparent reason, you know? Right. So obviously they probably, probably got caught up in that one a little bit. But the voice is gone. Many people are sad about it. We will continue to move on in any case. This is pretty awesome. I'm super excited for everything that's going to be rolling out. I will keep you up to date. Now, the one thing that I will say, if you are waiting and excited for this, you are not going to get it in a few different regions, or at least it's not yet available. That is the eu, the uk, Switzerland, Iceland, Norway and Lichtenstein. I am so sorry, Ever from Lichtenstein. It's not available yet, but it's coming soon. I think the bigger thing here, though, is that it's not available in the eu. I'm really hoping it rolls out, like, without any issues in the eu. I'll keep you up to date. Have a fantastic rest of your day.

Summary of "OpenAI's Advanced Voice Mode Is Finally Here"

Podcast: The Joe Rogan Experience of AI
Episode: OpenAI's Advanced Voice Mode Is Finally Here
Release Date: November 10, 2024

Introduction to OpenAI's Advanced Voice Mode

Notable Quote:

"Finally, the moment I have been waiting for for like freaking 4 months. OpenAI has stopped dragging its feet and is starting to roll out their advanced voice mode."
— Speaker A [00:00]

Comparison with Existing AI Voice Models

Notable Quote:

"I've used like those other kind of AI voices to scale up entire podcasts, like over 100,000 listeners before. So, like, they're good enough that people like them and people, all sorts of corporate people use them. This isn't what OpenAI is doing."
— Speaker A [05:30]

Features of OpenAI's Advanced Voice Mode

Key Highlights:

Dynamic Voice Adaptation: Ability to adjust tone and emotion on-the-fly.
Versatility: Can perform tasks ranging from delivering scripts to mimicking specific speech patterns, such as sounding out of breath or imparting sarcasm.
Natural Interaction: Incorporates natural speech elements like pauses and stutters, enhancing realism.

Notable Quote:

"What OpenAI has done is they've trained a model that's super dynamic, meaning that you can tell it to, you know, you give it a script and say, hey, like, say this with the voice and it will say it. Then you can say, okay. Say it like you're running up a hill and you're like, out of breath."
— Speaker A [10:45]

Introduction of New Voices

Notable Quote:

"They have Arbor, Maple, Soul, Spruce, Veil. Right. They're doing all like the nature name things and they already have Breeze, Juniper Cove and Ember, which yeah, whatever."
— Speaker A [15:20]

Controversy Surrounding the 'Sky' Voice

Key Points:

Resemblance to Scarlett Johansson: The 'Sky' voice mirrored the AI from Her, sparking controversy.
Legal and Ethical Implications: Allegations that OpenAI used Johansson's voice without consent.
Removal of 'Sky': In response to the backlash, OpenAI deleted the contentious voice from their offerings.

Notable Quote:

"Everyone that tried the voice, Sky, it sounded exactly like the AI system from the movie Her, which is essentially just Scarlett Johansson's voice... they released their model and showed their update with that voice, and then it got tons of controversy... then they deleted it for no apparent reason."
— Speaker A [22:10]

Roll-Out Details and Availability

Key Points:

Phased Roll-Out: Gradual introduction to ensure quality and manage demand.
Pricing: Advanced Voice Mode included for Plus and Teams users.
Regional Limitations: Currently unavailable in specific European countries with imminent expansion plans.

Notable Quote:

"Advanced voice mode is on its way... All Plus users will have access by the end of fall and we'll let you know as soon as you're in."
— Speaker A [25:50]

Final Thoughts

Notable Quote:

"This is pretty awesome. I'm super excited for everything that's going to be rolling out. I will keep you up to date."
— Speaker A [28:30]

Conclusion