Transcript
A (0:00)
Welcome to the podcast. I'm your host, Jaden Schaefer. Today we're talking about the fact that OpenAI is betting the farm on audio AI. This is way beyond ChatGPTs, just getting a nicer voice. There's a bunch of. There's a new report from the information that says that the company has spent the last two months consolidating engineering product research teams to rebuild its AI audio models, and they're doing this from the ground up. This is all ahead of an audio first personal device, which we're expecting to come out in roughly a year from now. There is a lot going on in the story and a lot is at stake, so we're going to be covering all of that on the podcast today. If you want to try the latest AI audio models from OpenAI or from Eleven Labs, which in my opinion is the best at the moment, I would recommend going and checking out AI box AI. I have a playground that lets you access over 40 of the top models, everything from OpenAI, Meta, Google deep, seek tons more. All these audio models, they're on there and you can try them out for $20 a month on my own startup, AI Box AI. I'll leave a link in the description if you want to go try that out. This big focus that we're seeing from OpenAI right now to push audio AI, I think shifts or essentially kind of showing what's happening in the broader tech world. Right now, screens are starting to fade into the background. They're not, you know, the, the trendiest thing. Most of these, a lot of these Silicon Valley startups and venture capitalists, a lot of people are talking about how screen time or screen addiction is bad. And so it feels like people are kind of pushing away from screens. And with that audio is kind of taking the forefront. It feels like this is the, you know, the response. There's, of course, smart speakers that have already normalized voice assistance, and those are in over a third of US households. And Meta recently added, of course, a feature to their Ray Ban smart glasses. They use a five microphone array that essentially isolates a voice in a noisy environment. So if you're talking to someone in a bar or at, you know, in some sort of noisy store or market, and you look at them, it has these five microphones, it isolates their voice and in your ear will amplify just their voice. So effectively it is, you know, turning your head into a directional microphone. Google also started testing audio overviews in June, which is, you know, covering the search results into conversational summaries And Tesla, of course, is weaving XCI's Grok into its vehicles to create a voice driven assistant that can help you when you're doing navigation or even like climate control, that kind of stuff. You could just talk to it and say, hey, like, you know, turn the heat up, et cetera. So big tech, I think right now is not the only one that is betting really big on audio. There is a collection of startups that are chasing the same idea. There's, you know, some of them that have, you know, been successful and others that have, it feels like, failed extraordinarily, including Humane's AI pin, which burned through hundreds of millions of dollars before essentially becoming a bit of a cautionary tale for screenless wearables. There's, of course the Friend AI pendant. This is a necklace that, that you wear around your neck. It got famous because the CEO bought friend.com for like $6 million and basically blew their entire fundraising round on the domain name. And with that, their necklace is essentially just sitting there and it records your life and it keeps you company and, you know, some people are curious about it, a lot of people are worried about privacy concerns. You know, it's a crazy product. So I think right now there's at least two more companies, including Sandbar and another that's led by the pebble founder Eric, which if anyone remembers pebble, this is like the biggest Kickstarter of all time. Back in the day, I think there is like $10 million for smartwatches before the Apple Watch came out. Very cool, novel idea. And they are developing an AI powered ring. So there's actually two companies. So the pebble founder Eric, and then also the company called Soundbar, both creating these AI powered rings that are going to come out this year, which essentially lets users stop, you know, talking to your phone and actually talk to your quote, unquote hand, as it were. So right now there's a lot of different shapes, whether those are necklaces or rings or glasses. But I think underlying all of this is basically the same thing and that is that audio is being positioned as the next dominant interface. Right? We're so used to using our phones, we're so used to typing things out. But now every environment, from your living room to your car to of course, your body, all of those are turning into a control surface. So you can get, you know, smart speakers, your car can talk to you, your ring, your necklace, like everywhere you go, your glasses. So right now, I think this is why OpenAI is putting a massive focus on an upcoming Audio model. It's expected early this year, so 2026. And it's reportedly designed to sound more human, handle interruptions like a real conversation partner and even speak over you mid sentence, which is kind of interesting. Slash annoying. I'm hoping you can turn that feature off if possible. I know like the goal here, they're like, well a human wouldn't like let you just keep talking forever and ever. But it's like well sometimes I would like to like recently I was using the, the chat feature on chatgpt today and I was like, hey, I want you to do XYZ thing. And I was like okay, I can do that thing. And I like just kind of cut it off. Cause I'm like great, whatever, here's everything you need to know. And I'm like just listing out just tons and tons of information for this specific problem, trying to give it all the context. Kind of the same way I would copy and paste like a massive document in. And I can only imagine if like halfway through that it was interrupting me to ask like follow on questions where I'm like no, just listen to all my information first. Wait till I stop speaking. So I don't know. They're trying to make it sound more human, but you can imagine ways that that will also be more annoying when you're trying to use it to, to get some work done. Anyways, we'll see how customizable that is. They said that they are also going to be imagining an entire lineup of devices including glasses or a screenless speaker that behave less like utilities and more like companions. So that's kind of interesting approach that OpenAI is taking and all of that. And a lot of those products I think are going to come basically out of the partnership that Apple. Well, not Apple, but Jony I've who was formerly the Apple design chief, he made a deal with OpenAI. They bought his hardware company, which we don't even know what the product was, but they bought it for $6.5 billion. With that he, they essentially got his design firm which was called IO. This happened back in May of last year. And I think that the big focus that he's always said with his design firm is that he has this big emphasis on reducing device addiction. So Audio first products, in his view offer a chance to correct some of the mistakes that happened in earlier generations of consumer tech. His idea is that, you know, we're, we're addicted to all these bright colorful videos and things on our screens. And so if he can make a screenless device Then people are going to be, you know, less addicted. But it can be kind of a useful thing. What's interesting though is they specifically say they're not trying to create a utility or like this like useful thing. They're trying to create a companion. So something that you talk to that you maybe use, it gives you information, ideas. I'm so curious, curious where this companion goes. I think that's what friend is sort of trying to make with their pendant. Like it listens to you and it can message you on your phone. That's the other crazy thing about pendant or the, the find pendant. It doesn't, doesn't actually talk to you. You don't have conversations with it. It just has a microphone which transcribes the text and then sends you text messages. Anyways, it's, it's interesting. There's a lot of things at stake here. I'd be very curious to see if the, if the companionship one really is that popular. Like I definitely can see value in using these for utility. But companionship is interesting, I will say. Recently I had a very, very long conversation. It was of course the new year and so I had a very long conversation with Chat GPT where I basically. And this wasn't chat but parts of this. I use the voice mode to give a bunch of details but I basically went in there and gave it every single revenue stream I had last year and how much money I made from every single one and and gave it a bunch of business ideas I had for this year and ask it where I should focus my time. What you know, opportunities I was leaving off on the table. Basically asking it for like sort of like financial advice to break down all my businesses. It did an incredible job. Told me areas where I needed to focus more should focus less on like I think intuitively some of those things I like I could have told myself that but it's so hard when you have these ideas that you feel sort of like emotionally attached to and it feels good to have a third party that feels non biased. Just listen to the data and give you a, give you an output. So like when we're talking about having a companion and when Johnny I talks about having a companion like yeah, if my, like I mean I use Chat GPT for that but like if I had some sort of, you know, speakerless device on my desk where I was just talking to it and it was responding and it was like business coaching me, very useful. But I sort of view that more as a utility than a companion instead of just being like, wow, it's a beautiful day. Like, I don't know, like what you would you talk to a companion about to actually keep you company. I feel like that feels sort of dystopian where I know some people don't have people, but we should all have people. And if there's people that don't have people, I feel like it's society's job in one way or another, whether that's family or communities, to, to reach out to those people. I know it sort of goes philosophical, but I, I love the, the uses, the productivity gains of AI. I don't think that we need to replace our relationships with them and make these our primary companions. You know, when, when possible, if you have the opportunity to have connections with real people, I think that always should be prioritized and put number one, especially because AI can go off the rails and all sorts of issues can arise. And, you know, obviously we're humans. We need humanity, we need people around us. So anyways, off my so philosophical soapbox there for a second. This is going to be interesting to see what OpenAI comes out with. I'm super excited for just better audio models. Obviously. I think 11 Labs is doing a phenomenal job and 11 Labs is doing some really interesting things here where they're allowing you to clone your own voice and they have a ton of really cool tools, features that are like, really specifically built for audio. So that's where I find myself going to, I think last month I was on 11 labs, $1,300 a month AI audio tier, because I was getting a bunch of virtual assistants to generate a bunch of audio for a project I'm working on. So, like, obviously I'm a huge fan there, but if OpenAI can come up with a competing product that does a lot, I think that they'll. It's definitely high likelihood that they'd get some of my business. And of course, a ton of people use them over on AI Box, my platform, which if you haven't already signed up for and tried out, go check it out. There's some really cool, exciting surprises, which I'm not announcing yet, but if you go over the website, you'll see them. So yeah, go check it out. Thank you so much for tuning into the podcast. As always, if you could leave a review, that would help the show a ton and I hope you have an incredible new year in2026. All right, catch you in the next episode.
