Loading summary
A
OpenAI just hosted their Dev Day 2024. Now, this is something that a lot of people look forward to because essentially they're pitching, you know, new updates to developers, but really they're new updates that everyone can use. And I think more. The more exciting thing here is that these are a lot of amazing features that are going to get embedded in all of the software we use every single day. So today on the podcast, I'm going to be covering all of the new updates. Specifically, they've introduced real Time voice API, they've introduced fine tuning with vision, they've introduced model distillation and something they're calling prompt caching. These are amazing updates and I'm going to break down all of them on the podcast today. Before we get into it, I want to say if you are not on the waitlist for AI Box, head over to A Box. It's my very own AI marketplace and app builder that I've been working on for over a year. And this month, yes, October, I have some very exciting news and a very big announcement coming. So. So if you are on that wait list, you will be the first to know. AI Box. AI. I would love for you to join the Waitlist and join us on this journey. So let's get into everything that OpenAI has just announced in their most recent Dev day here. Now, before I say the exact announcements, I also wanted to share a really interesting little snippet. TechCrunch did a little interview with them and they were talking about, we know that right before this big dev day and out and all the announcements that came out, a whole bunch of really key executives from OpenAI left the company. So they were asked about it and in this briefing that they had, OpenAI's chief product officer, Kevin, well, talking about, all of them said, I'll start with saying Bob and Mira have been awesome leaders. I've learned a lot from them. They're a huge part of getting us to where we are today. And also we are not going to slow down. So this, I think, is big news for a lot of people as we move into Dev Day because they definitely did not slow down. They came out with a ton of absolutely incredible new updates. And the first one that I want to kind of COVID and talk about is what they're dubbing Real Time API. So this is the one that everyone's talking about. And essentially what this is doing is it's for their new voice model and it's not actually their, their latest, latest voice model, but it is, it's essentially Their ability for them to go and in real time have an API. So that's for developers to use in real time. When you talk to their voice model, it responds back. So what this is essentially replacing, or what people were doing previous to this was using voice to voice AI models because. So there's voice to text and voice to voice or text to voice, right? Depending on if you're talking and it's giving you text or it's giving you. Or if you're giving it text and it's talking back, like, whatever. In any case, for voice to voice, meaning like if you have your phone and you're chatting with something like, hey, teach me how to speak better Spanish or whatever, and it's responding back to you. Previously, how this used to work was that you would talk, it would take that clip, it would essentially transcribe it into like text, send it, it would listen to your read it or listen to it, and then it would send you like your voice clip back. And that takes a couple seconds. And there's latency. The problem with you is there's all of these new agents or sales tools and all this stuff. And it was taken a few seconds to get your response back. It didn't seem very natural. So they've now officially created an API for Real time, meaning it's listening to you in real time. And I think it's partially predicting what you're going to say at the end of your sentence. And as soon as you are done saying your sentence, it immediately is giving you the response. And you can kind of think of this in the same way that like when you're chatting with ChatGPT and you ask it a question and you can watch it like type out all the letters and all the sentences of what it's saying, essentially. With this Real Time API, the voice is like automatically talking as it types that out. In the past, it would wait till it typed the whole thing out, turn it into a voice thing, and then send it to you, the voice packet. Now it's in real time responding to. So what does this unlock? Some absolutely incredible things. The first one, they have two demos that they kind of showed of different companies. One is called healthify. So this is a nutrition and, and fitness coaching app. And they are using the new Real Time API to essentially help create really natural conversations with their AI coach. That is essentially for humans who are looking to do different things with their diet or they need personalized support. So they showed this demo where essentially there's this company and they talk to it Asking it for different health suggestions. They talked about what this thing's capable of doing and, and it responded back to them immediately upon, you know, them asking questions. And they even switched between multiple different language while they're talking to talking to it. So I think they switched into like Hindi and then they like. And it understands all of that and responds really quickly. So absolutely impressive. The second demo they showed is for a company called Speak. This is a learning language learning app. They're using the real Time API to essentially help their role play feature, which is really, really impressive. And I'm sharing these demos because I think there's going to be like thousands of new apps like these that are doing really impressive things. But this kind of gets your, kind of get your head in the right place. So with Speak, they showed a demo where they're talking to it as easy as possible about just different ways to improve their language. And specifically in their sh, they're demoing their app where they like, it tells them to say a word and they say the word and the AI beyond just like listening for the word, it's actually listening for the pronunciation. So it's really, really impressive. And what this shows to me when it listens for the pronunciation is that it's not just like you're doing voice to text, it's transcribing into text and sending it to this model. It's actually listening to the words, it's decoding the language. This is really, really impressive. So they showed a demo where they were just like, you know, they're saying a word in Spanish and it was saying, oh, to say this word in Spanish, you need to really enunciate like the last part of the word and really make sure you pronounce it this way. Try again. So he says it again and they're like, fantastic, like, you did it great. Whatever, it moves on. So this is absolutely amazing to me. When you think of apps like Duolingo and all these other language learning apps that have been very common that we commonly use, these will all shift to beyond just like picking the right word on the screen or, you know, even maybe saying it. It's listening to how you say it, it's correcting your pronunciation, it's helping you not have any grammar errors and, and it's a conversation. And this is, I mean, this is how people learn languages. So this makes perfect sense. So I think both of these are great examples and I think the other great example that we're going to see a million of sometimes people will be annoyed is like the teleprompter or the telemarketer stuff that's going to be annoying with real time. But then you can also imagine, like, customer service. A lot of times when I call customer service, like with my Internet company and I have to wait for, oh my gosh, so annoying. Recently I was on hold for like an hour for my Internet provider to cancel. Cause I was getting a faster Internet company. And if they told me like, hey, do you want to. Instead of waiting an hour to talk to an actual person to cancel, because how hard is canceling? This is not a difficult thing. Would you like to talk to an AI and we can have you done in two minutes? I would say 100%, yes. So I think there's so many companies and people that are going to be able to leverage that to speed up the processes, save money. And the customer myself would have been thrilled in this case, like, as if I care to talk to actual Susie, talk to cancel this stupid subscription. Please put me out of my misery. So I think this is really exciting. They talk about safety. They say that they have multiple layers of safety protections to mitigate the risk of abuse. And really it's like scams and all that kind of stuff. I recently saw a demo that somebody did where essentially they told it, they told the new voice model to act like a scammer from India trying to scam you into give, giving them your credit card information. It was this, I don't know, it was this Indian guy that did it. So, you know, it's okay, no stereotypes. He was, he came up with it. In any case, it did an amazing job of the accent. Exactly what a scammer would say. And like, I, it was kind of like a funny and a joke thing that he was doing. But to me, I was like, oh, crap. Like, it's not gonna be, that's not, that's not gonna be what's calling you. Because they're not gonna say, have this specific accent. They're gonna say have the accent of. Have a Southern American accent. Have a Western American accent. Like, wherever geographically you're located, it's going to mimic your, your area. So anyways, I think there's, they're putting, that's why I think this safety and privacy thing is important that they're talking about. They're putting these safeguards in. Now will there be open source models with the other models that people abuse 100%. So I think you still have to be vigilant. It's not just like, oh, they're, you know, Opening eyes going to be safe for that we don't have to worry, like, no, this is something we should be concerned about and looking towards. But you know, it looks like OpenAI is going to mitigate because they're the best, the biggest, the fastest. So people won't be able to have the best tool to be able to scam you. In any case. This is interesting. This is coming. There's pros, there's cons, but I'm excited because there's a lot of amazing things coming with this new real time API. All right, the second thing that I want to talk to you about is vision fine tuning. So for those that don't know, like, or I guess a refresher on fine tuning, it really came. I got the realization the other day that it's not as complicated as it sounds. Is something we talk about a lot with AI, like, oh, they fine tune this model and it sounds like they did like some big fancy thing. You fine tune Chat GPT essentially when you give it like a bunch of examples before asking it to give you an output, that's all fine tuning is. So it's not like some, it's not like some super, I don't know, exclusive fancy thing where, you know, it's like hard to understand what it does if, if you're like, hey, like, write me a LinkedIn post. Here's five examples of LinkedIn posts I've written that I like, copy my tone and style. You just fine tuned ChatGPT to copy your tone and style by giving an example. So that's all fine tuning it. What they're doing now that they've introduced is vision fine tuning. So essentially what they said is there are hundreds, there's thousands of companies that are fine tuning, essentially giving a big data set of text. So you could imagine if it was like, you know, I had a friend that was fine tuning an AI model and he wanted it to be able to Write the best TikTok comments that were most likely to get, you know, top ranked, top ranked. So that was his goal. He just wanted to make, write really good TikTok comments. So he went and scraped 20,000 TikTok posts or 20,000 TikTok comments on the top, on a bunch of viral posts and found whatever the top comments were. And then he essentially used all of these top comments to fine tune a model to say, look, you understand how to Like ChatGPT understands how to write comments, but it generally will write bad ones or generic ones. Here's, you know, for the fine tuned the best ones, these are the ones that get the most upvotes. Copying this tone, these tones and styles and ideas like now make really good TikTok comments. And it, it was able to do that. It was able to make great TikTok comments that were really interesting, funny or witty. And so he was thrilled about this fine tuning. Okay, so this is a common thing. Thousands of companies are doing this, are uploading these big text data sets and fine tuning. But the problem is there's a ton of use cases that are not text. Right? When you talk about like medical imaging and you're trying to locate a Tumor, like, yes, ChatGPT vision can go and look at an X ray scan and be like, oh, looks like there's an issue potentially there. But how does it do that and how accurate is that? So what they're doing is now with vision, they're allowing you to fine tune with images. So let's say for this medical example of discovering things on an X ray, if you go and grab a hundred pictures of tumors on lungs, for example, and, and annotate them and fine tune and upload these into, upload these to OpenAI. Now when they do their image recognition, they're much better at actually recognizing a specific tumor versus, you know, right now the image recognition can see like what everything in the world is, kind of give you an idea, but it's not a specialist in that special area. So now you're able to fine tune the model to that. So this is very interesting and they gave some interesting examples of companies that have done this with them. One of them is Grab, which is a food delivery and rideshare company. And they have it where it can essentially see speed limits or speed signs. So it's able to, to better do that, there's a company called Automate which is essentially helping you to. Agents kind of do take actions, but it's based off of ui, right? So it's like scrolling through the Internet and going to a website and buying and clicking and doing things. And they were able to fine tune the model with images of buttons and UI elements so that the model knew what to click on beyond. Just like here's this website, go to the sales page. But if it was like, oh, what's this sales page? They can fine tune and say, okay, these are all sales page buttons. This is the what the word sales looks like. And they can really go quite specific. Or even maybe it didn't say sales page, but it just said learn more. And they could fine tune it to know that learn More typically means X, Y and Z on these type of sites. So Automate saw a, I think there they said it improved the success rate of the RPA agents from 16% to 61% which is a 272% uplift in performance compared to just the base GPT4O. So really, really impressive. One other company was called CO Frame and they actually use this. This is essentially building an. It's an AI growth engineering assistant that is helping businesses to create and test variations of websites and the UI and they're essentially trying to optimize business metrics. So a big part of this is autonomously generating new branded sections of a website. Right? They're trying to optimize this so they need to generate new ones kind of based off the rest of the website. So they were able to fine tune GPT4.0 with images and code that they improve. And by doing this they improved the model's ability to generate websites with consistent visual style and correct layouts by 26% compared to the base GPT4 models. So pretty much they upload an image of a website and then they say right below generate like you know, the next chunk of the website and they were able to you know, do it with just the base model and it was like eh, it was like okay, it wasn't great, it doesn't look like it. It should be exactly what's next on the website. Then they fine tuned it and you can see that the next chunk of the website is perfect. It looks just like something that you would expect to, to follow in the flow of the website. The way they do their headings with multiple colors of the same word is like the exact same. Where it wasn't able to do this before. So this fine tuning made this much, much better. Now again they're all about safety and privacy and they're continuing to run safety evaluations on the fine tuned models and monitoring everything that's going into them to make sure that they're all used for, you know, things that they're allowed to be used for. But overall really, really exciting use case. So the next thing I want to talk about that I think is so fascinating, it's this concept called model distillation. This is the first time I've really heard that word use. It's, it's first time I've really heard this becoming like a very popular thing. But model distillation essentially is fine tuning a cost effective model with the outputs of a larger model. So what that means is we have GPT, you know, 01 that just released and it's this incredible model, but it is way more expensive. I mean, way more expensive and way more computationally intensive to run. But then of course we have much smaller, much faster models, things like GPT4OH mini, which is like it. A lot of developers I talked to say it feels like it's almost free because you have to send a million messages for it to ever run up any sort of bill. Because it's just like so, so cheap, so fast and it's optimized. But the problem is the responses are not as good as GPT01, especially the preview. So what they're essentially able to do is fine tune smaller, more efficient models like the GPT4 mini with the outputs of the better model. Now in the past they said that people were able to do this, but it was a lot like clunkier, it didn't work very well. And so they've streamlined this whole approach to essentially get it. So you can go and fine tune these small models based off of your. Based off of the outputs of the really good models. So you're like, look like the GPT01. It gives me the right answer for this specific question. The GPT4 Mini does not. You go and generate a thousand answers that are from the GPT01, you feed them in, and now all of a sudden this really small, optimized, really cheap model is able to give you the responses that you need for so, so much cheaper. So this is really exciting and very interesting, especially for companies that have to do some of the very repetitive tasks over and over again. They're saving a ton of money. Okay, so this is really interesting. I'm super stoked about this one. Speaking of saving money and optimizing, I want to talk about the last big update that they released, which is called prompt caching. So prompt caching is an absolutely fascinating topic. Again, it's in kind of this optimization and making things more affordable vein of thinking. Essentially what you're doing, they're going to be offering automatic discounts on inputs that the model has recently seen. Meaning every time you have a conversation with ChatGPT, it has to look at the context of all the previous messages to help you with your most current one. So all of that context it's already seen before. And every time you send a new message, while it adds like a little bit of new text, every time, everything above that new text is the same over and over again. Right. And your context just keeps getting longer and longer. So everything that it's previously Seen it's going to cache all of that data and essentially you're going to get a 50% discount on the tokens used to do that. So how that works is they, they charge you when you are using ChatGPT, the API, they charge you for how many tokens or how many like words pretty much is in a, is in like a message you send it. So if you send it a five word question, it's going to charge you that much for the inputs. And then when you do the follow up question, now all of a sudden it's like your previous question and response and, and that's like maybe like a hundred words and it's going to charge you for all a hundred words plus your new ones. And it just keeps snowballing and getting more and more expensive. So these chats get more expensive the more messages are in them. But now that they're doing cash and you get a 50% discount on everything that it's already seen before. To me this is really exciting. When you're looking at tokens that, what that looks like is for their pricing model, it's like 200 or it's $2.50 if you're looking at the uncat uncashed input tokens. And then as soon as they start cashing it, it's down to $1.25. So it just, it's 50% off, 50% cheaper. They did say because people are sort of concerned about what this, what it looks like. They have some specifics of how it functions. This is on any prompts longer than a thousand tokens that they kick this in before a thousand tokens. It doesn't really matter because it's so short. But in any case, they said caches are automatically cleared after five to 10 minutes of inactivity, meaning they're not going to like cash your, your chat conversations and just keep them forever. People, people are concerned about the privacy aspect here. So 5 to 10 minutes after inactivity the caches are wiped and 1 hour after their use they're completely removed. So they said that with all API services, prompt caching is subject to our enterprise privacy commitments. Prompt cachings are not shared between organizations. Okay, so this is something that people obviously want. Absolutely fascinating use case for essentially cutting down and making these things more efficient. So I see a ton of developers that are very, very excited about this. The last thing I want to talk about that was announced that a lot of people have been very excited for is the fact that OpenAI put out a tweet and said starting this week, advanced voices rolling out to all ChatGPT enterprise education and team users globally. Free users will also get a sneak peek of advanced voice plus and free users in the EU will have, will keep you updated, we promise. Okay, so pretty much the advanced voice features that are amazing, right? Everyone's been testing out and showing demos and stuff where ChatGPT is essentially able to talk in like a thousand different ways and accents and tones and styles. All of a sudden all of the free users are getting that. So people are freaking out on Twitter. They're really excited that all of the free users are getting it, but at the same time people are kind of mad because at the end it said plus and free users in the EU will keep you updated, we promise. So if you're in the European Union, you do not get these. Now a lot of this is issues with the European regulations, the AI act and stuff. They sort of, I'm in my opinion, they sort of overregulated. And so now you see like with the iPhone, all of the new Apple intelligence features, they're not going to be in the EU when they roll out this cool stuff, it's not going to be in the eu. It's just a lot harder to keep up with regulation and everything in the eu. And it seems like people are kind of upset about that. They're saying things like living in the EU is becoming increasing, increasingly infuriate, infuriating. And then they say, you know, someone said our money is not good enough for you. Like they're getting mad but like they got to keep up with the, with the regulations and what they have to do. So overall, hopefully if you're not in the eu, you're going to get this right away for all the free users. And that is really exciting to me. Now, if you are interested in different ways to make money with AI, different side hustles, I have a school community that I have launched called AI Hustle. I'll leave a link in the description where every single week I create an in depth video breaking down one of my side hustles that I use, that I'm using AI to do how much money I'm making, what products I'm using, what tools I'm making an exact breakdown, things I can't share publicly. It's all over on my school community. So that's $19 a month and the price will probably go up to a hundred dollars a month eventually. But right now it's that price. So if you will lock it in, you can lock it in forever. I'll never raise the price on you. We have an Incredible community, over 150 people that are all sharing their projects and AI getting feedback, giving you feedback. It's an amazing group and we share exclusive stuff there on how, yeah, people are making thousands of dollars and it's really exciting. So if you're interested, check out the link in the description. I'd love to have you in the AI Hustle School community and I hope that you all have an amazing rest of your day.
Release Date: February 18, 2025
Host: The AI Podcast
Episode Title: OpenAI's MASSIVE Announcements at Dev Day 2024
In this episode of The AI Podcast, the host delves into the significant announcements made by OpenAI during their highly anticipated Dev Day 2024. The focus is on groundbreaking updates that are poised to revolutionize the way developers and everyday users interact with artificial intelligence. From real-time voice APIs to advanced model fine-tuning and optimization techniques, the episode provides an in-depth analysis of OpenAI's latest innovations.
Overview:
OpenAI introduced the Real Time Voice API, a transformative feature that enables real-time interactions between users and AI voice models. Unlike previous voice-to-voice or text-to-voice systems that suffered from latency issues, this new API offers instantaneous responses, making conversations with AI feel more natural and seamless.
Notable Quotes:
Use Cases Demonstrated:
Healthify: A nutrition and fitness coaching app utilizing the Real Time API to create natural and engaging conversations with AI coaches. This allows for personalized diet suggestions and immediate multilingual support, enhancing user experience.
Speak: A language learning application leveraging the API to facilitate interactive role-playing scenarios. The AI not only listens and transcribes but also evaluates pronunciation, providing real-time feedback to improve language acquisition.
Safety Considerations:
OpenAI emphasized the implementation of multiple safety layers to prevent misuse, such as scam attempts. The host highlighted concerns but noted OpenAI's commitment to mitigating risks through advanced safeguards.
Overview:
OpenAI expanded its fine-tuning capabilities to include vision-based models. This advancement allows organizations to train AI models with specific image datasets, enhancing accuracy and specialization in tasks like medical imaging or UI element recognition.
Notable Quotes:
Use Cases Demonstrated:
Grab: Enhanced recognition of speed limits and signage for their food delivery and rideshare services.
Automate: Improved robotic process automation (RPA) by fine-tuning models to recognize UI elements, significantly boosting task success rates from 16% to 61%.
CO Frame: Utilized vision fine-tuning to develop an AI growth engineering assistant that autonomously generates and optimizes website sections, achieving a 26% improvement in visual consistency and layout accuracy.
Benefits:
This capability allows for highly specialized AI applications, increasing efficiency and accuracy in various industries by tailoring models to specific visual data.
Overview:
Model Distillation is a new technique introduced by OpenAI that involves fine-tuning smaller, more cost-effective models using the outputs of larger, more sophisticated models. This approach aims to retain the high-quality responses of larger models while significantly reducing computational costs.
Notable Quotes:
Benefits:
Applications:
Ideal for tasks that require repetitive processing, such as customer service automation, data entry, and other routine operations where performance and cost-effectiveness are critical.
Overview:
Prompt Caching is an optimization technique designed to reduce costs associated with repetitive input processing in AI models. By caching previously seen inputs, OpenAI offers a 50% discount on tokens for repeated content, thereby making interactions more affordable.
Notable Quotes:
Mechanism:
Benefits:
Overview:
OpenAI announced the rollout of Advanced Voices to all ChatGPT Enterprise, Education, and Team users globally, with a sneak peek for free users. However, users in the European Union will not receive these features immediately due to regulatory constraints.
Notable Quotes:
User Reception:
Regulatory Impact:
The delay in releasing advanced voice features in the EU underscores the challenges AI companies face in navigating international regulations, particularly those that prioritize privacy and ethical considerations.
The Dev Day 2024 by OpenAI marked a significant leap forward in the accessibility and functionality of AI technologies. From real-time voice interactions and specialized vision models to cost-effective model distillation and prompt caching, these advancements are set to democratize AI usage across various sectors. Despite some regulatory hurdles, particularly in the European Union, the overall reception of these innovations promises a transformative impact on both developers and end-users. The AI Podcast host emphasizes the exciting potential of these updates, encouraging the community to anticipate a future where AI seamlessly integrates into daily applications, enhancing efficiency, personalization, and user experience.
Note: Timestamps are illustrative and correspond to sections within the podcast transcript.