Transcript
A (0:00)
It is the beginning of December and I'm doing the December AI Box update. So for those that have been listening to the podcast for a long time. No, I have been working on a startup called AI Box for quite some time. We did a successful crowdfund from that and wanted to make a podcast which, if you are currently listening to this podcast, I'd recommend clicking the link in the show notes over to the YouTube and watching this. It's going to be a demo of the Air Box platform. I'm going to cover everything that we've been able to accomplish so far and what we're going to be working on next before the big rollout, giving you all the updates on all of that. But this is going to be a lot more interesting as a video. So I'd recommend going and checking this out on YouTube or if you're on Spotify, you keep watching on Spotify the videos there as well. So we'll be getting into all of this. So the first thing I wanted to mention is that we have made a bunch of big, I guess some of the most like noticeable updates that we've been able to accomplish so far have been in regards to the actual AI models that we've been able to add to the platform. So this is really exciting. If you go over to the AI Box platform, you can see that previously we had, you know, Sonnet and ChatGPT and Gemini. We've now added all of the Llama models, which I think the best one that they have so far is going to definitely be their 405B Turbo. This to me is just as good as GPT. 4 really impressive open source model that we brought in. We also brought in the Quen models, including Quencoder. So Quinn is like China's essentially Chat GPT. It's from Alibaba. And this thing is actually quite impressive, I'll tell you that. One of the things I'm most impressed with Quen is when you, when you send it, ask it for a request ChatGPT. A lot of times you'll recognize it. Like it tries to keep things to like kind of one document or like six or seven paragraphs. Max Quen. I asked him to write me a PhD thesis on something because we were testing like how many credits a message could use for all of our internal testing. And when you asked it for a PhD thesis, Quen gave me like four pages of response, which I was actually really happy with because, I mean, that's what I was asking it for. So Quen's really cool. We also have the Nemotron 70B from Nvidia, which is a fantastic model. It's a derivative or, you know, they. They essentially used Llama to help create this, but it's got a lot of stuff in it. Nvidia did a really great job, so we're excited to have it on the platform. We have some more Llama, some more Quen, and when you see like different Quen models, it's kind of like ChatGPT's models will, or, you know, of any of these AIs will typically go for, you know, their best model. And then we'll go for a smaller model that is faster, more lightweight or, you know, specific for coding or kind of. They all have their own uses. We have Wizard LM from Microsoft, which is a great model that they built. We have the DBRX Instruct model from Databricks, which I've done podcasts on in the past talking about what they built there. We have DeepSeek, their 67B model, which is actually Deep Seek has kind of been trending lately as it's created some really impressive models that have done well in some of the benchmarks. So we're excited to have Deep SEQ on there. Code Llama is Llama's specific code model that does quite well when it comes to coding, though. One of my favorites is the all of the Mistral models. We've added six different Mistral models, including Code Stroll developer. We have their small model, their edge model, their compact, their vision model, and then of course the large Mistral model. Now, one thing I wanted to talk about with all of these is the ability that you have to see what each model does and compare each model on the platform. So every single model is going to come with. And I'll cover some more that we added. But I wanted to go over this. Every model, we have a bit of a summary. These will get beefed up a little bit with some more details. We also have a capabilities tab. So we added charts to every single model, which is essentially comparing the model against all of the other industry leaders so how their price compares. And the reason the price is important is because, um, the more expensive a model is, the more of your tokens it's going to use. So we have, you know, it's. I think it's important to know that if you're using something like O1 Preview, that's awesome. It's literally 26 times more expensive. And that's the reason why on ChatGPT they only allow you to use a Certain amount of those an hour or a day. Like they have rate limits on this. The nice thing about AI Box is we'll never have rate limits. You'll be able to, you know, use it as much as you want, but, but you just use up your, you know, you can use up your tokens. You might have to go to a higher tier if you're only using one and you're using a ton. The other thing we have is quality. This is important just to know how accurate your responses are. We have third party data benchmark platforms that run through all these tests and score them all. So O1 preview is doing great. O1 mini is doing really great. Cloud 3.5, Sonnet's doing really great. And you can kind of see some of the trade offs, right? Like Sonnet is maybe five points behind in quality, but it's, you know, like a core, it's like a 20th of the price to run. So sometimes that trade off is super worth it. And then Gemini of course is on there as well. So you can compare everything on there. We also have a number of other sort of examples of what this model can do. So Mistral Large, it's just a text model, it's going to give you text. But if you were to go over to something like, you know, Cohere or I guess if you were to go over to something like, you know, Chat or Sonnet, you'd see it also has image capabilities so you can upload images and ask it questions about it. It will know what's going on there. If you do something like GPT4O, you can see that it has the ability to generate images, do the image stuff. So we kind of have whatever the model is capable of, we have it up here and then we also have all of the benchmark data there as well, which is really useful. We have the Cohere models. Cohere is doing some cool stuff. We just added Grok, so the GROK Beta and Grok Vision from X. And this is because we're in anticipation of Grok 3 coming out very shortly. So we want to make sure that we're, we have the platform on here. As far as image models go, we have some fantastic image models. We have Dolly3, which is what's built into Chat GPT. This is a decent model. But if I'm being 100% honest, there's a lot more models that are really impressive, especially Flux and Ideogram. We've got both of these added. I'm going to show you a comparison and also Kind of how this chat, how AI box works, why it's useful for comparing tools. I'll give you a comparison of all of these image models. So if we go over to some previous chats I was just doing, I asked, I asked ChatGPT for a picture of a green dog. You might have seen this earlier, if I'm being 100% honest. This thing doesn't look like incredibly photorealistic. It's sort of cartoony, sort of, I don't know, whatever. That's just me, right? So this is a chat GPS response. I went and clicked this button here so you can say rerun chat and you can pick a different AI model to do to run it with. So I ran it with Flux. You can see Flux giving me a green dog. It looks like a literal photo of a dog and it literally looks like he dyed his hair green. So I thought that was pretty awesome. And I asked the same thing to ideogram and ideal. Graham also gave me a really fantastic image that looked photorealistic of a green dog that I thought was better than, than Dolly 3. So the reason why we have all of these models on here is to give you kind of the ability to just click rerun chat. You select a new model that you think would be, you know, you're curious to see what you know, Dall E2 would, would spit out for this image. And you can compare and toggle between all of your different, just like tabs on a browser, you can toggle between all of your different eyes and find out, you know, what gives you the best quality output for what you're working on. So we have a ton of different stuff, whether that's from images to audio. One other thing I wanted to bring up while that's getting pulled up is our audio AI models. So we have added two different categories. Okay. So obviously Dall E2 is like the worst image model ever. Their green dog looks like total garbage. So yeah, I wouldn't recommend it. But we have it on there so you know why you shouldn't use it. And you can see what is actually going to give you the best result on these models. So that's kind of cool. We have for text to speech models, we have 11 labs and OpenAI. Both of these are really, really powerful and we went quite deep on the settings for these models. We also have speech to text, which are both just whisper and translate from OpenAI. So yeah, but I think what's more interesting is probably the text to speech models. So if you're looking at something like 11 labs, you know, we have the audio capability, so you can see that we can. You can see where this is kind of benchmarked against other, other models. What's really impressive here is going over to the settings on this. You're able to actually look at all of the different voices that they offer and you're able to play a sample of them. You're able to then select, okay, I like this one the best. And now you have, you know, essentially your new voice picked. And then as far as quality goes, they have Turbo V2 or just their standard. And the difference is just how many tokens that's going to use for you. Turbo V2 is pretty dang good, but standard actually works fairly great. You can adjust your maximum tokens and you can save your defaults, but the real power here is being able to go and just look through all of those different voices so you know exactly what you want. And the cool thing is, you know, I'm showing you like just a 11 Labs chat here, but you can imagine this all in one, you know, one fluid conversation where you, you know, I recently asked it to give me an analysis of the Gulf War, and it gave me this really, really long analysis out of GPT01 preview. And then you can say, you can go click to change the model. You can go over here and select 11 Labs. And then you can go and change your settings for this. Go pick whatever voice you want. And now you can have a voice that actually reads this whole thing. So very, very powerful tools. The other thing that I wanted to talk about was we've added just this little more section which kind of pops up and gives you some more information about our Terms privacy policy. A lot of this is just compliant stuff that we have to do. So that is, that's coming along nicely. In addition, over on, if you go and look over at our settings area, the things that we are currently working on for settings are the profile. So we have some information for logging that we're working on. So you're able to go and actually change your account name and stuff like that. And then the billing. So we're currently working on the stripe. These are kind of the two big things. Our login profiles, billing, which is stripe. Both of these were working with other platforms, so they're not too bad. And then of course, the cloud front, which is hosting this, so people can actually start using it. So those are the last three things we're working on before we can launch this. So. So we're really excited with a lot of the progress we've made and where we're at on this. Something else that we've done which you don't really see a lot of, is error handling. So when there is an error, you know, if a AI model crashes or you ask it to do the wrong kind of thing and it doesn't know what to do, we have error messages that show up to the user. This essentially was quite a project as we had to build in, like a lot of different things that you can. Someone could accidentally say wrong or if an API sometimes times out how we're. How we wait to reping it. You know, sometimes you notice it looks like it takes an image a long time, but that could just be that, you know, something went wrong. And so we got an error back from the server and so we got to try to ping it again and then we just serve it up to the user. So all the error handling has been a big project that we're really excited about. The progress we've made on. In addition to error handling all of the models that we've added to the platform, there's a ton had to all be added to the admin dashboard in the. In the back end. So essentially making it so that when, you know, when you run this model, Flux vs Dolly vs Dall E2, we know how much we're charging for everything and all of that's being logged in the back end. So it's a pretty complex system we built there, but we're really excited about it. It's doing well. And then the final thing that we have worked on, which again, you don't see too much, is all of the. All the token tracking. So again, kind of like I mentioned, with how much everything costs, whether that's audio, image or text, every time we go and add a new model on, they all have different pricing. And so all of that is. Is all being managed in the back end. So we know how much every single. Every message should cost on every single model. And we all have that figured out. So that's exciting. And that's a lot of what. A lot of what we've been working on. So stay tuned as we're getting ready to launch this product. I'm really excited about it. This is something I've, you know, put blood, sweat, tears and all my ideas and passion into. I'm really excited about the capabilities here. You know, just being able to click, compare and look side by side at different images or different, you know, text or code that you could generate side by side. There's a lot of really exciting stuff. I've covered a lot of these different features in past one, so I won't go over, like, you know, every single feature that we have on here. But I will say, um, there's a lot of exciting stuff. We have some more things coming in the pipe. Um, but we're really focusing on getting this thing launched so you guys can get it in your hands. So look up for an email if you're on the wait list on AI Box AI for the platform, and if you have any recommendations of features that you're not seeing here that you're like, oh, this would be really cool, send me a message. Over on LinkedIn, we're, we're rapidly adding new stuff. We have a really talented development team and we'll definitely add some cool things to the roadmap. Otherwise, get on the wait list and you can be a beta tester for this and get launched with this very shortly. Thanks so much for tuning into the podcast today, for checking out this update. Really excited about it, and I'm excited to onboard everybody onto this really fabulous new platform we've been able to build.
