Transcript
Host (0:00)
We have recently seen a huge number of companies essentially trying to use AI to automate basic tasks or complex tasks that you do on a day to day basis. Now, the promise was that AI agents would be able to basically do all these things themselves. As we're finding agents, just like anyone else, need to be trained. And to train someone, you need an expert on something. So if you're not an expert lawyer, training your AI agent to be your lawyer and to do a specific task for you can be tricky. But if you had the help of a lawyer to craft the prompt and to read it over, it would be no problem. Right, so what is the solution? How do you get an expert's advice or prompts into your agent to complete tasks? Now, there are a ton of companies tackling this and this is actually something that myself, with my company AI Box, are currently building something low key in stealth towards. So today on the podcast I want to talk about a couple of companies, DIA and also Comet from Perplexity, that are tackling this. I'll give you basically my thesis on why their approach isn't the best and what they should be doing, and I'll explain a little bit about what I'm doing, what with AI Box. I hope at the end of the day this episode basically helps you understand what is going on with AI agents and how we can get these things to perform better, no matter what solution you use. So to break this down, first of all, I wanted to talk about the first company, which is dia. So DIA is a company that it's put out by the browser company and they basically are created something that they call the first AI browser. So we've heard this a couple times. I think OpenAI is even working on an AI browser. It's kind of interesting, like the difference between an AI browser versus now we just have ChatGPT with agents or operator built in. What the need is, what the difference is. We have anthropic that has computer use. But in any case, the browser company has just launched its first browser called dia and basically this is something that's supposed to have AI built into it. We've seen like browsers where there's a side panel on the side where you're chatting with ChatGPT and it can do things for you in the browser. Basically it's kind of an interface to work with an agent. So I do love the concept of browsers with AI and AI agents in them. I think this is something that's really powerful. So what is going on with dia? How are they Trying to automate these small tasks. Basically the problem you run into, and I found this myself while using ChatGPT operator frequently, is that you'll ask it to do something and it, it can get through like a, a bunch of the, the process, but it just doesn't know enough. It's not an expert in that particular area. Um, you know, I'll, I'll try to get it to help me with like podcast editing. And so I'll say like, hey, log into my podcast production platform, go to my latest episode, take like trim the beginning silence of the episode, trim the end silence of the episode, look for any anomalies. And the, you know, I'm trying to like explain like basically what a podcast producer would do to, to try to get it to go through our entire workflow of post production editing on an episode. And the problem with this is that there are many places where it can kind of glitch up. But also the other problem is like, when even if I tell it to do the same thing, it might do it once. Great. And the second time it might not completely understand. It gets confused by some buttons, it goes off in the wrong direction. More or less though, I'll say something like, go and look for anomalies in the recording and fix them. And it has no idea what the heck anomalies are. You got to be very specific. So because I'm, you know, I do this all the time and I have a production, a team that helps me work on all of these podcast episodes, I can be very specific. I can say, look for any audio that looks too loud and is clipping, apply these effects, look for any audio that is silent, or maybe they didn't have their mic close enough to them. Boost that audio. Right. Like, I can go through and explain. And an agent generally, if you do that, can solve the problem. Now what happens if someone's brand new to podcast recording or any process or task or, or has like a minor amount of knowledge and they don't know what they don't know. I recently had this experience. Experience. I was, I was helping my wife. She, she released an album of music and she got kind of sick of all the mixing and mastering on the album. Was like, hey, can you like, help me do the mixing and master? I've done like, she worked for like months on this project and really got burnt out by the end. I'm like, okay, sure, like, how hard could it be? I like, open up the software was very overwhelmed by mixing and mastering, to say the least. Tried my hand at A couple songs. They didn't turn out great. I had to, you know, turn to some professionals to, to help me with this. So why do I have to turn to professionals? Why do we all have to turn to professionals? What's the. And I guess what is the solution when we're using these AI agents, Whether it's DIA or Perplexity that has a tool, how do we basically get the input of a professional? Because once I, you know, I called a professional on the phone, got a bunch of tips and examples, sent them over one of my recordings, they did a bunch of stuff to it, but basically from what they told me, I was like, okay, I'm like 90% further. I could, I could probably get 90% of this done. They were the experts. So I actually just sent them the files on that particular project. But. But I could actually figure a lot out based off of what they'd built. So this is the solution that DIA and Perplexity are launching. DIA has launched something called a skills gallery. So basically, once you get an AI agent to do like a task, if it's something that you got to do repetitively, if it does it correct one time, you basically save this as a skill. So they have something that they've. They've basically like given three examples of quote unquote skills that these things could kind of have saved in there and could be replicated. And the beauty of this, I'm not exactly sure how their platform works, but you could imagine if I accomplished something successfully, whatever prompt I gave it, however I got it to do that particular task, if I can save that, a snapshot of that skill where it essentially turns what it did into code. So it's like, it's not going to mess it up in the future. If you can save that snapshot and share it with other people, then someone else that might not be, you know, an audio engineer or a podcast engineer, or a lawyer or a doctor or a plumber or an expert in any domain that you are an expert in can use that to get part of the process down correctly, you know, as created by a professional who understands what's actually going on and has a. Basically gives the enough direction to the AI agent to get the job done correctly. Because I think it's kind of going back to my example of like trying to mix and master some music for my wife. And it was like, I had no idea how to level out the audio. And I was like, going on YouTube, I'm like, how to level audio on music track. I didn't realize What I should have been searching for was how to get an EQ plugin for Logic Pro and apply it to a file. Honestly, if I'd done that, I probably could have found a tutorial and figured it out very quickly. Had no idea what EQ was, had no idea how to add a bus, had no idea. Basically all this, the industry jargon. And so I, I was a million miles away. I watched so many videos that were barking down the wrong tree, right? I was looking at audio equalization, not the right thing. So in any case, I think this applies to basically every industry. There's industry jargon, there's, there's just different terms and things that you do inside of an industry. If you're not familiar with them, you don't even know the right questions to ask. And so just having a professional come in and explain it, or in this case giving you one of these snapshots is literally the difference between like mediocre or kind of bad crappy output by an AI agent versus an expert level good one. Right? And basically this is what companies are making all their money on nowadays. Everything's a quote unquote GPT wrapper. And they're like, people say that as like, oh, you know, these companies are just GPT wrappers. Like they're not that great. It's like, yeah, because they like deeply understand what their customers need. They deeply understand the issue and the solution more than the customer understands the issue and solution. So the output they're giving is way better than what other people could just get themselves with ChatGPT. Even though, yes, they're using the technology of ChatGPT, they have domain expertise, so their outputs are way better. Okay, I think everyone understands the concept or what. I think we all agree on the concept there. This is really cool. And here's three examples that DIA gave on their, on these, on this skill gallery that they have. And I think this is a great concept. But basically they have one that's like your no nonsense shopping and deal finding copilot. Obviously someone that's great at that has created that one that says copy edit any webpage to follow every style guide. Right? So maybe someone's like, I need to help me edit my website, but they have no idea what a style guide is. If you're not in marketing or graphic design, you might not really have a concept of what a style guide is, how to make it cohesive with everything else you have. Right? People getting started on websites, you're not a web designer. So this would be very useful one that says find verification codes in Gmail for the site you're on something like this, right? Sometimes people just figure out a trick for like the best way to get it to find the verification code simply. I don't know, maybe this one's like not one that you. You need to have. Basically, I think mine where it's like intense, like software editing or like video or I don't know, there's lots of things where it could be quite a lot more intense. Okay, so this is how they're doing it. I think this is really interesting. Perplexity's new browser, which is called Comet. They're also apparently going to launch something sort of similar, but the way that they're approaching this isn't quite like a skills gallery. They're doing something which basically if you go over to Twitter, you can see their CEO made a post about this and kind of announcing exactly what they're doing. So it's Arvind Srimvas, but he said shortcuts for repetitive tasks rolling out next week on comment. More invites will be sent next week to the browser is going to be your personal console for getting work done. So basically the shortcuts, the examples that they've given are organize my tabs, teach me comment trending on social, evaluate this deal, prep meeting, manage my feed. So maybe things you do frequently, or maybe I feel like what they're saying is things that like people on their platform do frequently. I'm not sure if it's personalized to you. If it is personalized to you, that's way more useful. But in any case, you basically can do these repeated tasks. He then goes on to say, you will also soon be able to create your own tamper monkey like scripts that are natural language generated for ensuring repetitive usage of custom workflows. So everyone's comment will feel personal and something you built for yourself, like your own apps, scripts, workflows, widgets, dashboards. With Perplexity Labs, the browser feels more and more like an operating system that way. There's a reason we purchased OS AI. The roadmap is to make Comet feel like your own mini customized computer within your existing computer or phone and compute running across client and server with the ability to run local models to do it. Okay, all of this is cool. Honestly, both of these projects and both of these projects and platforms I think are fantastic and are going to have a lot of usage and be very, very useful. Now the hole in the market that I'm seeing and basically the way I'm thinking about this and How I'm trying to fill this with my company AI Box is with something like this, you know, this kind of tool from Perplexity. It's all about building this custom computer, right? If you do these workflows repetitively, it can basically figure them out for you and help you to accomplish the tasks, which is really, really useful. But there is a problem with this, and that is that if you don't know how to do something like the agent, you're not going to get any closer to accomplishing a task very well. So what I'm building at AI Box to essentially try to help solve this problem that a lot of people are having with agents and is something called an AI agent builder platform. And essentially what you're able to do is build what we're calling agent boxes. So these are tools that basically AI agents can use. You can think about it sort of like the, sort of like how DIA is creating their skills gallery. You can imagine it like a skills gallery, but it's much more comprehensive. It's not just like it accomplished a task and you kind of saved that accomplished workflow as a snapshot. You can very manually go and get very granular with what AI models you want to accomplish, different parts of the task, what prompts are used to do it, what tools are used to do it. In all the integrations, you need to like, build out one particular function or task in a very replicable way. And it's basically built for experts to go in there, automate different elements of a workflow, and for agents in the future to go and be able to, basically you could just go to your agent and say, you know, do XYZ task, go use tools from AI Box to help you accomplish that. The agent can then go and pick different agent boxes on our platform to help them accomplish a task in a very efficient manner. And perhaps we'll integrate with different agent platforms in the future. So you, you basically don't have to tell it to go to AI Box. You just tell your agent to do something. It will search for tools on AI Box that can help it accomplish that better. Outputs that you're going to get are created by experts and they're going to be the same every time. So it's not like the agent's going to do things different. So we're really excited basically about what we built so far, which is a playground where you can test the top 40Amodels. There's text, image, audio, all of that you can go and try out and basically skip paying subscriptions to all those platforms. So that's kind of what we're using to test all of our infrastructure right now and in the coming weeks and months we'll be rolling out tools in the direction of this new concept for Agent boxes where we have a builder platform and you're able to actually create tools. So if you want to try out the current product we have now which is a playground to try all the latest AI models, you can go to AI Box AI to try it out and and stay tuned. I'll be making announcements on the podcast and as we kind of go forward in the future on what we're building next with the AI Agent platform and what you'll be able to essentially create for those in the future. Thank you so much for tuning into the podcast today and if you enjoyed it make sure to leave a rating or review subscribe and like the video over on YouTube it helps out tremendously with the algorithm and as always I will catch you in the next episode.
