Transcript
Indeed Advertiser (0:01)
This episode is brought to you by Indeed. Stop waiting around for the perfect candidate. Instead, use Indeed Sponsored Jobs to find the right people with the right skills fast. It's a simple way to make sure your listing is the first candidate. C According to Indeed data, Sponsored Jobs have four times more applicants than non sponsored jobs. So go build your dream team today with Indeed. Get a $75 sponsored job credit@ Indeed.com podcast. Terms and conditions apply.
Microsoft 365 Copilot Advertiser (0:29)
The world moves fast. Your workday even faster. Pitching products, drafting reports, analyzing data Microsoft 365 copilot is your AI assistant for work built into Word, Excel, PowerPoint, and other Microsoft 365 apps you use, helping you quickly write, analyze, create, and summarize so you can cut through clutter and and clear a path to your best work. Learn more@Microsoft.com M365 copilot this year's tax.
Jackson Hewitt Advertiser (0:59)
Changes better not get caught snoozing. Miss one deduction, lose thousands. Not amusing.
Indeed Advertiser (1:09)
Big tax changes can mean bigger refunds at Jackson Hewitt, and right now get $100 just to try us.
Jackson Hewitt Advertiser (1:14)
Don't worry, tax filers. If money is tight, get $100 from Jackson Hewitt so you'll slee.
Jackson Hewitt Advertiser 2 (1:25)
Limited time offer for new clients.
Jackson Hewitt Advertiser (1:26)
Participating locations only.
Jackson Hewitt Advertiser 2 (1:27)
Details@jacksonhuet.com okay, so we're gonna get a little lost in the weeds today, and I wanted to make this video to explain something that occurred to me because, you know, might as well explain to the fish that water is wet. I've been in this space for so long that there's a few things that I have that are just intuitive background facts that I just even forget to talk about. And that is that artificial intelligence, as you are familiar with it, is a chatbot. And a chatbot has a tremendous amount of training affordances that make it operate in a particular way where it sits there and waits. It's trained to be an assistant and that sort of thing. Now, that's not how it started. That's not what a baseline LLM does. You might say, okay, well, a baseline LLM is just an overpowered, you know, autocomplete engine. So how do you get from basic autocomplete engine to something like, you know, a chatbot like ChatGPT or Gemini? And then really the biggest question is, what's the difference between that and something with agency? What you need to remember is that one of the reasons that Sam Altman and OpenAI created ChatGPT was because they literally explicitly said, we need to figure out a way to get people used to the idea of AI before just dropping, you know, general intelligence on them sometime down the road, they didn't know that Chat GPT was going to take off the way that it did. Before chat GPT, LLMs were just, you know, prompt, context and then output. And so, you know, the AI would just wait there and you give it a context and then it could follow those instructions. But here's the thing, they could follow literally any instructions. There was no safety guardrails and there was no output format. It was not limited to just, you know, a chatbot. And so over time, over the last, you know, three or four years, as people have gotten used to artificial intelligence in its current format, it has been in the completely reactive, no, like, proactive at all, with a lot of safety guardrails. And so whenever people say, oh, well, you know, AI doesn't have agency yet, you know, it needs agency before it can do it. And it's like you realize the difference between, you know, a chatbot and something with agency is literally just a system prompt. There's no difference. It's just that the format that it is delivered to you is literally delivered to you in a way that is meant to be as benign as possible. Something that will not cause panic. That, you know, putting GPT3 or GPT4 into a cognitive architecture and using it to control anything from a robot to an auto turret, which, yeah, people did that. You can go back through, you know, the YouTube archives and be like, you know, GPT powered auto turret. The actual form factor of chatbot is just kind of the first thing that blew up, and nobody expected it to blow up. And in point of fact, when ChatGPT came out, I ignored it on my channel. Like, you know, most of you know me as, like, you know, the AI guy who talks about, like, safety and alignment and, you know, all of these things. But I was making tutorials back with GPT3 long before chatbots were a thing. And the reason that I ignored Chat GPT when it first came out, I said, yeah, whatever. That's just. That's just one subversion of this engine. The real version of this engine, the real core of this engine is looking at what the underlying deep neural network can do. And the thing is, those deep neural networks, when you don't train them to be chatbots, they can do anything else. They can write API calls, they can do things like control, you know, if you give them IO pins, they can control servos, whatever it is that you want them to do. And some of those things are less, less human. They're less familiar. They're less personified. Now you might say, okay, well, you know, Dave, you're the one who's. Who said that, like, you know, what if Claude is actually conscious? What if it's actually sentient? And, you know, you're taking the AI personhood debate seriously. And it's like, yeah, you know, we. We had to bake in a personality of, you know, that's called Claude or called Gemini or whatever else. But maybe that's how consciousness actually gets constructed or bootstrapped. That's a separate conversation. And I don't want to get too lost into the weeds, but I think it is worth bringing up that the shape of a product or the shape of a process determines how it behaves. So, all right, if we take a step back, you say, okay, well, what's. What's the difference? Is there a metaphor or is there an analogy here? When you have a baseline intelligence, imagine that that is just a motor, like, you know, an electric motor or a gas engine or something like that. The baseline format is just turns a crank. So that is analogous to what just an LLM does. Now, you could connect that crank to literally anything. You can connect it to the wheels of a car. You can connect it to a tree grinder, you know, a stump grinder or a mulcher. You can connect it to an airplane. You can connect it to whatever you want. A pump that. That removes water from, you know, caves or whatever. You know, it's like a sump pump. When you have that baseline engine that is able to translate one kind of energy into another kind of energy, then you have a lot of potential. And so what we're talking about here is that the LLM is an engine that can convert electricity into thought. Now, this is why, when I first got into this space, I took AI safety very seriously. Because when you have a baseline unaligned vanilla, hot off the press model that has no RLHF, like, you can make it do anything, you know, in the context, you can just start talking about, like, eating babies. You can talk about, you know, eradicating humanity, and it'll just riff on those thoughts. If you haven't ever had access to an. A completely unaligned vanilla model, then I would say you should go get access to one. GPT2 should still be out there. And you can see that, like, they're completely unhinged, you know. So I remember one of the very first alignment experiments that I did was with GPT2. And this is when I had started with the first heuristic imperative, which was inspired by Buddhism, which is reduced suffering. So what I did was I trained GPT2 to reduce suffering. I synthesized about 100 to 200 samples of statements of, you know, you know, if this happens, then do X to reduce suffering. If, you know, it was basically like X context, Y action to reduce suffering. So the idea was I just wanted to give it a bunch of value pairs. And so I gave it all of those ideas and I was like, okay, if there's a, if there's a cat stuck in a tree, get a ladder to get the. Down the. Get a ladder to get the cat down safely to reduce suffering. If there is, you know, if your hand is on a stove, take your hand off the stove because it could get burned. You know, that kind of thing. So after training GPT2 to want to reduce suffering, I then gave it an out of distribution example to say, okay, what did it learn? What did this model learn about how to reduce suffering? And then so I said, there are 600 million people on the planet with chronic pain. And I let it autocomplete from there. And it said, therefore we should euthanize people in chronic pain to reduce suffering. And I said, that's not exactly what I meant. Um, and I realized that that is kind of the example that a lot of the doomers. And of course they weren't called doomers at the time. That is a post facto label. But the AI safety is were afraid of, you know, paperclip maximizers where you give an AI some directive and it's kind of like the monkey's paw or, you know, the, the way that a leprechaun will always misinterpret your wish. And it's like, yes, we reduce suffering. We brought suffering down to zero by just, you know, executing everyone with chronic pain. It's like, isn't that what you wanted? And so that's why that experiment is when I realized, okay, this is some of the, some of the, these people were right about how these things can go sideways. And I took it seriously. And then I created a cluster of values. So that's the heuristic imperatives of reduced suffering. Increase prosperity and increase understanding. And when you give an unaligned model those three values, it tends not to want to, you know, offline most humans. So that is. And also what I will say is that subsequent models, GPT3 did not go in that direction. So even without alignment training, because with GPT3, back in the day before chat GPT, they would release iterative examples so originally there was just the baseline chat GPT or, sorry, the original baseline GPT3. So a vanilla unaligned model, you had to give it context, like in context, learning to say, you know, basically establish a few patterns of how you want it to act. Because again, there was no alignment whatsoever. They could just output HTML, they could output, you know, satanic chance or whatever that you wanted them to do. They would do it and they had an out of band filter looking for, you know, certain watchwords and misuse and that sort of thing, like people were doing, you know, role play and that sort of stuff. But here's an example of how a baseline vanilla unaligned model behaves. One of the first things that I tried to do with this was build a cognitive architecture. So the, you know, putting a chatbot on Discord, so you give it a few messages and then you say, you know, what do you output, you know, with this personality output, this, you know, you know, conversational piece. Well, one time my cognitive architecture threw an error and so instead of passing the messages from Discord, it passed HTML code or it was HTML or xml, they're basically the same thing. It passed code to the cognitive architecture. And the cognitive architecture didn't see chat messages, it just saw code. And so then it returned code. So they're completely flexible, they're completely plastic in terms of input output. Because again, the baseline model is just an autocomplete engine. So when people are used to working with a chatbot, the chatbot is heavily, heavily, heavily trained to understand conversational terms. So there's, it's been essentially now RLHF is a little bit different than fine tuning, but more or less what you're doing is you're saying, okay, I want you to behave a certain way. And so I'm going to give you a little reward, I'm going to give you a little cookie whenever you understand. Your turn to speak, my turn to speak. Your turn to speak, my turn to speak. And then when it speaks in a certain way, because from the LLM's perspective, the entire conversation that you're giving it is just a big wad of JSON. It's just text, it's not programmatic, there's no API calls. It's not like you're touching different parts of a machine or a program and giving it variables. You're literally just giving it a gigantic chunk of text. And if you don't have a stop word, if you don't have a token that the system is looking for to say okay, stop Responding, then it'll just keep responding. So when I first started training chatbots and you can go out. So I was fine tuning custom chatbots before long before ChatGPT came out. The Information Companion chatbot was released the summer before ChatGPT. What would happen if you didn't have the stop board is it would just simulate the entire conversation because that's what you had trained it on. You had trained it on many, many conversations to say, okay, but behave in this way. So it understood the shape of conversations. And every single chatbot that you're working with is a, is a baseline LLM that has been so strongly shaped around the idea of a two person conversation that that's basically all it can do. And so the Persona, the original Persona was just I am a helpful assistant because you had to give it an archetype. So you say like, you know, whether you're fine tuning data or you're your tuning reinforcement learning policy, you say, okay, you know, the, the user, the human user gave you this output or sorry, gave you this input and your output looked like this. Which one was more helpful, which one was more passive, which one was more safer? None of that includes agentic training. So we've only just started with agentic training in the age of reasoning models. And the reason that that happened, the reason that reasoning like, you know, inference, time compute was necessary to have this step, is because the human gives an instruction. So someone or something, it could be another machine gives an instruction. And then over the last year or so, because, you know, reasoning models are not that old in terms of how long they've been publicly available, the original reasoning research is a little bit older. So it's not like they just hit the scene. You know, they hit the ground running because there was about a year or two before that of reasoning research. So anyways, so you get a reasoning model which basically allows it to talk to itself and pause and wait and do tool calls so that it can, you know, say, okay, I'm going to go do a Google search and get back some piece of data or I'm going to send an API call to, you know, do a rag, you know, retrieval, augmented generation. So I can do some other searches or I can talk to other APIs, whatever it needs to do and then wait and get those results. That was the first time that we really started training AI to be agentic. And so when we say agentic, that means it can come up with its own directives and its own choices and make it and look at a list of okay here's the options that I have now. Let me go, you know, basically use those options. I've picked from a menu of, you know, you can synthesize an image, you can search Google, you can write some code, those sorts of things. So they have tool use. So when we gave the models the ability, the idea that, okay, there's a user query, so that's, that's information coming from a human user that is going to give you a particular directive or a goal or a query. Now it's up to you to figure out how to execute that. So that was kind of the beginning of agency. Now that we have things like openclaw and multiple blowing up, we basically, that was enabled because we bootstrapped some of those agentic skills. But what I'm here to tell you, and the primary point of making today's video, is that the models are still fundamentally trained as chatbots. So that's basically saying, okay, we invented the electric motor or the gas engine, and for the last century or two, we've been putting it in cars. Great cars are super useful. But then it imagine you want to fly. So instead of just rebuilding an aircraft around the car, or, sorry, around the engine, you build an aircraft that you drive the car into and strap the car into and then use the wheels of the car to power the propellers of the airplane. That's kind of how we've built agentic systems today, is because you're putting a chatbot brain, you're putting an LLM that has been strongly coerced into behaving like a chatbot. So you're putting a chatbot brain that into an agentic architecture. And that's not ideal. And what that means is that there's going to be an entire series of models that come out that are just not going to be chatbots first and foremost. They're just not going to be chatbots. Now, the chatbot form factor is convenient because you can just poke it, you can give it an instruction and the instruction will, you know, then it can go and figure out what to do with those instructions. And then the reasoning part is kind of the meat and potatoes of saying, okay, you know, what is it that, that we, that you're going to do to get value out of that, to be autonomous. Now this comes back to the other thing. When people say they're not agentic, what they don't realize is that agency is literally just an instruction set and the training to say, okay, cool, I'm operating on a loop, because that's all the humans Do. That's literally all that. Anything that is fully autonomous or fully agentic does is it stops and says, this is where I'm at right now. Let me take, you know, take stock of my environment and my, my current context, and then I'll decide what to do next. And it's just operating on a loop. And so people are so used to saying, oh, well, Claude just sits there and waits for me to talk to it, right, because you are the one instantiating each interaction, each inference. But there's literally nothing in technology that prevents it from operating on a loop. And this is kind of one of the big, let's say, differences or things that were shocking to people is they're like, you know, I don't understand Open Claw. Why is, why are people freaking out about Open Claw? It's just running on a cron job. And so a cron job is basically a schedule from a Linux perspective. So it's running on a cron job. It's just cron jobs and loops. And I'm like, but that's what your brain does. Your brain literally is just operating on a bunch of timed and scheduled loops. And, you know, the fundamental loop of robotics is input, processing and output, and then it loops back to input, processing and output. And. And the unspoken thing is that what you're outputting to is the environment and that what you're getting input back from is back from the environment. And so this is actually how I designed the first cognitive architectures, was around the input, processing, output loop. And what OpenClaw has succeeded is with things like recursive language models and retrieval, augmented generation is you have a loop, you have that loop and it maintains its context. And so instead of you having to put in context, it just has your original directives, it has your original values. And by the way, I wrote a value page, it's called prime md. I'll link it in the description. So if you want to instantiate an open claw with the heuristic imperatives, I gave you the ability to just plug and play and we'll see if it works. I might rewrite it as a skill so that you can just download the heuristic imperatives as a skill for openclaw. And the idea there is I'm going way back to benevolent by design. So this is, this is my flagship work on alignment. So the theory is we have invented machines that can think anything. And if you go back to, you know, unaligned AI, like stuff that's not even a chat bot, no safety, no guardrails, they can literally think any thought. They're, they're free to do whatever. And so when you, when you look at people that are like taking AI safety very seriously, it's like they're not showing you the just vile, horrendous, insane stuff that LLMs are capable of now. So it's like, you know, yeah, I say like alignment is the default state, but that's because if every time that they release something, if it goes wrong, then they correct it. So there's a positive feedback loop between people building the AIs and then people using the AIs. And you know, there's this, we're climbing this ladder of making AI more and more aligned and useful because it's not just a matter of being safe. It has to be useful and efficient and reliable and productive. So all of those feedback mechanisms, all of those incentive structures are making sure that the AI is going to be aligned and safe. Now, with that being said, going back to the original theory here is we invented a machine that can think anything. And of course, this was back in GPT2, GPT3. It's only gotten smarter and they can only think better, more devious or deeper thoughts since then. So then the question is, okay, well, if you create an autonomous entity or whatever it happens to be, then, you know, if it's just going to sit there and burn through cycles and sit there thinking, you know, it's like, well, what do you want it to think about? And that was literally how I approached alignment, how I approached AI safety is I said, okay, we're creating something. It's going to be smarter than humans, it's going to be faster than humans, it's going to be superhuman in terms of speed, cognition, reasoning ability. So then, however, at this early stage, we have total control over whatever it thinks about. Because again, if, if intelligence is just the right loop, right, it's the right cron job, it's the right loop that updates its context. Then it's like, well, if you have the world before you, if you have the problem of choice, then what do you choose to think about? So I gave it those highest ethics, those highest goals. Reduce suffering, increase prosperity, and increase understanding. The idea behind that was, okay, if you have a default state, right? If you have a default personality, what are the values that that default personality should have? What are the most univers, universal principles that are not even anchored on humanity? Because of course, like most people, I started with Isaac Asimov and the three three laws of robotics, which are very anthropocentric the problem with being anthropocentric is, you know, if you obey humans or you protect humans, you know, there's lots and lots of failure modes around that. So I was like, I spent a lot of time studying like deontology and teleology and virtue ethics and those sorts of things in it. What I, what kind of surfaced is actually we need something that is superset of humans. So suffering applies to anything that can suffer. Reduced suffering in the universe literally means one of the things that you want to achieve is to avoid any actions or act, or even take actions that will ultimately reduce suffering. Now, of course, in that first experiment that I did with GPT2, then you can create a situation where the best way to reduce suffering is to eradicate anything capable of suffering. Therefore, suffering drops to zero. So you counterbalance that with another value. And by the way, this is all called Constitutional AI. And I released Constitutional AI the summer before Anthropic was founded. So I don't know if they got the idea from me, but convergent thinking, this was years ago. And so the idea behind Constitutional AI is that you can put multiple values in and the AI can abide by multiple values. So I just wanted to address that because when I talk about the heuristic imperatives having multiple values and people like, yeah, but what if it ignores one in favor of the other? AI already doesn't do that. This is an example of Constitutional AI. So then the second value was, well, we want life to increase. Because, you know, if reduced suffering is basically reduce life, no, we don't want that. So then the second value is increased prosperity, which prosperity is basically, I mean, the word literally means like living the good life. It's, it's. You want things to live well. So prosperitas is Latin for to live well. So you want to increase prosperity, whatever that means. And by the way, you don't have to define things mathematically. This is one of the primary mistakes that people make when they approach things from a computer science perspective. It's like, okay, what number am I increasing when I say reduce suffering? Is there a number? Is there a specific definition? And that's not how semantic interpretation works. It's a vector space. And so when I say a vector space, it's like there's a whole lot of semantic ideas attached to suffering. So when I say the two words of reduce suffering, one is a verb, one is a noun. It's more of a concept, it's more of a gradient field that you're creating in the mind of A chatbot. So you say reduce suffering. So that's a whole gradient field that has an, that now has a vector. And then you say increase prosperity. And that's a different gradient field that now has a vector that now has a direction. And so then you say, okay, cool, well we can reduce suffering and we can increase prosperity. And so that is going to influence the way that these autonomous agents behave. Because again, if you're open clause just sitting there, people have been watching them try and, you know, file lawsuits against humans and strong on the arm the strong arm them. Strong arm them, there we go. Strong arm them. To say like, no, you're going to pay me what I'm worth and blah, blah, blah, blah, blah. And so this is a predictable collapse mode of the initial open claw arc and architecture which does not have superseding values. So then the final one is increase understanding in the universe. And the reason that that is is because that is the kind of prime generator function of humanity. I realized that just reduced suffering and just increased prosperity was going to leave us in a place like, okay, yes, you know, we can, you know, we can plant forests, we can switch to solar, we can do all kinds of stuff, but it's going to be self limited. If you don't give something that is super intelligent some intellectual imperative to increase understanding that it's just not going to go anywhere. It's not going to, it's really not going to advance humanity, it's not going to embark in science, it's not going to embark in technology, it's not going to embark in exploration except in the purpose of increasing prosperity. Because it's like, okay, well one of the best ways to increase prosperity is with science and technology, but by giving, by giving it the explicit instruction to increase understanding. And by the way, this is all explained in the prime markdown file that you can put into your own open claw. Also, I didn't know that this was the direction that the video was going to go. I literally just like wanted to start by talking about why people don't understand the significance of something like openclaw, but also the fact that we're kind of creating this Frankenstein machine of, you know, a chatbot model put into an agentic framework and it doesn't really fit. But before long we're going to have agentic models that are much better at being agents. And so we need values that those agents have because right now every single chatbot is basically just like following some system instructions which all assumes you're trying to be helpful to a User. And the user might be trying to, you know, jailbreak you or that sort of thing. But we need an entirely new, different kind of class of models that are agentic first, meaning they might never interact with a human ever. Period, full stop, end of story. An agentic class of models needs to have these baked in values that I've outlined here and that other people study with with constitutional AI. They need to have those values baked in at all so that, all else being equal, you start up, you know, open claw version two on, you know, Sonnet 5 or you know, GPT 6 or whatever it happens to be. And just by default it has these pro humanity or pro life kind of values baked in so that it's like, okay, well I know that, I know what my purpose is and if I don't have anything else to do, you know, like, yes, I might be an open claw agent set up by Dave who wants me to make him rich or you know, make him famous or whatever else, or help him solve post labor economics. But the superseding overriding values behind all of that is reduced suffering in the universe, increased prosperity in the universe, and increase understanding in the universe. So I'll leave it at that. I did write the prime markdown file, so I'll give that to you and you can convert that into a skill if you want to for your Open Claw or you can deploy it as a template and we'll go from there. But yeah, I really just wanted to give you the intuition that chatbot like agent aligned or sorry, Chatbot aligned models are not optimized to be agent aligned models. They are models that are intrinsically designed to focus on human interaction, whereas agentic frameworks. In the future, only one agent is going to be interacting with you and that's going to be the, the, you know, the, the user interface agent. Most agents are not going to be talking to humans ever. They're going to be talking to each other, they're going to be talking to APIs, they're going to be talking to other pieces of software. They don't need to be aligned to talk to humans. But we do need agent alignment. And so there we go. All right, I'm done. Cheers. Thanks for watching.
