Nathan Labanz (59:20)
Well, broadly, I think it's the task length story from meter of every 7 months or every 4 months doubling time. We're at 2 hours ish with GPT5 Replit just said their new agent V3 can go 200 minutes. If that's true, that would even be a new high point on the, on that graph. Again, it's a little bit sort of apples to oranges because they've done a lot of scaffolding. How much have they broken it down? Like, how much scaffolding are you allowed to do, you know, with these things before you sort of are off of their chart and onto maybe a different chart. But if you extrapolate that out a bit and you're like, okay, take, take the four month case, just to be a little aggressive. That's three doublings a year, that's 8x task length increase per year. That would mean you go from two hours now to two days in one year from now. And then if you do another 8x on top of that, you're looking at basically say two days to two weeks of work in two years, that would be a big deal, you know, to say the least. If you could delegate an AI two weeks worth of work and have it do a, you know, even half the time, right. The meter thing is that they will succeed half the time on tasks of that size. But if you could take a two week task and have a 50% chance for that, an AI would be able to do it. Even if it did cost you a couple hundred bucks, right? It's like, well, that's again a lot less than it would cost to hire a human to do it. And it's all on demand. It's kind of, you know, it's immediately available. If I'm not using it, I'm not paying anything. Transaction costs are just like a lot lower. The whole, you know, the many, many other aspects are favorable for the AI there. So you know, that would suggest that you'll see a huge amount of automation in all kinds of different places. The other thing that I'm watching though is the reinforcement. Learning does seem to bring about a lot of bad behaviors. Reward hacking being one. You know, any sort of gap between what you are rewarding the model for and what you really want can become a big issue. We've seen this in coding in many cases where the AI will Claude is like notorious for this will put out a unit test that always passes. You know, that just has like return true in the unit test. Why is it doing that? Like, well, it must have learned that what we want is for unit tests that pass. You know, we want it to pass unit tests, but we didn't mean to write fake unit tests that always pass, but that technically did, you know, satisfy the reward condition. And so we're seeing those kind of weird behaviors. With that comes this like scheming kind of stuff. We, we don't really have a great handle on that yet. There is also situational awareness that seems to be on the rise, right where the models are like increasingly in their chain of thought. You're seeing things like this seems like I'm being tested, you know, maybe I should be conscious of what my tester is really looking for here. And that makes it hard to evaluate models in tests because you don't know if they're actually going to behave the same way when they're out in the real world. So those, you know, I wouldn't say this is a high level or high confidence prediction. But like one model of the future I've been playing with is the task length keeps doubling while at the same time these weird behaviors pop up and then are suppressed. And we have seen in the quad 4 and in the GPT5 system cards, Claude 4 reported, I think, a two thirds reduction in reward hacking. And you know, in GPT 5 they reported a few different dimensions, but you know, say something similar reduction in deceptive behavior. But those behaviors kind of just emerged. So it's sort of like weird behavior emerges. Then they sort of figure out how to tamp it down, but not entirely. Presumably in the next generation they'll like tamp it down some more, but maybe some new additional weird behavior could emerge and then they'll have to kind of tamp that one down. All the while the, the tasks are expanding in scope, you know, with a. Every four months doubling. So you could end up in a world where you can delegate really like major things to AIs, but there's some small but not necessarily totally vanishing chance that it like actively screws you over in the way that it is trying to do that task. And so you have a, you know, maybe it's like, okay, I think here I'm about to get two weeks worth of work done for a hundred bucks. Okay, that's amazing. But there's also a 1 in 10,000 chance that it like legitimately, you know, attacks me in a, you know, like, meaningful way. Some of the things that we have seen, these are like fairly famous at this point. But in the Claude 4 system card, they reported blackmailing of the human. The, the setup was that the AI had access to the engineer's email and they told the AI that it was going to be like replaced with a, you know, a less ethical version or something like that. It didn't want that. And it found in the engineer's email that the engineer was having an affair. So it started to blackmail the engineer so as to avoid being replaced with a less ethical version. People, I think, are way too quick, in my view, to move past these anecdotes. People are sort of often like, well, you know, they set it up that way and you know, that's not really realistic. But another one was whistleblowing. You know, there was another thing where they sort of set up this dynamic where there was some, you know, unethical illegal behavior going on. And again, the model had access to this data and it decided to just email the FBI and tell the FBI about it. So first of all, I don't think we really know what we want. You know, to some degree, maybe you do want AIs to report certain things to authorities. That could be one way to think about the bioweapon risk, you know, is like not only should the models refuse, but maybe they should report you to the authorities if you're actively trying to create a bioweapon. I certainly don't want them to be doing that too much. I don't want to live under the, you know, surveillance of Claude 5 that's always going to be, you know, threatening to turn me in. But I do sort of want some people to be turned in if they're doing sufficiently bad things. We don't have a good resolution society wide on, you know, what we want the models to even do in those situations. And I think it's also, you know, it's like yes, it was set up, yes it was research, but it's a big world out there, right? We got a billion users already on these things and we're plugging them in to our email so they're going to have very deep access to information about us. You know, I don't know what you've been doing in your email. I don't, I hope there's nothing too crazy in mine. But like now I got to think about it a little bit, right? What, what did I, have I ever done anything that I, you know, geez, I don't know. Or, or even that it could misconstrue, right? Like it's obviously not. Maybe I didn't even really do anything that bad, but it just misunderstands what exactly was going on. So that could be a weird, you know, if there's one thing that could kind of stop the agent momentum, in my view, it could be like the 1 in 10,000 or whatever. You know, we ultimately kind of push the, the really bad behaviors down to is maybe still just so spooky to people that they're like I can't deal with that, you know, and that might be hard to resolve. So well, you know, what happens then? You know, it's hard to check two weeks worth of work every couple hours or whatever, right? Like that's part of where the, where the whole. Then you bring another AI in to check it. You know, that's again where you start to get to the. Now I see why we need more electricity and, and $7 trillion of build out is yikes. You know, they're going to be producing so much stuff, I can't possibly even review it all I need to rely on another AI to help me do the review of the first AI to make sure that if it is trying to screw me over, you know, somebody's catching it, I can't monitor that myself. I think Redwood Research is doing some really interesting stuff like this where they are trying to get systematic on like, okay, let's just assume this is quite a different, quite a departure from the traditional AI safety work where the, you know, the big idea traditionally was let's figure out how to align the models, make them safe, you know, make them not do bad things. Great. Redwood Research has taken the other angle, which is let's assume that they're going to do bad stuff. They're going to be out to get us at times. How can we still work with them and get productive output and you know, get value without, you know, fixing all those problems? And that involves like again, all these sort of AIs supervising other AIs and crypto might have a place to. A role to play in this. Another episode coming out soon with Ilya Polasuchin, who's the founder of near. Really fascinating guy because he was one of the eight authors of the attention is all you need paper. And then he's started this NEAR company. It was originally an AI company. They took a huge detour into crypto because they were trying to hire task workers around the world and couldn't figure out how to pay them. So they were like, this sucks so bad to pay these task workers in all these different countries that we're trying to get data from that we're going to pivot into a whole blockchain side quest. Now they're coming back to the AI thing. And their tagline is the blockchain for AI. And so you might be able to get, you know, a certain amount of control from, you know, the, the sort of crypto security that the, the blockchain type technology can provide. But I could see a scenario where the, these, the bad behaviors just become so costly when they do happen that people kind of get spooked away from using the frontier capabilities in terms of just like how much, you know, work the AIs can do. But that wouldn't be a pure capability stall out. It would be a we can't solve, you know, some of the long tail safety issues challenge. And you know that if that is the case, then, you know, that'll be, that'll be an important fact about the world too. I always, nobody ever seems to solve any of these things, like 100% right. They always, every, every generation it's like, well, we reduced hallucinations by 70%. Oh, we reduced deception by 2/3. We reduced, you know, scheming or, or whatever by however much. But it's always still there, you know, and it's, and if you take the even, you know, lower rate and you multiply it by a billion users and thousands of queries a month and agents running in the background and processing all your emails and you know, all the deep access that people sort of envision them happening. It could be a pretty weird world where there's just this sort of negative lottery of like AI accidents. Another episode coming up is with the AI underwriting company and they are trying to bring the insurance industry and all the, you know, the wherewithal that's been developed there to price risk, figure out how to, you know, create standards, you know, what can we allow? What sort of guardrails do we have to have to be able to ensure this kind of thing in the first place? So that'd be another really interesting area to watch is like can we sort of financialize those risks in the same way we have, you know, with car accidents and all these other mundane things. But the space of car accidents is only so big. The space of weird things that AIs might do to you as they have weeks worth of Runway is much bigger. And so it's going to be a hard challenge. But you know, people are, people are working. We got some of our best people working on it.