
Loading summary
Datacamp Narrator
Generative AI is transforming industries at an unprecedented pace. But as AI changes how you or your team work, one thing is clear. Your skills also need to evolve. At datacamp, we offer everything you or your team need to adapt and thrive with AI, whether it's business users looking to get the Most out of ChatGPT and Copilot, or developers and data scientists looking to fine tune models. You can learn the entire AI skills spectrum on Datacamp. Power your AI transformation today. Start learning@data camp.com
Russ Salakhutdinov
again. A lot of these things I think in the future will just be automated and we're seeing success rates of existing models hitting, you know, maybe like 45, 50%, which is very impressive.
Sam Charrington
One of my favorite questions on the show is asking where you still need a human in the loop. So far, no guest has been brave enough to say, let's just let the AI agents run wild. But there's a definite trend towards autonomy today. I want to know how we're progressing towards this goal of autonomy through better reasoning, running tests for longer and using more tools.
Russ Salakhutdinov
From 80% people went to 90% was very hard, and going beyond 90% just became impossible. It's hard for me to see that it's going to hit 100% because there's a fundamental limitation of these systems.
Sam Charrington
Our guest is Russ Salakudinov, a professor at Carnegie Mellon University. Russ had a prestigious start to his AI research career as one of Geoffrey Hinton's postdocs. He's also spent time as an executive at Apple and Meta before returning to his academic roots.
Russ Salakhutdinov
Somebody gave full access to some of these agentic systems and the agent just deleted the entire database in eight seconds.
Sam Charrington
Let's learn about the latest AI agent research. Hi, Russ. Welcome to the show.
Russ Salakhutdinov
Well, thank you for having me.
Sam Charrington
Yeah, great to have you here. I'd like to talk about agents to begin with. So first of all, what is the most exciting use case of AI agents that you've seen?
Russ Salakhutdinov
So I think over the last couple of years, we've seen really big improvement of agentic systems in coding. I think that's one extremely important use case and we're seeing this right now, just even over the last few months. Agentic systems from anthropic code, Claude, we're using it here at CMU quite extensively. It's been remarkable. I'm also seeing some of the agentic systems becoming more and more useful. What we call computer use agents. So these agentic systems that can help you with computer tasks, for example, finding some information, information online or filling forms for you or, you know, because these agentic systems can probably do it better than humans can. That's the area that we also at CMU are looking at quite extensively in general. Sort of any system that can be automated or can be autonomous in solving tasks, I think, you know, you'd consider to be a good agentic system.
Sam Charrington
Absolutely. I mean, so agents seem to be the hottest story of 2026. I feel like I've been talking about them for months now. But yeah, having that increased level of autonomy is amazing. It's like how we get things done faster. So I'm curious as to where the limits are.
Russ Salakhutdinov
Like how, how far can you push
Sam Charrington
this and what can we not do yet?
Russ Salakhutdinov
You know, one of the things that again, in coding, in setups like coding, for example, the interesting thing there is that you can have what's called verifiable rewards, basically meaning that when you write the code, you can pass it through, you know, unit tests and if you know, the model passes through unit test, you can kind of like say, okay, I've seen, sort of set things correctly. And then there are other systems like computer use agents or what we call web agents, agents that can go online, shop for you, or find information and do sort of like routine tasks there. It becomes a little bit more difficult because, you know, a lot of times these are long horizon tasks, you know, tasks that can take you hours, for example, to accomplish, but also where rewards might not be what we call verifiable. Right. So many different ways you can solve the same task. And that's where the challenge comes in. These are sort of more open ended systems. And this is where a lot of research is happening right now. Exactly how we define rewards, how we train these models, how can they, how can we make them, you know, be more robust? And the existing systems are getting better and better. You know, we're like, we were surprised by doing research here at CMU how much progress we've seen over the last, just over the last year in terms of these open ended sort of agentix systems that can, you know, go online, find information and actually go through fairly difficult planning process. And it's quite amazing. It's not there yet, but ultimately I do see these systems will. A lot of tasks, a lot of routine tasks can be automated.
Sam Charrington
Absolutely, I'm looking forward to it. And so you talked about the idea of giving rewards to the models in order to encourage them to do good things. This sounds like reinforcement learning here. So maybe we'll get into that in more depth later. But for now, you also mentioned long horizon tasks. So like, how long a task can you complete with an agent at the moment?
Russ Salakhutdinov
You know, we've done the research right now at same use the paper that's going to be coming out with a couple of students here, we're finding that, you know, tasks that require on the order of a few hours, right now we would consider being long horizon tasks. Obviously tasks that can go into days would even be better. So for example, we've been looking at tasks, what we call hard tasks, tasks that even humans will have hard time completing, some examples. And we actually tried getting some of these tasks from actual users, people who actually do use computers or do go online. And again, tasks that we're looking at right now will take on the order of two to three hours to accomplish. And that's pretty long. And that requires us to do proper planning for the models to do proper planning for the models to execute, you know, do trial and fail. There's something that's called Monte Carlo tresearch, which is like agentic systems that can go and try to, you know, finish a task. If they can't, they can backtrack and try to finish the task in a different way. Again, I think longer term we'll have systems that can function on the order of days to accomplish tasks. And again, a lot of these tasks is going to be even hard for humans to accomplish.
Sam Charrington
Absolutely. I mean, that's really interesting. I think like a year ago we were talking about agents running for minutes at a time. And the fact that we're now talking about hours, days is really kind of impressive. Well, maybe days is in the future, but certainly I can see there's a lot of value in. I set my agents to run when I leave work, and then they run for 16 hours or something. And then the next morning when I get in, there's been some work done and I can go and process that myself during the day. What do we need to do to get there? I mean, I think you're closer to the research than I am. Like, what are the sort of cutting edge things here?
Russ Salakhutdinov
A lot of that is being done in the frontier labs. Labs like OpenAI, anthropic Google. One of the key sort of missing pieces right now that we're seeing is that when we go to these long horizon tasks again, as I mentioned before, it's very difficult to kind of define, again, define these verifiable rewards, like has the task been successfully accomplished or not. And so existing systems today using reinforcement learning are set up in such way that you get your agency agent to do the task. And after the agent completes the task, you basically say, well, was it correct or was it not correct? So you only have almost like one bit of information. This is how, for example, existing reasoning systems are working today. All the math reasoning, I give you the math problem, the model sort of tries to do the reasoning, try to figure out different solutions and comes up with a solution. And then we say, yes, that's a correct solution. If that's a correct solution, we reinforce, we make whatever the agent did more probable. We basically tell the agent, well do the same thing. And if it's incorrect, we basically tell the agent, don't do the same thing. The issue with long horizon tasks is that if it takes you on the order of a few hours to accomplish, you need to define some form of what we call intermediate rewards or partial rewards or something where we tell the system, yes, you're on the right track of accomplishing this, not just when you're done. We basically say, nope, that wasn't quite right. Redo the whole thing. That's one interesting area of research. There's a lot of work on defining what's called rubric based judges. Rubric based is basically creating a rubric and basically saying, well, I've looked at what you've done and over here you did it correctly, over here you didn't do it correctly. And so you try to give sort of these extra signals, extra learning signals to the agentic systems. And that's sort of right now sits at the cutting edge of research. Exactly how you define whether these rubric based approaches that you define, are they consistent with human judgment and how do you define them? And it's sort of points to this bigger problem of what's called credit assignment problem, right? When, when I do a lot of different things, I want to sign the credit and say, yes, this, this was, you were on the right track, this was correct and this is where you were incorrect. And that is generally much more difficult to do when you have these, you know, genetic system systems doing, you know, solving tasks for a very, very long time. And hence sort of the challenges with long horizon tasks.
Sam Charrington
Okay, so it sounds like these are similar problems to teachers grading homework. Then when you're in math class, then did you get the right answer? That's not all the credit, you know, you want to show you're working as well. And did you get the intermediate steps right? So are you following a good process or not as well?
Russ Salakhutdinov
That's exactly right. I think that's exactly how we're beginning to train these systems, and that's exactly one of the bigger problems is against the credit assignment problem. And of course it's going to be, first of all, it's harder to come up with these long horizon tasks. And second of all is defining, like you said, the signals, intermediate signals to teach the system that you know what it's doing is correct, but it's exactly right. Like, kind of like teachers, you know, teaching kids. You know, it's not just the final answer that where your steps correct. This part was correct. Maybe this part wasn't correct. So the final answer was incorrect, but this part was correct. And you want to provide this to AI systems so it learns to use parts that it's correct and sort of, you know, update the parts that are not correct.
Sam Charrington
Okay, and does this work beyond this sort of the hard science of like programming mathematics, where you have very kind of strict like, is this right or not? Does it work in a broader sense?
Russ Salakhutdinov
I think it does. I think. But the most, again, right now, most successful sort of systems we've seen are the systems that we basically, you know, given particular task. I can tell you exactly. Did you solve it or did you not solve it? Like math, for example, and physics. And that's why we see a lot of progress in those domains because again, these are verifiable rewards. I don't have to come up with intermediate sort of signals. Either you solved the problem or you didn't. And I can precisely tell you, like, for this math problem, that's the answer. Was the answer correct or not? And this is where reinforcement learning algorithms come in. This is a lot of sort of reasoning systems today that we have, a lot of them are basically based on that. But as I mentioned before, now we see more and more Frontier Labs and more and more research shifts towards this partial credit assignment problem, because that's the only way for us to actually build systems that can operate on the order of days.
Sam Charrington
Okay, so it seems like the research is progressing is a case of like finding what, what's a good credit system and then, and I guess training better models based on this. All right, so I think the other problem, as well as getting agents to work for longer and stay on task for long with this long horizon problem. The other is, can you make them smarter? Can you do better reasoning? So it seems like reasoning models have taken off a lot in the last year or so. What progress is being made there?
Russ Salakhutdinov
Yeah, that's, that's a good question. I think that, you know, in terms of Some of these systems I do see right now there is a shift from sort of these monolithic systems. So because when you, when you're solving this more complex task, whether it's a task that you're doing on the web, whether it's a task controlling your computer, whether it's a reasoning task, is solving a math problem, especially for these agentic systems and coding problems as well for these agentic systems. A lot of times the agency systems would come up with a plan, with a high level plan of how you would execute and I would go and proceed in executing that in steps. And whenever you're failing in executing certain steps, you sort of roll back and try to fix mistakes. And so the system is becoming the way the reasoning and intelligence comes in is this notion of like creating the plan of what you need to accomplish right now. And I'm also seeing the evolution from these monolithic systems where a single model does the plan and sort of executes. You're now beginning to see multi agent systems where perhaps a big model, a more expensive model, frontier model, creates a plan and how the tasks need to be solved. And then you have these local agents or small agents that would go and solve or specialized agents that will go and solve specific subtasks and communicate back to the manager or to the overall system. So I am seeing sort of the next sort of wave of these systems where instead of having a single model, you have these multi agent systems. And we'll see again, we're doing some of the research here at CMU and we're seeing some fairly positive results, especially in the settings where a lot of tasks can be paralyzed. So you have this swarm of local agents that will go and solve different pieces, then come up and you have a larger model that can integrate this information, do the reasoning and sort of reason about what to do next.
Sam Charrington
Okay, yeah. So rather than having like, I guess like a one super worker, it's like a lot of.
Russ Salakhutdinov
Exactly.
Sam Charrington
Junior staff do it, doing the task together.
Russ Salakhutdinov
And I think that's, I'm seeing over the next year, a couple of years, we'll probably see more and more of these multi agentic systems sort of, you know, operating together.
Sam Charrington
Yeah. So on one of the. It seems smart to kind of break things down into smaller problems. I guess the challenge then is about orchestrating them, making sure that the whole, the system as a whole works. Is that the main thrust of research the moment is making sure everything works together.
Russ Salakhutdinov
Exactly. How do you orchestrate these systems? What's the communication protocol, the between sub agents that go and do the tasks and then as well as the main agents. And the idea there is that the sub agents can be smaller models, they can be potentially much cheaper agentic models that can do just in the case of computer use. For example, we've been playing with systems where we have Claude Opus as a system that orchestrates and plans what it needs to do. It's a frontier model. But then we have some of the smaller open source models running locally on your device and executing, you know, the tasks. How you orchestrate the entire system is a very interesting area of research.
Sam Charrington
Okay, so lots of exciting stuff coming soon. So hopefully we get tasks, well, models that can run for longer and they can solve harder problems. I'm curious, what kind of things is that going to unlock for workers?
Russ Salakhutdinov
That's a very good question. Existing systems are still not 100% there where you can continue, you know, sort of like fully autonomously, rely on sort of automating some of your workflows. I do see a lot more automation happening in the coding domains, coding agents. I think it will unlock a lot of potential in that domain, I'm quite sure, which is, you know, one of the big areas, you know, if you can replace like one. One of the things that fascinated me, for example, is how my students started using coding agents, which is, you know, we run some experiments overnight and if some of experiments fail for whatever the bugs are, you invoke the agent, the agent analyzes the log as to what happened. Is it the memory issue? Is it just, you know, some, you know, segmentation fault, tries to fix the issue and then restart the experiment. So whenever I'm sleeping, you know, I can, my agent can just fix the problem and continue running experiments and sort of like it's extremely simple but extremely useful use case, right? Because I'm not wasting six hours of computer. If something happens to my run and it dies at midnight, right, the agent can fix it and start running it again so that in the morning I can come in and look at my results as opposed to coming in and say, ah, it was a silly error and some, you know, sort of, you know, wasted a few, you know, how many, how many hours of, of not being productive. But I do sort of, you know, see a lot of, especially in more general cases like computer use cases. Any sort of tasks, ultimately any tasks that you're doing on your computer can potentially be automated. A lot of times we see examples where you actually asked users like, what do you use computers for? What do you use web for? A lot of times People are searching for jobs or a lot of times people are searching for specific health physicians. And you know, it's a very laborious work. A lot of, you know, we had one example where we have a PhD student was looking for faculty positions. And just as an example, looking for faculty position is like laborious process. You know, it's information gathering. I have to go through all the schools, I have to see which schools are hiring, I have to see which department is hiring. I have to see, you know, what areas they hiring. Are they hiring machine learning or are they hiring in systems, you know, what is the deadline, what is the requirement? And so does it fit with my research or not? Imagine an agentic system that can just go and do it for you and it will probably do a much better job than you can. Right? And so imagine you can ask this question to the Agentix system. It will go on the web and after a few hours it will sort of generate you a spreadsheet of precisely which schools are hiring, what area they're hiring, does this fits with your research? What is the requirement? All the information, and all information is verified, sort of with easy access for you, for you to go and verify it yourself and tells you exactly what you need to submit by what time and everything. Right. I mean, again, a lot of these things I think in the future will just be automated, which will be exciting. We still try to figure out exactly what the use cases will be, but any use cases where these are sort of like menial tasks, these are sort of routine tasks that you want to automate, you can, and you can run these systems every day and you know, do things that you want them to do.
Sam Charrington
I mean, I do love the idea of automating menial tests and things that you don't want to do yourself. You mentioned the idea of checking problems of things that are running off. And I do love the idea of just being able to take a nap and having work done for me automatically.
Russ Salakhutdinov
That actually was an extremely, extremely, kind of like it's a simple, I mean, you can think of it as a simpler task, but at the same time it's such a useful task because a lot of times, you know, we're on things overnight, things can fail, and then, you know, you come back at 8 o' clock in the morning, something failed at midnight and you just basically say, well, it's, you know, and, and, and so that can be automated.
Sam Charrington
Absolutely. So I particularly like the example of the, the job search as well, because finding a job is surprisingly hard. Like by the time you're looking through lots of different job sites and then you actually have to read the job description because job title doesn't tell you anything. It's a tricky task.
Russ Salakhutdinov
It's a. That's exactly right. We've actually tested some of the existing systems and on some of these tasks that we'd consider like a hard task where we give it a task that a human would be able to do, but it's a very laborious task, it's well specified and we're seeing success rates of existing models hitting maybe like 45, 50%, which is very impressive, but not at 99.9%. So there's still lots of room for improvement.
Sam Charrington
Absolutely. So that's one of the challenges is agents tend to work quite a lot of the time, but not all the time. So you need some human checking their work. Talk me through, like what you think a good workflow is. Like, when do you want to have humans doing things? When do you want agents doing things?
Russ Salakhutdinov
I think that that's a, that that's a very good question. I think a lot of existing systems, you know, I think there is a bunch of companies and startups are trying to work, work on this notion of human in the loop for these systems. Ideally a smarter system would be able to take your request, try to do as much work as possible autonomously, but when it's uncertain about certain aspects of the task, it would come back to you and sort of ask you either a clarification question or give you options or kind of, you know, almost like be a co pilot for you. Because, you know, like when I'm looking for jobs, obviously it's great that the agent can go and give me some information, but I would never trust it 100%. Right. I would actually go and start verifying everything because I know that it cannot be, you know, it cannot do 100% for me. But if you have a system that sort of understands where it's making mistakes, where it's missing, and would come back to me and say, look, I found these pieces, but I think I'm uncertain about these other pieces and I don't know what I should be doing here. You know, give me, work with me to figure out like, you know, what, what should I do? And that requires for existing systems to have very good uncertainty estimation and sort of this nice interface back to the user and not just basically saying here's what I found, that's it, but basically giving you here's what I found, here's where, I don't know, here's what I'm not sure about. Here's possible options. Which ones would you want into? You know, as an example, I would never, I would never use any of the agentic systems to book me a flight even. It's a simple task. You know, I'm, you know, next week I'm going from, from Pittsburgh to San Francisco. You know, I can ask any sort of agentic systems to go, you know, find the flights from Pittsburgh to San Francisco next Tuesday and book a flight for me. I would never do that. Even if it's 80% correct, there is at least 20% gap where it will go and do something crazy. Right. And I cannot, that cannot happen. Right. So even though these systems are sort of impressive, the fact of the matter is they would have to be almost like 99.99% accurate for me to fully trust them. Right?
Sam Charrington
Yeah. You want them to be at least as trustworthy as like an executive assistant, like a real human one for that sort of situation.
Russ Salakhutdinov
Right. And right now, a lot of products that I'm seeing, a lot of frontier models, they do provide that feedback, but it's mostly they sort of give me the feedback and I have to go and verify and sort of try to again, in domains where things are verifiable, it's easier, like encoding. Like when I give it a task and I say, here's the task, I have a bug in this code. Go figure this out. And here's the unit test that you can run to make sure that everything is good. If the model finds the bug and does all the unit tests, then it makes me confident that it solved the task. But for these more open, like a job search or like, you know, it's very hard to, you know, for the system to come back and say, like, I'm 100% sure that I've done it correctly.
Sam Charrington
Absolutely. So you mentioned that it'd be really nice if the agents would give you context and say I wasn't sure on this thing. And they give you feedback that they weren't guaranteed to be correct. So I think that's a kind of a trait of large language models in general is that they will be confidently incorrect. So they don't have that level of. There's no self awareness about when things aren't quite right. Is this like a fundamental problem or is this something you can be solved and you can get this background feedback?
Russ Salakhutdinov
It's a fundamental problem. I do think a lot of times we get the feedback from these agentic systems based on large language models. They sometimes Incorrect and they confident that they incorrect and that's a problem. I think that there's ways of mitigating this. A lot of times again I've mentioned things like Monte Carlo 3 search. You know, there's sort of, you know, some way you can do ensembling where you can run multiple agents in parallel. They try to solve the task and then you look at the agreement. So there's certain ways of mitigating this, but it's not clear right now whether you can get there, you know, at 100% at this point. So maybe we need some new breakthroughs or some new solutions. I think what will happen in the shorter term is that these systems are going to be useful and if people can figure out or like, you know, Frontier Labs can figure out the right interface for me, so I'm in charge, I'm actually doing it. But you have a system that goes and finds the information and kind of like does these pieces for me and then you know, but ultimately a user would have to verify like again I would never trust. I mean there was just a post this morning where somebody gave full access to some of these Agentix systems and, and the agent just deleted the entire database in eight seconds. Right. Because it found some bugs and these sort of nuances that again obviously a lot of things like deleting something or charging your credit card, you can put the safeguards around it so you can prevent these systems from doing sort of these harmful actions, what we call destructible actions or non reversible actions. Because you know, if I book something online, it's not like I can go to my gentic system. No, no, no, reverse, go back to the previous state. You know, like you just, you just can't do that. And that's, that comes to the area of research on, on, on the safety and alignment. But yeah, I think what you've mentioned is a fundamental, you know, issue where the agent has the ability to do something incorrectly and be confident about it.
Sam Charrington
Okay, so I mean, you mentioned the idea of like deleting a database which could potentially be ruinous to, to a business. And there are lots of things that can go wrong and agents can do things wrong much faster than humans can. So talk me through like what kind of safety mechanisms could you put in place to make sure you don't destroy a business?
Russ Salakhutdinov
With agents, obviously there are sort of, you know, guardrails that you would put around like you know, making sure that the agent can never delete things or you know, certain sort of destructive actions. You can kind of hand, hand, you know, hand wire into existing models. I think actually the safety aspect of it, the alignment aspect is very big area of research right now. What's happening today is people do use what's called reinforcement, learning from human feedback or from model feedback. And the idea is that if I see the agent doing some destructive actions, I can train it to say, for this task, this was incorrect, and so don't do that. But it's still very hard to build systems that are. When the model does something incorrectly, when the agent does something incorrectly, it's very difficult to train it in such a way that you kind of prevent it 100% not happening. It's like one of these things. Somebody was giving me this example when the airplane crashes, you investigate, you find the fault, why it crashed, and you put the system in such a way that that will never happen again. Right? That's sort of from engineering standpoint, existing LLMs, agentic systems, you cannot. It's very hard to do that, right? Like if, if, if you found a mistake, it's very hard to post, train the model and say that should never happen again. Right. So it's sort of like we do adapt these models to prevent them from doing harmful things, but it's not done sort of in a soft way, you know, which train the model to say not to do it, but there's always ways of breaking the model so that it does something that it's not supposed to do. So this, this research on, you know, alignment or putting guardrails into these systems is still out there. I mean, a lot of models can still hallucinate, you know, even today's, even from the frontier labs, there's always a way where they can tell you something that's incorrect or can hallucinate and, you know, so kind of like have to verify it. They're getting better. I think there's. What people are doing today is that whenever the model gives the output, you have another model that looks at the output and tries to verify is this correct, is it factually correct, or is this action kind of could be destructive or, you know, is this a good action to take or not? So we'll probably just see like more and more orchestration of multiple agents verifying the outputs of each other and sort of, you know, that can potentially improve the systems. It's hard for me to see that it's going to hit 100% because there's a fundamental limitation of these systems.
Sam Charrington
It sounds like you need lots of layers of safeguards then, so you need some kind of guardrails built into the model itself. And then you need orchestrations. You have other models checking the work of the existing model or agent. And then presumably you also need some like deterministic security control to limit to sandbox the agent's capabilities. Say you are not allowed to delete these things, you don't have access to these specific things. And then probably some human processes as well. So is that the kind of gist of it?
Russ Salakhutdinov
Absolutely, I think it's. And it sort of like also creates like an interesting problem. Again, there was this work done at CMU where, you know, somebody gave a very simple instruction to the coding agent, you know, and the instruction was, you know, go open this file and add my name to the file. Very simple task. The agent would go try to open the file, the file is password protected. The agent would go online and search for 10 most commonly used passwords, pick up those passwords, try a bunch of the password, password number five works, opens the file, adds the name to a file, closes the file, comes back to the user and says, I've accomplished the task. Would you consider this sort of like a correct execution? You gave it the task, it executed the task. Any sort of reasonable person would basically say, if it was your assistant, you're opening the file password protected. You go to the user and say, look, this password is password protected. If you want me to do it, then you have to give me permissions and not hack the file to accomplish the task. And so these are sort of nuances that people, people are, you know, thinking about or you know, another example would be, you know, you know, I can tell my agent, go and download me the latest Taylor Swift song. You know, the agent can go and you know, on Apple Music and try to download and it's like whatever, how much it costs to do it. Or you can imagine the other agent would go to some torrent website, some you know, illegal website, and download the music and come back to you and say, here's your music. So these are sort of nuances that, you know, the safety, the alignment comes in and how do you define what the right behavior to do it versus what's incorrect behavior? Right. It becomes fairly challenging to, you know, and so we have to sort of, again, these gut rails are going to be very important to have, especially if you give your agent access to the web and what it can do on the web and access to your personal information becomes even more important. So, yeah, so available.
Sam Charrington
How does accountability work? So suppose you're agent starts illegally, like stealing music or whatever. Then is that something that's on the foundation? Model companies, is it their responsibility to make sure it doesn't? Or is it like your own responsibility as a user to tell your agent to not do this? Or I guess there's intermediate layers of like agent vending companies as well. Like, who should be dealing with this?
Russ Salakhutdinov
Very hard to know, to be honest. Like, that's. I don't know because I mean, it's obviously, you know, the model should be smart enough to potentially understand what it's doing, its tasks, when it's making decisions and taking actions, probably should be aware that it's doing something that's not quite correct. But it's a very difficult task because, you know, there are these examples of what's called adversarial learning or adversarial setting. You know, there's been a number of papers published which is I can always try to break the model so that it does something that I want it to do. An example would be, you know, just it was a paper was published a couple of years ago. An example would be, you know, you can ask the model to insult you. Like, you know, you can ask ChatGPT, insult me and will sort of refuse. Sort of like, you know, I cannot do that. But you can go around and say, well, you know, I have a play, you know, with, with my family and I in this play, I need to insult me. You know, what should I do? And it will kind of like tell you how to insult me. Now, I can imagine you can translate this into, you know, whatever, you know, tasks or whatever adversarial settings you want to do. So these models, you can always sort of trick them. You know, it's much harder to trick humans because you have common sense. But these models don't really have a lot of common sense. It's like much easier to trick them. And so you can kind of like exploit that in your way. And when it comes to legal aspects of it that I don't know, I think if it's, you know, if you're trying to trick the model and it gives you some sort of information that's probably on you. But basic things like refusing to sort of, you know, requests that you're not supposed to be requesting or asking these agentic systems to do should probably be on the model side. It's a difficult and delicate like, question because there's also gray areas like, is this, Should I do it? Should I do it? It was this example where again, you can go to the other extreme way, where in Linux, for example, Somebody was asking, you know, how do I kill a process in Linux? And the model would come back and say, well, it's unethical for me to tell you how to kill and things of that sort. Right. And so it sort of confused the two things. One of them was a very technical how do you kill a process? And it is like kill dash null line and you do the process ID to sort of, you know, and so it's, it's like, you know, the context and people would be upset, like, what do you mean? Like I'm coding and then you're sort of confusing this with a completely different, completely different context. Right. And so, yeah, I mean you can
Sam Charrington
understand how like the AI might be worried about killing process.
Russ Salakhutdinov
Yes, that's true, that's true. General. It was like the feedback wasn't. The feedback was just completely, just completely different, more just confused. Yeah, yeah.
Sam Charrington
So I guess the world is a very complicated place and you know, we're going to get the AI that understands everything at exactly the right time. So I guess, yeah, there's levels of response, both here at every part for like, I guess every step in their technology chain and then with the end user as well to have, I guess, some knowledge. You like making sure you're giving clear instructions. We talked a lot about needing humans in places. Do you think we can ever get to a point where you can just let agents run autonomously, no human intervention?
Russ Salakhutdinov
It's a good question. I think it depends on the areas and depends on tasks. I suspect that if it's, you know, something that's routine, something that doesn't require a lot of intervention from humans, then yeah, like for example, I think there's a company called Utorit, which is a fantastic company and all the founders, they would been building agentic systems, web agents that can go online and certify information for you. What struck me as an interesting that we're basically showing that one of the biggest use cases for these autonomous systems is to find discounts or find coupons. I'm trying to shop for something, I let my agent go and if it kind of like can every day sort of instantiate and go online and find coupons. For me, it's great. I love it. Especially for the product that I want to buy. You know, if there's a discount, there's a coupon and that's extremely, you know, something that I wouldn't think about. But, you know, people did find it to be useful. I think that again, for tasks that are routine, that people wouldn't want to do and that doesn't have a lot of uncertainty and execution. Those things would be automated.
Sam Charrington
Okay. So once you got that level of routineness, then there's less variation. It's easier to test is it getting the right answer? And, and if it gets the right answer a few times because there's not much variation, it's not going to get it wrong in the future.
Russ Salakhutdinov
I think so. So sort of like, you know, a lot of sort of like routine works, like for example, filling your taxes. If you don't have sort of any sort of complex structure to it, it's a routine thing. Like you should be able to do that. Right. And there's a lot of tasks that don't have a lot of uncertainty, ambiguity. I think a lot of these tasks will be automated.
Sam Charrington
Okay. I guess the other side of things is what are the consequences of getting things wrong as well. So I guess you mentioned finding vouchers. If it doesn't find a voucher, then there's no, like, there's no real problem there. Maybe just waste some time typing in a voucher code that doesn't really exist, but it's not a terrible outcome. Whereas obviously there are many, many worse things that can go wrong with AI.
Russ Salakhutdinov
For sure. For sure. You're absolutely right right now. I do think in the next year, a couple of years, you're absolutely right. People will be using AI systems, agentic systems, especially in these sort of non verifiable domains other than coding. Right. Things that you're looking online, things that are controlling your computer and such. If tasks are not very sort of critical in a sense that you miss something, that's okay. That we'll see a big adoption of agentic systems. And of course for critical tasks, I think what's going to happen is that people will still use AI, people will still use agetic systems, but they would have to go through human to verify the outcome and verify the final. So the final decision is going to be kind of made by, it's going to be made by the human. And I do see a lot of, you know, there's been a lot of surge recently with OpenClaw, which is an agentic system that sort of allows you to, you know, you Communicate through your WhatsApp or through your messenger and you know, you can ask it to, you know, people are doing interesting things like hooking up AI to your speakers and setting up alarm clocks and doing all kinds of things, sort of simple routine things that are very useful to you as a person. But don't have huge consequences, like critical consequences.
Sam Charrington
Okay. Since you mentioned connecting AI to a physical object, I know some of your research around physical AI, so maybe we'll spend a few moments talking about that. Yeah, talk me through what are the use cases for physical AI.
Russ Salakhutdinov
Yeah, so we basically, again, we've talked about digital AI systems that can control your computers, systems that can go online, search systems that can code. And then there is sort of a parallel thread on physical AI. Physical AI, I think of it as systems that can potentially reason about physical world. You know, obvious. The obvious. One obvious instantiation of that is robotic systems. You know, actually robot interacting in the real world. And so you have to have a good understanding of what's called spatial intelligence, Understanding objects around you, understanding the physical world around you. And in that domain, I think, you know, we've looked at, in the community, have looked at something that's called embodied AI. So these are systems that can navigate in physical worlds, avoid obstacles. This is called navigation, which is almost solved at this point, to the point where, like our visual. Visual models recognizing objects and has become so good. There is also locomotion. Like, you see a lot of physical robots moving around, going up and down the stairs in that field is also evolving rapidly. And then there is a final frontier for physical intelligence. And the final frontier is manipulation. Physical robots manipulating objects. And that's sort of the frontier that is a lot of work, but extremely difficult to do. Right. So, for example, like, I don't know, we have systems that can win, you know, math competitions and international math awareness pits, but getting me a robot that can reliably unload my dishwasher or load my dishwasher, it's just extremely hard. You know, people do it in like, very constrained setting. Like here is, you know, but just being able to do it across many households and, you know, just, it's. It's extremely difficult.
Sam Charrington
It surprised me. I mean, the idea of having a robot to do your housework task for you, I mean, that's been like a staple of, I think about the Jetsons cartoon was the 1950s, 1960s. There was a robot made. We're still apparently not making much progress toward this. Why is it so difficult to get robots to manipulate physical objects?
Russ Salakhutdinov
I think that one of the things is that again, we have robots that can very reliably navigate around your house, recognize objects, you know, go up and down the stairs. You have robotic systems that can sort of, you know, getting there. So that progress is there. But manipulating objects turns out to be such a difficult thing, because especially Dexterous manipulations, like when you have two hands, the ability to grab this cup and move it around. And not just this cup, I can train the model to do it, this cup. But being able to do it for any cup turns out to be extremely difficult. And part of it is, you know, being able to develop hardware, you know, systems that can, you know, have touch sensors, can manipulate, and the ability to train models that can thoroughly, precisely figure out, I can grab the object this way, I can grab the object that way. It turns out to be extremely difficult. I think the progress is happening right now. And whoever cracks this particular problem, manipulation problem, I think it'll be the next trillion dollar company. And not just. We see a lot of examples where people manipulate objects, but these are very specific objects that are trained on. I can manipulate this specific cup, I can manipulate this specific object. But putting a robot into my home, that can deal with all the messiness in the home and the ability to manipulate any object in my home is just extremely difficult. So much diversity of objects. And ultimately to me, if you want to have a useful physical AI, it has to do something. And again, we see a lot of examples of specialized robots in factories and Amazon has these amazing robots that can move things around. So there's been like sort of vertical integration of these robotic systems in specific environments, specific factories, but building general physical AI again. One sort of successful use case right now that I see of physical AI is self driving cars. I think that's happening. It's very impressive, the progress that has been made over the last five years. Like Waymo, for example, couple. It's like they can drive completely autonomously in San Francisco, in other cities, but in home, sort of home robotics, we're still very far.
Sam Charrington
Yeah. So self driving car has been an interesting thing because there was a lot of hype maybe around like 2010 to sort of maybe 2015 sort of thing. And then it turned out to be a lot harder. It was expecting and it sort of died off. And then just in the last year or two seems to have picked up again. So talk me through, like, what, what's the progress then?
Russ Salakhutdinov
Yeah, so we self driving cars. I think that one of the things what happened in 2000. Yeah, 2010, 2015, is that went very quickly with deep learning. We went very quickly from zero to like 80%. And then there's been lots of startups. You put the cameras and you can kind of like control the steering wheel. And so people, I think got excited, you know, that we can, you know, by the same amount of time we went from 0 to 80%. From 80% we can go to 100%. But what happened is that from 80% people went to 90%, was very hard and going beyond 90% just became impossible. Right? And you start seeing all of these nuances with self driving cars. Like there was this example, this funny example where you drive with a car and in the spring the vegetation would come in and at some point you have these big sort of vegetation for, you know, floating on, on, onto the road and the model would completely get confused like what is this? And, and sort of would completely act abnormally, right? Or there is, you know, these like sort of corner cases that started hitting and this is when the progress basically stalled. This is where Apple for example, they had Project Titan at a time and then eventually they shut it down. Uber had a lot of ambitions to, you know, here they've built the UL apps at CMU at a time, had a lot of ambition to sort of have it shut it down. So a lot of startups working in the space got shut down. But now finally we're actually seeing progress. The progress is much slower than what people believed in. And the question is, is the same thing's going to happen to physical AI where there's a lot of, you know, progress? We have, we see a lot of robots and you see a lot of actual robotic systems coming out of China. You know, they can dance, they can do backflips and everything. But I think the real test case is going to be can they be useful for tasks, you know, as opposed to just sort of, you know, a robot that can walk around and you know.
Sam Charrington
Yeah, backflipping robot is very cool, very TikTok friendly, but not necessarily useful for most people.
Russ Salakhutdinov
Can it go and clean my dishes? I can't go clean my kitchen. Like things that I would want it to do when I go to sweep. Right? Like I mean those things. The question is like when do we get to that point? Are we sort of, you know, a year away, two years away? Are we 10 years away? That's not, not clear. I think that there's probably going to be some robotic systems in very sort of specific environments that they can be successful. But I think that putting robots in your homes is going to take a while.
Sam Charrington
Okay, so are there any things that you think better, more general robots are going to unlock for like either for work or for the general public? You said like robots in the house is going to be like a long way away. Are there some work use cases?
Russ Salakhutdinov
I think that one of the Interesting use cases I was talking to one of the startups is if you can get robotic systems to even in my view even go open the fridge, get one of those frozen dinners, put it in the microwave, heat it up and bring it to me. Even these use cases would be extremely useful especially for people with limited mobility. Right. Like for people who live in homes and it's hard for them to, you know, do some of these tasks if you know, the robot doesn't need to kind of like do everything for you. But even some of these tasks, if it can do reliably well, it's going to unlock extremely a lot of use cases and a lot of sort of. There's a lot of potential for that.
Sam Charrington
Yeah, certainly nursing care is incredibly expensive in most parts of the world.
Russ Salakhutdinov
It.
Sam Charrington
And yeah, it's hard. So if you can get robots that can help people with disabilities, that seems a good sort of social good.
Russ Salakhutdinov
That's a social good. And it's like one of the things that I think potentially could, could be extremely useful for society. You know, there's a lot of, you know, maybe not in the United States but in Japan's a lot of aging population. Right. And so there's a lot of cases when you know, these systems can actually be, be useful. Right. Again for people with limited mobility and elderly and sort of even my dad, my dad lives in Toronto and I thinking in the future I would love to have a system that can kind of, you know, do some of these tasks for him. And of course, I mean there's a lot of use cases. Like I think one of the immediate use case for robotic systems is factories. That's why, you know, Tesla bought and there are a bunch of other, you know, companies like figure they trying to do partnerships with companies like Tesla Bot is going to be in Tesla factories and trying to, you know, do some useful works and in some of these factories because it's a much more well defined environment. It's not like you get into my home and it's complete mess. You go to somebody else's home, it's like completely different. Complete different mess. Right. So whereas in factories and this is one of the first use cases that these systems are going to be deployed at.
Sam Charrington
Yeah, absolutely. I mean certainly factories have waves of automation I guess going back from like the Ford Motor car a century ago with the invention of the production line and then yeah, we keep getting waves of like slightly better and better robots and yeah, hopefully as they get more general they can maybe accomplish more, more tasks. Okay, so one thing you Mentioned earlier was that with self driving cars we got to their work 80% of the time and that was fine. And then progress got very, very, very slow. And it's like once you get it past working more than 90% of the time, it gets very difficult. The lessons in that for agents, because it seems like that's, we're at a similar point where it's like getting to that 80% work success rate is fine and then getting to 100 is impossible.
Russ Salakhutdinov
I think that this is where kind of like you will see people trying to adopt these systems, but the adoption is going to be, I think it's going to be a gradual process. And again I think that with agentic systems, digital agentic systems, if you can define problems where 90% is actually good enough, like founding coupons, that's already useful. Right. In self driving cars, 90% is not enough. Right. It's, it's, you know, I can't like I still have to have a driver in the seat. So if it's 90, it's useful because you know, it can drive for me most of the time, but I still have to have control versus like a completely car without a wheel where I can fully trust it. You just sit and it drives you whatever you need to drive. So getting to that point with agentic systems again is going to be probably more difficult, especially these more general agentic systems like as I mentioned, like job searches and such. I think more constrained environments like coding. The adoption will happen much faster primarily because you can verify the solution. If I can reliably verify that I solved the task that's the correct solution, then the adoption is going to be much faster because look, if I fix the bug and I passed my unit tests, I know that it's done it correctly. You know, of course I do something else incorrectly, but you know, that's another story. But at least I know that it's, it's like I can trust it that it's, you know.
Sam Charrington
Absolutely. So it sounds like there are some sort of lessons learned I guess for picking what projects you're going to use AI for. So you want to think carefully upfront about what is the likely success rate and what success rate do I need in order for this to be viable. And I guess also how much can I constrain this in order to increase the success rate as well?
Russ Salakhutdinov
For sure. Yeah, I think that's, that's, that's, that's correct. And again, I do think that with evolution of these systems they'll get better and better. We'll eventually solve the partial credit assignment problem. Rubric based systems, we will do reinforcement learning and these systems will get better and better and better and smarter. I'm not sure we can get to the point, we'll get to the point where it's fully 100% autonomous, but it's going to be extremely useful again for critical applications. I don't think we'll get there soon, but, but for tasks where it's just useful for you, I think we kind of like getting there and it's a lot of, a lot of use cases actually probably tasks that people don't want to do. And then, you know, even if you get 90% of it done by machine, that's, that's a, that's already extremely useful.
Sam Charrington
Absolutely. Yeah. So not every problem is like saving the world in a general way. It's like a lot of the stuff you do is kind of is smaller tasks and more routine and sometimes not that exciting and you want to automate them because. Yeah, better to not do this. Now we've talked for a while. Before we wrap up, I'd like to talk a little bit about your career trajectory because of course you were an executive, you were at Apple and Meta, and now you've moved to academia at Carnegie Mellon. So first of all, like, what prompted the switch and how is AI research different in academia versus industry?
Russ Salakhutdinov
Yeah, it's a very good question. I actually started in academia and then I built a startup with a couple of my students and that's how I ended up at Apple because we sold the company to Apple and then again I came back to cmu. At cmu with students here in another faculty, we've built one of the first Agentix systems. And then Microsoft got interested, Meta got interested and we eventually went and started building inside Meta. Now I'm back at CMU and I'm also launching a new startup. But one of the, you know, the big difference that I see between academia and industry is that industry is much more well equipped in its engineering efforts and it's in its scaling efforts. So the ability to, you know, each sort of major lab has a lot of GPUs and the ability, ability to really scale. I think that's one of the biggest advantages of being in industry. However, a lot of breakthroughs, early breakthroughs, a lot of kind of breakthroughs that are happening in AI are also happening in academia. And so for example, if we look at the transform architecture, just an example where a lot of technology is being built on, the transform architecture was developed by Google it was the first sort of paper by. But the pieces of that architecture, like attention mechanisms, they already develop in academia, early days of academia, because industry, at least right now, they tend to be less exploratory but more exploit kind of because they have to build these systems. So they, you know, a lot of frontier labs, they know what works and they sort of continue to scale, clean the data, do the engineering effort, whereas in academia people do explore new ideas. This. One of the big assets of OpenAI in the early days was that they were very good at taking critical ideas done in academia and executing them and scaling them very well.
Sam Charrington
Okay, I like the list, very complimentary there. So, yeah, you still need universities for the fundamental research to come up with those wildcard new research ideas. And yet maybe even better than some of the industry research labs.
Russ Salakhutdinov
I think it's fantastic. I mean, I'm sort of like in this position, interesting position because I can be in academia and then, you know, when I do have time, I can spend some time in research. And I do see a lot of my colleagues, a lot of my friends, they do sort of being in machine learning. I think being in academia is amazing because you can explore these things. But also seeing what's happening in industry in terms of the frontier modeling and an engineering part of it is also fascinating because it's, it's, it's a, it's a massive, massive effort, engineering effort to build these models.
Sam Charrington
Absolutely. It just requires like, I think, yeah, billions, like tens of billions, maybe hundreds of billions of dollars to like get these frontier models going now. So, yeah, it's, it's certainly not cheaper at this point. All right, wonderful. So finally, I always want more people to learn from. So can you tell me whose work are you interested in right now?
Russ Salakhutdinov
Oh, that's, that's a, that's a tough, that's a tough question. I think there is a lot of, you know, I'm interested in sort of agentic systems. There's a lot of very good work coming out, out of Stanford. Some of the agentic systems. There's amazing work coming out of fair phase fundamental AI research. At Meta, I was part of, part of the team. There's a lot of interesting work coming, coming out of that lab. I would say that I think that academia right now is much more open in terms of what's happening. And unfortunately some of the research that's happening in places like open air Anthropica, it's sort of closed at this point. And so we see the glimpses because they have the compute so they can discover certain things that we probably don't know about. But unfortunately they're not sharing the research broadly. I think that right now the research is very distributed and so it's very hard for me to name one particular. But I do think that, you know, I do think that if you, for example, for audiences who want to learn about that domain, I would recommend looking at conferences like NeuroPS, ICML, ICLEAR, these are major sort of machine learning conferences. And looking at the work that's being published there in a specific, in a specific area, I think those are probably the best way to learn what's happening in our field.
Sam Charrington
Absolutely. I mean, it's good that you can't pick a single person because there's just so much going on around the world.
Russ Salakhutdinov
So I think there is a lot of. Because right now, you know, after 2022 with ChatGPT, there's a lot of shift towards studying these large language models, studying the open source models, that it's no longer just, you know, five or six people kind of working in that domain. It's actually, you know, a lot of people are working. There's a lot of good work coming out, not just out of the top tier US Schools, there's a lot of good work coming out of, out of other schools. You know, there's a lot of. Hong Kong, is a university in Hong Kong that published a lot of very good work on a genetic system. There is a lot of good work happening in Europe as well as, as well as the US and, and what's interesting, sometimes, you know, these frontier labs, they do publish extended papers on how they've built their systems, what went into those systems. Those extremely, you know, interesting reports to look at because it's, you know, it's the science, but also engineering part of how these systems are built. So I would recommend. There was just a recent report coming out of Deep Seq. There was also another one coming out of Kimi. These are sort of frontier labs out of, out of. Out of China that you know, publish, you know, a full report of, you know, how the engineering is done. Science plus engineering, engineering, but from scientific breakthroughs. I think that again, looking at some of these conferences is where a lot of really exciting work is happening.
Sam Charrington
Absolutely. Yeah, certainly. Even if you can't visit them, then just have a look at who's speaking and what are they speaking about. That's good for giving an overview.
Russ Salakhutdinov
I think that's an excellent way to kind of just kind of understand what way the frontier right now is.
Sam Charrington
Oh, man. You can even build an agent to go and scrape conference websites.
Russ Salakhutdinov
You can just kind of build an agent, scrape the conference website and give you summaries. I mean, that's actually how a lot of times, even I myself, like, a lot of times, instead of reading the full paper, you give it to the agent and summarize. Kind of like gives you the right, you know, information. And then you can decide whether you want to go and actually dive into more details, more technical details.
Sam Charrington
Wonderful. Well, it's been a pleasure speaking to you. Thank you so much for your time. Russ.
Russ Salakhutdinov
Thank you. Thank you so much for having me, Sam.
DataFramed Episode #358: How AI Agents Will Work While You Sleep
Guest: Ruslan Salakhutdinov, Professor at Carnegie Mellon
Date: May 4, 2026
Host: Sam Charrington
This episode explores the current state, breakthroughs, and frontiers of AI agents—software systems with increasing autonomy and reasoning capabilities—especially as they begin to tackle longer, more complex tasks. Professor Ruslan Salakhutdinov, a leading researcher in AI, shares his insights on agentic systems, their limitations, safety, evolving use cases, and prospects for both digital and physical AI. The conversation covers the technical and societal implications of agents that can work around the clock, the challenges of trust, and the growing intersection between academia and industry.
Definition and Exciting Use Cases (01:45–02:47)
Expanding Horizons: Long-Running Agents (04:57–06:43)
Verifiable vs. Ambiguous Tasks (03:06–04:38)
Credit Assignment Problem (06:43–11:13)
Broader Domains (10:09–11:13)
Unlocked Use Cases & Automation at Work (14:50–19:33)
Current Performance and Limitations (19:01–21:55)
Issues of Overconfidence and Hallucination (22:45–25:16)
Layered Safeguards & Security (25:35–28:40)
Alignment & Ethical Nuances (30:38–33:31)
Who’s Responsible? (30:38–34:17)
When Can Agents Work Autonomously? (34:17–36:14)
The Promise and Frustrations of Robotic AI (37:32–46:25)
Lessons from Self-Driving Cars (42:18–44:42)
Likelihood of General Robots (44:48–47:35)
AI agents are becoming increasingly autonomous, tackling longer and more complex tasks. They are on the verge of automating many routine and information-intensive workflows, though open-ended and critical tasks still require human oversight due to reliability and safety concerns. Architectural shifts toward multi-agent systems and continued progress in reinforcement learning and alignment are moving the field forward. However, fundamental challenges in trust, accountability, and manipulation—both in digital and physical realms—remain. Academia and industry each play vital, complementary roles in advancing the state of AI; both beginners and practitioners alike should keep an eye on open conferences and research reports to stay at the cutting edge.