
Loading summary
A
And so I think like these always on asynchronous event driven agents, that will be a really big productivity unlock. And especially in enterprises there's so many events that are just triggering, triggering, triggering. And so if you can have agents listening to those and firing off, I think that will be a massive game.
B
Welcome to the Nvidia AI Podcast. I'm Noah Kravitz. Our guest today is Harrison Chase. Harrison is CEO and co founder of LangChain. One of just the most incredible stories of this whole generative AI era that we're in as Harrison will get into in a minute. Langchang was founded about three years ago, over a billion downloads. The whole point is to help developers build applications with LLMs and now getting into agents and agentic frameworks and all that great stuff. So we're going to get into it in a moment. Harrison, thanks so much for joining the AI podcast and welcome.
A
Thanks for having me. Excited to be here.
B
So let's start. About three years ago, you started LangChain with this premise of building tools so developers could build apps with LLMs. What did you see back then that either others didn't or even if they did, you saw it and just thought, this is where things are going, this is where I'm headed.
A
What got us really interested was seeing the applications that people were building on top of LLMs and the systems they were building around the LLMs in order to power those applications. And those systems had a lot of similarities with each other even early on. And even early on we could tell that they would get quite complex over time. And so a lot of what we built is tools to help people build these systems, these agents, which we now call agents, around these LLMs and figuring out what the common patterns are and the cotton's tooling is and making it really easy for anyone to do so. Right.
B
And so now you've coined this, I don't want to say this term, but Langcheng, you talk about deep agents.
A
Yeah.
B
So what is a deep agent? And then maybe we can get into talking about the enterprise. And why would an enterprise in particular care about that distinction?
A
Yeah. So about a year ago we saw a few really interesting things, so maybe even backing up like, you know, three years ago. Great. Like LLMs, you want to connect data, you want to connect these other things to them. Fantastic. How do you do that? Turns out it's really hard. And the best way to do that for different types of agents was actually pretty different. You would build different scaffolding, you would build different workflows, around the LLMs, about a year ago, we saw cloud code come out, we saw Manus come out, we saw deep research come out. And under the hood, all of these had the same kind of like general architecture. They were simple in some ways. They were an LLM running in a loop, calling tools. But then they also had common patterns of connecting to a file system and having subagents and doing planning. And so about like nine months ago, we released for the first time Deep Agents, which is a library. And we've been building it ever since. And we've just continued to see kind of like this same pattern of giving the LLM more autonomy and this environment for interacting. This is what powers openclaw, for example, is this type of harness. And so deepagents is really this new type of agent harness that we think is really general purpose and that you can customize to do different things. But it's not like you're reinventing the scaffolding each time. You're just customizing it with prompts or tools. And so it's way easier to get started with and also way more powerful because it's a simple thing under the hood. And simple is, is really good. And so, and so Deep Agents is this general purpose agent harness model agnostic open source that we've been building for a while and we're starting to see more and more agents build on top of.
B
So when you're working with customers and the enterprise in particular, and we're getting into these systems that are so powerful, becoming so powerful in large part because they are autonomous to a larger degree and as you said, they can do more now. The agents can control the screen and go off and do things with apps and such. What are your conversations like with enterprise leaders? And what's kind of the feeling around, you know, is it a tension between risk, reward, is it just the excitement for what the systems can do? And so there's trust in building these systems that give agents more leeway. What are those conversations like?
A
There's a lot of things. So one like, you know, not, not everything needs an autonomous agent.
B
Sure.
A
And so one framework we have, Langraph, is really good for when you actually want to combine some of the autonomy of LLMs with more, more directed workflows and more control. And, and so honestly, when talking with a lot of enterprises about Deep agents, some of them are just like we love Lane Graph, Langgraph is better. We're going to stick with Langgraph and that's fine with us. We think there are different use cases and different things but that's definitely kind of like one component that comes into it. Another component that comes into it is definitely just like, okay, great, like the LLM is doing a bunch, but how do we know what's going on? Sure. And so another thing that we work on is laying Smith, which is observability and evals. And that's basically our answer for that. There's this really interesting thing about agents compared to software where agents, the interaction space for agents is way more open ended. You can ask it anything. Like text is infinite. If you have a ui, there's a bunch of different buttons you can click. And so it's much more constrained. And then also models are not robust at all, they're non deterministic. And then you change one word and the answer changes completely. So this is why we think observability is really important. And that's a huge thing that enterprises care about and very related to observability is than evals. Because sure, you can see one thing that happens, you can tell why it goes wrong. But what if, you know, what if you want to test how it did on like 10 different questions, 100 different questions. And so building up these eval data sets is a big thing. We work with folks.
B
Right. And so Langsmith is the platform for building agents as well as observing and evaluating.
A
Yeah. So the way that we think about the agent development life cycle is build, test, run, manage.
B
Okay.
A
And so the build is all the open source you can. It's like choose your fighter, choose Langgraph, choose deep agents, choose another framework. All of our stuff works kind of modularly. But then this test run, manage, that's Langsmith. So we've got a bunch of stuff around testing and evaluating these models. We have a deployment platform for deploying these at scale and then we have observability and other things for managing them.
B
Let's talk about skill. Or maybe you can talk about skills for a minute. When I first started playing with these tools, and I'm not a developer, I'm just kind of a technical layperson, if you will. Right. I love playing with these things. Back in the day when they first came out, maybe it was baby AGI, whatever it was, I spun my computer right into the ground in an infinite loop. But when I first discovered skills, it took me like there was a moment where it sort of took me back where I was like, wait, I just describe it and it goes off and it builds and then of course it does, because that's how this all works. But can you talk a Little bit about skills and about kind of that. That same idea of giving the agent the autonomy to write the tool and run with it. But how do you keep it in check and keep things secure?
A
Yes, skills are a great way to package up knowledge and other kind of like instruction sets and other tools for an agent to use. And so they started in coding agents, and a skill would involve basically a markdown file with some instructions and then some scripts that you could run. And one of the things that's kind of become clear over the past few months is like, coding agents are very general purpose in a lot of ways. And so the same idea of a skill as a markdown file and then some scripts to run is really, really interesting. We see a bunch of different types of skills. Some of the skills are purely kind of like informational. So, like, you want to learn about something great, go, go read this markdown file. Other skills do things. And this is where it starts to get, I think, like, really interesting. It could be a Python script that, that hits a URL. It could be a Python script that runs some GPU accelerated compute. And so this is, this also ties into the environment aspect. So, so when we think about agents, we think of a model, a harness, and this is Deep Agents and then an environment that it runs in a runtime for it. And so Nvidia just released Open Shell, which is a secure runtime for it. And then the other thing that's related to the runtime is also like, where it runs. Does it run on a Mac Mini? Does it run on some GPU accelerated environment? Does it run in the cloud? And so those three components and being able to pick and choose what you need for different jobs is a big part of kind of like customizing your agents.
B
Was there a moment? Or can I put you on the spot and ask you to think of kind of an aha moment where this, the idea of deep agents really clicked. And in a use case, and whether it was something internal at LangChain you were working on or maybe with a customer, was there kind of an aha moment where you were like, yeah, this is it?
A
I think so. It started just by seeing really the three things of Manus, Deep research and cloud code. And this is the same way that LangChain started as well, Just going to early meetups, seeing things that people were building and seeing patterns. And so the first version of Deep Agents, just like the first version of LangChain, I hacked on over a weekend, and it was a weekend project. I'd been talking internally with Some folks and being like, oh, like, you know, Claude, code's really interesting, like Manus, that's. They've got some similarities. And so it wasn't until I had time to kind of like, sit down on a weekend and hack some stuff together that, you know, we came up with a few patterns of what these similarities actually were. And then using it. I think the first thing we used it for was a deep research type thing. And so we gave it access to a bunch of files and we just put it in this, like, virtual file system and had to do some research. And. And it wasn't even really doing rag. It was just grappling and globbing like, like a coding agent would over these files. And it worked fantastically well. And so I'd say deep research was the first concrete thing, but really the idea came, came from just seeing, seeing a pattern and spending a weekend kind of like hacking on it.
B
You, you mentioned earlier the importance of know what the words you use, but audit, auditability, traceability, Being able to see how did the agents, you know, do what they did. Can you talk a little bit about evaluation driven development?
A
Yeah.
B
And how that plays into. And again, in the enterprise, you know, building that trust in what the agents are doing.
A
Yeah. If you talk about trust in the enterprise, that, that, you know, what does that mean? That means that the agent's doing what you want it to do. There's a few different ways that we see people getting that trust. Part of it is observability and traceability and being able to go into an agent run and see exactly what steps it took and exactly what it did. The other part where trust comes in is having these scenarios and running the agent over them and seeing how it performs and evaluating that. And this is kind of like what we talk about as evaluation driven development. You come up with these scenarios ahead of time. One common misconception here, by the way, is that you need 1000 scenarios for it to be effective. You could start with 5, you could start with 10. It really doesn't matter. I think creating these evals is a really good way to do product thinking about what the agent should act. Because this is another thing, like, agents can do anything.
B
Yeah.
A
But they shouldn't do everything. They should do, like, what you want them to do. And so being able to be forced to come up with like, hey, these are 10 questions that we expect the agent to get asked. This is what we think a good response is for each of them. This is what a bad response is for each of them. That's a really good kind of like mental model for kind of like coming up with what these agents should do. And then you can use that to drive all of your changes. So you change a prompt. Great. You can run it against this benchmark. Did it improve? Did it get worse? And then this, this eval data set is, is living over time as well. So as you release it to first like a small set of users, you might see them using it in unexpected ways. And then some of those ways you might be like, okay, maybe they shouldn't be doing that, let's put some guardrails around it. But other ways you might be like, yeah, that's totally legitimate. We had no idea they would use it. Let's add some data points to our eval data set so when we go and change the prompt in the future, we can make sure that it's still good at these use cases.
B
Are enterprise customers open to kind of rolling with that? Oh, we weren't expecting this behavior necessarily, but it's good behavior. And so is there a sense of kind of experimentation? Obviously in the AI community and the open source community, it's all about experimenting and sharing and things are going so fast. Is the enterprise embracing that at all?
A
The best ones do in limited ways and with a limited blast radius. They might roll it out internally, for example, they might roll it out to a set of alpha, they might roll it out to 1% of users or something like that. There's definitely way more caution there than there is with gen native startups. But, but building agents is so iterative and the importance of this iteration can really not be understated. And so I think the enterprises that are, I think a failure mode for enterprises is you have some idea of an agent, you take three months to craft a bunch of examples, you take another three months to, to build the agent, you take another three months to get humans to look at everything. But, but the space has just moved so fast. Like the, the whole, the whole idea you came up with, there's probably a better, like, there's just a better way to do it at that point. And so I think like you have to kind of ship, you have to learn. You ha. This is another thing, by the way, that no one likes the answer for. But like, you have to, you have to basically redo your agent every, every nine months at, the pace that things have been like with these agent harnesses. If, if you're using in agent architecture from like a year and a half ago, you should very strongly be considering looking at Rewriting on top of an agent harness or something like that.
B
Right.
A
And again, for performance only or for. Yeah, yeah, for performance. That's still the bit like that. So there's two things. It's like performance, but also scope of what the agent can do. So if the agent's doing a very small thing, it's not as valuable as if it's doing a big thing. And maybe like a year and a half ago you just couldn't get it to do the big thing, but so you focused on the small thing, but now you can. And so if you're not like reevaluating that and saying, hey, there's this big thing, let's hook up an agent harness, let's take a stab at that, you absolutely need to be doing that.
B
Yeah. So I want to ask you about models, Frontier models, you know, the. I would say everybody, but I think the kind of mainstream AI world, you know, focused on the latest and greatest and what can they do and everything. Open models have become incredibly important. I mean they've always been important, but I feel like the past year or so incredibly important. And you spoke earlier about OpenClaw and Nvidia's Open Shell and the Nematron family of models. How do you approach and how does Langchang approach and then your customers mixing Frontier and open models together to achieve cost, performance ratio and all manner of other things. What's your approach on mixing those?
A
Yeah, I think there's a bunch of different ways that we combine them. So I think one obvious way that we worked with Nvidia on a blueprint for, with deep research, you have a bunch of sub agents and those sub agents might want to be specialized agents and there might be a, there might be an orchestrator kind of like agent that's using a frontier model. But then when it goes to a sub agent, it might want to use either like a fine tuned model or an open source model for cost or latency reasons. And so when you have these big agentic systems with these subagents, it's totally possible that one part could be using a frontier model and one part could be using an open source model and another part could be using a fine tuned model the other. We've been paying a lot more attention to open source in the past, even just like two weeks, I would say for probably two reasons. One, I think they're getting good enough to where they can drive this harness. So being able to properly utilize everything in the harness, it's not super easy. And for a while it was only the Frontier models that could do that. We're starting to see there's still a step below the frontier, but we're starting to see that these open source models can drive the harness, which is really interesting because this is the most agentic stuff. And then the other thing that's causing us to look really hard at open models.
B
If I could stop you for a second, Harrison, back up. What are the qualities that a model needs to drive the harnesses successfully?
A
So at the risk of sounding a little broad like it needs to be intelligent, it needs to be good, another thing that is maybe underappreciated is it probably needs to be good at coding. So we've actually seen that Quencoder is a better general purpose model than just the Quin series of models because a lot of what makes up this harness looks very similar to coding agents. So this harness has a file system, it has a bash tool. Right. So if the model knows how to use it, if it's a coding model, then that's actually really, really good. And so I think models that are better at coding are generally actually general. Better general purpose agents.
B
Yeah, no, that makes sense. And so then the sub agent models you're talking about.
A
Yeah. And so then a second thing that made us look at this, look at open source models even more is open claw. So one of there's a bunch of really interesting things about Open Claw, but one of the interesting things is it's always on, it's proactive, it's running. And so if you're using a coding agent and you kick it off, even let's say like 20 times a day, you know, you're probably okay paying like some good amount for that. If it's running every 10 minutes, like, oh my God, you cannot. And if you, if you're running like three of these, like you, you just cannot do that. And so I think like cost is a really interesting reason for these open models, especially in these proactive, always on scenarios to make them become popular.
B
Shifting gears for a second, LangChain just opened. Nvidia formed the Nematron Coalition and LangChain joined. Can you talk a little bit about why and what it may or may not mean going forward for LangChain users?
A
Yeah, we need open models and we need harnesses that they can run in. And you know, we think we can provide the harness and we want to work with Nvidia and all the other companies in the coalition to help provide a model that can work with that harness and others as well. I think as we talked about the open source models are getting good. They're still a little bit behind the frontier models in terms of driving the harness and so great. We can use them in sub agents, we can maybe use them for some of these kind of triggers and the always on. But if they can drive the really expensive workloads, I think that's going to be really transformational in terms of what you can do with open models, which generally mean what you can do with more sensitive data, what you can do more cheaply, what you can offer to customers, just more. And so yeah, I think at a really high level we're excited about the Nvidia Nemotron coalition because we want an open model that works really well with open harnesses. And then a third part, which actually I don't think was part of the coalition when it started, but I think the open runtime is really important as well. And you guys are also doing stuff around that.
B
This is my favorite question to ask and you know, I'm sure the hardest but maybe the most fun to answer. What's next? What do you think agents, Agentix Systems, Langsmith LangChain, the company for that matter is going to look like in and I'll let you kind of go with what time frame makes the most sense because I ask and depending on the guest they're like a year. No, that's too long.
A
No, no, no, no.
B
But what do you think's coming down the pike as far as, you know, Agentix Systems and all of these things that you're working on every day?
A
I'd maybe call out kind of like some three things that I think are interesting. One's, one's pretty short term and I think we'll see in the next like month or two, if not, if not by the time this comes out. But asynchronous sub agents. So right now when an agent kicks off a sub agent, it basically waits for it to respond and that's great. But if these sub agents start to get really long running, you want to just have them run in the background and you want to have this manager orchestrator agent, like check in on them and maybe update them. And so I think one trend that we'll see is encoding right now, encoding agents. You interact with the agents that's doing coding. I think we'll start to see a trend where you interact with this orchestrator agent and that orchestrator agent spins up a bunch of background coding agents and you just talk to the orchestrator and say, hey, what's going on with this experiment? What's going on with this feature. And so I think we'll start to see asynchronous sub agents become a bigger and bigger topic.
B
Hate to resort productivity, but how much of a is that going to be a step change or how much of a difference in terms of what you're able to accomplish?
A
So, so I think, I think this bill, like the only reason asynchronous sub agents even make sense is if the agent, the sub agents themselves actually run for a while. Right? Like if they just run for like one second and then return, you can just make them synchronous. And so I think like it will be a productivity gain, but it like requires these agents to be long running in the first place. And I think that's the real productivity gain. And I think this is just a nice interface on top of them. One thing that wasn't on my list of three things but I think will also be more and more impactful is basically these agents being proactive, running in the background, always on listening to events that I think will be a massive productivity gain. So I have an email agent, it runs in the background, it listens to my emails when it wants to respond. There's still Human in the Loop, but it flags a draft and it's like, hey, here's a draft, do you want to approve it? Do you want to change something? That is so much more efficient than if I had to go, there's no way I would take an email, copy, paste it, go to ChatGPT, say hey, can you draft me a response copy, paste that. And so I think like these always on asynchronous event driven agents, that will be a really big productivity unlock. And especially in enterprises there's so many events that are just triggering, triggering, triggering. And so if you can have agents listening to those and firing off, I think that will be a massive game. The other two things that I think are coming down one, agent memory. We started to see this a little bit with openclaw, but I think the idea that it could remember things as you interact with it, it could actually update its own tools and skills. And descript itself, I think more and more we'll see agents kind of like remembering things and yeah, learning from their interactions. And that's why, that's why Human in the Loop is important as well. That's why I don't think these things will be fully autonomous because they need to learn. And the only way you do that is by interacting with the environment, with humans. And so I think that'll be a Big piece of it. And then the last thing is agent identity. So, you know, if there's an agent in an enterprise and I chat with it and you chat with it, whose credentials does it use? Does it use mine? Does it use yours? Does it use a fixed set? So previous to openclaw, I think we saw that basically everyone was doing the on behalf of model. So the agent would act on behalf of me, on behalf of you, on behalf of the end user, and it would pass like my Slack credentials through. And so I might get a different answer than you would get. I think the thing that OpenClaw changed is people started thinking of these agents as like identities, as their own things. And I think we'll actually see more things where they will be like, hey, Tom is a marketing agent and you can chat with Tom and I can chat with Tom and Tom has a persistent memory and Tom has its own credentials and Tom can go and do things and Tom is Tom. Tom is not acting on behalf of me or you. Tom has its own accounts with, with Slack or Gmail. And that's a big thing that we need to figure out that I don't think anyone in the industry really knows. You know, I was chatting with one SaaS provider. They, they, in all the open claw craziness, they were making it really easy for people to create accounts for their agents. But it's still like an account. And so like, will we see, will we just see more and more people create normal accounts? Will there be special agent accounts? I don't know, but I think this idea of like agent identity is really interesting.
B
Yeah, there's a whole can of worms on the other side of the words agent identity, I think, but not for this conversation. So, you know, you mentioned the weekend project that you worked on that unlocked things at LangChain for you. Openclaw, another weekend project went incredibly viral, incredibly quickly. What are your thoughts or how has that impacted the work you do? And I'm thinking more about the perception that users, developers, enterprise customers might have about engines as it really, you know, has it. Was there a rush of people knocking at your door saying like, hey, can you build me a claw? Like, how does it change things 100%?
A
I mean, I think Jensen said, what do you say? Every enterprise needs a claw strategy or something like that. And we're absolutely seeing that. I think, like, it set a North Star, it set a new objective for kind of like what these agents can and should be able to do. Now there are a lot of things that you probably want to do more securely than kind of like in open cloud. The whole reason it took off is because it can do everything. And that's great for weekend projects and hobbyists, but when you bring it into an enterprise, you're understandably going to want a lot more control. That's why we're thinking about agent identity, that's why we're thinking about observability. But in terms of like, did it change the North Star for what we build? Absolutely it did. I think it also made it really, it made it so much easier to communicate some of the ideas as well. And so that's been fantastic as well.
B
Amazing. Harrison, so much we just talked about in a short amount of time and so much more, but I'm sure by the time we cross paths again, as you mentioned, right, you get your take, three months to scope and three months to build, and all of a sudden it's nine months and, you know, no more. So the next time we cross paths, I'm sure it'll be a different looking world, but kind of built on these same things. But for folks who've been listening or watching and want to learn more about LangChain, the work you're doing, best places to go online, website, socials, research, blog, anything like that.
A
Yeah, we have a Great Blog. It's blog.LangChain.com. a lot of the stuff we talked about around context, engineering and agent identity will be blogs on there. And we update that a lot. And then Twitter, I think everything in AI is happening on Twitter. We're just LangChain on Twitter. And so you can find us there easy enough.
B
Harrison Chase, thank you so much. Been an absolute pleasure. Appreciate you taking the time to join the podcast.
A
Thank you for having me.
Date: May 6, 2026
Host: Noah Kravitz (B)
Guest: Harrison Chase, CEO & Co-founder of LangChain (A)
Episode Link
This episode explores the rapid evolution of agentic AI frameworks, focusing on LangChain’s contributions, the emergence of “Deep Agents,” LangSmith’s role in observability and evaluation, and new concerns around trust and identity for autonomous agents. Harrison Chase shares deep insights on enterprise adoption, skill-building, trends in open vs. frontier models, and how weekend hacks can drive industry-changing breakthroughs.
"Those systems had a lot of similarities... even early on we could tell that they would get quite complex over time." (01:20)
"Simple is really good. Deep Agents is this general purpose agent harness... you’re just customizing it with prompts or tools." (02:01 & 02:57)
“Our answer is laying Smith, which is observability and evals... The interaction space for agents is way more open ended than for software.” (04:15 & 04:49)
“Skills are a great way to package up knowledge and other kind of instruction sets... Some skills are purely informational... Other skills do things...” (06:50–07:36)
“The first version of Deep Agents...I hacked on over a weekend....Had to do some research...and it worked fantastically well.” (08:26–09:04)
“A common misconception...is that you need 1000 scenarios...You could start with 5, you could start with 10...creating these evals is a really good way to do product thinking...” (10:00–10:37)
“The space has just moved so fast...you have to basically redo your agent every nine months...” (11:50–13:11)
“Models that are better at coding are generally actually...better general purpose agents.” (15:29–15:59)
“We want an open model that works really well with open harnesses.” (17:04)
“Always on, asynchronous, event-driven agents...will be a really big productivity unlock.” (00:00)
“...we’ll see agents kind of remembering things and...learning from their interactions.” (21:08)
“I think the thing that OpenClaw changed is people started thinking of these agents as like identities, as their own things...” (21:57)
“Did it change the North Star for what we build? Absolutely it did...It made it so much easier to communicate some of the ideas...” (23:11–23:56)
On Simplicity in Tech:
“Simple is, is really good.” (02:57, Harrison)
On Evaluation:
“This is a really good mental model for coming up with what these agents should do. And then you can use that to drive all your changes.” (11:12, Harrison)
On Iteration and Fast Cycle:
“You have to basically redo your agent every nine months at the pace that things have been like with these agent harnesses.” (11:50, Harrison)
On OpenClaw’s Impact:
“Jensen said...Every enterprise needs a claw strategy...It set a North Star, it set a new objective for kind of like what these agents can and should be able to do.” (23:11, Harrison)
Harrison Chase delves into the nuances and future of agentic frameworks, the challenges and rewards of enterprise adoption, and the interplay between rapid innovation, observability, and trust. With LangChain’s Deep Agents, LangSmith, and collaborations (like the Nematron Coalition), the future of autonomous, proactive AI looks both promising and full of opportunities for those willing to iterate swiftly and build thoughtfully.