
LangChain is a popular open-source framework to build applications that integrate LLMs with external data sources like APIs, databases, or custom knowledge bases. It’s commonly used for chatbots, question-answering systems, and workflow automation.
Loading summary
Sean Falconer
LangChain is a popular open source framework to build applications that integrate LLMs with external data sources like APIs, databases or custom knowledge bases. It's commonly used for chatbots, question answering systems and workflow automation. Its flexibility and extensibility have made it something of a standard for creating sophisticated AI driven software. Eric Friess is a founding engineer at LangChain and he leads their integrations and open source efforts. Eric joins the podcast to talk about what inspired the creation of LangChain, agentic flows versus chain flows, emerging patterns of agentic AI design, and much more. This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.
John
Eric, welcome to the show. Hey John, glad to have you here. I've used LangChain a bunch, so I'm excited to get into it and talk about it.
Eric Friess
Yeah, me too.
John
So you were there from the beginning, so I just want to kind of go back to that point. From what you can recall, like what sort of for you and other people that were kind of part of that initial founding team, like what inspired you to create LangChain? Like how did you sort of identify this need for a framework that chained together LLMs, created this abstraction layer?
Eric Friess
Yeah, definitely. And some important context there is that. There's kind of a distinction between the starting of the open source project, which Harrison did in October 2022, and then kind of when the company formed around it, which was kind of the first half of 2023. So the Open source projects kind of came out right place, right time, right before ChatGPT really launched and everyone kind of started building with these LLMs. And that was really Harrison's brainchild. He was kind of working with GPT3, which I don't know if you interacted with some of the text completion models instead of the chat completion ones, but they were relatively challenging to work with. It was string in and then it just kind of continued the string output. And so there was a lot more manual steps you had to do as a user of those from kind of the concept of output parsing was kind of one of the big reasons that people use LangChain in the early days all the way through. Like how do we turn this into some sort of agentic loop using some of the current research of the time. And so we've kind of continued that effort as the company where we're kind of keeping up with the latest models and kind of integrations that people want to work with. That's primarily the area that I work in as well as making kind of usable implementations both for prototypes as well as production applications of the latest and greatest research. Like how do you build software with these new models?
John
So back in the early days of GPT3, there was, I guess their APIs were not necessarily the easiest thing to use, but they have as a whole, I think the model foundation model companies have gotten a lot better in using their APIs directly. How does that change LangChain's focus in terms of the value add that they're bringing above and beyond just having essentially a cleaner API into some of this stuff?
Eric Friess
Totally, yeah. So let's compare October 2022 to now where the landscape in October 2022 was. You kind of used GPT3, maybe you used one of the early CLAUDE text completion models as well. And so there was, I think, three integrations in the original LangChain library, which were those LLMs. And at that point you had to as a user handle all of your kind of message formatting, all output, parsing into messages manually. Where the kind of simplification of what these chat models are doing is it's still just a text completion model, but they're trained on very specific formats of alternating human message, AI message, human message, AI message. And so that allows the API providers to actually guarantee that the next message is going to be an AI message in a more strict way than we can do as just observers of the output of the model. And so with them handling more of that, our focus becomes a lot kind of further up the stack, if that makes sense. So our main focus right now is really on laying graph. How do you orchestrate these kinds of agents as state machines where the LLMs are clearly very powerful, but they're not quite powerful enough yet to build functional software with just the simple react loop, where react loop is just give an LLM access to all the tools that you want to give it access to, go and execute those tools and then pipe that output back into the model. And so that's the simplest react loop doesn't work all that well because as you increase the number of tools that you provide to the model, it starts calling the wrong tool, Sometimes it doesn't call it with relevant parameters to it and those kinds of things. And so what we've kind of been doubling down on is okay, give your LLM access to, and it depends a lot on the model, but give your LLM access to like 5 tools or have some sort of flow where at different steps it might have access to different ones. Let's use an email assistant, as an example, you might classify an email either as like recruiting inbound or like from a recruiter, or like a candidate reaching out to you. And in those two different instances, you might want to give it access to different tools in terms of like, okay, I want to give it access to nothing and just respond like, hey, like, write a draft of interested or not interested, depending on the background company or a candidate coming in, you might want to attach that to something like your recruit, like greenhouse or applicant tracking system in order to track that they emailed you. And so you can actually segment that request in this kind of like, nice graph flow that we visualize in Line Graph Studio.
John
Is that really about sort of having like an opinion, I guess, as like a framework for how people need to stitch together these agents? So rather than someone kind of like, you know, making the mistake of creating sort of this almost like a, you know, monolith agent, where it's going to have access to a thousand tools, you're saying, like, don't do that. If you use this opinion essentially, and the flow of it will essentially force you to kind of break it up into modular components that make more sense.
Eric Friess
Precisely. And that actually is kind of a fun part of LangChain's evolution, where the first agent abstraction we came out with was the LangChain agent executor, which really just implemented that broad react loop. And to be clear, lots of people are using that and being very successful with it as long as their engineering the tools that it has access to in the right way, where you kind of have to have a limited number of tools and you have to have really good prompting and descriptions for how to call it, such that the agents actually end up calling it. And it obviously performs better with kind of the latest and greatest larger models when you do that. And some of these Lang graph flows really enable you to use smaller models as well, both for cost savings or maybe you want to run on hardware that isn't as powerful.
John
Yeah, so let's stay on agents. But maybe before we go deeper on that topic, can you explain sort of like what is different about agentic versus sort of what we've maybe seen previously of these fixed flow architectures through things like rag and so on.
Eric Friess
Yeah, the distinction in my mind is really whether it's a feed forward application or a cyclic application. And so we kind of distinguish them as chains versus agents or graphs. If you're building them with Langgraph and a chain always finishes like, it always just kind of goes through the steps maybe for The RAG case, it does a retrieval step, looks up some documents that it wants to paste into your prompt, passes that prompt to the LLM and generates some nice description that you might get out of a perplexity or something like that. In the agentic version of rag, you really do that retrieval step, generate that output, and then you might even like fact check that output and say, hey, is this factually accurate? Or you might do some other steps that kind of filter out that output and if you don't like it, you can actually just bounce back to the beginning and say like, hey, regenerate this. Based on this feedback that our editor node or editor sub agent told it.
John
To do, how do you get essentially avoid a situation where you are running an endless loop of reflection and planning and then this cycle never actually finishes?
Eric Friess
Yeah, there's a few different strategies. So by default langgraph has a recursion limit. So you can really think of this as the same problem you end up in when you recursively call a function too many times in Python or in any language where it'll kind of hit that stack limit. It's kind of the equivalent concept for a graph where it's really designed to hit that. And there's ways that you can kind of handle that case such that we can kind of gracefully exit when we hit that based on the artifacts that we've generated through all of those steps. But we've also seen a lot of people implement kind of just tracking in the state of a graph. Some Important background on LineGraph is the model is this like all these nodes and edges that you connect to each other, but all of the nodes operate on the same schema. And so we call that typed dictionary just the state of the graph. And so you can have a state field of like number of times. I've fact checked the answer and it starts at zero and you just increment it each time. It's just a different way of writing A for loop. And when we hit three, we say like, okay, like we're kind of done fact checking this. Now let's just respond to the user and say like, hey, I'm not completely sure if this is the right answer, but here's what we kind of ended up with. And so that caps the amount of time that your agent can be spent producing an answer.
John
And in terms of like, patterns of behavior, like, what are some of the agentic patterns that people are using and that are supported in the box with linegraph?
Eric Friess
Yeah, great question. So the first one that everyone starts with, well, I shouldn't say everyone, but many people start with is that React agent that I mentioned before because it's so simple. It's just two nodes. One of them is the LLM calling node and one of them is the kind of tool we call it tool node. But it just executes the code associated with the tool and produces the output that can be passed back to the model. That is kind of the quickest dopamine hit when you're kind of getting started building agents, where you can really build something that goes and sends an email for you or sends a slack message to you based on some input that came in. But we very quickly see people start adding Human in the loop type outputs where, okay, whenever I call my send email node I really want to review that first. So whenever before actually executing the send step, I'll interrupt is kind of the concept in line graph and show the user the draft that I've written and they can choose to give some feedback that can be then edited and shown as a new draft or just hit send and send it away. So React is kind of where people get started. Human in the Loop is one of the patterns that we see recurring and is now a first party concept in Langgraph, other ones that people are building a lot with. We now have a concept of kind of a global state store. So whenever you start a conversation with an agent, we consider that a thread kind of similar to how you would have alternating messages in ChatGPT. But you might want to have some sort of memory that's actually tracked between multiple interactions with your agent. And so that's where we get into kind of that global state store. In the Checkpointer. We've experimented with a lot of different versions of memory and what we've kind of come to is a realization that kind of less is more. Right? Like just being able to set and get keys potentially with some added features around filtering by sessions or like other threads that are kind of associated with a certain one. Those are useful abstractions, but maybe automatically editing and like trimming conversation histories and things like that, maybe not as useful.
John
Yeah, I think another like simple pattern too is basic reflection. I mean and you can even like, if you just go to chatgpt as a user, you can experiment with this where you say like, you know, write me an email that touches on these things and then copy that email output, paste it back in and say like analyze this and improve it and you'll get a better version and do that a couple times. And essentially that is the idea of like automated Reflection agent, you're just doing it manually totally. And then in terms of like creating these abstractions, like, given how fast everything is moving all the time, like, is it hard to create these types of abstractions? Like, how do you choose sort of the right abstraction when things are. You've been doing this since 2023, but now we're nearing 2025. But in the life of Genai, two years is really like 40 years or something like that. So things are moving so quickly. How can you kind of choose the right abstraction and not get into a place where you end up with having to make a ton of breaking changes as you learn new things and new things come out?
Eric Friess
Yeah, great question. And this is a constant struggle for us, as I'm sure you as a LinkedIn user of experience, as well as lots of the users listening to this podcast, we have gone through a lot of iterations where kind of the 2022 version of LangChain was really about these kind of all encompassing opaque chains where you would create. And the simplest one was like an LLM chain class which actually did a lot of magic under the hood and it was really difficult to debug. And then in 2023, we really focused on the LangChain expression language where you would kind of compose these chains as distinct steps. But a lot of the steps were still a little bit opaque. Like you had to know that the JSON output parser would take a string in and output some sort of a dictionary and those kinds of things. And then this year we've really gone towards Lang graph. And so each of those, even though we still support all the old things from a user's perspective, every single time that changes, it can feel jarring. Right? Because the kind of front and center quick start is now something that I didn't learn when I first learned Blankchain. And I think we've done a pretty good job of kind of announcing those and then still supporting the old models. Because obviously we have a lot of users operating on the LangChain expression language in particular. But I think the philosophy has really just become more and more bare bones where everyone who comes to LangChain is either a Python developer or a JavaScript developer. As long as we're talking about the two packages that we maintain, there are some community driven efforts and kind of go Kotlin, there's a few other ones, but for the two main ones, everyone knows those two languages. And so the more just raw Python that we can let people write, the better, because it's things that they already understand. And it's kind of no magic included there. And so with langgraph, everything is really just a Python function. And the kind of main abstraction is the same as Network X. If you'd used that before, where you're saying, like, create my graph, graph, add node, graph, add node, and then connect these two nodes to each other with an edge, or connect these two nodes with a conditional edge, which does those kinds of things. And sure, there's lots of bells and whistles on the side that you can use for interrupts if you throw a particular kind of error or kind of these checkpointer features where you're storing state or memory. But in order to get started, seeing some langgraph code makes a lot more sense than seeing some LangChain expression language code. If you've never seen it before, I don't know if you've used it, but it's a lot of kind of pipe operators. It looks a lot more like Bash than Python. And so that has been really the philosophy that's kind of been the big change in my mind.
John
Okay, and is it hard as well to. When you're creating these abstractions, like, how do you think about, like, how different models are going to have different limitations on them? And depending on if I, you know, switch the model suddenly from GPT 4 to a different version of GPT or Claude or whatever, then the size of the context window could be impacted by that. Maybe other types of features could be affected by that.
Eric Friess
Definitely. And as the industry has evolved, actually the constraints have changed a lot. I would say where when I first joined LangChain, the main difference between models was the context window.
John
Right.
Eric Friess
You'd use. I'm going to forget the actual numbers, but I think like, GPT3.5 Turbo had like, I think a 4000 token context window to start, and maybe it came up to 16,000 later. But then a lot of the smaller models were like 1024 tokens, and so you could like barely fit in the messages that you wanted to send to them, and they terminated with these really jarring errors, where it's like, okay, you exceeded the token window sometimes in the middle of the output it was generating, and then you just get like these partial things that weren't that useful. And then nowadays the main distinction I would call out is probably tool calling, where tool calling is easily the most important feature that LangChain and langgraph users are using out of the models, where it's really useful for providing some sort of tool calling and structuring output, where you provide a Schema and the LLM generates all the fields that you kind of ask for, which is a really nice interface point between code and these LLMs. And different models perform very differently. And even with the same model we can chat a little bit about this in a second. But like the open source Llama line of models for meta, the tool calling performance is actually markedly different with different providers depending on how they've implemented, kind of parsing of those tool calls, which is kind of another fascinating thing that we're kind of working with right now where it's kind of difficult to call that out on some of the provider pages in our documentation. But to answer your original question about how do we kind of manage a lot of that, the first step is really just documentation, right? We add notes to the provider pages where it's like, hey, this model has a check mark for tool calling or an X for tool calling, or maybe even a warning sign in some cases where it's like, hey, this model says it has tool calling, but it never actually calls tools. And the next step is really building abstractions that make sense around those. And so now in the library we have these bind tools and with structured output methods that if you just call them, they either work right, they give you some sort of structured output or they throw a not implemented error of like hey, this provider doesn't offer tool calling and so you can't mine tools.
John
To it in some ways like, and you're probably too young to remember this, but it feels like the early days of the web when like it was non standardization around like HTML and JavaScript and stuff like that. And you'd have to have these like control loops essentially or control statements around like, okay, if it's this specific browser of this specific version, you know, essentially this is the way that it needs to behave or this is the call that I can make. And it's just probably a byproduct of early days, things are moving quickly and everybody's trying to, you know, push things out to production and there, there's going to be essentially a non standardization across all these things. And then when you bring in the open source models and they're going to be served by different people, there's going to be potentially different interpretations of how to, you know, respond to something like tools, for example, and people are going to have their own takes on those totally.
Eric Friess
And actually if any of the listeners kind of want to dig into the lore here, if you look at some of the source code for the original anthropic integration or, and Actually still like aws, Bedrock has one version of their integration that's kind of like this, where everything is still a text completion. So a lot of that message parsing logic actually happens in the LangChain integration, which is kind of a crazy world that we lived in once upon a time where there was an if statement of like in Bedrock in particular, right? It's like, if it's an anthropic model, parse it in this way. If it's a cohere model, parse the output in this way. Because the message tokens that are actually outputted are different, which is yeah, obviously kind of a speed to market type thing that you see across all these different providers.
Sean Falconer
In a world where agreements are the lifeline of businesses, DocuSign is more than just signatures. The company is transforming how professionals create, commit to and manage agreements. Use DocuSign Intelligent Agreement Management to turn complex negotiations into a streamlined experience, breaking down barriers and pushing business forward. Visit DocuSign.com today to learn how DocuSign IAM can give your business a competitive edge and propel you into the future of agreement management.
John
In terms of implementing agents with Langgraph, can you walk through, like, what is that process? Like? You know, I want to get started with Langgraph. I want to build a basic agent. Like, what do I need to do?
Eric Friess
Yeah, I'll actually start from the very beginning where we actually have a lot of different media formats to get people started because we've realized that some people like following video tutorials, some people like following kind of written documentation where you can copy paste code. And some people really like starting from like a complete template, but they just extend themselves. So for those three, we have the LangChain Academy, which is academy.LangChain.com, which is a video format. For this we have the Langgraph documentation. Just Google that and there's a quick start. Or we have Langgraph Studio, which has kind of five templates to get you started, which is actually used in Academy if you end up doing that. So definitely, if you want to dive in further, would recommend going to one of the sources from my coworkers who have made much better content than I can describe right now. But for the kind of brief answer, right now it's all just Python. So you either open up a jupyter notebook or you open up a text editor and you create a bunch of Python functions representing the different steps you want your graph to take. And funnily enough, the kind of graph interface for this tends to be more intuitive for people Who've interacted with like no code type editors, which the whole orchestration of it actually reminds me a lot of like labview or like some of those kinds of old. Connect a bunch of edges between different nodes for robotics type things and you define those operations, you connect them up and then you really just run it and see what the output is. One important step I left out is defining that schema. So by default we recommend just storing like a message history. So the most simple agent actually doesn't even have any tools where it's just accumulating. Like I send a message and it appends it to the messagestate and then the LLM sends a message and it appends that to the messages state and you just go back and forth in an interaction. But then you could also store something of number of turns of conversation and just increment that at each node. You could store which tools the LLM should have access to and then modify that over time. And you can really store any of that in there with the caveat that if you store anything that's not serializable, you won't be able to use some of the checkpointer kind of hosted features, if that makes sense.
John
And in terms of the workflow and orchestration, that's all happening within my environment where I'm essentially hosting my code.
Eric Friess
Totally. Yeah. So actually, yeah, important distinction. Langgraph is kind of mostly an open source project, but then we also have Landgraph platform which is kind of our hosting. If you've used like Next JS and Vercel, kind of a similar model where Lang Graph is all the orchestration, it knows how to execute everything. We have some open source versions of check pointers that allow you to kind of serialize that state and kind of fast forward and rewind through some of your execution using essentially database features for that. But then Langgraph platform is really about hosting everything as a REST API and also visualizing it. So we actually have some features in Lang Smith, which is our kind of debugging and observability commercial product that lets you visualize your graph, interact with the state manually through these kinds of interrupts and things like that. And it overall just kind of makes it easier to build some of these things over time. And both of them have a generous free tier, but have to call out that they are not part of the.
John
Open source offering for the nodes in the graph is memory shared across nodes?
Eric Friess
So the state memory is shared across all the nodes and it's identical across all the nodes and that's Kind of the whole reason for the abstraction. I think if that weren't the case, it would probably make sense to just write everything as raw Python. And to be clear, lots of developers still do that. But then the check pointer kind of global state that is stored across all threads as well. So you might send a single message which ends up kicking off a sequence of like six nodes just from that one message before it returns something that's meant to be shown to the user, which is typically a message on the message history. But the execution of all those nodes does not affect like your conversation with it, if I was the one to start that thread. Whereas that global state is accessible for both. So there's kind of a few layers.
John
Of grouping of data in terms of like scaling these. I guess it's essentially on the developer who is going to be hosting this to take on the scale challenges that they might run into with running one of these agent flows.
Eric Friess
Right, scale in which regard?
John
Well, in terms of if I build something in Lang Graph and some agentic workflow and then I put it up on a server somewhere and someone starts hitting it, you know, essentially one it's going to be hitting my hosting infrastructure, but the orchestration and workflow, if I'm using the open source version that's also hosted within my environment. So I'm assuming it's on me to essentially meet whatever the scale requirements and also architect it for scale by potentially breaking this up as needed.
Eric Friess
I'll answer this in two ways. So Langgraph itself is the one that's helping execute those nodes just for a single request. And so if you have some Architecture, if you, for example, use all the synchronous APIs instead of the async APIs, that's going to hog a thread for much longer than if you use something like Python Async for that. And there's decisions in terms of how you implement your graph that developers will always have to take responsibility for, because that's their graph code. But then once I have that packaged into essentially a fast API endpoint or something like that, that's when you have a decision of whether you want to. Or actually right before you have the FastAPI endpoint, you have a decision of whether you want to build that fast API endpoint yourself and host that just in a docker container on EC2 or some sort of hosting service. Or you can kind of go with Langgraph platform, we have both a cloud version of it, which is packaged with langsmith or Host kind of self host that Langgraph platform Container where we have a free tier and then after that we kind of enforce an enterprise license for that. But I think that kind of getting to your point, there are challenges associated with building infrastructure that hosts these at scale, where you're getting lots and lots of queries per second. And that's really where Langgraph platform comes into play. And we kind of take that on for you.
John
What's happening on Langgraph platform in terms of what you can share in the backend infrastructure to handle that scale?
Eric Friess
Yeah, so a lot of it is kind of segmenting the storage from the compute. So as mentioned before, we have this concept of nodes and edges, so that's kind of the execution. And then we have this concept of a check pointer, which is the storage of state, allowing kind of one worker to execute the nodes until it hits kind of the first interrupt or it ends for whatever reason. And then another compute node can actually pick that up and work on it later if a different request comes in. So there's some load balancing challenges associated with that. There are even challenges on just like implementing a check pointer in a way that handles infrastructure failures well. So like database connection goes down, those kinds of things. All of those are kind of part of the hosted offering of those. Right now we stick all of those in postgres. So the check pointer for hosted is there and the local one is all based on SQLite, which works well, but is kind of not as scalable.
John
What about in terms of like tool integration? Like you're going to have a certain like potential fragility with calling out to some third party tool so I can write the function that's going to, you know, maybe it pulls data from my CRM or something like that, but then that call could fail. So is it really, I guess like on the developer to sort of follow the best practices around, like, you know, distributed systems and retries and things like that, or is some of that offloaded by LangChain?
Eric Friess
Great question. So node retries, we have some features that make it easy to use those, but for the most part we're seeing people implement their retries themselves with something like tenacity or kind of one of these retry libraries. Just because you actually want different retry behavior in different situations where like a send email function, you probably don't want to retry that indefinitely because a empty response to me it could have still sent the email and I'm just not sure you need some kind of custom logic to check if the email was sent or not. But something like retrieving results from Google or something like that that can be retried pretty indefinitely because it's just gonna. You might hit rate limits or things like that, but you're not affecting any external state. But long term potentially, but no plans for that. Currently we do work with a few providers that do make working with tools easier. One of the companies we started working with recently is this company Arcade AI that does a lot of stuff around auth for tools where if you have multiple users kind of handling all the different permissions to their different services can sometimes be a challenge and they handle that.
John
I'm just curious what your thoughts on in terms of like optimization around inference both from a cost perspective and also, you know, performance perspective. Because you know, it's great that I can build like these agentic workflows that can do really complicated, amazing things, but every time I'm relying on API call to a model to perform some inference cycle, like there's not only a financial cost associated with that, but there's also performance costs with that.
Eric Friess
Yeah, great point. And we have a few fun anecdotes from a few of our customers on this where most seem to be following at least on the cost side. The philosophy is mostly that we're betting that the costs of these things come down.
John
Yeah, economies of scale, exactly.
Eric Friess
Like opening I inference cost has gone down by like 50x or something in the last year. It's like more than an order of magnitude in the last year. And so that'll probably continue to happen at least in the short term as we get smarter about how we execute these models. And then speed also tends to come with that. Like a lot of the kind of model performance side is just making the models smaller either through kind of decreasing precision or doing different kinds of sparsity strategies. And that kind of gets you the benefit of both while not really sacrificing performance.
Sean Falconer
But the global developer talent shortage is expected to grow to 4 million in 2025, further contributing to developer burnout. With the security and talent shortage growing rapidly, businesses need effective tools to help developers work efficiently and securely. That's where Bitwarden comes in. Bitwarden delivers trusted open source security solutions that empower your developers and security teams to securely manage and share sensitive information online. Protect your infrastructure secrets, API keys, user passwords, mailing addresses, credit cards, passkeys and more. With easy to use and enterprise ready Bitwarden solutions. Start your free trial today@bitwarden.com.
Eric Friess
There is still a benefit to using, especially on the open source side, right? You're going to get better speed and cost characteristics out of a Llama 7B model than a Llama 7DB model. And so that's the main area that we see people pulling that lever in the short term, where you might have a classification step that's run on a 7B model, just because it's a lot faster, a lot cheaper. But then typically when you're generating output for users and things like that, you tend to back up on the larger models for important tool calls that like, kind of decide the control flow of the application. Especially for more complicated ones, you tend to use a larger model. So we kind of have this new iron triangle where you're talking about, I would actually group, like cost and latency together in one corner, you have kind of accuracy characteristics in another, and then you have like, reliability, which is kind of always, always concerning.
John
One for those that are building on, you know, line graph or even the sort of older version of LangChain, like, what are some of the typical challenges that they run into that they have to navigate? Like, what are some of the things that they should know about, essentially?
Eric Friess
Yeah, great question. Well, first of all, would love to hear from folks. We now have comments on all of our docs pages. We monitor our GitHub, obviously, through kind of pull requests and issues. This week has largely been working through the backlog on pull requests there. But in terms of challenges that users see getting started, I think the main one is information. And this is kind of a problem across the industry. And this is actually, I think part of the reason that we've been relatively successful in this space is we document a lot of things in our documentation on these new strategies in a way that you can kind of immediately implement. But with something new coming out every day, it is often challenging to kind of drink from that fire hose and information. And so this is also something that we struggle with internally, where it's like, okay, what is the quick start to Landgraph, right? And I think right now we have like three or four different ones, depending if you want to kind of target more of like a hosted production application, online graph platform, or if you're just kind of playing and want to build some sort of chatbot. And so I think the main one that I would like to work on kind of in the beginning of next year is, is how do I decide which happy path I want to take as a new user? And it's challenging. It's something that we rework every three months just because the. The state of the art is always different.
John
What are some of the most surprising or creative applications that you've seen built on top of LangChain.
Eric Friess
Ooh, good question. I think. Okay. Surprising and creative. One of the ones that surprised me earlier this year is some engineers from Uber actually gave a presentation on this is @GitHub universe gave a presentation on writing unit tests with a kind of code assistant built on top of Langgraph. And code assistants are obviously super popular right now. There's very distinct productivity ads from them and so a lot of different companies are working on them. But it was really cool to see kind of the step by step process of like how you actually lay out nodes in Langgraph to do this. And the Talk is on YouTube would highly recommend giving it a listen. So that one was kind of a real life kind of code assistant that was being used by some portion of their engineering org. I really like Elastics Security Assistant. That's been kind of a fun one. They actually they've been working with us for a long time so they built the first iteration of that on the Agent Executor, which was kind of that react loop that I mentioned before with some extensions to it. And then they've recently migrated that to Langgraph as well. And that's really about generating security rules for. I forget what the term is, but the kind of automated quarantining of software and like monitoring logs for writing rules for that has been a cool one. And then all the customer support ones, just the variance in what steps different folks want to build into their customer support systems is quite surprising to me. Where? I don't know, I think it's a domain that I haven't done as much work in myself. And so just the number of different tools that you might have to integrate for those kinds of flows has been relatively surprising. And the different ways that people want to do it.
John
Yeah, I mean I think if you can get to a place where we can automate like unit tests, certain parts of security, governance on call documentation, like the types of things that are not always the funnest jobs but have to be done, I think that would make a lot of people happy.
Eric Friess
I agree.
John
What about zooming out even beyond LangChain? What are some of the key innovations that are happening in AI right now that are interesting to you and you're sort of keeping an eye on.
Eric Friess
So first and foremost, tool calling performance is definitely the best one. So the way people are using tool calling in Langgraph is really as a reasoning model, an open domain reasoning model where you have some sort of context and data and you Have a question of like, hey, is this an important customer or a not important customer that just reached out to my inbound form. And as there's kind of a threshold that we haven't hit yet where you kind of just trust the model to make the right decision out of the box. And I think that's probably the one that kind of throws gasoline on this fire in terms of what people can build with them. Where right now, even if you successfully run your first hundred things in your test set through it, the 101st that fails in kind of an unexpected way is kind of a big hit in terms of your ability to release that. And people are using Langgraph in cool ways to kind of guardrail that and make sure those kinds of unexpected ones don't go out. But the reasoning capability is definitely something that leaves it to be desired. I'm personally very excited about these multimodal input and output models that we're starting to see where we have seen kind of text and image modalities as inputs to models for a little while now, kind of starting with GPT4 Vision Preview and then kind of 4.0 is now out of the box capable of that. But now we're seeing more models that are actually outputting kind of mixes of text, images, audio that are not necessarily the most reliable yet, but they kind of show a vision for what the future could look like once training of those models kind of produces something that's very good. Potentially hot take, but personally I'm a little bit less excited about some of these video models and things like that. I think they make really cool art and they're really good for brainstorming in the creative space. But I guess for me personally maybe a little bit less useful. But we're totally going to see people generating prompts for those in LangChain and then generating kind of critiques of them to kind of edit over time. I'm sure that'll be a use case that we see. Back to the modalities. One though I'm really excited about audio. I think OpenAI real time or advanced voice in the app and then the real time API is an area that I'm personally very excited in. I actually got my start in NLP back in, what was it, 2015. My first internship was at a company called Jibo, which is this little white robot that was out of the media lab and we were doing some strategies for open ended text there. But the ability to interact with something over voice I think is a pretty magical experience when it works well and I was actually using advanced voice mode to practice my Mandarin for the last few months for a trip I took to Taiwan two weeks ago. And it's pretty fun to be able to kind of have a teacher in your pocket for anything you want.
John
Yeah. Also I think opens up the door to real time translation and not being that far from the vision that Star Trek put out years ago of the universal translator and things like that is pretty amazing. I worked on Google Assistant for a while, so I know the perils of and frustrations around voice when it doesn't work. So it's like incredible the step functions that are happening. And I think, you know, going back to your point around multimodal and how it's not like, you know, perfect performance today, but if you even look at like image generation from 2022 to where it went from there to 2023 to. And then like videos in 2024, like the speed with which things are getting better is like exponential essentially. So it probably won't take that long until multimodal is significantly better performance than where it is now.
Eric Friess
That's the hope.
John
Yeah. In terms of productionizing LLM based applications, like what is the biggest hurdle that people typically run into?
Eric Friess
I would probably say reliability is the first one where standard system stuff exactly like, but not infrastructure reliability. In this case, I think it's probably more the outputs are non deterministic and so you want, you need like even just defining the criteria for your customer support application. Like what percent of my emails have to not piss off a customer in order for me to release this in production. Like defining criteria like that is something that I guess we've had to do to some extent with like kind of training kind of human operators of these kinds of things where it's like, okay, as soon as we feel like someone has enough, I don't know, credibility to respond to these things. But the amount of evals that are still happening just with like vibe checks. Harrison actually did a talk on Tuesday with James from Character AI and he was talking about some of the. A lot of the early evals at Character were literally just the researchers playing with the systems and just vibe checking it. And I think there's still a lot of that going on just because defining the criteria, like defining concrete evaluation criteria is really challenging. And now we have lots of systems. Langsmith has some great systems for running evals, both online as well as kind of before you release new versions of your application. But in order to get any of those evals out, you really do need to put in the effort to define what that is.
John
Yeah. And there's a whole new crop of companies that are investing, building products, like, you know, brain Trust. I'm excited about. I think, you know, we're having them on the show sometime soon. Like, there's. There's people trying to address that issue. And it is like something that fundamentally has to be addressed because the non deterministic nature, it's very hard to tell whether when you make a change, are you actually moving things in the right direction or not. And it is a lot of just kind of like, you know, dipping your finger up in the air and seeing which way the wind is flowing at the moment. I think there is a lot of this, like, vibe checking that's going on.
Eric Friess
Totally. I think lengthsmith Evals actually has done a really good job of helping turn those vibe checks into real evals. Like we launched an annotation queue, I think, about a year ago at this point. And the way people have used that to kind of curate data sets from live data or even just from like internal data in terms of interacting with things and then converting those into evals. We obviously have lots of features for like using LL as a judge or kind of converting just like kind of running code to evaluate those kinds of things. And it's been really interesting working with lots and lots of different customers on that to just see kind of how different organizations think about that.
John
Awesome. Well, Eric, thanks so much for being here. I really enjoyed it.
Eric Friess
Thanks so much, John. Have a good day.
John
Cheers.
Podcast Summary: LangChain and Agentic AI Engineering with Erick Friis
Software Engineering Daily | Release Date: February 11, 2025
In the February 11, 2025 episode of Software Engineering Daily, host Sean Falconer engages in an insightful conversation with Eric Friess, a founding engineer at LangChain. The discussion delves into the inception, evolution, and future of LangChain—a prominent open-source framework designed to integrate Large Language Models (LLMs) with external data sources. Friess provides a comprehensive overview of LangChain’s journey, its transition from basic chains to sophisticated agentic AI flows, and the emerging patterns in agentic AI design.
Sean Falconer opens the discussion by acknowledging LangChain's widespread use in building AI-driven applications like chatbots and workflow automation systems. Friess elaborates on the framework's genesis:
"The Open source projects kind of came out right place, right time, right before ChatGPT really launched and everyone kind of started building with these LLMs."
[01:25]
LangChain was initially created by Harrison in October 2022 to address the complexities associated with early LLMs like GPT-3, which required extensive manual effort for output parsing and message formatting. The framework provided an abstraction layer that simplified integrating LLMs with APIs, databases, and knowledge bases, making it easier to build sophisticated AI applications.
As the ChatGPT surge amplified the demand for robust LLM integrations, LangChain evolved from handling simple chains to more complex agentic AI flows. Friess highlights the limitations of early models:
"The simplest react loop doesn't work all that well because as you increase the number of tools that you provide to the model, it starts calling the wrong tool."
[02:39]
To overcome these challenges, LangChain introduced Langgraph, which orchestrates agents as state machines. This approach allows for modular components, ensuring that LLMs interact with a limited set of tools relevant to specific tasks. For instance, in an email assistant scenario, LangChain can classify emails and determine which tools to access based on the classification, thereby maintaining efficiency and accuracy.
A significant portion of the conversation distinguishes between agentic flows and traditional chain flows:
"The distinction in my mind is really whether it's a feed forward application or a cyclic application."
[07:00]
Chain Flows: These are linear, feed-forward processes where each step is executed in sequence without revisiting previous steps.
Agentic Flows: These involve cyclic processes where outputs can trigger additional actions or loops, such as fact-checking or regenerating outputs based on feedback.
Friess explains that agentic flows offer a more dynamic and responsive architecture, allowing applications to handle complex, real-world scenarios more effectively.
With the rapid advancements in LLMs, LangChain has had to adapt to varying model capabilities and limitations. Friess discusses the shifting focus from context window sizes to tool calling:
"Nowadays the main distinction I would call out is probably tool calling, where tool calling is easily the most important feature that LangChain and langgraph users are using out of the models."
[15:19]
He emphasizes the importance of structured output and schema definitions in tool integrations, allowing different models to interact seamlessly with various tools despite their inherent differences. LangChain now includes features like bind tools and structured output methods to standardize these interactions.
For developers interested in building agents with Langgraph, Friess outlines a multi-faceted onboarding approach:
"We have the LangChain Academy, which is a video format. For this we have the Langgraph documentation. Just Google that and there's a quick start. Or we have Langgraph Studio, which has kind of five templates to get you started..."
[19:50]
Langgraph supports Python developers by providing intuitive graph interfaces reminiscent of no-code editors. Developers can define Python functions as nodes and connect them to build complex workflows. Additionally, Langgraph Platform offers hosting solutions and visualization tools, such as Lang Smith, to aid in debugging and observability.
Implementing agentic workflows presents several challenges:
Recursion and Infinite Loops: To prevent endless cycles, Langgraph imposes recursion limits and allows developers to implement state tracking mechanisms to cap the number of iterations.
"By default langgraph has a recursion limit... It's just a different way of writing a for loop."
[08:03]
Tool Integration Fragility: Interacting with third-party tools can introduce points of failure. While LangChain provides mechanisms for handling retries and errors, Friess advises developers to integrate best practices from distributed systems to enhance reliability.
"Most of the time we're seeing people implement their retries themselves with something like tenacity..."
[27:43]
Scalability: Scaling agentic workflows requires careful architectural considerations. Langgraph Platform addresses some scalability concerns by separating storage from compute and offering hosted solutions that manage load balancing and infrastructure reliability.
"Langgraph platform is really about hosting everything as a REST API and also visualizing it."
[22:07]
Friess shares his excitement about several emerging trends in AI:
Multimodal Models: The integration of text, image, and audio inputs and outputs is opening new avenues for applications, such as real-time translation and interactive voice assistants.
"I'm personally very excited about these multimodal input and output models..."
[35:43]
Optimizing Inference: Reducing costs and improving performance are critical. Friess notes that smaller models (e.g., Llama 7B) offer cost and speed advantages, while larger models are reserved for more complex tasks.
"We have this new iron triangle where you're talking about... accuracy characteristics in another, and then you have reliability..."
[30:08]
Enhanced Tool Calling Performance: Improving how models interact with tools remains a priority, ensuring that agentic workflows are both accurate and reliable.
"Tool calling performance is definitely the best one."
[35:43]
LangChain has been instrumental in enabling a variety of innovative applications:
Code Assistants: Engineers at Uber utilized Langgraph to build a code assistant that writes unit tests, showcasing the framework's capability to enhance developer productivity.
"Langgraph to do... a real life kind of code assistant that was being used by some portion of their engineering org."
[33:29]
Security Assistants: Elastic's Security Assistant leverages Langgraph to generate security rules and monitor logs, demonstrating the framework's utility in maintaining robust security protocols.
"They built the first iteration of that on the Agent Executor... and then they've recently migrated that to Langgraph as well."
[33:29]
Customer Support Systems: Diverse customer support workflows have been implemented, integrating multiple tools to handle various support tasks efficiently.
"The different ways that people want to do it... where I don't know, I think it's a domain that I haven't done as much work in myself."
[33:29]
Eric Friess provides a nuanced perspective on building and scaling agentic AI applications with LangChain. From its inception to its current state, LangChain has continuously adapted to the evolving landscape of LLMs, emphasizing modularity, reliability, and scalability. As AI technologies advance, frameworks like LangChain and Langgraph are pivotal in translating these innovations into practical, real-world applications. The conversation underscores the importance of thoughtful abstractions, robust tool integrations, and proactive scalability strategies in harnessing the full potential of agentic AI engineering.
Notable Quotes:
"The simplest react loop doesn't work all that well because as you increase the number of tools that you provide to the model, it starts calling the wrong tool."
— Eric Friess [02:39]
"By default langgraph has a recursion limit... It's just a different way of writing a for loop."
— Eric Friess [08:03]
"Tool calling performance is definitely the best one."
— Eric Friess [35:43]
"The open source projects kind of came out right place, right time, right before ChatGPT really launched and everyone kind of started building with these LLMs."
— Eric Friess [01:25]
This detailed summary captures the essence of the podcast, providing a structured and comprehensive overview of the discussions between Sean Falconer and Eric Friess. It highlights the key points, insights, and practical considerations for developers and enthusiasts interested in LangChain and agentic AI engineering.