
Python’s popularity in data science and backend engineering has made it the default language for building AI infrastructure. However, with the rapid growth of AI applications, developers are increasingly looking for tools that combine Python’s flexibil...
Loading summary
Narrator
Python's popularity in data science and backend engineering has made it the default language for building AI infrastructure. However, with the rapid growth of AI applications, developers are increasingly looking for tools that combine Python's flexibility with the rigor of production ready systems. Pydantic began as a library for type safe data validation in Python and has become one of the language's most widely adopted projects. More recently, the Pydantic team created Pydantic AI, a type safe agent framework for building reliable AI systems in Python. Samuel Colvin is the creator of Pydantic and Pydantic AI. In this episode, he joins the podcast with Gregor Van to discuss the origins of Pydantic, the design principles behind type safety in AI applications, the evolution of Pydantic AI, the log fire observability platform, and how open source sustainability and engineering discipline are shaping the next generation of AI tooling. Gregor Vand is a security focused technologist, having previously been a CTO across cybersecurity, cyber insurance and general software engineering companies. He is based in Singapore and can be found via his profile at Van HK or on LinkedIn.
Gregor Van
Hello and welcome to Software Engineering Daily. My guest today is Samuel Colvin. We're really excited to have you here today, Samuel.
Samuel Colvin
Thanks so much for having me. Yeah, really excited to be here.
Gregor Van
Yeah. So Samuel, you are just to get this completely correct, you are the founder of Pydantic, is that correct?
Samuel Colvin
I am the founder of Pylantic. I have to say, since I recently moved to the Bay Area, people have started asking me for the first time, did you also create the library? Which seems like a slightly weird question. I feel like I'd be a fraud if I was created Pylantic the company running Pylantic the company and didn't create the library. But yeah, I created the original library way back and now run the company by the same name.
Gregor Van
Good. Yeah, well I'm glad we cleared that one up at the start. I'm glad I didn't ask did you only found the company. So, as we like to do on Software Engineering Daily, just getting a sense of where have you come from from a developer standpoint. I've seen on LinkedIn you've worked through some interesting companies and I think it would be really interesting to understand how Pydantic came about. We're obviously here to talk about Pydantic AI, but we're going to just hear about the story to this point in time.
Samuel Colvin
Yeah, so I was a mechanical engineer way back and then being a Software Engineer since like 2014. Worked in a number of different roles, ran a bootstrapped self funded company before, but then started, I don't know, really got into open source, like 2016, 2017 and around then, type ins were just coming to Python In I guess, 3.5, 3.6, and they seem really powerful, but it seemed to me then, and seems still to me today, completely ludicrous that they don't do anything at runtime. As in, I totally understand the history of why that's the case. It makes sense, sense once you understand that. But imagine week one of learning to code and you're told you're writing software, it's going to be interpreted by a computer, everything needs to be exactly correct and then you're told, oh yeah, by the way, these type hint things in Python, although you might get a squiggly line, everything will initially continue to work. When you pass the wrong types, you'll just get an error later on. It's completely weird. So it came from this, like, could we, did it even make sense? Was it possible to enforce those types of. It worked. It obviously worked spectacularly well relative to my initial experiment. Was that possible? So, yeah, that was like 2017 and then the library just took off. I mean, took off gradually, but relative to other open source I had done at that point, kind of took off.
Gregor Van
Yeah, I mean it's had like. Is this number correct, like 300 million downloads monthly?
Samuel Colvin
Yeah, we're about 460 million downloads a month now. Just hoping to cross the like half a billion downloads a month sometime, I guess, end of this year, early next year.
Gregor Van
Yeah, yeah. And I mean it's used by, I mean if you just look at any kind of logo on the pedantic website, all the big, big, big people, everybody. Nvidia, Meta, NASA. Yeah, everyone.
Samuel Colvin
All the companies who are writing Python and they're using it somewhere.
Gregor Van
Yeah. Is it fair to say though, when it was introduced it was a little bit controversial to introduce types to Python? Is that a fair statement?
Samuel Colvin
So, as I say, types obviously came to Python long after Python existed. They were there for static typing and for things like documentation. Arguably it was almost an oversight when they were created that they were left around at runtime. And I think there were those who wish they had got rid of them at runtime and stopped people like me doing odd things with them at runtime, because obviously once people like me started using them at runtime and once they were found to be really useful, it did limit what you can go and do with them. In static typing time. Right. There's a world where, as in Typescript, they are not part of the actual language at all. And you'll have much more flexibility about what you do with them because they're not only part of the ast, but they're actually there in the runtime. We can use them, but that has some constraints on what you can do with them, as I say, in static typing. But I think it's incredibly valuable. Right? I mean, it's the only language where you can do this trick effectively. Pedantic is obviously not the only library that does it, but it's kind of the preeminent one, I guess, at this point.
Gregor Van
Yeah, and you mentioned constraints there. We're going to touch on constraints in a little bit. Just sort of philosophy behind that. We're going to move on from pure Pydantic in a second. V3. Where's that at?
Samuel Colvin
The first thing to say about V3, because I know that obviously the transition from V1 to V2 was quite painful for a lot of people. We fixed a lot of broken edge cases that we should have fixed before V1, but we also probably made some mistakes in V2. The V2 to V3 transition will be much, much smoother. We will mostly be changing some config defaults that we haven't been able to change because we've been very careful about breaking changes since, and we can probably telegraph most of them. The biggest change coming soon in Pydantic is, I think we're going to call it Struct. So it will be a new primitive type in Pydantic, probably used as a decorator. It should be pretty much data class compliant. But the big difference is that under the hood, the data will be held as a Rust type rather than as a Python type, particularly if you're loading the data from JSON or from a binary format. So we should be able to get 3x ish improvement in performance out of that, which will be significant because obviously Pydantic is already very, very fast. It's 50ish times faster than Pydantic V1, which was already faster than some of the libraries that went before. But there are a number of interesting things you can do if the data is fundamentally in rust. One of them is go straight to a parquet data without ever having to go through the Python types. We also will have an array type which kind of will go with it to allow you to basically define a table. That's probably the biggest thing. And then there are some other that we're discussing whether or not we add a binary input type, which would be probably protobuf, to make it very easy to basically serialize Pydantic models over GRPC or something like that. Not quite sure about that. But yeah, there are some cool things you could do if we remove those constraints, obviously. David Hewitt, who now does a lot of work on Pynantic, is also the maintainer of PYO3, the rust bindings for Python. So we're kind of pushing the limits of what you can do with Python and Rust in Pydantic.
Gregor Van
Awesome. Yeah, very exciting. So, yeah, hopefully for anyone who's seen Pydantic in the title of this and came to hear about that, then there you go, there's the update. So let's move on. The kind of next, I guess library product that came along was Logfire, I believe. And am I right in saying that also was when I guess Pedantic as a company got venture backing as well? Is that fair to say?
Samuel Colvin
Yes. So we, end of 22, beginning of 23, Sequoia wonderfully reached out to me. It wasn't a company before that, it was just me working on it. And so I started the company as I got the seed round. So yeah, raise the seed round beginning of 2023. So I had going back a little bit on pydantic, like early 2022, I started working full time on Pydantic, doing the rewrite to Rust. About eight months into that three month project, I, I was halfway done wondering how I was ever going to finish this Sisyphean task of rewriting the whole thing while it was blowing up in its usage in the background. So yeah, the first thing we did when we raised money was hire a team and go and release V2, which we did in the middle of 2023. And then we started looking around for what to build. And I had actually owned the Logfire domain name from like 2019. I had felt that logging, as I would have called it then, or tracing in Python, was breaking broken, or at least nothing like as nice as it should be. So we were trying to work out what we were going to build on the commercial side and we settled on building this observability platform, which is now Logfire.
Gregor Van
Awesome. So yeah, let's just, as you say it's observability, let's just take sort of five minutes on. You've kind of touched on it about what is Logfire, how does it work, why is it there?
Samuel Colvin
It exists because I wanted the experience of instrumenting your Python application to be as simple as writing the rest of Python and OpenTelemetry had come out. OpenTelemetry is a wonderful open standard for doing observability. It means that there are SDKs out there for every language basically that you might want to use. And there are things like the OTAU collector that can proxy the data, spin it out to multiple different back ends. Almost every platform out there supports OpenTelemetry. Now, the problem was that they made this pragmatic decision early on in the development of the SDKs to have the same API in every language. And that makes a lot of sense in some ways, but it means you can't do all the neat things you can do in Python. And I think it's also fair to say it's sort of managed by teams within the hyperscalers and the big observability companies. There's never been anyone else who has been particularly interested in making it easy to use as a library. And so the first thing we have is Logfire SDK, the PIP install, logfire. Incredibly nice experience for tracing. I think it's fair to say the nicest way of doing tracing in Python that's just emitting opentelemetry data. We do some clever things to make it better than normal. OpenTelemetry. We allow you to serialize a Pylanic model or a data class or even a datetime and we'll record data about that which isn't supported by default, OpenTelemetry. And then the commercial bit is the LogFire platform on the back end, which is a closed source observability platform, which is what we charge for. Although actually we have an amazingly generous free tier, possibly too generous, but now we've made it, I think we're not going to change that limit where you can look at your logs or your traces and technically it's all tracing data, but we make it look, it feels like logs. It's instant. They'll come through as your application is running and then you can just basically click expand to dive into what's going on within a particular task or HTTP request or whatever else. But we also do metrics, analogs, so we have full observability. I think the other thing that has changed, there are two other things that I guess make LogFire unusual. One, we let you write full SQL to go and query your data. So it's ultimately an analytical database with a nice UI on it and an SDK and all that stuff that's useful for developers who want to write some SQL. It's easier than having to learn a new DSL. But the really powerful bit is AI's love writing SQL. So the single best they call it like AI as an SRE and there's a whole industry of companies trying to do this. Honestly, the best experience I have seen for that is connect Claud code to logfire via MCP and ask it go fix a bug or go and investigate the slowest endpoints or go and find out why my users are churning and suddenly Claud code has visibility into all of your actual application data and it can go and investigate. So that's incredibly nice for us. Outcome of supporting SQL. I don't think any of us knew that's where AI was going to go when we started that back in 2023, but it's definitely super powerful then. The other thing that's different about us is we have first class support for the AI observability stuff. Things like evals, things like token usage and pricing, but we're also general observability because I don't think that in five years time anyone will talk about AI observability, it just won't be a thing. In the same way no one talks about cloud observability or web observability, it's just going to be a required feature of any observability platform.
Gregor Van
Yeah, lots of interesting kind of nuggets there. I mean as you touch on AI as SRE and there's obviously a bunch of companies just taking that and I think it's then very interesting where actually if you just cloud code to a good library, then there you go, you've kind of got it. And I think we're just seeing that where I'll call these sort of low level platforms are still beating out anyone who comes along with a sort of specific product around that. So yeah, super interesting. We're going to move on to pydantic AI which is the main topic for today. So again let's just talk about where did it come from And I'm sure there's maybe a bunch of people listening right now thinking oh well of course pydantic, of course they were just going to do an AI thing, but I know that's not the story, so let's talk about it. Where did it come from? What is it?
Samuel Colvin
I mean in some ways we were doing the opposite. Like we were reasonably cynical about some of the AI stuff for a long time and that's probably why we didn't build an agent framework in 2023 as others did. In some ways that probably turned out to be a good decision because we waited for the patterns to settle and we were able to build something that has probably influenced the patterns a bit. But we've also been able to read what others are doing. Whereas those who created agent frameworks or equivalents in 20, 22, 23 are kind of stuck. Either they have to go and break their API again or they're stuck with primitives. I think we've now moved on from. So yeah, come late last year we were starting to build AI functionality into logfire. I knew all these agent frameworks out there used Pydantic, so obviously LangChain, Langgraph, Crewai, Llamaindex, all of these guys used Pydantic and I assumed there was going to be a good one that I could go and use. I started looking at them and was super disappointed by what I found. They're not type safe. I think type safety is incredibly important and only getting more important with AI's writing code. If you look at the standard of engineering among the top 100 or 200 Python packages, it's pretty high. Everything has coverage, everything has pretty thorough unit testing. They have CI that does releases, they have typed documentation, tested documentation, stuff like this. That was none of those low level things that sure, no single one is a showstopper, but they're kind of indicators of quality. Seem to be the case with any of the other agent frameworks or LLM libraries. And so we thought for a long time about whether does this. Or thought for a bit, is it really worth us going and building another of these things? It seems like a kind of gold rush. Don't we want to make the spades? But when we realized that we decided people weren't doing it the way we would, we decided to go and build pydantic AI. So we tried to keep it relatively unopinionated and low level. Try to do the things that you definitely don't want to have to reimplement again and leave the kind of opinionated how am I actually going to make this thing work with an LLM up to the end user? Because we're not the AI engineers, we're the people who are good at building libraries and we want to let you go and innovate on how exactly you're going to use an LLM. Is that a good starting point?
Gregor Van
Yeah, I mean I think just to make sure we're not breezing past assuming knowledge from the audience. I mean pedantic AI ultimately can do things like it can call LLMs, it can create agents, do function calling, do evals. So it's agent orchestration, would you call it or not exactly.
Samuel Colvin
Agent framework agent orchestration. We also have a graph library. We're about to have a new version of our graph library, which is a bit less boilerplate than the current graph implementation. I mean, I think there's some debate about how valuable graphs are. They don't do anything particularly special you can't do in other code, but they definitely can be a nice way of thinking about it. So we have that support. I mean, for the most part, if you're building an application with LLMs, Pydantic AI will let you get going much more quickly, but unlike some of the other agent frameworks, will go on to be usable in production and will let you do the customization that you want to do. Where we probably what we don't have is necessarily all of the integrations or all of the batteries included. Here's a button to add support for whatever database or whatever rag service. We would rather let you build that because in production that's probably what you want to go and do anyway. And as I said earlier, I think type safety is absolutely critical. We have a fairly unusual way but type safe way of doing dependency injections so that you can access dependencies within tool calls, which is, I mean a lot of it's inspired by fastapi. Worked with Sebastian a fair bit on, not so much working on fastapi, but we talked to him, fair bit within the team and definitely a bit inspired by fastapi. But actually given that there are new typing concepts like Concatenate available in Python now, using them to give the most type safe experience you can.
Gregor Van
Yeah, and I wanted just to touch on kind of before we get into more of the agentic and tooling side of things, type safety obviously huge. That is pydantic. And I mean that is by definition constraints. And how have you thought about just that way of approaching things when it's come to pedantic AI? I mean, have the concept of constraints come into it in a way?
Samuel Colvin
Yeah, I mean, I think one of the things I'm realizing is people sometimes blur what they mean by type safety. I would say there's data validation or type validation. That's what Pydantic does. I have some untrusted data, I have some Python types. I will guarantee to give you an instance of that object that matches those types or raise a validation error. There's that thing and obviously we support that within PYNASTIC AI. I think we have some of the most advanced support for different ways of doing structured outputs. We support tool calling for structured outputs built in structured output that some models support and then what we call prompted outputs, where you basically give the model a JSON schema and say try and match this. But that's when I talk about type safety. I'm actually talking about static typing, like using the types that are available in Python to do relatively complex stuff. So for example, agent is generic in the output type. That means that when you access the result or output from from an agent run, that will be at typing time an instance of the output type. But we also guarantee it's an instance of that at runtime with pydantic. But we also go much further, like I say, dependency injection, type safe graphs, which again, the biggest downside of other graph libraries is you basically, sure you have this possibly useful mental model of a graph to describe things, but you lose all of the type safety that you would expect in other bits of your code base. We have a way of supporting graphs that is type safe.
Gregor Van
Got it.
Samuel Colvin
You're a developer who wants to innovate. Instead you're stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It's a flexible, unified platform that's built for developers by developers. MongoDB is acid compliant enterprise ready with the capabilities you need to ship AI apps fast. That's why so many of the Fortune 500 trust MongoDB with their most critical workloads, ready to think outside rows and columns, start building@mongodb.com build.
Gregor Van
So let's move on to actually kind of, I guess, the usage, so to speak, but I'll kind of just throw out. It's a very generic question, but I think it's maybe something that can lead to more discussion around how pedantic AI is approaching this. If I was to say, what is the correct number of tools to expose an agent to? Very, very big. Right? So how do we think about that?
Samuel Colvin
I think people talk about 10 to 15 max. I think it's interesting that that number has not moved this year. Although the same people claim models have got way brighter. And I think what has actually happened is people's models have got cleverer. But our idea of how big an agent should be has decreased this year. So this is like everyone was still. I remember in February I was AI engineer, everyone was saying this is the year of the agent. Well, that's true. I don't think that means we're going to stop Using agents. But one of the things that's changed is our definition of how big an agent is. So there are ballpark three definitions of what an agent is. There is the like AI definition which is an LLM calling tools in a loop until some condition is met. There is the engineering definition of an agent, which is effectively a microservice. And then the joke is there's the business definition of an agent, which is something that can replace an employee, ignoring the third one for a minute. If you think about the first two LLM calling tools in the loop and a microservice, at the beginning of this year we thought we would have our agent which would be a microservice and inside it it would have one agent in the code sense. It would be given all of the tools, all of the context, and it would iterate until it magically arrived at the answer. I think we have for the most part moved away from that idea down to the idea that we have multiple different agents that you piece together to give some constraints on what your application is able to do and is therefore make it more deterministic while still giving the LLM the kind of space to innovate. So concrete example, let's say we have a deep research agent. We think of that at the business level or the infrastructure level is one agent. If you look inside, what's happening is you might have a planning agent which generates a plain text description of the plan that you're going to execute. Then you have an agent which will extract structured data from that plan, turn those bullet points into some structured pedantic model of like here are the steps that we're going to go and execute. Then we might use one agent for each of those sub steps. And then we have a final agent that basically takes all of that context and outputs our final summary of what's happened, the kind of research. And now if you think about that system, as sure you can think about that as agent orchestration. Again, AI people love inventing new words for existing concepts. For the most part, agent orchestration, we have ways of modularizing code. We've had them for 40 years. They're called functions and classes and we don't need to invent new ones, it turns out. But that thing at the beginning of the year, people would have said, oh yeah, I've got deep research. It's just one agent that goes off and runs with access to these many tools until it magically arrives an answer. I think we've moved away from that. And there are lots of reasons for that we can switch which LLM we use for each of those different tasks, even which provider we can switch in and out, which search we want to use and we can debug it more easily. We can work out which of those things went wrong if you just give all of your context to an LLM and hope it gets it right. It's magic when it does and it's unsolvable when it doesn't.
Gregor Van
And maybe this is seen as one of these AI faddy terms, but I think it's probably something that developers have heard. The idea of swarms or agent teams. I'm sure you think of it much more nuanced than that. So I mean, in relation to what you've just been saying, I think most.
Samuel Colvin
Of these terms are pretty much bullshit. I think. I mean our big thing in pydanzing is AI is still just engineering. Sure, what LLMs can do is borderline magical, extraordinary. If you had told us this is where we would be five years ago, probably none of us would believe it, but how do we go and use that? We apply the engineering principles that we have learned and improved on over the last 20 years, 40 years, however long you want to think about it. So yeah, I mean, I'm not going to show code now, but I had a deep research implementation that I wrote the other day where yeah, you can think about an agent swarm that is like I call the same agent many times in parallel to go and do research. Then I take all the results and I pass them to a more powerful model and I get it. To summarize the result, there are more complex workflows, but it's very rarely actually a complex graph. I mean, I think that's one of the things you notice if you try and go through Landgraaf's documentation, go and find interesting, genuinely innovative graph example doesn't exist.
Gregor Van
I mean, yeah, you said you're not going to bring up code that's good because we're an audio only output here. So I'm not going to narrate code. But let's talk about how you would think about this. You're somebody who's obviously working in this realm day in, day out. So if you were going to be building, let's just take the obvious example like customer service agent, do you have a framework for what are you going to expose if we're talking across a bunch of tools and how, how do you think about adding and removing in the sense, like what's your kind of experimentation process on that as well?
Samuel Colvin
I'd Say a few things. I'd say, first of all, unlike in traditional applications, where any experienced engineer can basically eyeball what's going to be performant and implement it first time, that is not going to be the case with AI. You're going to have to go and try a bunch of things and throw stuff out and try again. And so the two things that in my opinion matter there are type safety. Because you want to go and refactor. It's a heck of a lot easier to refactor if you've got type safety. If you want to tell Claude, go rewrite this to work in a different way. It'll do a heck of a lot better job with that if you've got type safety. Second thing is observability and observability from day one. Observability is not something that you shoehorn into your application the day before launch because someone told you you should. It's genuinely useful from day one trying to work out what's going on. And then thirdly, I think evals are important. Evals are a powerful mechanism to kind of allow you to have at least a chance of systematically improving, rather than kind of random walk, but also just like digging in and trying to understand what it is that the LLM is actually doing. They're very rarely. It's extraordinary how what they can do, but their processes are reasonably easy to follow for a human. They're not meaningfully more intelligent than us, so we can go and read through it and understand where they've come from. A decision in general. I mean, there's a great talk from Barry Zhang at AI Engineer at the beginning of this year called think like your agent or something like that, or think like your model. The idea is it can be really hard to work out what data your model has access to versus what data you have access to and therefore understanding what mistakes it's likely to make, which basically a lack of context. One of the examples I like to use with this is if you give an LLM data in the form of some bullet points, do a pretty good job of understanding the different bullet points. If you give it that same data in the form of a markdown table or a CSV table, it does a much, much worse job. And yet you and I look at the CSV file and open it up in Excel or look at it as a markdown table. Pretty easy to see what's happening. We can look down this column. But if you imagine how an LLM sees your data, which is effectively as one long line of bytes. Now, trying to correlate where is like comma 73.4. Go back all the way to work out which column that relates to. It's incredibly hard. And so you try not to give access to tables in that form. If you can give it access to basically XML or JSON where it has the key each time, it'll do way better. But that's just one example. But there are many examples where ultimately the problem is you have failed to give the agent or the LLM access to some key data. It needs to solve the problem that you implicitly have, but you haven't realised you have it because it's kind of so obvious to you.
Gregor Van
I guess, then, taking that, I want to maybe go slightly back to. If we think about multi agent and that question around, are we talking about a number of tools in one agent or are we talking about multi agents? Again, how do you think about that? It's sort of organizational design almost. You've already touched on the idea of. Think of it like a human. Are we talking from a human perspective? It'd be like, oh, are we talking silos? Are we talking about communication of teams who get to talk to each other about something? Again, how do you think about that? And this is also then thinking about, well, how do we look at shared memory or messaging or the concept of voting between them? How do you look at that?
Samuel Colvin
I mean, I think the first thing to say is if you have a isolated task and you can take the context that's required for that task and move that into a separate agent and call that within a tool or call that in a separate step, it can be a great way of reducing the amount of context that the main agent has. So let's say you're building a research agent that has access to go and query some big SQL database. Now, the obvious thing to do is to go and smash the whole of your schema into the main agent and it now has access to all of the information it might need and it can go and write SQL to query that data. Well, fine, but you've. Now, if you're combining that with like some other significant tasks, you've got an awful lot of stuff in your context. If you have a tool that is called like run aggregation, let's say, and it takes a natural language description of the data that it's trying to find and then within it it calls a separate agent which that has access to all of the schema and the database context and examples and stuff like that. Then one, we have a system that's way easier to debug because we can go and run the SQL agent, see when it works and see when it doesn't and write evals on that in particular. But two, the main agent that's got to do that and also a bunch of other tasks, look up some rag database, worry about memory access some NoSQL database, blah blah blah blah blah. It doesn't have to think about that at all. It just gets this plain text tool, describe the data that you're looking for, a bit of context on the kinds of attributes you like and you've gone down from thousands of tokens of context for SQL down to a few hundred or tens even to describe that. Get data Endpoint.
Gregor Van
Okay.
Samuel Colvin
Capital One's tech team isn't just.
Gregor Van
Talking about multi agentic AI. They already deployed one, it's called Chat.
Samuel Colvin
Concierge and a simplifier in car shopping.
Gregor Van
Using self reflection and layered reasoning with live API checks.
Samuel Colvin
It doesn't just help buyers find a.
Gregor Van
Car they love, it helps schedule a test drive to get pre approved for financing and estimate trade in value. Advanced, intuitive and deployed. That's how they stack.
Samuel Colvin
That's technology.
Gregor Van
At Capital One we're going to move on to graph theory or graph theory meets AI if you want to call it that. I think just did some sort of pre research obviously as I do before all interviews. I think you've talked about the concept of graph theory meets AI if you like. So I believe there's this concept and whilst I'm not in this domain, so audience forgive me on this one, but dags directed acyclic graphs. Could you talk to us a bit about that and sort of how this is sort of looking at the process by which an agent might move along its kind of steps. I think in theory one is kind of quite linear and the other is you can kind of have a cycle and I believe pedantic AI kind of prefers one approach over the other.
Samuel Colvin
Like I say, I think the jury is out to some extent on graphs and their use. I mean I think most people when they talk about dags they mean effectively a graph with some dependencies that relate by node. The graphs that you end up building if you're using an LLM are quite often cyclic, as in there's no reason why you can't have cycles in there. We have pydantic graph which is part of pydantic AI. It's used under the hood by agents. I'm a bit torn on how valuable they are. Someone said to me recently that the most useful thing that graphs do is make people who want graphs happy. And that is a very clear definition of why a graph is useful. How much value are they after that? I think not that much. I think that to be rude for a moment about a capacitor. I think LangChain had taken a lot of heat for LangChain and how it didn't have any functionality. They chose to go and build Langgraph because that seemed like the right thing to do. And bluntly, they didn't understand how to do durable execution. So they had graphs as a way of snapshotting and now they're stuck saying graphs are the right way of doing it because they can't go and build a third library. That's their new way of doing it. So what's happened? I mentioned this earlier, that we were able to adopt the new way of doing things. One regard in which I have sympathy for them is that since they released Langgraph, everyone else anthropic, OpenAI, Google Us and a bunch of other agent frameworks have all centered on this model of agents of LLMs calling tools in the loop, which are a very powerful primitive. Now you can implement that with Landgraaf, but it's a lot simpler just to go and use our agent or even OpenAI agent implementation. There's a reason that one is so similar to ours, which is it's, I'll say, inspired by our agent implementation. So the other thing that Langgraph lets you do is snapshot at the end of each node. And so if something fails you can go back to that point. Now that works. But if you want to have that and parallel node execution, so run multiple different nodes in parallel, you basically have to abandon type safety completely. You have to manually check that your data is consistent. Our approach is different. We support durable execution frameworks like temporal debos and we have a bunch of others coming soon and they let you get the durable execution. This idea of an agent that can run for minutes or hours and resume basically pick up from where it left off, if it stops or if you get errors, which I think is a much, much more powerful way of getting the longevity part of long running. Agents.
Giving a talk yesterday on temporal and durable execution. Obviously you can use durable execution with graphs to run your graph within a durable execution framework. And now you get that same snapshotting, effectively behavior restarting a graph from where you left off. It's much more fine grained. It's literally snapshotting at every async call and the code should be much easier to write because you don't have to worry about this snapshotting that gets in your way.
Gregor Van
I mean, just I guess talking sort of slight layman terms here, but if we're talking about failure modes as such, and okay, something fails and there's this concept of snapshotting, but is there a concept of being able to kind of move on even though some part of the chain failed?
Samuel Colvin
So our graph implementation, we have some basic snapshotting. I think we will. I don't think we'll retire it because people are using it, but I think my approach would be durable execution is the way forward. Like these are solved problems people like temporal, but they're not by any means the only one. There's a whole space of those companies have done an amazing job of giving you ways of writing what feels like normal procedural Python code, but if you get a failure, it will automatically be retried. If you want to go and sleep for three weeks before the next task needs to be run, you just sleep for six weeks and it will take care of restarting the process from the right point. That stuff is, I think, really powerful. I think it is a far better solution to the same problems as graphs.
Gregor Van
So let's talk about the general devex. How did you think about that? I mean, I guess were there any things that you've learned through especially Pydantic v1.2 and maybe even how you've been thinking about v3 as to how the devex feels and looks for someone coming into Pydantic AI?
Samuel Colvin
I think that we know we're battle scarred by introducing the wrong API and having to maintain it for a long time. And so we're very careful about the surface area and not just adding in any old thing that someone wants to add. I think others who are earlier on in the arc of maintaining open source perhaps don't necessarily think that way. I think that the extraordinary powerful thing about code and about good open source libraries is people can go and use them in ways you never thought of. Right. Pylantic is used for a myriad of things I never thought of when I started it and lots of things I to this day there are hedge funds for example, but other large organizations who do stuff with Pylantic that I had never thought of and that is if you build a really powerful tool, the point it is universal enough that people can go and do things you hadn't occurred to you. We want to build an agent framework where we build those fundamental things. You don't want to have to go and repeat. So the very simple thing an agent will do is if you're doing structured data extraction and the model gets the response wrong and you get a validation error, it will return that validation error to the model and say please try again. That is a very neat, very nice thing that will very often catch intermittent or intermittent bugs. You do not want to have to go and implement that again. It may as well exist in a library where you're sharing that implementation with everyone else. But then how you go and implement rag, for example, is far more opinionated, far more context specific, far more room to go and innovate and try doing unusual things. And we don't want to get in your way of letting you do that. So we're, I think, yes, strongly of the opinion that we're trying to build the right foundations rather than give you these high level abstractions that tell you what to do, but constrain you in doing it well.
Gregor Van
You touched on it in reference to pure pydantic. But when we think about case studies or things you've seen pydantic AI being used for, I guess could you just pull out some of those that come to top of mind right now in terms of either things that you've just been very impressed by, or I would say maybe more interesting than things that even you hadn't thought of pedantic AI going to be used for.
Samuel Colvin
I mean, I'm trying to think there's someone in our public Slack who's written a coding agent with pylantic AI which is a neat working implementation that works very nicely. I think again, coming back to the log fire and the SQL thing, it's amazing how powerful agents can be if you just hook them up to a SQL connection and let them retry a bunch when they get the SQL wrong. We've done some stuff with Pylanski, but I've seen others do it where you're effectively doing data analysis with a SQL tool. The other interesting thing is how much has come and been implemented by the community or from pull from the community. So AGUI integration, AGUI is a protocol for talking to a UI to a chat interface basically. But you have rich components that was implemented by someone else. But also some of the model implementations are either implemented or maintained or support improved by the community. Have an awful lot of people coming and contributing to it, I think. You know, one of the unfortunate things about maintaining open source is you often don't see the most interesting things people are doing with it because those things end up being proprietary But I hear quite often from people. Once I found Pydansk AI, suddenly I had found an agent framework I could actually bear and now that's the only one I will use. And I hear that roughly that line from experienced engineers all over the place. And that gives me, you know, that feels great because that's where I come from. Right. I'm not a cursor type developer. I didn't start developing with cursor. I spent many years learning it the hard way. And there are lots of other people who come from that deep engineering experience for whom Pydantic AI resonates.
Gregor Van
Yeah, I've spoken to many, especially through this podcast, many open source maintainers and I'm not one. And I always just assume that they kind of know all the projects that are using their libraries, especially sort of the top ones and they're like, no, I actually have no idea. Not no idea, but sort of. It's very hard to keep on top of the web of things that it's being used for state management. Do you have any sort of recommendations for then where that should be managed if using Pydantic AI?
Samuel Colvin
We have some neat examples in our demo repo of managing memory with either tools. So you have a record memory tool and a retrieve memory tool that works surprisingly well or just recording all messages and then doing a bit of work to cut off some older messages when you like, if you have very long running conversations going on, both of those work fairly well. I think the fundamental. Again I come back to it, but I've said it before, but I'll keep saying it. We're not trying to give you the high level opinionated here. Is it like I know one of the other agent frameworks has like three different memory implementations for short term memory, long term memory, contextual memory. It's very unclear what they do under the hood. We let you do tool calls, we return you structured Python objects as messages, you go implement the thing you want. If it's a simple demo you want to build, the simple thing will work. If it's a production application where you've got lots of nuance, those pre built ones aren't going to work for you anyway. I think we might move a little bit more in the direction of having a, like having support for storing messages easily in a database, basically an abstract base class and a few implementations for that because it's common enough pattern, it makes sense. And I think we're thinking still about how we'll have embedding support soon. At the moment there's no embedding support in pynastic AI because it honestly hasn't come up. It's one of the most upvoted issues, but it hasn't been a burning need for the most part.
Gregor Van
Is that as in like actually creating the embeddings?
Samuel Colvin
Yeah, like generating the embeddings API. Then once we have that, which is relatively simple API, do we then go further and have concept of rag and hybrid search or vector search or, or do we just say, yeah, here's an API for generating embeddings, how you go and implement the next phase is up to you. Yeah, I mean the other place I mentioned Agui already, we also are about to have support for Vercel AI Elements, which is another protocol effectively for communicating with chat UIs. So the principle is you should be able to build a chat UI with really a few lines of JavaScript or no JavaScript at all, just using a pre built UI and then you, you can go and do your innovation within the agent however you like in Python.
Gregor Van
And then just to kind of, I guess tie a bow on it observability, I guess Logfire, that's the sort of maybe batteries included piece if you want to.
Samuel Colvin
But again, we work really hard to follow open standards. So Pydantic AI emits standard compliant OpenTelemetry data. A couple of us are reasonably involved in the otail sig for Genai, so we like push OTEL to work the right way for Genais and then we support that. And so at my last count, There are about 13 different observability platforms that support pedantic AI one way or another. We obviously think Logfire is the best of those, but unlike some of our competitors, we're not trying to use a proprietary protocol to kind of lock you into using R1. We think that we'll win because we have the best agent framework and the best observability platform and we can kind of guarantee they work well together. But we're not trying to stop bringing a different agent framework or bringing a different observability platform.
Gregor Van
Yeah, and I definitely see that trend with a whole bunch of platforms where ultimately they are open source and they have maybe different sort of arms. But the key, I would say part of their success is just maintain that thing as an open standard and if somebody wants to swap out a bit, that's completely fine. But at the same time you've got three to four bits of the ecosystem that you still know will work well together. If you just want to default to.
Samuel Colvin
Something, I think it's valuable if you're an enterprise, that you can come and you have one company you can come and shout out when the two don't work well together. That's the powerful bit of what we have, is we control an awful lot of the stack. Right. As in people on our team, although they're not directly part of Pydantic. Marcelo maintains Starlet and New Vehicorn, which is basically the modern networking stack for Python. So from Pydantic to Starlet to uveacorn to Pydantic AI through LogFire SDK to LogFire, most of those bits are literally under the Pedantic umbrella. And even if they're not, we're pretty involved in the ecosystem. And that is, I think, where companies of all sizes with some engineering taste, but particularly enterprises, find the value in like the one solution or the one point of contact for many different solutions.
Gregor Van
Yeah, absolutely. Being on the side, shouted out on that basis, but that's a good place to be. So we're going to hear about something kind of exciting in a minute, I believe, as a new product in the wings. But just to kind of wrap up this again, I'm just the voice of the audience here in terms of LLM gateways.
Samuel Colvin
Yes. So we're about to release Pedantic AI Gateway. I think by the time this goes out, it should be launched in particular from Enterprise. It's a feature we just hear immediate need for. It's the hair on fire problem right now. There seems to be no good go to solution. It makes sense, right? You're a financial services company, you're expecting to spend 5 million a month on OpenAI. Are you going to give everyone in the company an OpenAI key, where technically they can spend $5 million and then some intern leave something running overnight? And you've spent a big chunk of that, obviously, or not. And you want observability into what's going on. And that's if you only have one model. Now, what if we're doing some research and we have Anthropic Gemini Mistral, Grok going on? It obviously makes sense to have a single platform to manage those things. And once you have that platform, it's a very useful place to do a number of things. So whether that be caching or security or fallback. And so we just kept speaking to enterprises who didn't have a good solution for this but needed it. And so, yeah, we're about to release AI Gateway. It will obviously have a very nice, easy integration with Pedantic AI. So you can basically set gateway and then the provider and the model and your one API key will then let you connect to all of those different models without having to go and put a credit card into each of them if you're getting started. And yeah, so we'll allow to basically let you use all the big models through one gateway. Initial launch, we won't have many of the shinier features, but they will come very soon afterwards. The caching and the fallback and the security stuff, very exciting. Yeah. And then the actual gateway itself is open source, but the console, the platform for managing it is closed source and that we will sell and part of it comes back to. It's all very well, but we can make the process of getting going with building with LLMs incredibly simple. But at the moment there is this barrier that a lot of the LLM providers are just like their platforms are not that easy to use and understand. And we want a really nice way of allowing developers as they get started to go from zero to I have an app running with an agent in it and I have observability very, very quickly. And that makes complete sense. And obviously one of the neat things we can do in the gateway is we can emit opentelemetry from within the gateway. So if you're a large organization and you definitely want to record all prompts and you want to look for phishing or for prompt injection, you can do all of that stuff in the gateway with logfire, which is why they kind of play well together.
Gregor Van
Yeah. And yeah, just to call out. So we're recording this the third week of October, I believe it's sort of end of October. Is the release on that one.
Samuel Colvin
I've got a big night ahead of me because we were hoping to get to the private beta tomorrow. I think it's now going to be Monday and then, yeah, hopefully 31st of October, we'll do the public announcement and let you sign up, put a credit card in if you want to use the models we resell, or bring your own key if you want to put your own key in. And that'll be free.
Gregor Van
Yeah. So, yeah, just to call out, thank you so much for coming on, given that this is your evening over on the west coast and as you just said, you've got a big night ahead of you. I misheard you saying you've got a big night ahead of you at the pub, but you were about to say public.
So, yeah, thank you for making the time. I thought it might be nice just to kind of round out With a little bit of hacker news feedback actually, because I think it's kind of fun when people that have given and don't worry, there's no kind of gotchas here. I think it's kind of interesting. Six months ago someone said I found that Pydantic AI framework strikes a perfect balance between control and abstraction. What would you say to that?
Samuel Colvin
I think that's exactly the kind of how we've tried to think about it. And in particular, I mean I remember speaking to people beginning of this year who would say I don't need an agent framework, I don't want all those abstractions. The one thing that I want is the model agnosticism and be able to plug into any of the big models, not have to go and use the OpenAI or Anthropic or Google SDK and then be stuck to them. And so to some extent we tried to build it as model agnosticism without too much on top of it. We've added a little bit but it's all opt in and fundamentally the agent is pretty simple, some pretty minimal behavior on top of the standard LLM calls but with that nice unification so you can switch model very quickly.
Gregor Van
One more. Yeah, just I've been building an integration with Pedantic AI and the experience has been great. Questions usually get answered within a few hours and the team is super responsive and supportive for external contributors.
Samuel Colvin
Yeah, I mean I think we take that very seriously. We care about the kind of response rate on GitHub even on the open source all replying on Slack. One of us will reply to your message on public Slack almost always within an hour or two in most time zones. We've been there, we're developers. Right. I'm writing code most of my day still luckily maybe I should be doing more sales but hey, and we care about that stuff and because we are ourselves open source developers, that is one of the things we care about most. We had a sales call earlier with someone from Enterprise and they were compelled by what we showed them in logfire but really the reason they came to us was the solution they were using before had 290 something open pull requests that no one ever responded and that was the thing that had actually driven them to be like what else can we use? And came to us. So I think that responsiveness and engagement with the community is a big part of what makes us different.
Gregor Van
Awesome. Yeah. And obviously I said no gotchas but also why not pull out something that maybe there's someone in the audience that is saying this in their head anyway. And there's always going to be people that aren't just throwing all positive comments, but someone saying, I really wish Pydantic invested in Pydantic instead of some AI API wrapper.
Samuel Colvin
Fair. And I think we probably one of the reasons we haven't been that much recently is when we made new releases of Pydantic, mostly people were annoyed that we broke things because we fixed someone else's thing. And so Pydantic is a big established library. We are now. David Hewitt, as I said, is working an awful lot on new stuff within Pedantic. You will see big new features come out. The other thing though to say is go and look at the top 100 most downloaded Python packages. Look for eponymous companies. In that list you will see four companies. You will see Google, Amazon, Microsoft and Pedantic. If you look in the top 25, you will see, I think only Amazon, maybe Google and us. Now, per head, per dollar of however you want to measure it, we are one of the most impactful companies in terms of how much open source we do. But we have to make money, right? We're a startup. We've been given money by some big VCs to go out there and make a profit. We're not a charity. I find there was someone who worked, hilariously worked in the CTO's office at Google, who said, oh, I'm not sure about Pydantic. Now they've raised money. I'm not really sure about these open source projects that are trying to make a profit. Saying the same thing about Astral and Rough, he ended up deleting his comment because he obviously realized he was on the wrong side of history on that one. But like, there are a certain number of like millionaire communists, particularly in California, who would love open source to be, you know, entirely benevolent, but they get paid an awful lot of money by big tech companies who don't particularly contribute back. So we subscribe to the pledge Open source pledge. So we give $2,000 per year per developer to open source. That's above and beyond all the open source we do. We think that stuff really matters. I see very few of the other bigger companies doing that and I suspect the person who made that comment doesn't work for a company who does that. So I think we do enough for open source. I'm pretty proud of what we do and I'm pretty robust in rebutting anyone claiming that we don't do enough.
Gregor Van
Yeah, absolutely. I wanted to bring that out because I very much stand with with you on that one. Both seeing what Pydantic does as well as many other companies that get the same heat just because they switch focus a little bit to something that happens to have a commercial arm to it and somebody gets all rubbed up the wrong way.
Samuel Colvin
And I'll tell you that if in the unlikely event Pydantic doesn't work out and we end up shutting the company down or being acquired by someone that people don't like, we'll get a lot more heat for it then. Well, the reason we're trying to make money is so that those things don't happen. Right. Because maybe I'm not as nice as Guido. I'm not going to maintain and obviously Pylantic is nothing like as big as Python itself. But like, I am not going to go and maintain like Pedantic on my own for the rest of my life being paid like I was probably getting towards $40,000 a year when I was maintaining it on my own. That's not me. I'm not going to do that for the rest of my life. So these projects, the company needs to make money to support both our open source and the wider ecosystem?
Gregor Van
Absolutely. Just kind of talking about, I guess, the company today. How many people do you have in the team now and are you hiring? Is this a good place to plug any hiring or.
Samuel Colvin
We will be hiring a little bit over the next few months. The blunt truth is that we, I think that it's got easier and easier to apply for every single job you can think of and the number of pretty poor applications we get now whenever we put a job out up, we're being more and more like targeted in looking for particular people. So. And unfortunately we're a small team. We don't have capacity for like junior people or interns in general. And so I'm afraid if you email me, being like, I'm a big fan, I'd love to do an internship with you when I finish university. Unfortunately I'll try and get back to you, but the answer will be no. We're always looking for like really bright, experienced engineers who have a proven track record in open source, in Python, Rust or Typescript. But if you haven't got a like pretty impressive record in that direction, we probably aren't going to. It's probably not going to work with us. Yeah. But yeah, we will be hiring. We put everything on social media and would love you to apply if you match the conditions on the application. But please don't email me. Our first rule of anyone that we hire is they need to follow the hiring rules which start with email careers at Not Samuel.
Gregor Van
Yep. I've been there as both a past life owner of a company where developers are applying as well as them being a CTO and having people sort of try and get around the process and I just politely say, please follow the process. You're not going to get anywhere just by emailing me. Unfortunately. All these stories from back in the day. Oh, I guess the email address is just nonsense now or it should be. There's a process usually for a reason.
Samuel Colvin
So I'll just add one thing though. I mean a lot of the great people we've hired have done significant stuff in open source. Now don't think that as a three weeks into your Python career you can go and like start using cursor to generate pull requests on Pylantic and we'll go and hire you for loads of money. But like we have found some incredibly talented people who probably would have been overlooked by bigger companies by finding people who have like been working away maintaining awesome Python libraries for a long time. And so I do think that if you can get into open source, if you can go and do the hard yards of building up a reputation in open source, both us and many other companies will hire you. So I do think. But it's not a quick fix. It's not like you can't just go and like buy the right crypto coin and suddenly be a millionaire. Right? It takes you five years of learning. How long does it take to get 10 years of experience? It takes 10 years and you can't really accelerate that. Arguably AIs make that even harder because the discipline, you need to learn it yourself when an LLM will probably get something approximately right. Instead it's getting harder.
Gregor Van
Yeah, it's something we've discussed a few times. We have an SED News Monthly and Sean and I have discussed this a few times. Just sort of where's the tipping point between people coming in now as developers and not to in any way dissuade people or. But yeah, there is no point in just firing up cursor and then thinking that you're a programmer. It doesn't work that way. And it's not. We're not, oh, we're the old guard saying you can't come into the club. It's not that whatsoever. It's just that engineering is a fundamental. There's all these principles and concepts that just seem helps if you understand them before the code that's written that you're then reviewing or editing.
Samuel Colvin
It's a bit like whether you're flying a plane or driving a truck. Sure, you can put it into cruise control where, when you're going down the motorway, but when you get to that narrow lane where you need to reverse round a corner, you still need those expertise. And in the end, sure, I use claw code a whole lot and it does lots of things for me that I don't want to have to go and write all of those react components. But when it runs into some weird bug, when you need to set up exactly how you're going to share type safety between the front end and the backend, that still requires me. And I don't think there was a. And maybe we're about to have AGI, we're all redundant. But until that point, however little time it is, you still fundamentally need a truck driver in that truck to reverse around the corner, even if they spend a bunch of their time in cruise control. And the same is true with code. And in some ways it's a multiplier. Right. We all have the resources of a team lead, but you still need the knowledge of a team lead to be able to deploy that team, whether it is human or AI, effectively.
Gregor Van
Yeah, absolutely. So, yeah, just kind of final closing question, and this is more like a personal one to you, I guess, which is more just inspiration, I guess. Have you got any people, whether it's in the community right now or even living or dead, who's kind of inspired you and does inspire you?
Samuel Colvin
I guess I'll have to say Guida von Rossum, the creator of Python. I've met Guido a number of times now. He can be reasonably blunt, but he's always friendly and fun to talk to. And he is so humble. I mean, I have seen him walk into rooms where no one knows who he is and just stand there and listen to what's going on. He so often describes himself as the author of Pep 482, not the like BDFL and the creator of Python. I'm so impressed by what he has built in Python, both as a community and as a language. He got so much stuff right long before his time, in terms of realising that it was something that programming languages are for humans to use, not for computers to use, and prioritising the human bit. And I think the success of Python, you look at every other successful language, with the possible exception of Rust, which I'm also a big fan of, they've all been anointed somehow. They've all had a reason why they're going to win. JavaScript had the browser go, had Google C, had Microsoft, et cetera, et cetera. Python had one random Dutch guy and an amazing community. And I ever impressed by what Python has become and you look now right, like in AI. Sure, some people say TypeScript might take over, but like it's Typescript and Python, it's a two horse race and Python's doing amazingly and I'm like proud to be a part of that community and have had a little bit of impact on it over the years.
Gregor Van
Yeah, I have to say I am more on the typescript side, but that's not because I dislike Python, it's just, that's just the route I took in life and I'm really impressed by how Python has, as you say, it's the leader clearly in AI programming and it's been fascinating to kind of watch that. Well, look, it's been absolutely a pleasure to have you and feel very lucky to have you on SE daily, especially as you've called out in an episode, this is sort of towards the end of your night and it's not finished yet by any means. So yeah, thank you so much for coming on and obviously we look forward to the Pynastic AI Gateway release which is probably going to be out by the time that this is airing. So yeah, thanks so much.
Samuel Colvin
No problem. Thanks so much for having me. It's been a pleasure.
Date: December 4, 2025
Host: Gregor Van
Guest: Samuel Colvin (Creator of Pydantic & Pydantic AI)
In this episode, Gregor Van sits down with Samuel Colvin, the creator of Pydantic and Pydantic AI, to discuss the evolution of Pydantic from a genre-defining type safety and data validation library for Python, to a broader company delivering core infrastructure for modern AI applications. They cover the origins of Pydantic, the philosophy and engineering behind type safety, the development of the Logfire observability platform, the challenges in agentic AI frameworks, and the upcoming Pydantic AI Gateway. The discussion is rich with technical insights, open source reflections, and practical advice for engineering reliable AI systems.
Samuel Colvin’s Background:
Growth & Impact:
Controversial Beginnings:
On Constraints & Runtime Types:
Transitioning Versions:
“Pydantic is already very, very fast. It’s 50ish times faster than Pydantic V1.” (Samuel, 06:28)
Genesis:
Product Scope and Differentiators:
AI & Observability:
Evolving Patterns:
On "Swarms"/Teams:
On Dual Focus and Critique:
“The reason we’re trying to make money is so that those things don’t happen... I am not going to go and maintain Pydantic on my own for the rest of my life...” (Samuel, 50:16)
Purpose:
Features:
On Open Source Careers:
On the Impact of AI on Software Engineering:
Personal Inspiration:
Samuel is candid, technical, pragmatic, and sometimes wry — insisting on engineering fundamentals amid rapidly evolving AI trends and community buzzwords. The episode emphasizes practicality, long-term sustainability, and the central importance of type safety and observability in production AI systems.
This summary captured all critical concepts, memorable moments, and direct quotes for listeners who want a detailed but digestible account of the episode — without the fluff.