
Loading summary
A
Software abundance for government. Why do we need it and how do we get there? To discuss, we have on Russell Kaplan, co founder of Cognition, who previously spent time at Scale and Tesla. Thanks to Cognition for bringing us this episode. And Russell, welcome to ChinaTalk.
B
Thanks for having me, Jordan. Excited to be here.
A
So what is wrong with software and government?
B
We have a lot of problems with software in the government, despite the government being actually a lot of the source of innovation in software for a long time. But, you know, today the state of the world is, it's pretty sad as a citizen, you know, you interact with software for the government and a lot could be better. You know, I just to put some numbers on it, you know, there is more than $100 billion a year spent on it for the US government. A lot of these systems are ancient. The GAO did a study finding that, you know, in the 2000 and tens there were 10 critical legacy systems we needed to modernize. Less than three, or I think only three of them have even started the process of that modernization. And as a country, we're spending a lot of money and not getting the same results that we see in the private sector. And I think what's happening now with AI and software engineering, it's changing the private sector. But I'm personally really excited about how much it could change for the country as well. And I think it's actually really important for, you know, for the sort of the next generation of the United States to get this right.
A
You mentioned the $100 billion a year number. Like, what does $1 get you of in the private sector? And how does that kind of comp over to some federal or state department spending that money?
B
So yeah, the private sector, the way we, we buy software is, you know, we have a problem and we see, okay, what's the best tool in the market for that problem? And we buy whether it's a SaaS solution for my CRM or it's infrastructure for scaling my database. But the market tends to be more efficient for the government, it's a different story. It's really challenging for the government to purchase software directly. There is a much higher kind of compliance and regulatory hurdle for software vendors to even start working with the government. You know, we face this at cognition. Getting to fedramp high was a journey for us. But even once you're there, there's a lot of indirection which a lot of these systems were designed with good intent to make sure that there's no corruption, that people are having RFP processes that let government vendors get the government buyers get the best price for what they want. But the net result of this system is that it's enormously slow to get software into the government. And in particular to reuse Software like a SaaS tool has a much easier time being bought by a private sector company versus a government agency which often needs to have a much higher degree of ownership of the product they're using. So I mean, the net result of this, if you look at some of the data, we're still powering most of the country, the critical systems of the country with ancient code. You know, tens of millions of lines of COBOL are powering our treasury, our Social Security Administration, and it's not getting better.
A
So Russell is cobol, not Lindy. I mean, what's wrong with running a government on ancient software languages?
B
Well, the problem is that nobody knows how to write COBOL anymore. You talk about COBOL specifically is. It is a problem in the private sector too. But what we found is that the people who wrote these systems are often no longer there when changes need to be made. And so the result of this is that there's a small cohort of specialists who learned COBOL many decades ago, still write cobol, and they need to be brought in for any change. But there's just fewer and fewer of them and the changes get bigger and bigger. So as a result, everyone's sort of scared to touch the big mainframe systems that are kind of powering critical infrastructure for the country. This problem happens in the private sector too, by the way. So a lot of banks that we work with at cognition, large health insurers, airlines, they're running these large scale systems. And I mean, to give it credit, COBOL is a very performant language. It's actually really efficient and really fast. And it's working, it's kind of working. So people don't want to mess with it. But what, what happens is that when requirements change, it's really hard for us to move with those requirements to update them. And that's where kind of the slowdown really comes.
A
Let's stay on this for a second. So like maybe for the uninitiated, like why are there new programming languages and what can they enable besides just like having more people know what Python is and how to use it than the stuff that was invented in the 60s and 70s.
B
Yeah. So I mean, just really brief history of programming languages which even before we were writing COBOL, people were writing assembly, like 1948 assembly sort of became popularized. And this was a big upgrade even from the Previous era of punch cards. The 1890 census was the first time that punch cards were used in a real sort of production setting. It was by the government because they realized that to sort of count the census manually was going to take more than 10 years for the 1890 census. And so they were literally not going to get the job done. And so the government put out kind of a call for technology to say, all right, what can we do to solve this problem? And in 1890 we used punch cards. And that was a big deal.
A
That 1880s baby boom. Amit just is the straw that broke the camel's back.
B
It really was too many people, not enough counters. Yeah, I think it was going to take like 12 or 13 years, I think, or something. Doing it the, doing it the old fashioned way. But punch cards, it's a very little representation. There's a hole, or not a hole representing kind of a one or a zero originally as a data storage format, assembly, cobol, modern languages, you mentioned Python, even Java, they kind of all just walk up the ladder of abstraction, of making it easier to tell your computer what you want it to do. And so you have to have increasingly less arcane specialized knowledge and increasingly more intuitive interfaces of how do I get my computer to do what I want it to do? And I think AI is actually the next logical rung on the ladder. It's not some fundamentally structurally different thing when it comes to programming. It's telling your computer what you want it to do, but in English in a way that's really natural for everyone.
A
Yeah, and I think the other thing is the older programming languages are optimizing for the constraints of the particular generations of technology. So now before you had way more severe sort of memory and storage processing restrictions. And in today's languages, like you have human being, like, you know, pre2025 or whatever, you needed a person to sit down and write every line of code that you were going to deploy. And that is, I don't know, now not really a thing so much anymore.
B
The hardware teams work so hard to optimize the chips, you know, to keep pushing Moore's Law. And then the lazy software software engineers like myself, you know, we just stop worrying about garbage collection and memory management and we relish in the productivity gains without worrying so much about the efficiencies is kind of how it's gone a lot of the time. We do get more efficient, but typically most of the hardware performance improvement gains capture by actually making software easier to write. And that's one thing relevant both for the government and the private sector that AI might flip this, where AI might be able to say, hey, I'm going to actually write this in really optimized assembly or binary directly because I don't need to have this intermediary interface that a human can understand.
A
Beyond insanely performant code, what else can we expect for in our world of software abundance?
B
I think the most important thing is that software is going to start flowing more like water. It's just something that's easy to move around, easy to change, easy to get more of, and in particular that a lot more is going to be created as a result. If you look at the structure of kind of the SaaS industry and software as deployed in government and private sector, a lot of the way things are shaped is because of how hard it is to change things, right? It's like to migrate off of a system of a database that you've installed, you've designed. That's a massive, massive project if you want to, you know, if you want to even buy another company. One of the most complex parts of that historically has been the integration of the software and infrastructure and the IT systems, the different data storage. So there's like this sprawling complexity that's emerged and honestly a lot of vendors who are, they kind of use the switching costs as a way to build a moat around their business. It's, oh, I'm going to, you know, land this, land this contract. We're going to set our stuff up and you know, we're going to discount it the first year and then it's going to be impossible to leave. And I think a big structural change that's about to happen in the economy and you can already start to see some of the reactions to this in the public markets is that strategy doesn't work anymore. You can't hold your customers hostage with switching costs. When AI is going to do the switching and it's going to work on it 24 hours a day and it's not going to get bored with what is often a really tedious process. And I think that ability to move from whatever you have to what's the best tool for your problem is going to lead to a lot of changes.
A
So what's cognition doing to making that future possible?
B
We started in January 2024, so we're like a month more than two years old by the time we're recording this right now. And we started originally as a research lab focused on reasoning and long term planning for software engineering. So already there was at the time great progress on Chatbots. But what about making things that could think for a really long period of time and using that for software engineering? Then we launched Devin, the AI software engineer in March of 2024. And that was sort of the first real draft of what should an autonomous agent look like, which now is of course extremely popular in software, to have these almost like co workers, these digital co workers, as opposed to co pilots that you're, that you're delegating work to. And if you think about the sort of what is the complexity we talked about or the switching costs, the challenges of migrating and modernizing, there's an architecture part of deciding what's the problem with our status quo and where do we want to get to. And that's still done by humans today. That's still done by humans. But once you've decided on that, the execution, the implementation detail, a lot of times it's pretty toilsome stuff. It's actually the stuff that engineers really don't love to do. Paying down tech debt, refactoring file after file of old code. And so what cognition does we provide this AI software engineer, Devin, that people can deploy against their code to really quickly transform it, improve it, modernize it, upgrade it. And at this point, we're used by a lot of the Fortune 500 by global organizations, but really focused on large complex systems that require serious amounts of existing context to do useful changes.
A
Well, let's do the compare and contrast with cloud code, Russell.
B
Yeah, so by the way, I think cloud code is awesome. I think in general, the explosion of developer tools in AI and software engineering has been kind of crazy to see not just cloud code, but I think codecs, you know, other sort of IDEs, like CLIS. The interface is constantly changing. So where cognition sits is we have sort of like a platform, we have, you know, an IDE. We acquired Windsurf, the agentic IDE in 2025. We built Devin, the autonomous agent. The biggest difference between Devin and cloud code, it's really, are you running in the cloud or are you running remote? Are you running locally? So is this something that you can spin up in parallel in a fleet versus something that's running on your machine? And it's like a pretty fundamental architectural difference. It's like, do you give the agent its own computer? That's like the dev tool difference. I think the way we work with companies is also pretty different. We're less of a here's the tool, go figure it out. Cognition. We work with, again, a lot of the largest, most complex organizations in the world. And these folks don't just have a developer tools problem. They often have a, like a transformation problem. How do I get this major outcome done in three months instead of two years? And so we've built, for example, a pretty large forward deployed engineering team for our size of company, where we'll go work with the government, we'll work with an enterprise to kind of partner together on driving a meaningful outcome.
A
And why do they need that and not just the tools and let it rip? Or maybe the other question is, do we wait six months or a year and then the technology is going to be so good that all we need is a model to just go and fix everything for us.
B
Yeah, so this is kind of like the AGI maximalist case of like, oh, if we just sort of solve, you just have the best possible model, then shouldn't everything else just happen? And I kind of think the answer is no. Have you seen the chart of inflation by sector over time? Where you can see Plasma screen TVs are massively deflationary, but healthcare and tuition is going way up. That chart is sort of my mental model for the post AGI future, which is all of the things that are intelligence soluble, they get really, really deflated. But what you're left with is all the rest of the complexity of the real world, which is actually quite substantial. You know, it's like, first of all, how are you even allowed? How do you get the permissions to deploy in the environments you need to deploy in? How do you work with like the people who are ultimately in charge of these systems to drive the outcomes they want to drive and then also just like re. Re reframe, restructure the process of how technology is built or procured inside an organization. I think that, you know, the models are going to keep getting better and they're going to make software easier and easier to create. It's like all the other problems that are sort of left are left behind.
A
So we're recording this the afternoon of February, Friday, February 27th. It's 2:22pm There is now two and a half hours left before Pete Hegseth drops the anvil on anthropic. Apparently. I'm curious, Russell, you know, given that Devin can pull from all the different models sort of what challenges and opportunities that has, you know, that gives you guys from a product development perspective.
B
Yeah. So I mean, I think if you're the Dow, I think you're certainly frustrated and worried about the decisions of any one model provider affecting the mission of what you want to do. And look, I think every private company has the right to say, these are the use cases we want to serve. These are the use cases we don't want to serve. And kudos to Anthropic for saying, hey, here's what we want to do and not do. But I think if you're looking at, it's another kind of point on should model providers even be providing the vertical tools on top? And is that the best experience for customers? If anything, we see kind of the opposite where the differentiation in models is decreasing, not increasing over time. If you look at Frontier eval scores for software engineering benchmarks, the gap between the best models right now as we're recording this, it's like less than half of the gap it was 12 months ago. And as companies are spending billions and tens and hundreds of billions of dollars on bigger and bigger clusters, bigger models, the models themselves are sort of converging together. And so if you're a government buyer, you care about the outcome that you're driving more typically than what model am I going to use? And so I think in some ways it gives a structural advantage to the agent labs, cognition being one of them, as, hey, we're focused on the customer problem. No matter what models exist or don't exist, we're going to combine them in the best way. And of course we'll have our own specialized stuff for very specific narrow use cases. But to drive the outcome you want,
A
we have a running gag on ChinaTalk of the AI mandate of heaven. And even though it's been anthropics for a hot minute, listeners will recall the worlds in which it was Geminis and OpenAI's. I feel, you know, I hear you, Russell, on the models kind of converging in capabilities, but when I play with them, they do feel different. And you know, people talk about being better at this or that thing for software. Like, how do you guys go about playing with them and figuring out who to assign what work when we're talking about Devin.
B
Yeah, so on the Mandate of Heaven piece, I think these things are cyclical. And one thing that's interesting in software engineering in particular is that the right form factor for building software is constantly changing based on, in part, the underlying capabilities of the models. And so when we, for example, when we launched Devon in March of 2024, it was just at the edge of possible, I would say, to have an agent that you could really delegate work and come back. And in fact, honestly, it wasn't even really useful for us for like another three months. Between when we, when we built this prototype that we shared with the world, it took about three months for us to then use it enough internally at Cognition that Devon became the number one contributor to Devon. So that was like a three month lag and then there was another several month lag before it actually started becoming deployed in production settings. Useful for customers. And what's happening is like as, as the models improve, the form factor for how to use them is constantly changing. So in coding, we went from tab completion, think you're writing a word doc and you hit tab to get the next response. But in your code editor to a local chat experience where you can sort of chat with your code base and ask questions and do local agents to now, increasingly, okay, we've got autonomous agents, we can delegate work. And by the way, the form factor might look completely different again six months from now. So I think the mandate of heaven is actually going to probably keep changing constantly based on who is sort of first or best at the next form factor. And every new form factor is like a new front to battle. But as far as evaluating the models themselves. So we built an internal, kind of pretty comprehensive evaluation suite. The original draft of it was called Junior Dev Evaluating. Could these models act like junior developers? We have a fork of it now internally that's more like a senior dev because the models keep getting better and we work with every lab to basically before they release models, we run our evals and we give them feedback and we say, hey, we think you guys are strong here, you're weak here. Here are some ideas for how you can make this better. And we have a great partnership with every lab about this. I think many of them have told us that we have the best private evaluation suite for agentic coding tasks. That's like external to, you know, sort of independent from a model provider. So we care a lot about evals because we, you know, we find our customers, they want the best models. The other interesting data point is that no matter what task you give, the eval scores are consistently worse if you constrain the agent to use one model versus if you can use multiple. To your point, there are differences, right? So for example, whether it's personality or whether it's macro context understanding or details, these little differences add up.
A
That's really interesting. I mean, is there a structural reason for that staying true forever?
B
The differences forever? Yeah, yeah.
A
Like, like, like if, if, you know, we have our, if we're, if we're holding equal the distribution of like AI researcher talent and everyone still has like the same amount of chips across the, you know, three or four labs. Like is it, you know, is there like, like what is the reason why things are spiky in this direction versus versus another direction?
B
Yeah, so I think, I think the sort of the structural equilibrium is one of model convergence. You know, the capabilities increasingly converging, increasingly similar to basically similar level, similar levels of performance in every domain. And I think if you look at sort of why would that happen? I mean the trend lines are in that direction, but why would that happen in steady state? First you have the scaling loss, right. So it takes exponentially more cost inputs for linear gains in any benchmark you choose. Right. And so if you're operating at small scale, it's easy for one firm to spend 100 times more than another firm if you are $1 million versus $100 million. But once you're in the okay, we're all spending hundreds of billions of dollars that it's hard to get a multi order of magnitude leap over your competitors. So I think there's a kind of a scaling laws reason that these things are convergent. There's also just the practical reality that non competes are unenforceable in California and people are moving from one lab to the other all the time. I think the half life of a proprietary algorithmic insight is probably like three months. We might guess even within the Labs you have one person working at OpenAI and their partner working at Anthropic and who knows. So I think the half life of proprietary IP in Silicon Valley is short. And so if you get to this state of the models roughly converge, maybe there's some personality differences, not more capabilities, more personality that could persist. But I think the last point that's relevant for every task is we have this mantra in Silicon Valley that oh, we always want more intelligence, more intelligence, more intelligence, more intelligence. We've got to build clusters of compute in the galaxy, to harvest energy of every star to have the most intelligence. And I actually do think there are use cases for ever increasing amounts of intelligence. But I think this also sort of ignores the fact that for any given application domain, often you reach a threshold of intelligence saturation where for that use case it's enough, you know, And I can tell you, for example today, if you said, hey, let's build a simple static front end site for Chinatalk, any model, any frontier model would do that well today. And so once you're at the level for a given task, of that task is intelligent, saturated, you don't really care which model you're using you care about. Okay, it works. So now is it fast and is it cheap? And I think increasingly more and more domains are going to actually see this intelligent saturation, at which point what model you're using becomes less relevant and the interface and the experience around it and how it kind of drives outcomes end to end for your company, for your government organization matter more.
A
All right, driving outcomes. Let's talk about it. Before we go to the government stuff. What are some enterprise case studies that you guys have worked on that you think illustrates what, I don't know, 2025, 2026 models are capable of powering? Devin?
B
Yeah, the thing that I've been the most sort of, I guess surprised or impressed by is the ability of these large organizations to take Devon, take autonomous agents and do massive multi year projects in weeks or months. So I'll give you an example. There is a law that changed recently in Brazil that changed the taxpayer ID numbers of Brazil to be alphanumeric instead of numbers. Okay. Think of this as like the Y2K catastrophe of Brazil. It's called the CNPJ migration. So every system, you know, every system in the country that tracks taxpayer ID numbers for companies, it's like it has to go alpha domeric and it's, and it's a different format, it's longer. And so, you know, think of the banks, the health care providers, the government agencies. This is like a huge problem. We work with the largest financial services organization in Brazil called iTau and they, you know, they had a sort of a two year plan to become compliant with this change. It involved upgrading COBOL mainframes, it involved upgrading processing. Conceptually it's not complicated, but when you have thousands and thousands of different systems that all interact in complex ways, it gets really messy, really gnarly. And they were able to use Debon to get the bulk of that project done in three weeks instead of two years. And then you can kind of clean up the edges however they wanted. But it's been really impressive to see, you know, stuff like that, the sort of multi year to multi week project happen more and more.
A
So is, is this kind of where we're at today is like the, like the really not fun, painful migration stuff where it's just like, you know, some version of transposing a to be in a way which is sort of more modern and functional is the, is the current sweet spot for software and models.
B
So anything that you can kind of validate automatically is a sweet spot, I would say. And I'll give you an example of why I'm working on cognition. An example is before this, I was at scale AI, which provides data to the Frontier labs. And what we found is that we were doing labeling at scale with sort of human experts saying, hey, this model response is better than that model response, and trying to provide reinforcement learning with human feedback to improve these models. And what was happening is that it just kept getting harder and harder to do well, because every human response needs to be smarter than the model's own intuition for it to provide useful signal in some sense to make the model better. And as the models get better, that gets actually really hard to scale. We were finding experts like PhDs in chemistry and true domain subject matter experts in every niche in the world to be able to try to keep eking out better and better performance from these models. In software, you have a big difference, which is that you can just run the code, you can compile the code, you can test the code, and if it works or doesn't work, that signal you can use for reinforcement learning to make these things better. So every application, whether it's in government, whether it's in the private sector, you can think of where we can build an automatic feedback loop. I think that's really the key enabler to success. So migrations are a good example of this because you can build tests to say, hey, how should the system behave? Does the new system behave the same way as the old system?
A
Can we talk CVE mitigation for a second?
B
Yeah, yeah. So a lot of people are worried about security and AI. And I think the worries are real in the sense that people are using AI in all sorts of ways that they haven't before. And attackers are actually using this to discover vulnerabilities in really novel ways that would have been really hard to do manually previously. And what's happening now on the other side is that the defenders are kind of fighting, are fighting this. With AI, we have great existing tooling for sort of scanning and detecting vulnerabilities via like traditional static analysis. You know, think like a sonar cube or a veracode or a snyk or anything that you can take it or a code base and say, okay, what's my risk surface area? The what happened a few years ago is you would do that and then you would get thousands of alerts. And sometimes you get tens or hundreds or hundreds of thousands of alerts or even millions of alerts. That really large organization. And so they have to get really like, there are large organizations in the world that today have hundreds of thousands of Open alerts of hey, this might be insecure here. Which if you think about that, it's kind of terrifying but it's also, it's challenging because we just don't have the capacity to go read all of those and staff, you know, staff the team to go fix them. There's just, there's just more problems than people. And what we're seeing with, with Devin and actually just with AI more generally is that this is like a really good use case because you've got tons of alerts. It's pretty toilsome. They need to be triaged and AI can do the triaging actually quite well. And so, you know, some of the largest financial services firms in the world, for example, they, they apply debon to every cve. Every single vulnerability that's caught in their entire code base before even going to a human, it goes to Devon, it goes to the AI agent and then we try to auto remediate and we're right now at a roughly 70% fully automatic remediation success rate right now. So the code change suggested by Devon can be accepted and approved in one click, no changes needed. And that should only go up obviously as the models keep getting better.
A
Yeah, I mean I think this is an important point as like you are not going to make critical infrastructure, whether that be a bank or a power plant resilient like to the degree you want it to, especially when you have AI attacking it on the other side. And the cost of sort of, you know, getting into these systems starts to decrease because you know, your power plant or water treatment plant has had 30 years to hire the software engineers to clean up this stuff and just hasn't. So the only way we get into that world where they are, where they do have stronger defenses is like there is something which is way cheaper than what the alternative has been for the past few decades. So it's cool that we're at the point where we're seeing that the systems,
B
they don't even need to be vulnerable to automatic AI infiltration to be at risk. We actually see on the attacker side, humans working with AI has made attackers much, much stronger. I'll give you an example. There was a vulnerability a few months ago called React to Shell. Very, very, this is a 10 out of 10 critical vulnerability where you could essentially remote control any server by sending the right network requests that use this library, which is a very popular library. So 10 out of 10 severity. The attacker who found this found it using AI tools. In fact a product we offer called Deep Wiki. It's like our code based intelligence product. We give it away for free for every open source repo was used by luckily a good Samaritan researcher to go find issues in this code base and then just basically unlock novel exploits. And part of the challenge, one of the hard parts of being a security researcher is just like wrapping your head around all of the code that already exists inside some existing system. And so when AI is making it easier to ask questions about that code, to summarize it, the attackers get a lot of leverage. Yeah.
A
Well, let's talk about the sort of understanding the code base dynamic both in your kind of like legacy corporate clients as well as the government ones. Like why is that such a challenge to upgrading them?
B
Yeah. So I mean right now if you think of the state of the world, these models, maybe they have like context window in the million ish range. So you can throw in say a million tokens and okay, we can reason about that correctly. A lot of the real world production systems in enterprise and governments are much, much larger than this. Right. You can have individual code bases that are hundreds of millions or billions of lines of code. You can have thousands of systems that plug together in different ways. And we talk about, oh, we still need to understand what we're doing as human engineers. I would argue no human engineer actually understands what we're doing inside a large organization at this point anymore. The complexity has already escaped, you know, the constraints of like one person's brain. It's just, there's just too much stuff, it's too interconnected, it's too hard. And so that, that the same reason it's challenging for people. It's also challenging for, for you know, most kind of AI systems, especially where the like there's a, I would say the limitation of the models right now. And I would expect in the coming years your models should get better at handling bigger and more complex, like more complex code. But we do a lot of our research team at Cognition has focused specifically on large scale code based understanding to take, okay, every disparate system. How do you look at it together in the same way and you reason about it. It's actually a mixture of deep learning and graph algorithms to build this high level graph relationship of the different parts of code and the different systems in an organization that tends to scale much, much higher.
A
Yeah, so let's stay on that for a second. How do we go from a million token context window to something that can actually understand what's going on in our gnarly Brazilian bank?
B
So for right now, you need something, you need more than the models. Right now, I would say you need more than models. And look, we hope at cognizant. We always want underlying base models to get better every day. And we train our own models that are specialized for specific tasks. But the models alone are insufficient to do very large scale code base understanding. Well, what we found is that if you basically try to index everything so you can kind of throw the billions of lines of code and the many different systems into kind of structured machine learned representations of what are the key similarities and differences across these different services and what are their relationships. You basically can build this graph data structure that interconnects how everything works in much higher degree of detail. And then you can still use LLMs when you're zooming in at some specific area to say, okay, how do these pieces fit together to basically go solve a problem? And I think this is a really important point. If you look at kind of AI and software, and by the way, this is true in other AI domains too, it's much easier to make a new thing from scratch than to make changes to an existing thing. Because to make changes to an existing thing first you have to understand why the thing is the way it is and the why something is the way it is. You know, that might be decades of historical context. Some of it's documented in the code, some of it might be written, you know, in a, in a confluence page somewhere else. Some of it might be in one guy's head who left the organization five years ago. And so you have this enormous history that I think we have to respect when we're trying to make changes to real world systems. Yeah.
A
So I think Social Security is perhaps the paradigmatic example of that. Like no government or so no administration wants to do anything to stop those checks going out. And that plus the census data being so finicky, ended up during the pandemic, enabling hundreds of billions of dollars of fraud because there wasn't like a more modern system that would allow you more visibility into where those checks were going. I don't know.
B
Yeah.
A
Thoughts on that in the government context?
B
No, totally. And I think, I think it's, I think, you know, sunlight is the best disinfectant. I think it's great that the government is starting to put out these data sets, these public data sets to say, hey, you know, community go where is the fraud? You find it. You know, we're not even going to find it. We actually assigned, assigned to Devin to the recent like large, large, large dataset release from HHS to go find like what are the fraudulent patterns in here? And like, very quickly this is, you can tell this is a task really well suited for AI because there are anomalous movements of money patterns here that do not add up relative to the distribution. And so I think you're going to see, I think you're going to see a lot more of that. Both government agencies using AI internally to fight fraud, but then also sharing data externally to sort of leverage the full, the full community.
A
So what are some of the dream projects? Like what, what, what, where do you really want to sink dev in in the coming years?
B
I think, look, state capacity matters a lot to me. You know, we as both a citizen and as someone interested in the well being of the United States, it's great to see what our country is capable of at its best, but also frustrating to see what it's hindered by at the worst. The incentive structure of how the private sector helps government and the way contracting happens and the sort of resulting lock in and stickiness of really suboptimal systems for long periods of time. It's actually really, it's really frustrating and I think it affects us every time we go to the dmv. And I would love to see a future where we have high state capacity for software. That there is not this big gap between your experience using, you know, using software with the government and your experience, you know, using software in every other aspect of your life where things continue to get better. You know, the, the bits power, the atoms. Like our, our interaction with the physical world, it's increasingly governed by the software systems. And so one of the things we're trying to do in cognition for government is empower every agency to sort of get to where they want to go. And really it starts for us with modernization. Modernization is the bottleneck for a lot of these problems. And we work with a ton of agencies at this point. We work with the army, we work with the Navy, we work with the treasury, we work with NASA, jpl. I think we have, I think dozens now of fedramped deployments and we're just getting started. But I'm really excited to kind of help level the playing field between public sector and private sector.
A
And how has the experience been putting Devon in government versus financial system or other enterprise?
B
There's more parallels than you might expect. It turns out actually the largest health insurers in the world, they are also very sensitive to regulation. They are also very sensitive to security. They also have enormously complex systems. And so I would say there's actually more similarities than differences. And that's one of the reasons we're deciding, hey, you know, relatively early in our company journey, we want to go help the government too. You know, it's not, it's not this completely different set of problems. The problems are the same. You have to work with the, you know, you have to work with your counterparties in different ways to, to kind of be useful, but the problems are actually pretty similar.
A
And what about if we're talking about like Stripe or Notion or I don't know, some Silicon Valley firm.
B
Yeah, the like Silicon Valley tech native startups are I think, really different. You know, there's sort of a spectrum of buy versus build. Right. And I think a lot of what's special about Silicon Valley is, you know, companies are building things themselves, right? They're constantly building things. They're, they're making their own agents, they're, they're shipping, they're, they're shipping new things all the time, constantly reinventing themselves. And they have a lot of companies where, you know, their core focus is not software. Their core focus as an organization is solving some other set of problems either for their customers, their citizens or their stakeholders. And software is just a tool. Software is a tool to get the job done. And, and what's happened is historically these organizations, the ones that they're not native software organizations to some degree, they're reliant on the software vendors to bring them the tools for the job. And I think if you really play forward, what's going to happen with AI and software engineering? Every company, every organization, every government agency is going to be in control of its own destiny in a much bigger way. If you think about how constrained software creation is right now, engineers, everyone needs more engineering capacity than they have. The roadmap is really long and things get cut all the time and descope so that you can prioritize what's needed, that's going to start to flip. And I think it might be the result might be that every company has the capabilities of a software company.
A
So what does the software engineering starved healthcare provider or federal bureaucracy actually need in order to sort of taste the fruits of that future? Russell, besides a good procurement process for a little devastation?
B
You joke about the procurement, actually the procurement process is one of the sort of the first beneficiaries of the fruits of software abundance. People are joking about the SaaS apocalypse right now or some aren't joking about it. Some companies are, you know, their stock is down, you know, 30% on, on, on this concept. And I think, I think a lot of the concept in some ways is, is overblown because we're not going to all vibe code our own systems of record tomorrow. But I would say the leverage has flipped and procurement organizations are seeing the benefits of this. So you know, one of our large like Fortune 500 clients, they actually instituted a new procurement process with Devin where before they buy any other software, they first prompt Devin and say, hey, can you go build, just go try to build this application. And Devin is not going to one shot, build a giant company's application in one go. But you can get a prototype and you can get sort of a taste of something. And then the procurement team goes to the software vendor and says, hey, like we want a discount. And that's actually an effective negotiating tactic. And then people are getting discounts from that already. And I know of at least one case where you know, it was an infrastructure provider and the firm decided actually we are going to build this internally because it's actually not that hard. And we've got the prototype and the prototype works. And so I think we are starting to see that happen like in the real world in Q1, 2026 this is already happening, but it's going to, it's going to put pressure on people to deliver value the software. And that's what I'm personally really excited about is, you know, less rent seeking, more product quality.
A
Yeah, I wonder also on the like the question of how many really good people do you need to get to like passable, right. You know, we've for the past decade or so have had all these, you know, Code for America or like various, like rotate for two years into government and on the one hand they do good work and on the other it's like okay, maybe, maybe you make like a nice front end or like you fix one problem, right. But the ability for like that one person to fix 10 problems or 50 problems in that two year cycle, I imagine these tools are going to be, allow those, those folks who take those, you know, who take these jobs and do these rotations to have a lot more leverage.
B
Yeah, no, I mean we see that all the time. And it's one of the fun things about software is basically everyone always wants more software. And so what happens is if you're the individual engineer, you can just ship a lot more than you used to be able to and you're more empowered kind of cross functionally as well. Right. Like you can get some help with your designs, with your agent, you can get some help scoping the product roadmap, you can get some help with the integrations. So each individual person is getting a lot more empowered. By the way, in every function, the product manager feels the same, oh, I can prototype this without the engineer. And the designer says, I can build this and scope it without either these too. So you have every, every person kind of traditionally involved in the, the process of building software is more empowered to have more ownership of, of the outcomes they're driving. And I think the result of this is, you know, you can, of course you can get a lot more done with smaller teams, but organizations are also just getting a lot more ambitious and we see, I mean, I would say that the bulk of the change that's actually happening right now is people are taking the productivity gains and what, what more can we ship? What more can we pull in on the roadmap?
A
Yeah, and I think from a policy perspective, and this is, you know, a drum I beat a lot is like, you need to use these tools even if you're not a software engineer because like, the possibility space of what you can do from a policy perspective is just going to be, is just going to expand. Like the, the idea I came up with was some like, dynamic pricing of, for the FAA to do like different, like Dr. In space for, you know, delivering your packages or like taking your kid home from daycare or something. And they're awesome. Yeah, surge pricing for my daycare vitals. And, but like that is a very. Software like that, like that, that's a big demand on software. Right. And we in New York City right now, we have this incredibly dumb version of surge pricing. It wasn't necessarily because the software was complicated, but like, like there, there you can just have more creative, dynamic things because it's, it will no longer be impossible to do what the equivalent of, you know, 10 FTEs building you a thing in 2024 or 2025. So I'm excited for like people to use their imagination when it comes to how to do this stuff better.
B
Yeah, yeah, both. But like, I mean, for policymakers, I think, you know, it's useful to like implement your own policy ideas with these tools. But I think it's also really important to, to build the mental model of okay, what's possible and what possible. Because that mental model changes I think every month. I think actually one of the greatest harms that we did in generative AI is we shipped, you know, Google Auto Answers at the same time we had ChatGPT Pro and a lot of people were, you know, running a Google Query on a cheap model served for free. And they're, oh, this AI answer is not very good. Meanwhile, you know, if you have the $200 a month ChatGPT Pro subscription, the answer might be research grade quality. And so people were building very inaccurate mental models of what these systems are capable of. And everyone's guilty of it, including people who are even working on the tools. If you're working on building the tools and you're not constantly testing the frontier, your mental model goes out of date really quickly.
A
I mean, not to give away your evals, but what are you hoping to see in the next few years?
B
I think we're heading to a world where building software, it's already no longer really about coding to some degree. Like the writing the code is not really the bottleneck anymore, it's everything around that. So I think humans still have to understand the code we're putting into production. And the emerging bottleneck, it's actually in review. So we launched a product a month ago called Devin Review, which it's a very human centric interface for trying to understand what is increasingly AI generated code. You have people making changes that are thousands and thousands of lines. The volume of code is growing enormously. But I think for where we are right now, 2026 Q1, you've still got to understand the code that you put into production. I would say by 2028 that is no longer true. I think we will just have much broader specifications of systems that characterize, okay, this is how we want the system to behave. It looks something more like writing a specific in English. And AI kind of compiles the English spec down to software. But I think in 2026 and probably most of 2027, we're still going to be looking at code, trying to understand code, and we're not going to yet be at the level of reliability that you can just fully automate these things. It reminds me a lot of the way in self driving. So when I was at Tesla, I was on the autopilot team, I was working on the Vision neural network. There's, you know, when you get to sort of 99.9% reliability, a lot of drivers start really trusting the system because it works 999 out of every 1,000 times. And so that one in a thousand where you actually have to take over, people are human and you pay less attention. And so I think we're kind of right now in that uncanny valley phase of AI software engineering where it works so well that you almost might be too trusting of it, but you've still got to understand what you're doing.
A
You mentioned this earlier but like the, the sort of the self driving form factor for when you get to level five is like you take a nap and that's kind of like it's, it's clear like what the end state we're going towards is. I mean how do you guys think about like what the next kind of interaction paradigm is going to end up being?
B
Yeah, well, so self driving is interesting because you're right that level five is you take a nap but that's kind of the limit. Right. You, you still decided where you want to go, right? Where should the car drive you versus in software? I actually think there might be a level six which is, which is you don't even decide where you want to go. It's actually, you know, maybe you have some very high level objective for what you want to accomplish but the like level six autonomy of software is you know, the kind of AI agent actually deciding, deciding the details of what to even build in the first place. I think the level of abstraction that people are going to be operating on is going to grow really high, really, really sort of unexpectedly fast. And if you, if you kind of specify a business objective or an outcome you want increasingly we're going to be able to optimize against that objective directly.
A
Yeah. And it's funny because that's actually where I feel the limits most is like that I sort of, I like the like the first question idea generation, like what direction to take this like vague thing I have and yeah. Sort of the execution or the research or you know, finding all the random stuff on the Internet or like building me the MVP like that it can take care of but it's still, and I guess this comes back to your like can you execute the code and see if it works or not? That's a hard thing for, for a model to like come up with the policy idea that's going to like fit into all of the sort of constraints we're living in or the episode or question topic or what have you.
B
Yeah, I think right now, I mean it's kind of like in a lot of life it's all about asking the right questions. That's I feel like the key skill of using models right now is like well what question are you asking? What task are you trying to do? I do think that's a distinctly human activity that is going to remain human for a long time. And I mean even the way we've structured our society as a democracy ultimately you know, we as people are in charge of what we want to do, what we want it, like the structure of society. We want to set up how we want to push forward. And so I do think these things are tools ultimately. Like they are tools for the betterment of society, but they're getting much, much more capable, much more autonomous all the time.
A
When you're working with the clients and for the, for your, for deployed engineers, like, are they oftentimes just kind of squinting around being like, oh, you guys should. Yeah, you thought you wanted us to do A. But like B and C is also something that these models are capable of. Like how much do you see cognition serving the role of like, you know, AI to problem finder?
B
I think, I think that is an area that we help with a lot right now because usually the customers understand their problems, but they also don't necessarily have the best mental model of exactly the full universe of problems that are addressable with AI today. And the thing that's really interesting about Devin and just agents in general is once you're plugged into the code, you can see all the problems. You can see the sort of problem discovery process that used to take lots of conversations, lots of challenges within software. Whether it's the security vulnerabilities we talked about or something else, you're getting increasingly automated. So a typical engagement for us might be government organization or a large enterprise would come in and say, okay, we have these three outcomes we want to achieve and we think we can do it better with AI. We're going to modernize this legacy system and we want to do it in weeks or months instead of years. We need to build this new product, this new capability and we want it as fast as possible and it's going to grow our business this much. Or we need to structurally improve our, our testing coverage, our validation, our security posture. And here are the metrics so they have some set of outcomes. And what we find is that inside each organization there's actually a really wide distribution of how much people are leaning in to using new generation tools to do their job. And in every organization, it doesn't matter if this is the most you think of as the most legacy old school organization in the world. There are people in those organizations that are excited about the future and want to try new things and learn, always consistent 100% of the time. And those people I think are more empowered than ever to have extraordinary impact. There's also folks who are like, I've been doing it this way for 30 years and I'm super skeptical of all this stuff. And I think those folks are. The evidence is increasingly growing that it might be worth taking a peek at.
A
What are your calls to action? Who are you hiring for, what kind of conversations you want to have coming out of this?
B
Yeah, I mean, so we're hiring a lot in cognition for government right now for folks who have kind of been on the ground and seen the problems firsthand. Right. So I think our forward deployed engineering organization, it's maybe the fastest growing of all the roles in some sense. People are asking, what does the future of software engineering look like? It might look like you always have to understand the problems of your customer because the writing the code part is getting easier and easier. And so you might. I actually think if you look at like kind of our core research and engineering product team versus, you know, engineers who wear multiple hats, they interact with customers, they help shape the product. You know, the ladder is growing. It's growing much more, much faster and in the limit. You know, we all might be working directly with other people in some capacity. So we're growing that a lot. And we're looking for folks who have kind of experience building in the public sector. And then we're also growing sort of how we do we call engagement management because these projects are very rarely just about the software. It's like, what are the end to end organizational problems that we've got to go solve? Of course we have classified deployments and we work in secret networks. And so folks with the right clearances and backgrounds are always, always super interesting. But I think we're really. Yeah, we're just kind of scratching the surface of how much this is going to change.
A
Russell. So I turned the mic back on because we have some family lore to share. Russell, give it to us.
B
Well, you were asking me why. Yeah. Why I was so interested in the census, you know, the 1890 census and how we kind of popularized punch arts. Yeah, My grandmother was, she was one of like the first female programmers in, in the, in the country back when it was like in a very arcane, you know, very arcane activity of messing with punch cards. Later assembly came out. She was like super excited about that. She gave me a lot of crap growing up that we had it so easy, you know, in the 2020s, like writing code with, with a computer that you could edit and like you didn't have to worry about dropping things. Yeah, her, her, her master's thesis was on the knapsack problem, which that line of research ended up actually being really useful in the Apollo missions. And so, yeah, part of my hope for cognition, for government is we can go full circle and we can help bring the government back to actually where it once was, which was the true leader in technology.
A
What does she think about Devin?
B
Unfortunately, she passed away a few years ago, before Devin came out. But I think, I think she would look at it and I think she would be proud. I think she would be happy. Okay, that's nice.
A
It's like the way the genders flipped in software engineering, where in the first few decades this was like a very female coded field to having a change. I wonder if actually all the AI tools are going to help it flip back again because the sort of the type of skills, well, it's just going to rearrange what the labor market looks like. And I think different skills end up being prioritized in a way that you don't kind of see the gender split which has dominated for the past few decades.
B
Totally. And at a minimum, I think it would be so accessible so early in your life to learn and use these types of tools that you might start building applications with AI before you even know what the concept of a gender norm is. Software like water, they're just flowing everywhere and it's gonna be a really fun time to be a builder. Cool.
A
All right, well, first you gotta learn how to speak, but maybe we'll give my daughter like six more months. Awesome, Russell. Well, thank you for that.
B
This episode is brought to you by Nespresso Introducing Virtuo up, the latest in a long line of innovation from Nespresso. It's innovation you can touch, sense and taste in every single cup. With a three second start, easy open lever and dedicated brew over ice button, it's even easier to enjoy your coffee your way. Sip for yourself. Shop Vertuo up exclusively@nespresso.com LifeLock how can I help? The IRS said I filed my return, but I haven't.
A
One in four tax paying Americans has paid the price of identity fraud.
B
What do I do?
A
My refund though.
B
I'm freaking out. Don't worry, I can fix this.
A
This Lifelock fixes identity theft guaranteed and gets your money back with up to $3 million in coverage.
B
I'm so relieved. No problem. I'll be with you every step of the way. One in four was a fraud. Paying American.
A
Not anymore. Save up to 40% your first year. Visit lifelock.com podcast terms apply.
Episode: Software Abundance for Government With Cognition's Russell Kaplan
Date: March 9, 2026
Host: Jordan Schneider
Guest: Russell Kaplan (Co-founder of Cognition; former Scale AI and Tesla)
In this episode, Jordan Schneider chats with Russell Kaplan about the challenges and opportunities in modernizing government software, the transformative potential of AI (especially Cognition’s “Devin,” an AI software engineer), and the impact of software abundance on public and private sectors. The discussion dives into technical, organizational, and policy aspects, with a focus on legacy systems, AI’s role in software engineering, procurement reform, and the road ahead for both government and enterprises.
“Tens of millions of lines of COBOL are powering our treasury, our Social Security Administration, and it's not getting better.”
— Russell Kaplan [02:46]
“AI is actually the next logical rung on the ladder... It's telling your computer what you want it to do, but in English in a way that's really natural for everyone.”
— Russell Kaplan [05:49]
“When AI is going to do the switching and it's going to work on it 24 hours a day and it's not going to get bored […] that strategy [of vendor lock-in] doesn't work anymore.”
— Russell Kaplan [09:19]
“We provide this AI software engineer, Devin, that people can deploy against their code to really quickly transform it, improve it, modernize it, upgrade it.”
— Russell Kaplan [10:40]
“The differentiation in models is decreasing, not increasing over time; the structural advantage [goes] to the agent labs.”
— Russell Kaplan [16:00]
“You can just run the code, you can compile the code, you can test the code, and if it works or doesn't work, that signal you can use for reinforcement learning to make these things better.”
— Russell Kaplan [26:01]
“You have to have this enormous history that I think we have to respect when we're trying to make changes to real world systems.”
— Russell Kaplan [33:51]
“Every company, every organization, every government agency is going to be in control of its own destiny in a much bigger way.”
— Russell Kaplan [39:07]
“The writing the code is not really the bottleneck anymore, it's everything around that. So I think humans still have to understand the code we're putting into production. And the emerging bottleneck, it's actually in review.”
— Russell Kaplan [46:13]
“The level of abstraction that people are going to be operating on is going to grow really high, really, really sort of unexpectedly fast... increasingly we’re going to be able to optimize against that objective directly.”
— Russell Kaplan [48:39]
Personal Moment:
“My grandmother was, she was one of like the first female programmers in the country back when it was like in a very arcane, you know, very arcane activity of messing with punch cards... part of my hope for cognition, for government is we can go full circle and we can help bring the government back to actually where it once was, which was the true leader in technology.”
— Russell Kaplan [54:47]
Kaplan hopes that Cognition can help bring U.S. government technology back to its innovative roots—echoing the era when his grandmother wrote some of the country’s earliest code for public sector projects:
“Part of my hope for cognition, for government is we can go full circle and we can help bring the government back to actually where it once was, which was the true leader in technology.” [54:47]
For more insights like these, check out the ChinaTalk newsletter.