wavePod

Get Wave AI

Software Abundance for Government With Cognition's Russell Kaplan - ChinaTalk | Wave AI Podcast Notes

Back to ChinaTalk

Software Abundance for Government With Cognition's Russell Kaplan

ChinaTalk

Mon Mar 09 2026

Summary

ChinaTalk Podcast Summary

Episode: Software Abundance for Government With Cognition's Russell Kaplan
Date: March 9, 2026
Host: Jordan Schneider
Guest: Russell Kaplan (Co-founder of Cognition; former Scale AI and Tesla)

Episode Overview

In this episode, Jordan Schneider chats with Russell Kaplan about the challenges and opportunities in modernizing government software, the transformative potential of AI (especially Cognition’s “Devin,” an AI software engineer), and the impact of software abundance on public and private sectors. The discussion dives into technical, organizational, and policy aspects, with a focus on legacy systems, AI’s role in software engineering, procurement reform, and the road ahead for both government and enterprises.

Key Discussion Points & Insights

1. The State of Government Software

Legacy Problems: Government spends $100B annually on IT, but many critical systems run on ancient code (COBOL, etc.) with slow modernization.
“Tens of millions of lines of COBOL are powering our treasury, our Social Security Administration, and it's not getting better.” – Russell Kaplan [02:46]
Hiring Bottleneck: Few people can maintain legacy code; the number dwindles every year.
Procurement Hurdles: Regulations designed to reduce corruption have made it extremely hard to buy and upgrade software efficiently. FedRAMP compliance is slow and cumbersome.

2. Why Modern Programming Languages (and AI) Matter

History Lesson: Language evolution improves abstraction, accessibility, and efficiency (from punch cards to assembly to COBOL to Python, and now to AI).
“AI is actually the next logical rung on the ladder... It’s telling your computer what you want it to do, but in English in a way that’s really natural for everyone.” – Russell Kaplan [05:49]
Shift in Constraints: Earlier efficiency demands have faded; hardware improvements mean software now focuses more on ease and productivity.
AI’s Promise: Autonomous agents (like Devin) can optimize code, manage migrations, and handle large, tedious projects quickly.

3. The Vision of Software Abundance

Switching Costs Breakdown: AI will neutralize long-standing vendor lock-ins; it will become much easier, cheaper, and faster to change systems.
“When AI is going to do the switching and it’s going to work on it 24 hours a day and it’s not going to get bored […] that strategy [of vendor lock-in] doesn’t work anymore.” – Russell Kaplan [09:19]
Software Becomes “Like Water”: Expect software to flow more freely, with rapid adaptation to needs.

4. What Cognition is Building

Devin, the Autonomous Software Engineer: AI agent capable of grasping complex codebases, executing large refactors, and enabling faster system upgrades.
“We provide this AI software engineer, Devin, that people can deploy against their code to really quickly transform it, improve it, modernize it, upgrade it.” – Russell Kaplan [10:40]
Agentic IDE (Windsurf): Acquired in 2025; enables “digital coworkers” rather than just tools.
Focus on Partnerships: Particularly in government and large enterprises, Cognition often embeds engineering teams directly, not just delivering tools.

5. AI Models and the Market

Model Convergence: Leading AI models are becoming less differentiated over time; success relies more on integrating and orchestrating them than choosing the “best” one.
“The differentiation in models is decreasing, not increasing over time [...] the structural advantage [goes] to the agent labs.” – Russell Kaplan [16:00]
Cognition’s Eval System: Internally developed to rigorously benchmark model performance (“Junior Dev” and “Senior Dev” evals).
Best Practices: Use multiple models for improved results; single-model constraints underperform mixed approaches.

6. Case Studies and Real-World Impacts

Brazil’s CNPJ Migration: A major government ID format change solved in 3 weeks instead of 2 years at a top Brazilian bank using Devin—illustrating transformative scale and speed.
“They were able to use Devin to get the bulk of that project done in three weeks instead of two years.” – Russell Kaplan [24:13]
CVE Mitigation: Automated security vulnerability remediation at ~70% success rate pre-human intervention at major financial firms.
“We try to auto remediate and we're right now at a roughly 70% fully automatic remediation success rate right now.” – Russell Kaplan [28:30]
Fraud and Anomaly Detection: Government agencies leveraging AI (and Cognition’s tools) to analyze large datasets, identify fraud, and augment oversight.

7. Technical & Organizational Limits

Context Window Limitations: Even the largest language models can only “see” a million tokens at a time; real-world legacy systems are orders of magnitude larger.
AI for Codebase Mapping: Combining machine learning with graph algorithms to understand and modernize massive, intertwined systems.
Bottleneck Shifts: Human understanding and verification (review) remain essential—at least until 2028, per Kaplan’s forecast.

Memorable Quotes & Moments (with Timestamps)

“Tens of millions of lines of COBOL are powering our treasury, our Social Security Administration, and it's not getting better.”
— Russell Kaplan [02:46]
“AI is actually the next logical rung on the ladder... It's telling your computer what you want it to do, but in English in a way that's really natural for everyone.”
— Russell Kaplan [05:49]
“When AI is going to do the switching and it's going to work on it 24 hours a day and it's not going to get bored […] that strategy [of vendor lock-in] doesn't work anymore.”
— Russell Kaplan [09:19]
“We provide this AI software engineer, Devin, that people can deploy against their code to really quickly transform it, improve it, modernize it, upgrade it.”
— Russell Kaplan [10:40]
“The differentiation in models is decreasing, not increasing over time; the structural advantage [goes] to the agent labs.”
— Russell Kaplan [16:00]
“You can just run the code, you can compile the code, you can test the code, and if it works or doesn't work, that signal you can use for reinforcement learning to make these things better.”
— Russell Kaplan [26:01]
“You have to have this enormous history that I think we have to respect when we're trying to make changes to real world systems.”
— Russell Kaplan [33:51]
“Every company, every organization, every government agency is going to be in control of its own destiny in a much bigger way.”
— Russell Kaplan [39:07]
“The writing the code is not really the bottleneck anymore, it's everything around that. So I think humans still have to understand the code we're putting into production. And the emerging bottleneck, it's actually in review.”
— Russell Kaplan [46:13]
“The level of abstraction that people are going to be operating on is going to grow really high, really, really sort of unexpectedly fast... increasingly we’re going to be able to optimize against that objective directly.”
— Russell Kaplan [48:39]
Personal Moment:
“My grandmother was, she was one of like the first female programmers in the country back when it was like in a very arcane, you know, very arcane activity of messing with punch cards... part of my hope for cognition, for government is we can go full circle and we can help bring the government back to actually where it once was, which was the true leader in technology.”
— Russell Kaplan [54:47]

Important Segment Timestamps

State of Government Software & Procurement: [00:20–03:23]
Programming Evolution & Role of AI: [04:42–07:17]
Software Abundance Vision: [08:05–09:50]
About Cognition and Devin: [09:56–11:44]
AI Model Market & Evaluation: [14:50–20:56]
Enterprise Case Studies (e.g., Brazil): [23:46–25:16]
CVE/Security Automation: [27:01–29:07]
Understanding & Modernizing Legacy Systems: [31:10–34:37]
Government Fraud & Modernization: [35:09–37:37]
Comparison: Government vs. Enterprise vs. Tech-native Firms: [37:44–40:01]
Changing Procurement & Organizational Leverage: [40:24–43:54]
The Future of Software Engineering (Review, Abstraction): [46:07–49:11]
Personal History—Programming Family Lore: [54:29–56:24]

Closing Themes & Calls to Action

Policy & Imagination: Policymakers and public sector workers must continuously update their mental models and experiment with new software tools to keep pace.
Empowerment: Smaller, more cross-functional teams can now deliver much more, amplifying the impact and creativity at every organizational level.
Hiring at Cognition: They're seeking engineers—especially those with experience in the public sector and forward-deployed, customer-facing roles—as well as those with the required clearances for classified work.

Final Thought

Kaplan hopes that Cognition can help bring U.S. government technology back to its innovative roots—echoing the era when his grandmother wrote some of the country’s earliest code for public sector projects:
“Part of my hope for cognition, for government is we can go full circle and we can help bring the government back to actually where it once was, which was the true leader in technology.” [54:47]

For more insights like these, check out the ChinaTalk newsletter.

Loading summary...

Transcript

A (0:00)

Software abundance for government. Why do we need it and how do we get there? To discuss, we have on Russell Kaplan, co founder of Cognition, who previously spent time at Scale and Tesla. Thanks to Cognition for bringing us this episode. And Russell, welcome to ChinaTalk.

B (0:15)

Thanks for having me, Jordan. Excited to be here.

A (0:17)

So what is wrong with software and government?

B (0:20)

We have a lot of problems with software in the government, despite the government being actually a lot of the source of innovation in software for a long time. But, you know, today the state of the world is, it's pretty sad as a citizen, you know, you interact with software for the government and a lot could be better. You know, I just to put some numbers on it, you know, there is more than $100 billion a year spent on it for the US government. A lot of these systems are ancient. The GAO did a study finding that, you know, in the 2000 and tens there were 10 critical legacy systems we needed to modernize. Less than three, or I think only three of them have even started the process of that modernization. And as a country, we're spending a lot of money and not getting the same results that we see in the private sector. And I think what's happening now with AI and software engineering, it's changing the private sector. But I'm personally really excited about how much it could change for the country as well. And I think it's actually really important for, you know, for the sort of the next generation of the United States to get this right.

A (1:28)

You mentioned the $100 billion a year number. Like, what does $1 get you of in the private sector? And how does that kind of comp over to some federal or state department spending that money?

B (1:42)

So yeah, the private sector, the way we, we buy software is, you know, we have a problem and we see, okay, what's the best tool in the market for that problem? And we buy whether it's a SaaS solution for my CRM or it's infrastructure for scaling my database. But the market tends to be more efficient for the government, it's a different story. It's really challenging for the government to purchase software directly. There is a much higher kind of compliance and regulatory hurdle for software vendors to even start working with the government. You know, we face this at cognition. Getting to fedramp high was a journey for us. But even once you're there, there's a lot of indirection which a lot of these systems were designed with good intent to make sure that there's no corruption, that people are having RFP processes that let government vendors get the government buyers get the best price for what they want. But the net result of this system is that it's enormously slow to get software into the government. And in particular to reuse Software like a SaaS tool has a much easier time being bought by a private sector company versus a government agency which often needs to have a much higher degree of ownership of the product they're using. So I mean, the net result of this, if you look at some of the data, we're still powering most of the country, the critical systems of the country with ancient code. You know, tens of millions of lines of COBOL are powering our treasury, our Social Security Administration, and it's not getting better.

B (17:28)

Yeah, so on the Mandate of Heaven piece, I think these things are cyclical. And one thing that's interesting in software engineering in particular is that the right form factor for building software is constantly changing based on, in part, the underlying capabilities of the models. And so when we, for example, when we launched Devon in March of 2024, it was just at the edge of possible, I would say, to have an agent that you could really delegate work and come back. And in fact, honestly, it wasn't even really useful for us for like another three months. Between when we, when we built this prototype that we shared with the world, it took about three months for us to then use it enough internally at Cognition that Devon became the number one contributor to Devon. So that was like a three month lag and then there was another several month lag before it actually started becoming deployed in production settings. Useful for customers. And what's happening is like as, as the models improve, the form factor for how to use them is constantly changing. So in coding, we went from tab completion, think you're writing a word doc and you hit tab to get the next response. But in your code editor to a local chat experience where you can sort of chat with your code base and ask questions and do local agents to now, increasingly, okay, we've got autonomous agents, we can delegate work. And by the way, the form factor might look completely different again six months from now. So I think the mandate of heaven is actually going to probably keep changing constantly based on who is sort of first or best at the next form factor. And every new form factor is like a new front to battle. But as far as evaluating the models themselves. So we built an internal, kind of pretty comprehensive evaluation suite. The original draft of it was called Junior Dev Evaluating. Could these models act like junior developers? We have a fork of it now internally that's more like a senior dev because the models keep getting better and we work with every lab to basically before they release models, we run our evals and we give them feedback and we say, hey, we think you guys are strong here, you're weak here. Here are some ideas for how you can make this better. And we have a great partnership with every lab about this. I think many of them have told us that we have the best private evaluation suite for agentic coding tasks. That's like external to, you know, sort of independent from a model provider. So we care a lot about evals because we, you know, we find our customers, they want the best models. The other interesting data point is that no matter what task you give, the eval scores are consistently worse if you constrain the agent to use one model versus if you can use multiple. To your point, there are differences, right? So for example, whether it's personality or whether it's macro context understanding or details, these little differences add up.

B (20:56)

Yeah, so I think, I think the sort of the structural equilibrium is one of model convergence. You know, the capabilities increasingly converging, increasingly similar to basically similar level, similar levels of performance in every domain. And I think if you look at sort of why would that happen? I mean the trend lines are in that direction, but why would that happen in steady state? First you have the scaling loss, right. So it takes exponentially more cost inputs for linear gains in any benchmark you choose. Right. And so if you're operating at small scale, it's easy for one firm to spend 100 times more than another firm if you are $1 million versus $100 million. But once you're in the okay, we're all spending hundreds of billions of dollars that it's hard to get a multi order of magnitude leap over your competitors. So I think there's a kind of a scaling laws reason that these things are convergent. There's also just the practical reality that non competes are unenforceable in California and people are moving from one lab to the other all the time. I think the half life of a proprietary algorithmic insight is probably like three months. We might guess even within the Labs you have one person working at OpenAI and their partner working at Anthropic and who knows. So I think the half life of proprietary IP in Silicon Valley is short. And so if you get to this state of the models roughly converge, maybe there's some personality differences, not more capabilities, more personality that could persist. But I think the last point that's relevant for every task is we have this mantra in Silicon Valley that oh, we always want more intelligence, more intelligence, more intelligence, more intelligence. We've got to build clusters of compute in the galaxy, to harvest energy of every star to have the most intelligence. And I actually do think there are use cases for ever increasing amounts of intelligence. But I think this also sort of ignores the fact that for any given application domain, often you reach a threshold of intelligence saturation where for that use case it's enough, you know, And I can tell you, for example today, if you said, hey, let's build a simple static front end site for Chinatalk, any model, any frontier model would do that well today. And so once you're at the level for a given task, of that task is intelligent, saturated, you don't really care which model you're using you care about. Okay, it works. So now is it fast and is it cheap? And I think increasingly more and more domains are going to actually see this intelligent saturation, at which point what model you're using becomes less relevant and the interface and the experience around it and how it kind of drives outcomes end to end for your company, for your government organization matter more.

B (27:05)

Yeah, yeah. So a lot of people are worried about security and AI. And I think the worries are real in the sense that people are using AI in all sorts of ways that they haven't before. And attackers are actually using this to discover vulnerabilities in really novel ways that would have been really hard to do manually previously. And what's happening now on the other side is that the defenders are kind of fighting, are fighting this. With AI, we have great existing tooling for sort of scanning and detecting vulnerabilities via like traditional static analysis. You know, think like a sonar cube or a veracode or a snyk or anything that you can take it or a code base and say, okay, what's my risk surface area? The what happened a few years ago is you would do that and then you would get thousands of alerts. And sometimes you get tens or hundreds or hundreds of thousands of alerts or even millions of alerts. That really large organization. And so they have to get really like, there are large organizations in the world that today have hundreds of thousands of Open alerts of hey, this might be insecure here. Which if you think about that, it's kind of terrifying but it's also, it's challenging because we just don't have the capacity to go read all of those and staff, you know, staff the team to go fix them. There's just, there's just more problems than people. And what we're seeing with, with Devin and actually just with AI more generally is that this is like a really good use case because you've got tons of alerts. It's pretty toilsome. They need to be triaged and AI can do the triaging actually quite well. And so, you know, some of the largest financial services firms in the world, for example, they, they apply debon to every cve. Every single vulnerability that's caught in their entire code base before even going to a human, it goes to Devon, it goes to the AI agent and then we try to auto remediate and we're right now at a roughly 70% fully automatic remediation success rate right now. So the code change suggested by Devon can be accepted and approved in one click, no changes needed. And that should only go up obviously as the models keep getting better.

B (51:04)

I think, I think that is an area that we help with a lot right now because usually the customers understand their problems, but they also don't necessarily have the best mental model of exactly the full universe of problems that are addressable with AI today. And the thing that's really interesting about Devin and just agents in general is once you're plugged into the code, you can see all the problems. You can see the sort of problem discovery process that used to take lots of conversations, lots of challenges within software. Whether it's the security vulnerabilities we talked about or something else, you're getting increasingly automated. So a typical engagement for us might be government organization or a large enterprise would come in and say, okay, we have these three outcomes we want to achieve and we think we can do it better with AI. We're going to modernize this legacy system and we want to do it in weeks or months instead of years. We need to build this new product, this new capability and we want it as fast as possible and it's going to grow our business this much. Or we need to structurally improve our, our testing coverage, our validation, our security posture. And here are the metrics so they have some set of outcomes. And what we find is that inside each organization there's actually a really wide distribution of how much people are leaning in to using new generation tools to do their job. And in every organization, it doesn't matter if this is the most you think of as the most legacy old school organization in the world. There are people in those organizations that are excited about the future and want to try new things and learn, always consistent 100% of the time. And those people I think are more empowered than ever to have extraordinary impact. There's also folks who are like, I've been doing it this way for 30 years and I'm super skeptical of all this stuff. And I think those folks are. The evidence is increasingly growing that it might be worth taking a peek at.