
AI-assisted programming has moved far beyond autocomplete. Large language models are now capable of editing entire codebases, coordinating long-running tasks, and collaborating across multiple systems. As these capabilities mature,
Loading summary
Podcast Host / Announcer
Are you passionate about software development and the tech industry? Software Engineering Daily is looking for a new podcast host to grow its hosting team. In this role, you'll help shape the show's editorial direction and interview engineers, founders, hackers and tech leaders. Podcasting experience is a plus, but not required. Curiosity, great communication skills and a genuine interest in the craft of building software are what matter most. If this sounds like you, reach out at editoroftwareengineeringdaily.com AI assisted programming has moved far beyond autocomplete. Large language models are now capable of editing entire code bases, coordinating long running tasks, and collaborating across multiple systems. As these capabilities mature, the core challenge in software development is shifting away from writing code and toward orchestrating work, managing context and maintaining shared understanding across fleets of agents. Steve Yegi is a software engineer, writer and industry veteran whose essays have shaped how many developers think about their work. Over the past year, Steve has been exploring the frontier of agentix software development, building tools like BEADS and Gastown to experiment with multi agent coordination, shared memory and AI driven software workflows. In this episode, Steve joins Kevin Ball to discuss the evolution of AI coding from chat based assistance to full agent orchestration. The technical and cognitive challenges of managing fleets of agents, how concepts like task graphs and get backed ledgers change the nature of work, and what these shifts mean for software teams, tooling and the future of the industry. Kevin Ball, or K. Ball, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action discussion group through Latent Space. Check out the Show Notes to follow K. Ball on Twitter or LinkedIn or visit his website K Ball LLC.
Kevin Ball
Steve, welcome to the show.
Steve Yegge
Hey K. Ball, thanks for having me on.
Kevin Ball
Yeah, I'm excited to go into this. I was mentioning before, I've been a fan of your writing for a very long time, so I'm really interested to see your speaking goes. But let's have you introduce yourself to our members who may not be as familiar with your writing. How do you describe yourself and how you got to where you are today?
Steve Yegge
Yeah, nobody's ever asked me that before and you gave me exactly 12 seconds of preparation for this, so thank you for that extensive how do I describe myself? Yeah, an industry vet for sure, right? Like I've done this a long, long, long time. I actually started programming when I was 17 and I turned 57 a couple days ago and so it has been 40 years now. Yay. So I've seen a lot, right? I've seen a lot of transformations and that kind of thing. I picked up blogging at Amazon because I was trying to figure out how to convince an organization of 800 engineers of certain things, I guess that I thought they were thinking about wrong. And I started ranting and it picked up in popularity and then, I don't know, that just became a thing for me. Right, the blog rants. Right, the drunken blog rants. I actually quit drinking over nine years ago. I'm going for a 10 year break from drinking and I will.
Kevin Ball
But you've maintained the rant.
Steve Yegge
But. But I kept the rant going actually. And, well, I'll just say weed's lilo in our state, let's leave it at that. So anyway, yeah, that's me in a nutshell.
Kevin Ball
All right, well, and one of the reasons I'm excited to talk with you today is I think in your rants you've been one of the people laying out a lot of the bleeding edge of what's going on as a transformation in our industry right now. Let's start maybe by walking through that evolution and I'd love to get some of the play by play from you as you were thinking things. So I'm going to go back to. I think you had a blog post that called out a pattern we were starting to see that you called CHOP chat oriented programming back in 2024.
Steve Yegge
I think I remember that.
Kevin Ball
Was that the start of LLM stuff for you or was there something even predating that?
Steve Yegge
I mean, like, look, I was. When ChatGPT 3.5 came out, I mean, I was just shocked that it could write emacs lisp functions that were pretty good, right? I mean just single functions and that was about the extent of its abilities, but still was like, whoa, elisp. You know, that's a pretty edge case out there, right? And so I don't know, I think chat was probably the first time where I started feeling like I was taking crazy pills and nobody was ever listening to me because everybody uses chat today. Just the cloud code crowd is what, 10, 15% of developers now? 20%, you know, or less. It's growing, right? But I mean like still most people use chat and it was all of 2024. I was trying to get people to use chat, right? And they're like, no completion acceptance rate. You remember that metric that comes.
Kevin Ball
I do. VS code is still watching it from what I can tell.
Steve Yegge
Whoa. Well, I mean, yeah, there's probably still developers that are using completions. So, yeah, I started kind of feeling like I could see into the future. Right. Because I don't know, look, when you've done this for 40 years and you've chased productivity for 40 years, trying to make yourself go faster, you get a sense for when you're going faster, right? And so, yeah, there's a lot of speed bumps and the AI does a lot of things wrong and so on and so forth that it was just like, I don't know, it was like finding the early hover bike in the Zelda game where you could, like, you didn't have the battery yet, but it was still faster than walking. And everyone complains the hoverbike's crashing all the time, but it's like, dude, it's faster than walking. You have to use it now. And it's just been getting better since then. Right. So it's cloud code put chat in a loop. Right. Chat put questions in a loop and then Claude code put chat in a loop and Gastown puts Claude code in a loop and Ralph Wiggum does also from Jeffrey Huntley. So we're just seeing basically AI. We're just multiplying it. More AI times more AI times more AI. And that is the solution to everything. And it's really making people mad, right?
Kevin Ball
Yes.
Steve Yegge
Hacker news threads and stuff like that. Right.
Kevin Ball
Well, and I think we'll get into that. But I'm curious. So along that route, you said GPT 3.5 was the first eye opener when suddenly the scooter is going faster than walking. Have there been other turning points for you along that journey?
Steve Yegge
The next tipping point was very much GPT 4.0. Right. Because that one. What we had been struggling with when we were building coding agents that only had chat built in, but we would do the copy and paste and stuff, was that once a file got up to around 800 lines, that there wasn't a lot of fidelity in reproducing it when they would make changes. Right. Kind of like Nano Banana can be now when it's editing its own stuff, it just starts to blur. And so the tipping point was 4.0 was able to reproduce a thousand line file with perfect fidelity and make a one line or a one character change. That was huge because most source files in the world are a thousand lines of code or less, or should be. And so now we're talking, right, about GPT being able to edit all of the files in the world and make sure simple changes which immediately. At that point I didn't even know that was, it was before last year, right? It was middle of 2024.
Kevin Ball
Yeah, mid to late 2024.
Steve Yegge
I think that's already when the. Oh, you could farm this. You know, I'm a gamer. I mean, come on. You immediately gamify everything. And then the next, there was another one with Sonnet 37. It was like, oh, whoa, right. That was Claude Code. That was my biggest tweet ever. I had like 300,000 views or something where I was just like, ooh, Claude Code is neat. And then Opus 4.5 was the next big one. Right. This is the one that like got Gastown launched. It couldn't have launched without Opus 4.5. Right. Opus 4.5 is what got it off the ground. And the half life on anthropic models, if you've been counting, has been about four months between models at the beginning of 2025 and now it's up to about two months between models. So we're probably going to see an Opus 5 drop real soon. Right. So all the naysayers, I mean, we'll talk about it, but just you ask about these tipping points. A lot of the people who are looking at this problem right now and discounting it, have approximately a three month window before and after today they're looking back about three months about what the last generation of model was and they don't know very much about how Isaac Newton invented differential calculus. And like if you just zoom in, right, they're on this really steep slope and they don't see it, they just see the derivative. And so what's happening is I have had the view since chat GPT3.5 and really for 40 years of this acceleration and now I see it starting to accelerate really fast. And Opus 4.5 is the splash, the big boulder that has hit the pond where people are realizing that it doesn't need to get any smarter now at all.
Kevin Ball
Exactly. So I have in some ways a similar journey. And I would say starting from about Sonnet 3.5, we were at a place where I at least started to say, you know, what if they'd never released another model? And we just kept learning how to use these and building the tooling around them. The world of software engineering is forever changed.
Steve Yegge
It would be now. It wouldn't have made it into certain domains. 3.5 and 3.7 had their limitations, certain size of mountain they could chew through and you would have to break things into that size in order to use them, but it was doable. But now with four or five, that Mountain size has gotten much larger and it's obviously just going to continue. So, I mean, look, we're looking at a new world where, I mean, the guy who invented Redis, Antirez or whatever, did you see his post? He had this realization just a few weeks ago. He worked with Opus 4.5 and Cloud Code and was like, well, it doesn't make any sense for us to write code by hand anymore. And this is a really, really. This is the biggest horse pill that the industry has to swallow right now, right?
Kevin Ball
Yes. And I think there'll be some fun times grappling with the implications of that. But let's maybe look at our current moment where things are, and I would say, like in this evolution, as you highlight, the sort of current threshold is how do we coordinate across fleets of agents, whether they're working in parallel or even in series, where you're joining to do a set of different things and you've done some projects around this. So let's maybe start with beads, which if I blink, that was just three months ago. Yep, three months ago. Has definitely gotten a lot of popularity. There's also a bunch of related concepts, but let's sort of talk through beads. What are they? What is it as a primitive for this fleets of agents world we're living in?
Steve Yegge
Yeah. So I don't know if you saw, but yesterday Anthropic launched an update to. They retired todo write and they launched tasks, which they credited as being inspired by beads, which I thought was very nice of them. And I also get why they didn't just use beads. Right, we can talk about that. But basically like beads is a task tracker. So like a better to do list for your agent. But it has three properties that make it really, really interesting. One of them is that it's a graph. So it's a task graph, which is a lot like your work graph and your implementation plans and your Gantt charts and kind of how you manage knowledge work in general. So it kind of accidentally captured the ability to capture all work into micro bytes into bytes that will actually scale up as cognition scales up. So it's really interesting how beads are just. They're just dividing work into a graph, which we've always done, except you add two more ingredients, you add SQL. I mean, graphs and databases have never been that great, but they've been working on it for 40 years and it's pretty good. Now it's just a pile of nodes and edges and so. Right. And so you add SQL. The databases love SQL. You got to Understand the database, the LLMs. LLMs.
Kevin Ball
You can get shockingly close to beads just being like, use SQLite to track your thinking. Go.
Steve Yegge
You can, right? But beads introduces graph edges that we humans never would have probably put in or thought of. Claude helped design beads and because of that, it's got stuff the AI's feel is very important and they use it all the time. Like the discovered from which tells you like who was working on what when this bead was opened. They love that for the forensics, for understanding how the work unfolded. Because beads is the surface of work as it's getting executed. All the closed beads, the integral under that surface is the work that you've done and all the open beads are the remaining work. And beads itself tracks that surface generally. So yeah. And the third component to it that makes it completely magical is git. So it is a git ledger of all of your work. So you broken it into bite sized addressable pieces that can refer to each other. There's a graph structure to it, it's queryable with a database and it's all on a git ledger, which is just shockingly useful because you never lose them, ever. The history is always there. You can always reconstruct if there was a problem. Right. And it gets better than that because you can actually start looking at these ledgers and like determining how well agents have done over time. You can even see your own work on the ledger. It's like a portable resume for you.
Kevin Ball
It's.
Steve Yegge
It's really wild. So beads was a really interesting kind of a discovery.
Kevin Ball
Yeah, I think it's worth digging into a few of those pieces. But let's maybe start with conceptually, for a developer who's used to coding and managing work. If we're going to stay at beads, not talk about Gastown for here yet, but like, what is the cognitive shift that you make as a developer if you start using BEADS with your agent?
Steve Yegge
Well, if you're already using an agent, then it's already using probably to do lists and markdown files. Probably. What else would you use? Maybe you have a wiki or a database. Okay. And the problem with all of those solutions is that they don't have git. Or maybe you're using markdowns and you are using git. The problem with that is that doesn't have a graph structure that's queryable in a database. And the LLM has to read and parse the markdown files to get that graph structure every single time it looks and they get out of date. Et cetera. So if you're using an agent and you're really leaning into it, then as soon as you try beats you, literally, you just try it. And people reach out to me from all over the world. K Ball, man. They're like every day and they reach out to me. I had a colleague meet with me just two days ago and he was like, I started using beads. And yeah, it's a huge unlock, but I don't understand why, right? And it was because it's like catnip for the LLMs. It's like candy for them, it's memory for them. And as soon as they get what it provides, they just want to use it. They get kind of mad if you try to take it away from them, right? Because you'll never have to lose any work again. You know how LLMs are, they're like, they're focused on the thing you gave them and they'll notice, oh, by the way, you know, your other room is on fire there. But it's not really my problem, so I'm just going to focus on this, right? They disavow work. They say someone completely unrelated to me broke the build. It was them in the previous session, right? That doesn't have to happen anymore with beads because they're like, oh, I see, this is a problem. I'll file a bead for it, right? And you can see how this ties into agent orchestration, because as you're piling up beads, you're piling up a work backlog that you can, I mean, depending on how well specified the bead is. Because you can put in a spec if you want, the bead can have all these fields and comments and design and whatever, right? So if it's really well specified work, you can give it to an automate, you can give it to an agent and just have it do it and have another agent code review it and then check it in and you're done, Right? I mean, as long as it passes all the tests. Yeah. So you can see people are using beads as a substrate for agent orchestration. It's a memory, a shared memory, one that federates through git, which means it acts like a distributed database for your agents on a AWS or GCP or whatever hyperscaler you're running on. Right? Azure, they can all communicate with each other with beads. And you don't need a central hosted service. It's all going through your git repo. It's wild.
Kevin Ball
No, that's absolutely wild. So for going through git Then I haven't looked at the implementation of beads. Are you essentially storing beads in like SQLite so it's just in the file system right there, or do you have a proprietary interface or how are you actually managing it into Git? So a task Graph, it's in SQL, but it's also Git. So is it just SQLite files?
Steve Yegge
Yeah. So I did the stupidest possible thing, which was I didn't do any due diligence. I didn't realize how useful this was going to be. I wanted Git LLM, Claude wanted SQL and so we decided we were just going to cram them together in the worst possible way. Right? There's a JSON file, one line per issue, and it has merge conflicts all the time and it gets slurped into the database and there's a daemon and it gets stale. It's a two tier architecture, it's horrible. And it's all going away in the next version release which will come out like maybe this weekend, right? Which is, had I done my due diligence, I would have realized that a git database is what I need. I need a database and I need git, I need versioned data sets, right? And it turns out somebody has solved this problem. The Dolt team. You laugh. Ha ha ha. I knew this. Well, I didn't know, but it turned out to be an old buddy of mine from Amazon too, Tim Sen, who started dolt. Right. And I've got friends that are working there and I had no idea that they even had this thing. But yeah, BEADS is going to switch to that and it's just going to fix everything that is.
Kevin Ball
I mean once again I've seen a lot of people using SQLite, but the challenge with that and git is also merge conflicts, right? Merge conflicts everywhere. So yeah, having a Git native database makes a ton of sense.
Steve Yegge
I mean the three way merge goes away, BD sync goes away, the daemon goes away, the whole thing. You still have the git export, you still have the federation Dolt federates exactly the way BEADS did. It's weird. They like, it was like I was following in their footsteps, right? But they did it right and they put 10 years into it and it's embeddable in go. So it just happens to be embeddable in beads. Right? So it's just going to be one minor, you won't even notice. It's just going to get better and it enables a bunch of stuff like field level merge resolution instead of issue level and all kinds of new History and kind of new dimensional sort of looks at the things that we weren't able to get before. Key value stores and things like people are starting to use beads to. Beads is a data plane, man. It's a data plane. It's nuts. And people are like one contributor put in a key value store. And I was looking at it and I was like, I was trying to wrap my head around it. I was like, yeah, it makes total sense if you have agents using this as their memory, right?
Kevin Ball
They want to be able to share things and have a reliable way to look it up and all that. Yeah, yeah, yeah.
Steve Yegge
A bead is a task. It's very heavyweight, relatively. Even though it's lightweight compared to like a GitHub issue or a Jira, they're very heavyweight. I mean bead is much lighter weight than that, right? But a key value, right, is super lightweight. So I was like, yeah, let's do it. And Dolt supports them really super well and whatever, right? So I'm just so happy the way Beats is going. The code base is garbage right now. It's vibe coded, which means that you have to run code review passes on it like constantly. And I got behind and I'm like a month behind on code reviews and so I'm sure the code is just garbage, right? It works, it passes the test, people are using it. But within a few weeks, within, I don't know, a week, I'll get it all cleaned up, we'll be on dolt. And beads is going to be a.
Podcast Host / Announcer
Thing of beauty in mobile application security. Good enough is a risk. Guard Square uses advanced multi layered code hardening techniques and automated runtime application self protection and mobile application security testing combined with real time threat monitoring to deliver the highest level of mobile app security. Discover how Guard Square brings all these together to provide mobile app security for your Android and iOS apps for without compromise at www.guardsquare.com why is there always a meeting bot in your Zoom call? Blame Recall AI. Recall AI powers the meeting bots and desktop recording apps behind products like Clulee, HubSpot and ClickUp. They handle the hard infrastructure work capturing clean recordings, transcripts and metadata across Zoom. Google Meet Microsoft Teams in person, meetings and more so developers don't have to build it themselves. If you're building a meeting, note taker or anything involving conversation data. Recall AI is the API for meeting recording. Get started today with $100 in free credits at Recall AI Software.
Kevin Ball
So I want to now step a step back. So we talked about Beads, the particular Thing. And you alluded to a few different pieces in there about just changes in the way that we're approaching code that I want to talk about before we get all the way into Gastown, which I think takes this up a few orders of magnitude. So one of the things you talked about was you said, hey, if a bead is well enough specified, it can just get farmed out, taken care of, et cetera. How do you think about specification in this agent world as we're doing it? And what things are you or another human in the loop, how are you managing that? Like, how do you think about those things?
Steve Yegge
Yeah, I wish that I had more time to think about it. Right. This question is even more fundamental to Ralph Loops. I've been talking to Jeff Huntley a lot about this. And for Ralph Loops, you really have to specify your acceptance criteria very thoroughly or else you run the risk of getting the wrong thing. Right. And so the way Gastown approaches it, the way I approach it, my workflow is basically like we're only going to ever implement everything to a first approximation unless it's really important, like the dolt stuff. Right. We really, really push hard on that. But everything else is successive sort of iteration. We're just going to get it out there, fix bugs in it. You know what I mean?
Kevin Ball
Yeah. Just wondering about how you think about specification and in your workflow with beads, for example, when do things bubble up to a human versus running autonomously?
Steve Yegge
Yeah. So you have all these different workflows that you can support. And mine tend to be so iterative that I just rarely get time to get a lot of specification time in. But it's a really interesting question. Look, you just have to make time for it. Right? Like, Gastown is such a powerful engine that you basically spend all of your time in one of two modes. You're minimaxing, you're either minimizing or maximizing context windows. I had this really interesting discussion last week with some folks at Anthropic. I met some very lovely teams at Anthropic who were interested in Gastown because to them, they see it and they see it as exposing a lot of bugs in their model. Right. Because a lot of the Gastown workarounds are things that the workers probably ought to be doing better if they understood that they were factory workers. But that's not something they've ever been trained on. Right. So I was talking to them and they said that there's an interesting kind of split inside of Anthropic where some people love to Minimize context use. So it's like use the smallest task possible. Decomposition, right? Throw away ephemeral. Just write one task at a time because you get the benefits of your context window. Your costs expand quadratically as the token size grows and also the performance tanks after a very small size. And so they're all about performance and cost, which is great. I'm going to tie back to your specification question in a moment, I promise you. Okay. And the other group is the maxima, the context maximizers. And what they do is they load up the context window heavy with just lots of rich information and instructions and you know what I mean? Like, because LLMs perform really well and make really good decisions, especially to strategic decisions decisions when they understand why they're doing something and not just what you want. Right. And they said, so which one are you, Steve? Right. And I was like, well, interesting that you say that. You've just described Gast, Towns, Polecats and Crew. The Polecats are for the ephemeral work that's already well specified and it's throwaway. And you actually want them to be small context. You want to do one task at a time. You decompose it, right? Get them to work through it, farm through it. And it's factory farming code. But there's a lot of work that's usually design work where you're doing the hard thinking and you need to have conversations with the LLM. And usually you want to build up a lot of context with them. Right? Not to where they're getting amnesia, but often I'll be like, okay, you've just hit on a really difficult corner. Anytime there's some difficult corner of the code that I'm working on, I'm like, okay, it's time for us to roll up our sleeves and not just band aid it, but figure out the whole where it fits all in. And that means I got to load them up with context. And so I have a set of documents that I'll pull like of increasing mind blowingness, right. To show them. And so yeah, the crew supports that kind of workflow and the polecat support the other. And I think it's a recognition that they both exist. And I think as engineers we flip flop back and forth between them, but we're getting gradually pushed over to the heavy thinking kind of work where the LLMs are just going to do all the coding because we've done all of the difficult design. Which is why I mentioned in my, one of my last blog posts that I'm taking naps all the time.
Kevin Ball
Yeah, I want to get to that because I. I have noticed a similar type of exhaustion in this work. But before we go there, I want to follow up on this just a little bit more. So the thing you described in terms of when you're getting into heavy problem solving mode and you're booting up all of this context, like, one of the ways I've been talking about this with folks is like, if you conceive of these LLMs as they're fundamentally like they're a little VM, they're a computer that is language driven and code driven and all these different things encode as data. What you're trying to do is essentially write the bootloader for the problem you're solving. How do I get exactly the right set of context and data to get the right relevant things? So I'm curious, how do you for yourself manage or how do you, within Gastown manage? What does that bootloader look like for different types of tasks?
Steve Yegge
Oh, okay. So over time, before Gastown, when I was just automating my own workflows with Beats, which I think was where a lot of people are, or even without the leads, some things that I found really helpful were, See, the thing is, you got to learn what they're good at and lean into it. Right. And so one thing they're really good at is to do lists. Like, they love bureaucracy, they love acceptance criteria, they love checking things off. Yeah. And so on the boot up side, there's a bunch of stuff that I want them to do specifically because they failed to do it on the shutdown side. So like on boot up, I want you to go and look for branches and stashes and unmerged work and blah, blah, blah. Right. Unclosed beads, whatever. Clean up your sandbox, Clean up your environment on boot up as well as on shutdown. And so both of those instructions became things that I encoded in prompts. And so for many people, they're at a phase in their engineering where they're managing their own private libraries of prompts that they pull out as needed. Right. And Gastown was just basically me going, well, what if I could just have some canned prompts that sort of came up when certain roles came online and then it was all predicated on this. What if cloud code could run.
Kevin Ball
Cloud.
Steve Yegge
Cloud code. Right. But yeah, the boot up and the landing are really important. So like I have this prompt called land the plane. And my colleague on Monday was telling me about this. He hasn't used Gastown, he just uses beads. Right. But he found Land the plane really useful. I lived by Land the plane for, I don't know, six weeks. Feels like six months last night.
Kevin Ball
Time is compressing super compressed right now.
Steve Yegge
Yeah. So Land the Plane is okay because the agent will be like, party, we are done. Right? The agent is like, literally, it's giving you emojis checklist to the moon. This project is ready to launch. I am done with this feature. Look at all the things we anthropic.
Kevin Ball
Models in particular love GPT. They love GPT is a little bit more staid, but.
Steve Yegge
Yeah, except GPT can't code, so who cares? Right?
Kevin Ball
Well, different discussion. I have found GPT 5.2 to be incredibly effective for particular styles of code and prompting, but it goes a hell of a lot slower.
Steve Yegge
I heard Gemini 3 is really good at UI coding. Maybe as good as Cloud Claude. So, yeah, sure, it's.
Kevin Ball
Wait. And UI coding is like a thing that GPT just falls flat on its face. It's so bad. It's so bad.
Steve Yegge
Yeah. So I don't know, maybe they'll develop specialties. Right. So Claude loves acceptance criteria and landing the plane takes advantage of that by. It's almost taking advantage of them, like, as if they were OCD and giving them some OCD thing to make them right. Because even if they're low on context, even if their context window is near exhausted and you're like, let's land the plane, and they look up the instructions, they'll be like, yes, sir. And they'll start checking things off and they will finish that thing. Right. Even if they hit a compassion, which I view as a failure mode. Right. So you try to land the plane as early as you can. But yeah, the Land the Plane makes them much more reliable at not forgetting stuff. They're just. It's like they have common sense, but they get distracted. They need to be reminded. Yeah. So, yeah, I mean, like, these are all muscles that you build as a developer before you jump in the lion's den with something like Gastown. Right. You've got to be really, really good at, like, bringing context into LLMs, watching how they deal with it and triaging it and dealing with it. And once you get into that cycle manually for a while, Right, you're going to feel more confident to be able to wrangle like eight of them at a time.
Kevin Ball
So let's maybe use that then as a jump over into Gastown. And let's start from the beginning. Like, let's describe for anyone who has not read your rant or the Many responses that has fawned out on the Internet. What is Gastown? What's in the box?
Steve Yegge
All right. I mean, there's a lot of different lenses that you can. You can use to look at Gastown. Right. The simplest lens is to lump it into the category of orchestrators, which include things like Devin, which has been around for a long time. They're attempts to run multiple coding agents, like as a team, basically. Right. Or just in parallel on parallel tasks with some tracking layer. Yeah. And the Ralph Wiggum loop and the loom loops from Jeffrey Huntley. And you've got Claude Flow. Right. The fancy one with the routing. And then you've got. What else? There's a few others here and there, but there aren't many. But it's in that category. It's an orchestrator. Right. So it lets you run multiple coding agents. And it's pretty closely tied to cloud code right now because it's on the boundary, it's on the edge. It pushes the agents so hard that they get confused regularly. And so only opus 4. 5 is really strong enough to be able to, like, run Gastown reliably. And even then it breaks a lot. But anyway, Gastown is predicated on a really simple idea. All right, let me tell you how it goes as your trust with the LLM grows. Okay. And this is my eight stages of programmer that I called out, which got a lot of attention on its own, by the way, because it was a real challenge to people. Right. But I'm sure it resonated that they realized that they had moved up from stage one and they finally felt pretty good about it and that they were doing done. And when they saw where they fit on the entire thing, they were like, oh, no. And their ego is a bit bruised. But by the same token, they couldn't really deny it either, because they had already seen their transition of leaning more and more into the agent. What's happening is you're trusting it more. And by trust, I literally mean you can predict what it's going to do better. That's the only way you're going to trust it. Right, Is being able to predict. And that just means practice. And we're talking hundreds to thousands of hours of practice to get up that. Up that ladder. Right. And this is not just developers, it's developers today. But it's going to be all knowledge workers before long. Right. As your trust goes up, your patience goes down. It's very interesting, okay. Because you're like, you got your agent and they're working on the thing. And you know they're going to get it done because they've done it five times properly before. They're going to fix another test for you. And you're just like, you know what? I'm going to start up another agent and that's it, man. That's the gateway drug. That's the end of it.
Kevin Ball
Right, I hear you. I mean, like, I'm right now, if I self assess, I'm like the cusp between six and seven for you, right? Like, I'm managing three, five agents. Sometimes it pushes up.
Steve Yegge
Yeah.
Kevin Ball
And I have some questions around where I'm running into limitations that I will get to you. But, okay, so you're moving up this stack. Your trust is increasing, your patience is decreasing.
Steve Yegge
You're running more and more agents. And now you start running into problems that are very different for an individual dev. They're like team problems. You have agents that are running, you know, stepping on each other. You forget who's doing what. You have gates agents waiting on each other. I mean, it starts to get kind of complicated. And at a certain size, you lose the ability to keep track of it in your head. Even if you're using beads and it's just a zoo. And so Gastown was me going, well, what if I just put them all into like work tree hierarchies and sort of gave them names and, oh, it was Jeffrey Emanuel and his mail discovery that really enabled it. Beads was a huge unlock, but the other half of it was male. Right. So the thing is, LLMs like stuff they're trained on. And the longer it's been in their training set, the more they like it. And so mail, email, which has been around since the 70s, is like a pair of old jeans for them. They love the mail interface. And so you put them together with identities and the inboxes and they can send each other mail and they will. And so very quickly, I like, had this town of collaborating agents using mail and I gave them names. You're the mayor, right? You're polecats. And I said, you're going to be named after Snow White and the seven Dwarfs. So I went through the seven Dwarves, and one day I saw the mayor and it was really mad. It mailed Sneezy Polecat and it was like, you are not the mayor. You're Sneezy Polecat. So read your own inbox and do your own work. Right? And I was just like, what is happening here? Right. It was beautiful. Then one day a swarm took off and fixed all my bugs. I had like 30 beads all, like, lined up to knock out. And I couldn't find them, and I panicked and I realized that they had all been closed and fixed, right? And I was like, ooh, right? That's that swarm feel, right? So Gastown kind of emerged out of the muck, out of this primordial soup of managing agents by hand. But basically, it's. It gets into this mode where you just like, here's the thing is, if you have a rule for yourself where you never watch them work, never watch them work. That's counterintuitive advice. Most people keep their eye on the agent, and they're like, I'm going to see if you're going to make a mistake. You made a mistake. That's what they do. And that's what I did for, I don't know, six months, right? I tried to, like, watch them all diff. Looked weird, right? As soon as you get out of that mode and you realize that they're going to make mistakes and you're going to find them just like regular engineers, okay? So don't sweat it then. Instead, you're only looking at the ones that are finished. And the ones that are finished are a problem because either they didn't land the plane properly and you need to walk them through that process, right? They think they're finished and they say they're finished, but they're not finished finished. And so you got to walk them through that or they're really finished. And now it's like, what would you like me to do? And I'm like, well, let's talk about it, right? Whoa. Well, and that's when you sit back and start having those wonderful design discussions with them where you're just like, okay, so what if we were to throw the database out and try something else? And then you're like, okay, off you go. Go think about that. And you cycle to the next agent and you go, okay, I'm going to give you a hard problem, too. And so that's actually how I use Gastown now, is I spin them all up, all the crew with hard problems. And you know what? It's so wild because, like, 8 out of 10 get done and 2 out of 10 get lost. And I'm like, I know we thought about this before. Sometimes we can find the design, sometimes we lost it, and I just have to redo it from scratch. It's a little annoying.
Kevin Ball
So this gets into one of my key questions, which is one that, once again, I'm grappling with not even quite being at the Gastown level. But I can imagine it gets even more, which is like having. How do you mentally keep up?
Steve Yegge
Yeah, So I mean, like, for starters, it can be frustrating and you can find yourself yelling at them. You know, like I told you for the billionth time, don't make PRs. We're the freaking maintainer. I mean, come on. It's right in your prompting. And they're like, I made a pr. But then, you know, I realized that's it's my fault. Right? Like, there's a solution for it, actually. You need a tool preuse hook from Claude and you just say, don't make PRs, and that's the end of it. Right? It's. It's solved. So you gotta realize these things are getting better so fast that you just gotta like, mostly just take this very nice zen approach and just be like, look, they're getting stuff done and that stuff is making forward progress and we're moving the goalposts. Right? And so, yeah, it's a different world, man. How do you avoid getting tired? Dude, I take naps throughout the day. I'm exhausted. It's like I'm a factory manager now with a brand new team. They're all a little clueless. They're smart, but. Right. They're bumping into each other and I don't know the business very well and I'm running around trying to keep them all busy. It's exhausting.
Kevin Ball
That I guess gets to my question of like. And this maybe gets to another kind of related thing is like, how? Well, because I think of. You write software. Writing software, traditionally, if we go back even two years before all this, you're writing software and it has a couple rules. It is. You have an executable artifact that's maybe even the smallest one. But two, you're like creating this mental model of a system that you have. You're expanding your mental model of the business problem you're trying to solve and you're mapping between those things. Now agents are writing code, but those two other problems still maybe exist. Do you have a mental model of the code that is being written?
Steve Yegge
Yeah. So, like, have you ever worked with a really, really, really good product manager, like a very technical one, who used to code a lot and now they're a product manager. Once and. Okay, once. Yeah, you're right. And they're like gold, actually, you know, the ratio is very high at places like Google. Like the technique, but still not 100% by any means. And the ones who are really, really good, you can have a conversation with them about almost any corner of the architecture. And they'll know whether it's using a hash table or a list or whatever, they'll know the O of N performance of it because it affects a list that the user sees. Right. Or the length of time that an export takes or something. Right. So they'll have conversations with you like, well, the export's taken too long. Like, what if we used a cache for that? That and da, da, da. Right. They have a. Like the good ones. And this is also true for an uber tech lead. Have you ever worked with an Uber tech lead at a big company? Right, they're working with tech leads, and so they're coming up with a consolidated, synthesized view of the machine that's being built across all of these teams. Right. And so the extent that you can do that, that is an engineering leadership role. And it's a hat that can be worn by product or by technical program managers or by senior engineering leaders, executive leaders, whoever wants to get down there and try to understand how this thing is working. Right. And I tell you, that's the most important part. It's not actually what language it's written in usually. It's not the syntax. It's not any of the details of the linker or any of that stuff. It's what does it do? Right? What's the functional specification of this thing? And we have the ability to build software so fast now that keeping the functional spec in your head is a huge task.
Kevin Ball
That is the core problem I'm asking about. Yeah. How do you do it?
Steve Yegge
I have to remember the entire surface of Beads and Gastown, including all of the integration points that people have brought in. And I find myself often having conversations with the LLM going, I'm really embarrassed that I don't know this, but how does our plugin system work? Or how does whatever work? Right. Last I checked it, it met my approval, but that doesn't mean that I remember how it works. Right. And so I have to go back and reconstruct it. And. And yeah, so, like, it's this constant. It's a discipline thing. It's a hygiene thing. Just like you have to do regular code reviews of code you've never seen, and you have to continue reviewing it and asking questions until, like, as an Uber tech lead, you're like, okay, I think they're in the noise now. They're in the, you know, in the weeds. Like, we don't need any more code reviews for now. Right? Because you've asked them A bunch of different angles, right? Same thing goes on for the. For the product. You know, you've got to, like, understand every nuance of your product, and if you don't and you're not using it, then why do you even have that code? So you've also got to start pruning aggressively and sort of, like, retiring features that aren't pulling their weight, because otherwise you'll have tech that mountain. Right? I mean, look, man, I mean, people are like, yeah, they're not looking at the code, and so they don't understand it. That's a very junior mentality. That's a person who's not very seasoned or experienced, somebody who's never led a team, nobody. Somebody who's never been, you know, responsible for a very large operate. You know, I mean, like, look, I was on nuclear submarines in the U.S. navy. And the software projects that we make are of comparable complexity to a nuclear submarine. And they're, you know, freaking six stories tall and several football fields long, and they're very complicated. Right. And this notion that people have, they just. They're just thinking about it completely wrong.
Kevin Ball
I couldn't agree more. In terms of, like, once again, the code details, like, that's. That is in the noise at this point. And I think there is a world that we've been talking with some people about, like, we ferment over the last 15, 20 years. We moved from servers as God servers to pets to. Servers are cattle. You don't think about servers. You're spinning up and down. We're doing the same thing with code, right? Like, code is cattle is like the world we're going to here in many ways, but your system still matters and the pace of things still matters. So I'm curious, like, related to this actually. Like, I have enough trouble keeping my mental model up to date with the work that I'm doing. But I am still leading a team. So I'm also keeping up with, you know, end people who have end agents working for them. Yeah, like, this problem expands. So, like, are there toolings that you're using? Thinking about this, how does Gastown expose this stuff in a way that is easier to process? Like, how do you approach it?
Steve Yegge
I was interviewed with, like, I don't know, 10 other, like, famous ish people. Like, in 2008, a long time ago, and one of them was Peter Norvig, we all got some Polish kid, reached out to us all, somehow hooked us, and we all answered a bunch of questions. James Gosling and Guido Van Rossem and all these People, right? And Peter Norvig like gave the best answer to one of the questions. Just all time it was something like, you know, what is it that differentiates great programmers? And everybody else went blah, blah, blah, blah, blah. And Peter Norvig said, being able to keep the entire problem in your head, that was his answer. Right. And that is the problem that we're all faced with. In fact, what's going to differentiate successful teams from not successful teams is the ones who are able to keep bigger problems in their collective history head that cross functional communication and coordination. Those costs, LLMs can help reduce them, but the rate that you're producing just exacerbates the Jevons paradox of project management. Right. I had a non technical boss once, not very technical, who they were very good at using lieutenants, like using council senior principal engineers who they trusted to guide their decision making and lead their organization. And they had all the other leadership pieces. And so it was an arrangement that worked. And I think that some companies like Amazon adopt this directly and you get this dual management type, you know, set up, right, where you've got a manager and you've got a technical sort of advisor or a technical leader. Right. I think that we're all moving into this, into this role, right, where we have a. You have access to a technical advisor. And so now the question is, right, I blogged about two friends of mine that yelled at their colleague because he was two hours behind them. Right. Did you hear about this? Yeah, because they're running 20 agents each and so the they've realized that they're moving so fast that if their work is hidden from view for even a little while, if they're not completely transparent and pushing and public and loud about everything that they do, then they might as well be working at the bottom of a mine shaft and the world will move on without them very quickly and their stuff will be impossible to rebate because stuff just moved that fast. And they wound up like getting cross with a colleague because he was like, well, I implemented the blah, blah, blah. And they're like, why? Why did you do that? What information was that based on? And he's like, it's from two hours ago. And they're like, two hours ago? What's wrong with you? We made six decisions since then, right. It's like, this is a serious problem, right? So I mean like, it's the problem like Beads and Gastown have sort of taken the brakes off and is like it's going to take another one to two model iterations, you know, Opus 5. Opus 5.5. Call it summertime. Before they run, the orchestrators run smoothly and the work that they do really is, you know, high enough quality that the average developer can kind of use it and trust it. Right. But we're almost there, right?
Kevin Ball
This is one of the things and one of the reasons I was excited to get you talking on this, because I see I've interviewed a whole bunch of people doing coding tools. I'm using all of them and everyone's focused on the individual developer. Right. How do we optimize the coding productivity of an individual developer? And LLMs are full flipping crazy for this. Like, we can go so far and using it in the team context, your bottlenecks shift completely. And now it's how do we keep our wetware up to date?
Steve Yegge
Yeah. And also, like, like I was saying, that dual management thing hits everybody. Now you are a manager, you are a leader because you have a team of software engineers. You got thrust into the role and now your other skills, your soft skills, your humanity, your college education, your liberal arts is all like really important now your people skills matter, right? Because like, you're going to have a 300,000 line thing you want to go and bring to another team because it cures cancer, and the other team is going to be like, we can't read your code. What do you. Who are you? Right. And it's going to become this negotiation to get anything done anywhere. Right? Because you've produced too much. Companies are struggling with this, just merging each other's work together. Right? So the better you communicate, fitting the problem in your head involves fitting the people problem in your head too. Right. It's, you know what I mean? Like that, you know, who needs what. And I think to an extent it's a skill that some people will have and some people won't, but it's a skill that you can learn. It's a skill you can foster and develop and be taught. You can be with mentors and they can teach you this skill. This is going to be how all knowledge work works. They call it centaurs. Yeah. Have you heard this term?
Kevin Ball
Yep.
Steve Yegge
Do you know who invented it?
Kevin Ball
I don't know the origin, so I read online.
Steve Yegge
So it must be true that it came from Bobby Fischer, who was he used it to describe AIF assisted chess. You know, then there's the whole Centaur Minotaur blog, which is really clever, right? Which is. Which one do you want? You know, do you want the AI head with the human body or do you want the right the human head. And everybody wants Centaur. And then there's debate over whether it's actually called a chimera, because that's the term that academics were using till Bobby Fischer took it all away. And then you hear, like, boring, you know, hybrid, whatever. But look, the fact is everybody is going to have a friend, a helper. Everybody's going to have an AI, right? Something like Claude. Like, have you tried Claude Cowork? And you know how Claude code can do your fricking laundry for you, right?
Kevin Ball
Yep. I mean, I didn't try Cowork because I already have Claude Code and Codex working within Obsidian. Right. Like, I already have what I.
Steve Yegge
What they're shipping it for, so you get it. But you turn on Cloud Cowork and you're like, oh, I get it. It's Claude code for everybody else, like, on your desktop.
Kevin Ball
And to be fair, like, I was trying to. My wife at some point had a problem where I was like, oh, this is an easy cloud code problem, and I couldn't boot her into it. And then we tried to do it using Claude AI, the web interface, and it was terrible. And so, like, the. The use case for Claude Cowork makes perfect sense.
Steve Yegge
Yep, yep. Same, same. Fun times ahead. You know, the naysayers, I think they're in for, you know, really, they're going to go through the five stages of grief this year, but interestingly enough, at the end of it, you wind up, hopefully with happiness. Some people will drop out of knowledge work. I think they're just not going to like this. They're not going to like this whole Centaur thing. They're not going to want to work with an AI, you know, So a.
Kevin Ball
Thing that I've been grappling with is a tremendous amount of what we're doing with AI tools right now is trying to do kind of work at the same or maybe slightly lower quality in much higher pace. And there are times when, like, more is more. Right. You got a bunch of features you got to rip out. Like, it's fine. The thing I'm trying to invert is, like, what are the use cases? Or when are we able to use this AI assistant to do the same work but at a much higher quality.
Steve Yegge
I got a buddy who's one of the best engineers in the whole world. I mean, the stuff he's built is a really large long list at big companies like Amazon and Google. He insists that his quality is much higher with LLMs, and it's because of the way he does it. He just chooses you get with LLMs, you get what you choose, what outcomes you choose, right? And he chooses quality. And so what he does is he reviews all of their work and it becomes a pair programming exercise. And it's necessarily better because it's the best of both of them, right? Just because I'm optimizing for throughput doesn't necessarily mean that you. You can't dial up quality. And for certain launches like Dolt, I'm dialing up quality just insanely high, doing wave after wave after wave of code review and going to the dolt team and having them review it, right? So quality is just a choice, man. That's all.
Kevin Ball
So that's. I think a really important thing to discuss here is like, how do you. Let's spell out, as someone who has been hacking with these things at both ends of that spectrum for probably more than most folks listening to this, like, when you want to start dialing up quality, actually, one, how do you decide this is appropriate? This is the thing where quality is important versus throughput. And two, like, what are the knobs and levers that you personally turn?
Steve Yegge
For me, for the stuff that I'm working on, I'm in a space where I am very fortunate to be able to have a very low bar. A very low bar. Nothing has to work at all, right? Beads, I can't break now because people are depending on it, right? So. But beads is incredibly well tested. It has significantly more test code than actual code. And I think this is just a vision of the future, right? Where we just have everything just is tested to death. And integration tests too, right? Token burning tests, right. Just like verification and validation are the two. Right? There are the two gates. And validation is. Did you. Did you build the right thing? Right? Which is another problem, right? You can have something go and build something that's great, but it's the wrong thing, right? And then verification is. Did it build it? Well, you know, along with keeping the problem in your head, this is one of those. We've got new sort of pressure points. We've got new, like, you know what I mean? New bottlenecks in development as the old ones have been obliterated by AI, right? One of them is merges, One of them is shared designs and keeping those up to date. And you know what I mean? Like we've talked about that the merge one is terrible and it's why Gastown has a refinery to try to basically have a dedicated role that knows how hard merges are and just redoes them. Right? But you know, it's funny. I showed Gastown to Anthropic and I don't know if I said this earlier, but they felt like it was just working around a bunch of bugs in their model. I mentioned that. So what this implies is that, and people have already pointed this out, right, is that Gastown will flatten. Like, I don't need as many roles because half the roles were just workaround saying, do your job. And as soon as it knows what its job is, you only need the other roles. So I think you're going to have a really simple maybe two tier hierarchy where you've. Yeah, you know, maybe the three tier. Just because organizationally, you know, you want to take advantage of that. You know, militaries are hierarchical for a reason. I'm sure AIs will choose to be hierarchical on large enough projects. But you only talk to the top. Right? I think that's where we're headed. And I think that it's a skill that everybody will learn this year because it's just so effective. Right. But senior people are benefiting more than junior people because they know what good looks like. And we're in a stage right now where the AIs don't always know what good looks like. Right. And so this is why this is. I mean, like, so again, like, valid verification and validation for a senior engineer are often as simple as just looking at it. So it's a little harder if you, if you're in a new space, a new domain, a new programming language, a new new area. You don't actually. You're trying to build for a customer that's not you. I try to avoid doing that. I try to avoid building things that. You know what I mean?
Kevin Ball
Like, I was going to ask, like, have you tried building anything for normies using Gastown?
Steve Yegge
So I have been so busy building Gastown that I haven't been able to spin up a rig for say, like Wyvern, my video game, and just work on it, right. And see how it deals with, for example, UI programming. You know, how do they handle that? Gastown has had a lot of bugs. Gastown has had much faster adoption than I expected, even though I expected people to ignore my dire warnings and use it anyway. And so I've had to stay on top of it because it's had, you know, stalled workers, runaway workers. It's had, you know, I had to kill 320 cloud code instances on my machine the other day. And each one of them takes like a gigabyte of memory. So it can really kill your Machine fast. So, yeah, we're still in that stage, but yeah, this is a new world, man. It's fun times. Look, can you build stuff for normies? I don't know. I wouldn't recommend it. Right. Look, why does Gastown exist? Like, what am I even doing with it? If I know that this isn't the long term shape, what is it for? And one of the main things it did, truthfully, and you saw this, I think, online, is that it reframed the discussion completely.
Kevin Ball
Oh, yeah. It reminds me of the political framing of, like, there's an Overton window of, like, where people are talking. And you just moved it.
Steve Yegge
I moved it way down. Right. What was acceptable to talk about? And it was because up until Gastown, I was just a blogger who was saying, AI is going to be big, people are going to be running fleets of agents, et cetera. It's all going to be going to be right. And Gastown came out and was just so damn smug, right? Like, like, purposely, like I had a whole cast of characters. It had mythology, there was a song, you know, and a nano banana really came through. My God, those pictures are unbelievable. Right? And also I had discovered the whole, you know, molecular work thing. And so it had kind of a theoretical foundation that was actually working to where, like, I could see it working, like it was doing what I wanted it to do. It worked, right? And so I launched it and that instantly, like, a lot of arguments had nowhere to hide anymore, right? It changed the conversation overnight from no, no, no, what you're saying is wrong to bro, you're pretty aggressive, right? That was what happened. They were like, now they're faced with the reality that it is building itself, which nobody's come right out and said it, but that's, you know, that's what it's doing. It's building itself. And so it's at least good enough to build itself as a swarm. And this is deeply, deeply unsettling for people. And some of the content that I've seen is just hilarious, responding to it, right? They feel like they're getting taken over by the hive mind. This is Ender's game, like, playing out in front of us. It's really wild.
Kevin Ball
So the building itself reminded me of a thing, right? So because that reminds me of building compilers. And one of your first goals with a compiler for a language is that it should be able to compile itself, right? You want to create a bootstrapping compiler. So Gastown, it sounds like, is a bootstrapping agent orchestration framework.
Steve Yegge
It is, and it's a great analogy. And when my friends knew about Gastown, but it hadn't booted up yet, we would use the compiler analogy. I wasn't using it to build itself yet. I hadn't booted into it. I was still using, you know, just naked Claude code to build it, right? It was so weird. You would think that you would just boot into it. Man, I wish I could describe this to people. You would think that it would just kind of turn on. But it went through about six or seven, like layers of waking up. And it felt like I was digging a tunnel and burst out into this new universe. And then we started setting up camp and all of a sudden we were building infrastructure and cities and stuff, right? And every single iteration of this, almost every day, it felt like there was a breakthrough. It was like, we're done. Gastown's finally going to be self sustaining. We figured out that there's no activity. Feed beads is the feed. When they close, that's an event. Identities don't need to be separate. An identity is a bead. It's the data plane. All these realizations. And we were like, yeah, this is it. And then it would just flop. It was like the Wright brothers, right? It was just, it wouldn't move. And then on December 28, one day, I was like, and then we should do blah, blah, blah. And the mayor's going, okay, Convoy X landed. Convoy Y landed. That feature's done. That feature's done. And I'm like, what? Because I hadn't touched anything, right? And I realized it was working. It was doing the thing, the thing, the compiler thing. It compiled itself, right? And I so, so excited. It was two days till New Year's and I was like, okay, we got two days. Let's make this thing launch, right? And so, yeah, that's the story of Gastown, man. I actually got it into self hosting mode.
Kevin Ball
I am curious if there's other compiler metaphors that you found useful for it. I think, to me, once again, I often use the metaphor for an LLM of it being a VM of some form. And I think there's a few projects I see out there that seem to be using compilation metaphors. Oh, shoot, I'm blanking on the name of it. It's one that's like, you express your prompt intent and it iterates on it until it can get to the right prompt for each different type. But I'm, I'm curious for you, like, are there other metaphors? It doesn't have to be compilers even, but metaphors that you have found useful to draw on for building this new approach.
Steve Yegge
I feel like I seriously, this system between BEADS and Gastown, I feel like I'm building systems that I've been building my whole life, my whole career. And echoes of them coming up, right? Like the Wyvern Properties system, which is really sophisticated. It's sort of like JavaScript but even has transient properties, right? Like, you know, if you drink a potion, it's going to change your health by a certain amount for a while, but when you log out, it shouldn't persist with you, say, in a game, right? So you got this idea of your permanent properties and your. And your transient ones, right? Well, it turns out orchestration is a lot like that too. You got your permanent work and then you've got all the throwaway stuff that's just. It was running a patrol and all you care is that it ran the patrol and what the outcome was, right? So I have property, inheritance and all that built in. It's a really sophisticated pattern that I used. There's an event system that's like my video game event system. It starts to feel like an operating system. And really with the interfaces that people have been putting on it, it starts to feel like a game. And I saw some fan fiction on X yesterday where somebody was like, my gas town went crazy. We had soldier roles and the mayors were all fighting with each other. Who's the head honcho? And at first I thought it was like real, because I had just cleaned up 300 cloud code instances and it's not out of the question that this could happen. It's just that they had named roles for things. And so I was like, it was totally like fan fiction, but it was kind of also real. I think we're going to get to the point where this acts like an rpg, like a strategy game, like rts. I'm serious. We're not far off, man. I mean, like Age of Empires, you're building stuff. Code, you're building stuff and you're looking at the outputs of it. And man, I mean, like, why not make it fun? It's already really fun. It's super addictive, right? Why, why not lean into it? So I think it's just going to be just absolutely remarkable. Like by the end of the year, people will be playing games and building software. And we're talking about. Now look, there will always be a frontier where engineers will excel because you're building really difficult software. People will be very impressed by it because as an engineer you were able to pull off or an engineering team or a giant company you were able to pull off something that clearly took experience and resources that the regular person couldn't bring to bear. But I still think we're going to see just a huge explosion of software coming from everybody. Yeah, just like when YouTube made it so everyone could write, and social and phone made it so everyone could upload a video, I think everyone's going to be making software.
Kevin Ball
So, not to get out of the LLM world, but what do you see this doing to the industry, man?
Steve Yegge
Like, for starters, my two friends that are tripping over their third friend because they're going so fast have me very worried for what happens, right? Because I've been talking to industry and they're already like, it's already weird. It's already weird. When people use, like, Codex or cloud code or Gemini Cli inside of their workplace, they're so much more productive than their peers that it starts to look weird at performance review time. And then, you know, how do you hire? And the whole interview process is thrown into question and it's affecting everything. Also, all your bottlenecks start to move. I've seen business teams getting incredibly surprised because engineering teams are delivering stuff for them and they're not ready for it yet. They. They're in no way, shape or form ready to roll out this big thing that they asked for. Right. I'm seeing business teams building software for themselves because they're tired of waiting for engineers and they can now. Right? And so, you know, we're seeing, I don't know, like, the rise of the Jeff Bezos 2 pizza team, where a bunch of experts who want to solve a problem just to get together and solve it, and maybe they have one engineer on staff. I'm seeing a shift to where we have all work turning into a gig economy. All workers are gig workers. Like, your project probably only needs a product manager for, you know, one week out of the entire project. And it probably only needs a UX designer here and there, right? And so why not, like, rent them, right? I see an internal economy kind of Airbnb, like, well, maybe that's a bad example, but whatever, Uber, where you know, you're getting a gig economy, like where you're renting workers. Google's had this forever. Sres have office hours. Right? That kind of thing. But I see with vibe coding and everybody contributing to the same big artifact that they're building together, I'm seeing much more mobile, flexible workforce where people are helping each other on the fly as needed. Because I think the answer to your question earlier is I think old fashioned planning goes out the fucking window. It's gone. Okay. I think that companies that are successful will build stuff in real time. And the thing that they're building, the software will become the living artifact, the thing that they're building. It is the shared contract. It is the spec. Right. It is the prototype in a sense. They'll have staging environments and you'll be able to spin up five of them and try different options and then throw away four of them and it'll be like very fertile and productive. But we're not going to be sharing specs anymore. I think those are kind of old. I think we're sharing ideas that have actually implementations. Yeah. And this is going to just utterly change how companies do their business. Right? I mean like silos will get busted up. I think a lot of bureaucrats that are manipulating the system in their favor to try to like, I don't know, keep work for themselves or keep work off of their plates or whatever. They're all going to get found out and kicked out. And I think the system's going to reward people who are really, really good at working with AIs and really, really good at working with people and kind of making stuff happen in big chaotic environments. But that's just what I think. What do you think, K Ball?
Kevin Ball
I mean, I'm already seeing a lot of that happening. I think one of the things I'm trying to wrap my head around are like the bottlenecks are clearly moved. How do we keep attacking those bottlenecks or moving them or changing them around? Right. So you mentioned about planning. Planning is, is a huge bottleneck at this point. And architectures and decision making. There's definitely. Still, we sort of joked around this uber tech lead. Right. We're all having to become uber tech leads in our capacity to absorb information about the state of a system. Yep, that's a bottleneck because that is a skill that most engineers have not developed. I don't even know all engineers can.
Steve Yegge
Develop it or want to. Right. It's not some people's idea of fun, you know, they'd rather solve a problem than verify somebody else's solution, for example. And I get it. I don't think it's for everybody, but for me it's like a dog sticking its head out the window, you know, getting all them smells real fast.
Kevin Ball
Well, it's a wild ride and it's amazing. Right. Like I am managing a team and coding more than like more code that I'm producing than I ever did as an individual engineer.
Steve Yegge
Yeah, you get those feelings of the happy, happy programmer feelings when stuff clicks together, it works. And you're like, yeah. And you're just getting them all day long, just getting those dopamine hits. Just like it's like certain video games have found out formulas that just like, just maximize dopamine, right? And they're really addictive and people just get completely into them and they spend hours and hours like we're getting really close because programming's always been kind of addictive. You gotta be in flow and everything's gotta be going right. And you can't be wrestling with some stupid JavaScript library or whatever, right?
Kevin Ball
There are so many people I know. I mean, I had this literally yesterday. I'm sitting in an airport and I've got crappy wifi and whatever and I need to go to the bathroom and get food and I'm like, I can't give up. I just kicked off some agents, I want to see what they do. They're on my laptop and I'm not.
Steve Yegge
You know, seriously, using coding agents is like playing blackjack. It's like a slot machine. And it's. Because what I was getting at earlier, when you start to trust the LLM to the point where you will let them work, then you will spin another one up and another one up and it's literally spinning in the sense of, you know, text is scrolling by and you'll eventually reach an equilibrium always where you always have one available.
Kevin Ball
At least there's always something to check in on and see, oh, this is.
Steve Yegge
Finished, what's going next, waiting for you.
Kevin Ball
Right?
Steve Yegge
Waiting for your input. And so even if the tools make you better at it, you'll still be in that equilibrium quickly. And it's just like Assassin's Creed got towards the end where you were sending spies off on missions and there was some magic about it where you weren't doing the missions yourself and you wouldn't think it was as fun as playing the regular game, but it was, right? And so like, yeah, they've. It's almost like coding with agents with an orchestrator is maximizing dopamine kind of like going to a casino or like playing one of those really addictive video games. But to me that's really heartening because think about it, if all knowledge work gets broken into beads into right bite sized pieces and we find a way to match all of the knowledge work to all the people in the world and everybody can participate in this, everyone's going to be building software and having fun and you know what I mean? It's democratized. I don't know, man. I see kind of nothing but upside from all of this. Everybody's really scared about it, but I actually think people are going to just, they're going to be blown away with what they can create. I'm bullish on the future.
Kevin Ball
That is actually like a really nice close. So we could close there. But is there anything we didn't talk about that you want to talk about before we wrap?
Steve Yegge
I'm going to share a realization with you that I had and I'm pretty convinced that it's pretty accurate, but it's first time I've ever shared it. So this is a right for your show world first, right? I think that I figured out the magic formula to tell if you're going to live or die in the world of AI as a software product. But I don't know if that's interesting enough for your viewers because, you know.
Kevin Ball
Well, I'm interested.
Steve Yegge
Yeah, I think everyone's trying to figure it out, right? All the boards. Because look at companies like AI is eating software, it's eating jobs, it's eating entire categories. Stack overflow, poof, Chegg, right? The homework company poof, right. Then, you know, call center companies, you know, IDEs are starting to worry. You know, there's a lot of software out there that's starting to get a little, a little worried that AI is going to eat it. And in fact, all boards should be worried because Dr. Andre Karpathi is out there saying that AI is going to eat all software and there'll be nothing left. And he's very worried about this, right? So how do you tell if you're going to make it or not? And I think that the answer is basically thermodynamics, right? And if you can find a way to save tokens, then the AI will use your thing and you will live. Okay? Calculators, databases, storage systems, ledgers, transactional workflow systems, you know, routers, networks, infrastructure, anything that does a bunch of computation and math that the LLMs could do in their heads. But you've seen, how did you see the article Anthropic reverse engineered how they like do multiplication and involves like chickens and goats and stuff. It's basically like they guess that it's 95ish with one pattern match and then they use a lookup table based on the digits to find out which that it's 95 instead of one of the ones near it, right?
Kevin Ball
Yeah. Though once again the simple tool of you can write code gets a lot of that to happen.
Steve Yegge
Well, they do, right? They write code and they use tools and so they're doing a bunch of matrix multiplications. They're basically lazy. They're going to take the shortest path. And so you have a couple of hurdles. One is getting your tool or product actually in their field of view so that they even have the activation energy to know about you, to use you. There's a product called Serena that you may not have heard of. It's an OSS product that uses LSP servers in your IDE to save a bunch of tokens. If you have your LLM wired up to it, it will use that instead of grep and it will find its way around the code base much more quickly because it's all pre indexed and it doesn't have to use grep. Right. It's proven to save a lot of tokens. So it's a more energy efficient state for the LLM to be operating in. And I will argue that LLMs will always, because of laziness, call it what you want. They have a moral imperative to use the least energy possible to solve a problem, do they not? Right.
Kevin Ball
Couldn't prove it by those thinking traces.
Steve Yegge
You can't prove it by the sinking traces by definition. Well, you can for the small ones. You can see that they're wasteful and inefficient and so they will choose the most efficient. Writing code is often the most efficient. But if there's a tool available that's more efficient than writing code, CPU cycles are generally speaking going to be cheaper than GPU cycles if you can solve it down in that layer. And honestly, I think that npu, I think neurons are even cheaper and humans are actually quite good at certain tasks that we're going to be able to do that are just going to be cheapest to give to us. It's the matrix. The first plot, remember the Wachowski siblings had like it wasn't a battery, we were being used for computation. I think they actually called it accurately. So anyway, that's my hot take, is that the way that you survive the AI apocalypse is you build something like beads or Dolt or mongodb or Temporal or Kubernetes or Kafka or Cassandra or whatever. That's infrastructure that AIs will prefer to use instead of building their own or doing it in their heads. Right. Save if you can, make it clearly obvious to them and beat the thermodynamics of Them knowing about your tool, which means you have to market it to them. So you got to kind of do what notion did and literally go and work with OpenAI and Anthropic and Google to like train the models on your tool so that they're better at it. And you can actually pay to do this. That can overcome the activation energy for the LLM to realize that it will be in a lower energy state using your tool and saving tokens for whatever task it is. Yeah. Does that make sense? Do you believe that?
Kevin Ball
Well, we'll see. I think it's an interesting take. I think there is also a question of it's not just activation energy, but it's also access, which is why everybody's racing to become a system of record in some form or another. Do you have data that they don't have out of the box?
Steve Yegge
Yeah. I mean, again, I think you can still look at that in terms of energy token spend basically. Right. It always comes down to how many tokens they have to spend to solve your problem. But yeah, rag, you know, you know, that whole problem is. It's another dimension to it.
Kevin Ball
Sure.
Steve Yegge
Will you survive or not? So maybe mine's necessary but not sufficient. Yeah, but sure, yeah, we will see. Right? I mean, look, if you believe Karpathy and the AI researchers, AI will be able to do it all. Like if that's really true, then in two years there won't be a bunch of apps on your phone, there'll just be one and it'll be Claude and it'll be able to do everything and it'll be super addictive and more interesting than any person that you hang out with. And so why would you have any reason to go to a different app unless it was something that, that your CLAUDE companion does couldn't offer you? And that's going to be things like computer games that are really well thought out or data stores that it just doesn't have access to or products where there's a lot of people and it's, you know, to collectively provide more entertainment than Claude by itself. But it's going to be a weird new world, right, where AI is going to start getting really sticky, in my opinion. People will become dependent on it.
Kevin Ball
I think that's already happening.
Steve Yegge
I mean, look, they can't read clocks, you know, I mean, like it won't be long before they can't read because why would you have to? I don't think that's necessarily bad. I just think the world is changing really, really fast.
Kevin Ball
It's funny because through this conversation. I feel like half of what you've described is utopian and half is dystopian.
Steve Yegge
Look, I'm going back and reading a bunch of, rereading a bunch of Arthur C. Clarke books, right? Because like, he actually accurately predicted a lot of this. He called it aliens, but it was, it was AIs. And yeah, we have a crossroads ahead of us and it could be a dark path or a good path. I really believe that. And there are people lining up to make it the dark path. Right? Imagine a single global, global payment rail. Oh, how exciting. That would be a single moment payment rail, like that's digital feudalism, right? That's where they can extract transactions from everything that happens on the planet. A single work rail, a single anything rail. A single social system. Like anybody who's building towards that is building towards a surveillance state. So I'm building against it, or actually an escape hatch. And I, I really think that humanity is reaching a, a crossroads and AI forcing function. So we'll see how it goes. But there's going to be massive, massive counter reactions this year, right? The social reaction against AI is going to be like nothing you've seen.
Kevin Ball
All right, let's call that a wrap.
Steve Yegge
Sam.
Episode Date: February 12, 2026
Host: Kevin Ball (K. Ball)
Guest: Steve Yegge
Podcast: Software Engineering Daily
This episode delves into the transformation sweeping the software engineering world as large language model (LLM)-powered “agents” move from code autocomplete to complex orchestration, coordination, and independent work. Industry veteran and influential essayist Steve Yegge discusses his experiments at the agentic frontier, including the origins and insights from his BEADS and Gastown projects—tools designed for managing fleets of coding agents and agent-driven workflows. Yegge and host K. Ball examine the technical and human changes required to thrive in this new era, the cognitive shifts in development, and what it all means for the future of engineering teams.
Long-Term Perspective: Yegge, a 40-year industry veteran (started programming at 17, turned 57 recently), built his reputation via blog “rants” at Amazon aimed at organizational change.
Early AI Reactions:
Critical Model Advancements:
BEADS Origin and Structure:
Cognitive Shift for Developers:
Implementation Evolution:
Workflow and Specification:
Bootstrapping and Agent Management:
Gastown Defined:
The Human Side of Trust:
Real-World Use:
Keeping Up with the Swarm:
Distributed, Real-Time Work:
Fitting “code as cattle” into mental models:
Workflow and Quality Control:
Impact on Industry & Organizations:
The End of Traditional Planning:
On Agent Productivity Leap:
“It was like finding the early hover bike in Zelda. ... It’s faster than walking. You have to use it now.” (05:00, Steve Yegge)
On Paradigm Shift:
"It doesn't make any sense for us to write code by hand anymore. ... This is the biggest horse pill that the industry has to swallow right now." (08:55, Steve Yegge)
On What Survives in AI-Eaten Software World:
“If you can find a way to save tokens, then the AI will use your thing and you will live.” (63:30, Steve Yegge)
On Team Transparency & Speed:
“If their work is hidden from view for even a little while... they might as well be working at the bottom of a mine shaft and the world will move on without them very quickly.” (40:02, Steve Yegge)
On the Role of Soft Skills:
"Now your other skills, your soft skills, your humanity... are all really important now; your people skills matter." (42:32, Steve Yegge)
Metaphor for Gastown Bootstrapping:
"It was like the Wright brothers, right? It just wouldn’t move. And then, on Dec 28, one day,... I realized it was working. It was doing the thing—the compiler thing. It compiled itself." (52:29, Steve Yegge)
On the Industry’s Crossroads:
"There are people lining up to make it the dark path. ... A single work rail, a single social system—anybody who's building towards that is building towards a surveillance state. So I'm building against it—or actually an escape hatch." (68:42, Steve Yegge)
For anyone navigating—or about to navigate—the era of agentic development, this episode is full of lived experience, paradigm-challenging insights, and both dry humor and urgent warnings from a true industry pioneer.