Summary11 min read

The Pragmatic Engineer: "Building Pi, and What Makes Self-Modifying Software So Fascinating"

Date: April 29, 2026
Host: Gergely Orosz
Guests: Mario Zeckner (creator of PI), Armin Ronacher (creator of Flask, early PI contributor)

Overview:

This episode explores the origin and impact of PI, a minimalist, self-modifying coding agent developed by Mario Zeckner. Joined by famed open-source engineer Armin Ronacher, the conversation goes deep into the realities of self-modifying software, the shifting landscape of AI-driven engineering teams, the challenges of agent-generated code, and the increasing complexity—and potential chaos—introduced by AI agents in software development. This is an honest, grounded discussion about what’s actually working, what isn’t, and why the tech industry just might need to slow down.

Key Topics & Insights

1. Personal Journeys Into Tech and AI (01:26–07:11)

Mario’s path: Growing up in Austria with limited means, self-taught through gaming, early exposure to machine learning (pre-deep learning boom), and multiple startups before rekindling his ML interests post-GPT.
- “I’ve always kept up with machine learning stuff because obviously super interesting and... GPT happened and that's the story.” – Mario (03:43)
Armin’s background: Not “working poor” but rooted in early computer exposure through his parents’ architectural firm; self-taught in programming via old computers and Linux, founding Ubuntu User communities, and eventually developing Flask and working for Sentry.
- “I wasn’t ever really good at this but I found it really interesting... over time, if you keep doing this, you get better.” – Armin (05:01)

2. Early Reactions to AI Coding Agents (08:19–14:25)

Both were initially skeptical of coding agents like Copilot.
- Mario: “I tried [Copilot] and it was absolutely horrible. But...when GPT came out, and especially when they started providing API access, I did a lot of projects just figuring out what works and what doesn’t...” (09:09)
- Armin raised early open-source license concerns but ultimately embraced the knowledge-sharing ethos—even if it meant GPL code leaks.
  - “My optimal version’s like copyrights don’t exist in a way or like very, very limited...I was like, oh, maybe this will just completely destroy copyrights. And for me that was like, oh, this is... If that’s the outcome, I’m fine with it.” (13:00)

3. How Teams Are Adapting to AI Agents (14:25–26:31)

Adoption patterns: Experimentation explodes during “down times” (vacation, holidays, free credits).
- “Adoption happens when people have vacation... it’s like a two to three week kind of thing until it really clicks on you.” – Armin (15:14)
Impact on code quality: Quality (and sanity) trends downward—the ease of agent-generated code produces more, but also messier, code.
- “A lot of the code in those PRs is how an engineer wouldn’t do it. Because as an engineer you sort of get a really bad feeling committing certain code...the agent really does not care.” – Armin (17:46)
Loss of critical engineering friction: Agents lack accumulation of “pain” and battle scars that encourage engineers to refactor and simplify.
- “Agents don’t learn... They also feel pain. And I think that's one of the defining things about humans.” – Mario (20:37)
Changing team dynamics: Non-engineers and junior engineers can now bypass traditional gatekeeping, sometimes leading to tension or questionable changes.
- “Now your code is way more complex than it should be because instead of failing properly, it is now recovering and entering these many more failure states.” – Armin (18:44)

4. The Case for Friction and Saying No (21:27–27:35)

Value of senior engineers: Experienced engineers know when to say no, recognize tradeoffs, and keep complexity down—agents do the opposite, adding everything because it’s easy.
AI prompts vs. PRs: The debate between “prompt requests” (asking for the intent) vs. reviewing agent-produced code. Both guests see merit in getting actual code, even if messy, as a learning tool.
- “I actually value seeing a terrible implementation of something...” – Mario (26:31)

5. Software Responsibility in the Age of Agents (27:35–31:20)

Drawing parallels with the Industrial Revolution: The transformation of responsibility—machines vastly speed up creation, but accountability doesn’t scale.
- “The machine cannot yet be responsible.” – Armin (28:59)
Overestimating automation: “We as software engineers are so bad at becoming domain experts that we don't see all the non machine parts that go into a workflow.” – Mario (29:49)

6. Origins and Philosophy of PI (31:20–47:18)

Motivation: Disillusionment with increasingly buggy, unstable, and over-engineered agent harnesses (Cloud Code, OpenCode, etc.). Mario wanted simplicity and reliability.
- “If I commit to a development tool, I want it to be a stable, reliable thing like a hammer.” – Mario (33:04)
PI’s architecture: Minimalistic, with flexible extension points. Designed for personal tinkering, allowing users (or even the agent) to modify/extend itself easily.
Self-modification: “...You can just ask PI to build it and PI will modify itself.” – Mario (38:21)
Real-world usage: OpenClaw, a popular AI assistant, runs on PI’s engine; PI can be endlessly customized, from UI tweaks to making itself support new tools or workflows.
Notable moment:
- “And that saves me time. I'm not saying I like pull requests by agents because they're terrible and I auto close them now. But they have value.” – Mario (26:31)

7. Open Source in the Age of Agentic Engineering (47:18–59:44)

Agent-generated noise: Tools like OpenClaw now spam maintainers with a flood of PRs, most with little intent or quality.
- Mario's 3D visualization tool for triaging agent-created issues (49:03).
- “And agents don't see the comment. My GitHub workflow posts underneath the pull requests. So this is a great filter for filtering out agents and keeping the humans safe...” – Mario (51:33)
Need for bottlenecks and filters: “I just need a bottleneck that allows me to process the amount of incoming things as a human...” – Mario (53:01)
Shift in open source dynamics: There's now an explosion of “vibe slop”—agent-created, one-off projects, few of which are likely to endure.
- “A lot of open source really worked because people piled out on hard problems and so they congregated around it... open source is all about throwing stuff up.” – Armin (54:53)
Long-term impact: The ratio of useful, maintained projects hasn’t changed—just more short-lived noise.

8. Complexity is the Enemy—For Agents and Humans (59:44–65:25)

Limits of AI context: Agent effectiveness determined by context window—can’t handle codebases that exceed their “vision.”
- “The complexity they add is their own worst enemy...” – Mario (60:47)
**Quality decays as codebases grow beyond what any agent, or team, can handle—especially since agents don’t “feel the pain” that would drive normal human cleanup and refactoring.
Friction as a feature: Purposeful process and code review slow shipping, but offer safety and quality control.
- “There were changes where you really wanted to think... if you do this correctly, then it saves you time and it makes you happier.” – Armin (65:39)

9. How to Cope and Stay Sane in the AI Agent Era (65:39–71:14)

Refactoring as grounding: Mario maintains quality by regularly diving back into the code to refactor, not just patch.
- “Being in the code is the one thing that keeps the code base quality high and the complexity low.” – Mario (70:00)
Advice: slow down: The urge to move at machine speed causes chaos. Take advantage of what AI is good for—freeing engineers to focus on meaningful, hard problems, rather than just increasing output.
- “You need some way to review all of that code... But you can't as a human, because as a human, you're used to spitting out 1.5k lock a day, and that's about the limit that you can actually review.” – Mario (71:14)

10. Debate: MCP vs CLI (75:56–84:38)

MCP (Machine Control Protocol): Designed for exposing tools to LLMs through a defined interface, popular in enterprise, but too complex, non-composable, and brittle for many devs.
- “At the end of the day, it’s authentication and it’s sort of invoking some stuff.” – Armin (76:28)
CLIs (Command-Line Interfaces): Still preferred, particularly for their composability and directness.
- “The model only sees the end result and it is super free in how it massages that data.” – Mario (80:25)
Outlook: Both approaches have advantages. CLI is developer-centric and flexible; MCP is better for user-facing integrations and enterprise auth.

11. Looking Forward: The Future of Agentic Development (84:38–90:50)

Self-modifying software will spread, but foundational issues remain: quality, knowledge bottlenecks, and reliance on a few platform providers.
- “I think the self malleability thing is obviously something I believe in. I think we will see more of that.” – Mario (84:51)
Societal dependence: Predictions that organizations may become unable to maintain code without their favored AI agent, raising dependency and pricing risks.
- “My best guess is that we’ll wake up to the fact that... engineering teams are already now telling me that they have code bases that they think they couldn’t maintain anymore without the machine.” – Armin (86:00)
Coping mechanisms: Being outside of Silicon Valley and prioritizing time away from technology helps maintain sanity.

Notable Quotes & Memorable Moments

| Time | Speaker | Quote / Moment | |-----------|-----------------|-------------------------------------------------------------------------------------------------------------------------------------| | 03:43 | Mario Zeckner | “I’ve always kept up with machine learning stuff because obviously super interesting and… GPT happened and that’s the story.” | | 13:00 | Armin Ronacher | “Maybe this will just completely destroy copyrights. And for me that was like, oh, this is... If that’s the outcome, I’m fine with it.” | | 17:46 | Armin Ronacher | “A lot of the code in those PRs is how an engineer wouldn’t do it... the agent really does not care.” | | 20:37 | Mario Zeckner | “Agents don’t learn... They also feel pain. And I think that's one of the defining things about humans.” | | 26:31 | Mario Zeckner | “I actually value seeing a terrible implementation of something...” | | 33:04 | Mario Zeckner | “If I commit to a development tool, I want it to be a stable, reliable thing like a hammer.” | | 38:21 | Mario Zeckner | “You can just ask PI to build it and PI will modify itself.” | | 51:33 | Mario Zeckner | “So this is a great filter for filtering out agents and keeping the humans safe...” | | 53:01 | Mario Zeckner | “I just need a bottleneck that allows me to process the amount of incoming things as a human...” | | 60:47 | Mario Zeckner | “The complexity they add is their own worst enemy...” | | 65:39 | Armin Ronacher | “There were changes where you really wanted to think... if you do this correctly, then it saves you time and it makes you happier.” | | 70:00 | Mario Zeckner | “Being in the code is the one thing that keeps the code base quality high and the complexity low.” | | 71:14 | Mario Zeckner | “You need some way to review all of that code... as a human, you’re used to spitting out 1.5k lock a day... if your agent spits out 10 times that, no chance you can review that.” | | 80:25 | Mario Zeckner | “The model only sees the end result and it is super free in how it massages that data.” | | 84:51 | Mario Zeckner | “I think the self malleability thing is obviously something I believe in. I think we will see more of that.” | | 86:00 | Armin Ronacher | “Engineering teams are already now telling me that they have code bases that they think they couldn’t maintain anymore without the machine.” |

Important Timestamps & Segments

Mario’s tech journey: 01:33–03:43
Armin’s programming roots: 03:47–06:40
Skepticism → excitement about AI coding agents: 08:19–14:25
Learnings from 30 engineering teams on AI agents: 14:25–21:00
On code quality decline & team challenges: 17:39–21:53
The shift in team/PR dynamics: 23:34–25:33
Why Mario built PI (motivation & philosophy): 31:20–37:13
How PI enables self-modification, customization: 38:21–41:36
Open source spam and managing agent PRs: 48:51–54:53
Complexity and the limitations of AI agents: 59:44–65:25
Industry pressure to ship faster & need to slow down: 71:14–72:46
MCP vs CLI debate: 75:56–84:38
Future of self-modifying software (predictions): 84:38–86:47
Coping with AI agent chaos & advice: 87:20–91:00
Book recommendations: 91:00–91:38

Final Thoughts & Recommendations

On self-modifying software: PI’s design philosophy—minimal, stable, and endlessly extensible—appears to be a winning approach for the current agentic era, balancing power with necessary friction.
On software quality: Human “pain” and friction, not just faster code output, result in better, longer-lived projects. Agents aren’t a panacea; processes that slow developers down can be essential for maintaining quality.
On open source: The flood of agent-generated code is accelerating, but the truly valuable projects will continue to require intent and sustained human involvement.
On the future: Expect more self-malleable tools and a societal reckoning around AI dependency; slow down, review code carefully, and don’t be afraid to say no, even as new technologies tempt us to move faster than ever.

Book Recommendations

Mario Zeckner: Code by Charles Petzold – “It’s just such a great read. It’s also for non techies and it’s the first thing I recommend.”
Armin Ronacher: Breakneck – “Goes a little bit into an exploration of how China works and how maybe Europe and the US are different... at least thought provoking.”

“The idea of self-modifiable software really grew on me... But agents simply do not feel pain—they just keep adding to the complexity. And in a code base where devs regularly feel the pain of the code base and do something about it, the quality will probably be also better.” – Host, Closing Remarks (91:44)

For further deep dives and resources, see the show notes at Pragmatic Engineer.

This summary captures all major topics and quotes while preserving the episode’s candid, technical tone. No ads or filler—just the heart of the discussion for modern software engineers and leaders.

Loading summary

Transcript251 lines

[00:00]
Host
What if I told you that one of the most influential AI coding agents of 2026 was built by a single developer in Austria who got frustrated with existing AI coding agents? This is PI, a minimalist self modifiable coding agent which has quietly become the engine behind the wildly popular personal AI assistant openclaw. Mario Zeckner is the creator of PI, and joining him today is Armin Ronacher, the creator of Flask and now an early adopter and contributor to PI. In today's episode, we cover the backstory of PI and why self modifying software is much easier to do with AI agen what Armin learned, interviewing 30 engineering teams about how AI agents are changing how they work and why software quality feels like it's trending down. The case against McP and why CLIs are becoming so popular and many more. If you want to hear from two very grounded voices in the industry, honestly talk about what's working and what isn't and why we need to slow down as an industry, this episode is for you. This episode is presented by statsig, the unified platform for flags, analytics, experiments and more. This episode is brought to you by workos Engineers Love to Build. Today's episode will be a great example of this. We'll get into why and how PI was built from the ground up. But when you're shipping a product, some problems are better solved with trusted infrastructure built for scale. Enterprise features like SAML, directory sync and audit logs are some of those. WorkOS gives you APIs to add them in days, not in months. Ship faster without reinventing the wheel. And now let's get into the episode.
[01:26]
Interviewer
Mario and Armin, it's so good to have you here on the podcast.
[01:29]
Mario Zeckner
Thanks for having us. Thank you.
[01:31]
Host
So, as a kickoff, Mario, how did
[01:33]
Interviewer
you get into tech and eventually into building AI stuff?
[01:37]
Mario Zeckner
Oh, well, that's a long story. How much time do we have? So I'm a kid of the 90s actually, and got my first PC at 96 and the trigger for that was that I loved computer games. We were kind of working poor, so we couldn't afford any of the Game Boy and NES super NER stuff. But I had an uncle who had an Amiga 500 and I would go to his place every second day and just play games there. And eventually my parents told me if you work, you can save up and buy yourself a computer. And in reality my dad would do. What's it called? Schwarzerwerd.
[02:13]
Armin Ronacher
Well, you're not necessarily paying your taxes
[02:15]
Mario Zeckner
on your work, so he would do his normal job and after his normal job he would go fix cars and work at construction sites.
[02:22]
Interviewer
Yeah, it's very common in Europe.
[02:23]
Mario Zeckner
I know everyone did that and after two or three years or so they just said it's time and took me to a computer shop in the nearby big city and bought me a 486. And that's how it started basically.
[02:35]
Interviewer
Antium 486.
[02:36]
Mario Zeckner
Yeah. An Intel 486 DX 40 MHz with turbo button. And that's where I started. And I've always been into games a lot. Which also led to graphics programming. And through sheer luck I got a job while I was studying at university at the Applied Science organization who was doing NLP stuff. Machine learning. Applied machine learning. Basically taking research results and trying to stuff them into industry applications. And that's where I learned the ropes of machine learning. That was all before deep learning became a thing. And I actually quit that kind of domain in 2010 11ish because I joined a startup in San Francisco and then later came back and joined another startup with two friends in Sweden where we did an ahead of time compiler follow java bytecode to iOS that got sold and since then I have a little bit more time and I've always kept up with machine learning stuff because obviously super interesting and. Yeah. And GPT happened and that's the story.
[03:44]
Armin Ronacher
Yeah.
[03:44]
Interviewer
And here we are. And then Armin, what were your roots?
[03:47]
Armin Ronacher
My roots are definitely not working poor but. But because my parents ran an architectural office where they kind of adopted computers for CAD drawing. My first computer was like old computers that they recycled. My first computer, even though I'm younger was in 3.
[04:07]
Mario Zeckner
So sorry for you.
[04:09]
Armin Ronacher
And so basically none of the computers that I ever had were capable of playing computer games properly because one they used Windows nt which at the time didn't. So you had to sort of like build your way through it and like the only way next you could actually get them to run was because before it didn't know yet how to get the Windows 95 or like Windows 3.11. That was like before it booted into either one of those. You could put it into dos, like really old DOS games at the time when you could already get better stuff. But because it was sort of this kind of thing, I started toying around with Quick Basic a lot with to Pascal. I bought a bunch of books on that and that was my roots of learning how these things work. And it just. I wasn't ever really good at this but I found it really interesting this idea of like no, for sure.
[04:58]
Mario Zeckner
Like I like we call it a Tiefstable in German.
[05:01]
Armin Ronacher
No, I swear to you, like I was when I started dabbling with this, I just really sucked. But like, over time, if you keep doing this, you get better. And then in 2002 or three,
[05:19]
Mario Zeckner
I
[05:19]
Armin Ronacher
used to use Delphi a lot, which was like a visual version of Turbo Pascal. And in 2002 or 2003 someone also showed me because I got this idea like, I want to use Linux. And then Delphi didn't work on Linux and then I found Python and through that I started doing some Python programming. And there was a Ubuntu just came out in 2004 and that was a venture backed vehicle, but they created all this local communities. So that was like Ubuntu Association. So together with a bunch of friends, we started the German Ubuntu foundation, not a foundation association. And we ran this online community called Ubuntu Users for four or five years. Because Ubuntu was popular. The community grew and then scaling problems came. So that's how I got into web development. And then for building this, I just wanted to build a templating engine, a web library, all of this. And then eventually I bundled that together and made this flask framework, which got very popular and even nowadays still is a thing that clankers like to spit out.
[06:23]
Mario Zeckner
That's hilarious.
[06:25]
Armin Ronacher
But I left it. And then in 2013 14, as I worked on computer games for a couple of years in London. But then afterwards I went back to Open Source and I worked on Sentry for 10 years and then left in April last year to try something new.
[06:41]
Interviewer
So both of you are originally from Austria. In fact, you right now live in Austria as well, right? You were doing games, you were working at Century, you also did games before. And then the third person who's not in the room but was on this podcast just before is Peter Steinberger, also from Austria. Great that the two of you meet. Where did the three of you meet? Because I've recently seen a bunch of photos, especially before openclaw and PI started you hanging out, the three of you experimenting, playing with AI.
[07:11]
Mario Zeckner
I think the two of us met on the Internet, right? On Reddit.
[07:15]
Armin Ronacher
It depends, because I definitely met you once when I was at the university, but you didn't recognize me at the time and I was useless, I was already famous. But yeah, we sort of abstractly met
[07:26]
Mario Zeckner
on the Internet, but eventually we met up in Vienna. We were screaming a lot at each other, but on the Internet, but in a very Cute kind of way in a very non confrontational kind of way. And even though we might not think alike in all areas of our lives, it was a cultured exchange, I would say. So that was nice. And Peter, I like Six Degrees of Peter Steinberger. Basically I was working at an office in my town and the company that gave me free office space in exchange for being like a mentor to the CEO had some kind of business dealings with Peter's company. PSPDFKit PSBDF Kit yeah, and eventually came to the office in Graz and I think that's where we met the first time. And then also the same year we met at the conference in Istanbul. Just hung out for an entire night. And that's basically where it all started. Nice.
[08:20]
Interviewer
And how did the both of you go from being skeptical about AI when these tools came out again? Both of you have at that point, by 2022, you've been doing a decade plus of building complex software in different domains. What was your first reaction to it? And then eventually how did you kind of come across the side of like, well, this thing is actually really interesting.
[08:41]
Mario Zeckner
So for me it was, I think in 2022, I think, Copilot, GitHub Copilot came out before GPT.
[08:48]
Interviewer
Yes, in 2021.
[08:49]
Mario Zeckner
Yeah. And through my previous startup stuff, I was working with Nat Friedman and Miguel De Casa from Xamarin.
[08:56]
Interviewer
Because they acquired a company with Xamarin.
[08:58]
Mario Zeckner
Yeah, they acquired a company I talked about earlier, the Java compiler thing. I knew Nat Friedman from our early startup stuff and eventually moved to GitHub and then was in my DMs in 2022, I think, and asked if I wanted to have access to GitHub, Copilot, the Tap, tap, tap autocomplete thingy. And I was like, I don't really care, I don't think this is going anywhere. And he's like, no, man, it's the future, gotta try it, it's the future. So I tried it and it was absolutely horrible. But yeah, when GPT came out, and especially when they started providing API access, I did a lot of projects just figuring out what works and what doesn't work. Not necessarily in the coding space, but eventually once they had tool calling, that's when they became very interesting, or function calling as OpenAI called it back then. But it took until 2000, I would say 24, end of 24, October, so for that to actually be useful. And that's where the coding agents also became kind of interesting. And then 2025, the cloud code team came out with Claude code. And that introduced AgentiXearch. So basically just give the agent a way to plow through your file system and read all your files and then made the whole difference. Actually, like all the things that came before, like cursor with indexing and any AST based stuff and all of that, that just went away. And I know that the CEO of Chroma is probably mad at me for saying this, but that was the difference that it didn't. It wasn't like a dense and sparse search thing that the agent could go through. It was just give it access to your files. That was it for me. That's where it clicked for me.
[10:36]
Armin Ronacher
I think my path was kind of similar because I think Copilot came out quite a bit earlier. But I know that there was a program at GitHub that gave you early access to Copilot. At the time, I think it was like this maintainers group or something where I still was in. I got the feeling for Copilot that this will actually be really interesting, but not in any way in which it is now because I felt like, oh, I am in open source for such a long time and now they're doing training in open source data. It's like there is something. At the very least this will be controversial. I didn't think about it being productive. I felt like, oh, this is going to be. It's going to be a controversial thing. I thought training open source data. And I remember for like an almost like I was trying to probe it. Like really?
[11:23]
Mario Zeckner
Whether there's flask in there.
[11:25]
Armin Ronacher
No, I was trying to probe it like really adversarial. So one of the things that I probed on is like I probed on like, will it retell GPL code? And I remember at one point I got it to spit out the inverse square root function, which is very easy because it had a very specific name. So it was very easy to get the recall. But I also found that you can sort of tab in a certain way, then it would then continue putting license text on top of it. It was completely wrong. So it came from an open source GPL drop of doom originally, I think. And so it was like it would have been GPL code if it would have done that, but it actually attributed MIT license from a random dude. And I did. It's like, oh, Mr. Copilot, that's the wrong thing. And that tweet at that time got really, really popular. And then sort of people started like sharing with me because I was at a time not really exposed to how much actual AI progress was being made in those labs. I didn't come from this AI space or ML space, so I learned about the university and like, oh, there's AI Winter and then nothing happens. But through this tweet and some other things I like, all of a sudden I recognize that there was something there. There's actually CEOs in certain companies are convinced this will get off. And that's how I started paying attention to it. Essentially. I was trying all kinds of stuff with the API. Can you do bug fixing things? I got really interested in it, but it didn't at all feel like the world is going to change until cloud code.
[12:54]
Mario Zeckner
And you also changed your stance on the whole, oh my God, this is spitting out open source code it memorized.
[13:00]
Armin Ronacher
So because my shtick for many years now has been that I really, I'm like, I want people to share stuff. I think human progress comes from building on top of each other. And I'm a huge supporter of the fact that in the US you basically take knowledge from one company, another company, that then all competes. I like this pirate kind of approach to sharing.
[13:26]
Interviewer
Yeah, spreading knowledge.
[13:27]
Armin Ronacher
Yeah. And so my optimal versions, like copyrights don't exist in a way or like very, very limited kind of version of this. I was like, I really didn't care that it spits out GPL code and doesn't attribute it. I was like, oh, maybe this will just completely destroy copyrights. And for me that was like, oh, this is. If that's the outcome, I'm fine with it. But it was an interesting kind of thing in the beginning that it sort of creates this license violation. I want to see what chaos will emerge from it. And so far I think mostly what has emerged from it is a strong belief now that like the system in place for copyrights has some assumptions in the US about how it's supposed to work. And we're all kind of like ignoring that right now because we want to create a mess first and then re regulate it. Probably because at least in theory, a lot of the things that we're producing right now are probably by historic readings of the copyright interpretation, actually not copyrightable.
[14:26]
Interviewer
Yeah, that's an interesting one. But speaking of jumping today, so an interesting thing that you did recently, we talked about it just before, is as part of your new startup is building things on top of agents, and you Talked to about 30 different engineering teams saying, hey, how are you using agents inside of your company, inside of your team? What did you learn from large companies
[14:46]
Armin Ronacher
to Startups, I think that a bunch of learnings are entirely unsurprising is that whenever people had location, there was more time spent on trying these tools.
[15:00]
Interviewer
And just to be clear, you talk with folks at the likes of meta startups. A bunch of different people, right?
[15:07]
Armin Ronacher
So a bunch of different people from different European dinosaurs.
[15:12]
Mario Zeckner
Are you pointing at me?
[15:14]
Armin Ronacher
Well, I mean the European dinosaur would be someone like CMETS or. I also talked to two companies which are sort of in a critical space. And what I mean, adoption happens when people have vacation is that like when, when your CEO or the tech lead comes and says like, you gotta use cursor now, you gotta use cloud code now is actually you don't get it in a way because you, you need to actually spend some time on like there's a. There's a. It's like a two to three week kind of thing until it really clicks on you. And so I always felt like with the people that I knew, like I had a lot of free time. Like I left the company in April and until October I was like, I can dive into this. And I like, this is like, how does nobody get this?
[15:57]
Mario Zeckner
It's a catnip fall.
[15:59]
Armin Ronacher
It was crazy catnip. I didn't sleep much, all of this. But what happened within the company seemingly is that when there was like Thanksgiving, there was for the Europeans, a lot of it was over summer and then Christmas. A lot of people sort of. And they also get free credits during those times. And so like more and more people.
[16:16]
Interviewer
Oh, you mean the AI companies often give you generous credits.
[16:20]
Armin Ronacher
More and more people went into this. And especially after Christmas, I would guess, like in more than half the companies I talked to after Christmas, it really exploded and exploded in all the ways you would expect it, where all of a sudden the quality drops and it doesn't necessarily drop because people want to make worse code, but because it actually takes some effort to stay within this. And we have seen this in the startup ecosystem already in the summer last year. If you pay attention to the YC startups, some of them have their stuff on GitHub for some period of time on GitHub and you can look at it and at the time, because plan MD files checked in and everything attributed to Claude, that vibe coding kind of thing was for prototypes and whatever and that built that out. It was already out there to see. But then gradually a small version of this has been code bases with a little bit of vibeslop on top. And an interesting sort of part of this was how engineering teams and Companies are now responding to that with all kinds of different findings. But a lot of it has been challenged to review PRs. They're getting larger and larger and they're becoming more psychological.
[17:40]
Interviewer
Taxing engineers specifically are having a hard time keeping up with the longer PRs. The that they're more frequent.
[17:46]
Armin Ronacher
Yeah, and a lot of the code in those PRs is how an engineer wouldn't do it. Because as an engineer you sort of get a really bad feeling committing certain code because you think of your future self and the agent really does not care. I will retell this story over and over, but I worked for an Xbox One game at the time, right around the Xbox One launch. So there was like a fixed day, it has to release on that day. So I worked on the Halo Master Chief collection and there was a game where you had like a matchmaking component and you had to like store this thing and whatever. And it was an all hands on deck kind of situation where people had to go in and unslop the human made slop that was the matchmaker. And it was a system with way too many states. We call it an emergent state machine because it was like 16 bools on one massive thing and infuriated about only six valid states. But in reality it was a dramatic explosion of possible states. And that's how a genticode feels like where it really should only be a very clearly defined system. But in all reality they're like, oh, config doesn't load. Let's catch it down and load the default config. So instead of actually failing, it now recovers. But now your code is way more complex than it should be because instead of failing properly, it is now recovering and entering these many more failure states. And that makes it much harder to work with this code because you can also not really ask the agent to refactor it because the agent is like, oh yeah, this could be possible. So we need to maintain this invariant.
[19:19]
Mario Zeckner
I think it's kind of even worse than what you described about your human made complex system, because there are moments of brilliance in agents where they spit out perfectly fine simple code exactly the amount and type of code you didn't need for that specific thing. And you as the steering engineer looking at that and like, wow, this is amazing. I can just sit back and not care because it's obviously doing the thing. Like two minutes later you have another agent running in this window and it spits out the worst horrible garbage. But you might not notice because now you have fallen into automation bias and Think your agent is doing the job well?
[19:55]
Interviewer
Do you think this might be our bit of a human bias? Because. Because typically onboarding a new engineer, when you join a new grad, you review their code. And if it's terrible code, you will review the next one thoroughly until they get to the point that, oh, it writes the code that I do and then it typically takes six months or a year or something like that, but then I can trust this person.
[20:20]
Mario Zeckner
Yes, but you don't have anything like that with agents. Agents don't learn. You can put as much stuff in your agent's MD or build a memory system, but that's not the same type of learning than a human does. Obviously, humans are available as well, no matter. But they have some capability of learning
[20:37]
Interviewer
and retaining that learning, right?
[20:39]
Mario Zeckner
Yes. And they also feel pain. And I think that's one of the defining things about humans. It kind of ties back to what you said. Eventually, if the pain gets too big, you as a human are incentivized to fix the cause of your pain. And in the code base, the cause is usually terrible interfaces, terrible complexity that you want to get rid of because you can no longer maintain that system.
[21:01]
Interviewer
Isn't this why just holding onto that, you know, like senior engineers are always in demand because the CEO sees a senior engineer as like, they just get it done. But in reality, as a senior engineer or most senior engineers who are effective, they've had battle scars, they've been burned, they felt the pain, they saw what happened when they left tech death spiral. So they now make all these decisions that they know they will help avoid. And of course through this progress goes faster.
[21:27]
Mario Zeckner
I personally think, and your mileage may vary, but a good engineer is an engineer that says no a lot and I don't need this a lot because that keeps complexity down. If you're using agents, the exact opposite happens. You say, yes, I want this and that, I want this and I want this and I want this because I don't have to type it myself, I don't have to think about it. I just give the little machine a prompt and it will spit out something that kind of looks like the thing I want and good enough. And that's where all the problems start.
[21:53]
Armin Ronacher
And one thing that I also think is good engineering is all about knowing the trade offs that you have to make. And there is sometimes the right solution is actually if you were to sort of sit at university and learn about it, you kind of learn that you shouldn't be doing this. In a way, I think Karl Henderson had this once where he said you do the dumbest solution first until it doesn't work anymore. Because the actual problem is there's so much stuff that you need to do that if you actually do the right solutions, the correct solutions, all of this, you're creating the kind of complexity that kills you at scale. And the engineer learns that. But also if you don't have that battle scar, it's actually very hard for you to argue correctly. Because it is this learning process that gives you the authority to then convince other engineers in the engineering Org that you should be doing it this way. That is part of it, you learn that. But the other thing is also that the agents give you now world knowledge access. And one of the other things that I learned through interviewing engineering teams now is that the senior person says no knowing something and then 48 hours later the junior comes by and said, I talked to the agent and I already had this inkling, but now I have all the evidence of why we shouldn't be doing it this way. Because like previously you really didn't have that ready made access to someone who
[23:19]
Mario Zeckner
can tell your senior off.
[23:20]
Armin Ronacher
Yeah. And this creates other stresses now that were previously like not every team has
[23:27]
Mario Zeckner
that because people going to the doctor with a chatgpt printout and saying this is what the machine said, you better do that.
[23:34]
Interviewer
Is it fair to say that we are based on what you're seeing and talking? We might face a thing where it's very hard for experienced engineers, it's harder just for them to say no in spite of the product manager or a junior engineer saying it's much worse because
[23:52]
Mario Zeckner
the product manager comes in and sends pull requests and automatically shits them.
[23:56]
Armin Ronacher
Yeah, that's another thing comes in like non engineers participating in engineering processes is a thing.
[24:01]
Mario Zeckner
Now ask Arvind how that works. Ask him how does it work?
[24:06]
Interviewer
How does it work, Armin?
[24:07]
Armin Ronacher
Well, it's hard because if. Because on the one hand it's well intended, right?
[24:11]
Interviewer
If someone who's like, what is your experience? Is this your company talking with other people?
[24:17]
Armin Ronacher
So first of all we have a little bit of this errand. We're small and so my Kovana for instance, sometimes has a poor guest on the website. I talk to people that have that at scale where the marketing team all of a sudden does stuff on a website and the sales team creates ever more elaborate sales demos that sort of land up on a GitHub. Org and partially one of the most funniest one was where the sales demo built a feature that didn't exist, but nobody noticed. So this is new because previously none of that happened.
[24:56]
Mario Zeckner
But I think it's empowering.
[24:58]
Armin Ronacher
It's empowering. It's like there's a good thing to it too.
[25:00]
Mario Zeckner
If your entire org, if everybody in your org can participate in the creation of software in some form. Right. Previously, people couldn't do that. You had a designer who could figure something out in figma, but they might not be able to kind of put it into a clickable dummy demo, whatever. You might have a PM who wants to try out a feature without wasting time of an engineer. Now you can do that. The problem is that people are now so focused on everybody can do everything now that they forget that you still need a process to kind of guardrail all of that.
[25:34]
Armin Ronacher
And the integration part is the hard thing. It's like Peter gave this idea of like the prompt request, but I'm actually really warming up to this idea. Like, once you've demonstrated it, I no longer need your code.
[25:45]
Interviewer
And just to recap, the prompt request was him saying that he doesn't like to get pull requests and said he would rather see the prompt because he will run the prompt or he will tweak it and it will generate it in the style that for me, it's
[25:57]
Armin Ronacher
less about, like, I want to see the prompt as what is it supposed to be doing? And now that we understand. Because actually, in many ways, I think the interesting part is often you don't really fully know what you wanted to do in the first place. And so the act of creating clarifies what you really want to do. And so that part is highly valuable. Often the approach and the code that comes out of it is not what an engineer with sufficient seniority would have done. So it's not like I want your prompt so that I can re clank my clanker so that it does it slightly better. But more like, now that we know what we wanted to build, it's probably faster for me to start.
[26:32]
Mario Zeckner
Yeah, and I also kind of disagree with Peter on I just need your prompt. I actually value seeing a terrible implementation of something like if I get a pull request, and most of the pull requests we get on the PI repository are made by agents without a lot of human touch, let's say. Then I immediately know, okay, this is going to be garbage, but it's valuable garbage because someone has put in at least a minimum amount of thought instructing their agent to create this pull request, and I get to see how a shitty implementation of what they wanted to build looks like. And I don't need to waste my own time on trying that out. So somebody else tried it out already. That the naive dumb agent, do the thing, do no mistakes version. And that saves me time. I'm not saying I like pull requests by agents because they're terrible and I auto close them now. But they have value. It's not just a prompt, it's on an exponential. Right. Sigmoid eventually, always, because server dynamics. But I think we're going to find out way earlier than in previous cycles that this is a bad idea.
[27:35]
Interviewer
That's good news.
[27:36]
Armin Ronacher
But I think it's going to be interesting. And I don't know the answer to this, but I read this fascinating retelling of the British Industrial Revolution and how it changed the textile industry.
[27:46]
Interviewer
The Industrial Revolution.
[27:48]
Armin Ronacher
Yeah. And so the general thesis on that article was like every time something at the head of the pipeline got optimized, it created an incentive downstream of the whole thing to create something. Right. So in the beginning, if you can weave the thing faster, then eventually you need to have garn that can be weaved at faster speeds. Then eventually you need to. Everything sort of turned a bottleneck all the way down. And ultimately the biggest bottleneck in the entire thing turned out to be what I think is actually the next bottleneck we're hitting in engineering, which is like at one point you made a shirt and if you didn't like the shirt, you went back to the person that made it and they fixed it up for you. And so the actual thing was like, if the shirt is bad, nobody cares about anymark who've destroyed the shirt in the process. Is it just going to get a new one? The responsibility actually went from anyone in this chain to the entire factory as a whole doesn't have to carry responsibility anymore because we have commoditized the whole thing so much that you don't have to do this. And if you take the engineering approach of it, it's like a pretty significant part of running a company. And running a service is like running it reliably. And so you have these postmortems on incidents to figure out what went wrong in the process.
[28:58]
Interviewer
And you go back and fix the shirt.
[28:59]
Armin Ronacher
Yeah. And the thing is, we are running all on this idea that every engineer that sort of is in this creation process, that ultimate letters carries some responsibility and that we are going to that person and not saying to blame that person, but to figure out why did you do wrong here? And so the machine now produces stuff at ten times the speed. The responsibility thing does not scale in the same way, because the machine cannot yet be responsible. And I don't actually know if there is a future where you can abstract away human failure so much in how we run engineering that now the entire company now no longer cares about who signed off on a pull request or something like that. We automated in the same way, I think as we are sort of automating T shirt creation. I just don't yet see that.
[29:49]
Mario Zeckner
So here's the thing. I think one thing we software engineers or IT people underestimate is just how freaking complex the world is and how much human squishiness is in each little nook and granny and. And corner. Right. So we were thinking, oh, we were now able to automate that thing. Now we can automate everything, like every bit of knowledge work. But we as software engineers are so bad at becoming domain experts that we don't see all the non machine parts that go into a workflow. And we are running through the same fallacy here again, we are seeing models doing incredible things. I'm not disputing that. For me this is like, whoa. Basically all my research in the 2000s is now null and void because transformers can do all the things. But we are overextending that to everything like we always do in software, like we did in ed tech. Yeah, we have tablets in classrooms now. Sure, now it's solved. Education is solved because we have now computers.
[30:45]
Interviewer
Well, in fact, I've heard, I don't know which country it was, but they're now rolling back Sweden. They're taking the tablets out from the classroom.
[30:53]
Mario Zeckner
It turns out if you do some scientific investigations into the tactics and effects on pupils, if you do just throw a bunch of tablets into a classroom, close it and hope for the best. Turns out the best is terrible. So yeah, for me, I think the biggest takeaway in the past two to three years is the hype is terrible because it dehumanizes everything. And I want to not be part of that circus.
[31:21]
Interviewer
Well, speaking of not wanting to be part of the circus, let's talk about PI, which is a very popular.
[31:27]
Mario Zeckner
Let me get my clown nose.
[31:28]
Interviewer
And also minimalist coding agent. Can we start with the backstory of why you decided to build PI at a time where there were already agent harnesses around.
[31:39]
Mario Zeckner
Right. Because they were suboptimal.
[31:44]
Interviewer
Tell me more.
[31:45]
Mario Zeckner
Yeah, sure. So I was a believer in cloud code just because they kind of created that whole genre through the invention of a gentic search. I mean invention. There were precursors to that and shoals of giants and so on. But they were the first that packaged it up in a really compelling package. And at the time that fit my workflow really well. It was simply, it was predictive. So the LLM heuristic nature or stochastic nature of being kind of unpredictable, but everything around the LLM was kind of nice and tidy and easy to understand.
[32:18]
Interviewer
So you were a happy user of claw code, right?
[32:20]
Mario Zeckner
I was super happy. I was proselytizing it. But eventually the team started dogfooding and getting more and more tokens, I guess, and kind of increased velocity and team size. And with that came more features and much, much, much more bugs. And I personally like simple tools that are stable that I can rely on even if they have non deterministic parts. But all the deterministic parts should be as stable as possible. And that was just not the experience with cloud code around summer 2025. So I kind of soured on that real hard.
[32:54]
Interviewer
Was it bugs? Was it unexpected behaviors?
[32:57]
Mario Zeckner
So they take away your control of the context. They would inject stuff behind your back, which is bad. And then your workflows that used to work stop working because there's now a system reminder that you don't even see in the UI that will modify the behavior of the model. They would also do this to the system prompt. I reverse engineered. I mean I wouldn't call opening an obfuscated JavaScript file and unobfuscating it reverse engineering coming from a more low level background. But I reverse engineered cloud code during the summer of 2025 and built a little service where I can track the progression or evolution of the system prompt and tool definitions in cloud code. And it's like every release it was messing with stuff. Cchistory mariosegna at if you want to see that. And yeah, that just messed with my workflows and I don't appreciate that. If I commit to a development tool, I want it to be a stable, reliable thing like a hammer. I don't want my hammer to break a different spot every day. Yeah, that's terrible. So that's what happened with Claude. But again I'm. This is not like I'm not roasting the team. I think they're. Some of them are really nice people I got to know on the Internet. They're just dogfooding and that's perfectly fine. We need somebody who like goes to the full velocity kind of way. But I don't want to work with a tool like that.
[34:08]
Interviewer
Yep.
[34:09]
Mario Zeckner
Because I can get work done.
[34:10]
Interviewer
It sounds like the move fast and Break things. To break things was not for you.
[34:14]
Mario Zeckner
No. And then I looked into alternatives and amp Android came out around that time, I think pretty early in 2025. I don't remember. Amp was earlier.
[34:25]
Armin Ronacher
It was very early. I think they sort of spun off from the same experience of taking because I think amp was around when cloud code came out.
[34:34]
Mario Zeckner
That's pretty sure.
[34:34]
Interviewer
Around that time.
[34:35]
Mario Zeckner
Yeah.
[34:35]
Interviewer
Yeah.
[34:36]
Mario Zeckner
In any case, I looked into those harnesses and they were super good. They were just super expensive as well because none of them could basically use what made color code enticing on top of it being a cool tool, the subscription. And that works in an enterprise setting where you are paying by token anyways, but it doesn't work for the small tinkerer in the garage. While I'm not a small tinkerer in the garage in the financial sense anymore, I kind of still relate to that community and I would like to use my subscription with something. So I looked into open source alternatives and found open code. But while that kind of wipes me from my OSS roots, it too did stuff to the context I didn't appreciate behind my back. Pruning tool results after a certain amount of tool result token output, or asking an LSP server after every single edit the model makes if there is an error. Yes, there will be an error because the model isn't done yet with its work, so the code doesn't compile. So the LSP server will.
[35:37]
Interviewer
So like reaching a lsp, the language
[35:40]
Mario Zeckner
language server protocol server. Yes. So when you go into VS code and you type some typescript, you have like in the bottom some error diagnostics, and that comes from an LSP server for TypeScript. And OpenCode runs an LSP server on your behalf in the background and feeds the model with diagnostics from that server on every edit. We as programmers, how do we work? Right? We go into one or more files, we added line after line after line, and only then look at the errors that resulted from that. In OpenCode's case or in other harnesses, cases that also support LSP, the model calls an edit tool to change lines and they would inject the diagnostics after every edit call. And that's just not smart because now you're confusing the model with you have an error, you have an error, you have an error and the model is like yeah, I know, I know, I'm not done yet. Oh, it's not great. Anyways, TL Dr. Opencode wasn't for me either. It was also I had to fork it to modify it Which I don't think should be necessary. So then I just thought, how hard can it be? I built my own little thing.
[36:41]
Interviewer
And then your own little thing is pretty minimalistic. What does it use? What's the basics of PI?
[36:47]
Mario Zeckner
The basics of PI are my own abstraction of all The LLM provider APIs because I didn't like the Vercel SDK. The Vercel AI SDK for various reasons. Armin kind of wrote a blog post eventually about that as well. It's obviously good to use. Lots of people use it. It just didn't fit my old man sense of abstraction.
[37:08]
Interviewer
But this is the beauty of software and especially open source. You can build your own always.
[37:14]
Mario Zeckner
Yeah. And now with agents you can even do it faster and produce terrible complex software. So I built an abstraction with that. Then I built a little abstraction for a generalized agent loop with tool calling and streaming, all of that. I built a bespoke little tool that doesn't flicker or not a lot. And then I tied that all together into a coding agent that looks like Claude code or codecs or whatever you have. That's it. And the extensibility comes from the fact that this minimal core has so many hook points that you can basically hook into with a simple typescript module that gets loaded into the same node process and that allows you to do things like provide the LLM with custom tools, do your own compaction implementation, fully revamp the TUI itself. You can modify everything in the tui. So if you have a special.
[37:59]
Interviewer
The terminal ui, right?
[38:01]
Mario Zeckner
Yes, exactly. If you want the TUI to behave differently for a specific workflow, you have like, say you're non techie, you can change the TUI to become whatever you need. As a non techie, I have a couple of non techie friends that did that because they don't need to know how to build this, they can just ask PI to build it and PI will modify itself.
[38:21]
Interviewer
Oh, so this is the thing, right? So you can ask PY to modify itself because of the extension points and it can write code that extends itself
[38:30]
Mario Zeckner
and it's trivial, but it's a big unlock.
[38:33]
Interviewer
Is this what you meant when you said that for open code you needed to fork it to modify it? It doesn't have this.
[38:38]
Mario Zeckner
It does have a plugin system, but there's not a lot of extension points and it was very rigid. I think they changed it recently. I think it's much more open now. I haven't kept up with it, but might be better now.
[38:50]
Interviewer
So I guess PI Stars has this very minimalistic thing as I understand the tools it has is read, write, read, write, edit, bash.
[38:56]
Mario Zeckner
It's all you need, that's it.
[38:58]
Interviewer
And then you can actually start to make it your own. Like, okay, what are examples that people put out?
[39:04]
Mario Zeckner
PY doesn't have mcp. People just ask PI to build MCP support into PI. PI doesn't have a plan mode. Armin goes, and my plan mode must be fantastic, bespoke and super smart.
[39:13]
Armin Ronacher
I don't have a plan mode.
[39:14]
Mario Zeckner
Yeah, but he has like five implementations of a plan mode until he realized plan mode is entirely useless. Other people just like messing with the UI and making it their own. Like a different visual style of the editor box where you enter your prompt stuff, like trivial stuff, more cosmetic stuff. Other people have retriggered it for full blown RL environment for open weights models where they use PI as the agent that does. That's part of the RL execution environment. So you can do anything really.
[39:46]
Armin Ronacher
What drew me to it beyond actually using the library abstraction was in fact the custom tools part. One moment for me was over Christmas again, like many people had some time and I tried to build other things and Peter was talking to me in November that he's vibing without looking at code more or less. I don't know exactly how. He said he can do this now. Like, okay, I want to build a thing where I don't look at the code. I wanted it to not look like slop. I wanted a version of it where afterwards, even though I don't really look at the code, it should look like what I would have written and I want to make a game. And so then I basically started the whole experience with just basic pie. I was like, we want to build a game. But actually before we build a game, I want you to set up the code base in a way that you can validate the changes that you're making. But also I can see them like a two prong kind of approach. Like I wanted to be in the loop but also have the agent be able to validate itself. And what sort of emerged out of that was. Well, first of all, it built itself some debugging tools into the game so you can make screenshots and run a simulation and sort of dump out state and read it again. But also PI can show images in a TUI and. And I added a bunch of like I talked with Dick Twinker to figure out what would be interesting things to do. But we ended up having all these screenshots I can tap through quickly in the UI or PY has also this great feature I can reverse to an earlier state in the conversation and then it can branch within the conversation. So we build a bunch of stuff around that because these sessions, especially with screenshots and it became very token inefficient very quickly. It was actually one of the other things that PI was rather quickly, rather good at was having a lot of
[41:29]
Mario Zeckner
screenshots in it because openclaw people had a lot of screenshots in their chats and openclaw is using PI, so we
[41:36]
Armin Ronacher
had to but having this. It felt really magical for me to actually treat the problem as I don't know what the right way of engineering here is, but very clearly part of it is like I should be in a loop so we can figure out how to specifically for the problem at hand do that. And it turned out like for web project and computer games and some of the other things I tried, they're kind of different, but very many of them sort of come down to similar thing where like the agent interacts now with my program and should do the most optimal way and I want to interact with it in conjunction with it, interacting with the program and the entire experience should be as little confusing as possible to both me as a human and to the agent. And I found it very, very fascinating just to see how that emerges where like your tool all of a sudden when you launch it in this program looks and feels different than if you launch it in the other program.
[42:33]
Host
I really like this point Armin made just a few seconds ago that AI works best when the engineer stays in the loop and the system can actually validate what changed. And this is a great time to mention our season sponsor Sonar. AI can now generate code faster than you can verify it. Sonar, the makers of sonarqube, sees this leading to serious gap in verification. With the rise of coding agents autonomously writing code verification is no longer nice to have. While the latest coding models are extremely intelligent, they also are error prone and they don't fully understand your code base and your context or your objectives. This is why verification must be mandatory in agentic workflows. Sonarq provides a 0 trust multilayered approach to code verification that is consistent and repeatable. It analyzes semantic syntax, data flows and architectural boundaries at agent speed, acting as a critical trust and verification layer before any code reaches production. Covering 40 plus languages and 7,500 issue types, Sonarqube is the most comprehensive code verification platform available and with easy integration via mcp, CLI and hooks, it fits right into your existing AI tool chain. Let agents move fast and have sonarqube as the independent multilayered verification for safe, reliable and auditable agentic development. Head to sonarsource.compragmatic to start verifying your agentic workflow today. I'd also like to talk about our presenting sponsor, statsig. Statsig builds a unified platform that enables both experimentation and continuous shipping. Built in experimentation means that every rollout automatically becomes a learning opportunity with proper statistical analysis showing you exactly how features impact your metrics. Feature flags let you ship continuously with confidence and because it's all in one platform with the same product, data teams across your organization can collaborate and make data driven decisions. To learn more, head to statistic.compragmatic with this. Let's get back to the episode and to the topic of general versus Purpose made tools.
[44:30]
Mario Zeckner
Yeah, I mean I spend a lot of my UFUN construction sites to earn money and you don't use a hammer for all your problems at the construction site. You have a screwdriver, you have your hammer, you have your drill, you have whatever. And I think in engineering it's kind of the same. I'm not using the same tool for every task I do as an engineer. So now if I use an agent, I don't want a general agent for every task per se. I want a specialized thing where I know the performance will be top notch for that specific task. Because we built the harness in a way that the agent can be most effective at this task just because of the construction of the way the harness is constructed. And that's what I wanted to enable with PI. That said, I'm probably the person that has the least amount of modifications in PI. I have like two extensions that I use and they're trivial. They're basically just if you see a URL that looks like a GitHub issue or pull request thing, pull down the details via the GitHub API and display me a small little widget on top of the editor that gives me the issue title, the author account, and a link to the issue. That's basically all I do.
[45:33]
Interviewer
Well, it might work for you as a minimalist.
[45:36]
Mario Zeckner
Yeah, I mean that's how I work on the PI mono repository, because I might have two or three of sessions open in which I process and issue a pull request. That way I remember what the session was about.
[45:47]
Interviewer
But sounds like you also made your PI for working on the PI monorepo a specific one. And if you, if you were working on a if you went back to Building games, you'd probably have a. I never thought of the fact that you might want a different harness for a different task. I guess we just kind of assume that most developers, you work on your main thing at work, you might have a side project and just experiment, experiment with whatever. But this one, this. I wonder if this is a new thing that we could never have. We could never have custom tools for a project. That just sounds crazy, you know.
[46:20]
Mario Zeckner
Here's his. Like my intuition is this, I think where we are going is software that modifies itself on behalf of the user's wishes and needs. And the agents can do that now if you give them enough rope to modify themselves. And I think with PI, that is my first foray into this kind of self modifiable malleable thing just for the coding agent sector. But I think this actually can be extended to all kind of knowledge work to a degree for specific tasks within the broader set of knowledge work. Obviously dehumanization and so on, you know. But yeah, the next plan here is actually to have an alternative user interface to the tui because the TUI is obviously limited and the best alternative stack is obviously the web because it works everywhere and can do anything. So once I have that built out, then it really becomes interesting because then you're not limited anymore to the line based rendering of a terminal. Now you can do really, really interesting stuff. And so yeah, we'll see how that works out.
[47:18]
Interviewer
And one reason that I learned about PI before I knew that it was this minimalist Interface is how OpenClaw is using PI.
[47:28]
Mario Zeckner
How did that come hanging out and reviewing each other's blog posts and just throwing ideas at each other. And in October I started building out PI and Peter started building out Varelay, his little WhatsApp assistant, so to speak.
[47:42]
Interviewer
Oh, that's how it started.
[47:44]
Mario Zeckner
Yeah. And he was in search of a gentic core he could reuse or copy. I think it started out by him taking PI and cloning it and calling it Tau and then modifying it. But eventually he got tired of having to maintain that. So it just said I'm going to use your stuff. And that's how it ended up being PI wouldn't have compaction if it weren't for openclaw. I specifically built that because Peter was crying in the in chat and I need compaction. Okay, you get compaction, but I'm going to tell all my users don't use compaction, it's bad for you.
[48:16]
Interviewer
Yeah, but that's, I guess the beauty of building on top of open software one another.
[48:20]
Mario Zeckner
Right. I mean it has pros and cons. Yes. I now get to enjoy all the openclaw instances that think bugs in openclaw are actually PI bugs. So they autonomously send me a gazillion issues and pull requests without the users probably even knowing. And I get to deal with that in my open source. So that's a negative side effect.
[48:39]
Interviewer
Well, so you're really on the receiving end of this, I guess.
[48:42]
Mario Zeckner
I mean, just like openclaw itself is, which is much more exposed to this problem. I mean, they have tens of thousands of issues now and there's no way they can get a good grip on that.
[48:51]
Interviewer
But how are you dealing with the fact that you now have openclaw just AI autonomously opening things on your repo? As a maintainer, do you build tools to battle this and try to close them out or build a tool for
[49:04]
Mario Zeckner
OpenClaw ones which embeds issue and pull requests into a 3D space so I can see the clusters of similar things that that agents would have sent to the repository and then I can bulk select things and close them out. Oh really?
[49:15]
Interviewer
So you actually have a 3D visualization?
[49:17]
Armin Ronacher
Yeah, open Clover context. I think it's less crazy now, but end of December to I think mid February. I mean it was exploding obviously, but like this explosion almost like directly translated to I was on this repo refreshing pull request and the number went up.
[49:38]
Mario Zeckner
Yeah, we actually tried to contribute and help out Peter a little bit, but I immediately gave up.
[49:45]
Armin Ronacher
I didn't know how to do anything useful there. I was looking at this, I was like, this is a type of software engineering I'm just not used to.
[49:52]
Mario Zeckner
I would fix two things and spend an hour on them and then five minutes after I committed and pushed it, some clanker comes along and just reverts my fixes. And this is not how I.
[50:02]
Interviewer
Okay, can we talk about the name Clanker? Oh sure.
[50:06]
Mario Zeckner
So Clone Wars, Star Wars. I actually never watched it, but kids of friends of mine watched it a lot while we were visiting them. So I kind of through osmosis got the lore and there is an army of robots and the Jedi would call them clankers or people will call them clankers because when they move, they clank. Clank, clank. Yeah, that's the origin of that. Yeah.
[50:29]
Interviewer
So an AI, a droid.
[50:31]
Mario Zeckner
Yeah, exactly. Yeah. But coming back to the how do you deal with the influx of attack agentic pull requests and issues? I just auto close every pull request. Human agent, doesn't matter what I do is if I haven't had contact with you previously, my GitHub workflow knows about this because if you had, you're in a file in my git repository, your account name. So if you're not in there and you send me a pull request, your pull request gets auto closed. And then my little workflow posts a comment under your pull request that says hey, thanks so much for contributing, really appreciate it. Could you please open an issue in a human voice no longer than a screen's worth of text and if I like it, I type. Looks good to me. And then that account name gets put into the file and the next time they send a pull request they pass and it turns out agents don't see the comment. My GitHub workflow posts underneath the pull requests. So this is a great filter for filtering out agents and keeping the humans safe, more or less from.
[51:34]
Interviewer
This is interesting. I wonder if this might be an unavoidable future where we need a way to separate. Is this coming from a human with an intent or an AI?
[51:46]
Armin Ronacher
I don't necessarily care if it were actually a good pr, then if it came from a machine, it's actually, I think what's interesting in PI is like. And OpenCloud even more so is like it accumulates pull requests. Well, actually there was no intentionality behind it at all. And so the person that dispatched the machine didn't actually care that much about
[52:10]
Mario Zeckner
it, but didn't even know about it
[52:11]
Armin Ronacher
or didn't even know about it. And I've done open source for many years and that was also there was a big difference between someone that sending pull request up or like an issue. And it's like, hey, please fix this. But actually didn't care enough to even reply to questions anymore. This is not uncommon. And then you don't actually have to fix that, but you have to close it out because maybe it's still useful input. But clearly that person wasn't caring enough. And with the pull request, it's even worse now because they come in so quickly that many of them cannot be merged anyways without manual resolution of the conflict. And there's a lack of back pressure mechanism because even I as a human, if I see there's like 500 pull requests open, I was like, I probably will not contribute to this thing now because at worst I will make it worse.
[53:02]
Mario Zeckner
I think previously in open source you had the people who would just send issues and be very entitled and say you're the worst person on the planet if you don't Fix my little issue. But that's fine, that can be handled. And pull requests were kind of special because it needed a human to invest quite a bit of time to produce them. And you don't have that anymore. You just have people, oh, this should be easy. Agent, please do this thing, make no mistake, send it to this repository. And that's just not going to happen. So basically what we need are bottlenecks. I'm not necessarily, I don't necessarily need human verification or verification that you're human. I just need a bottleneck that allows me to process the amount of incoming things as a human, because in order for PI to not deteriorate into a pile of garbage, I still believe that it needs me and other capable people reviewing at least the important code. And for that I need bottlenecks because otherwise I can deal with.
[53:54]
Armin Ronacher
It's a second law of thermodynamics, right? It's like everything degrades towards chaos and you have to put extra energy into to keep it away from this outcome. And we don't see and feel the pain of the code base anymore if we stop looking at it and people don't feel the pain or they feel no restraint anymore. The issues are also interesting because on the one hand it is something great about someone doing an investigation and sending you a description of that that can be good and can be bad, but they look very similar. It takes quite a bit of energy to tell apart a good and a bad AI generated issue request. And unfortunately like most of them are not great, but some of them are actually good. And that's also kind of. It's weird. Like all of it is weird. I really don't know what the future of open source is in many ways because like, a lot of open source really worked because people piled out on hard problems and so they congregated around it and said like, now we need to have a good database, so we're going to put all this energy on building a good database. And the value of open source came from there's some hard problems and we're going to throw our energy together and we're trying to figure out how to solve it. And now it feels like open source is all about throwing stuff up. What really grinded me so mad was people, particularly a lot of agentic engineering right now is building more stuff for agentic engineering. So it's Uborus or Eborus or whatever you call it. And I see this tweet and it's like, oh, I solved problem xyz and here's my solution for it. And you click on this thing, it's like it's 48 hours old. That person probably never used the thing that they built.
[55:38]
Mario Zeckner
I would like to suggest to the viewership to look at Arvind's GitHub account over the last year and what happened there.
[55:44]
Armin Ronacher
Yeah, I built a lot of his stuff, but I don't then go on Twitter and say, hey, I solved the problem. I have a shit ton of Vibe slop on my GitHub account and I wish I could mark it differently because, like, maybe there's some utility in it, but unless you're going to actually have that code base still be there a year, a year and a half from now and someone is still using it, the utility of that is actually not validated in a way. And there's so many markers and metrics you can look at now for GitHub that really demonstrate this explosive growth of it. But if you were to then maybe find some other number to see like how many of the things that are being created are actually turning into like really fundamental pieces that can sustain open source communities that can actually deliver this value that scales amazingly. We haven't actually created many Vibe engineered projects that have become that.
[56:41]
Interviewer
But I like how you mentioned energy and how open source always worked. If we just think pre AI, again, let's say Linux, the most successful or widely used open source project. It has both an energy and a structure. People come in with intent that they want to add something, they have a process where it goes through. There's human trust at every level. There's a little pyramid and in the end it all goes back. Exchange request goes up one level and in the end Linus does the cut. But there's a lot of energy, there's a lot of intent, there's a lot of humans, There's a lot of humans. And it was always about human energy. And now we suddenly have this AI, which it's just tokens right now. Who knows how much they're subsidized or not. Or it's just machines doing and suddenly they create plaus things that look like human energy and it's hard to differentiate and suddenly just like throws this wrench.
[57:33]
Mario Zeckner
I actually disagree. I don't think a lot has changed to open source.
[57:36]
Armin Ronacher
Okay, the volume has changed.
[57:39]
Mario Zeckner
No. Yes. But that's just a number. The amount of, as you said, the amount of actually useful and maintained projects has probably not changed a lot.
[57:48]
Interviewer
So you're saying that the ones that were there, they're still useful and maintained
[57:50]
Mario Zeckner
not even the ones that were there. I mean there's a specific rate of new open source project that survive longer than two weeks. That's always been the case. So now we just have more projects that die after two days than before. But we still have the same amount of projects that will have a long term viability. Just because there are humans that actually care to maintain the thing over a long time. Build a community of humans that support the entire thing, build an ecosystem around the entire open source project that makes
[58:21]
Armin Ronacher
you not believer into multbooks.
[58:24]
Mario Zeckner
No, I mean good job meta butting that up. Super useful. No, I think at the end of the day we are kind of freaking out when we don't actually need to. Because apart from the fact that I personally can now generate code faster than speed of light for me building an open source project and that entails not just the code, but the community around it, the spirit around it, the ecosystem around it, nothing changed. What changed is mechanical parts. I. I need the bottlenecks to deal with the influx of exponentially growing agents. Pull requests, whatever. GitHub itself is under immense pressure because now it's not just humans hammering their infra, it's now billions of millions of openclaw instances hammering their infra. Everybody complains about GitHub going down. I actually think they're doing a pretty good job. That's a lot of traffic that's coming their way since basically Christmas, since basically open call. So yeah, I would be a little bit more optimistic. We're just in the messing around and finding outstage at the moment and everybody wants tokens to be a KPI, just like lines of code used to be a KPI.
[59:30]
Interviewer
We've seen this speaking around of things that don't change and messing around and finding out you wrote a tweet or you wrote somewhere that your biggest enemy is complexity. It's also your agent's biggest enemy. Can we talk about that?
[59:45]
Mario Zeckner
Very simple. If I have a 600 lines of code code biz and my agent can at best be effective up to a context window size of around 200,000 tokens, how much of the code can the agent see?
[59:58]
Interviewer
A third. Right?
[59:59]
Mario Zeckner
Right. If you manage to get all the relevant code for a task into that context window, you're probably okay. Although that is a separate project, an information retrieval problem which is not solved and which agentic search also doesn't solve. That is, are you sure that the agent finds all the relevant code it needs to find to fulfill a thing. That's also where all the garbage code comes from because it doesn't see all the things it needs to see. In this case, let's assume the best case, information retrieval is solved. Everything fits into the context. Agent does a good job. Okay? That's not the reality we're living in. Because now the agent spit out so much code that they themselves cannot possibly read into their context on a new task anymore. You know what I mean?
[60:45]
Interviewer
Yep. They develop their own context window.
[60:47]
Mario Zeckner
Yeah, exactly. The complexity they add is their own worst enemy, because eventually the codebase will be so big and so complicated and so interconnected that the agent has absolutely no way on a technical level to ingest all the context it needs to do the new task. And I would like to point out that the agent has learned all of this garbage from the Internet and from us. Because on the Internet there's all our old code. While there are some pearls, there's also a lot of swine because we have a gazillion GitHub projects from the olden days where we just tried out things, and because instances like Linux or any other really well maintained and well written open source project are minuscule compared to all the rest of the garbage. And a machine learning model will kind of converge towards, well, simplified to the mean. Right, and what is the mean then? It's not the handful comparatively of excellently engineered projects, it's all the garbage on the Internet, all the cargo culting, all the trend type of the day kind of stuff. And that's what we get when we let the agent do all the things for us.
[61:55]
Interviewer
Yeah, so we have this problem of things are getting more complex, which slows agents down, which will in fact impact quality, which we were just talking about. But Armin, now that you're building your own startup, two of you are building your startup. Now how are you. And you're working with agents. Right? And they will have these things. How are you dealing with generating code, building products, balancing quality, type, depth, complexity?
[62:22]
Mario Zeckner
I'm not dealing with that badly. Look, I think that we're coping, we're not dealing with.
[62:27]
Armin Ronacher
I don't know if I wrote this in the blog. I definitely have it on my slides for the conference here. I enjoyed the time from April to about October immensely because it felt like I can do so much. But also there was no heightened expectation. The world has not yet gotten used to this idea that everything has to now also move at 10 times the speed. And there was a moment of time where I felt like we worked in this vibe tunnel thing in the beginning. And I Was like, it felt so much fun because, like, I have time now to play with the kids and I just prompted a little bit on my phone and like it felt Vibetunnel
[63:08]
Interviewer
was where you could set up with your phone, talking with your hr.
[63:11]
Armin Ronacher
Yeah, it was just like a terminal, basically.
[63:14]
Mario Zeckner
Yeah.
[63:15]
Armin Ronacher
And it's not that we did much with it, but like it had this like happy vibe. And like, I know that I spent too much time on a computer, but I didn't feel any pressure. But now it's like this, like we're collectively feeling like everything has to ship faster, it has to iterate faster. The baseline that we want to achieve in terms of fidelity and everything has to be higher. And so now it feels very stressful.
[63:40]
Interviewer
Even in your own startup. Yeah.
[63:42]
Armin Ronacher
Because to some degree you can be the most stoic person in the world and it's still going to get at you in a way that I'm slowly learning to work with my own emotions in a way on dealing with this. But I find it very, very hard in a way because I was used to things working a certain way and I knew how I do some stuff and then I fell a little bit too much into the trap of giving into the machine and actually doing things in a way that I normally wouldn't have done.
[64:13]
Mario Zeckner
Things that you regret.
[64:14]
Armin Ronacher
It's definitely gentle regret, gentic regret.
[64:17]
Mario Zeckner
Yeah.
[64:18]
Armin Ronacher
And so quite frankly, the answer is I feel like now, with a little bit of power of hindsight, learn some things that I wish I would have learned probably in November.
[64:28]
Host
Tell us.
[64:29]
Armin Ronacher
Well, I mean, a lot of it is really the recognition that there is no backchannel to me or to any other engineer. When under normal circumstances there was a back channel. There was this feeling of things are not quite right in the code base. There was this. Now the change is harder and the complexity you sort of see then the complexity of the pull request getting higher. But if you rubber stamp it, then what's the back channel there? And so this mechanism, this back pressure, this friction in the code base you don't feel when you work with the agent.
[65:01]
Mario Zeckner
I think there's a way to kind of measure it. If I scan through my sessions on a project from start to current date, I think the frequency of curse words increases because the agent starts messing up more because it itself cannot deal with the complexity of the edit of the project. And I would be actually really interested in whether this is measurable because I feel it in most of my projects now that occurs a lot more.
[65:25]
Interviewer
But you mentioned friction in the software. You didn't say tech depth, you didn't say complexity. What is this friction? Because I don't remember us talking about this pre AI at all.
[65:39]
Armin Ronacher
So I found this ironically kind of funny and it's kind of sad, but I will not name any names. But there was what I assumed was an incident related, at least in part, semantic engineering on a company where they shipped out a configuration change that ultimately resulted in a security issue. And look, things happen. But the link that I saw on this had the social preview of that company's tagline and the tagline was Ship without friction. And that really gave me pause because I know as an engineer we used to talk about got to get rid of all the things in the way so that you feel happy shipping stuff. But there always were changes where you really wanted to think. It's like, do you want to drop the database? Do you want to merge this migration, which might take a table lock that could potentially take you down. It's like there's moments every once in a while where you were really supposed to think and people created checklists or people created mechanical gates where you would have to confirm something. There's certain things that we used to put, particularly if you run a SaaS company, did it put stuff in. So to slow things down or. In some of the best engineering teams, in order to mature a service, you have to define an slo, you have to define expectations and if your service is supposed to be critical. But there's some other stuff that unlocks on this sort of tree of requirements. And a lot of engineers feel like, oh, this is also this bureaucracy. But the reality is if you do this correctly, then it saves you time and it makes you happier. You're not waking up at 3 o' clock in the morning. Like all of this is useful.
[67:17]
Interviewer
It's like friction injected to deliberately slow things down. I guess the easiest example, in any decent sized company you have services based on tier, based on criticality. The highest tier software now needs to have, let's say two or three code reviews or an approval from a director to do a configuration change, which again all slows down. But the kind of like we know this is on purpose, like by adding this friction, we want you to think, do I want to push through this friction in terms of time invested or effort or having to justify things, etc.
[67:52]
Mario Zeckner
It makes you think about, do I really want to add this to the code base if I know that the end effect will be that it has to go through this entire chain of hardware. So we're Coming back to saying no to yourself to avoid pain, going through
[68:05]
Interviewer
that process and then taking on the pain when you know that you have the convenience, you have the backing, you have the confidence as well, right? Like so typically when it's a high friction thing, let's say a tier one service or a highest tier service where a director have to sign off, when you're a new joiner on the first day and you don't know the context, you probably know that that's a pretty large ask and you'll probably socialize, get buy in from an experience and say like oh, this is the right thing. You'll go with them right back to Human Dynamics a lot of times I
[68:34]
Armin Ronacher
think the thing is there's a very delicate balance in the whole thing because you don't want the friction to be just an accident of having created bad developer experience. But some things look the same, but they were deliberate, but they may not sufficiently documented. But there's this feeling now like get rid of all the friction so that the agent can be very autonomous so that it can run many of them simultaneously. A lot of it comes from that. These things are actually rather slow and the only real time saving that you get from it is parallelism. And so somewhere there is this trap. I feel a little bit more experienced now in managing the trap, but I don't have the solution for that either. And I will not like say that is an example code base where I felt like really really great about the stuff that I built except for pre existing libraries from before enchanting days where I still feel like a strong emotional attachment to them and I'm much more careful about doing them than any of the code that we other than PY to which I don't have access.
[69:49]
Mario Zeckner
Oh no, there's still no right access. There's a lot of slop in py, but I try to avoid it in the bits and pieces where I know that's important code. Like we have an HTML export functionality where it takes the current session and just spits out an HTML file that you can then host on GitHub and whatever. I have not looked at a single line of code for that function. I don't care if it's broken, if it looks right when it comes out. But then there's the agent loop itself or the extension loading mechanism and all of that stuff and that's important. And the way I deal with ensuring that that has or at least trying to ensure that it has high quality is I refactor mercilessly because that pulls me into the code base. I need to understand what I want to change structurally, not just line per line and syntactically or whatever. I need to understand what's going on to do a good refactor. And I'm doing that every now and then, like I'm doing now at the moment, prompted by wanting to add a new feature that's currently not possible with the current architecture. Being in the code is the one thing that keeps the code base quality high and the complexity low. But that's against the industry wisdom of burning as many token maxing, basically.
[71:01]
Interviewer
Yeah, that's an interesting one happening. But you just recently wrote on the same theme, a blog post called We all need to Slow the F Down. Can we rehash some of the thinking and what triggered you to just put it out there?
[71:15]
Mario Zeckner
Okay, so the basic gist is, okay, your agent can now spit out 10 times more code a day than you can. But it also means it spits out 10 times more boo boos errors. Even if it has half your error rate, then, okay, it's not 10 times more, it's five times more. It's still more than you would spit out. So the rate of deterioration in your code base has now increased and now go Dark factory. Now take 100 agents that do this to your code base. What's the end result of that? So that's the first problem, right? You need some way to review all of that code. It now gets generated to fix all the boo boos. But you can't as a human, because as a human, you're used to spitting out 1.5k lock a day, and that's about the limit that you can actually review. Well, right, if your agent spits out 10 times that, no chance you can review that. And not all of that code by the agent might be important, like the HTML export thing, right? But even if the agent spits up 3 to 5k a day, you have no way of reviewing that in any meaningful sense. And then if you do the armies.
[72:17]
Interviewer
Yeah, I mean, and then the armies. This is interesting. So you call it the Dark Factory. The idea being that tens or hundreds or thousands of agents, you give them a spec, they go and they break it up. They organize themselves like the mayor and all that jazz. They have the QA agent, they have the. You know, you give them roles, you give them context, and then you give them enormous amounts of tokens and spend. And the idea is, or the hope is that, oop, your software will be done in.
[72:46]
Mario Zeckner
Oh, there will be something, will be Done. Definitely something's going to be done. First your purse and then. No, yeah, sure. More power to the people that make that work. I can't make it work. And the reason I think I can't make it work is because I still care about the quality of my product. And I don't care if it's built by hand or by agent. I just want the quality to be good, both in terms of how easy it is to maintain it and add new stuff to it on a developer side and on the user side. All the companies claiming that all of the code is not written by agents. Yes, we know quality is garbage. We feel it in our bones when we use your product. It's garbage. So I don't want that. And yeah, basically I think people need to turn around and say, hey, what are we even doing here? We have these wonderful machines now that can take away so much pain from us by doing stuff we hate doing and doing that really well. Why don't we start by giving us some more free time to work on the interesting bits and delegating the stuff we know they can do to them on large across the entire organization. Find all the things that annoy the sh out of you and have the agents automate that for you. And then you suddenly have time to think about, what do we actually want to build, what do our users need? And if we decide to build the thing, then we can pull in the agents again and say, and we're going to polish the sh out of that because now we have the time and the means and the tools to do an excellent job. But that's not how we are working. We build an army of agents and install beats and make a big spec that hopefully will result in something crying. But here's the thing we talked about. Where did the agents learn the knowledge from? Right. The Internet. So garbage to mediocre. Now, if you write a spec, what's the best possible spec you can have?
[74:37]
Interviewer
The best possible spec is. Well, you define exactly how it should work.
[74:41]
Host
You give it test cases.
[74:43]
Mario Zeckner
Best possible spec is the software itself.
[74:46]
Interviewer
Oh, I see what you mean. Yes.
[74:48]
Mario Zeckner
Okay. You write a spec that's not the software itself. So that means there's a lot of planks that need filling in.
[74:53]
Interviewer
Yes.
[74:54]
Mario Zeckner
What do you think? Is the agent going to fill those planks in?
[74:58]
Interviewer
Most likely from stuff from his training data.
[75:01]
Mario Zeckner
And we already identified what the quality of that training data is. Right. Garbage to mediocre.
[75:06]
Interviewer
Well, and even before AI, don't forget, Stack Overflow had a really big criticism because There was this thing of like, well, you control C, control V from Stack Overflow. And oftentimes there will be some answers where the first answer was either not correct or not correct in many cases. Regex for email was a good one. You emailed Regex for Email, first page was Stack Overflow. Everyone just copied the first solution. And I think underneath number three, it was said it missed a bunch of cases.
[75:31]
Mario Zeckner
Yeah, but here's the thing though. I'm not saying agents or humans are better. They are clearly not. But agents also don't solve that problem. And if you then don't let just one agent that's already 10 times more productive as you do the thing that it's bad at and that you as a human are bad at, but a hundred of those, what do you think is the outcome? Yeah, it's just very simple math.
[75:51]
Interviewer
Let's talk about another controversial topic, MCP versus cli.
[75:56]
Mario Zeckner
Oh my God.
[75:59]
Interviewer
It's coming up and right now I'm hearing a lot of people really going for CLI is the future. And I think I'm sitting with two of them. But also, MCPs are also really popular inside of large companies. Especially when you talk with a bunch of people working at large companies, it seems MCPs have found a real product market fit inside of larger enterprises.
[76:18]
Armin Ronacher
Despite what people might think, I don't actually hate MCP quite as much.
[76:23]
Interviewer
Oh, wait, we have it on recording.
[76:26]
Mario Zeckner
Yeah, no, we don't deal in absolutes. We're in sif.
[76:29]
Armin Ronacher
So my fundamental challenge with MCP is that I think, first of all, the spec is very complex, I think, for it, but it's like this is just generally how specs happen to be. So it's a bit the core of its time. So there's an inherent complexity in it. But if you were to say, like, okay, so what is it really doing? At the end of the day, it's authentication and it's sort of invoking some stuff. And mcp, even theoretically there's structured responses. But MCP for the most part is run some stuff, put stuff back into context, and then work with it so it fills your concept very quickly. And there's a cloudflare has this codemod mcp, which I, in principle I really like. I have an MCP for testing, which is a JavaScript interpreter that gives me access to the Google API. And between an MCP like this and a skill, there's not a huge difference because the skill also needs to be in a system prompt so that it finds it. But the agents are just very, very, very, very good at running code. And MCP is not quite running code. It's basically rag. It's like input in and do some stuff and maybe some state transition at the model also doesn't see but it is in that sense just. It's a hard problem to solve, but it does solve off. It solves a whole bunch of things. I want it to work, I just still don't get it to work. Like I wish it could work. And my suspicion is still the glue has to be code execution. But because MCP servers are largely not defined in a way that the model actually understands them, I haven't found ways to compose MCP tools reliably. I found ways to make the MCP itself be composable by having the MCP be one tool run code. But I haven't found ways to then orchestrate larger ones. I want it to work and I think it has found its niche and I don't think it's going to go away.
[78:24]
Mario Zeckner
I think it's just a victim of its own success really. When the whole thing started I think it was in October 2024 for it was more or less a solution to get external services into consumer facing chat apps. Connect your emails, connect your OneDrive. Connect your what?
[78:41]
Interviewer
Pretty much. And then IDs also took it over because it was convenient. The cursors, the wind surfs.
[78:46]
Mario Zeckner
Yeah, but I think the origin was basically the consumer side, not the developer side. And I think that's a totally great use case. I don't want my mom to having mess around with code generation or whatever to invoke some API or call some API and so on. So perfectly fine use case. And then developer side also picked it up and thought oh this is a great way to provide tools to my LLM tools as in in the system prompt somewhere. That is if you want to call this tool, provide this JSON payload and you get this thing back. Right. And that kind of felt right at the time because if you read Anthropic's documentation they would say our models can deal with about 30 to 40 tools in the context. And even that wasn't the case. Like at 12:20 they would just break down but doesn't matter. But there was still like a yeah, this can work if you kind of keep it small and contained and very specific to your use case. And then people started building MCP servers that would just basically map an entire open APIs back into a gazillion tools. Yeah. And that's where it all fell apart. So that's the first problem. Very bad. MCP servers from big corporations that thought we need this. Now what's the fastest thing we can build? I just push the open API spec of our APIs through this thing and make it an MCP server. That's garbage. The second problem is that it's inherently non composable. If you want to combine a tool out the MCP tool outputs of two different servers, they need to go through the context. The model itself needs to do the data transformation, the composition of multiple pieces of data fetched through and then compare
[80:24]
Interviewer
to this with a cli. It's a pipe, right?
[80:26]
Mario Zeckner
Exactly. The model only sees the end result and it is super free in how it massages that data. And that's also the idea behind code mode. Basically it's a hack. It's basically okay, we now have mcp. We know it doesn't work for this specific use case. We have multiple sources of data and you want to combine them, but don't kind of pull that through the context. So let's build code mode. And code mode is basically we take all the MZP servers, we expose that as functions in TypeScript and then the model can actually just write some code that calls the MCP servers and then does the composition in the code. It's like how many interactions do we want here? We can just let the model write the code. We don't need the MCP server. And then the third part is David from Sentry is a big proponent of MCP because it's off the off thing. And honestly that's again for me super valid. But the model itself kind of doesn't make sense anymore.
[81:18]
Armin Ronacher
I think that there's a world for MCP2 which is ironically maybe based more on. There's a company called Stainless which basically generates SDKs out of OpenAI specs. And I'm really warming up to the idea of maybe it is an MCP is entirely based on OAuth plus
[81:39]
Mario Zeckner
libraries
[81:41]
Armin Ronacher
libraries or directly HTTP request against OAuth specs because if you compose it together there and I think one of the things that's also kind of underappreciated and this sort of sec, particularly if you see PY do its stuff because it's kind of transparent of the tool calls that it does. It's kind of magical at times how creative agents get at large outputs. Like for instance PY when it runs a program in Bash and it produces too many lines of code, it actually only reads I don't know what the cutoff is, but it reads the first couple and it's like, oh, if you want the rest of the file, it's 20 megabytes large and it's in this file. And then the agent is like oh, 20 megabytes, that's too much. I'm going to grab on the file, right? And they get really ingenious in how they're interacting with it. And MCP takes that away. The question is like, how would you define MCP in a way where it wouldn't take that away, where it still has all of that magic and, and capability? And I don't really know the answer because I think it's hard but off needs solving and composability needs solving and I think there's a bright future of that kind of stuff. And also like what Mario said, if coding agents wouldn't have become so popular, then the idea of code generation code running for non code related problems probably wouldn't have taken off quite as much too. But the most capable personal agents, OpenCloud being a good example of it, they're just coding agents hidden from you. And then that just naturally some random person who is not a programmer is going to say how am I going to do this? And the model doesn't say install this mcp. The model says, okay, I can write a Python script that does it. And so you naturally have this in the sort of the crazy space you have the adoption of more code execution and the compliant enterprise space, you don't have that. There's a different path.
[83:33]
Mario Zeckner
And I personally don't think that models are going anywhere else other than code generation going forward for any kind of agentic task. I think that's mostly a function of there being a lot of training data for code generation and code generation being a very easy means to control computers. So I don't see a different paradigm there coming out of the model labs anytime soon. So I think taking that as the assumption where the future is going, we just need to figure out how to make code generation kind of work within an enterprise setting with AUTH and all of the other enterprise things that entails.
[84:09]
Interviewer
So let's do a fun trying to predict a year out, which is hard, but in 2027, knowing some of these basics, just again from first principles, where do you think these coding agents might be and the software engineering workplace flow might be? Basically this is just like again speculation. We know we cannot predict the future, but where do you think there'll be a lot of focus in the coming year? And we might in an optimistic case See some results in tools and how we work and what's working, what's not working.
[84:38]
Mario Zeckner
I have no idea. I honestly have no idea. I could make up something that's probably not going to happen. I think the self malleability thing is obviously something I believe in. I think we will see more of that and seeing.
[84:51]
Interviewer
So like self mutable software.
[84:53]
Mario Zeckner
Yeah. Including the tools themselves with which we built the software. And I think that will expand not only to the tech sector but also to non tech applications of agentic tools.
[85:08]
Armin Ronacher
Is it dog years which is times seven? Is that how it works? So that's basically the model I have right now of how this stuff works. It's like when you ask me what's going to be in in a year is like seven years. Right. And to me that makes it incredibly hard to have any sort of predictions about the future because like it's still not one year. Maybe now it's a one year from like people starting to using cloud code. But it feels like it is much, much longer, much more, more time behind and more time has passed and, and I think like right now the closest that I can imagine is going to be like we, we, we know that code execution and code generation and like this harness thing around it, this is going to be it because reinforcement learning gets more of that data. And my strong hypothesis is that as more and more people are starting to wake up to this, you can do interesting things with agents. There will be a societal recognition also of how much more dependent you are on basically two companies. And I think we'll have a conversation about that part. We should have a conversation about that part particularly as Europeans because we don't really have these labs over here. And so I hope we have that conversation. But my best guess is that we'll wake up to the fact that we are now, I mean engineering teams are already now telling me that they have code bases that they think they couldn't maintain anymore without the machine. My guess is that one of those companies will be public and all of a sudden and it will be expensive. And I think that might actually dominate or at least become a conversation that's much bigger than the question of are you using PI or using Claude code or something like this.
[86:47]
Mario Zeckner
I also see we've seen this with was it Mythos, the new Claude model. Oh no, Spot the new GPT model. They will only give this to select partners. So now we are seeing a split in who can get the best intelligence.
[87:04]
Interviewer
Yep. Or the perceived best intelligence. It'll be interesting dynamics. So Both of you are working on AI, on popular AI tools. You're building a startup that of course you're using AI and it's also around agents. How do you both keep up to date?
[87:20]
Mario Zeckner
I've just seen things and it's not as easy to get me on a hype train as it used to be, but that comes with age. It's definitely easier not being in San Francisco because I think that just drive me crazy. I hear so many things from my peers over there and that's just like, yeah, I'm not going to go to San Francisco. Thank you.
[87:38]
Host
So having a peaceful environment around you
[87:39]
Interviewer
where it's not all about tech might be helpful.
[87:41]
Mario Zeckner
It helps having a kid. It helps just going outside, climbing trees, going ice skating, and then looking back at what you did just half an hour ago and be like, why would I do that? That's just stupid.
[87:55]
Armin Ronacher
I mean, to the detriment of maybe people that are trying to stay in contact with me. I. I got very good at not muting notifications, not reading emails. And that has in part become necessary, I think, over the last year or so. But it actually turns out that passage of time sometimes clarifies stuff a lot because if it's really necessary, it's going to reach you again. I have an unhealthy Twitter addiction, which I'm not particularly proud of, but in terms of source of interesting things, that is still a thing. But I try to now sort of consume it in a form of if it's really, really important, it will stay in the discourse for quite a while and I just wait it out. And if it's, if it's there until like three weeks after it originally happened, then probably something to it. And I don't need this three week head start necessarily. But it is, honestly, it's really hard. It is really hard to deal with this because there's a genuine excitement in it. And I feel like more than 20 years of experience in that space of software engineering, it tells me a lot of stuff. But at the same time it hits you in certain ways where you felt like there will be grounding and there will be something to build on and a strong foundation. And now it feels like, well, seemingly everybody else doesn't care about that foundation anymore, so maybe you don't need the foundation. And for quite a while it works. And that is sort of weird.
[89:28]
Mario Zeckner
I kind of feel like since we've been fun, employed in 2025 when all this started, that we had a head start. I see the excitement the two of Us and Peter had in April last year has waned. Nobody else? No, no. But nobody else at the time has kind of shared that excitement that much. And then the Christmas break came and now everybody else has that excitement that we had in April. Right. So now they are learning groups now they are catnipping themselves to immeasurable amounts of lost sleep and at terrible code bases. And I think it will self correct because it's not sustainable.
[90:05]
Interviewer
Yeah, we did see this as well. I did a deep dive with the Pragmatic Engineer early March when a lot of people who were very excited in January about all and they started to use the new models what they can do. They went all in at work or on side projects in about two months time. A lot of them were like hang on, it introduced all this complexity, it has these things, I'm not going as fast as I thought I would be, et cetera. So I guess there's just a natural thing where you have a time, anything new. Right. A job, anything. You have a honeymoon period where you've got the blinders on which you should by the way and then you start to realize and maybe overcorrect. But there's a natural thing where in general it just takes time to see the outcome of your decisions.
[90:50]
Mario Zeckner
Yeah. So I'm not worried about all the dog factory and all the software is dead and sass is dead and all that. I generally believe this is just part of the hype machine and that will self correct.
[91:00]
Interviewer
Yeah. As closing what's a book that you would recommend and why?
[91:05]
Mario Zeckner
Code by Pet Salt Classic. I just love it. It's just such a great read. It's also for non techies and it's the first thing I recommend. If anybody asks me what's your job? I'm pointing at that. And it's like it has much less to do with computers than you think.
[91:20]
Armin Ronacher
And I read recently Breakneck which I unfortunately forgot the author of it sort of goes a little bit into an exploration of how China works and or maybe Europe and the US are different and I found it at least thought provoking.
[91:39]
Interviewer
Well Mario and Armin, thanks a lot for this conversation. It was great to have it in person.
[91:42]
Mario Zeckner
Thanks for having us.
[91:44]
Armin Ronacher
Thank you.
[91:45]
Host
This was a really fun conversation. Thanks to Mario and Armin. The idea of self modifiable software really grew on me. Mario said how PI doesn't have MCP support, plan mode and many other features that devs would want from it, but you can build it into its own code. So far it's working. PI is popular because it modifies itself. I wonder if and when this concept of self modifying software thanks to AI will spread outside of just this dev tool. I also liked how we talked about the observation that agents don't feel pain, but humans do. When a code base gets too complex, the human engineer feels the issues this creates and this tech depth is what pushes refactors and rewrites. But agents simply do not do this, they just keep adding to the complexity. And in a code base where devs regularly feel the pain of the code base and do something about it, the quality will probably be also better. And finally, the MCP versus CLI discussion. This was a good one. MCP is more about offering tools for AI through context and CLIs allow piping one tool after the other. Both Mario and Armin are more of the fans of the cli, but in all fairness, MCP has its use cases, for example inside larger companies. The right tool for the right job do check out the show notes below for related dynamic engineering deep dives that go even deeper into related topics. If you've enjoyed the podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you. If you also leave a rating for the show. Thanks and see you in the next one.