Loading summary
A
Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs. And today there's no swix. He's in Europe with AI Engineer. But I'm joined by Quinn and Thorsten from sourcegraph. Welcome.
B
Thanks. Great to be here.
C
Great to be here.
A
So we already had Origin of sourcegraph with Bejong and Steve, so we'll put the link in there. And this was when you launched Kodi, and today, I guess, Cody. Cody is a brand that has passed and now you have amp. Let's maybe start there. Obviously, Quinn, you're CEO of sourcegraph Thursday. What's your role? I guess title. How do you describe what you do? CEO is much easier.
C
I'm not going to name my internal.
B
Title, but I'm the dictator of amp.
C
Yeah, that's internal title, but I'm the lead engineer and one of the creators of amp. Yeah.
A
So were you part of the thumbs up, thumbs down on Cody Brand? Like, how did you get to amp? Let's tell that story.
C
I mean, I'll start. You can jump in. But basically I came back to software February, and then this was when Claude3537 happened, too. And then Quinn and I started hacking on. You know, what if we just take Claude 37 and what if we give it just tools and let it go nuts? You know, like, no constraints, no. A lot of the other stuff that we had in Kodi, which works for Kodi. Let's just start trying this out. And we started a new project and we were, you know, I remember first weekend SF where I would stand up in the middle of the room, like when you got to. You got to see this, like, this is crazy. And then he was like, okay, let me try this. And then we went off from there. And then we realized relatively quickly that it's a different kind of product where Kodi was very much first of its kind with rag and assistant panels, assistant sidebar, but with, you know, a tool calling agent, where I define an agent as a model, a system prompt, and tools and tool prompts that go along with this that you give a lot of permissions for. So it can actually, you know, see the file system interact with the file system or your editor. It's a different thing. And we realized we gotta handle this differently. We gotta reset expectations. We gotta tell users that it's a different thing and they gotta use it differently in some sense. And also that we cannot make it work with a $20 subscription, which back then was seen as, you know, offensive thing to say. And now they're charging money. Yeah, yeah, exactly. But now, you know, people are paying hundreds of dollars per month, which I've been saying this every day for the last two weeks. That's crazy to me still, like, how far we've come. So, you know, this is just how it started. Like, okay, this is a different thing. We were astonished, surprised, amazed by what these models can do. So we decided, let's reset expectations, let's tell a new story. We can. We have enterprise customers for coti, but they have expectations. We have contracts. These are large contracts, long running contracts. And you can't just say, guys, here's a new mode. It costs whatever, how many much dollars more. It works completely differently. You need to hold it in different way. So in order to avoid this and to avoid being disrupted, you create a new thing that kind of disrupts the business on its own, you know. Yeah, that's. I don't want to add.
A
Yeah.
B
The only thing that matters is building the best coding agent. Nothing else matters. Because if you can build that, that's way bigger than anything else that came before. And to be clear, nobody has built that yet. We are getting better and better. But I think you've seen this treadmill of tools that you use as a dev. First it started with Copilot and then Cody. We were really good at chat rag and then Cursor and Windsurf showed that kind of IDE forks and partially agentic things could get better and better. And then the next generation AMP and Claude code. And now you're already seeing people say, oh well, Codex is better than Claude code. And there's not been any tool that has stuck with devs for more than six or 12 months or something.
C
Six months. Yeah.
B
And we saw that firsthand. We're now on our second iteration and we are able to move so much faster. Given that it has a totally different name, totally different brand, and some people don't even know that Source Graph or the people behind Kodi Made amp, that has been so good. So I do not know how, if you had an AI tool that was relevant nine or 12 months ago, how you can even bring the same brand and same customer contracts along with you and make a good product. It is so liberating to be able to say totally different on the technical.
C
Level Kodi was or is. You know, it's a Source Graph product, so it's kind of. It works with the Source Graph platform. That means you're tied to the release cycle of the sourcegraph platform. And saucekraft is in the cloud. We have, you know, cloud versions of Sauce Graph, but also on prem for some customers. Completely different game. And with amp, we basically said, let's undo this. Let's build something that allows us to ship 15 times a day. And that's what we've been doing over the last six months. Like we're still doing this. And it's a game changer. Not just, you know, anybody who's done this knows this, but internally and externally, you need to reset expectations that this is a new way of how we build software. And having a new project with a new way to do it is, I think, a better way to do it than to try and get the old to move in this new way, because it would take longer.
A
Are there any numbers that you share about developers like AMP usage?
B
Overall, it's growing really fast. It's growing more than 50% month over month, a lot faster in some weeks. And really what we have seen too is there's a huge change in who's using it. So we have teams with like two or three people that are on annual run rates of like hundreds of thousands of dollars. So that's it. We also made a decision to not try to go to every single dev in an enterprise, which we had done with Cody. We pick off the people that want to move as fast as we want to move, that want to stay at the model product frontier like us. So it's all about just being able to move really fast. And I think that the way that agents work today, most of them are used in your editor or CLI interactively. You have one agent at most running with you at all times. That's going to be blown up with AC when they're running 24, seven concurrently in the background, then you can have ten or a hundred times as many. And that's going to dominate inference, that's going to dominate the output you get. So it's really, you know, AMP is growing really fast, but it's more about how do we get to be the first ones with that, like 10 to 100x improvement. And everything is about how can we move fast and learn along the way. It just so happens that we are positive gross margins along the way.
C
I would say that's one of the biggest axioms that we have with AMP is that we don't know where this ride is going. But what we do know is that it's changing every few months and, you know, start of the year, right. Cursor was the king and the biggest, fastest growing startup of all time. Now if you were to ask a lot of developers, what do you think is the dev tool king, I don't think they would name curse as the first one. And then in I think a couple of months later, or maybe a couple months before, somebody, this was from somebody in sales, they said like, I don't know what it was, but they were basically saying blah, blah, blah. Makes cursor look like GitHub Copilot, you know, like, makes it look old and boring and enterprisey. And this is like, just think about this. Like, copilot is not that old. Like it was state of the art, I don't know, maybe two years ago or something. And now the world has changed completely. And we know that this is not over yet, like, that the changes are still coming. So from an engineering and business perspective, this is priority number one. Position yourself in a way that you can react to these changes and position your product and your expectations and your technical code base in a way that lets you react to these things as fast as possible. And then everything else flows from. Everything else we've done is basically based on this. Like, that everything can change at release of another model or something.
A
But how are you doing it internally from a team perspective? Because, you know, obviously you have a lot of customers already on the sourcegraph product. There's kind of like this tension of, you know, going founder mode and kind of burning the bridge on maybe some of the old use cases versus having a smaller team and a dictator for a new product. How does that look like from like a building the company perspective?
B
When you have a really popular, successful product that's highly profitable, that funds a lot of this craziness. And we're able to do this also with the customer trust. So there's a lot of things on amp that we do. Like no consistent pricing, no user model choice, no checking off all the boxes that security and compliance and legal want that, you know, takes nine months. We're able to get away without doing that stuff because we have that customer trust. So, you know that that has been a big thing. It requires you to totally change how you think about an existing business. It's not a way to sell through that same channel to those same users. It's a way to use that trust and that revenue to fund crazy stuff that you got to do. But it's something that we deal with all the time and we've got really smart devs. And yet it is hard for people to throw away everything that they have learned about how to build software development. And so in some Cases. It's been really refreshing to have people that have only ever been at tiny, like one person companies.
C
Yeah.
B
And they come here and they have no preconceived notions about how you do planning or anything like that. And that is, it's great because you can throw all of that out of the window.
C
Yeah, we've had. This was radical in some sense that when we started it was Quinn and I working on main. No code reviews, nothing, and just pushing. And it was like a personal project. And I think we're both experienced engineers, so it would be everybody owns their stuff, you push and if you break CI, you go and fix it, or if the other person is awake, you fix it or something. And it seems like when you move this fast and you ship this often, you have, you know, throughout the day there's like 15 decisions you have to make where you have to flip between the duct tape, personal project mode, move fast. And this is how they do it at Google mode. And it, you know, requires a certain expertise or it requires also to be free from like the thinking of the last 15 years of like always do it like Google, like we always scale up. And the base assumption between like the whole Google thing was always that, oh, we found product market fit, now we have a product, let's scale this up. Right. Every company I ever worked in was based on this assumption that this is the product rug, let's make it proper and engineer it up. But now with these changes, what's ingrained in AMP is the understanding that, well, even if it scales up, we have to be prepared that somebody pulls the rug and a new technology comes out and it kind of shifts everything. So we have to be prepared for this. And again, it all flows from this. So now in our development mode, the team is super small, you know, compared to I guess other companies. But I think with round eight people now on the AM core team and we still don't do formal code reviews, we still push to main, we still ship 15 times every day. We dog food this as much as possible. And it turns out that in a fast moving environment like this, this beats a lot of other things like fast feedback loops and using the product yourself and dogfooding it, using the product to build a product beats a lot of established processes, you know, and we can get away with it because we can dogfood it and. Yeah, how has it been internally received? I think we have the luxury of, you know, making use of the infrastructure that we already have. For example, have a fantastic security team. Right? Security team Comes in, guys, let us take care of the security stuff for amp, you know, so that's just fine. And I'm like, cool, Like, I don't have to worry about this. Then we have infrastructure people, guys, let us take care of how to run this in the cloud. Cool. I don't have to worry about this. I can concentrate on the client or the ux. So this is a nice spot to be in where we can move fast but use platform teams to kind of make sure that it doesn't break or it scales up or whatever, but still have like the, you know, the tip of the iceberg can melt and be rebuilt basically, while the thing beneath the waterline is stable, you know?
A
Yep.
C
Not the greatest analogy, but I think it's. It's a. There's a distinction between, you know, like platform stuff that does work, but on the UX or product application layer, you want to be able to kind of tear the thing down and rebuild it as fast as possible. And I think that's what we're doing.
A
One thing is you get a separate team and then the other thing is, how do you put that team to work? Right. Like, if you look at like the coding agent space, I mean, obviously you started with Cody and I think there was maybe a thesis behind it. And then you had the rise of clock code and you had Codec cli, which is trying to catch up. I would say they're maybe a little behind on the UX and all of that, but they obviously have, you know, billions of dollars to train a custom model. So that kind of weighs a lot of the option. How did you decide about the structure? So you have both plugin for ides. So I use M code and cursor, but I can also go in the CLI and use M code. Was that an easy choice? Was there a lot of discussion on we should just do one of the modes? Supporting both is obviously more work. Right. And a lot of these products don't support both. So what was that initial design choice of the structure of the product? And then we'll dive into the models as well.
C
So we started with the VS code extension because it was the easiest thing to get off the ground. Like when you have a VS code extension, you have a marketplace. You can ship this, you can update it 15 times every day. You don't have to think about updating stuff. You also are next to the editor. And looking back, you know, it's been six months, the editor might be dying or you might do a lot of coding outside the editor. Back then, it sounded much more radical than it does sound right now. So we started with like, let's explore this. And having the thing next to your editor is a good place to start. And you can see the cursor, you can do selection, whatnot. But we were really, like, from the start, we didn't want to have like a deeply integrated thing. It was always like, ah, let's keep the feature small. We got to be able to move fast. And then we build up the CLI on the side as like a different client, which also gives us the ability to abstract, like the core and the client stuff. So that's a nice boundary to have. But then, to be 100% honest, we were also surprised by how many people were fine with using a CLI for cloud code. For example. Like, if you had asked me half a year ago, I would have said, no way, like a CLI tool. And what we realized is, well, a CLI is not just, you know, it's a ui, sure, but also it's a CLI program. That means you can run it on ssh, you can run it in any other editor, you can run it in multiple split panes, you can run it in multiple tabs. If you want to do this in VS code, you have to rebuild a lot of stuff. And you have to rebuild the way you switch between conversation, you have to rebuild. I mean, SSH works out of the box in VS code, sure, but still, like, you're tied to this. And we had an experiment, an internal one, about a desktop application. So like a standalone application. And turns out, yes, that's great to have multiple agents, but you also have to reinvent everything that right now Terminal gives you for free, right? If I use Ghost or ITERM or Vesturm or whatever, I can command N, command T. I get tabs, you know, splits different environments per tab. You can CD into directories, you can set nvars. You get this for free, right? And if you do it in a desktop application, then you run into the issue of, you know, what people see with like, a lot of the Async agents. Oh, you run and run the task, set the end for us, which directory you have to be in, what's, you know, what's in the path, whatnot. You have to do this beforehand, and in the Terminal you get it for free. So that's kind of the short version of it that we started with VS code because it was easy and it gave a lot of feedback. We could concentrate on the stuff that matters and not worry about Stuff like distribution, which VS code takes care of. And then with the emergence of clis, we noticed that it's a big, big improvement or there's other advantages to it. So now then we rebuilt the CLI twice and now we have like a really nice twee with our own framework. And one interesting thing is our VS code extension has a lot of advantages over the cli. For example, it's easy to display diagrams, it's easy to display images, it's easy to render a bunch of stuff. Like we can do command, return to submit messages, you know, all of that stuff. And turns out we had like an internal poll last week at our company meetup where Beyang was asking, who of you uses the CLI and who of you uses VS code? And was a 50, 50 split. And it's very strange that it comes out like this and there's not a clear winner. And both have advantages and disadvantages. And so right now we have both.
A
But do you cut that data based on the level of the engineer or maybe the specialty, you know, maybe front end versus back end? And how do you segment that? Or do you just take it?
C
I mean, we haven't really segmented it. If I had to, you know, guesstimate here. There's also a generational divide where I would say the younger people, you know, younger than 25, they're terminal. Seems old to them. And they are much more inclined to use the stuff in the editor. But yeah, we don't have any fancy segmentation, I think. Not to sound too dramatic, but like, one of the other guiding principles that we've had from the start with AMP was whenever somebody is like, what's the data on this? Or do we have like analytics on this? It's like, well, did you look for it yourself? Like, did you try it out? Did you talk to customers? Like, we constantly talk to customers. That beats a lot of other stuff. So, yeah, we don't have any segment analysis of who uses what and where and how.
B
I use both. And this idea that everything is changing, it applies to this. We looked at this, we saw the way that things were going and how much more flexible a CLI was. And we, about three weeks ago, we said, we think probably it's painful, but we will kill the VS code extension for amp. And we said that. I lay that out and I didn't like it, but it seemed like that's how things were going. And then you think about Async agents, which probably need to be on your phone and on the web, or Maybe you use WhatsApp to interact with them. That's a whole other mode of interaction. Well, and if it's on the web, that's like the VS code ui, not the terminal ui. And then there's this other thing that we're planning on doing that I can't share more about. But that also makes me think, well, actually we really need to keep the VS code UI in. And so this thing that seems so obvious, actually there's two other completely different things out of left field that totally overturned it.
C
Yeah.
B
So we're keeping it and it's definitely adding some more complexity. But there's a lot of things we can do to reduce that and simplify it.
C
But there's always a hand hovering over the button to can we get rid of this? Like, can we shed weight? Like, can we get rid of. Can we reduce complexity? So we're again in the spot of if a new model comes out, we can react quickly. And, you know, sure, it's good engineering and there's not a lot of duplication, but still, updating one client is still faster than updating two clients. So there's this constant tension between what's the most minimal product that we can have and, you know, just to pick some other examples, there's a lot of niceties you can do in VS code where, for example, you have recent. Not recent, but a common example. You know how in VS code you can hover over diagnostic and then you can say, you know, fix this or whatever. And then people would ask, can you add like a Let amp fix this button? And it's like, you can also amp knows about your selection, knows about the diagnostics. It can see all of this. So you can just ask like, fix this for me. And if you type three words, it will usually do it. So that's something where it's like, well, you can already do it. It's a nicety. But let's remove this surface area. Let's remove this other thing that we have to backport or keep working or whatnot. And tiny example, but there's, you know, 500 of these, what we say.
A
But how do you think of that when the IDE is already AI ide. So I use cursor, right? Yeah, there's already like fix and chat that pops up and they want obviously that button to go to their chat versus like you guys are on the left side and it's like, yeah, just do this here. Do you feel that in a way the ID VS code extension is more like for the people not using this like AI first tools and using the features like most people. You know, I'm sure GitHub is like eventually going to have something good to put in VS code. How much do you think about VS code extension just being maybe a stepping stone to the thing you cannot talk about, that you don't talk about. And then the bifurcation of the TOI versus like the fully async. You're not looking at anything.
B
We're not trying to maximize our revenue, our user adoption. Literally today with the state of today's models and today's tools, because everything's changing so fast. So yeah, we're not trying to fight Cursor for who's going to win the rights to have users fix with our AI or their AI. Frankly, it doesn't really matter to us. I don't think that that interaction is a really important way that people are gonna be interacting with AI in six months or 12 months. I don't think we learn anything from that. And we just said we're not gonna do it. And users, some have definitely asked for that. And the other thing is we have to figure out what do users actually want? And they say they want a lot of things. And in the case of customers, a lot of times they'll say they want a lot of things. They'll say that they want bring your own key. They'll say that they want model choice. They'll say that they want a subscription for a hundred dollars a month or you know, pricing to lock users out if they spend more than $30 in a day. But actually what we've seen is they want the very best coding agent. Not everyone.
A
Not everyone.
B
But we're focused on the ones that want the very best coding agent. And when we tell them how that thing will slow us down, then that starts this conversation where they'd rather not have something they might use 2% of the time if that means that the tool is worse.
A
Right?
B
And we alone among the entire industry, it feels like we are being really honest and really bold with that. And I am really concerned just for the rate of progress overall, that a lot of these other tools that are great, like Claude Code and Codex and Cursor and so on, that they've forgotten what made them great and what made them grow so fast, which is building the very best product. And they built it in a way that's too overfit on the current capabilities. And so they're just going to peak and then it's going to be a slow fall. And zero of the software Business model works. If that happens, you need to have growth into the future. So I think it's best for our business, but also I think that we're trying to push the whole industry to just be radical about the changes that are coming. Yeah.
A
When you said the best coding agent, I'm always like, is there a market for, like, the mid coding agent? I think the model choice is a great example of, like, why would you want a model choice? I think pricing, I guess, is, like, the only thing that people bring up. But I think to your point, it's like, you already pay engineers a lot of money. The cost of Sonnet 4 versus Sonnet 3.5, it's kind of like minimal compared to 150, 200, 300k once you do taxes and benefits and all of that that you pay to employees. So, yeah, I think we're in this part of the market almost where people are not maxing these things.
B
Yeah, there's absolutely a market today. Literally today. Someone will pay a monthly fee for that cheaper AI product today, but they're not gonna be paying that in six months. It's gonna be a different product or they're gonna be paying for something else. And if you have that much churn as a product, you simply cannot build software in that way. But a lot of people get tempted by that, and they hear a lot of users ask for it.
C
But six months ago, it was still the game of, oh, new model got released, and then everybody would tweet out, it's already available in their editor or whatever it is, their extension. Right. And I think that's kind of over. Like, it's just people realize that, well, the benchmarks are one thing. Right. Oh, this is the best model. Turns out it's not in this editor, but it feels different than this editor. So the whole, like, you know, the models are the thing. I don't want to say that's over, but it's becoming less important. And people are not now also waking up to the fact that it's not just the model. It's a system prompt. It's the tools, it's the harness, the scaffolding around the model. So I can give you the choice to use Gemini 2.5 in AMP, but without the system prompt being tuned to it, without what I called before, like, going with the grain of the model. The models are trained in different ways, so you want to optimize the tool and all around it for this specific model without that happening. Doesn't make a lot of sense. You get the wrong signal. I can drop you in a new mod right now and have it available in 10 minutes. But that's not what you're after, right? You want the best possible version of this model in this tool and that's I think become more important, less like the model selectors and whatnot.
A
Why do you mention the models at all? So you have sonnet four for the agent, you have O3 for the Oracle.
C
We don't, we don't show them in the product. We don't mention them all at all. We put it in the manual. We have like an owner's manual because people kept asking us, well, but even.
A
Then it's like, why does it matter that they ask? Because you might not now. It's like if you want to change it tomorrow, then it's like you gotta tell people you change the model. And it's like, where do you think we are on the slope of like, hey look, you guys should forget at all about what model is even running, what the difference is.
C
So I think we're going towards a future where the model will become an implementation detail to some sense and we will end up on a different abstraction layer. And for example, you ask like, when would I use a mid model? Right? When you put it like this it sounds obvious like who wants to use the shitty version of the beta version. But you know, we're thinking actively about this. There's models who might not be as smart as Sonnet 4 as the main agentic driver, but it might be 10 times as fast. And that doesn't mean that you think, well now I need to go faster to use this. But I think there's different modes of working in your day to day work with this model in a different harness or in a different configuration can then be another way to do or get things done versus talking to an agent in a back and forth. So in that sense we've seen this with planning modes or people use different models, but it's still pretty clear that it's a different model and whatnot. But I do think it will be pushed more and more in the background and that people will choose or have different ways to interact with models and the specific model or its version will not be as visible anymore.
A
Yeah, and I know Cody was using Starcoder for inline edits, at least that's what Bianga said publicly, so I'm not leaking anything. Does this still seem interesting to you to figure out, hey, is there something in open source that we can use and maybe fine tune to make better? Or are you still we just want to be at the cutting edge and that's maybe in the back burner.
B
So first it took people eight or nine months to figure out what 35 Sonnet was capable of from when it was released last June. And this was around the time we were building AMP and Claude code came out and you realized that, wow, like a tool calling agent is incredible. And at that moment everyone, all the smartest people in the world also realized that billions of dollars of money went into training new models and harnesses based on that. And now it's September 2025 and we're reaping the benefits of all that investment. And you have so many more models coming out. You have the open source models like Quentin 3 coder and Kimmy K2 and they're moving so fast. You have Xai's models, you have five that came out and we're still figuring out how to use these things. But it would actually be an incredibly pessimistic outcome if all those smart people and all that money were not able to build anything that was better than Sonnet. So we in our internal team right now and this could change. We have about half of our internal team using a different model other than Sonnet as their main way of using amp and that's a huge change. In the past we had done that only to test and begrudgingly, but now we're using it and there's a different way of interacting with an agent that's not the linear chat transcript that actually means you don't feel like you're getting a cheaper mid model. You feel like this is a different way of interacting where that speed is really beneficial and it's more constrained. So things are changing so fast.
A
Is the GPT5 codecs only being available in Codex make you nervous about future availability of cutting edge models and does that put more emphasis and figuring out maybe an open source strategy.
B
They make it available to API customers, it's delayed. And if they were doing that, I really think that for the most part I take these model houses at their word and they wanted to get it out to their first party product as quickly as possible because they honestly need to gather more data and they're iterating in public. So yeah, I would love it if all the model houses perfectly coordinated with us before they released anything. But I know that would slow them down and I don't want to slow them down like that. In the same way that we want our customers to give us grace and help us iterate in public.
A
Yeah, I think there's an interesting just dynamic in the market. Like when cursor switch From Sonnet to GPT5 it's like the default model that was like 200 million of revenue for Entropic that kind of went away and moved on to GPT5. So there's kind of like okay, we're all friends now, you know, but maybe later that's going to change. But yeah, it's an interesting.
C
The other thing also is that you know if you're building an agent and you're not at one of the model houses, you can use multiple models from different providers. Right. So which is what we do. Like we when you use amp, you're using a model from Anthropic, you're using a model from OpenAI and you're using a model from Google and we also very close to shipping like a fast open source model that we can use as a different sub agent in there too. And when you put it like this, it seems silly to say we only use one model of this family because they all have different strengths and weaknesses.
B
I think we are one or two months away from a possible news cycle that is the foundation model. Companies have spent billions of dollars in capex and hired like crazy and now you know, they're no longer the best in this realm and there's a huge stampede away from them. That's very possible. And I'm not saying anything new. Just imagine last May when people were counting Anthropic out before Sonnet came out. Things change so fast here.
A
Yeah, yeah, yeah. And I think OpenAI obviously with Johnny I've and some of that it's moving more in a consumer fashion as well. So it's been interesting to see the big push on Codex. I would have imagined them to go more towards education kind of like big and I know they have a lot of big enterprise contracts for like ChatGPT for your enterprise kind of thing. So yeah, you guys I think are in a good, in a good spot because you have both like the Sourcecraft trust like you said but also like amp I see a lot of great stuff on Twitter. You know people are like I just put all my AMP agents running. I come back, it's great. It's like I think it's now on that wave of like okay, this is like one of the best tools out there and like if you're like a serious engineer, you should probably use AMP at least in some capacity and then make your own choice. How difficult is it to think about what goes in your harness versus like what People should build. So you have custom commands. You've done a great job on the tooling where people can put executables as tools instead of having certifying like an MCP server. Yeah. How much of it. You're like, hey, we're just giving you the tools versus how much you want to be opinionated with things. Like, I mean, I think of like compacting conversation as like maybe one of the key commands that people have. And like in clock code, you can give a custom prompt to compact. Like, what's that discussion like?
C
Yeah, the main assumption again, everything is changing. We got to be able to move fast. That means what you want is. I don't use the picture of a harness often. What I use is like a scaffolding. Like you want to build a scaffolding around the mall, a wooden scaffolding, that, that if the model gets better or you have to switch it out, the scaffolding falls away. You know, like the bitter lesson, like embrace that a lot of stuff might fall into the model as soon as the model gets better. Right. Because then it can remember more, whatever. Why invest three months in like a separate apply model when the next generation, you know, 0.7 version or 0.8 or whatever version of this model can now do all of the edits on its own? So that's again, the bigger thing. And, and with that in mind, we really try to restrict a lot of the features that we add around them all. And you can do a lot of stuff, like we could be busy all day adding stuff in our clients and whatnot, making the product more complicated, but we don't want to. So that's the first thing. The other thing is we're living in strange times. We're living in strange times from a product development perspective, where basically I think the old triangle of design, product and engineering, it's kind of changing. It's not a triangle more. I don't know what shape it is and whatever, but it's not, it's not a triangle anymore. And the reason for this is, is because you can't build a roadmap. You can't say this is what we're going to build in the next six months. People don't know yet how these models are, can be used to their full extent. Everybody's figuring this out on the go. That's another thing. The other third thing there is we just talked about as well, having coffee before coming here is that the only UI basically is like a text UI and you can use this in the wrong way. And the example I used earlier was if, you know, you buy Jira, for example, but you use it for your shopping list. Atlassian is happy about this, but that's not what they built the product for, right? But you can use it in the wrong way and still get results. The problem with LLMs and a lot of the models is that you can use it in the wrong way and it looks like you're getting results, you know, like you can use OpenAI ChatGPT to look up serial numbers or, you know, technical specifications for a camera or something and it will tell you this, you know, but it might be wrong or 99% of the time, or 98% or 95% of the time it might work, but in 5% it might not work. So having non deterministic LLMs as the heart of your product is something unprecedented that we have in software, I think. So with that in mind, a lot of the features, what we see, you know, where people build like elaborate workflows, like I have my custom slash commands and they trigger custom sub agents and they in turn trigger custom MCP tool calls on behind which again another model is doing inference again and taking the input and blah, blah, blah. I think a lot of this will and has resulted in hangovers where people realize, oh, like this looks like it's a deterministic workflow. It looks like it does the thing that I wanted to do, but actually I can't use it if it only does it in 98% of the time. So that's something we're really conscious of where I think everybody's experimenting, everybody's sharing their experiences. You know, the threadboard tweets about what to prompt, where and how. But you have to be super strict about not giving users a false sense of what the product can do and how reliable it is, because I think it's dishonest in some way and it doesn't lead to good results. And just as an example, I think over the last three months I would say we're ahead of the curve, like using amp eternally, we're ahead of the mainstream agentic adoption. But say a month or two where we've tried a lot of this stuff and then realized, oh, this wasn't the best use of our time or the tokens. And now you see a lot of other people waking up this famous on Twitter, Armin Ronicker, the Python developer from Austria, he's done a lot of good stuff with cloud code and shared a lot of his learnings. And you could see that the Way he tweeted was super excited. Like a lot of things I can now do this and this and this. And then a month later it's like, oh, maybe, you know, having eight remote control agents that I control with my phone and let them run for 20 hours, maybe that's not as productive as I thought it would be. And yeah, it's something that we're super conscious about.
A
What are those things? What are the failure modes that you heard from customers where it's like, hey, we tried AMP and it just didn't work at doing xyz. Is there a collection of those that you guys use as almost like a North Star as you keep building?
C
But I think one of the things is the whole vibe coding stuff where people just use it and they're like, hey, I spent 10 bucks in tokens and it didn't build me the full app or something. The failure mode of outsourcing, the thinking but not the typing, which I think it should be the opposite. You still have to know engineering, you still have to know how to program, you still have to know your application and its architecture, how it's deployed, and then basically use the agent to do the work that you would have done. But you have to know what the desired outcome is and whatnot. Like that's a common one where people just hands off the wheel agent. You go and write this for me and then turns out a couple hours later, oh, actually nobody understands. It's, it's spaghetti code amp.
B
It's different from the products it competes against. So we've had one head to head loss with AMP where we lost against the, you know, usual players. And the reason why is one of them discounted their other product 100% for two years. The other one discounted at 85% for two years, which is just crazy. And we wouldn't want to do that because are we really going to learn from that? And then how's it going to be used? It's going to be used in a different way. So usually the way that we might lose is there's some other product that would go to 80% of the devs in a company that is like the base layer, sometimes that's copilot or cursor and AMP is more expensive, it's more powerful and they'll give it to that 20% of devs that they trust more. And in a previous world any software company would say, oh no, we need to get 100%. That's, we don't want our competitor getting in there. But actually that Means that we're able to even more focus on being bold and crazy because all those devs can always fall back to a cursor or a copilot. So we actually really like that kind of deal.
C
The other thing there, I think a bunch of questions already touched on this, is that talking about segmentation or market or the ideal user, again, everything is changing. So what we try to do is we try to, you know, build a tool for people who are at the frontier or at least curious about it and want to figure out how to use these agents in the best possible way. And that's based on the assumption that if you build for the mainstream user who not, you know, mainstream sounds like, I don't know, it sounds bad. But what I mean is, what I mean is if you build a product for somebody who does not know what a good prompt looks like, you will fall behind right now because you will spend time and resources building stuff like the prompt enhancer and like blah, blah, blah, blah, blah. But then you will end up building this and you miss the next step change that might happen. So the way we think about it is we build for the people who already get that. A lot of stuff is changing, but we want to leave the door open. If you're open to learning new things and you want to learn how to use AI and agents in your workflow, please come with us. We're happy to have you. But if you're skeptical and you, you think prompt engineering, that's a bullshit term. I don't care about this. We're not right now building a product for you because we would fall behind.
A
Yeah.
B
So prompt enhancer, that's a bullshit feature that doesn't actually work. The theory behind it is nuts because what helps LLMs is not tricks and phrasing your prompt in a certain way. It's fundamentally information that you have in your head that you can bring into the prompt prompt. And if you don't have that in a prompt enhancer, LLM cannot magically conjure that up. It cannot narrow the search space for you custom sub agents. The way that we disqualify that is something we wanted to build at this point is because you look at all of the tokens that you're sending to the model and it's so many more. It's so much more convoluted. We don't think that these models are trained in a way that would support this use case and the output of this going in here. It's so much harder to debug and mcp is another thing. MCP has done a great job in getting products to expose the verbs that agents might want to, you know, interact with, although in most cases they do not actually get the right verbs exposed. But as a user facing technology, it is such a common failure mode where a user will go and add in some MCP servers. Auth is a huge pain. But let's say they get over that hurdle. Then they have, I don't know, 50 tools exposed that often are too low level granularity and it takes a ton of tokens in the model. It makes everything slower and more expensive. They're often misused and it's just not a good experience. So, you know, there's all of these things that we've said no to and other tools are bringing them in and they're saying yes to all these things. I think it feels like they're making progress in the meantime and people retweet and people talk about how they're able to do these amazing things, things. But just the simplest example, that seems so obvious and frankly, it confounds me that more people don't do this. You make it so that my Google Docs and Notion and linear and GitHub issues are all accessible to my agent. The vast, vast majority of developers who use AMP or cloud code or anything else, they don't have all those context sources set up. That seems like such a slam dunk. So we built that, we ripped it out. Before we would move forward with that, we'd have to get an answer even for our own usage, why are we not doing that? And it's frankly still puzzling to us. But we're not going to touch that until we get confident about that.
C
And to come back to the example you mentioned, compact, we have this in the product, but again, the hand is hovering over the rip it out button because I think compact is such a alluring thing where people think, oh, you know, I ran out of context, I hit that button, now I'm back to the start. But you lose signal, you lose data and it's something where are the mods really good enough? Is compacting good enough to really glance over this, that the user doesn't have to worry about it? Or is it something where you would have to somehow make it clear to the user that, hey, look, Your conversation has 50 messages back and forth. If you hit compact, this is all going to become blurry. You know, you're going to compress it and you lose signal. You use fidelity and then you put it in a new context. Window. Are you sure this is the right trade off? And some users are. But again, it's strange times because now we have this thing at the heart of our software, this orb from outer space that can do sometimes whatever it wants. And it's strange to build on top of this and it's strange to educate your users about this, that this is the thing. Imagine the end of the 90s PC era, you had to build Microsoft Word. And then you say, well, at the heart of this new personal computer depends on three whatever. There's a weird op from outer space. And sometimes if you bold, tax, invert, it actually makes it italic, you know, but that's the situation we're in. Like, that's the fact. Like it doesn't always bold the text. I mean it's underlines it. If you reach 150 tokens or 150,000 tokens or something, how do you teach this to the user?
A
Yeah, and you know, we're in the church of context engineering at the Chrome office, and when we had Jeff on the podcast, they talked about the context Rod paper that they did and they mentioned specifically encoding, for example, showing previous failures was like not helpful at all to the agent. And so I think when you're compacting conversation, there's almost like, you know, if you have a long conversation, it usually means something went wrong along the way and you had to like go back and forth on like a bunch of things that didn't work and you're keeping those in. But I've been trying to figure out what's like, what's that going to look like? In my mind, it's almost like if you take the idea of linear, which I use and I give to my agents just to get. Because then I have a canonic prompt for one issue. Because often you have to restart because it's like it just goes too much down the wrong path.
B
A lot of people don't restart. A lot of people just try to keep going.
A
Yes, that's bad, but how in that, in that case, what can you take from that conversation as a learning and put it back in the upstream issue so that then the issue is like either more descriptive or as like more information that is not compacting, but it's almost like how you would do as an engineer. It's like you're doing it in your mind, right? You get an issue and then you start working and then you kind of update your mental model. That doesn't really work for agents, but people are not doing this small increment in the initial issue, I would say.
C
In this case it's still you cannot outsource your thinking. Right? Like in this case, I don't think you can expect right now model to say out of this conversation, this is the most important thing. Let me put this back in the linear thing. I maybe if you phrase it like this and automate it like this, and it's always a perfect conversation, maybe it works. But I think in this case you still have to be mindful of the context. And what we encourage users to do, for example in amp, is to start a lot of small threads and be really, do context engineering and be really strict about what goes into context and what doesn't. And the other thing that I think touches this on is where a lot of CLI tools, for example, have super verbose output and Bazel. Sorry to call this out, I'm not a big bazel, but you could just call Bazel out super verbose output. So then the natural assumption is, oh, let's hide this from the user. You know, like let's. Let's abstract this away and summarize the output or whatever. Or whatever. Just the exit code or something. And then you get into this dangerous territory where what you see is, what you get is not true anymore. And in the context, what you see is like some other thing in the context. And that could lead to issues. But for me, the meta thing here too is everything is changing. That means we're seeing this. CLI tools right now are also adopting to being used by agents, so they're changing the output too. So if you focus on the fact that Bazel will always be robust and build something for this issue, you might be outdated in half a year where somebody is like, no, no, no, we have a basal agent wrapper. And now this is not an issue anymore.
B
One model that I have is if you are relatively on the cutting edge of using agents and there's some PERS persistent problem like this, it feels kind of out of band, like how the model itself will update its memory or will update the linear issue, the model needs to be trained in order to do that better. If it's something like your own coding conventions, that's different. But if it's something fundamental that feels like about out of band from the agent, the model needs to be trained to deal with memory better or to accept the fact that it might have a incorrect view of its own history. If you go back and edit it, it. And we're feeling these pains right now because people have only been using agentic coding Tools for a matter of months. Most people have been using them for like less than three months. And if we're only feeling them now, it takes a little bit of time for a team at a model house to go and do a fine tune of one of their really big models or they've got other big models, the new revisions that are being trained and they can only fit a certain number of experiments like this in, they're probably going to get half of their approaches wrong. So you can only do so much. And that's Thurston's idea of going with the grain of the model.
C
And I mean, you've seen this, I'm sure, where a lot of users are going through this lesson where they let me just add this MCP server that does everything I wanted to do and then two days later it doesn't use it. Like it never calls the tools and it's like, yeah, it wasn't trained to do this. And you can sense they have different philosophies in the model houses. I think anthropic is, from what I can tell, working a lot or training a lot towards using memory, like storing information on ChatGPT obviously has this OpenAI. So if you give it a memory thing, yeah, it might use this. But then you have the issue of, well, if I give it this other custom made MCP that we build internally and our processes don't map to anything that OpenAI and Anthropic have seen or trained for, it won't be used and you won't get good results and super strange, right?
A
Yeah. I wrote this article for the GPT5 release about models self improving for coding. So I basically asked GPT5 what are tools that would be useful to you to be a better software engineer? It's like, well, you know, give a list of like 10 tools and I'm like, okay, implement them. Wrote all the tools and then I asked it to do the same task I'd done before, but with those tools. And then it goes through the whole task and I'm like, which of the tools did you use? And it's like, oh, I didn't use any of them. And I'm like, why did you not? It's like, you know, to be honest, I don't really need the tools, I can just do this task. And I think that's like a good thing metaphor just for the trend of the models, which is like, hey, they're going to use less and less of these custom made tools to fix today's issue. I think the things that we can bet on, and I'm curious to hear your thoughts is they're always going to have some sort of test runtime. I don't think there's going to be a world in which the model is not going to run tests and say I'm sure this is going to work. The other one is there's always going to be some sort of infrastructure as code to then handle the deployment side. So I think whenever there's going to be some runtime issue, they're going to need to understand where they're running, you know. So I think like you can put them in a box having an actual Docker file and whatnot. It's helpful for them to explain what they have access to. What do you think are like other things that you don't expect the model to like have in the model that you want to still expose to it. So we can assume it's going to test, we can assume it's going to have some definition of its environment. Are there other things that come to mind?
B
I think test is a big one and there's many different kinds of tests. So we had subagents in amp, you know, among the first that come out with this conception of subagents, which is a separate context window, more curated set of tools. And I think there's a lot of potential to take a tool like test and right now you invoke it by the Bash tool and you have some complex invocation. Too often it'll run all of your tests, which is noisy and it takes a long time. If you're in your editor and you've got something nice set up, you can hit like a hotkey and then it'll only run the tests that you need at your cursor. So giving the LLM a tool like that seems to have a lot of potential. And then that could even potentially be a smaller model, a fine tuned model for that task. It could be multiple based on what projects or stack you're using and that could eliminate a lot of the confusion. Even with a good agents MD guidance about how to run tests, I still see with amp and I think, you know, we've, we tried to make this really good. It only gets it right maybe 90, 95% of the time. Sometimes it'll run the wrong test thing or it won't escape it correctly. And I think we can eliminate that with a sub agent. So there's so much more potential to go deep in areas like that. And then for every language it's a little bit different. So Handle all those cases.
A
Do you feel like that will just be built by each company on their own or do you think there's like a sane default that you guys are going to build for that that is going to be effective? For most code bases and test structures.
B
This is where scale helps and we have a lot of scale. So increasingly we're able to see in this framework for this standard go unit test package that's easy vtest in JavaScript, that's easy. And once you start getting the more of the long tail, then it might have to just fall back to a really good model. But I think that we could probably make something that's optimized for some of these more popular unit testing frameworks. And it's a combination of deterministic stuff and non deterministic stuff. Because right now in my VS code I can hand Apple T if I'm positioned in a test file inside of one of those test blocks and it's only going to run that one. So, you know, even that is a benefit.
A
And now I'm mostly bottlenecked by your playwright. Yeah, it just takes a long time, man.
B
But the crazy thing is the vast majority of devs who are building web applications with coding API agents do not have playwright. And if they have it, it is set up in such a shitty way where it cannot really log into their app. They don't have any pattern for that. So even something like that. That's another example of a sub agent that's go and try this basic end to end testing flow described in natural language with the running application. And wouldn't it be great if it could also do it in parallel? So you know, there's all these ways that you can improve. That's a great example.
C
And I think touching on this, we have coding agents. They are productive, they add value. We cannot assume that everything around the agent in dev tooling or code bases will stay static. So I think people are already adopting their code base to be better used by agents or they're adopting their tooling to be better used by agents. They're more descriptive help text or whatever it is. So I think, I don't know, we should have a counter. But everything is changing. I don't know, saying this again, but we cannot build right now with oh, this is the tool that's going to stick around and giving that all of the code bases and all of the processes and all of the dev tools will stay the same. We have to assume that this stuff will change too and we have to stay nimble so we have to make like short bets or small bats and try and get us, you know, in small steps forward, but always be reactive to this stuff that, you know, if people again, let's not use basil again. But I think playwright is a good thing where the feedback loop is incredibly important to working with these agents like that the agent can see whether what, what it's doing is actually working. So what we've seen people now do is, well, instead of having the client log and having the browser log and having the database log, let's have one unified log. Because then it's easier for the agent to just look at this log and make sense of it. And then it turns out it doesn't have to be nicely formatted, it can be verbose, you can just have like JSON line outputs and whatnot because the agent can understand it much better than a human can. And I think that's just a little preview of more things that we will see where you're like, wait a second, this is not made for human consumption anymore. How can we optimize this for agentic consumption? And then maybe the game changes.
A
And there's some things that now we get. For example, in my vitest suite I have a knock to record HTTP calls. So whenever, especially for inference, like you can't really mock, we do classification, things like that. You just need to see what happens and then we just save the whole interaction and then the model can actually see what the API returned like in much detail and it can reference it back in the future. So when you add a new feature, it can look at the test and it can see what the API usually returns. And it's like, oh, okay, it's going to have that key and the content and things like that. I think there's more of that to be done. I think there was maybe also a time in which having console logs was really bad. And I think there's maybe not going to be a console log that is only funneling to not the actual console in the browser, but some way for the agent to see all of the details of everything that is happening. What I haven't figured out is how do you instrument that? Because you cannot put a whole bunch of console logs that go somewhere else in the code because then you're also polluting the context window of the model. So you need some other way to do it. But I think yeah, the more you're login, the more the model can kind of like self iterate. But.
B
And you just described like five approaches that seem absolutely worthwhile to go explore to improve how coding agents work.
A
Somebody do it. We can do some of it at Kernel Labs, but we cannot do all of it. So unless somebody help again, like, the.
C
World around us is also changing. Jose Valim, the creator of Elixir and you know, contributor co contributor Rails can't remember the name, but basically they have a new framework tooling out that is Phoenix. Yeah, it's for Phoenix. Right, but it's the name of the. I can't remember, but it's about, well, what if you build a framework for an agent too? What if the agent is integrated into the framework so that you can. If the application fails to run, you can ask the agent that has access to all of the context. And that's going to be more and more. I think like a lot of, you know, developers will build stuff because they're fed up with like copy and pasting stuff around. So we're going to see this in developer tools.
A
Well, I mean, Rails was like one of the first frameworks that I know that in the error page they had a cli that you could use the local context. And I think more of that in next you have the copy to markdown. Whenever you have an exception, you can copy the markdown, put it in there.
C
That's the first sign.
A
Yeah, yeah, but I think there should. And in their docs too you can like copy to markdown, but then it's like you can only copy the markdown the whole page. And it's like, well, you know, maybe I only want to do this section or like I want to do 1, 2, 3. I don't know. I think that's why the mentally fias of the world stainless all these companies that do kind of like API docs and API generation from docs are getting a lot of interest. I think you'll get more of that. But it's hard to get people to move over. I'm sure you see it with some of the sourcegraph customers. It's like, how am I supposed to reinstrument this whole code base that is like 15 years old?
B
It's true. But what we have said is we explicitly our building amp for the people that do want to move. And that's been so liberating.
A
And I think that that's the great thing about what you see in the market today, which is like you have all these companies that are so AI first and just use it and do great and then you go on Hacker News and it's like, I've never got a single good result from AI And I'm like, well, obviously that's not true. And maybe the extreme is definitely true, though, I think to me, that's kind of like the thing is, like, the people that are spending $100,000 a year on AMP with two people, obviously they're getting value. It's not like they love burning money.
B
Yeah.
A
But the people that are negative, to me, that's not always true because it's easy to be negative. And, like, it doesn't cost anything.
B
Right.
A
To put a comment that is bad. And so what's going to be the thing that forces the rest of the market to be whatever, man. Let's just get on amp and make that work.
C
They just have to see this work once or twice. You know, we've been in developer tooling for a long time with Sourcecraft, and it's always been hard for the last, say, 10 years to get a company to adopt a developer tool that does not immediately fit into their code base. Because the code base, that's the standard. Everything else has to adapt to our code base and our processes and whatnot. What we're seeing now with agents is as soon as somebody has seen what it can do, they have such a multiplying effect or they bring so much value that people are willing to adopt the code base for this. Like the first time in how many decades where people are like, maybe our code base is wrong, like, maybe maybe we should change the way we develop code to make more use of this. So I think people have to see this and then the agents will pull them along. Or like the, you know, the value that this brings will pull it along.
A
Yeah. I'm curious. So I was on the board of a company called Launchable, which was founded by Kosuke Kawaguchi, built Jenkins, and the idea behind Launchable was like, well, instead of running all of your tests, we'll use machine learning to figure out what tests are impacted by your PR and just run the small subset of them. And I think what we found, then the company got bought by Copies. But it was like, in a lot of companies who go in there and they're like, oh, well, well, how can we trust it, though? Let's do a poc. And then you do the poc and it's like, it works great for the subset. Well, you know, work for the subset, but, like, is it going to work for, like, the whole test week and then you do a whole process? And I think with coding, it's like, for some companies, it's like they see it work on One task and they're like, it's worth trying on every task. And then there's another subset of companies that are like, well, you know, it works a little bit on the front end, but it doesn't work on like my Java service back there. So I'm not going to use it it at all. I haven't quite figured out what's going to be the market pressure to make those people move along, you know, but it's like you said, it's like for some people it needs to work once. Maybe for some other people it's got to be one task that always fail, one task that I always use. We have built this Kernel Gym product which is like an MCP playground and tester and I have a task which is like add YOLO mode which is, you know, let a user toggle between auto running, which sounds easy, but it's actually quite hard without LLM's work work to stop inference, to approve a tool and then run it again. And every model was failing until GPT5 codecs and codec CLI was the first time I got it in one shot. It made the whole thing. And I wonder if everybody should build some sort of four or five tasks that are like, okay, if you can actually do this end to end, then I'm like, I'm in. But I feel like people are still in denial of that's going to work. They don't want to have the conversation at all.
B
If you look at that early adopter, the, you know, laggards, that chart of technology adoption, there's a reason why the early adopters are the tiny little start of the curve, you know, 3%. And it feels like so many of these arguments are people saying, well, what if we made a product that was for the early adopters but somehow made the laggards also adopted early. Why aren't we going after that big market? It's the vast majority of the area under the curve. And it's like, like because they fundamentally do not want what you are building. And maybe they should, maybe they're going to realize that, but you're not going to make them realize it. Or if you waste your time trying to make them realize it, you're going to be trounced by hopefully people like us that are only focused on the early adopters. It's a total mindset shift. And if you are just focused on building something for early adopters and you literally do not care and you set up your entire business and product to not, not care, not have to care about the people that are laggards. You can do a much better job. And that's what we're experiencing now.
A
Let's talk about the outer loop, because I think that's kind of like the next step, at least for me. It's like, I think the coding agents themselves do great on a task by task basis. But then there's like, you know, PR review, which GitHub is like so slow and so clunky and it's so like order by file versus, like, I think we should get to a world which is like more semantic. It's like, hey, you know, these are really like the 50 lines of code that matter to look at. And everything else is like, it's fine. You can like skim through it. How do you think about that when you want to? Especially when you think about Async agents. You know, there should be an easy way to spin them up, which I think is fairly clear. But then I'm not sure if there's yet an easy way to catch up on what they're doing. You know what I found when I use conductor, like vibecam band, it's like I spin out five, six of them and I'm working on them and I kind of jump between them. And then my wife is like, let's have dinner. And then we have dinner and I go back and I'm like, what the fuck is going on here? Again, it's like, which one is doing what? And it's like, it's hard to, like, just at a high level, see what each of them is working on, where it's getting blocked. Have you guys seen anything that works there? Have you been thinking about building any tools in that space?
C
I agree. Right. I feel this too. I think, you know, with our internal experiments, I think, you know, for example, this idea of, well, I just spawn 10 agents and they work and I control them. I think Stevie is doing this and he has like a whole workflow around this and it seems to work for him. But for me, I guess I'm a one tasker in my mind. Like, I need to. I can do this. Like, I cannot control five agents at the same time. And then when I do it asynchronously, I realize that I need to be really strict about how I review what they've done and that I also don't jump between them. And then it's also, you know, making sure that you don't miss anything. Like, I spun up so many agents and then haven't checked back on them because I forgot that they actually run. So that's something you need to build in the product. But yeah, I don't think it's figured out, you know, like, it's a. There's so much to do still.
B
Yeah, it's wide open. We think of it right now like if you're playing chess, you can play one board at a time. Or the people in New York City, Central park who play against 10 different tables at once. And they go, and they sit down in front of the table, they get oriented, they make a move, and then they go. And that's what we're trying to build. And it turns out even if you've got a coding agent running in your editor in the cli and then it makes a big diff, you've still got to understand it. And it just becomes even more important when you have a lot running in the background. So we want to make it easier to orient yourself with what's the change? And there's a lot of stuff that is not in the realm of coding agents that would help, like having a deploy preview consistently available. So you could just click and click through it. And then we want to make it fast for you to make a move and then, you know, get on with your next thing.
C
Yeah. Or, you know, just ui. So at a glance you can see, I don't know what it is yet, but at a first glance, so you can see what the agent actually did without having to go and read through, like, the emoji summary. Finally we have it, and blah, blah, blah, stuff like this. But to come back to, like, your question of, like, the outer loop, I think, and you know, if Biang was here, he would talk for a long time about this because he's passionate about it. That's. The inner loop has changed a lot in that, you know, write test, review and whatnot. It's that you now review a lot more code. And what effects does this have for me? For example, we don't do any formal code reviews on the AMP team, but it doesn't mean that code isn't reviewed because we use, you know, amp to write 80 to 90% of our code base. But that means everybody should review the code that the agent wrote so it's reviewed by at least one person. Right. And that's not reflected at all in GitHub yet. Like, GitHub is still based on this other mode where you tag somebody. But then it's like, well, I actually went through two agents to produce this code and I reviewed it three times. Do I now tag five other people and right now we're stuck in this mode where people would say yes, but I don't think it's going to hold that much longer.
A
Yeah, the other thing I noticed is like merge conflicts. I used to have very little because it's like I know what I'm working on and if I'm doing multiple tasks, I know how this is going to impact that and I'm going to build towards it versus the agents, especially when you run them in parallel, it's like they just start to change whatever is convenient to them and then it's like across them they're like changing the same thing. And so one thing we've been thinking about building is like, you know, how do you do better cross agent orchestration of like these changes? So I built for like the GPT5 post. It's like task manager, there's like CLI first and basically any agent can append what files they're touching and then they can read what files other agents are touching and see what those diffs are to implement them back. But then I think the question is, well, maybe what they're doing now doesn't end up being the final thing. And now you're wasting all these tokens like reviewing all these changes before review. I think at this point it's like, is Git well designed for this future world that we're going into? I think everything is back on the table. I think maybe five years ago it was like there was a couple of YC companies doing oh, we're like a new version control system. And I'm like, look man, I'm not really interested in listening at this stage. And same way programming languages, it's like when Chris Ladner even started working on Mojo, it's like, okay, because of AI, I understand why you need to build a superset of Python. And I think now with agents it's like maybe clear why TypeScript should win because type checking is very good for the model to do self improvement. What are the other things?
B
I think the interesting Flex here is people assume that coding agents meet the bar of writing the exact same kinds of software to the exact same standard. And that is not necessarily an assumption that end users, consumers will apply if they have software that's much faster, cheaper, much more personalized. If they can conjure it up on their own, then yeah, you're going to tolerate if the loading state of this thing doesn't quite, you know, work correctly. So changing user demands and standards is an interesting thing that you can flex here.
A
Yeah. What do you think about that. You know, we've been thinking about enterprise software moving more towards user generated content which is like hey, you know, expenses are a great example of all these expense tools. Why are there so many companies when the core action that you're doing is take one line of expense and tag it with different things, but then you have to set up all these categories and whatnot versus just generate it for my company and for each team separately because they have different things. And it's like to me that feels like more and more of that will become true and then the real value is what's kind of like the underlying data store or data stores that you're feeding into this. And I know some enterprises are building already built kind of like internal lovables, basically where each employee can kind of create a simple tool and then they connect the tool to internal data stores and they might be the only users of it. There's nobody else that does it. And I'm curious how you guys think about. I know that Bolt New for example now has clock over integration. Like where do you see the line move between like software engineers build software and like obviously AMP is like a great tool for that versus going more upstream which is like any non technical people can also plug into the code and like build things on top of it. That feels in a way very different but also very similar in like the challenges that you need to solve for.
B
I think this idea of non technical is the wrong way to look at it. There are always going to be people that are good at unambiguously specifying what they want out of a computer. And we've had non coders, including one of our board members who built something with AMP that replaced like 250k a year piece of software that he used for a lot of their internal fund tracking. He maybe took one computer science class, he hasn't really coded, but he's a really smart guy and he knows how to unambiguously specify what he wants to his CEOs certainly and now to a computer as well. So if you can get people like that a tool that's really powerful, they don't think of themselves as a non technical person. I think that's, that's just such a bad mindset. So we want to build for the power user and if that person has not been a coder, but they can pick it up really quickly, that's great. Again we're completely focused on the people that know how to and want to get the very best out of this and that want the agent to win that aren't trying to be like, oh, you know, nada. Hey, it didn't do this thing. Tell me when it does.
C
We had this at the start a lot where people, whenever you have like an AI tool, I think there's a natural tendency by engineers to get it in a gotcha moment, you know, like, oh, I asked this and I didn't know this. And it's like, are you trying to get something out of it or are you trying to get it to fail? And you know, it's not, it's not worthwhile to build for somebody who doesn't want to fail.
B
Yeah, actually, if you fast forward how the world is going, you're seeing already over the last few years, companies have really slowed down their growth in engineering headcount. This is a global phenomenon. You're seeing engineers like here on the AMP team and other companies that are using agents really heavily. They're cutting out the middlemen. They're putting the people who are building the product closer to the customer. Because you can go and hear an idea from a customer literally in the meeting. You can kick off an agent to go and build it, and then you have a first draft of it. So overall, the person who's using the coding agent is getting so much closer to the problem. They're also going to share more in the rewards from solving the problem. Because without needing to share the profits with everyone else, there's naturally, you know, more to go to them. So I think if you fast forward this, it's not that the firm or big companies are going to completely go away, but you're going to have people that have an incredible vision in their head and that are so close to the problem and have an incredible incentive to go solve that problem. Equip them with a coding agent. If you're going to build a coding agent that those people want, that is way better and more valuable and you're creating more value, you're allowing more new things to be created in the world than if you were building a coding agent that is for the median developer that makes them 30% better. So that's who we're targeting. And I don't think that that will necessarily look like vibe coding. Vibe coding is this really unproductive thing to discuss because everyone has a different definition of it.
A
It.
B
And too often it's having the agent write code with. With poor feedback loops and poor quality control. And I don't think that that's valuable, but it's giving that person the ability to build something Truly great. Really fast. When they're so incentivized and they will have every desire for it to work well.
A
Yeah. And I know we're getting close to time, but a couple things I want to touch on. So, Thorsten, I was reading through your blog. You left sourcegraph a year and a half ago, then you joined back. Good job, Quinn. Bring him in home.
B
Thank you, Thurston.
A
But when you've wrote a post about leaving, one thing you wrote is that when you first joined in 2019, one thing that Quinn told you is like, hey, Searchgraph is your playground and you have skills and talents and I want you to use those skills to move the company forward. How do you take this idea of the power user getting close to the customer and how people are going to build teams overall? There used to be engineering and product, like you were saying, the triangle that's kind of going away. What are the type of people that you think are going to be most successful? How should people think about structuring teams? It's like, obviously you're doing this with AMP in a way, right? You're building a sub team and sub product within a larger company. Any tips that you have for other founders and executives?
B
Thorsten is incredible and AMP would not exist in any way without him. He has strong internal constitution of how he uses it and what's real and what's not. And it's so easy to get away with the hype, the possibilities, especially when you see other people, a lot of other smart people who are getting carried away by it. Thorson has this incredible ability to stay grounded and that with everything changing so fast, with it being such a hype cycle right now, that's really important. Also, just these first principles, thinking like how we've completely rethought how we build everything in AMP based on how should we actually do it, rather than what has come before. Thorsen is the rare person who's been at bigger companies, who's seen how source graph, how we build enterprise software, and, you know, not the Google way, but in a different way, and has taken the parts of it that work and not the parts that don't. So all of that combined with someone who's an incredible engineer, incredible writer, communicator, that's a really powerful combination. So find those people. And then what I said when he rejoined is he is the dictator. That made him feel really uncomfortable, as you can see. I hope you cut to his face, but that's, that's exactly what you have to do. And had just put so Much trust in people like that. And that also shows everyone else at the company that they can do crazy stuff, that they can go way beyond. They can take it to the extreme, they can make mistakes. And that's still okay because we're not trying to build something that's going to go really big in the current state. Amp is growing incredibly fast. But the most important thing is we're building the coding agent, God, that thing in the future. And that's something that we're all in search of. So none of the mistakes, none of the successes in the month to month timeframe really matter. It's all about getting ourselves in the right trajectory. And you got to do crazy stuff. So equipping Thorsten to do crazy stuff stuff and to take the ideas that he has and make them scale up with all the reach that Sourcegraph has, that's been my goal on the first principle thinking.
A
How do you think about that? And so there's the world of evals and there's the world of vibes, right?
C
Yeah.
A
How do you approach it? How do you look at the product and you're like, okay, this is good, this is bad, this is what we need to improve. Is there something formal that you guys use internally or is it mostly you as the dictator directing two part answer.
C
I think the first part is to also answer the other question a little bit. Is what I've seen become more important or the shift I've seen is that I set the triangle of PM designer, engineer. I think as an engineer or any of the three, you now need to know a lot more about the other parts. As an engineer, you cannot see yourself as the person anymore who types out a spec or turns a product PRD into code. I think you need to be aware of business, you need to be aware of a product. You need to know and have some taste for software. Otherwise I think the value of your work will diminish over time because the pure typing out of code for most of the code, you know, exceptions being a John Carmack and you know, whatever. For most of the code, I think the value will diminish. And we've already seen this like compare GitHub contribution chart today. Its value to save two years ago. Right. And to come back to the second part like vibes and whatnot. I think we don't have any set evals. We don't and this was controversial up until a week ago, I think when I think Boris from or two weeks ago from Anthropic said they don't have evals. For the coding agent too, but we don't and we haven't had them. I've built evals before. I fine tuned models before. I know that they're good. I love evals. I was addicted to it, to LLM as a judge. I wrote about LLM as a judge. But for a coding agent who's supposed to work in many different code bases, who's supposed to work with many different types of prompt, who's supposed to work with many different type of tasks, it's a time investment that we cannot afford with everything changing and having to stay fast. And if you ship 20 times a day, you will get a lot of good feedback. I swear you I could tune my system prompt a little bit now and then I would say by this evening, people on our team would go, why does it call this tool so often? Like, what's going on? What did we ship? And that's incredibly valuable feedback and that's incredibly valuable when people dog food the product and use it all day. And how do I make these calls? I don't know. I think it's experience of like I think about software a lot. I love using software. I, I listen to a lot of business podcasts, I read a lot about business, I listen to a lot of software podcasts, I read a lot about software and then I try to project like, what does the business need? How can we get growth to 10x? How can we get our users to 10x? How can I use my engineering capabilities to serve as a function of the business to reach those goals? How can I organize the team or get the team to help me reach those goals or together reach those goals? And you know, it's hard to explain, but it's like I feel like in this year, truly here at Softgraph, like everything I learned over the last, say 15 years of my career is coming together in the sense that all of the hours spent listening to the acquired podcast to help me as much as, you know, reading hacker news for how many hundred hours and writing code for how many thousand hours? You know, and with code being now this, this tool that you can wield much easy or much fast or much more often, I think it's become much more important to how do you want to wield it and when and for what reason?
A
I think the hard to explain is a great explanation why, you know, you just cannot one shot create these things because there's a lot of implicit preference. Awesome guys. Anything to wrap call to actions. Are you hiring who should reach out to you? Request for startups what should people build that is going to be helpful to you guys?
C
Yeah. I don't know. I don't know. We're always interested in talking to fellow engineers who are interested in agentic programming, figuring new stuff out. We want to hear from them, like, what works and doesn't work. We're always willing to hire people with exceptional talents who are fully in this and realize that programming is changing a lot and I don't know what else.
B
If you want to come on this journey with us and see where coding agents are going to, then come along.
C
Yeah.
B
Use amp, send us your feedback. And we are just so excited. We feel like kids in a candy shop. Just that we get to go build the future of coding. Feels like the final boss.
C
Yeah.
A
Nice. Thank you guys for coming on. This was fun.
C
Thank you.
Date: September 25, 2025
Participants:
This episode explores the journey from Sourcegraph’s earlier product, Cody, to its next-generation coding agent: amp. Alessio, Quinn, and Thorsten dive into the technical, organizational, and philosophical underpinnings of building at the forefront of the rapidly shifting AI agent landscape. They dissect the nature of product iteration in an “everything changes” world, the evolution of developer tooling, the realities of agentic coding, and their bets for the future of developers and enterprises.
[00:50-05:15]
“The only thing that matters is building the best coding agent. Nothing else matters.”
— Quinn (B), [03:14]
[05:15-07:55]
“We’re growing more than 50% month over month...but it’s more about how do we get to be the first ones with that 10 to 100x improvement?”
— Quinn (B), [05:20]
[07:55-12:23]
[12:23-19:59]
“There’s always a hand hovering over the button—can we reduce complexity? So we’re again in the spot where, if a new model comes out, we can react quickly.”
— Thorsten (C), [18:49]
[19:59-23:18]
“We alone among the entire industry, it feels like we are being really honest and really bold with that.”
— Quinn (B), [21:56]
[23:18-27:06]
“I think we’re going towards a future where the model will become an implementation detail to some sense and we will end up on a different abstraction layer.”
— Thorsten (C), [25:31]
[27:06-30:39]
“We are one or two months away from a possible news cycle that is the foundation model companies…now you know, they’re no longer the best in this realm and there’s a huge stampede away from them. That’s very possible.”
— Quinn (B), [30:13]
[31:59-42:00]
“Compact is such an alluring thing where people think, ‘oh, I ran out of context, I hit that button, now I’m back to the start.’ But you lose signal, you lose data… is it good enough to really glance over this, that the user doesn’t have to worry about it?”
— Thorsten (C), [42:00]
[38:28-39:50]
[43:34-48:23]
“For me, the meta thing here too is everything is changing. That means CLI tools right now are also adopting to being used by agents, so they’re changing the output too.”
— Thorsten (C), [44:55]
[48:23-54:11]
“That’s just a little preview... this is not made for human consumption anymore. How can we optimize this for agentic consumption? And then maybe the game changes.”
— Thorsten (C), [53:04]
[57:08-60:47]
“If you waste your time trying to make [laggards] realize it, you’re going to be trounced by … people like us that are only focused on the early adopters. It’s a total mindset shift.”
— Quinn (B), [60:47]
[61:46-65:40]
“If you’re playing chess... people in Central Park who play against 10 different tables at once... they get oriented, make a move, and go. That’s what we’re trying to build.”
— Quinn (B), [63:45]
[72:22-78:54]
“You now need to know a lot more about the other parts... Otherwise, I think the value of your work will diminish over time because the pure typing out of code... will diminish.”
— Thorsten (C), [75:33]
[79:14-end]
“We feel like kids in a candy shop. Just that we get to go build the future of coding. Feels like the final boss.”
— Quinn (B), [79:43]
| Timestamp | Speaker | Quote | |-----------|---------|-------| | 03:14 | B (Quinn) | “The only thing that matters is building the best coding agent. Nothing else matters.” | | 04:24 | C (Thorsten) | “With AMP, we basically said, let’s undo this. Let’s build something that allows us to ship 15 times a day.” | | 18:49 | C (Thorsten) | “There’s always a hand hovering over the button to can we get rid of this? Can we shed weight?...if a new model comes out, we can react quickly.” | | 21:56 | B (Quinn) | “We alone among the entire industry, it feels like we are being really honest and really bold with that.” | | 25:31 | C (Thorsten) | “I think we’re going towards a future where the model will become an implementation detail to some sense and we will end up on a different abstraction layer.” | | 30:13 | B (Quinn) | “We are one or two months away from a possible news cycle that is the foundation model companies...now you know, they’re no longer the best in this realm and there’s a huge stampede away from them. That’s very possible.” | | 39:50 | B (Quinn) | “Prompt enhancer, that’s a bullshit feature that doesn’t actually work.” | | 42:00 | C (Thorsten) | “Compact is such a alluring thing...but you lose signal, you lose data...is compacting good enough...?” | | 53:04 | C (Thorsten) | “That’s just a little preview...this is not made for human consumption anymore. How can we optimize this for agentic consumption? And then maybe the game changes.” | | 60:47 | B (Quinn) | “If you waste your time trying to make [laggards] realize it, you’re going to be trounced by ... people like us that are only focused on the early adopters. It’s a total mindset shift.” | | 75:33 | C (Thorsten) | “As an engineer or any of the three, you now need to know a lot more about the other parts...Otherwise I think the value of your work will diminish over time because the pure typing out of code...will diminish.” | | 79:43 | B (Quinn) | “We feel like kids in a candy shop. Just that we get to go build the future of coding. Feels like the final boss.” |
For more notes and future episodes, visit latent.space.