
The rise of language-model coding assistants has led to the creation of the vibe coding paradigm. In this mode of software development, AI agents take a plain language prompt and generate entire applications,
Loading summary
Narrator
The rise of language model coding assistance has led to the creation of the Vibe coding paradigm. In this mode of software development, AI agents take a plain language prompt and generate entire applications, which dramatically lowers the barriers to entry and democratizes access to software creation. However, many enterprise environments have large legacy code bases, and these sprawling systems are complex, interdependent, and far less amenable to the greenfield style of Vibe coding. Working effectively within them requires deep context awareness, something language models commonly struggle to maintain. Augment Code is an AI coding assistant that focuses on contextual understanding of large code bases and enterprise settings. It emphasizes tooling to manage large development surface areas while automating PRs and code review. Guy Gharari is a co founder at Augment. He has a PhD in physics and was previously a research scientist at Google, where he worked on AI reasoning and math and science. Guy joins the podcast with Kevin Ball to talk about Augment code, its focus on full context for large enterprise code bases, code review as the new bottleneck in AI driven development, and much more. Kevin Ball, or K. Ball, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action discussion group through Latent Space. Check out the show notes to follow K. Ball on Twitter or LinkedIn or visit his website K Ball LLC.
Kevin Ball
Guy, welcome to the show.
Guy Gharari
Thanks for having me.
Kevin Ball
Yeah, I'm excited to get to talk with you. Let's maybe start with just a little bit of background about you and about how you came to start Augment.
Guy Gharari
Sure. So I was a researcher at Google working on AI, trying to open the black box of language models and other AI models and kind of trying to understand what makes them tick, trying to especially improve their reasoning capabilities. So I was very interested in getting them to solve math and science questions. And then as I was watching these models get better and better, I felt that we were crossing a threshold of language models actually becoming useful. This was before ChatGPT came out. I think once ChatGPT came out it was kind of clear to everyone that these things are actually really useful. But to me, the other big reasoning task out there besides math and science is code. And so I and my co founder Igor, we felt that there was a big opportunity here to take these models and productionize them and build AI coding assistance that would help enterprises working on large code bases get the most out of AI models. And that's why we started Augment.
Kevin Ball
Nice. Well, a Lot of different things to dig in there. But I actually want to start with the reasoning around math, because one of the things that math is very interesting for is you've got a way to formally validate on the other side. Right. You've got a formal checker. So I'm kind of curious, does that apply as you move into code, or how does that reasoning approach need to differ as you move into something a little fuzzier?
Guy Gharari
Yeah. So it's interesting, in math, formal verification still is a big area of research. Right. That's the promise is that you can formalize theorems and then automatically verify the output from language models. With math, it's actually a bit complicated because math wasn't constructed to be formally verified. It was constructed by people who were doing math informally, and then as time went on, tried to put more and more rigor into it. And so now we're at a point where there are, like, what I would call corners of math that can be formally verified, but most of math has not been formalized to the extent that you can put it through a theorem proverbial. So that vision of let's just verify all the math outputs, it still remains a distant vision, but for code, it's different. Code was in some sense built from the ground up to be executed. It's not formally verified, but you can execute it. You can see what happens when you run it, because that's the whole point. And I think that's why we're seeing AI for code really take off right now, is because that's the first domain where we can really close the loop between the model writing code and then being able to exit execute code and getting the feedback from that and iterating until it gets the code to work. So I think with code, this is why we're realizing this vision of really grounding the model's answers in reality now. And this is why we're seeing agents take off and so on and so forth. So it's a very exciting time to be working on AI for code.
Kevin Ball
Yeah, that ability to close the loop is one of the things that I've seen as people have started figuring out AI coding techniques like that seems to be key. How much can you automatically validate, whether it's by type checking or tests or what have you, how much can you sort of dynamically validate by giving the agent access to tools to test it? So how much of that are you tackling within augment itself versus leaving two developers to do so?
Guy Gharari
We try to do as much of that at Augment as we can. And so we try to provide the agent with all the feedback that's available to us to make it do a better, as good a job as it can. So everything you mentioned, type checking, Linter errors, then test, these are all things that we try to either automatically provide the agent with or kind of nudge it in the right direction so it can go do things itself. So, for example, we don't want to run tests automatically. That's probably a bit of a heavyweight thing for us to do. But we do try to nudge the agent in that direction because we know how useful it is when the agent actually runs tests. And so, yeah, it's a combination of all these things. And I think we're really just scratching the surface. There are things like production logs, metrics, traces, which are, I expect to also be very useful signals for the agent. These will take more work to get to. But I think at Augment, we're all about the context, right? The context that we provide the agent is the most important thing that determines how good a job it's going to do. So it starts with the things we just talked about. We try to surface context from the code base itself, no matter how large it is, to help it kind of get the lay of the land and what code it can use to do its job and what patterns it should follow. And then I expect that as time goes on, we will accumulate more and more context sources that we can give the agent, including these things that kind of keep it grounded in terms of code execution.
Kevin Ball
Yeah, I think that's one of the things that seems to be key to making these things work is like, how are you managing that context? And without asking you to disclose any secrets to what makes Augment tick, I'm curious, how do you think about those different pieces, like implicit context, where you just surface it, versus giving the user tools for explicit context discovery of what's the right stuff? Are there intermediate documents? How do you think about that context management problem?
Guy Gharari
Right. So one principle we have is that we try to keep the agent as autonomous as possible. So models have gotten very good at being able to figure out which tools to call when. And so we try not to put things automatically in the context window unless the agent asks for them. One reason for that is that on the one hand it's not necessary, and on the other hand, developers use our agent to do such a large variety of things that we don't want to guess at what the best context is. So, you know, we will not put things automatically in the context window from the code base, for example, unless we're really, really sure that this is what the agent wants, so that's part of it. On the other hand, the agent doesn't always know what it doesn't know, right? Maybe there's a source of context that it's not even aware of. And then, like Linter Errors is an example. Those are things we often can get our hands on automatically, but the agent doesn't know maybe when to go and call a tool to give it Linter Errors. When we're pretty sure that there is a source of context that's important for the agent, then we will figure out how to slip it in automatically. I think there's a bit of a balance between those things. There's another principle that we follow, which we're calling infinite context. That's where we try to make sure that the user never has to think about the size of the context window of their model. And so it shouldn't matter how large your code base is, we should just be able to surface the parts of it that are important for the agent. And that's where our context engine comes in. You should also not care about how big the context window is in terms of how long your agent session is. We want the user to be able to go on with their agent session basically forever. And we will do the context management behind the scenes to make sure you never run out of tokens. And so that's another part of the experience that's very important to us.
Kevin Ball
Yeah, totally. I think that sort of situation of context rot, where you have this longer and longer thread and suddenly your agent is not doing what you ask. It's maybe forgetting some of the things from earlier, or it's one I've seen a lot is it'll start and it'll be like, do you want me to do more? Instead of just actually proactively doing more, all seem to be signs. So, yeah, if you can make that go away, that will feel magical.
Guy Gharari
Exactly. And so some of it we're able to make go away. But of course, those tokens need to live somewhere, right? So you need to either like retrieve from them or summarize them or do something else. And so those are the tricks that go into how to make it indeed feel magical. This problem where it stops paying attention to things that were previously in the context. That one, I see it a lot. We still have not been able to solve that, but we have some ideas on how to improve on that. Because, yes, the longer the context gets, the Harder time these models have kind of figuring out what to attend to. That's definitely something we're seeing.
Kevin Ball
Well and this gets into a topic area that I'd love to get your sense on, which is kind of effectively using the tools. Because I mean, I agree we're at a place where LLM based coding tools are completely transforming the way that we write code, but they're not a drop in replacement for the tools that we used to have. And I think some of the widely varying results people talk about, you know, Twitter will have somebody ranting about how these things are worthless and somebody else talking about how can you do that? I've 100x my productivity has a lot to do with how we approach them. So how do you think about like using AI coding tools in the day to day software development process?
Guy Gharari
Yeah. So to me it starts from a pretty simple place which is prompting. What I find is that users who don't get a lot of value out of these tools, when we really look into it, it often comes down to prompting and prompting is about. I like to say that we put this prompt box in front of users as if it's this magical thing and it kind of the hype is so prevalent right now that people maybe feel like, well, it doesn't matter what I put in there. This thing is so intelligent that it should figure out exactly what I need. And if it's starts going off the rails then it's just useless where the reality is, even in the prompt box, context really matters. And it's funny, but you kind of need to put yourself in the model's shoes and try to understand what context does the model have. So let's say the model with augment, it knows all about my code base, but it doesn't know what I'm trying to do right now. Right. It doesn't know my intent. And the more I can tell the model or the agent about my intent and the more I can tell it about how I wanted to accomplish the task, the better result I'm going to get. One thing we added to help with this is a little feature. It's called the prompt enhancer. So it's a little sparkles button in the prompt box. You can write your half baked prompt because many people are also me included are just lazy to write very long prompts. Just put your short prompt in there and then click on the prompt enhancer and it's going to turn your prompt into a mini specific with all the details and things that are needed to get the job done. And then you can look at that spec and see what details it got right and what details it got wrong and fix the ones that got wrong. And that all helps keep it on track. So I'd say it really starts with prompting. That's the most basic thing. And what context you put in the prompt really matters for the results you're going to get.
Kevin Ball
Yeah, well, and I love that feature because my flow right now, anytime I'm doing anything moderately complex with these tools is I will say this is what I'm trying to do. Write me a spec. And then we do exactly that iteration process. And it is a process of like taking that implicit that's inside of my head and helping me turn it into something the model can act on. This kind of raises this question of like, what are they capable of? What is the current state of agentic coding? How complex of a feature can I click my sparkles button, have a little back and forth with, and actually end up with something that's implementable?
Guy Gharari
Yes. So I would separate it to. Right. What can you do with some back and forth versus what can you do in one shot? Because one thing we're seeing is that these agents are now good enough that you can do a lot one shot as well. Not as complex a feature, but there's a bunch of stuff you can just automate away. So I think in terms of feature complexity. So let's say my PRs typically get to hundreds of lines, or like if I stretch it low thousands of lines, I write all of them with our agent. The fact that it understands the code base means I don't need to do that much hand holding. And I just, I can focus on what should the design be, what should the architecture look like, and if it got it wrong, kind of steer it in the right direction or do it initially through a spec like you mentioned, which is also something I do more and more these days. So I'd say I usually work on PRs with the agent. And it's been a while since I encountered a PR that I can't just write with the agent. With enough steering, I think doing things completely automatically in one shot here we're talking about simpler tasks. So maybe there's a class of tickets. This is something I've seen one of our customers do. They use our CLI tool that we've recently launched to automate ticket to PR journeys with a prompt that they've developed over time and fine tuned so that there's a whole class of tickets they can just turn into PRs. There are other things that we use it for, like code review, like automating code review and putting comments on PRs. These are again, things that don't require a lot of steering initially. So simpler tasks you can do in one shot, but complex tasks, to me, it's like up to roughly a PR level.
Kevin Ball
Yeah, no, that makes sense. And one of the things you mentioned there is code review. So I'd actually love to get a sense of what are you doing at Augment for code review. That's a big pain point I've seen because as these tools make it easier to write more and more code more and more quickly, I at least have ended up reviewing. I've probably reviewed 100,000 lines of code in the last three weeks. That is painful. So how do you keep up with that? How do these tools help you kind of navigate that?
Guy Gharari
Wow, that's a lot of code. 100,000 lines is a lot of code for a few weeks.
Kevin Ball
I mean, that's what these tools kind of enable. If you let them just rock and roll. Right. If you're doing some greenfields work, they will generate a lot of code. Maybe I'm exaggerating slightly, but certainly 50,000.
Guy Gharari
Yeah, no, that makes total sense. That's what we're seeing as well, is that as agents start writing 80, 90% or more of your code, they write it so quickly that code review becomes the bottleneck. We're seeing that internally, we're seeing that with customers. And so I don't have anything to announce yet, but we are definitely trying to figure out how can we give users the best code review experience. There are a lot of interesting questions around that. So what I can say is, you know, GitHub, which is where most folks do code review, was not designed with agents in mind. Right.
Kevin Ball
The UI just starts to choke when you get these big.
Guy Gharari
Yeah, absolutely, exactly. Also, with agents, you can do more. Like agents can go and actually change the code. Right. It doesn't have to be just the back and forth with the user. So we are. We're trying to figure out what code review experience we want to provide our users with. CLI, we're shipping it with a GitHub action that you can use to do code review if you like, to install that, and then the bot will comment on your pr. And that's something we're using internally. And so maybe separating it to like short term and long term. Short term. I think there's a lot we can do in automating, kind of the first pass at a pr trying to Find all the low hanging fruit there's in terms of like, maybe there are bugs, maybe there are like glaring inconsistencies, all kinds of things like that. We're also using our agent to write PR descriptions, which is something I found very delightful because it does, especially with all the context awareness that it has, it does a very good job of describing like what the PR is even about to help the human reviewer. So these are all short term things and I think we'll have something to announce around code review fairly soon. But I think longer term it's very interesting to think about what should the code review experience really be like? And you know, if you think about it, if there's one agent writing the code and then we're going to say, okay, there's another agent reviewing the code, like, do we really need these to be two separate agents or can we maybe take a step forward and like merge these things into one? Or maybe not. Maybe it does make sense for them to be adversarial in some sense and then maybe that's hard to achieve in one agent. So I think these are kind of the next questions that will be interesting to figure out. For code review is, in the age of agents, does it even make sense to have that kind of separation between writing code and reviewing code? Or should it really be all part of one thing? I'm not sure what the answer is, but those are the kinds of questions that we're trying to ask.
Kevin Ball
Yeah, no, it's a super interesting one and one I've been thinking about too. Because like, if you think about code reviews, they serve kind of almost three purposes. There's like the easy one, checking for bugs. Is this broken? Is this not broken as you highlight? Like we can already do that pretty well with agents. Then there's like, architecturally, is this evolving our code base in the right direction? Is this fitting in with our architecture? Is this taking advantage? I don't actually know how well agents do on that. As you highlight, they often don't know what they don't know. So maybe it doesn't know that there's a pattern over in some other part of the code base that it should be using. And then there's just like even keeping track of what's going on. How do I know what's in my code base? Does it even matter anymore?
Guy Gharari
Yeah, exactly. And I think, yeah, going back to the code base and maintaining it proactively is something again. It'll be really interesting to see how that pans out. I wonder how developers will react to that, because developers typically have their tasks that they want to focus on. And if you kind of go and tell them proactively. I know at Google I used to get these pull requests or change requests that were like, well, I think the tool is called Rosie or something like that. I ran and I improved your code. Why don't you review the stuff I did for you? So an early version of agents going through your code and trying to do stuff, I never liked that because it always felt like a distraction from the task that I'm supposed to do. These are the kinds of things that will start showing up as we try to automate more and more with agents. I do think the point about architecture, I suspect that's going to be a hard rock for us to figure out because that's a very common failure mode from what I've seen, is agents will get to correct code, but code that has like pretty bad design or pretty bad architecture, and if you don't steer it in the right direction, you're going to eventually get yourself into trouble. You're going to get to an unmaintainable state of the code, basically. And it's still unclear to me if other agents can go and find those problems or if the level of intelligence of the models is just not good enough yet to do proper design review or proper architecture review. It's also fascinating because these things are moving so fast, so maybe the models today cannot do it, but maybe three months from now they will be able to do it. It's a bit hard to predict, but I do think this question about architecture and design is kind of key. I think the best we can do right now is probably help humans understand the decisions that were made so that they can make a judgment call and decide if something was done properly or not. So that can definitely be part of an agentic code review, is just to explain how the code fits into the grand scheme of things in your code base.
Kevin Ball
Yeah, that makes sense. Well, and I think a lot of this gets to this question of the gap between vibe coding, which is all the rage, right? Let's vibe code this, let's vibe that. Which, I mean, got to admit, it's kind of fun. You have a script or a greenfields project, you don't care about how to maintain it. Like, yeah, vibe it all out, let the LLM make all the decisions. But professional software engineering, you're maintaining a code base over time, you're evolving these things. So, yeah, some of these questions of maintainability architecture become much more important. I'm Curious. Are you all looking at both of those use cases? How do they differ in terms of what you need to build into your tooling?
Guy Gharari
So we are targeting professional software developers and professional software teams. That's who we're building the product for. You can use it for Vibe coding if you want. It's totally fine. And if you have a code base that's, like, not trivially sized, you're going to get a lot of value out of it because it will understand your code base better. But everything we're building is with an eye toward professional software developers. So those folks usually deal with large code bases. And so we build a context understanding into every part of the product, including, for example, the prompt enhancer. So the prompt enhancer gives you a mini spec that also takes your code base into account. So it's not just based on your prompt, it's based on everything it's seeing. And that is also why we're trying to kind of be very thoughtful about how we do code review, because we know that code review is such an important function in professional software teams. So that's what we're mainly focused on. And I would say also, personally, yes, it's a lot of fun to do zero to one with Vibe coding and not even look at the code and just see that it works and so on. But I've seen this repeatedly. Once you get to, like, around 10 to 20,000 lines of code with your 0 to 1 project, if you haven't looked at anything the agent has done, you're in for a rude awakening because something's probably messed up in there from what I've seen. So it's really worth it to review the code as you go along with the agent, even if you're doing 0 to 1. And of course, if you're not doing 0 to 1, you kind of have to, because you have an existing code base and you want it to respect the patterns and so on and so forth.
Kevin Ball
Yeah, no, totally. I think one of the things that you all are solving for with the context is the fact that anytime you have a set of existing code, it may not match what's out in the world. Especially legacy code bases often have their own special snowflake pieces. So you need the. I think that's a place where many developers get hung up because they say, hey, it doesn't understand my code. It's trying to write something that doesn't match at all. And I think my intuition is if you're doing the context management right, it will. But I'm kind Of curious. How do you navigate legacy code bases? Are there differences in how the product approaches those?
Guy Gharari
So actually, no. We've come up with a solution that works for small code bases, large ones, new, old. It is very steerable. And so one thing that happens with legacy code bases is that sometimes there are parts of the code base that you wanted to use, like maybe the new stuff, maybe I'm working on a new piece of code and I wanted to follow the new way of doing things, and then sometimes I wanted to follow the old way of doing things. And these can be different parts of the code base. But Augment is incredibly steerable, so. So even though it has a broad view to the whole code base, you can just tell it. So that's where the intent comes in, right? It can see the whole code base, but it doesn't always know what it is that you want. But you can tell it in a few words which parts to follow, which parts not to follow. Without having to go and point it at specific files or specific functions or anything like that. You can keep it at a fairly high level and as long as there's enough guidance there for it to figure out what you're talking about, you're going to get good results out of it. Another thing I wanted to mention in the context of professional software developers is we try to meet developers where they are. So that means that we don't actually have our own ide. We've never developed our own ide. We integrate into VS code. We also integrate into Jetbrains and VIM and now CLI if you want to work in the terminal. And so that to us is all part of. We want to give the agents the best context. We want to let the developer do as little as possible and, and also not have to change their work environment. And so if you really like using JetBrains, as many of our developers do, you can just use Augment in Jetbrains. You don't have to switch tools to do that.
Kevin Ball
That is definitely a win, I think. I. I'm a longtime VIM person myself and I briefly had to navigate into VS code or a VS code fork when Cursor was like the bleeding edge of everything in this domain. And I'm very happy about the trend back towards like, okay, CLI tools and integrations everywhere. Tapping into something that you mentioned earlier and kind of digging into this. So you mentioned something about how, oh, the models today can handle this versus that. How do you think about model selection? And this is something I've seen, you know, some Tools just kind of expose, here's your palette of models. You do your own thing. Others say, no, we're going to pick for you. So how do you think about what models to use?
Guy Gharari
Yeah, it's kind of constantly evolving. So for the longest time we did not show any model choice to users and we even went on record saying we never will. And the reason was that there was really one game in town and it was Claude that was sans sonnet 3.5. And then through 3.7 and 4, it was just by far the best coding model out there. But now things have changed. There are multiple models, I would say, using GPT5. It's definitely right now, in my experience, on par with Claude. We're seeing good results. We're still trying to figure out like the right reasoning level for GPT5 and what gives you the best kind of value. And we're making improvements in how we integrated it into the product. But GPT5 is a very solid coding agent, I would say like a frontier coding agent. And there are others that come out. There are a lot of interesting open source models that come out. GROK Code has a lot of people talking. I would say some of those other models are kind of in a different way tier I would call them. They're low cost models. They're not as good as the frontier models, but they cost a fraction of the price of the frontier models. So almost as often happens in this space, almost overnight, everything changed. And going from one model that you can use to having, I don't know, four, maybe five models that you could actually conceivably use as agents, we call all these other models viable agents. It's maybe not the best of the best, but you can still use them. Maybe you can even use them daily. And so we now have a model picker. We put GPT5 in there because we felt that for the frontier models, if the model is good enough, and we've seen that it has a different style, we don't want to give users the choice to go between them. Maybe you like GPT5's answer style better. Right. And so you should be able to use that.
Kevin Ball
They definitely have different styles. Claude will write buckets of code and GPT5 will think for a few 20, 30 seconds and then make a two line change.
Guy Gharari
Exactly. And so I think one thing to say is that there's no longer one right answer for everyone on what model to use. So that's a primary reason for us to introduce a second model in a model picker. And then Besides that, we are looking very closely at the other models, including the low cost models, and trying to understand how they fit into the product. Because now that we have a model picker, it's easier to add them. We don't want it to be a slippery slope, like we're not going to have 20 models in there. That seems overwhelming. And I think again, if you're a professional software developer, you probably don't have time to spend evaluating 20 different models. So we want to still be very opinionated about it. We want to explain what the different models are good for and and when to choose them. So we're thinking to have a very short list of models and not every new model that comes out. It's not going to automatically make it in there. But it does seem like a time where it makes sense to have more model choice and we'll see, maybe in the future it will all get consolidated again into one big system and we can remove the model picker. But right now it's a moment in time where choice seems like the right thing within limits, like maybe three or four rather than 20 different choices.
Kevin Ball
I'm curious how much individual harness code you write around those different models. To use a very, maybe slightly dated example. But like, one of the things that has been observed widely is, for example, the various Sonnet models react very well to XML formatted prompts of different sorts if you're trying to do structured things. Whereas if you're trying to do structured things over in GPT, landmarkdown might be a better answer. And those are things that you can kind of swap out at different levels. But I'm curious. Yeah. How do you think about not just model picking, but like the harness around the models?
Guy Gharari
Yeah. Every model we introduce and every time we've also evaluated a model or upgraded a model, even within the same model series, it required its own prompt. I think that most of the prompt is shared, so by prompt, you know, there are two pieces to it. There's the system prompt and then there are the tools and the schema. Right. How they're defined, what the descriptions are. So those are the main things that go in the prompt. Different models require different prompts, for sure. One difference between Sonnet and GPT5 is that Sonnet is a more opinionated model than GPT5. And so if you want it to do things that it doesn't naturally do, like for example, explore the code base before it because it's really eager to go and edit code.
Kevin Ball
Sure is.
Guy Gharari
Sure. Yes.
Kevin Ball
Please fix this for me. I've reformatted your entire code base.
Guy Gharari
Exactly, exactly. And it's now production ready. Right. That's like the. Yes. And so. And so if you wanted to go and explore a bit and collect information before it starts working, which is very important for us. Right. For our target audience with existing large code bases, you have to really push it to do that. GPT5 is different. It's a lot more steerable. So the way GPT5 reacted to those instructions was to do an excessive amount of exploration and so we had to dial that back. So that's an example of just different. I think it just goes back to different ways these models were trained and different things that were baked into the post training basically. And so we have to make those changes. I can point to another common failure mode is the ability of these models to edit files. You know, file edits is such a key function and they just get it wrong so often and it always requires tweaks to getting them to reliably edit files. The worst thing that happens with file editing is they will make a successful file edit and they will drop a line or add a line and it just results in like, you know, uncompilable code. The less bad thing is to have them make a tool call but then not actually be able to edit the file, have the tool call failed, which is a bit annoying. But then models typically recover from that and make a correct tool call. So that's editing files is another area where we typically have to do some amount of prompt tuning to get new models to work. Yeah, but I say the main thing is just like the overall style and can we get the. If it at all can behave as an agent, then you kind of want to tweak the system prompt to get it to behave nicely.
Kevin Ball
That brings to mind something else. So something I'd heard of one of the players in the space exploring because of exactly that problem you highlight around editing models is actually building a custom diffing model where they would take whatever the model generated and pass it through this customized model to generate a good diff or things like that. To what extent are you all playing with your own custom models? Is that something that you're investing in? Where is there space where that's needed?
Guy Gharari
Yeah, for us we had a custom model for edits, but models have gotten good enough and we were able to, with enough prompting work we were able to remove that need. And so we don't have a custom model for that. The main place where we have custom models are for the code base. Understanding for the context engine, because that is where, you know, agents can navigate the code base with like LS and grep and things like that. But, but ultimately figuring out what part of the code base you need is a retrieval problem. Even with agents that have access to tools, it's still a retrieval problem. And there is a basic problem in retrieval called the semantic gap. That's where if you are trying to find things based on a given string and the actual string in, let's say the code base is quite different, you're going to have a hard time grepping for it. Now agents have made that a bit easier because they can try out a bunch of different combinations. You sometimes see them hunting around for the right string to grep, for example, in a code base. But still, if I didn't give it quite the right string or if I don't even know what the string is called, let's say I'm looking for a function and I don't even remember what that function is called, then agents with just file system access are going to have a hard time finding the right information. And so that's where we come in and train our own models to be able to close the semantic gap and surface the right information to the agents no matter what. So that's one area where we're always investing and iterating with our own models. And then another area is the non agentic features completions. NextEdit. Those are all driven by models that we train ourselves and are of course still maintaining and iterating on.
Kevin Ball
Yeah, that's super interesting. And I think a nice thing about that is it gives you a potential moat where a lot of companies in your space who are not doing that. I think there was a direct quote from somebody at one of these companies says there is no moat. I have no moat whatsoever. And that's definitely been there in terms of talking about it from a machine learning standpoint. I think there was a famous quote out of Google at some point leaking like we'd have no moat. Everybody can train these models maybe to be just capital. How do you think about this space of coding agents? Like are we going to end up in a place where somebody develops a big competitive advantage and they run away with it? Or is this there's always going to be five different flavors so we can.
Guy Gharari
Talk about the model layer and the application layer separately. Right. So for the model layer, again, Claude was the clear leader for about a year and now others have meaningfully caught up, I would say, especially OpenAI with GPT5 and so I'm excited to see how that evolves. It doesn't seem like anyone is because of the recent developments, it doesn't seem like anyone is running away with it. There's healthy competition from my perspective. And then I think on the application layer, I think the context understanding is a big part of the value that we can provide. I don't know if we have a moat there, but I think we are clearly differentiated in terms of the performance that our agent makes on large code bases. And for us, we intend to keep pushing in that direction. I mean, code base is like the most important source of context for the agent, but there are others. We recently released a new feature where the agent also has access to your code history. And so those are like access to commit messages. And there's a lot more, of course in professional software development there are a lot of other interesting sources of context. I do think that as we're going from agents are being used for interactive software development to agents being used more and more for automation, that's where there is a lot of room for innovation and kinds of innovation that we haven't seen so far. Because we and others, we've all been so focused on working in the IDE and giving developers the best interactive experience that now I expect there to be kind of a new wave of product development in terms of okay, how do you take these agents and now start automating more and more of the software development life cycle? Of course it's not going to happen in like one shot. We're not going to go in and just automate the whole thing. That will probably take years if we ever get to that. But there's a lot of low hanging fruit, like I mentioned before, ticket to pr, looking at production logs, incident response, a lot of other things that you can do to make developers lives better. So that's where I expect the next interesting products to come out. While of course we are still iterating on improving the product for interactive software development, because for the complex tasks you're still going to want to be in the ide, you're still going to want to supervise what the agent is doing closely. I don't think that's going away anytime soon. And so just going back to your question, I think in the application layer that's where I expect a lot of the competition to happen is to see which vendors can really go beyond the IDE and give you a cohesive software development experience. Maybe a platform, not just for individual developers working on their feature development or bug fixes. In the ide. But really how do you go and improve the way the whole software development life cycle happens on your team? That's where I expect competitive ways.
Kevin Ball
I love that because I feel like, yeah, to your point, all these tools are focused on the individual developer, but team dynamics are shifting as well. I mean, I highlighted all the code I had to review, but that's just like one example of the ways in which this is changing. So you have a frontline seat your customers as they're adapting these tools and doing it, how they're having to change. What are you seeing in terms of the ways that teams are changing their functioning in this era?
Guy Gharari
It's just starting. I think it's a bit early because the model capabilities are fantastic. The products, I don't think the products are completely caught up to the model capabilities yet. And then I think for many developers, most developers are still in the phase of adopting these tools and trying to understand how do you get the most value out of the existing products. So it's all kind of evolving together. Models are getting better, products are getting better and developers are learning how to adopt and get the most value out of the products. I think from what I've seen, the most forward looking teams are looking very closely at automation from our perspective. They're using our CLI tool to automate things like so I mentioned ticket to PR before. You can do things like automatically scan your production logs for errors and then open tickets based on those and correctly assign the tickets to individual developers. Definitely people are looking at incident response. I think code review we already mentioned and people are getting very creative with it. I think some folks are doing scanning for security vulnerabilities. So these are all things that once you have a tool like cli, which means you take your agent, it's the full feature agent with full code based understanding and so on. But it's in a CLI form factor. You can easily put it into your GitHub Actions or your CI CD platform and you can start automating everything because you kind of have this unit of intelligence that has escaped the IDE and you can now put it anywhere you want and get intelligence in there with a full understanding of your code base. And it's all very early, but that's where I'm seeing things. Or developers who are more forward thinking, that's how they're pushing the envelope is they're using these tools to automate more and more things in their team. That's the main thing I'm seeing right now.
Kevin Ball
So if you were to track that forward. What do you think being a software developer in say 2030 or 2027 even, what does that look like as we map this forward?
Guy Gharari
So it really depends on how fast and how models are going to get better. Right. So right now, models, as we talked about, they're very good at writing code and writing correct code, but they're not good at looking at the big picture and making the correct architecture decisions. So if we extrapolate from that and assume that that kind of remains a limitation of models, but they will get more intelligent, they will get faster, they will get cheaper. Right. Then I think it looks like developers become tech leads. They manage probably fleets of agents, and then the challenge for developers is going to be how much context can you fit in your head in terms of what all the agents are doing and all the context switching that you have to do when you look at what different agents are doing, which is already true today, if you're pushing agents to the limit today, you're probably running a few of them in parallel, doing a few different things. And that already is like, even if you have two or three of them.
Kevin Ball
My brain taps out at two. Sometimes I can manage three, but it really two independent threads is about all I can contain.
Guy Gharari
Exactly. I do expect things to improve because one thing that pretty clearly will happen is that I expect we will go from a world where you have one agent with one agent loop doing something to having multi agent systems where you have probably a top level agent or orchestrating sub agents. That seems to be where things are headed. And so we will be able to tackle, I expect, more complex tasks with less human involvement. But still you need a human to supervise things at a high level. Again, supervise the design, the architecture, make sure we're going in the right direction. So that feels like a tech lead role. And so one potential way this can go is developers will become tech leads. And then you know, what skills do you need in that role? Well, what's the most important, the most important skills for a tech lead are you need to understand the technology really deeply. Again, that's very important so that you can supervise decisions and steer things in the right direction. The other very important skill for a tech lead is communication. You have to know how to communicate well. Well, that's just another way to say you need to prompt the agents. Well, right. When we say good prompts, that just boils down to communication. So that's one way this could go. If on the other hand, model capabilities will get so good that you can rely on them more for the technical decisions, then I expect to see the role evolve to be more probably product focused. I mean, I expect that to happen anyway, but it could be that the product decisions become actually the main decisions that you make. And then if you have deep technical skills, I'm pretty confident you will have an edge because I don't think agents will get good like within two years. So good that you really just don't even need to worry about the technical parts. But probably you'll be spending more of your time as my guest thinking about the product, thinking about users and making sure that the direction and those decisions are correct. So it kind of in a two year time frame, I think it kind of depends on how quickly models get, especially at improving what are their current limitations, which is like really the deep technical understanding.
Kevin Ball
So if I'm hearing you correctly, what the job starts looking at more and more and this is already happening, but more and more it's about the decision making. It's not the hands on the keyboard typing out code anymore, which it hasn't been since probably Sonnet 3.5. It's about making decisions. So if that is the future, what needs to happen in our coding tools? What does Augment need to do and all of these other folks doing to support that decision making process?
Guy Gharari
Yeah. So one thing that we think is important, we already talked about before, is going from as we're talking about tackling harder and harder tasks, more complex tasks, it's not an individual developer story anymore. Right. It's really more of a team story and a team effort. So we have to get a lot better at supporting whole teams rather than just individual developers. Another thing we have to do is in order to get there, we are going to have to automate a lot. And like we already talked about how code review becomes the bottleneck, right. There are going to be other bottlenecks that come up, like looking at what's happening in production. That's already an extremely painful thing in many companies is understanding what's going on and handling outages and breakages and so on. So for us I think it means both developing features for the most common tasks to help developers become more productive by taking away the toil of those repetitive tasks, like code review, I would say, but then also giving developers. You know, developers are extremely creative and they're very used to automating things and building their own tools. So from our perspective, we also want to give them kind of the right building blocks so that they can go and automate tasks within their team without us having to build every single feature for them. And so we constantly think about the balance of like, which features should we build versus which general tools should we provide, like CLI remote agents to let developers build their own automations. All of that has to happen for developers to really become tech leads, right. And not have to worry about the day to day stuff. I mean, it's not just in the ide, it's also all this other stuff you have to deal with. We didn't mention deployments. I mean, there's just a lot of stuff you have to do, maybe at a high level. There's so much tooling that needs to be built to get to the point where developers can really just focus on the decisions. So I think that's the large hurdle in front of us is building all of that tooling for developers and then making it as easy for them as possible to get to that point where they make these decisions.
Kevin Ball
Yeah, no, that makes sense. Well, and one of the things you said there led me down a thread of thought. So I know you invest a tremendous amount in context management. One of the things that I've definitely seen playing with all of the tools in this space is like I still know more about my code base than the tools know, no matter how good they are. So how are you thinking about exposing to developers the ability to build their own tools that plug into your contact system, right. And say, oh, for my code base, I know you should look at these docs or you should search that or what, what have you.
Guy Gharari
It's a question how do you take what's like the developer has a lot more context than the agent. How can the developer effectively steer the agent?
Kevin Ball
Or steering is one step. But you were talking about empowering developers to build tools. So can I build tools that plug into Augment?
Guy Gharari
Oh, I see. So there are different levels to this. The simplest level is all of our agents support mcp. And so if you want to plug into Augment, you want to give it additional context, either write or connect to your mcp. I think the flip side of that is one thing we often get asked is people tell us Augment has the best context understanding, but I have a more sophisticated bigger system. How can I take Augment and the context understanding there and plug it into my existing system? And our answer currently to that is we don't give you access to the context engine directly, but we give you access to the agent as a CLI program and that agent has the full code base understanding through the context engine. So my recommendation is if you want to use Augment and especially Augment's context understanding in the bigger setting, just use the CLI agent. Just use it. You can ask it questions about your code base, right? It's not just for writing code.
Kevin Ball
Treat it as a building block essentially.
Guy Gharari
Exactly. For us, the CLI is a building block. You can use it for interactive development, you can put it in your GitHub Actions, but you can also use it just exactly as a building block inside your bigger system. Maybe you have a bigger multi agent system already that does stuff and you just need to put the context understanding in there. Just use our CLI and just use it to answer questions about the code base or explore it or come up with design docs or specs or whatever it is. It works really well for that.
Kevin Ball
I love that. And you've got then essentially a built in ability to do a multi agent system. You spawn Augment and say, hey, here's your CLI tool for Augment. Go and do a thing.
Guy Gharari
Exactly. It's Unix, right? You just spawn another process. It just so happens that this process is this highly intelligent thing that knows about your code base, but it's still just a process. You can launch multiple of them, you can do whatever you want.
Kevin Ball
That's really cool. I think to your point, the more we can build these UNIX like building blocks and start layering and start doing that, we're going to see compounding effects here. So we're getting close to the end of our time here. I want to check in. Is there anything we haven't talked about today that you want to make sure we do talk about before we wrap?
Guy Gharari
No, I think I would just reiterate that if you're working on a large code base, you've probably tried some of the other tools. If you haven't tried Augment before, I encourage you to go to augmentcode.com, download. It doesn't matter if you're VSCode or JetBrains or you prefer to work in the Terminal. Any form factor will give you the full code based understanding. And I think you will see within, I don't know, probably 30 minutes of using it, just exploring your code base, trying to write code. I think you'll see the difference. With our context engine, that's what we keep hearing from users. And so try it out. And if you try it out and have any feedback, we're always happy to get feedback. Positive, negative, all of it. We're always striving to improve our product. But yeah, that's my recommendation. It really is to me. It's like I would say a life changing experience to be able to go into a code base and just navigate it. It's a freeing experience to just navigate it with like pretty high level instructions. I think the other thing I would mention, we actually didn't talk about that, but there's the use case where developers already intimately know their code base. Like I think in your case, Karen, you know the code base really well and you need something to make you more productive. There's another use case we see a lot of in software teams, which is maybe I'm a new member to the team or maybe it's a really large code base and there's a part of the code base I haven't gone into. That's where you see augment really shine because you can see how it does things that I can see how it does things that would have taken me days probably to figure out, just kind of in minutes because, oh, I didn't know the code is even structured this way. I didn't know how it works and so on and so forth. And I can just get to working code really, really quickly. So looking at unfamiliar code is another very important use case for us and a very important use case for the context engine.
Episode: Scaling AI in Enterprise Codebases with Guy Gur-Ari
Date: October 9, 2025
Guests:
This episode dives into the evolving landscape of AI-assisted coding, with a focus on Augment Code, a platform designed for deep contextual understanding and automation in large, complex enterprise codebases. Guy Gur-Ari shares insights from his experience as a co-founder of Augment, reflecting on the technical, product, and human changes wrought by AI coding agents in professional software teams.
Key themes include the limitations of current large language models (LLMs), practical strategies for closing the context gap in legacy codebases, the shifting role of code review, and predictions for the future "tech lead"-style developer as agentic systems advance.
On closing the validation loop in coding vs math:
"With code ... we can really close the loop between the model writing code and then being able to execute code and getting the feedback from that and iterating until it gets the code to work."
— Guy Gur-Ari ([03:25])
On prompting and productivity:
"The more I can tell the model or the agent about my intent and the more I can tell it about how I wanted to accomplish the task, the better result I'm going to get."
— Guy Gur-Ari ([11:03])
On code review as bottleneck:
"As agents start writing 80, 90% or more of your code ... code review becomes the bottleneck."
— Guy Gur-Ari ([15:54])
On the future role of developers:
"Developers become tech leads. They manage probably fleets of agents, and then the challenge for developers is going to be how much context can you fit in your head in terms of what all the agents are doing."
— Guy Gur-Ari ([41:12])
On Augment’s differentiator:
"We are clearly differentiated in terms of the performance that our agent makes on large code bases. For us, we intend to keep pushing in that direction."
— Guy Gur-Ari ([35:39])
On extensibility and plugging into workflows:
"For us, the CLI is a building block... you can use it for interactive development, you can put it in your GitHub Actions, but you can also use it just exactly as a building block inside your bigger system."
— Guy Gur-Ari ([48:56])
This episode provides an in-depth look at how AI-powered coding assistants are evolving from productivity tools for individuals toward foundational, team-centric automation platforms in the enterprise. Guy Gur-Ari of Augment highlights the technical breakthroughs, product challenges, and human factors involved in deploying agents that can cope with sprawling, messy codebases—while anticipating the rise of the "developer as tech lead, agent orchestrator." If you're interested in where AI tooling for code is heading, and what it takes to bridge the gap from vibe coding to rigorous, maintainable software, this episode delivers fresh, actionable insight.