Software Engineering Daily

Episode: Scaling AI in Enterprise Codebases with Guy Gur-Ari
Date: October 9, 2025
Guests:

Guy Gur-Ari, Co-founder, Augment
Kevin Ball (K. Ball), VP Engineering at Mento

Overview

This episode dives into the evolving landscape of AI-assisted coding, with a focus on Augment Code, a platform designed for deep contextual understanding and automation in large, complex enterprise codebases. Guy Gur-Ari shares insights from his experience as a co-founder of Augment, reflecting on the technical, product, and human changes wrought by AI coding agents in professional software teams.

Key themes include the limitations of current large language models (LLMs), practical strategies for closing the context gap in legacy codebases, the shifting role of code review, and predictions for the future "tech lead"-style developer as agentic systems advance.

Main Discussion Points & Insights

1. From Math Reasoning to Coding Agents

Math vs. Code Formal Verification ([02:03]–[05:00])
- Gur-Ari’s background in AI research for math led to an interest in code as a reasoning challenge.
- Quote ([03:25], Guy Gur-Ari):
  "With code, this is why we're realizing this vision of really grounding the model's answers in reality now. And this is why we're seeing agents take off and so on and so forth. So it's a very exciting time to be working on AI for code."

2. Closing the Loop: Validation and Feedback

Augmenting Model Capabilities with Context & Feedback ([05:00]–[06:59])
- Agents benefit from feedback via type checking, linter errors, and nudges to run tests.
- Incorporating logs, metrics, and traces is seen as the next frontier for more robust context.

3. Context Management Strategies

Implicit vs. Explicit Context and "Infinite Context" ([06:59]–[09:28])
- Augment's philosophy: Only provide necessary context proactively, keep agents as autonomous as possible, and minimize user manual intervention.
- The “infinite context” principle ensures users need not worry about token limits or context window size.
- Quote ([07:26], Guy Gur-Ari):
  "We try to keep the agent as autonomous as possible... we will not put things automatically in the context window from the code base, for example, unless we're really, really sure that this is what the agent wants."

4. Making Context Rot "Magically" Disappear

Technical Challenges Remain ([09:28]–[10:26])
- "Context rot" (forgetting prior context) is still an open problem. Retrieval, summarization, and prioritization tricks help, but not a full solution yet.

5. Effective Use of AI Coding Tools: It Starts with Prompting

Harnessing Productivity through Intentional Usage ([10:26]–[12:55])
- User productivity is highly variable, often based on prompting skill.
- Quote ([11:03], Guy Gur-Ari):
  "Even in the prompt box, context really matters... The more I can tell the model or the agent about my intent and the more I can tell it about how I wanted to accomplish the task, the better result I'm going to get."
- Augment’s "prompt enhancer" feature helps users create more effective prompts by auto-expanding short inputs into fuller specs.

6. Limits of Agentic Coding: What Can AI Really Do?

Back-and-Forth and One-Shot Task Complexity ([12:55]–[15:07])
- Complex pull requests (PRs), even those with thousands of lines, can be managed by the agent with sufficient user steering.
- Repetitive or relatively simple tasks can often be fully automated (e.g., ticket to PR flows, code review comment generation).

7. Code Review: The New Bottleneck

Automation and Future Directions ([15:07]–[18:29])
- With AIs generating so much code, human code review becomes the bottleneck.
- Automation for first-pass reviews (bug detection, consistency) is already being implemented; Augment is developing more in this area.
- Future: Rethinking the division between code-writing and code-reviewing agents.
- Quote ([15:54], Guy Gur-Ari):
  "As agents start writing 80, 90% or more of your code... code review becomes the bottleneck."

8. Architectural Oversight & Maintainability Challenges

Limits of Agent Understanding ([18:29]–[21:06])
- LLMs are effective at catching bugs but struggle with maintaining good architecture and design—human oversight remains critical.

9. Vibe Coding vs. Professional Engineering

Greenfield vs. Legacy/Enterprise Needs ([21:06]–[23:18])
- While “vibe coding” is fun for small, disposable greenfield projects, maintainability and architecture become critical in professional contexts.
- Augment’s product focus is on aiding professional teams and large codebases.

10. Legacy Codebase Support and Tool Integration

Steerability & Environment Integration ([23:18]–[25:35])
- Augment’s context management works across both small and massive legacy codebases; users can steer the agent to new or old codebase patterns via intent.
- Product integrates with popular IDEs (VS Code, JetBrains, VIM, CLI).

11. Model Selection and Customization

From Single-Model to Multi-Model Era ([25:35]–[29:33])
- Augment now offers users a model picker (e.g., Claude, GPT-5), as multiple models have reached production viability.
- Each model behaves differently:
  - Quote ([28:08], Kevin Ball):
    "Claude will write buckets of code and GPT5 will think for a few 20, 30 seconds and then make a two line change."
- Only a short curated list of models is offered (professional focus).

12. Prompt Engineering and Harnessing for Each Model

Customizing System Prompts, Tool Use per LLM ([29:33]–[32:38])
- Each model requires tailored prompting and harness code for optimal performance, especially for file edits and code exploration phases.
- Quote ([30:59], Guy Gur-Ari), on Sonnet model behaviors:
  "It's now production ready. Right. That's like the. Yes. And so. And so if you wanted to go and explore a bit and collect information before it starts working, which is very important for us... you have to really push it to do that. GPT5 is different. It's a lot more steerable."

13. Custom Models for Semantic Context & Retrieval

Where Augment Invests in ML ([32:38]–[34:59])
- Main differentiation: their own models powering semantic search and retrieval, enabling agents to succeed in unfamiliar or poorly structured codebases.

14. Moats, Differentiation, and Application Layer Innovation

Where "Moat" Exists in the Stack ([34:59]–[38:31])
- Foundation models are at relative parity (for now).
- Application moat and differentiation come from superior context and automation features (retrieval, code history, etc.).
- Next competitive frontier: automating more of the software lifecycle, extending beyond individual developer productivity.

15. Team Dynamics & Automation

How AI is Changing Team Life ([38:31]–[40:59])
- Early-adopting teams are automating ticket creation from logs, doing code review in CI, vulnerability scanning, and more.
- CLI agent as an enabler for embedding intelligence everywhere, not just the IDE.

16. Looking Forward: The Developer as Tech Lead & Agent Orchestrator

Role Evolution Toward Supervision, Architecture & Product ([40:59]–[44:31])
- In the near future, developers may supervise multiple agents ("fleet management"), focusing on architecture and high-level decision-making.
- Quote ([41:12], Guy Gur-Ari):
  "Developers become tech leads. They manage probably fleets of agents... the challenge for developers is going to be how much context can you fit in your head."
- As models improve, the balance may shift even more toward product and user decisions.

17. Tooling to Support High-Quality Decision-Making

Team Support & Customization ([44:31]–[47:10])
- The need for team-centric features and building blocks for user automation.
- Quote ([45:02], Guy Gur-Ari):
  "We have to get a lot better at supporting whole teams rather than just individual developers... giving developers the right building blocks so that they can go and automate tasks within their team."

18. Power User Features: Exposing and Customizing Context

Empowering Developers to Build on the Platform ([47:10]–[49:43])
- Augment exposes context via their agents (and CLI) for use as building blocks within larger systems.
- Quote ([48:56], Guy Gur-Ari):
  "For us, the CLI is a building block... you can use it just exactly as a building block inside your bigger system. Maybe you have a bigger multi agent system already that does stuff and you just need to put the context understanding in there."

Notable Quotes

On closing the validation loop in coding vs math:
"With code ... we can really close the loop between the model writing code and then being able to execute code and getting the feedback from that and iterating until it gets the code to work."
— Guy Gur-Ari ([03:25])
On prompting and productivity:
"The more I can tell the model or the agent about my intent and the more I can tell it about how I wanted to accomplish the task, the better result I'm going to get."
— Guy Gur-Ari ([11:03])
On code review as bottleneck:
"As agents start writing 80, 90% or more of your code ... code review becomes the bottleneck."
— Guy Gur-Ari ([15:54])
On the future role of developers:
"Developers become tech leads. They manage probably fleets of agents, and then the challenge for developers is going to be how much context can you fit in your head in terms of what all the agents are doing."
— Guy Gur-Ari ([41:12])
On Augment’s differentiator:
"We are clearly differentiated in terms of the performance that our agent makes on large code bases. For us, we intend to keep pushing in that direction."
— Guy Gur-Ari ([35:39])
On extensibility and plugging into workflows:
"For us, the CLI is a building block... you can use it for interactive development, you can put it in your GitHub Actions, but you can also use it just exactly as a building block inside your bigger system."
— Guy Gur-Ari ([48:56])

Memorable Moments

[10:26] Kevin Ball and Guy Gur-Ari share stories about the wide range of productivity and frustration experienced with LLM coding tools.
[15:41] K. Ball joking about reviewing "100,000 lines of code" thanks to AI agents, highlighting how speed creates new bottlenecks and stresses.
[28:08] Both reflect on the distinct "styles" of leading LLMs when writing code, with GPT-5's precision contrasted against Claude’s verbosity.
[41:12] Gur-Ari muses on the mental limits of developers as "agent fleet managers," and Ball notes that "my brain taps out at two."

Timestamps for Key Segments

[02:03] Guy’s background: AI reasoning, math, and code.
[05:27] Augment’s approach to validation and closing the loop.
[07:26] Explicit vs. implicit context and the "infinite context" approach.
[11:03] Prompting as the key to successful AI-assisted coding.
[13:30] What agentic coding is currently capable of.
[15:07] The growing bottleneck of code review in an AI-driven world.
[18:29] The limits of AI in architectural/design code review.
[23:52] Supporting legacy code with steerable, context-aware agents.
[26:15] Model selection: from one leader to several viable contenders.
[30:05] Prompt/harness customization for each model’s quirks.
[32:38] Where Augment builds its own in-house models for retrieval/context.
[34:59] Where the "moat" lies in AI coding platforms.
[38:57] Early signs of team-level automation using Augment CLI.
[41:12] The future developer: fleet manager, architect, product thinker.
[44:31] Tooling for high-level team decision making.
[47:10] Custom context and building blocks for power users and integrators.
[50:03] Final thoughts—Augment's differentiator for exploring unfamiliar code.

Summary

This episode provides an in-depth look at how AI-powered coding assistants are evolving from productivity tools for individuals toward foundational, team-centric automation platforms in the enterprise. Guy Gur-Ari of Augment highlights the technical breakthroughs, product challenges, and human factors involved in deploying agents that can cope with sprawling, messy codebases—while anticipating the rise of the "developer as tech lead, agent orchestrator." If you're interested in where AI tooling for code is heading, and what it takes to bridge the gap from vibe coding to rigorous, maintainable software, this episode delivers fresh, actionable insight.

Software Engineering Daily

Episode: Scaling AI in Enterprise Codebases with Guy Gur-Ari
Date: October 9, 2025
Guests:

Guy Gur-Ari, Co-founder, Augment
Kevin Ball (K. Ball), VP Engineering at Mento

Overview

Main Discussion Points & Insights

1. From Math Reasoning to Coding Agents

Math vs. Code Formal Verification ([02:03]–[05:00])
- Gur-Ari’s background in AI research for math led to an interest in code as a reasoning challenge.
- Quote ([03:25], Guy Gur-Ari):
  "With code, this is why we're realizing this vision of really grounding the model's answers in reality now. And this is why we're seeing agents take off and so on and so forth. So it's a very exciting time to be working on AI for code."

2. Closing the Loop: Validation and Feedback

Augmenting Model Capabilities with Context & Feedback ([05:00]–[06:59])
- Agents benefit from feedback via type checking, linter errors, and nudges to run tests.
- Incorporating logs, metrics, and traces is seen as the next frontier for more robust context.

3. Context Management Strategies

Implicit vs. Explicit Context and "Infinite Context" ([06:59]–[09:28])
- Augment's philosophy: Only provide necessary context proactively, keep agents as autonomous as possible, and minimize user manual intervention.
- The “infinite context” principle ensures users need not worry about token limits or context window size.
- Quote ([07:26], Guy Gur-Ari):
  "We try to keep the agent as autonomous as possible... we will not put things automatically in the context window from the code base, for example, unless we're really, really sure that this is what the agent wants."

4. Making Context Rot "Magically" Disappear

Technical Challenges Remain ([09:28]–[10:26])
- "Context rot" (forgetting prior context) is still an open problem. Retrieval, summarization, and prioritization tricks help, but not a full solution yet.

5. Effective Use of AI Coding Tools: It Starts with Prompting

Harnessing Productivity through Intentional Usage ([10:26]–[12:55])
- User productivity is highly variable, often based on prompting skill.
- Quote ([11:03], Guy Gur-Ari):
  "Even in the prompt box, context really matters... The more I can tell the model or the agent about my intent and the more I can tell it about how I wanted to accomplish the task, the better result I'm going to get."
- Augment’s "prompt enhancer" feature helps users create more effective prompts by auto-expanding short inputs into fuller specs.

6. Limits of Agentic Coding: What Can AI Really Do?

Back-and-Forth and One-Shot Task Complexity ([12:55]–[15:07])
- Complex pull requests (PRs), even those with thousands of lines, can be managed by the agent with sufficient user steering.
- Repetitive or relatively simple tasks can often be fully automated (e.g., ticket to PR flows, code review comment generation).

7. Code Review: The New Bottleneck

Automation and Future Directions ([15:07]–[18:29])
- With AIs generating so much code, human code review becomes the bottleneck.
- Automation for first-pass reviews (bug detection, consistency) is already being implemented; Augment is developing more in this area.
- Future: Rethinking the division between code-writing and code-reviewing agents.
- Quote ([15:54], Guy Gur-Ari):
  "As agents start writing 80, 90% or more of your code... code review becomes the bottleneck."

8. Architectural Oversight & Maintainability Challenges

Limits of Agent Understanding ([18:29]–[21:06])
- LLMs are effective at catching bugs but struggle with maintaining good architecture and design—human oversight remains critical.

9. Vibe Coding vs. Professional Engineering

Greenfield vs. Legacy/Enterprise Needs ([21:06]–[23:18])
- While “vibe coding” is fun for small, disposable greenfield projects, maintainability and architecture become critical in professional contexts.
- Augment’s product focus is on aiding professional teams and large codebases.

10. Legacy Codebase Support and Tool Integration

Steerability & Environment Integration ([23:18]–[25:35])
- Augment’s context management works across both small and massive legacy codebases; users can steer the agent to new or old codebase patterns via intent.
- Product integrates with popular IDEs (VS Code, JetBrains, VIM, CLI).

11. Model Selection and Customization

From Single-Model to Multi-Model Era ([25:35]–[29:33])
- Augment now offers users a model picker (e.g., Claude, GPT-5), as multiple models have reached production viability.
- Each model behaves differently:
  - Quote ([28:08], Kevin Ball):
    "Claude will write buckets of code and GPT5 will think for a few 20, 30 seconds and then make a two line change."
- Only a short curated list of models is offered (professional focus).

12. Prompt Engineering and Harnessing for Each Model

Customizing System Prompts, Tool Use per LLM ([29:33]–[32:38])
- Each model requires tailored prompting and harness code for optimal performance, especially for file edits and code exploration phases.
- Quote ([30:59], Guy Gur-Ari), on Sonnet model behaviors:
  "It's now production ready. Right. That's like the. Yes. And so. And so if you wanted to go and explore a bit and collect information before it starts working, which is very important for us... you have to really push it to do that. GPT5 is different. It's a lot more steerable."

13. Custom Models for Semantic Context & Retrieval

Where Augment Invests in ML ([32:38]–[34:59])
- Main differentiation: their own models powering semantic search and retrieval, enabling agents to succeed in unfamiliar or poorly structured codebases.

14. Moats, Differentiation, and Application Layer Innovation

Where "Moat" Exists in the Stack ([34:59]–[38:31])
- Foundation models are at relative parity (for now).
- Application moat and differentiation come from superior context and automation features (retrieval, code history, etc.).
- Next competitive frontier: automating more of the software lifecycle, extending beyond individual developer productivity.

15. Team Dynamics & Automation

How AI is Changing Team Life ([38:31]–[40:59])
- Early-adopting teams are automating ticket creation from logs, doing code review in CI, vulnerability scanning, and more.
- CLI agent as an enabler for embedding intelligence everywhere, not just the IDE.

16. Looking Forward: The Developer as Tech Lead & Agent Orchestrator

Role Evolution Toward Supervision, Architecture & Product ([40:59]–[44:31])
- In the near future, developers may supervise multiple agents ("fleet management"), focusing on architecture and high-level decision-making.
- Quote ([41:12], Guy Gur-Ari):
  "Developers become tech leads. They manage probably fleets of agents... the challenge for developers is going to be how much context can you fit in your head."
- As models improve, the balance may shift even more toward product and user decisions.

17. Tooling to Support High-Quality Decision-Making

Team Support & Customization ([44:31]–[47:10])
- The need for team-centric features and building blocks for user automation.
- Quote ([45:02], Guy Gur-Ari):
  "We have to get a lot better at supporting whole teams rather than just individual developers... giving developers the right building blocks so that they can go and automate tasks within their team."

18. Power User Features: Exposing and Customizing Context

Empowering Developers to Build on the Platform ([47:10]–[49:43])
- Augment exposes context via their agents (and CLI) for use as building blocks within larger systems.
- Quote ([48:56], Guy Gur-Ari):
  "For us, the CLI is a building block... you can use it just exactly as a building block inside your bigger system. Maybe you have a bigger multi agent system already that does stuff and you just need to put the context understanding in there."

Notable Quotes

On closing the validation loop in coding vs math:
"With code ... we can really close the loop between the model writing code and then being able to execute code and getting the feedback from that and iterating until it gets the code to work."
— Guy Gur-Ari ([03:25])
On prompting and productivity:
"The more I can tell the model or the agent about my intent and the more I can tell it about how I wanted to accomplish the task, the better result I'm going to get."
— Guy Gur-Ari ([11:03])
On code review as bottleneck:
"As agents start writing 80, 90% or more of your code ... code review becomes the bottleneck."
— Guy Gur-Ari ([15:54])
On the future role of developers:
"Developers become tech leads. They manage probably fleets of agents, and then the challenge for developers is going to be how much context can you fit in your head in terms of what all the agents are doing."
— Guy Gur-Ari ([41:12])
On Augment’s differentiator:
"We are clearly differentiated in terms of the performance that our agent makes on large code bases. For us, we intend to keep pushing in that direction."
— Guy Gur-Ari ([35:39])
On extensibility and plugging into workflows:
"For us, the CLI is a building block... you can use it for interactive development, you can put it in your GitHub Actions, but you can also use it just exactly as a building block inside your bigger system."
— Guy Gur-Ari ([48:56])

Memorable Moments

[10:26] Kevin Ball and Guy Gur-Ari share stories about the wide range of productivity and frustration experienced with LLM coding tools.
[15:41] K. Ball joking about reviewing "100,000 lines of code" thanks to AI agents, highlighting how speed creates new bottlenecks and stresses.
[28:08] Both reflect on the distinct "styles" of leading LLMs when writing code, with GPT-5's precision contrasted against Claude’s verbosity.
[41:12] Gur-Ari muses on the mental limits of developers as "agent fleet managers," and Ball notes that "my brain taps out at two."

Timestamps for Key Segments

[02:03] Guy’s background: AI reasoning, math, and code.
[05:27] Augment’s approach to validation and closing the loop.
[07:26] Explicit vs. implicit context and the "infinite context" approach.
[11:03] Prompting as the key to successful AI-assisted coding.
[13:30] What agentic coding is currently capable of.
[15:07] The growing bottleneck of code review in an AI-driven world.
[18:29] The limits of AI in architectural/design code review.
[23:52] Supporting legacy code with steerable, context-aware agents.
[26:15] Model selection: from one leader to several viable contenders.
[30:05] Prompt/harness customization for each model’s quirks.
[32:38] Where Augment builds its own in-house models for retrieval/context.
[34:59] Where the "moat" lies in AI coding platforms.
[38:57] Early signs of team-level automation using Augment CLI.
[41:12] The future developer: fleet manager, architect, product thinker.
[44:31] Tooling for high-level team decision making.
[47:10] Custom context and building blocks for power users and integrators.
[50:03] Final thoughts—Augment's differentiator for exploring unfamiliar code.

Scaling AI in Enterprise Codebases with Guy Gur-Ari

Get Free Podcast Summaries in Your Inbox

Pick Your Shows

Subscribe Free

Get Instant Summaries

Summary

Software Engineering Daily

Overview

Main Discussion Points & Insights

1. From Math Reasoning to Coding Agents

2. Closing the Loop: Validation and Feedback

3. Context Management Strategies

4. Making Context Rot "Magically" Disappear

5. Effective Use of AI Coding Tools: It Starts with Prompting

6. Limits of Agentic Coding: What Can AI Really Do?

7. Code Review: The New Bottleneck

8. Architectural Oversight & Maintainability Challenges

9. Vibe Coding vs. Professional Engineering

10. Legacy Codebase Support and Tool Integration

11. Model Selection and Customization

12. Prompt Engineering and Harnessing for Each Model

13. Custom Models for Semantic Context & Retrieval

14. Moats, Differentiation, and Application Layer Innovation

15. Team Dynamics & Automation

16. Looking Forward: The Developer as Tech Lead & Agent Orchestrator

17. Tooling to Support High-Quality Decision-Making

18. Power User Features: Exposing and Customizing Context

Notable Quotes

Memorable Moments

Timestamps for Key Segments

Summary

Summary

Software Engineering Daily

Overview

Main Discussion Points & Insights

1. From Math Reasoning to Coding Agents

2. Closing the Loop: Validation and Feedback

3. Context Management Strategies

4. Making Context Rot "Magically" Disappear

5. Effective Use of AI Coding Tools: It Starts with Prompting

6. Limits of Agentic Coding: What Can AI Really Do?

7. Code Review: The New Bottleneck

8. Architectural Oversight & Maintainability Challenges

9. Vibe Coding vs. Professional Engineering

10. Legacy Codebase Support and Tool Integration

11. Model Selection and Customization

12. Prompt Engineering and Harnessing for Each Model

13. Custom Models for Semantic Context & Retrieval

14. Moats, Differentiation, and Application Layer Innovation

15. Team Dynamics & Automation

16. Looking Forward: The Developer as Tech Lead & Agent Orchestrator

17. Tooling to Support High-Quality Decision-Making

18. Power User Features: Exposing and Customizing Context

Notable Quotes

Memorable Moments

Timestamps for Key Segments

Summary