Summary8 min read

Podcast Summary: The Pragmatic Engineer – "Measuring the Impact of AI on Software Engineering" with Laura Tacho

Host: Gergely Orosz
Guest: Laura Tacho (CTO, DX)
Date: July 23, 2025

Episode Overview

This episode dives into the real-world impact of AI tools on software engineering, cutting through media hype to background data, trusted case studies, and the practical realities of how developers are working with AI today. Gergely Orosz and Laura Tacho explore how to measure the true effect of AI on engineering productivity, developer satisfaction, and business outcomes. They discuss adoption trends, what AI actually accelerates (and what it leaves behind), and how organizations should be measuring value. Especially relevant are insights from large companies like Booking.com and Workhuman and tactical advice for engineering leaders who want to roll out AI successfully without falling for misleading metrics or overblown expectations.

Key Discussion Points and Insights

1. Media Hype vs. Reality

Sensational Headlines:
AI in software is frequently overhyped in mainstream media, with headlines suggesting imminent mass-replacement of developers or that AI writes enormous portions of production code.
Quote:
“Those headlines suggest that 30% of Microsoft’s code that’s running in production was authored by AI. That is not at all realistic…we’ve never seen data consistent with that kind of sensational claim.” (Laura, 05:29)
Oversimplification:
Such narratives often misunderstand how AI is used or measured, confusing code completions or autocomplete assistance with genuine code authorship and impact.

2. Measuring AI’s Impact: What Actually Matters

Inadequacy of Simple Metrics:
Metrics like lines of code generated or “acceptance rate” (the rate at which code suggestions are accepted) are misleading and don’t reflect productivity or business value.
Quote:
“Source code is a liability. Do we really want to measure AI impact in terms of lines of code generated? I certainly don’t.” (Laura, 16:03)
Laura & DX’s Measurement Framework:
- Utilization: How many developers actually use AI tools (measured by DAU/WAU as a % of the population, and license allocation).
- Impact: Developer experience improvement, time savings (esp. on system toil or debugging), actual velocity increase.
- Cost: Tooling cost, licensing, and particularly the new challenge of token-based/consumption pricing.
Quote:
“Acceptance rate is just such a tiny part of the story…we need to track the impact across the lifecycle, and really stay focused on the end result – more revenue, reduced cognitive load, better developer experience.” (Laura, 14:11)

3. Adoption Patterns and Use Cases

Case Study—Booking.com:
- Major focus on adoption and enablement (office hours, training, leadership sponsorship).
- Achieved 65% weekly/daily use among developers—well above the industry median.
- Not all devs use AI tools, often due to license constraints or because tools are poorly suited to very novel or specialized code.
- Quote:
  “It’s not necessarily that these individuals are skeptical, some of it is just that their organization doesn’t make a license available…And for some product areas, it’s just not that effective.” (Laura, 20:37)
Top Use Cases (Study of 180+ Companies):
- #1 Time Saver: Stack trace analysis and error debugging
- #2: Refactoring existing code
- #3: Mid-loop (inline) code generation
- Other important uses: code documentation, brainstorming, planning, unit test generation—often more valuable than just generating net new code.

4. The Paradox of Time Savings and Satisfaction

Developer Satisfaction Risk:
When AI accelerates the “fun” parts of coding, developers are left with a greater share of meetings and administrative toil, which can reduce job satisfaction.
Quote:
“Many developers were actually feeling less satisfied because AI is accelerating the parts that they enjoy. And so what was left over was…toil, meetings, the administrative work.” (Laura, 00:04 & 30:08)
Coding Is Not the Bottleneck:
Developers often spend only ~20-25% of their time actually coding, so time saved by AI here doesn’t necessarily translate into dramatic business outcomes or headcount reductions.

5. Architectural and Documentation Shifts

Codebase Design for AI (and Humans):
- Emphasis on “clean interfaces” and discoverability to make it easier for both agentic models and human engineers to navigate and use services.
- “Write documentation for both AI and humans”—lean to examples and easily-ingestable formats, not just narrative docs or screenshots. Companies like Vercel are seen as models here.
Quote:
“The Venn diagram for what’s good for AI agents and good for humans is a circle—clear boundaries between services.” (Laura, 40:47)

6. What Leading Companies are Measuring—and Reporting

Case Study—Workhuman:
- Used the AI measurement framework (utilization, impact, cost) with pre/post baseline data.
- Found an 11% boost in org-wide developer experience, and daily/weekly AI users had 15% higher velocity than non-users.
- Sustained benefit requires both technical measurement and self-reported developer experience metrics.
Advice: Start measuring now, baseline first—don’t delay even if you don’t have perfect data.

7. Evolving Tooling Costs and Investment Patterns

Tool Pricing & History:
Companies once easily spent $3k–8k/year per developer on “core” software tools. We may be entering a similar era, with budgets for AI tools (possibly $1200–$2000/year per dev).
Token-based Consumption:
Key new challenge: who gets access to more AI power—junior or senior devs? Where does ROI per dollar spent show up?
Comparison with Past Hype Cycles:
This AI wave is similar to the container/Kubernetes era: chaotic at first, then gradually stabilized through industry consolidation.

8. Engineering Best Practices for AI Rollouts

Highly Regulated Industries Lead the Way:
- Financial, insurance, and pharma companies are deploying AI more methodically and seeing better results due to required structure and policies.
Case Study—Indeed:
- Methodical experimentation with different tools.
- Segmenting use cases (e.g., using AI reviews to speed up feedback loops across geographies).
- Treating AI adoption as a controlled experiment rather than a one-size-fits-all deployment.
Great Use Case—Automated Migrations:
- Much hated by devs; AI can do heavy lifting and present ready-to-review PRs for upgrades, saving time and reducing monotonous work.
- Prompt engineering tip: manually perform one migration, then use the diff as an example for more accurate multi-file AI migrations.

9. What Should End Users and Companies Expect?

Faster Time to Market:
AI may enable rapid experimentation and shorter time-to-market for validated features. The companies that already excel at experimentation (A/B testing, product iteration) will benefit most.
Quote:
“I think roadmaps are on their way out…companies that are going to win with AI will focus on rapid experimentation.” (Laura, 57:11)
Potential Risks:
Feature bloat and instability if new code isn’t properly validated/tested.

10. Staying Grounded: How Leaders Should Navigate the AI Hype

Data Over Hype:
- Focus on measurement, experiment methodically, and combine developer workflow data with direct feedback.
- “Data beats hype every time.” (Laura, 67:40)
- AI is not a silver bullet; just giving licenses is insufficient—enablement and training are needed.

Notable Quotes and Memorable Moments

On Hype:
- “Can you imagine that headline: ‘ACME Corp only ships code to production that’s been read by robots. Is this the end of software engineering?’” (Gergely, 10:15)
On Developer Experience vs. Productivity:
- “Source code is a liability. …when what could have been written in one line is now written in five lines, do we really want to measure AI impact in terms of lines of code generated?” (Laura, 16:03)
On the Reality of Time Savings:
- “On the very best day, developers are not spending even 80% of their time coding. The average is like 25%...when we apply AI...we’re only working with 20% of that time.” (Laura, 30:08)
On Satisfying Work:
- “What happens with the time savings? ...develop[ers] were feeling less satisfied because AI is accelerating the parts that they enjoy.” (Laura, 00:04 & 30:08)
On AI-powered Documentation:
- “Documentation needs to be there for AI so that a developer gets the information they need in the best way…in the editor, not just your documentation site.” (Laura, 37:05)
On Rollouts:
- “The more intentional and structured a rollout is, the higher chance it has to be successful…slow is smooth and smooth is fast.” (Laura, 60:08)

Important Timestamps

00:00 – Opening question: what developer time savings with AI really mean
02:37 – How media hype misinforms and what engineering leaders need to know
09:14 – Why metrics like “acceptance rate” and “lines of code” are misleading
14:11 – What to actually measure: outcomes, not code volume
18:08 – Booking.com’s AI adoption strategy and performance data
24:04 – Biggest time-saving use cases for professional engineers (not code gen!)
30:08 – The paradox: AI accelerates “fun” parts, leaving more toil
36:17 – New architectural/documentation best practices for AI/agent usage
42:44 – Workhuman case study: 11% org-wide DX gain from AI
47:06 – Tooling investment, the return of big-ticket per-dev pricing
59:13 – Regulated industries’ advantage with structured AI rollouts (Indeed case)
63:51 – Automated migrations as a killer AI use case
67:40 – Laura’s closing advice: “Data beats hype every time.”

Takeaways for Engineering Leaders & Developers

Measure, Don’t Guess: Build or baseline your own productivity and experience data before/after rolling out AI.
Don’t Chase Vanity Metrics: Ignore lines of code and suggestion acceptance rates—focus on business/productivity outcomes.
Adoption Requires Intentionality: Training and enablement matter as much as tool selection.
Expect Complexity: AI doesn’t remove hard problems—batch size, testing, code quality still determine software health.
Structured Experimentation Wins: Treat AI adoption as a series of experiments; regulated/structured orgs see the best outcomes.
Focus on Developer Experience: Improving day-to-day dev workflows lead to real downstream benefits for teams and companies.
Stay Skeptical, Stay Data-driven: Challenge hype with measurements; keep leadership and business partners informed with real numbers.

Recommended Resources & Closing

Laura’s book recommendations:
- “Write Useful Books” by Rob Fitzpatrick
- “Unsavory Truth” by Marion Nestle
Practical tip: For AI-assisted migrations, do one manual migration and use that as a diff example for the AI to repeat accurately.
Link: For more data and guides, Laura recommends DX’s “Guide to AI-Assisted Engineering” (see DX website), and the Pragmatic Engineer Deep Dives.

“Data beats hype every time.” – Laura Tacho (67:40)

(End of summary)

Loading summary

Transcript107 lines

[00:00]
A
What happens with the time saving as a developer? What would that mean for me as a developer?
[00:05]
B
When Dora researched this question, what they found was that many developers were actually feeling less satisfied because AI is accelerating the parts that they enjoy and so what was left over was more stuff that they didn't enjoy. The toil, the meetings, the administrative work. It gave me pause when I read.
[00:23]
A
That what is the actual impact of AI tools on software Engineering? Lara Taco is a CTO at dx, a company with a mission to measure developer productivity with data, and has been doing so even before AI tools went mainstream from 2022. Today we discuss why most of the hype in the media about AI gets things wrong thanks to oversimplification, and why the burden is on us engineers to set the record straight. The actual data of the impact of rolling out AI tools for development at companies, booking.com and workhuman how developers report that their most timing saving use case is not actually AI code generation, but debugging tricky stack traces and doing it faster. The paradox of AI tools How using AI tools to help with coding can make develop less satisfied with our jobs because we actually like to code and many more interesting topics. If you're a lead or engineer interested in data about what works today with AI and how to stay underground with all the hype and the media, this episode is for you. If you enjoy the podcast, please subscribe to it on any podcast platform and on YouTube. All right, so Lara, welcome to the podcast.
[01:25]
B
Hey Gregory, good to see you again.
[01:27]
A
So to kick off one thing I hear a lot and I I see it as well is how AI is and has been and remains overhyped. So when I look at media headlines, some of the headlines are just, frankly they sound ridiculous. There are some headlines around how actually let me read you a few. Some are a bit scary as developers and some just feel like over the moon. So here's one from Forbes Are Coders jobs at risk? AI's impact of the future of programming a CIO magazine AI coding assistants wave goodbye to junior developers Gizmodo OpenAI just released a coding tool to quote, help programmers in brackets, replace their jobs. Probably software development may never be the same. Now these are all like, you know, they're not like tech, like, not publications where software engineers would say like okay, they're like amazing with tech. But you know, Gizmodo and Forbes like decision makers read them and they often get forwarded to developers or you see them. What is your take on what we're hearing with mainstream Media. And as you're talking with developers and engineering leaders, what are they telling you about these?
[02:38]
B
Yeah, I mean these headlines are headlines for a reason, right? They get clicks, they get engagement, they're.
[02:43]
A
Sensational, they're ad supported media usually.
[02:46]
B
Absolutely. And I think anytime, I think in general media literacy is really important. Data literacy is really important. And this is no different. So whenever I see a headline like that, I always trace it back to the money, like who's getting paid? As you said, it's ad supported media. Who's being covered? Are they a vendor? Are they selling an AI tool? Like I ask all of these questions. You should be asking all of these questions as well.
[03:11]
A
Are they paying for coverage? Sometimes they are. And even when they're not paying, they might be paying PR agencies who pitch these ideas to magazines and the magazines who need to do output. So they, for example, they're being paid by ads, they will often produce these shallow articles which are kind of pre written for them. By the way. This is a fun fact.
[03:29]
B
I think there's a challenge in AI which is that simplifying something to the point where it can be understood by someone who doesn't have a background in develop in development can oversimplify to the point of being incorrect. And so one recent example that I read was actually in the Wall Street Journal and they talked about how, you know, companies have AI employees and these AI employees had line managers. I'm using a lot of air quotes here. Line managers and company credentials. Well, sure, you could think about Copilot or you know, Claude or any agentic workflow as having a line manager being the person who's dispatching the work or you know, verifying the work.
[04:12]
A
And you can think about like insatiated the agent.
[04:14]
B
Yeah, yeah. Or like company credentials. Does that mean it has access to commit on GitHub? Like does my dependabot have company credentials? Because it can open a pull request. And so yeah, I could see in a journalistic world that is accurate. But is that really a reflection? Is there engineering manager out there hiring AI agents as employees and giving them company email addresses? Like that's just not really what's happening. And I think the oversimplification can be really sensational and everyone wants a piece of the AI hype right now. So you know, those are the things that I think about when I read this. I think it's really unfortunate because it puts a big burden on engineering leaders to educate their business counterparts who maybe don't have, they just don't have the background knowledge and experience to understand what is authentic and what is not authentic. What are the real limitations of these tools? What is possible now? And you know, it is our job, like it or not, to be that person who can translate those and explain it. Because ultimately it hurts us and it hurts developers when we don't do that. Because nobody wins in this hype cycle. Developers are like, this is super gimmicky. That's another reason for lower adoption. Like, this is super gimmicky. There's no way that this is going to work as good as it says. They try it once, it gives them spaghetti code and they're like, this is just a load of bs. And then on the other side, on the executive side, there's CEO saying like, Hey, I heard that Microsoft is writing 30% of their code with AI. Like why aren't we, why aren't we doing that? Those headlines suggest that 30% of Microsoft's code that's running in production was authored by AI. That is not at all realistic. We don't have data to support that from any of the companies that we work with. And hundreds of companies we've never seen data consistent with that kind of sensational claim.
[06:06]
A
If you want to build a great product, you have to ship quickly. But how do you know what works? More importantly, how do you avoid shipping things that don't work? The answer? Statsig. Statsig is a unified platform for flags, analytics, experiments and more, combining five products into a single platform with a unified set of data. Here's how it works. First, Statsig helps you ship a feature via feature flag or config. Then it measures how it's working, from alerts and errors to replays of people using that feature, to measurement of top line impact. Then you get your analytics, user account, metrics and dashboards to track your progress over time, all linked to the stuff you ship. Even better, Statsic is incredibly affordable with the super generous free tier, a startup program with $50,000 of free credits and custom plans to help you consolidate your existing spend on flax analytics or A B testing tools. To get started, go to statsic.compragmatic that is S-T-A-T-S-I G.compragmatic Happy building. This episode was brought to you by Graphite, the developer productivity platform that helps developers create, review and merge smaller code changes, stay unblocked and ship faster. Code review is a huge time sink for engineering teams. Most developers spend about a day per week or more reviewing code or blocked waiting for a review. It doesn't have to be this way. Graphite brings stack pull requests, the workflow at the heart of the best in class internal code review tools at companies like Meta and Google to every software company on GitHub. Graphite also leverages high signal code base aware AI to give developers immediate actionable feedback on their pull requests, allowing teams to cut down on review cycles. Tens of thousands of developers at top companies like Asana, Ramp, Tekton and Vercel rely on Graphite every day. Start stacking with Graphite today for free and reduce your time to merge from days to hours. Get started@gt.devpragmatic that is G for graphite T4 technology.devpragmatic. at Google they had I think 25% a few months ago similar claim. And then I talked with someone at Google and they actually brought up saying like, you know, they're like, yeah, we have all these like AI integrations, AI code review. We have the autocomplete like they have, they have pretty much as usual, you know this because you also talk with Google, but they have the whole internally they have the stack that's available for anyone using cursor, et cetera, just the internal version which is kind of trained on Google's data. And this engineer was saying like, I'm pretty sure they're counting accepted completions as AI generated, but that just seems weird because like yeah, I accept it when it makes sense, but I'm like reading it and I'm reading it out and they're like, we don't know where this number comes from. Like who's measuring it? They didn't even tell them. And is acceptance really AI generated? Well, technically, yes, but even before AI, then we could have said it was machine generated a lot of your code because autocomplete has always been very good at predicting edgers Start to type out the first two letters of a class. Was our code machine generated? Technically, yes, but so yes, I agree it's confusing.
[09:15]
B
Yeah. And I think your point about acceptance rate is exactly it. That's a lot of these studies that are producing these numbers for headlines are looking at accepted suggestions and that's just not a great measure of first of all the business impact. But also, even if that code made it to production, there is absolutely no, there's not a line between I accepted the suggestion and now it's running in production. So that's quite misleading. I think, you know, we could say are 30% of pull requests being assisted by AI. Like I think that is probably true at a majority of companies. And so that's just a really different magnitude of influence than 30% of code being written by AI.
[09:58]
A
We can also say that 100% of PRs at most companies, or like most larger companies have been kind of robotically checked. Right. Like the linter has been run on them, static analysis has run on them. This has been going on for years. And it catches all the obvious things.
[10:15]
B
I mean, can you imagine that headline like ACME Corp only ships code to production, that's been read by robots. Is this the end of software engineering? We could certainly come up with a sensational headline to just talk about CI cd.
[10:30]
A
As you said, you talk with a lot of engineering leaders, right? Like on companies, engineering leaders, tech leads, those kind of folks these days. What are some of the most common questions you get related obviously to AI? And what kind of sentiment are you gathering when it comes to AI from these same engineering leaders?
[10:49]
B
Yeah, I think the most common question I get is what should I be doing? I think as engineering leaders, we have operated in a space where we can pattern match in a lot of things. Right. Like if, if I'm trying to modernize my CICD pipeline, I can go talk to another customer of a different tool and see what they're doing and how they've modernized and look at their before and after. And in AI, we just don't have that because it's frontier work for everyone. And I think that's very exciting, but also very distressing when you're holding the bag of money and you have to figure out where you should spend that money. That's really tricky. So what should I do? The other thing is, you know, how do I measure it and how do I prove that I've made the right decisions right now? Because that is something that every engineering leader is being held to account by their exec team, by their board. How are you investing in AI and can you show me the results?
[11:46]
A
You know, tapping exactly on this. How, how can we actually measure the impact of AI? Actually, under Pragmatic Engineering, we did some deep dives, actually, we did one of them with you on how to, you know, we figured out all the things that don't work, like lines of code or the single metrics. And we started to make progress with things like a space framework later, the devex framework. What are measurements here that work might work even for developer productivity or just measuring the efficiency of AI. What have you seen, you know, work out?
[12:13]
B
Yeah. What's so tricky is that as you said, developer productivity is just, that's it's a really hard problem on its own. And when we add AI on top of it, we certainly don't reduce complexity here. So, you know, companies that had invested a lot in understanding developer experience and developer productivity are in a better spot right now to understand the impact because they have that baseline of understanding of how their teams and organization operated before. And then we can just do experimentation style. We can look at what the impact of AI is. I think for any leader who is out there sort of feeling like lost in the forest, not quite understanding what are even the measurements to look for when it comes to telling the story about the impact of AI. Abhinoda, who's the co founder of DX and I have just put together a new AI measurement framework which I'll just share on the screen. We can kind of talk through it. This is the DX AI measurement framework and what we recommend based on our field experience working with hundreds of companies who have been working with AI from the infancy, the very beginning, when AI was a glimmer in everyone's eyes, to now, full scale rollouts where they're seeing some pretty impressive results from using AI. We want to look at it across to utilization impact and cost. And these are really the three areas that will give you together a really complete picture about how AI is working or not working, what you should do next and then how you can tell that story about impact in your organization.
[13:47]
A
Yeah, because I guess in the end everyone's looking for impact, right? Like it should result in something tangible. Am I feeling this right in terms of either you're like, as a software engineering organization, either you're like building more stuff, building better stuff or generating more revenue or like somehow if it doesn't help with any of those or some related things, then you know, what am I even doing right?
[14:12]
B
Yeah. And I think, you know, as an industry we've looked for output metrics to kind of quantify that end result. And at the beginning and actually what you'll see in a lot of the headlines is about the quantity of code that can be produced with AI. But this is really disconnected from everything we know about developer productivity. Developer experience like quantity of code doesn't actually mean business impact. And so when we think about measuring the impact of AI, we need to sort of track it across the lifecycle, but also really stay focused on the end result which is as you said, it's more revenue, it's reduced cognitive load for developers, it's a better developer experience, it's more time, more time to innovate these Things are really important. And focusing on something like acceptance rate, for example, just by itself isn't really going to tell you the whole story there.
[15:01]
A
I wonder if we're going to have a bit of a speedrun of what we've learned about developer productivity over, let's say, 20 years, compressed in a few years. Because I still remember that when the first kind of developer productivity products came out measuring, they started to measure things like lines of code per developer, and then they looked at number of commits per developer on average. And you know, the first product said, like, this is good, look at this. And then as an industry, we started to say, that's B.S. like, I'm sorry, the developer who pushes the most code to production or writes the most lines of code might not be your best developer because they might just be doing boilerplate stuff, updating frameworks. Actually, they might just be fixing their own bugs because they ship so many of them. So I feel. But we've had that conversation, let's say maybe like 10 years ago, and everyone agreed lines of code is not the best metric. In fact, some of the best developers sometimes don't even write lines, they delete them. But now we're now back here like, oh, yeah, AI generates a lot of lines of code, therefore it should be productive.
[16:04]
B
Yeah. And I think one of my more controversial opinions is that source code is a liability. I think it sounds controversial. And then when people think about it, they realize that, yeah, it actually is. And now we're in a world where it is trivially easy to produce a tremendous amount of source code. And so what does that actually mean for productivity and business impact when what could have been written in one line is now written in five lines? Do we really want to measure AI impact in terms of lines of code generated? I. I certainly don't. We don't recommend it. We did not include acceptance rate in our framework for good reason. I think it does give insight into whether the tools are fit for purpose. But when we're looking at broadly measuring business impact and the impact on developer experience and the impact on the business, acceptance rate is just such a tiny part of the story.
[16:57]
A
And by acceptance rate, you mean like, what percent or how many lines did the developers say accept the tap suggestion or whatever the AI is spitting out?
[17:05]
B
Yeah. So we can use that to figure out if it's just spitting you spaghetti code and suggestions that are not accurate, then acceptance rate is going to be low. And we can use it as a signal to understand that, okay, these tools are not sufficiently robust for the use cases. But if we're just looking at, oh well, developers are accepting 95% of the suggestions. That doesn't really tell us anything in terms of is it increasing their velocity, is it saving them time, is it going to help us innovate faster? Those are the things that we actually want to look at and that's what we've included in this AI measurement framework, not necessarily just those granular measurements of acceptance rate or lines of code.
[17:48]
A
I'm glad to hear a little bit of more grounded approach. And I guess it helps that you're not an AI vendor and you're not, you know, I guess your goal is to try to figure out like what actually works. Now you did an interesting research case study with booking.com and how they use AI. What did you find there?
[18:08]
B
Yeah, so for, for booking, what they really realized was adoption is the key to getting a better result. What they found was that the developers who were adopting the tool, so going from non user to periodic but consistent users and moving, you know, moving the population up from, from not using into daily, weekly usage, that was where they were seeing the most benefit. So they did some very concerted organization wide efforts around enablement for adoption, things like office hours, things like, you know, workshops and trainings. And they got their adoption up to 65% of developers using these tools on a weekly, daily basis, which is actually well above the median, which is 50% industry wide and the top quartile is 60%. So they're doing quite well according to the industry benchmark.
[18:59]
A
So just to pause here, just so I understand, so booking was like, okay, we'd like as many devs as possible to use these tools, GitHub, Copilot or whatever, or any other copilot that they had the chat, et cetera, let's say on a weekly basis. And we're going to do training on this, leadership will say, hey, please use it or like try it out. And they did office hours, they had teams on this. And even after doing this, we're saying that about 65% of devs use it weekly or maybe daily, but mostly weekly. So that means 35% are still like, no, I'm good, I'm just going to do what I did before. Right?
[19:37]
B
Yeah, for a variety of reasons. Right. But that 65%, what's interesting is like that's still above the 20, you know, above the P75 industry wide.
[19:47]
A
But I'm just pausing here because I think there's two types of people, right, who are listening as well. It depends on what environment you are. But if you're in a startup environment or you're just like an early adopter, you'll be like, well, why would you not use it? Like, everyone using it, Like, I'm not going to use it all the time, not for everything, but like, yeah, it's there. Like, I know when to use it, when to not, and they'll use it daily. And then there might be some people saying, why would anyone use this? But it's interesting that in a company that is trying to say, like, okay, let's get everyone to use it. To me, it still feels interesting how it's only 65% when they do all these things that a bunch of places don't do, from the training, from the enablement, from the investing, from partnering with companies like you. So what have you Learned about that 35%, like, what is either holding them back or are they right to be skeptical?
[20:38]
B
Yeah, I think the biggest learning for me is that it's not necessarily that these individuals are skeptical Luddites who don't want to use any new technology. Some of it is just that the organization doesn't make a license available to them. They would like to, they would like to use it, but the licenses aren't available. And I'm not suggesting that's the, that's the case for booking. I want to make it clear. But I have seen that pattern repeated in many organizations. So we have to, when we, when we think about utilization, you know, in our framework, we recommend looking at the number of daily active users and weekly active users. If you use DX to measure that, you can actually look at that as a percentage of your total population. And then you can actually look at where the licenses line up to be across your population, because it might be that the people who would like to use it can't yet because the licenses are not available to them. Yeah, you know, some companies right now, as they're experimenting will say, okay, we're going to get 500 licenses for copilot, 500 licenses for Kodi and whatever. So there's a limited pool to pull from. And so there's no scenario where 100% of developers could be using it. Just they simply haven't invested the money in making licenses available to 100% of their developers. And so I would say that's a fairly big reason where we see not 100% development or 100% adoption. I think the other thing, and you covered this in your lead dev talk, is that for certain services, components, product areas, it's Just not that effective because of the very novel or greenfield nature of the kind of code. So like we can think about this on a spectrum and one is like writing terraform files or like one is, one is writing something in YAML in like a really defined way like for terraform, and the other one is doing something that no one has ever done before. AI is amazingly excellent at stuff that has a lot of structure and a lot of pattern. But when you use the example of that healthcare startup who wanted to remain nameless because they didn't even want to go on record saying that they don't use AI because it just doesn't work for them. And so you can imagine at a company as big as Booking or as big as Meta and Dropbox, there are going to be pockets of developers who just aren't well served yet by the tools.
[22:56]
A
Well, and I can also see cases where these tools just really fall flat, where you're trying to do something very, very specific and in a concise or very performance effective way, which is usually about understanding the whole structure, making small tweaks. For example, if I think of I was talking with a stable Linux branch maintainer, Greg kh, I was saying how do you use AI? And he was like well, I mean we don't really, they use it for tools. But if I look at every Linux commits to the kernel, it's a few lines and those lines have been thought for so long and they need to be as concise as possible. Performance matters, all those things matters. And for those use cases, especially with big companies, I can imagine that if you're in a platform team, you're optimizing the P95 performance of your Android app. This is like you might use it for brainstorming or here and there, but in the end the changes you make are so small in terms of lines of code, but they're so large in impact and so much of it is about testing, about wins that have not been made before or seeing connections. So I wonder if some of that is also that.
[24:05]
B
Yeah, I want to show this to you because I think that exactly that point that you made, that maybe the biggest gain is not in co generation but in like you could still use it for brainstorming, you can still use it for error analysis. This is part of that enablement and training that companies can offer in order to increase adoption, increase impact. We did a study of 180plus companies and we looked at the developers who were saving a serious amount of time with AI and we tried to understand like what are you actually doing? And interestingly enough, code generation, like mid loop code generation is the third highest use case for saving time. But actually stack trace analysis and refactoring existing code, we're saving more time than the mid loop code generation. And so this is also really important for companies and platform engineering teams to understand because I think the idea is like, well, we give our developers a license for Copilot and then just expect them to kind of figure it out. And a lot of us go to mid loop code generation because that's kind of what is mostly talked about.
[25:13]
A
Yeah, most demos, the most obvious things, right? Like yeah, that's what I thought about until we talked about.
[25:18]
B
It's the most obvious thing. But things like, you know, putting in 100 line stack trace analysis and being like, what? Why is this happening? Give me like give me a diff that would fix this problem.
[25:32]
A
Or give me four possible ideas.
[25:34]
B
Totally. Yeah.
[25:34]
A
And then two of them might be things that I didn't think about and now I can go off and research.
[25:39]
B
Yeah. And so there's kind of like an. There's no ceiling on the different kinds of use cases that AI can help, especially when it has really good context and understands your code base really thoroughly. Code documentation, brainstorming and planning unit tests are an area that are really well served by AI. Anything that's like very well defined. But I was really surprised to hear about stock trace analysis being the top time saver and not necessarily being mid loop code generation because as you said, it's the most obvious thing.
[26:12]
A
And when you say mid loop, this is like what does mid loop stands for?
[26:16]
B
Yeah, kind of. I can write out the scaffold of whatever function I want to write and I can give it the input and the output and then just say, give me finish my function, make it complete this thing for me.
[26:28]
A
But this is really interesting as well because I feel it goes a little bit against the mainstream narrative, even with developers, as you said, that what is AI good for co generation? It generates code faster. Because I think that's what we see. Right. It does spit it out. It can do it like it is superhuman in terms of speed. You cannot write this fast. But if we see that these are the main use cases, stack trace, refactoring, okay code, your engine is still down there. And then we have the other stuff. Maybe it suggests that. Yeah, as you say, maybe we're thinking a bit wrong. And I wonder if this might also impact how there's this narrative that now anyone can be a developer. You don't need to be a developer to write software. But if these tools really help with stack, take trace analysis and refactoring. Unless you're a developer, you're not going to use that. So maybe these tools are actually a lot better for experts, professional software engineers who know what they're doing.
[27:21]
B
You know, to me, the sticking point about things like stack trace analysis or refactoring code is that it's about time savings and not interaction with the tool. And what I mean by that is like, number one, typing speed has never been the bottleneck in development, but now we have all this code generated faster than we can type. That's great, but it still takes me time to review that code to, you know, cognitively make sure that I understand it and that it's accurate time to review it. And so the time savings, it's not that we're saving time because we don't have to type a lot of that time. We're just reallocating to reviewing or other parts of code authoring. That's not typing for stack trace analysis. We're actually just eliminating the toil completely of parsing through this huge output, trying to figure out what's going wrong and then going spelunking in the code. And so that is truly like a net positive time saving. To say, give me four examples or what's the most likely cause of this. I can just totally leapfrog that whole 45 minutes that I would have spent banging my head against my keyboard trying to figure out, whereas if I'm using it for code generation, yeah, it's faster. But I also have to invest time reviewing that code to make sure that it's accurate.
[28:45]
A
Reviewing it, iterating on it, you might refactor it oftentimes, you know, I think as developers, we know that it generates something, but if you have some coding styles or if you have a way of coding, you will tweak it, rewrite it, change it, it gets it wrong, et cetera. So, okay, so this makes sense going back to the first one where like, okay, it truly saves time when. What happens when it saves time? Like, let's say, okay, until now, I've usually had to spend a bunch of time analyzing the stack trace. And I was stuck on it for, you know, 15 minutes at first. But by the way, with experience, it would have gone down to like 10 and then 5 and now I'm a senior engineer and boom, I look at it, I know what that is. But what happens with the time saving? Like, as a developer, like, what would that mean for me as a developer or me as an organization, do I just clock out earlier? Will I now have a bit more space to help out others, to start to think about bigger things instead of the day to day? And some strategic stuff. Where have you seen it? Because this is not new, right? Developers saving time. We've seen this with other tools as well. The big question is the organization thinks, oh, if my developers spend 10% time, each of them, on average, I can either fire 10% of them, this is the, you know, the big evil corporation, or I can just not hire and my productivity goes up 10%. This is what the business thinks, but it's not really what happens, does it?
[30:09]
B
It's definitely not what happens. I think one thing to keep in mind is that on the very best day, developers are not spending even 80% of their time coding. I think the industry average is like 25%. There was a study at AWS that an average AWS Engineer only spends 20% of their time coding. And so when we apply AI to the coding tasks, we're only working with 20% of that time to begin with. And when we save 10% of that time, that actually doesn't, that doesn't amount to, oh, we can, you know, ship 10 new product lines, you know, overnight. That's just not realistic. I think this is though where things get a little bit weird. So what happens with that time? Because we would like to think like, oh, this is going to be really great for developers. They're going to be saving, you know, they're saving time, they can reinvest that time and tech debt repayment or, you know, other stuff. When Dora researched this question, what they found was that many developers were actually feeling less satisfied because AI is accelerating the parts that they enjoy. And so what was left over was more, more stuff that they didn't enjoy. The toil, the meetings, you know, the administrative work. And so that was an interesting result. And that's from their guide on AI engineering that came out a couple months ago. It gave me pause when I read that because I thought, I've always had the very strong conviction that AI time savings is not going to come from the coding task. It makes sense that that's the obvious place where we all started. But organizationally, how fast we can create code has never been the bottleneck, it's been everything around it. And now when we take away or make that faster, the code authoring process, for people who like to author code, that has some impact.
[32:02]
A
Well, I think you, you know, when I was Thinking when I was a developer and then was a manager with developers, like, what is a good day as a developer? And I think, you know, an average good day, like again, there can be different days. But it was like I come into work, you know, we say hi to people, you know, maybe we talk about something. Usually it's, it's. We don't have meetings. I have a clear goal in mind. It's something challenging that I want to complete something. Maybe it might have been from yesterday or I'm now just starting as fresh. I get into the zone, I, you know, get it together, I get it working, I clean it up, it's working, it's amazing. I committed or I tested, I check it, I put up a pull request and I'm done. And you know, if this happens at 2pm and it was something challenging, it should have taken me eight hours, but I got it done in four. And I'm really proud of works it does. Maybe I could clean up and then I help out some people. But when I had that, like it's been a good day and the bad day is the opposite. It's like I get into work, I have this thing that I need to do and I get interrupted. I go to a meeting and I try to go back and now another meeting or I finally get back into it, but now I'm just stuck. It's more complex. And I go home and when I'm falling asleep, I'm still thinking about this goddamn thing and I actually want to open my laptop and I don't sleep well. We used to do things before AI just like things like no meeting days and bunching meetings to give people the chance to be in the flow. Because there is something about that when, when you are in the flow for a day, for some time, it's good, right?
[33:39]
B
It feels good.
[33:41]
A
You're a developer, you still build software or used to build software, right?
[33:46]
B
Yeah, both. I think this is going back to that question about measurements and how do we measure the impact of AI? This is one area that I'm personally very curious about, which is does AI allow us to manage interruptions better? Does it help developers stay in flow state? Does it reduce cognitive load? And these are numbers that aren't going to show up necessarily in time savings per developer, but will show up in other areas of developer experience. Because one hypothesis is that with the use of AI tooling, the tax that you pay when you have to switch tasks, so going from, you know, maybe you have from 10 to noon focus time and then you have a meeting from 12 to 12:30 and then you can focus for the rest. You know, that tax beforehand and the tax after. Does having an AI coding assistant reduce the amount of time that it helps you or that it takes you to get back into flow? Because you have basically a body double or a pair programmer, so to speak, who is holding context and keeping context for you and it's easier for you to pick up where you left off. This is really difficult to measure systemically with only workflow data alone, which is maybe another thing that I didn't emphasize when I talked about the measurement framework. But combining self reported metrics with system and workflow metrics is absolutely essential when measuring the impact of AI tools because it does have an impact on the authoring experience as well. And some of that we cannot observe from our systems. We actually have to talk to developers in order to figure it out. And this is one area. So talking about change competence, developer experience, measurements, CSAT for the AI tools, those are all things that are really important parts because we might actually miss important signal about how AI is impacting the code authoring experience or other parts of the software development life cycle. But we might miss them if we're only looking at the workflow tools themselves. So we need to have a more robust, comprehensive way of measuring across the organization.
[35:52]
A
So let's talk about what is working and what you've seen working. And one of the things that we previously talked about is how some of the teams who are actually making pretty good use of AI are starting to make some architectural changes to their code base and make some architectural decisions to make it easier to read for like let's say agentic modalities. What have you seen work and how is this coming along?
[36:17]
B
Yeah, there's two broad things that I'll talk about here, and one is the architecture itself and then the other is sort of the discoverability of the architecture and of the system. And there's a lot more going on there on the architecture itself. What I have seen just sort of anecdotally from my own conversations is leaders recommitting to like clean interfaces between services. I would say that's probably the top thing that comes up into Nice.
[36:42]
A
That's kind of a nice thing to come out of this.
[36:44]
B
Great. I love to see it. I think, you know, we can think about sort of AWS's. Everything is an API as a model here. Like when your own system systems operate like that, the interfaces are so clear and well defined, it becomes easier for agentic models to use your Code base because the boundaries are more clearly defined.
[37:04]
A
And so that's, by the way, also for humans, right?
[37:06]
B
Also for humans. Yeah, it's like it works really great. And so that actually, that point, also for humans, is the interesting point about documentation, because this is the shift that I'm seeing more often. I was actually in Amsterdam while you were in Mongolia, otherwise we could have had another steak. But while I was in Amsterdam, I did a fireside chat with about 45 other engineering leaders and very quickly into Q and A, the question was, should we be writing documentation for AI or for humans? And my answer to that question is, yeah, both. But here's one thing that I've seen kind of pick up, I would say, in the last six weeks. So human documentation often has visual dependencies. It'll be a screenshot of something. It needs this sort of narrative flow. Whereas for AI, it's really good to have the coding examples. There can't be visual dependencies. It doesn't work great that way because developers aren't going to a documentation page necessarily or watching a YouTube tutorial to see how to use your thing. They're going to Claude to ChatGPT. They're interacting, you know, in their IDE with an AI assistant and trying to implement it. And so documentation needs to be there for AI so that that developer gets the information they need in the best way. And I think companies that are going AI first, like Vercel Clerk, for example, they have really, really solid examples of AI first documentation because it's a flywheel. Like, you know, they have great documentation. Then when a developer's trying to implement this thing, they actually get a good suggestion and can do it successfully from within their ID with whatever coding assistant. Then that coding assistant has more data about what actually works and it just keeps reinforcing it. And so for internal development teams, like platform teams, this is a great model to think about. Like, how can you make documentation that is getting people the information they need at the moment they need it, which is now in the editor, not in your document, you know, your documentation necessarily. And then for external developers, if you're making a dev tool that's like out there in the ecosystem, this is the way that people are discovering your tool and implementing it. They're, you know, just the way that people that developers are are coming across tools and using them is really different. So that's been the biggest way that I've seen companies think about or already start trying to change the way that they architect their services, but also document their services to make them work better in this sort of AI assisted coding world.
[39:48]
A
That's interesting. And I also like your thinking of how this also creates a bunch of opportunities for especially companies, startups who are building APIs or things for developers to use. If you make it easy for them to use and also make it a bit more friendly for AI crawlers or any of these to ingest, you might be getting more users later on or unblocking your users. Because in the end, I'm going to guess in the future, two years from now, developers will be like, all right, I want to create a project using this technology. As a developer, you will specify this technology. If not, it'll default to whatever. But if you have too much trouble with the technology, eventually there's going to be this learning thing. You will choose the technology as today, the one that you're familiar with. You know, it works. You know, if you get stuck, it's easier to do. So these things will remain important. Technology, SDK, vendor, vendor, you know, reliability, maintainability. These things are going to remain important for professional software engineers that, you know, we are and we will be.
[40:48]
B
I like what you said about this is also good for human beings and also it's good for human beings and it's good for AI. Like, there's so many areas about developer experience and the kind of the world that I operate in where, like, what's good for the developer and good for the business is circle. That Venn diagram. And this is, you know, what's good for the AI agent and good for the human being is also a circle, like clear boundaries between services. I think that's such an interesting space to operate in. It's like we needed AI as sort of like the business kick in the pants of like, hey, we're not going to get as much out of our investment in AI because now our wallets are open. Before it was like, well, read the manual, kind of work around bad tooling, but now that there's significant financial investment around it. Not that development teams aren't a significant financial investment, but I think the tolerance is different, you know.
[41:40]
A
Yeah, I like your thinking. This is a good Venn diagram to draw because again, when you onboard to a new company, it's always been problem. Onboarding has been just difficult. There's the onboarding documentation is out of date, the presentation is out of date. No one tells me how to do this. So then people take, you know, a month to get productive and now you can say, okay, let's update it. So now our AI tool can also Be helpful. But also it goes both ways, right? Like when people onboard, they can also read their thing, they can now turn to the chat agent which will actually give them accurate information. I, I love it. Like, I feel, I feel there's a lot of like these, these wins like this. It's always nice to discover these. So speaking of wins, as an engineering leader at an organization that is adopting AI, there's either a mandate or you want to do it, or both. How can you measure what is working? And before you told me about this company called Workhuman who did something similar where they actually figured out what to measure, how to measure in a practical way. How did this work? What was the story there?
[42:45]
B
Work Human, I think like many other companies started working with Copilot. It's very accessible to most developers. And what they found was they knew that it was working. They heard from the developers that they enjoyed using this tool. But what was really hard was figuring out how to quantify how much it was working and where it was working. And so what workhuman did was they used those metrics in the AI measurement framework across utilization, impact and cost to try to figure out how we can actually kind of draw a map of where things are going well and where things are not going well. So what they found was that looking at developer experience more broadly, they were able to figure out that AI has a good impact on our company because it improves developer experience. And I think this is sort of very broadly speaking the advice that I give to every single company when they're trying to figure out like how do I, how do I reason about AI's impact on my company? I tell them AI is a tool to improve developer experience. When you improve developer experience, you have better outcomes. It is, it follows like that it's the same pattern at every single company. AI, isn't this like magic bullet that's going to solve everything? We're talking about improving developer experience. And so Work Human found that it's an 11% boost, boost in developer experience.
[44:08]
A
And how did they find? Was it via kind of a survey? Did they measure data? Is it a mix?
[44:15]
B
Yeah, they're measuring a mix. So they're using, they use dx. So they use the Developer Experience Index, which is our kind of composite metric of 14 research backed developer experience drivers. So everything from incident management to local dev, iteration speed, lots of these different factors that play a big important role in the day to day work of developers. So they're seeing, you know, an 11% gain and that's all correlated as well to time savings.
[44:43]
A
So, and just so I understand, I think for anyone like saying so they were measuring these things before, right. And then as they rolled it out they kept measuring it and now they're seeing a gain. Right. Because you can only get an improvement on something that you measure. So if I'm working at a company where never measured anything, I'm going to have like, I can measure now and have a baseline, but unless I did it before AI, it's going to be a bit harder for me to tell, you know, how much has it improved because I don't have data before.
[45:12]
B
Yeah, exactly. And so my other general advice is like start measuring now. We can pull in historical data, you know, to cover some gaps. You know, we can look at GitHub and Jira and those tools historically when it comes to surveying on the developer experience and from those self reported things, just do start small, just get started, don't wait to hire that other person, don't wait to, you know, do anything. You're just delaying success. So get started, get a baseline. That's what helped work human, be able to figure out what the, the biggest gains were.
[45:45]
A
And so what were the gains that they found? 11% across the board. What does that.
[45:51]
B
Yeah, so they were, they kind of measured organization wide and found that developer experience went up 11% across the whole organization, which was great. The developers that they found that use AI. So we're just segmenting here like daily weekly users with users who are not had a 15% higher velocity. So they were able to ship more code to production, get more stuff done than non users. These numbers are from several months ago. And what they've noticed is like it just keeps compounding and getting higher and higher. So this is a great example of you know, we know and if you and I are developers, like we know that these tools are delightful to use most of the time. We know that they make a big difference when it comes to, you know, enjoying work and bringing the joy back to doing development work for a lot of us. But going to your VP or CTO or board and saying, well the developers like to use them is not a, you know, like that's a, that's a, that's a hard sell to keep the wallet open for getting more funding for these tools.
[47:01]
A
Yeah, yeah, because, because these things cost money. Especially now with the agentic tokens, you know, they can burn through a lot of money.
[47:07]
B
Yeah, absolutely. That's actually another thing that you know, is worth discussing here because in a lot of these, you know, companies that have been using AI for a longer time and are a little bit more mature. They were sort of operating in that binary. You have a license or you don't have a license. Right now one of the biggest challenges when it comes to measurement in front of us is looking at consumption based pricing and figuring out who are the developers that have the most to gain. Who are the developers that have the least to gain from being able to have access to more tokens. What are the use cases that have the biggest impact, for example, stack trace analysis. Because what I'm hearing from engineering leaders right now on the ground is I just don't know how to allocate the buckets of money. Do I give a bigger bucket of money to senior engineers than junior engineers or is it the other way around? Do I give a bigger bucket to junior engineers because I can get more, you know, productivity value from them with AI assistance and the senior engineers don't need as much.
[48:13]
A
This is just so interesting because I feel a little bit of deja vu now. If you remember back to the last time in the tech industry where we've had companies spend thousands of dollars per year on developer tools was around 2000-2010, where a lot of startups and a lot of tech companies spent about 3,000 to up to five or up to $8,000 per year per developer on Visual Studio licenses, which was not just for Visual Studio but it was for documentation. This was pre Internet documentation was terrible on the Internet and that was amazing. And access to early release software where they could use like SQL Server, all these things and it was huge amounts of money and even startups paid because they could get an all in one development kit for like thick clients. So like Windows applications, web database servers, et cetera. And this lasted for a few years. It kind of died out. But there was a case where almost every company who said, and they did it because, well, they could just use open source, but it was slower and they were like, oh it's, you know, if we're paying 100k for a developer per year, even back then it was that we'll pay 8k more or 10k more per year to get them more productive and to get ahead of the competition. So we've had this before, but now it feels we're getting back to this where like there will be companies that say, you know what, let's just bite the bullet, do it even if we don't have the data. Shopify is doing this. They have no budget limits and they're the companies that are Like, I'm not sure it feels expensive. We're used to like $200 per year per developer allocation, that kind of stuff.
[49:49]
B
Yeah. And I think history always repeats itself. Right. And when you're of a certain vintage, as you and I are, we see these patterns over and over again. And so, you know, that's been providing maybe some comfort to me is like, you know, I've lived through quite a lot of hype cycles. I was talking with Jesse Adametz from Twilio yesterday, who leads their developer platform and like, he and I were both like in the thick of the Kubernetes hype. We were kind of comparing notes of like, how does this feel compared to like the Kubernetes and container hype? You know, there's a lot that's the same, there's a lot that's different. And the hype, eventually, you know, it all concludes one way or another. So we're not still living in the container hype cycle. Eventually we're going to see some stabilization. Right now we're in this like, Cambrian explosion of tool sprawl and there's just so much unknown, pricing is unknown. And eventually we're going to kind of consolidate and come to something that's a little bit more stable. But it's going to probably take us a year or two to get there. But Gregory, I would not be surprised in 18 months if we're spending. Yeah, like, what did you say, 3,000?
[51:01]
A
It was 3,000 to $8,000 per year. So if you. That was per year. So on a monthly basis, that would have been something, you know, 400 or 300 to like $800 per month per developer on those tools back in 2000.
[51:16]
B
Yeah. And so, I mean, we can adjust that for inflation. But my, my sort of prediction has always been, you know, or is right now, if we think about 18, 18 months in the future, I don't think it's unrealistic to spend 1200-US$2,000 on an agent who can complete tasks autonomously, even if they have to be verified by a human in the loop. And, you know, I think that there are going to be companies who are willing to open their wallets because maybe this does allow them to avoid increasing headcount at the previous rate, or it just allows their senior developers to be spending more time on more complex work, which is another thing that we've seen in our data at DX. So we have a Core4 measurement framework, which is our sort of evergreen, solid foundation of developer productivity. I'll actually I can show that right here for those who are curious to see it. But one of the things that we look at is speed. And when we think about measuring AI, we see AI as like an enabler and we're going to see the impact on all of the core measurements of productivity and performance. It's not that we need to rethink everything because AI exists. Like we still need to just go back to our fundamentals, understand what performance means, and then see the AI impact on it. And specifically in this speed category, what we've seen is that the diffs per engineer increase, so we're able to get more throughput, but also the complexity of them increases as well. So AI users are able to work on more complex work and get more of that work through the systems to production, which is interesting, part of me thinks, is that complexity, good or bad, we see when we triangulate complexity, this.
[53:04]
A
Also goes back a little bit to source codes being a liability. Right. We're also seeing diffs increase and eventually opportunities for bugs will increase. In fact, probably bugs will increase unless you have more thorough testing.
[53:17]
B
You know what, you're very right about that. And actually that's a trend that we're seeing already. Let me show you this from the.
[53:24]
A
A. Yeah, I'm speaking the future here. Predicting first principles, thinking.
[53:30]
B
This is from the DORA's Impact of AI study that they released a couple months ago. What they've seen already is that delivery throughput is actually slowing a little bit. Because my hypothesis is batch size is increasing. And this is the thing about AI, it doesn't change the fundamental physics of things we already understand to be true about software development. Bigger batch sizes are riskier. So we want to keep batch size small.
[53:58]
A
But here's the batch size. Usually it's the diff size.
[54:01]
B
Yeah, exactly. How much work is being shipped? Is it a small amount? Small chunk, big chunk? And then here's this kind of forecasting. If AI adoption increases by 25%, they actually predict that it's going to be a minus 7.2% reduction in delivery stability. For. I mean, we can hypothesize about the number of different reasons that might be. I mean, part of it goes back to that fundamental thing that bigger changes are riskier. AI makes it trivially easy to write very, very big changes all at once. And this is something that when we put together the AI measurement framework, I think one of the biggest risks about measuring AI is that when we do get tunnel vision on stuff like lines of code or acceptance rate, we miss the Picture about quality, stability, reliability, maintainability. And we can't get short term gains at the sacrifice of long term stability. Like we know that that's not a viable solution. And in order to be able to protect yourself, you have to have good measurements in place that you're seeing all parts of the picture and not just hyper focusing on the speed gains.
[55:13]
A
Yeah. And as a software company, just taking this a little bit further again, let's put our optimistic glasses on where this thing works and most of the code is good and we can even do a bit better testing with AI so the quality will not degrade that much or it might still stay the same. What is the best case outcome? Because people are starting to ask this question. I'm seeing social media AI has been around for two and a half years or coming up three years starting from ChatGPT has been around for earlier, but again, let's just take that for the sake of it. But as an end user of a customer of a company that has invested heavily in AI, made that be Google, Microsoft, a startup, what should you be seeing? Obviously from the company's perspective, maybe they're doing the same with fewer people, but should you be seeing higher quality, more frequent iteration, better bang for your buck if you will. No price changes and more functionality. What might we see? Or is it just like the cloud, which is it's been a cost exercise. Reliability might be higher in some cases or dependability will be good. But end users don't see anything. You don't know if the service or your company that you're using has their own infrastructure or the cloud. You don't care. The company very much cares and they can do all sorts of tricks there.
[56:32]
B
I think it's all of the above, most likely. So as an end user, what I expect is faster time to market and that's really on, you know, on the other side, on the building side, what we're trying to emphasize and what a lot of our conversations have focused around with other, you know, executives and engineering leaders, we're really trying to reduce the amount of time to market.
[56:53]
A
Yeah.
[56:54]
B
So I think this has a lot of implications. You know, software is developed usually right now very sequentially. We have a roadmap and maybe we have a Gantt chart of what we're doing now.
[57:05]
A
We have prd, we have a meeting with all these business stakeholders because we know it will be expensive to change later. Right.
[57:12]
B
This is another one of my maybe unconventional right now, maybe a little off, off the rails opinions. But I think like I think roadmaps are on their way out. In the age of AI, I think companies that are going to win with AI are not ones that think about things in roadmap sequential form, but think about it more as experiment portfolios. And so rapid experimentation and trying to figure out what does delight your customers is going to help companies win. I think the companies that will win are the ones that focus, that already have muscles to do experimentation, a B test, trying to figure out, you know, how to delight their customers. I think as an end user, what I don't want is thrashing as an end user and I could see that happening. It's just like, oh, we're just gonna. Because now there's fewer reasons to say no to things and there's probably good reason that some of those things weren't built yet and they're sitting on your backlog and now that it's, you know, not trivial but much easier to build those things. That doesn't mean that as an end user I'm going to find it useful and I don't want to see thrashing and just like feature bloat. But what I do want to see is faster time to market for the things that have been validated and experimented already that really do delight end users. The same is true. I don't care whether, you know, my application, the application I'm using is running on Kubernetes or not, or if it's running in Azure or it's running in aws really doesn't. I don't care. I just want the end user experience to be great and I don't care if AI was used to development or not. I just want the great experience as.
[58:51]
A
An end user, as a company who might be rolling out, either as an engineering leader or you're a tech lead there. Have you seen a good rollout before these AI tools? Is there a case study or a company that you might have observed who actually did a pretty good job in figuring out what to measure, how to roll out, how to deal with things like reliability, those kind of things.
[59:14]
B
Yeah. You know what's wild Gerge, is that I'm seeing that highly regulated industries, financial insurance, pharma are having the, the best results from introducing AI tools. Here's my reason is because they have to be so deliberate and structured in rolling out. And what we have found is that structured rollouts get the best results. And so it's that whole like slow is smooth and smooth is fast kind of thing. So you know, I, I've had many conversations with very large Banks who are far ahead, I would say, of their kind of tech counterparts at even smaller companies because they've had to be so intentional with acceptable use policies, with licensing, with, with budget and finance and kind of making sure everything is making sure.
[60:08]
A
How you're not going to have sensitive PII data leak.
[60:11]
B
Absolutely. And the more intentional and structured a rollout it is, the higher chance it has to be successful. So that's one thing. Structure is everything. So one of the companies that has actually a really good story about structured rollout and adoption is indeed. So Indeed.com is a global job site, talent matching service. They do a lot with, you know, helping people find careers that are meaningful to them. Indeed is operating at enterprise scale. And one of the things that is like very much built into the fabric of the, of what indeed does is this experimental mindset. And so one of the ways that showed up with their structured rollout was actually trialing a bunch of different tools and doing a cohort analysis of which tools were working better for which use cases and which individuals. Because as I said before, we're in kind of like the Cambrian explosion when it comes to all the different tools. And it can be very overwhelming for engineering leaders or for executive teams to like have a bag of cash and then they want the confidence that they're actually spending that bag of cash on the right stuff. Their wallets are open. And so what they did was kind of segment groups and do trials of different tools and then look at the results comparatively. And then from there they figured out, okay, well let's do you know these tool, these two. I've seen other companies take the same approach and they ended up with, with just one vendor. But this is like a very structured, methodical approach to sort of run a controlled experiment and figure out what are the tools that are giving us the most gains. I think the other thing that they did experimentation wise is just trying to figure out like can we, we have a hypothesis. So for example, high latency code reviews are bad for everyone. They've got teams all over the world and say you're a developer in Seattle working with a developer in, I don't know, Vienna, Austria is where I am. So you know, obviously if I, if I'm in Seattle and I do something at the end of or middle of my day, my co worker in Vienna is already probably sleeping or offline and this just leads to high turnaround time. And so they had an idea like what if we could use an AI tool for code review, which maybe it's not going to be perfect or as well thorough as that human code review. But at least I could unblock someone early, you know, at the time that they need it and give them preliminary feedback. So closing the feedback loop, which we know is an important part of developer experience and so does that hypothesis hold up? I think that's the one thing that I really appreciate about their approach is that they treat these like experiments with hypotheses that they're trying to, you know, validate or not and not just like oh well let's just use AI for code review. That certainly sounds like it's going to help. They're taking a really methodical approach and so code review was one of those things to shorten feedback loops. They're also looking at different use cases to figure out which ones are the most impactful. So unit tests for example are a really good use case and figuring out how they can then spread that across. Also having these sort of organization wide change initiatives we can think about like adoption. This is a use case that's come up actually a bit in the kind of in the academic research world. There was a paper about how Google is using AI for migration and some different tricks which you know, I see that pattern actually I think I had probably four conversations about that this week. It's only Tuesday morning. So that's a really common pattern right now is using AI for migration.
[63:52]
A
AWS is also doing something like that with their Q Developer Pro migrating from one Java older version to a newer one apparently there who was trained on that. So I've heard that from internally on like one of the. And apparently what I've heard is hit or miss but sometimes it can work really great. So like yeah, just plus one to migrations.
[64:09]
B
Yeah, absolutely. And so they're taking this approach of like let's look for those use cases and run some experiments to validate that they're actually working and then try to roll them out.
[64:18]
A
You know what you mentioned at the beginning like how as a developers can get demotivated if AI is doing the stuff that you like doing. But migrations is something that. I don't know. I'm going to speak for myself. I always hated doing it and it always took longer and it was a drag and I think everyone hated doing it because it like you just want to get it over with. Like you don't. You want to get it like right.
[64:40]
B
But undifferentiated heavy lifting.
[64:42]
A
Yes, undifferentiated heavy lifting. It's the business doesn't care about it. As a developer you're not going to be a much better developer, obviously you're going to be a better professional, but it's, you know, it's not the thing you're excited about.
[64:54]
B
Well, can AI reduce the cost of change or reduce the complexity of change? And so indeed, many other companies that I speak with are thinking, what if we could proverbially knock at the door of our development team with a PR that includes the migration? It's already been tested, it has an ephemeral environment, it's already been run and all they have to do is press.
[65:14]
A
Approve like, well, and you review it, but suddenly like reading through it, the changes that have been done. It's a lot easier with the migration than. And figuring out does this match the, you know, like our coding standard with the modern language features that I expect, as opposed to looking it up? Because with the migration, especially from an old framework version, you have to look up how it used to work, which hard to find documentation is outdated. And then obviously you want to get the model right. But yeah, I wonder if this will be a great use case, by the way, because who wants to do it?
[65:47]
B
To start with, here's a top tip, one of my favorite in my arsenal of tips. So I think a lot of folks approach a migration and they give one file and say, okay, migrate this to whatever, upgrade this to the next version. And that's going to give kind of mixed results. And so a different technique to use is actually do one of them by hand, like do one of the migrations by hand, give the diff, or give both files then to your model of choice and say, give me a prompt that will reproduce this result for subsequent files that match the structure and format of file A and you're going to get a prompt that more closely matches the actual work that needs to be done. Instead of starting with like, oh, you know, upgrade this file from whatever version X to whatever version x2. I guess we can call that prompt engineering. We have a lot of tips like that in the AI, the guide to AI assisted engineering, which you can find on DX's website. But we went through and interviewed like 180 different companies and, you know, power users of AI to try to figure out like what were some of the things like prompt engineering, great system prompts, recursive, adversarial, prompting, all of those different things. My tip about migrations isn't in there because that came at a different time, but there's lots of different techniques that developers can use to get more out of their tools because it's not just as easy as throwing a license at your developers and hoping that they figure it out. They really do need support and enablement and training. And so that guide can play a good part of that.
[67:24]
A
Amazing. And as closing for engineering leads, tech leads who want to stay grounded and they do want to use AI tools when it makes sense, how can they get better at kind of avoiding the hype and sticking to what works? And what is your advice to them?
[67:41]
B
Data beats hype every time. So I think it's of course important to keep an eye out in the industry and look for opportunities, look for new and novel approaches and think how you might be able to fold that into what you're doing. But ultimately, AI to work on an organizational scale needs to be thought about as an organizational problem and you need to have really solid organizational hygiene when it comes to measuring your performance. Treating AI as an experiment, trying to figure out then what's the impact of AI. So get your baseline measurement as quickly as you can and then start running experiments. Don't just expect AI to be a silver bullet or that every engineer is going to inherently know exactly how to use it, because just like any other tool, we still need training and enablement. So when you have the data and you can storytell around the data, that's gonna also protect you from the hype cycle and maybe some unnecessary pressure from inflated expectations from the media. So that's my advice. Data beats hype.
[68:43]
A
Love it. So to wrap up, I'll just do some rapid questions if that's okay with you. I ask and then you just shoot what comes. What's a tool is your digital physical that you love using and why I.
[68:55]
B
Have to shout out to granola because there has never been a tool that I've used that's like dramatically increased my quality of life as much as granola has. I'm a bad note taker and a forgetful person in general.
[69:10]
A
And this is the AI meeting note taker, right? Yeah.
[69:14]
B
So what granola does, of course, make sure that you get permission from the person that you're in a meeting with. But, you know, it's an AI note taker. But what I love about it is that I can take my own sort of like disjointed notes and then granola comes back and fills them in with all of the context that I missed. It's really magical. And this was like one of the times that I was like, wow, AI really is amazing. But it's been such a transformational tool.
[69:41]
A
What is a book that you would recommend and why?
[69:44]
B
There's two books that I want to recommend, if that's okay. First one is like a more business book and it's called Write Useful Books by Rob Fitzpatrick. This will get your writing clear and snappy and just cut out a lot of the fluff. So it's incredibly useful. It's meant to be skimmed and you can really digest it in like an hour. So use it. I think the other book that's not a professional book, but I think useful nonetheless is called Unsavory Truth by Marion Nestle. She's like a. A food scientist, author, historian. And it really breaks down like food marketing and the food lobby more in the United States and about how everything is kind of constructed. And I think right now we're talking about AI hype, we're talking about media literacy, data literacy. There has been no book that has really just strengthened my understanding of how deep this all goes with politics and funding and lobbyists than that particular book. It is about food and not about tech, but a lot of the concepts are really transferable and it's just very fascinating.
[70:47]
A
Well, amazing, Lara. This was nice to go a bit more deeper into data and seeing what actually works and doesn't work. So it was really good to have you here.
[70:59]
B
Yeah, thanks Gary.
[71:01]
A
I hope you enjoyed this data driven conversation with Laura. I really like how Laura said that the best way to deal with hype is with data. To check out more data and detailed cases, case studies that Laura referenced. See these collected in the Show Notes below. For more in depth reading about how we can use AI tools as software engineers in a grounded way, check out the Pragmatic Engineer Deep Dives, also linked in the show notes. If you enjoy this podcast, please do subscribe on your favorite podcast platform and on YouTube. This helps more people discover the podcast and a special thank you if you leave a rating. Thanks and see you in the next one.