
In this episode, Simon speaks with Joe Magerramov (VP & Distinguished Engineer) to explore the trans
Loading summary
A
This is episode 753 of the AWS podcast released on January 26, 2026.
B
Hello everyone and welcome back to the AWS Podcast. I'm Lish here with you. Great to have you back. I'm joined by a super special guest. I'm joined by Joe Magirumov, who is VP and distinguished Engineer at aws. Joe, welcome so much to the podcast.
A
Thanks, Simon. I'm really excited to be here today.
B
It's amazing to have you here. You've been at Amazon for 20 years. Not many times I meet Amazonians that have had more tenure than myself. But you're well into there, you've seen some stuff and you've written so much code and led teams and helped teams write a lot of code that I'm sure many of our customers use each and every day, whether I know to or not. Just to give us some context. I mean, 20 years at Amazon. What are some of the things you've worked on as an engineer during that time?
A
Oh, wow. It might be easier to say what I haven't worked on, but it's kind of interesting. My, my, my time at Amazon kind of has two halves. So I spent the first half, the first 10 years working on all the systems behind our Amazon.com retail website. So I worked on some of our shipping systems, some of our payment systems, a number of systems behind our Marketplace. And about 10, 10 years ago I transitioned to work in the cloud where I've worked. I worked on our computer networking services. So I worked on services like VPC or load balancers, NAT gateways and all other types of gateways you would have in networking. And then I also work specific amount of time with our container and serverless. Serverless. So ecs, Lambda EKS and now last, last year I've been working on, on Bedrock and inference platform for, for Amazon and we worked on Project Ventle, which is, which is something we released this year and something I've been super excited and passionate about.
B
Yeah, just, just got mentioned and called out in, in re. Invent as well. I guess just to, to spin maybe all those, that subset of things you worked on as you mentioned. I mean I think that the characteristics there are, they're very much about scale, they're about robustness and reliability. I mean you're not working on stuff that well, if it's an error, there's an error, it's okay, you know, the user will retry. This is like, you know, trillions of packets, real time processing mission. Like this is the stuff that keeps you up at night if you get it wrong. I'm guessing that's right.
A
Yeah, it's, you know, reliability and scale is at the heart of a lot. What we do at Amazon, clearly scale makes a lot of things more interesting and more challenging. So you constantly have to think about not just only how do you get the scale right, but also how do you get it right in a way that doesn't add too much complexity because complexity tends to be the enemy of reliability. And so there's this constant tension behind engineering for scale versus engineering for reliability. And you have to kind of navigate that, that tension, navigate getting things right. But, but not too complicated, not too complex.
B
Yeah, it's, it's, it's one of those classic things if you know it when you see it. But you know, it only takes a whole bunch of years of experience to figure that out. But also something else I want to call out that our listeners can't see. But I can see because I can see Joe on video. Here is behind Joe is. I don't know what the collective now would be, but I'm going to call it a wodge of patent puzzle pieces. So at Amazon, if we get a patent, we get a puzzle piece, which is, you know, very exciting and great and so always good to do that. And I've always been proud of my own patents. I have nine, a small number of nine. Joe does not have nine. Joe, how many patents are sitting behind there?
A
There's a lot, it's hard to say. It's about, it's a cube of about, let's see, six by six by two. So I'm guessing around 72.
B
Yeah, that's a lot.
A
Approximate.
B
It's impressive.
A
Yeah, yeah. The funny story at Amazon, for folks who don't, don't know at Amazon, patents are you get a puzzle piece, like an actual puzzle, a three, three dimensional acrylic puzzle piece. And while I, you know, people have different opinions on software puzzles, software patents, what truly gets me is that I love puzzle pieces. And so this whole concept of puzzling, stacking things together and building out of them, and you hear a lot about it, is I'm a builder at heart and I love building stuff. And so the whole concept of stacking those pieces and building it is really what it feels to me. And that's, I'm afraid to admit that's been by far the most fun part of getting a software pattern is just.
B
Getting the hardest possible way to get puzzle pieces, to make a puzzle. But I love the dedication. I think it's Fantastic. So we talked about this thing called Mantle, which is part of Bedrock. Just help us understand, I guess, what problem are we solving, what it's for, and then we're going to get into the guts of how you built it, which is, I think, really a great unpacking of the current way of using AI for software development. But before we get to that, let's just talk about Mantle and give it its due. Yeah.
A
And so Mantle. Mantle is a new inference engine underneath Amazon Bedrock. So, of course, starting Bedrock is inference service for Amazon. It's public AW service, and it's a service that's seen tremendous growth, tremendous amount of customer adoption. And we've learned a thing or two operating Bedrocks as we've operated it for a couple of years. We've learned how customers use it, we've learned how inference behaves, we've learned how inference scales. And so with Mantle, it's our realization that at the heart of its inference is not quite a web service, but more of a scheduling system. And so request comes in and there's a lot of concerns you typically see in a scheduling system. Things like prioritization, things like fairness, things like placement, things like co placement, where you want to have multiple requests placed at the same time. And so there's a lot of those concerns come in. And so if you look, if you step back and look at the whole ecosystem and if Bedrock, you realize like it's. Well, what it does is a giant scheduling system. In an effort to accept that reality, to make sure that Bedrock can offer the best customer experience, we've built this new inference engine underneath the Bedrock. So if you're using Bedrock today with models like Minimax or GPT, OSS or Mistral, Mantle system is what's actually placing and executing those requests. And of course, from the customer's point of view, there's. The benefits are twofold. The first one is we, we want to offer customers the best possible customer experience. So the best latency, the best performance, the high number, highest number of features that the customers want you. But on our side, we also want to do it at. We want to operate this fleet efficiently because it's a large fleet. And as everybody knows these days, inference is, it's, it's, it's a demand constraint space where there's, there's enough demand for the customer. And so if you can't quite utilize your resources efficiently that you're not serving all the customers who want to use the service. And so on our side, we also want to make sure that we do it as efficiently as possible. And so Mantle was combination of kind of observations of how inference behaves and us building the inference system that allows us to serve our customers, serve them well, and keeping our utilizations and our cost as good as possible. Because at the end of the day we want to be able to be.
B
Passed on to our customers. That's right, exactly, exactly. And I think, I think one of the great things about that challenge is because of the proliferation of different models and being able to cater for all these kinds of models and kind of run inference efficiently on that is a non trivial problem. And I think what's interesting is that you and the team decided when you were going to build this particular service or part of the service that you are taking an AI first development approach. And just before we go into the show, we were having a quick chat and we try not to chat too much before the show because then all the good conversations happen off the show. But we're talking about the fact that what is the way to develop using AI today is not what it was three months ago, it's not what it was 12 months ago, won't be what it is in six months. It's changing all the time. But what really appealed is, is you and the team have taken an approach and then you've blogged about this approach and used data to talk about what happened, what you've learned. So we're going to get into the guts of this because I think there are folks who are hungry to hear about this. So firstly, you know, you're, you're a distinguished engineer, as I said, you've got the wadge of patents, you've written more code than most would have, yet you've taken an AI first approach. What was the thinking here?
A
Let me kind of tell you a little bit because I think everybody had a slightly different path in how they started trying, trying to use AI based development. Let me walk you through my path. So of course, you know, five, five years ago or four years ago, ChatGPT comes out, LLMs become a mainstream and, and one of the things that, one of the capabilities they had was writing code. So folks a lot of interest in that industry, a lot of folks trying it. And probably roughly about two years ago for me, I started trying to write code using LLMs and of course, you know, the results were pretty mixed back then. You can try, you can maybe have it write a little piece of code occasionally would have, would get things right. There's Twitter is full of all the jokes and all the comments Making fun of the code produced by LLMs and as slowly they were getting better and better. And for me there was about an inflection point where I started being able to use it on actual prototypes. So I could literally have an idea, I could use a model to write a little prototype, try it out, but yet wasn't, I wasn't quite convinced yet whether this is, this is something that could be used in production. So I would totally take the prototype and then turn it into something that I would build. It was using maybe more as a, as a gimmick or as a tool to help me learn rather than something that would make me more, more, More effective, more productive. Yeah. And then maybe about six to nine months ago I started noticing that I was doing that a lot less often. All of a sudden the code produced by the model was getting good enough to where like eh, I still needed to make modifications, I still needed to constrain it, but all of a sudden it was, it was solving the problems in a more robust way. And so you kind of have this steady march of progress. And at the same time um, we were also, you know, we were also trying to figure out like how do we, how do we can apply, apply models? How, how can we apply this learning to actually turn into real production code? Like what needs to change in our industry, what needs to change in our approaches to, to where we actually not just using it for toy exercises and POCs, but actually using it for real production. And so within Mantle Team we had a couple of ideas, a couple of thoughts. They're not, not, they're not that fancy. And one of the, one of the kind of, one of our realizations was that at the end of the day the model is a tool and it's a tool that accelerates an engineer. And so one concrete rule we have, and I think that rule worked out really well for us, is that at the end of the day any line of code committed into the repository has a human name attached to it and the human is ultimately responsible for the quality of the code. And so if you look at it that way, it's probably more analogous to a compiler or to a programming language rather than to a fully autonomous agent that runs around and modifies the code. And I'm not saying there aren't patterns that could benefit from fully autonomous agents, but in our case we made a decision that a human is the ultimate author of the source code.
B
It's kind of the continuous extension of you build it, you run it type thing. It's like accountable for all your code, however you produced it, you know, you could have got a barrel of monkeys to make that code. Doesn't matter.
A
That's right.
B
It's still Joe's code.
A
That's right. That's right. And once you have that accountability, you now start a lot of right, a lot of right tensions happen where the engineer responsible for writing code has to, has to figure out how to make the model produce the right quality code. For me, everybody has a slightly different approach, a slightly different pattern. What I do is I, I give a model a prompt and we can talk a little bit about even that. That process was interesting. And then I let the model produce the code and I review it. I review it. I decide my first decision is like, well, did the model solve my problem in the right way or not? Or do I agree with the solution? And more often than not I do. Then I decide, did the model, you know, did the model solve the problem to my liking? Does it have the, did it use the right practices, did use the right libraries? Is it overly complicated or did it miss edge cases? And then I iterate on that, on that project the way I would almost do it myself. I, I, I fix bugs, I fix issues, and at some point I get.
B
Fixed in the code directly at that point. Or are you prompting the AI to do the coding? What's the balance there for you both?
A
What I found out works the best for me is that if the model is almost there but missing maybe a line or two, or maybe missing an edge case or two, I'll, I'll just take over and finish it just quicker, faster. I, I don't mind doing it at all. Occasionally I find that, well, it's not quite how I would have done it. And then we go for a second round and we, and no, different how you would, you would do with another engineer. You have a, you'd have a, you provide feedback, perhaps change the data structure here, perhaps let's use a different algorithms and we kind of iterate on it until we go. Earlier on though, and this is part of the learning experience. I, I would actually, and this is me getting into a little bit of, kind of how my approach evolved. In the early days, I would literally stay glued to my screen and I would look at lines of code appearing and I would even stop the model in the middle and say, like, no, no, no, wait. Yeah, yeah, you're going off on a tangent. Let's change the direction. Let's try something different as I've gained more, better intuitions and gained a Little bit more confidence about how the model works. I started going, becoming more and more asynchronous where like it's literally I, I send the task over the fence, I wait for the results, I evaluate the results and I repeat woo. And almost universally it's a multi step process. We iterate. But, but I've, I, I've, I've got to the point where I've also calibrated my prompts enough to actually get, find out what's just the right level of complexity to, to ask the agent to do that. Results are going to be something to my liking.
B
And, and I think at the moment that's, that's one of the sort of techniques or tweaks that's really important. And before, before we get into that, I just want, I'm going to, I'm going to call out what we're going to talk about in a minute because we're talking about this but we, we have data that shows 10x development velocity so we'll get into that. So I want folks to listen carefully what we're talking about here because this can really help. So the prompting is a huge thing. I think, I think at the early stages of quote unquote vibe coding people like, you know, write me, Twitter and expect the thing to do stuff and we've, we've come a long way in terms of understanding. Well, you know, the better you prompt, the better the outcome can be. How big should something be, et cetera. It's almost like the old microservices conversation we'd have is how big should a microservice be? So tell us about the prompting approach you're currently using and I'll preface this on your behalf by saying this is the approach you use today. It doesn't mean it's going to be the best approach tomorrow, but it's what you've learned.
A
Yeah. And I think, I think you hit something important when you said, you know, don't ask the model to write me Twitter. Because reality of it, like, just like anything else in life, you, you need to calibrate yourself on how to most effectively use the tool. Right. Like when I first, when I first started software development, I, I learned, I started with C and I, I'm, I'm terrified to look at my first code because I, I, I didn't know what I was doing. I was, yeah, I was likely, I was likely using wrong idioms. Bugs galore. Asking, asking the language to do it too much or, or something that's, it wasn't meant to do. And as you, as you hone your skills, as you practice, you kind of get this intuitions about which approaches are likely to work and which approaches are not. And one of the intuitions that I find, I find that is super helpful to build is finding the maximum supportable request from the model. Meaning that something, you know, you ask too much and the model is likely to fail. It might run out of context window.
B
Or it may too much does weird stuff. Yeah.
A
Yep. Ask it for too little and well, you're not getting quite the speed ups because you are still have to be constantly no loop and asking. And so one of the intuitions you build up and one, one of the reasons I think that actually doing and trying things is so important is that you build up that intuition of what is that maximally supportable request and changes over time. It's not, it's not a static thing. It changes with the model's model's abilities, projects, domains. But having that intuition has been incredibly, incredibly helpful. And so for me I've learned, made a couple of observations. One is the observation of having, you know, having that kind of sweet spot of the request is the maximum helps me get the maximum kind of acceleration, maximum speed up. The second one is ambiguity. And this is where one of the interesting places is where we work in a field where many different approaches can solve the same problem. And sometimes you don't care and you can just pick nuance. Other times you may have strong opinions based on your past experiences as an engineer, based on your kind of what you're trying to accomplish and so helping the model disambiguate. Also what you want to do is turns to be super helpful as well because it sets up the guardrails for what you want the model to operate, which is by the way no different than dealing with junior engineers where you want to set up guardrails, you want to set up shared expectations and work from that. And so a large chunk of what I kind of do, how I operate is just trying to think through what are the appropriate guardrails, what are the appropriate constraints, what are the high level approaches I want to do. And for example, it's super common for me to actually brainstorm a problem with the model first before we even start implementing. And so just recently I was working on something where I didn't quite have a good intuitions myself yet what, what I wanted to do. And I, I literally started with like, here's the problem, I have list of list list solutions. Yeah, the model listed one immediately became obvious that one of those was really not going to work. So we went the other way. We. We worked back and forth to the point where I finally felt confident that this is gonna work. The model understood what I wanted to do, or at least the context was there. And then I flip from the brainstorming mode to, okay, let's go make it happen more. And so it is, you know, it is this. The tool is extremely flexible. It can be used in different ways, and you kind of want to take the maximum advantage of it. You want to not just. Not just tell it what to do, but sometimes also use it to help you figure out what to do and how to do it.
B
I think, I think that's a really important insight because it's, you know, if you think about how these models are trained, they trained on huge corpuses of code, some good, some not so good, but lots of code. And so it, it has, it doesn't have an opinion. It's a statistical model, but it has access to way more code than we can fit into our heads. And there's a lot to be said for, hey, here's the problem domain I'm trying to work on. And simply one of the things I've found is really true with the bottoms is saying, ask me questions, you know, like, prompt me to tell you what you need to do to get to a better point. And you, as you say, you start that dialogue and you're still driving, but it's taking you around lines that you may have not even considered and different approaches that just weren't in your mind because, you know, hadn't had your first coffee yet. You weren't sort of really thinking clearly. And it just, it's fascinating to see, like you say, it's like working with a colleague to some degree, of having that interaction.
A
Yeah, yeah. And that's exactly right. The model, you know, the model's seen every single implementation of B tree out there. I have not. I said, sometimes it's just brainstorming. I still want to be in the driver's seat. I still want to be the one that makes the ultimate decision the way we go. But using the model as that sounding board on. On problems is oftentimes I found it is a useful first step and a lot of times helps me. Helps me decide what I want the model to do before I even go to the.
B
Because you're still deciding. And this comes back to, you own the code. You're, you're. It's not, you know, hey, model, figure something out and then go implement something I have no idea about. It's like you still. And even if it's proposing something maybe you're not that familiar with, I'm assuming you probably do a deep dive yourself and say, well, actually, you. Yeah, what, what, what is. What is in this for me? Is this an algorithm I haven't seen before? Is this an approach I hadn't considered? What do I need to know to understand the risks?
A
Yeah, absolutely. And you still, you still review the code and, and you know, you still, you still have to keep your judgment on when do you want to, you know, what complexity you want to introduce and when you want to go the tried and tested way versus trying something more performing or more, more sort of more experimental. And you have to kind of, you have to still own the decision. And at the end of the day, you have to verify what's being produced. But having that conversation is, is. I found it's often useful to just. Cause at the end of the day, the clarity, clarity of thought, clarity of what you want to do is what provides the acceleration. The more, the more you have clarity in your own head, the more you kind of dealt with ambiguity, the faster you're gonna go. That's been true before. That's true even more so now. And so the models are fantastic tools to just help you gain that clarity as well and kind of brainstorm ideas and try new things and. And then, you know, in old days we would use. Google would use Docs. These days, it's just a new way of doing research and new way of kind of learning.
B
Very, very true. Well, you know, even things like the fact that you had. There's the AWS MCP with access to all the documentation, just saves us time on looking up the document. Even we at Amazon look at the documentation.
A
Oh, absolutely. In fact, I found out the models are. They know AWS services better than me sometimes, which is crazy because, like, just ask. Yeah. Asking. Asking questions and asking about behavior has been tremendous time saver.
B
Let me touch on context windows briefly, because certainly what we're seeing is that, you know, if you're running a heavy full context window, things starts getting squirrely. And so certainly in my own personal workflow, I'm using that, the frequent intentional compaction approach and finding great results. I've almost become, you know, resolute about, you know, if it's. Once it gets over 40% to 60%, I'm. I'm compacting because weird stuff happens. Are you seeing that? Do you, do you manage your, your context window particularly closely?
A
Oh, that like there's no tomorrow. Uh, and, and for, for, for, for listeners who are not familiar. So models have complex limb of windows which is the, the ultimate kind of limit constrained resource when dealing with the model. Typically this range from a hundred thousand to a million tokens. And as soon as the context window reaches the maximum, the model cannot do work anymore. So you have to use techniques like compaction or, or reduction or kind of long term memorization to just start, start working around that. And so it is, it is one of the things that you as an engineer need to actively model and manage. And what I found out, what I do, I do a couple of things and there's, there's an interesting conversation. We can go into what happens, how the tools need to evolve. But I, I clear the context window between every request and so what I try to use, I use a lot of files as a long term memory for the model. So concretely some, some requests fit into a single context window where you want to fix a bug, make a change and you, you work on it, you finish, you clear the context window. A lot of things I work on, they take more than you know they take, they require multiple iterations and there, what I find I, I found works really well is starting with a file that describes a high level approach. We wouldn't do no different than what I would do if I was working on a large problem myself.
B
Yourself. Yeah, yeah.
A
And then we break it down, right? You break it down. Kira does some of that for you with their spec driven development. I tend to use the command line tools a lot and I would start with the model and start a prompt with okay, here's our, here's the design we agreed on. We're now on step 2 out of 7 or 2 out of however many. And here's what we're going to do in this step and this step we're going to be doing blah blah, blah.
B
And then we'll work on just these.
A
Things, just this thing, right? And then when we finish it, we commit the code and then we move on to, we clear the context. So you forget everything you've done. That durable, durable file that keeps kind of the track of our intent and the last commit and we continue from there on. And so currently a lot of it is things I do manually by hand and it's just tends to match really well how I work myself as well. I start with the kind of end to end goal, but then I break down problem and it's just generally how humans tend to work, right. You, you want to you want to break down the problems into smaller problems, then go after those. I can imagine longer term, this is going to become part of the built in tools and this is going to become, this workflow is going to be a lot more natively built in. But yes, the context window is something in your, kind of in your back of your head, in the back of your mind, and you are effectively managing to it as a, as a constrained resource and breaking down the problems as much as you can to the point where you take maximal advantage of that resource.
B
And the funny thing is, both of us being quite experienced, let's say practitioners in the field, is that there'll be a time in the next few years where we'll sit back together and go, do you remember when you had to manage context windows? You young kids these days, you don't know what it was like. Yeah, so. So Joe, you mentioned, you mentioned in Your blog post, 10x Development Velocity, now that's a classic, you know. Well, it's gotta be marketing, Gump.
A
That's right.
B
There's no way you can prove it, you know. Come on.
A
Yeah. Anyway, anytime somebody says use a round number, you have to. Yeah, yeah, you have. You should have 7x.
B
Yeah, so, so tell us about it. Tell us what, what that looks like in your team. In that team.
A
Yeah. Well, so I don't know exactly what the number is. Is it 9x or 11.3x? But I'll tell you from my personal experience, and personally, you know, our, our field, when we think about measure productivity, that's a, that's a topic for many other podcasts because it's such a deep topic and so many strong opinions. But in the day, the way I view it, like, well, we engineers, we build software to solve customer problems and then at the end of the day, to build software, you have to write code and you have to write high quality code, but you have to still write code. And for me, I've always enjoyed writing code. Time's always been a challenge as I've become more senior. But I would always find an hour or two a day to write some code to do, to do a little bit of engineering work, because I just enjoy doing it. And with the kind of switch with agentic first development, I just find that I just accomplish so much more in the same amount of time. And for us, you know, for our team, again, commits are not, they're not the whole story, they're just the slice of the story. But we've, we've written, we've written on Average close to 10x, close to maybe even more than 10x Number of commits across the whole team. And it's just not even, it's not even just one team member. It's not just me. It's every team member on the numbers.
B
The whole team.
A
The whole team.
B
And in your blog post, just for folks to understand there, there is a great commit graph that shows the velocity of the team pre and post. And it's. I was going to say unbelievable, but that's wrong because it's. Believe it's the data. The data tells the story. It's like it's a. This dense, packed.
A
Yep.
B
Amount of committing going on. But as you mentioned at the start, I want to, I want to reiterate, still owned by the individual developers, still responsible for that. But you're shipping a lot more. And you talk about it like driving at 200 miles an hour.
A
Yep.
B
I think it's a good analogy.
A
Well, in fact, just even as a joke, maybe to drive the point home, as you and I are talking right now, I prompted the model to write some code. I haven't, I haven't seen the results yet, so I'm not actually paying attention, but I'm literally writing code as we speak.
B
And can I share a dirty secret? I have Cairo writing some code for me as well. So this is the modern nerds. We code even when we're doing other stuff now it's different.
A
And that's, and that's, and really the reality of it is the enablement, the amount of code I've written would not have been possible in other worlds. Not just because, you know, it takes more time or I don't have enough free time. It just, I fundamentally would not have had enough continuous time where I could sit down and, and write.
B
Generate that code. Yeah.
A
Cause there's a lot of demand on my, on my time. There's a lot of things coming in. And so having this, it's, it's not just the velocity improvement, but it's also switching from a synchronous mode of writing code to a synchronous mode of writing code where I can actually write code and not necessarily have entire entirety of my attention span focused on that until the end where I want to go ahead and review the results. And so that, not only that, that in itself is what's been so enabling for me is that my, my family jokes around is like, I've, I've gotten to the point where I love giving a model something to do overnight just because I love kind of. It bothers me that this is.
B
Yeah, it's wonderful.
A
Yeah. I can, you know, I can just give it a prompt and wake up in the morning and maybe, maybe it was too ambitious. It doesn't work. But it's like, let's, let's just keep working. Right? Like, why?
B
Nothing to lose.
A
I don't exactly, I don't need to be up, so might as well, might as well use the, use the compute cycles to produce some code and see what, you know, see where it goes.
B
So, so if you're generating all this code and you're still responsible for it, how are you ascertaining the code is of high quality, how you're doing your testing? Are you seeing an increase of the velocity of bugs along with the increase of the velocity of code? Yeah.
A
Well, humans are going to be humans. And so bugs will continue being an issue in that industry kind of, until, until there's a breakthrough in how we do validation, how we do verification of code. I suspect we're going to have to be dealing with bugs. And so that's not different with the code written by me or by the model. Um, and so the, the thing that's changed and the thing that, that our team had to work through and had to navigate is that even at the start, even if the rate of bugs was lower than what would have been with a human, they still happen. And when they happen, when you, when you write a lot more code, you're gonna have a lot more bugs. Just, just, just, just simple math, right?
B
Rule of big numbers.
A
Yeah, but what's even worse about those bugs is that now it's your bug density. Once you have to deal with them, they impact the whole team. Right. Like you checked in, if I check in a buggy code today, it's gonna potentially break other engineers workflows and other engineers code. So you kind of have almost like a tools down scenario. So everybody tries to chase down what's changed, what's broken. And so we've learned very early that we have to, and none of it is new. Our industry has been paying a lot of attention to how to, how to improve the testing, how to improve the verification of software. But we found ourselves that we needed to be raised the bar even higher. We needed to focus on how do we catch as many bugs as possible before they checked in, before they get into the production source code, before they get into a beta environment where they could impact other engineers who are doing testing. And so a lot of the thought that we've been putting a lot of the energy We've been putting is how do we set ourselves for success in a way that we don't constantly stumble and introduce bugs? Once again, like a lot of it are, things are. These are not novel ideas. This idea has been tried out. We've done them in the industry, but all of a sudden they become even more important than they were before. Because at this rate of change, if you don't, you need to have a way to curtail chaos or else it just becomes explodes. Yeah, explodes. Exceeds humans ability to reason with. And so a few of the things we've done, and this is maybe kind of work through. If I had to pick the one thing that I think made our team successful, it was less about, you know, having folks who knew what we do. A lot of us, this is new. This is a new space for all of us. You know, our whole industry is trying to figure out.
B
I'll figure this out. Yeah.
A
How things are going to work. But having folks who are, when faced with obstacles, look for solutions. And so it's very easy to say, yep, I introduced the bug model. You know, Joe and model together wrote a bug that made it to production. We should slow down and not. And not continue.
B
Not let that keep happening.
A
Right. And it's a lot more satisfying, but yet at the same time a lot more difficult to say like, okay, what do we need to change to make that not true anymore. And so for our team, one of the things that we've been putting a lot of attention is how do we, how do we. We accepted that using AI assisted gentic coding is the way we want to. The industry is going to work. What needs to change in our build systems, in our test systems, in our development workflows, in our operational workflows to make that a reality. And so I would probably not exaggeration to say that 25% of the team's energy goes into that aspect. Not just feature development, but just thinking about development practices, thinking about operational best practices. And sometimes it's subtle things like very concrete example of that our build system, you know, our build system is been around for a while. It solves a lot of Amazon needs, but it's not fast. And that made a ton of sense in the world. Yeah, it didn't have to be because a human, if the human takes a week to build, to write software, or if a human takes a couple of days to write a feature, who cares if it takes 20, 30 minutes to. To build and test and run all the integration tests. But in the world where the velocity is, is Sufficiently high, that workflow now becomes, can become a bottom line. And so one of the things that we worked on is working with our build systems with our partners in builder tools and how do we actually speed up this workflow to the point where we can get an answer in a few minutes. We can't have their model run all the tests, run all the builds and catch integration bugs much, much faster. And you kind of, it's not one big fix, but it's a lot of this little attention to details. Little like you see a barrier, you knock it down, you figure out how to, how to pave the way and it just takes, takes work, takes attention to details. Takes a kind of almost like stubborn persistence to keep insisting that it's possible to get a 10x or 12 6x speed up. Let's figure out how we get it.
B
Well, I think it's interesting because you are pointing out that as you, as you unconstrained one thing, you discover new bottlenecks. And similar to when we're writing services and deploying services at scale, you know, the, the, the attention to detail once you get to scale is vital to get the benefits. And suddenly, if we're talking about the scale of a software developer suddenly becoming, let's use 12.7x as our number now. 12.7x Suddenly it's really important to focus on all the other efficiencies around that person, what they're doing, because that's going to be the big, the big poll in the tent all of a sudden.
A
That's right. That's right. And a lot of it is just like you said, attention to detail and just desire. It's desire to go fast and desire to unbottleneck yourself while still keeping up the same bar on quality, reliability, security and all that we value. Yeah, absolutely.
B
That can't go at all. Now you talk about, also in your blog you talk about communication and communication bottlenecks. I think if you, all of us working in it, know that, you know, the ultimate world of it is often me, myself and I working on my own thing. No one don't have to talk to anyone else. Life is perfect, you know, no overheads. And the minute you add one other person, it gets more complicated. The minute you add more than one other person, it gets exponentially more complicated. But if you want to go far, you need lots of people. So talk to us about communication and what you found particularly working in this philosophy.
A
Yeah, yeah, there's a famous beam of. You have. It plays on a graph theory. But if Number of nodes is N, the number of connector edges, scales based on N squared. And so because as the team grows, communication becomes a disproportional, disproportionate percentage of your overall time. And so yeah, our hypothesis early on was that it is really hard to move fast without communicating a lot. And you can kind of decide and you know, Amazon is famous for the service oriented architectures where we use service as boundaries of communication, right? So a service A can operate without spending a lot of time talking to service B other than the few times they have to change the interfaces. And that's works really well at the large scale organizational levels. But when you're working within, you know, a single platform, a single service, communication is oftentimes one of the necessities for speed, right? Shared having that shared understanding, have that shared understanding about goals, about approaches, about trade offs intentions. And so our hypothesis early on was that it will be really hard for us to move fast, for mantle team to move fast without communicating frequently at high throughput and at high fidelity. And so our hypothesis was that you know, remote is not going to work. And I know folks have all sorts of opinions about remote work, but for us, we hypothesize that we will need to be all sitting together close to each other, be able to constantly communicate within the mental team just because the speed at which we, we, we, we thought we could move is just going to be really hard for everybody on the team not to be on the same page. And so we'll sit, in fact the whole team is sitting right outside my door right now, probably listening to me speak. But we'll see. So we all sit in one small area. We have a lot of communic, we have a lot of interactions, we have a lot of discussions, we have a lot of debates. Generally our mental model has been if we have a question or a decision to make, we don't schedule a meeting. We just walk over to each other's desks and having quick conversation. Sometimes we resolve the discussions quickly, other times we realize that we've uncovered a fundamental decision point or fundamental trade off and then we have to spend a little bit more time whiteboarding the idea, talking through this. But I probably wouldn't exaggerate if I said on a typical day I at least spent one hour just talking to other engineers in a high throughput face to face interaction out of the whiteboard just to, just to work through the details.
B
And how do you find that works in terms of obviously we're folks getting to quote unquote flow. It can be frustrating if someone sort of taps you on the shelf and says, hey, can I borrow you real quick? How are you balancing that tension between those two things?
A
Probably not too well for me. I generally find myself, well, you get.
B
To say, hey, this is what we're doing folks.
A
Yeah, well, not only that, but I also find myself that I, I, I, I prioritize shared understanding above where else? And so if somebody comes to me with the problem or space decision they have to make, I actually think that probably is most important thing I could be doing at the time. Yep, I might be in the middle of writing some code. I might be in the middle of, you know, of debugging something. Unless it's an emergency. I will prioritize shared understanding because at the end of the day, that's the, the cost of not having that shared understanding is that you're gonna go build something wrong and then you're gonna have to double back and redo it.
B
It's more, it's more expensive than long term.
A
And so over time I think I've taught myself to just always, you know, put my headphones down or put my, put my, you know, turn my computer off and go, go spend that time. But at the time, you know, one of the, one of the cool things about kind of having small tight knit teams is like you also learn each other's styles and you, you know how different folks operate. And so for example, I know if one person has their headphones on, it's probably because they don't want to be disturbed. Somebody else's, somebody else is always just like me and they're happy to be, to jump off from the table and talk to her. And so you kind of learn, you know, learn each other's practices, learn each other's habits. And frankly that's to me, that's always a sign of a healthy team of a team of folks who just respect each other's boundaries, respect each other's approaches and know enough about each other to work, to work kind of in a way that's most effective for each other. But for us, you know, we, we generally, you know, for, for small discussions, it's up on the shoulder. When we have a big discussions, we want to make sure that we, we take a little bit, you know, we take a little bit more structured way where we would bring it out during a stand up and say, yep, we're going to have this discussion next. Whoever needs to be part of it, come join it. But, but, but you and the team.
B
Also, I think have been very convicted about. This is something I know a lot of software engineers listening will be like throwing their hands up going yes, please let that be me. You've been very convicted about removing meetings and having little to no meetings in your diaries, which I think is fantastic. Just unpack that a little bit because that's, I know from a productivity perspective it's a huge thing for developers.
A
Yeah, well, I'm not quite as successful. I still, you know, I still probably spent 20% to 25% of my weekly meetings. But yeah, it's, look, it's, it's the meme. The, the meetings meme runs very deep. Our industry has been talking about, you know, meetings and discussions. One of the things that, you know, even before, before we got to AI assisted coding, one of the things that always I took prioritize and that's how I could find those couple hours per day to do coding is that I only want to be in a meeting where it's either absolutely necessary for me to add value to the meeting or it's absolutely necessary for the meeting to teach me something that I need to learn. And so a lot of it I've already kind of came natural to me to reduce the number of meetings I take per week and prioritize just doing hands on engineering work. Even before that. Now we took even a stricter stance. We want to make sure that the engineering team can focus on building building mantle, building the best inference platform we could build. And we took a stance with the team of, aside from this whiteboard conversation, aside from actual discussion pertaining to building mantle, truly asking ourself, is this the meaning that we need to be in? A lot of times the answer is yes, right? You have to go meet with another team to figure out a solution to some technical problem. You have to go agree an approach. But if you kind of look through that critical lens, a lot of the meetings that you go to may not be, may not be that important. Right? And, and you kind of have to think through, you kind of have to make that cost value of, of like is this moving? Is this getting, is this helping me solve the customer problem in new unique way? The thing that I would, you know, maybe add, add to this, that I found personally found works is that having folks who are senior enough who actually understand that trade off is super is super important. And having you know, there's a, there's a, there's a famous. What I find often helps. There's a framework that my line A vice president of AWS has written this. It's a principles. She calls it the principals roles framework. And it kind of works through different role as senior technologists can play in a conversation. You can be a sponsor, a person who is, and I'm going to butcher a little bit of the explanation, but you're a person who is pushing an idea forward. You could be decision maker, you're the person who is deciding on something and a couple of other roles. And what I found out is that super helpful for me is looking at the meeting is deciding am I one of those roles that are critical? Am I a sponsor, am I a decision maker, or am I the person who's going to be doing this? And the answer is no. Usually it's probably a meeting that would be okay to skip and you still have to apply judgment, you still have to make that decision. But going through that framework of the outcome that you are bringing to the meeting is oftentimes a useful framework to decide whether something is worth while you're attending. And if you look at it turns out that you don't probably have 40 hours worth of meetings in a week, you probably have a lot less.
B
Yeah, I like it. I like it. So, so you, you touched on the fact that, you know, you, you do a lot of asynchronous prompting and, and that sort of stuff. So what does a human do during that time? Like, what are you doing and what do your team do during that? Do you sort of kick back, play a bit of foosball? Um, I'm guessing that's not the answer.
A
Take a, take a, take a pop. Do a podcast interview.
B
Yeah, do a podcast now. What, what are you, what are you doing in the gaps? A lot of it.
A
Sometimes, sometimes those gaps are perfect time to have those whiteboard conversations where you go and you talk about what you're going to do next. Sometimes actually go, go grab a coffee or go, go take a meeting or go have a conversation. Increasingly, the idea I've been toying around with is actually like, what can I do? Use that time to spin up a second request and do a second change. And now you can start kind of seeing how you, you can go even beyond, you know, 12.6x and maybe you can get to 20, 22.6x. But a lot of it is just, a lot of it is just. Yeah, it's, it's, it's using that time to do something else. And you know, to be honest, it's not a surprise to anybody. Hopefully that software engineering is, goes beyond coding. You have to do a lot of other Things. Right. And so this would also be oftentimes the time I would spend on other things that, that require building software.
B
Need your attention?
A
Yeah, maybe. Maybe we are, you know, there's an operational issue I need to look at. Maybe there's a deployment I need to test, maybe there's a customer question. And you kind of have to structure your day a little bit differently and you kind of have to think through, you know, you start playing the game of like, okay, let me just get this request in, get this prompt out real quick before, before my next meeting so I can, I can don't waste the time, have the model actually do something productive.
B
It's funny you say that. I've talked to a lot of folks who are like, as I'm walking to make lunch, I'm making sure I'm hitting enter before I leave the desk for the prompt to happen, or I'm at home and I'm between chores and I quickly kick off another prompt because I know I can come back an hour, it's done. I have this sense, and it's probably a thought I need to flesh out more, that the whole concept of task management, of work management, particularly for software developers, but all knowledge workers, is going to radically change because suddenly we're trying to track all these things in parallel. I don't know about you, but once I've got three AI sessions going at once, it's tricky to maintain your context correctly because as human beings we're not designed to do that. We're supposed to be tunnel vision and not context switching. So I think a lot more abstraction is going to have to happen to make that a lot easier for us. At the moment. We'll take the lift because we know the benefit is there, but it's not the natural way to operate.
A
Yeah, we have our own context windows that overflow and we need to actively manage it.
B
And they're not a million carat tokens, that's for sure.
A
Yeah, yeah, absolutely. And I think this is where there's a lot of opportunity for innovation and tooling to help us manage this. Like, for example, Kuros, you know, stack driven development is a great approach to like, how do you break down the problems and kind of help the human manage the overall workflow that the model needs to produce in a way that sort of harmonizes how humans and models work together. And I suspect that there's going to be a lot of, a lot more innovations in this space, a lot more innovations that help humans interact with models, help humans model, manage models as also the models are going to become more powerful and they may be able to run for longer periods of time with more complex tasks.
B
And some of them are starting to have that ability to spin off other models and all that sort of stuff. It becomes this sort of, you know, turtles all the way down at some point.
A
Exactly. And so, yeah, I think we are kind of probably sounds a little cliche by now where we are seeing big shift in the industry and a lot of our things, a lot of how things work are going to change. Are we going to have to learn, we're going to have to discover, we're going to have to pioneer some of those things. And I suspect there's going to be multiple approaches that work just like with any other shift. And you're going to see a lot of cool new innovation coming out, a lot of cool new tool coming out that help humans, help us, you know.
B
What makes it interesting.
A
That's right. Yeah, it's exactly that. It's what makes it so exciting and what makes it so interesting.
B
And so what would, what would you say? The thing that, you know, working this way, you do, you've done it, you know, seriously now, production code. Our customers are using it.
A
Yep.
B
You've got a team. This is real, real stuff. What surprised you the most about working this way? Like, did something leap out of you and go, wow, I didn't expect that?
A
I mean, the trivial answer is like, just the fact that it works as well as it does. Yeah, yeah. And it's still, you know, if you think about it, there's this model, you know, there's hundreds of billion, trillion floating numbers that somehow produce functionality. Right. Like, that's. I suppose we all have to pause and think about how all of that works.
B
If I told you that was going to happen 10 years ago, you would have said, I don't know what it's not.
A
Yeah, exactly. Right. I think, I think sometimes it's just worth reflecting the complexity of something and how well it works. But if I had to, you know, if I had to think that the thing that probably took me most by surprise is that asynchronous nature of software development. It's the fact that all of a sudden I don't actually need to be present or even like, paying attention for, you know, for 80% of the kind of cycle of making a software change. And I have to kind of. I have to, well, A, figure out what to do with myself, but B, like, I don't actually have to be glued to the screen watching, watching every line of Code producing. I think to me that was a little bit of surprising because I was still envisioning, and maybe this is just me thinking small, that, hey, we are sitting together, we are actively pure programming and I'm making changes while the model works. And switching to an asynchronous mode has proven to be a big enabler, a big friendly mindset shift, more so than just having model produce code. Interesting, interesting. And the other thing that's been pretty surprising for me is just at the end of the day, to just make successful, we have to almost like going to grassroots. A lot of things I talked about, you know, smaller teams where everybody in communications, that's how we used to work 20 years ago.
B
So where's the fundamentals?
A
Yeah, yeah, Building sophisticated test harnesses, paying attention to deployments and build speed. Those are invariable in time and they always mattered, they just matter a lot more now. And so seeing that sort of a full circle and coming back to things that matter, things that make you more productive, has really been, I would say, surprise, but been an interesting observation and a colleague of mine said it well, that. AI driven development is going to impact well functioning teams much more than it's going, it's going to magnify well functioning teams much more than it magnifies not so well functioning teams.
B
That is a really interesting insight. That is, yes, I like that a lot. So, yeah, that's. Which comes back to, you know, in my experience, well functioning teams do the fundamentals exceptionally well. It's like athletes, you know, you pro athletes, yeah, they're talented, they're trained, et cetera. But you look what they do, they do the basics over and over again perfectly.
A
Yeah, that's all it is. Exactly.
B
So, so we've got lots of listeners, there's folks who are listening going, oh my goodness, Joe, sign me up. This is the future of my team. I'm, I'm good to go. What advice would you give to someone in that mode who's enthusiastic, rightly so, but about to embark on this journey?
A
Yeah, I'll probably give a couple of advice. I would say try build. There's just no substitute by building. Nothing I say right now, nothing anybody else says, nothing you're going to read online is going to teach you how to apply the tools to your problem space, to your domain. Just fundamentally, I would say being a builder is a lot more valuable now, a lot more important now than it's ever before. And so just start by building, trying, leaning in, experimenting, learning what works, what doesn't, kind of forming your own.
B
Finding where those edges.
A
Yeah, find your edges. Find, you know, build muscle memory, build intuitions. And then the second kind of, the second advice I would give is that it pays to lean in. The thing that made Mental Team successful was the fact that we all believed that it was possible. And we knocked down the barriers and we changed our own mental models, our own approaches to make it so. And so I'd say starting with that mindset of not the well, is AI going to work or not, but more was like, well, I AI works. What do I need? What do I need to change about my systems, my approaches, my software development approaches to make it so? And just going with there has been incredibly powerful for us because frankly, earlier on we run into bottlenecks, we run into problems. And it took a lot of perseverance and a lot of attention to detail, a lot of kind of trying things to get people out.
B
It is hard to do and it's hard to change your mind too. And it's hard to let go of things that you did previously or things you think are good and it comes out. Our old friend, be stubborn on the vision, but flexible on the details. This is what this is. This is. You're like, you knew you wanted to go faster, you knew you wanted to automate, but my goodness, you had to change a lot of detail.
A
Yeah, you had to change. You have to solve a lot of problems, you have to learn a lot of new skills, and you have to focus on the basics.
B
Yep, yep, that's, it's, it's how it is. Joe, this has been fascinating. I know a lot of folks will have enjoyed hearing this from the, from the quote unquote, real world, if I can put it that way. This is well beyond vibe code here. Just as a reminder, folks, you're using this code right now in your life. It's happening. Joe, we'd love to have you back sometime to share more about the journey because I know that in six months time your workflow will be completely different. So thank you so much for coming.
A
On the show and thank you so much for having me. Simon, I really enjoyed the conversation.
B
Always a pleasure. And would you love to get your feedback? AWS Podcast.com is the place to do it. And until next time, keep on building.
AWS Podcast #753: Amazon Bedrock Mantle and Developing at the Speed of AI
Release Date: January 26, 2026
Host: Simon Elisha
Guest: Joe Magirumov (VP & Distinguished Engineer, AWS)
This episode features a deep dive into Amazon Bedrock Mantle, AWS's new inference engine, and explores how developing with AI tools is fundamentally transforming software engineering velocity, team workflows, and the required mindset for AI-driven development. Simon Elisha interviews Joe Magirumov about his two decades of experience at Amazon, the scaling challenges of Mantle, and concrete lessons learned from taking a truly "AI-first" approach to building production systems.
This episode provides a candid, detailed view into building state-of-the-art, AI-assisted cloud services. The Mantle team’s experience demonstrates that leveraging LLMs for development is not just viable but transformative—when paired with the right engineering foundations, accountability, and agile team practices. The high velocity achieved isn’t about relinquishing control to AI, but about thoughtfully retooling how humans and machines collaborate, keeping humans “in the loop” for quality and responsibility, and relentlessly refining surrounding processes and tools.
Summary by AWS Podcast Summarizer