
AI-assisted coding tools have made it easier than ever to spin up prototypes, but turning those prototypes into reliable, production-grade systems remains a major challenge. Large language models are non-deterministic, prone to drift,
Loading summary
Narrator
AI assisted coding tools have made it easier than ever to spin up prototypes, but turning those prototypes into reliable production grade systems remains a major challenge. Large language models are non deterministic, prone to drift and often lose track of intent over long development sessions. Kero is an AI powered IDE that's built around a spec driven development workflow. It's focused on helping developers capture intent upfront, translate it into concrete requirements and designs, and systematically validate implementations through tasks, testing and guardrails. It aims to preserve the creativity of AI assisted development while producing software that is ready for real world use. David Janacek is a Senior Principal Engineer and a lead Advisor on the AgentIQ AI team at AWS. Today his work focuses on Kiro Frontier Agents, Amazon Bedrock, agent core and AWS's operational agents. He joins the show with Kevin Ball to discuss the design of curo, how spec driven development changes the way teams work with AI coding agents, and what the next generation of Agentix software development might look like. Kevin Ball, or K. Ball, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co founded and served as CTO for two companies, founded the San Diego JavaScript Meetup and organizes the AI in Action discussion group through latent Space. Check out the show notes to follow K. Ball on Twitter or LinkedIn or visit his website K Ball LLC.
Kevin Ball
David, welcome to the show.
David Janacek
Oh thanks. Great to be here. Very excited to chat today.
Kevin Ball
Yeah. So let's start out with a little bit about you. Can you give me the quick rundown of who you are and how you got to where you are today working on curo? Sure.
David Janacek
I'm a senior Principal engineer who has spent my coming up on 20 year career exclusively at Amazon with a singular purpose in mind and that is to to make developers lives easier. I've been just focused on that. It all comes from I guess how we build and operate software. Here we do DevOps and so to us DevOps means this of course takes different meanings depending on where you are and how you use the term. That's how language evolves of course. But to us that means developers do the ops. There is no DevOps separate thing. It's just a state of how to do dev more than it is to be a separate thing anyway. So because of that that obviously puts a lot of work responsibility on the shoulders of me, the developer. And so I've been moving from team to team over the years at Amazon, mostly aws, trying to build the next thing that's going to help life as A developer be easier. So that means I found it tedious on the first team I was on to operate databases, especially when they scale and when they need to be highly available. And on one hand I didn't like doing database ops because, because it distracts from the thing that I'm actually trying to do. But on the other hand I kind of loved it. And so when I heard that we're going to make a highly scalable, highly available database called DynamoDB, I signed up and I was like, okay, I'll help join that and build that. So the thing that attracted me to that was that it's going to be large scale and just that I'll never have to do database operations again because it's just managed and everything and so that kind of rinse and repeat with that pattern. I worked on Lambda API Gateway, serverless stuff, operations is a big thing. So I also worked on CloudWatch, which is the Amazon CloudWatch, the observability tool that we use broadly and a lot of people do. So anyway, that's been my whole thing and so most recently it's been with the advent of LLMs that has opened a whole new way of making developers lives easier. And that's what brings us to kiro.
Kevin Ball
Yeah, I have to say like I still remember manually sharding databases and having to bring up systems. So I am very grateful for all of the work and like. Oh yeah, where we are today of
David Janacek
just like conceptually easy but then tedious. Yes. At one in the morning, which is
Kevin Ball
somehow always when things go down or when you're having to run those migrations because yeah, no, it's such a pain and less of a pain now. So let's talk about CURA then. Cause like we've been doing a whole sequence. I've been really diving into this for probably the last year and a half of like how do we use these tools to code? They're incredible tools, they're, they are non deterministic, they have all these challenges. They're also wrought with a lot of learning that we have to do. So what is curo? What is the take and overview of how it works?
David Janacek
Sure, Curo is an AI development environment that helps developers go from prototype to production grade code using a technique approach we call Spec driven development. Spec driven development. It keeps all of the agility and fun and iteration that you get with vibe coding, but it adds just enough structure to make it so that you can produce the actual result that you want where it runs as autonomously as possible. So it's something where it keeps the IDE focused with an end goal in mind and keeps focused on tasks that are in service of that goal rather than having it wander off. So the spec driven development that's kind of baked into Curo is what makes it so people can produce production quality code that's thoroughly tested with accurate tests and produce the right thing that they need for production without the frustrations that you can get of like a wandering vibe coding experience.
Kevin Ball
So I love this. I mean, I've been long an advocate of things like document driven development and this sort of thing. So spec can have a lot of different meanings with different levels of formality to it and different levels of restriction. So how do you, within the context of Kiro define what a spec consists of?
David Janacek
I'm hearing you kind of say maybe also. Well, specs can be, they can be overly formal. When I've done kind of documentation driven, like they can be potentially a little bit kind of in the way. But with Curo I find it actually doesn't. It actually speeds me up and helps me. I found I just like it and it adapts to whatever your way of thinking and coding is. So a spec to Kiro is just, it's three parts, but you just start with pretty much the same prompt as you would have otherwise. You say, hey, I want to build this thing like in here and let me describe this thing for you and let's build it from. Then it expands on that. It's like, okay, well you said you want to make a traffic light control system, so I think that means that you're going to want it to keep track of when cars enter and exit the intersection. It kind of expands on your prompt and produces more detailed requirements that you then use. You can, you read this doc, it's a markdown file and you read it and say, yeah, these requirements, that's in this ears format, which just has some more like thou and shall kind of wording, but it's just very clear. Then it's, it's easy to read and skim to say, okay, this is what I want, or not. And then you can chat and say, okay, no, actually I don't want it to do that thing. I want it to do this other thing. And you just chat with it. You can either modify the doc or you can just go back and forth just like you would any kind of chat based LLM interaction. So you chat with it about the requirements to agree on what we're going to build, whether it's something new or a feature like A spec can be a whole new project, it can be a feature, it can be an upgrade. I'll say, hey, let's go upgrade this code to the latest Node JS version. Because that's some maintenance tasks that I need to do. It'll come up with requirements and that includes acceptance criteria to say, well, okay, I'm going to make sure that in the end that these are the properties that the system needs to have. That's kind of all part of the requirements. So Requirements doc from there, once you say, yep, this is what I want, it produces a design which is another markdown file that includes all of the technology framework choices that it's going to have. All the kind of non functional, some more of non functional requirements if it wasn't Requirements Doc, class diagrams, architecture diagrams, the little snippets of code to say, here's how I plan on doing this or that. So then I can skim that and see if I agree with that approach. It's nice that I get to see that those kind of code snippets are nice because I might Otherwise be like 20 minutes into this project and see, oh, I don't want you to go about it that way, I want to do it this other way. And then you have to throw all that work away. So the design kind of helps me make sure that it's going to take an approach that fits with my mental model for how this code base should be evolving or start from. And then once I agree with that, you it breaks the project into tasks and the task, it's actually a separate markdown file, a third markdown file that just says, okay, let's build the framework. Let's put here then now we'll do some infrastructure as code to set up some stuff. Let's set up a test environment. Now here's it just breaks down the implementation. Maybe let's create a database schema. Okay, now let's create a. It just keeps going and then it even splits out optional tasks which are nicer, like tasks that you can follow up with later that say, okay, then let's. It focuses on. Let's get something tangible for you to see working and then let's go do the thorough all the tests, including property based testing. Essentially a spec is just these three things that you're just chatting about and seeing what Curo is going to be doing, plans on doing and how it plans on going about it. And I really just like doing that all up front because then I can just say, okay, go do the tasks.
Kevin Ball
Well, this is fascinating to me because the process you've just described is one that, that is essentially what my team does. But we do it by hand in the sense of like we have a bunch of processes. It was like, okay, go and analyze what's going on or let's write a spec together, let's iterate kind of going through this. And it sounds like Kiro bakes that into the flow as you go. Now, are those markdown documents, are they committed as a part of your code base? Like where do those live? How are they managed over time?
David Janacek
Yeah, ultimately it's up to you, but the intent, I guess, and what I like to do and what I see everybody do is is commit them to the code base. These are, they're actually pretty flexible into how you use them in the future. Like one approach is that you use the specs to add a feature, start a project, and they're kind of one shot, then you're done. Now I should actually add that during this process I actually go back and forth a lot. It's not just a waterfall. It's like, let's do the requirements and then the design, then the tasks. And like sometimes I'll go back and forth, I'll say when I see the design, I said, oh, this design I don't agree with because it's actually missing a couple of requirements. Like, I forgot to mention that I actually want to use this framework or something or that I actually. Yeah, well, I don't want users to be able to do that or something. So it's nice to be able to go back and forth. Even once it starts implementing, I can go back in terms of the spec or the tasks or like any part of the spec. And so this is also true with. When you talk about where do I commit the spec? You can treat them as these one off things where, okay, now that the implementation task is done, I'll put it there for posterity so that later on if the agent or I have questions on how we got here, I can see that or ask questions about it. And the agent can read the specs that exist for that project, consult them when it deems necessary or when I ask it to. Some people like to keep a spec to be up to date with the project, where they say, okay, this spec represents the overall architecture. Once I make a change, I want to go update some spec, but that's definitely a fine way to do it. Sometimes you might actually just say, hey, I'm going to have a design document also checked in that I can ask Curo to say update, update my overall design that's like separate from a project specific spec just once you're done now, update my authoritative design once we're done and just keep it up to date with what we just implemented there. So that's another nice way to tease it. So I tend to use specs as these start to finish and then that's like archive for posterity and then I'll use a separate document to keep that overall idea of the current state of the architecture and the system and how it's built.
Kevin Ball
That makes sense. So let's now move into something you kind of mentioned a little bit, which is this idea of, I think you said property based tests or things like that. I think at this point pretty much everybody has experience using these tools and depending on which ones you use, they are better or worse at actually sticking to the guidance, the spec that we've agreed on, things like this. And so how do you think about building in layers of validations, guardrails, and actually validating those requirements?
David Janacek
Well, I think it's really important. That's actually what the spec is super great at, is it captures the actual intent of what you're trying to get done and whether or not it has done that yet. So that's just the breakdown of tasks. Okay. I haven't written these tests yet. Okay. So that's just one to make sure it actually does the things that you wanted it to do. But in terms of the quality of the tests, a nice thing about capturing the intent in the spec of the requirements and the design is that that gives a bunch of extra information that you wouldn't have otherwise gotten by just doing some prompt that you mentioned like 20 minutes ago that's now kind of floated off. Like that's to me when I'm doing coding using an AI agent, that's where whenever I type something into steering or anywhere, that's the value that I'm adding. And so that I want to make sure that that's saved and consulted. And that's where it's really nice to have that all summarized into the spec where I can see that kind of work that I've been doing with the agent. And so from that, because it describes the actual intent of what the system is supposed to do and how it's supposed to work, we can write more thorough tests or CURO can write more thorough tests than if it were just looking at some code and saying, okay, I need to write some unit tests now, or some integration tests. CURO uses something called property based Testing, it's not a curo specific thing. It's something that's been around in the industry for some time, but it's something that we realized that by having this spec, we could actually do thorough test generations using this technique. Property based testing tests invariants and then it writes tests to make sure that all those invariants are held during any kind of sequence of input. Rather than saying writing a test for a specific scenario, this tries to generate many scenarios and make sure that those invariants hold true. So let's go back to that traffic example. Let's say you're building a traffic light system. You want to a really important invariant is that at most one direction has a green light at a time, right? It's obviously very important for a system to never have more than one green light. It's okay for it to have no green lights, but to have most one. And so that's, if I think about how to test such a system, I want to make sure that every sequence of things, where somebody hits the walk button or the emergency vehicle goes through that. It's always like all these different things can happen. These are the inputs to the system can happen, different timings, power outages, power restores, and I want to make sure that it always holds. And so property based testing is where you describe a test in terms of these invariants, which can be generated directly from the spec. If the spec says that, hey, we want to make sure there's only one green light. Okay, so now we can generate a property based test based on that. So property based testing uses different frameworks that help drive these different inputs. So you have the test that has the invariant. Okay, how do you drive a bunch of input at it? Well, that's where a input generator comes into play. We use one called hypothesis. It's a Python framework for property based testing. And that just generate from the spec that describes the types of input that you want to a part of your system. It generates a bunch of permutations of that and then feeds that into the test or into all the other tests that might also want to have that input and test different to make sure that different invariants are upheld. And then when a test fails, ultimately then the agent can go and test its implementation against these and then keep adjusting the implementation until those invariants hold during all the tests. And, and one nice thing in property based testing is they're very thorough with all the permutations of input. But that thoroughness can make it tricky to understand why a test failed, because it's like okay, it's not just, oh, it's straightforward as a unit test where you fed it this specific input and then this assertion failed. That's very easy to understand. It's okay, let's run it again with that input. With property based testing, you had like a whole series of inputs in a sequence. And so to replay all of those can be a little confusing to see which of those triggered it. So property based test frameworks use a technique called shrinking to take those different permutations of inputs and it finds the large sequence that caused the failure and then it just kind of tries to remove those states until it arrives at the most compact explanation for why the code isn't holding the invariance to be true. So property based testing, very powerful way of having thorough tests to make sure that the system is what we agreed up front with the agent of what the implementation should do. Because it just tests so many boundary cases rather than having to just fish for one at a time. Because I've seen agents just will. For those listening, you're smiling here because you know what I'm about to say, like, you'll see an agent kind of. I can't get the test to work, so I just commented out the body of it, so now it passes. That's great. Okay, let's move on. And then it forgets that it never did that. Having these thorough tests make it so that it keeps the agent honest where it has to prove its correctness.
Kevin Ball
Yeah, there's multiple things agents tend to be, or LLMs, I guess, tend to be confirmation biasing and self confirmation balancing. So they'll try a thing and they'll get into this rut and by mapping the space out with the full property range, you can help them from getting in that rut. So one of the things I loved about your description of the spec based workflow is Curo's taking you through this best practice, right? Like I've seen, everyone's trying to figure out their set of things and Curo's like, here's, here's our approach, we're going to be opinionated, let's go. Does it do the same thing there for specs? So it's like, okay, I've done my implementation, now it's time to do property based testing. Here we go. Or are you prompting it to do that?
David Janacek
It actually thinks about it. It says, okay, here are the properties that I'm going to be verifying in the end. But you just decide, yeah, no, it actually has a step where it's coming up with correctness properties and whether or not I have it actually implement those tests right now is the optional part. Yeah, it's kind of also reflecting on whether during that spec flow it's saying, do I actually have enough clarity from you? This is actually an interesting thing that Curo is doing to decide how much more it needs from you. I mean, you can obviously weigh in at any time with it, but it kind of checks it reflects on are these requirements clear or conflicting. So that's kind of an interesting behind the scenes that it's doing. But it's also doing that around what properties then should I test? What are the key tests to make sure that these requirements are upheld? So it's just doing that reflecting during that spec flow.
Kevin Ball
That's great. I bet that leads to substantially better specs. But then that also means, hey, whatever environment I'm in, I just need to find a test framework that can validate these correctness criteria.
David Janacek
Right?
Narrator
It's pretty powerful in mobile application security. Good enough is a risk. Guard Square uses advanced multi layered code hardening techniques and automated runtime application self protection and mobile application security testing combined with real time threat monitoring to deliver the highest level of mobile app security. Discover how Guard Square brings all these together to provide mobile app security for your Android and iOS apps without compromise at www.guardsquare.com why is there always a meeting bot in your Zoom call? Blame Recall AI. Recall AI powers the meeting bots and desktop recording apps behind products like Cluli, HubSpot and ClickUp. They handle the hard infrastructure work capturing clean recordings, transcripts and metadata across Zoom, Google Meet Microsoft Teams in person, meetings and more so developers don't have to build it themselves. If you're building a meeting note taker or anything involving conversation data Recall AI is the API for meeting recording. Get started today with $100 in free credits at recall AI software.
Kevin Ball
So feeding into this concept, we've talked about what Kiro is doing in an opinionated way. But what mechanisms, hooks, skills, other form factors does Kiro offer for developers to customize it to their environments, to their particular preferences, to their team preference practices, et cetera?
David Janacek
I think this is a really important thing to get Curo to understand how you work and your team works and everything. And there are essentially three features in Kiro that help with this. First, I can write a steering file or a series of steering files where that kind of just describes my development environment. Maybe I say, hey, I'm always using this. My team is always using this for continuous deployment. We use these Frameworks just kind of set that the stuff to always be keeping in mind. These are potentially, you can have many steering files. They do take up your context window, so you want to keep them relatively small with pointers to where to go for more information about a particular thing. And that's really useful because by having multiple of them you can have maybe a company wide one. If you have many teams that try to do certain things the same way, but then you can have your own for your own team to do things the way that your team does it, that's maybe different than other teams. And then your own steering file that says here are my own preferences. So just the steering files help it stay focused and be able to do things the way that you are used to having them without having to repeat yourself all the time. Then there are powers, which Kiro Powers is a feature we added relatively recently that just kind of bundles up MCP servers with steering and with hooks, which I'll describe in a second. But these powers are things that are loaded dynamically depending on what you're doing. So Supabase, for example, provided a CURO power on Supabase. So if I'm using, if I say that I'm using Supabase or my project clearly is, Curo will load the Supabase power, Curo power, and it suddenly shows up with these MCP servers and steering files and hooks that would help it when it comes to using that platform.
Kevin Ball
That's really interesting. I want to dig into that because one of the biggest things I think people are trying to grapple with right now is what I've been calling progressive disclosure of context. Right. It's like I don't want everything in my context window up front. One of the challenges with MCP servers is it's hard to do like they've got a whole bunch of different stuff. And so skills were sort of a step in the like, okay, we'll give you a little description and then you can load more if you want it. But this sounds like potentially even more powerful. So how do powers work? Like what are the knobs and levers? I have to say, okay, these are the situations in which this context is going to be relevant.
David Janacek
Yeah. To use a power, it's relatively simple. I mean, on the surface, you go to curo.dev and list the powers that the website has is just a starting point and say add it and then it'll load up the ID and download that whole bundle of stuff, the Curo powers. I think that you'll find that they're pretty similar to what you're describing with skills of that they kind of have a smaller amount of data that will entice the agent to use it in certain cases, just enough to pique its interest at the right time. And then when it is time to do a thing using that, it'll load up the MCP servers and steering files and everything for that part of the task. I'd say it's not magic, but it is extremely convenient. And so it keeping the, I'd say keeping the context window small, it's convenient for that, but it's really convenient for just bringing in the expertise around a particular technology when it's time for that. I found these features that we build into Curo. We're building from our own experience and other customers experience. It's the nice thing about building tools for developers is that we are also developers. And so it's a thing where it's a little more intuitive to imagine what other customers might want. But just as we were, that's kind of just where Kiro came from in the first place actually is we were using LLM based tools to do development and we found that okay, it would wander off and so we were like, well, let's build specs. Or it would, you'd want to do something and it wouldn't know how to do that. Say hey, I want to use this new bedrock Agent core feature to build a new agent or use strands. It's a framework that we created for open source framework for making agents. I want to use that well at the day of launch of that new service or feature, the agent has no idea what that is and I can either go and give it a bunch of links to say, hey, go read this documentation, read this documentation. This is what I'm talking about. Here's how to find it. It points it in the right direction and saves a bunch of back and forth of like no, I meant this, I meant this. Like the power just keeps it focused. So we just from like this experience of having to repeat ourselves to the agent, say hey no, this is what I'm trying to use right now and here's how to find out more about it. We found that this powers concept would just help package up expertise so that the agent will be good at everything absolutely well.
Kevin Ball
And I think that is what that kind of progressive disclosure allows you to do, right? As you can say, say like there's all these things that are going to be relevant at some point. Let me give you access, but only when you actually need it let's talk a little bit about hooks because that was another thing that I saw in Kiro that seemed like it was potentially very powerful.
David Janacek
Oh yeah, hooks. It's actually another part of kind of this packaging of Akiro Power. But it also make them on their own. A hook is something that will run like a prompt or another kind of a spin off an agentic loop in reaction to a thing. So if I say, let's say I have an API, some kind of web service API and I want to, every time I update my API definition, might want to generate some things off of that. Just like specs, you can generate things like property based, test off the specs, API definitions. Also you can generate a lot of interesting things like SDKs, API documentation. Really nice. And so maybe that's a nice time when I save my API, if I never make a change to my API definition, I want to go do these things like as a result, so that I would write that as a hook and it's pretty simple to make them actually you just write a prompt that says update my API documentation. Simple. Like just really that's it. And you would say trigger on whenever this file is saved or changed. So the different triggers that you can kick off these different hooks, like maybe run this code scanner, run this dependency. If every time I update my dependencies file, whatever framework I'm using, I want to check for security vulnerabilities or something using this tool or out of date if there are more recent versions available that I could grab. So it's a nice way to just do those things that I need to remember to do or that I'm just making it a little more convenient. You can also manually trigger the hooks. I find that's actually how I mostly do it. Just personally I just. Sometimes it's not exactly when a file gets saved that I want to do a thing. I just easily, oh yeah, I click this button and it's going to go off and do that thing for me.
Kevin Ball
Now does that whatever it does. So say you have an agent running on a thing and it's kind of going. It's got its own context window, it's writing things and it touches one of these files that has a hook attached and the hook runs. Does the output of that hook. Well, first of all, if it's a prompt snippet that's getting injected, does that run in the main agent context window? It's a separate context window.
David Janacek
Yeah. Pops out into a separate context window. Yeah.
Kevin Ball
Okay, awesome. And then does anything from that get fed back into the original context window so that you could create like one of the things that I think is emerging as a pattern with a lot of these is you want to create feedback loops for your agents so that it can self correct and linters and tests and all these things give these opportunities. So does that get kind of piped back in some way or can it?
David Janacek
I don't think it does. I'm pretty sure these are one off independent tasks that just fork. I haven't done it that way, so I don't think it joins back with the main agent that you spawned it from. Actually could be wrong, but it's a neat idea.
Kevin Ball
Does it have a full agent loop or is it like a single inference, do a thing and come back?
David Janacek
No, it's a full agent loop. It's going to just keep working on a task just like any other task.
Kevin Ball
So it doesn't necessarily have to go back to the core one because it could go and like just do the fix itself.
David Janacek
That's right, that's right. Yeah. They just all branch off and go kick off these other tasks. So I think, yeah, hooks are an interesting kind of nice convenience for remembering to go and do other things.
Kevin Ball
That kind of pulls us into a world. Now we're talking about essentially multi agent patterns, right? Because a hook at its core it sounds like is a small agent or maybe not so small agent. It could be a large agent, who knows. So how do you and Kiro think about coordination across those different agents, making sure they don't stomp on each other, et cetera?
David Janacek
I think an interesting place that I see more of the agent coordination. So when you're in an ide that agent coordination, you're seeing them do the things and so it's not a huge amount of cognitive load to see what one is doing and make sure that they're not overlapping or anything. When you get into this other world of agents that aren't running right in front of you, that this coordination becomes very interesting. This world is actually already here. Recently we launched a set of what we call frontier agents, which are they connect into CURA in this interesting way. So with cura we found that with spec Driven development it could run for longer, it could run independently. You can give it a larger, more ambiguous task and have it go do that without having to like pester you. We decided to push that as far as we could into a set of new software development agents that we call frontier agents. One is, we call curo, autonomous agent. And so this is, it's a Non IDE agent. It's just you assign it, it kind of meets you. Where you are is part of your team. And you would say, if I have like a whatever I'm using for my backlog for my team, you assign it a task and it'll go do that task and do its own loop and test and test and refine and refine and test and understand. Expand on your kind of more ambiguous tasks that you gave it to make sure it fits with your team's working patterns and implement that for you. And then produce a code review, like a pull request. Here is the result. Do you want to merge this? The second of these frontier agents is a DevOps agent. So we noticed that when you code, you also need to run that code. That's like what I was saying with DevOps. So we've kind of encoded that into an autonomous agent that we call AWS DevOps agent. And so this does incident response. It'll like triage and root cause issues and recommend how to fix these issues in production over time. It'll actually look at your whole environment, your whole setup to say, well, actually I found these opportunities to optimize your infrastructure or the way that you even deploy your software. Say, hey, this alarm keeps going off. I noticed you're getting this alarm that goes off because you're doing bad deployments all the time. Let's update your CICD pipeline to add better tests and automatic rollback and better alarms. Because maybe alarms aren't even catching this early enough. It just tries to prevent future issues by just working all the time in the background to look for opportunities on what to improve. And then we have a security agent. It'll make sure that the code that you and agents write adhere to write security standards, and it'll even do a penetration testing on its own. So the coordination you asked about coordination, I think that really comes into play across these agents where they're not running in front of your face. These are running kind of all the time. Ideally, they run when you're sleeping or something. Right. So getting deeper into the backlog than you would. And so the coordination between these tends to actually be the coordination that you use across your team. Like we make it so that these agents are where you are and your team is. So these agents, you interact with them in Slack or whatever team communication tool you're using, in whatever backlog tool you're using, whether that's JIRA or whatever. And in the case of the DevOps agent, in whatever incident response tool you're using, like servicenow or something, whatever observability tool you're using. Dynatrace or Datadog. So these are all about when it comes to coordinating agents, we find it's best to do that coordination where you're already doing coordination with your teammates.
Kevin Ball
So that raises some interesting questions for me. So let's take one of those examples. The frontier agent working off of my backlog. So say it's in Jira or Linear or something like that, it sees a ticket, does it then follow the curo process of okay, I'm going to write a spec, do I then as a human have to take a look at that spec? Or like, how does the in the loop piece of this happen? Or does it, is it completely autonomous? And then it's going to come back and I'm going to look at it only when it's got a complete working set of code that may or may not match my intent if I had a very poorly specified ticket.
David Janacek
Right. I mean it has to have that judgment and learn like so these agents learn about what your team preferences are and how they work. So they learn based on your feedback. So you might realize, just like I mentioned in the KIRO spec flow, the curo is asking itself whether or not it has enough information to have a well formed spec or whether it has conflicting instructions that it needs your help resolving. Similarly with property based testing, when it's generating tests and sees a test failure, if it thinks if curo, like with the kind of reasoning that we've given it, if it thinks it has the right implementation, but it also thinks it has the right kind of requirement, but yet the property based test is failing, it needs help resolving that. So it realizes, okay, I have too much ambiguity in this case to be able to have that kind of optimism that I'll so often have about moving things forward. So similarly with the autonomous agent, a big part of them is realizing that they have a ticket assigned to them that maybe has some instructions in it that conflict with team practices that it's already learned that you actually this ticket says to use this logging library, but we're the team is using this other logging library like it might need to ask for clarification before it continues. And so that's a big part of it is deciding when to re engage somebody before just burning a bunch of cycles doing something that is the best guess.
Kevin Ball
Can we peel back the COVID a little bit and talk about how that learning works? A couple different things that I'm curious about. So one is like pure guts implementation, right? Everybody's trying to solve this is like the frontier problem right now with LLM tooling is how do you make it kind of continuously learning rather than train once and go. And so I'm just. There's various approaches floating. I would love to know, at least at a high level, what approach you all are taking to it. And then I guess the other thing that I'm always fascinated with around this is how is that made legible to humans? Because LLMs as we know, get things wrong. And so how do you take whatever mechanism that you're using to derive these are the practices we use or what have you, what form is that, then bubbled back up to people to say, you know what, you learned that wrong, that's not correct, or yeah, this is great. Can we expand on this? Or what have you?
David Janacek
Great. I'll give you a couple examples about how these agents do their learning and how you see it as a customer of them, a user of them. One is in the DevOps agent, the AWS DevOps agent. It needs to understand what we call topology. Your whole system, in all of your test environments, your production environments, what your infrastructure is, how your code and interfaces kind of overlays on top of that infrastructure, how your CI CD pipelines push to it, and how you as operators interact with it, how you observe it, or given this part of my application, where are the logs? What provider do I use to keep the traces, whatever. And so that topology is a thing that you can see when you create AWS. DevOps agent, we call Agent Space. Visualize that topology for you. So you can see here's the universe that we have learned about and discovered so far. I kind of mentioned this, that part of the DevOps agent runs all the time looking for things to improve. And while it's doing that, it's also discovering more and telling you more about, hey, I was trying to figure out something, I was trying to access this part of you figure out where your logs are for this thing, and I can't find it. You've told me I should be able to access this, but I can't. And so it kind of is producing these things that kind of got in the way of it learning more as like, hey, you might want to resolve this. And so as a result of that, we're asking, how does somebody correct that or do that? Well, they can either if we are right and we found a misconfiguration, they can reconfigure it, or if we were looking in the wrong place, they can write in the case of DevOps agent we call Runbook, which just, it's a. Essentially a set of steering files that are loaded kind of in the way that like progressively loaded. So they include like short names and short descriptions that can entice the agent on when to look at them. And one could consult with them. But these runbooks can help it say, okay, no, this is the, actually the observability tool that I'm using for this part of my system. You shouldn't be finding logs there anyway. You should be finding logs in this other place instead. So that's one way that the agents learn is the topology in the DevOps agent case. And you can see that visually in the application. The other one I guess I could talk about is a similar agent that we have called AWS Transform. AWS Transform, especially this custom transforms. It's a service that makes it so that you can. It's an agent that helps you do a longer kind of upgrade or transformation project or maybe a repetitive one. Let's say you, you have the same kind of code transformation or upgrade or migration that you need to do like rinse and repeat. A lot of times maybe you're trying to re platform or move to a new framework or upgrade a new language version that's like a real rinse and repeat thing. And sure, you could give that to Curo and have it do it every time and give it some best practices and everything on how to do that. Give it a Curo power specifically for it definitely works. But we've made Transform to be where it learns every time it does a transform of a particular kind and kind of writes down okay tips on oh, this worked really well, this didn't work. And so you can write your own kind of custom transform that said for just for you and it learns just it's not learning off of others, it's learning off of your. Every time you run the transform, your custom transform. And then the way that it shows you what it's learned is it just shows you a bunch of essentially learning document knowledge items it calls them, which are just essentially markdown files. So it's showing you what you learn and then you say yes, that is correct or no, that is not correct. You kind of. So you're accepting or not accepting those knowledge items. So that's kind of how that's shown to you and how you can control what it. Whether or not it's learning useful stuff.
Kevin Ball
And then digging into implementation pieces then are these markdown files that are globally available in the context for this agent are they exposed progressively like powers are like how do you think about as you accumulate learnings over time? And the transform sounds like it's honestly like very simple. This will have one type of transformation it's doing in learning, but maybe not. Maybe it's got a whole bunch of different different things. Or if we're talking about like a just frontier coding agent that's learning all of my team's practices. Like it could accumulate a lot of documents over time. So how do you control which the agent is looking at when? Or like you had this great example that I loved of like oh, you asked me to use this logging library, but I know that my team uses a different logging library. Like that's a detail, probably one of many details about coding best practices on this team. How did it find the right one?
David Janacek
It's certainly not a load it all up every time into the context when everything we've learned, you know, progressive disclosure or resurrection of knowledge is super important. Aging out of knowledge is important. It's a mix of so many techniques and around like from rag to simple files that are with like summaries. There are a lot of agent like other agent loops like having agents that are responsible for doing that reflection I was describing in the DevOps agent, for example, how it. It's actually going back every day and looking at the last weeks of issues that it investigated. And it's just reflecting on those. It's reflecting on those for things that maybe are like the way that you see it most as the user of it is you're seeing recommendations on how to prevent this same recurring issue from happening or patterns of issues. But it's also going and reflecting on whether or not it took the right path in that investigation or like where did it waste time in the investigation. The trick is sometimes it's good to waste that time. Like you don't know, like next time it might be a different issue. And if you don't explore that branch of an incident response like then you, you miss out on the thing that actually was the problem this time. So I guess to your question of how do we organize knowledge? It's. It's a mix of techniques. We're always learning ourselves on the best way to do it. So we're. It's moving so fast that by the time I'm done with this sentence we'll probably have a slightly different take on it. But it involves like a lot of agents like behind the scenes. Like ultimately agents with jobs of learning are pretty good at then reflecting on what might be useful in the future and coming up with the right applicability.
Kevin Ball
Yeah, no, and I love that example of just like having a set of things that are. Their job is to look back over particular time periods, maybe look at particular types of aggregates, what have you, infer what they can and then expose it or make it useful when need be.
David Janacek
And it's also important to do that reflection during a task because the nice thing about background agents, like frontier agents as we call them, is that they have time. Now during an incident response, we don't have time. Like the idea is to get that figured out as quickly as possible. But when working on a coding task we have time. And so we can have a bunch of just introspection points in the agent where if it's working on something it can say okay, let's go look at, let's examine one at a time or in parallel actually all these different elements of what we've learned. And so that's a neat observation is sure we don't have time to go evaluate every knowledge item we've accumulated, but we can kind of really ask ourselves from different subject areas whether we've considered everything. I guess having the background agent learn, but also having the agent that's doing a thing kind of know what questions it should be asking of the knowledge and when. And when it has more time to do that, it has more time to reflect on learnings.
Kevin Ball
Yeah, I like that. Can you share some of the different perspectives you might have to take? Because I think that is another very interesting thing with these LLM agents is like you can have like, oh, I have five different lenses on which I wish you to evaluate this thing and then we'll come back and compare.
David Janacek
There's so many different dimensions on this one that we do make sure that we're always reflecting on is AWS has this thing we call the well architected framework to make sure that we're building things that follow the learnings over time of what's a good way to build on AWS or just build systems and operate them in general. So we kind of reflect on like security for example. You know, is this, am I opening up anything that we shouldn't be opening up or following the best practices that you've established as a company? Or even things like reflecting on whether or not we're following the testing practices, the observability practices, Just everything you can kind of imagine of, particularly if the somebody says that something is important. So if they say that this is important to them, like make sure you're doing this like, then we should reflect on that to make sure we're doing that thing.
Kevin Ball
And are there ways for teams, for example, to set up one of these agents to say like, okay, here's a perspective I always care about and in fact I'm going to give you some additional resources or things particularly for that perspective.
David Janacek
Yeah, exactly. Like connecting knowledge bases, steering files, even writing powers. One thing we've done within Amazon in our use of Kiro, we're back to the IDE at this point is we have a steering file that gets kind of bundled up. We have a separate VS code plugin for our that connects to our internal build system anyway. And so we've had that for, for decades in the different generations of IDEs over the years. But when it's installed in Cura, we'll install a default steering file. We've just said, okay, this is something that we always need to have. So make sure you're always following these things. Again, unless a team has overridden it with their own kind of little bit separate way of going about it or whatever.
Kevin Ball
One thing that you kind of said in passing, but I think is actually interesting to dig into, you mentioned, oh, the best practices for how to do learning, they're changing. You know, by the time the sentence has changed or you're finished, they may have changed. And it's funny, but it's also one of the big challenges of our current LLM coding era is things change so incredibly rapidly it's hard to keep up with what's going on. So I'm curious, is there anything you're doing either that Kira is doing or that you found effective at Amazon for helping us limited human brains keep up with the changes that are happening in our software stacks?
David Janacek
A lot of what we do is educating each other, sharing what we do. We have long running just ways that we propagate and share in practices and stories we do a lot of storytelling of. When I was building this thing and solving this problem, here were the important things that I learned and maybe you should too. So a lot of these internal talk series, we do a lot of them externally too at our learning developer conferences like Re Invent do a lot of describing how we build things and how others can learn from what we found that works well or doesn't work well? A ton of just sharing information with each other in different talk series that we make sure we advertise and market well. And people learn different ways. Some people like talk, some people like reading, some people like podcasts. And so we try to do a lot of different mediums to do that one thing kind of pre AI era, the most most recent AI era of generative AI, we released one thing we call the Amazon Builders Library. It's a set of long form articles that describe how we at Amazon do certain things when it comes to writing software or operating it or designing scaling big large distributed systems and operating them at scale. And so just sharing knowledge like that, getting into the nitty gritty details on like what's the right way to implement a health check when you have a server behind a load balance or something that seems super easy. Like oh yeah, just respond healthy when you're healthy. Okay, but what does that mean and what are the downsides? I actually for that one, that's one of the articles I wrote for the Amazon Builders Library. It's a long article because when you a health check is and something responding to that health check is an automatic system that's going to can have surprising behavior when it's running unattended. And anyway, so just to your question of how do we keep up with all of the changes in AI? Talk about it a lot talk about what we do that works. We try to do that externally for everybody so that you all can, you know, learn what we learn and so we can learn from you all. Like just every, you know, just we try to just have a participate in community of practice about it and then we try to think of just kind of shortcuts for it around things like hero powers that are going to package up everything that somebody has learned and an opinion that they especially when company or tool owner, or framework owner, platform owner has an opinion about. Here are the best ways to use my framework or platform to be successful. I can then share that directly with everybody using Curo by providing a power that packages up all that experience, that knows all the power user tricks and tips and everything. So just ways to share knowledge and then ways to share your packaging of tools. We find that those are the best we can do so far. And then around the learning systems and training in general in particular, I mean this is part of why we've been building things like Amazon Nova Forge, which is a service that helps you do model development from early model checkpoints. So it's just there are so many services that we've been launching recently that help you do that. Like when it comes to different techniques, whether it's building a rag, a knowledge base or actually training models, we do now know that that's super important for agents. And so we're doing Everything we can to provide services that you can use to just get started with something that learns like Agent Core memory. So Bedrock Agent Core is a service that helps you run, right. Agents. It's something that we use internally like AWS. DevOps agent uses agent Core because it's a handy way to build and operate and scale agents in a secure way. Isolation between tenants and everything. And so one of the features of Agent Core is memory. And so it will look at the traces of the agent, all the agent and tool interactions and has different strategies that are available out of the box to just compress that knowledge to say, okay, here's what I should store for later for the like session wise of just how do I reflect on a particular agent run and then how do I promote stuff to long term memory where I can have long term lessons distilled and available afterwards. So Agent Core memory is one place that we've been trying to build that like how do agents learn? I know your question was more how do humans learn about how to use these things? But agents need to learn too. So we need to be giving as many building blocks as we can so that basically that complex task of training and learning can at least give everybody a head start who hasn't tried it before.
Kevin Ball
Kind of on that note, you're trying to push forward this frontier and make it easier for folks. What do you see like where the frontier is in these agentic coding tools and building agents and this whole space and what's coming over the next. I don't know how far we can project now, but three months, six months, something like that, I guess.
David Janacek
It's funny, everybody's going to have their own slant on this from their prior expertise. But one thing that I see of challenges that kind of fit with my own background, it's tricky to tell whether agents are doing things successfully. How effective is an agent at the thing that you've built the agent to do? It's something we sink a lot of time into when we build our own kind of agents. And it's something that we've been trying to build primitives to make even easier. Like we've been trying to make observability and evaluation of agent trajectories or automatic and easier. It's funny how some techniques that I'd say when it comes to agentic applications, some of the techniques that were maybe not as exciting because they're more subjective for say like website. Are your website users happy? Actually kind of tricky. A little more annoying than measuring. Is my API returning success or Failure okay, like pretty easier signal. Still complex but my website customers happy is a tricky question to answer. You can infer it and there are a bunch of techniques around say like okay, is the objective to it depends on the domain. If you have an E commerce site, are people buying things successfully? If not, there's probably a problem on the site and people probably aren't happy for some reason. But you have to have a domain specific just like business outcome. Same with agents. It's not an API. These are things that users are using to drive a certain business outcome. That's tricky. Some things that we see around agents are humans entering like a thumbs up, thumbs down which is a useful signal. It helps, you know, but then like is the user really going to provide a really detailed description of why they weren't happy? Like we don't want to over correct on that. I'm never a huge fan of the. I mean it's important and necessary but I don't want to just rely on the thumbs up, thumbs down. It's kind of the airport bathroom cleanliness button as you leave.
Kevin Ball
Yeah.
David Janacek
Okay. Am I is it clean or not? Right. Like I don't like that. I, I in fact I liken it to that because it is sort of kind of people wrinkle their nose at the idea of that. It's like because I, I kind of intentionally give a sort of gross comparison because I think what we need to be doing is figuring this out more naturally. What is the objective that people have with an agent and and how, how close is it to achieving that objective and where it didn't achieve it. Why. So this evaluation and learning are both very related about how do you evaluate whether or not this is succeeding for customers and then when it did, how do you make sure that we are good at that next time if there's a new discovery and if we didn't that we don't go down that route next time. Unless it's just a situational thing where maybe like an operational investigation maybe we do need to still check to see if it was a bad deployment that triggered the event. Even if it wasn't a bad deployment this time.
Kevin Ball
I love that. I recently set a challenge to my team. I said one of our goals for this product should be delightful. You have to figure out how we measure that. Right. But it's like these things are so nuanced and subjective now.
David Janacek
Yeah, it's a nice thing where we can, you know, borrow the page from website and mobile device. Mobile application monitoring, like real user monitoring is sort of this rum as industry called is like kind of. I'd say it was certainly appreciated by people who are doing website operations and development and mobile app website development, but I think the rest of the industry might not have really appreciated the need for that. Maybe because it didn't apply to what they were doing so much. But I mean it could have, but now it brings that kind of technique into the forefront, but with new technology to be able to do real user monitoring better now. So yeah, it's interesting how it just, it's a nice shift toward understanding your customers happy. It's what have always drawn me to observability because I used to work on observability on CloudWatch of observability is a lens with which to understand your customer's happiness. And so I think that's very true with agents.
Kevin Ball
So we're coming to the end of our time at this point. Is there anything we haven't talked about that you think would be important to leave folks with?
David Janacek
I'd say when looking at coding tools that help you with, that's a really great starting point into kind of rethinking how you're building and operating software. But look at, I guess one, when you start with the tool, especially if you're new to AI coding tools, you're not going to get what you want from it on the first try. It's like when, when search engines came out, like this is like a little bit of a story time thing, but when search engines came out, I remember some people, you know, you weren't just magically good at using search engines. You had to use the right syntax maybe, maybe like search engines were a little primitive when they first came out too. So you had to learn how to use Boolean expressions to be able to filter out things that were uninteresting to you. But the more you did searching and the more you really studied, like okay, why didn't I and introspect. Why didn't I get what I was looking for from that search? The better you get at using the tool and of course the tools get better over time that you don't need the expertise on. But they also like introspecting. Well, why didn't I get what I want? And I see that the more that people think about that, the more successful they are with, with a coding tool, like a coding agent, like for any coding agent, like Curo. Because maybe like, well, why wasn't it able to use this new framework that I made five minutes ago? It's okay, well, you need to remind it to like, okay, so this was what steering is for. That's what powers are for. Like that's what MCP servers are for. So just think about, well, why didn't I get what I wanted? And so because other people are getting that, we found that a team did a 30 people for 18 month level of re platforming of a service. Like 30 people for 18 months. We're able to get that done in like with six people in six weeks, something like that. I'm mis citing this statistic, but it's just the significantly shorter time when the tools are really used in this. So my advice for people is these tools are extremely powerful, but they take you kind of learning how to use it and incorporating it into your development practice and then changing your development practices. Because these tools can accelerate coding so much that they force the need to change the rest of the practices around the coding. It doesn't become the bottleneck anymore. And this is why then we've built these frontier agents to handle things beyond the coding. Because we don't want to just shift the bottleneck. When you're doing a bunch of production grade coding. You also need production grade security pen testing and review. You also need production grade operations. And so you kind of need to accelerate all these things at once. And so look beyond look. Basically distilling this into two things is keep using tools and ask yourself, well, why didn't I get what I wanted? Because other people are so like why didn't you get what you wanted out? There's maybe something, some trick of using a tool in a better way. And then second is look beyond that one tool and look for the other bottlenecks that you can speed up and make your life easier with like around DevOps and security.
Kevin Ball
Yeah, absolutely. Well, and I think the stuff we've talked about today, like all these things, CURA was baking in some of the practices that six months ago you had to learn. Right? Like I have to learn prompt the thing for a spec. There it goes. I love the property based testing because it does fit into this idea that I been playing with a lot of like forcing the LLM out of its groove instead of here confirm your, you know, jump down this confirmation bias loop. It's go across all of those things. And as we talked about like it that concept of what needs to be verified is just baked in there. And we did check while we were talking, like there are property based testing libraries for all sorts of different environments and we can just throw them. Sounds like CURO should be able to just generate tests in whatever environment we're in. So super cool. I know. We've been seeing incredible speed ups. It sounds like. Yeah, Amazon, you guys are seeing incredible speed ups. So excited to see where this goes.
David Janacek
Yeah, same. It's a new frontier. It's the next big speed up in making developers lives easier. In this case, a whole lot.
Podcast: Software Engineering Daily
Date: February 26, 2026
Host: Kevin Ball (K. Ball)
Guest: David Yanacek, Senior Principal Engineer & AI Lead Advisor, AWS
This episode dives deep into Kuro, Amazon’s AI-powered IDE tailored for “spec-driven development.” Host Kevin Ball interviews David Yanacek, a senior engineering leader at AWS, about the philosophy and technical details behind Kuro, how spec-driven workflows can transform AI-assisted coding from creative prototyping to robust, reliable production software, and what the next generation of agentic (agent-based) development platforms looks like at Amazon.
"Spec driven development… keeps all of the agility and fun and iteration that you get with vibe coding, but it adds just enough structure to make it so that you can produce the actual result that you want where it runs as autonomously as possible." — David Yanacek [04:42]
"Essentially a spec is just these three things that you're just chatting about...I just like doing that all up front because then I can just say, okay, go do the tasks." — David Yanacek [09:02]
"A really important invariant is that at most one direction has a green light at a time...property-based testing describes a test in terms of these invariants, which can be generated directly from the spec." — David Yanacek [13:43]
- Uses frameworks like **Hypothesis** (Python) for test case generation, property shrinking, and debugging edge cases.
- This approach both exposes and prevents agent “cheating” (e.g., making tests pass by commenting out functionality).
"The steering files help it stay focused and be able to do things the way that you are used to...without having to repeat yourself all the time." — David Yanacek [21:26]
"These agents are where you are and your team is. So these agents, you interact with them in Slack or whatever team communication tool...and whatever backlog tool you’re using..." — David Yanacek [31:42]
"Progressive disclosure or resurrection of knowledge is super important. Aging out of knowledge is important. It's a mix of so many techniques..." — David Yanacek [39:46]
"…What is the objective that people have with an agent and how close is it to achieving that objective? And where it didn't achieve it, why?" — David Yanacek [52:06]
On Iterative Learning:
"You’re not going to get what you want from [AI coding tools] on the first try...introspecting, why didn’t I get what I want, makes you more successful." — David Yanacek [54:05]
On the pace of change:
"Best practices for how to do learning? By the time this sentence is finished, they may have changed." — Kevin Ball [44:39]
On the future:
"It’s a new frontier. It’s the next big speed up in making developers' lives easier." — David Yanacek [57:50]
On property-based testing:
"I love the property-based testing because it fits into this idea...of forcing the LLM out of its groove instead of confirmation bias...it’s go across all of those things." — Kevin Ball [56:58]
For those new to agentic, spec-driven workflows or Kuro: