
Discover how AWS leverages automated reasoning to enhance AI safety, trustworthiness, and decision-m
Loading summary
A
This is episode 747 of the AWS.
B
Podcast, released on November 24, 2025. Hello, everyone, and welcome back to the AWS Podcast. I'm Liz here with you. Great to have you back. And I'm joined by a very special guest. I'm joined by Byron Cook, who is a Vice president and distinguished scientist here at aws. G', Day, Byron. How you going?
A
Hey, how's it going?
B
Good, thanks. So you are, you are back. We, we've had you on the podcast a couple of times years ago now, and it was long overdue to have you back because as is the case with many scientific advancements, the uses are not always evident to the general populace. But as technology changes, stuff happens. So obviously we're going to talk about gen AI because you can't not. But first, but first, let's talk about your specific domain, which is automated reasoning, because it is a fascinating domain. Not many people know about it when I speak to customers about the concept. So let's maybe just briefly unpack that and then we'll get into the utility side. So tell us about automated reasoning and why you're so interested in it.
A
Yeah, so, I mean, in the past we would have talked about. The story would have started in the 70s or 80s about people attempting to prove programs. And you know, we talked about practical applications, microprocessor. I got my start in microprocessor verification. Worked on reasoning about genetic regulatory pathways, reasoning about railway switching systems, reasoning about operating systems, and then Amazon reasoning about policies and virtualization and networking and cryptography and storage and all that kind of stuff. But actually the story goes way back. So Turing actually wrote one of the first papers in the space. It's called how to Prove, how to Prove a Large Routine or How to Check a Large Routine. But he shows basically how to take a program, essentially map it into the program semantics, the world of mathematics, like mathematical logic, and then to reason about the mathematical artifact that the program is actually really representing. And we basically use the same tricks today. But then, but then a really interesting thing happened is that with the invention of the digital computer, people right away began to think, well, like, you know, what about like artificial brains? And so the term artificial intelligence was coined at this workshop at Dartmouth. It was John McCarthy, I think was the. Hosted it, and they invited folks from kind of different disciplines from cognitive sciences, logic, you name it, statistics, information theory. Claude Shannon was there and, and they, they termed, they coined that term artificial intelligence. And so I would that. So if you had talked to people in the 60s and asked them, what is artificial intelligence? It's actually highly likely you would have heard about my area, automated reasoning.
B
Interesting.
A
So it's, it's the, it's the logical, it's the manip manipulation of symbols in a logic system to try and deduce, you know, to try and, to try and, and so there's, you know, so the, the sort of running competition battle or collaboration, if you will, has actually been in these different areas where you have those sort of statistical, like a, views based on cognitive science on your neuro neurons and so on, and then views like based on Bayes theorem, for example, and then views based on mathematical logic. And at various points along the different approaches have been ahead or behind.
B
And then I guess they excel at different things too. Like they've got different.
A
Yeah, exactly.
B
Strengths and weaknesses.
A
Yeah, exactly. And so one of the things that happened is there was this notion of AI winters. So basically funding. There was a big pendulum swinging back and forth like there was tons of funding. There was no funding. There was tons of funding. There was no funding. And so what happened in time is that these different researchers got sort of more and more specific about the exact problem they were taking on with their tools. And so the areas kind of diverged and so, so, you know, machine learning, so, you know, later we would call those tools like machine learning, automated reasoning. And so automated reasoning became the kind of domain of like, how do we reason about programs as opposed to how do we like, think about, you know, the reproducing, you know, artificial general ontology.
B
Yeah, sort of an analogy of human thought. And it's interesting because if you think about why this is relevant and why you've been working on this within Amazon for so long as well, is, is we do things at like, huge scale, like, like it's, it's, it's difficult for a lot of folks to understand the scale that we get to operate at. And it's a real privilege to go to do that. But when you get to that sort of scale, it's kind of beyond anyone's purview to figure out, well, how do you know this will work? Like, how do you know unexpected things won't happen? And this is where this discipline, I think, really comes into it, that this is not about guessing or assuming or it's sometimes, okay, like this is about will this happen or will this not happen Type delineation.
A
Yeah, there's like, at the top of my mind, there's sort of three trend lines that drove the adoption of automated reasoning at Amazon. And then you know, I'm sure we'll end up talking about it but then it how it becomes neuro symbolic AI and how that connects to gen AI and agentic AI. But three big drivers pre the general public getting super excited by generative AI. The big drivers were one there's actually a very radical increase in the capabilities of one of the most foundational automated reasoning technologies. It's called propositional satisfiability solving. So there's an international competition of the tools and the tools are getting better and better and better like remarkably so. And there's some real breakthroughs going out there. There's a real breakthrough like you're like around 1999 but then there's some huge breakthroughs like 10 years ago. And so those tools are just getting better and better. So if you encode your problems into the tool formats these tools can handle they're way better able to solve them now than they were before. So that's one trend. The other trend, another trend is that this move to a services model and the view that software is not shrink wrapped, it's operated or and so that drives a different economies of scale situation where you basically are constantly, you're encouraged to constantly rewrite the code. Right. So we, we write some software, we deploy it, we take technical debt. We know that the software probably isn't going to be able to scale to where we think it might go but we wait that we see how customers are using it and then we're constantly rewriting that and so that, that opens up the, the desire for the teams to then to want to replace code or to. They're constantly rebuilding the code and that so that's an entry point for the formal as opposed to other places I've worked where it's like they wrote it perfectly the first time was their opinion. And so you kind of, it's, it's real hard to go and like get. Yeah, exactly. And then the, what's the third thing is, is that with the move to the cloud a bunch of conservative security obsessed customers were kind of, kind of looked at, they looked, they looked at the economics and they liked it a lot but they were nervous about security and that this work became the reason they would come that they were actually excited to get off their New Jersey data centers say right. So they, so they, they want, they wanted to move to AWS A because of the cost but they're like oh wow, if you can so and we had customers say you know we'll move orders of magnitude workload over More to aws, if you can prove the correctness of the cryptography that's super meaningful to us. That's a threat model that we're really worried about. And so basically it drove a bunch of business into aws. Yeah. And so I'd say those three things were the perfect storm. And I was just kind of really lucky, right place, right time, when I joined Amazon.
B
Well, it's interesting too that, you know, often, often highly complex detailed areas of research like this that are shifting all the time, often the jump into the practical side is quite difficult. And what's been interesting to me is to see automated reasoning sort of dip its toe in the water a little bit within aws, but suddenly it's like it's, it's available in the hands of every AWS user. So if you're talking about provable security, you know, if you're using different capabilities, like, you know, trying to understand, you know, what can access my vpc, et cetera, like you've managed to surface this capability in a, in a consumable form. Really. Like, like our customers get to use this every day, sort of unconsciously.
A
Yeah. I mean, my frustration is, is that we're not really even giving them the half of it. Right. Like, so the, the, the, the problem we have is that automated reasoning is a little bit of a misnomer because what there's, there's actually a spectrum. So some tools are actually push button, fully automated, but traditionally the scope of the problems they can solve were rather limited. And so that's, that's what you have with IAM Access Analyzer and VPC Reachability Analyzer, for example. They're based on propositional satisfiability and it's, and its cousin called satisfiability modulo theories, it's a slight extension to it. And those, they have, they have pretty significant limitations about what you can do if you're willing to. Traditionally, if you're willing to put a PhD or a person with a PhD at the keyboard to run these tools, then you could do tremendously more because, because now they could guide the tool to do, do really remarkable things. And the challenge has been that there aren't that many people able to drive that don't have the background. They're hard to manage. You know, just a whole bunch of reasons that it's that it's that reasons, reasons. Exactly. And so that's where, you know, long, long, long. We've been dreaming for a long time about these sort of, I mean, in today's lingo, I would say it this way that language models that are trained over proofs, right. Such that now the language models can run the automated reasoning tools or I might call them theorem provers, mechanical theorem provers. That, that was a dream we had and we were thinking about how to do that like you know, training over all of our papers and archive and that kind of stuff. But the, the, the, what magically happened was that the mathematics community got really into the Lean theorem prover. Lean is Lean. Lean has written, was founded and the sort of the main architect is Leo d' Amora who, who works at Amazon. So the, the world of mathematics embraced Lean when they're communicating with each other about their proofs. But also it allows people not at the top universities to learn the foundations of mathematics and, but Lean caused people to at scale to write down the foundations of mathematics and that became really good training data for language models. And the language models began to get really good at running Lean. And so it's actually like, like DeepMind Deep Seek, like you know, just a whole bunch of model providers are, are actually using Lean like with reinforcement learning for example to, to build models. And so for that reason the models are usually pretty good at running these theorem provers. And so that gives us now that.
B
Like capability that you didn't have before.
A
Yeah, yeah. So that, that's a really exciting sort of piece of the puzzle. So that's kind of changing how we're thinking about a lot of things right now.
B
That's fascinating. It's an interesting, like I said, it's an interesting side effect of the general development of this sort of domain that it's sort of picked up this capability.
A
Yeah. And I guess in the, in the idea of like there's the idea like the world model. Right. Like the, but, but if you, if you think about it like what that's giving you is the real, more real data as opposed to human author data. Right. And the view is that like the, the volume of human author data is actually fairly limited and also flawed. The thing about Ethereum Prover is the data is not flawed and it actually gives you an infinite amount of data. There are an infinite number of statements you can make and prove or disprove with Ethereum prover. And so that becomes incredible training data. And what we're, what we're seeing and what a lot of model providers are finding is that if you use these kinds of tools that you get transfer like they, they are more logical, they're more able to do good recipe design. They're, they're more able to do the sorts of tasks that use the same part of your brain that you would use when you're trying to drive a mathematical proof through.
B
So it's, it's leaning on that, that mental model, if you like, or approach to problem solving for other things, meaning it's better at the other things.
A
What we found is that basically the Lean theorem, there are others, there's Isabel, this hall light, actually all of the develop all of the founders of those tools all work at Amazon these days. But. In the world of neuro symbolic AI or the world of where you train models or reinforcement learning or there's various other ways of integrating these things. That Lean is kind of the lingua franca of that world. And so that's pretty exciting.
B
That's really cool. That's really cool. And so let's, let's, you know, we've been skating around a little bit. Let's, let's jump right in. So obviously generative AI happened. It brought great utility and great capability. And then folks started noticing that it would give you confidently incorrect answers, AKA hallucinations. And so then there became the challenge of, well, I'm getting answers. Sometimes they're right, sometimes they're wrong. How do I know? And I approve or verify. And so a lot of work's gone on in terms of applying automated reasoning checks through things like Bedrock, guardrails, et cetera. Talk to us about how, firstly, how with the lens of automated reasoning, we can think about the data or the answers produced by large language models and how we seek to actually verify truth and incorrectness and hallucinations.
A
Sure. It's, it's worth, I think it's worth pointing out that the, the two times that, I mean we're, we're both Amazon employees. So like. Right. You'll appreciate this, this and hopefully the listener can kind of understand what we're saying. That Amazon, unlike other companies I've worked at, has like all these mechanisms to drive connection to the customer. And to me, before I joined Amazon, that sounded kind of corny, but what I've found is that actually is a big driver for why automated reasoning is successful. Right. In 2014, 15 timeframe, it was the financial services institutions in New York. I happen to be in New York and London. So customers began like they had mathematical backgrounds, they had the need, and they began driving on it and were driving on it hard. And some very influential finance financial services institutions began to approach other financial services institutions and say to them, we need to support this and we need to help Amazon Understand how valuable this work is that's that's brewing inside the company. And that same thing happened actually with hallucination and Gen AI.
B
Interesting.
A
So when Gen AI happened it, you know it was kind of you know the, the board members of most enterprise organization's granddaughter showed them and playing with it over the holiday break. And then by February each C suite leader of every enterprise organization needed to know what their gen plan was. And so then they all you know diverted resources and then formed teams and went and did a bunch of stuff and I began to get calls and again I'm in New York City and London so like the financial services and a bunch of those sort of industries in, in. In these areas up and down 6th Avenue were calling me. And so then I started, I started ended up, I was. My. My calendar was filling with meetings with customers and I was kind of hearing the same thing around but we won't be able to, we. We can't trust this. We can't use it. Like the prototypes are cool but like there, there. There were issues around privacy, issues around sovereignty and then issues around incorrectness due to hallucination and so those kind of became that a bunch of stuff that that scientists including myself began began tackling. So the, so to, to. To kind of answer your question directly how I think about it with, with so we're, we're sort of delving in the area of neuro symbolic ISO. It's sort of the intersection. So then with, with much help from my friends, you know, who are more on the statistical machine learning side jointly with him I've begun to appreciate that there's like multiple ways you can combine these tools. So one is you can use automated reasoning to generate data that you might train over and that would include reinforcement learning. Another way is to just allow the gen to do whatever it wants and then when it's done doing whatever it wants, whatever magic it's using and take that information and then prove or disprove its correctness. And then another way is to integrate them more deeply. So like in the inference step to provide hooks for logical reasoning. And then sort of a cousin to that is with you know, agentic tools where you provide agentic access to the formal reasoning tools and then the it's a multi agent system, they collaborate and then you know, maybe the Ethereum proverb provides a verification result together with a certificate that audit to make sure that the Ethereum prover was really called and that your other agents are just hallucinating that answer. And so those are some different ways and you're kind of seeing. And then when it comes to agentic AI, then you have questions around, well, what are the agents doing? How do we know they're doing the right thing? And then how do, when we compose them, how do we know that we're doing the right thing? And then can we plan over the compositions? And so there's a whole kind of rich area and what we're seeing within Amazon and you know, I can't really speak to all the, the products that'll, that'll come out, but I can speak to automated reason checks and bedrock guardrails. Is that, that the, there's different sort of points in that space and there's different solutions we can put together and we're basically sort of using all of them.
B
Yeah, yeah, it's, that's the thing. It's not, it's not one solution. It's lots of, lots of insertion points. Let's maybe unpack one. One accessible example, if I could put it that way, for folks. So one of the examples we often talk about is like, you know, a generative AI solution that's doing mortgage approval type work. So it's assessing different stuff. The question is always, well, how do I know it's right? Now mortgage approvals is good because there's lots and lots of rules. But the challenge is there's lots and lots of rules. So how does automated reasoning go from there's a whole bunch of financial institution policies around lending to. Is my answer correct? Like help us bridge that gap. Yeah.
A
So the, the statement, I could make a statement that anyone who's worked in a formal reasoning for any length of time would agree with violently. But people who haven't would be super puzzled by. And that statement is that over 80% of the work is figuring out what it is you want to prove.
B
Yeah. Slight advice. Farida Principle lives across the world on everything.
A
So we, I'll give you some examples. You know, pre gen AI, right. So, so we launched IM Access Analyzer, right. It's a tool, automatically takes all your policies, throws them in a big soup, is able to reason at the semantic level about them, is able to prove, is able to prove properties. And then IAM Access Analyzer walks you through an exercise to basically trick you into writing the specification. Right. It is, it is asking you a set of questions and basically building something that we're going to now prove of your system. And ever your system diverts from that spec, then we're going to flag it to you and then we're going to ask you that. Okay. And then the spec evolves. But also what are the semantics of the IAM policy language? Where are they precisely? And that turned out to be a multi year effort.
B
Wow, how hard could it be?
A
Turns out that no one really knew. And it turns out these days if you ask questions hard enough of aws, what's going to happen is they're finally going to pull out someone from the IMX disanalizer team who has a background in mathematical logic, who's then going to be answered. Because who owns it these days are those, is those folks. And so we see that where increasingly people build on that semantics, we're building like within aws we use that semantics for a lot of other purposes beyond IAM access analyzer. And sometimes when teams break the semantics, they're required to roll back the changes. So, so it actually becomes a very important again another God realm of customers is this is, this is the semantics. Same thing with EC2 networking, same story yours years argument. Many meetings, many people in the meeting, many. And so, and I've worked on device drivers, I've worked on biological systems, I've worked on railways and so on. It's always the same that you get three people, three biologists who are working on models of leukemia or cancer. You can't get agreement between the three of them. Right. So it actually turns out just the, what are the ground rules? What are the assumptions that we're making when we do this? Proof. And then what are the important properties? Like is it all data at rest? Okay, imagine like all data rest must be encrypted. Sounds, sounds reasonable. Okay, well what do you mean by encryption? Do you mean like a Caesar cipher? Oh no, no, no. Caesar ciphers. That's totally insecure. Okay, but you just said all data rest. You said encryption encryption. So you didn't really mean when you said encryption, you actually meant something else. Right. And so that, that refinement loop takes a really long time. And so that's now what, you know, your customers who are going to try and do mortgage approvals or family medical leave act, or you know, questions about ticket airline ticket returns policies or HR rules around leave of absence and stuff like that, and how does your stock interact with your leave of absence and so on. Turns out that it's really hard to work that out. It's really hard to figure out all the corner cases. It's hard to get agreement. And often achieving agreement often becomes very controversial because when you're setting policy ahead of time, you're actually making an infinite set of decisions all at the same time. And people begin to find all the corner cases that they don't like the answers to. And so it's very hard to reach a consensus. So we're basically taking, we're giving customers a big stick and they're, and helping them poke the wasp's nest of what is truth. And it's, and it's a, it's actually a very difficult challenge. Add to that the fact that we, because they don't understand the average customer, won't understand mathematical logic, and we don't want to require that they do. We're basically having to maintain a, not a digital twin, like a logical twin. Basically, we, we want to maintain a natural language representation of your rules together with the actual, actual symbolic logical view. And it might be that we want to take the logical view and communicate it to you in natural language, but it might be that we want to take your natural language and convert it into logic. But I appreciate that you don't understand logic and then help you understand what's hidden in that logic and what are all the weird corner cases. And so there's, I mean, we can kind of zoom into all the techniques that we developed for Automated Reasoning Checks Pro, but they're super interesting. And then what we're finding all around the company is that we're realizing that those are actually basic building blocks and we're reusing those, a lot of those capabilities in different parts of the company. And so, so it's, it's a pretty, it's a very, very exciting time right now. Let me say it that way.
B
It is, it is. So, so if I'm, if I'm then a customer, let's, let's keep, let's keep picking away this mortgage approval situation. So if I'm a customer, I'm using, you know, I'm using automated reasoning checks in guardrails, for example, so I can upload now, I'll have documents, I'll have policies, et cetera. Is automated reasoning basically interpreting that and converting that into that mathematical model? For me, is it playing back? Maybe if it's detecting anomalies or, you know, as you say, you know, often these policies are built over time and actually have built in clashes with one another because they've never been applied to this. Like, what do you see in terms of that process from a customer standpoint?
A
There are a lot of things we could do and probably we will end up doing, but at the first launch there we're, we're doing something more limited. So what we're doing is we're taking in PDFs and we are automatically synthesizing our first swing at what we believe the policy might be in a formalism. It's called SMT satisfiability Modular theories. So you have logical constants and you have axioms, basically. And then so you ax. We axiomatize a representation of what we believe the, the document to be saying. We use generative AI to help us do that. So then the question is. Whoa, whoa, whoa. Like you just, you're trying to mitigate hallucination and generative, a generative AI, and that, that same challenge is going to come up when we talk about inference time check. So keep me honest until it's explained that you too. Right, so we use generative AI to help formulate. And so then now, now how do we help you get confidence that, that it's right? Okay, well, what we can do is we can now we have a logical formula with propositional structure, like ands and ors and knots and so on. So we can walk that structure and find interesting corner cases embedded in the logical structure. And then we can find examples in those. Like, we can find like the sort of convex holes, to use a technical term, but the sort of different slices, if you will, of the formula. And then we can find exact witnesses of those formulae and then we can translate them into natural language. And then we can say to you, hey, what would you think about this case? Thumbs up, thumbs down. And if you say thumbs down, then we're going to say, well, why not? And then you're going to give us a bit of text, and then we use that text to take another round of the prompting. And then we're using a technique from IAM Access Analyzer. So basically, IAM Access Analyzer is doing the same thing. We take all of your policies and then we look at all the like, weird corner cases that sort of are up to the parts of your policy. We basically take your policies, put them in a blender, and then get a whole bunch of like, interesting scenarios out of them and then run them by you to say, what about this? What about this? But then we use some techniques from lattice theory to reduce the number of questions we're going to ask you. And it's something analogous to like green eggs and ham, like the, the Dr. Seuss story, right? Like if, if you've already said you don't need grain eggs and ham, you don't need them with, in a car, in a, on a bar, at a bar, or Whatever. Right. So we're able to like scoping down, right? We, we, yeah, we basically use some, those kinds of tricks to, to, to reduce the overhead of that exercise, but it's not trivial like. Right. And so, so we, there's a bunch of how to, how to make that exercise easier and easier to understand and then, and then we can lock that down. And then there's a bunch of things you could do. Right? So you could deploy this in production and then you could record your customers Q&As, for example.
B
Yes, yes.
A
And then if you make a change to your policy later, you could go replay those with automated reasoning checks again to see how that, how the rules change and how you give different answers. And then you can also query, you can also collect customer feedback on the questions, on the answers that they didn't think were right. Or you could audit them and then you use that information to figure out which of the rules you actually have got wrong and sort of make an organizational level decision like do we want to change the rule? Do we not want to change the rule? And it's my hope and belief that in time that we will no longer use PDFs to decide policies. Right. That, that, that we will, that the real source of truth will be mathematical logic and that, and that, you know, people of the future probably won't understand mathematical logic per se, but they will have natural language abstraction. Yeah, exactly. Versions, representations of those. And that they, because of tricks that we have with generative AI, that, that, that distance between those two things becomes much smaller.
B
I think that's what the interesting thing is because we want to get it down to that representation that benefits the ability to process.
A
But without the PhD we have another trick that's very clever. And then there's a whole bunch of science going on about how to do it even better.
B
But I love, Byron, the use of your phrase trick when you're really, what you're really saying is highly detailed, incredibly complex, amazing, interesting mathematical work that's being done, AKA trick.
A
What we do is we, so we, we can do this both at the time when we're translating your documents, but also at inference time. Right. So imagine you are now an airline customer and you're asking a chatbot. Or, or we could talk about how this translates to agent, you know, SOPs and agentic AI and so on, but it's kind of the same question. So the customer is asking a question. How do we map that to mathematical logic at inference time? Right. Because now we don't have the benefit of the like, oh, pump the brakes. Let's watch walk you through a bunch of examples, right?
B
This is real time. This is real time.
A
Exactly. Grandma, Grandma wants to know if she can change her ticket from her first class to a coach ticket. And it's a code share flight and it's, you know, the multiple airlines involved. And she wants to know now. And you can't say we're going to like walk you through a semantics exercise. So how do we do that? Well, we, we actually translate multiple times. So you can take the same thing and you can use different language models, you can use different settings on language models, all kinds of things you can do to basically exercise out the ambiguity in the statement. So if the statement is unambiguous, very often what we see is that we get semantically the same formula. They can be different syntactically but. But they kind of, but they, but they're equivalent semantically. And if there is ambiguity in the statement we tend to get different formulae. And so what we can do now, so we basically imagine we get five translations each from different language models or the same language model, but just call different times and then we use the theorem prover automated reasoning to check that each of the pairs are equivalent. And if they are, then we're very confident we got the translation rate. And if we don't, we get a counterexample. Right. So, so the one translation may assume it's first class or one minute one may one, one, one of the translations may assume you've taken the flight and the other translation may assume you didn't take the flight. And so now we get to go back and we get to ask the customer or the customer's customer. Right. Either the airline or the passenger. Did you mean you've taken the flight or have you not taken the flight?
B
So we seek to solve for that ambiguity that has emerged as evidently.
A
Yeah. And so that brings up all kinds of really interesting cognitive stuff which I'm now I'm talking to a lot to cognitive psychologists and so on. And so there's a whole thing about theory of mind and around how people build mental models about what each other understands and they convey information in ways that's efficient but leverages the shared mental model. And so basically what's happening is at inference time we are trying to build literally a logical representation of the mental model that the customer has or the customer's customer quickly and, and then to such that we can now reason about that formally and then provide evidence as to why the answer is is, is is correct. So that, that's turned out to be very interesting work and it's interesting if.
B
If I sort of just as we come towards the end of the episode here, I'm thinking that throughout what we've been talking about, a lot of it has come down to bringing the capabilities and the, and the, the, the sophistication of automated reasoning. The complexity if you like, to a level that can be accessible to non specialist people, that can be accessed by systems, et cetera. There's kind of this, this ability to introduce it to more use cases because as you said at the start, you don't need the PhD necessarily sitting there all the time. This sounds like it's likely to actually create a kind of phase of hyper growth in the use of this technology. And the advancement of this like this is really interesting to me that suddenly it's just gonna explode into it.
A
So here's the fascinating bit, right? Your, your SMT or SAT solvers, they're MP complete, typically. So that's intractably large, right? That's the famous hard, you know, the hard, hard problems. But those are the easy ones by our, by, by, by our, by our best breakfast like NP complete is practically boring because for us the hard problems are the undecidable ones, right? So, so there are, practically speaking, whether or not it's NP complete or undecidable, what that really means is that the tools will from time to time not be able to answer the question. They can answer. They can be correct. They can. Right? They can say the theorem is not true or the theorem is true and you can rely on those answers. But sometimes they will just either never come back with an answer or they'll just come back and say I don't know. And the, and the. So that's, that is a natural consequence of MP complete or undecidable problems. That's a fundamental computer science, you know, intro, Intro to Theory of Computer Science 101 kind of idea. It's there and, and the challenge is that it's fundamental. So you, the only way to deal with it is to get as close as you can to the customer to appreciate the type of problem they're asking. Because you're trying to statistically cause that case, those cases, you want those cases to be so few that, that, that the customer still benefits from using the tool. And so it, it's a very interesting, it's a good UX and design and customer connection and formal reasoning are comorbid. They live together.
B
They live together. Yeah, it's true and it's interesting, I think more folks are getting familiar with that concept somewhat unconsciously because if you think about folks using these tools in their daily lives, most people know that if you kind of, you know, do a broad one shot statement to your, you know, your chatbot of choice, you're going to get a pretty questionable answer versus if you take the time to think about your domain, think about your scope, think about what you want the tool to do, you'll typically get a much higher quality answer. This is kind of speaking that same thing of scope down and then suddenly good stuff happens.
A
It is amazing to watch our society discover how hard it is to get, get to truth in complex domains. Yeah. And I think we're discovering that as a society and I'm by the way I am in my personal life, right? Like I, I abstract a lot of ways so don't think, I don't, I don't appreciate how hard like the Department of Buildings in New York City is to navigate or like you know, like selling laws or electrical codes or whatever because I just do this all the time. I don't really do anything else. Right. And so, and I too am trying to use Gen AI for those things. Right. Like do I really need the determine. I just do it myself. And so I'm using Gen AI. I'm like whoa, whoa, whoa, whoa like now, now that's going to not meet code. There's all kinds of issues, right. And so it's so the, the defining and reasoning over true and untrue statements and complex domains is incredibly hard. And, and I think that we are as a society really beginning to appreciate that as we, as we try and because we were as we're trying to remove this slow error prone socio technical mechanisms we've had in place with AI systems to help us answer those questions efficiently and truthfully we're really having to dig in as a society about what is truth and how do you encode it.
B
It's fascinating. I think it's exciting. Byron, thanks so much for coming on. I know your time is really short. Thank you for coming on and sharing this with us. I know a lot of our listeners are going to be dusting off old maths books but also relax that they don't have to.
A
Right, exactly. Yeah, exactly. Yeah. That's the other thing is you can nowadays lean and a little Gen AI goes a long way.
B
There's a T shirt right there. Lean a little AI Go log way.
A
Yeah.
B
Thanks everyone for listening. Would you love to get your feedback awspodcast.com is the place to send us any feedback you have. And until next time, keep on building.
Date: November 24, 2025
Host: Liz (AWS)
Guest: Byron Cook, VP & Distinguished Scientist at AWS
This episode explores the field of automated reasoning—its mathematical foundations, historical development, and its transformation into practical tools for security and AI at AWS. Byron Cook, a leader in the field, explains how automated reasoning transitioned from academic theory to tangible products that customers use daily on AWS. The conversation bridges foundational logic, applications in AI (including verification and guardrails), customer-driven challenges, and the ongoing quest to make sophisticated verification accessible for real-world use cases.
Historical Context:
Roots traced back to mathematical logic, notably Turing’s early work mapping programs into mathematics to check correctness.
Automated reasoning is the manipulation of symbols within a logic system to deduce truths—essentially, machines using formal mathematical tools to reason about systems ([01:09]–[03:14]).
"Turing actually wrote one of the first papers in the space...shows basically how to take a program, essentially map it into...the world of mathematics, like mathematical logic, and then to reason about the mathematical artifact."
— Byron Cook ([01:40])
Discipline Divergence:
Why It Matters at Amazon:
Radical advances in propositional satisfiability solvers (SAT/SMT solvers) making previously intractable problems solvable.
The shift to services and “software is never shrink-wrapped”—code is constantly rewritten, presenting ongoing opportunities for verification.
Security concerns from enterprise customers; many moved to AWS specifically because AWS could provide formal, provable guarantees about things like cryptography correctness.
"We had customers say...if you can prove the correctness of the cryptography, that's super meaningful to us. That's a threat model we're really worried about."
— Byron Cook ([07:41])
Accessible Examples:
Tools like IAM Access Analyzer and VPC Reachability Analyzer use “push-button” automated reasoning, allowing customers to benefit without a PhD in logic ([09:14]–[11:58]).
"If you're using different capabilities, like, you know, trying to understand, you know, what can access my VPC...you've managed to surface this capability in a consumable form."
— Liz ([08:26])
Limits and Challenges:
Lean Theorem Prover and LLMs:
The mathematical community’s adoption of Lean (created by Leo de Moura at Amazon) has provided abundant, high-quality training data for LLMs.
LLMs have become surprisingly proficient at driving theorem provers, creating new integration opportunities ([11:58]–[13:25]).
"...the models are usually pretty good at running these theorem provers. And so that gives us now that capability that you didn't have before."
— Byron Cook ([12:01])
The Hallucination Problem:
Generative AI offers immense utility but can produce confident, incorrect answers (“hallucinations”).
Customers (especially in highly regulated domains like financial services) raised major concerns about trust and correctness in GenAI outputs ([13:51]–[16:01]).
"We can't trust this. We can't use it. Like the prototypes are cool but...there were issues around privacy, issues around sovereignty and then issues around incorrectness due to hallucination..."
— Byron Cook ([16:18])
Verification Approaches:
80% of the Work:
The hardest part is defining exactly “what you want to prove.” Translating policies or regulations into precise specifications is the biggest challenge ([19:49]–[21:09]).
"Over 80% of the work is figuring out what it is you want to prove."
— Byron Cook ([19:49])
From Documents to Logic:
Current tools (e.g., Automated Reasoning Checks in Guardrails) can analyze PDFs and synthesize initial logical models of policies using LLMs ([25:27]–[28:29]).
They surface “corner case” scenarios—which are then validated and refined by customer feedback, using lattice theory to minimize repetition (think: Dr. Seuss’s Green Eggs and Ham) ([25:27]–[29:43]).
"...we're giving customers a big stick and...helping them poke the wasp's nest of what is truth."
— Byron Cook ([23:36])
Closing the Human/Logic Gap:
Real-time Translation:
At runtime (e.g., a customer asking a chatbot about ticket changes), systems translate natural language to logic multiple times (across models/settings) to uncover ambiguity ([30:10]–[32:22]).
Disagreement between translations signals ambiguity; system prompts for clarification.
"If the statement is unambiguous...we get semantically the same formula...if there is ambiguity...we tend to get different formulae...so now we get to go back and ask...Did you mean you've taken the flight or have you not?"
— Byron Cook ([31:06])
Cognitive Insights:
Democratizing Access:
Dealing with Hard Problems:
Even with powerful tools, many verification tasks remain undecidable (fundamentally unsolvable in general); good UX and customer tailoring are essential to get value despite formal limits ([33:48]–[35:27]).
"Good UX and design and customer connection and formal reasoning are comorbid. They live together."
— Byron Cook ([35:22])
Societal Lessons:
As AI and automated reasoning scale into more domains, society is collectively learning just how complex “truth” is in real-world systems ([36:02]–[37:20]).
"It is amazing to watch our society discover how hard it is to get, get to truth in complex domains."
— Byron Cook ([36:02])
Byron Cook’s deep-dive traces automated reasoning from its logical foundations to its essential, everyday role inside AWS’s infrastructure and in the future of trustworthy AI. AWS is bridging the gap between formal mathematical verification and practical, accessible tools—enabling customers to trust, verify, and continually improve AI-driven systems at scale. The episode underscores the importance—not just of technical breakthroughs, but of designing systems that are understandable, maintainable, and respond to real-world ambiguity and change.
For more feedback or questions: awspodcast.com