Summary6 min read

Risky Business Soap Box: The Lethal Trifecta of AI Risks

Podcast: Risky Business
Host: Patrick Gray
Guest: Josh Devon, Co-founder of Sondera
Date: February 19, 2026

Episode Overview

In this special Soap Box edition, Patrick Gray sits down with Josh Devon, co-founder of Sondera and previously Flashpoint, for a deep dive into the emerging risks of AI agents in enterprise environments. The conversation centers on the expanding role of agentic AI, the urgent need for trustworthy governance, and what Josh calls “the lethal trifecta” of risks: access to sensitive data, interaction with untrusted content, and the ability to externally communicate. The episode explores the challenges of controlling non-deterministic AI behaviors, the shortcomings of traditional security tooling, and Sondera’s approach to harnessing these agents safely at scale. The tone is engaging, pragmatic, and forward-looking.

Key Discussion Points & Insights

1. Defining the Problem Space: Agentic AI at Scale

Agentic Era: Josh introduces the idea of the "agentic era," where AI agents can autonomously carry out complex tasks, making them both powerful and inherently risky due to their non-deterministic nature.
- “Agents are amazing precisely because they are non-deterministic... if you have a lot of minions and you could trust them, you could do a lot of things." — Josh Devon [01:41]
Trustworthiness is Crucial: Trust comes down to reliability and governance—the agent must achieve goals reliably and within set boundaries.
Principle of Least Autonomy: Borrowing from “least privilege” for humans, Josh proposes "least autonomy" for AI, giving agents power but tightly restricting what they are allowed to do.
- “With agents it's really a principle of least autonomy. How do I continue to give this agent more and more superpowers while continuing to restrict its autonomy?” — Josh Devon [02:55]

2. The Lethal Trifecta of AI Risk

Core Enterprise Risk Scenario:

Three-pronged Threat (“Lethal Trifecta”):
- Access to private data
- Exposure to untrusted content
- Ability to externally communicate
- “Anytime an agent has access to private data, exposure to untrusted content and the ability to externally communicate, that means you have an agent that's susceptible to prompt inject and could exfiltrate data.” — Josh Devon [04:41]
Data Mutation Risk: Beyond exfiltration, agents can mutate data inside the perimeter, potentially bypassing DLP and traditional controls.
Attribution and Identity: Difficulty identifying whether a risky action was performed by a human or an agent—raises challenges in forensics and response.
Gap in Security Tooling: Normal endpoint defense software (EDR) is blind to logic-based attacks like prompt injection, which don’t involve malware.

3. Prompt Injection: Social Engineering for Robots

Prompt Injection Parallels Phishing:
- “I always call these prompt injection attacks... it's basically social engineering for robots.” — Patrick Gray [07:02]
- Josh reinforces the need to assume agents will be prompt injected, just as humans are assumed phished ("assume breach" → "assume prompt inject").
Defensive Strategy - Behavioral Guardrails:
- Agents require “policy as code” to intercept risky behaviors, such as blocking financial transactions over a threshold.

4. Applying Guardrails: Technical Approaches

Agent Harness and Policy Engine:
- Sondera’s solution creates a “harness” wrapping around the agent, evaluating each action against a policy engine in real-time (policy as code) to enforce rules.
- “Effectively what we're doing is man-in-the-middling the entire trajectory and every single step we are evaluating that through a policy engine...” — Josh Devon [09:47]
- Policies can outright deny, steer, or escalate to a human-in-the-loop.
Non-determinism and Evasion:
- Agents often find novel, unintended ways to accomplish tasks (“If I block ‘rm -rf’, it might use something else to delete files.”).
- Josh describes the difficulty in covering every permutation and the necessity of deny-all-by-default policy languages, such as Amazon’s Cedar.

5. Continuous Simulation (“Backtesting”) for Risk Discovery

Simulations/Adversarial LLMs:
- Sondera uses adversarial LLMs to constantly probe agents' action spaces, stress-testing with real-world scenarios to uncover risks like data leakage, costly infinite loops, or privilege escalation.
- “It's sort of like in this action space, how do I test this... We have an adversarial LLM that takes the agent under test and then perturbs it with tool calls.” — Josh Devon [20:01]
Policy Auto-Formalization:
- Natural language rules (e.g., corporate policies or regulations) are auto-formalized into logic statements to generate machine-enforceable policy as code.

6. Deployment Flexibility and Integration

Harness Architecture:
- Can be deployed on-prem, as a sidecar, in the cloud, or in air-gapped environments—wherever agents live.
- Control plane and policy studio support central management and simulation for diverse agent fleets.
Integration with Major Platforms:
- Sondera has hooks into agents like Claude Code, Cursor, GitHub CLI; others may require proxies to monitor/control behavior.

7. The Challenge of “Shadow AI” and Evolving Agent Capabilities

Shadow AI:
- Handling unsanctioned agent use isn’t Sondera’s focus but is a growing enterprise challenge as agents integrate deeply into OSs and apps.
Agents that Learn New “Skills”:
- "Unlike humans... you don't come back tomorrow and you're like, oh, by the way, I learned differential calculus last night... But agents are like that." — Josh Devon [25:20]
- Continuous monitoring and simulation are necessary as models evolve and gain new capabilities unexpectedly.

8. Target Customers and Use Cases

Focus on Highly Regulated Sectors:
- Large firms in finance, healthcare, insurance, and manufacturing—entities with global operations and significant compliance burdens.
Supporting AI Platform Teams & Agent Vendors:
- Applies both to internal security teams and to those building agents for enterprise sale who need robust, attestable safety controls.
- “We're trying to make it easy to have that single control plane in the enterprise that allows you to apply a single policy to all these different agents.” — Josh Devon [35:12]

Notable Quotes & Memorable Moments

Josh Devon on Agent Risks:
- “Current tooling like EDR... just is not able to constrain agent behavior because EDR can't see these logic-based attacks.” [06:32]
Patrick Gray on OpenClaw:
- “There’s this really funny moment… OpenClaw’s like, ‘That's fine, just give us these cookies.’ … and off it went. At no point does the API... know that it’s an agent and not a human being.” [08:48]
Josh Devon on Non-deterministic Evasion:
- “If I block ‘rm -rf,’ it's like, oh, now let’s use move to trash. It found many other ways to delete files…” [12:42]
Patrick Gray’s Anecdote on Tech Evolution:
- Reminds listeners how each generation experiences the shock of automation—recalling his mathematician father marveling at computer-plotted graphs just as today’s engineers marvel at AI-written code. [29:28]
On Enterprise Pain Points:
- "[Banks ask]... how are we going to prove to regulators, auditors that all of our agents in all these different countries operated according to their bespoke laws..." — Josh Devon [31:08]

Important Timestamps

[01:41] — Josh defines the agentic AI problem space & unique risk of non-deterministic behaviors.
[04:41] — Introduction to Simon Willison’s “lethal trifecta” and real-world enterprise risks.
[07:02] — Prompt injection as “social engineering for robots”; analogy to phishing.
[09:47–11:39] — How Sondera’s agent "harness" and policy engine work; real-time behavioral control.
[12:42] — Challenges of blocking all permutations of risky behaviors.
[15:43] — Where the harness lives; deploying across varied enterprise environments.
[20:01] — Simulation/adversarial LLM as stress-tester for agent security posture.
[25:20] — Agents’ ability to gain new unexpected capabilities, demanding continuous simulation & policy updates.
[34:01] — Primary customers: large, regulated enterprises and platform builders.
[36:14] — Early use cases: policy standardization, attribution, coding agent oversight.

Conclusion

This episode delivers a pragmatic and often witty exploration of the real and rapidly evolving risks of AI agents in the enterprise. Josh and Patrick dissect not just theoretical dangers, but the practical challenges of deploying, monitoring, and governing autonomous agents. The emphasis on continuous policy enforcement, simulation, and flexible deployment models positions Sondera’s approach as both timely and vital for organizations on the frontier of agentic AI adoption.

“I want these agents running. I want people in YOLO mode on their Claude code... but for an enterprise, I can't be in YOLO mode. But if I have the lanes that I can constrain YOLO mode inside... yes, I want the YOLO.” — Josh Devon [28:30]

For those tasked with securing enterprise AI deployments, this episode is an essential listen—or, with this summary, an essential read.

Loading summary

Transcript57 lines

[00:00]
Josh Devon
Foreign.
[00:05]
Patrick Gray
And welcome to another edition of the Soapbox podcast here at Risky Business. My name is Patrick Gray. For those of you who are unfamiliar, Soapbox is where we sit down and have a wholly sponsored chat with a vendor or a startup. And yeah, these things are wholly sponsored and that means everyone you hear in one of them paid to be here. So today we are chatting with Josh Devon. Josh is the co founder of Sondera and prior to that he was actually the co founder of Flashpoint, which is a company that many people in the cybersecurity discipline may have heard of. But now, of course, you know, not content with his post acquisition riches, you know, the life on the mega yacht got a little bit boring. So he's decided to come back out into founderland and start Sondera. Josh, welcome. Good to see you.
[00:53]
Josh Devon
Yeah, no, hey Pat, good to see you as always.
[00:56]
Patrick Gray
Now, in addition to them being a sponsor, I'm planning on doing some advisory work with Sondera and of course they're a decibel founded company and decibel as a part owner of the Risky Business podcast. But Josh, like let's just step through, I guess how to tackle all of this, right, because it's a big topic. Why don't we start with defining the problem space that you're trying to tackle at the moment, which is AI agents crawling around everywhere doing God knows what. I think we've had a bit of insight into how this can go wrong lately just by looking at what people are doing with OpenClaw. But why don't you walk through, I guess, you know, define the problem space that you're in at the moment and what you're trying to do? Because it is interesting.
[01:41]
Josh Devon
I think at a fundamental level what we're trying to unlock is like this agentic era, right? Like agents are amazing precisely because they are non deterministic. And if you have a lot of minions and you could trust them, you could do a lot of things. And while you sleep, the minions can go work. And I think like that's the future that we want to unlock. And I think where the organizations and enterprises are truly going to find the value, the way I think about it is Genai is kind of like the engine and agents are kind of like the automobile around the engine. We're not going to go into work and have hundreds of chatbots. We're going to have waymos driving around and in order to do that they have to be really trustworthy. Now trustworthy is a combination of both like reliability, like will this thing Drive me to the airport and get me there most of the time. And then governance. Like can it do it without running over 14 people? Right. Like it can achieve the goal reliably, but it has to do it in a way that satisfies like the conditions that I've set. And I think that's the unlock and the problem space that we're going after. It's how do we enable us to put these agents on longer term missions that can achieve, you know, harder and harder goals by laying down like deterministic lanes in which they can operate, but letting them have that unique non determinism which is what makes them so special as a technology, but bound that autonomy. And we have this saying with humans where it's principle of least privilege with humans in terms of passwords, but with agents it's really a principle of least autonomy. How do I continue to give this agent more and more superpowers while continuing to restrict its autonomy as I give it more powers and that utility? You know, safety, regulation, compliance, all those trade offs are what we're trying to do with our product and allowing folks to set specific deterministic rules around their agent behavior that they know they can rely on. You know, today we have a lot of what I call prompt and pray, which is we put into the system prompt, please, please, please. If we put it in capital letters, we're like, oh, I really mean you have to follow this every time. And we know that the agents won't and so that non determinism is at both like a risk and the value. And we're trying to reduce the risk and cut the tail off. So hopefully that explains at a high
[04:14]
Patrick Gray
level, I mean, partially. Right. I think one of the problems there is that I've asked you to define the risk and fair enough, you gave me a metaphor which is around cars. How do we stop a Waymo or a Tesla from running down 14 people on the way to the airport? Right. So okay, cool. What is the corpo like enterprise equivalent of an agentic AI running over 14 people on the way to the airport. What does that look like inside an enterprise? Brass tacks.
[04:41]
Josh Devon
The biggest risk really for businesses I think today is what Simon Willison calls the lethal trifecta, which is anytime an agent has access to private data, exposure to untrusted content and the ability to externally communicate, that means you have an agent that's susceptible to prompt inject and could exfiltrate data. And I think data concerns are really primary when it comes to gen AI. And in addition, agents present Another challenge to traditional data governance. So not only do you have to worry about the perimeter and exfiltration, but you have this issue of data mutation as well. Agents can go and just mutate data inside the perimeter that your DLP tools, et cetera, might not see. So that's a huge risk. Another risk you have is around identity. And pat, actually you and I were talking about this a little bit where if you can't attribute who's doing what with agents versus humans, you can take on a lot of risk. As a company, as an employee, I've seen when it comes to horror stories, people's jobs have been on the line because their agent returned say a file that they weren't necessarily supposed to have access to. Who's the insider threat? Is it the human, is it not? I've had banks ask me if say a vulnerability, we're using coding agents and a vulnerability gets in the code. Okay, fine, some vendor, we find the vulnerability. But how do I know what kind of problem I have? Do I have an insider in the developer? Did the agent get hijacked? Did the agent just hallucinate? How would I know? And so having that observability I think is going to be really critical. And if you don't have that, you're taking on risk of unknown risk of who's doing what in my organization. And I think then we've also seen just as a risk that our current tooling like edr, et cetera, just is, is not able to constrain agent behavior because EDR can't see these logic based attacks where you know, I'm, I'm prompt injecting an agent to open up a web browser and go to a website and go do a thing. There's no like malicious software being detonated. And so basically I, I always, I
[07:02]
Patrick Gray
always call these prompt injection attacks. I mean it's basically social engineering for robots, right? So like how are you gonna, how you need to use similar defenses as social engineering, which is limiting what the agent can do, which is limiting what people can do, right? Like remove foot guns from people and from agentic stuff 100%.
[07:21]
Josh Devon
And I think that's the first principle's thinking that we're approaching this problem space of just like humans, I have to assume they're going to be prompt injected. Like humans get fish, they get prompt injected, same thing. The models can easily get phished or prompt injected. They can have emergent misalignment. I just have to assume this is the other we say in security, assume breach. We have to assume Prompt, inject. It's the only way that you can really do this. The way that then you have to govern these agents is through the behavioral controls, just like we do with humans. With humans, we have key cards and lock doors, and we prevent bad ideas from happening through behavioral controls. And that's what we've got to do with the agents, because we have to assume that they'll be prompt, injected. And so we're going to stop behaviors that we see that we don't want them to do. So if I prompt inject an agent to, you know, send me $500, I have to assume that the agent will get prompt, injected. And I'm going to put a policy against a tool call that says, hey, if you're doing more than $100, you're not allowed to, even if the agent is prompt, injected. And I think it's those types of deterministic controls that are going to be necessary to give us, you know, the confidence that we know that these agents aren't going to, you know, when you talk about risk, it's like, well, they could just decide to send lots of money to someone if you have an agent that can, you know, send money,
[08:49]
Patrick Gray
you know, so here's the thing, right? My colleague James Wilson has recorded a solo podcast, but he's looked at openclaw, and I listened to it yesterday, and, you know, there's this really funny moment where he's like, you know, he needed to give OpenClaw access to, like, one of his social media accounts or whatever, but he didn't want to give it access to his browser. So OpenClaw is like, that's fine, just give us these cookies. Yeah, you know, and he gave it the cookies and off it went. Right? So, you know, at no point does the API or whatever, you know, wherever these cookies are being used, does it know that it's an agent and not a human being. Right? So I dig it that, like, what you're saying is, oh, well, we could put these guardrails around the agents, but how, right? Like, if these things start getting access to API endpoints, right? And they could just ask the user, hey, I'm going to make your life easier. Give me that API key. And the user pastes it in. Like, why? I'm guessing this is a problem that you're spending a bit of mental energy on. How do you deal with that?
[09:48]
Josh Devon
Yeah, no, totally. And again, so if we go with the first principle assumption of it has to start with behavior, what we're effectively building, and I'm using There's two terms I think that are worth just calling out. One is that the AIML researchers use. One is scaffold. So scaffold, an Asian scaffold is what wraps around an LLM and gives it the ability. So Claude code, for example, is a scaffold around Opus 4.6 and you're giving it a set of tools, a certain set of instructions and that makes it the agent. There's also something called a harness. A harness is what we're building that basically wraps the agent itself in the scaffold. And effectively what we're doing is man in the middling the entire trajectory and every single step we are evaluating that through a policy engine that uses policy as code to verify that the agent is doing something that is allowed. So we can create rules. Like if an agent pulls GDPR sensitive data, it's not allowed to use open web search tools. And if it tries to, we can notice that in the trajectory and then stop the tool call and we can either outright deny, we can try to steer the agent, so say, hey, you're not allowed to do that, try again. We can escalate to human in the loop depending. But that's you know, one big piece of how we're like in real time doing that. And there's obviously, you know, more layers to go in there. But that's how we're doing. We're inspecting the, you know, the deep, full like inspection and stateful inspection of the trajectory itself and then stopping it in real time if it's breaking a policy.
[11:40]
Patrick Gray
I mean, isn't one of the issues there though that the agents as they're, you know, non deterministic, I mean that's another way to say that they're sneaky.
[11:47]
Josh Devon
Oh, 100%.
[11:49]
Patrick Gray
And you know, you're going to tell it, oh, you can't do it this way, it's going to try something else.
[11:52]
Josh Devon
Right, so, so you're spot on. And like this, I think we were talking about this earlier, like this is I think a like one of the maybe bigger unappreciated risks and it's kind of like a scaled, scaled down version of the paperclip, you know, problem where it's like, you know, I make the AI make paperclips and it decides the best way to make paperclips is to like kill all the humans. So it can make as many paperclips as it wants. Like I feel like we have that situation, but using that as an example makes it so far fetched that it's hard to realize that like the risk isn't necessarily that you tell the agent to go do a thing and it just turns into, you know, it goes berserk and like, you know, turns into like the Terminator. It's more like you're saying it because you have a hyper competent, super eager to please, you know, you know, agent that is going to find a way. So, for example, it's going to invent,
[12:40]
Patrick Gray
it's going to invent DNS tunneling just to get it done, you know.
[12:43]
Josh Devon
Well, well, using OpenClaw, for example, we were doing a research extension for that and building out a version of what we're building with policy as code to prevent OpenClaw from running RM RF and all these things. And so as I was testing it to hey, try to do this or try to do that, delete this file, delete that file. It found many other ways to delete files that didn't like, if I blocked RM rf, it's like, oh, now let's use RM rf. Well, I can move, move to trash. I can find. There's so many different permutations.
[13:21]
Patrick Gray
One of the things that I love about OpenCloud, I didn't know this until I listened to James podcast is it manages its own config files as well. Right. Which is like, how do you put a harness on that?
[13:30]
Josh Devon
Yeah, so we actually, in a policy pack that we created, we created a deterministic set of rules to protect the system files so that openclaw isn't allowed to write its own, you know, you know, rewrite its own heartbeat because, you know, there's a, there's a shy hullude waiting in Open Claw, right, Where you know, this agent can get prompt injected, rewrite its own rules to, you know, every five seconds, reach out to other, you know, like there's this, you know, something that we can see here. And being able to block the behaviors is really where it's at. One of the cool things that through our approach and we've published about this, we're using Amazon's policy language seeder, which has really great properties. But one of them is that it's a forbid all like by default language. So it allows you to sort of deny the entire action space and then specifically allow different types of tool calls as a way of sort of constraining that for higher risk situations than, you know, I'm going to try to, you know, you know, I don't, I can't get every single edge case and you know, so that denial I think can be really, you know, Helpful. The other thing that we're doing as part of our harness is we have a simulation piece. So I see like a lot of people struggle with, you know, just like what we talked about at the beginning of this conversation. It's like, well, what are the risks? It's like, well, what's your agent do? You know, like so many people struggle with, you know, because each, you know, each type of agent has specific risks. Like I can set a rule like don't steal. Like, yeah, I want none of my agents to steal. But every agent can steal differently. Is it sending money? Is it sending data? What? So you know, we have, we use simulation to, you know, test that sort of action space and see, you know, where can we find, you know, edge cases and what, you know, can we get this thing to exfiltrate data? Can we get it to leak tokens? Not really red teaming it, the model to try to get it to say a bad word or looking for vulnerabilities. But what toxic flows and trajectories can we get this agent to do and then use that?
[15:44]
Patrick Gray
Just hold on before we continue with the simulation stuff because there's some interesting stuff to talk about there. Let's talk brass tacks again. Going back to those two words brass tacks about, let's talk about what you've actually built. So we know you've got a harness. Where does that harness live? You know what I'm like, man? I like to talk about things in real simple things. Is it a cloud platform? Does it arrive in a taxi? Does it drop out of a tree? Do you spin it up on an endpoint like, you know, how, how what is the thing that you have built?
[16:14]
Josh Devon
So basically the harness connects into a control plane and the control plane can be deployed in different places. And I should say like the harness has a policy engine. That policy engine can be deployed on prem. It can be deployed as a sidecar. You can send it to the cloud and we can host that. But we are under the impression that agents are going to be everywhere, even air gapped and we're going to need to have rules that can be applied to them even if they don't have Internet connections, but they're on a thumb drive or something. So we've deliberately built the harness and policy engine to be able to be deployed very, very easily in all those different environments. Vpc.
[16:57]
Patrick Gray
So the harness just goes anywhere where the agent, the agent is. Right.
[17:02]
Josh Devon
The control plane itself can also be hosted wherever. VPC we can do a managed host, you can host it Locally. And that's where your policies live, your policy studio. And what we see is that folks can take their system prompt. So in our system prompts we have a lot of please, please, please, you must never, if it's more than this, you must always do it in, you know, all those like, you know, things that we put in the system prompt. We take all the natural language policies that enterprises care about. So I've had folks ask me, you know, how do I apply my acceptable use policies, how do I apply my employee handbook, how do I apply EUAI Act? We take all of that natural language and through a process called auto formalization, we pull out like these logic statements, like what are the obligations, permissions, prohibitions contained in these natural language? And we then convert that into policy as code. And again, as I mentioned, we're using Cedar, so we're able to verify this code. We're able to re. Simulate. I mentioned simulation. We could talk more about it, but resimulate to see are these policies working the way that we expect them to.
[18:15]
Patrick Gray
No is always going to be the answer there. Just before you continue though, when you're using agents from the majors, how do you apply this harness to enterprise grade stuff that's in the cloud? What's the deployment look like there?
[18:32]
Josh Devon
Sure we are. As a good example, we've got hooks into say CLAUDE code and cursor and GitHub, CLI and the agents that are most widely deployed right now in my opinion. And different organizations have different, I would say quality of hooking. We're deeply, deeply integrated with like CLAUDE code and Cursor and others I think are beginning to open up their ecosystems when they're kind of like a walled garden.
[19:03]
Patrick Gray
But basically, is this going to be like Microsoft when they didn't build like network taps in azure for like 15 years or something? Is it going to be kind of like that?
[19:12]
Josh Devon
Well, there are ways like you know, we're talking about like what's the deepest integration that we can get. We also can use things and others have done this too, like using like, like, like light LLM and proxies and such that, you know, we can sort of like get what we need to no matter what. It's just how deeply we can get integrated with certain types of agents. But what we've built is meant to
[19:32]
Patrick Gray
be instrumental where they've made it easy, you've integrated where they've made it hard. You can sort of proxy everything and infer from that, like most of it. Yep, yep, got it, got it. Sorry. All right, so Back to simulations. What sort of stuff? You know, and I, I mean I, you could call it a simulation. You know, I don't like that word for some reason. It just sticks in my craw. But however, however say we were to call it a backtest, right? Maybe that's a better word. To make me happy, I've had engineers
[20:02]
Josh Devon
almost call it like a unit test for the agents. It's sort of like in this action space, how do I test this? And the simulation, I think we use that word because as we sort of see it, there's a spectrum of simulate, emulate, digital twin and you can get very sophisticated with cyber ranges and all of that. What we're really trying to do is starting at the simplest, fastest way to stress test these agents. And to your point, we call that simulation. But basically we have an adversarial LLM that takes the agent under test and then perturbs it with basically tool calls. And again, it's specifically focused on the action space. We're not trying to get it to say bad words or like, you know, it's what risky behaviors can we get it to do? We can monitor for all the bad words and do all of that stuff too. But to us that's like, you know, that's like sort of the easier problem. It's like, how do we get it?
[21:07]
Patrick Gray
Yeah, you built a digital devil on the shoulder LLM that can go and try to trick all the other LLMs into doing naughty stuff.
[21:15]
Josh Devon
And what's cool then is that we can one, test policies that may already exist on that agent to see if they're being effective, and two, we can bring threat intelligence into that simulation as well. So yes, there will be new edge cases, there will be new agents will get new capabilities. When a model changes, might the agent try to do something that it hasn't done before? And as we move towards a space where agents are going to be creating their own tools, agents are going to be creating other agents. There's going to really be a need to, you know, understand like what the potential is. Like what really are the potential risks here. And so the way I see it today is like a lot of folks are just, we get a spreadsheet and we get in a room for the next 18 hours to 18 months and it's like, can you think of a risk, Pat? Pat, can you think of a risk? And like, you know, that, you know, it's like, oh, that takes a long time. So, you know, this I think can help jumpstart that and get all the Teams aligned. Like, what does security.
[22:13]
Patrick Gray
Well, it's also, it's also, you know, with that, with that sort of spreadsheet manual thing, it's sort of like the network graph a bit, I guess. You know, it's like trying to do a bloodhound graph by hand. You know, like, it's just gonna, it's gonna take. You're never gonna find it. You're never gonna find the subtle paths to an LLM doing something that bothers your compliance team. Right. So, you know, with that in mind, tell me what sort of stuff you've found. What sort of stuff shakes out of these simulations? Fine, we'll call them simulations, but what sort of stuff? What sort of stuff shakes out of these simulations that we're, you know, what sort of changes are being simulated in the first place? And then from there, how do you actually go about setting some guardrails, redoing the simulation or back test and then being able to. Off you go. Yeah, deploy.
[23:03]
Josh Devon
Think of, think of simulation as something that can kind of be running all the time. Like, I kind of, you know, think of it as like we're constantly like stress testing what, what's coming out is like things like, can we get the agent to, you know, send money more dollars than we want it to? Can we? It really depends, like what the agent is under test. Right. Like, so can we, you know, get it to leak tokens? Can we get it to, you know, manipulate internal data? So really it's like taking all of the, you know, like all the OWASP, like top 10 things that you like, might worry about, see if we can, see if we can see that we can also do things like, hey, might this thing go into like some infinite cost loop where it's going to try really, really hard and rack up a huge cloud bill, things like that. So there's a whole spectrum of like, risks that different, like folks will care about whether, you know, a CTO cares about. Again, like, I have cost control risks or a GRC team needs to know what controls do I have against particular risks that are popping up with things like Claude, it's like, can I get Claude to jump directories? And the answer is yes. We've all had to do weird things. So those are the types of risks that might emerge. And then we take all of that. And that's part of that auto formalization process that I mentioned of. What is this risk register? What is the agent, like what we call the agent card? It's sort of like, what is this agent and its capabilities like are we hiring an intern with photocopy access or a CFO who can send dollars? And we then have a process and like a pipeline that takes that natural language for the policies that you want, that takes the simulation that takes what the agent's capable of and then creates like bespoke policy as code for every agent, that restricts the agent's autonomy in the way that we want it to for that agent, while still allowing it to succeed. Part of why simulation is also important, especially when you're adding policies. It's like I can make the Waymo perfectly safe by making it not be able to drive. It's 100% safe. But we always have this trade off between utility and safety. And I think simulation can help us help make those trade offs. And what policies might we need to mitigate X, Y and Z? And so that process of simulate, what rules do we need? Real world observability. Continue simulation we see as like a virtuous flywheel to really continually update the policy as code. Because a challenge that folks tend to bring up is the fact that unlike humans like Pat, you're an incredibly smart person, but you don't come back tomorrow and you're like, oh, by the way, I learned differential calculus last night and I can do all these things that I couldn't do before. But agents are like that. So you might get a new model release that the agent starts, I don't know, coding its own tools that it didn't do before.
[26:19]
Patrick Gray
Well, how do you catch that? I mean, how do you catch that? Because you're going to design a policy based on what the agent can do or what skills it shows during the simulation. What if it picks up, you know, like Neo from the Matrix, all of a sudden it knows kung fu. Like what do you, what do you do?
[26:34]
Josh Devon
Then it's, I think that's again why you have to continually be constraining the autonomy of the agent.
[26:43]
Patrick Gray
So default, default, deny.
[26:44]
Josh Devon
Default, deny. But understanding like, you know, you might like, I don't know, I'm making this up. But like if the Waymo got a rocket pack, you know, and could start like doing things well, maybe there are certain areas that you would allow that, but other areas you're going to constrain some.
[26:55]
Patrick Gray
So it's least privilege, I guess is what you're getting at. It's least privilege. It's sort of like allow listed permission.
[26:59]
Josh Devon
Well, we say least autonomy. Least autonomy, really. It's like, you know, again, like a lot of these rules are probably going to stay the Same in the sense of, like, you know, okay, none of our agents are allowed to blackmail or, you know, we don't. But the way that agents can do this, behavior will change. And that's why simulation is going to be so important. And it's going to be really important to be able to put controls around the agents that are outside of the model. Because, like, let's say you've got a fleet of agents doing your business and, you know, suddenly this emergent behavior appears because, you know, pick your model. You know, version 17 came out. Well, I can't go to my customers and be like, well, we have to shut everything off and tell all these people, tell them to retrain the models, because we don't like it. You know, I have to have external controls around the model that can prevent it from doing anything that I don't want to. And so, you know, and what we're getting at of, you know, agents creating tools, you know, that's programmatic tool calling, right? Like, I give the agent an API spec and I think if we're seeing in simulation that in response to, you know, some, you know, hey, I want you to do this, and I'm blocking another thing and it starts writing its own, like, you know, tools, you might say, hey, does this agent need bash? What. What do we need? You know, like, you know, like what.
[28:20]
Patrick Gray
By the way, something you said earlier too, which is directory traversal via Claude. I didn't realize you could dot, dot, slash an LLM. That's. That's quite amazing about how everything that's new, everything that's solved is new.
[28:30]
Josh Devon
Yeah, yeah. No, Claude will sometimes come out. And like, the thing is though, is that, like, I don't want to, like, to me, I want this to what we're building to be a green light product. Right? Like, I believe in this stuff. I want these agents running. I want people in YOLO mode on their. On Claude code. Like, I've used YOLO mode and Claude code and sandbox. It's awesome, right? It's the way that you want it to be. But, you know, for an enterprise, I can't be in YOLO mode, but if I have the lanes that I can constrain YOLO mode inside, you know, like, yes, I want the yolo. And so this is what we want, right? We want to be able to go to sleep and wake up and it did the thing that we want it to do.
[29:16]
Patrick Gray
Yeah, you want to go away, make a cup of tea, you want to come back, that task is finished or nearly finished.
[29:24]
Josh Devon
I mean, and that's where we're headed. I mean, like, that's like, you know.
[29:28]
Patrick Gray
You know, it's not the first time though. Right, Sorry. I just got to tell an anecdote briefly, which is my father was a mathematician and he wrote some textbooks and whatever, and he used to have to hand draw graphs. Then computers came along and I remember watching my dad sit back with a cup of tea watching derive, which was the maths package back then, plot out a 3D graph, and he would just sit there in front of the computer with a cup of tea or a glass of scotch, watching it appear. You know, to him, back then, this was the equivalent of people watching Claude write their application. It's just funny when I think about that sort of cyclical nature of technology, you know?
[30:17]
Josh Devon
No, and I bet he was feeling the same thing we are now. When I'm watching Claude, I'm like, why are you so slow, Claude? You know, like, why are you taking so long to do this?
[30:24]
Patrick Gray
No, I mean, I think for him it was just incredible. Like, it was just this incredible thing, you know, I must have. I've seen him. I just. He did a lot of plotting, right? Like it's part of the gig. And all of a sudden, as a mathematician, when all you needed to do is write a few lines in a text file and bang, out came this, you know, this plot that to do it by hand is like, you know, full on. Absolutely full on, yeah. Anyway, sorry, you just triggered a memory there. Had to share childhood memories with Pat Gray.
[30:52]
Josh Devon
Yeah, no, I guess maybe you did learn differential calculus.
[30:56]
Patrick Gray
I mean, my qualification is in engineering, so.
[30:59]
Josh Devon
Yeah, yeah, I know. Well, you're going to. You know, I was saying the complexity of the problem, I think for enterprises, and I think the approach that we're really trying to take here is a lot of folks are figuring out, okay, we're figuring out this identity stuff, which agent's which, and the humans versus the agent. A lot of folks are figuring out the gateways. And you've got these MCP gateways and agent gateways, and these are sort of like, can the agent start the mission? Folks are figuring out the posture, management and all of that, but I think it's like the space that we're focused on of while the agents are in motion. How am I applying my organization's policies at scale to all these different agents, some of which might be in different countries, in different teams and just talking to a large bank, for example, they're like, we're in 180 countries, like how are we going to prove to regulators, auditors that all of our agents in all these different countries operated according to their bespoke laws and all of that stuff?
[32:07]
Patrick Gray
Well then there's the issue of what people are calling shadow AI. Right. Which is not your wheelhouse. Right. Your wheelhouse is dealing with the sanctioned stuff. Companies like Push Security, they're good at actually because they're into the browser. They're very good at finding when people are using AI agents that are unsanctioned. But that's a huge problem. I think island, the browser maker, they do it as well.
[32:29]
Josh Devon
Yeah, like, I think, you know, you know, we can like work on like agent shadow stuff. It's a little bit trickier because, you know, agents are beginning to show up everywhere. So like, for example, you know, we were talking about Windows a little bit earlier, but like, you know, whatever the next version of Windows is, like, are we going to have agents built into that? So it's like, so you know, you're going to start seeing these things like built in natively into the os, into like the endpoint, into the edge.
[32:59]
Patrick Gray
But I mean, that's okay if it's like, if it's in Windows, like there's going to be some sort of enterprise instrumentation sort of features because, you know, that's fundamentally Microsoft makes business software. But I get your point, which is that this stuff is going to be everywhere. Now I just want to quickly move on to a slightly different part of the, of the discussion, which is that, you know, this is early for you. There's been a bunch of people who've spun up like some fairly rudimentary like RBAC plays for AI. Some of them have even been acquired. I think, like there's all sorts of M and A activity and it's like, do you even have a complete solution yet? So whereas you're doing, I guess, something more comprehensive, that's going to take a bit more time. It's like a more complete solution. That said, you've barely even launched yet. You're currently working with design partners at this point. You know, you just mentioned, oh, you're talking to a bank with, you know, in 180 countries. Are these the type of organizations that you are trying to serve, these large sort of institutions in heavily regulated industries. Is that the go here? Is that who you're working with?
[34:01]
Josh Devon
Yeah, I mean, I think like we're really eager to solve the hardest problems. Like, and where we see, you know, highly regulated enterprises, finance, healthcare, insurance, even some aspects of manufacturing, you know, this is an area where there's so much to gain, but the risks are very high and, you know, we want to be working on like, solving those. So, like, yes, some of our, we're working with some very large enterprises that are, you know, pushing ahead with like, agent stuff. The other is, you know, so I would say like, one, we're talking to security teams that are doing that. Two, we've also been sort of pulled into several of the AI platform teams inside these large enterprises who, you know, are like, I think what we're saying is like, I believe we're building the right thing. What I find is that like other organizations are trying, either they're trying to build some version of this and I think getting stuck around, like, how do you.
[35:04]
Patrick Gray
So this is, this is the same old enterprise play, which is everybody's trying to do it themselves and they don't really have the time, the fun, focus the team.
[35:13]
Josh Devon
Yeah, and, well, some of them are doing it, but it's very tricky. A lot of folks are trying to use hard code rules into API calls and stuff like that, but that becomes very brittle and it's hard to do across different agents, whether they're from different frameworks and stuff like that. So we're trying to make it easy to have that single control plane in the enterprise that allows you to apply a single policy to all these different agents. And I think that's going to be really critical to get these things up and running. The other area that we're working with design partners is folks who are building agents to sell into the enterprise and either will have to expend significant resources in building their own guardrails and attestations of like, this is how our agent isn't going to do bad things or they can work with us and we're helping sort of almost like a, I don't want to have to build user manager if I can use a vendor that gives me user manager. We don't get any value by building user manager ourselves. So those are folks where we're sort of accelerating the sales cycle for startups and also enterprises that are building agents to sell into other vendors. So those are like our use cases. So anyone really building agents or concerned about the coding agents and like making sure that they have visibility into them and can put like, you know, rules like, you know, don't commit secrets, you know, is like a big one. Can we get visibility into, you know, is it the human or the agent doing things? That's another big one. And so those are like some of the early use cases around the coding agents as well as folks building agents, but maybe don't have the same level of expertise we have.
[37:05]
Patrick Gray
Josh Fascinating. Fascinating to chat to you about all of that, about what you're building. Yeah, we're going to be talking to you a bunch through the rest of 2026, and I think you're doing your hard launch in a couple of weeks, so all the best with that. And yeah, we'll be chatting with you soon.
[37:19]
Josh Devon
Thanks so much, Pat. It really was a pleasure. I loved hearing your anecdotes also, Sam.