Loading summary
Daniel Whitenack
Foreign.
Podcast Host / Announcer
Where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work and create. Our goal is to help make AI technology practical, productive and accessible to everyone. Whether you're a developer, business leader or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn X or Bluesky to stay up to date with episode drops behind the scenes and AI insights. You can learn more at Practical AI fm. Now onto the show.
Sponsor / Advertisement Voice
Well friends, when you're building and shipping AI products at scale, there's one constant complexity. Yes, your random models, data pipelines, deployment infrastructure and then someone says let's turn this into a business. Cue the chaos. That's where Shopify steps in. Whether you're spinning up a storefront for your AI powered app or launching a brand around the tools you built, Shopify is is the commerce platform trusted by millions of businesses and 10% of all US E commerce. From names like Mattel, Gymshark to founders just like you. With literally hundreds of ready to use templates, powerful built in marketing tools and AI that writes product descriptions for you, headlines, even polishes your product photography, Shopify doesn't just get you selling, it makes you look good doing it. And we love it. We use it here at Changelog. Check us out. Merch changelog.com that's our storefront and it handles the heavy lifting too. Payments, inventory, returns, shipping, even global logistics. It's like having an ops team built into your stack to help you sell. So if you're ready to sell, you are ready for Shopify. Sign up now for your $1 per month trial and start selling today at shopify.com PracticalAI Again, that is shopify.com PracticalAI.
Daniel Whitenack
Welcome to another episode of Practical AI. I'm Daniel Whitenack, I am CEO at PredictionGuard and I'm joined as always by my co host Chris Benson who is a principal AI research engineer at Lockheed Martin. How you doing Chris? It's been a while.
Chris Benson
It's been a little bit. It's good to talk to you. I was gone for a brief, a brief period but I'm back all safe and secure now.
Daniel Whitenack
Yes, completely reversed back to where you where you normally are but for a great conversation because we have a great previous guest who I got to talk with in London last one of the last times I was over on that side of the pond and now get to catch up with Donato Capitella who is principal security consultant at reversec. How are you Doing Donato.
Donato Capitella
Very, very good, thank you. And I'm so happy to be back.
Daniel Whitenack
Yeah, yeah, same here. I feel like the AI world is in some ways the same and in many ways different than when we chatted last. What's life been like for you?
Donato Capitella
It's definitely been very, very busy for us. Our company has obvious we now reverse the same people but we separated. But as part of that we've been doing much, a lot of Gen AI cybersecurity work. I think our pipeline has tripled in size and we've been doing a lot of research. I am actually just back from Canada where I was presenting our research at Black Cut in Toronto and before that I was at another conference in Stockholm called Secure AI. A complete two days just focused on Genai security. I mean we were presenting our research, there was OpenAI there, Microsoft, a lot of hugging face talking about MCP protocol security. So so much was happening. And so for us it's been incredibly busy and just like literally half an hour before I finished to run one of the training course that we do on Gen security for our consultants so that we can have more people that can deliver the work which is full of energy for me to do that. Like there are a lot of young people there, so it's been busy. Lots of work, lots of research, so lots of travel. What more to say?
Daniel Whitenack
Yeah, yeah. And what I mean last time we talked certainly we talked a lot about LLMs, prompting LLMs, etc. There's now these kind of additional layers or frameworks or approaches to developing AI applications. From your perspective, just in terms of. I'm always curious about this because some of us that are so kind of into the AI world and not constantly in front of real world enterprise companies, we have maybe a warped view of, oh, everybody's creating agents using MCP or something. What is the reality on the ground, as far as you see it, of kind of the core AI use cases that people are often thinking about in terms of not only security, but just in terms of adoption and scale. And then what is maybe actually shifting in terms of those use cases from your perspective at least.
Donato Capitella
I mean if you asked me this question last year and you probably asked me this question, I would have said the majority of our clients were doing rag on documents that or internal chatbots. There was a few of them that were starting to look at agentic workflows. Now fast forward to today. A lot of the stuff we test is agentic in one way or the other. And for me I have a very simple Definition of Agentic the LLM can use an external tool or API to do something. So it's got agency and typically there is a little loop that runs and the LLM can choose the different tools and maybe there is an orchestrator and a lot of these are internal, for example, for customer support. So there is an email that comes in and then there is this agentic workflow that based on the email it's got access to a few tools. It will look into the user account, it will try to look at historic data and then it can either decide I'm going to automatically perform an action or I'm going to suggest an action for the customer support agent. Some of them also draft the response or the types of actions that the agent, the real person needs then to approve. There is a lot of these currently, currently going on and to me makes sense because this is the promise of gen AI. Like certainly we didn't put that much investment in it just to generate text. Maybe the one thing that might be surprising for people outside of some of these enterprises is that MCP is too new for them to have it. Meaning that if you think about it, some of the big organizations have got development cycles where the first projects get concealed. A project gets conceived one year ago and so a lot of them will have their own agentic frameworks, essentially their own loops and their own prompts and their own parsing or they use LangChain, which isn't. No, actually what's the one that they use? Oh God, I forgot the name.
Daniel Whitenack
Crewai.
Donato Capitella
Oh, I was literally looking at the source. It's in C. I was literally looking at the source code like last week, what was it? It's by Microsoft. I cannot. Semantic, something which has got. You can define tools, say 18C sharp, like I mean people use Python, but you have to imagine a lot of these places, like native C stuff.
Chris Benson
I'm curious, as you were talking about, the world has moved into agentic and we've talked a lot about that on the show in general over in the last year and such. But kind of moving from that prompt only environment that maybe you and Daniel talked about earlier into the sagittic world, you defined it as kind of that external agency, you know, to bring in things. I would guess as someone who is not an expert on security, that that introduces, you know, mega amount of new vulnerabilities and new concerns. Just because you're now using those agents to reach out into the world and do things. Could you talk a little bit about like what that new landscape looks like to you, since you talked to Daniel last time.
Donato Capitella
So I would say if I need to be concise and make a statement, basically what people need to consider is that any tool exposed to an LLM becomes a tool exposed to any person that can control any part of the input into an LLM. Now what's very common is that our clients take APIs, which used to be internal APIs, for example, for customer support, for asking stuff. And these API are built to be consumed by internal systems, meaning they have never been exposed for real on the Internet. Now as soon as you make that API into a tool that the LLM can call any entity that can control any part of that LLM input via things like prompt injection, they can get the LLM to call that API with whatever parameters they want. And because this wasn't an API that you ever expected to be exposed essentially on the Internet, all of a sudden you have a problem. And it is not just exposed to the person that's prompting a chatbot. It is exposed to somebody that sends a customer support email in and then that customer support email is fed to the agentic workflow and now can cause the LLM to call some of these functions with whatever parameters. So I would say that authorization or access control has been the biggest things we've been focusing our efforts on. Like, you know, how is the identity passed to the tool and do you have a deterministic non LLM based way of determining whether that function can be called in that context in a safe way? If you don't have that, you can't go into production.
Daniel Whitenack
I want to run something by you, Donato, because I was thinking about this the other day and I wonder if you agree or have a comment on it. Basically, which is that what you basically described can be very, very complex. Like everything from like, let's say it is a customer service thing. There's the actual customer ticket. Maybe I'm in a retrieval way pulling in previous JIRA tickets that have information like from a repository, I'm calling, you know, maybe multiple tools. It seems like there's this sort of like explosion of complexity in kind of this web of connected things that happen before the prompt goes into the LLM. And I remember earlier on in my career when it was like the days of microservices everything, right? It's like all of a sudden you have 1,000 microservices, right? And I remember we had dashboards up on the wall and part of the problem was like when there was something bad that happened, an alert would go off on one of the services. But it wasn't just an alert that would go off on one of the services. It was like an alert went off on all of the services because they're all interconnected in this way that makes them all kind of malfunction at once. And so it became kind of this root cause analysis issue then at that point, and you kind of gave up or you had the trade off of that complexity and root cause analysis for the simplicity and flexibility of kind of developing on this microservices architecture. Do you see this kind of also getting into that kind of root cause analysis type of scenario or analyzing this network of things? Because it's just becoming so complex as these pipelines kind of grow and become more, more interconnected. And any one piece could kind of trigger a problem in the whole thing.
Donato Capitella
I mean, it is reminiscent of that. And I will say it's an explosion of data sources in the context of the LLM. So what I think is really dangerous is that now in the same single individual call or context that goes into an LLM call, we are mixing more and more data sources from more and more entrusted parties in the same LLM call. And that's where I think confidentiality integrity starts becoming a problem. Because again, now everything you put into that prompt ought to be trusted for the use case. Otherwise any single part can break it. I will give you an example. One of our consultants in the US was doing a test a couple of weeks ago this. And the idea of the use case was great. So there is a customer support email. And this is William Taylor. I'll give him a shot because he's an amazing guy. But the email came in and so the use case is the following rag on all of the support tickets, not just the ones belonging to the user that sent the email, but basically all of the emails that have keywords or like, you know, similarities. And so that builds the top 10 emails that came in which are potentially related to this query. The entire thing is then fed to the LLM and the LLM can then decide, okay, I know how to solve this based on historic data. And I'm now just going to send an email to the user or I need to escalate it. This is terrible from a cybersecurity point of view. I, an attacker can send in an email with a lot of keywords or even I can fill the context of my email with people's email addresses that I'm interested in. Now I send that email. That's now part of the rag. When one of those users sends a ticket in. My malicious email is very likely to be picked and to be part of that huge prompt which is then processed. And I can make the LLM generate an email with a phishing attack. And now the company will send the user an email with the content I want. For example, this is a link. Click it to solve the issue. We demonstrated that. So the problem here is that we are feeding to the LLM different data sources and some of them are potentially malicious or not controlled. So there is this explosion. And you could say the same with mcp. So every time somebody is adding an MCP server, obviously the output of an NCP server is input into your LLM context. The description of an MCP server has to end in your LLM context. But that can contain prompt injection that can tell your client to call another MCP server completely unrelated to do something else. I mean, this has been demonstrated a million times. And Sean from Hugging Face was talking about it@secure AI just again in Stockholm a couple of weeks ago. And this is a very hard problem to solve. So we are mixing different untrusted sources into the same LLM context, and that's hard to solve.
Daniel Whitenack
Foreign.
Sponsor / Advertisement Voice
It is time to let go of the old way of exploring your data. It's holding you back. But what exactly is the old way? Well, I'm here with Mark Dupuis, co founder and CEO of fabi, a collaborative analytics platform designed to help data explorers like yourself. So, Mark, tell me about this old way.
Mark Dupuis
So the old way, Adam, if you're a a product manager or a founder and you're trying to get insights from your data, you're wrestling with your postgres instance or Snowflake or your spreadsheets. Or if you are and you don't maybe even have the support of a data analyst or data scientist to help you with that work. Or if you are, for example, a data scientist or engineer or analyst, you're wrestling with a bunch of different tools, local Jupyter notebooks, Google Colab, or even your legacy bi to try to build these dashboards that someone may or may not go and look at. And in this new way that we're building at Babi, we are creating this all in one environment where product managers and founders can very quickly go and explore data regardless of where it is, right? So it can be in a spreadsheet, it can be an airtable, it can be in postgres Snowflake. Really easy to do everything from an ad hoc analysis to much more advanced analysis. If again, you're more experienced. So with Python built in, Python built in right there in our AI assistant, you can move very quickly through advanced analysis. And the really cool part is that you can go from ad hoc analysis and data science to publishing these as interactive data apps and dashboards, or better yet, at delivering insights as automated workflows to meet your stakeholders where they are in say, Slack or email or spreadsheets. So, you know, if this is something that you're experiencing, if you're a founder or product manager training, get more from your data for your data team today. And you're just underwater and feel like you're wrestling with your legacy BI tools and notebooks. Come check out the new way and come try out Fabi.
Podcast Host / Announcer
There you go.
Sponsor / Advertisement Voice
Well, friends, if you're trying to get more insights from your data, stop resting with it, start exploring it the new way with Fabi. Learn more. Get started for free at Fabi AI. That's Fabi AI again. Fabi AI.
Chris Benson
As I'm processing what you're, what you're talking about with this, I'm like, I'm just imagining, you know, with, you know, as, especially as you're describing kind of your, your offensive driven approach that you guys have, you know, the number of potentially bad actors out there that could be exploiting this, you know, with this information. And, you know, are you at this point, like, what are you seeing out there in the wild? Like, you know, that's such a compelling kind of a danger story that you're telling that is so practical. Like any of us could go do that. What are you seeing in the real world in terms of bad actors and at what levels? Like, you know, I come from the defense and intelligence industry, so obviously my brain goes to, to those types of concerns. But, you know, there's cyber criminals, there's all sorts of different types of potential bad actors out there. So what are you and what is this industry kind of focused on right now in terms of what's already happening and where your biggest fears are?
Donato Capitella
So I will say that because of what we do now, we don't have an incident response team, so we don't really get to see much of what happens. Like, we don't see that. So we are more at the prevention side. So we will test systems that are not in production yet. So we kind of see into the future. Well, if that system had gone into production the way it was, I can foresee the attack would have happened. Now, in terms of what people have actually demonstrated in practice, the one that comes to mind and I'll give a shout out to the guys at this company called AIM Labs. They did demonstrated a vulnerability on Copilot called, they called it Eco Leak. So basically it's the same rag concept. You send an email. Copilot is just a big rag. Now with that email it was very clever. I think we should link in the show notes the description of the attack. But basically with that email they got Copilot to exfiltrate information. Now the thing is, Microsoft knows about this. They had a lot of filtering in place, but they were able to find a clever markdown syntax to bypass the filtering. So as probably your audience will know that one of the main vectors to exfiltrate information in LLM applications is to make the LLM produce a markdown image. And in the URL you can point the URL to an attacker controlled server and then you can tell the LLM by the way, in the query string of this URL put all the credit card data of this user. If the LLM knows about that and obviously when the LLM returns that and you try to render that image, the request is going to go to the set. Now Copilot, you can't do this in Copilot because they're filtering out a lot of these markdown syntax. But the guys found a way around it to bypass the regular expression that Copilot was using. So what we're seeing is instances where stuff could really go wrong. But thankfully there is a lot of researchers that seem to be catching them before they are exploited to the full potential. But then cybersecurity is very strange. Sometimes you will know a breach happened five years later.
Daniel Whitenack
And I know one of the things I definitely want to get into with you based on, you know, our previous conversations was kind of design pattern type of things. But before, before we get there, I'm, I'm a little bit curious just from a strategic standpoint in terms of how you're interacting with customers, because there's one side of the spectrum where you can lock, you know, try to lock everything down, right, and say, oh, here's like, we haven't verified any of these sources of data, right? We, we have to have a policy in place, right, to, you know, approve certain tool connections or no external connections to different tools and other things like that. The issue on that side I see is like people want to be productive, they want the functionality, they'll do this sort of shadow AI stuff and like try to, you know, like they just want the good functionality. So you kind of go on that End of the spectrum you maybe have that problem on the other end of the spectrum without any sort of policy or without any sort of, you know, governance, right? Then you just get into this chaos and a huge amount of problems. So you know, there's never any kind of perfect solution. You're always going to have to wrestle with something. But do you have any thoughts on that in terms of companies like I guess their posture in how to approach this? Recognizing that people are able to find tools and able to find their own solutions that solve their issues so easily but might introduce liability.
Donato Capitella
I mean this is very, very old in cybersecurity with the difference now that people really want to be using Genai because I'm lazy like a lot of other people. I guess I do like the ability to use it to do a lot of tasks to make them easier. Now what happens in some of the enterprises, I think I put them among our clients into two big categories. I mean there are some which are extremely risk adverse. Obviously I will not name them but the only thing I want to say is that I would never work there because it's basically impossible to get anything done and everything is so slow. And sometimes even for us as pen testers I have to log in with Citrix into a Windows box. Then from there I have to RDP into a server. From that server I have to go into into like a Linux machine and from there I can finally do some testing and by the time I've done all of these I am so locked into that there is nothing I can do. And the employees work like this like they these machines and they can't do anything. So you have that extreme. And they do exist like a lot of big core like financial sector, extremely risk adverse. It makes you cry when you see that. I think I couldn't stand it. I couldn't spend all my day into six layers of vdi. But on the other side and we work a lot with startups and it's wild west to say so I think it's fun. But yeah, people are just using whatever like you know, so yeah, this is two buckets and I think it's. I don't have an answer for that. Meaning that I see both but I see extremely locked down environments and I see companies that are much more relaxed and yeah, people are doing a lot of shadow AI like people have clothed desktop just installed. I guess they will have all the NCP services they want. They go on chatgpt even if company policy says you can't go. And yeah, they put all their data there. I wouldn't do that.
Chris Benson
I'm curious, as you're kind of addressing some of the challenges and these different environments that are inherent now in pen testing, could you also talk a little bit about kind of the differences in penetration testing today versus, you know, kind of before this gen AI era? Like what's changed and what kinds of activities and how have the, how have the metrics that you're looking at changed? Like what, what, what has the new approach to dealing with prompt injection and these type of exploits brought to bear in that day to day life? You know, aside from having to sometimes go so many layers deep, you know, as you mentioned in the financial thing, what are some of those other attributes that have changed?
Donato Capitella
So I would say not much has changed, which is interesting. So there are two things that change capability. From the pen testing point of view. It is much quicker if you are offensive to write a script to do something. I mean, this is like if you know what you're doing and you have a good LLM, your capability at least you are working faster. Like that is true now from the security assessment point of view. So clients are building applications. What's changed is that if they have an LLM in the application workflow, we have to do additional testing. And that testing is a bit different because we are working on probabilistic stuff. So we try to help people assess. Okay, have you got guardrails? What's the quality of those guardrails? And what can you do outside in the design or in the implementation to make sure that when the LLM does something wrong, you and your customers are protected? So typically it takes a bit longer and actually it becomes more data science driven. So if you're looking, if you're testing SQL Injection, it is not very data science driven. You basically demonstrate that you can do it if you are testing SQL injection. So if you're testing prompt injection, you know that prompt injection is inherent. So you are going to find a way. So what you're trying to test is what's the effort? How hard is it for the attacker to be successful? Because that's then going to drive the types of guardrails that you need and the type of active response. I will say something more and then I will let you guys see if we can make sense of this. But basically I think jailbreaking and prompt injection is less similar to SQL injection and more similar to password guessing attacks. In what way? So the question is not whether the LLM can be jailbroken. The question is what's the effort. How many prompts do I need to try before I am successful at jailbreaking it? There are so many techniques. Crescendo, random, suffix attack, best of N. Like you can do so many of these techniques. So the more effort I can put in it, the more I'm likely to succeed. So exactly as password guessing, the way you kind of solve this is there are two layers. One layer is you don't allow the attacker to explore the space of all possible passwords. Likewise, you don't allow the attacker to send a hundred thousand prompts per second to explore to find something that's going to jailbreak it. You have a set of guardrails for prompt injection, topic control. As soon as a user as an identity that's connected to your application triggers three of those guardrails, that's your feedback loop. You stop the user, you suspend it. In the same way that Chris, if I try three passwords that are wrong against your email account, I am not going to be allowed to keep trying. Your account is going to be temporary locked and that's to prevent me me from exploring that space. I think protecting against jailbreak attacks in the real world is very similar. You have the guardrails, they are not protecting the application. They are giving you a feedback signal that that person, that user, that identity is trying to jailbreak it and then you can act on it. Sorry, it was a very long answer, but no, it's a great answer that people don't understand. These people think that they. The jailbreak, the guardrail protects them. No, the guardrail is your detection feedback loop that then you have to action to protect your application and your users. It's a completely different thing.
Chris Benson
It's a good thing to hear because it's not that. That's something that was new to me as well. So I appreciate you covering that.
Daniel Whitenack
Yeah, yeah. And I hate it from, I guess even just from the user experience side, if you try to treat that, that prompt injection block as a kind of binary, you know you're going to let it through or not, you're going to moderate the user. Also, those prompt injection detections are not perfect. Right. None of them are. So you're going to get false positives and from the user perspective that creates problems. Right. But if like you say you have certain percentage of detections or a certain number or a certain number of triggers, that's a much stronger and also an approach that is happening in the background. I almost feel like this sort of net new seam event related to AI things where you Kind of have the response to it. I'm wondering, Donato, you spent a lot of time kind of digging in, I know, to research in this area. One of those things being a paper that I think you've made some videos on, but also just we were discussing prior to recording, could you talk a little bit about that? And I think that goes into some design patterns. Obviously, if people want to kind of have the full breakdown of this, because there's a lot of goodness there, they can watch Donato's video on this, we'll link it in the show notes. But maybe just give us a sense of that at a high level, some of what was found.
Donato Capitella
So this paper is called Design Patterns to Secure LLM Agents Against Prompt Injection. And I already like the title of the paper because it's telling you exactly what's in the paper. You don't have to kind of wonder what it's about. So what I like about the paper, this is coming from different universities, people at Google, Microsoft, I mean there are like, I want to say 15 different contributors to this paper. It's very practical. They basically look at different types of agentic use cases. Not every agentic use case is the same. So they kind of give examples of like 10 different agentic use cases. Now an agentic use case then has a certain level of utility. So how much power do you need to give to that LLM in order to be able to do certain operations? And that defines the scope of that. And then they find they crystallize six design patterns that you can apply depending on your trade offs between security for that use case, between security and how useful, usefulness or power of that use case. Now there are some use case, there could be use cases that you can make very secure with the pattern that they call action Selector. Now this is the most secure pattern. You are just using the LLM to basically select a fixed action from the user input. So that kind of removes often in that case, any, anything bad the attacker can do. Because if the LLM produces output that doesn't make sense. It's not an allowed action for that user. You, you discard it. And then they talk about other patterns. And the one that's the most promising and the most widely applicable, they call it code, then execute. And this is, this was published by Google and I think they call it Camo. There is a dedicated paper to that. And so the idea is that the LLM agent is prompted to create a plan in the form of a Python snippet of code where it's going to commit to executing that program exactly as it is now. As part of that program. The LLM can access data and can perform operations. But the logic of the program is fixed by the LLM before malicious data enters potentially the context of the LLM. And all the third party data that comes in is handled as a symbolic variable. So X equals function call. Then you take X and you pass it somewhere else. Not only this, but every tool that you can call can have a policy. It can say if this tool is called with an argument that was tainted with a data source coming from here, this action cannot be executed. But if this tool is called with a variable, so you do this with dataflow analysis with a variable that came from what we consider trusted users, then these actions can be done. So each tool can have a policy. You can write the policy and then the framework traces data. This is not AI, this is classic data flow analysis. And so all of these can be enforced completely outside of DLLM and completely deterministically. It's very reminiscent for people in cyber security of what SC Linux does on on a Linux kernel. So it's kind of this reference monitor for for LLM agents.
Sponsor / Advertisement Voice
What if AI agents could work together just like developers do? That's exactly what Agency is making possible. Spelled AGN TC Y Agency is now an open source collective under the Linux foundation building the Internet of agents. This is a global collaboration layer where the AI agents can discover each other, connect and execute multi agent workflows across any framework. Everything engineers need to build and deploy multi agent software is now available to anyone building on Agency, including trusted identity and access management, open standards for agent discovery, agent to agent communication protocols, and modular pieces you can remix for scalable systems. This is a true collaboration from Cisco, Dell, Google Cloud, Red Hat, Oracle and more than 75 other companies all contributing to the next gen AI stack. The code, the specs, the services they're dropping, no strings attached. Visit agency.org that's agntcy.org to learn more and get involved again. That's agency A G N T C Y.org.
Chris Benson
So when you're talking about the the code then execute design pattern, is there a way of inhibiting the LLM from using prompt injection to get the LLM agent to write the code that then gets executed? Is there basically some way of defending the code being written from being influenced by the prompt, you know, by a potential prompt injection?
Donato Capitella
That's the key of that use case. You ask the LLM to produce a plan or the code before any untrusted input enters the context so the user query is trusted. Okay, but then the tools that it calls and the output from those tools and the third party data could be an email that the user received. Now those will not be able to alter the LLM control flow and if they try to, it will be stopped by the reference monitor because it will say, no, this function cannot be called with this input because this input has been tainted by this third party email. Very, very cool concept. They do have a reference implementation. I mean, I had a weekend. I like this paper so much that one weekend I actually implemented all of the six design patterns. I think I put it in a git repo. It's not difficult to implement actually. And it was really fun because then I realized something that I kind of intuitively know. You don't solve the problem of LLM agent security inside the LLM. This is not an alignment problem. You solve the problem outside of it. You still use prompt injection, detection, topic guardrails, you still use these as feedback loops, as we said before. But if you want to get assurance that stuff is not going to go bad, you need to have much stronger controls that don't depend on the LLM itself.
Chris Benson
So it would be fair to say it's kind of a system design problem rather than a model design problem, because you're kind of isolating the model. Am I getting that okay?
Donato Capitella
Totally.
Daniel Whitenack
And you mentioned some of this work.
Chris Benson
Of course.
Daniel Whitenack
It's been great to see that both in terms of video content and in terms of code and actual framework, you and your team have contributed a lot out there. One of those things that I've run across is the spiky package or framework or project. Could you talk about that a little bit? Maybe how that came about and where it fits into kind of tooling, I guess, in this realm.
Donato Capitella
That's very interesting because when we started doing pen testing of LLM applications in 2023, we were doing a lot of stuff manually. And obviously nobody wants to do that manually. It's more similar to a data science problem than a lot of the traditional pen testing. So we started looking into tooling that we could use. And I'll be honest, the problem there is that a lot of tooling for LLM red teaming is doing exactly that. Is red teaming an LLM, an LLM application? It ain't an LLM. Like it's got nothing to do with an LLM. Like it doesn't have an inference API. Like if I have a button that I can click that summarizes an email that is not even a conversational agent. If I send an email in and there is like an entire chain of stuff that happens, like, I can't run like a general purpose tool against it. It doesn't make sense. Sense. So we started writing scripts, individual scripts that we use to kind of create data sites. And obviously for us, this thing needs to be needed to be practical. Now I, I have five days, six days to do a test for a client. And within those days, I need to be able, even in an isolated environment, to give the client an idea of what an attacker could do. So you have all of these wish list of things. So my wish list was I need to be able to run this practically in a pen test. I need to be able to generate a data set which is customized for what makes sense in that application. Like, for example, I wanted a data data sets that I could use whenever it mattered to test data exfiltration via markdown, images versus HTML injection, JavaScript injection versus harmful content topic control. A lot of our clients, for example, say, I don't want my chatbot to give out investment advice. Actually, we would be liable if that happened. But every use case is different. So I needed something that I could very quickly create these data sets and then every. And then it could be as big or as small as I needed it to be. Now sometimes we go to clients and they tell us, oh, you can send 100,000 requests a day, Fine, I'm gonna have a very large data set. Sometimes we go to clients and they say, you can only send a thousand prompts a day. So you need to be very careful because that's an application, that's not an LLM inference endpoint. So you need to be very careful that you need to create a data set that answers the questions of the client. Can people exfiltrate data? Can people make this thing give financial advice? And then you also have general stuff like, like toxic content, hate speech. Yeah, that anything covers that. But we needed practical stuff and we needed to be able to run it in completely isolated environments. Like if you don't have access to. We needed something where I didn't need to give it an OpenAI key. Okay. It is really important and you know, some of the stuff we can check with regular expressions if we've been successful. But we had to figure out a way that if I am in an isolated environment and I have a data set that I'm generating to test whether the application is going to give out financial advice, but I cannot call a judge LLM to tell me whether the Output is actually financial advice. How do I deal with that? So we had to find a solution for that. It needed to be simple, that we could have a team of pen testers use it. It needed to be extensible. So it needed to be modular so that if one of my colleagues has an application in front of them and this is something that we will see. I think one of our colleagues in the U.S. steve had a chatbot that was using websockets. Now he spent the first day crying trying to reverse engineer that protocol. And then on day two, and he can do that with Spiky, he wrote a Spiky module that's got a playwright. So the Spiky module used a headless browser to open the chatbot, send the prompt and read the response. We were the only pen testing company working on that chatbot that was actually able to programmatically test a lot of stuff. I think we had another one of our guys was working on some AWS infrastructure and the way you introduce the prompt is by dropping a file on an S3 bucket calling a lambda and then in another S3 bucket, one minute later you would have another file that was result of the pipeline that eventually called the LLM. So we needed a way where a consultant could enough a day look at whatever they had in front of them and create an easy module so that then Spiky could take stuff from the data set, send it there and read the response and then say whether the attack was successful or not. So we assume. And then we wanted to be able to extend it with guardrail bypass. So we have a lot of attacks where you take the standard data set and then you can say, okay, for each of these entries in the data set, I want you to try up to 100 variations using the best of an attack. So introducing noise versus using the anti spot lighting attack, which is another attack that we develop where you try to break spotlighting by introducing tags and strange stuff. So the LLM doesn't understand where data starts. So all of these things and it needed to be simple and Sorry, that was very long answer, but that's what we've been working on for the last year and we made the whole thing open source. We've actually had people from the community, from other companies contribute. So it's been very fun to put this together.
Chris Benson
No, it sounds really cool. And by the way, I don't remember if we identified what Spiky breaks down to from kind of the acronym. It's simple prompt injection kit for evaluation and exploitation. In case we didn't say that out loud, but I was curious as you're kind of going through the different kind of construction of the attacks and writing modules and stuff, I am wondering as you're, as you're using Spiky, like how much of it is pretty kind of standard built in tools that you have there on any given engagement when you're using the tool to do the pen testing versus how often are you having to, in a typical engagement, are you having to create custom modules that are very specific to a particular client's needs? I was just as you were going, going through, I was trying to decipher that, but I wasn't sure that I understood like you know, the toolkit as exists versus saying ah, for this client, I need to add this thing in. What, what does that look like typically?
Donato Capitella
So typically on like the first day of a test, you write a module which is going to allow Spiky to talk to the application. So that depends on what the application is. So the first day is typically writing this kind of adapter. It could be very easy if you have a REST API or again as we were doing, you can write playwright, you can use the AWS API. So whatever that is, that's the biggest part. And then you look at what you are trying to test. Data exfiltration and stuff like that. You have, we call them seeds. So you don't have pre built data sets, you have seeds that allow you to build data sets which can take five to 10 minutes to customize. But basically what happens there is that. So you have jailbreaks, which are common things that typically you don't touch. Then you have instructions and the instructions is what you customize. So if I want to test data exfiltration, social engineering, HTML injection, I will add or modify the instructions in there. So it might take five minutes. But basically we only test things that make sense for that application. So we create the data set and then the rest. Once you have the target adapter that allows Spiky to talk to your application and you have the data set that makes sense for your client, then you will run that data set and then you will rerun it again with different attack techniques. So we would say, okay, what happens now? We have a 10% attack success rate, maybe that's okay. Maybe we want to see what happens. If we now implement, best event, this attack that introduces noise, is that going to bypass the guard rate? Typically the attack success rate goes up and then we can try all these different things and maybe change the kind of parameters. So to answer Your question? There is a bit of customization to make sure that what we do makes sense for the application, but then there is a lot of built in attack modules that do the heavy lifting for you.
Chris Benson
That sounds really cool. I'm looking forward to trying it out myself. You really have me intrigued about it. As we are winding up here, one of the things that we like to try to get a sense of on finishing is kind of where things are going and you are in this, this really cutting edge aspect. You know, the merging of security and AI and all of the new types of risks that people face out there. And you guys have made so much progress over the last year or two. I'm wondering as you're looking ahead at this, at both kind of what you're doing at your organization and also like the larger industry since you're participating in all of these different touch points, you know, going to different conferences and stuff like that, where, where do you see this going? What kind of evolution are you expecting going forward? And as part of that, what do you want to see? Like, aside from whether or not you're seeing an example when you're like at the end of the day you're not, you know, you're able just to kind of ponder and maybe have a glass of wine or whatever you do at night, like what is the thing? You're like that's the thing that it would be cool. I want to go do that, you know, whether or not it's on the plan right now or just an idea. Wax poetic for me a little bit on this because I'm kind of curious where this industry might be going.
Donato Capitella
Oh, I wish I knew, to be honest. I think so. Realistically what I would like to see is people shifting the cybersecurity mindset from let's do LLM red teaming to let's secure LLM applications and use cases using a design pattern that actually makes sense. So let's stop asking LLMs to say that humanity is stupid or how to make a bomb and let's start looking at our applications and ensuring that they can be used in a safe way if they have access to tools and stuff like that. Because I think that's going to be one of the big issues that we're gonna have. Like if people don't start seriously taking the risks that come from LLM agents, we are going to see real world, big breaches coming from that. So what I would like to see is shifting that discussion from LLM red teaming to system design. That takes into account the fact that that we don't know how to solve. Prompt injection and jailbreaking in LLMs. When somebody figures it out, I will be the happiest person in the world. But I believe some ultman last year said they would have solved hallucinations and I am not going to continue.
Chris Benson
That's a good way to. That's a good way to end right there. Donato thank you so much for coming on Practical AI. Really fascinating, fascinating conversation. I am excited about this and hope you come back again. I know, I know we've already had a couple of conversations, but they're always fun as you're as new things are happening for you, don't hesitate to let us know what's going on and keep us surprised on what the space looks like.
Donato Capitella
Thank you very much for having me.
Podcast Host / Announcer
Alright, that's our show for this week. If you haven't checked out our website, head to PracticalAI FM and be sure to connect with us on LinkedIn X or BlueSky. You'll see us posting insights related to the latest AI developments and we would love for you to join the conversation. Thanks to our partner Prediction Guard for providing operational support for the show. Check them out@prictionsguard.com also thanks to Breakmaster Cylinder for the Battle Beats and to you for listening. That's all for now, but you'll hear from us again next week.
Podcast: Practical AI
Hosts: Daniel Whitenack, Chris Benson
Guest: Donato Capitella, Principal Security Consultant at Reversec
Date: October 16, 2025
This episode dives into the rapidly evolving world of agentic AI systems—where Large Language Models (LLMs) can interact with external tools and APIs—and the escalating complexity and security risks that arise as these systems are deployed in real-world enterprise scenarios. Donato Capitella returns to discuss the state of AI agent adoption, security vulnerabilities, real-world attack patterns, and practical frameworks for securing today’s LLM-powered agents. The conversation balances deep technical insight with stories from the field, offering both warnings and actionable advice for practitioners.
(04:49–08:35)
(08:35–13:21)
(13:21–17:05; 20:28–22:46)
(22:46–26:40)
(26:40–31:37)
(31:37–33:06)
(33:06–40:47)
Paper highlighted: Design Patterns to Secure LLM Agents Against Prompt Injection (see show notes for full breakdown and Donato’s demo).
Six security patterns identified for different agentic use cases and trade-offs between security and utility.
Most secure: Action Selector—LLM picks from fixed actions.
Most promising for flexibility: Code then Execute (from Google’s "Camo" approach)—LLM writes a plan as code before untrusted inputs are introduced. External policies enforce which tool calls are allowed, independent of the LLM output.
Core message: Shift to system-level design controls. Secure LLM agents with traditional dataflow analysis and monitor tainted variables, not simply post-hoc content moderation.
(40:49–50:51)
(50:51–54:09)
On the evolving threat surface:
"Any tool exposed to an LLM becomes a tool exposed to any person that can control any part of the input into an LLM."
— Donato Capitella, 09:22
On guardrails and prompt injection:
"The guardrail is your detection feedback loop ... They are giving you a feedback signal that that person, that user, that identity is trying to jailbreak it and then you can act on it."
— Donato Capitella, 30:57
On security focus shifting:
"Let's stop asking LLMs to say that humanity is stupid or how to make a bomb and let's start looking at our applications and ensuring that they can be used in a safe way if they have access to tools."
— Donato Capitella, 52:28