Loading summary
A
Foreign.
B
And welcome to this sponsored interview here in the Risky Bulletin feed. My name is Patrick Gray. What you're about to hear is a conversation between Casey Ellis, who does some interviews for us here at Risky Biz hq, and Keith Hoodlett, who is the director of Engineering for AI Machine Learning and AppSec over at Trail of Bits. And this is, you know, this is just a really interesting conversation. As many of you would know, Trail of Bits actually just finished second in the DARPA AI Cyber Challenge, and that's where people contestants had to develop AI systems that would basically find bugs in code and then write patches for them. Their implementation of that was called Buttercup, and as I mentioned, it came second, but it's all open source, so you could go find that if you just search for Trail of Bits Buttercup. But the point is, you know, they're kind of proving that they're a company that know a fair bit about AI. And in this conversation you're going to hear Casey and Keith talking about how prompt injection is essentially an unsolvable problem. And you know, Keith is really compelling here, actually. They also talk about some offensive and defensive work trailer Bits have been doing around MCP servers, like Model Context Protocol servers, and also what organizations should be thinking about when it comes to securing their own AI implementations. All in all, this is just a very interesting conversation and big thanks to Trail of Bits also for sponsoring everything you're going to hear in this Risky Bulletin feed this week. So I will drop you in here with Keith Hoodlett from Trail of Bits talking all things AI. Enjoy.
A
If you think about where we are on this journey, it's very early days for the implementation integration of this technology in browsers, in IDEs, in software development tools, you name it, even just the chatbot element, the people that are using it to do customer service and things of that nature. Even though it's been a few years since the technology's come out, we're also just seeing people really trying to adopt this at a scale that we just haven't seen before. And unfortunately the attack vectors, especially prompt injection, continues to be the thing that stands out as a probably unfixable problem long term. And so we're continuing to band aid and build walls around the way that these tools work. But attackers do what attackers do. And so some of the recent research we did around this was actually just published recently on Scaled Images as a prompt injection vector. So in this case, what we were able to do. So Suha Sabi Hussain and Kikimora Morozova to the engineers on the team here at Trail of Bits basically identified that they could hide prompt injection texts inside of images. That when using scales of an image, like downscaling or upscaling of an image, there's processes that happen that allow the pixels to sort of line up to the right color scheme. And those are predictable algorithms in terms of the way that you can actually take a pixel and take the 4x4 pixels around it and determine like, what's the most important color to downscale this image appropriately. And so what we were able to do is then be able to predictably determine the color and where the text needed to be, such that when an image became downscaled, it would add text onto the image and that text would be a prompt injection that then LLM would read. And then suddenly it gets hit with a prompt injection. But the default image, by itself unscaled, looks totally safe.
C
So it's like a really fancy form of steganography with prompt injection as the kind of the goal.
A
Yeah, that is a really, really great example or term to use. I hadn't even personally considered that. Yeah, it's sort of like steganography in the sense that you're hiding things inside the image. But in this case, we showed this by using like a dark text or a dark image of some kind. And you hide the text in the image, right. And then when the downscale happens, it changes the background color to like a reddish tint, but then the dark text sort of pops out at you. And what we found with a lot of the different tools able to insert an image into for prompting purposes was that they, for the user side, would not see the scaled image that was actually getting sent to the large language model in the backend. And so because of that, now suddenly you have a human thinks the image is fine, image gets uploaded or inserted. Boom, prompt injection. And then you can do all sorts of crazy things like exfiltrating data and what have you.
C
Yeah, so it's interesting because to me, kind of calling back to your title trailer bits, the combination of AIML and then AppSec. Historically, AppSec's taught us that input validation is kind of important. And then obviously validating and kind of putting limits around how you're processing input is fairly important too. Right. That image scaling attack sounds like a really neat way to create an input. Something that you said before was just around this idea that prompt injection is kind of technically unsolvable. And I think I was actually at the same event with you a little while back where someone from one of the foundational models, who was in a pretty good position to have this opinion, said a similar thing. Do you want to just go into that a little bit more? Why is it so hard to actually solve this?
A
I think the biggest challenge with large language models when it comes to just the way that you interface with them, is they're built on a way that they have a hard time to discern data from input. And so this is where I think some of the earliest prompt injections have happened, where it's like suddenly it's rewriting the system prompt by just saying, ignore all previous instructions. I think we're all beyond that point of the level of simplicity for prompt injections now. But even so, it's this concept where the large language models, the way that they're built, the way that they operate, is around human text. And so even when you start to insert things like tokens that are maybe just esoteric items that are not really intended for text, but the large language model is trying to interpret them in way that it's like trying to be helpful is maybe the best way that I can put it. And so in that sort of like way in which it is instructed to operate, it can break down in the way that it then calculates what the actual token space should be, that it should perform inference on what the response should look like as a result. And so we've seen a lot of success in the jailbreaking community where they play against what's considered getting into math here. But gradient descent, which is within the inference window that you're looking at, you effectively have a gradient where maybe it's an image of a cat, for example, and on one side of that gradient, the image of a cat is represented as a cat in a vision model, whereas on the other side of the gradient, maybe it's, I don't know, a train or a tiger or a lion, like house cat versus something else. And finding where that gradient is and where those edges are, and then pushing the model into spaces that are not intended but are reachable is effectively what prompt injection is trying to do, is it's trying to push the model out of its contained environment and make it respond or perform actions or read data and respond with that data that it shouldn't be, that we're trying to prevent you from reaching. And that's what guardrails and all sorts of Llama firewall or Nemo guardrails, or what have you are trying to accomplish, which is containing or constraining your ability as an attacker. And to push the model into performing actions on your behalf that it shouldn't be performing.
C
Yeah, that makes sense. And coming back to the kind of fundamental unsolvability issue, the models themselves are designed to be probabilistic in how they respond, not deterministic. So there's always going to be some sort of degree of risk around this type of issue. Is that kind of what you're getting at there?
A
Yeah, absolutely. And so it's to your point, right? The input validation, output sanitization, these are all things that I think we've talked about for a very long time in application application security. And they're just as relevant here in the world of AIML security, because if you're not logging how users are actually interfacing with your large language model and you're not monitoring for drift of the model itself or for outputs that seem like they're wildly out of line with the way that you're intending to have the model work, if you're not detecting on that, you're missing the attacks that are happening in the wild today. And I think so many companies out there are moving very fast and unfortunately things are sort of being broken around them without their knowledge.
C
Which is funny because when you think about the onset of new technologies on the Internet, there does seem to be this pretty predictable pattern that the more quickly and the more kind of hypey the onset of uptake of a new tech is, in general, the more poorly it's implemented. From a security standpoint, that tends to be kind of a technical debt trash fire at some point in the future.
A
Yeah, there was a MIT Sloan paper recently on that, which is effectively AI, and especially AI generated code is moving so fast that it's building technical debt faster than the humans can actually address that problem. And so you're seeing outsized impacts in the way that control flow, authentication, all sorts of other pieces of applications, as they're being built with AI generated code, are also creating separate problems as a separate part of the discussion.
C
But yeah, yeah, which, which is, you know, to me that's fascinating because it's a pattern that we've seen before. We saw that with Iot. We saw it with cloud, you know, we've seen it in like web1. Even this idea of something getting kind of becoming the new hotness or hitting, you know, the Internet or the market is the new hotness. Everyone kind of rushing to implement it. And then in the process, you know, there's a whole bunch of things that we kind of realize after the fact that have created security and risk Issues. And this seems to be almost a version of that that's on crack in some ways.
A
Yeah. And the attack vectors, I mean, I think you may have actually been the one who used this analogy with me at one point, but over time I've heard the analogy used where it's like you have a predator, prey relationship when it comes to attack. And defensive systems and predators in the wild need to evolve faster than prey animals in order to eat and survive. And so I think what we've seen in this environment with large language models and AI in general, especially coming back to the image scaling prompt injection vector as just an example, where you see this mass adoption of this technology, of this capability, of these multimodal models in terms of the way that they operate. It's an excitement by the large companies that are building this technology. And yet here we have folks like members of my team who very creatively think of, okay, well, what's actually happening in the background in the way that this is engineered to then drive its prompt injections all over the place. I feel like sort of woody in that meme from Toy Story. You know, it's prompt injections, prompt injections everywhere. Well, it's in your images, it's in your text, it's, you know, in your voice commands. It's everywhere.
C
So if there's a couple of other attacks that you guys have kind of found over the past period of time that you wanted to bring up, like, let's do that, Like, MCP is one example that's sort of gone from not really being a thing to being the thing that everyone's doing and implementing really over the past six months. Like, what's some of the work that you guys have done there?
A
Yeah, so we've got a couple of different things, especially in the MCP security space. The biggest thing that we did is back in late April, May, we dropped a bunch of different MCP attack vectors, one of which we call line jumping, which is effectively, you're using the tool description as the prompt injection vector, which is trusted automatically by the MCP server when it connects, because it has to understand what the tools can be used, how it operates. And so because of that, we immediately jumped on a bunch of different attack vectors, like antiterminal codes or all sorts of other things that you could use to then hide prompt injection vectors inside of the actual tool description itself. So it looked benign to the human, but it clearly was malicious when connecting. In our case, we also just recently released, gosh, maybe a few weeks ago now, MCP context Protector. So it's basically a wrapper tool as an MCP server itself that proxies all of those communications so that you're looking at the tool description. Okay, well we can now see if there are prompt injection in there and block it. We can look for anti terminal codes and escape them properly. So now as the user, you're reading it and you're saying what are all these escape characters? What is happening here? And so it gives you that sort of this is sus feeling. But then on top of that we also built in the ability to add guardrails. So like llama firewall or others where you can actually add in detections against like hey, there's a malicious prompt injection in just normal text here that maybe the human would miss from reading it. And so it's got some built in capabilities for quarantining to say like this looks sus human. Please validate before we, you know, send this command back or send it on far at all.
C
Yeah, that's very cool. So you know, it's sounding a little bit like the Wild West.
A
Yeah, just a little bit is an understatement.
C
Like AppSec circa, you know, 2008 or just the Internet itself, like circa 2001. Kind of a redux on that I guess. You know, kind of. As we, as we wrap this up, what would your like if there's one piece of advice that you could give to, to folks, you know, on, on the practitioner side or even on the leadership side that are either implementing this type of thing into their environment or are aware of the fact that it's happening whether they like it or not. Like what's, what's the, the piece of advice that you give them on how to, you know, get ahead of this I guess, as best as possible.
A
Yeah. I think the biggest thing of course is to test and validate, benchmark as much as possible for common behaviors. The standard response that I've been giving to a lot of folks is log, monitor and alert is continuing to be a thing that you need to do when it comes to the way that people are using or interfacing with your implementations of AI, as well as the outputs that are coming from your AI to make sure that they are, maybe it's just a sampling, but that they are consistent with what you're expecting to see. Because in terms of actual preventions, they're both difficult to implement and difficult to sustain at scale because we're just so early days on it. I mean, if you think about to your point of 2009 versus 2001. It's like, well, at some point site reliability engineers became a practice in large businesses. Maybe it was Google or it was Amazon or what have you. And then they became adopted everywhere. And I think when you look at the Google's Amazon Anthropics, OpenAI's, Microsoft, et cetera, they're on that leading edge of understanding the problems that large language models can introduce into their workflows or into their capabilities as a company. And they're very quickly specializing on defense. And I think a lot of people around the industry, whether it's VC backed startups or even just Fortune 500 companies that are a little bit slower to adopt new technologies like this are way behind the bell curve in terms of their ability to understand that this attack vector is going to cause downstream impacts, whether it's confidentiality of information or it's actual actions that are being taken on behalf of the user by way of large language models in an agentic sense. And so we're moving very fast. But I think you can't know if something is broken if you're not testing it. So definitely get out there and make sure that you're working with someone like TrailOfBits to validate that this thing does not not have wide open tunnels into very sensitive information within your environment. But then on top of that, make sure you're logging, monitoring and alerting the way that these things are being used in order to ensure that you're not having these downstream impacts in ways that you're just not even visible to your world.
C
I think that's really good advice. Trust But Verify. It's kind of timeless, but we need to do it kind of faster at this point with this stuff, by the sounds of it. All right, well look, Keith, really appreciate your time everyone. This has been Casey Ellis for the Risky Business podcast speaking with Keith Hoodlett, who is the Director of engineering in AI, machine learning and AppSec for trial of Bits. Really appreciate your time, Keith. Thanks for the shot.
A
Thanks so much, Casey. Good to see you, mate. Cheers.
Podcast: Risky Bulletin (Risky Biz)
Guests: Casey Ellis (Risky Biz) interviews Keith Hoodlett (Director of Engineering, AI/ML & AppSec, Trail of Bits)
Date: September 7, 2025
Episode Focus: The persistent, complex problem of prompt injection in AI systems and why it may be unsolvable, Trail of Bits' research into new attack surfaces (including image-based prompt injections), and recommendations for organizations implementing AI.
This episode dives into the fundamental security challenges posed by large language models (LLMs), focusing on the persistent—and perhaps unsolvable—issue of prompt injection. Keith Hoodlett shares insights from Trail of Bits’ recent research, including novel attack vectors and defensive practices, with practical advice for security practitioners and leadership.
"We're continuing to band aid and build walls around the way these tools work. But attackers do what attackers do." — Keith Hoodlett [01:44]
"So it's like a really fancy form of steganography with prompt injection as the goal." — Casey Ellis [03:37]
"The models themselves are designed to be probabilistic in how they respond, not deterministic. So there's always going to be some sort of degree of risk..." — Casey Ellis [07:41]
"AI generated code is moving so fast that it's building technical debt faster than the humans can actually address that problem." — Keith Hoodlett [09:04]
Testing & Validation: Continuously test, validate, and benchmark for prompt injection or unexpected behaviors.
Monitor & Log:
"Log, monitor and alert is continuing to be a thing that you need to do when it comes to the way that people are using or interfacing with your implementations of AI, as well as the outputs that are coming from your AI..." — Keith Hoodlett [13:46]
Proactive Defense: Leading organizations (Google, Amazon, Anthropic, OpenAI) are specializing in defense, but most companies are "way behind the bell curve."
Quote – Summing up the guidance:
"You can't know if something is broken if you're not testing it. So definitely get out there and make sure...you're not having these downstream impacts in ways that you're just not even visible to your world." — Keith Hoodlett [15:41]
Quote – Timeless Security Principle:
"Trust but verify. It's kind of timeless, but we need to do it kind of faster at this point with this stuff." — Casey Ellis [15:56]
| Timestamp | Speaker | Quote / Memorable Moment | |-----------|-------------|---------------------------------------------------------------------------------------------------| | 01:44 | Keith Hoodlett | "Attack vectors, especially prompt injection, continues to be the thing that stands out as a probably unfixable problem long term." | | 03:37 | Casey Ellis | "So it's like a really fancy form of steganography with prompt injection as the kind of the goal." | | 09:04 | Keith Hoodlett | "AI generated code is moving so fast that it's building technical debt faster than the humans can actually address that problem." | | 10:00 | Keith Hoodlett | "I feel like sort of Woody in that meme from Toy Story. You know, it's prompt injections, prompt injections everywhere. Well, it's in your images, it's in your text, it's, you know, in your voice commands. It's everywhere." | | 13:46 | Keith Hoodlett | "Log, monitor and alert is continuing to be a thing that you need to do when it comes to the way that people are using or interfacing with your implementations of AI, as well as the outputs that are coming from your AI to make sure that they are... consistent with what you're expecting to see." | | 15:41 | Keith Hoodlett | "You can't know if something is broken if you're not testing it." | | 15:56 | Casey Ellis | "Trust But Verify. It's kind of timeless, but we need to do it kind of faster at this point with this stuff." |
For security leaders and practitioners: The landscape is rapidly shifting. Visibility, validation, and vigilance are your best defenses—don't assume security by default, and expect the unexpected as prompt injection vectors proliferate.