Risky Bulletin Podcast: "Why Prompt Injection is an Intractable Problem"
Podcast: Risky Bulletin (Risky Biz)
Guests: Casey Ellis (Risky Biz) interviews Keith Hoodlett (Director of Engineering, AI/ML & AppSec, Trail of Bits)
Date: September 7, 2025
Episode Focus: The persistent, complex problem of prompt injection in AI systems and why it may be unsolvable, Trail of Bits' research into new attack surfaces (including image-based prompt injections), and recommendations for organizations implementing AI.
Episode Overview
This episode dives into the fundamental security challenges posed by large language models (LLMs), focusing on the persistent—and perhaps unsolvable—issue of prompt injection. Keith Hoodlett shares insights from Trail of Bits’ recent research, including novel attack vectors and defensive practices, with practical advice for security practitioners and leadership.
Key Discussion Points & Insights
1. The Early Days of AI Adoption and Security Risks ([01:44])
- Rapid Adoption: AI is being integrated into browsers, IDEs, customer service, and more.
- Security Gaps: The scale and speed of adoption outpace understanding of new attack vectors, with prompt injection foremost among them.
- Band-Aid Solutions: Current mitigations are mostly temporary, building walls and workarounds rather than solving root issues.
- Quote:
"We're continuing to band aid and build walls around the way these tools work. But attackers do what attackers do." — Keith Hoodlett [01:44]
2. Image Scaling as a Novel Prompt Injection Vector ([01:44-04:34])
- Innovative Attack Surface: Trail of Bits researchers found ways to encode malicious prompt injection text into images that becomes visible only when the image is scaled.
- Technique: By predicting algorithmic behavior in image scaling, hidden text can be revealed to LLMs—text that appears benign to the human eye.
- Steganography Comparison:
"So it's like a really fancy form of steganography with prompt injection as the goal." — Casey Ellis [03:37]
- Keith expands, explaining how attackers can exfiltrate data through this technique.
3. Why Prompt Injection Is Essentially "Unsolvable" ([04:34-07:58])
- Input vs. Data Dilemma: LLMs fundamentally can't reliably differentiate between instructions and data.
- Human-like Openness: Their design to be "helpful" makes them susceptible to boundary-pushing by attackers, through carefully crafted tokens or instructions.
- Probabilistic Nature:
"The models themselves are designed to be probabilistic in how they respond, not deterministic. So there's always going to be some sort of degree of risk..." — Casey Ellis [07:41]
- Guardrails Are Imperfect: Solutions like guardrails, firewalls, or prompt sanitization mitigate risk but can't eliminate it due to the models’ openness.
4. Parallels to Earlier Tech Booms and Technical Debt ([08:42-10:00])
- Historical Pattern: Rapid, "hype-driven" adoption leads to poorly implemented security, as with IoT, early cloud, and Web 1.0.
- Technical Debt:
"AI generated code is moving so fast that it's building technical debt faster than the humans can actually address that problem." — Keith Hoodlett [09:04]
- Exponential Risk: Multimodal models and widespread use create new, hard-to-predict attack vectors.
- Technical Debt:
5. Trail of Bits' Offensive & Defensive Work: MCP Servers ([11:06-13:46])
- Context: Model Context Protocol (MCP) servers have become popular but present security challenges.
- Attack Techniques:
- "Line Jumping": Hiding prompt injection in tool descriptions, which MCP servers automatically trust.
- Obfuscated Vectors: Use of anti-terminal codes and benign-looking text as attack carriers.
- Defensive Release:
- MCP Context Protector: A proxy tool that scans for prompt injection, escapes suspicious codes, and adds guardrails (like Llama Firewall) for malicious prompt detection.
- Human-in-the-Loop: Quarantines "sus" communications for human validation before further action.
6. Advice for Security Practitioners and Leadership ([13:46-15:56])
-
Testing & Validation: Continuously test, validate, and benchmark for prompt injection or unexpected behaviors.
-
Monitor & Log:
"Log, monitor and alert is continuing to be a thing that you need to do when it comes to the way that people are using or interfacing with your implementations of AI, as well as the outputs that are coming from your AI..." — Keith Hoodlett [13:46]
-
Proactive Defense: Leading organizations (Google, Amazon, Anthropic, OpenAI) are specializing in defense, but most companies are "way behind the bell curve."
- Even with imperfect tools, active monitoring and validation are key to surfacing issues.
- Getting external validation from specialists (like Trail of Bits) is highly recommended.
-
Quote – Summing up the guidance:
"You can't know if something is broken if you're not testing it. So definitely get out there and make sure...you're not having these downstream impacts in ways that you're just not even visible to your world." — Keith Hoodlett [15:41]
-
Quote – Timeless Security Principle:
"Trust but verify. It's kind of timeless, but we need to do it kind of faster at this point with this stuff." — Casey Ellis [15:56]
Notable Quotes & Memorable Moments
| Timestamp | Speaker | Quote / Memorable Moment | |-----------|-------------|---------------------------------------------------------------------------------------------------| | 01:44 | Keith Hoodlett | "Attack vectors, especially prompt injection, continues to be the thing that stands out as a probably unfixable problem long term." | | 03:37 | Casey Ellis | "So it's like a really fancy form of steganography with prompt injection as the kind of the goal." | | 09:04 | Keith Hoodlett | "AI generated code is moving so fast that it's building technical debt faster than the humans can actually address that problem." | | 10:00 | Keith Hoodlett | "I feel like sort of Woody in that meme from Toy Story. You know, it's prompt injections, prompt injections everywhere. Well, it's in your images, it's in your text, it's, you know, in your voice commands. It's everywhere." | | 13:46 | Keith Hoodlett | "Log, monitor and alert is continuing to be a thing that you need to do when it comes to the way that people are using or interfacing with your implementations of AI, as well as the outputs that are coming from your AI to make sure that they are... consistent with what you're expecting to see." | | 15:41 | Keith Hoodlett | "You can't know if something is broken if you're not testing it." | | 15:56 | Casey Ellis | "Trust But Verify. It's kind of timeless, but we need to do it kind of faster at this point with this stuff." |
Important Segment Timestamps
- [01:44] Early days of AI adoption, scale of risk, introduction to image-based prompt injection.
- [03:37-04:34] Steganography analogy, exfiltration possibilities, visibility challenges.
- [05:25-07:58] Deep dive: why prompt injection is unsolvable.
- [08:42-10:00] Historical analogies: IoT, cloud, and technical debt.
- [11:24-13:46] MCP server attacks and defensive tools.
- [13:46-15:56] Practical advice and future outlook.
Takeaways
- Prompt injection is a persistent, fundamental, and likely unsolvable problem due to the open, probabilistic design of LLMs.
- Innovative attack vectors (like those using image scaling for "prompt steganography") demonstrate that attackers will exploit any new AI feature/behavior.
- Defensive efforts must focus on continuous monitoring, logging, alerting, and validation; prevention is difficult, but detection and response can limit risk.
- Following "trust but verify" principles and engaging specialized expertise are key for organizations at any stage of AI adoption.
For security leaders and practitioners: The landscape is rapidly shifting. Visibility, validation, and vigilance are your best defenses—don't assume security by default, and expect the unexpected as prompt injection vectors proliferate.
