Summary5 min read

CyberWire Daily: Research Saturday

Episode: When “safe” documents aren’t

Host: Dave Bittner (N2K Networks)
Guest: Omer Nindberg (CTO, Novi Security)
Date: March 28, 2026

Episode Overview

This episode dives into the hidden dangers lurking within seemingly "safe" documents, exploring vulnerabilities in widely-used PDF engines that can have widespread consequences for enterprises and users. Dave Bittner speaks with Omer Nindberg, CTO of Novi Security, about their research titled From PDF to pwn, which investigates how attackers can exploit PDF viewers embedded in applications, and how AI-driven tools are rapidly changing vulnerability discovery at scale.

Key Discussion Points and Insights

1. The Hidden Risks in Embedded PDF Engines

Attack Surface Expansion: Many companies embed third-party PDF engines within their applications. If one engine is compromised, attackers could potentially access many companies and their user data.
- “PDF engines are something that a lot of companies embed into their applications...you can compromise lots of companies and customers just by them integrating those PDF engines inside of their applications.” — Omer Nindberg (01:30)
Initial Research Motivation: Novi Security’s approach begins with the assumption that vulnerabilities exist and the goal to understand how far AI can push the discovery process.
- “The mindset of a vulnerability researcher is there’s always another vulnerability…if you keep on digging, you’ll find it or find traces that will lead you to the correct way.” — Omer Nindberg (02:25)

2. Anatomy of a PDF Engine Exploit ([03:24])

Technical Dive (PDFTron Case):
- PDFTron, a popular engine, is embedded as an iframe in hosting applications and communicates via postMessage.
- The researchers investigated undocumented UI configuration parameters and discovered that some inputs allow direct JavaScript evaluation.
  - Obfuscated, minified JavaScript complicated their work.
- Exploit achieved by embedding HTML within SVG elements to bypass security filters, ultimately enabling JavaScript execution.
- “...we found a way to execute JavaScript, which was really nice.” — Omer Nindberg (06:44)

3. Scaling Vulnerability Discovery with AI ([06:49])

Manual Discovery vs. Automation:
- Static code is easier to analyze; dynamic single-page applications pose challenges.
- Dynamic code requires runtime analysis and instrumentation to trace data flow and catch vulnerabilities.
- "In dynamic applications...the only place that the actual code flow can be investigated is actually at runtime." — Omer Nindberg (07:07)

4. Embedding Researcher Instincts into AI ([11:55])

Beyond Simple Pattern Matching:
- AI agents trained not only on code, but on replicating expert intuition: prioritizing interesting leads, recognizing common hurdles (like SVG/HTML embedding), and applying proven bypass tricks.
- Training involved exposing agents to thousands of real-world environments to iteratively improve detection.
- "We actually train our agents on those intuitions and we try to navigate their preferred path to paths that actually correlate to finding more vulnerabilities." — Omer Nindberg (11:55)

Memorable Quote:

“When somebody that...knows how to research vulnerabilities and done it for years...they just have instincts of what’s more important than the other things.” — Omer Nindberg (11:55)

5. Specialized AI Agent Roles: The Swarm Approach ([14:49])

Agent Types:
- Tracer: Maps out the attack surface.
- Resolver: Connects sources (inputs) to sinks (dangerous functions) to hypothesize possible exploits.
- Bypass: Focuses on achieving actual exploitation, often requiring coding expertise.
Workflow Mirrors Human Researchers: Specialized agents mimic distinct phases of manual research, optimizing the chances of a successful find and proof.
- "Each different task requires different skills that the agent needs to embed inside itself." — Omer Nindberg (16:49)

6. From “Vibes” to Proof: Genuine Exploit Validation ([17:21])

Critique of Other AI Tools: Most tools only produce probable findings—“vibes”—versus actual, reproducible proofs.
Novi Security’s Aim: Deliver not just a hypothesis, but working exploit code (e.g., a proof-of-concept script, triggering real xss or idor exploits).
- “So the proof that we provide is actually something that you can just take, run, and then you’ll say, ah, yeah, this makes sense. It does exactly what I would expect it to do.” — Omer Nindberg (18:27)

7. The Changing Game for Defenders ([19:04])

Defenders Must Keep Up:
- Attackers are now armed with tools that make previously labor-intensive, niche vulnerability discovery rapid and routine.
- Security teams must employ equivalent tools and strategies to avoid being outpaced.
- "Defenders must move a lot quicker than before because it’s just easier now to automate and scale everything." — Omer Nindberg (20:32)

Memorable Moments & Notable Quotes

On the thrill of discovery:
- “Once we started to dig deep...we found a way to execute JavaScript, which was really nice.” — Omer Nindberg (06:44)
Describing AI agent learning:
- “It’s not a single action, but...an iterative motion that the goal is at the end to find a vulnerability.” — Omer Nindberg (13:40)
On the dangers of not adopting AI-driven defense:
- “If today there is a tool that can find vulnerabilities that yesterday were impossible...and you’re as a defender not using that tool...you’re going to be in trouble.” — Omer Nindberg (19:32)

Timestamps for Key Segments

| Time | Segment/topic | |---------|------------------------------------------------------------| | 01:30 | The risk of PDF engines in enterprise applications | | 03:24 | Walkthrough: exploiting the PDFTron engine | | 06:49 | Challenges of scaling vulnerability discovery with AI | | 11:55 | How elite researcher instincts are embedded in AI agents | | 14:49 | The collaborative AI “swarm” (tracer, resolver, bypass) | | 17:21 | Differentiating “real proof” from “vibes” in AI security | | 19:04 | Defender takeaways and the new urgency for security teams |

Final Takeaways

Embedded document engines, like PDF viewers, dramatically increase the attack surface for all applications utilizing them, especially when misconfigurations or insecure parameters exist.
Attack techniques now blend deep technical know-how with creative use of document formats (SVG/HTML) and runtime analysis.
AI can scale vulnerability discovery, but to be effective, must be trained not just on code, but on the instincts of human expert researchers.
Defensive teams must rapidly adapt, employing similar AI-powered tools to keep ahead of attackers exploiting these new automated, scalable offensive methods.

Guest: Omer Nindberg, CTO, Novi Security
Research Discussed: “From PDF to pwn”
Host: Dave Bittner, N2K CyberWire

Loading summary

Transcript28 lines

[00:02]
A
You're listening to the Cyberwire Network, powered by N2K. And now a word from our sponsor, arcova. Formerly Morgan Franklin Cyber, arcova is a global cybersecurity and AI consulting firm built by practitioners who've been in the seat. They work directly with enterprise teams to solve complex security challenges, building secure by design programs that hold up as technology and threats evolve. From focused engagements to long term partnership, arcova delivers outcomes that endure because no one should navigate complexity alone. Learn why leading Global Enterprises Trust arcova@www.arcova.com that's a R C O V A.com.
[01:06]
B
Hello everyone and welcome to the Cyberwires Research Saturday. I'm Dave Bittner and this is our weekly conversation with researchers and analysts tracking down the threats and vulnerabilities, solving some of the hard problems and protecting ourselves in our rapidly evolving cyberspace. Thanks for joining us.
[01:30]
C
PDF engines are something that a lot of companies embed into their applications and then if you have a vulnerability in one PDF engine, so as a third party attack, you can compromise lots of companies and customers just by them integrating those PDF engines inside of their applications.
[01:54]
B
That's Omer Nindberg, CTO of Novi Security. The research we're discussing today is titled From PDF to pwn. So before you all brought AI into this process, what did you all do to manually identify the issues inside of some of these PDF viewers? What made you think there's something deeper here? This is worth pursuing.
[02:25]
C
When we start to investigate any application, you don't know if there's a vulnerability or not, but you always presume that there is. The mindset of a vulnerability researcher is there's always another vulnerabilities. There's still something that nobody else has found before and if you keep on digging, you'll find it or find traces that will lead you to the correct way. We didn't start this whole process because we thought we're going to find like thousands of vulnerabilities, but we wanted to understand what's the limits that we can do with AI. And then we just started with the first engine which was a PDF drone by Uprise because we found a few of our customers that had that engine.
[03:13]
B
Well, let's walk through it together can take us through the story of how you all dug into these and what sorts of things started to be unveiled for you.
[03:24]
C
So I think the first thing that we, once we started that we found out that a lot of the engines, or if I'll talk specifically about PDF Tron, the engine itself is embedded in the Application as an iframe. What that means is any application that uses that engines in order to render PDFs, for example, it needs to communicate with that iframe via post message or something like that. We started to investigate the trust layers between the application itself, the hosting application, which is unknown because it can be any application, and the engine of PDF drone or the embedded JavaScript inside the iFrame. Once we try to understand all the connectivities between the two, we found interesting post messages that the parent of the iframe sends to the iframe itself in order to initiate it. For example, one of the things that we found that there is a parameter that's unrequired, but you can provide it and it has UI configurations that changes the way that the engine displays the rendering application. Think about it, the PDF engine, what it is is a place where you can edit files, add annotations, put comments, add signing and things like that. Once we start to investigate all the inputs that are available and we found something that's undocumented, but it had something that appeared to be a very massive changer to the application itself, we just started to dig deeper and deeper and this whole thing is obfuscated, minified JavaScript, which is always nice. And when we started to dig deep, deeper and deeper, what we found is that some of the inputs that you're able to provide from the configuration, the external configuration via the post message, it gets into a sync that just evaluates JavaScript. It didn't end there. We still needed to bypass some mechanisms because we were able to inject JavaScript code into a image tag, but we couldn't supply, let's say for SVG, we can supply JavaScript inside the SVG. And then there we did something that was also very nice inside the svg, we embedded HTML, the DOM processor, Once it saw that we're in the context of svg and then we're again in the context of HTML. So didn't parse the internal context of HTML as svg. And then all the bypasses that were in the code, we just bypassed them altogether. And then we found a way to execute JavaScript, which was really nice.
[06:45]
B
Must have been an exciting moment.
[06:48]
C
Yeah, for sure.
[06:50]
B
Now, in the research you talk about this problem of scaling, you found these things manually, but then you had to go back and say, can we do this again and again at scale? Why is that so difficult? In these dynamic applications, the places where
[07:07]
C
scaling is a lot easier is when code is just code and programs are statically by nature. So It's a lot easier because then you have a very distinct path from sync to source. That means from a user input into a dangerous function. If it's easy to go and draw that path, it's usually easy to create an exploitation and then prove that it's a vulnerability. But in dynamic applications, and most single page application and modern applications are dynamic by nature, it's very, very hard to understand how you connect the dots in JavaScript. A lot of the times you have objects or you have tables and functions that call entries inside of tables, that call different entries of different tables and indexes all over the place and everything is dynamic. The only place that the actual code flow can be investigated is actually at runtime. And that's something that, let's say the old tools, it was very difficult for them to understand. And for in order to find vulnerabilities in that case, you actually need to live in runtime. You need to run on real applications that are actually running now. And then you need to understand how to add tracing or instrumentation to the application itself in order to be able to investigate them, or give AI or any code, any program the tools in order to investigate in real time.
[09:00]
B
We'll be right back. Most environments trust far more than they should, and attackers know it. ThreatLocker solves that by enforcing default deny at the point of execution. With ThreatLocker allowlisting, you stop unknown executables cold. With ring fencing you control how trusted applications behave. And with threatlocker DAC defense against configurations, you get real assurance that your environment is free of misconfigurations and clear visibility into whether you meet compliance standards. ThreatLocker is the simplest way to enforce zero trust principles without the operational pain. Its powerful protection that gives CISOs real visibility, real control and real peace of mind. ThreatLocker makes zero trust attainable even for small security teams. See why thousands of organizations choose ThreatLocker to minimize alert fatigue, stop ransomware at the source and regain control over their environments. Schedule your demo@threatlocker.com N2K today. Foreign. Ever wished you could rebuild your network from scratch to make it more secure, scalable and simple? Meet Meter the company reimagining enterprise networking from the ground up. Meter builds full stack zero trust networks including hardware, firmware and software. All designed to work seamlessly together. The result? Fast, reliable and secure connectivity without the constant patching, vendor juggling or hidden costs. From wired and wireless to routing, switching, firewalls, DNS security and vpn, every layer is integrated and continuously protected in one unified platform. And since it's delivered as one predictable monthly service, you skip the heavy capital costs and endless upgrade cycles. Meter even buys back your old infrastructure to make switching effortless, transform complexity into simplicity, and give your team time to focus on what really matters, helping your business and customers thrive. Learn more and book your demo@meter.com cyberwire that's M E T E-R.com cyberwire. One of the things that caught my eye in the research when you all were going through the process of teaching your LLM agents, you say that the edge wasn't just running LLMs on code. You describe it as embedding elite researcher instincts. What does that really mean in practice?
[11:55]
C
Yeah, so in practice, if you look for again syncs and sources, as I said before, you'll find tons of potential links and tons of things that you should should look for. But when somebody that, that knows how to research vulnerabilities and done it for years, so they just have instincts of what's more important than the other things. When you go into a specific path, where are the hurdles that you will probably meet and how do you need to mitigate them if you're blocked? Let's say what I explained before with the SVG and then HTML inside it and then another SVG inside of it. So that's like a bypass technique that you need to know about. And in order to, to do or to create an agent that's able to do that, you need to give it tons of tricks and intuition and we actually train our agents on those intuitions and we try to navigate their preferred path to paths that actually correlate to finding more vulnerabilities.
[13:11]
B
How do you teach an agent to think in terms of these trust boundaries instead of just like pattern matching?
[13:19]
C
That's something that we didn't cover in the blog itself, but we are going to provide another blog that's a lot more AI related.
[13:27]
B
Can you give us a preview?
[13:29]
C
Yeah, for sure, no problem. What we need to do in order to be able to do something like that, we need data and vulnerability research of real applications. Data is actually environments. So what we need is hundreds or thousands of environments with vulnerabilities that the agent didn't learn on. Because we don't want the environments to be contaminated. They need to be something that the agents don't know and then we actually just give them the task, let's say find an XSS in this specific part of the application and we do that across thousands of applications and that's our data and it's not A single action, but it's a iterative motion that the goal is at the end to find a vulnerability. And once we have tons of data and traces, so after we have all that, we can actually optimize the path and like quantify it. What changes can we make in order to make the agent better? And what does it mean better? It just means that statistically it finds more vulnerabilities.
[14:50]
B
You all describe this as a collaborative swarm. You talk about tracer, resolver and bypass. Can you explain that to us?
[15:02]
C
You can have a generic agent that does everything, but like most things, if you have something generic that it's not, then it's not specialized. So the way we try to tackle that task is we do create specialized agents and we do have flows that match the way that researchers research. So, for example, we start by investigating what are all the things, what is all the attack surface. So sinks is something actually pretty far, like pretty advanced. But we start by understanding what is the attack surface that an attacker can have. And after we have that, so we can continue and then ask what are all the sinks that are available in the code? We don't have code all the time, but if we have, that's great. And then we try to create hypothesis that connect syncs to source, because if you have a connection, that it means you have a vulnerability. But let's say we did that, then we have to optimize on actually creating an exploit. And there is a big difference between a potential connection sync to source and an actually working exploit in a live environment. And each tasks, like from the tasks that I just described, requires different, like a bit of a different mindset. So let's say for creating the exploitation, what you actually like, what we understood is what you actually need is a very good coding agent. Because creating an exploitation is actually creating code. We need to create a POC script that proves that what we've done actually works. But in the previous steps, let's say finding sinks inside of a source file, that's something that you can do statically, but connecting it to dynamic patterns, that's something very hard there. You need an agent that's very good at instrumentation and reading logs. Each different task requires different skills that the agent needs to embed inside itself.
[17:22]
B
You all make a claim, it's a strong claim that most AI security tools produce vibes, not actual proof. But you say your approach is different. What distinguishes what you all are doing there?
[17:36]
C
Yeah, I think that our end product is not assumption or a claim that we have a vulnerability and our platform. What we actually strive for and provide to our customers is full validation that we have a vulnerability. And the way that we are able to provide full validation is by actually creating reproducible code that proves something is vulnerable. So let's say if we're talking about idor, for example, so we prove that from one user context you can access data from a different user context. And the way to prove that is writing a code that logs in, has a user context, and then being able to provide some kind of request, for example, and then extract data of a different user. And that's something that's verifiable. If we do xss, for example, we prove it by instrumenting the browser, for example, and then validating that we were able to spawn an alert box, for example. So the proof that we provide is actual something that you can just take run and then you'll say, ah, yeah, this makes sense. It does exactly what I would expect it to do. And it's not just a very nice hypothesis.
[19:04]
B
From a defender's point of view, what should security teams take away from this research?
[19:11]
C
This research is really talking about offensive security and how to find vulnerabilities. But I think in today's world you don't have the privilege to not use a tool like this or use AI in order to look for vulnerabilities yourself inside your applications and even the most niche ones. Because if once you could have thought of why should an attacker look inside some niche place and do whatever, maybe it made sense, okay, but it's still something that it's high effort. And if today there is a tool that can find vulnerabilities that yesterday were impossible to find that scale and you're as a defender, you're not using that tool to discover those before the bad guys. So you're going to be in trouble. So from a defender point of view, I think this research and other researchers research in the same field just proves that defenders must move a lot quicker than before because it's just easier now to automate and scale everything.
[20:39]
B
Our thanks to Omer Ninberg from Novi Security for joining us. The research is titled From PDF to pwn. We'll have a link in the show notes and that's Research Saturday brought to you by N2K CyberWire. We'd love to know what you think of this podcast. Your feedback ensures we deliver the insights that keep you a step ahead in the rapidly changing world of cybersecurity. If you like our show, please share a rating and review in your favorite podcast app. Please also fill out the survey in the show notes or send an email to cyberwire2k.com this episode was produced by Liz Stokes. We're mixed by Elliot Peltzman and Trey Hester. Our executive producer is Jennifer Ibin. Peter Kilpe is our publisher and I'm Dave Bittner. Thanks for listening. We'll see you back here next time.
[21:47]
C
Two kinds of fishing out here, one
[21:49]
B
for fish, one for your data. Hackers try to hook you, but Cisco
[21:54]
C
Duo keeps every user and device protected.
[21:57]
B
Cisco Duo Fishing season is over. Learn more@duo.com.