Loading summary
A
Hello everyone, this is Tom Uran and I'm here with Gruk for another between two nerds discussion. G', day, Grok. How are you?
B
Fine. Yourself, Tom?
A
I'm very well. This week's edition is brought to you by Prowler, the open cloud security platform. Find them@prowler.com so this week, as per usual, you sent me a paper and it's this one's from Google's Threat Intelligence Group, formerly Mandiant, I think, and it's titled AI Threat Tracker Advances in Threat Actor Usage of AI Tools. So we're just going to discuss different, I guess, maybe asymmetries, why AI might be easier for criminals to use rather than defenders.
B
I'm not sure that's been demonstrated yet, but it's theoretically true.
A
Yeah. And I guess we'll look at AI because it seems like the security space is maturing and every week or every month there's a new application of AI that actually seems practical and useful and it does seem to be delivering real benefits for security people. So this paper is kind of a look at, at the other side. Typically criminals don't talk about them themselves necessarily, or at least not where I hang out.
B
To start off. One of the interesting things here is that this is an update to a paper that came out in January that's relatively quickly that they're coming up with a new paper to update their previous findings within the same year. And there appears to be quite a bit of change as well. There's been a lot of progress. Well, maybe not progress. There's definitely been a lot of change in that time.
A
Yes.
B
So I think there's probably a high level of, or a high pace of innovation, which is. I think that's probably matched by what's happening everywhere with AI right now. Right, right.
A
So this paper we've talked about, I think it was anthropic and maybe even OpenAI a while back about how through threat actors were using the models. And that was from the perspective of the organizations that make the models. They were looking at the queries that were coming in and working from the queries out. This one is based on threat intel. So it's got a very different flavor in that. It's here's malware that we see and here is how the malware is using LLMs or whatever, AI technology. Right. So that's a pretty significant difference.
B
Well, they do include that. Here's how they've been using Gemini, so they're including that as well. But I think, yeah, obviously the main difference is that they do more than just that.
A
Right.
B
They cover the same stuff that the other companies do. And also this additional threat intelligence information about the malware. And the interesting thing about the Cybercrime forums having AI tools available, which we'll get to and all that. So yeah, there's a lot more here to talk about.
A
So one of the first sections is first use of just in time AI in malware. So that's a bit of a mouthful, but it goes on. For the first time, Google Threat Intelligence Group has identified malware families which it calls prompt flux and prompt steal. The that use LLMs during execution. These tools dynamically generate malicious scripts, obfuscate their own code to evade detection, and leverage AI models to create malicious functions on demand rather than hard coding them into the malware.
B
Ah, so.
A
Let me. While still nascent, this represents a significant step towards more autonomous and adaptive malware. Yeah, so you've got problems, right?
B
Yeah, they go into. What's useful here is that they provide some details on exactly what these prompt steel and prompt flux are doing and it's not impressive. So basically these things just, they have a long plain text string that's a prompt that gets fed to an LLM and then the LLM returns a one line command for like a one line Windows command to run. And it seems to me that if I am a reverse engineer, I would much rather have an LLM prompt to look at than a bunch of obfuscated code. Like if you get a piece of malware and it's really hard to decode, that's annoying. If you get a piece of malware that's mostly empty but just has a. You are an expert senior malware developer. You will write an obfuscated ransomware package that will then find every text file, doc file, excel file and independently encrypt them and store the key. That sounds super easy to reverse.
A
Well, it actually has examples from prompt steel malware. It has the command in the report and it's make a list of commands to create a folder and gather computer information, hardware processes, etc etc, ad information and execute in one line and add each result to a text file. So it actually spells out in pretty clear direct language what you want it to do. And it's got another example and so on. And so you would read that go, okay, that's what it's trying to do.
B
Right?
A
You can assume, I guess probably the LLM will do a fine job most of the time and it's going to get better.
B
So I think there's a few to me, as just a straight up developer, if I'm looking at this, I'm thinking, here's the problem. If I write a script that does these things, I can write it. So it's going to work 99.999% of the time. There's going to be some edge cases where it doesn't.
A
It seems like a defined problem set where you know exactly what you're looking for.
B
Right.
A
It'll be in pretty much the same locations every single time.
B
It didn't even generate a dynamic space to store the information file that it collects. Right. It's basically like if you had a junior developer and you said, okay, write a PowerShell script that's going to go through and do these things. And they went to Claude code and they said, write me a PowerShell script. Actually, you know what, never mind Claude code, write me a prompt to give to Claude. And then you ship that instead. And it seems pretty dumb, is what I'm saying. If this is the future of malware, we have nothing to worry about.
A
It wasn't exactly clear to me what the point was, because you're taking a defined problem where you can write a script and replacing it with a defined problem where you write a prompt.
B
You've added a layer of complexity and uncertainty to a simple fixed problem. Right. You've taken something that's very straightforward, that's been sold for decades, it's easy to do, and you've now made it uncertain, unpredictable, slower, more fallible because there's more like making the request, getting it back and executing it is more steps than having it in a single script file that you run locally. I don't understand what the point is.
A
Right, yeah. Now, there could be parts, but I don't really see it in the report where it's based on the information you've gathered. Make a decision.
B
Right, Right. If it said, okay, now download the zip file of all of the documents that you've collected, analyze those documents and make a determination about whether this is a high priority target or what the next step should be, or where are we on the network and where should we go next?
A
What I would like is if make a decision on what information will let me determine if it's a high priority target and send that information to me. Because that would be like you're filtering a whole lot of stuff.
B
Right? But it's doing this through a network request, so we have to upload all of the stuff. Right. So you basically have to do an exfil of everything to figure out if you need to exfil a subset.
A
Yeah, well, with that small wrinkle, but like, in principle, yeah.
B
I guess if you look at it, you need to exfil it to someone else's network. So technically you're saving your own bandwidth.
A
Yeah. So to be clear, the exfil is to get it to a capable LLM. So presumably this malware is not coming with, you know, eight gigabytes of LLM attached to it that is then running on the local system. That's. That's what I'm trying to say.
B
Right. You could, you can tell which boxes have been hacked because they basically start rebooting when they run out of ram and their GPUs melt. Like, that's.
A
So it's interesting, the sort of bigger description of this malware data miner written in Python. It queries the hugging face. LLM Quinn 2. So it's going to. Hugging face. Prompts used to generate the commands indicate that it aims to collect system information and documents in specific folders. Prompts.
B
Gee, you reckon that was some sort of. That's a profound insight based on a lot of hardcore reverse engineering work that they've done there.
A
Prompt steel then executes the commands and sends the collected data to an adversary controlled server. And this one has been observed in operations, so they've got different categories where some are experimental and some have actually been seen in the wild.
B
Right. So I think the embarrassing thing here is that this was from Apt 28. This is fancy fair. Like the GRU have actually deployed this in Ukraine. And while Google has called this prompt steal, the Ukrainians called it Lame Hug, which I think is a lot more accurate.
A
It's a bit of a mystery to me because I thought that Fancy Bear was relatively competent. Right.
B
I did as well. But. So the thing is that during the war there has been a huge increase in the number of personnel in cyber. So I think there might be a very broad brush here with Fancy Bear. Like you're talking about hundreds of people in dozens or 100 teams. You know, like there's just, there's a lot going on and not all of them are the seasoned professionals. There's probably quite a lot of, I would say, entry level developers based on this.
A
So these are the developers fresh out of university who are used to using AI to create their programs.
B
Right. But I think what's even more embarrassing is that their manager must be fresh out of university as well, because he approved this.
A
I think it's interesting that they didn't talk to some other group and say hey, where's a script that would do this for me that is tried and tested? Or look at, like, I don't know, ransomware manuals or something like that which would have these.
B
Anything. Just look at GitHub like this. There's got to be a bazillion like. Or even run it on your own. Your own machine and take that output and copy and paste it into, like. You know, this is.
A
I mean, I guess on the plus side, for the Russians, they're taking people with low levels of skill and getting them to write malware that they're deploying in a war.
B
Okay, so that's a win, I guess, technically, I suppose. I mean, theoretically. Right. If people are saying, like, AI creates new capabilities for people who wouldn't otherwise be able to do it, or it lowers the barrier of entry, that appears to be the case. Right. Like, people who would not normally be able to do malware are now able to do malware.
A
But, I mean, I think it's lowering the barrier in a nonsensical way. Like, it's allowing you to do really stupid things.
B
Right. So here's the thing that happened during the Vietnam War. Basically, it was very difficult getting people to go. Like, there was the draft, and it was very difficult to get qualified people because anyone who is smart and rich or whatever could find ways out. So McNamara, who is the Secretary of Defense, made this decision to lower the entry requirements, and they dropped the IEQ requirement to, like, 80.
A
Right, right.
B
And so they had. They made this argument that, you know, like, it would be good for these people who are sort of have a hard time in society. We can put them in a structured environment. When they leave here, they'll be veterans. It'll be a lot easier for them to get a job. They'll have skills. But the reality is when you have someone who's got an IQ of like, 70, they're not really able to tie their shoes. Right. So when they would go to basic, they would have to have someone, like one of the other soldiers would be assigned to them to help them, like, put their clothes on and tie their shoes. And it turns out that these guys did very, very badly in combat because, you know, so, like, yes, on the one hand, they were able to add thousands of people to the army, but it was a net negative for everyone because almost all of them got dishonorably discharged because they were not fit for service. And this feels a bit like that, where it's like, if you can drop the requirement for competence low enough, you can get Anyone doing this, but then you'll have anyone doing this, and that's a problem. And it's a case study on why you don't drop requirements. So far.
A
Yeah. Yeah. Now it seems at least possible. I don't know how effective this malware is, but it seems possible that it actually kind of works. And yeah, I mean, they did call it lame hug, so presumably it's not that good. I would have thought it'd be easy to detect queries to hugging face, for example, as a, as an ic.
B
Yeah. So look, I, I think it works, but I don't know if that's the. I don't know. This part of working is the part that you want to base your metrics on, right? Like if you're an intelligence organization and you've got your KPIs, you probably don't have gather basic information about the box and get.
A
That's not your rate limited.
B
That's not right. Like, that's not.
A
That's for that.
B
Right. Like doing that is not the end of the job. You can't be like, yeah, I got a list of all the documents. Check next target. It strikes me that you're able to show success on a thing that isn't even step zero. I would assume that you are able to show up to work. I wouldn't say it's a win that you show up to work. That's what this seems like. They didn't actually do the job, but they did show up.
A
So some of the experimental ones are interesting, perhaps. So there's one called Prompt Flux. Its primary capability is regeneration, which it achieves by using the Google Gemini API. It prompts the LLM to rewrite its own source code, saving the new obfuscated version to the startup folder to establish persistence. Prompt Flux also attempts to spread by copying itself to removable drives and mapped network shares. So this is classified as experimental. And it seems like the sort of thing where eventually the LLM is going to make a mistake and your malware is not going to work. So it may spread, but you'll then have, I guess, like a virus, at.
B
Some point it's going to go, you're absolutely right. I did forget to encrypt all of the disk files.
A
And so it seems like it. If you're going to have. I mean, this is basically a worm, right? It tends to spread by copying itself to removable drives and mapped network shares. Like, it seems pretty important that you know what a worm does.
B
Yes, I would have thought so. I don't think you Want a lot of uncertainty in what your worm does. Like it just, it seems to me that that's if you've got a thing that's running all over the place, you kind of want to have some control.
A
Right, Right. And I guess maybe the script or the prompt is fairly defined and that doesn't change. But it says do these things rewrite this source code, make it look different. And so it would. It's probably, you know, within a range, not perfectly executing the script, but it probably doesn't deviate too far away from it, the script, the prompt.
B
Right. So I think the thing is you're now relying on Gemini because they're using the Gemini API to read an obfuscated script, rewrite it as a different obfuscated script and preserve functionality 100% between these versions. Every single time it does it that doesn't seem like it will work.
A
Yeah, it seems like if you've got. Well, I guess it's Chinese whispers. Right. And so the problem is when it's self propagating DNA replication.
B
Right. Like you're basically saying copy this thing and it's making an attempt to copy it and sometimes it's going to get it a little bit wrong and then.
A
And again, aren't there already scripts to re obfuscate things that are?
B
Yeah. Polymorphic encryption is from the 90s. If you, if you used an LLM to write a polymorphic encryption engine that did this on the host, that would be. I mean, I'm not sure it could do it, but anyway, that would be an interesting thing. But having the LLM do it itself seems like a recipe for disaster. Like I could see why you'd have this as experimental. It's a thing to play around with. Like it's an interesting thought experiment, but you wouldn't operationalize it.
A
Yeah. So there's another example which is also observed in Operations. They call it Quiet Vault, but it's basically the same as the information gatherer. It's like looking for credentials though. So again, I think there's a limited number of places where credentials are typically stored. You don't need an LLM to go searching for those places. You know where they are.
B
There's. Right.
A
Like, and so why have an LLM write a script?
B
I mean, why have it dynamically write a script? I think is the. Like it.
A
Yes. Yeah, right.
B
It's the. Okay, yeah, sure. If you want to use an LLM to speed up the development cycle, that's fine, but I don't see why you would add an LLM to the operational part.
A
Right. Unless you're making some decision that you can't. Right.
B
So if it's making an actual decision based on what it detects, there is space for innovation and value. If you're saying hunt through all of the emails that you find and everything that's salacious or risky or suggests illegal activity, you know, exfiltrate that there seems to be some value there. But I'm not sure why you would push that out onto the network that you don't control rather than doing it internally on your own one.
A
So there's another section on the paper, purpose built tools and services for sale in underground forums.
B
So another of their key findings is this one, this maturing cybercrime market marketplace for AI tooling. Right. Where they say the underground Marketplace for illicit AI tools has matured in 2025. Gartner has issued the four quadrants and we see that the crime GPT is sort of up and to the right and you've got your early adopters and your. Sorry. So we have identified most multiple offerings of multifunctional tools designed to support phishing, malware development and vulnerability research. Lowering the barrier of entry for less sophisticated actors. Right. In theory. Okay.
A
So I mean, that probably is true.
B
Right.
A
It seems like it will be, assuming they work.
B
Right.
A
But I think there's many things in crime that seem to be quite. They're trivial, episodic or tactical, I guess.
B
Yeah, I think they're simple to solve, they're straightforward in a lot of ways and it's the sort of thing. So there was a paper that came out a couple of weeks ago that has this framework for understanding whether a particular problem is good for being solved by AI, whether AI is a good solution. And the framework that it gives forward is called script. It's short, so it's brief in time, closed, which means that it doesn't need a lot of context, it's repetitive. So it's just sort of one rote to task over and over again. Independent says like it stands alone, it's permissive, so it's forgiving of mistakes. And it's tech ready, which means that already you can do it digitally already. Like that's a solved issue. And so that sort of breaks up into these the two sections of like it's ready to automate, which means it's repetitive, permissive and tech ready. And then it can fit in a context window is the second set, which is it's short, it's closed and it's independent.
A
Right, right. So the examples We've looked at already the gathering of information from a system. The problem is it falls over at the first point. It already has been automated.
B
This is a solved problem. You can add more layers if you want, but like you're not making it better.
A
Yeah, yeah, yeah, yeah. And so it seems to me that like there would be many things in crime where it's like get someone to click on a button and put their credentials in. I guess like that's already automatic, that's already automated. Right. So what's the, where's the value and.
B
And beyond that? I mean if you, if you just look at the, like the romance scams, they have manuals which are basically just template emails for every stage of like if you have a, it'll be, you know, 800 pages of email templates and you just go, okay, I'm at the point where this is happening and I want them to do this. And you scroll to that part of the document and you cut and paste the email.
A
Yeah.
B
And it's sort of battle tested, proven. It's been used repeatedly. Like it always works. You know, it's good enough. And I suspect that those are probably better than having an LLM do that simply because these have been sort of crafted and refined.
A
They're empirically tested. I think they do something like a B testing or.
B
Right. Well, the people that are successful keep the templates that they use.
A
Yeah.
B
And that's much more likely to be correct than an LLM coming in cold is my feeling.
A
I mean, you could use an LLM, I guess, to take advantage of current events or something. Feed it a newspaper article or, or the days.
B
I would, then I would take the template and say customize this for, you.
A
Know, like that's what it. Useful help you get over a lack of native, whatever language skill.
B
Right, right. And it'll do things like if you accidentally forget to replace someone's name, like you're going through and it goes from like Martha to Delilah to like Dave.
A
So within that script framework, that kind of example fits because it's short, it's closed, you're doing the same process. So it's repetitive. But.
B
You don't need a lot of context from previous examples. You just need this current one, this current thing that you're working on and then maybe some temporal context like today's news or this is the name of the person. But those are just script variables. They're not context necessarily. It's not based on all the previous interactions. Derive the next step.
A
Right. Yeah. So that might make it Easier to come up with current news hooks or whatever.
B
Right.
A
But it's not fundamentally changing the nature of what you're doing.
B
I mean, the permissive thing is basically you can make a mistake and it's no big deal.
A
Right? Yeah.
B
And I think the thing with the crime aspect of this is that there's a lot of fish in the sea. Right, Right. So even if you completely screw up one thing, it's like, yeah, you know, there goes that opportunity. Back to fishing, like, back to finding more victims. It sucks, but it's not the end of the world. And I think that the huge difference here from offense to defense is that as an attacker, if you don't gain access, like, it sucks, but you'll just try someone else, where as a defender, you can't say, well, you know, better luck next ransomware. You'll get them next time, slugger. The cost of failure is quite high. And so that's not a good fit.
A
Right. So the places where it does seem that AI is making a difference is in defense is at least in making a decision or helping to make decisions about particular incidents by gathering information and breaking a bigger task into lots of those smaller tasks where it is relatively permissive, or you are just very likely to make a mistake because it's query this database and format the results nicely or whatever. And so it's, I think, more complex in that you've got to take multiple steps and try and break the steps into short tasks that are AI friendly, for lack of a better term. And then maybe the entire body of work is not AI friendly. But because you've done all the little.
B
Subtasks, 80% of it will be. And the last 20% is a lot easier to do because now that 80% has been automated.
A
Yeah. So, I mean, I feel like in this discussion we've been fairly negative about the use of AI in crime, but actually, that's not my view at all. I think that it will be quite useful. And so there's this discrepancy between what I see written in the paper. And it seems that these are all the examples that don't actually make a lot of sense to me. But I think that there must be other stuff going on because you hear of, well, like markets for deep fakes, for example, which I think are probably useful in scams for getting people to suspend disbelief or overcome suspicion or whatever. So I don't understand why this paper in particular seems to have a lot of stuff that I don't think is that compelling.
B
I think these might be the most easily identifiable malicious uses of AI.
A
Right? Because if you're using AI to actually write the script, right. Where's the evidence that it was AI?
B
Right? You get onto a network and you need to decide about what you're going to do. And an AI helps you make that decision. There's not going to be a forensics trace of that. Like it's not going to show up somewhere, right. Unless you're going to, you know, Gemini and you know, dear Gemini, I am a student who is, you know, doing a CTF on a network that looks a hell of a lot like the agricultural Minister of Vietnam's. So I think that these dumb things are what we find because the good things aren't showing up yet.
A
Right?
B
At least that's what I would think. Assuming that there are good uses happening, it seems to me that they're not the sort of things that are going to leave traces that can be detected. So I think one of the things that we brought up was that a lot of this seems sort of like just out of college, first programming job sort of stuff. And what strikes me is that if you are just out of college at your first programming job, what you want to get is skills for your next job. So maybe doing all of this AI development funded by the government is just setting them up to go and work in the private sector for a lot more money. So it's, you know.
A
So even though it looks dumb to us, it's actually smart from employees perspective.
B
Right?
A
Like let's muck around with AI, get good at it and then go make bank legitimately.
B
Absolutely right. So it's, it's cybercrime funding someone's career in AI. Wonderful. Thanks a lot, Dom. Thanks, Rob. It.
Podcast: Risky Business
Date: November 10, 2025
Participants: Tom Uran (A), Gruk (B)
In this episode, Tom Uran and Gruk examine Google Threat Intelligence Group's latest paper "AI Threat Tracker: Advances in Threat Actor Usage of AI Tools". They break down how AI, particularly large language models (LLMs), are beginning to show up in malware and cybercrime tooling, and analyze whether these innovations represent meaningful technical advances or just technological theater. With a skeptical and sometimes irreverent tone, Tom and Gruk assess the actual capabilities of observed AI-enabled malware, discuss the evolving cybercrime market for AI tools, and contrast offensive and defensive applications of AI in cybersecurity.
"This one is based on threat intel. So it's got a very different flavor in that. It's here's malware that we see and here is how the malware is using LLMs or whatever, AI technology." — Tom Uran [02:08]
First Observed Cases: Google identified malware families ‘Prompt Flux’ and ‘Prompt Steal’ that use LLMs during execution to generate malicious scripts or obfuscate themselves on the fly. ([03:15])
How It Works (and Why It’s Lame):
Reverse Engineering Simplicity:
"It seems to me that if I am a reverse engineer, I would much rather have an LLM prompt to look at than a bunch of obfuscated code." — Gruk [04:10]
"You've added a layer of complexity and uncertainty to a simple fixed problem." — Gruk [07:23]
"I thought that Fancy Bear was relatively competent." — Tom Uran [10:37]
"There's probably quite a lot of, I would say, entry level developers based on this." — Gruk [11:15]
"It feels a bit like that, where it's like, if you can drop the requirement for competence low enough, you can get anyone doing this, but then you'll have anyone doing this, and that's a problem." — Gruk [14:00]
"You're now relying on Gemini...to read an obfuscated script, rewrite it as a different obfuscated script and preserve functionality 100%...that doesn't seem like it will work." — Gruk [17:20]
"If you just look at the, like the romance scams, they have manuals which are basically just template emails for every stage..." — Gruk [22:40]
"As an attacker, if you don't gain access, like, it sucks, but you'll just try someone else, whereas a defender, you can't say, well, you know, better luck next ransomware." — Gruk [25:07]
"If you're using AI to actually write the script, right. Where's the evidence that it was AI?" — Tom Uran [27:59]
"It's cybercrime funding someone's career in AI. Wonderful." — Gruk [29:35]
On the redundancy of AI in basic malware:
"You're taking a defined problem where you can write a script and replacing it with a defined problem where you write a prompt." — Tom Uran [07:11]
On incompetent adversaries:
"You could tell which boxes have been hacked because they basically start rebooting when they run out of ram and their GPUs melt." — Tom Uran [09:18]
On the current state of underground AI tools:
"Gartner has issued the four quadrants and we see that the crime GPT is sort of up and to the right..." — Gruk [19:59] (sarcastic)
On attacker vs. defender asymmetry:
"The cost of failure is quite high." — Gruk [25:07]
Wry, skeptical, and occasionally caustic, Tom and Gruk dismiss much of the current AI-in-crime narrative as inflated or premature. They are not techno-optimists for criminal innovation, but also warn not to mistake observable failures for the absence of sophisticated uses—they just may not be visible to analysts yet.
(End of summary)