
Loading summary
A
You're listening to the Cyberwire Network, powered by N2K.
B
One of the malware that we identified in our report that we have mentioned. This guy is actually running on the machine. It's collecting all the information of the machine and then sending it to the LLM saying that, hey, you know, I'm running in this environment now give me the code to actually bypass all these things or, you know, do that stuff. Now this thing is dynamically being coming from the LLM or this code is being dynamically generated. It's coming. It's not part of the actual binary. So the actual footprint the attacker has to use in this case is very, very small. You just put a very small binary on that that has plain English inside it, and then it will connect to the LLM and do the rest of the job. So it will definitely increase some complications. But, but I do feel that then we will have the same technology, so we will be able to operate at the same machine speed and then counter.
A
Hello, and welcome to another episode of Data Security Decoded. I'm your host, Caleb Tolan, and if this is your first time joining us, welcome to the show. Make sure you hit that subscribe button so you're notified when new episodes go live. And if you're already a subscriber, thanks for coming back. We encourage you to, as Dr. Seuss would say, rate the show and leave a comment below. Love that rhyme, but in all seriousness, this really helps us reach more listeners like you who are eager to learn more about reducing risk across their business. Now, today I'm joined by Amit Malik, a cyber researcher from Rubrik zero Labs. His team released a report about chameleon malware that hides in the OS and ghost penguins stealing your data via protocols. You aren't even watching Rubrik Zero Labs use LLMs to catch these ghosts. But here's the real question. In a world of AI driven attacks, is your data resilient or are you just building a faster getaway car for the bad guys? Let's get you your answers. Amit, welcome to the podcast. It's so great to have you on Data Security Decoded. I'm really excited for this conversation. But before we dive into the meat of it all, I have to ask you something that I ask every guest, and that is, what is something that is not related to cyber that you're obsessed with? Lately, for me, it's been Pokemon cards. When I was a kid, I did a little bit of collecting of that, and recently a friend got me back into the world of trading Card games and specifically Pokemon. And I've started my own little collection and it's really cool to see how the art has evolved over the years and see the culture that's built up around, around this trading card game. And it's been really cool to be a part of. So what is something that is non cyber related that you're obsessed with?
B
Thank you, Caleb for inviting me to the podcast. Right. And so recently I have got interested into, you know, psychology and philosophy. I'm kind of trying to, you know, go through the people that are prominent and read their books. So right now I am reading Osho and, and, and the teachings from the Osho, like, you know, the, the books that he has written. So there are a bunch of books that I have collected, you know, and I'm going through them one by one. Right now I am kind of reading the, the book name called Awareness and it is pretty interesting. Like, you know, the way, you know, the things are described and then the way human minds work and the ways to, you know, go through that.
A
I love it. When I was in school, I studied political science and we did a little bit of philosophy. Not quite as much as I would have liked, but it's very interesting to kind of stretch and mold your mind in a different way by, by reading different philosophies. So very cool, very cool. Well, I'm really excited to talk about this report that you put together for Rubrik zero Labs. And you know, to kind of slightly play devil's advocate here, everyone really claims that they use AI powered analysis, but how did the LLM change the workflow in this case? Did it actually find the malware or did it just explain what a found much faster.
B
So basically, I mean, we all know that the AI is actually changing the kind of the productivity for people. It is helping a lot in the development world and it is helping a lot in the customer and other areas. Right. So our kind of goal is basically how it can help in the malware analysis because that's what we do day in, day out. Right. So we kind of worked on that thought and designed a system. Now I would not say that we are at a very mature level because there are so much nitty gritties into the malware analysis that happens. Malware sometimes download payload from, once you run them, then they download and then they have various obfuscation techniques and so and so forth. But with the current state of affairs, the type of malwares that we are getting and the decomposition that we can do, like in terms of extracting macros or in terms of doing any unpacking that we see, right. Or any dynamic analysis, basic stuff. We execute and try to extract the memory that we are able to kind of extract the code, and then we try to kind of run that through the LLM. Because LLMs are really good in terms of understanding the code. And at the end of the day, malware is just a code, right? So we have kind of developed specialized prompts working on them. And we asked the LLM, the models, what this code looks like. And surprisingly it is. You know, initially we had not thought really that this can give us that level of results, I mean, the quality of results that we are getting on daily basis. So just to give a perspective, that we are roughly getting around 5,000 to 6,000 samples today based on the hunting and all the stuff that we are doing out of that. We do the clustering and everything that is also part of the system design and so and so forth. Then we get roughly 500 to 600 samples that we really want to look into from the LLM point of view, right? And then we send to the LLM for this analysis. And LLM is providing us only 10 to 20 samples that are worth looking into, that are really new, that are using new techniques and so and so forth. And every day we are finding some surprising facts that are not really possible. If as an analyst I just work on that because the amount of sample is too large, I can't possibly analyze 500, 600 unique samples on daily basis that are of our interest. So definitely I would say that the, the AI or the analysis based on the LLM is definitely helping. As an analyst, I can say that it has increased my productivity to a great level. I do not really have to go and analyze a malware for maybe the initial analysis I don't really have to do that is already being done by the LLM. And two, I would say that if the code is not obfuscated and it's not that complex in terms of, when I'm saying complex, not in terms of functionality, but in terms of the obfuscation and the layer of obfuscation that it is using or the type of payload that it will download from the Internet, so and so forth, then it is providing really, really good results. And that's what I can say from my experience on that.
A
Right. And I'd love to talk a little bit more about that obfuscation that you mentioned there. And also, what a difficult word to say, you know, but anywho, how did the LLM help You really understand the intent of the code rather than just the syntax.
B
So basically what we really do is that you can have a code and then code can be. Let's say if we are talking about a Linux binary, it is an ELF binary. Most of our focus right now is cloud based threads that are either Linux cloud workloads or the ransomwares that are there or other document related, like whether it's a macro or something like that. So let's say that we have a Linux binary and that binary is actually know maybe certain size, it is using certain type of code base inside it. Our focus is to identify the actual business logic of the malware, right? Because in reality that is a lot of code and it will have lots of libraries code as well integrated inside the system stuff that it is using. We don't want to send that to the LLM, otherwise the LLM will get confused. Because if you send all of those things to the LLM, LLM will not really be able to identify what really is the core of the. The logic is. And the cost will also be very high because now you are looking at lots of tokens that you have to consume because you are sending the entire code. So we follow a kind of step process where we have a code, then we kind of remove the library code that is there. There are technology like identification of library code and then removing that, we do the decompilation of the code and then we send the code to the LLM. Now the core of LLM is again you have a business logic extracted before sending it to the LLM and then prompt as well, the type of prompt that we have to use. So it requires a little bit of effort in terms of different permutations because prompt is just English text that you going to talk to the LLM and give that code and do that. So based on our experience and doing a little bit of iterations, we are able to embed this code into a prompt and ask LLM that hey, can you do the analysis of this code that we are sharing with you and what type of functionalities or uniqueness that you see that you know, kind of not seen previously. Now the existing models that we have like ChatGPT or Anthropic and we are talking about they have certain level of guardrails inside them. They are designed for a defensive purpose because they do not know from that perspective whether it is really an analyst that is asking about that or it's a malware author is actually asking about that. So that, so that is the challenge that we kind of need to solve it, but we have prompts that try to kind of say that take as are the good guys and then try to do the best you can and try to tell us in summary what this kind of code is doing. And if there is anything significant, then tell us that you have seen something significant. So that is the way we are using.
A
Yeah, right, absolutely. Now let's talk a little bit about Chameleon C2, which you mentioned in the report. Why is the Windows subsystem for Linux such a juicy target? Is it because EDR tools are essentially just looking the other way when Windows are talking to its Linux sub layer?
B
Yeah, exactly. I think it all depends on the malware evolution. Malware authors are using all the unique ways because they have the infrastructure, they have the horsepower in terms of they can use the different security products in their environment. They have to, you know, they kind of test those things, you know, because the for malware author it is just one single stuff, right? They, they just have to bypass the security products and then they have to just deploy into their target and then compromise those things. Specifically, when we talk about the apts, we believe that some of these, you know, kind of malwares that we have mentioned here, they are part of some malware, you know, the APT groups that have done so. This is the interesting thing. It is not like it has been talked about that the WSL can be abused by the malware authors, but it was mostly in terms of as a proof of concept in conferences by the researchers and so on and so forth. And we have seen like, in this, they have beautifully used the WSL to actually use the Linux subsystems to compromise and do all this kind of activity that they are doing, right? And we are tracking this thing. And now we see initially we saw one or two samples and now we see large quantity of samples in terms of the. That are using this type of technique. That is so when these type of things happen, it can be that the malware authors are actually testing something initially and they want to see whether it is detected, not detected, and what really is going on. And then later on they kind of, you know, use that in more organized way and then use that to compromise more users. So this one was interesting. And so it is not like we have talked about only three cases that we felt is very, very interesting. On daily basis we are getting lots of insight. And again, that credit I will give to the LLM because practically it was not possible for us to go through that number of samples and then extract Those insights from these malwares. It is because of the systems and the design of the systems that we have in accordance with the LLM that we are able to kind of sport these things as they happen or as the malware sample is submitted or as the malware sample is circulated. We have that way to look into it.
A
Yeah, right, right. The productivity gains that you're mentioning are just incredible. So this is really, really exciting. And your report mentions apt 41 is the Linux RAT or RAT. You found a greatest hits version of their old stuff or are we seeing a total rebuild for modern cloud and.
B
Hybrid environments based on the code and variables, it feels like it is associated with APT41 though there is no 100% confidence that this is really associated with APT41. But if you look at the maturity of the code inside that and the type of functionality that is there in the red, it looks like that it is associated with the APT. And most likely it might be associated with APT41. Right? Because on Linux side it is very. Because Linux is primarily used by the enterprise, the servers and all these things. It is not something people want to steal your credit card or your password on that Linux is mostly business orientation. If somebody is deploying it at sophisticated capabilities, then it largely means that they have different and bigger motives than the simple commodity malwares are doing. So in its fairground. We do feel that it is, it is associated with the APT and the system basically identified. On that day we got around three, four notifications that we have talked about in the blog. And because the system is relatively new right now, we are getting very interesting insights right now that, you know, there are a bunch of other things that we haven't really talked about, but we are getting really good insights from the system and with time we will basically share that with the community, our findings. But yeah, I mean it looks like that it might be associated with an APT and it was purely the kind of the behavior of the code and then identification of that is done by LLM itself. We just looked into that. We verified the technical accuracy that. Yes, this is because. So here is the thing, right? So right now at this state, you cannot trust LLM at all, like 100%. You can't say that whatever is LLM is saying is 100%. So that stuff you have to do like okay, it has been referred to as we got the notification, we saw that there is something new coming, but then it has to be technically validated so that all the functionality, we validate it, you know, manually to make sure. That everything that is there is. Is correct in, in its sense.
A
Absolutely.
B
Yeah.
A
Hallucinations are still a thing and so it's important to have that human layer of trust there for sure. Now your report also mentioned that Ghost Penguin uses UDP for communication for our non networking nerds out there and engineers. Why does that make a defender's life really like a nightmare?
B
I mean the UDP is definitely like TCP is something that you kind of know that there is a handshake and then there is a proper packet rebuilding that you can do in terms of. Because there is a sequence and all these type of things. You can kind of correlate the timelines and then see hey, what really is being done. But UDP is very, very, I would say asynchronous in that way. Like you have packet delivery right now, you have command and control and then analyzing, let's say PCAP file. Will not that be easy in terms of. In TCP you can correlate sequencing and stuff, but in UDP it's kind of difficult right from that point of view. But it was kind of interesting to see that the malware authors are actually going in that direction and using UDP as a command and control.
A
So I want to step back a little bit from the report. We've dove into multiple elements of this blog and report that you put together. But if I'm a CISO at some mid sized company, I'm probably thinking I don't have a Rubik Xero Labs, I don't have this team that's investigating into these innovative ways to scan and analyze malware. How do these findings change how an average company should look at their posture?
B
One thing I would say, and I think pretty much everybody is doing, if that level of resources or skill set is not there, then definitely just like we are contributing, it's very, very security community is very strong in terms of the information that is coming from all the companies, whether the other security companies are there. So they do proactively share any important intelligence on their blogs and stuff. So they should keep an eye on that thing to see what really is going on and then they should see what really is their environment look like. Right. If their environment is having Linux exposure higher or the cloud, which is going to be the case, then they should kind of ingest that information and then try to see how they can leverage this information coming in their security posture and how they can follow the best practices to mitigate the risk.
A
Right. All right, so my last question for you kind of circles back to the beginning, you kind of started us off with talking about how you're really getting into analyzing different philosophies. And this is kind of a philosophic question for you. So if hackers are Starting to use LLMs to write the code as fast as your systems are analyzing it, who wins that race? Is machine speed defense enough when the attack also starts moving at machine speed as well?
B
Yeah, I mean, definitely that is very interesting because there is one report that we have published ourselves as well where we have kind of shared that there are malware samples that are actually trying out. It is not really in production, but I feel like the malware authors are kind of trying the different ways how they can use the LLM in order to devise different things. And that report is public actually on Rubik zero Labs page where we have shared the case studies from the live malware samples. So one of the things that I feel is basically since the malware authors are using the LLM and at the defense side we are also using LLM to analyze. I think the problem right now for the authors or the malware authors is basically to call the LLM, because LLM is not something, at least as of now, you can host on your own, right? It is something that you have to rely on some third party and something. So you have to make a call that is reasonably a good thing for defenders because then they know that hey, some sort of call is being made to a public provider and then they can watch for that and then they can see and block those things. And it also kind of restrict the attackers to use it as a large scale, because as you use it as a large scale, people will get to know and then they will kind of put those guardrails there. But at the same time, I do feel that at some point in time as the top technology will kind of matures and the computation will be not that much required. And then especially the nation state threat actors where they have the horsepower to actually kind of host their own LLM models without guardrails and all these things. I do feel that it will to some degree complicate the life of the defenders if they start using the LLMs in the code. And the reason I feel is because it's a plain English, it's not really a code. The guy is just writing a prompt saying that, hey, give me this code. Or just to give an example, one of the malware that we identified in our report that we have mentioned, this guy is actually running on the machine. It's collecting all the information of the machine and then sending it to the LLM, saying that hey, I'm running in this environment now give me the code to actually bypass all these things or do that stuff. Now this thing is dynamically being coming from the LLM or this code is being dynamically generated. It's coming, it's not part of the actual binary. So the actual footprint the attacker has to use in this case is very, very small. You just put a very small binary on the that has plain English inside it and then it will connect to the LLM and do the rest of the job. So it will definitely increase some complications. But I do feel that then we will have the same technology so we will be able to operate at the same machine speed and then counter these two things, right?
A
I certainly hope so, because, you know, recently we had a conversation with Hayden Smith, CEO of Hunted Labs on the podcast too, where we were talking about how attackers are even more so now using these open source libraries to insert malware into different technologies in people's environments. Something that a lot of enterprises aren't even really aware of necessarily at this point. So it's kind of interesting to kind of build on that a little bit more. But Amit, thank you so much for joining us and telling us about this innovative method that you and the team have uncovered, using LLMs to uncover malware within code and kind of analyze it there too. So for those who are listening, you can learn more about this in the show notes. We'll drop the link to the report in there. Also on zeolabs.rubrik.com I know they have a bunch of other resources that Amit and the team are pulling together all the time, so they're really doing some cool stuff over there. Amit, is there anything else you want to leave our audience with as you as we wrap up here?
B
I would say at Jiro Labs we are doing pretty good stuff and we are making our effort to share our findings with the security community. So I hope those contributions make life easier somewhere.
A
Wonderful. Thank you so much again and until.
B
Next time, thank you, Kelly.
A
That's a wrap on today's episode of Data Security Decoded. If you like what you heard today, please subscribe wherever you listen and leave us a review on either Apple Podcast or Spotify. Your feedback really helps me understand what you want to hear more about. And if you want to email me directly about the show, you can send me an email at data-security-decoded2k.com thank you to Rubrik for sponsoring this podcast. Thanks to the team at N2K, which includes senior producer Alice Carruth and executive producer Jennifer Ibin. Content strategy by Mayan Plout Sound design by Elliot Peltzman Audio mixing by Elliot Peltzman and Trey Hester Video production support by Bridging Cricky Wilds and Sorel Joppy. Until next time, stay resilient.
Host: Caleb Tolan (A)
Guest: Amit Malik, Cyber Researcher at Rubrik Zero Labs (B)
Date: January 20, 2026
This episode explores how Rubrik Zero Labs leverages Large Language Models (LLMs) to rapidly analyze malware and keep pace with increasingly AI-powered cyber threats. Host Caleb Tolan and guest Amit Malik discuss their recent report on "Chameleon malware" and "Ghost Penguin" threats, detailing how attackers are both evading detection and experimenting with LLMs to generate adaptable code. The conversation dives into real-world examples, the challenges and benefits of AI-driven malware analysis, and how defenders can apply these insights—regardless of organizational size or resources.
Traditional Limitations:
New Workflow with LLMs:
Workflow to Extract Business Logic:
LLM Guardrails:
Quote: “Based on our experience and doing a little bit of iterations, we are able to embed this code into a prompt and ask LLM…if there is anything significant, then tell us.” — Amit Malik [09:14]
Targeting WSL:
Enhanced Detection:
Quote: “It was not possible for us to go through that number of samples and then extract those insights from these malwares. It is because of…systems and the design…with the LLM that we are able to…” — Amit Malik [11:52]
Attribution Insights:
Caution Required:
Why UDP Matters:
Quote: “UDP…you have packet delivery right now, you have command and control…analyzing…will not…be easy…in UDP it’s kind of difficult…” — Amit Malik [15:36]
How Can Organizations Benefit?
Quote: “If their environment is having Linux exposure higher or…cloud, which is going to be the case, then they should…try to see how they can leverage this information…in their security posture…” — Amit Malik [16:58]
Current Edge for Defenders:
Emerging Threat & Defensive Parity:
Optimism:
“[With LLMs] as an analyst, I don’t really have to do the initial analysis—that is already being done by the LLM.” — Amit Malik [05:34]
“If you send all of those things [all code including libraries] to the LLM, [it] will not really be able to identify what really is the core.” — Amit Malik [07:47]
“The security community is very strong in terms of the information that is coming from all the companies… They do proactively share any important intelligence…” — Amit Malik [16:49]
“Right now at this state, you cannot trust LLM at all, like 100%. You can’t say that whatever is LLM is saying is 100%. …it has to be technically validated so that all the functionality, we validate it, you know, manually to make sure…everything that is there is correct…” — Amit Malik [14:10]
“You just put a very small binary on that that has plain English inside it, and then it will connect to the LLM and do the rest of the job. …it will definitely increase some complications. But…we will have the same technology, so we will be able to operate at the same machine speed and then counter.” — Amit Malik [20:06]
Rubrik Zero Labs' innovative use of LLMs is enabling defenders to analyze massive volumes of malware samples rapidly, surface truly novel threats, and counter emerging attacker innovations—like dynamic, environment-aware code generated at machine speed. While AI is closing some gaps, the human layer remains critical for validating findings and arming organizations with actionable intelligence. Even organizations without specialized resources can learn from and apply these insights; staying informed, vigilant, and adaptive is essential in a world where attack and defense are both accelerating.