Data Security Decoded
Episode: How Rubrik Zero Labs Uses LLMs to Analyze Malware at Machine Speed
Host: Caleb Tolan (A)
Guest: Amit Malik, Cyber Researcher at Rubrik Zero Labs (B)
Date: January 20, 2026
Episode Overview
This episode explores how Rubrik Zero Labs leverages Large Language Models (LLMs) to rapidly analyze malware and keep pace with increasingly AI-powered cyber threats. Host Caleb Tolan and guest Amit Malik discuss their recent report on "Chameleon malware" and "Ghost Penguin" threats, detailing how attackers are both evading detection and experimenting with LLMs to generate adaptable code. The conversation dives into real-world examples, the challenges and benefits of AI-driven malware analysis, and how defenders can apply these insights—regardless of organizational size or resources.
Key Discussion Points & Insights
1. The Shift: LLMs Transforming Malware Analysis ([03:54])
-
Traditional Limitations:
- Manual malware analysis is unable to keep up with the volume and sophistication of today’s threats.
- Typical daily sample review: 5,000-6,000 samples are initially collected; after clustering, about 500-600 require deeper review; LLMs ultimately highlight 10-20 truly novel or dangerous samples.
-
New Workflow with LLMs:
- Automated Triage: LLMs quickly analyze code and surface the most noteworthy, unique, or advanced samples.
- Analyst Productivity: “As an analyst, I can say that it has increased my productivity to a great level. I do not really have to go and analyze a malware for maybe the initial analysis…that is already being done by the LLM.” — Amit Malik [05:34]
- Code Understanding: LLMs excel at scanning code for intent, not just function, even in obfuscated samples—if certain preprocessing steps are taken.
2. Technical Deep Dive: Obfuscation & LLM Prompting ([07:25])
-
Workflow to Extract Business Logic:
- Strip out library and non-essential code before feeding samples into the LLM to keep token size manageable and clarify intent.
- Design specialized prompts to direct the LLM towards detecting novelty and intent, not simply regurgitating code syntax.
-
LLM Guardrails:
- Commercial LLMs have built-in protections; prompts and positioning must clarify analysis is for defense, not adversary use.
-
Quote: “Based on our experience and doing a little bit of iterations, we are able to embed this code into a prompt and ask LLM…if there is anything significant, then tell us.” — Amit Malik [09:14]
3. Chameleon C2 & the Windows Subsystem for Linux (WSL) Attack Surface ([10:15])
-
Targeting WSL:
- Malware is exploiting the Windows Subsystem for Linux, which many endpoint detection & response (EDR) tools overlook.
- Initial usage was proof of concept; it’s now moving to large-scale exploitation.
-
Enhanced Detection:
- LLM-driven analysis has surfaced many more examples, showing attackers are testing and iterating to improve their tactics.
-
Quote: “It was not possible for us to go through that number of samples and then extract those insights from these malwares. It is because of…systems and the design…with the LLM that we are able to…” — Amit Malik [11:52]
4. Attribution & Novelty – The APT41 Linux RAT ([13:06])
-
Attribution Insights:
- Code analyzed strongly suggests links to APT41, a known advanced persistent threat (APT), though not 100% confirmed.
- Focus on Linux targets suggests enterprise data theft motives over individual credential theft.
-
Caution Required:
- “You cannot trust LLM at all, like 100%. …It has to be technically validated…” — Amit Malik [14:10]
- Human expertise is still needed to verify LLM findings due to the risk of hallucination.
5. Ghost Penguin and UDP-based C2 ([15:20])
-
Why UDP Matters:
- Ghost Penguin uses UDP (not traditional TCP) for command-and-control communication, making defensive network analysis much harder.
- Asynchronous, stateless nature of UDP means defenders can't easily reconstruct high-level events from packet captures.
-
Quote: “UDP…you have packet delivery right now, you have command and control…analyzing…will not…be easy…in UDP it’s kind of difficult…” — Amit Malik [15:36]
6. Practical Takeaways for Mid-Size Enterprises ([16:23])
-
How Can Organizations Benefit?
- Even those without specialized labs can leverage public threat intelligence reports and best practices shared by companies like Rubrik, CISA, and others.
- Know your environment: High Linux/cloud presence demands different attention than pure Windows shops.
-
Quote: “If their environment is having Linux exposure higher or…cloud, which is going to be the case, then they should…try to see how they can leverage this information…in their security posture…” — Amit Malik [16:58]
7. The Arms Race: Machine-Speed Attack vs. Machine-Speed Defense ([18:04])
-
Current Edge for Defenders:
- Attackers using public LLMs leave a network trail (API calls), creating opportunities for detection and blocking.
- Nation-state or advanced adversaries will eventually host and use their own models without guardrails or traceability.
-
Emerging Threat & Defensive Parity:
- Sophisticated malware is beginning to offload key functionality to LLM prompts, reducing the static footprint and enabling dynamic, environment-specific attacks.
- “You just put a very small binary…that has plain English inside it, and then it will connect to the LLM and do the rest of the job.” — Amit Malik [20:06]
- Defenders must also operate at “machine speed” to maintain parity in this evolving battle.
-
Optimism:
- “We will have the same technology, so we will be able to operate at the same machine speed and then counter…” — Amit Malik [20:52]
Notable Quotes & Memorable Moments
-
“[With LLMs] as an analyst, I don’t really have to do the initial analysis—that is already being done by the LLM.” — Amit Malik [05:34]
-
“If you send all of those things [all code including libraries] to the LLM, [it] will not really be able to identify what really is the core.” — Amit Malik [07:47]
-
“The security community is very strong in terms of the information that is coming from all the companies… They do proactively share any important intelligence…” — Amit Malik [16:49]
-
“Right now at this state, you cannot trust LLM at all, like 100%. You can’t say that whatever is LLM is saying is 100%. …it has to be technically validated so that all the functionality, we validate it, you know, manually to make sure…everything that is there is correct…” — Amit Malik [14:10]
-
“You just put a very small binary on that that has plain English inside it, and then it will connect to the LLM and do the rest of the job. …it will definitely increase some complications. But…we will have the same technology, so we will be able to operate at the same machine speed and then counter.” — Amit Malik [20:06]
Key Timestamps
- 03:54 – Impact of LLM automation on malware triage and analyst workflow
- 07:25 – Technical workflow to extract business logic and overcome obfuscation before LLM analysis
- 10:15 – The rise of Chameleon malware and exploitation of Windows Subsystem for Linux
- 13:06 – Attribution insights and caveats about APT41-linked Linux malware
- 15:20 – Why Ghost Penguin’s UDP-based C2 is so hard to detect
- 16:44 – Actionable advice for organizations without specialized resources
- 18:04 – Analysis of the LLM-powered attacker vs. LLM-powered defender arms race
- 20:06 – Details of malware dynamically generating code via LLM, and implications for defenders
Conclusion
Rubrik Zero Labs' innovative use of LLMs is enabling defenders to analyze massive volumes of malware samples rapidly, surface truly novel threats, and counter emerging attacker innovations—like dynamic, environment-aware code generated at machine speed. While AI is closing some gaps, the human layer remains critical for validating findings and arming organizations with actionable intelligence. Even organizations without specialized resources can learn from and apply these insights; staying informed, vigilant, and adaptive is essential in a world where attack and defense are both accelerating.
