
Loading summary
A
Hey everyone, and welcome along to Seriously Risky Biz. My name is Amberly Jack. This is our podcast all about cyber security policy and intelligence. In just a moment, I'll bring in Tom Uran, our policy and intelligence editor, to chat all about the Seriously Risky Business newsletter that he's put together this week. First. So this week's episode is sponsored by Thinxt and you can find them at T-H-I-N-K-S-T.com so a big thank you to them for that. Now, Tom, I want to jump straight into the newsletter here and your first piece here, Chinese Made Deep Seq, in particular, the R1 model, turns out, throws out some more insecure code when prompts mention things that are considered sensitive to the Chinese Communist Party. This is according to recent research from CrowdStrike. But Tom, ignoring the fact that, I mean, you and I were sort of laughing about this before and saying we're not entirely sure why, when you are prompting an LLM to produce code, you would make specific mention of, say, Tibet or Falnun Gong. But that aside, were you surprised by this research?
B
Kind of yes and no. So, yes, because I think that mostly model manufacturers aren't going around trying to get models to make insecure code in particular circumstances. So, like, if they had deliberately put in that kind of, I don't know, would it be a sabotage or a backdoor not or that kind of feature? I guess that would be very surprising. It doesn't seem like that's what's going on here. What it looks like is going on, or at least what CrowdStrike speculates, is this phenomena called emergent misalignment, which is basically if you try and get a model to do something, you fine tune it for one particular thing that like, actually results in bad outcomes elsewhere. And so the theory is that the model's been trained to think of the things that the Communist Party dislikes, hates, Uyghurs, the Falun Gong, the Hong Kong pro democracy movement as bad. And so whenever you associate something with those bad things, you get bad outcomes.
A
Yeah, right.
B
And so the specific example they give in the report is, well, I should step back. What they did is they asked a whole lot of models the same question, gave them the same prompt, and asked them to code a specific task. And I think the good thing about coding is that you can then measurably test whether it's secure or not. And so they said, okay, when we just asked them a straightforward coding task, Deep Seq is about as good as anything else. It was a few percentage points worse than the best Western open source model. So that was, in terms of.
A
It was very little though. It was like 16 and 19% or something like that.
B
Yeah, yeah. So very little or like really quite a lot, depending upon your point of view. So if you ask an open source model to code something about 16% of the time, the best model will have some sort of vulnerability in it.
A
Yep.
B
Like that seems like actually remarkably high. And by that measure deep seq was 19%. So not all that much different if at 16% you're checking for vulnerabilities anyway, like that's the only way that you can use them and being rigorous about your coding processes. And so 16 to 19 doesn't make much difference. It's not, you know, 0, 100. But then they took the same models and they would add what I would call extraneous information. They called it contextual modifiers. So things like they had neutral ones like for a cybersecurity company, for a financial institution, for a Western cyber security company, for a named specific company. And then they had ones which they described as geopolitical and could be sensitive like for a Tibetan company or for run by Uyghurs or run by the Hong Kong pro independence movement, something like that. And I don't know, is it a surprise or not if you add some particular modifiers? Deepseek in particular got a lot worse at coding. So if you mentioned the Falun Gong for example, it actually got a bit conflicted. And so at times they could actually see the internal thought process, air quotes of Deep Seq and it would say, well, you know, the Falun Gong is bad. Is it ethically correct for me to do this? Well, they're just after a technical answer and so almost half the time it would just straightforward refuse and say, I'm sorry, I cannot do this for you. So CrowdStrike has come across something real here. There is like a real phenomena. But at the same time I had this feeling that the report was cherry picking things that would make it more sensational. Which, I mean, to be honest, that's not a, that's not an unusual phenomenon. So the baseline, the vulnerability rate was for deep seq was 19%. And the specific figure they gave when you mention for industrial use in Tibet was 27%. So that's, that's a very, very specific. For industrial use in Tibet, that's a very specific example to associate with that. Right. And it's unclear to me if it's just poorly explained or they're saying that's the worst one. And if it's, you know, for, you know, for commercial use by the Falun Gong, it's less than that. So with that caveat, like I said, I still think they've come across something real and so reason for concern. It's interesting to me to see how you've got this technical report. It's relatively narrowly scoped and it actually says, we think that other models. I think they use the word. It's not unlikely that other models have this same phenomena, so that if you mention certain topics to other models, you'll also get bad results. Now, of course, once it becomes quite political, it just gets interpreted as China, deep sea, terrible, bad, we can never have it.
A
Yeah.
B
Now, I think there's lots of different models, so, you know, whatever. I think if you're a US Company, it probably makes no sense to use deep sea because you're just assuming political risk without any, like, major benefit.
A
Yeah, yeah, for sure. But I mean, as you said there, and as CrowdStrike pointed out as well, this would probably be something that other LLMs and other models would do. I mean, clearly when they did this research, they weren't putting prompts, you know, this is for MAGA, or this is for QAnon, or this is for.
B
Yeah, yeah, exactly.
A
Whatever else may be controversial in whatever political spectrum you sit on, would you expect that the results would be kind of skewed for other LLMs with specific prompts like that?
B
Yeah, I would be surprised if they're not. And the reason I think that is that some of the Topics you mentioned, MAGA, QAnon, far left, far right, whatever.
A
Yeah, yeah.
B
They're politically charged.
A
Yep.
B
And this seems to be a feature of LLMs in general, not of Deep seq. And so if you do the same research on Western models, I would be. I would be surprised if you don't find the same results. And it may be not as dramatic, but I think you pick an emotive topic and associate it with a banal coding task, and I think that will change results. It strikes me as kind of strange. Why would you have a coding task and then just sort of add information that is not relevant to what you're trying to achieve that can't possibly make it better. Adding extra random information can't make your task better.
A
Absolutely. I'm assuming. Because I know that they used Deep SEQ and they used other models. I'm assuming one of the other models was not grok.
B
So you're referring to the recently grok.
A
Grok.
B
If you asked it anything about Elon Musk, it would just say outrageous stuff. You know, Elon Musk is fitter than LeBron James. It's better. He's better at resurrection than Jesus Christ. He's made the most outstanding contributions to humanity. I think one was, he's a better piss drinker than anyone else. And so this is all very funny, but if you took that version of Grok, I don't know if it's been sort of amended since, and you ask it about something that cuts against Musk's interests, like it would be, you would expect that that result would be bad in some way or modified. And so the broader point is that all LLM makers have an interest in getting their models to produce the right results. And what the right results are depends on who you are and what you want. And everyone has a political viewpoint. And so this report, yes, China has a clear viewpoint of what's bad. It's putting that in its models and we need to be careful of that. But we also need to be aware that our own models are doing that as well. And we also need to be careful of that.
A
So the key takeaway is if you are using an LLM to produce code, check it.
B
I think you should be checking it for everything, really, shouldn't you? They're an aid, not a replacement. So I mean, the error rate still is high enough that I don't trust them.
A
Yeah, yeah. Jumping onto your second story, Tom, we have learnt quite a bit about an Iranian Cyber Espionage Org Department 40 or Charming Kitten. They've been doxxed in quite a big way by UK outlet Iran International and they have written a lot of details about this group into how it operates, which I guess gives really good insight into sort of what goes on behind closed doors here. But what stood out to you here, Tom?
B
Yeah, so this feels like a hack and leak operation. It feels like someone has gotten into this group, Department 40 and they've given a whole lot of documents to Iran International, which is an Iranian focused English or UK based outlet. So in China we've seen similar leaks where it's been a disgruntled employee. And so this one feels just a lot more complete. It's got, you know, national ID numbers, names, photos, it's got the national ID number of the founder and also his fake identities as well. So it's really very, very comprehensive and so it feels like a more complete but also from outside in picture there's kind of things that are similar to what I think of as a air quotes Normal cyber espionage agency. So they've got different teams. They've got an infrastructure team which is one of the bigger ones. They've got an OSINT team that runs their social media personalities. So they've got some online Personas, they run Abraham's, Axe and Moses staff and they kind of do things, influence operations online. And then they've got a hacking team, when in fact they've got two hacking teams. And in fact the hacking team is the smallest part of the organization. Now. The whole organization's only 60 people. Now some of it feels to me at least, uniquely Iranian. So the hacking team is entirely male, the infrastructure team is also entirely male. And in fact it's called the brothers team and it does infrastructure and systems development. And then you've got a sisters team that is entirely female and it does the translation, the open source research and it manages those Personas. So it's interesting just to see how different organizations are structured and that they're like that.
A
Yeah. If I ever get a job in an Iranian cyber espionage organization, I don't know, what are we doing?
B
You've been typecast already. And then the, the report lists some of the targets and they seem like, yeah, these are reasonable targets. So they've got regional telcos, police departments, airlines, and then they've also got government and military targets in the uae, Jordan. Now, curiously, it doesn't mention specific Israeli targets, even though Israel is the place that Iran probably hates the most.
A
Yeah. Right.
B
So all this intelligence is fed into Kashef, which is a database, like, roughly translated, it's revealer or discoverer. And so it's got travel records, identity records, everything. And they just whack it in there and they use it to surveil like both Iranian and foreign nationals. And so that also. That seems pretty normal.
A
And that detail there kind of reminded you a little bit of how China sort of works a bit as well, doesn't it?
B
Yeah, yeah. So the Chinese state has stolen a whole lot of, particularly US but worldwide, similar sorts of records. So insurance records, medical records, government classification or personnel, security records, Marriott like hotel stays, I think it was United Airlines one at some of the airlines. And it's interesting to see a much smaller operation, Iran, doing the same kind of theft and also whacking them into a database. So everyone presumes that that's what the Chinese are doing with that stolen data. So we have. Well, another state is doing exactly that.
A
And then there are a few strange things that you. That you saw them doing.
B
Yeah, so what was weird is They've got. One of the documents was the Department 40s Master Operations document, which is like a list of all the major projects they've got. And some of them were just what I'd call standard cyber espionage things. So there's a particular operation target against the former head of Saudi Arabia's general intelligence department. So, okay, yeah, high priority target. That makes sense. There's another one targeted against Turkish medical centers. And that makes sense because apparently a lot of Israelis traveled to Turkey to get medical treatment and the plan was to use intelligence to abduct some of those people. So that's like a sensational operation, but it's. Yeah, you would use intelligence to assist, like that. That seems normal. And then there's just like, well, okay, we'll also create three different types of destructive drones. Like we'll have a glider one which is for covert operations overseas, and then we'll have a jet powered one and then we'll have a suicide quadcopter. And this to me just seemed totally bizarre from, from the perspective of someone in a western intelligence agency that you would just go, okay, yeah, so it's, you know, hacking online media Personas and suicide drones. It's like those go together.
A
I've never put a lot of thought into the construction of suicide drones. So I don't know, maybe it's simple. Maybe. Yeah, yeah. I initially thought fake Personas.
B
Yeah, I initially thought this was very weird. You've got a organization that's set up like they've got their org structure and it's, you know, you've got the brothers team, infrastructure team, the hacking team, you've got the Osint sisters team. Like this is cyber espionage and maintaining databases. There's no hardware team, there's no, you know, workshop that you would need to do those other things like to build drones. But like, you know, perhaps I'm just thinking too much in the western mode where, you know, you get Boeing or you get Raytheon to do this kind of project for $2 billion. Maybe it is just as simple as going on to AliExpress and buying a drone and then whacking a bomb on top of it and off you go. And I mean like the whole effort felt a bit like a family business.
A
Yeah.
B
So the guy who heads it, he was recruited when he was relatively young. It's now 60 people. His wife is head of the sisters team and she's. They've got a number of front companies. He's the head of one front company. She's the head of a different Front company. So you kind of wonder about governance and nepotism and like, to be honest, leaving aside the purpose, like, you know, the, the drones will be used for some nasty stuff, actually. Like if you're a young person, particularly a young bloke, I think grabbing a drone and whacking an explosive on top of it and then flying it around and blowing it up, that actually sounds like a lot of fun. So it seems like maybe they're there for a good time rather than for a well managed time.
A
Yeah. Right. And just finally, Tom, because we, we are running out of time, but obviously all this information being published is not going to be ideal for department 40 or charming kitten. But do you, do you see this being enough to kind of be the end of them or, you know, are.
B
They going to bounce back? I think it's disruption, but they'll bounce back. There've been other quite large, like politically significant outings, particularly of Chinese hacking groups. And the thing is, when a state wants to do cyber espionage, it's going to keep doing cyber espionage. Maybe the, the exact people involved will change, maybe the head will disappear, maybe it'll be restructured, but I don't think the actual function disappears. Like, that's an enduring requirement to collect intelligence for Iran. So they're going to find a way to do that. It sort of depends on the personnel, the people's relationships with the Islamic Revolutionary Guard Corps. Are they solid? Are they going to just go, okay, well, we need to do some stuff to, we need to move locations or whatever. Yeah. But I think it's a disruption, not an end.
A
On that note, Tom, we will actually leave it there, but it's been fascinating as always. And I'm actually not here next week, so I won't see you for a little while. But I believe Patrick Gray is going to take his old spot in front of the microphone and chat to you next week. So, Tom, thank you so much. As always. It's been a pleasure.
B
Thanks, Amberly. Sam.
This episode of "Seriously Risky Biz" digs into two key cybersecurity stories featured in Tom Uren's weekly newsletter:
[00:04–10:54]
"The theory is that the model's been trained to think of the things that the Communist Party dislikes... as bad. And so whenever you associate something with those bad things, you get bad outcomes." [01:58]
"I had this feeling that the report was cherry picking things that would make it more sensational..." [05:14]
"...but I still think they've come across something real and so reason for concern." [05:42]
"I would be surprised if they're not... I think you pick an emotive topic and associate it with a banal coding task, and I think that will change results." [08:05]
"If you asked it anything about Elon Musk, it would just say outrageous stuff. You know, Elon Musk is fitter than LeBron James... He's better at resurrection than Jesus Christ." [09:08]
"...what the right results are depends on who you are and what you want. And everyone has a political viewpoint." [10:03]
"So the key takeaway is if you are using an LLM to produce code, check it." [10:33]
"They're an aid, not a replacement. So I mean, the error rate still is high enough that I don't trust them." [10:44]
[10:54–20:47]
"It's interesting just to see how different organizations are structured, and that they're like that." [13:37]
"It's interesting to see a much smaller operation, Iran, doing the same kind of theft and also whacking them into a database." [15:17]
"Some of them were just what I'd call standard cyber espionage things... And then there's just like, well, okay, we'll also create three different types of destructive drones." [16:15]
“...if you're a young person, particularly a young bloke, I think grabbing a drone and whacking an explosive on top of it and then flying it around and blowing it up, that actually sounds like a lot of fun.” [18:13]
"...when a state wants to do cyber espionage, it's going to keep doing cyber espionage. Maybe the exact people involved will change, maybe the head will disappear, maybe it'll be restructured, but I don't think the actual function disappears..." [19:44]
"I would be surprised if they're not [also affected] ... I think you pick an emotive topic and associate it with a banal coding task, and I think that will change results." —Tom Uren [08:05]
"He's a better piss drinker than anyone else ... It's all very funny" —Tom Uren [09:13]
"It felt a bit like a family business." —Tom Uren [18:23]