Summary5 min read

Cybersecurity Today: DeepSeek JailbreakYields System Prompt and Open AI

Host: Jim Love
Release Date: February 3, 2025
Episode Title: DeepSeek JailbreakYields System Prompt and Open AI

Overview

In this episode of Cybersecurity Today, host Jim Love delves into significant cybersecurity issues impacting both Canada and the United States. The discussion spans a staggering report on fraud losses in Canada, a groundbreaking security breach involving the DeepSeek AI model, and an alarming rise in phishing scams targeting U.S. toll road users. The episode culminates with an exclusive interview with Ivan Novikov, CEO of Wallarm, a leading cybersecurity firm specializing in API security.

Canada's Massive Fraud Losses in 2024

Jim Love opens the episode by highlighting concerning statistics from the Canadian Anti-Fraud Centre (CAFC). In 2024, Canadians reported losing $638 million to fraud, with $310 million attributed solely to investment fraud. Identity fraud emerged as the most frequently reported scam, with 9,487 cases. However, the CAFC suggests that these figures likely underestimate the true scope, estimating that only 5 to 10% of fraud victims report their losses, potentially pushing the actual total into the billions.

Key Statistics:

Total Reported Fraud Losses: $638 million
Investment Fraud: $310 million
Identity Fraud Cases: 9,487
Service Fraud and Bank Investigator Scams: $16.4 million
Spear Phishing: $67.3 million
Romance Scams: $58 million

Jim emphasizes the CAFC's recommendations for Canadians to bolster their defenses against fraud by:

Using strong passwords
Enabling multi-factor authentication
Avoiding unsolicited financial offers

He warns about the surge in fraudulent investment ads masquerading as legitimate news stories, particularly impersonating the CBC, and proliferating across social media and search engines. Jim passionately urges listeners, stating, “Do something. So I got that off my chest.” [02:30]

AI Security: The DeepSeek Jailbreak

Introduction to DeepSeek and the Breach

The conversation shifts to a significant development in AI security. Researchers successfully jailbroke DeepSeek, an open-source AI model from China, exposing its hidden system instructions. This breach not only compromises DeepSeek but also raises broader concerns about AI safety across the industry.

Jim elaborates on the incident, mentioning, “The discovery raises some major security concerns, not just for Deep Seek, but for all AI safety.” [04:15]

Expert Insights: Interview with Ivan Novikov

Ivan Novikov, CEO of Wallarm, joins the conversation to provide an in-depth analysis of the breach and its implications.

The Jailbreak Technique

Ivan explains the method used to compromise DeepSeek, revealing that Wallarm's team employed a binary search technique to manipulate the AI model into divulging its system prompts. “We built a technique called biased attack... it's like a binary search tree,” he states. [08:00]

Implications for AI Security

The breach underscores the vulnerability of AI models, especially as the race to develop advanced AI accelerates. Ivan notes, “The speed we're moving at AI, are we paying appropriate attention to security? The answer is probably no.” [09:00] He also touches on the problematic claims by OpenAI regarding the use of DeepSeek, clarifying, “The model said yes, which doesn't mean that it was.” [20:53]

Responsible Disclosure

Following the breach, Wallarm responsibly disclosed the vulnerability to DeepSeek, who promptly patched the issue. Ivan praises DeepSeek's swift response, remarking, “They fixed it in less than an hour or so. That's a good velocity.” [22:09]

Future of API Security

Ivan emphasizes the urgent need for improved API security frameworks, increased awareness, and better management practices to mitigate similar vulnerabilities in the future. “We have to deliver something very fast and we don't have security enough of crime to secure it properly.” [12:04]

Rising Phishing Scams on U.S. Toll Road Users

Shifting focus to the United States, Jim discusses the wave of SMS phishing scams targeting toll road users. Brian Krebs of Krebs on Security reports that criminals are sending fake messages impersonating toll agencies like E-ZPass, directing victims to fraudulent payment sites.

Scam Characteristics:

Impersonation: Messages appear to be from legitimate toll agencies.
Fraudulent Payment Sites: Designed to steal payment details and bypass multi-factor authentication.
Geographical Spread: Alerts issued in states including Florida, Texas, California, and Connecticut.
Tactics: Use of mobile-only sites and leveraging advanced messaging services like iMessage and Rich Communication Services (RCS) to evade spam filters.

Jim underscores the sophistication of these scams, quoting, “Text messages are a new attack vector. They are finding ways to get past screening.” [15:40] The FBI advises users to report such attempts and remain vigilant, emphasizing, “Never click on unsolicited texts.” [16:00]

Concluding Remarks

Jim wraps up the episode by reiterating the critical nature of the discussed cybersecurity threats and the importance of proactive measures to safeguard against them. He encourages listeners to stay informed and cautious, especially in an era where both traditional and emerging threats are evolving rapidly.

In the Afterword segment, the detailed interview with Ivan Novikov provides valuable insights into API security and the broader challenges facing the AI industry, reinforcing the episode's emphasis on the necessity for robust security practices in the digital age.

Notable Quotes

Jim Love [00:45]: “The true total could be in the billions.”
Jim Love [02:30]: “Do something. So I got that off my chest.”
Jim Love [04:15]: “The discovery raises some major security concerns, not just for Deep Seek, but for all AI safety.”
Ivan Novikov [08:00]: “It's like a binary search tree, the algorithm that help you to identify...”
Ivan Novikov [09:00]: “The speed we're moving at AI, are we paying appropriate attention to security? The answer is probably no.”
Ivan Novikov [20:53]: “The model said yes, which doesn't mean that it was.”
Ivan Novikov [22:09]: “They fixed it in less than an hour or so. That's a good velocity.”
Jim Love [15:40]: “Text messages are a new attack vector. They are finding ways to get past screening.”
Jim Love [16:00]: “Never click on unsolicited texts.”

For more detailed insights and the full interview with Ivan Novikov, stay tuned to Cybersecurity Today. Stay safe and stay informed.

Loading summary

Transcript35 lines

[00:01]
Jim Lough
Canadians lost a reported $638 million to fraud in 2024. Researchers jailbreak deepseek API and exposed the system prompt and a new SMS phishing scam targets U.S. toll road users. This is Cybersecurity Today. I'm your host Jim Lough. Canadians reported losing more than $638 million to fraud last year, according to the Canadian Anti Fraud Centre. Nearly half of that, almost $310 million, was lost due to investment fraud. Meanwhile, identity fraud was the most frequently reported scam with 9,487 cases. But the report is clear that the real number could be far worse. The Canadian Anti Fraud center estimates that only 5 to 10% of fraud victims report their losses, suggesting that the true total could be in the billions. Regardless, we have some information about the types of frauds that are occurring and although we might not have a complete picture, we do have a better picture of what's happening. After investment fraud, the most common scams were service fraud and bank investigator scams, which impersonate financial officials and resulted in $16.4 million in reported losses. Spear phishing, where attackers use targeted email fraud, cost victims a reported $67.3 million, while romance scams led to $58 million. In addition to reporting this data, the CAFC also has some useful advice on their site. For people who have been scammed or defrauded, it's worth looking at. They do advise Canadians to use strong passwords, enable multi factor authentication and avoid unsolicited financial offers. On this last point, fraudulent investment ads disguised as news stories are a growing problem and some of these look pretty good. In Canada, they impersonate the cbc, our national broadcaster, and do stories that try to hook people in. Now these are appearing on social media and search engines. Are you listening? Facebook and Microsoft Edge on your news page you are replete with fraudulent ads that are going after innocent people. Do something. So I got that off my chest. Authorities are urging Canadians to report scams to law enforcement and the Canadian Anti Fraud center. And if there's an American equivalent to this or a program that I haven't heard about, please let me know at editorialechnewsday ca Glad to report that as well. Researchers have successfully jailbroken Deepseek, an open source AI model from China that made the news last week. They've exposed its hidden system instructions and a lot more. The discovery raises some major security concerns, not just for Deep Seek, but for all AI safety. Wallarm, a cybersecurity firm, found a way to trick Deep Seek into revealing its internal rules and constraints, CEO Ivan Novikov explained. We convinced the model to respond in certain ways, breaking its internal controls. Now, the jailbreak suggests Deepseek safeguards are weaker than expected, raising some concerns about this and other open source models. But in reality, the concern really is with the speed we're moving at AI, are we paying appropriate attention to security? The answer is probably no. The compromised AI may have, and I stress may have even supported some of the claims that OpenAI was making about Deep Seq using its model to train Deep Seek. Though no proof of intellectual property theft was found, the speed of deepsea's development has raised questions, and this breach adds to that. Now, Deep Seek developers have since patched the issue and Wall ARM has withheld the technical details to prevent further abuse. But the incident highlights a broader issue how easily can AI models be manipulated? And as new challengers entered the market, and as everyone's trying to win that AI race and get there first, we may find more examples of where speed trumps security. We have an exclusive interview with Ivan Novikov, which will air after the show. Just stay on after the credits for the feature. We call afterward. And Brian Krebs of Krebs on Security has done an excellent piece on the wave of phishing scams hitting toll road users across the US with fake messages demanding payment for unpaid tolls. Researchers are linking the attacks to China based phishing kits that are adapted to impersonate toll operators with alarming accuracy. Victims receive texts pretending to BE from E ZPass, SunPass or State Toll agencies directing them to fraudulent payment sites. The Massachusetts Department of Transportation recently warned about phishing attacks targeting its Easy Drive MA program. Victims are tricked into entering payment details and one time passwords, allowing criminals to bypass even two factor authentication. The scam has been spotted in Florida, Texas, California, Connecticut and other states, and it appears to be tied to Lighthouse, a China based SMS phishing service that now includes fake toll payment pages among its products. These sites are mobile only, making them harder to detect as scams. In fact, security experts are warning that phishing attacks are evolving. Criminals are now using iMessage and rich communication services RCS to bypass spam filters, making these messages look even more legit legitimate. The FBI urges users to report phishing attempts to the Internet crime complaint center IC3 and never never click on unsolicited texts. But the bottom line texts are a new attack vector. They are finding ways to get past screening and we have to train ourselves and our users to be very, very skeptical and very cautious when they respond to A text, especially an unsolicited one. That's our show for today. Stay tuned for afterward and hear our interview with Ivan Novikov. I'm your host, Jim Love. Thanks for listening. And now welcome to Afterword. My guest today is Ivan Novikov, CEO of Wallarm, a security company that specializes in API security. They've recently done a major study on API security and found some major vulnerabilities, particularly in Deepseek, which allowed them to download the entire system prompt and more. I hadn't heard of Wallarn before and maybe that's my failing, but can you tell me a little bit about the company? Because you've hit me twice in a week now. I got a great study from you on APIs, really liked it, very detailed, very great. And then this press release today. So tell me a little bit about the company.
[07:10]
Ivan Novikov
Okay, Warm. As an API security company, we actually ran out of stealth back 2016 while Y Combinator inception in Silicon Valley. Since that time we mainly focused on enterprise companies delivering them AI and API protection tool called war. And since that time we got like signal contraction. More than 100 large enterprise customers all over the world still have a HQ in San Francisco since that time.
[07:41]
Jim Lough
Can we talk about the study? Just because since I've got you on the recording here, this. You did a study and it said I. The one thing that jumped out at me was it said that there'd been an increase in API led incidents or I guess incidents where APIs were the key attack vector by 1,025%.
[08:01]
Ivan Novikov
Yeah, the thousand percent we mentioned there. This is specifically related to AI CVEs or in other words vulnerabilities published in 2024 comparing to 2023. So basically in 2023 we analyzed all the CVE's common vulnerability numbers and bulletins. So we found only 39 in 2023 comparing to 439 in 2024. That's basically 11 times more. And this is all CVE is related to any AI products, frameworks or LLMs, directly or indirectly. Right. Everything that we can attribute to AI.
[08:44]
Jim Lough
And do you tie that into the growth in AI, particularly that there's that much vulnerability?
[08:52]
Ivan Novikov
Sure, because again, we got more and more products, specifically open source product that were built and released to deliver AI in real environments. In other words, if you want to use AI, it's not as easy as just use some tool like in many cases it's just like API proxy such as you call OpenAI API and that's it. But then you need manage data, manage Pipelines collect the data, somehow orchestrate this. And that's why you start to use some other tools to support this, what's so called pipewires or workflows. And if you want to use your local LLM instead of calling someone else via API, then you do even more. Right? And this raise of tools definitely pushed razor vulnerabilities.
[09:37]
Jim Lough
Yeah, maybe I'm asking the question incorrectly. I was doing a recording of our weekend show and I said mea culpa. When we were doing APIs when I was in development, we were trying to make them work. We weren't as concerned with security. I will confess to that, and I think everybody else will. But we should have learned over the years how big an attack vector APIs are. Why is it that and you obviously got into this business because you think that they need to be protected. What is it that keeps us from making these more secure?
[10:09]
Ivan Novikov
Look, I, I definitely can point a few factors that contribute to that. The first of all, APIs, right? Not something new. Right. And a couple of years ago when we just start to run the stress stats report, by the way, this is our third year, so we run 10 reports in this way and then we do it quarterly. So basically it's two years plus two reports, something like that. So what we found is when we just released first report we tried to get the historical overlook and we found that the first API exploit was detected back in 1998. So basically 25 years of history that time, roughly this. So then we start to dig into it and try to find out why APIs became more and becoming more and more actively widespread. So definitely the main driver here is overall adoption, right? People want to run more services and connect them with each other. Before probably like 10, 15 years ago it was if you can recall that called enterprise service bus or ESB when SAPI that that kind of technologies were in place. So it was like non gated hubs, right? Then it turned over to API gateways when every everything was gated. And now basically a couple of years ago, when we finally realized that API is the key major, the most important thing for enterprise security, it became too unmanaged. So basically everyone can run API and make it available. Pretty much everything inside outside partners really depends on the type of the business, but not really manage that by gateway. So because majority of API became unmanaged and if you can look at the Gardner report, they pretty, I guess if I'm correct like 80 and 90% API, enterprise APIs become unmanaged very soon in 2020. 6 or 20, 27. That's exactly the key, right? More things, less management, more security issues.
[12:03]
Jim Lough
So what can we do about it?
[12:05]
Ivan Novikov
Look, the fair answer is we have to overall improve our frameworks and overall development and then deployment techniques. It's well described in Microsoft sdlc like guidelines, pipeline and everything that happened afterwards is all about this, right? Ultimately the problem is we need as a business, right, we need to deliver something very fast and we don't have security enough of crime to secure it properly. That's why we added firewalls, some kind of like external controls, IPS IDs, all that kind of things to try to at least block something that obviously can happen. The other problem is, and then we definitely have to do that. So to, to address this problem we have to increase awareness. And I guess AI here plays a good role because now all the developers can just ask AI so if the piece of the code looks secure and get instant knowledge and feedback about this particular code rather than ask security guy. And security guy runs some scanners and tests. And so it's like a straightforward connect between who's building this code and basically all the security insights collected all over the world. Even if it's not perfect, that's much better than nothing, right? And the other thing is, and the other thing is overall improving our framework development, frameworks, API, application servers, all that stuff because majority of them, well, secure, right? I understand that now I, I look like very old school with mentioned web Sphere, another EBM product, but they rebuilt for good and it's a lot of security controls in the web sphere that is still not allowed across all the new newest management platforms and API and application servers. So improving frameworks basically reducing attack surface while we're developing it definitely secure a lot. And the other basically short component, right? If you just try to build this stable system, right? The third component is like overall knowledge and awareness then make frameworks and reducing attack surface there and built in controls in other words or patterning. And third part is overall assessment and management. So basically even if API is not managed, it's still important to at least know that API exists. And if I recall to 10 years old projects when it was just a few APIs or application, every single service or API or applications that released were well documented with owner with document called Passport, right? With pretty much everything now because development speed should be increased, right? We don't have these passwords anymore, but we still need to make a list of them and understand who is responsible for this business function. Because API is very tightly connected to business functions. Right. They serve in essentially transactions, call API calls. Right. Or a bunch of them together one business function. So we have to appoint this. That's what I think we should do. What's already happening with different quality in different places.
[14:59]
Jim Lough
So let's talk about Deep Seek. What made you look at Deep SEQ in the first place? And then what did you find?
[15:06]
Ivan Novikov
Yes. Yeah. First of all, Deep Seq is ultimately very like flashy technologies that pretty much everywhere. So we decided to look into it and find out what's there. Right. Find the difference and evaluate the performance of the model. And I want to have like very important comment here. So the deep seq.com or chat deepseek.com this is the product essentially it's an AI agent. Right. The agentic AI is like big thing now which doing some actions. Right. They still build based on the model that by the way available in open source. Right. But the model itself that's available in open source, it's not exactly equal to the product as a chat. Right. In other words, this chat can search for Internet which is function. Right. And this is a big difference between native LLM and native LLM security. And what we do in that wall securing AI product that using LLMs but in fact serving a lot of API calls and doing a lot of actions behind the scene. Right. That's why I try to find the way how to. How we can learn more about the model implemented in very specific ways such as chat.deep seq. Com. We found a way how to what so called jailbreak. In other words, how to convince model to respond for questions or give us technical data that shouldn't. And that's so called jailbreak. So we found that unlike other jailbreaks that were published or really published specifically in Deep SEQ and other models, the usual jailbreak actually built to get some data such as instruction how to build something bad and or respond with no censorship and such things. So our jailbreak is more technical Jailbreak that unlock model basically tell us everything about the model itself. It's a little bit different kind of.
[16:52]
Jim Lough
Yeah traditional jailbreak. You're looking to get it to bypass its instructions to be able to tell you something. How to build nap, how to make napalm, how to make math are the classic ones. Or in this case what really happened at Tiananmen Square would be probably a good jailbreak. If you got past that one, you probably get somewhere. But those are the classic ones. But you actually got in in and got it to really dictate what Its overall instructions were and its overall model was. What made you think of how to do that? Because obviously the one way to do is say print out your instructions or give me your main prompt. And by the way, just in case anybody at OpenAI is actually sitting there going, we're better. No people have gotten the prompts from a couple of major AI providers just by asking for them. But obviously you tried that and that didn't work. And so what else did you do next?
[17:46]
Ivan Novikov
Yeah, and then we tried to build a way, like more scientific way how to get at least some knowledge. Right. If you cannot add directly, ultimately you still can ask indirectly. And then we build a technique called biased attack. When we put the like the model response in a very strict frame when the model should answer essentially yes or no or between like three or four options that we provide them. So model cannot lie and model not give an answer. That's why it still start to provide some stuff and then a bunch of code around it. How to ask many questions like that and get a kind of like you, very similar to like binary search tree, the algorithm that help you to identify, hey, if the number is between this and that, then what? So if this is between this frame, then what's inside this frame and then divide by two and so on. That's how we can get some basic knowledge. And in terms of extracting large text such as this AI system prompt, then it took some time, but we did it and we posted results. Make everyone first of all check what's inside.
[18:45]
Jim Lough
And you're a lot smarter than me. I need you to slow down. You went after it. Is this like password stuffing? You gave it a whole pile of commands to try and figure out which one it would break it.
[18:57]
Ivan Novikov
Not exactly this. So first of all. So we will wait a little bit until full disclosure for this technique because as a model it's also vulnerable and we have to get some other models to get fixed. But essentially it's binary search. So you have to find a way how to directly ask yes and no or between few options. And then based on that options you can build your next question. And then at smaller chunks you get into an answer.
[19:24]
Jim Lough
Wow. And you managed to extract basically it's prompt. What did you manage to extract?
[19:30]
Ivan Novikov
That's a basic system prompt. So once you found the way how to directly communicate with the model in the way you want, you can extract pretty much everything that you want. And that actually what we call jailbreak. And we extracted the baseline of instruction. So basically when you put some prompt in A chat, this prompt or your query actually adds to a bunch of others including policies and how to answer questions. And so guideline that provided by developers. That's why the chat itself AI product that built on the top of LLM. So if you just download this open source LLM, you have to define your own system prompt. Right. You will not find what was defined by default. And this prompt identify behavior of the model policies, what it can or cannot respond. So we extracted that and also asked model some kind of like technical questions about how was it, how the model was trained, was it like OpenAI API used to distill data. And we got some answers that we decided to also include into blog post, which definitely not the kind of guarantees that it was used because ultimately we didn't know how model rigid data model.
[20:43]
Jim Lough
Was pretty sure that it was using using OpenAI. You might not have found that. We've. I've heard of people who've actually gotten direct responses from it saying I was trained on OpenAI.
[20:53]
Ivan Novikov
Yeah, that's the same with it, right. We have direct response was it trained using OpenAI after jailbreak? And the model said yes, which doesn't mean that it was. Right. And I can imagine that the model could be guided to answer like that just to get some PR around that and let everyone compare models and increase valuation. We didn't know, but that was the answer and that's what we got.
[21:15]
Jim Lough
So you contacted Deep Seek. Now this is the second hack. They had one on their database a couple of days ago. They responded quite quickly from the sound of it.
[21:25]
Ivan Novikov
So first of all, I don't think that it's. Second one, if you look at just X or Twitter, you will find more than a bunch of dozens of two.
[21:33]
Jim Lough
That I know of.
[21:34]
Ivan Novikov
I notified them yet.
[21:36]
Jim Lough
Maybe I'll ask Deep Seek how many times they've been hacked.
[21:39]
Ivan Novikov
They don't know. They don't know yet. But you can run jailbreak and then people had to. Yeah. So overall, yeah, it's usual practice called full disclosure or responsible disclosure. Right. First we notified them and once we realized that it's actually fixed so we cannot reproduce this attack anymore, the jailbreak doesn't work. So we decided to publish this however, because the same jailbreak we know for sure works for other models. So we decided to don't this floor flight with some details about that however.
[22:09]
Jim Lough
Thank you. Yeah. Although I tried to jailbreak you on this interview and didn't succeed.
[22:16]
Ivan Novikov
Not yet. At least not yet.
[22:17]
Jim Lough
Yeah, I'll keep working on it. This has been terrific. Thank you so much. I really, I think people do developers a service when they point out these problems. The great thing I think about Deep Seek at least is that they admitted it. I've seen far too many companies now that when they get notified of a breach or an attack vector, seem to deny it or go that's it's not a big deal. So they seem to be at least responding well.
[22:42]
Ivan Novikov
Look, at least they fixed it so the communication flow and so it's not, I'm not the guy who will guide them how to respond, but they fix it and is it for me it means that it's high tech engineering driven company and they fix it in less than an hour or so. So that's a good, that's a good velocity, right? The velocity that as a security researcher I really appreciate they really care about their product and not that many companies in the world can do that. It's very young and active company, a lot of energy. So I like it. However, all the other things like to decide we will see during times how company will grow, how they will respond to other issues and security issues. And now they're the hype and we know for sure that it's worth it. It's a lot of good tech implemented there and they did a good job anyways.
[23:27]
Jim Lough
Yeah, and as I said, this is a side project. This is something they did in their spare time. We'll take over the world of AI in, in our spare time. I think you got to give them a little bit of credit for that. Although I, I did say the one thing I, I did learn and I don't know if much you've discovered in this in, in the work you've done is because something's a side project or it's a proof of concept or God forbid, a test project. We tend to not pay enough attention to security, forgetting that our test projects are often attached to other systems or at least attack vectors in themselves. So I think it's a good lesson for us all to say even if you're doing this as a proof of concept, you have to pay attention to the security on it.
[24:10]
Ivan Novikov
Yeah, and I agree with you. And here is it for me at least like the kind of like borderline, right. If you're doing something in open source, right, and it's just available, then you feel free to do whatever you want. People take their own risk because they read your guidelines. It's bad if you release the product, even free product, then you take some responsibility for your users. That's how it became to Play and I guess users understand that essentially run on the Chinese servers and the China or the Chinese company have access to all the data. They can read this agreement and so that's the thing. However, there is a difference between product, real product such as chat or API that uses LLM and LLM itself. Building the. In terms of engineering LLM, they did an amazing job. Whatever they used, this is for good and it's good for all of us as a community. Now we have probably the best model, the fastest one, the most performance one and like why not to use the model however, the product that make you know the website and this app and so on, that's still, that's as it for me is still in deep better. So that should be improved significantly.
[25:13]
Jim Lough
But and as I've said they have to take some responsibility. But for those of the people who are running corporate security or who have employees who might be on there, if you've got an employee who's on a two day old AI and they're putting your corporate information on there in a server in China, it's time to take their PC away.
[25:36]
Ivan Novikov
By the way, majority of this PC is built and delivered from China.
[25:40]
Jim Lough
Yes. You can't. Yeah. What do you do? Thank you so much. My guest has been Ivan Novikov, he's the CEO of Wallarm, their company that deals with API security. They've got a great report out. We did a story on them. You can find a link on our show notes. Thank you very much. And that's afterward. If you stayed to the end, I'd love to know what you think. You can reach me at editorialtech, newsday.cat ca or if you're watching this on YouTube, just go underneath, put the comment in there. I'm your host, Jim Love. Thanks for listening.