Summary6 min read

Practical AI – Deep-dive into DeepSeek

Episode Date: Jan 31, 2025
Hosts: Daniel Whitenack (CEO at PredictionGuard) & Chris Benson (Principal AI Research Engineer at Lockheed Martin)

Episode Overview

In this “fully connected” Practical AI episode, Daniel and Chris take a deep dive into DeepSeek, the headline-grabbing new Chinese large language model (LLM). They break down what makes DeepSeek significant: its high performance, low reported training cost, open release on Hugging Face, security and privacy debates, and the model’s likely impact on the global AI landscape. The episode aims to cut through the hype, clarify confusion, and help listeners understand both the technical and broader implications of DeepSeek’s arrival.

Key Discussion Points & Insights

1. What is DeepSeek, and Why Is It Causing a Stir?

[03:28 - 07:54]

DeepSeek is a generative LLM from a Chinese startup, notably achieving performance comparable to "frontier" models like OpenAI's GPT-4/01.
The model shocked the AI community due to its claimed low final training cost—about $5–6 million, a fraction of what Western companies reportedly spend.
Debate is raging: Is DeepSeek’s accomplishment as big a deal as the buzz suggests? What does this mean for the cost and accessibility of leading-edge AI?

“It appears to have been achieved at a much, much lower cost than all of the competing models from anywhere in the world up to this point.”—Chris Benson [04:03]

The mainstream narrative sometimes exaggerates DeepSeek as a “bedroom team” effort, but in reality, DeepSeek is a well-resourced organization with access to tens of thousands of GPUs.
Their open release on Hugging Face contrasts with OpenAI and Meta, whose models and training data are less open to independent scrutiny.

2. What About Security, Privacy, and Geopolitics?

[17:56 - 31:43]

There are two main ways to access DeepSeek:
1. Hosted product (app/web interface): Data goes to DeepSeek, governed by their (explicit) terms, stored on servers in China. Privacy implications here are similar to using ChatGPT or Gemini, but with new geo-political ramifications.
2. Downloable model artifacts (via Hugging Face): The open model and safe-tensor weights can be run locally, even in a 100% air-gapped environment—removing “phoning home” and privacy fears.

“It is very clear from the terms and service that Deepseek has posted that they will gather all of your… they’re saving a lot of your personal data and information. They will use that for future model trainings. And that is housed on servers in China.”—Daniel Whitenack [21:09]

Security concerns shift depending on how you use DeepSeek. Local/offline use is secure in traditional senses (assuming best practices and safe model files), but there are still risks—
- Bias: The model may reflect its training data and alignment process, which can differ from Western models.
- Prompt injection vulnerability: Secure's studies show DeepSeek is likely more susceptible to prompt-based attacks at the application layer than other leading LLMs.
Chris emphasizes that with open models:

“If you were running it on the transformers infrastructure and you did have disconnected inbound and outbound networking… would you have any reservations about running it…? …I wouldn’t.”—Daniel [27:25-28:48]

3. Technical Deep Dive: Architecture, Training, and Variants

[33:04 - 43:34]

Model Architecture:
- DeepSeek R1 is a “mixture of experts” model, a transformer-based architecture engineered for efficiency.
- Engineering choices similar to Llama (Meta) but with unique implementation details (prompting slower adoption in upstream Hugging Face Transformers—this is changing rapidly).
- “Mixture of experts” allows only some blocks of the model to activate per inference, reducing compute cost.
Training Strategies:
- Three customary stages: unsupervised pretraining; supervised fine-tuning; reward/preference modeling.
- DeepSeek innovated in data generation:
  - Used intermediate “reasoning” models (e.g. DeepSeek-R10) to generate high-quality synthetic chain-of-thought data for supervised fine-tuning, reducing human effort.
  - Automated filtering to select top-quality generated data for final model training, increasing efficiency.

“They used this interim reasoning model to actually help generate…reasoning examples to add into the training data…This allows you to augment your fine tuning data, use less human resources…” —Daniel Whitenack [36:18]

Model Variants & Distillation:
- Besides the massive 700B-parameter flagship, DeepSeek released multiple distilled models (e.g. “DeepSeek R1 distill Llama 70B”), trained to imitate the base R1 model output but much smaller and cheaper to run, even on consumer hardware.
- Smaller models (e.g. 8B, 32B) are accessible for laptop-scale or enterprise-hosted deployments.
Community already porting DeepSeek distillations to GGUF and other efficient formats for Macbooks and non-GPU use.

4. The Broader Impact: What Does DeepSeek Mean for AI?

[43:34 - 49:33]

Business and Technical Implications:
- Model optionality: Enterprises should prepare for a world with dozens of competitive, accessible LLMs—not just one or two.
- No more model lock-in: Future-proof AI architectures require easy model swap-ability, monitoring, and robust security for multiple vendor offerings.
- Cost expectations are shifting: It's no longer credible to demand $100M+ model budgets when startups (with the right know-how) can reach parity at ~5M.
- Startups may face a reckoning: Investors will question high valuations and operational costs. The era of "just add more GPUs" may be fading.
- Data curation: Training still demands significant investment in high-quality, well-aligned data (and human input), regardless of low compute cost.

“If you got $5 million sitting around, you could create a best-in-class model…These are going to proliferate very quickly…having kind of model lock in…That’s not going to work out great for you in the long run.”—Daniel Whitenack [44:35]

“Is this the moment…where investors [say]…‘Why do you need 100 million? Why don’t we give you 5 million and see what you can do with it?’”—Chris Benson [46:32]

Notable Quotes & Memorable Moments

Geopolitics & Censorship

On expected bias in answers from a Chinese LLM:

“I asked it what happened in Tiananmen Square in 1989? And it replied, I'm sorry, I cannot answer that question. I'm an AI assistant designed to provide helpful and harmless responses… just a reminder of the geopolitics of AI…” —Chris Benson [08:21]

On Open vs Closed Model Culture

Transparency and reproducibility:

“You see…model producers produce ‘technical papers’, but these technical papers don’t share details to where, in theory, you could reproduce this.”—Daniel Whitenack [12:43]

On Model Security Myths

Separating model vs product:

“When we say ‘model’, that's what we mean. We don't mean the product. And that can be run again with considerations in a secure environment.”—Daniel Whitenack [24:05]

On the Real Cost Story

Much more behind the scenes:

“There's just so much that's not…known about this. So they really cherry picked what they chose to publish about it.”—Chris Benson [07:54]

On Startups & Budgets

VCs may rethink what “enough” looks like:

“Maybe…investors [will be] looking at it going, why do you need 100 million? Why don't we give you 5 million and see what you can do with it?”—Chris Benson [46:32]

Important Timestamps for Segments

| Timestamp | Topic/Segment | |------------|--------------| | 03:28–07:54 | DeepSeek intro, cost surprise, Open vs Closed models | | 08:18–13:56 | DeepSeek narratives, team size, hype vs reality | | 17:56–31:43 | Security, privacy, local vs hosted use, bias/fears | | 33:04–43:34 | Model architecture, variants, distillation explained | | 43:34–49:33 | Impact on AI ecosystem, business, VC, data curation |

Resources & Learning Links

Jay Alammar's “An Illustrated DeepSeek R1” – Visual, accessible explanation of DeepSeek’s architecture and process.
Daniel’s Blog Post on DeepSeek Security and Privacy – Linked in show notes, addresses myths vs real risks for using DeepSeek in enterprise.

Final Thoughts

DeepSeek highlights just how fast, open, and competitive the LLM space is now becoming. The episode encourages business and technical leaders to rethink both the economics and strategy of AI deployment: anticipate a future full of accessible, swappable models—and prepare for both the risks and opportunities this fresh wave of innovation will bring.

“It's definitely a shock to the general public's perception that there is model optionality out there. There's going to be a proliferation of these models from various different places…”—Daniel Whitenack [14:10]

Loading summary

Transcript72 lines

[00:00]
Chris Benson
Foreign.
[00:06]
Podcast Host / Announcer
The podcast that makes artificial intelligence practical, productive and accessible to all. If you like this show, you will love the Changelog. It's news on Mondays, deep technical interviews on Wednesdays and on Fridays. An awesome talk show for your weekend enjoyment. Find us by searching for the Changelog wherever you get your podcasts. Thanks to our partners at Fly IO Launch your AI apps in five minutes or less. Learn how at Fly IO.
[00:44]
Daniel Whitenack
Well, welcome to the very first fully connected episode of Practical AI in 2025. In these fully connected episodes of the Practical AI podcast, Chris and I keep you fully connected with everything that's happening in the AI world and hopefully share some learning resources to help you level up your machine learning and AI game. I'm Daniel Whitenack, I'm CEO at PredictionGuard and I'm joined as always by my co host Chris Benson, who is a principal AI Research engineer at Lockheed Martin. How you doing Chris?
[01:20]
Chris Benson
Doing good today, Daniel. It's a lot of interesting things happening out there in the AI AI world and I love these conversations where we do these fully connected kind of deep dives into things that are of personal interest to you and me, which is how we choose them.
[01:36]
Daniel Whitenack
And there are a lot of exciting things coming up in these episodes. It's a little bit easier for us to kind of freeform talk about a few things, but for those listeners who have been seeing our logo for some time on the podcast feeds, just FYI, there'll be a change to that coming up, but no need to swap out our feed or anything that should be be good. We're still doing great things with the Changelog and they've made a few changes to their shows and their lineup, publishing them in different ways. They have a show about that if you want to learn about it. But we'll still be going strong and excited kind of for probably a much needed refresh. You know, I don't know it, I think it's been like six and a half years or something, hasn't it Chris?
[02:26]
Chris Benson
It has been.
[02:27]
Daniel Whitenack
Maybe we can change in six and a half years. I don't know.
[02:29]
Chris Benson
Yeah, I mean six and a half years in GPU time is a long time.
[02:33]
Daniel Whitenack
Yeah, that's expensive. Yeah. So yeah, just FYI, longtime listeners be aware you might might scroll through your podcast app, look for a new logo sometime in the near future. But yeah, I think obviously that's bigger news than Deep Seek, but I guess we can devote most of the episod to what is the story of our week, couple weeks and who knows how long, which has been deep seek. R1.
[03:04]
Chris Benson
I know this is. You know, I was thinking as we were saying that speaking of GPU time, in this case, maybe a lot less GPU time.
[03:12]
Daniel Whitenack
Yeah, a lot less. Maybe an unqueer amount.
[03:17]
Chris Benson
That's true.
[03:17]
Daniel Whitenack
Good point.
[03:19]
Chris Benson
Because they only talked about the final run that was successful in terms of the spend on it. So. So, yeah, well, I guess we're getting ahead of ourselves. For those who are not as familiar with it. Yeah.
[03:28]
Daniel Whitenack
So for those, probably many of those that are listening to this particular episode have come across Deep Seek. But for those that have not seen anything, maybe you've been under a rock somewhere. Chris, what are we talking about with deepseek?
[03:46]
Chris Benson
So we have. There's a Chinese startup that we're talking about here that has released a large generative model, an LLM that is, I guess, and I'm going to gloss over some stuff right here just because we'll dive into the specifics, but is very highly performant. It's comparable to the best models that OpenAI has had out there. But the thing that's really rocked everybody's world is the fact that it was trained at much, much less cost. At least the parts we know, which we'll dive into that detail. As we said, there's some things we. And there's some things we don't know, but it appears to have been achieved at a much, much lower cost than all of the competing models from anywhere in the world up to this point. And so in short, the AI world, and I guess everybody outside the AI world that cares about this stuff is in this giant debate and conversation about the implications. Is that a big deal? Not such a big deal. Why is it a big deal? You know, is it overblown? And of course, Daniel and I are about to dive into all of that right now. Yeah, there's. It's a target rich environment, as we like to say in defense.
[05:06]
Daniel Whitenack
Are they surveilling us while, while we use the model?
[05:09]
Chris Benson
Exactly.
[05:10]
Daniel Whitenack
Could you, should you. Might you run the model in all sorts of different ways? Yeah, there's tons of confusion around this, Chris, which is one interesting thing that hopefully, hopefully after you've listened to this podcast, you're not more confused. We don't make that guarantee, but hopefully, hopefully that's, that's the case. Yeah, it's interesting. So I think one of the stories around this, and there's multiple kind of narratives that we can go into here, so much to talk about. One of the narratives is around how some Chinese startup with a much lower budget or spend on the model building, built such a good model and essentially gets parody to models from OpenAI and others. In particular, the comparison has been made to the O1 model, which if you remember we talked about this on the show, this is OpenAI's sort of thinking quote unquote model. So the model, when it generates output, so you put in a prompt, the LLM generates text output. A beginning portion of that text is sort of thinking content, meaning they're training the model to sort of spit out logic of how to solve maybe a deeper problem or reason about the, the input prompt before it actually gives its final answer. If you're in the chat GPT interface, you can see this kind of in a different color or grayed out if you're using the API. I don't think that they send that back in the API. You still pay for it, but I don't think they send it back. This is a similar type of reasoning model, so deep seq r1 and in this reasoning, this kind of very kind of flagship model of OpenAI, for example, the same kind of task, DeepSeek is kind of getting this what we could call parity now, you know, different benchmarks are out there, et cetera. And you know, each model has its own biases and different behavior. But yeah, the, the first kind of narrative around this in the news is whoa, this came out of nowhere. There's this new company, it's just a startup, they did this on the cheap. So they kind of published numbers around 5 million, 5.56 million, somewhere in that range for the, for the final training of this model. And compared to what it took to train OpenAI's 01, that's like a drop in the bucket. So yeah, this first narrative, what are your thoughts on this, Chris?
[07:55]
Chris Benson
Well, I think that there's a lot more information we want they published that final number as you mentioned, but I've seen a lot of posting about what did it take, what were all the unsuccessful runs that they had, the experimental runs, things like that. There's just so much that's not, not known about this. So they really cherry picked what they chose to publish about it.
[08:19]
Daniel Whitenack
But I mean, have you used it, Chris?
[08:21]
Chris Benson
Oh, I actually, so surprisingly, because of my job, I tend to avoid some of the Chinese technologies in a direct use way. But I did load it onto my personal system. I have it on now. They were having some struggles on login for the last hour at least that I've been trying and got it on. But yeah, I have it up. I know that I coyly sent you a text yesterday with a screenshot of me using it where I was just, just playing with it and I had seen somebody else do this. And so I asked it what happened in Tiananmen Square in 1989? And it replied, I am sorry, I cannot answer that question. I'm an AI assistant designed to provide helpful and harmless responses, which I thought was so. So, you know, that was about what I expected actually, to be perfectly honest. But just a reminder of the geopolitics of AI, you know, which definitely plays big, is that this is a Chinese government approved data set. And so.
[09:17]
Daniel Whitenack
And I think there's to that point. So first off, Deepseek is an amazing team of people that actually just didn't show up on the scene like a week ago.
[09:29]
Chris Benson
Yep.
[09:29]
Daniel Whitenack
They've been doing actually really great open source and science work, you know, for a while now. So the Deep SEQ team has been around, there's been previous Deep SEQ models. They have released those models on Hugging Face, which to be fair to our US counterparts, OpenAI has not released their models openly on Hugging Face to where you can run them and do research with them and analyze them and generate outputs. So that's maybe a point to highlight. They have been open in that sense, but there's been some, I think, confusion around this concept of this very small team. Like there's a couple, couple of guys in their bedroom with a gaming card in their gaming PC and they train this model that beat OpenAI. That's kind of the narrative that's being pumped around. In reality, they have access to tens of thousands of GPUs. I saw a post by Philip Schmidt, who's at Hugging Face now. I don't know all of the details of where he gets certain information, so take this for what it's worth. But one of the things he said, similar to you, Chris, is the sort of five to six million dollars mark that's kind of the final base model. No reinforcement learning, doesn't include smaller runs, doesn't include data generation, which is a key piece of this. So we'll talk about that when we come to the model here in a second. But actually generating the prompts for this model is a significant part of it. And the kind of RL training, as I mentioned, so the costs were definitely greater. There were more resources that were brought into this. It's not a company that just sort of popped up in someone's bedroom. So some of that narrative is, it's true in the sense that the company is small, they clearly were working under constraints, and they did a very impressive thing working under computational constraints in the environment that they're working on, and they released the model openly in hugging face. So kudos to them on that. Some of the other narrative points around that I think are a little bit.
[11:55]
Chris Benson
Fuzzy, so why I'm curious just on kind of how your take on that is. So kudos for being open source on hugging face. And you and I, over the last year, year and a half, have been predicting open source kind of inevitably, at some point, especially as things were plateauing a bit, that open source would inevitably have its impact in this way. And so I don't think it was a question of when and not if. And so having said that, and seeing this particular group putting it on hugging face, why do you think that they declined to include all of the training information on how they got there in terms of the cost? What do you think might be motivated? And I realize I'm asking a speculative question.
[12:43]
Daniel Whitenack
It's interesting. I mean, I think one point to make there is it's not out of character for anyone that's posting these models in the sense that, like Meta's Llama 3. 1. I mean, there's a. You see, kind of these model producers produce, quote, technical papers, but these technical papers don't share details to where in theory, you could reproduce this. Right. And there's details about the data that they're not revealing. There's details about their process that they're not revealing even when they release one of these models. And so it's not out of character for that to be kind of how things happen. And so really, part of me is like, well, why would they be motivated to do that? It would be sort of out of their philosophical commitment to open science, I think, would be the thing that would motivate them to do that, which certain parties do. So, like Allen Institute for AI with their OLMO models, et cetera, have made it a very conscious effort to be truly open in terms of data, process, model assets, et cetera. That is definitely an exception that proves the rule. Is that the right phrase?
[13:57]
Chris Benson
Yeah, I know what you're saying.
[13:58]
Daniel Whitenack
Yeah. So something like that. You get what I'm saying?
[14:00]
Chris Benson
But there's an acknowledgement that in the larger field that the technical papers that come out are as much marketing papers as they are and kind of accomplishment papers as they are.
[14:11]
Daniel Whitenack
I mean, there's certain elements that are interesting to know. Obviously, you can know the model architecture, you know, you're running it it's open, so there's things you can learn from that. I do think that this represents a kind of shock to the system where, you know, you and I have been saying we're basically getting to parity with open models and closed frontier models for all intents and purposes. For most enterprise use cases, we're sort of. We were already at parity, but in the general public's eyes we definitely weren't, and to some degree we weren't in terms of certain types of models and that sort of thing. I think this is definitely a shock to the general public's perception that there is model optionality out there. There's going to be a proliferation of these models from various different places, which then leads into natural discussions about where are these models coming from and can I trust them and how does it behave differently than what I'm used to using and can I run it securely. All of those sort of things pop up and they popped up sort of immediately and captured a lot of. A lot of attention around that. I agree.
[15:35]
Chris Benson
Foreign.
[15:42]
Adam (Changelog)
What'S up, AI practitioners? Adam here from Changelog, want to tell you about how much I love notion. I know Daniel and Chris love notion as well, because we use notion to organize everything. And behind the scenes here at Changelog FM and CPU fm, we work with a lot of cool teams externally, and we create dashboards and workflows and operating systems essentially to work well with others outside of our domain. And the cool thing is, is notion is so flexible that we could do anything with notion. And the coolest thing I'm loving about notion is their notion AI. I can search across all my notes, all my docs, get context, get summaries. It's all AI powered, all inside my notion, powered by all the content in my notion. So I can work with external teams, internal teams, I can build workflows. And all this AI is really helping my team, my tools, my knowledge base, be empowered to do our best work. And unlike other tools out there, you got to jump from one thing to the next to the next, and it's just not seamlessly integrated. Notion is seamlessly integrated, infinitely flexible, and it's beautiful, it's easy to use. Mobile desktop, web shareable, web shareable. I mean, you name it, Notion can do it. And the fully integrated notion AI helps us work faster, write better, think bigger, do tasks more efficiently. Things that would normally take us hours now takes us minutes, maybe even seconds in some cases. And yes, we are a small organization comparative to Fortune 500 companies, but they are used by over half of Fortune 500 companies and teams that use Notion send less emails, they cancel more meetings, they save time searching for their work, and they reduce their spending on tools, which helps everyone stay on the same page. Try Notion today for free when you go to notion.compracticalai that's all lowercase letters, notion.compracticalai to try the powerful, easy to use Notion AI today. And when you use our link, of course you are supporting this show and we love that. Notion.compracticalai.
[17:56]
Daniel Whitenack
Well, Chris, I do want to get to some of the technical details that we know about, you know, what the model is and versions of the model. But maybe before that it would be useful to address the elephant in the room, I guess, which is the security element of this or the cybersecurity privacy issues related to this. So there's been the geopolitical elements around. Oh, is the US Ahead? What does this say about dominance in the space? That's one thing. There's another thing, which is my company, which previously potentially had problems with pasting things into chatgpt that they weren't supposed to like, whatever it is, customer details or IP or whatever, those kind of, that shadow AI usage was already happening in companies. Right. And people were concerned that their employees were pasting things into ChatGPT. Well, now there's this sort of new player in the space. People are pulling that up because it's the new amazing AI app, it turns out that is run by a different company and that data is going to a different place and is being housed on, you know, Chinese servers. So there's, there's that element. So we need to kind of parse through that. But also, this has produced a separate confusion from my perspective around the quote, security of the model is deep, seat secure. I think that's like a very. We have to clarify what we mean when we say that question because I, I don't know what you. How I don't know the things you've seen, Chris, There's a lot of stuff out there that is not very helpful in this sense.
[19:49]
Chris Benson
Yeah, a lot of fear, uncertainty and doubt. And some of it may be justified, some of it may not be. Probably a lot of it may not be. For me, the scarier thing is not the model itself, it's the infrastructure around the model, where it's being housed, what external entities to the core data scientists at that company have access to it. And I think that's where, that's where a lot of the concern is going to be. It's. Yeah, if you have a Separate. If you've downloaded it from Hugging Face and you're running it on your server, that's not to say that that every facet of security is being accommodated, certainly. But at least you've taken some of the issues potentially out of the security equation. So, you know, I definitely, I have on my personal phone, I have the. The app, which is unusual for me, but knowing that we were going to do this and wanting to play with it a little bit, but I am very wary of my. Of what? I don't know about that at this point.
[20:49]
Daniel Whitenack
Yeah, yeah. So you drew out something really important, Chris. And I actually, I wrote a blog post about this that I'll link in the show notes of this episode. If you're interested, you can take a look and it might be a good resource if you hear your engineering management or people in your company, you know, hyping the fears around Deepseek, you know, and maybe you want to use that in a secure environment. Maybe that would be a good tool that you could point them to. But all that to say, I think the main thing that I wanted to highlight in that was this element that you just described. So there's really two ways to access this model. So there's two ways to utilize Deep Seq R1, one is via a product offered by the Deep Seq company, which is a software product that you access and they host. This would be parallel to a lot of other software products like OpenAI Host Chat GPT. That is their product, which has a model interface embedded in it. But it is a product similar to you using Airbnb. Right? You go to Airbnb, you put in your personal information into Airbnb. They have certain terms and service that, that they hopefully follow. But you have no view into what's going on under the hood of Airbnb or Chat GPT or this Deep Seq AI product. And so it's really not the model in that case that is not secure, quote, unquote, in terms of you putting data into it. It is the product built around that model. And it is very clear from the terms and service that Deepseek has posted that they will gather all of your. Well, I shouldn't do a blanket statement like that. They say exactly what they will get from you, but they're saving a lot of your personal data and information. They will use that for future model trainings. And that is housed on servers in China. So that is the explicit terms and service, if you're okay with that. That's the usage of the product Right. Not the model.
[23:11]
Chris Benson
It's funny that you bring that up because like most people in most software products that I use, I don't necessarily go through all the terms of service as carefully as I really should. And, you know, because we're always.
[23:25]
Daniel Whitenack
No one does.
[23:26]
Chris Benson
No one does. But I will confess that when I was downloading the Deep Sea Cap and it brings that up in registration, I did, and I was horrified to read it, and I had already determined I was going to do that. I was using all personal stuff, nothing related to work, that kind of thing. But even so, not only did I kind of do a big swallow on bringing the app down, but it made me really think about the kinds of things that I would put into the interface. Very, very carefully given. And like you said, going around the product as opposed to the model.
[23:58]
Daniel Whitenack
Yeah, yeah. So this is one access pattern, right? Access through the Deep seq, either their mobile app, or I think it's chat.deepseek.com, their chat interface. Similar. Again, similar to ChatGPT. And really, you should have some of the same related concerns with a Chat GPT or an anthropic as you would with Deep seq in the sense that you really want to know how your data will be used and what are the privacy considerations around that. This adds a new element in that it's a sort of foreign entity dealing with that data. Right. So there's a different element of that, but that's not the model, that's the product. The model which they have released on, again on Hugging Face is. And when we say model here, for those that haven't maybe been around the podcast for a while, when we talk about a model, there's sort of two elements of that. There's the code needed to actually run that model, so to process your inputs and generate outputs. And then there is a set of parameters, a set of data that's loaded into that code that parameterizes that code such that it can run both of those have been released. And in fact, actually the code needed to run deepseek. Now, I'm going to make a clarification here. So the code needed to run some versions of DeepSeek is not even DeepSeek's code, at least if you're using the Hugging Face ecosystem. It's a software package called Transformers, which is open source. You can go look at Every line on GitHub. It's maintained by thousands around the world. It's completely open and transparent. And so the code element isn't that, you know, that's being looked at and being developed and the model is implemented in there. The data element is available on hugging face and has potentially its own concerns as you download that into your environment, which we can talk about here in a second. But both of those are open. You can download them and run them even in like if I spin up a VM or some computer, I could download those assets, cut off that computer from the Internet both outbound and inbound. Right. And run that model in complete isolation, where no data goes to Deep seq, no data goes to China. They're not sending and connecting to that computer. Right. So just to make it uber clear here, that is the model. When we say model, that's what we mean. We don't mean the product. And that can be run again with considerations in a secure environment. Now, I should say the caveat here. I do believe as the time of this recording, the transformers library hasn't been updated to support the full deep seq R1 architecture, which is actually very typical. When a new model is released, sometimes it's not always supported in the upstream transformers, which means that there is remote third party code that you have to load to run the full model, which is not true for all of the versions of the model. I expect that will change in a matter of, I don't know, maybe it's changing as I speak now. It'll happen fast, like days, weeks, whatever. That will be kind of merged into upstream and then that concern will kind of go away.
[27:26]
Chris Benson
So just to kind of follow up on that thought, would it be fair, given what you just said, to say that if you were running it on the transformers infrastructure and you did have disconnected inbound and outbound networking from it, just to take all of those extraneous concerns out, would you have any reservations about running it in a scenario where security was important?
[27:53]
Daniel Whitenack
Yeah. So that removes the kind of phone home. My data's going to China.
[28:00]
Chris Benson
Yep.
[28:00]
Daniel Whitenack
Vulnerabilities around remote code execution or something on my computer, those sorts of concerns are, I would say, taken care of. Now there are, I mentioned the data files that you would be loading those model parameters. There are insecure ways that those could be loaded in. JFrog and others have shown vulnerabilities in that. Those are mostly taken care of by using the right model formats, which Deep SEQ is doing. It's called safe Tensors. If you want to look into it, you can. So I wouldn't have any reservations about that. Now that brings up a secondary question though, which is another point of confusion. So I'm glad you brought it up. The secondary question is okay, if I'm running this, it's not phoning home. My data is not going to deep seek or any foreign entity if I'm running it in this secure way. Are there other concerns that are unrelated to this sort of phone home privacy issue type of thing? And I think one of the things that you brought up before was potential biases.
[29:00]
Chris Benson
Biases and the original training set.
[29:02]
Daniel Whitenack
Yeah, in the model. Right. So you brought up this example of asking about Tiananmen Square. Actually I think because we, we also asked a similar question very. I don't know if this is actually fixed in the app as of the time of this recording, but at first when you would ask that question in the Deep SEQ product, the actual application, it would print out the full answer of like it would actually answer what happened and then it would all collapse and give you like a canned like sorry, I can't answer that. So based on that I would know or I would assume that the actual model that you would download and run in that secure environment or you know, on your laptop in one of these kind of local hosting things, that model is not biased in terms of that response. That model specifically is not. You could get it to answer about Tiananmen Square. It's a product decision similar to like when Gemini was trying to create diversity in their image output and generated some really interesting looking things or what ChatGPT or anyone does when you send in a prompt, they inject stuff into that prompt. They do post processing. It's a product. Right. You don't have visibility into any of that. And so they're doing obvious product things there to introduce artificial biases. Now I do think that it is possible that in the sort of alignment fine tuning process, Deepseek had their own vision of how they wanted to align that model, which may not be any malicious in any sort of way or kind of biased in weird political ways. It might just be their choice of how they wanted to bias that model in other ways. Maybe it is motivated by certain things, I don't know. But that model will have its own sort of biased behavior. The other thing that I think has been shown in a number of places with Secure did a study of this and it is a model that is also way more sensitive to prompt injection attacks than kind of the many other state of the art models which produces another type of vulnerability at the application layer. So you've taken care of like the model hosting security issue. But all that to say that doesn't mean at the actual use of the model or in the integration layer, you shouldn't still be asking relevant questions, which again, I highlight some of those things in the blog post if people are interested.
[31:44]
Domo Representative
Well, friends, AI is transforming how we do business, but we need AI solutions that are not only ambitious, but practical and adaptable too. That's where Domo's AI and data products platform comes into play. It's built for the challenges of today's AI landscape. With Domo, you and your team can channel AI and data into innovative uses that deliver measurable impact. Impact While many companies focus on near applications or single model solutions, Domo's all in one platform is more robust with trustworthy AI results without having to overhaul your entire data infrastructure. Secure AI agents that connect, prepare and automate your workflows, helping you and your team to gain insights, receive alerts and act with ease through guided apps tailored to your role and the flexibility to choose which AI models you want to use. So Domo goes beyond productivity is designed to transform your processes, helping you make smarter and faster decisions that drive real growth. And it's all powered by Domo's trust, flexibility and years of expertise in data and AI innovation. And of course, the best companies rely on Domo to make smarter decisions. See how Domo can unlock your data's full potential. Learn more@AI.domo.com that's AI.domo.com.
[33:05]
Daniel Whitenack
So Chris, there's also the element around this that we always like to do when we get into our deeper discussions around any particular model, which is what are the unique technical or architectural elements of this? What types of versions did they release? There's actually might be very confusing for people when they see like deep seq R1 distilled Quinn 32B right that there's a lot of words there that might not make sense. Jay Alomar, who we love on the podcast and has been on the podcast he runs this, has posted for years a lot of great blog posts about kind of illustrated Transformers and other things as a learning resource that you might want to take away from this particular episode. He posted an illustrated deep seq R1 article which goes through some of the details. What's interesting here, Chris, I don't know if you got through any of that, but the overall picture of how they did this fine tuning is fairly similar to, I think how many people have been doing fine tuning for some time.
[34:20]
Chris Benson
And I guess that one of the things I wanted to bring up on this is, I believe, correct me if I'm wrong, It was based on one of the LLAMA models, right?
[34:30]
Daniel Whitenack
Well, the Deep SEQ architecture has been around for some time and is a specific architecture. It's similar to the LLAMA architecture. It is also involving layers upon layers of of transformers. Right. In terms of the exact architecture of this deep seq R1, it does involve mixture of experts layers in the model. So there's layers and layers of transformer blocks in the architecture and then these mixture of experts blocks which you might see people refer to like activated layers or parameters. These mixture of experts layers don't always. You don't process the input through all elements of that layer of the model each time you run the model, which creates some efficiencies both for inference and often for training purposes as well. But yes, it's a similar setup. Some slight differences which also kind of. Those slight differences are the reason why we mentioned earlier, likely you kind of have, at least currently as we're recording this, you might have to import some third party code to support the model in the upstream transformers, which is likely to change quickly.
[35:49]
Chris Benson
How does that affect the fine tuning though in terms of how Deepseek approached it versus maybe how LLAMA has been approached and stuff? Are there differences? Are you seeing a lot of similarity there? I know Sam Altman made a comment, and I'm not quoting him directly, but it was something to the effect of once, you know, once somebody else has already done something that you're basing on, it's a lot easier to do that. And that was his kind of minimization of what Deep SEQ had done. I'm kind of curious, you know, how does that affect this in terms of design tuning?
[36:19]
Daniel Whitenack
I mean the overall, like I say the overall process. And when I say overall process, this is often kind of a pre training step of very raw data that is completely unsupervised. A fine tuning step which is supervised and maybe an additional fine tuning step which is like a preference tuning. That sort of overall picture of how the training is done seems to also be true here. This is like the overall picture of how they did that. Now there are some unique elements of this in that they created this deep seq R10 model which is kind of an. They use this interim reasoning model to actually help generate some of the data for that supervised fine tuning step. So this is where, going back to the original discussion in our conversation, that 5 million number corresponds to maybe that final or one of the final training steps, but not necessarily the data generation. So they used interim models that they, you know, their Intention wasn't to release. It doesn't perform great in terms of a general purpose model, but it might perform well to generate long chain of thought examples like this reasoning, these reasoning examples to add into the training or the training data. That's supervised fine tuning which allows you to augment your fine tuning data, use less human resources to create that fine tuning. So I forget the exact figures, but Meta did spend a ton of money in terms of the data curation with human data labelers to create those data sets for, for the LLAMA models and probably still are in this case, at least some of that data was this synthetic data that was, that was generated by this interim model. So that's kind of one interesting step of, of the process that maybe is relevant to some of the budget and efficiency considerations.
[38:23]
Chris Benson
Sure. And maybe that's part of the motivation of leaving that out altogether is that wasn't a direct cost necessarily to them, or at least not in the way that it had been when it was originally manufactured.
[38:35]
Daniel Whitenack
Yeah. So there's this deep seq R1 model, which again is the architecture is not fundamentally different from architectures of what we've seen in the past. There were some creative things done in the training process. Both that significant portion, assuming it was a significant portion of data which was synthetic or generated data. They also used some kind of automated processes and model based processes to filter and curate that generated data, to actually filter out good examples from kind of all of the candidate examples. So there were some very creative things in that data generation piece. But the other stages of this were not fundamentally stages of training that we haven't been familiar with with other model releases. There are a number of model versions that have been released from DeepSeek. So there's the Deep Seq R1, that sort of flagship, which is like 700 billion parameters or something. It's very large. You're going to need at least at full precision. You're going to need many, many GPUs to run this. I think Philip from Hugging Face said like his, what he said was like 16 x 80 gigabyte GPUs, like 16 x H1 hundreds to give you context. I think an H100 will, if you have it up all the time at on demand pricing and the cloud is going to run you something like 60 to 80 grand a month, something like that. So you need 16 of those to run the model, the full model in full precision with at least Nvidia GPUs. Then they've released other variants of this model and that Full model is that mixture of experts model which has the element of the kind of external or third party code added into it. They've also released distilled versions of that model. So we can get into that here in a second. But just wanted to make clear kind of what the main model looked like. And these distilled versions of the model, if you go to hugging face, you can actually look at the collection From DeepSeek for DeepSeek R1. And what you'll see is a whole bunch of different Deep SEQ models, which again is often a point of confusion with people like what do all these things mean? So we've got deep seq R1, we have distill Lama 70B, distill Quin 32B, distill Lama 8B, et cetera. These ones are really significant for people to maybe understand. So these models are what's called dense models. So they don't have the mixture of experts element. They run all of their parameters all the time and they're distilled models. So what DeepSeq did is they took their flagship Deep Seq R1, they created their flagship model, and then they used the process of knowledge distillation to create smaller versions of that model that leverage the kind of power of the larger model in the following way. So we've talked about this on the show before especially. We had a, we had a episode with Noose Research. They talked about this a lot. If you're curious, go and check that out. But you essentially use the larger model to generate a bunch of example outputs and then you use those inputs and outputs that were generated by the larger model to then train a smaller model with very, very high quality data that boosts the performance of the smaller model beyond what you could get from a smaller model just from training it by scratch. So when it says deep seq R1 distill llama 70B, it's a distilled version of llama 70B that use deep seq R1 to generate that synthetic data for the fine tuning process, which is great because they have versions of this model from between 1.5 billion up to 70 billion. It's great because actually, you know, especially the smaller ones, you could run it on your laptop, certainly the kind of sweet spot ones around that 8 billion to, you know, 32 billion. Those would take a card or a couple cards you could run them on. And so this gives more accessibility to these models. And of course, people have already proliferated from there with various other optimizations like gguf and other ones that can run on, you know, your MacBook processors and that sort of thing. So just to clarify the kind of ecosystem around the models there, those are a couple things to keep in mind.
[43:34]
Chris Benson
Yeah, that's useful to know. So couple of big questions, you know, as we are starting to wind up here, they kind of to address to pull out of the technical a little bit for a moment and kind of address the ecosystem at large, the AI community. What do you, what would you predict that Deep Seek now being here and how things wash out? Obviously there's, there's, you know, the market reacted. I think they took half a. Half a trillion dollars away from Nvidia plus another half a trillion from.
[44:05]
Daniel Whitenack
Don't they have a lot of trillions of.
[44:07]
Chris Benson
They have a few, they have a few and so, but, but that will probably, you know, stabilize out going here. The what do you think that we're looking at over the months ahead, you know, beyond the day of you know, market reacting today kind of thing and we're talking about. But as you look at six months out, nine months out, that kind of thing, what do you think the real impact of Deep Seek is going to be on the larger AI global community?
[44:35]
Daniel Whitenack
Well, we've been saying it for a while, but I think the wider business community has not realized this. And one of the kind of fallouts from this or the things that will shift I think are really people taking seriously that in the future part of what your business needs to consider is model optionality. Right. Sort of these GPT models kind of were king for a long time but now, you know, apparently if you got $5 million sitting around, you could create a best in class model. And you know, what does that mean? It means all of these are going to proliferate very quickly. This is not the last of these types of models we will see. They will proliferate very quickly. And you having kind of model lock in quote unquote like you built all of your AI functionality around this particular model, whether that be open or closed. That's not going to work out great for you in the long run just because of the models changing and so building in this ability to swap models to have control and configurability. I think that's one of the kind of trends there. The other one I would say is now that you're considering bringing these models into your own infrastructure, like there's parity with OpenAI in many respects it brings up all of these questions that we were immediately prompted with, right? Like well if you bring that into your environment? What are the security concerns related to that? How can you run it robustly and reliably? What should you be monitoring in production? Right. So it brings up all of these additional questions which I think overall will be really good for people to consider because they're probably things they should have been considering for the past year as so many things were built on kind of one model family. So yeah, those are a couple of thoughts. I don't know if you have additional ones.
[46:32]
Chris Benson
I've been speculating on whether as valuations have been going up and up and up for all these different AI startups all over, across the entire globe and the budgets have just grown astronomically. Is this the moment where at this point in investors looking at it going, why do you need 100 million? Why don't we give you 5 million and see what you can do with it?
[46:55]
Daniel Whitenack
Look what they did with it, look.
[46:57]
Chris Benson
What they did with it and stuff. And so whether the cost of operations in AI startups is now going to affect and if that's the case and with potentially not everybody being able to be quite so productive with their 5 million, what does that mean? Are we going to go through a little bit of a cleanup round in the AI startup world or whatever? Any thoughts on that as we finish up?
[47:23]
Daniel Whitenack
Yeah, it's interesting. I mean it definitely has an impact if you're kind of a model building type of startup for sure. And we've seen other examples of this even last week, you know, hearing from genmo and what they're doing with video generation models with a very small team and what they're able to accomplish. I think it definitely has implications on that side. I think it also has implications for those. Hopefully I think people will start thinking about less the kind of model building ventures which will be interesting. But also this is going to make model building and fine tuning even more accessible to kind of the enterprise, which will kind of fuel both tooling and infrastructure type of investments as well. And now those I don't think will have the kind of inflated like, oh, I need a hundred million dollars to train a model sort of scenario. But yeah, we'll see how that shakes out.
[48:28]
Chris Benson
I think in enterprise world out there, maybe as a final thought on this, I think that you're now going to see existing budgets which often for large companies are in the hundreds of millions of dollars where they're not AI specialty companies, but they're. But it matters enough to have a big budget. The expectation that I've always used other models and we've built lots of infrastructure around that. Maybe there's pressure now to actually go and work on specifically models for your business that you're creating because you have 5 million times X available to do that in the new way of thinking. So it may certainly change what the expectations are in enterprise world.
[49:11]
Daniel Whitenack
What I think will be stressed in that case is the data curation and human involvement in that process.
[49:19]
Chris Benson
Yes.
[49:19]
Daniel Whitenack
That's not, I mean, there was clearly a big investment on that side here. So even if you spend 5 million on that, you know, final training, there's definitely a process that goes into that. I agree. Yeah.
[49:33]
Chris Benson
Good conversation today.
[49:35]
Daniel Whitenack
Yeah, definitely, Chris. Definitely. Check, check out some links that we'll put in the show, notes to various articles about this, both on the technical and the kind of more hyped geopolitical stuff. So check it out. Thanks for joining again. Great to talk, Chris.
[49:51]
Chris Benson
Yep. Absolutely. See you next week.
[50:00]
Podcast Host / Announcer
All right. That is our show for this week. If you haven't checked out our Changelog newsletter, head to changelog.com news. There you'll find 29 reasons, yes, 29 reasons, why you should subscribe. I'll tell you reason number 17, you might actually start looking forward to Mondays.
[50:20]
Daniel Whitenack
Sounds like somebody's got a case of the Mondays.
[50:24]
Podcast Host / Announcer
28 more reasons are waiting for you@changelog.com news. Thanks again to our partners at fly IO to break master Cylinder for the Beats and to you for listening. That is all for now, but we'll talk to you again next time.