Practical AI – Deep-dive into DeepSeek
Episode Date: Jan 31, 2025
Hosts: Daniel Whitenack (CEO at PredictionGuard) & Chris Benson (Principal AI Research Engineer at Lockheed Martin)
Episode Overview
In this “fully connected” Practical AI episode, Daniel and Chris take a deep dive into DeepSeek, the headline-grabbing new Chinese large language model (LLM). They break down what makes DeepSeek significant: its high performance, low reported training cost, open release on Hugging Face, security and privacy debates, and the model’s likely impact on the global AI landscape. The episode aims to cut through the hype, clarify confusion, and help listeners understand both the technical and broader implications of DeepSeek’s arrival.
Key Discussion Points & Insights
1. What is DeepSeek, and Why Is It Causing a Stir?
[03:28 - 07:54]
- DeepSeek is a generative LLM from a Chinese startup, notably achieving performance comparable to "frontier" models like OpenAI's GPT-4/01.
- The model shocked the AI community due to its claimed low final training cost—about $5–6 million, a fraction of what Western companies reportedly spend.
- Debate is raging: Is DeepSeek’s accomplishment as big a deal as the buzz suggests? What does this mean for the cost and accessibility of leading-edge AI?
“It appears to have been achieved at a much, much lower cost than all of the competing models from anywhere in the world up to this point.”—Chris Benson [04:03]
- The mainstream narrative sometimes exaggerates DeepSeek as a “bedroom team” effort, but in reality, DeepSeek is a well-resourced organization with access to tens of thousands of GPUs.
- Their open release on Hugging Face contrasts with OpenAI and Meta, whose models and training data are less open to independent scrutiny.
2. What About Security, Privacy, and Geopolitics?
[17:56 - 31:43]
- There are two main ways to access DeepSeek:
- Hosted product (app/web interface): Data goes to DeepSeek, governed by their (explicit) terms, stored on servers in China. Privacy implications here are similar to using ChatGPT or Gemini, but with new geo-political ramifications.
- Downloable model artifacts (via Hugging Face): The open model and safe-tensor weights can be run locally, even in a 100% air-gapped environment—removing “phoning home” and privacy fears.
“It is very clear from the terms and service that Deepseek has posted that they will gather all of your… they’re saving a lot of your personal data and information. They will use that for future model trainings. And that is housed on servers in China.”—Daniel Whitenack [21:09]
- Security concerns shift depending on how you use DeepSeek. Local/offline use is secure in traditional senses (assuming best practices and safe model files), but there are still risks—
- Bias: The model may reflect its training data and alignment process, which can differ from Western models.
- Prompt injection vulnerability: Secure's studies show DeepSeek is likely more susceptible to prompt-based attacks at the application layer than other leading LLMs.
- Chris emphasizes that with open models:
“If you were running it on the transformers infrastructure and you did have disconnected inbound and outbound networking… would you have any reservations about running it…? …I wouldn’t.”—Daniel [27:25-28:48]
3. Technical Deep Dive: Architecture, Training, and Variants
[33:04 - 43:34]
- Model Architecture:
- DeepSeek R1 is a “mixture of experts” model, a transformer-based architecture engineered for efficiency.
- Engineering choices similar to Llama (Meta) but with unique implementation details (prompting slower adoption in upstream Hugging Face Transformers—this is changing rapidly).
- “Mixture of experts” allows only some blocks of the model to activate per inference, reducing compute cost.
- Training Strategies:
- Three customary stages: unsupervised pretraining; supervised fine-tuning; reward/preference modeling.
- DeepSeek innovated in data generation:
- Used intermediate “reasoning” models (e.g. DeepSeek-R10) to generate high-quality synthetic chain-of-thought data for supervised fine-tuning, reducing human effort.
- Automated filtering to select top-quality generated data for final model training, increasing efficiency.
“They used this interim reasoning model to actually help generate…reasoning examples to add into the training data…This allows you to augment your fine tuning data, use less human resources…” —Daniel Whitenack [36:18]
- Model Variants & Distillation:
- Besides the massive 700B-parameter flagship, DeepSeek released multiple distilled models (e.g. “DeepSeek R1 distill Llama 70B”), trained to imitate the base R1 model output but much smaller and cheaper to run, even on consumer hardware.
- Smaller models (e.g. 8B, 32B) are accessible for laptop-scale or enterprise-hosted deployments.
- Community already porting DeepSeek distillations to GGUF and other efficient formats for Macbooks and non-GPU use.
4. The Broader Impact: What Does DeepSeek Mean for AI?
[43:34 - 49:33]
- Business and Technical Implications:
- Model optionality: Enterprises should prepare for a world with dozens of competitive, accessible LLMs—not just one or two.
- No more model lock-in: Future-proof AI architectures require easy model swap-ability, monitoring, and robust security for multiple vendor offerings.
- Cost expectations are shifting: It's no longer credible to demand $100M+ model budgets when startups (with the right know-how) can reach parity at ~5M.
- Startups may face a reckoning: Investors will question high valuations and operational costs. The era of "just add more GPUs" may be fading.
- Data curation: Training still demands significant investment in high-quality, well-aligned data (and human input), regardless of low compute cost.
“If you got $5 million sitting around, you could create a best-in-class model…These are going to proliferate very quickly…having kind of model lock in…That’s not going to work out great for you in the long run.”—Daniel Whitenack [44:35]
“Is this the moment…where investors [say]…‘Why do you need 100 million? Why don’t we give you 5 million and see what you can do with it?’”—Chris Benson [46:32]
Notable Quotes & Memorable Moments
Geopolitics & Censorship
- On expected bias in answers from a Chinese LLM:
“I asked it what happened in Tiananmen Square in 1989? And it replied, I'm sorry, I cannot answer that question. I'm an AI assistant designed to provide helpful and harmless responses… just a reminder of the geopolitics of AI…” —Chris Benson [08:21]
On Open vs Closed Model Culture
- Transparency and reproducibility:
“You see…model producers produce ‘technical papers’, but these technical papers don’t share details to where, in theory, you could reproduce this.”—Daniel Whitenack [12:43]
On Model Security Myths
- Separating model vs product:
“When we say ‘model’, that's what we mean. We don't mean the product. And that can be run again with considerations in a secure environment.”—Daniel Whitenack [24:05]
On the Real Cost Story
- Much more behind the scenes:
“There's just so much that's not…known about this. So they really cherry picked what they chose to publish about it.”—Chris Benson [07:54]
On Startups & Budgets
- VCs may rethink what “enough” looks like:
“Maybe…investors [will be] looking at it going, why do you need 100 million? Why don't we give you 5 million and see what you can do with it?”—Chris Benson [46:32]
Important Timestamps for Segments
| Timestamp | Topic/Segment | |------------|--------------| | 03:28–07:54 | DeepSeek intro, cost surprise, Open vs Closed models | | 08:18–13:56 | DeepSeek narratives, team size, hype vs reality | | 17:56–31:43 | Security, privacy, local vs hosted use, bias/fears | | 33:04–43:34 | Model architecture, variants, distillation explained | | 43:34–49:33 | Impact on AI ecosystem, business, VC, data curation |
Resources & Learning Links
- Jay Alammar's “An Illustrated DeepSeek R1” – Visual, accessible explanation of DeepSeek’s architecture and process.
- Daniel’s Blog Post on DeepSeek Security and Privacy – Linked in show notes, addresses myths vs real risks for using DeepSeek in enterprise.
Final Thoughts
DeepSeek highlights just how fast, open, and competitive the LLM space is now becoming. The episode encourages business and technical leaders to rethink both the economics and strategy of AI deployment: anticipate a future full of accessible, swappable models—and prepare for both the risks and opportunities this fresh wave of innovation will bring.
“It's definitely a shock to the general public's perception that there is model optionality out there. There's going to be a proliferation of these models from various different places…”—Daniel Whitenack [14:10]
