Reshaping Workflows with Dell Pro Precision and NVIDIA RTX GPUs
Episode Title: Live from GTC: Train Smarter, Not Bigger with NVFP4
Date: March 18, 2026
Host: Logan Lawler
Guests: Max (AI platform software team, NVIDIA), Hero (Solution Architect, NVIDIA)
Event: Recorded live from NVIDIA GTC 2026
Episode Overview
This episode, hosted by Logan Lawler live from GTC 2026, dives into groundbreaking advances in AI workstation technology and efficiency, centered on the partnership between Dell Pro Precision and NVIDIA. The focus is on the new NVFP4 (NVIDIA Float 4 Precision) format running on the latest Blackwell GPU architecture and its integration into workflow solutions like the Dell Pro Max. Max and Hero from NVIDIA join to break down how NVFP4 technology is revolutionizing large language model (LLM) training, delivering significant gains in performance, memory, and energy efficiency—all without sacrificing accuracy.
Key Discussion Points & Insights
1. Introductions and Roles at NVIDIA
- [00:33 - 01:05]
- Max: AI platform software team, supporting training/inference frameworks for better NVIDIA performance; prior focus on chip design.
- Hero: Solution architect who directly helps customers tackle and solve a diverse array of workflow problems.
2. Demonstrating NVFP4 on Blackwell GPUs
- [01:19 - 04:11]
- Demo Focus: Training with NVFP4 format using the JAX framework on Blackwell, NVIDIA’s latest GPU.
- NVFP4 offers a 27-40% performance gain over FP8 (Float 8 Precision) and substantial savings in memory and overall system energy.
- Over the years, memory/energy savings:
- “Save 50% [energy] compared to Hopper generation and 210,000 times compared to Kepler, which is 12 years ago.” — Max [02:02]
- Key techniques enabling NVFP4 advances:
- Stochastic rounding
- Random Hadamard transform
- Two-level scaling
- Higher-precision retained for sensitive final layers
Quote:
“We get a 27 to 40% gain on performance using FP4 compared to FP8. And you save the memory…over the years you save 50% [energy] compared to Hopper generation and 210,000 times compared to Kepler.”
— Max ([01:37–02:11])
3. Real-world Usability & Open Recipe
- [03:06 - 03:55]
- User Simplicity:
- “Although the techniques look comprehensive, the usage is very easy.”
- “You can just change a single argument, which is a quantization, and it will call the transformer engine to help you get the performance gain.” — Max [03:28]
- Framework & Community Adoption:
- NVFP4 recipes published in the JAX toolbox; used by leading labs and accessible for experimentation.
4. Performance Data & Benchmarking
- [03:55 - 04:35]
- Key Findings:
- On GB300 nodes, FP4 averaged 20% gain over FP8 with “overlapping loss curves” confirming accuracy retention.
- NVFP4 completes tasks faster under time constraints where FP8 can’t.
- Explanation:
- FP4 step: 0.6 seconds vs. FP8 step: 0.7 seconds, which adds up significantly over thousands of steps.
Quote:
“We collected data…and compare the FP4 and FP8. We still amortize. They have a 20% gain and also the loss that are overlapped, which proves FP4 still keeps the residue without loss entity.”
— Max ([03:59–04:15])
5. Advantages of NVFP4 Over Higher Precision (FP16, FP32)
- [05:14 - 05:27]
- Hero:
- Summed up simply: “You use much less resource but without losing the accuracy.”
- Two-level scaling preserves accuracy with this reduced-precision format.
- Customer Takeaway:
- Businesses gain better resource efficiency for both training and inference, maintaining competitive edge without hardware overhauls.
Quote:
“The advantage is that you use much less resource but without losing the accuracy. We use two-level scaling to maintain the accuracy as much as possible.”
— Hero ([05:14–05:27])
Memorable Moments & Notable Quotes
- Host’s humor about unsolvable hotel shortages in San Jose:
- “The only problem you’re not solving is the lack of hotel rooms inside San Jose. Unfortunately that’s just a joke.” — Logan ([01:05])
- Summary close:
- “FP16 is yesterday’s news. Unfortunately, if you haven’t checked it out, NVIDIA FP4 runs on the Dell Pro Max with GB10 or Spark, or the GB300, also the data center products.” — Logan ([05:30])
Timestamps for Key Segments
- 00:33–01:05: Guest introductions, background roles at NVIDIA
- 01:19–03:55: NVFP4 demo on Blackwell GPUs, technical strategies, performance and efficiency discussion
- 03:55–04:35: Benchmark data and real-world testing, time savings
- 05:14–05:27: Plain-language customer advantages of FP4 vs. FP16/FP32
- 05:30: Host’s closing and product call-out
Episode Takeaways
- NVFP4 ushers a new era in AI training: Substantial increases in throughput, reductions in memory and energy, and maintained accuracy over higher-precision formats.
- Implementation is streamlined: Recipes are easy to adopt and integrate into existing workflows, especially with JAX.
- Dell Pro Precision x NVIDIA Blackwell delivers: Provides future-ready hardware efficiency and cutting-edge AI capability for both enterprise and research customers.
