Podcast Summary: Live from GTC: Write Once and Run Anywhere with NVIDIA's cuTile
Podcast: Reshaping Workflows with Dell Pro Precision and NVIDIA RTX PRO GPUs
Host: Dell Technologies AI Factory with NVIDIA (Host: Logan Lawler)
Date: March 18, 2026
Guests: Rohit and Sash, Tech Marketing, NVIDIA
Episode Overview
This episode, recorded live from the NVIDIA booth at GTC 2026, explores how modern workstation innovation—specifically via the new Dell Pro Precision line in partnership with NVIDIA RTX Pro GPUs—transforms AI and high-performance computing workflows. Host Logan Lawler dives into the power and versatility of NVIDIA’s cuTile, a new Python DSL, with guests Rohit and Sash from NVIDIA. The discussion centers on the promise of seamless code portability: writing high-performance kernels once on a desktop device like the GV10 (aka “Spark”), and deploying them anywhere—from the desk to the data center or cloud—on diverse, cutting-edge NVIDIA hardware.
Key Discussion Points & Insights
1. Seamless Hardware-Agnostic Development with cuTile (Tiles Programming Language)
-
cuTile as a Game-Changer
- cuTile allows developers to write kernels (compute code) once and run them on any NVIDIA hardware, from desktop to data center (GB10, H100, B200, GB300, etc.).
- It is a Python-based Domain-Specific Language (DSL) for kernel development.
- Eliminates the need for hardware-specific code tweaks.
"You can write kernels on the GB10 locally. And then the special feature about cuTile is that it’s completely agnostic of hardware, so you can literally run the same code on any of the cloud machines...and you will see almost a linear performance improvement as you increase petaflops."
— Rohit [02:05] -
Easy Onboarding, Fast Results
- Getting started with cuTile takes very little time (“within like 10 minutes”).
- Designed for productivity and rapid prototyping on local workstations.
-
Enables True "Write Once, Run Anywhere" Workflow
- Previously, updating code for every new hardware generation required manual changes.
- Now, the cuTile compiler converts written kernels to optimal code for any targeted NVIDIA hardware automatically.
"Before cuTile...you would have to do a little bit of changes to your code to kind of include new hardware features which would not be automatically included...But with cuTile...you don’t really have to do any code changes. It’s performant out of the box on any NVIDIA hardware that you run."
— Rohit [04:06]
2. Real-World Workflow: Desk to Data Center to Cloud
-
Workflow Overview
- Developers can begin work on a device like the GV10 (“personal supercomputer”), iterate and fine-tune locally, then deploy to enterprise-class systems or cloud environments without re-coding.
-
Supported Hardware Stack
- cuTile supports GB10, H100, B200, GB300, and other major Nvidia hardware platforms.
"You can write once on a...on a GB10 and it can deploy to S100 or B200 directly, automatically, the compiler, and underneath, it converts your kernel to whatever the hardware you want in its most optimized fashion."
— Sash [03:26]
3. Industry Impact & Future-Ready Innovation
-
Efficiency and Future-Proofing
- With cuTile, teams avoid duplicated effort across hardware generations.
- Performance scales nearly linearly with more powerful hardware, offering a clear growth path as projects move from prototypes to production.
-
Launch Context:
- cuTile launched around six months before this episode; previously, NVIDIA’s CUDA required more manual code maintenance to leverage new features.
Notable Quotes & Memorable Moments
-
On the transformative nature of the GB10 and cuTile:
“I'm not going to say [the GB10] has changed the face of modern AI development, but we're going to go ahead and say it did. That's not my words, that's other people's words.”
— Logan, Host [01:24] -
cuTile's value proposition, summarized:
“Try out cuTile. It's really easy. It's a tile programming language. You can get started within like 10 minutes. And once you write all your kernels in cuTile, you don't have to worry about writing them ever again, unless you want to change the logic.”
— Rohit [02:05] -
On the deployment stack’s flexibility:
“At the end of the day, you can write once on a GB10 and it can deploy to S100 or B200 directly...the compiler...converts your kernel to whatever the hardware you want in its most optimized fashion.”
— Sash [03:26]
Timestamps for Key Segments
- 00:19 – Host introduction, brief overview of new Dell and NVIDIA announcements, purpose of the episode.
- 01:03 – Guest introductions: Rohit and Sash from NVIDIA Tech Marketing.
- 02:05 – Rohit explains the concept and benefits of cuTile.
- 03:26 – Sash details deployment and stack compatibility; explanation of deployment on different hardware (H100, B200).
- 04:06 – Rohit provides context on the launch of cuTile and how it contrasts with previous approaches.
- 04:33 – Discussion of the full desk-to-data-center workflow enabled by Dell+NVIDIA solutions.
Final Thoughts
This episode underscores NVIDIA and Dell’s commitment to streamlining and democratizing AI development. With cuTile and the new generation of Pro Precision workstations, developers can rapidly prototype locally and deploy seamlessly across a full spectrum of powerful hardware, maximizing productivity and minimizing code maintenance. The conversation captures both technical insights and the excitement of live innovation at GTC 2026.
