Summary4 min read

Podcast Summary: Live from GTC: Write Once and Run Anywhere with NVIDIA's cuTile

Podcast: Reshaping Workflows with Dell Pro Precision and NVIDIA RTX PRO GPUs
Host: Dell Technologies AI Factory with NVIDIA (Host: Logan Lawler)
Date: March 18, 2026
Guests: Rohit and Sash, Tech Marketing, NVIDIA

Episode Overview

This episode, recorded live from the NVIDIA booth at GTC 2026, explores how modern workstation innovation—specifically via the new Dell Pro Precision line in partnership with NVIDIA RTX Pro GPUs—transforms AI and high-performance computing workflows. Host Logan Lawler dives into the power and versatility of NVIDIA’s cuTile, a new Python DSL, with guests Rohit and Sash from NVIDIA. The discussion centers on the promise of seamless code portability: writing high-performance kernels once on a desktop device like the GV10 (aka “Spark”), and deploying them anywhere—from the desk to the data center or cloud—on diverse, cutting-edge NVIDIA hardware.

Key Discussion Points & Insights

1. Seamless Hardware-Agnostic Development with cuTile (Tiles Programming Language)

cuTile as a Game-Changer
- cuTile allows developers to write kernels (compute code) once and run them on any NVIDIA hardware, from desktop to data center (GB10, H100, B200, GB300, etc.).
- It is a Python-based Domain-Specific Language (DSL) for kernel development.
- Eliminates the need for hardware-specific code tweaks.
"You can write kernels on the GB10 locally. And then the special feature about cuTile is that it’s completely agnostic of hardware, so you can literally run the same code on any of the cloud machines...and you will see almost a linear performance improvement as you increase petaflops."
— Rohit [02:05]
Easy Onboarding, Fast Results
- Getting started with cuTile takes very little time (“within like 10 minutes”).
- Designed for productivity and rapid prototyping on local workstations.
Enables True "Write Once, Run Anywhere" Workflow
- Previously, updating code for every new hardware generation required manual changes.
- Now, the cuTile compiler converts written kernels to optimal code for any targeted NVIDIA hardware automatically.
"Before cuTile...you would have to do a little bit of changes to your code to kind of include new hardware features which would not be automatically included...But with cuTile...you don’t really have to do any code changes. It’s performant out of the box on any NVIDIA hardware that you run."
— Rohit [04:06]

2. Real-World Workflow: Desk to Data Center to Cloud

Workflow Overview
- Developers can begin work on a device like the GV10 (“personal supercomputer”), iterate and fine-tune locally, then deploy to enterprise-class systems or cloud environments without re-coding.
Supported Hardware Stack
- cuTile supports GB10, H100, B200, GB300, and other major Nvidia hardware platforms.
"You can write once on a...on a GB10 and it can deploy to S100 or B200 directly, automatically, the compiler, and underneath, it converts your kernel to whatever the hardware you want in its most optimized fashion."
— Sash [03:26]

3. Industry Impact & Future-Ready Innovation

Efficiency and Future-Proofing
- With cuTile, teams avoid duplicated effort across hardware generations.
- Performance scales nearly linearly with more powerful hardware, offering a clear growth path as projects move from prototypes to production.
Launch Context:
- cuTile launched around six months before this episode; previously, NVIDIA’s CUDA required more manual code maintenance to leverage new features.

Notable Quotes & Memorable Moments

On the transformative nature of the GB10 and cuTile:

“I'm not going to say [the GB10] has changed the face of modern AI development, but we're going to go ahead and say it did. That's not my words, that's other people's words.”
— Logan, Host [01:24]
cuTile's value proposition, summarized:

“Try out cuTile. It's really easy. It's a tile programming language. You can get started within like 10 minutes. And once you write all your kernels in cuTile, you don't have to worry about writing them ever again, unless you want to change the logic.”
— Rohit [02:05]
On the deployment stack’s flexibility:

“At the end of the day, you can write once on a GB10 and it can deploy to S100 or B200 directly...the compiler...converts your kernel to whatever the hardware you want in its most optimized fashion.”
— Sash [03:26]

Timestamps for Key Segments

00:19 – Host introduction, brief overview of new Dell and NVIDIA announcements, purpose of the episode.
01:03 – Guest introductions: Rohit and Sash from NVIDIA Tech Marketing.
02:05 – Rohit explains the concept and benefits of cuTile.
03:26 – Sash details deployment and stack compatibility; explanation of deployment on different hardware (H100, B200).
04:06 – Rohit provides context on the launch of cuTile and how it contrasts with previous approaches.
04:33 – Discussion of the full desk-to-data-center workflow enabled by Dell+NVIDIA solutions.

Final Thoughts

This episode underscores NVIDIA and Dell’s commitment to streamlining and democratizing AI development. With cuTile and the new generation of Pro Precision workstations, developers can rapidly prototype locally and deploy seamlessly across a full spectrum of powerful hardware, maximizing productivity and minimizing code maintenance. The conversation captures both technical insights and the excitement of live innovation at GTC 2026.

Loading summary

Transcript12 lines

[00:05]
A
Welcome to reshaping workflows with Dell Pro Precision and Nvidia, where innovation meets real world impact in high performance computing.
[00:20]
B
It's Logan live again from GTC 2026 in the Nvidia booth and we've got a great little mini episode today. I'm with Rohit and I'm with Sash. I have to say it because the name, it's so good. It's like a karate chop. I love it. So we're in the Nvidia booth. I mean, we've obviously talked about a lot about Spark, the Nvidia DGX Spark are the Dell, the dell promax with GB10 or our new launch that's over in the Dell booth, which is the GB300. The Dell Promax GB300. And a lot of people ask questions about how do you ultimately do development work, do fine tuning and then you deploy that to the data center of the cloud very quickly. Right. So that's what we're going to talk about today. So before we get started, I'm going to give it to Rohit and then Zash to let them introduce themselves and then we'll get it kicked off.
[01:04]
C
Hey, Logan, thanks for your time. I'm Rohit. I work as tech marketing at Nvidia and I'm with Saish.
[01:12]
D
Thanks, Logan. And yeah, I'm with teammate of Rohit here and I work at tech marketing too. We work on the marketing front of things. How we no showcase of product. How do you advertise them on that?
[01:24]
B
So we'll start with Rohit for the first question is, you know, obviously last year Jensen announced the, you know, the Nvidia Spark. The GB10 device has been very popular. I'm not going to say it's changed the face of modern AI development, but we're going to go ahead and say it did. That's not my words, that's other people's words. But in the end of the day, that system's really designed for maybe a one, mostly a one to one, maybe a two to one, depending on, you know, the workload. Right. But ultimately people want to bring that work down to the desk, you know, down to the desk site, then ultimately the data center of the cloud. And people always ask, how do you do that? So let's start here. Like tell us a little bit about the demo that you're showing from taking that work, you know, on a Spark up to the data center of the cloud.
[02:05]
C
So I think, Logan, you came to the perfect booth. What we are showcasing here is we take CTile which is a new Python DSL in which you can write code kernels. The idea is that you can write kernels on the GB10 locally. And then the features. The special feature about Cutile is that it's completely agnostic of hardware, so you can literally run the same code on any of the cloud machines, including an H100, GB, 200, GB300, whatever, and you will see almost a linear performance improvement as you increase petaflops. So, as you were rightly pointing out that the GB10 has been very popular and it. It's almost a personal supercomputer, let's say that. But making it actually work for the data center is what we're trying to solve here. Try out Cutile. Cutile. It's really easy. It's a tile programming language. You can get started within like 10 minutes. And once you write all your kernels in Cutile, you don't have to worry about writing them ever again, unless you want to change the logic.
[03:08]
B
Okay, great. So I think a very good explanation. So, Sash, a question for you, right, you start on the GV10. We've talked about cloud, we've talked about data center. How does this work? Can we deploy on H1 hundreds, H2 hundreds? Does it need to be on GV2 hundreds? Like, what's the stack that it can ultimately be deployed to?
[03:27]
D
So basically what QTile enables you is it's basically just a way of writing a kernel, right? So at the end of the day, you can write once on a. On a GB10 and it can deploy to S100 or B200 directly, automatically, the compiler, and underneath, it converts your kernel to whatever the hardware you want in its most optimized fashion.
[03:46]
B
Your kernel is written in quick and concise. I love that. So let me ask, within kind of this whole deploying from the GV10 to data center or cloud, has there been any new launches specifically at this gtc, or is this something that's existed for a while now since you launched the product?
[04:06]
C
Oh, that's a great question. I think before QTile released like six months ago and before that, you would have to do a little bit of changes to your code to kind of include new hardware features which would not be automatically included when you write CUDA kernels. Right. But with Cudile, which came out six months ago, and we're demoing it at GDC now, you don't really have to do any code changes. It's performant out of the box of on any Nvidia hardware that you run.
[04:33]
B
I love that. And I mean it's truly a death side to data center cloud story. Right? So Sash Prohit, really appreciate the time. If you haven't checked it out one, you need to definitely get your Spark or your GB10 device ultimately, if you're doing any sort of, you know, development work at the desk side, local, secure, fast, plenty of vram and then ultimately being able to deploy it. So with that, we'll see you on the next one. Do what you want, what you want.
[05:15]
A
This podcast was produced in partnership with Amaze Media Labs.