Loading summary
A
Foreign.
B
Hello, and welcome to the Nvidia AI Podcast. I'm your host, Noah Kravitz. Right now, the world is watching AI evolve faster than ever before. And that progress isn't just being fueled by technological breakthroughs in scale. It's being fueled by human collaboration. Open source models, open data sets, and shared research are giving developers, enterprises and governments the building blocks they need to innovate together. Nvidia has been part of this movement from the very beginning, contributing open libraries, publishing data sets and research, and most recently, sharing families of open models. Which brings us to today's episode. We're talking about Nematron, specifically unlocking the secret of Nematron. On the surface, Nematron may look like just another open model family, but the real story is how it anchors Nvidia's strategy for building accelerated infrastructure and driving increased adoption of AI everywhere. Joining us to unpack this open secret are two of the leaders driving this work forward. Brian Catanzaro is Vice President of Applied Deep Learning Research at Nvidia, and Jonathan Cohen is Vice President of Applied Research at Nvidia. Brian and Jonathan are here today to talk Nematron. I can't wait. Gentlemen, welcome to the AI podcast. Thank you so much for making the time to join us.
C
Thank you for having us.
A
Yeah, it's great to be here.
B
So let's start at the top and I'll direct this one to you, Brian, to get us going, if that's all right. What is Nematron? And as a follow up, why did Nvidia decide to build its own family of models when you already work with essentially every major model builder out there?
A
Nemotron is Nvidia's open technology for artificial intelligence. Nemotron includes models that we train. It also includes data sets that we release, as well as algorithms and methodologies. And our goal with Nemotron is to support the community in building customizable AI that can be integrated deeply and tightly into the beating heart of every business around the world. Our second goal with Nemotron is to help Nvidia design systems for deploying and constructing AI. There's a lot of questions about how AI works that touch the various design decisions that go into building Nvidia's software and hardware systems. And we can answer those questions better because we build Nemotron. So, you know, ultimately we're excited to open up Nemotron even further and continue to put it out there for the community. We love learning from the community. Nemotron is built in collaboration with the community, where we learn A lot from what others are doing in the community and then we try to contribute what we can back. We think that this is a great opportunity for Nvidia to support the AI industry.
C
Yeah. So Nemotron is collection of large language models and it's probably worth saying, so they're 10 text models and multimodal LLMs and we've kind of settled on like three sizes or we think of them as weight classes. So we have smaller models that we call nano models, we have medium sized models we call supermodels, and then we have the largest frontier size models which we call ultra. So Nematron collectively refers to everything Brian said and then this family of models that were 12.
B
And so how does Nematron fit into Nvidia's broader AI strategy? Because from what I understand, it's not just, and I say just the models are huge, but not just the models, but it's kind of a cornerstone for growing the ecosystem. Yeah.
C
Well, if you think of Nvidia as an accelerated computing platform company and you ask the question, well, what does an accelerated computing platform mean in this age of AI? So it includes chips, it includes networking, it includes software stack, but it also includes the models. And when we think about what is a platform today, a platform is all of those components. And if you're building AI applications and you care about the quality of the models, but you also care about the performance. Like Brian mentioned, one of the reasons that we train Nemotron models is so we can learn. We are pushing the limits ourselves so that we can learn and make sure that our platform is the best. But it also means we can do co design. We can cooperatively design the model architecture, the software stack and the hardware across the whole all of the hardware components all together. And we've been doing that and that gives us opportunities to make things more efficient, lower latency, higher throughput, more energy efficient by improving things across that entire stack up into the model architecture. And so Nemotron is a really important part of that strategy as a accelerated computing platform company, where our success comes from this full stack co design and optimization.
A
Yeah, one thing I wanted to add to that is that these days there are new things that are part of accelerated computing that maybe people haven't considered. So for example, data sets that we use for pre training and post training models have a dramatic effect on how quickly the model converges. In fact, you know, comparing different revisions of our Nemotron pre training set, we've accelerated pre training by a factor of 4x just by having A smarter pre training data set, which means that you can actually train a much smarter model with the same amount of compute.
B
Yeah. What makes one data set better, more optimized to help the model converge faster than another.
A
But what we're trying to do with LLMs is build something intelligent that can help us solve problems. It can answer questions, it can reason. It turns out that if you just take all of the text that humankind or computers have ever produced on the Internet and train an LLM on it, that's kind of where the community started many years ago. But it turns out that's not the most intelligent way of building AI, because a lot of that text isn't adding very much intelligence. And so every organization that builds LLMs spends an enormous amount of effort and compute in understanding their data set, refining it, rephrasing it, using synthetic data generation. And the effort that we put into these data sets has an enormous impact on how quickly the models train and also on the overall strength of the model once it's trained. And so these days, I believe that the data sets that we release as part of Nemotron are an important part of Nvidia's accelerated computing efforts, because it's not really possible to think about how fast a system is for training. If you're training on a data set that's not very smart, it's going to take an enormous amount of compute to get to the same amount of intelligence as if you were training on a data set that was much more polished. And so that's kind of the genius of accelerated computing, is that we try to understand the problem from first principles and we try to optimize the entire stack end to end. And these days it seems pretty important that Nvidia's accelerated computing platform includes Nemotron.
B
Right.
C
I can give another example that's interesting.
A
Inference time.
C
So reasoning. The way these models reason is they generate thinking tokens, right? You ask it a question and then it generates a lot of tokens as it thinks through the answer. And there's very clear, like examples where you can generate a lot of tokens and not actually make a lot of progress towards the answer. Or you can be more efficient, generate fewer tokens and make more progress. And again, from the same perspective of accelerated computing, you don't really care, you know, did it generate 10,000 tokens? Well, you do care. You care if you can generate, you know, the same quality answer in 2,000 tokens instead of 10,000 tokens. That's a function 5x speed up. And so that's also part of the accelerated computing story. So all of these are opportunities we have to make things faster.
A
Exactly. Accelerated computing has never just been about how many arithmetic operations per second you can perform. It's really about what capabilities do you provide.
C
Yeah, yeah. And I think the key to Nvidia's historic success is as a company we've always focused very much and had deep expertise on the actual end applications people care about. Whether it was computer graphics or high performance computing or deep learning or now modern AI. It's really thinking about what's the end goal and how do you build a platform that gets you that goal with the least amount of time you have to wait. Lowest latency, highest.
B
You talked some about the openness, the collaboration and co design that's such an important piece of this open source, a big part of Nematron. What does it mean? And maybe Brian, I'll ask you this first, but either of you, what does it mean to call Numitron one of the most open AI development efforts that we've ever seen?
A
Well, we really think that it's important for AI to be trusted and widely deployed. And in order for that to happen, we think it's important that enterprises have the option to understand the data sets and the technologies behind AI and fine tune them for their own problems and then integrate them very tightly into the software and systems that they use to solve problems for their markets. We think AI is not a one size fits all solution. And we've seen in the past many instances of when open platform technologies really allow different industries to differentiate different solutions for the problems they face. You know, for example, the Internet as an open technology had really different implications for different industries like healthcare versus retail. The way that those organizations use the Internet to change the work that they do was quite different. But the fact that the Internet was an open technology allowed many companies, many industries to think about solving their problems in a new way using the Internet. And when we think about AI, it seems obvious that enterprises need that ability as well. You know, the world's most important and valuable data always has the most sensitivity about it. And so we think it's important to support enterprises as they learn how to deploy AI, that they can do it in a way that respects their work, their privacy, the important ways that they go about problem solving in sort of a unique way for their business. And so we think that it's really important that there exists an open foundation for organizations around the world to build and deploy AI. And Nemotron is how we're contributing to.
C
That I can just add one thought to that. From the perspective of accelerated computing, if you think about we come up with some way to make a chip faster, how does the world consume the benefits of that acceleration? Well, in the case of a chip, you buy a chip and you get the benefits. But what if we come up with a technique that makes models more efficient at thinking or a data set mix that saves you time at training? How does the rest of the world receive the benefits of that? In what form do you package it? I think the only answer is we have to teach everyone what we did by sharing it through open source, open weight models, sharing the data sets, explaining how they work, sharing the algorithm. So I think it's natural that open source is a delivery mechanism for the technology that's going into our platform.
B
So from a little bit of a hypothetical, I'm an IT leader or a business leader at an organization and I'm hearing what you guys are saying and we want to do this and we have specialized needs in our industry and we have troves of our data that kind of represent company intelligence and our special way of doing things that has brought us success and we're ready to embrace the AI age and transform. We could use Nematron and I'm going to walk through this and point me when I get this wrong. We could use Nematron to take an open source model and we could customize it, train it on our company data and the rest industry data things to help IT understand what we do and the problems we're trying to solve in our industry. Nemotron could help. We could add reasoning capabilities and that sort of thing to IT add. And then we have a kind of, and I don't want to misuse the term sovereign and if you guys want to talk about sovereign AI, but we would then have our own sort of customized, adapted to our business, our industry, the way we do things, our data. And it's ours because we took an open model, we trained it and so we don't have to worry about the sensitive data being out in some commercial model somewhere or what have you, because it's our model now.
C
I think that's one aspect.
B
Close.
C
Yeah, that's one. I mean there's many aspects. So for example, if you say Nvidia trains a model, a Nematron model, and it's great, but since you've disclosed all your training data and looked at your training data for whatever reason, we have some policies where this data we can't use and we could say that's Fine. Everything you need to reproduce what we did is there. You can train your own model, excluding that data. Or you say, well, I like the data, but the mix is wrong. I don't know. I'm a sovereign project and it really needs to be very good at speaking this language and understanding this culture. And that data wasn't as represented in your training set as I want it to be. Everything that we did is transparent. And so you can make these modifications yourself. I mean, that's one aspect.
B
Fantastic. Yeah, right. Nvidia's released data sets, recipes, alignment techniques alongside the models. So along these same lines of building trust and transparency, why is all of that important? Why is this full level of transparency important for the end users, you know, to be able to customize and deploy safely?
A
Well, I think ultimately, if you don't know what's in a technology, it's harder to trust it. And every business has different ways of thinking about the problems they're solving. They have different problems. And I think it's important as we get more sophisticated about deploying AI and we integrate it more tightly into business problems around the world, for businesses to be able to inspect. You know, how is this AI built? And, you know, therefore I can build trust that it's going to help my business solve problems. You know, the integration is a, is a really important point as well. So with Nemotron models, there's a really broad spectrum of integration. You can run it locally on a machine without any Internet. You could also run it through an API in the cloud and everything in between. You can deal with your business's sensitive data using the same data management and security protocols that your business already has. And I think for a lot of applications of AI, that level of customizability and introspection is going to be essential. I also want to say that I think there's a real big benefit to open technologies in the sense that they tend to develop faster. So Nvidia believes that helping AI grow creates opportunity for us. And we think that one of the best ways of helping AI grow is to contribute in an open way to the community. I think when you consider a technology that's being developed kind of independently by a few different organizations, but they're not able to share very much about what they're doing, there's obviously going to be a lot of reinvention that has to happen and the progress is going to be slower. And so if we are able as a, as a, as a community to come together, you know, to contribute ideas, data, models to each other and learn from each other. I think that we'll progress faster. And you know, we've seen that over the past couple years as various organizations have been contributing to the open technologies for AI. It's really helped the community move forward. And you know, like for example, OpenAI just released GPT OSS. That was a fantastic thing for the field. Alibaba has been doing some great work with QN models. Obviously, Meta's family of LLAMA technologies has been extraordinarily helpful to the field to help the field grow and develop. And at Nvidia, we know that when AI grows, it's opportunity for everyone. It's opportunity for businesses that they can solve new problems. And it's opportunity for us because we work with every business that's building AI.
C
Yeah, I like, I mean, a good example of that playing out is our own research groups. Often will you like if you have some idea for a way to improve a model? We often will just take one of the existing open weight models, not necessarily Nemotron. That sort of gives you the best vehicle for trying out your idea.
B
Right.
C
Improve it in some way and publish a paper or release the result. Right. So like we are building on all the work from these other organizations that release open weight models all the time as well.
B
And that's, you know, this is no news to you guys or probably many listeners to the show, but that same sentiment has been echoed so many times over, I mean, over the past couple of years in particular by guests we've had from all industries and walks of research and life. And you know that the more we're collaborating, the faster we move as a whole.
C
Yeah.
B
Our guests today are Brian Catanzaro and Jonathan Cohen. They're both from Nvidia. Brian is vice president of Applied Deep Learning Research, while Jonathan serves as vice president of Applied research. And they're here talking to us about Nvidia Nemotron family of open models and open technology. We've been talking about the importance of open open technologies to the AI community in general, to Nvidia, the learning that goes into informing really the whole stack, the hardware, the models, the software, the connectivity, networking, everything, and the data sets, as Brian was talking about, and how it all really comes to make things advance faster and more efficiently, sort of broadly speaking. Neumatron has been a huge effort at Nvidia with many teams working together, they still are, to bring this to life, from advanced research to commercially licensed models and data sets. Now, can you guys talk about the pipeline from research to production models? What that's like what it's been like for Nematron.
A
Well, it is a huge effort and it takes a lot of people with different talents coming together to build Nemotron. We've organized the project around basically the different stages of development that a model has to go through, pre training, post training, alignment and so forth, as well as different functional areas like for example, long context recall or image understanding. So within each of these areas we have multiple teams working together, some of which are very researchy, very theoretical and others are very engineering focused and then whole spectrum in between. I would say it's a great honor to be part of a project where people are coming together to build something like this. It's also a big challenge, you know, trying to get so many brilliant minds pointed in the same direction. I think that's one of the central challenges facing every AI development effort around the industry these days, is how do we work together to build one amazing thing as opposed to building a hundred small things. And that's really something that's been inspiring to watch come together.
C
Yeah. If you compare it with a large scale software effort, there's this famous observation called Conway's Law, which is the communication patterns that are observed within a piece of software tend to mirror the communication patterns of the organizational structure that build that software. And training a model is like, I mean, Conway's law is definitely an issue, but it's just a very different endeavor. It's not like I build a module and you build a module and we have a nice clean interface. Somehow all of these things have to get combined together. You know, image, Brian's example, image understanding and long context recall, somehow all have to get combined together into a single training recipe and a single data set mix. And so the modularity is I think less than in software engineering. And so this idea that you can just decompose it and have lots of teams with sort of clean interfaces between them doesn't really work as well. And so I think there's a real struggle in scaling up an effort like this to a very large team to do something really big.
B
Is there a new paradigm emerging?
C
It's an interesting question.
B
Organizing. Yeah.
C
I wonder over the next five, 10 years there'll be some new law named after someone and some, you know, management principle here. It's, it's an interesting thing we've certainly been thinking about, but, but it does present these challenges. I think one of the most important principles that we've kind of settled on is you just need a lot of internal openness and transparency. You have to solicit ideas. There are a lot of people across the company and outside of the company working on all these problems. You have to solicit all these ideas and you have to encourage them all to work together. That's the only way forward. And so that just takes a very like mature culture and you know, good leadership and egoless, you know, operation and everyone being really motivated by the, at the end of the day, by the work.
A
I would say also that one of the amazing things about AI is that it's such a general technology that it really changes the way that we do AI. You know, it used to be like 20 years ago when I was a grad student that it was common for people to build state of the art models in computer vision on their own. Like one graduate student on their own could build a model that's, that was state of the art in some important area of computer vision. And you know, that's kind of how we were trained as, as PhD students is like, go be brilliant on your own. Well, with, with modern AI, the best results come from using industrial scale equipment and you know, general models that can then be taught how to solve important problems. But that requires working together. So one of the first things that AI has changed is the development of AI itself and organizations that can figure out how to collaborate and work together succeed. And you know, that's one of the reasons also that we really believe in Nemotron as an open project is because we've seen how openness internally has made it possible for us to solve whole classes of new problems with AI. We believe that as Nemotron and other open efforts come together, bring together more ideas and more force to bear on the development of AI, that the results will be stronger.
B
Jonathan Nvidia has a history of building end to end products. Self driving comes to mind. Gaming of course, superpods, but then disaggregating them for the world to use. Does Numitron follow that same pattern in your mind? And if so, how?
C
Yeah, I think so. I think when we talk about that and Jensen talks about this a lot, what we mean is our solution. But the things ultimately that we build are very complicated integrated systems with many layers and many components. And on the one hand we need to build the whole thing ourselves because it doesn't work unless you build the whole thing yourself. So we need to train a whole model. At the end of the day, you know, it doesn't, doesn't make sense for us to release like, I don't know, a way to make a reasoning recipe without actually training a model to do reasoning. You know, like you do these things and put the whole thing together. But at the same time, I think it's very important that we put all of the components into the ecosystem and allow people to consume the parts that they want and not consume the parts that they don't want. So this is how our hardware is. We design data center scale computers at this point, but we don't sell it as a single data center. We design the whole thing, we build the whole thing, then we chop it up into pieces and we sell it through normal sales channels. And people, our customers are free to take the parts they want, replace. You know, it's truly an ecosystem. You know, if you don't like the way, you don't like our cpu, use a different cpu. You don't like the storage, use a different storage. You don't like this networking, use a different networking. And we're, we're open and interoperable with all these things. And it's, it's a tremendous engineering challenge to work that way. But I think it's why we've been so successful is because it allows us to harness the power of like the entire computing industry. Because we're not really locking anyone out at all. Right. We're including everybody. And so when we think about large language models, I guess we're thinking in the same way. So we're going to develop techniques and anyone is free to take them. Other companies that train large language models for a living are free to take anything we built. They probably won't take all of it, but they're free to take anything. They want to take the software, that's great. They want to take some of our data sets, that's great. They want to take the software and the data sets and some of the training recipes, but modify them, that's great. They want to take the finished models, that's great. So in that sense, I think philosophically that's absolutely how we think about products, it's how we think about hardware, how we think about software, and it's now how we also think about foundation models.
A
And I think that's one of the things that makes Nvidia unique as a big tech company is that although we do full stack, end to end integration, we don't dictate to our customers how that technology is going to be deployed or used. We know that it's not a one size fits all problem or even assembled right. And so we're happy to support companies of all shapes and sizes in every industry. Develop and deploy AI. And because Nvidia has this orientation, the supportive orientation, where we understand that it's not one size fits all, that actually is the secret to why we are able to collaborate with all of these companies. And we want to do that with AI technology as well.
B
Kind of switching gears a little bit, but still talking along technical lines. Can you share any exciting technical breakthroughs that came about during the Neumatron development process and what they might mean going forward, specifically in terms of efficiency and deployment, but really take it as broad as you like.
A
Yeah, well, Nvidia is thinking about AI from an accelerated computing perspective. And we have a belief that the faster we can make a model, the smarter it's going to be. And this follows just because clearly if we're able to think quicker, then we can get more thoughts in the same amount of time that can help us solve problems. So we're bringing this perspective of accelerated computing to AI in kind of a unique way. A couple things just from the past few months that we've demonstrated that I'm really excited about. One is we released a model, we call it Nematron NANOV2. It is a hybrid state space model. So it's not a pure transformer model, but it uses this other technology for reasoning over sequences called a state space model that has some pretty big efficiency benefits, you know, on the same hardware compared with other models of the same intelligence. We're about six to 20 times faster. And you know, we're pretty excited about, you know, the, the capabilities of this model. But it's just the beginning. We have really ambitious plans to continue evolving the architectures behind Nemotron as well as the systems that are used to build and deploy it. Another thing that we were able to show recently is we trained a nemotron model using 4 bit floating point arithmetic and we're able to get world class results, which is really exciting because using only four bits per parameter of the neural network can be dramatically more energy efficient than using other representations. And we know that the development of AI is going to be constrained by the efficiency with which we can train it and deploy it. And so showing people new algorithms that are more efficient then is going to help push the industry forward. And you know, it's, it's not enough to say, hey, I've got the system. It's really fast at low precision arithmetic if no one understands how to use it.
B
Right.
A
So Nemotron is our way of demonstrating to the community like, hey, you can take advantage of this amazing low precision hardware to train A world class model if you follow this algorithm.
B
Right.
C
It's amazing that four bits is enough. Like if you just think about how little, how low resolution that is, the fact that that works is pretty incredible.
B
I think so maybe. Can you rephrase for folks who might be listening and myself included, who don't fully get the ramifications of what doing four bit arithmetic and these results really mean?
A
Well, one fun analogy from my childhood comes from video games. I don't know if you remember the 8 bit Nintendo system. Yes, of course, then there was the 16 bit Nintendo system and it was like, wow, there's so many more colors with the 16 bit Nintendo. It's like wow. The, the, you know, look at that smooth gradient. Right. So if you, if you only have eight bits you can represent 256 numbers. With 16 bits you can represent about 65,000 numbers. With four bits you can represent 16. Right. So it's a very, very small amount of options to pick from. Like if you're going to draw a picture using 4 bits numbers, it's actually going to be pretty hard to make it look smooth.
B
Right.
A
Of course, what we're doing with our four bit training hardware and software isn't as straightforward as just using exactly one of 16 numbers for every parameter in the neural net. They actually come in blocks. The blocks have scaling factors attached to them in hierarchical ways. And that's all accelerated by software and hardware that we've built in Transformer engine and in our Blackwell GPU generation. And so it's kind of amazing that, you know, we're able to take this raw material that's very coarse and rather small and we're able to make it flexible enough to train a world class neural network.
C
Right. But on some level I, I always like to think of this as like, you can have any number you want as long as it's one of these 16 and, and somehow, you know, it still works. It is pretty miraculous.
B
Yeah. Amazing. As we wrap up the conversation. But look ahead to the future of Nematron. What can developers and enterprises expect next? You've talked about it a little bit, some of the things coming through the pipeline and that you're working on. But what can devs and enterprises expect from Nematron? And you know, perhaps more importantly, how can they start to engage with Numitron right now?
C
Well, I can just say, you know, you should expect us to train some big models. We've trained recently, some smaller models, we'll be training some bigger models. You can expect us to incorporate more multimodal technology we have from Nvidia, we have some of the world's best, well, I guess the world's best open weight speech recognition models. At this point that technology hasn't really been incorporated into Nemotron and we're working towards adding audio and these kinds of capabilities. So I think there's a lot of just really cool technology. We're working on really bringing all of the best technology across Nvidia and concentrating it in Nemotron. I think that's, you know, that's something people can look forward to. I don't know Brian, what you would say.
A
Well, yeah, I would also reinforce how important reasoning is to Nemotron. It's been a core part of Nemotron development for the last year and we were super proud that we were able, for example, to take Nemotron reasoning and add it to Meta's llama family. We know that there's a lot more work to do to make reasoning even stronger and we're really excited to do that.
B
Brian. John, this has been great, really informative conversation, but just to hear the two of you talk about Nematron from the inside out, just, just a treat. So for folks who are listening and want to get started with Nematron, the models are available now.
C
Yeah. So our models are available on Hugging Face. You can download them.
B
Perfect.
C
You can also experience in all of them on build.Nvidia.com and download them there as well.
B
Excellent.
A
We do have a landing page on Nvidia.com for Nemotron and we're busy filling it out right now gathering all of the Nemotron content together in one place. So I would go there.
B
Excellent. And work in progress, I'm sure. Is the content, like the technology itself evolves and evolves again. John, Brian, both of you, I know tremendous amount on your plate with Nematron and everything else. So we appreciate the hour to come on and you know, help shout from the rooftops, tell the world about all the fantastic work you and your teams have been doing. Congratulations and all the best going forward. You know, as you said, not just inside of Nvidia, but collaborating with the community and working to raise all the votes together.
C
Thanks for having me.
A
Thanks.
Date: October 21, 2025
Host: Noah Kravitz
Guests:
This episode explores the philosophy and strategy behind NVIDIA's open source AI initiative, "Nemotron." The conversation goes deep into why openness accelerates AI development, how collaboration shapes the future of AI, and the unique technical innovations that make Nemotron stand out. The guests reveal how Nemotron acts as a cornerstone for NVIDIA's end-to-end AI platform, discuss its impact on efficiency, transparency, and customizable enterprise AI workflows, and give listeners a peek at what's next for the Nemotron family.
Quote:
"Nemotron includes models that we train...data sets that we release, as well as algorithms and methodologies. Our goal...is to support the community in building customizable AI that can be integrated deeply and tightly into the beating heart of every business around the world."
— Brian Catanzaro (01:45)
Quote:
"Our success comes from this full stack co design and optimization."
— Jonathan Cohen (04:40)
Quote:
"We've accelerated pre training by a factor of 4x just by having a smarter pre training data set..."
— Brian Catanzaro (05:09)
Quote:
"You care if you can generate the same quality answer in 2,000 tokens instead of 10,000 tokens. That's a function 5x speed up."
— Jonathan Cohen (07:35)
Quote:
"We really think that it's important for AI to be trusted and widely deployed. And in order for that to happen, we think it's important that enterprises have the option to understand the data sets and the technologies behind AI and fine tune them for their own problems..."
— Brian Catanzaro (09:03)
Quote:
"Open source is a delivery mechanism for the technology that's going into our platform."
— Jonathan Cohen (11:36)
Quote:
"Everything that we did is transparent. And so you can make these modifications yourself."
— Jonathan Cohen (13:18)
Quote:
"If you don't know what's in a technology, it's harder to trust it. And every business has different ways of thinking about the problems they're solving."
— Brian Catanzaro (14:03)
Quote:
"All of these things have to get combined together...the modularity is less than in software engineering...there's a real struggle in scaling up an effort like this to a very large team."
— Jonathan Cohen (19:44)
Quote:
"Organizations that can figure out how to collaborate and work together succeed."
— Brian Catanzaro (22:10)
Quote:
"We're open and interoperable...we're not really locking anyone out at all. Right. We're including everybody."
— Jonathan Cohen (24:45)
Quote:
"We released a model...a hybrid state space model...on the same hardware compared with other models of the same intelligence, we're about six to twenty times faster."
— Brian Catanzaro (26:50)
Quote:
"It's amazing that four bits is enough...you can have any number you want as long as it's one of these 16 and somehow...it still works."
— Jonathan Cohen (30:36)
Quote:
"We have some of the world's best open weight speech recognition models...that technology hasn't really been incorporated into Nemotron and we're working towards adding audio and these kinds of capabilities."
— Jonathan Cohen (31:15)
Quote:
"You should expect us to train some big models...Incorporate more multimodal technology...bringing all of the best technology across Nvidia and concentrating it in Nemotron."
— Jonathan Cohen (31:12)
Quote:
"The models are available now...on Hugging Face...on build.Nvidia.com...and our Nemotron landing page."
— Brian Catanzaro & Jonathan Cohen (32:28–32:49)
On Open Source's Acceleration Effect
"The more we're collaborating, the faster we move as a whole."
— Noah Kravitz (17:14)
On Full Stack Ecosystem Philosophy
"We're including everybody...it's why we've been so successful."
— Jonathan Cohen (24:45)
On the Four-Bit Training Revolution
"With four bits you can represent 16...if you're going to draw a picture using 4 bits numbers, it's actually going to be pretty hard to make it look smooth."
— Brian Catanzaro (29:16)
Summary Reflections
"Organizations that can figure out how to collaborate and work together succeed. And that's one of the reasons also that we really believe in Nemotron as an open project."
— Brian Catanzaro (22:10)
This episode delivers an insider's understanding of how open source principles, community-driven research, and end-to-end platform thinking are driving the next generation of AI at NVIDIA. Nemotron stands as a unique, open family of models and methodologies—engineered for flexibility, trust, and cross-industry adoption—and exemplifies a philosophy of openness, not just for technical gain, but as a foundation for trust, speed, and opportunity in the AI age.
For anyone seeking to deploy, customize, or just understand cutting-edge AI, Nemotron and this conversation provide a practical, transparent jumping-off point.