
Vulkan is a low-level graphics API designed to provide developers with more direct control over the GPU, reducing overhead and enabling high performance in applications like games, simulations, and visualizations.
Loading summary
Host
Vulkan is a low level graphics API designed to provide developers with more direct control over the gpu, reducing overhead and enabling high performance in applications like games, simulations and visualizations. It addresses the inefficiencies of older APIs like OpenGL and direct 3D and helps solve issues with cross platform compatibility. Tom Olson is a distinguished engineer at ARM and Ralph Potter is the lead Kronos Standards Engineer at Samsung. Tom and Ralph are also the outgoing and incoming chairs of the Vulkan Working Group. They joined the podcast to talk about earlier graphics APIs, what motivated the creation of Vulkan, modern GPUs and more. Joe Nash is a developer, educator and award winning community builder who has worked at companies including GitHub, Twilio, Unity and PayPal. Joe got his start in software development by creating mods and running servers for Garry's model, and game development remains his favorite way to experience and explore new technologies and concepts.
Joe Nash
Welcome to the show. Thank you so much for joining me today. How are you doing?
Tom Olson
Hi Joe. Doing good. This is Tom Olsen by the way.
Joe Nash
Awesome. Perfect. How about you, Ralph?
Ralph Potter
Hi Joe. Yeah, doing good. Thank you for having us both.
Joe Nash
Awesome. So to kick us off, you know, I introduced you both there as the chairs of the Kronos working group. There's a bit more to both of your stories. Tom, do you want to kick us off by introducing yourself and how you came to be working on Vulkan?
Tom Olson
Sure. So I work for arm, which is known as a CPU company, but we do also make GPUs, the Mali GPU family, and I been a professional Graphics Standards committee chair for the past 18 years. It's a bit horrible. I started chairing the OpenGL ES standard, which some folks may have heard of. It's the mobile version of OpenGL and when the need came out for Vulcan in 2014 or so, I for my sins picked up the flag and ran with it and helped to get that effort started. And I've been doing that ever since. I've reached the stage where it's time for me to get out of the way and let younger people take charge. So hence ralph.
Joe Nash
Perfect. Yeah. And speaking of those younger people, Ralph, how about you?
Ralph Potter
Yeah, so I work for Samsung, where I'm located in our GPU team, specifically Samsung Mobile, the portion who make kind of mobile phone handsets. I've been in the Vulkan Working group for six or seven years now, five of them representing Samsung. And yeah, I don't have Tom's 18 years of experience of chairing, but I Have a little bit. And it will be hard to replace Tom, but we will do our best, of course.
Tom Olson
Can I point something out?
Joe Nash
Yeah, please do.
Tom Olson
Both Ralph and I have been working on standards before. Our current employers, this is. It's kind of the culture of the group. There's a large number of people involved with creating Vulcan, where their involvement persists across multiple employers. It kind of gets in your blood. It's hard to put down, and it's a rare skill that companies need, so they will often hire you to keep doing what you're doing.
Joe Nash
Fascinating. So actually, yeah, that leads me into, I guess, a topic that I think will be interesting to explore, which is how Vulkan is developed and what Kronos is. But I guess first to set the scene for folks who somehow aren't familiar with Vulkan. Tom, can you tell us briefly what Vulkan is?
Tom Olson
Sure. So Vulkan is the modern way to program GPUs. In the past, you've heard of APIs like DX9 and OpenGL, and those were kind of graphics APIs, and there was a big magic driver that turned that into GPU commands. With Vulkan, what we've done is remove. It's not really a graphics API, it's an API for controlling and programming a GPU. All GPUs today, you probably know, are highly programmable. They have multiple execution engines, and you can write code for them. But getting it to run and run efficiently in parallel on the GPU is difficult. So Vulkan's job is to expose that power.
Joe Nash
That's a really interesting distinction between it's not a graphics API, it's an API for controlling a gpu. Very, very interesting. So I guess what are the. You mentioned, obviously it's easy to control that power. What are some of the advantages of this approach? What problems are you directly looking to solve versus the previous generation?
Tom Olson
So in the previous generation, the basic problem is that a GPU's programming model is incredibly different from a CPU programming model, Even a CPU cluster programming model. Massively, massively parallel. Data parallel, typically. And increasingly flexible, too. And the basic problem was that the old generation of APIs, OpenGL, et cetera, presented a sort of very nice, convenient, but very CPU like programming model where you gave a command, the device did it, you gave another command, the Device did it. GPUs don't work like that at all. You queue up massive numbers of commands and you shove them in the driver and the hardware, take them all apart, execute everything. It's kind of Like a data flow paradigm, everything runs as soon as it can as long as the hardware understands the dependencies. And so the result is that with the old APIs, you couldn't get the efficiency you wanted because you were working through this very thick abstraction and this very sequential abstraction. You need an API that exposes the massive parallelism of the device. And so Vulkan does that. Specific problems we had in the old days, OpenGL had no real way to make use of multiple CPU cores. So that if you were trying to keep the GPU fed with commands, because the VPU is just. GPU is a voracious consumer of data and commands and you might need multiple threads in order to generate those commands fast enough, you couldn't do it. It wasn't in the programming model. Vulkan solves those problems.
Joe Nash
So one of the goals of Vulkan, as I understand it, was to be, you know, hugely cross platform and support lots of platforms, which I guess is kind of shown by both of your backgrounds here between ARM and Samsung, obviously focusing on a whole world of devices. Ralph, can you talk about what kind of platforms you're looking to support with Vulkan? Because I understand it's not. I think most people think of like, you know, PCs and the consoles.
Tom Olson
Right.
Joe Nash
I understand it's like a whole world of things.
Ralph Potter
Yeah, sure. So definitely it exists on PCs. All of the desktop GPU vendors have Vulkan drivers. It exists on some of the consoles, both handheld and dedicated ones. It is pretty fundamental to Android these days and becoming more important. So the vast majority of mobile phones that you can buy today will support Vulkan. There are some outliers, but the vast majority will, at least in the Android space. You will also see it. It may not be so obvious, but it also exists on other devices as well, more embedded devices, appliance type devices. You might find there is Vulkan in there, even though it's not obvious to you as a user. So the short answer is, if it's got a relatively modern gpu, there's a good chance that there is a Vulkan driver available somewhere.
Joe Nash
Right. I think I saw in a talk from Siggraph that someone mentioned a Coke machine that might be new, Tom. Actually, that example. Yeah, yeah, yeah. Particularly cursed programming environment. Cool. So I think that really sets the foundation for what Vulkan is. So I guess I wanted to talk a little bit about the history, which obviously you mentioned a bit in your intro, Tom, about how it around the start of it and transitioning from OpenGL and OpenGL ES to Vulkan. You mentioned that there was a need for Vulkan and that's when you started working on it. What was that need? Can you give us the run up and the history to Vulkan?
Tom Olson
Sure. So if you go, boy, it's shocking how long ago this was. So go back to say, 2012 to 2014. The dominant graphics APIs of the day were OpenGL and DX11, which was the most modern version of DirectX on Windows. And they both had this problem that they were lovely programming environments. It was very comfortable and easy to move into using them. They worked the way a CPU programmer would expect a graphics API to work, but people had enormous difficulty getting performance out of the devices. So as a result, for example, on consoles, nobody used them at all. Well, they used DX a bit on Xbox, but mostly they just threw them away because they could not get the performance. And so around that time, you could say it was kind of a revolution. The farmers with the pitchforks, developers were looking for alternatives. And you had things emerge, like Mantle, which was an AMD proprietary API, since AMD had cornered the console market at the time and it had this property that it exposed the parallelism at the price of not being as nice. A programming environment is much more painful to use and complex to think about, but it gave you the power. And so people were very excited about that. And there was talk of moving toward it. Microsoft began moving in the same direction with an API called DX12, which I should say DX12 is to DX11 pretty much what Vulkan is to OpenGL. It's we kind of divide the world now into modern GPU APIs, which is Vulkan, DX12 metal, and the old ones, which is OpenGL ES DX9, 10, 11. So these modern APIs were emerging and we felt that OpenGL was going to be left behind. We could see that it had problems. Well, I would say we were all working Ralph too, I believe we were all working in the OpenGL space. I was chairing OpenGlass. We could see that there was no way we could evolve those APIs in a gradual way to meet the need to provide the efficiency that developers were just demanding. And so that led to kicking off the effort. We kicked it off in 2014. Took us a year and a bit. We came out in early 2016 with Falcon 1 0. So did that answer your question?
Joe Nash
It did. And it also answered another question. You know, obviously DirectX 12 and metal both. My timeline's a little bit Fuzzy. But my understanding is they're kind of all at the same time, all the same generation. And so I was going to ask, you know, how did that happen? Why were they all at the same time? But I think you've really filled that in. So I guess one of the things you mentioned there is, you know, this kind of step change in developer experience in terms of, you know, what the developers could expect the API to do for them and how much extra work they had to put in. How do you go about navigating that in terms of an API design philosophy? Like, you know, you're trying to meet the needs of the users as graphics programmers. They come in, you know, there was a demand for this level of API, but I imagine you still have a lot of people who are still expecting the affordances of the old API. How do you juggle that painfully?
Tom Olson
It's a constant, I won't say balancing act, but it's a constant debate. To be frank, in Vulcan as it is today. Well, Certainly in Vulcan 1.0, we created an API which gave developers what they said they wanted, but was frankly quite difficult to use. And we made sort of an intellectual commitment. And by the way, when we started this effort, we had massive participation, particularly from game engine companies. Epic, Valve, well, Valve first and foremost, they were real champions of Vulcan early. But Epic, Unity, all the majors were there and we had a big fight about are we going to make concessions to ease of use or are we going to say performance is first? Full stop. And we pretty much said, no, performance is first. We will never sacrifice that. What's happened is that the hardware has gotten easier to use and so Vulcan has gotten easier to use in parallel. And modern Vulcan is not nearly as gnarly and, you know, sharp edged as Vulcan was early. Boy, I think I'm rambling a bit here, but I'm trying to make sure I hit the various aspects to this. I would say one way we deal with this problem is by tooling. So a feature of Vulcan that we think is one of the best ideas ever. I wish I could remember who in the group had it. So a feature of Vulkan, since it's dedicated to efficiency before all else, it does not check for errors. When you give a command to a Vulcan function, if the commands you give it are meaningless, the specification says you get undefined behavior, possibly including program termination. So you make one mistake and it's dead. Driver RESTARTS what do you do? Well, what we do is Vulcan has a defined interface to shim layers. We call it the layer system. And so when you create a Vulcan, you're a programmer, your program says, I want to use Vulcan, give me a driver, please. You go through this negotiation, you can say, please install the validation layer on top of Vulkan. So if you do that, you get the same interface, all the same functions. But when you call, if you pass garbage information into a Vulcan function, the validation layer checks it before it calls the underlying driver and it logs an error if you did something wrong. The validation layers are incredibly powerful and useful. The investment that's gone into them is millions and millions of dollars. It's very complex software, a lot of it paid for by Valve, a lot of other stuff written by members, but it's one of the most important things. So we're trying to provide you with a safe and sane programming environment, but not at the price of slowing down the hardware. So the idea is you develop with validation turned on. When you ship your code, you turn it off and suddenly everything runs much faster because the driver's not checking any errors itself. RALPH what have I forgotten? I kind of rabbit holed on now.
Ralph Potter
I, I think all of that is correct. If I was going to give one quick piece of developer advice, I would say if you're writing a Vulkan application and you are doing it without the validation layers enabled, you are doing it wrong and you will come to regret it. They are pretty fundamental to that. I also agree with Tom that this balance of usability and the challenge of using the API is a difficult problem. If you go back to our kind of 2016 launch publicity, at the time we said Vulkan is not the API for everybody. I think it has become less thorny to use as time has gone on. There is still no free lunch. Like, it is still definitely harder to get started in Vulkan than it is to get started in OpenGL. There is a higher expectation, Tom said that this is an API to control a gpu. There is a higher expectation that you will understand how a GPU functions than maybe there was in OpenGL. I think once you have that understanding and once you've got over the initial hurdles of how to get started, the fact that it is more predictable, there are less unexpected driver heroics going on. There is a place in which you can say it is more workable, but it requires a certain base level of understanding, Certainly the barrier to entry is higher. That's kind of undeniable and it's kind of intrinsic to what we built.
Joe Nash
It's a really interesting point about if you understand How GPU works, it's easier to use. I feel like in recent years there's been a lot more general awareness of how GPUs work from Basel programmers because of general purpose GPU and obviously ML and AI and is accelerating people's usage of GPUs. Do you think that understanding is becoming more widespread in the developer community and making it easier for developers at large to use Vulkan or. Yeah, I don't know.
Ralph Potter
I think. Well, it's hard to answer that question without assigning where we are today versus where we are with the old API. There is a lot of information out there nowadays. You know, there's been a lot of presentations, a lot of talks, a lot of documentation from GPU vendors that will tell you nowadays how things work. In the OpenGL three days, I'm not sure how much of the same information was out there for the general public to consume in the first place.
Joe Nash
That makes sense and goes along with Tom's point about, you know, because the graphics cards are easier to use. Vulkan is therefore easier to use. Absolutely. So you mentioned throughout the explanation, you mentioned Valve and you mentioned members and sponsors and we had the introductions, both of you, about working for members of the group. So I think it'd be great to talk about now about what the Khronos group is, how it's organized and how Vulkan is developed. Tom, would you like to kick off with. I guess start with what is Khronos and the working group.
Tom Olson
Sure. So Kronos is. Well, it's an international consortium and standards body with the mission statement of connecting software to hardware. And so they generally, most of the products they create, though not all, are interfaces between some gnarly piece of hardware like a video accelerator, a computer vision accelerator, a graphics accelerator and applications. And their standards kind of range in approach from being hard over in the direction of developer friendly and relatively easy to use to things like Vulcan, which are, you know, here be dragons. But you'll get power if you use them. It's got about 100, 120 members ish today, I think. And it includes, you know, Samsung, ARM, Intel, AMD, silicon companies. It includes game engine companies, Valve Unity, Epic, several others. It includes software consultancies, Lunar G, who create the SDK and the validation layers for us. There are other kinds of people involved, it's a wide variety. The thing that ties us together is that there is an IP agreement and this is very important. You don't want to read the legalese, but in a nutshell, we agree that if you have patents that are necessary to implement Vulcan or any other of our ratified standards, you agree to license them to implementers of Vulcan at no cost if they're necessary. Patents on techniques for implementing Vulcan, like particular circuits for this or that you can own, and you can enforce the patents, but something that the standard itself necessarily infringes, you have to license or you can withdraw from. You know, there's a way to withdraw yourself, but it's kind of suicidal. So it's very rare to do that. So that's Chronos, a high level. There's a board, Abby. There's all this infrastructure. Then there's the Vulcan working group, which has the members that I said on a typical call, we have maybe 40 people from maybe 20 companies participating. We do design work on new functionality and we do this maintenance call, which we did this morning, where we go through and fix the corner cases and answer developer complaints. And we have a presence on GitHub. And anybody in the world can come in and say, this doesn't seem to work the way I thought it did. Is the spec wrong or what? And we will jump on that and answer that. And often it results in spec clarifications. We do other things. We make a conformance test so that if you're implementing Vulcan and the spec says, though, you know, implementations must do this, we make sure they do. And that's our biggest expense. We spend about a half million a year more than that actually writing those tests. We have software contractors who do that for us. Then there's the SDK and the tooling. There's a compiler that is used for, As I said, GPUs are programmable. You program them in special languages, and there's a compiler for that. Those are all things that we maintain as part of this effort.
Joe Nash
Okay, that opens up so many questions. I want to start with the conformance tests. So you mentioned if there's an implementation, you will test it, and that's responsibility you take. So when you say an implementation, what does that mean? And I guess my question that came from that is like, if someone goes and starts a new Vulkan implementation, just like some random person, and then it says, cool, do you have to test it? Is that how that works?
Tom Olson
So here's how it works. Vulkan is a trademark of the Kronos group, and if you want to use that trademark, you have to have permission. And the Krones groups on the website, the guidelines say you can use it for conformant implementations. And there's some weasel words about, you know, if it's in development and it's not certified yet. But basically when you've got your thing done and sorry, let me back up and say, typically a Vulkan implementation is a device driver. It comes from a GPU vendor and you get it and install it on your machine. Because you have the right gpu, you install their driver. We have the ability, this is built into the infrastructure that if you have two separate graphics cards in your machine and two GPUs from two different vendors, you can put in drivers for both of them and it'll work. And your application will have to choose when it's setting up, starting to use Vulkan, say, well, which device do I want to run on? But generally a Vulkan implementation is a device driver. There's a lovely software implementation out there called Lava Pipe. And if you are experimenting and learning and having trouble or you just don't want to install a device driver, you can use Lava Pipe. It's quite efficient actually. Sorry, I got sidetracked.
Joe Nash
No, absolutely, that's perfect. Yeah, Lava Pipe sounds really cool. And I think that ultimately answered my question. So sorry, I'm now going backwards through your answer to how the Acronis worked. So when it comes to participating in the working group, you know, you work at arm, Ralph works at Samsung, what is the, I guess, arrangement there? The members give employees over to the working group full time. How does that work?
Tom Olson
No, so well, it's up to the member. Members of Kronos are typically companies. There are a small number of individual contributors approved by the board, but generally to work in Kronos, your company joins, they pay a fee, it's a couple of tens of thousands a year and then they have the right to participate. And what that typically means is they tell certain of their employees, part of your job is to go to the meetings and contribute to making Vulcan better and making Vulcan work for the community. So in my case as chair, it's been a full time job for me. But that's rare for most of our members. They're working maybe 10 to 40, 50% of their time devoted to working on Vulcan and the rest doing things for their own companies. Participation involves, in the case of Vulcan, two 90 minute calls a week, new tech and old tech. Plus we have subgroups. There's a separate group that deals with ray tracing and they have their own meeting. There's a separate group that deals with machine learning and they have their own meeting. There is a separate group for dealing with the programming language that is used to program the programmable parts of the gpu. We have a few others. There's like a marketing committee, et cetera. So you can get as involved as you want. You can't really be effective if you aren't spending at least 10% of your time on it, because you kind of need to be known, you need to have traction, you need to understand what's going on and there's a minimum cost to that.
Joe Nash
Yeah, that makes total sense.
Host
This episode is brought to you by WorkOS. If you're building a B2B SaaS app, at some point your customers will start asking for enterprise features like single sign on, skim provisioning, fine grained authorization and audit logs. That's where WorkOS comes in, with easy to use and flexible APIs that help you ship enterprise features on day one without slowing down your core product development. Today, some of the hottest startups in the world are Already powered by WorkOS, including ones you probably know like Perplexity, vercel, Brex and Webflow. WorkOS also provides a generous free tier of up to 1 million monthly active users for user management, making it the perfect authentication and authorization solution for growing companies. It comes standard with rich features like bot protection, MFA roles and permissions, and more. If you are currently looking to build SSO for your first enterprise customer, you should consider using workos. The APIs are easy to use and modular, letting you pick exactly what you need to plug into your existing stack, integrate in minutes and start shipping enterprise plans today. Check it out@workos.com that's workos.com so I.
Joe Nash
Guess into the particular so you know, you've got those meetings, but Vulkan has, like any software project, a cadence of releases and things that get added. And the meetings you're having, you know, have to come out. Ralph, as the person who's now responsible for this, can you talk to us about the roadmap and how they're constructed? Like there's a couple of terms that I think would be useful because I know there's been some change in how the roadmap and how versions have worked over the years, from core version, the roadmap profiles to milestones. Can you lay all that out for us and how it works?
Ralph Potter
Yeah, so the first thing to understand about Vulkan is that we have a core API. It's kind of the default set of things that everybody implements the mandatory requirements. On top of which we have a notion of a thing that we call an extension, which is another package of functionality that GPU vendors or implementers who feel that it's valuable can implement this extension specification. And it's essentially an optional piece of functionality that they might decide that their market and their customers see value in. And the way that we used to do things is that we would release a core version on a pretty regular two year cadence, which would roll up a certain amount of functionality that had been exposed in extensions. But extensions would flow out throughout the year on a pretty ad hoc basis. And a few years back we came to the realization that this was getting extremely difficult to handle. As a software developer, we have 11 adopters, I want to say off the top of my head, somewhere around that field, and we had all made different decisions about what we felt were valuable extensions. The extensions themselves contain optional sub functionality in them. Identifying exactly what you could expect as a developer became very difficult. On top of which we had always taken the view that the core API had to be capable of running on more or less everything. You referred earlier to Tom saying Vulkan on Coke machines, that was sort of a constraint on the core API was from 2016, when we launched Vulkan 1.0. We never raised the minimum specs that you needed to run the core on. And so our only approach was to add more and more extensions as a route to trying to bring some order to that. Our process now is we define a thing that we refer to as a roadmap, which is again a collection of extensions and features, but we say for a particular subset of devices, we describe it as immersive graphics devices. You can think mid to high end smartphones, desktop PCs, consoles, that everyone will ship devices that fit the requirements of a particular roadmap by a particular point in time, or approximately that point in time. So we hope that brings some cohesion. Those have dates on them. So we've released a roadmap 2024 earlier in the year. I don't think it will be a huge surprise to anyone to say that there will be another one coming. And we have now got into a model where we can plan the rough content of roadmaps many years out in advance, which is also important because hardware roadmaps are amazingly long. And if we need to have a conversation in the working group about we would like us all to support Feature X. If somebody doesn't have it in hardware, we're talking about probably five years to go from a hardware design to an implementation, to something that shows up in a product. And so we have a roadmap that for a couple of years out and we have more tentative things for as far out as 2030. And when we get that far out, they're kind of nebulous. Maybe they won't arrive in practice, but there's a structure to it now. And so I think my message would be, if you're a developer trying to figure out where we're going, the roadmap tells you directionally where we're going. The core API is supposed to tell you what you can rely on to exist on any device that has updated drivers is kind of how I would categorize it.
Joe Nash
That hardware view and the amount of time that adds to it is. Yeah, that's a really fun constraint for working in this world. So obviously you've just told me there that you do plan out far enough in advance, which leaves me no choice but to ask what's next for the next milestone, which I guess is 20, 26 is every two years, right?
Ralph Potter
There is a milestone plan for 26. I believe that we shared some of this at Siggraph.
Tom Olson
Yeah, I had slides on it, but I can't remember.
Ralph Potter
I am now trying to recall your slides, but I mean things that I know are on there. We have some work on debugging improvements. I believe there's some work on compute improvements. I believe we talked about ML work to come.
Tom Olson
We've got a couple of robustness features. Well, we have this expectation that WebGPU, which is the web graphics API, which is vaguely Vulcan like, but much friendlier and does a lot more work for you and therefore runs slower. But it is what it is, it'll be great for learners. And anyway, so they have very strict requirements for safety because you're going to run code off the web and you have no idea what it is and you don't want it to screw up your machine. So a bunch of robustness features that will make it, we hope, possible to write an interpreter for web GPU that is absolutely impossible to crash from the outside. We have that. We have a bunch of stuff related to getting compute parity with OpenCL, which is a lovely higher level computing API for GPUs. Primarily, it's not absolutely limited to GPUs, but you can also do compute in Vulcan. But it's not as nice, it's not as orthogonal and regular and clean. And there are some safety features that aren't present in Vulcan that are present in OpenCL. So we're trying to have some uniformity there. Things like 64 bit addressing. You don't necessarily have it in Vulcan, but we'll have in roadmap 2026, we'll be basically saying all of the interesting highly programmable devices that support lively open software markets are going to have 64 bit addressing in the GPU. So these things are coming. The ability to cast pointers. Vulkan doesn't have it. Some state management improvements. State is the bane. Okay, now we're going to get too far into stuff you don't want to hear about. But managing GPUs have enormous amounts of state. And you typically, you set up all this state and it defines a virtual machine, and then you shove data through it with a shovel as fast as you can shovel, and it all just works. But managing that state is a nightmare because you want to change it and start shoveling more data, but the old data isn't finished running through and are you just. Just. Anyway, so we have state stuff in mind, ML stuff, as Ralph said, obviously. Okay, this goes to a meta point. What is Vulkan's job, in our view? We have this discussion with our board of directors, who persist in thinking of Vulkan as a graphics API. In our view, the mission of Vulkan is to do whatever people want to do with a GPU. And so GPUs, for example, on your desktop graphics card has a video decoder in it. They all do. And so exposing video is part of Vulkan's job. And we do that. People use GPUs for machine learning. It's like the dominant platform for machine learning. And so therefore it's in our wheelhouse to expose machine learning on GPUs. So anything people want to do with a GPU, we want to provide what you need to do it. I think Ralph, you did cover the debug, for example. RALPH was one of the leaders of getting debug functionality in.
Joe Nash
So, yeah, I think you mentioned the other day you worked on some of the extensions prior to a chairship. Right.
Ralph Potter
One of my first initiatives in arriving in the Vulkan working group as Samsung's representative was this is not unique to Vulkan, but debugging what happened to your GPU when it crashed is a really thorny, painful problem for developers. And so, yeah, when I arrived, one of my first initiatives was the working group should really do something about this problem, and it's not an easy problem to solve. We spent at least two years discussing exactly what we could do there.
Joe Nash
The classic manoeuvre of joining the committee to solve your own problem. I like it a lot. Perfect.
Ralph Potter
That was essentially what went on There.
Tom Olson
We all do that. We all do that.
Ralph Potter
I think I'll also add one caveat to all of this discussion about roadmaps, which is to say we're talking about the future here. And historically we have been a little bit risk averse about saying we're going to do a thing, and then essentially we have historically only wanted to say we're going to do a thing when we absolutely knew it was done and nothing could possibly go wrong. Roadmaps are new ground for us. Speaking about what's on road maps that have not been announced is definitely new ground for us. And so there is a world in which companies start working on these things that we've said. We're, you know, are on our roadmap and somebody discovers there's a problem. And collaboratively, if we're, if we're asserting that everybody will support a thing, sometimes that means we need to figure things out. So this is where we're trying to go. The things that are not in a published document, there is room for them to move around in time based on problems that people run into. Talking about the future is difficult.
Joe Nash
Yeah. Yeah, that totally makes sense. Thank you. Yeah, thank you very much for that. Remember, that's amazing. Yeah, great review of the content. But also to do that, you mentioned your siggraph talk, Tom, which you said something in that talk that I wanted to chat about because I thought it was really interesting. And it hits on another thing you just said about, you know, what is the role of Vulkan? So paraphrase, I think you said something like, it takes an ecosystem to raise an API. And you were talking about the ecosystem around Vulkan as a whole. But you said, the really interesting thing you said was that although the working group doesn't have authority to dictate how the ecosystem develops, it does have responsibility to ensure that it works, which seems like a very difficult hill to stand on. And I imagine it's a nightmare to manage. Can you talk a little bit about this and how it influences your work?
Tom Olson
Sure. Well, I mean, this was a realization we came to slowly because we created Balkan 1.0 back in 2016 and people desperately wanted to use it. And we came out and said, here it is. We finally got it done and we gave it to them. And they were like, well, now what do I do? I don't know how to learn this. The API is enormously complex. I don't have any tools that I can use. There are bugs in the implementations. This. Thank you. But it's not solving my problem. And it was a gradual process. For us to understand that we have to define our job broadly, as if the job of the Vulcan Working Group is to create Vulcan and also make sure it's successful. We have to own all the problems that somebody else isn't owning for us. I mean, it's tiny organization. We have a budget of about a million and a quarter per year, half of which we spend on conformance testing. So compared to some other standards bodies, we're tiny compared to the way I like to say it. Maybe I said this. At Siggraph, we are approximately one third the size of the average McDonald's in terms of our annual budget.
Joe Nash
Or I don't recall that.
Tom Olson
That's fantastic, okay, in terms of our annual cash flow. But we have fortunately a lot of. Well, I will say we are leveraging efforts of many people outside Valve is wonderful about funding a lot of work in the ecosystem that Cronos doesn't pay for. So the total value going into the Vulcan ecosystem is many times what the working group's budget is. But still we're small. And so anytime a developer is finding Vulkan not usable for some reason, even if we can't solve it ourselves, we feel a responsibility to listen to them seriously, understand the problem, give them the best answer we can, and hopefully find or motivate a solution from some other part of the ecosystem if we can't do it ourselves. So you asked, how does this affect your work day to day? We do a lot of tracking every time we have a face to face, which is three times a year, one of them is virtual. These days, one of the things we always do is go through survey, try to find every piece of feedback we can find from the developer community, survey our members, survey our advisory panel. We have an advisory panel and all of our GPU vendor members have developer relations teams that are constantly talking to developers and trying to help them use Vulkan on their implementation. But they hear things and they hear what's not working. So job one, we just keep on top of it. Job two, if it's a problem, it becomes an issue in our issue tracker and it comes up on the agenda. And Lucky RALPH gets to a lot of the chair's job, I will say, is rubbing the group's nose in problems that aren't progressing. And so I've been doing that for a long time and RALPH is going to do it going forward.
Joe Nash
Got to keep things moving. So another thing you mentioned was in that talk and in your summary of remit26 was open was the OpenCL feature parity. And you know, you mentioned that Vulkan does offer compute, so obviously that's an enormous topic at the moment. Can we talk a little bit about what the facilities of Vulcan offers are for compute and gpu? Tom, do you want to kick us off on that?
Tom Olson
Well, I'm old enough, I always start with history. Compute came into GPUs on the desktop back with I think DX10 compute shaders and OpenGL 3, was it 4 point? I can't remember what OpenGL version introduced compute shaders, but it's been around for a long time. There's been a compute model, it's GPU flavored in that GPUs are quirky and thorny. So you have special memory spaces. Compute can only happen here, it can't interact with other things. But the shading languages are general purpose. They have a full population of float and integer types. In modern Vulkan, let's say Vulkan, with the extensions that bring it up to know 1.3 and beyond, you have the ability to do something which is like having pointers. It's not quite exactly the same thing, but you can do, you can do fully general computing on desktop hardware. You can do double precision. We have that. We have slowly and painfully worked ourselves to where we think the behavior of floating point numbers is fully specified. There used to be a lot of quirks like do you get not a number when you divide by zero? Or do you get zero? Or do you get. You know, there was a lot of latitude in early Vulcan and we've slowly nailed that down. You may have to enable certain extensions. What do we call it? Like if, for example, you decide I really don't want round to nearest, I really want truncation. We have an extension shader, float controls that will allow you give you the hooks you need in the language to turn on and off different kinds of floating point behavior. Ralph, do you have any thoughts?
Ralph Potter
I mean, I think I would take it up a higher level and say there are compute APIs, things like OpenCL and things like Cuda that provide you very precise. So first of all, they tend to be more general programming, kind of C like programming models. There's things like pointers in there. They also provide you very precise guarantees about things like what precision were my floating point operations. Give me exactly how many, how much error can I have in a square root extension, in a square root instruction, that sort of thing. These are the sorts of things that you need if you're doing for example, scientific computing. If you're doing a physics, a complex physics simulation, you need to know how is your floating point math going to behave. Graphics historically has been very forgiving of being slightly more lax about that because we're dealing with colors and perception and pixels. And so historically the graphics answer to how precise does a square root have to be? Was very different from the compute API answer to that same question. But once you start doing the same sort of compute problems on Vulkan, then a lot of the same considerations come in and we start having to nail those things down, but also nail them down in a way where if you write a compute app and it's critical, you're, you're maybe willing to pay those costs, but we can't make all of graphics slower as a, as a consequence. And so those are the sorts of trade offs that's kind of the high level take on. Where there's a difference is they've come from different places and now the use cases are sort of converging. And so, you know, some of those things have to come from the compute side. There's more things that have to be nailed down.
Joe Nash
You preempted my next question, which is going to be where does this fit alongside OpenCL and CUDA? So that's awesome. So I guess to round off this section, you mentioned earlier, the programming language for Vulkan and then general purpose shaders. So this kind of ties into, I guess nicely the news this month that Microsoft will be supporting spirv, which I believe is the language you're referring to for hlsl, their shader language. Can we talk a bit about what Spirv is and how its role in Vulkan?
Ralph Potter
So again, I guess I'll refer back to history. In OpenGL, we took in graphics shaders as a shading language, as human readable source code. Everybody had a compiler that parsed that source code and translated it down to the native instructions of their gpu. It was built into the API that there would be a function call that you provided the source to and it would do the compilation. There are a couple of consequences to that. One is that your API only consumes one source language and people either have to code in that source language or they have to have something that generates that source language. A further complication is that compilers are complicated. Compilers have bugs in them. Different vendors, compilers have different bugs in them. And that was a painful experience. So putting my former compiler engineer hat on, the typical process for compilers is they're taking A source language, and they're translating it into some intermediate representation of the language, something the compiler understands that is not human readable, but still contains the structure of the code. And then they translate that down to the actual individual instructions, the hardware level instructions. So what SPIRV is, is that we essentially said we would standardize the intermediate representation. We would standardize a format that says this is a representation of your program. It's not designed to be human readable, it's a binary representation. But that allows a multitude of front end languages and front end compilers to generate those intermediate representations. It gets drivers out of the business of parsing text and it lets drivers engineers just concentrate on the problem of how do I get from an intermediate representation of this problem to my instruction set? It's been a very powerful thing. It's got us to a place where application developers can write their shaders in HLSL if they're coming from the DX world. They can still write them in GLSL if they're coming from that world. There are other compilers out there as well that also generate spirv. In that sense, it's been a very powerful choice. I would say that is one of the early Vulkan 1.0 decisions that we made. That was absolutely right.
Joe Nash
Awesome. Cool. Thank you for running through that. So that's kind of covered all of the Vulcan specific topics I wanted to chat about today. And I'm conscious that we're running low on time, but I do want to follow up on, I think towards beginning of this podcast, we made a couple of jokes about, you know, committee work and people who enjoy it and doing it as a career. Folks who have heard this and they're like, you know, actually working on a committee for an open standard sounds like it's for me. Do you have tips for how you would get involved from the start?
Tom Olson
A lot of it comes down to picking your employer carefully. As I said, most of what that is, most of us in the group work either for a GPU company which supports Vulcan or a company which makes use of Vulcan in some fashion. I did leave out, by the way, Google is a member because Android depends on Vulcan. So they, they put a lot of effort into it. So either you're. If you're. You need to work for a company that needs Vulcan to exist for some reason, either because they want to sell it, or because they want to buy it and use it, and then you work your way into it. It's. I mean, another thing to say about Vulkan which maybe we haven't touched on, is that we're heavily committed to open source. So the specification is open source, all the tooling is open source, all the compilers that we use and the validation layers and all that stuff. And we've had enormous benefit from people who read comment on. Proposing a new feature through the open source interface is a tough sell because if you're coming from outside and you don't work for a GPU vendor, there are tons of constraints that you're not aware of and your chances of producing something that will actually work in hardware are near zero. So I wouldn't encourage people to just come in and try to to add features. But if you start working with the spec, understand the spec through looking at things that go by, you'll know enough to make a contribution. And your contributions, by the way, would be desperately grateful for. We always are. Bug reports, et cetera. Not in other people's drivers, but in the spec itself. It really comes down to it's difficult to contribute actually, let me back up. There are other places that we're very interested in having help with which are not the spec. So for example, part of our devrel operation, we have a large and growing collection of sample codes and Ralph, do you know, can you join that group without being working for a. No, because that develops examples for unpublished extensions. That's why it's NDA.
Joe Nash
That makes total sense. Yeah. Getting to a company that's a member, I guess, is a starting point. Ralph, anything to add?
Ralph Potter
I mean, I think Tom's point is largely correct. The short answer is the most likely routine is to work for a company that is a member or if your company is small but in the right space for your company to join and that gets you a seat at the.
Joe Nash
Table, do your own Lunergy.
Ralph Potter
Yeah, well, whether you do your definitely there are, you know, if you're a GPU contractor type company, there's space for those, you know, we have game company members. In the grand scheme of things, the cost of joining Kronos is a lot less than the cost of your engineering time. So that is probably the route in. I would, if I had to say how have most members who are regular participants of the working group got there? The most traditional route is become a driver engineer at a hardware company and volunteer to do this stuff. You have to have a certain mindset to find standards work engaging. I love it. There are other driver engineers in my team who find it, you know, a lot of meticulous paperwork and they would rather write code it's something that you let you either learn to love or you. Or you learn that you want to do something else. But the traditional routine is probably through driver teams in hardware companies. But as Thomas mentioned, we have other members as well. There are game companies, there are platform vendors, there are people like Lunergy and Mobica and Agalia who are kind of software contractors. There are people from some of the open source projects, albeit sponsored by companies working in that space. So yeah, there's a variety of routes in.
Tom Olson
I should mention I said you pay several tens of thousand dollars for a company to become a Krones member. We really do want the participation of small companies, small game developers, et cetera. So there is what's called an associate membership, which is in rather than tens of thousands. It's thousands and it scales with company size, counted as number of employees. Those members don't get a vote in the committee, but they can do everything else. They get to participate, they can make non NDA proposals for changes, et cetera. And we do that.
Joe Nash
Wonderful. Awesome. Cool. Thank you both so much. This has been illuminating for me and it's great to hear how everything works under the hood. And I do believe that. Tom, you mentioned the beginning that you've got an upcoming retirement. Thank you so much for all your years of service development. Vulcan. As someone very downstream, as a big enjoyer of video games, I've enjoyed the fruits of your labors for many years. Thank you very much and congratulations on your election, Ralph, and good luck for the future.
Ralph Potter
Thank you.
Joe Nash
Thanks.
Podcast Summary: Software Engineering Daily – The Vulkan Graphics API with Tom Olson and Ralph Potter
Release Date: December 19, 2024
Host: Joe Nash
Guests: Tom Olson (ARM) and Ralph Potter (Samsung)
Topic: In-depth discussion on the Vulkan Graphics API, its development, advantages, and future roadmap.
Joe Nash welcomes Tom Olson and Ralph Potter, chairs of the Khronos Vulkan Working Group, to discuss the Vulkan Graphics API. Tom introduces himself and his extensive experience in graphics standards, while Ralph shares his background at Samsung and his role in the Vulkan Working Group.
Notable Quote:
Tom Olson [01:40]: "I've been a professional Graphics Standards committee chair for the past 18 years... I started chairing the OpenGL ES standard... and helped to get Vulkan started."
Tom Olson provides a foundational understanding of Vulkan, emphasizing that it is not merely a graphics API but a comprehensive API for controlling and programming GPUs. Vulkan aims to expose the GPU's parallel processing capabilities more directly, offering higher performance and reduced overhead compared to older APIs like OpenGL and DirectX.
Notable Quote:
Tom Olson [03:35]: "Vulkan is the modern way to program GPUs... its job is to expose that power."
The discussion highlights the inefficiencies of older graphics APIs that presented a CPU-like programming model, which didn't align well with the inherently parallel nature of GPUs. Vulkan addresses these issues by allowing developers to queue extensive commands, optimizing parallel execution, and better utilizing multiple CPU cores.
Notable Quote:
Tom Olson [04:39]: "With the old APIs, you couldn't get the efficiency you wanted because you were working through this very thick abstraction... Vulkan solves those problems."
Ralph Potter elaborates on Vulkan's extensive cross-platform capabilities. Vulkan is supported across PCs, various gaming consoles, and is fundamental to Android, ensuring that a wide range of devices can leverage its high-performance graphics and compute capabilities.
Notable Quote:
Ralph Potter [06:42]: "If it's got a relatively modern GPU, there's a good chance that there is a Vulkan driver available somewhere."
Tom Olson explains the role of the Khronos Group, an international consortium responsible for standardizing APIs like Vulkan. The group comprises over 120 members, including major GPU vendors and game engine companies. They collaborate to ensure Vulkan meets industry needs while maintaining cross-compatibility and performance standards.
Notable Quote:
Tom Olson [17:38]: "Khronos is an international consortium and standards body with the mission statement of connecting software to hardware."
The conversation delves into how Vulkan implementations must undergo rigorous conformance testing to ensure compatibility and performance across different hardware. Tom describes the process of obtaining Vulkan's trademark through Khronos and the role of device drivers in implementing Vulkan.
Notable Quote:
Tom Olson [21:16]: "Vulkan is a trademark of the Khronos group, and if you want to use that trademark, you have to have permission."
Ralph Potter discusses Vulkan's roadmap, outlining planned features and improvements leading up to 2026. Key areas include debugging enhancements, compute improvements, machine learning integrations, and state management advancements. The roadmap aims to align Vulkan's evolution with long-term hardware developments.
Notable Quote:
Ralph Potter [26:00]: "We have released a roadmap 2024 earlier in the year... we have now got into a model where we can plan the rough content of roadmaps many years out in advance."
Tom and Ralph explore Vulkan's compute functionalities, comparing them to dedicated compute APIs like OpenCL and CUDA. While Vulkan offers robust compute capabilities, it is evolving to bridge gaps in precision and usability to cater to scientific computing and machine learning applications.
Notable Quote:
Ralph Potter [43:51]: "Compute APIs like OpenCL and CUDA provide very precise guarantees... Vulkan is working to bridge those gaps."
The discussion covers SPIR-V, Vulkan's intermediate shader language, which standardizes shader compilation across different front-end languages like HLSL and GLSL. Ralph explains how SPIR-V facilitates a more efficient and uniform shader compilation process, enhancing portability and reducing driver complexities.
Notable Quote:
Ralph Potter [44:15]: "SPIR-V allows a multitude of front-end languages to generate a standardized intermediate representation, simplifying shader compilation."
Tom and Ralph offer insights into how developers and companies can engage with the Vulkan ecosystem. They emphasize the importance of working with member companies of the Khronos Group and contributing to open-source projects. Ralph highlights the pathways for both large and small companies to participate, encouraging involvement through contributions and collaboration.
Notable Quote:
Ralph Potter [49:44]: "The most likely route is to work for a company that is a member... or join as an associate member if you're a smaller entity."
As the podcast concludes, Joe Nash thanks Tom and Ralph for their invaluable contributions to the Vulkan API and the broader graphics community. Tom expresses gratitude for his tenure, while congratulating Ralph on his new role, underscoring the collaborative efforts driving Vulkan's success.
Notable Quote:
Joe Nash [52:04]: "As a big enjoyer of video games, I've enjoyed the fruits of your labors for many years. Thank you very much."
Conclusion
This episode of Software Engineering Daily offers a comprehensive exploration of the Vulkan Graphics API, shedding light on its inception, technical advantages, development processes within the Khronos Group, and future directions. Tom Olson and Ralph Potter provide expert perspectives, making complex topics accessible and highlighting Vulkan's pivotal role in modern graphics and compute applications.