Transcript
A (0:00)
Catastrophic forgetting is definitely a problem, especially in post training and when you don't have access to the original pre training data.
B (0:08)
How much of that is focused on improving the design of AMD hardware and how much of it is focused on general model development which AMD is not doing right? AMD does not develop its own models.
A (0:23)
I think people want infinite chips, Craig. So no it doesn't relieve the pressure. I am Sharon. I'm the VP of AI at AMD and I think about self improving AI, self improving LLMs which we'll get into later. But my background comes from AI research so I used to be an AI researcher at Stanford where I did my PhD with Andrew Ng. I taught there as adjunct in generative AI back before all this ChatGPT stuff. And after Stanford I started a startup, an AI infrastructure startup doing post training of language models actually on AMD GPUs. This was started a couple months before ChatGPT launched and most recently over last several months we have transitioned now to amd. So my team and I are there now and very excited to enable more people to use compute and to get access to compute because that really is the limiting factor, one of the big limiting factors for developing AI and being able to enable more people to steer these models. So that's what I'm really excited about. And yeah, that's, that's why I'm here.
B (1:36)
Yeah. And, and I do want to talk about self improving AI. Can you start by defining what we're talking about when we talk about self improvement? Are we talking about models that rewrite their own code or something like refining their own training data?
A (2:00)
Yeah, I think that's exactly it. It is a broad category but essentially it's the idea of these models being able to edit any part of themselves to improve themselves, whether that be the data, whether that be the actual model architecture, whether that be how they evaluate themselves. Actually the part that I'm working on is below all of that and it's actually how fast they actually run on the GPUs themselves. They are writing the kernel code that underlies these models to run faster on these GPUs and to run effectively on them and on new hardware too. That's been really exciting to see.
B (2:37)
Yeah, I just read about kernel evolve. Who is doing kernel evolve? I've forgotten. Is that Google? Is that your guys work?
A (2:51)
So there's a lot of different pieces of work around kernel generation and being able to use LLMs to generate these kernels. We're doing some of that, but I think that Work might have been from Meta, but there's. Yeah, I think there's work across the board, across the industry that is very important I think towards this end because it enables more people to get on, get on different types of compute. What we did most recently was in collaboration with actually a bunch of different institutions like Meta, Google, DeepMind, ML, Common, Stanford, Nvidia etc. Was NeurIPS tutorial on generating kernels using AI. And so we presented that and basically goes through how we're using AI agents generate these kernels and how we're thinking about post training these models to generate kernels more effectively. Because what's really exciting about kernel generation and kernel development is actually we have the profiler, so we have the ability to actually see how fast the generated kernels are on the chip themselves. That's really exciting. My team, we're also working on a more robust production level benchmark to share with the community as well as well as different techniques to modify the models to, to, to do better on this task.
