Transcript
A (0:00)
There is this thing that's happening in AI and in AI every week now, a lot is happening. Fundamentally, if you look at AI progress, it's been a very smooth exponential increase in capabilities. This is the overarching trend. It's not like pre training fizzled out. It's just we found out a new paradigm that at the same price gives us much more amazing development. And this paradigm is still very new. I think one of the biggest things that I would say people kind of know on the insight and others don't is that already right now it's not about the progress. There are so many things Chat or Gemini, any LLM can do for you that people just don't realize. You can take a photo of something broken, ask how to repair it. It may tell you you can give it a college level homework and it will do it for you.
B (0:42)
Hi, I'm Matt Turk, welcome to the MAD podcast. My guest today is Lukasz Kaiser, one of the key architects of modern AI who has quite literally shaped the history of the field. Lukasz was one of the co authors of the attention is all you need paper, meaning he's one of the inventors of the transformer architecture that powers almost all the AI that we use today. He's now a leading research scientist at OpenAI helping Dr. Second major paradigm shift towards reasoning models like the ones behind GPT 5.1. This episode is a deep exploration of the AI frontier. Why the AI slowdown narrative is wrong, the logic puzzles that still stomp the world's smartest models, how scaling is being redefined and what all of that tells us about where AI is heading next. Please enjoy this fantastic conversation with Lukas. Lukasz, welcome.
A (1:28)
Thank you very much.
B (1:29)
There was a narrative, at least in some circles, maybe outside of San Francisco throughout the that AI progress was slowing down, that we had maxed out pre training, that scaling laws were hitting a wall. Yet we are recording this at the end of a huge week or a couple of weeks with release of GPT 5.1 GPT 5.1 Codex Max GPT 5.1 Pro as well as Gemini Banana Pro for Grok 4.1 almost 3. So this feels like a major validation of that narrative. What is it that people in year AI labs know about AI progress that at least parts of the rest of the world seem to not understand?
A (2:15)
I think there is a lot to unpack there, so I want to go a little slower. There is this thing that's happening in AI and in AI every week now a lot is happening. New model coding, doing Slides, self driving cars, images, videos, you know, it's a nice field that doesn't make you be bored for a long time. But through all of this it's sometimes hard to see the fundamental things that are happening. And fundamentally, if you look at AI progress, it's been a very smooth exponential increase in capabilities. This is the overarching trend and there has never been much to make me at least, and I think my colleagues in the labs believe that this trend is not happening. It's a little bit like Moore's Law, right? Moore's Law happened through decades and decades and arguably you would say it's still very much going on, if not speeding up with the GPUs. But of course it did not happen as like one technology was bringing you there for 40 years. There was one technology and then another and another and another and another, and this went on for decades, right? So from the outside you see a smooth trend. But from the inside, of course, progress is made through new developments in addition to the increase of computer power and better engineering. So all of these things come together. And in terms of language models, I think there was a big pivotal point. I mean one point was of course the transformers when it started. But the other point was reasoning models and that happened, I think O preview was a bit a year and a month ago or something like that. So we started working on it maybe three years ago, but it's very recent. If you think of it as a paradigm, that's a very recent thing. So it's always like these S curves, right? It starts, then it gives you amazing growth and then it flatlines a little bit though we'll get to the pre training. I feel pre training in some sense it's on the upper part of the S. But it's not like scaling loss for pre training don't work. They totally work. What scaling clause says is that your loss will log linearly decrease with your compute. We totally see that and clearly Google sees that and all other labs. The problem is how much money do you need to put into that versus the gains you get. And it's just a lot of money and people are putting it. But with the new paradigm of reasoning, you can get much more gains for the same amount of money because it's on this lower and there are just discoveries to be made and these discoveries unlock insane capabilities. So it's not like pre training fizzled out, it's just we found out a new paradigm that at the same price gives us much more amazing development. And this paradigm is still very New it happens so fast. I think if you blink, you may miss it because it was basically, you had chat 3.5, right? GPT 3.5 in chat. And it would give you answers and it used no tools, no reasoning. It would answer you something and now you have chat and you know, if you were not into it, you may have blinked. And it also gives you answers and you may say, okay, it's more or less the same, except the chat now will, you know, go look on some websites, reason about it and give you the right answer instead of something it memorized in its weights. I very much used to like this example of what time does the SFZoo open tomorrow? The org chat would tell you totally hallucinate from its memory an hour that it read Zoo opens on probably the Zoo's website from five years ago. And it didn't know what's today or tomorrow. So it would just assume it's a weekday. Chat now knows what's today because it's in the system prompt. It goes to the Zoo website, reads it, extracts the information if it's ambiguous, probably checks three other websites just to confirm, and then gives you the answer. But if you blink, you may think it's the same, but no, it's dramatically better. And you know, as a consequence, since it can read all the websites in the world, it can give you answers and stuff that it wouldn't be able to even touch before. So there is tremendous progress, right? And it happened so fast, may even be missed. I think one of the biggest things that I would say people kind of know on the insight and others don't is that already right now it's not about the progress. Like there are so many things Chat or Gemini, any LLM can do for you that people just don't realize. You can take a photo of something broken, ask how to repair it. It may tell you you can give it a college level homework and it will do it for you probably. So that's absolutely amazing.
