AI won't plateau — if we give it time to think | Noam Brown - TED Talks Daily

Summary6 min read

Episode Summary: "AI Won't Plateau — If We Give It Time to Think" by Noam Brown

Podcast: TED Talks Daily
Host: Elise Hu
Episode Release Date: February 1, 2025
Speaker: Noam Brown, OpenAI Research Scientist
Event: TED AI San Francisco 2024

Introduction

In this compelling episode of TED Talks Daily, host Elise Hu introduces Noam Brown’s insightful presentation titled “AI Won’t Plateau — If We Give It Time to Think.” Brown, a renowned research scientist at OpenAI, delves into the advancements of artificial intelligence (AI) over the past five years and presents a paradigm shift in how AI models can continue to evolve without hitting a plateau.

The Scaling Paradigm in AI

Brown begins by contextualizing the remarkable progress AI has achieved, primarily through scaling up existing architectures. He states:

"The incredible progress in AI over the past five years can be summarized in one scale." [01:59]

He explains that although there have been algorithmic innovations, the foundational transformer architecture from 2017 remains largely unchanged. The significant advancements stem from increasing the scale of data and computational resources. For instance, training GPT-2 in 2019 cost about $5,000, while contemporary models require hundreds of millions of dollars to train. This exponential scaling has led to continual improvements, but it raises concerns about the sustainability of this approach.

Key Points:

Transformer Architecture: The backbone of current AI models since 2017.
Scaling Up: Increasing data and compute has driven AI advancements.
Cost Concerns: Training costs have surged from thousands to hundreds of millions of dollars.

The Poker AI Experiment: System 1 vs. System 2 Thinking

Brown shares a pivotal experiment from his PhD days, highlighting the limitations of solely scaling traditional AI models (referred to as System 1 thinking). He recounts his work on developing an AI to play poker, a game requiring both luck and deep strategy:

"Opponents thought we had figured out the paradigm and now all we needed to do was scale it." [02:44]

Despite training the AI extensively—playing nearly a trillion hands over three months—the bot consistently lost to top human players. The crux of the problem was the AI’s inability to engage in deliberative thinking during gameplay. Brown contrasts this with human players who employ both intuitive (System 1) and analytical (System 2) thinking:

"If it was a difficult decision, they might think for a few minutes." [05:00]

Curious about the performance gap, Brown hypothesized that integrating System 2 thinking could bridge the divide. His experiments revealed that allowing the AI to "think" for just 20 seconds per hand yielded performance improvements equivalent to scaling the model's size and training by 100,000 times.

"Spending 20 seconds thinking in a hand of poker got the same boost in performance as scaling up the size of the model and the training by 100,000x." [07:15]

This revelation prompted a fundamental redesign of their poker AI to incorporate both System 1 and System 2 thinking, culminating in a breakthrough victory over human experts in a subsequent competition.

Key Points:

System 1 vs. System 2: Differentiating between fast, intuitive thinking and slow, analytical reasoning.
Performance Gap: Traditional AI lagged behind humans due to lack of deliberative thinking.
Breakthrough: Integrating System 2 thinking dramatically enhanced AI performance without exorbitant scaling.

Extension to Other Domains: Chess and Go

Brown extends his findings beyond poker, citing historical AI achievements in chess and Go. He discusses how IBM’s Deep Blue and DeepMind’s AlphaGo leveraged thinking time to outperform human champions:

"Deep Blue thought for a couple minutes before making each move." [09:10]
"AlphaGo took the time to think for a couple minutes before making each move." [10:00]

Research indicates a clear relationship between increased thinking time (System 2) and AI performance. A 2021 study highlighted that a tenfold increase in thinking time corresponded to a tenfold improvement in performance, paralleling the benefits seen from scaling model size and training duration.

Key Points:

Deep Blue & AlphaGo: Pioneering AIs that utilized deliberative thinking to achieve landmark victories.
Research Correlation: Empirical evidence supports the effectiveness of System 2 thinking in enhancing AI capabilities.

Implications for Language Models and Future AI Development

Addressing skepticism about AI reaching a plateau, Brown introduces OpenAI's latest innovation, O1—a language model designed to incorporate System 2 thinking. O1 adjusts its "thinking" duration based on the complexity of the task, enhancing its problem-solving abilities:

"O1 benefits by being able to think for longer. This opens up a completely new dimension for scaling." [12:30]

He argues that this approach offers a viable alternative to the escalating costs of scaling traditional models. While the initial costs of longer thinking times are higher, the incremental improvements in performance justify the investment, especially for critical applications like medical research or scientific discovery.

Brown anticipates that embracing System 2 thinking will unlock new potentials for AI, moving beyond mere chatbots to become powerful tools for addressing complex global challenges.

Key Points:

O1 Model: A breakthrough in integrating deliberative thinking within language models.
Cost-Benefit Tradeoff: Balancing increased query costs with significant performance gains.
Future Potential: Expanding AI capabilities to tackle pressing and intricate problems.

Conclusion: A Call to Embrace the New Paradigm

Brown concludes by emphasizing that the AI revolution is not a distant future but an ongoing transformation. By adopting strategies that incorporate both System 1 and System 2 thinking, the AI community can continue to drive unprecedented advancements without being constrained by traditional scaling limitations.

"I know that there are some people who will still say that AI is going to plateau or hit a wall. And to them I say, want to bet?" [14:00]

His optimistic outlook suggests that with continued innovation in AI architectures and methodologies, the field will keep evolving, surpassing existing expectations and breaking through perceived barriers.

Notable Quotes

On AI Progress Through Scaling:

"The incredible progress in AI over the past five years can be summarized in one scale." [01:59]
On System 2 Thinking Equivalence:

"Spending 20 seconds thinking in a hand of poker got the same boost in performance as scaling up the size of the model and the training by 100,000x." [07:15]
On Future AI Potential:

"This opens up a completely new dimension for scaling." [12:30]
Challenging Skeptics:

"I know that there are some people who will still say that AI is going to plateau or hit a wall. And to them I say, want to bet?" [14:00]

Implications for AI Development

Noam Brown’s presentation underscores a pivotal shift in AI research and development. By integrating deliberative thinking processes, AI can achieve higher levels of sophistication and effectiveness without the unsustainable costs associated with traditional scaling. This approach not only enhances performance but also broadens the scope of AI applications, making it a cornerstone for future advancements in the field.

For listeners and AI enthusiasts, Brown’s insights offer a roadmap for overcoming current limitations and embracing innovative strategies to propel AI into its next phase of evolution.

Additional Notes:
For more information on TED’s curation process, visit Ted.com Curation Guidelines. This episode was produced by the TED Audio Collective team, including Martha Estefanos, Oliver Friedman, Brian Greene, Autumn Thompson, Alejandra Salazar, and Christopher Faizy Bogan, with support from Emma Topner and Daniela Valarezo.

Loading summary

Transcript8 lines

[00:00]
Progressive Insurance Ad
This episode is brought to you by Progressive Insurance. Do you ever think about switching insurance companies to see if you could save some cash? Progressive makes it easy to see if you could save when you bundle your home and auto policies. Try it@progressive.com Progressive Casualty Insurance Company and affiliates. Potential savings will vary.
[00:18]
Thumbtack Ad
Not available in all states the ins and outs of caring for your home out uncertainty self doubt stressing about not knowing where to start in plans and guides that make it easy to get home projects done out Word art sorry Live laugh lovers in knowing what to do, when to do it and who to hire Start caring for your home with confidence. Download thumbtack Today.
[00:47]
Elise Hu
TED Talks Daily is sponsored by Capital One. In my house we subscribe to everything music, tv, even dog food. And it rocks until you have to manage it all. Which is where Capital One comes in. Capital One credit card holders can easily track, block or cancel recurring charges right from the Capital One mobile app at no additional cost. With one sign in, you can manage all your subscriptions all in one place. Learn more at CapitalOne.comscriptions Terms and Conditions apply. You're listening to TED Talks Daily, where we bring you new ideas to spark your curiosity every day. I'm your host teacher Elise Hu. It turns out it's not just humans who think fast and slow. AI models also need time to think if we are wanting them to perform better. In his 2024 talk, OpenAI research scientist Noam Brown shares new understanding about AI that can inform how to make models work better and do more at scale. It's coming up.
[02:00]
Noam Brown
The incredible progress in AI over the past five years can be summarized in one scale. Yes, there have been algorithmic advances, but the frontier models of today are still based on the same transformer architecture that was introduced in 2017, and they are trained in a very similar way to the models that were trained in 2019. The main difference is the scale of the data and compute that goes into these models. In 2019, GPT2 cost about $5,000 to train every year since then, for the past five years, the models have gotten bigger, trained for longer on more data, and every year they've gotten better. But today's frontier models can cost hundreds of millions of dollars to train, and there are reasonable concerns among some that AI will soon plateau or hit a wall. After all, are we really going to train models that cost hundreds of billions of dollars? What about trillions of dollars? At some point, the scaling paradigm breaks down. This is, in my opinion, a reasonable concern, and in fact it's one that I used to share, but today I am more confident than ever that AI will not plateau. And in fact, I believe that we will see AI progress accelerate in the coming months. To explain why, I want to tell a story from my time as a PhD student. I started my PhD in 2012, and I was lucky to be able to work on the most exciting projects I I could imagine developing AIs that could learn on their own how to play poker. Now, I had played a lot of poker when I was in high school and college, so for me, this was basically my childhood dream job. Now, contrary to its reputation, poker is not just a game of luck. It's also a game of deep strategy. You can kind of think of it like chess with a deck of cards. When I started my PhD, there had already been several years of research on how to make AIs that play poker. And the general feeling among the research community is that we had figured out the paradigm and now all we needed to do was scale it. So every year we would train larger poker AIs for longer on more data. And every year they would get better, just like today's frontier language models. By 2015, they got so good that we thought they might be able to rival the top human experts. So we challenged four of the world's top poker players to an 80,000 hand poker competition with $120,000 in prize money to incentivize them to play their best. And unfortunately, our bot lost by a wide margin. In fact, it was clear even on day one that our bot was outmatched. But during this competition, I noticed something interesting. You see, leading up to this competition, our bot had played almost a trillion hands of poker over thousands of CPUs for about three months. But when it came time to actually play against these human experts, the bot acted instantly. It took about 10 milliseconds to make a decision, no matter how difficult it was. Meanwhile, the human experts had only played maybe 10 million hands of poker in their lifetimes. But when they were faced with a difficult decision, they would take the time to think. If it was an easy decision, they might only think for a couple seconds. If it was a difficult decision, they might think for a few minutes. But they would take advantage of the time that they had to think through their decisions. In Daniel Kahneman's book, Thinking Fast and Slow, he describes this as the difference between system one thinking and system two thinking. System one thinking is the faster, more intuitive kind of thinking that you might use, for example, to recognize a friendly face or laugh at a funny joke. System 2 thinking is the slower, more methodical thinking that you might use for things like planning a vacation or writing an essay or solving a hard math problem. After this competition, I wondered whether this System 2 thinking might be what's missing from our bot and might explain the difference in the performance between our bot and the human experts. So I ran some experiments to see just how much of a difference this System two thinking makes in poker. And the results that I got blew me away. It turned out that having the bot think for just 20 seconds in a hand of poker got the same boost in performance as scaling up the model by 100,000x and training it for 100,000 times longer. Let me say that again. Spending 20 seconds thinking in a hand of poker got the same boost in performance as scaling up the size of the model and the training by 100,000x. When I got this result, I literally thought it was a bug. For the first three years of my PhD, I had managed to scale up these models by 100x. I was proud of that work. I had written multiple papers on how to do that scaling. But I knew pretty quickly that all of that would be a footnote compared to just scaling up system two thinking. So, based on these results, we redesigned the poker AI from the ground up. Now we were focused on scaling up system two thinking in in addition to system one. And in 2017, we again challenged four of the world's top poker pros to a 120,000 hand poker competition, this time with $200,000 in prize money. And this time we beat all of them by a huge margin. This was a huge surprise to everybody involved. It was a huge surprise to the poker community. It was a huge surprise to the AI community, and honestly, even a huge surprise to us. I literally did not think it was possible to win by the kind of margin that we won by. In fact, I think what really highlights just how surprising this result was is that when we announced the competition, the poker community decided to do what they do best and gamble on who would win. When we started, when we announced the competition, the betting odds were about four to one against us. After the first three days of the competition, we had won. For the first three days, the betting odds were still about 50 50. But by the eighth day of the competition, you could no longer gamble on which side would win. You could only gamble on which human would lose the least. By the end. This pattern of AI benefiting by thinking for longer is not unique to poker. And in fact, we've seen it in multiple other Games as well. For example, in 1997, IBM created Deep Blue, an AI that plays chess. And they challenged the world champion, Garry Kasparov to a tournament and beat him in a landmark achievement for AI. But Deep Blue didn't act instantly. Deep Blue thought for a couple minutes before making each move. Similarly, in 2016, DeepMind created AlphaGo, an AI that plays the game of Go, which is even more complicated than the game of chess. And they too challenged a world champion, Lee Sedol, and beat him in a landmark achievement for AI. But AlphaGo also didn't act instantly. AlphaGo took the time to think for a couple minutes before making each move. In fact, the authors of AlphaGo later published a paper where they measured just how much of a difference this thinking time makes for the strongest version of AlphaGo. And what they found is that when AlphaGo had the time to think for a couple minutes, it would beat any human alive by a huge margin. But when it had to act instantly, it would do much worse than top humans. In 2021, there was a paper that was published that tried to measure just how much of a difference this thinking time made a bit more scientifically. In it, the authors found that in these games, scaling up thinking time by 10x was roughly the equivalent of scaling up the model size and training by 10x. So you have this very clear, clean relationship between scaling up system 2 thinking time and scaling up system 1 training. Now, why does this matter? Well, remember I mentioned at the start of this talk that today's frontier models cost hundreds of millions of dollars to train, but the cost of querying them, the cost of asking a question and getting an answer, is fractions of a penny. So this result says that if you want an even better model, there are two ways you could do it. One is to keep doing what we've been doing for the past five years and scaling up System 1 training go from spending hundreds of millions of dollars on a model to billions of dollars on a model. The other is to scale up system two thinking and go from spending a penny per query to $0.10 per query. @ a certain point, that trade off becomes well worth it. Now, of course, all these results are in the domain of games, and there was a reasonable question about whether these results could be extended to a more complicated setting like language. But recently, my colleagues and I at OpenAI released O1, a new series of language models that think before responding. If it's an easy question, O one might only think for a few seconds. If it's a difficult decision, it might Think for a few minutes. But just like the AIs for chess, Go and poker, 01 benefits by being able to think for longer. This opens up a completely new dimension for scaling. We're no longer constrained to just scaling up system one training. Now we can scale up System two thinking as well. And the beautiful thing about scaling up in this direction is that it's largely untapped. Remember I mentioned that the frontier models of today cost less than a penny to query. Now, when I mention this to people, a frequent response that I get is that people might not be willing to wait around for a few minutes to get a response from a model or pay a few dollars to get an answer to their question. And it's true that O takes longer and costs more than other models that are out there. But I would argue that for some of the most important problems that we care about, that cost is well worth it. So let's do an experiment and see. Raise your hand if you would be willing to pay more than a dollar for a new cancer treatment. All right, basically everybody in the audience, keep your hand up. How about $1,000? How about a million dollars? What about for more efficient solar panels or for a proof of the Riemann hypothesis? The common conception of AI today is chatbots. But it doesn't have to be that way. This isn't a revolution that's 10 years away or even two years away. It's a revolution that's happening now. My colleagues and I have already released O1 preview, and I have had people come to me and say that it has saved them days worth of work, including researchers at top universities. And that's just the preview. I mentioned at the start of this talk that the history of AI progress over the past five years can be summarized in one scale. So far, that has meant scaling up the system one training of these models. Now we have a new paradigm, one where we can scale up System two thinking as well. And we are just at the very beginning of scaling up in this direction. Now, I know that there are some people who will still say that AI is going to plateau or hit a wall. And to them I say, want to bet? Thank you.
[14:07]
Elise Hu
That was Noam Brown at TED AI San Francisco in 2024. If you're curious about Ted's curation, find out more at Ted.comCurationGuidelines and that's it for today. TED Talks Daily is part of the TED Audio Collective. This episode was produced and edited by our team, Martha Estefanos, Oliver Friedman, Brian Greene, Autumn Thompson and Alejandra Salazar. It was mixed by Christopher Faizy Bogan. Additional support from Emma Topner and Daniela Valarezo. I'm Elise Hu. I'll be back tomorrow with a fresh idea. Fresh your feet. Thanks for listening.
[14:50]
Verizon Ad
Ladies and gentlemen, we are now boarding Group A. Please have your boarding passes ready to scan. If your phone is cracked, old or was chewed up by your Chihuahua travel companion, please refrain from holding up the line and instead simply go to Verizon and trade in any phone in any condition from one of their top brands. For the new Samsung Galaxy S25 plus with Galaxy AI on Unlimited ultimate and a watch or tab also on them. Service plan required for watch or tab. Trade in and additional terms apply. See verizon.com for details.
[15:19]
Thumbtack Ad
Did you know one in two women wear the wrong foundation. Matching foundation is hard, but il maquillage makes it easy. Take the Power Match quiz to find a perfect match in seconds customized to your unique skin tone, undertone and coverage needs. With 600,000 5 star reviews woke up like this is our best selling foundation for a reason. Available in 50 shades of weightless Natural Coverage and with Try before youe Buy, you can try your full size at home for 14 days. Just pay shipping. Take the quiz@ilmaquillage.com Quiz that's I L M A K I A G E.com.
[15:48]
Progressive Insurance Ad
Quiz this episode is brought to you by Progressive Insurance. Do you ever think about switching insurance companies to see if you could save some cash? Progressive makes it easy to see if you could save when you bundle your home and auto policies. Try it@progressive.com Progressive Casualty Insurance Company and affiliates. Potential savings will vary. Not available in all states.