Latent Space: The AI Engineer Podcast

Episode: Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay

Date: January 23, 2026
Guest: Yi Tay (Google DeepMind, Gemini Singapore)
Host: Latent.Space

Episode Overview

This episode features a wide-ranging conversation with Yi Tay, a leading researcher at Google DeepMind and a key contributor to the Gemini project in Singapore. Together with host Latent.Space, they dive deep into cutting-edge advances in AI, focusing on reasoning and AGI, reinforcement learning (RL), the Gemini model's performance at the International Mathematical Olympiad (IMO), and the emergence of Singapore as a frontier location for AI work. The discussion is frank, technical, and lively, exploring the practical challenges and philosophical musings involved in pushing the boundaries of AI research, team-building, and even personal performance.

Key Discussion Points and Insights

1. Gemini Singapore and the AGI Ambition

[01:00–02:52]

Gemini Singapore is an extension of the global Gemini effort, specializing in “Reasoning and AGI.”
The team’s naming (“AGI”) signals ambition and is meant to communicate their North Star to both insiders and outsiders.
Rejoining Google, Yi reflects that returning felt seamless—“like you play Pokemon, you leave it aside and then you go back and you click continue. Save game.”—and that Google’s infrastructure is still a major asset.

2. From Foundation Models to Reinforcement Learning

[03:19–04:49]

The shift in research focus from architectures and pre-training (think T5, UL2) toward RL is a defining trend:

“RL is basically the main modeling toolset that we play around with these days.” —Yi Tay [03:16]
Discussion of on-policy vs. off-policy RL:
- On-policy RL means the model learns from its own outputs and adapts based on reward signals, as opposed to imitating success from other models (off-policy/imitation learning).
- Analogy to human learning: initial imitation (SFT) is useful, but true advancement comes from interacting with the environment and making personal updates.

Memorable quote:

“Now I have a kid and everything, like want my kid to try stuff. And then you tell them like, okay, this is where this went wrong, where this went right... Rather than okay, you just copy everything somebody else does.” —Yi Tay [06:25]

3. Reasoning, Learning Rates, and Self-Consistency

[07:58–12:33]

The host draws parallels between machine learning concepts (like learning rate and curriculum design) and human cognition.
On self-consistency as a fundamental principle in LLM reasoning (e.g., majority voting in CoT chains):

“Chain of thought itself was also a big idea. And then self consistency was also like a big fundamental idea in modern LLM literature.” —Yi Tay [12:19]

4. Achieving Gold at the International Mathematical Olympiad

[12:33–24:34]

Background: Google’s effort to automate IMO gold using an LLM-based approach (Gemini DeepThink) marks a landmark achievement.
Historical context: Previous approaches used hybrid, tool-augmented systems (ex: alpha proofs). The latest winning approach was end-to-end LLM, without separate symbolic reasoning engines.
Yi describes the team composition and the “hackathon” energy of the final push. The solution involved shipping a single model checkpoint for inference during the live IMO event.
The achievement was uncertain and adrenaline-filled, with the IMO’s scoring being bell-curve-based (dependent on human participants).

Notable Quote:

“If we are not... if the model can't get to IMO gold, then can we get to AGI?... At some point we have to use these models to try these Olympic completions.” —Yi Tay [14:03]

On Specialization vs. Generalization:

There was internal debate on using specialized (symbolic) systems versus a large, general LLM.
Yi’s take: “At the end of the day, you want one model for everything. My prediction is that I think most things can be subsumed by the model.” [19:31]

5. Long-Horizon Reasoning Benchmarks: Pokémon, Agents, and Novel Knowledge

[26:43–31:12]

Discussion of new challenging benchmarks, like Pokémon, as tests for long-term planning and visual reasoning in agents.
- “Pokemon is a great long horizon benchmark... but the hard part is actually trying to synthesize the web knowledge and then apply it in the game itself.” —Yi Tay [30:25]
The AI community is not yet at the point where models can truly discover novel knowledge without human guidance.

6. Defining and Demystifying Reasoning in LLMs

[32:52–36:59]

Reasoning’s meaning is morphing; for practitioners, it increasingly means fine-tuning and RL to elicit abilities, not just chain of thought in language outputs.
On the emergence of recursive reasoning traces in LLM corpora (reasoning about reasoning):

“Now we are in this age where LLM text is in the corpus... There’s a bit of a recursive loop.” —Interviewer [35:41]

7. The Rise of AI Coding Tools

[36:59–40:10]

Personal anecdotes: Even experienced researchers like Yi are now AI-powered coders, pasting bugs into internal copilots to get suggestions, saving significant time.
“It does pretty well most of the time. And actually there are classes of problems that it's just...maybe probably better [than me].” —Yi Tay [38:27]

8. The Impact of Code Generation and Automation on Research Teams

[41:02–42:03]

Yi explains the practical impact is more a “passive aura that buffs everybody” rather than direct job replacement.
“These things are not going to replace one person...but more like a passive aura that buffs everybody in game terms. Right?” —Yi Tay [41:32]

9. The Incremental "Hill Climbing" Approach to AGI

[43:52–45:01]

Most progress is a sequence of focused, incremental improvements—adding datasets, improving particular weaknesses, “hill climbing” until AGI is achieved.
However, not all pain points can be solved via focus; some require more foundational shifts.

10. "Is Attention All You Need?" and Model Architecture Futures

[45:22–48:59]

Yi is skeptical that attention mechanisms and Transformers will be replaced anytime soon, citing architectural inertia and compatibility with existing research investments.

“It’s probably the self-attention is...there was this whole big era...where people try to undermine the attention as much as possible...In the end, always...you have one layer of self-attention there and it still works.” —Yi Tay [47:26]

11. Scaling Laws, Hardware Bottlenecks, and Data Efficiency

[54:52–59:33]

Ongoing debate: Are we compute-bound, memory-bound, bandwidth-bound, or data-bound?
The field remains unoptimized for data efficiency. Human learning provides an existence proof that AI can, in principle, learn much more from less.

Notable Quote:

“The data efficiency of humans is definitely way higher. Than models... where is the bug? Maybe it’s a feature, not a bug.” —Yi Tay [59:09]

12. World Models, Learning Efficiency, and the Road to AGI

[60:10–66:06]

Exploration of “world models” as a paradigm for more data-efficient, generalizable learning.
The future might lie in methods that allow AIs to form and update internal world models, not merely process sequences.
However, precise definition and implementation of world models remain elusive.

13. Societal and Organizational Shifts: Asian AI Frontier and Building Teams

[78:42–84:46]

Gemini is establishing a frontier lab in Singapore, aiming for global relevance and deeply inspired local talent.

“I think it's possible for Singapore to be close to the frontier. And...having the true pioneers of AI here, we want to...inspire the community here.” —Yi Tay [79:45]
Remote/international collaboration is routine; the main constraint is time zones, not geography.

14.What Makes a Good Researcher?

[85:39–88:38]

Yi emphasizes a preference for talent density, high raw IQ, and a proven “research taste” over credentials.

“You can just do good work, put it online and then somebody will contact you. It's actually super easy but super hard at the same time.” —Yi Tay [87:29]
High competition makes it both easier and harder for talented grad students to attract attention: execution and taste are key.

15. Personal Health, Productivity, and Sustainability

[89:21–91:52]

Yi shares success in losing 23kg over the past 1.5 years.

“I think being healthy is important to do good research...It's also impacted my work in a good way.” [89:28–91:17]
Productivity is tied to health and long-term sustainability in intellectually demanding roles.

Notable Quotes & Memorable Moments

On RL and Human Analogy:
“On policy-ness is more like humans. We are more on policy because we go around the world, we make mistakes and then we... ah, okay, this is... But imitation learning is mostly somebody else... not first principles.” —Yi Tay [06:25]
On achieving IMO gold:
“If the model can't get to IMO gold, then can we get to AGI?” —Yi Tay [14:03]
On dropping specialized tool chains:
“At the end of the day, you want one model for everything... most things can be subsumed by the model.” —Yi Tay [19:31]
On AI coding as a research productivity force multiplier:
“These things are not going to replace one person...but more like a passive aura that buffs everybody in game terms. Right?” —Yi Tay [41:32]
On the academic vs. industrial divide in IR/RecSys:
“Every time I ran some modeling experiments for rexis...it feels like the environment was rude...You are in a world where the gravity is different...” —Yi Tay [76:14]
On Singapore and “frontier” founding:
“I think it's possible for Singapore to be close to the frontier. And...having the true pioneers of AI here, we want to...inspire the community here.” —Yi Tay [79:45]

Highlighted Timestamps

On joining Gemini Singapore and team naming: [01:00–01:24]
RL as the new “main” modeling paradigm: [03:29]
On-policy RL and human learning analogies: [05:17–06:53]
Chain-of-thought, self-consistency, and reasoning in LLMs: [11:32–12:33]
Achieving IMO gold w/Gemini, process and surprise: [12:33–24:34]
Benchmarks: Pokémon as agentic planning test: [26:43–31:12]
Using AI for own research (AI coding): [36:59–40:10]
Architectural inertia, attention and Transformers: [45:22–48:59]
Data efficiency and human learning as a target: [54:52–59:33]
Establishing Singapore as a global AI research hub: [78:42–84:46]
On research hiring, taste, and "stat points": [85:39–88:38]
Personal health transformation and productivity: [89:21–91:52]

Final Thoughts

This episode provides a deeply authentic look into some of the brightest minds shaping AGI research today. Listeners get a sense of the intensity, collaborative energy, and reflective humor that powers the field—alongside practical wisdom on everything from team-building and hiring to the importance of physical and mental health. The dialogue is equal parts technical, philosophical, and personal, making it a must-listen for anyone interested in the bleeding edge of AI research and engineering culture.

Latent Space: The AI Engineer Podcast

Episode: Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay

Date: January 23, 2026
Guest: Yi Tay (Google DeepMind, Gemini Singapore)
Host: Latent.Space

Episode Overview

Key Discussion Points and Insights

1. Gemini Singapore and the AGI Ambition

[01:00–02:52]

Gemini Singapore is an extension of the global Gemini effort, specializing in “Reasoning and AGI.”
The team’s naming (“AGI”) signals ambition and is meant to communicate their North Star to both insiders and outsiders.
Rejoining Google, Yi reflects that returning felt seamless—“like you play Pokemon, you leave it aside and then you go back and you click continue. Save game.”—and that Google’s infrastructure is still a major asset.

2. From Foundation Models to Reinforcement Learning

[03:19–04:49]

The shift in research focus from architectures and pre-training (think T5, UL2) toward RL is a defining trend:

“RL is basically the main modeling toolset that we play around with these days.” —Yi Tay [03:16]
Discussion of on-policy vs. off-policy RL:
- On-policy RL means the model learns from its own outputs and adapts based on reward signals, as opposed to imitating success from other models (off-policy/imitation learning).
- Analogy to human learning: initial imitation (SFT) is useful, but true advancement comes from interacting with the environment and making personal updates.

Memorable quote:

“Now I have a kid and everything, like want my kid to try stuff. And then you tell them like, okay, this is where this went wrong, where this went right... Rather than okay, you just copy everything somebody else does.” —Yi Tay [06:25]

3. Reasoning, Learning Rates, and Self-Consistency

[07:58–12:33]

The host draws parallels between machine learning concepts (like learning rate and curriculum design) and human cognition.
On self-consistency as a fundamental principle in LLM reasoning (e.g., majority voting in CoT chains):

“Chain of thought itself was also a big idea. And then self consistency was also like a big fundamental idea in modern LLM literature.” —Yi Tay [12:19]

4. Achieving Gold at the International Mathematical Olympiad

[12:33–24:34]

Background: Google’s effort to automate IMO gold using an LLM-based approach (Gemini DeepThink) marks a landmark achievement.
Historical context: Previous approaches used hybrid, tool-augmented systems (ex: alpha proofs). The latest winning approach was end-to-end LLM, without separate symbolic reasoning engines.
Yi describes the team composition and the “hackathon” energy of the final push. The solution involved shipping a single model checkpoint for inference during the live IMO event.
The achievement was uncertain and adrenaline-filled, with the IMO’s scoring being bell-curve-based (dependent on human participants).

Notable Quote:

“If we are not... if the model can't get to IMO gold, then can we get to AGI?... At some point we have to use these models to try these Olympic completions.” —Yi Tay [14:03]

On Specialization vs. Generalization:

There was internal debate on using specialized (symbolic) systems versus a large, general LLM.
Yi’s take: “At the end of the day, you want one model for everything. My prediction is that I think most things can be subsumed by the model.” [19:31]

5. Long-Horizon Reasoning Benchmarks: Pokémon, Agents, and Novel Knowledge

[26:43–31:12]

Discussion of new challenging benchmarks, like Pokémon, as tests for long-term planning and visual reasoning in agents.
- “Pokemon is a great long horizon benchmark... but the hard part is actually trying to synthesize the web knowledge and then apply it in the game itself.” —Yi Tay [30:25]
The AI community is not yet at the point where models can truly discover novel knowledge without human guidance.

6. Defining and Demystifying Reasoning in LLMs

[32:52–36:59]

Reasoning’s meaning is morphing; for practitioners, it increasingly means fine-tuning and RL to elicit abilities, not just chain of thought in language outputs.
On the emergence of recursive reasoning traces in LLM corpora (reasoning about reasoning):

“Now we are in this age where LLM text is in the corpus... There’s a bit of a recursive loop.” —Interviewer [35:41]

7. The Rise of AI Coding Tools

[36:59–40:10]

Personal anecdotes: Even experienced researchers like Yi are now AI-powered coders, pasting bugs into internal copilots to get suggestions, saving significant time.
“It does pretty well most of the time. And actually there are classes of problems that it's just...maybe probably better [than me].” —Yi Tay [38:27]

8. The Impact of Code Generation and Automation on Research Teams

[41:02–42:03]

Yi explains the practical impact is more a “passive aura that buffs everybody” rather than direct job replacement.
“These things are not going to replace one person...but more like a passive aura that buffs everybody in game terms. Right?” —Yi Tay [41:32]

9. The Incremental "Hill Climbing" Approach to AGI

[43:52–45:01]

Most progress is a sequence of focused, incremental improvements—adding datasets, improving particular weaknesses, “hill climbing” until AGI is achieved.
However, not all pain points can be solved via focus; some require more foundational shifts.

10. "Is Attention All You Need?" and Model Architecture Futures

[45:22–48:59]

Yi is skeptical that attention mechanisms and Transformers will be replaced anytime soon, citing architectural inertia and compatibility with existing research investments.

“It’s probably the self-attention is...there was this whole big era...where people try to undermine the attention as much as possible...In the end, always...you have one layer of self-attention there and it still works.” —Yi Tay [47:26]

11. Scaling Laws, Hardware Bottlenecks, and Data Efficiency

[54:52–59:33]

Ongoing debate: Are we compute-bound, memory-bound, bandwidth-bound, or data-bound?
The field remains unoptimized for data efficiency. Human learning provides an existence proof that AI can, in principle, learn much more from less.

Notable Quote:

“The data efficiency of humans is definitely way higher. Than models... where is the bug? Maybe it’s a feature, not a bug.” —Yi Tay [59:09]

12. World Models, Learning Efficiency, and the Road to AGI

[60:10–66:06]

Exploration of “world models” as a paradigm for more data-efficient, generalizable learning.
The future might lie in methods that allow AIs to form and update internal world models, not merely process sequences.
However, precise definition and implementation of world models remain elusive.

13. Societal and Organizational Shifts: Asian AI Frontier and Building Teams

[78:42–84:46]

Gemini is establishing a frontier lab in Singapore, aiming for global relevance and deeply inspired local talent.

“I think it's possible for Singapore to be close to the frontier. And...having the true pioneers of AI here, we want to...inspire the community here.” —Yi Tay [79:45]
Remote/international collaboration is routine; the main constraint is time zones, not geography.

14.What Makes a Good Researcher?

[85:39–88:38]

Yi emphasizes a preference for talent density, high raw IQ, and a proven “research taste” over credentials.

“You can just do good work, put it online and then somebody will contact you. It's actually super easy but super hard at the same time.” —Yi Tay [87:29]
High competition makes it both easier and harder for talented grad students to attract attention: execution and taste are key.

15. Personal Health, Productivity, and Sustainability

[89:21–91:52]

Yi shares success in losing 23kg over the past 1.5 years.

“I think being healthy is important to do good research...It's also impacted my work in a good way.” [89:28–91:17]
Productivity is tied to health and long-term sustainability in intellectually demanding roles.

Notable Quotes & Memorable Moments

On RL and Human Analogy:
“On policy-ness is more like humans. We are more on policy because we go around the world, we make mistakes and then we... ah, okay, this is... But imitation learning is mostly somebody else... not first principles.” —Yi Tay [06:25]
On achieving IMO gold:
“If the model can't get to IMO gold, then can we get to AGI?” —Yi Tay [14:03]
On dropping specialized tool chains:
“At the end of the day, you want one model for everything... most things can be subsumed by the model.” —Yi Tay [19:31]
On AI coding as a research productivity force multiplier:
“These things are not going to replace one person...but more like a passive aura that buffs everybody in game terms. Right?” —Yi Tay [41:32]
On the academic vs. industrial divide in IR/RecSys:
“Every time I ran some modeling experiments for rexis...it feels like the environment was rude...You are in a world where the gravity is different...” —Yi Tay [76:14]
On Singapore and “frontier” founding:
“I think it's possible for Singapore to be close to the frontier. And...having the true pioneers of AI here, we want to...inspire the community here.” —Yi Tay [79:45]

Highlighted Timestamps

On joining Gemini Singapore and team naming: [01:00–01:24]
RL as the new “main” modeling paradigm: [03:29]
On-policy RL and human learning analogies: [05:17–06:53]
Chain-of-thought, self-consistency, and reasoning in LLMs: [11:32–12:33]
Achieving IMO gold w/Gemini, process and surprise: [12:33–24:34]
Benchmarks: Pokémon as agentic planning test: [26:43–31:12]
Using AI for own research (AI coding): [36:59–40:10]
Architectural inertia, attention and Transformers: [45:22–48:59]
Data efficiency and human learning as a target: [54:52–59:33]
Establishing Singapore as a global AI research hub: [78:42–84:46]
On research hiring, taste, and "stat points": [85:39–88:38]
Personal health transformation and productivity: [89:21–91:52]

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay

Powered by Wave AI

Summary

Latent Space: The AI Engineer Podcast

Episode: Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay

Episode Overview

Key Discussion Points and Insights

1. Gemini Singapore and the AGI Ambition

2. From Foundation Models to Reinforcement Learning

3. Reasoning, Learning Rates, and Self-Consistency

4. Achieving Gold at the International Mathematical Olympiad

On Specialization vs. Generalization:

5. Long-Horizon Reasoning Benchmarks: Pokémon, Agents, and Novel Knowledge

6. Defining and Demystifying Reasoning in LLMs

7. The Rise of AI Coding Tools

8. The Impact of Code Generation and Automation on Research Teams

9. The Incremental "Hill Climbing" Approach to AGI

10. "Is Attention All You Need?" and Model Architecture Futures

11. Scaling Laws, Hardware Bottlenecks, and Data Efficiency

12. World Models, Learning Efficiency, and the Road to AGI

13. Societal and Organizational Shifts: Asian AI Frontier and Building Teams

14.What Makes a Good Researcher?

15. Personal Health, Productivity, and Sustainability

Notable Quotes & Memorable Moments

Highlighted Timestamps

Final Thoughts

Summary

Latent Space: The AI Engineer Podcast

Episode: Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay

Episode Overview

Key Discussion Points and Insights

1. Gemini Singapore and the AGI Ambition

2. From Foundation Models to Reinforcement Learning

3. Reasoning, Learning Rates, and Self-Consistency

4. Achieving Gold at the International Mathematical Olympiad

On Specialization vs. Generalization:

5. Long-Horizon Reasoning Benchmarks: Pokémon, Agents, and Novel Knowledge

6. Defining and Demystifying Reasoning in LLMs

7. The Rise of AI Coding Tools

8. The Impact of Code Generation and Automation on Research Teams

9. The Incremental "Hill Climbing" Approach to AGI

10. "Is Attention All You Need?" and Model Architecture Futures

11. Scaling Laws, Hardware Bottlenecks, and Data Efficiency

12. World Models, Learning Efficiency, and the Road to AGI

13. Societal and Organizational Shifts: Asian AI Frontier and Building Teams

14.What Makes a Good Researcher?

15. Personal Health, Productivity, and Sustainability

Notable Quotes & Memorable Moments

Highlighted Timestamps

Final Thoughts