Latent Space: The AI Engineer Podcast
Episode: [State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor
Guest: Ashvin Nair (Cursor, formerly OpenAI, Berkeley RL PhD)
Host: Latent.Space
Recorded: At NeurIPS, December 30, 2025
Episode Overview
This special NeurIPS episode features Ashvin Nair, a recent addition to Cursor and a former OpenAI researcher, discussing the evolving frontier of reinforcement learning (RL), reasoning in language models, code generation, and the shift from major labs like OpenAI to high-velocity startups. The conversation traces Ashvin’s trajectory through robotics and RL research, his time on OpenAI’s code and reasoning teams, the significance (and limitations) of recent language model achievements, and finally his move to Cursor, where product/model co-design and continual learning are at the heart of rapid model improvement for coding agents.
Major Discussion Topics
1. From Robotics to Language Models: The Transfer of Grit and Skills
- Robotics researchers frequently move into language models, carrying valuable real-world problem-solving skills.
- "Robotics is a pretty good fit for LLMs because the Switch ends up being— you kind of do similar things... it's kind of hard to get stuff working in robotics world. I think it builds very gritty people." (01:31)
- The contrast between robotics’ “gritty” real-world focus and the sometimes “unhinged” simulation world is noted.
- "Lex [Fridman] said: the most rounded people are the robotics people because they don't have a choice. They work with the real world. The most unhinged... are the simulation people." (02:01)
2. The State and Hype Cycle of Robotics vs. Software AI Agents
- Robotics funding is increasing, but practical breakthroughs and market size still lag behind software agents.
- "It feels like LLM agents are going to be like a trillion dollar market before robotics is maybe even like a $10 billion market." (03:46)
- Robotics is compared to where language models were at the GPT-1/GPT-2 stage: impressive demos, but not at a generalization or economic inflection point. (04:54)
3. OpenAI, GPT Codex, and IOI/IMO Gold Medals
- Ashvin joined OpenAI’s CodeGen team before ChatGPT became public.
- "I was like, okay, I'm going to go to this chill research lab... and then ChatGPT happens and everything blew up." (06:00)
- The significance and “moving goalposts” of AI benchmarks:
- "If you told me we could have gotten IOI gold, then I would have just assumed... AI is solved, no point in working anymore." (07:07)
- Yet, even winning at these benchmarks hasn’t translated to total automation or societal upheaval, which leads to meta-reflection on progress.
- "I think that's actually what I spend a lot of time thinking about, is why is that the case?" (07:30)
4. Shifting the Goalposts and Overfitting to Benchmarks
- AI has a history of achieving “impossible” things (chess, Go, IOI gold), yet society adapts and demands more.
- "I think shifting the goalpost to some extent is correct. We keep Goodhart-ing whatever goalposts we have." (08:13)
- Reinforcement Learning research itself often overfit to its own benchmarks during the “RL hype” in academia.
- "We gave ourselves a lot of new knobs to tune and then... tuned those to fit the benchmarks. Everyone knew... but it's hard to appreciate that's happening at the community level." (09:09, 10:15)
- The “RL winter,” and how academia incentivized theoretical over simple, scalable solutions.
- "One of the pitfalls of academia is it doesn't really reward simple ideas that work... the things that actually work tend to be kind of simple." (10:38)
5. The Scaling Era: Is It Over?
- Scaling works, but not as an infinite engine for automating all jobs.
- "I think scaling is still happening, but it's worth seriously interrogating... why we’re not just automating all jobs right now." (11:45)
6. Limitations of RL in LLMs and Pathways to Practical Use
- RL in LLMs generalizes only narrow slices of tasks—what’s missing is bringing practically valuable context into scope.
- "It really feels like, if RL is a tool, then a big thing that needs to happen is... bring the world of economically useful tasks in distribution for RL." (12:21)
- Example: GDP-eval is a push to benchmark LLMs on real-world economically valuable tasks. (13:43)
7. General-Purpose Models vs. Specialist Models
- OpenAI has moved away from “one model fits all.”
- "Fiji [Simo] writing a blog post: we are no longer doing one model fits all." (17:02)
- It’s more of an organizational constraint than a technical limitation; orgs invest in the data that reflects their needs.
8. The “Blip” at OpenAI and the Unsettled State of AGI Governance
- Ashvin recalls the chaos around Sam Altman’s temporary firing and the uncertainty regarding board/governance structure.
- "It does feel like, no matter if we hit AGI in two years or ten or whatever, it’s not clear we have a good structure for governance." (19:36)
9. Origins of the Reasoning Paradigm (“o” Series at OpenAI)
- OpenAI’s “o” line models derive from a conviction that RL is the path to AGI, but RL truly started to click after pretraining improved.
- "People have been convinced for a long time that RL would be the way to get there— it started to work once pretraining got good enough." (23:15)
- The “big leap” moments are often perceived externally, but from inside the labs, progress feels incremental and continuous.
- "Internally at OpenAI it feels very smooth... you keep having different runs that get a little better each time." (26:07)
10. Cursor: Why Join a Startup Now?
- At Cursor, the small, tight org allows product-model co-design—something hard at scale in OpenAI.
- "At Cursor... the product people sit right next to the ML people. There's a lot of potential." (33:07)
- Ultra-fast RL policy updates (every 2 hours) are achievable because of this tight loop. (34:34)
- Continual learning in production is possible, but data curation and learning from mistakes (like humans do) remains an open problem.
- "Humans are quite good about bad data... you see someone touch a hot stove and you don't have to do it again. Models will happily keep doing it." (35:25)
11. What Makes Composer at Cursor Different?
- Composer’s key edge is combining sufficient intelligence with speed, keeping the developer in the loop—contrasted with friction caused by long LLM response times.
- "Smart enough you actually want to use it and it's also fast, so you kind of stay in the loop... all the other smart models are slow." (37:15)
12. Frontiers in ML/RL Productization
- Product/ML co-design can finally automate much of the process of software engineering rather than just answer prompts.
- "Really automating software engineering as a process... you write code, you go look at datadog, come back, have hypotheses... rerun stuff." (38:08)
- A smaller, focused ML group and deep internal tooling lets Cursor move faster and understand its data/user needs more directly.
Notable Quotes & Memorable Moments
-
On robotics transferable skills:
"It kind of builds very gritty people who look at data a lot, that kind of thing." — Ashvin (01:31) -
On the IOI/IMO milestone and meaning:
"If you told me we could have gotten IOI gold, then I would've just assumed that we could all go on vacation. It's all over. AI is solved." — Ashvin (07:07)
"Feels like nothing that much has changed, right? Life is still the same." — Host (07:25) -
On the continuous “shifting goalposts” for AGI:
"We keep Goodharting whatever goalposts we have." — Ashvin (08:13) -
On overfitting to RL benchmarks:
"We gave ourselves a lot of new knobs to tune and then implicitly tuned those to fit the benchmarks." — Ashvin (09:09) -
On product-model co-design at Cursor:
"There's unique opportunities to co-design the product with the model in ways I think we couldn't do unless we actually built the model ourselves." — Ashvin (33:07)
"We have policy updates every two hours." — Ashvin (34:34) -
On the limits and opportunities in continual learning:
"We're a few orders of magnitude in data efficiency away from... you do something once or make a mistake, models will happily just keep doing it." — Ashvin (35:38) -
On practical edge of Composer:
"It's smart enough you actually want to use it and it's also fast, so you kind of stay in the loop... all the other smart models are slow." — Ashvin (37:15)
Timestamps for Key Segments
- Robotics & Transfer to LLMs: 00:25–02:55
- Robotics vs. LLM Agents Market: 03:27–04:54
- Joining OpenAI, CodeGen, and ChatGPT Launch: 06:00–06:32
- IOI Gold Medals and the Real-World Impact: 07:03–08:13
- Shifting the Goalposts & Benchmarks: 08:13–09:09
- RL Academic Hype & Overfitting: 09:09–11:23
- Is Scaling Over? Moving to Economic Value: 11:41–13:34
- Generalization & Organization Limits in Foundation Models: 17:02–18:36
- The ‘Blip’ at OpenAI (Altman’s Firing): 18:46–21:03
- Origins of OpenAI’s Reasoning Paradigm: 22:11–24:48
- Continuous Progress vs. Perceived Leaps: 26:02–27:22
- Cursor’s Edge: Co-design, Fast RL Updates: 33:07–35:13
- Challenge of Continual Learning: 35:13–36:20
- What Makes Composer Stand Out: 37:15–38:08
- On RL Insights & Internal Tooling: 39:09–40:21
- Scientific Curiosity About Weight Capacity: 42:16–43:51
- RL Interview Question: 44:00–44:38
Final Reflections & Hiring Call
-
Cursor is hiring ML engineers, especially those interested in data and reward systems for code.
- "If you're interested in working on... data and rewards for code, that's a huge need. Please get in touch." (44:43)
-
Ashvin closes with a classic RL interview question:
- "Why is off-policy RL unstable?" (44:04)
Takeaways
- Progress inside labs is incremental even when outside perceptions are of sudden leaps.
- RL has reached unprecedented impact in code and reasoning LLMs, but overfitting to benchmarks and lack of continual learning remain challenges.
- Product/model co-design and tight team structures can drive much faster improvement in applied ML, as evidenced by Cursor’s rapid iteration.
- The conversation about AGI governance is far from finished—the “blip” served as a reminder that foundational choices are still unresolved.
Listen or Read More
- Full show notes and transcript at: https://latent.space
![[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor - Latent Space: The AI Engineer Podcast cover](/_next/image?url=https%3A%2F%2Fsubstackcdn.com%2Ffeed%2Fpodcast%2F1084089%2Fpost%2F186610562%2F0487befcd8272c3f597d688c89eab36b.jpg&w=1200&q=75)