Podcast Summary: a16z Podcast

Episode: From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki
Date: September 25, 2025
Host: Andreessen Horowitz (Ajahnay Miha & Sarah Wang)
Guests: Jakub (Jakob) Pachocki, Chief Scientist, OpenAI & Mark Chen, Chief Research Officer, OpenAI

Main Theme & Purpose

This episode delves into OpenAI’s evolving research culture and technical direction, focusing on the long-term mission of building an “automated researcher”—AI that can autonomously discover new ideas and make economically meaningful contributions. Mark Chen and Jakub Pachocki discuss the research strategy behind GPT-5 and Codex, trends in “vibe coding” and “vibe researching”, benchmarks and evaluation strategies, RL’s persistent relevance, managing a world-class research organization, and maintaining a balance between product and fundamental research.

Key Discussion Points & Insights

1. The “Automated Researcher” Vision

Jakub Pachocki (00:00, 07:04): OpenAI’s ambition is to automate not just coding or chatbots, but the discovery of new ideas—making AI that acts as an autonomous researcher across domains, including science and ML itself.

"The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas."
Moving beyond solving already-known problems to advancing technology—and science—months or years ahead.

2. GPT-5 and Mainstream Reasoning

Mark Chen (01:51): GPT-5 is a milestone aimed at bringing deep reasoning abilities to mainstream AI use, bridging prior “instant” models and more thoughtful, process-oriented series.

"We think the future is about Reasoning—more and more about reasoning, more and more about agents. And GPT-5 is a step towards delivering reasoning and more agentic behavior by default."
User experience focus: abstracting away “thinking modes,” so people need not worry about model selection for different tasks.

3. Evals: Moving to Meaningful Benchmarks

Jakub Pachocki (03:11):
- Traditional evaluation metrics (e.g., accuracy on standard tasks) are nearing saturation; moving from 96% to 98% no longer feels significant.
- New focus is on evaluations that probe actual discovery and “economically relevant” breakthroughs, such as top placements in global math/coding competitions or meaningful outputs in real-world settings.
"I think the big things that we look at are actual marks of the model being able to discover new things."

4. Long Horizon Agency & Model Autonomy

Jakub Pachocki (07:04):
- Measuring model progress in terms of “how long can the model autonomously operate”—expanding from mastering high school-level competitions to planning and memory over multi-hour, multi-step research tasks.
Sarah Wang (08:04-09:38): Team reflects on the stability/autonomy tradeoff—task depth sometimes leads to quality regressions, but new reasoning models have extended reliable work over longer “horizons”.

5. Extending Progress to Less-Verifiable Domains

Jakub Pachocki (09:52):
- Transition from math/science (where there’s a “right” answer) to open-ended domains is not a qualitative leap—many hard, open scientific problems actually require the same style of broad, open exploration and synthesis.

6. RL’s Ongoing Utility

Ajahnay Miha & Jakub Pachocki (11:18-12:58):
- Despite predictions that reinforcement learning (RL) would plateau, it continues to produce major gains in language models and generalist AI. Key challenge: Find the right “environment” for RL in language, now solved in part by leveraging massive language data.
"RL is a very versatile method...once you have an RL system working a long time, you have the ability to actually execute on these different ideas in this extremely robust, rich environment."
Jakub Pachocki (13:54): The future for reward modeling is that it becomes more human-like and less about carefully hand-crafting datasets:

"We will be inching towards more and more human-like learning, which RL is still not quite."

7. The Codex Leap & "Vibe Coding"

Mark Chen (14:35): Codex is now about real-world messy coding:
- Adapting model latency for problem complexity—fast results for easy tasks, more processing for hard ones.
- Focused on nuanced aspects of coding—style, proactivity, laziness.
"Real world coding is very messy...We've done a lot of work to dial in on [presets]: for easy problems, much lower latency; for harder problems, higher latency gets you the best solution."
Jakub Pachocki (16:43): Coding models now often surpass top human competitors—open modeling alleviates intense labor (e.g., “30 file refactor in 15 minutes”). He's moving beyond old-school approaches (just Vim!) due to the gains.
Mark Chen (18:31): “Vibe coding” is now the student default—AI-first. The aspirational analog is “vibe researching”—lowering research’s barrier to entry.

8. The Nature of Great Research(ers) & Vibe Researching

Jakub Pachocki (20:36):
- Persistence and honest hypothesis-testing are key.
- Great research requires belief in the value of hard, sometimes “impossible” problems, but also the humility to course-correct.
"The special thing about research...is you're trying to create something or learn something that is just not known."
Mark Chen (21:34): Experience, the ability to manage one’s emotions amid repeated failures, and learning problem-interest from colleagues and literature.

9. Organizational Resilience & Research Culture

Mark Chen (25:55): Recruiting/retaining great talent depends on authentic commitment to fundamental research, not reactive product competition.
Jakub Pachocki (27:46): Indicators for hiring: Track record of solving hard problems (not just visibility/social media presence).
Mark Chen (28:54): Recognizing researcher diversity; some excel at idea generation, others at execution.
Mark Chen (30:08): Protecting time and focus for fundamental research is critical in a competitive, product-driven landscape.

10. Product, Research, and Resource Balancing

Delineation between researchers devoted to core advances and those focused on product; coordinated yet protected.
Jakub Pachocki (33:38): Despite independence and divergence, all projects are ultimately linked by the automated researcher goal.
Mark Chen & Sarah Wang (37:25-39:43): Compute remains the main constraint ("Compute is destiny"), and resource allocation is dynamically managed. Claims that data, not compute, will be bottleneck have not played out.

11. Universities, OpenAI, & The Pace of AI Progress

Mark Chen (40:13): OpenAI’s resident program modeled after condensed PhDs, creating powerful talent pipelines.
Jakub Pachocki (41:18): Academia instills persistence and the value of grappling with hard, "stuck" problems—transferable to OpenAI’s culture of tackling ambitious challenges.

12. Handling Perception, Roadmap, and Change

Jakub Pachocki (42:43): Strong convictions steer research regardless of product hype—feedback incorporated, but vision drives priorities.
Mark Chen (43:30): Product launches aim for wild success, but fundamental research is about developing core capabilities for long-term impact.

13. Company Culture, Speed, and Enduring Principles

Mark Chen (45:50): OpenAI avoids the “learning plateau” with its culture of constant change and high volume of research breakthroughs.
Jakub Pachocki (46:45): Eternal readiness for paradigm shifts—always learning the “new thing”.

14. Co-Leadership, Chemistry & Trust

Mark Chen & Jakub Pachocki (47:17-49:24): Their deep mutual trust developed while jointly tackling unpopular, high-risk research, forming the backbone of OpenAI's reasoning advances and team cohesion.

"I think over time, kind of growing a very small effort into increasing larger effort..." — Mark Chen
"Mark... got a group of people working on very different things, got them all together and created a team with incredible chemistry." — Jakub Pachocki

Notable Quotes & Memorable Moments

Jakub Pachocki [00:00]:

"The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas."
Mark Chen [01:51]:

"We think GPT-5 is this step towards delivering reasoning and more agentic behavior by default."
Jakub Pachocki [03:11]:

"...For a lot of [evals], inching from 96 to 98% is not necessarily the most important thing in the world [...] Now we have these different ways of training, in particular reinforcement learning on serious reasoning, where we can pick a domain and we can really train a model to become an expert in this domain, to reason very hard about it..."
Mark Chen [18:31]:

"This past weekend I was talking to some high schoolers and they're saying, oh, you know, actually the default way to code is Vibe coding."
Jakub Pachocki [20:36]:

"Persistence is a very key trait... you're trying to create something... that is just not known. It's not known to work. You don't know whether it will work. And so always trying something that will most likely fail."
Mark Chen [25:55]:

"We have a fairly clear and crisp definition of what it is we're out to build. We like innovating at the frontier, we really don’t like copying..."
Mark Chen [39:35]:

"Anyone who says [we’re not compute constrained] should just step into my job for a week... there's no one who's like, oh, you know, I have all the compute that I need."
Jakub Pachocki [44:53]:

"More broadly than compute, there's physical constraints of energy. But also at some point, not too far, robotics will become a major focus."

Timestamps of Key Segments

Automated Researcher Vision: 00:00–00:13, 07:04–08:04
GPT-5 Mission & Reasoning: 01:51–02:43
Shift in Evals: 02:56–05:12
Competition Wins & Economic Relevance: 05:12–07:04
Long-Horizon Reasoning & Agency: 07:04–09:13
Extending Progress Beyond Science: 09:38–11:18
RL’s Enduring Effectiveness: 11:18–14:21
Codex, Coding, and Vibe Coding: 14:21–20:07
Great Research & Researcher Traits: 20:07–24:47
Organizational Resilience & Talent: 24:47–30:44
Research vs Product & Resource Balancing: 30:44–39:16
Academia vs Startup R&D: 39:43–42:16
Culture, Speed, and Enduring Research Principles: 42:16–46:45
Trust & Leadership at OpenAI: 47:17–50:40

Conclusion

This conversation offers a rich, behind-the-scenes look at OpenAI’s DNA—how an obsession with deep reasoning, persistent frontier-seeking, and culture of autonomy and trust has positioned the organization for another leap, beyond “vibe coding” to “vibe researching”. The mission: AI that can truly discover, invent, and progress humanity’s knowledge across any field.

Podcast Summary: a16z Podcast

Main Theme & Purpose

Key Discussion Points & Insights

1. The “Automated Researcher” Vision

Jakub Pachocki (00:00, 07:04): OpenAI’s ambition is to automate not just coding or chatbots, but the discovery of new ideas—making AI that acts as an autonomous researcher across domains, including science and ML itself.

"The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas."
Moving beyond solving already-known problems to advancing technology—and science—months or years ahead.

2. GPT-5 and Mainstream Reasoning

Mark Chen (01:51): GPT-5 is a milestone aimed at bringing deep reasoning abilities to mainstream AI use, bridging prior “instant” models and more thoughtful, process-oriented series.

"We think the future is about Reasoning—more and more about reasoning, more and more about agents. And GPT-5 is a step towards delivering reasoning and more agentic behavior by default."
User experience focus: abstracting away “thinking modes,” so people need not worry about model selection for different tasks.

3. Evals: Moving to Meaningful Benchmarks

Jakub Pachocki (03:11):
- Traditional evaluation metrics (e.g., accuracy on standard tasks) are nearing saturation; moving from 96% to 98% no longer feels significant.
- New focus is on evaluations that probe actual discovery and “economically relevant” breakthroughs, such as top placements in global math/coding competitions or meaningful outputs in real-world settings.
"I think the big things that we look at are actual marks of the model being able to discover new things."

4. Long Horizon Agency & Model Autonomy

Jakub Pachocki (07:04):
- Measuring model progress in terms of “how long can the model autonomously operate”—expanding from mastering high school-level competitions to planning and memory over multi-hour, multi-step research tasks.
Sarah Wang (08:04-09:38): Team reflects on the stability/autonomy tradeoff—task depth sometimes leads to quality regressions, but new reasoning models have extended reliable work over longer “horizons”.

5. Extending Progress to Less-Verifiable Domains

Jakub Pachocki (09:52):
- Transition from math/science (where there’s a “right” answer) to open-ended domains is not a qualitative leap—many hard, open scientific problems actually require the same style of broad, open exploration and synthesis.

6. RL’s Ongoing Utility

Ajahnay Miha & Jakub Pachocki (11:18-12:58):
- Despite predictions that reinforcement learning (RL) would plateau, it continues to produce major gains in language models and generalist AI. Key challenge: Find the right “environment” for RL in language, now solved in part by leveraging massive language data.
"RL is a very versatile method...once you have an RL system working a long time, you have the ability to actually execute on these different ideas in this extremely robust, rich environment."
Jakub Pachocki (13:54): The future for reward modeling is that it becomes more human-like and less about carefully hand-crafting datasets:

"We will be inching towards more and more human-like learning, which RL is still not quite."

7. The Codex Leap & "Vibe Coding"

Mark Chen (14:35): Codex is now about real-world messy coding:
- Adapting model latency for problem complexity—fast results for easy tasks, more processing for hard ones.
- Focused on nuanced aspects of coding—style, proactivity, laziness.
"Real world coding is very messy...We've done a lot of work to dial in on [presets]: for easy problems, much lower latency; for harder problems, higher latency gets you the best solution."
Jakub Pachocki (16:43): Coding models now often surpass top human competitors—open modeling alleviates intense labor (e.g., “30 file refactor in 15 minutes”). He's moving beyond old-school approaches (just Vim!) due to the gains.
Mark Chen (18:31): “Vibe coding” is now the student default—AI-first. The aspirational analog is “vibe researching”—lowering research’s barrier to entry.

8. The Nature of Great Research(ers) & Vibe Researching

Jakub Pachocki (20:36):
- Persistence and honest hypothesis-testing are key.
- Great research requires belief in the value of hard, sometimes “impossible” problems, but also the humility to course-correct.
"The special thing about research...is you're trying to create something or learn something that is just not known."
Mark Chen (21:34): Experience, the ability to manage one’s emotions amid repeated failures, and learning problem-interest from colleagues and literature.

9. Organizational Resilience & Research Culture

Mark Chen (25:55): Recruiting/retaining great talent depends on authentic commitment to fundamental research, not reactive product competition.
Jakub Pachocki (27:46): Indicators for hiring: Track record of solving hard problems (not just visibility/social media presence).
Mark Chen (28:54): Recognizing researcher diversity; some excel at idea generation, others at execution.
Mark Chen (30:08): Protecting time and focus for fundamental research is critical in a competitive, product-driven landscape.

10. Product, Research, and Resource Balancing

Delineation between researchers devoted to core advances and those focused on product; coordinated yet protected.
Jakub Pachocki (33:38): Despite independence and divergence, all projects are ultimately linked by the automated researcher goal.
Mark Chen & Sarah Wang (37:25-39:43): Compute remains the main constraint ("Compute is destiny"), and resource allocation is dynamically managed. Claims that data, not compute, will be bottleneck have not played out.

11. Universities, OpenAI, & The Pace of AI Progress

Mark Chen (40:13): OpenAI’s resident program modeled after condensed PhDs, creating powerful talent pipelines.
Jakub Pachocki (41:18): Academia instills persistence and the value of grappling with hard, "stuck" problems—transferable to OpenAI’s culture of tackling ambitious challenges.

12. Handling Perception, Roadmap, and Change

Jakub Pachocki (42:43): Strong convictions steer research regardless of product hype—feedback incorporated, but vision drives priorities.
Mark Chen (43:30): Product launches aim for wild success, but fundamental research is about developing core capabilities for long-term impact.

13. Company Culture, Speed, and Enduring Principles

Mark Chen (45:50): OpenAI avoids the “learning plateau” with its culture of constant change and high volume of research breakthroughs.
Jakub Pachocki (46:45): Eternal readiness for paradigm shifts—always learning the “new thing”.

14. Co-Leadership, Chemistry & Trust

Mark Chen & Jakub Pachocki (47:17-49:24): Their deep mutual trust developed while jointly tackling unpopular, high-risk research, forming the backbone of OpenAI's reasoning advances and team cohesion.

"I think over time, kind of growing a very small effort into increasing larger effort..." — Mark Chen
"Mark... got a group of people working on very different things, got them all together and created a team with incredible chemistry." — Jakub Pachocki

Notable Quotes & Memorable Moments

Jakub Pachocki [00:00]:

"The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas."
Mark Chen [01:51]:

"We think GPT-5 is this step towards delivering reasoning and more agentic behavior by default."
Jakub Pachocki [03:11]:

"...For a lot of [evals], inching from 96 to 98% is not necessarily the most important thing in the world [...] Now we have these different ways of training, in particular reinforcement learning on serious reasoning, where we can pick a domain and we can really train a model to become an expert in this domain, to reason very hard about it..."
Mark Chen [18:31]:

"This past weekend I was talking to some high schoolers and they're saying, oh, you know, actually the default way to code is Vibe coding."
Jakub Pachocki [20:36]:

"Persistence is a very key trait... you're trying to create something... that is just not known. It's not known to work. You don't know whether it will work. And so always trying something that will most likely fail."
Mark Chen [25:55]:

"We have a fairly clear and crisp definition of what it is we're out to build. We like innovating at the frontier, we really don’t like copying..."
Mark Chen [39:35]:

"Anyone who says [we’re not compute constrained] should just step into my job for a week... there's no one who's like, oh, you know, I have all the compute that I need."
Jakub Pachocki [44:53]:

"More broadly than compute, there's physical constraints of energy. But also at some point, not too far, robotics will become a major focus."

Timestamps of Key Segments

Automated Researcher Vision: 00:00–00:13, 07:04–08:04
GPT-5 Mission & Reasoning: 01:51–02:43
Shift in Evals: 02:56–05:12
Competition Wins & Economic Relevance: 05:12–07:04
Long-Horizon Reasoning & Agency: 07:04–09:13
Extending Progress Beyond Science: 09:38–11:18
RL’s Enduring Effectiveness: 11:18–14:21
Codex, Coding, and Vibe Coding: 14:21–20:07
Great Research & Researcher Traits: 20:07–24:47
Organizational Resilience & Talent: 24:47–30:44
Research vs Product & Resource Balancing: 30:44–39:16
Academia vs Startup R&D: 39:43–42:16
Culture, Speed, and Enduring Research Principles: 42:16–46:45
Trust & Leadership at OpenAI: 47:17–50:40

From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki

Powered by Wave AI

Summary

Podcast Summary: a16z Podcast

Main Theme & Purpose

Key Discussion Points & Insights

1. The “Automated Researcher” Vision

2. GPT-5 and Mainstream Reasoning

3. Evals: Moving to Meaningful Benchmarks

4. Long Horizon Agency & Model Autonomy

5. Extending Progress to Less-Verifiable Domains

6. RL’s Ongoing Utility

7. The Codex Leap & "Vibe Coding"

8. The Nature of Great Research(ers) & Vibe Researching

9. Organizational Resilience & Research Culture

10. Product, Research, and Resource Balancing

11. Universities, OpenAI, & The Pace of AI Progress

12. Handling Perception, Roadmap, and Change

13. Company Culture, Speed, and Enduring Principles

14. Co-Leadership, Chemistry & Trust

Notable Quotes & Memorable Moments

Timestamps of Key Segments

Conclusion

Summary

Podcast Summary: a16z Podcast

Main Theme & Purpose

Key Discussion Points & Insights

1. The “Automated Researcher” Vision

2. GPT-5 and Mainstream Reasoning

3. Evals: Moving to Meaningful Benchmarks

4. Long Horizon Agency & Model Autonomy

5. Extending Progress to Less-Verifiable Domains

6. RL’s Ongoing Utility

7. The Codex Leap & "Vibe Coding"

8. The Nature of Great Research(ers) & Vibe Researching

9. Organizational Resilience & Research Culture

10. Product, Research, and Resource Balancing

11. Universities, OpenAI, & The Pace of AI Progress

12. Handling Perception, Roadmap, and Change

13. Company Culture, Speed, and Enduring Principles

14. Co-Leadership, Chemistry & Trust

Notable Quotes & Memorable Moments

Timestamps of Key Segments

Conclusion