Podcast Summary:
80,000 Hours Podcast
Episode: Every AI Company's Safety Plan is 'Use AI to Make AI Safe'. Is That Crazy? | Ajeya Cotra
Date: February 17, 2026
Host(s): Rob Wiblin, Luisa Rodriguez
Guest: Ajeya Cotra (Senior Advisor, Open Philanthropy)
Main Theme
This episode features an in-depth conversation with Ajeya Cotra about the dominant AI safety plan at major AI labs: using increasingly capable AI systems themselves as integral tools for ensuring future AI systems remain safe and aligned. The discussion explores the logic and risks behind this strategy, the vast disagreements about the pace and impact of AI progress, transparency requirements for labs, and what governments, philanthropies, and individuals can do to prepare for transformative AI. The latter half is a candid reflection on Ajeya's career in effective altruism, management, burnout, and the maturity of the EA movement.
Key Discussion Points and Insights
1. Overview: How AI Companies Plan for Safety
-
AI Labs’ Stated Plans:
- OpenAI, Anthropic, and Google DeepMind plan to “use AIs to make AIs safe.”
- As AIs become more capable, they are incorporated into both the research and enforcement of AI safety protocols.
“In all of their stated safety plans, you see this element of as AIs get better and better, they're going to incorporate the AIs themselves into their safety plans more and more.” (Ajeya, [00:00])
-
Crucial Safety Bottleneck:
- The challenge is to harness very capable AIs for positive ends before their agency outpaces human control.
- Either: extensive oversight slows progress, or unchecked progress hands over crucial levers to AI.
“It either like bottlenecks our progress because we're checking on everything all the time …. or it doesn't bottleneck our progress, but we hand the AIs the power to take over.” (Ajeya, [00:32]; repeated at [56:59])
2. Disagreements on AGI's Impact and ‘Intelligence Explosion’
-
Definition Drift:
- "AGI" is defined inconsistently—some (VCs, technologists) use it to describe current models; others (AI safety, classic futurists) use a much stricter standard.
“VCs have an instinct to call something AGI that is like GPT5 … something just much milder.” (Ajeya, [04:11])
-
Spectrum of Predictions:
- Some forecast a “10,000 year leap” (Ajeya) by 2050, others expect moderate progress.
- Growth predictions range from 0.3 percentage points to 1,000%+ annual economic growth.
“It's an almost unfathomable degree of disagreement among people … a 10,000 fold disagreement” (Rob, [14:10])
-
Error Theories & Priors:
- Slow-growth camp sees technological hype as a recurring fallacy (e.g., computers, radio, TV didn’t create obvious productivity booms).
- Fast-growth camp points to historical accelerations (industrial revolution) and feedback loops.
“Every time … they've always been wrong … you have this strong prior that … someone could have made the same argument about television … about computers. None of these played out.” (Ajeya, [21:14])
-
Empirical Heads-Up?
- Need randomized controlled trials (RCTs), industry benchmarking, AI adoption metrics.
- Ajeya references Meter’s RCT showing AI slowed developers—a surprising but not permanent outcome.
3. Benchmarking/Transparency for Warning Signals
-
Transparency Requirements:
- Labs should systematically report internal benchmark results, AI use for coding/reviews, safety incidents.
- Benchmark-only disclosures lag behind real progress—real productivity increases (internal use) are paramount.
“I would really like them to be reporting their most concerning misalignment related safety incidents. … But then of course it's clear that reporting that is very embarrassing to companies.” (Ajeya, [38:50])
-
Barriers:
- Commercial incentives and competitiveness restrain disclosure.
- Companies may wish to keep pace secret, both for competitive edge and PR purposes.
- Ideally, reporting would be public, not just to niche regulators, so the broader research/scientific community can respond.
“It's more like we want this information out there in the open and then we want people to do ... involved analyses of it. … I think it would be very hard for [regulators alone] to interpret the evidence quickly enough.” (Ajeya, [41:03])
4. How to Use AI Labor During an ‘Intelligence Explosion’
-
Redirecting AI Labor:
- Upon hitting rapid self-improvement, focus AIs on:
- Alignment research (control, safety, interpretability).
- Societal defenses (cybersecurity, biodefense, disinformation resilience).
- Epistemic improvements (helping decision-making, compromise, and policy).
- Avoiding value lock-in, dangerous arms races, or "grabby" power transitions.
- Upon hitting rapid self-improvement, focus AIs on:
-
Window of Opportunity:
- Plan relies on a window (6–24 months?) where AIs are capable but not yet uncontrollably powerful.
- If “foom” is instant (i.e., a hard takeoff), or AIs only aid in making better AIs (and nothing else), the plan could fail.
“There exists a window of opportunity before AIs are uncontrollably powerful or have created unacceptable levels of risk, where they are really capable and really change the game for AI safety research.” (Ajeya, [63:05])
-
Practicality and Limitations:
- Success requires shifting substantial compute from making AIs smarter to making them safer.
- Risk: in competitive races, safety efforts may lose out to capability accelerations.
5. Open Philanthropy, Civil Society, and Individual Strategies
-
Ajeya's Perspective:
- Open Phil should be tracking and preparing for the moment when most technical work can be meaningfully automated.
- Consider massive, rapid spending (“dumping billions”) for AI labor at crunch time.
- Hedge against compute scarcity (e.g., buy GPUs, Nvidia stock).
-
Barriers:
- Will Open Phil or similar outsiders be able to buy top-tier AI time when it matters most?
- Price and even basic access to leading AI models/chips could be constrained.
“You get to a point where one company … keeps its internal best systems to itself and only releases systems that are considerably worse than its internal frontier.” (Ajeya, [90:44])
-
Call to Action:
- Governments and organizations should proactively ready themselves to adopt AI quickly in critical domains.
- Regularly test how automatable various workstreams are.
6. Ajeya’s Career Reflections and Effective Altruism
-
Career Trajectory:
- Ajeya recounts her shift from deep technical AI research to technical grantmaking at Open Phil, her struggle with finding depth and feedback, perfectionism in management, and management challenges after Holden Karnofsky’s departure.
“I felt like, kind of lonely … I just really have a high need for like constantly talking to other people … I was not very good at hiring and management.” (Ajeya, [122:58])
-
Burnout & Sabbatical:
- After struggling with limited feedback and high stress, she took a sabbatical, reflected on her motivational drivers, and recognized a desire to be “plugged in” to the organization’s strategy.
-
Return & Future:
- Ajeya returned to Open Phil to help new leadership but is considering a move to research roles at Meter or Redwood Research, where she could prioritize depth and thrive in a more collaborative, integrated environment.
-
Reflections on EA:
- Ajeya discusses her longstanding attraction to EA—particularly its scope for caring about distant/future beings, its rigorous nerdiness, and its extreme integrity.
- She notes a loss of that extreme transparency and philosophical engagement as Open Phil matured and took on adversarial challenges.
“A big part of my motivation came from … intellectual depth and this like crazy high level of openness, transparency, like having absolutely nothing to hide.” (Ajeya, [140:52])
Notable Quotes and Memorable Moments
On the scope of changes that could come from AGI
“There’s a pretty good chance that by 2050 the world will look as different from today as today does from the hunter gatherer era … 10,000 years of progress rather than 25 years of progress.”
— Ajeya Cotra ([05:49])
On disagreement over the pace and impact of AI progress
“It's an almost unfathomable degree of disagreement among people … they've spoken about this, they've shared their reasons and they don't change their mind and they disagree by a thousand fold.”
— Rob Wiblin ([14:10])
On transparency and the politics of being forthcoming
“My instinct is to just do more of that and to just say more and respond. But it's harder to do that from Open Phil's position for a number of reasons.”
— Ajeya Cotra ([143:32])
On the practical challenges of using AI for safety at crunch time
“Anything that requires a large corporation to be super discontinuous in something it's doing is facing big headwinds as a plan. So I would hope that they're sort of smoothly increasing the amount of internal inference compute that is going towards safety as the AIs get better and better …”
— Ajeya Cotra ([82:11])
On the evolution of Effective Altruism as a community
“I think I was, to some extent, kidding myself about how much of my own motivation … came from just the goals ... the latter two things [intellectual depth, transparency] were actually really important for my motivation and they were … over time just like smaller and smaller features of what it was like to do EA.”
— Ajeya Cotra ([140:52])
Timestamps for Core Segments
- 00:00 – 06:17: Introduction and Ajeya’s track record of AI predictions, shifting definitions of AGI.
- 06:18 – 16:51: The spectrum of AGI expectations, economic growth, and feedback loops.
- 20:02 – 23:26: Why the community doesn't converge on the likely speed/impact of AGI.
- 23:26 – 29:56: Empirical evaluation, benchmarks, RCTs, and transparency for early warning.
- 30:15 – 38:50: Challenges with transparency; internal vs. external reporting; competitive risks.
- 40:03 – 46:17: Reporting to governments; the need for public, not just governmental, awareness.
- 46:28 – 61:08: What to do at “crunch time”; redirecting AI labor to alignment/defense activities.
- 62:06 – 68:26: Assumptions/limitations of using AI for AI safety; capability balance risks.
- 70:36 – 86:13: Critical bottlenecks; Open Phil’s strategic preparations; barriers to implementation.
- 90:21 – 97:50: Will external actors get access to top-tier AI models during crunch time?
- 98:22 – 111:24: Will AI companies buy-in to this plan, or default to profit-seeking/power-seeking?
- 115:28 – 129:28: Ajeya’s personal reflections, management, burnout, and her shift away from grantmaking.
- 130:23 – 143:32: Integrity, transparency, and changing culture in both Open Phil and EA.
- 143:32 – 158:13: EA as a movement—parallels with religion, organization, and the need for spiritual grounding.
- 162:08 – 173:30: Ajeya’s next steps; general lessons about organizational fit and adaptation.
Final Takeaways
- The AI field’s plan to use “AIs to make AIs safe” carries logic but harbors crucial risks set by competitive and political realities, orderings of capabilities, and whether society gets meaningful early warning.
- Disagreements about the impact and speed of AI progress remain profound and unresolved, even after extensive debate among experts.
- Transparency from AI labs about internal use, benchmark progress, and misalignment incidents is essential for public oversight, yet strongly opposed by commercial interests.
- At the key “crunch time,” effective redirection of AI labor (and mass philanthropic/government spending) toward safety and societal resilience could make or break humanity’s future—if access and readiness align.
- Ajeya’s career journey illustrates the complex tradeoffs between depth, impact, professional fit, and personal values within EA and the AI safety ecosystem.
- Effective altruism’s core strengths—altruistic scope, nerdy rigor, and high integrity—face new challenges as the community matures; its niche may increasingly lie in cultivating speculative, ahead-of-the-curve work that other communities avoid.
End of summary. For a deep dive into any section, refer to provided timestamps above.
