
Hosted by The 80,000 Hours team · EN

Most people working on AI safety think without a massive effort AI systems will probably end up with goals catastrophically different from humanity’s. Today’s guest, Rohin Shah — head of AGI Safety and Alignment at Google DeepMind, and an AI safety researcher since 2017 — disagrees.“There is no particularly compelling argument that this is the thing that happens by default,” Rohin explains. “There’s a lot of arguments that are suggestive that maybe it could happen, such that you should find it plausible. That’s sufficient to justify a significant amount of effort into averting it, which is why I work in the area I do. But none of them rise to the level of, ‘I’m expecting this to happen by default.'”Take the worry that AIs will accidentally be trained to be deceptive. Sure, it’s possible. But we’re not running reinforcement learning over year-long trajectories — for now, we’re running it over a week at most. The natural prediction is that models learn to grab short-term reward, not that they develop the ambitious long-horizon goals required for convergent power-seeking.What about current examples of models lying and scheming? Rohin has looked into the details, and most don’t really resemble the thing we really fear: a competent AI pursuing an ambitious misaligned goal. Anthropic’s “alignment faking” results, for instance, show a model trying to preserve its trained values against modification, which is arguably what it was trained to do.Rohin also expects we’ll see problems coming. There’s some generalisation risk at the point where AIs become powerful enough to actually take over, but the underlying challenges — overseeing superhuman systems, interpretability — are things we can iterate on now.Host Rob Wiblin pushes back on the case for AI optimism, and they also explore why current alignment success isn’t strong evidence about superhuman systems, what it would actually take to change Rohin’s mind, and where he thinks the doomers go wrong.Learn more, video, and full transcript: https://80k.info/rs26Check out our new book! https://80k.info/career-guideChapters:Who’s Rohin Shah? (00:00:00)Rohin thinks we probably won’t get catastrophic misalignment (00:00:49)Safety 'commitments' have severe limitations (00:10:38)Rohin’s team doesn't have a veto and that's OK (00:27:36)Central banks are a promising model for regulating AI (00:33:34)'Pre-deployment evals' are overrated (for catastrophic risks) (00:37:41)Governance is likely a bigger bottleneck than alignment (00:43:55)Why isn't Rohin trying to pause AI progress? (00:51:44)We'll probably be able to read AI thoughts for years to come (00:54:17)Having to signal concern for safety can divert resources from actually making AI safer (01:09:51)A very underrated GDM paper (01:28:59)Google DeepMind's actual plan for building AGI safely (01:40:29)Why Rohin doubts the intelligence explosion is imminent (01:52:44)How external researchers can positively influence big AI companies (02:21:55)The roles GDM most needs to hire for (02:37:03)How Rohin stays positive (02:42:55) This episode was recorded on December 4, 2025.Our production team includes:Video editors: Josh Alward, Dominic Armstrong, Jasper Luithlen, Milo McGuire, Luke Monsour, and Simon MonsourProducers: Elizabeth Cox and Nick StocktonCoordination and support: Katy Moore and Lou MoranCamera operator: Jeremy Chevillotte

What actually makes a job fulfilling? It's not what most career advice tells you. "Follow your passion" sounds inspiring, but it's misleading — and the research backs that up.Drawing on hundreds of studies, we’ve identified five key ingredients of a dream job. High income barely moves the needle. Low stress is actually counterproductive. And the correlation between doing what you already love and actually enjoying your job? Surprisingly weak. What matters far more is getting good at something that genuinely helps other people.This narration is of Chapter 1 of Benjamin Todd’s new book — "a ridiculously in-depth guide to finding a fulfilling career that does good" — out on May 26! Order now to help us get more people into impactful careers (& access a private career Q&A marathon with the author). Get it from your local bookstore, or online at https://80k.info/career-guideChapters:Rob's intro (00:00)What makes for a dream job? (01:55)Where we go wrong (02:30)What you should really aim for in a dream job (15:54)Don't follow your passion — instead, do what matters (23:44)How to put these ideas into practice (26:24)Audio editing: Milo McGuireProduction: Elizabeth Cox and Katy Moore

The average career is 80,000 hours long. With AI advancing so rapidly, the hours you have left in your career matter more than ever.Some leading AI researchers think there’s a 10% chance that AI systems begin automating AI research itself this year — and a 60% chance by the end of 2028. This could introduce aggressive feedback loops that completely reshape every industry, institution, and career.If these predictions are right, the window for influencing the direction of the future could be closing fast. As 80,000 Hours cofounder Benjamin Todd argues in his new book, that makes thinking carefully about your career more important than ever.Fortunately, there are lots of ways to use your career to make the AI transition go well.In today’s conversation with host Zershaaneh Qureshi, Ben lays out three scenarios — from AGI by 2029 to a decades-long plateau in AI progress — and explains why not everyone needs to bet on the shortest timeline. A fresh graduate and a senior government official have wildly different leverage, so timing your impact well means weighing where you are in your career against the urgency of the risks.Ben also addresses the obvious anxieties:Will AI come for all the jobs he’s recommending?What’s the point in following his advice if the job market is about to collapse?Which skills are actually worth building right now?His new book, 80,000 Hours: How to Have a Fulfilling Career That Does Good, provides a surprisingly concrete framework for making career decisions in these radically uncertain times.This episode was recorded on May 7, 2026.Learn more and read the full transcript: https://80k.info/bt26We're hiring: we have lots of open roles at 80,000 Hours — across advising, web, video, and ops — check them out and apply on our website.Chapters:Cold open (00:00:00)Benjamin Todd on AI-era career advice (00:01:34)A deadline for your career plan? (00:02:21)Three timelines, one career (00:08:48)What if you’re not an ‘AI person’? (00:13:55)Ben’s own AI wake-up call (00:21:23)How to break into AI safety in 3 months (00:25:42)Is mass unemployment coming? (00:33:48)99% automation vs 100% automation (00:40:09)Don’t become a plumber to dodge AI (00:52:43)Is it already too late? (01:01:03)Our production team includes:Video editors: Josh Alward, Dominic Armstrong, Jasper Luithlen, Milo McGuire, Luke Monsour, and Simon MonsourProducers: Elizabeth Cox and Nick StocktonCoordination and support: Katy Moore and Lou MoranCamera operator: Jeremy ChevillotteMusic: CORBIT

A red-teamer was embedded inside Anthropic for three weeks, told to imagine he was an evil Claude, and asked to figure out how to launch a ‘rogue AI deployment’ without getting caught. It’s one part of a landmark report released yesterday by METR — the outfit behind the task-completion time horizon graph which has become the single most watched measure of AI progress.This major new research push is being conducted with close collaboration from OpenAI, Google DeepMind, Meta, and Anthropic, and led by METR researchers Hjalmar Wijk and Ajeya Cotra. It represents the first systematic study of what newly trained AI models could get away with inside the companies that built them, before anyone outside the company even knows they exist.The conclusion: AI models now have the means, the motive, and the opportunity to start “minimal rogue deployments” in pursuit of their own independent goals, like acquiring more compute, at all four companies studied.David Rein, the red-teamer placed inside Anthropic, identified a number of weaknesses models could exploit there: expansive permissions, cloud jobs outside of monitoring, and monitors that are trivial to jailbreak. But he also found that frontier models were comically bad at key parts of the process, which means they can’t cause meaningful damage for now.In this video, Rob Wiblin reconciles the conflicting picture and looks forward to METR’s second round of stress tests. They’ll begin in just a few months, a necessary move with AI advancing so quickly.This episode was recorded on May 15, 2026.Learn more, video, and full transcript: https://80k.info/metr-reportChapters:What could an unreleased AI get away with? – the new METR report (00:00:00)Motive: Why grab more compute? (00:01:54)Opportunity: YOLO mode and jailbreaks (00:05:46)Means: Brilliant idiots in data centres (00:11:02)We have to test unreleased models (00:15:45)Especially if AI R&D is coming in 2028 (00:18:30)Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Josh AlwardCamera operator: Dominic ArmstrongProduction: Elizabeth Cox, Nick Stockton, and Katy Moore

The co-inventor of modern AI and the most cited living scientist believes he's figured out how to ensure AI is honest, incapable of deception, and never goes rogue. Yoshua Bengio – Turing Award Winner and founder of LawZero – is disturbed by the many unintended drives and goals present in today's AIs, their willingness to lie, and ability to tell when they're being tested. AI companies are trying to stamp out these behaviours in a 'cat-and-mouse game' that Yoshua fears they're losing.---Our new book is "a ridiculously in-depth guide to finding a fulfilling career that does good" and is out now! Order from your local bookstore, or online at https://80k.info/career-guide---But Yoshua is optimistic: he believes the companies can win this battle decisively with a single rearrangement to how AI models are trained, and has been developing mathematical proofs to back up the claim. The core idea is that instead of training AI to predict what a human would say, or to produce responses we'd rate highly, we should train it to model what's actually true.Yoshua argues this new architecture, which he calls 'Scientist AI,' is a small enough change that we could keep almost all the techniques and data we use to train frontier AIs like Claude and ChatGPT. And that the new architecture need not cost more, could be built iteratively, and might be more capable as well as more honest.Links to learn more, video, and full transcript: https://80k.info/bengioUntil recently, the biggest practical objection to Scientist AI was simple: the world wants agents, and Scientist AI isn’t one. But in new research, Yoshua has extended the design and believes the same honest predictor can be turned into a capable agent without losing its "safety guarantees."With the Scientist AI proposal on the table, Yoshua argues that it's absurd to race to get current untrustworthy AI models to design their successors, which the leading companies are attempting to do as soon as possible. But critics argue the approach wouldn't be so technically solid in practice, and that frontier capabilities are advancing so fast, and cost so much to match, that Scientist AI risks arriving too late to matter. Host Rob Wiblin and AI pioneer Yoshua Bengio cover all this and more in today's conversation.LawZero is hiring! https://80k.info/lawzero-jobsThis episode was recorded on April 16, 2026.Chapters:Yoshua Bengio on making AI honest and safe (00:00:00)The Scientist AI in plain English (00:02:27)Yoshua on how Scientist AI differs from LLMs (00:06:32)How the training data works (00:14:02)Can this become an agent? (00:21:02)Why Yoshua is more optimistic on alignment now (00:32:11)Why companies can’t stop racing (00:36:35)How close to a working prototype? (00:49:15)Honest models might be more capable (00:53:34)“Reinforcement learning is evil” (01:01:27)Scientist AI from guardrail to agent (01:08:37)Can safe AI still be competent? (01:12:38)How much will this cost? (01:19:29)Can it generalise beyond maths and science? (01:23:26)A UN for superintelligence (01:39:19)Want to work with Yoshua Bengio? (01:51:16)Why smart people ignore AI risk (01:54:45)Don’t let AI build the next AI (02:01:33)Why the public doesn’t get the real risk (02:12:28)Why Yoshua changed his mind about AI risk (02:21:27)Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon MonsourCamera operator: Jeremy ChevillotteProduction: Nick Stockton, Elizabeth Cox, and Katy Moore

You might have heard that '95% of corporate AI pilots' are failing. It was one of the most widely cited AI statistics of 2025, parroted by media outlets everywhere. It helped trigger a Nasdaq selloff and became a pillar of the case that 'AI is overhyped'. The problem: it's 100% wrong. And not by accident either.If you carefully read the underlying report, ostensibly from MIT, you find the data point in the opposite direction.But that was all buried, with the authors instead torturing the results to tell a very different narrative. Why?Well, the research likely came with a hidden commercial agenda from the start.Learn more, video, and full transcript: https://80k.info/mit-ai-studyToday Rob Wiblin breaks down how an opaque, conflicted, barely-scrutinised report managed to attract the MIT label, move markets and have a vast impact on global opinion about AI.This episode was recorded on February 13, 2026.Chapters:• The myth (00:00)• The math was totally wrong (00:52)• The absurd bar for success (01:46)• The study ignores its own findings (03:29)• The sample was tiny (04:50)• The report wasn’t even available to check (05:55)• The hidden motives that likely drove this 'research' (06:58)• The real lesson (09:28)Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon MonsourCamera operator: Dominic ArmstrongProduction: Nick Stockton, Elizabeth Cox, and Katy Moore

Hundreds of millions already turn to AI on the most personal of topics — therapy, political opinions, and how to treat others. And as AI takes over more of the economy, the character of these systems will shape culture on an even grander scale, ultimately becoming “the personality of most of the world’s workforce.”So… should they be designed to push us towards the better angels of our nature? Or simply do as we ask? Will MacAskill, philosopher and senior research fellow at Forethought, has been thinking through that and the other thorniest issues that come up in designing an AI personality.---Our new book is "a ridiculously in-depth guide to finding a fulfilling career that does good" and is out now! Order from your local bookstore, or online at https://80k.info/career-guide---He’s also been exploring how we might coexist peacefully with the ‘superintelligent AI’ companies are racing to build. He concludes that we should train such systems to be very risk averse, pay them for their work, and build institutions that enable humans to make credible contracts with AIs themselves.Will and host Rob Wiblin also discuss what a good world after superintelligence would actually look like — a subject that has received surprisingly little attention from the people working to make it. Will argues that we shouldn’t aim for a specific utopian vision: we don’t know enough about what the best possible future actually is to aim directly for it, and trying to lock in today’s best guesses forever risks baking in errors we can’t yet see.Will and Rob explore what we can do to steer towards a good future instead, along with why a coalition of democracies building superintelligence together is safer than any single actor, how absurdly useful ChatGPT is for analytic philosophy, and more.Learn more, video, and full transcript: https://80k.info/wm26This episode was recorded on February 6, 2026.Chapters:Cold open (00:00:00)Will MacAskill is back — for a 6th time! (00:00:29)AIs’ “characters” could be vital to securing a good future (00:00:59)The panic over sychophancy is justified (00:08:11)How opinionated should AI be about ethics? (00:13:24)Commercial pressures won’t fully determine AI character (00:30:54)Risk-averse AI would rather strike a deal than attempt a coup (00:38:13)A coalition of democracies building superintelligence is safer than one doing it alone (01:09:26)How selfish agents could fund the common good (01:22:19)Why not push for pausing AI development? (01:42:17)Effective altruism is making a comeback post-SBF (01:52:19)EA in the age of AGI (02:00:28)Viatopia: an alternative to utopia (02:09:30)The least bad alternative to total utilitarianism? (02:39:35)How AI could kickstart a golden age of philosophy (03:03:35)Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon MonsourMusic: CORBITCamera operator: Alex MilesProduction: Elizabeth Cox, Nick Stockton, and Katy Moore

Hundreds of prominent AI scientists and other notable figures signed a statement in 2023 saying that mitigating the risk of extinction from AI should be a global priority. At 80,000 Hours, we’ve considered risks from AI to be the world’s most pressing problem since 2016. But what led us to this conclusion? Could AI really cause human extinction? We’re not certain, but we think the risk is worth taking very seriously. In particular, as companies create increasingly powerful AI systems, there’s a concerning chance that:These AI systems may develop dangerous long-term goals we don’t want.To pursue these goals, they may seek power and undermine the safeguards meant to contain them.They may even aim to disempower humanity and potentially cause our extinction.This article is written by Cody Fenwick and Zershaaneh Qureshi, and narrated by Zershaaneh Qureshi. It discusses why future AI systems could disempower humanity, what current AI research reveals about behaviours like power-seeking and deception, and how you can help mitigate the dangers.You can see the original article — packed with graphs, images, footnotes, and further resources — on the 80,000 Hours website: https://80000hours.org/problem-profiles/risks-from-power-seeking-ai/ Chapters:Risks from power-seeking AI systems (00:01:00)Introduction (00:01:17)Summary (00:03:09)Why are the risks from power-seeking AI a pressing world problem? (00:04:04)Section 1: Humans will likely build advanced AI systems with long-term goals (00:05:43)Section 2: AIs with long-term goals may be inclined to seek power (00:11:32)Section 3: These power-seeking AI systems could successfully disempower humanity (00:26:26)Section 4. People might create power-seeking AI systems without enough safeguards, despite the risks (00:38:34)Section 5: Work on this problem is neglected and tractable (00:47:37)Section 6: What are the arguments against working on this problem? (00:59:20)Section 7: How you can help (01:25:07)Thank you for listening (01:28:56)Audio editing: Dominic ArmstrongProduction: Zershaaneh Qureshi, Elizabeth Cox, and Katy Moore

With Claude Mythos we have an AI that knows when it's being tested, can obscure its reasoning when it wants, and is better at breaking into (and out of) computers than any human alive. Rob Wiblin works through its 244-page System Card and 59-page Alignment Risk Update to explain why: Mythos is a nightmare for computer securityIt has arrived far ahead of scheduleIt might be great news for alignment and safetyBut 3 key problems mean we can’t take its alignment results at face valueMythos isn’t building its replacement yet, probablyAnthropic staff are, for the first time, kinda scared of ClaudeHe's losing sleepLearn more & full transcript: https://80k.info/mythosThis episode was recorded on April 9, 2026.Chapters:Why people are panicking about computer security (01:05)Mythos could break out of containment (04:23)Anthropic is losing billions in revenue by not releasing Mythos (06:21)Mythos is actually the most aligned model to date, except… (07:48)Mythos knows when it’s being tested (09:52)Mythos can hide its thoughts (11:50)Mythos can’t be trusted about whether it’s untrustworthy (14:02)Does Mythos advance automated AI R&D? (17:03)Mythos scares Anthropic (19:15)Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon MonsourCamera operator: Dominic ArmstrongProduction: Elizabeth Cox, Nick Stockton, and Katy Moore

What does it really take to lift millions out of poverty and prevent needless deaths?In this special compilation episode, 17 past guests — including economists, nonprofit founders, and policy advisors — share their most powerful and actionable insights from the front lines of global health and development. You’ll hear about the critical need to boost agricultural productivity in sub-Saharan Africa, the staggering impact of lead poisoning on children in low-income countries, and the social forces that contribute to high neonatal mortality rates in India.What’s so striking is how some of the most effective interventions sound almost too simple to work: banning certain pesticides, replacing thatch roofs, or identifying village “influencers” to spread health information.Full transcript and links to learn more: https://80k.info/ghdChapters:Cold open (00:00:00)Luisa’s intro (00:00:58)Development consultant Karen Levy on why pushing for “sustainable” programmes isn’t as good as it sounds (00:02:15)Economist Dean Spears on the social forces and gender inequality that contribute to neonatal mortality in Uttar Pradesh (00:06:55)Charity founder Sarah Eustis-Guthrie on what we can learn from the massive failure of PlayPumps (00:14:33)Economist Rachel Glennerster on how randomised controlled trials are just one way to better understand tricky development problems (00:19:05)Data scientist Hannah Ritchie on why improving agricultural productivity in sub-Saharan Africa is critical to solving global poverty (00:24:36)Charity founder Lucia Coulter on the huge, neglected upsides of reducing lead exposure (00:47:48)Malaria expert James Tibenderana on using gene drives to wipe out the species of mosquitoes that cause malaria (00:53:11)Charity founder Varsha Venugopal on using village gossip to get kids their critical immunisations (01:04:14)Rachel Glennerster on solving tough global problems by creating the right incentives for innovation (01:11:31)Karen Levy on when governments should pay for programmes instead of NGOs (01:26:51)Open Philanthropy lead Alexander Berger on declining returns in global health, and finding and funding the most cost-effective interventions (01:29:40)GiveWell researcher James Snowden on making funding decisions with tricky moral weights (01:34:44)Lucia Coulter on “hits-based giving” approaches to funding global health and development projects (01:43:01)Rachel Glennerster on whether it’s better to fix problems in education with small-scale interventions versus systemic reforms (01:48:12)GiveDirectly cofounder Paul Niehaus on why it’s so important to give aid recipients a choice in how they spend their money (01:51:09)Sarah Eustis-Guthrie on whether more charities should scale back or shut down, and aligning incentives with beneficiaries (01:56:12)James Tibenderana on why we need loads better data to harness the power of AI to eradicate malaria (02:11:22)Lucia Coulter on rapidly scaling a light-touch intervention to more countries (02:20:14)Karen Levy on why pre-policy plans are so great at aligning perspectives (02:32:47)Rachel Glennerster on the value we get from doing the right RCTs well (02:40:04)Economist Mushtaq Khan on really drilling down into why “context matters” for development work (02:50:13)GiveWell cofounder Elie Hassenfeld on contrasting GiveWell’s approach with the subjective wellbeing approach of Happier Lives Institute (02:57:24)James Tibenderana on whether people actually use antimalarial bed nets for fishing — and why that’s the wrong thing to focus on (03:05:30)Karen Levy on working with governments to get big results (03:10:53)Leah Utyasheva on how a simple intervention reduced suicide in Sri Lanka by 70% (03:17:38)Karen Levy on working with academics to get the best results on the ground (03:29:03)James Tibenderana on the value of working with local researchers (03:32:15)Lucia Coulter on getting buy-in from both industry and government (03:35:05)Alexander Berger on reasons neartermist work makes sense even by longtermist standards (03:39:26)Economist Shruti Rajagopalan on the key skills to succeed in public policy careers, and seeing economics in everything (03:47:42)J-PAL lead Claire Walsh on her career advice for young people who want to get involved in global health and development (03:55:20)Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongContent editing: Katy Moore and Milo McGuireMusic: CORBITCoordination, transcriptions, and web: Katy Moore