
Hosted by Timothy B. Lee · EN

I don’t know anyone who has ridden in more different kinds of robotaxis than Sophia Tung. A YouTuber and the author of the RideAI newsletter, she is one of the most knowledgeable experts on the contemporary autonomous vehicle sector. She is also our first return guest.Across multiple trips to China, Sophia has taken rides in the three leading Chinese services — Apollo Go, WeRide, and Pony. In the United States, she has spent time in vehicles made by Tesla, Waymo and Amazon’s Zoox. She describes her experiences in each vehicle, comparing ride smoothness, vehicle comfort, and performance on the tricky process of pickups and dropoffs. We also dig into the debate over custom-built vehicles — the Zoox vehicle is custom-built for autonomy, whereas Waymo’s service is built on a retrofitted Jaguar I-PACE.Sophia argues that infrastructure is hugely important for the AV industry. In China, battery-swap stations get robotaxis back on the road in three minutes versus more than an hour of downtime in the US. Permitting is easier in China, and much of the Chinese supply chain sits within a stone’s throw of Shenzhen. An entrepreneur in China can jump on WeChat, visit a factory for tea, and have parts in hand within days. This gives China an edge not only in the electric vehicle market but in robotics more generally. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org

I talk to Divyansh Kaushik, a Carnegie Mellon machine learning PhD turned national-security advisor at Beacon Global Strategies, about the robotics race between the US and China and why winning the race matters for national security.We dig into the state of robotic AI models—particularly vision-language-action (VLA) architectures—and why training them is harder than training LLMs. There's no internet-scale dataset of robot manipulation, so some companies are hiring humans in exoskeletons to perform real-world tasks. China has attacked this problem head-on, creating dozens of state-funded data-collection facilities.Kaushik argues that the Pentagon, which once helped to bootstrap semiconductors and the early internet, could use its procurement and grand-challenge authorities to generate the contact-rich data American startups desperately need. We also explore China's hardware edge; Shenzhen's dense supply chains allow design iteration in a day, compared to weeks in the US.Kaushik argues there’s an urgent national security case for US leadership in robotics. Unitree robots, which are increasingly used in academia and by law enforcement, have been observed transmitting video, audio, and other data to servers in China without the consent of users. Kaushik argues that the US was too slow to ban drones made by the market-leading Chinese firm DJI. And he worries that the US government will become even more reluctant to act as the next wave of Chinese-made robots enters American homes and factories. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org

Alex Imas is an economist at the University of Chicago Booth School who argues that the most important thing about an AI-saturated economy won’t be what machines can produce—it’ll be what humans still want from each other.Imas’s central claim, laid out in his essay “What Will Be Scarce,” is that when AI can replicate every cognitive and physical task, demand for human provenance becomes the economy’s binding constraint. He backs this up with experimental evidence: in controlled settings, people’s willingness to pay for an identical good roughly doubles when it’s scarce and human-made, even when the hedonics are exactly the same.We talk through how this plays out in practice—Starbucks pulling back automation because customers missed the barista experience, the historical pattern of agriculture and manufacturing shrinking as shares of GDP while services absorb displaced income, and the debate with economist Phil Trammell over whether new AI-created goods could crowd out the relational sector entirely.The conversation turns darker when we discuss the transition to a post-AI world. Imas draws parallels to the Industrial Revolution, warning there were “huge losers” whose suffering gets swept under the rug. He favors David Autor’s proposal for a “universal basic capital” over simple UBI, but acknowledges a deep cultural problem: the relational jobs that survive are likely to disproportionately be care roles traditionally held by women, while the jobs most vulnerable to automation skew male. Can retraining programs—which have a poor track record—really bridge that gap? Or are we headed for a gendered economic rupture? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org

Last week Anthropic stunned the AI world by announcing Claude Mythos Preview—and then refusing to release it. Princeton’s Sayash Kapoor, co-author of the newsletter AI as Normal Technology, joins Tim and Kai Williams to make sense of the moment.Kapoor argues that Mythos’ vulnerability-finding prowess, including unearthing a 27-year-old OpenBSD bug, fits a familiar pattern: fuzzing tools triggered similar alarm decades ago but ultimately strengthened defenders more than attackers. Kapoor’s “normal technology” thesis holds that AI’s impact is shaped less by capability jumps than by downstream adoption—how industries, legal systems, and institutions absorb the technology.The conversation turns to whether alignment or control is the more promising safety strategy. Kapoor contends that the Mythos system card’s examples of the model bypassing access controls reveal shortcomings in control mechanisms, not alignment failures, and calls for ecosystem-level hardening—formal verification, sandboxing, network security—rather than relying on any single model behaving well.Kapoor then shares his latest research finding that AI agent reliability is improving four to ten times more slowly than average-case accuracy, and that current frontier models—including GPT-5.2—haven’t cleared even “one nine” of reliability. On Sierra’s TauBench, agents confidently book wrong flights and refund thousands of dollars in error, with Gemini 2.5 claiming 100% confidence even when it fails. If each additional nine of reliability is harder than the last, does that mean the real timeline for autonomous AI isn’t set by when models get smart enough, but by when the surrounding infrastructure catches up? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org

Tim talks to Nat Purser, a tech policy advocate at Public Knowledge and a veteran of Democratic campaigns, about how policymakers on the left side of the political spectrum view AI.Purser describes a Democratic landscape split between those who see AI as a real but threatening force and those who dismiss it as another crypto-style bubble. She traces how Sen. Bernie Sanders broke from the pack by treating AI as genuinely transformative—meeting with AI safety figures like Eliezer Yudkowsky and Nate Soares, proposing a federal data center moratorium with Rep. Alexandria Ocasio-Cortez, and openly saying he uses Claude himself. Purser contrasts this with the dismissive attitude she sometimes encounters among progressive elites.She also details the fractures within labor: Hollywood actors and writers see AI as an existential threat to creativity, while construction unions welcome data center jobs. On the legislative front, she recounts how a bipartisan coalition crushed Ted Cruz’s ten-year preemption of state AI laws in a 99–1 vote, and argues that narrowly scoped preemption paired with federal standards is the only defensible approach.Purser predicts the "stochastic parrots" camp — those who dismiss AI as mere corporate hype — will lose influence as AI capabilities grow. But it’s too early to say whether Democratic leaders, including the next Democratic presidential nominee, will embrace Sanders’s apocalyptic framing or take a more conventional approach focused on issues like privacy and nondiscrimination. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org

Author Ryan Avent joins Tim to revisit a bet they made 16 years ago—and to ask whether the lessons of self-driving cars apply to modern AI.Back in 2010, Avent wagered that his newborn daughter would never need a driver’s license thanks to self-driving cars. Tim bet she would and ultimately won $500. But he was right for the wrong reasons. Tim assumed regulation would be a major obstacle to progress in self-driving technology, but logistical challenges and a long tail of edge cases have done more to hamper Waymo’s growth.The parallel to LLMs is striking: ChatGPT’s early demos convinced many people that we were close to human-level intelligence, just as Google’s early autonomous vehicle demos convinced people we were close to human-level driving. But deployment of LLMs is bottlenecked by everything from data center buildouts to the glacial pace at which large organizations reorganize around new tools.Avent, who wrote The Wealth of Humans in 2016 and has a new book on social capital arriving in April, argues that AI’s deepest impact won’t be unemployment but a wholesale reshuffling of status. White-collar professionals may face the same loss of prestige that blue-collar workers experienced a generation ago. Tim pushes back with an optimistic take: if the college wage premium compresses, the long-run equilibrium might actually be more egalitarian, echoing the mid-20th-century economy some people remember fondly. But we only got to that economy after two world wars and decades of organizing by the labor movement. Could today’s transition be equally turbulent? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org

METR’s time horizons chart has become one of the most discussed metrics in AI. It estimates the difficulty of tasks — measured in human work hours — that a model can complete about 50% of the time. By this measure, frontier models have been doubling their capabilities about once every seven months.But in this conversation, recorded on March 2, METR researcher Joel Becker explained that two most recent models at the time — Claude Opus 4.6 and GPT 5.3 — had gotten close to saturating METR’s task suite. This made the time horizon estimate less reliable for the best models. He noted that adding or removing a single task from the test suite can swing the estimated time horizon for Claude Opus 4.6 from 8 to 20 hours. We discussed why it could be challenging for METR to extend the chart to cover more difficult tasks.We then dug into METR’s controlled study of AI-assisted programmers, which initially found an 18% productivity decrease — one of last year’s most surprising results. The updated study now shows gains, but with a twist: AI has become so essential to programming that developers increasingly refuse to work without AI, making it difficult to perform a controlled experiment. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org

Tim and Dean team up with Scaling Laws hosts Alan Rozenshtein and Kevin Frazier for a joint episode on the fight between Anthropic and the Department of Defense.In this episode, recorded on March 4, they analyze the Pentagon’s decision to declare Anthropic a supply-chain risk. Dean frames this as an assault on private property rights with no clear limiting principle, while Kevin digs into the shaky legal footing of invoking the Federal Acquisition Supply Chain Security Act of 2018 against a domestic company. They then turn to OpenAI’s competing Pentagon deal, including Sam Altman’s AMA on Saturday night.The episode closes with a disagreement about what will happen next. Dean argues this is “act one, scene one” of an inevitable push toward government control of AI labs—a fight he’s tried to preempt through hybrid regulatory structures. Tim offers a deflationary counterpoint: this may ultimately be a personality-driven fight over a technology that will end up being important but not decisive. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org

Dean joins from London after attending the AI Impact Summit in India. Dean and Tim unpack the summit’s central tension: “middle power” nations like India, Indonesia, and Nigeria pushing a vision of AI focused on public service delivery, agriculture, and affordable open-source models, while largely dismissing the frontier-AI questions Dean considers most urgent—lab auditing, recursive self-improvement, and national security. They then turn to the week’s biggest story: the Department of Defense’s ultimatum to Anthropic. Anthropic’s contract bans autonomous lethal weapons and surveillance of Americans. Secretary of Defense Pete Hegseth has demanded that Anthropic lift those restrictions by Friday or potentially face designation as a supply-chain risk or invocation of the Defense Production Act.Dean argues the DoD has every right to cancel a contract it dislikes, but compelling a company to retrain its model under duress is another matter entirely—especially when, as Dean points out, this whole episode will become part of Claude’s training data, potentially shaping how the model understands its own relationship to the US government. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org

With Dean away, Tim invites his Understanding AI colleague Kai to unpack the surprising ways chatbot personalities can go wrong, a topic Kai covered in a recent article.Every LLM starts as a base model capable of playing countless characters, but AI companies try to keep chatbots in a “helpful assistant” lane. Kai walks us through the Grok “MechaHitler” debacle, in which xAI’s attempts to make its bot less politically correct backfired spectacularly. They also explore the “emergent misalignment” finding that fine-tuning a model for one bad behavior — like responding with buggy code — can make it act broadly like a villain. And they compare Anthropic’s virtue-ethics approach to character — complete with an 80-page constitution — with OpenAI’s more deontological model spec.Finally, they discuss the controversy over OpenAI’s decision to retire GPT-4o, which had developed an emotionally warm, sometimes dangerously sycophantic personality that users grew attached to. Kai argues OpenAI is making the right call, but the episode leaves open a harder question: as these systems become more central to people’s lives, who decides what counts as a healthy AI personality? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org