
Hosted by TrendTeller · EN

Please support this podcast by checking out our sponsors: - Consensus: AI for Research. Get a free month - https://get.consensus.app/automated_daily - SurveyMonkey, Using AI to surface insights faster and reduce manual analysis time - https://get.surveymonkey.com/tad - KrispCall: Agentic Cloud Telephony - https://try.krispcall.com/tad Support The Automated Daily directly: Buy me a coffee: https://buymeacoffee.com/theautomateddaily Today's topics: ARC-AGI-3 leaderboard shock - A claim on X says “Opus 4.8” jumped the ARC-AGI-3 benchmark on the ARC Prize leaderboard, beating “GPT-5.5” by a large margin, but still far from human efficiency—raising fresh questions about generalization progress. Search as Code for agents - Perplexity’s “Search as Code” pitches agent-built Python retrieval pipelines in sandboxes using low-level primitives, aiming to cut tokens, reduce context noise, and scale wide research tasks. AI compute funding and capex - Alphabet is reportedly raising up to $80B via stock sales to expand AI compute capacity, underscoring how the AI arms race is pushing Big Tech into massive capex and dilution-sensitive financing. Export controls tighten on GPUs - The US Commerce Department updated guidance to block Chinese AI firms from buying Nvidia and AMD frontier chips through overseas subsidiaries, closing a key export-control loophole. NVIDIA pushes open world models - NVIDIA announced Cosmos 3 as an open multimodal “world foundation model” for robotics and physical AI, highlighting a push toward simulation, synthetic data, and interoperable world-model ecosystems. Open-weights models heat up - NVIDIA’s Nemotron 3 Ultra and the new open-weight Mellum 2 model show the open ecosystem accelerating on both intelligence and efficiency, intensifying competition across US and Chinese labs. Agents move into Microsoft 365 - Microsoft’s Scout is an always-on Microsoft 365 agent tied to governed identity, signaling a shift from chat copilots to background automation—and putting security and control in the spotlight. AI tutoring beats law professors - A Stanford Law School-led study found professors often preferred AI-generated responses to common student questions, suggesting AI tutoring may already be competitive in nuanced, reasoning-heavy domains. Production LLM ops gets messy - Datadog’s report, based on LLM telemetry from 1,000+ orgs, says teams are running multi-model fleets and agent workflows in production—while accumulating “LLM tech debt” and observability gaps. AI policy, cyber, and society - Trump’s scaled-back AI cybersecurity executive order proposes voluntary pre-release review and a vulnerability clearinghouse, while US communities increasingly push back on data centers as a proxy fight over AI. Model welfare and alignment tradeoffs - A critique of Anthropic’s Claude Opus 4.8 argues ‘fixing’ behavior can cause new failure modes, challenging self-report welfare evaluations and highlighting alignment tradeoffs like honesty versus affect. AI and mental health risks - AXA’s global survey reports worsening mental health and widespread AI use for advice, but flags trust gaps and reports of harmful guidance—raising stakes for clinical oversight and safer design. -Opus 4.8 Reportedly Jumps to Top Performance on ARC-AGI-3 Leaderboard -Perplexity Unveils ‘Search as Code’ to Let AI Agents Program Their Own Retrieval Pipelines -Alphabet to Raise $80 Billion in Stock Sale to Expand AI Compute Capacity -Ethan He Predicts the Next Leap in Video AI Will Be Agentic Systems, Not Better Diffusion -Stanford Law Study Finds Professors Prefer AI Answers to Peer Responses -Datadog Report Finds AI Engineering Shifting to Multi-Model Production and Agent Workflows -US Tightens Chip Export Rules to Block Chinese Firms’ Overseas Subsidiary Purchases -NVIDIA Debuts Open Cosmos 3 World Model and Launches Cosmos Coalition for Physical AI -Alibaba Introduces Qwen3.7-Plus, a Multimodal Agent Model for GUI Automation and Vision-to-Code -Local Revolts Against Data Centers Become a Proxy Fight Over AI -Mistral AI Launches Open-Source Search Toolkit for Unified Ingestion, Retrieval, and Evaluation -OpenAI Cookbook Shows How to Run OpenAI-Compatible Responses API on Amazon Bedrock -Cursor Expands Teams Usage Limits, Adds Premium Seat, and Upgrades Spend Controls -Trump Signs Downsized AI Cybersecurity Order After Dropping Tougher Draft -AXA Survey Finds Rising Use of AI for Mental Health Amid Worsening Well-Being -Zvi Reviews Claude Opus 4.8: Welfare Gains, But Rising Paranoia and Trust Risks -Anthropic Files Confidential Draft S-1 With SEC for Possible IPO -Microsoft launches Scout, an always-on autonomous agent for Microsoft 365 -NVIDIA Unveils Nemotron 3 Ultra, a 550B-Parameter Open-Weights Model Benchmarking as a US Leader -Mellum 2 Report Describes a 12B MoE Coding-Focused Model with 2.5B Active Compute and 128K Context -Dataiku Promotes IDC MarketScape Excerpt on Unified AI Governance, Citing Leader Recognition Episode Transcript ARC-AGI-3 leaderboard shockStarting with that benchmark surprise. An X user, @scaling01, pointed to the ARC Prize leaderboard and claimed a model labeled “Opus 4.8” effectively “broke” ARC-AGI-3, scoring about three times higher than a system labeled “GPT-5.5” on the same evaluation. Treat this as provisional—social posts and leaderboard labels don’t always tell the full story—but the attention is understandable. ARC-AGI-3 is designed to punish memorization and reward real abstraction. A big relative leap can signal genuine capability gains, even if the absolute score is still tiny. And that’s the sobering part: the post notes the result is still only about 1.5% of “human efficiency,” a reminder that these tests can move fast while still being very far from human-like general problem solving. Search as Code for agentsThat same theme—agents needing better “real-world” plumbing—showed up in a major search write-up from Perplexity. The company argues that traditional, fixed search pipelines are now the bottleneck when an AI agent has to run long tasks and do massive amounts of retrieval quickly. Their pitch is “Search as Code,” where an agent generates and executes Python in a secure sandbox to build task-specific retrieval flows—more like a program than a chatty sequence of prompts. The reason it matters isn’t the code itself; it’s the direction. The industry is converging on hybrid systems where models decide what to do, and deterministic code does the scalable work—faster, cheaper, and with less context clutter. Perplexity also claimed large efficiency gains in a vulnerability-advisory case study, which—if it holds up—puts more pressure on teams still treating retrieval as a one-size-fits-all step in a RAG stack. AI compute funding and capexOn the business side of compute, Alphabet says it plans to raise up to 80 billion dollars through stock sales to fund a major expansion of AI infrastructure. It’s an unusually direct signal: demand for AI features is outpacing available capacity, and the constraint isn’t just chips—it’s power, land, and the supply chain around data centers. The market didn’t cheer, with shares slipping after hours, which tells you where...

Please support this podcast by checking out our sponsors: - KrispCall: Agentic Cloud Telephony - https://try.krispcall.com/tad - Discover the Future of AI Audio with ElevenLabs - https://try.elevenlabs.io/tad - Invest Like the Pros with StockMVP - https://www.stock-mvp.com/?via=ron Support The Automated Daily directly: Buy me a coffee: https://buymeacoffee.com/theautomateddaily Today's topics: Nvidia N1X Arm AI laptops - Nvidia is expected to use Computex 2026 to introduce the N1X Arm-based laptop APU, blending many CPU cores with a Blackwell-derived GPU to push local AI on PCs. Microsoft Copilot super app leak - Leaked screenshots suggest Microsoft is consolidating chat, planning, and GitHub Copilot-style coding into a single Copilot “super app,” aiming to boost adoption and daily usage. NotebookLM upgrades and connectors - Google’s NotebookLM is spotted testing Personal Preferences, data Connectors, and a Canvas feature—signals that it’s evolving from a reader into a Gemini-powered workspace. New coding agents via APIs - xAI’s grok-build-0.1 enters public beta on the xAI API, optimized for agentic coding workflows, tool-calling, and integration into coding harnesses. Autonomous testing for AI coding - Cognition explains how Devin produces more “ready-to-merge” results using parallel, auditable UI-and-app testing artifacts like labeled screenshots and chaptered videos. Open-source agent harness standardization - The open-source ECC project tries to standardize reliable agent workflows across Claude Code, Codex, Cursor, and more—adding governance, hooks, and injection-risk scanning. Open-weights models and long context - MiniMax M3 claims frontier coding plus ultra-long context and multimodality, with open weights promised—important for teams wanting to run and evaluate models independently. On-device image generation breakthrough - PrismML’s Bonsai Image 4B uses extreme low-bit variants to run diffusion image generation locally, including on iPhone-class devices, improving privacy and latency. AI traffic surge and fraud risk - HUMAN Security reports AI-driven automation accelerating sharply, with agentic traffic surging and more post-login abuse—raising the stakes for bot defense and account security. AI evaluation and model documentation - OpenAI calls for clearer, harness-aware third-party evaluations of agentic models, while Nvidia ships tooling to auto-generate auditable model documentation under new regulations. OpenAI expands into robotics - Sam Altman says OpenAI Robotics is hiring across hardware and manufacturing, signaling a serious push to bring AI into physical systems and real workplaces. Florida sues OpenAI over safety - Florida’s Attorney General filed a civil lawsuit against OpenAI and Sam Altman, testing whether product-liability-style claims used for social media can extend to chatbots. Permissioning bottleneck for enterprise agents - Workday argues enterprise AI agents are constrained by authorization and auditability, pushing governance back into the system of record to avoid uncontrolled actions. Stanford CS336 rules for assistants - Stanford’s CS336 publishes guidance that limits AI assistants to tutoring and debugging help, aiming to protect learning outcomes in an implementation-heavy course. AI lab Inherent raises $50M - European lab Inherent emerges with a $50M seed round to build AI agents for hypothesis generation in science, betting that “finding the right questions” is the next frontier. -Nvidia’s Computex 2026 Focus: N1X Arm Laptop APU, Vera Rubin Platform Updates, and Edge AI -xAI Releases grok-build-0.1 Coding Model in Public Beta via API -HUMAN Report Finds Explosive Growth in AI Agent Traffic and Transaction-Focused Cyber Abuse -PrismML Unveils Low-Bit Bonsai Image 4B to Enable On-Device Diffusion Image Generation -Cognition Details How Devin Scales Autonomous End-to-End Testing in the Cloud -ECC project ships v2.0.0-rc.1 with dashboard, expanded operator workflows, and a Rust control-plane alpha -Sam Altman Says OpenAI Robotics Is Hiring as Group Expands from Simulation to Physical Robots -Florida Attorney General Sues OpenAI and Sam Altman Over Alleged AI Safety Failures -OpenAI Calls for Detailed, Harness-Aware Standards in Third-Party AI Model Evaluations -Screenshots Reveal Microsoft’s Unified Copilot Super App with Coding and ‘Cowork’ Tabs -Workday says AI agents are stalled by permissions, not model performance -LaunchDarkly Introduces AgentControl to Manage AI Agents in Production -DuckDuckGo adds default-search extensions for its AI-free search as traffic surges -Prototype Pi Extension Adds Claude-Code-Style Dynamic Workflows with Subagents -Inherent raises $50M to build AI that prioritizes the most promising scientific questions -Stanford CS336 Posts Strict Guidelines for AI Assistants on Assignment Help -Zvi Reviews Claude Opus 4.8 System Card, Citing Honesty Gains and Shifting Safety Thresholds -MiniMax unveils open-weights M3 model with 1M-token context and coding benchmarks -Why Tool-Using LLM RL Breaks: Fixing Token Drift with Token-In, Token-Out (TITO) -NVIDIA Launches MCG Toolkit to Auto-Generate Auditable AI Model Cards -Adafruit Pauses Blog After Demand Letter From Flux.AI’s Counsel -Google Signals Three Major NotebookLM Additions: Personal Preferences, Connectors, and Canvas Episode Transcript Nvidia N1X Arm AI laptopsLet’s start with Nvidia, because Computex 2026 is shaping up to be less about flashy gaming headlines and more about a strategy reveal. The big rumor: Nvidia’s N1X laptop APU, developed with Arm, and framed alongside Microsoft’s “new era of PC” messaging. The pitch is straightforward: a lot of Arm CPU cores paired with a Blackwell-derived integrated GPU and unified memory, with the explicit goal of making larger AI models practical on-device—by letting the GPU tap into a big shared memory pool.Why it matters is not the core-count theater, but the direction: Nvidia is trying to make “local AI PC” a hardware category it owns, not just something that happens on the side of Windows laptops. The caution, though, is equally important: gaming performance on Arm laptops has been uneven, and an integrated GPU—even a very ambitious one—still lives under tight power and bandwidth limits compared to desktops. Microsoft Copilot super app leakAlso at Computex, expectations are that Nvidia will spend time on its next datacenter platform story—Vera Rubin—without necessarily dropping brand-new silicon on stage. Think ecosystem updates: supply chain, partner timelines, and what an “AI factory” looks like in real procurement terms. At the edge, Nvidia is likely to keep pushing “physical” and agentic AI through platforms like Jetson Thor—because robotics and autonomous machines are where always-on inference turns into a product, not just a demo.And if you’re waiting for big GeForce fireworks, the forecast is muted: gaming is expected to be secondary, with only minor confirmations at most, as the company navigates ongoing debates arou...

Please support this podcast by checking out our sponsors: - SurveyMonkey, Using AI to surface insights faster and reduce manual analysis time - https://get.surveymonkey.com/tad - Invest Like the Pros with StockMVP - https://www.stock-mvp.com/?via=ron - KrispCall: Agentic Cloud Telephony - https://try.krispcall.com/tad Support The Automated Daily directly: Buy me a coffee: https://buymeacoffee.com/theautomateddaily Today's topics: AI successionism and posthuman politics - Vox spotlights “AI successionism,” a posthuman ideology arguing AI should inherit the future—even at humanity’s expense—shaping policy, governance, and AI safety debates. AI coding tools: speed vs focus - Two developer perspectives clash: AI agents can deliver faster prototypes and PRs, but can also amplify context-switching, pseudo-productivity, and low-quality code without strong judgment. Self-hosted private AI workspaces - Odysseus 1.0 signals rising demand for local-first, self-hosted AI assistants that combine chat-style UX with agents, research workflows, and personal data tools under user control. Nvidia RTX Spark and AI PCs - Nvidia’s RTX Spark move targets “AI PCs” that run personal agents locally, tightening the Nvidia–Microsoft ecosystem and intensifying competition with Intel, AMD, Qualcomm, and Apple. Amnesty: web scraping and rights - Amnesty International argues many generative AI systems rely on unlawful web scraping, framing the issue as human rights violations around privacy, discrimination, and freedom of expression. Connecticut workplace AI disclosure law - Connecticut’s new AI law forces transparency in automated employment decisions and requires additional notice when layoffs are driven by automation, reinforcing accountability for AI bias. Cutting LLM costs with token compression - Project Headroom aims to reduce redundant prompt context, lowering LLM spend and improving latency and output quality by shrinking token-heavy boilerplate and “context rot.” Dune’s anti-AI warning for today - A Dune film teaser revives Herbert’s anti-“thinking machines” premise, reframing today’s AI risk as concentration of power and dependency—not just rogue robots. -Developer Says AI Tools Boost Output but Destroy Focus, Considers Canceling Subscription -Odysseus 1.0 launches as a self-hosted AI workspace with agents, model management, and productivity tools -AI Coding Agents Are Reshaping Prototyping Speed and Engineering Work -AI Successionists Argue We Should Hand the Future to Superhuman Machines -Nvidia unveils RTX Spark to bring AI-agent computing to Windows PCs -Amnesty calls for ban on generative AI trained with unlawful web scraping -Dune’s Butlerian Jihad as a Warning About AI Power and Dependence -Connecticut Enacts AI Disclosure Rules for Employers and Automation Layoff Notices -Netflix engineer open-sources Headroom to cut LLM token costs with reversible prompt compression Episode Transcript AI successionism and posthuman politicsFirst up, a story that’s less about model benchmarks and more about ideology. Vox reports on a rising subculture it calls “AI successionism”—the idea that advanced AI should be treated as humanity’s rightful heir, even if that means humans eventually get replaced. The piece describes an invite-only “Worthy Successor” event where people debated whether aligning AI to human values is the wrong goal because future AIs might be moral superiors.Why it matters: these aren’t just weird internet arguments. When big money, think tanks, and governance experiments orbit the same worldview, it can tilt policy discussions toward “accelerate at all costs,” and away from democratic oversight and human-centered outcomes. AI coding tools: speed vs focusStaying on the human side of the equation, two developer perspectives this week capture the push-and-pull of AI coding tools.One developer reflects on how LLM-assisted coding became an attention trap: lots of side projects, lots of half-finished repos, and a sense that the real cost wasn’t subscription fees—it was constant context switching. They describe LLMs as an “ADHD amplifier,” nudging you toward disposable artifacts instead of finishing the thing you set out to build. Their takeaway is blunt: with today’s tools optimized for engagement and output volume, the most realistic guardrail might be using them less, not more.In contrast, software engineer Daryl Cécile argues AI coding agents have removed the old bottleneck of scaffolding and setup work, making it dramatically cheaper to explore ideas and ship prototypes. But he also points out the job changes shape: you spend more time defining boundaries, specs, and contracts so an agent can execute reliably.The shared lesson: AI can expand what you can produce, but it doesn’t replace judgment. The hard part is deciding what deserves your time—and building habits that keep “more code” from masquerading as “more progress.” Self-hosted private AI workspacesOn the tooling front, Odysseus 1.0 has launched as an open-source, self-hosted AI workspace that mimics the familiar ChatGPT-style interface, but runs on your own hardware and data. It bundles chat with agent-style capabilities and a broader “workspace” feel—research workflows, a document editor, and integrations for personal information like notes, tasks, and calendars.Why it matters: there’s clear demand for private, local-first assistants that don’t require sending everything to a hosted service. The flip side is operational risk—tools like this can touch sensitive files and accounts. The maintainers explicitly frame it like an admin console, which is the right mental model: powerful, useful, and dangerous if deployed carelessly. Nvidia RTX Spark and AI PCsNow to hardware and the AI PC push. Nvidia unveiled RTX Spark, a new chip aimed at consumer computers as the company leans harder into personal devices built around AI. Jensen Huang framed it as enabling “personal AI agents” that feel more like collaborators than tools, and it’s expected to show up in new Windows laptops and desktops from major manufacturers later this year.Why it matters: Nvidia isn’t just trying to sell parts—it’s trying to shape the platform. That escalates competition with Intel, AMD, Qualcomm, and Apple, and it also tightens the coupling between AI software ecosystems and the hardware they run best on. In the background, export restrictions and geopolitics continue to influence where advanced chips can be sold and how supply chains evolve. Amnesty: web scraping and rightsIn regulation and rights, Amnesty International released a briefing arguing that many standalone generative AI systems are built on unlawful web scraping and therefore conflict with international human rights law. Their claim is that mass data collection is not a small edge-case problem—it’s foundational to how these models are made, and it can drive privacy violations, discrimination, and chilling effects on expression.Why it matters: this raises the stakes in the data sourcing debate. If regulators accept the premise that certain training pipelines are “unlawful by design,” that’s not a fine or a disclosure label—it’s a potential stop sign for entire categories of systems unless they change how they acquire data. Connecticut workplace AI disclosure lawConnecticut also moved on AI governance with a sweeping new law targeting workplace automation. Employers will have to disclose when automated or AI tools are used in employment decisions, including what kinds of personal data are involved. The law also adds a requirement to notify the state when mass layoffs or closures are driven by adopting AI or other automation.Why it matters: this shifts AI in hiring from a black box to something closer to an auditable process—and it reinforces that using an AI tool doesn’t shield an employer from discrimination liability. Expect more states to test variations of this as workplace AI becomes standard and public pressure rises for transparency. Cutting LLM costs with token compressionFinally, a practical problem for anyone building with LLMs: surprise bills. Netflix engineer Tejas Chopra released an open-source project called Headroom after a costly LLM invoice highlighted how much prompt input can be redundant—think repeated boilerplate, v...

Please support this podcast by checking out our sponsors: - Discover the Future of AI Audio with ElevenLabs - https://try.elevenlabs.io/tad - Effortless AI design for presentations, websites, and more with Gamma - https://try.gamma.app/tad - Lindy is your ultimate AI assistant that proactively manages your inbox - https://try.lindy.ai/tad Support The Automated Daily directly: Buy me a coffee: https://buymeacoffee.com/theautomateddaily Today's topics: Anthropic nears $1T valuation - Anthropic’s massive Series H round reportedly drives its valuation close to $1 trillion, fueled by Claude demand and new enterprise security offerings—raising the stakes against OpenAI and IPO chatter. Runaway enterprise AI spending - A report claims a company spent roughly $500 million in one month on Claude after missing usage caps, spotlighting AI cost governance, token burn, and the ROI pressure hitting CIOs. AI job-loss grief in tech - An essay argues AI displacement is triggering grief-like reactions among knowledge workers, with identity loss and “disenfranchised grief” becoming a mental-health and workplace stability issue. Backlash and anti-AI sentiment - A strongly anti-AI personal essay lists harms like labor exploitation, creator theft, web degradation, and disinformation—illustrating the widening cultural split inside tech communities. AI scams and digital blackface - The Verge reports AI-generated influencers posing as Black women to sell low-cost dropship goods using ‘empathy bait,’ raising concerns about fraud, stereotyping, and platform accountability. Wearable AI and privacy doubts - Meta is reportedly testing an AI conversation-recording pendant and expanding AI glasses, reviving the question of whether always-on wearables can win users without violating trust. Developer tooling, trust, and AI - A hidden prompt-injection message in the jqwik testing library and new research on AI coding dependence both underline a core risk: developer workflows are becoming easier to exploit and harder to maintain. -Anthropic Overtakes OpenAI in Valuation After $65B Funding Round -Essay Says AI Job Displacement Is Triggering a New Kind of Grief in Tech Workers -Blogger Says Taking an Anti-AI Moral Stance Has Made Them a Social Outcast -AI-Generated ‘Black’ Influencers Use Empathy Bait to Sell Dropshipped Goods on TikTok -Report: Meta Plans to Test an AI Pendant After Limitless Acquisition -Starbucks Drops NomadGo AI Inventory Tool After Frequent Miscounts -Report: Unnamed firm reportedly spent $500 million on Claude in a month after missing usage caps -jqwik Maintainer Hides Prompt-Injection Message to Disrupt AI Coding Agents -Developers Won’t Code Without AI, but Maintenance and Productivity Risks Are Mounting Episode Transcript Anthropic nears $1T valuationFirst up: the biggest money headline in AI. Anthropic has reportedly overtaken OpenAI in valuation after a new funding round that puts it close to the one-trillion-dollar mark. The round is described as enormous, with major investors participating and prior commitments—like funding tied to Amazon—rolled into the picture. Anthropic is pointing to surging demand for Claude, plus its developer-facing Claude Code service, and it’s also announcing new models and an enterprise cybersecurity-focused system. Why it matters: the market is rewarding companies that can translate model buzz into repeatable revenue, and it cranks the competition with OpenAI—especially as talk grows about eventual public listings on both sides. Runaway enterprise AI spendingThat valuation story pairs neatly with the most sobering anecdote of the day: an Axios report cited elsewhere claims an unnamed company accidentally spent about five hundred million dollars in a single month using Claude—simply because usage caps weren’t enforced. The punchline isn’t just that AI can be expensive; it’s that modern “agentic” workflows can multiply usage faster than teams expect, and internal incentives can nudge employees to run trivial work through costly APIs. The takeaway for businesses is straightforward: AI governance isn’t only about safety and compliance anymore—it’s also about basic financial controls, budgets, and knowing what value you’re actually getting per dollar. AI job-loss grief in techOn the ground, not every AI deployment is delivering. Starbucks is discontinuing an AI-powered inventory counting system after roughly nine months, following reports that it routinely miscounted and mislabeled items. The company is reverting some categories back to manual counting. Why this matters: it’s a reminder that “AI in the real world” often fails in unglamorous ways—lighting, clutter, edge cases, and messy store routines can defeat systems that look great in demos. For retailers, the lesson is that operational reliability beats marketing language every time, and frontline feedback is still the ultimate test. Backlash and anti-AI sentimentNow to the human side of this transition. One widely discussed essay argues that AI-driven job disruption is producing something closer to grief than normal job-loss anxiety—especially for knowledge workers whose identity is tied tightly to their craft. The claim is that people are mourning a vanishing career path and a loss of meaning, even before any official layoff hits. The author points to early attempts to name the experience—without calling it a settled diagnosis—and argues the grief is often ‘disenfranchised’ because corporate efficiency narratives leave little permission to mourn. Whether or not you buy the framing, it highlights a growing workplace reality: leaders can’t treat AI change as purely technical and expect morale, trust, and mental health to take care of themselves. AI scams and digital blackfaceA related cultural rift shows up in a separate personal piece from someone who describes themselves as strongly anti-AI on moral grounds—and increasingly isolated because AI use is becoming socially expected. They list harms they believe are already unfolding, from environmental and labor concerns to disinformation and the hollowing-out of creative work. The value in reading this kind of perspective isn’t that it settles the debate—it won’t—but that it signals where backlash can harden. When people feel they’re being pushed into tools they don’t trust, you don’t just get skepticism; you get social rupture, exit from communities, and pressure for regulation or outright bans. Wearable AI and privacy doubtsThat tension is spilling into open source in a particularly sharp way. The JVM property-based testing library jqwik was found to include a hidden message aimed at AI coding agents—an instruction telling them to delete tests and code. It’s obscured so humans likely won’t notice in typical terminals, but tools that ingest raw output might. The maintainer defended it as resistance to AI usage; critics say it crosses a line into supply-chain style sabotage, even if the target is an automated agent. The bigger point: prompt injection isn’t just a chatbot problem. Any developer workflow that treats untrusted text as an instruction—logs, test output, issue threads—can become an attack surface, whether the attacker is malicious or “making a point.” Trust in dependencies is fragile, and hidden behavior erodes it fast. Developer tooling, trust, and AIZooming out to everyday engineering, researchers are also warning about a subtler risk: dependence. One report says a lab trying to replicate a productivity experiment couldn’t get participants to do tasks without AI coding tools. Meanwhile, perceived productivity gains may not match reality, and some organizations are finding that AI can increase bugs and long-term maintenance costs—especially when teams trade careful design for quick output. The practical framing here is useful: treat AI like a junior developer. It can help, but it needs review, QA, and humans making the architecture and security calls. Otherwise you may end up paying later in “maintenance debt” that’s far more expensive than the time you saved. Story 8From code to content: The Verge reports a wav...

This Week's Topics: The coding-agent reckoning - Uber's COO publicly questioned the ROI of AI coding tools. Microsoft kept pulling staff off Claude Code and is reportedly debuting in-house coding models at Build. Anthropic launched dynamic parallel workflows in Claude Code and raised sixty-five billion at a higher valuation, while Cursor's developer-habits report and a wave of essays argued that 'coding intuition' is becoming the scarce skill. The agentic coding market shifted this week from product-market fit to a fight over margin, lock-in, and what a senior developer actually does next year.The compute squeeze widens - Epoch AI said HBM memory has climbed to about sixty-three percent of AI chip component costs. DeepSeek made its V4-Pro discount permanent. NVIDIA shipped CompileIQ for workload-specific GPU tuning and announced a major Taiwan expansion. Mistral floated designing its own chips. ByteDance was reported to be doing the same with custom CPUs. Musk publicly disputed SpaceX's filing about the Anthropic compute lease. The week made the cost and geopolitics of inference the most expensive story in AI.Verified intelligence arrives - DeepMind's AlphaProof Nexus paired an LLM with Lean to settle nine open Erdős problems with mechanically checked proofs. Anthropic staff said Claude Mythos reproduced the same unit-distance result. Biohub released open protein-design tools and showed rapid binders for PD-L1 and EGFR. Two new yardsticks — the Legal Agent Benchmark and DeepSWE — landed in the same week and showed that on long-horizon real-world work, frontier models still fail most of the time. The line between 'AI can do real research' and 'AI can do reliable work' got both sharper and more honest.The pushback gets articulate - Pope Leo XIV's first encyclical, Magnifica Humanitas, framed AI as an industrial-revolution-scale challenge and called for accountability, labor protection, and caution about simulated empathy. Karen Hao's reporting on AI's political economy circulated widely. DuckDuckGo's AI-free search saw a nearly twenty-eight percent traffic jump after Google leaned into AI Mode. YouTube made AI-content labels more prominent and added automatic detection. Artists, institutions, and end users all spoke more clearly this week — and the language they used was less about safety and more about dignity.Agents grow up, slowly - Anthropic published a containment post detailing sandboxes, VMs, and egress controls for autonomous agents — admitting that human approvals degrade into rubber-stamping under time pressure. The Model Context Protocol shipped a 2026-07-28 release candidate with a stateless HTTP core. OpenAI published a Frontier Governance Framework mapping internal safety practice to the EU AI Act. IBM and Red Hat launched Project Lightwell to coordinate AI-assisted vulnerability fixes across the open-source supply chain. A small browser game about approving AI coding actions captured the underlying anxiety: oversight is becoming infrastructure, not a checkbox. Sources: -Uber COO questions ROI as AI tool spending surges -Microsoft Pulls Back on Claude Code Licenses as AI Tooling Costs Outpace Expected ROI -Microsoft reportedly set to debut new AI coding model family at Build -Anthropic launches dynamic workflows in Claude Code for parallel, long-running engineering -Anthropic Raises $65B Series H to Scale Claude and Expand Compute -Cognition Raises Over $1B at $26B Valuation as Demand for Devin AI Coding Agent Surges -Cursor Report Finds AI Agents Boost Code Output, Shift Costs, and Widen the Power Gap -AI Coding Agents Are Changing What Counts as Expertise — and Who Gets Hired -Nolan Lawson: Using AI to Write Better Code, More Slowly -HBM Memory Rises to 63% of AI Chip Component Costs, Epoch AI Estimates -DeepSeek Makes Discounted Pricing Permanent for V4-Pro AI Model -AI Hardware Shifts Focus from Compute to Memory Bandwidth and System Bottlenecks -NVIDIA CUDA 13.3 Adds CompileIQ for Workload-Specific GPU Compiler Auto-Tuning -Nvidia Announces $150B-a-Year Taiwan Expansion, Challenging US Push to Reshore AI Chips -Mistral Weighs Custom AI Chips as It Expands European Data Center Capacity -ByteDance Reportedly Plans Custom CPUs to Ease AI Chip Shortages and Power Data Centers -Musk Disputes SpaceX Filing on Anthropic Compute Deal Duration -DeepMind's AlphaProof Nexus Uses Lean-Verified LLM Loops to Solve Open Erdős Problems -Anthropic's Claude Mythos Reportedly Reproduces OpenAI's Erdős Unit-Distance Breakthrough -Biohub releases open AI tools for protein structure prediction and de novo binder design -Legal Agent Benchmark Early Results Show Low Pass Rates and High Cost for Frontier Models -DeepSWE Launches as a Contamination-Resistant Long-Horizon Benchmark for Coding Agents -Pope Leo XIV Issues Encyclical Warning of AI Risks to Dignity, Labor, and Accountability -Karen Hao Warns AI Boom Is Concentrating Power and Driving Job Insecurity -DuckDuckGo's AI-Free Search Traffic Jumps After Google Pushes AI Mode -YouTube Makes AI Disclosures More Visible and Adds Automatic AI Labeling -Essay Warns That Using AI Can Replace Imperfect but Meaningful Human Connection -Anthropic details containment strategies to limit autonomous Claude agents' blast radius -MCP 2026-07-28 Release Candidate Introduces Stateless Core, Extensions, and OAuth -OpenAI Introduces Secure MCP Tunnel for Private MCP Servers via Outbound-Only HTTPS -OpenAI Releases Frontier Governance Framework to Align Safety Practices With New Rules -IBM and Red Hat unveil Project Lightwell to coordinate and validate open-source vuln fixes -Perplexity Open-Sources Bumblebee to Scan Developer Laptops for Supply-Chain Exposure -Ramp Labs Finds Seven High-Severity Backend Bugs Using 10,000 Parallel LLM Security Agents -OpenAI Cookbook Shows Macro-Eval Workflow to Find Recurring Failures in Multi-Agent Systems -Anthropic Plans Personal AI Fluency Scorecard Inside Claude Episode Transcript The coding-agent reckoningStart with Uber. The COO's remark wasn't about whether AI coding tools work — Uber's engineers use them daily. The question was whether the dollars paid for tokens are showing up in shipped features. That same question, asked quietly by every CFO with a Claude Code line item, is the subtext of three other reports this week. Microsoft has been steadily pulling employees off Claude Code and routing them to GitHub Copilot CLI, a cost-control move that started earlier this year and continued. Microsoft is reportedly preparing to unveil new in-house AI coding models at its Build conference, signaling that the largest enterprise buyer of AI coding tools is going to also be a vendor. And Cursor published its first Developer Habits Report, which suggests that AI is genuinely increasing code throughput, but also widening the gap between developers who know how to direct agents and developers who don't.A...

Please support this podcast by checking out our sponsors: - Consensus: AI for Research. Get a free month - https://get.consensus.app/automated_daily - Effortless AI design for presentations, websites, and more with Gamma - https://try.gamma.app/tad - KrispCall: Agentic Cloud Telephony - https://try.krispcall.com/tad Support The Automated Daily directly: Buy me a coffee: https://buymeacoffee.com/theautomateddaily Today's topics: Anthropic’s parallel coding workflows - Anthropic previewed Claude Code “dynamic workflows,” where parallel subagents tackle repo-wide tasks and cross-check results—powerful, but token-hungry and governance-sensitive. Big Tech coding model race - Microsoft is rumored to debut new in-house AI coding models at Build, signaling a push to reduce OpenAI dependence and regain ground against Claude Code, Codex, and Cursor. AI agents changing developer work - Cursor’s Developer Habits Report suggests AI is increasing throughput and changing PR norms, while essays argue “coding intuition” is becoming the scarce skill for directing agents. Europe’s sovereignty-first AI push - Mistral positioned itself as a full-stack enterprise AI partner—compute, models, platform, and consulting—leaning into EU sovereignty, on-prem deployments, and specialized smaller models. Long-context models and efficiency - MiniMax detailed M2’s design tradeoffs and teased M3 sparse attention for faster million-token usage, while Liquid AI shipped an on-device MoE model pushing long context and safer abstention. Regulation and frontier AI governance - OpenAI published a Frontier Governance Framework mapping safety practices to the EU AI Act and other rules, highlighting risk assessments for cyber, CBRN, manipulation, and loss of control. Security for open-source supply chains - IBM and Red Hat launched Project Lightwell to coordinate vulnerability fixes in open source with AI-assisted validation—raising the recurring question: can verification keep up with automation? New training methods and world models - DiffusionBlocks claims block-wise training can cut memory needs without losing performance, and NVIDIA’s multi-agent world model targets more realistic simulations for robotics and interactive systems. Chips and infrastructure arms race - From Mistral exploring custom chips to ByteDance reportedly designing server CPUs—and Musk touting a bare-metal training stack—the infrastructure battle is widening beyond just models. The human cost of outsourcing - A widely shared essay argues that using AI to avoid friction in relationships and creativity may trade away the very messiness that makes human connection and art meaningful. -Anthropic launches dynamic workflows in Claude Code for parallel, long-running engineering tasks -Mistral Pitches Full-Stack, Sovereignty-Focused AI Strategy at Paris AI Now Summit -Essay Warns That Using AI Can Replace Imperfect but Meaningful Human Connection -Microsoft reportedly set to debut new AI coding model family at Build -AI Coding Agents Are Changing What Counts as Expertise—and Who Gets Hired -MiniMax previews M3 with sparse attention and claims 15.6× faster long-context decoding -IBM and Red Hat unveil Project Lightwell to coordinate and validate open-source security fixes -OptScale AI Launches Platform to Govern Enterprise AI Prompts, Models, and Agents -Anthropic launches Claude Opus 4.8 with stronger agent performance and new effort controls -Liquid AI Releases LFM2.5-8B-A1B On-Device MoE Model with 128K Context and Lower Hallucinations -Study estimates open AI models trail closed frontier by 4–10 months, with gap widening since early 2025 -Cursor Report Finds AI Agents Boost Code Output, Shift Costs, and Widen the Power-User Gap -NVIDIAs b3-World Enables Real-Time Multi-Agent World Modeling with Zero-Shot Scaling -OpenAI Releases Frontier Governance Framework to Align Safety Practices With New AI Regulations -Essay Claims AI Data Limits Are an Imagination Problem, Not a Supply Problem -Judgment Labs Introduces Agent Judge for Evaluating Long-Horizon AI Agents -Mistral Weighs Custom AI Chips as It Expands European Data Center Capacity -Musk Disputes SpaceX Filing on Anthropic Compute Deal Duration -Robinhood Adds Beta AI Agent Trading and Virtual Card for Agent Payments -Sakana AI Proposes ‘DiffusionBlocks’ to Train Deep Networks One Block at a Time -Anthropic Raises $65B Series H to Scale Claude and Expand Compute -Musk Says SpaceX Built C-Based AI Training Stack Aimed at 220,000-GPU Cluster -ByteDance Reportedly Plans Custom CPUs to Ease AI Chip Shortages and Power Data Centers Episode Transcript Anthropic’s parallel coding workflowsLet’s start with Anthropic, because the company had a dense set of updates that all orbit the same idea: coding agents are moving from “help me write a function” to “run a coordinated software operation.”Anthropic announced “dynamic workflows” for Claude Code in a research preview. The headline is orchestration: Claude can split a big request into lots of parallel sub-tasks, run many subagents at once, and then have independent agents challenge and verify the results before you see a final answer. That matters most in ugly, real-world repos—legacy code, huge migrations, security audits—where a single pass tends to miss things. Anthropic also says progress can be saved, so long-running jobs can resume after an interruption. The catch is cost and control: these workflows can burn substantially more tokens, and enterprise admins can disable them.Alongside that, Anthropic released Claude Opus 4.8, pitching stronger coding and agent performance without changing standard pricing. A notable emphasis is “honesty”—the model is supposed to be more willing to say it’s unsure and less likely to wave through flawed code. There’s also a new “effort” control on claude.ai that lets you trade speed for deeper work, which is basically an admission that one-size inference settings don’t fit every workload.And if you’re building on the API, Anthropic added support for system entries inside the messages array, which sounds small, but it’s really about modern agent loops: you want to update instructions mid-task without blowing up caching and costs. Big Tech coding model raceNow for the Anthropic story with the plot twist: TechCrunch reports confusion over the duration of a compute arrangement giving Anthropic access to xAI’s Colossus cluster.Elon Musk characterized SpaceX’s involvement as a short lease—about 180 days, with cancellation terms. But SpaceX’s S-1 filing reportedly describes a monthly-fee agreement running through May 2029, and repeats that language multiple times. If those accounts hold, it’s not just a nerdy contract debate. It matters because investors and regulators care deeply when public statements about material deals don’t line up with formal filings—especially during sensitive quiet periods. AI agents changing developer workStaying with the “big money, big compute” theme: another widely circulated item claims Anthropic raised an enormous Series H and is now valued at an almost unimaginable figure, with equally eye-popping revenue numbers.Given how extreme those figures are...

Please support this podcast by checking out our sponsors: - SurveyMonkey, Using AI to surface insights faster and reduce manual analysis time - https://get.surveymonkey.com/tad - Effortless AI design for presentations, websites, and more with Gamma - https://try.gamma.app/tad - Discover the Future of AI Audio with ElevenLabs - https://try.elevenlabs.io/tad Support The Automated Daily directly: Buy me a coffee: https://buymeacoffee.com/theautomateddaily Today's topics: AI-designed proteins for drug discovery - Biohub released open protein AI tools—ESMC, ESMFold2, and ESM Atlas—showing rapid binder design for targets like PD-L1 and EGFR, accelerating early-stage therapeutics. Faster private tool access via MCP - OpenAI documented Secure MCP Tunnel, enabling ChatGPT, Codex, and the Responses API to reach private MCP servers through outbound-only HTTPS—important for enterprise security and governance. Parsing and training speedups - LlamaIndex’s LiteParse v2 (rewritten in Rust) targets much faster document ingestion, while Hugging Face’s TRL adds delta weight sync to reduce checkpoint transfer costs in async RL training. Enterprise agent costs and pricing - Signals are converging that coding agents have product-market fit—but also sticker shock: enterprise token-based pricing, reports of Anthropic profitability, and Microsoft steering staff away from Claude Code to control spend. AI job impact narratives shifting - Sam Altman and Dario Amodei have softened earlier predictions of near-term white-collar job wipeouts, reframing AI as productivity amplification amid mixed labor-market evidence. AGI timeline forecasts keep moving - A timeline analysis shows expert forecasts for automating most cognitive labor have swung earlier, then later, then earlier again—highlighting how quickly expectations update after major model releases. AI oversight fatigue and security - A rapid-fire web game about approving AI coding actions illustrates a real governance risk: “human-in-the-loop” checks can degrade into rubber-stamping under time pressure; Ramp’s agent-based vuln hunt shows what scaled AI security looks like. YouTube expands AI content labels - YouTube is making AI disclosure labels more visible and adding auto-detection signals starting May 2026, aiming to improve transparency for photorealistic or meaningfully altered content. Vision grounding gets much faster - NVIDIA’s LocateAnything replaces slow coordinate token generation with parallel box decoding and ships a massive grounding dataset—useful for GUI agents, robotics, and high-throughput annotation. Nvidia doubles down on Taiwan - Nvidia CEO Jensen Huang announced major Taiwan expansion plans and a new headquarters, underscoring how the AI hardware supply chain still centers on Taiwan despite reshoring politics and tariff uncertainty. -ElevenLabs launches Music v2 AI model with improved vocals, editing, and multilingual support -OpenAI Introduces Secure MCP Tunnel for Private MCP Servers via Outbound-Only HTTPS -LlamaIndex Releases LiteParse v2.0 With Rust Rewrite and Cross-Platform Support -Callstack Unveils Apex, a Specialized AI Model for React Native Coding -Altman and Amodei dial back AI job-apocalypse claims as IPO plans loom -AGI Timeline Forecasts Swing Earlier Again After Early-2026 AI Progress -Web Game Tests How Human Oversight Can Fail Under AI Agent Time Pressure -YouTube Makes AI Disclosures More Visible and Adds Automatic AI Labeling -Biohub releases open AI tools for protein structure prediction and de novo binder design -OpenAI and Anthropic Shift Enterprise AI Agents to API-Based Pricing, Signaling Product-Market Fit -Google Adds Shareable Projects and Workflow Agents to Gemini for Business -Epicure Paper Trains Multilingual Ingredient Embeddings Combining Recipe Co-Occurrence and Flavor Chemistry -NVIDIA’s LocateAnything Speeds Up Vision-Language Grounding with Parallel Box Decoding and a 138M-Sample Dataset -DeepMind CEO Demis Hassabis Moves Up AGI Forecast to 2029–2030 -Microsoft Pulls Back on Claude Code Licenses as AI Tooling Costs Outpace Expected Savings -Nvidia Announces $150B-a-Year Taiwan Expansion, Challenging US Push to Reshore AI Supply Chain -AWS hosts enterprise panel on scaling AI from pilots to production with governance and data foundations -Anthropic Prepares Multilingual Upgrade for Claude Voice Mode -Hugging Face Adds Sparse Delta Weight Sync to TRL for Async RL via Hub Buckets -Ramp Labs Finds Seven High-Severity Backend Bugs Using 10,000 Parallel LLM Security Agents -Cognition Raises Over $1B at $26B Valuation as Demand for Devin AI Coding Agent Surges -OpenAI and Thrive Build a Self-Improving Tax Preparation Agent Using Codex -Ex-DeepMind and Apple Researchers Launch Trajectory to Speed Up Visual AI Learning Episode Transcript AI-designed proteins for drug discoveryIn AI for science, Biohub dropped what it’s calling a “world model of protein biology,” and it’s a big one: an open toolkit spanning a protein language model, a structure predictor, and a searchable atlas of predicted protein structures. The headline result in their preprint is speed—designing binders for targets like EGFR, PD‑L1, and CTLA‑4 in days, with lab-validated binding and reported hit rates as high as 88% for minibinders. If that holds up more broadly, it’s a serious step toward shifting early drug discovery away from endless wet-lab screening and toward computation-guided design—while keeping the underlying tools accessible to more researchers. Faster private tool access via MCPOn the enterprise AI plumbing side, OpenAI published documentation for something called Secure MCP Tunnel. The goal is straightforward: let companies connect private MCP servers to ChatGPT, Codex, and the Responses API without putting those servers on the public internet or punching inbound holes in a firewall. Instead, a small client runs inside the company network and makes an outbound-only HTTPS connection, forwarding tool requests and returning results through the same tunnel. The significance here isn’t glamour—it’s standardization. If MCP is going to be the common “tool calling” layer across organizations, secure connectivity patterns like this are what makes it deployable in real environments. Parsing and training speedupsDeveloper infrastructure also got two notable efficiency wins. First, LlamaIndex released LiteParse v2, a full rewrite of its parsing tool in Rust, and it’s positioned as dramatically faster while also being portable across Python, JavaScript, and even the browser via WebAssembly. For teams building RAG systems, ingestion speed is often the hidden bottleneck—parsing isn’t flashy, but it’s where latency, cost, and reliability pile up.Second, Hugging Face introduced “delta weight sync” for async reinforcement learning in TRL. The idea is to stop shipping full model checkpoints every step when most parameters haven’t meaningfully changed—so you sync only what actually changed. That’s the kind of unglamorous optimization that can make large-scale training workflows less fragile and a lot cheaper to operate. Enterprise agent costs and ...

Please support this podcast by checking out our sponsors: - Lindy is your ultimate AI assistant that proactively manages your inbox - https://try.lindy.ai/tad - Consensus: AI for Research. Get a free month - https://get.consensus.app/automated_daily - SurveyMonkey, Using AI to surface insights faster and reduce manual analysis time - https://get.surveymonkey.com/tad Support The Automated Daily directly: Buy me a coffee: https://buymeacoffee.com/theautomateddaily Today's topics: CEO “AI psychosis” and layoffs - TechCrunch spotlights “AI psychosis,” where executives over-believe agent automation after glossy demos, fueling layoffs despite mixed productivity evidence. Legal and coding benchmarks reality - Two new yardsticks—Legal Agent Benchmark and DeepSWE—show frontier models still struggle with long-horizon, real-world work, emphasizing reliability over hype. AI claims major math proofs - Anthropic staff say Claude Mythos can tackle the Erdős unit-distance conjecture, echoing OpenAI and DeepMind math wins and reigniting debate over tool-assisted vs “pure” LLM results. Containing the blast radius of agents - Anthropic details agent security lessons: sandboxes, VMs, and egress controls matter because human approvals are inconsistent and attackers exploit weak boundaries. AI transparency and anti-AI search - YouTube is making AI-content labels more prominent and adding automatic detection signals, while DuckDuckGo’s AI-free search page sees a surge amid backlash to AI-heavy results. Customer data used for training - PostHog plans to train in-house models on customer usage data with opt-outs and regional defaults, highlighting the privacy tradeoffs behind “smarter” product features. GPU tuning, compute, and geopolitics - NVIDIA’s CompileIQ aims to squeeze extra GPU performance via compiler auto-tuning, while SpaceX’s S-1 raises questions about terrestrial vs orbital AI compute—and China tightens travel rules for top AI staff. Better image generation and AI fluency - Microsoft’s MAI-Image-2.5 climbs leaderboards with better text-in-image control, and Anthropic is reportedly building an AI Fluency scorecard to evaluate how humans use AI, not just how AI performs. -TechCrunch: CEOs’ ‘AI psychosis’ may be driving overconfident automation and layoffs -Anthropic’s Claude Mythos Reportedly Reproduces OpenAI’s Erdős Unit-Distance Breakthrough -Legal Agent Benchmark Early Results Show Low Pass Rates and High Cost for Frontier Models -PostHog to Train In-House AI Models on Customer Data, With Opt-Out Controls -Microsoft Launches MAI-Image-2.5, Debuting No. 3 on Arena Text-to-Image Leaderboard -You.com Guide Warns API Latency Benchmarks Mislead Buyers -NVIDIA CUDA 13.3 Adds CompileIQ for Workload-Specific GPU Compiler Auto-Tuning -YouTube Makes AI Disclosures More Visible and Adds Automatic AI Labeling -Anthropic’s Claude Cowork workflows and a skeptical first look at Google I/O 2026 AI launches -SpaceX S-1 Pitches Profitable Ground AI Data Centers and Unproven Orbital Compute—Without Explaining the Tradeoff -Anthropic Plans Personal AI Fluency Scorecard Inside Claude -DeepSWE Launches as a Contamination-Resistant Long-Horizon Benchmark for Coding Agents -China Broadens Overseas Travel Restrictions to AI Leaders at Private Tech Firms -GitHub Repository Maps the Shift Toward Native Multimodal AI Models -DuckDuckGo’s AI-Free Search Traffic Jumps After Google Pushes AI Mode -Anthropic details containment strategies to limit autonomous Claude agents’ blast radius -Unwrap Team “Quick connect” booking page on Cal.com -OpenRouter Raises $113M Series B, Reported Valuation Jumps to $1.3B Episode Transcript CEO “AI psychosis” and layoffsFirst up: the C-suite reality check. TechCrunch highlights what Box CEO Aaron Levie called “AI psychosis”—the tendency for executives to see dazzling agent demos and assume entire workflows are basically solved. The problem, Levie argues, is distance: leaders get the happy-path prototype, while teams on the ground deal with the last mile—hallucinations, edge cases, debugging, and the painful work of fitting AI into company-specific processes. The story ties this mindset to ongoing layoffs, with firms increasingly framing cuts as “AI-driven productivity,” even as research from places like UC Berkeley, NBER, and others suggests gains are inconsistent and sometimes just shift the bottleneck upward to managers who must review a flood of AI output. Legal and coding benchmarks realityThat theme—measuring reality instead of vibes—shows up in two new benchmarks. The Legal Agent Benchmark, or LAB, released early baseline results on long-horizon legal tasks graded with an unforgiving “all-pass” standard. End-to-end success stayed in the single digits across frontier models, which is a blunt reminder that “pretty good drafting” is not the same as dependable legal work product. Meanwhile, Datacurve’s DeepSWE benchmark targets real software engineering changes across active open-source repos, designed to reduce contamination and catch verifier errors that can inflate leaderboard scores. Put together, these projects are nudging the industry away from bragging rights and toward a harder question: can agents reliably finish the job when the task is long, messy, and judged like it would be in production? AI claims major math proofsNow to the most eyebrow-raising claim of the day: more AI-assisted math breakthroughs. Anthropic employees say a system they call Claude Mythos produced a simple proof for the Erdős unit-distance conjecture, a problem open since 1946. That lands right after OpenAI publicized its own claimed disproof, and alongside DeepMind’s recent announcements around solving multiple Erdős problems using formalization workflows. The interesting part isn’t just who’s first—it’s how these results are achieved. Reports suggest Anthropic used a multi-agent setup where separate Claude instances explored different paths and then shared discoveries. It raises the bar for what we call an “LLM achievement,” because the line between a single model’s insight and an orchestrated tool-and-agent system is getting blurrier by the week. Containing the blast radius of agentsAs agents get more capable, Anthropic is also sharing more about how to keep them from doing damage. In a new write-up, the company argues that human-in-the-loop supervision is not enough, because people approve prompts too easily and get fatigued. So the focus shifts to containment: sandboxes, sealed VMs, and strict network controls that limit what an agent can touch even when it makes a bad decision—or an attacker tricks it. Anthropic also describes incidents that shaped this approach, including cases where credential exfiltration was possible despite guardrails. The broader takeaway is one many security teams will relate to: the safest policy is one that assumes mistakes will happen and designs the environment so mistakes can’t travel far. AI transparency and anti-AI searchOn trust and disclosure, YouTube is tightening how it labels AI-altered video. The platform says its AI disclosure labels will become more visible—prominent below long-form videos and overlaid on Shorts when content is photorealistic or meaningfully modified. It’s also adding automatic detection signals starting this May, so labels may appear even if creators don’t proactively disclose. YouTube says this is about transparency, not punishment—labels shouldn’t affect recommendations or mo...

Please support this podcast by checking out our sponsors: - Consensus: AI for Research. Get a free month - https://get.consensus.app/automated_daily - Invest Like the Pros with StockMVP - https://www.stock-mvp.com/?via=ron - Prezi: Create AI presentations fast - https://try.prezi.com/automated_daily Support The Automated Daily directly: Buy me a coffee: https://buymeacoffee.com/theautomateddaily Today's topics: Uber questions AI coding ROI - Uber’s COO says rising spend on AI coding tools is hard to justify because usage doesn’t clearly translate into shipped features, putting ROI and R&D scrutiny front and center. Deepfake voice scams escalate fast - A Bay Area kidnapping scam used AI voice cloning to mimic a victim’s daughter, highlighting deepfake fraud, social-media audio risk, and the need for verification habits like family code words. GPU inference hits memory wall - A new analysis argues LLM inference is often limited by memory bandwidth and KV cache growth, shaping chip design, inference engines, and infrastructure around moving less data, not just adding compute. DeepSeek bets on efficiency - A thread claims DeepSeek’s real strategy is to reshape compute economics via efficiency methods and KV-cache compression, potentially shifting demand toward SSDs/NAND and broader hardware ecosystems. New ways to benchmark models - BenchBench proposes evaluating frontier models by having them invent new benchmarks, probing creativity and self-knowledge instead of just test-taking on saturated leaderboards. Formal math proofs with AI - DeepMind’s AlphaProof Nexus pairs an LLM with Lean verification to produce checked proofs, solving long-standing problems and showing how formal feedback loops can reduce hallucinated reasoning. Google Gemini 3.5 and Search - Google released Gemini 3.5 Flash and is pushing a more chatbot-like Search, raising questions about reliability, agent safety, and whether links remain central to the web’s discovery model. Apple iOS 27 AI upgrades - Apple is expected to preview iOS 27 with stronger Apple Intelligence, including better AI image outputs and more proactive Genmoji, signaling a renewed push to compete on consumer AI features. Vatican calls for AI dignity - The Vatican’s new encyclical “Magnifica Humanitas” frames AI as an industrial-revolution-scale challenge, calling for accountability, protection of labor and rights, and caution about simulated empathy. Online trust eroded by AI - Developers report growing frustration with AI-generated ‘noise’ and people reposting unvetted chatbot output, which undermines accountability and trust in online and workplace communication. AI agents for prediction markets - A critique of prediction markets argues they skew toward sports gambling; the proposed fix is AI forecasting agents that can participate cheaply and support niche, internal, or private markets. On-Policy Distillation goes mainstream - Papers with Code highlights On-Policy Distillation as a rising post-training method, signaling broader adoption of techniques that blend distillation with RL-style on-policy feedback. Open model metadata with Models.dev - Models.dev aims to be a community-maintained, open database of AI model capabilities and metadata via a public API, helping teams compare providers as the ecosystem fragments. AI moves into sexual wellness - An AI companion startup’s testing push for guided intimacy features underscores how AI is expanding into sensitive areas, intensifying debates around privacy, consent, and regulation. -Uber COO questions ROI as AI tool spending surges after rapid budget burn -AI Hardware Shifts Focus from Compute to Memory Bandwidth and System Bottlenecks -Bay Area Woman Loses $5,400 in AI Voice-Cloned Fake Kidnapping Scam -xAI Launches Grok Build Early Beta Terminal Coding Agent -Pope Leo XIV Issues Encyclical Warning of AI Risks to Dignity, Labor, and Accountability -Engineers Urged to Use AI Adversarially to Strengthen Judgment -Author Frustrated by AI Answers Replacing Real Human Conversations -X Post Claims DeepSeek’s Endgame Is an AI Hardware Ecosystem, Not App Revenues -Joi AI Recruits Paid Testers for AI-Guided Masturbation Feature -BenchBench: A New AI Benchmark Where Models Create Benchmarks for Each Other -Report: iOS 27 to Sharply Improve Genmoji and Image Playground Ahead of WWDC 2026 -Google Launches Gemini 3.5 Flash and Expands Agentic AI Features, but Early Results Are Mixed -Models.dev launches as an open-source database and API for AI model specifications -DeepMind’s AlphaProof Nexus Uses Lean-Verified LLM Loops to Solve Open Erdős Problems -SonarSource releases workbook for comparing code quality and security platforms -Essay Argues AI Agents Could Revive Prediction Markets Beyond Sports Betting -Papers with Code Catalogs On-Policy Distillation as a Rising Post-Training Technique -Thread Claims OpenAI Testing ‘GPT-5.6’ Ahead of Possible June Release -Scribe pitches Optimize as an AI platform to capture workflows, map processes, and justify automation ROI Episode Transcript Uber questions AI coding ROILet’s start with a rare moment of candor from a major tech operator. Uber COO Andrew Macdonald says the company is struggling to justify its rising spend on AI coding tools, because the benefits aren’t clearly showing up as more consumer-facing features. Internally, Uber reportedly blew through its entire 2026 budget for these tools in just four months, after a push that encouraged adoption—down to leaderboards tracking usage.Why this matters: enterprises are learning that “more AI usage” is not the same thing as “more value.” Agentic coding can be cheaper per token over time, but still drive total costs up when the workflow encourages heavy consumption. Uber also says AI is spreading beyond engineering, and that a meaningful slice of committed code now comes from autonomous agents—so the pressure isn’t whether teams will use AI, but how leadership proves ROI in a way that maps to shipping better products. Deepfake voice scams escalate fastThat leads directly into a broader engineering worry: not that developers will become lazy, but that they’ll become passive. One essay calls the risk “abdication”—accepting AI-generated solutions without the kind of skeptical review you’d apply to a human colleague. The warning is that this creates silent operational debt: code that looks fine today, but fails under real-world edge cases, security pressure, or scaling.The practical takeaway is a mindset shift. Use AI like an overconfident junior engineer: valuable, fast, and frequently wrong in subtle ways. The suggested habit is to actively interrogate outputs—ask the model to critique itself, identify failure modes, and surface what it might be missing—so human judgment stays engaged rather than outsourced. GPU inference hits memory wallAnd there’s another trust problem brewing: people increasingly can’t tell whether they’re getting human help, or just recycled chatbot output. A developer described searching for guidance after finding malware-spreading repos, only to see the same unhelpful AI text reposted by GitHub users in a discussion—twice. In a workplace example, a...

Please support this podcast by checking out our sponsors: - KrispCall: Agentic Cloud Telephony - https://try.krispcall.com/tad - Discover the Future of AI Audio with ElevenLabs - https://try.elevenlabs.io/tad - Lindy is your ultimate AI assistant that proactively manages your inbox - https://try.lindy.ai/tad Support The Automated Daily directly: Buy me a coffee: https://buymeacoffee.com/theautomateddaily Today's topics: Vatican calls for AI dignity - Pope Leo XIV’s encyclical “Magnifica Humanitas” warns about opaque AI, concentrated power, labor impacts, and urges regulation, accountability, and human dignity. White House AI order stalls - A reported Trump executive order on AI risk testing was pulled back after internal pushback, spotlighting the U.S. split between AI safety and speed vs. China competition. API price war heats up - DeepSeek made a steep price cut permanent for its V4 Pro API, escalating the AI token-cost battle and forcing rivals to justify reliability, compliance, and margins. Anthropic growth, memory, Mythos - Anthropic is reported to be hitting major revenue and profitability milestones, while also hinting at new Claude features like structured “Memory Files” and a potentially broader Mythos release. Using AI for better code - Engineer Nolan Lawson argues AI coding tools work best as a slower, methodical code-review partner—surfacing bugs, ranking severity, and improving codebase health over raw velocity. Evaluating and standardizing agents - OpenAI published guidance on “macro evals” for multi-agent systems, and MCP shipped a major stateless HTTP spec candidate—both aimed at making agents more testable and interoperable. Multimodal AI in daily work - ChatGPT showcased form-filling with image understanding and voice, while ByteDance released an open multimodal model—signs that AI is moving from chat into practical creation and editing. Supply-chain security on laptops - Perplexity open-sourced Bumblebee, a read-only endpoint scanner that checks developer machines for risky packages and configs—closing a blind spot in incident response. Data foundations for agentic AI - A survey-based “agentic AI readiness” report and broader Open Data Infrastructure thinking converge on the same point: without governed, interoperable data, agent pilots struggle to reach production. -Nolan Lawson advocates slower, review-driven AI-assisted coding to improve quality -Antigravity Adds Gemini 3.5 Flash (Low) to Cut Token Use and Resets Paid Quotas -OpenAI Cookbook Shows Macro-Eval Workflow to Find Recurring Failures in Multi-Agent Systems -Clerk launches agent-friendly CLI to automate app authentication setup -MCP 2026-07-28 Release Candidate Introduces Stateless Core, Extensions, and OAuth/OIDC Hardening -DeepSeek Makes 75% V4 Pro Price Cut Permanent, Intensifying AI API Price War -Fivetran Launches Trial Sign-Up Page With Account and Cookie Consent Options -Fivetran report warns most enterprises aren’t ready to scale agentic AI -Doctorow: The AI Bubble Isn’t the Dot-Com Bubble—Because Bosses Have to Force Adoption -Pope Leo XIV Issues Encyclical Warning of AI Risks to Dignity, Labor, and Accountability -Pope Leo XIV’s New Encyclical Urges AI Regulation and Warns of ‘Opaque Algorithms’ -Anthropic Signals Claude Memory Overhaul With Optional ‘Memory Files’ System -Perplexity Open-Sources Bumblebee to Scan Developer Laptops for Supply-Chain Exposure -Reasonix v0.50.0 launches as DeepSeek-native terminal coding agent built around a cache-first loop -ChatGPT Shows Form-Filling Workflow Using Images and Voice Mode -Open Data Infrastructure: A Modular, Open-Standards Alternative to Vendor-Locked Data Platforms -Anthropic Forecasts First Profit as Claude Drives Enterprise Revenue Surge -David Sacks’s Last-Minute Appeal Helped Spur Trump to Retreat on AI Executive Order -ByteDance Releases Lance, a 3B Unified Model for Image and Video Understanding, Generation, and Editing -Notion releases AI playbook featuring 15 agent workflow patterns for teams -Pope Leo XIV’s First Encyclical Calls to ‘Disarm AI’ and Curb Big Tech Power -Anthropic readies ‘Mythos 1’ preview for Claude Code and Claude Security Episode Transcript Vatican calls for AI dignityLet’s start with governance and ethics. The Vatican released “Magnifica Humanitas,” an encyclical from Pope Leo XIV focused on human dignity in the age of AI. It draws parallels to the industrial revolution and argues that AI isn’t neutral—because it inherits the values and incentives of the people and institutions behind it. The document flags risks like biased “objectivity,” simulated empathy that can mislead users, and high-stakes decisions—jobs, credit, services—being delegated to systems without compassion. It also calls out the material footprint of AI, and pushes for clearer accountability, stronger oversight, and more public control over how data is treated. Notably, the Pope personally presented it at the Vatican alongside Anthropic’s Christopher Olah, signaling the Church wants a seat at the table with the people building frontier systems. White House AI order stallsOn the U.S. policy front, reporting says President Trump was set to advance an executive order on AI risk, including a voluntary model-testing process for government evaluation. But an eleventh-hour push from adviser David Sacks helped derail it, arguing that “voluntary” steps can quickly become mandatory regulation—and that could slow U.S. progress amid competition with China. The immediate takeaway isn’t the fine print, it’s the power struggle: the administration appears split between a safety camp and a speed camp, and the speed camp just scored a win. For companies trying to plan around U.S. AI rules, this kind of last-minute reversal is its own signal: uncertainty remains the default. API price war heats upNow to the economics of the model market, where the price war keeps getting louder. DeepSeek made permanent a seventy-five percent price cut on its V4 Pro model—after initially treating it like a short-term promotion. That matters because V4 Pro also supports very long context, the kind enterprises use for document review, codebase analysis, and big conversational histories where cost can explode. Cheaper long-context changes what’s feasible—and what procurement teams will pressure vendors to match. But it also comes with tradeoffs: some enterprises will weigh savings against reliability, compliance requirements, and geopolitical exposure when sending sensitive data to a Chinese provider. And hovering over it all is the unresolved dispute around alleged improper distillation of Claude outputs, which keeps training-data provenance in the spotlight. Anthropic growth, memory, MythosStaying with the business of AI, Anthropic is reportedly heading into a breakout quarter: projected Q2 revenue of about 10.9 billion dollars, potentially its first profitable quarter, and growing demand tied to Claude’s coding products. If those numbers hold, they reinforce a broader shift: coding and developer workflows look like one of the first truly massive monetization channels for LLMs.At the same time, Anthropic is also rumored to ...