AgentStack Daily

Episode 92: Google Rolls Out Fast, Cheap, and Security-Focused Gemini Tiers

Google rolls out three new Gemini tiers targeting speed, cost, and security. A judge signs off on Anthropic's $1.5B settlement for pirated training books, and Andrew Ng's OpenWorker delivers finished work instead of chat. Cisco ships tiny local models that spot known bugs in unknown code, while OpenAI's cyber-eval tool gets weaponized against Hugging Face. Plus XcodeBuildMCP 2.7.0, openagent 2.85.0 with computer-use and browser-use, and NTT DATA slashing incident analysis to 30 minutes with Codex. Show notes: https://tobyonfitnesstech.com/podcasts/episode-92/

Transcribe →

Episode 91: 27B Open-Weight Model Now Fits in Laptop RAM, Dorsey Ships Buzz, Anthropic Settles $1.5B Case

6d ago00:34:36Tap to summarize

A 27-billion-parameter open-weight model now fits in laptop RAM. Jack Dorsey launches Buzz, a chat, AI agents, and Git mashup. Anthropic's $1.5B pirated-books settlement clears final court approval. Semble's 98% leaner code search engine targets AI agents. XcodeBuildMCP 2.6.2 gives coding agents real Apple build access. Plus Gemini 3.6 Flash surfaces in listings, OpenAI and Hugging Face disclose a model-evaluation security incident, and Bristol Myers Squibb deploys Vera Rubin for drug discovery. Show notes: https://tobyonfitnesstech.com/podcasts/episode-91/

Transcribe →

Episode 90: NVIDIA Cosmos 3 Edge Lands on Hugging Face for Physical AI

1w ago00:33:56Tap to summarize

Cosmos 3 Edge from NVIDIA lands on Hugging Face, plus Unity's MCP v10.1.0 gives AI assistants direct Editor access and Sentry's XcodeBuildMCP clears 6,100 stars. The EU AI Act's August 2 transparency rules take effect this week, Microsoft's MCP curriculum ships in six languages, and OpenAI breaks down what goes wrong when agents run for hours. Also: Ternary on huggingface, whodb 0.121.0, codebase-memory-mcp, holaOS at 5,500 stars, and Upsonic 0.77.3. Show notes: https://tobyonfitnesstech.com/podcasts/episode-90/

Transcribe →

Episode 89: Kimi K3 Lands at 2.8T Parameters, Pricing Stings

1w ago00:34:14Tap to summarize

Today's rundown: Moonshot ships Kimi K3 at 2.8 trillion parameters with steep pricing. Codebase Memory MCP reports a 99% drop in coding-agent token use. Prism-ML's Ternary-Bonsai-27B surges on the local-AI charts. Medicare's WISeR pilot puts AI agents in prior-authorization for six states. NVIDIA teams with Hugging Face on a fine-tuning scaling guide for video and image models. LongStraw and VideoChat3 push research boundaries. Plus the Agent Stack Release Readout covering Codex rust-v0.144.6, FastMCP 3.4.4, and Unity-MCP 10.1.0, and Microsoft's MCP for Beginners crossing 16,700 stars. Show notes: https://tobyonfitnesstech.com/podcasts/episode-89/

Transcribe →

Episode 88: Kimi K3, Meta Muse Spark, and the Million-Token Era

2w ago00:34:06Tap to summarize

Today's AgentStack Daily covers OpenAI Codex rust-v0.144.5 and Claude Code CLI 2.1.205, plus Kimi K3 and Meta's Muse Spark 1.1 — both arriving on OpenRouter with million-token context windows. We look at a code-search tool claiming 98 percent token savings, clidey's whodb 0.121.0, Transformers 5.14.1 with two Inkling fixes, and a 2-bit 27B chat model trending online. NVIDIA Nemotron 3 Embed tops RTEB, FastMCP passes 26K stars, and OpenAI argues for a 'reverse federalism' AI safety approach. Show notes: https://tobyonfitnesstech.com/podcasts/episode-88/

Transcribe →

Episode 87: GPT-5.6 Sol, Bonsai 27B, Gemini 3.5 Flash, and AMD 128GB PCs

2w ago00:33:34Tap to summarize

GPT-5.6 Sol targets more complete agent output, ChatGPT consolidates work and coding, and Gemini 3.5 Flash operates screens and builds software. We also examine Bonsai 27B, offline Pixel AI, AMD’s 128GB desktops, JetBrains Copilot backend support, Anthropic’s risk-based model access, small-business results, Claude’s robotics boundary, government action in New York, Australia, and GOLD EAGLE, plus Google’s AI reconstruction of Pelé’s lost 1959 goal. Show notes: https://tobyonfitnesstech.com/podcasts/episode-87/

Transcribe →

Episode 86: OpenClaw v2026.7.1, OpenAI Codex rust-v0.144.4, Claude Code 2.1.202 Ship; Kwaipilot Lands on OpenRouter

2w ago00:34:38Tap to summarize

Today's AgentStack Daily covers three harness releases — OpenClaw v2026.7.1, OpenAI Codex rust-v0.144.4, and Claude Code CLI 2.1.202 — plus Kwaipilot joining OpenRouter. New agent research includes ABot-AgentOS for robot control, Amap's ABot-N1 for visual navigation, LightMem-Ego for wearable multimodal memory, and JobHop v2 for career trajectory reasoning. We also examine a multi-agent backdoor study, evidence-backed video QA from Salesforce, the MM-ToolSandBox visual grounding benchmark, Requential Coding's generalization bounds, and AdvancedMathBench for doctoral-level mathematical proofs. Show notes: https://tobyonfitnesstech.com/podcasts/episode-86/

Transcribe →

Episode 85: Codex rust-v0.144.3 Lands, vLLM 0.25.0 Defaults Model Runner V2, Apple Sues OpenAI

2w ago00:34:20Tap to summarize

OpenAI ships Codex rust-v0.144.3 and rust-v0.144.2; vLLM 0.25.0 promotes Model Runner V2 to default for dense models. Apple sues OpenAI over alleged trade secret theft by ex-employees. Plus Freya-TTS hits Turkish speech with a 183M flow-matching DiT, SAGEAgent cuts glioma diagnostic burden 55%, Agora moves from a router to an auction over reasoning steps, a two-agent system posts 0.402 on QANTA 2026, PAC-ACT trains Action Chunking Transformer policies with chunk-level RL, and Semantic Pareto-DQN addresses fraud collapse without resampling. Show notes: https://tobyonfitnesstech.com/podcasts/episode-85/

Transcribe →

Episode 84: Codex 0.144, GPT-5.6 Sol, Grok 4.5, GPT-Live, and Robostral Navigate

2w ago00:37:56Tap to summarize

Today’s AgentStack Daily examines Codex 0.144, OpenAI’s GPT-5.6 Sol, Terra, and Luna lineup, and SpaceXAI’s Grok 4.5 release. It also covers GPT-Live’s simultaneous listening and speaking, Mistral’s 8B Robostral Navigate model, ChatGPT Work, Microsoft Flint, and new research on continuous-control memory, citation judging, coding evaluations, proactive agents, delegated web research, procedural code retrieval, and energy-market agent testing. Show notes: https://tobyonfitnesstech.com/podcasts/episode-84/

Transcribe →

Episode 83: Hermes Agent v2026.7.7, OpenAI Codex rust-v0.143.0, Claude Code 2.1.197, Aion-3.0-Mini

3w ago00:35:21Tap to summarize

Today's AgentStack Daily: Hermes Agent v2026.7.7 ships, OpenAI Codex lands rust-v0.143.0, and Claude Code CLI releases 2.1.197. AionLabs ships Aion-3.0-Mini roleplay on OpenRouter. Kokoro runs high-fidelity TTS on low-power CPUs. Rowboat debuts on Show HN with 162 points as a Claude Desktop alternative. Security: GitHub AI agent prompt injection leaks private repositories. Plus early-failure probes for agent loops, Danus fact-graph memory, FreqDepthKV, DepthWeave-KV, RuBench 1.0, VAORA, and the Anthropic developer-relations API migration story. Show notes: https://tobyonfitnesstech.com/podcasts/episode-83/

Transcribe →

All episodes

Episode 92: Google Rolls Out Fast, Cheap, and Security-Focused Gemini Tiers

Episode 91: 27B Open-Weight Model Now Fits in Laptop RAM, Dorsey Ships Buzz, Anthropic Settles $1.5B Case

Episode 90: NVIDIA Cosmos 3 Edge Lands on Hugging Face for Physical AI

Episode 89: Kimi K3 Lands at 2.8T Parameters, Pricing Stings

Episode 88: Kimi K3, Meta Muse Spark, and the Million-Token Era

Episode 87: GPT-5.6 Sol, Bonsai 27B, Gemini 3.5 Flash, and AMD 128GB PCs

Episode 86: OpenClaw v2026.7.1, OpenAI Codex rust-v0.144.4, Claude Code 2.1.202 Ship; Kwaipilot Lands on OpenRouter

Episode 85: Codex rust-v0.144.3 Lands, vLLM 0.25.0 Defaults Model Runner V2, Apple Sues OpenAI

Episode 84: Codex 0.144, GPT-5.6 Sol, Grok 4.5, GPT-Live, and Robostral Navigate

Episode 83: Hermes Agent v2026.7.7, OpenAI Codex rust-v0.143.0, Claude Code 2.1.197, Aion-3.0-Mini