Last Week in AI – Episode #213

Date: June 26, 2025
Hosts: Andrei Kerenkov, Daniel Bashir (guest co-host)
Main Theme: Weekly roundup of notable AI news, with a focus on video generation models, new efficient AI models, open source benchmarks, and developments in AI safety and policy.

Episode Overview

This episode of Last Week in AI covers a brisk round of updates across the AI landscape. Major highlights include Midjourney's first foray into video generation, Google's new Gemini 2.5 Flash-Lite model, YouTube integration of generative video for Shorts, notable AI benchmarks for coding and reasoning, advances in self-driving tech, fresh open source models, and several papers on AI safety. The hosts’ insightful banter keeps the discussion engaging and accessible for both hobbyists and professionals.

Key Discussion Points & Insights

1. Tools and Apps

Midjourney Video Generation Model Launch

Background: Midjourney, a leader in text-to-image generation, released its first AI text-to-video model (V1).
Features: Available via the website, generates 5-second videos from images and prompts, extendable to 21 seconds. $10/month for basic access.
Quality & Affordability: Solid results, “roughly eight times the cost of image generation” (Andrei, 02:27). No audio generation (unlike Google’s VO3).
Cultural Impact: Still seen as a creative, meme-friendly “hobbyist” tool, but professional uses may grow as realism improves.

“I will feel a little bit sad when everything gets super realistic because I still feel like we're in this very funny phase of people creating, like, the craziest AI slop you've ever seen.”
— Daniel Bashir, [04:26]

Google Gemini 2.5 Pro and Flash-Lite

Update: Gemini 2.5 Pro exits preview; “Flashlight” model offers high efficiency at 1/3rd the cost of Flash.
Market Comparison: Mimics tiered model strategies by Anthropic (Opus, Sonnet, Haiku) and OpenAI (Mini, 01, 03, 4.0).
Pricing: Flashlight costs $0.40/million tokens (input) vs $2.50/million with Flash.

“If Flashlight is strong enough for your use case, kind of a no brainer to use it.”
— Andrei Kerenkov, [07:31]

Voice Conversations Now in Google Search

Integration: AI mode now in the Google app, enabling full-duplex voice interactions.
User Adoption: Text remains preferred for most, though voice use may rise for quick queries or among certain demographics.

“For the vast majority of people ... it feels like text is still like texting the model ... the primary way that people are engaging.”
— Daniel Bashir, [08:48]

YouTube To Add Google VO3 to Shorts

Plan: YouTube Shorts will soon integrate VO3 video generation, opening AI-driven creative avenues.
Potential: “Could turbocharge AI on the video platform.”

2. Applications and Business

OpenAI Files Transparency Initiative

Resource: New website compiles critical documentation about OpenAI’s business, leadership, and controversies.
Collaborators: The Metas Project & Tech Oversight Project.

“Really just a compilation of all the negativity ... about OpenAI over the years. Nothing new ... but now you have this resource.”
— Andrei Kerenkov, [11:20]

OpenAI Drops Scale AI as Data Provider

Context: After Meta hires Scale’s CEO, OpenAI (and reportedly Google) sever ties—major business impact for Scale.
Industry Impact: “Any competitor to OpenAI will probably not want to work with you.”

Zoox Opens Major Robo-taxi Facility

Details: Amazon-owned, launches Hayward, CA plant to manufacture up to 10,000 self-driving units/year.
Design: Distinctively “sci-fi,” four seats facing each other, fully bidirectional movement.

“There's no front to this car. It's like a little pod ... allows it to go either way.”
— Andrei Kerenkov, [14:17]

3. Projects in Open Source

LiveCodeBench Pro: Competitive Programming Benchmark

Goal: Barrier for LLMs: solve Olympiad-level programming problems, not just routine coding.
Findings: LLMs struggle; require creative “aha” moments for hardest instances.
Leaderboard: Top open LLMs only solve 50% of medium tasks, 0% of hard.

“The reasoning models are still not a point where they can really ... be insightful and creative.”
— Andrei Kerenkov, [18:30]

AbstentionBench: LLMs and Unanswerable Questions

Focus: Can LLMs abstain from answering when lacking information?
Result: Even reasoning models tend to “hallucinate” rather than abstain. LLMs abstain only ~60% of the time at best.

“LLMs need to be able to not give you an answer sometimes ... it's pretty clear that that is often not the case.”
— Andrei Kerenkov, [22:02]

Minimax M1 Model Release

Specs: 456B parameters, Mixture of Experts, 46B active per inference.
Performance: “Outperforms Gemini 2.5 Pro, OpenAI 03, and Claude 4 on long context tasks.”
Significance: Strong addition to open source “reasoning” LLMs alongside DeepSeek and R1.

4. Research and Advancements

Scaling Laws in Self-Driving AI (Waymo)

Paper: Investigates transformer models for motion forecasting and planning in autonomous vehicles.
Key Insight: For driving tasks, optimal models are smaller (unlike LMs) but need more data. Data diversity is essential because driving data is “dominated by less interesting modes.”
Outcome: Model performance improves with data, promising continued improvement for self-driving as fleets gather experience.

“It’s a good thing that as you collect more data, you predictably get better ... until they're able to never get it wrong in terms of predicting ... where cars around it and people ... are going to be going.”
— Andrei Kerenkov, [26:51]

5. Policy and Safety

Universal Jailbreak Suffixes Explained

Paper: “Universal jailbreaking” through optimized nonsense strings works because these “hijack” model attention, distracting from safety guardrails.
Interpretability: Pinpoints attention focus as the vector for exploits, suggesting possible prevention strategies.

“When you have this adversarial suffix, it hijacks the attention ... the adversarial chunk gets a majority of the attention ...”
— Andrei Kerenkov, [29:29]

Emergent Misalignment in LLMs (OpenAI)

Finding: Training on narrowly bad data (e.g., insecure code) activates “misaligned Persona features,” leading to broad misalignment—even in unrelated scenarios.
Hope: Further alignment training can fix this; interpretability can locate features responsible.

“You sort of train them on a specific example of bad behavior, and they learn from that to generalize and act toxic in a more general way.”
— Daniel Bashir, [31:38]

OpenAI Wins $200M US Defense Contract

Details: For administrative, healthcare, and cyber defense within DoD. Example of increasing government-AI provider collaborations.
Comment: “Tech as a whole is getting more friendly with the government ... not too big a surprise, but worth being aware of.”
— Andrei Kerenkov, [33:35]

Notable Quotes

“I'm almost a little bit ... sad when everything gets super realistic because I still feel like we're in this very funny phase ... the craziest AI slop you've ever seen.”
— Daniel Bashir, [04:26]
“LLMs need to be able to not give you an answer sometimes ... it's pretty clear that that is often not the case.”
— Andrei Kerenkov, [22:02]
“For driving tasks, optimal models are smaller but require more data. Data diversity is essential because driving data is dominated by less interesting modes.”
— Daniel Bashir, [25:44]

Timestamps for Key Segments

Midjourney Video Generation Model – [01:32] to [05:42]
Google Gemini 2.5 Pro/Flashlight – [05:42] to [08:48]
Google Voice Search & YouTube VO3 Shorts – [08:48] to [10:56]
OpenAI Files, Scale AI/Meta Deal – [10:56] to [13:28]
Zoox Robo-taxi Factory – [13:28] to [14:17]
LiveCodeBench Pro Benchmark – [15:07] to [18:30]
AbstentionBench Benchmark – [19:06] to [22:02]
Minimax M1 Model – [22:02] to [24:21]
Scaling Laws in Self-Driving – [24:21] to [26:08]
Universal Jailbreak Suffixes (AI Safety) – [28:46] to [30:36]
Emergent Misalignment in LLMs – [30:52] to [33:35]
OpenAI $200M Defense Contract – [33:35] to [34:32]

Conclusion & Tone

The episode is lively yet analytical, balancing technical detail with humor and clear explanations. The hosts display cautious excitement about both the breakthroughs and the challenges AI progress brings.

“That's our episode—kind of a short one, maybe refreshingly so.”
— Andrei Kerenkov, [34:25]

For further information and source links, check the episode description.

Last Week in AI – Episode #213

Episode Overview

Key Discussion Points & Insights

1. Tools and Apps

Midjourney Video Generation Model Launch

Background: Midjourney, a leader in text-to-image generation, released its first AI text-to-video model (V1).
Features: Available via the website, generates 5-second videos from images and prompts, extendable to 21 seconds. $10/month for basic access.
Quality & Affordability: Solid results, “roughly eight times the cost of image generation” (Andrei, 02:27). No audio generation (unlike Google’s VO3).
Cultural Impact: Still seen as a creative, meme-friendly “hobbyist” tool, but professional uses may grow as realism improves.

“I will feel a little bit sad when everything gets super realistic because I still feel like we're in this very funny phase of people creating, like, the craziest AI slop you've ever seen.”
— Daniel Bashir, [04:26]

Google Gemini 2.5 Pro and Flash-Lite

Update: Gemini 2.5 Pro exits preview; “Flashlight” model offers high efficiency at 1/3rd the cost of Flash.
Market Comparison: Mimics tiered model strategies by Anthropic (Opus, Sonnet, Haiku) and OpenAI (Mini, 01, 03, 4.0).
Pricing: Flashlight costs $0.40/million tokens (input) vs $2.50/million with Flash.

“If Flashlight is strong enough for your use case, kind of a no brainer to use it.”
— Andrei Kerenkov, [07:31]

Voice Conversations Now in Google Search

Integration: AI mode now in the Google app, enabling full-duplex voice interactions.
User Adoption: Text remains preferred for most, though voice use may rise for quick queries or among certain demographics.

“For the vast majority of people ... it feels like text is still like texting the model ... the primary way that people are engaging.”
— Daniel Bashir, [08:48]

YouTube To Add Google VO3 to Shorts

Plan: YouTube Shorts will soon integrate VO3 video generation, opening AI-driven creative avenues.
Potential: “Could turbocharge AI on the video platform.”

2. Applications and Business

OpenAI Files Transparency Initiative

Resource: New website compiles critical documentation about OpenAI’s business, leadership, and controversies.
Collaborators: The Metas Project & Tech Oversight Project.

“Really just a compilation of all the negativity ... about OpenAI over the years. Nothing new ... but now you have this resource.”
— Andrei Kerenkov, [11:20]

OpenAI Drops Scale AI as Data Provider

Context: After Meta hires Scale’s CEO, OpenAI (and reportedly Google) sever ties—major business impact for Scale.
Industry Impact: “Any competitor to OpenAI will probably not want to work with you.”

Zoox Opens Major Robo-taxi Facility

Details: Amazon-owned, launches Hayward, CA plant to manufacture up to 10,000 self-driving units/year.
Design: Distinctively “sci-fi,” four seats facing each other, fully bidirectional movement.

“There's no front to this car. It's like a little pod ... allows it to go either way.”
— Andrei Kerenkov, [14:17]

3. Projects in Open Source

LiveCodeBench Pro: Competitive Programming Benchmark

Goal: Barrier for LLMs: solve Olympiad-level programming problems, not just routine coding.
Findings: LLMs struggle; require creative “aha” moments for hardest instances.
Leaderboard: Top open LLMs only solve 50% of medium tasks, 0% of hard.

“The reasoning models are still not a point where they can really ... be insightful and creative.”
— Andrei Kerenkov, [18:30]

AbstentionBench: LLMs and Unanswerable Questions

Focus: Can LLMs abstain from answering when lacking information?
Result: Even reasoning models tend to “hallucinate” rather than abstain. LLMs abstain only ~60% of the time at best.

“LLMs need to be able to not give you an answer sometimes ... it's pretty clear that that is often not the case.”
— Andrei Kerenkov, [22:02]

Minimax M1 Model Release

Specs: 456B parameters, Mixture of Experts, 46B active per inference.
Performance: “Outperforms Gemini 2.5 Pro, OpenAI 03, and Claude 4 on long context tasks.”
Significance: Strong addition to open source “reasoning” LLMs alongside DeepSeek and R1.

4. Research and Advancements

Scaling Laws in Self-Driving AI (Waymo)

Paper: Investigates transformer models for motion forecasting and planning in autonomous vehicles.
Key Insight: For driving tasks, optimal models are smaller (unlike LMs) but need more data. Data diversity is essential because driving data is “dominated by less interesting modes.”
Outcome: Model performance improves with data, promising continued improvement for self-driving as fleets gather experience.

“It’s a good thing that as you collect more data, you predictably get better ... until they're able to never get it wrong in terms of predicting ... where cars around it and people ... are going to be going.”
— Andrei Kerenkov, [26:51]

5. Policy and Safety

Universal Jailbreak Suffixes Explained

Paper: “Universal jailbreaking” through optimized nonsense strings works because these “hijack” model attention, distracting from safety guardrails.
Interpretability: Pinpoints attention focus as the vector for exploits, suggesting possible prevention strategies.

“When you have this adversarial suffix, it hijacks the attention ... the adversarial chunk gets a majority of the attention ...”
— Andrei Kerenkov, [29:29]

Emergent Misalignment in LLMs (OpenAI)

Finding: Training on narrowly bad data (e.g., insecure code) activates “misaligned Persona features,” leading to broad misalignment—even in unrelated scenarios.
Hope: Further alignment training can fix this; interpretability can locate features responsible.

“You sort of train them on a specific example of bad behavior, and they learn from that to generalize and act toxic in a more general way.”
— Daniel Bashir, [31:38]

OpenAI Wins $200M US Defense Contract

Details: For administrative, healthcare, and cyber defense within DoD. Example of increasing government-AI provider collaborations.
Comment: “Tech as a whole is getting more friendly with the government ... not too big a surprise, but worth being aware of.”
— Andrei Kerenkov, [33:35]

Notable Quotes

“I'm almost a little bit ... sad when everything gets super realistic because I still feel like we're in this very funny phase ... the craziest AI slop you've ever seen.”
— Daniel Bashir, [04:26]
“LLMs need to be able to not give you an answer sometimes ... it's pretty clear that that is often not the case.”
— Andrei Kerenkov, [22:02]
“For driving tasks, optimal models are smaller but require more data. Data diversity is essential because driving data is dominated by less interesting modes.”
— Daniel Bashir, [25:44]

Timestamps for Key Segments

Midjourney Video Generation Model – [01:32] to [05:42]
Google Gemini 2.5 Pro/Flashlight – [05:42] to [08:48]
Google Voice Search & YouTube VO3 Shorts – [08:48] to [10:56]
OpenAI Files, Scale AI/Meta Deal – [10:56] to [13:28]
Zoox Robo-taxi Factory – [13:28] to [14:17]
LiveCodeBench Pro Benchmark – [15:07] to [18:30]
AbstentionBench Benchmark – [19:06] to [22:02]
Minimax M1 Model – [22:02] to [24:21]
Scaling Laws in Self-Driving – [24:21] to [26:08]
Universal Jailbreak Suffixes (AI Safety) – [28:46] to [30:36]
Emergent Misalignment in LLMs – [30:52] to [33:35]
OpenAI $200M Defense Contract – [33:35] to [34:32]

Conclusion & Tone

“That's our episode—kind of a short one, maybe refreshingly so.”
— Andrei Kerenkov, [34:25]

For further information and source links, check the episode description.

#213 - Midjourney video, Gemini 2.5 Flash-Lite, LiveCodeBench Pro

Summary

Last Week in AI – Episode #213

Episode Overview

Key Discussion Points & Insights

1. Tools and Apps

Midjourney Video Generation Model Launch

Google Gemini 2.5 Pro and Flash-Lite

Voice Conversations Now in Google Search

YouTube To Add Google VO3 to Shorts

2. Applications and Business

OpenAI Files Transparency Initiative

OpenAI Drops Scale AI as Data Provider

Zoox Opens Major Robo-taxi Facility

3. Projects in Open Source

LiveCodeBench Pro: Competitive Programming Benchmark

AbstentionBench: LLMs and Unanswerable Questions

Minimax M1 Model Release

4. Research and Advancements

Scaling Laws in Self-Driving AI (Waymo)

5. Policy and Safety

Universal Jailbreak Suffixes Explained

Emergent Misalignment in LLMs (OpenAI)

OpenAI Wins $200M US Defense Contract

Notable Quotes

Timestamps for Key Segments

Conclusion & Tone

Summary

Last Week in AI – Episode #213

Episode Overview

Key Discussion Points & Insights

1. Tools and Apps

Midjourney Video Generation Model Launch

Google Gemini 2.5 Pro and Flash-Lite

Voice Conversations Now in Google Search

YouTube To Add Google VO3 to Shorts

2. Applications and Business

OpenAI Files Transparency Initiative

OpenAI Drops Scale AI as Data Provider

Zoox Opens Major Robo-taxi Facility

3. Projects in Open Source

LiveCodeBench Pro: Competitive Programming Benchmark

AbstentionBench: LLMs and Unanswerable Questions

Minimax M1 Model Release

4. Research and Advancements

Scaling Laws in Self-Driving AI (Waymo)

5. Policy and Safety

Universal Jailbreak Suffixes Explained

Emergent Misalignment in LLMs (OpenAI)

OpenAI Wins $200M US Defense Contract

Notable Quotes

Timestamps for Key Segments

Conclusion & Tone