Discover Library

AI summaries and full transcripts for the podcasts you already follow. Free, by Wave.

Product

How it works
Browse podcasts
Your library

Company

About Wave
Privacy
Terms

Get Wave AI

Download app
WaveTube — YouTube summaries
Contact

Wave AI Tools

AI Transcription App
Speech to Text App
Audio to Text Converter
Audio Transcription Software
AI Note Taking App
AI Note Taker
Meeting Notes App
Meeting Transcription App
AI Meeting Recording App
Voice Recorder App
Lecture Recording App
Call Recording App
Online Voice Recorder
Transcribe Video to Text
Audio Recorder App iPhone
Phone Call Recorder iPhone
Voice Memo App Android
Sales Call Recording Tool

© 2026 Wave. All rights reserved.Built in San Francisco

Large Language Model (LLM) Talk | Wave AI Podcast Notes

Large Language Model (LLM) Talk cover

Podcast

Large Language Model (LLM) Talk

Hosted by AI-Talk · EN

AI Explained breaks down the world of AI in just 10 minutes. Get quick, clear insights into AI concepts and innovations, without any complicated math or jargon. Perfect for your commute or spare time, this podcast makes understanding AI easy, engaging, and fun—whether you're a beginner or tech enthusiast.

11episodes

Listen on Apple Podcasts

Episodes

All episodes

Newest first

Context Engineering
Jan 2100:13:58Tap to summarize
Context engineering is the system-level discipline of architecting the dynamic information environment for AI models. Unlike prompt engineering, which focuses on phrasing specific instructions, context engineering programmatically assembles the model's "working memory" using retrieved data, tool outputs, and conversation history. It employs strategies like selection, compression, and ordering to manage token limits and prevent "context rot." By orchestrating how information is filtered and presented at runtime, context engineering ensures LLMs remain grounded and reliable for complex, long-horizon tasks, effectively serving as the operating system for agentic AI.
Transcribe →
Manus AI
Jan 1900:17:12Tap to summarize
Manus AI is a general-purpose autonomous agent designed to function as a digital worker rather than a passive chatbot. Developed by Monica and acquired by Meta, it utilizes a Planner-Executor architecture to orchestrate foundation models like Claude and Qwen within cloud-based sandboxes. Manus excels at complex, asynchronous tasks—including app deployment, massive parallel research, and data analysis—by autonomously planning workflows and executing actions via a virtual file system and browser. Its unique Context Engineering and multi-agent approach enable it to manage long-horizon tasks efficiently without constant human oversight.
Transcribe →
Kimi K2
Jul 2200:15:30Tap to summarize
Kimi K2, developed by Moonshot AI, is an open agentic intelligence model built on a Mixture-of-Experts (MoE) architecture. It features 1 trillion total parameters, with 32 billion active during inference. Trained on 15.5 trillion tokens using the stable MuonClip optimizer, Kimi K2 is optimized for advanced reasoning, coding, and tool use. It offers strong performance and significantly lower pricing than many competitors, making cutting-edge AI accessible and fostering innovation.
Transcribe →
Mixture-of-Recursions (MoR)
Jul 1800:16:43Tap to summarize
Mixture-of-Recursions (MoR) is a unified framework built on a Recursive Transformer architecture, designed to enhance the efficiency of large language models. It achieves this by combining three core paradigms: parameter sharing (reusing shared layers across recursion steps), adaptive computation (dynamically assigning different processing depths to individual tokens via lightweight routers), and efficient Key-Value (KV) caching (selectively storing or sharing KV pairs). This integrated approach enables MoR to deliver large-model quality with significantly reduced computational and memory overhead, improving efficiency for both training and inference.
Transcribe →
MeanFlow
Jul 10, 202500:06:47Tap to summarize
MeanFlow models introduce the concept of average velocity to fundamentally reformulate one-step generative modeling. Unlike Flow Matching, which focuses on instantaneous velocity, MeanFlow directly models the displacement over a time interval. This approach allows for highly efficient one-step or few-step generation using a single network evaluation. MeanFlow is built on a principled mathematical identity between average and instantaneous velocities, guiding network training without requiring pre-training, distillation, or curriculum learning. It achieves state-of-the-art performance for one-step generation, significantly narrowing the gap with multi-step models.
Transcribe →
Mamba
Jul 10, 202500:08:14Tap to summarize
Mamba is a novel deep learning architecture that achieves linear scaling in computation and memory with sequence length, addressing Transformers' quadratic limitations. Its selective State Space Model (SSM) layer dynamically adapts to input context, allowing it to "forget" or "remember" information. Optimizations include a hardware-aware parallel algorithm for its recurrent "selective scan", employing kernel fusion for efficient GPU memory usage and recomputation to reduce memory footprint during training. This results in significantly faster inference (up to 5x throughput) and superior long-context handling.
Transcribe →
LLM Alignment
Jun 14, 202500:20:06Tap to summarize
LLM alignment is the process of steering Large Language Models to operate in a manner consistent with intended human goals, preferences, and ethical principles. Its primary objective is to make LLMs helpful, honest, and harmless, ensuring their outputs align with specific values and are advantageous to users. This critical process prevents unintended or harmful outputs, mitigates issues like specification gaming and reward hacking, addresses biases and falsehoods, and manages the complexity of these powerful AI systems. Alignment is vital to transform unpredictable models into reliable, trustworthy, and beneficial tools, especially as AI capabilities advance.
Transcribe →
Why We Think
May 20, 202500:14:20Tap to summarize
The "Why We Think" from Lilian Weng, examines improving language models by allocating more computation at test time, drawing an analogy to human "slow thinking" or System 2. By treating computation as a resource, the aim is to design systems that can utilize this test-time effort effectively for better performance. Key approaches involve generating intermediate steps like Chain-of-Thought, employing decoding methods such as parallel sampling and sequential revision, using reinforcement learning to enhance reasoning, enabling external tool use, and implementing adaptive computation time. This allows models to spend more resources on analysis, similar to human deliberation, to achieve improved results.
Transcribe →
Deep Research
May 12, 202500:11:35Tap to summarize
Deep Research is an autonomous research agent built into ChatGPT. It performs multi-step online research over several minutes, behaving like a human researcher by searching, reading, analyzing, and synthesizing information from multiple sources. It produces detailed, cited reports. Unlike standard ChatGPT's single-step responses, Deep Research uses an agent architecture orchestrating specialized reasoning models (like o3-mini) and generalist models (like GPT-4).
Transcribe →
vLLM
May 4, 202500:13:06Tap to summarize
vLLM is a high-throughput serving system for large language models. It addresses inefficient KV cache memory management in existing systems caused by fragmentation and lack of sharing, which limits batch size. vLLM uses PagedAttention, inspired by OS paging, to manage KV cache in non-contiguous blocks. This minimizes memory waste and enables flexible sharing, allowing vLLM to batch significantly more requests. As a result, vLLM achieves 2-4x higher throughput compared to state-of-the-art systems like FasterTransformer and Orca.
Transcribe →

Page 1 of 2