Podcast Summary: The MAD Podcast with Matt Turck
Episode: State of LLMs 2026: RLVR, GRPO, Inference Scaling — Sebastian Raschka
Date: January 29, 2026
Host: Matt Turck
Guest: Sebastian Raschka, AI Researcher, Educator, Author
Overview
This episode is a deep dive into the current state and cutting-edge developments in large language models (LLMs) as of 2026. Host Matt Turck speaks with Sebastian Raschka—a noted researcher, educator, and author—about breakthroughs in model architectures, alternative approaches, advances in post-training, inference scaling, benchmarking, and emerging trends. The conversation maintains a balanced tone between technical depth and accessible explanation, with Raschka offering both academic insight and “insider” views from active work in the field.
Key Discussion Points & Insights
1. Transformer Architecture and Alternatives
[01:05–04:04]
- Transformers Still Dominate: Despite being nearly nine years old, transformer-based LLMs remain state of the art.
- "There's nothing really better in terms of state of the art performance... My short answer is I would say right now if I were to build a state of the art model that would be still a transformer based model." — Sebastian [03:49]
- Alternatives Emerging: Newer architectures tackling issues of cost and scale, such as:
- Mixture-of-Experts (MoE): Increases usable parameters without linear cost scaling.
- Linear attention variants: Reduce cost, especially for long-sequence processing.
- Diffusion models and state space models: Offer efficiency but trade off quality (see next section).
- No Free Lunch: Efforts to make things cheaper often force a trade-off with performance or flexibility.
Notable Quote:
"You can actually...take a GPT1 or 2 model and with a few lines of code, almost, you can transform it into the latest...deep seq version 3.2 architecture. It’s not like a big leap, it’s still the same scaffold." — Sebastian [03:25]
2. World Models, Recursive Models, and Benchmarks
[04:04–09:45]
- World Models: LLMs that learn an internal representation of the environment, improving on tasks that require context tracking or variable state prediction (especially promising for code and potentially for robotics).
- Tiny Recursive Models & Hierarchical Reasoning: Small, specialized models are showing strong results on benchmarks like ARC, designed for logic and reasoning. They use recursion to refine answers, performing surprisingly well compared to much larger, generalist LLMs.
- Use Cases: While generalist models like GPT-4 or Gemini are versatile, specialized models can dramatically cut costs for focused tasks.
Notable Quote:
"It made a lot of waves also because like, oh, we don't need these big ChatGPT Gemini type of models to solve complex problems.... But each one was a different model. It was not like one model that could do all three things." — Sebastian [08:10]
3. Diffusion Models for Text
[09:45–13:22]
- Image to Text: Inspired by the success of diffusion models in image generation, researchers are exploring their use for text, generating all tokens in parallel and refining results in denoising steps.
- Trade-Offs: They can be faster for certain tasks but often lack the quality or flexibility of autoregressive transformers. Companies like DeepMind are experimenting but diffusion models remain a second-tier choice.
Notable Quote:
"They are not putting it out there as their state of the art model. It's more like a cheaper model...But [it] is not, I would say, the replacement at the state of the art." — Sebastian [12:14]
4. Architectural Tweaks, MOE, and Progress
[13:22–17:26]
- Smaller Gains from Architecture: Most current progress comes from incremental tweaks—normalization placement, sparse attention, etc.—rather than major breakthroughs.
- MOE Becomes Mainstream: Mixture-of-Experts transitioned from niche to nearly standard for large models in 2025–2026, aiding scalability and efficiency.
- Shift in Focus: The low-hanging fruit has moved away from pre-training toward post-training and inference improvements.
Notable Quote:
"Improvement is not so much coming from the architecture anymore. It is basically the post training." — Sebastian [13:49]
"Pre training is not dead, but pre training is boring. It's not where the low hanging fruit is anymore." — Sebastian [17:31]
5. Post-Training: RLHF, RLVR, GRPO
[18:03–28:19]
-
New Techniques Timeline:
- 2022: RLHF (Reinforcement Learning from Human Feedback) enables sharp jump in conversational ability.
- 2025: RLVR (Reinforcement Learning with Verifiable Rewards) and GRPO (Group Relative Policy Optimization) bring dramatic efficiency and accuracy leaps, e.g., DeepSeek R1 model.
-
How RLVR & GRPO Work:
- RLHF relies on humans (or a reward model) to rank outputs; RLVR uses objectively verifiable rewards (e.g., math or code correctness).
- GRPO further simplifies the process, improving efficiency and scaling by comparing results in groups, not just pairs.
-
Unlocking Reasoning:
- "In my experience...for example, I took the Qin 3 model...trained it just for 50 steps with RLVR and it goes from 1.5% accuracy on Mars 500 to 50%...by only doing 50 reinforcement learning steps." — Sebastian [24:17]
-
Process Reward Models:
- Training LLMs to improve not just answers, but quality of internal explanations/“chain of thought”—still a tricky, nonstandard area; reward hacking is an issue.
6. Challenges and Scaling RL
[28:19–35:17]
- RL’s Scaling Pain Points:
- Expensive and tricky to implement due to instability, frequent need to “babysit” training, hyperparameter tuning.
- Numerous “tips and tricks” have accumulated to stabilize RL post-training, moving toward maturity.
- Meta-Lesson on AI Progress:
- "There's no magic lever, no magic, I guess, bullet that gives you everything. It is kind of tweaking things here and there and, and making things more robust." — Sebastian [35:39]
- Key: Progress is built from many small innovations, not one big breakthrough.
7. Benchmarks and Real-World Evaluation
[37:08–43:11]
- Benchmaxing:
- Over-optimizing for benchmarks leads to models that may not reflect real-world ability.
- “Leaderboards” can reward style over substance.
- While models might “overfit” to benchmark datasets, rankings across different systems tend to remain meaningful.
- Need for New Evaluation Metrics:
- As benchmarks saturate, there’s a need to transition to more agentic, task-based evaluation, measuring ability to complete real-world tasks over multiple steps.
8. Inference Scaling and Tool Use
[43:11–50:10]
- Inference Scaling as a Major Driver:
- Instead of revising model weights, you increase compute at inference time—e.g., generate more tokens, majority-vote, or run multiple refinement cycles.
- Parallel sampling, response ranking, and iterative “self-refinement” all deliver perceived quality gains but at higher runtime cost.
- Tool Use:
- LLMs calling APIs (e.g., web search, code execution) reduce hallucinations and boost performance.
- Tool integration emerged as a key feature on ChatGPT, GPT-OSS, and is anticipated to spread as privacy, sandboxing, and local execution develop.
Notable Quotes:
"I think one of the biggest drivers this year has been also the inference scaling." — Sebastian [43:37]
"Tool calling means that the LLM can call a web search or...code interpreters...that is very, very powerful because...you can outsource a lot of things that are hard to tools. So like we humans do." — Sebastian [46:11]
9. The Next Frontier: Edge, Private Data, Industrial Models
[50:10–55:04]
- Horizontal Parity, Vertical Niche:
- Major LLMs from OpenAI, Google, Anthropic, etc., all feel similar at the “top line”.
- Real competitive advantage for businesses will come from combining LLMs (open-source or proprietary) with private, domain-specific data (finance, medical records, etc.).
- Return to Private Training:
- Big companies now train and fine-tune their own large LLMs in-house for proprietary use—potentially marking a return to the “edge”.
- Open Source’s Evolving Role:
- Open-source weights and research facilitate broader experimentation and education but often trail the scale and performance of proprietary models outside of “big” organizations.
10. Continual Learning: Hype and Reality
[55:04–59:16]
- Highly discussed at conferences, but genuine robust continual learning (self-improving LLMs) is still several years away, likely 2027+.
- Bottlenecks: catastrophic forgetting, high resource cost, risk in continuous updating, and lack of user-specific fine-tuning.
Notable Quote:
"Continual learning is an interesting topic because...it sounds attractive if you have an LLM that self improves or like an agent that does something, fails and learns. I don't think anything like that is feasible this year." — Sebastian [55:44]
11. Raschka’s Workflow, Book Writing, LLM Use, and Reflections
[59:16–67:17]
- Staying Current: Driven by excitement and personal interest; writes only about topics he genuinely finds engaging.
- Blog vs. Books:
- Blog for covering new research rapidly, especially architecture comparisons.
- Book for pedagogical clarity—focus on code, fundamentals, and reproducibility.
- "Code basically doesn't lie—if it either works or it doesn't work, you know. And I think that's a very useful way to learn." — Sebastian [61:10]
- Using LLMs:
- Helpful for proofreading, editing, and small clarifications—not for automating research or writing fully.
- "I would say first. Also, the thing is, it's not super satisfying if you just ask the LLM to do it. It's, you know, like cheating at homework." — Sebastian [65:02]
- LLM Burnout & Creative Process:
- Delegating too much to LLMs can feel “empty”; values hands-on work for learning and pride in accomplishment.
Notable Quotes & Moments
- "There's no magic lever, no magic...bullet that gives you everything. It is kind of tweaking things here and there and, and making things more robust." — Sebastian [35:39]
- "Inference scaling...is one of the biggest drivers this year." — Sebastian [43:37]
- "I think that's the challenge we have right now. How do you put that into words to communicate the progress and I think that will be in the upcoming years. The more difficult problem to solve how to actually evaluate what you're using." — Sebastian [42:24]
- "I get very excited about things and then when I'm excited about something, it goes very easy and very fast...there's like a lucky coincidence that...I honestly write only about things I find interesting." — Sebastian [60:21]
- "If you use only LLMs to just generate everything, and I wouldn't say useless, but I would feel maybe empty...Pride. Oh, I did something that worked and it's cool and you're proud of this." — Sebastian [65:20]
Timeline of Key Topics
| Timestamp | Topic | |-----------|-------| | 01:05 | Transformer architecture — status, alternatives | | 04:04 | World models, tiny recursive models, benchmarks | | 09:45 | Diffusion models for text | | 13:22 | Architecture tweaks, rise of MOE, small gains | | 18:03 | RLVR, GRPO, RLHF post-training explained | | 24:42 | Process reward models, explanation quality | | 28:19 | RLVR’s application to domains beyond math/code | | 30:46 | Pain points in scaling RL research | | 35:39 | Meta-lesson: Progress as incremental, multi-factor | | 38:40 | Benchmarks & "benchmaxing" | | 43:35 | Inference scaling, tool use, interface tricks | | 50:10 | Open source, private training, edge models | | 55:33 | Continual learning: challenges and outlook | | 59:16 | Raschka’s work habits, book writing, using (and not overusing) LLMs | | 65:02 | Reflections on LLM-driven workflows and burnout |
Closing Thoughts
Sebastian Raschka offers a grounded and nuanced view of LLM progress: the biggest leaps in 2026 are coming from a patchwork of small but effective improvements in inference, post-training, and practical engineering, rather than from entirely new architectures or wave-making breakthroughs. Progress depends on both collective incremental innovation and interdisciplinary effort, with private, domain-focused LLMs set to become increasingly important. Throughout, Raschka emphasizes hands-on learning and curiosity—both key for anyone seeking to stay on the leading edge of AI.
For more of Raschka’s work and practical guides, see his technical blog, books, and substack—all recommended for anyone interested in the details behind today’s LLM revolution.
