Wave Pod
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL - Daily Paper Cast | Wave AI Podcast Notes