Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations - The AI Research Deep Dive | Wave AI Podcast Notes