“Fail safe(r) at alignment by channeling reward-hacking into a “spillway” motivation” by Anders Cairns Woodruff, Alex Mallen - Redwood Research Blog | Wave AI Podcast Notes