“Fail safe(r) at alignment by channeling reward-hacking into a “spillway” motivation” by Anders Cairns Woodruff, Alex Mallen - LessWrong (30+ Karma) | Wave AI Podcast Notes