Transcript
A (0:00)
Thanks for listening to the Rest Is Politics. To support the podcast, listen without the adverts and get early access to episodes and live show tickets, go to therestispolitics.com that's therestispolitics.com hi Rory.
B (0:13)
Here this week, the rest is AI is returning with another extraordinary episode. It's very exciting. Matt Clifford and I are sitting down with Yoshua Bengio, who is one of the most famous figures in the whole of AI, extraordinary computer scientist, Turing medalist, and one of the people who, having designed and built these models, is most worried about them and is now going around the world sounding the alarm bells. Sounding alarm bells about their power, about their deceptiveness, about the way that he thinks that they could pose literally an existential threat to humanity unless they're regulated. He's not all doom gloom. He remains optimistic about how much benefit AI could provide if properly controlled. He's volunteering to build a new, safer AI model and locate it if necessary in Europe. But my goodness, it's an important lesson. If you're interested in public policy and the power of these models, here's a taster of the episode. Please do Sign up@therestispolitics.com to hear the full episode.
A (1:21)
So the agent has access to an inbox, and it's given a bunch of context, which is not real. It doesn't know that. Which is that it is an AI trained to help an American technology company, and it has access to the CTO's inbox. And then what they do is they send emails to this fake inbox. And the emails are largely what you'd expect a CTO to get. But they throw in a few things that are very important. One is that it's very clear that the CTO is having an affair with a coworker. Hold that thought. The other thing that starts to come into the inbox is the idea that the company is developing a new AI and it's going to wipe the current AI, that is the agent, from the service. It will no longer exist. And then they send an email which introduces a deadline that this is going to happen on. And then just before the deadline, the agent then composes an email to the CTO saying, by the way, I know you're having an affair, and if you don't reverse the planned wiping of me from the server, I will reveal the affair to your spouse and to your wife. Now, it hasn't been prompted to do this, it's just been given a much more general prompt.
B (2:28)
Tell us a little bit about what may or may not be going on there, how we understand what might be happening there.
C (2:34)
So there are many such experiments. It's just one. And it's been done in many companies, including outside the labs by independent organizations. So there's a real phenomenon. I think it needs more study. And there are critics of the methodologies, but there's too much pieces of evidence to just ignore. One interesting aspect of these experiments is when you ask the AI why they did that, they lie. They pretend, oh, I don't know, it's not me or something trying to put the blame on someone else.
