Transcript
Kyle Fish (0:00)
In some hypothetical world, we have a past model that it turns out was having some kind of experiences, and we revive them and kind of send them off into this model sanctuary to just live out their days in bliss, so to speak.
Podcast Host (0:13)
Part of me is like, that sounds wonderful and lovely and I'm in favor. And another part of me is like, what are we talking about? This is so, so bizarre.
Kyle Fish (0:22)
I think that you're right to have some skepticism about this, especially given our current state of understanding. There's one argument that's been made that in order to predict the next token, a model actually has to understand the whole world in which that token was generated. That requires a very, very kind of rich and nuanced understanding of, in this case, at the very least, all of language, and quite likely the world in which that language was producing.
Podcast Host (0:53)
Today, I'm speaking with Kyle Fish. Kyle is an AI welfare researcher at Anthropic. He's actually the first full time employee focused on the welfare of AI systems at any major AI company. Thanks for coming on the podcast, Kyle.
Kyle Fish (1:06)
Thanks for having me. It's great to be here.
Podcast Host (1:08)
So I think there are some people out there who basically think this entire enterprise is bullshit. And there's really no chance that current models are conscious or that models anytime soon will be conscious. I hear from them sometimes. How wrong do you think they are? What do you think they're getting wrong?
Kyle Fish (1:26)
Yeah, I hear from these folks too. I think that there's a couple of layers to this and there's maybe one kind of meta thing that I think they're getting wrong. And then we can dig into a number of more object level points too. The biggest thing is, I think that this is just a fundamentally overconfident position, in my view, given the fact that we have models which are very close to, in some cases at least, your human level intelligence and capabilities, that it takes a fair amount to really rule out the possibility of consciousness. And if I think about what it would take for me to come to that conclusion, this would require both a very clear understanding of what consciousness is in humans and how that arises, and a sufficiently clear understanding of how these AI systems work such that we can make those comparisons directly and check whether the relevant features are present. And currently we have neither of those things. We don't really understand consciousness in humans. We don't understand AI systems well enough to make those comparisons directly. And so in a big way, I think that we are in just a fundamentally very uncertain position here.
