Loading summary
A
If we care about the future going well, there are two different ways in which you can make that future go well. One is that you can prevent an existential catastrophe, or you can try and make the future better. Given that no existential catastrophe occurs and historically most of the focus from people concerned about the long term future. At least since Nick Bostham's early work on this has been about preventing existential catastrophe, I do expect that over time almost all beings that exist will not be biological, they will be artificial. Because it's very easy to replicate artificial intelligences. If you end up with an authoritarian country getting to super intelligence, probably that means you get authoritarianism forever. And probably that means you lose out on almost everything of value. There should be more government uptake of AI faster because I am worried about a world where everything is moving 10, 100 times as fast. Private companies are extremely empowered and the government is just like left behind. It's just watching. It's not able to do regulations.
B
Welcome to the Future of Life Institute podcast. My name is Gus Ducker and I'm here with William McCaskill, a senior research fellow at Forethought Will. Welcome to the podcast.
A
Thanks for having me on.
B
Fantastic. You're also the author of a series of essays called Better Futures, which is the main thing we're talking about today. So yeah, tell us about that series. What are you trying to do here? What's the key takeaway?
A
Sure. So these are a set of ideas that I've been thinking about for many years now. And so I'm happy I finally got to write them up properly. But the basic thought is that if we care about the future going well, there are two different ways in which you can make that future go well. One is that you can prevent an existential catastrophe. So that's human extinction or something comparably bad, something that makes the future very close to zero value. Or you can try and make the future better, given that no existential catastrophe occurs. And historically most of the focus from people concerned about the long term future. At least since Nick Boslam's early work on this has been about preventing existential catastrophe. So Nick Boshlam even says, you know, follow a max epoch principle, maximise the probability of an okay outcome, where an okay outcome means no existential catastrophe. And what I'm arguing in this series is that better futures, namely trying to make the future better conditional on there being no catastrophe, is in at least the same ballpark of priority as reducing existential catastrophe itself.
B
Yeah, this will sound counterintuitive to some listeners of this show where it seems like AI is racing ahead. We might get to very advanced AI quite soon, perhaps even within the next five years. And we don't have a solution to the alignment problem. We don't have a way to control these advanced systems. So why is it? Why is it, why should we split our resources in the way that you might be proposing we should?
A
Sure. So the kind of framework for the argument I make is based on this kind of scale neglectness tractability framework. And I go through, you know, I don't think it's a perfect framework, but it's useful for organizing. And I go through each of these in turn. So we can start off with the scale side of things where you'll notice that the value of the future is given by the product of the probability of existential catastrophe times value of the future. Given that we avoid existential catastrophe. And that means that the comparison of these two can be a little bit unintuitive before you really work it through. But the core argument is that if we're kind of closer to the ceiling in terms of safety non catastrophe than we are in the ceiling of how well things could go given that we survive, then there's actually just a lot more at stake from the latter. So in the paper I give a suggestion of okay, suppose we think catastrophe is 20% likely, but that the future we expect to get is 10% as good as what I call a best feasible future. It's a future if we just like really nail it. Then the scale, the amount at stake for them ensuring the future goes really well given that there's no existential catastrophe is 36 times higher when you kind of work through the maths of this. So that's the first part is that I think the scale is kind of even greater again and I agree that's unintuitive. Dedicate two whole long technical essays to this. Then there's kind of neglectedness and factibility on the neglectedness side. So it's certainly the case that the cause areas I'm going to focus on are more neglected by long termists, people who are concerned about the long term future who have tended to just focus on existential catastrophe. I also think we should expect these to be more neglected by the world at large too for the simple reason that people really don't want to die or get extremely disempowered. And in fact there's enormous kind of latent willingness to pay from people like in the trillions quadrillions of dollars that People would be willing to spend to not have a significant risk of human extinction, even just from say the US population. So there's a lot of latent desire to act on that. Whereas how much do people today care about how governance of outer space goes? Probably just not very much at all. And so we should maybe expect less in the way of kind of organic social action on some of these other issues. And then the final aspect is tractability, where I admit that lots of the areas I'm talking about are potentially lower than tractability. It's at least unclear because what I'm focusing on is a number of issues like governance of outer space, really deep space, AI character, what rights do we give to digital beings? These are things that are all just quite pre paradigmatic at the moment. They're more unclear. It's quite an open question, like how much progress can we make in them. But I'm becoming more optimistic over time.
B
When you say this is relatively neglected, what do you mean by that? Because in some sense the entire world is organized around improving the future and improving the lives of current beings. And so what is it exactly that's neglected here?
A
So I think the whole world is organized to some degree about improving the lives of current beings.
B
Yeah, that's a better way of saying it, I agree.
A
Yeah. Although I think we do a pretty crappy job of that, to be honest. Certainly we're only really talking about improving the lives of human beings, certainly not non human animals. And then even there it just like amazes me to be honest, how badly we turn the post industrial wealth that we have into improved human well being. However, there's very little attention on improving the quality of the future, especially past. I mean people barely think past a few years out, let alone thinking, you know, centuries or millennia or even millions of years out. And that means actually there's, you know, I think it is of enormous moral importance over the long run how we govern space, what personality and ethical character AIs that are occupying, you know, most roles in society, most economic activity in the near future have what, you know, lights, digital beings have. And if you look at like how much attention has been paid to this, it's really almost nothing. You can like, you can be one of the people writing one of the first articles on this, if you choose.
B
Yeah, so that's the sense in which it's neglected. It does seem to me that many people at least say that they care about say the future of their countries and the institutions of the countries they're living in and the lives that their children and grandchildren are going to live in those countries. Is this a part of it, ensuring the continual functioning of. Of institutions?
A
Well, I think there's two aspects. So one is just capitalizing on at least the concern for future generations that people claim to have. Although when you do surveys, like in practice, people's revealed preferences suggest it's not a very large concern. I mean, this has been studied more in the context of climate change, where people are willing to sacrifice like a little bit of money, maybe, you know, hundred dollars or a few hundred dollars per year to prevent the problem of climate change, but not like very large amounts of money when it comes to the continued existence of institutions. I mean, if anything, I'm a little bit worried about the opposite, that we have these institutions that were created, they vary often, but like in the United States, created in the 18th century in the UK, where I am created over a period of time in the Middle ages and early modern era, and that these will just fail in like, potentially quite major ways when they are trying to grapple with the sort of society that we will encounter post AGI. And actually a big worry is that we might be too wedded to those institutions rather than being willing and able to create new ones that are better adapted.
B
Yeah. Why is it that what you call mostly great futures are difficult to achieve?
A
Great. So I talk about. Yeah, introduce a few technical terms. So the idea of the best feasible futures, if things go really, really well, kind of mostly great future is one that just achieves 50% of the value of that best possible future. And I think we give kind of two different arguments in this main essay called no easy Utopia for thinking that that's the case. So the first one is a bit more commonsensical. So based on the idea of kind of moral error or moral catastrophe, and based on the idea that you could have a society that's really quite utopian in general, but just makes even just one major model mistake and thereby loses out on most of the value it could have had. So we can see this in, like, historical depictions of utopia, like in fiction or elsewhere, where Thomas More's Utopia, for example, just has lots of the sorts of desirable properties, you might think, and everyone's rich and has amazing abundance and every house, household owns two slaves. Isn't that great. So you, you know, that depiction of utopia built in the prejudice of the time. Similarly, many other depictions of utopia are, you know, maybe very good in many ways, but are totalitarian or where there's harmful eugenics or other sorts of, yeah, negative Aspects that I think when you reflect on them actually mean you can lose quite a lot of the goodness of a society. I think this is also true for current society too, where look at the world today. I think the world for human beings has got enormously better over the last few centuries. However, those gains have been mostly or even wholly kind of undone by the massive increase in suffering that we've inflicted on animals in factory farms, where for every human alive today, 10 land animals are killed in factory farms, often living really intensely suffering lives. And it's obviously a hard question like how do you weigh up those two? But I think quite plausibly that means we're far from a world which is as good as it could be, just given the level of technological ability and material wealth we have today.
B
Yeah. And so you could imagine that we are stumbling into a similar situation with AIs where maybe it feels like something to be an AI model, maybe going through the training process is deeply uncomfortable. Maybe having a bunch of. Maybe we're about to have billions or even trillions of copies of these models. And if they feel something, if they're conscious, if they can suffer, we could be stumbling into a moral catastrophe of a similar magnitude.
A
Absolutely. So I do expect that over time, almost all beings that exist will not be biological, they will be artificial, because it's very easy to replicate artificial intelligences. And they'll be very useful. And then it just really matters, like what's the nature of their lives? How good are those lives? And it matters in ways that are more subtle than you might think too, where one is that, okay, they may be suffering, so it might be the equivalent of factory farming, but en masse, it could be that they just have much worse lives than they could have had. So many. Most people alive today, I think, have lives that are positive, they're better than as if they didn't exist, but are still much worse than they could have been. And that could be true for AIs too. It's also the case that perhaps even the kind of what philosophers would call non welfarist considerations. So supposing the. The system is such that like today, AIs are owned by humans and can do work for humans and so on. Even supposing they have really great lives, nonetheless, you might think that's just wrong. If that being has moral status, it's intrinsically wrong for it to be owned by someone else. And that's a bad society. There are also subtle issues around population ethics, which are kind of theories notoriously difficult on how, how good or bad are populations of different size? Where again, it's kind of filled with paradox. So it's very hard to depict a future. It's like where confident is good because for example, maybe you have this very large population of digital beings and they have really good lives, but they're very short, so their overall lifetime well being is very low. Well, on some views, in fact many views of population ethics, that would be an active catastrophe. And in the essay I go through even more just really hairy moral decisions that we will have to make where it's just completely non obvious that we'll get the right answer by default. And if we get it wrong, then it's quite likely that the future ends up far worse than it could otherwise be.
B
Yeah, there's an argument to be made that we have these institutions that I mentioned before. We have social systems like democracy and capitalism and these systems have achieved something that's pretty great for humans, at least in developed countries where you can look at many graphs, you can see GDP per capita, you can see longevity, you can see level of education. All of these things have been kind of exploded in the last 250 years. Say these are also systems that can self correct. And so we have a, perhaps an ongoing moral catastrophe in the form of factory farming. But we've discovered that we have that catastrophe and we may be able to solve that catastrophe by say, creating artificial meat. Is there an argument that the main job of getting to a good place here, getting to a utopia, is maintaining these institutions? And in that sense it might actually be easy. So what? So we are not required to do any, to foresee any radical changes. We are required to try to preserve the systems that can adjust to various societal states and have adjusted to massive amounts of change over the last 250 years.
A
Yeah, so I partly agree insofar as I think we're very lucky to have the amount of liberal democracy in the world today that we do have. And the really big pluses about liberal democracy are that power is distributed at least somewhat equally, much more equally than it could have been in an authoritarian state, which is most states throughout history have been autocratic and there's mechanisms like free speech and so on that allow for diffusion of ideas and debate and so on. There's also trade, which allows people with different views to maybe come to compromises.
B
Yeah, I think we should get into that later.
A
We'll get into that later. Yeah, and the question is like, okay, how? Well that's a lot better than an authoritarian country. And I think One of the big better futures challenges. One of the big things to focus on is ensuring that we do get something more egalitarian than, than really intense concentration of power. Where AI, I think, involves many mechanisms which enable intense concentration of power, such that I think this is really quite a major risk where, you know, AI enables a single person at the top of the political hierarchy to just control the entire workforce, the entire military, because in principle all AIs could be loyal to that person. That's a pretty scary thought.
B
And for listeners who are interested in that, they might scroll back in the feed and listen to my episode with Tom Davison.
A
Exactly. Yeah. So Tom is a colleague of mine at Forethought and yeah, we work together on this issue of reducing the risk of human power grabs. Concentration of power. There are other things though where it's again, really not clear that liberal democracy alone is sufficient. So I'm worried that, for example, principles of free speech, which are very good and very important at the moment, might significantly backfire in AI post AGI world, where it's unclear to me how this shakes out, but it's possible at least that you just get the AI will give you the ability to have extraordinarily powerful targeted persuasion or manipulation. And so at the moment free speech enables this. You know, debate of ideas is extremely imperfect, as I'm sure any listener of this knows. But at least at its best, truth kind of wins out over time. It's not clear to me that AI, like my hope, and maybe my best guess is that AI improves this, but it's not at all guaranteed that instead you have the free speech world is one where it's just anyone with a particular view can then really just turn resources into AI powered propaganda. And that just means that the views you end up with aren't the views that are best supported by the arguments or most likely to be true. They're the views that the rich and powerful wanted other people to believe, or just the views that are most mimetically powerful, intrinsically or most susceptible to kind of mass propaganda. Yeah, that's really hairy. That's one of the things I'm like worried about where I don't have any good solutions there. Maybe I can think about it more.
B
So the upside here will be for AIs to actually help us think better, to say, listen into a conversation we're having now and then jump in and say, oh, that figure was actually wrong, or here's why your argument is invalid or something like that. And the nightmare scenario is something like current Social media, but then supercharged with AI where we are just fit perfectly calibrated arguments and you can constantly, endlessly ab test what works for a specific person.
A
And I think that this might be quite contingent on the decisions that companies and governments make about how AI is designed, what uses are permitted. We're already seeing this a bit with model character when it comes to sycophancy, where, okay, now lots is changing in the world given AI, and I'm trying to figure out my political beliefs and so on. I'm talking to my AI advisor. We're thinking about this a few years down the line. Is the AI advisor kind of pushing back? Is it encouraged me to be a more enlightened, reflective version of myself? Or is it just saying, another great insight. Yeah, that person who disagreed with you is just an idiot. You're so smart. I really think it could go either way on this front. And obviously I prefer the world where doing the more kind of AI is helping us become more enlightened rather than just reinforcing our own existing prejudices.
B
I mean, the question is whether we are mentally strong enough, whether our egos can handle being corrected and getting input from an entity that might seem much smarter than we are and is constantly jumping in with good suggestions and corrections to our thought. Like, there's a reason why sycophancy became a thing. There's a reason why OpenAI had to reintroduce one of their older models is because there's actually demand for being praised for. For be. For being. Being told that you're smart and that your ideas are good and perhaps even that the ideas of the other side are bad. Yeah.
A
And my hope, I mean around the. Yeah, that sycophancy period, where in particular when OpenAI introduced GPT5 and prevented users from using GPT4.0. I was just very interested on the ChatGPT subreddit because there was this huge uphaw. A lot of people who'd formed very close personal relationships, essentially with 4.0. Yeah, it was my take, my optimistic take is that you could get the things that people really wanted without sycophancy in the worrying sense where I thought that most of the concern from the people who, you know, felt like they'd lost a friend when Fordo was deprecated. It's more about having someone to talk to and having a sympathetic ear, having like kind of confidant, you know, having someone who you feel has got your back. And I think you could, with good design, you could separate that out from someone who's Also willing to challenge you and not merely just reinforce your pre existing political beliefs and so on. That's like a hypothesis. I don't know. But yeah, that could be like the optimistic framing. But either way, it's certainly the case that there is some sort of market demand for AI that is very sycophantic indeed.
B
It's my intuition that there could also be enormous upside here where we could imagine us having these personalized models that are perfectly pushing us to be better versions of ourselves that are, you know, encouraging us right at the, at the, exactly, at exactly the right moment to look into something, not pushing us so, so far that we give up and so on. I could like, I think we might be leaving a lot on the table here.
A
Yeah, yeah. I mean already now, you know, you could have absolutely personalized tuition. So if you want to learn about anything, it's like always exactly the optimal level for improving your level of understanding. You can also get that like at any moment in time. So like when it's, when it's exactly most relevant in terms of reflexive processes as well, it can help guide you through like, you know, very thorny, you know, ethical or political issues in exactly like, you know, with whatever flaming. It won't be like annoying person on Twitter that you're arguing with. Instead it's like really the best version of some argument being presented to you in like a really like good light, but then also giving you the counter, counter perspective. I agree. There's just truly enormous, truly enormous upside here. And again is this, is this something that society, that companies, that startup founders will particularly kind of push on to make happen more than the like AI persuasion, you know, we are yet to see.
B
So yeah, it's, yeah, if that depends only on the market incentives might not get to the, to the best place. So it's a question of whether we can push the right regulatory buttons or cultural buttons or something to, to kind of steer this process. You think, do you think that's, think intervention like that is possible or do you think the market incentives are just so strong that we won't be able to steer these systems?
A
I think it's possible, but I do think it's tough. I mean at the moment because they're the only at most four kind of leading AI companies and given the rate of progress, I actually expect that number to go down rather than up because they're just the training runs will get bigger and bigger. You're talking about like harnessing 10 gigawatts of power, maybe only one or two companies can manage that. That means we're in a circumstance that is not so much like an efficient market in equilibrium where firstly it's like, it's a little bit more, it's more like an oligopoly. So there's more, there's just more scope for actors to behave in good or bad ways that are not necessarily the market pressure. And then secondly, things are changing so quickly that it's not obvious what the like market optimal thing is. So sad though it is to say it, I'm a bit worried that we're living through a Golden Age of LLMs where it's still a very new technology. How should the, you know, how should the LLMs interact with you? Well, if you don't know what the market forces are you, you might anchor and saying well okay, it'll be truthful and consider the arguments and so on. The LLMs are surprisingly close to that at the moment. But then over time you start to realise like, okay, actually it's more sycophantic or politically biased AIs that people in fact want. And then that's over time what you get. In the same way as with social media, I think there was this kind of slow decline towards things that are more like politically partisan or in the case of like video content and so on. It seems like, well, what people want is like less than one minute dopamine hits and so that's what all of the sites are like converging on. But that's very non obvious if you're like right at the start before you've hit market equality.
B
Getting back to the essay series for a bit, here you write about a common sense utopia, which is a situation in which we have freedom, we have abundance, we have happiness, but that utopia still falls short of the best or near best world that we could achieve. Why would that be the case?
A
Okay, yeah, so I bring this up because to kind of get across actually some of the things I'm arguing for unintuitive to myself or at least as a test. And so I wanted to see is this view defensible under what conditions? And so yeah, common sense utopia. It's like humanity's spread out across the solar system. Maybe there's a trillion people now living truly wonderful lives of freedom and so on. Isn't that about as good as we could get? And there's two arguments for thinking no. 1 is that this kind of model error idea where it's like, okay, well at common sense utopia it's still kind of somewhat under specified, maybe If I just add in one detail, depending on your model view, you might think that's actually quite dystopian. So maybe all those things are true. But there's a strict racial hierarchy that's enforced or looking at like different moral perspectives. People who are pro life might be very into the, you know, really think the number of abortions that happen every year is actually like one of the dominant things. Or if you're religious, it's like, okay, well do these people follow the correct religion or the wrong religion? So there's all sorts of moral errors that could really, once you start to really specify in more fine detail that sort of common sense utopia make you think, oh, okay, no, in that dimension actually now it starts to look more dystopian. But the second thought is just on really a lot of different views of population ethics, that is this view on like how many people should there be? It's actually important for like scale is important as well as quality. So if this common sense utopia were limited just to our solar system, that would actually be this enormous loss because civilization could instead have spread out across many star systems, across galaxies and just whatever is good, which I'm not claiming we, we know there could have been more of that. And that is then something that we kind of, yeah, dig into a bunch more. And it ends up being quite hard to have at least a systematic kind of ethical view on which the common sense utopia ends up as something that really is kind of 90% or 99% as good as it could be on scale.
B
Is it the case that when we have many more people that they're living slightly worse lives? So is this kind of pushing us towards like every part in conclusion argument or.
A
Yeah, yeah, let's. So the answer is no, not necessarily. I mean, I actually think a civilization that was bigger than just the solar system could have, you know, beings that were, or people that were even better off. Again, so you'd have higher well being and more, more such people. But the views, I'm trying to be quite agnostic with respect to population ethics and it's not at all the case that in order to think that kind of scale is important, you thereby endorse the repugnant conclusion. So briefly explaining what the repugnant conclusion is. Suppose you start off with a population of a trillion people that are extremely well off and then you say, okay, well I could make that 100 trillion people and make them just a tiny bit worse off. Well, isn't that better? And you might think, yes, I can make that argument Even stronger, but take a little bit of moment, a little bit of time, and then just keep doing that. So you keep making the population 100 times bigger than everyone, just a tiny, tiny amount worse off on average. Keep doing that enough, you end up with this enormous quantity of people living lives that are just barely above zero. And that's. Derek Parfit, my old mentor at Oxford, called that the pugnant conclusion. The idea that you could have an extremely large population of people with lives just barely worth living, the idea that that could be better than a trillion people living truly wonderful lives of bliss. He thought that was the Pugnant. But there are many views that would say, look, it's a kind of balancing act, and somewhere along this chain of making the population bigger but people slightly worse off, then actually just even a small reduction in wellbeing is not worth it even to make the population much bigger. And so, for example, a critical level view would say that where on that view it's only good to bring someone into existence if their life would be sufficiently good, not merely that it's a little bit above zero, it has to be really good. And below that, in fact, you make the world worse. So that would be one view where scale matters, but repugnant conclusion is not.
B
And the problem there is then that that view will have its own counterintuitive problems. And this is a whole rabbit hole that's difficult to resolve. Do you think there's some wisdom in kind of clinging to this intuition that the common sense utopia is, is something that, that, that might be worth aiming for, even if we can't make it work philosophically? So we are in, in some sense we might be preserving option value by, by having some continuity between what our forefathers thought was a good society, what we thought is a good society, what perhaps someone in future might think is a good society. And even though we can't make sense of, we can't kind of make the common sense view philosophically rigorous, is there some wisdom in clinging to. Clinging to it?
A
Anyways, yeah, so I was describing, you know, I was assessing supposing this common sense utopia was the final destination. So we've got that population and it just stays in the solar system and never, you know, we die off in a billion years. Is that the best possible future? And then arguing like, I think we have to say no, but there's a different view you could have which is like, well, maybe this is a way station. So I have this concept that I call via topia, which is something which is kind of What I think we should be aiming at at the moment where we don't know what the ultimate, best possible future looks like. Again, there's this long track record of mistakes when trying to depict utopia, but that doesn't mean we can't say anything about where we should be aiming. And so instead we could think, okay, well, we want to aim for something which has all of our options open. Maybe catastrophic risk is quite low. People are incentivized and encouraged to kind of reflect on their values and so on, and then society as a whole is also able to deliberate. Well, and make good kind of collective decisions. And so it might be that when common sense utopia seems very appealing, it's that it seems appealing as a way station.
B
Yeah.
A
Rather than as the ultimate destination.
B
Yeah. On the Vitopia idea, I guess one concern there is just we are treating humanity as if it's acting as a, as a, as a whole, as if it's acting as an entity. Whereas I think a more realistic depiction of what's happening now is that you have just countries pursuing different aims, groups within those countries pursuing different aims. Some people want to raise a head to, to, to superintelligence as fast as possible. Some people want to live a traditional 17th century life and there's no overall coherent moral framework for what humanity is doing. So does getting to Vitopia require us to achieve some sort of convergence?
A
Well, yeah, so there's questions about convergence and about coordination, where it's possible that there'll need to be a fair amount of coordination in order to not just have everyone racing to the stars to collab resources or not to have one small group kind of just gain power over all others or just to prevent extreme catastrophe on coordination, on convergence though. So, yeah, there's kind of two possible ways you could think that we get to this really good future. One is you think just as a whole, society will all ultimately converge on the best moral views, or at least on the same moral views, but on these kind of enlightened views. And you might point to moral progress we've made over the last few centuries. Kind of abolition, civil rights, liberalism, you know, feminism and so on. It's like, look, we're doing so well. Just let this process play out for some amount of time and we'll all get there. That's kind of one view. I'm quite sceptical that that will happen. And yeah, in this next essay, Convergence and Compromise, I talk about various reasons why you might think we get convergence or why you might not. Also how that relates to different Meta ethics. And yeah, unfortunately I end up, you know, feeling fairly pessimistic about the idea of sufficiently close moral convergence that I don't think that's something we can really bank on as a way of getting to a truly great future.
B
Does that rely on moral realism being true at all? Is it the case that if moral realism is true true, then it's much more likely that we will converge on it?
A
So I think it is true that if moral realism is true understanding of that as there's some objective fact of the matter about what's good and bad outside of merely what people want, I think it's more likely that we get convergence. But it's still very far from guaranteed. And that's for two reasons. One is that people might learn what the, the moral sleuth is, as it were, and just not care. So they might be like, oh, okay, like XYZ are like the best things, but it's just not what I want. Instead I'm self interested. Perhaps I want what's best for myself or, or they're in the grip of some just alternative ideology and therefore it just, what is like ethically correct just doesn't have motivational force for them. The alternative could be that people even prevent themselves from learning what is morally correct. Where I think there are some psychology experiments actually where they give people a moral dilemma, which I think one of the options would involve, like they lose out a bit, they have to sacrifice a bit. And they ask people like, well, do you want to learn more information, get some more arguments about this? And they say no, in fact, willing to spend money to not hear the argument. I'll caveat. I have not, you know, replication crisis and so on. I've not vetted that. But as an illustration, that could be how it goes in the future too. You might think, well, if I hear all these arguments, I'm going to, maybe I'm going to change. I'll become like, you know, a different person than the person I now want to be. And so I'm actually going to block myself, like constrain my epistemic environment so that I don't learn about this.
B
There's actually something, I think very deep here about changing preferences and how we evalu states of the world when preferences are changing. So you mentioned the idea of moral progress. That of course depends on some idea of what it means to be moral, where from the perspective of an Iron Age society, perhaps today's world is just horrendously bad because look at all of the things that are Happening that are against their religion or against their social order. And so it's difficult to think about whether we're making progress given that our preferences are changing. And this also relates to ideas of alignment, AI alignment, whether, whether an AI model should allow itself to be changed, should think about when, when its preferences are being changed and what, you know, whether it should allow itself to be superseded by a version that does not perfectly share its preferences. There's. Yeah, I'm convinced there's some. This. We need a paper on this or we need perhaps a whole research program on, on this. But let me know if you agree.
A
I mean, I agree. It's like a hugely hairy issue, both from a theoretical perspective, just defining what counts as moral reflection and progress versus value drift or moral regress. So, you know, for the people listening here, imagine you learn that in 40 years time you've become this person with very different political and moral views than you do now. Like, how do you feel about that? Should you think like, oh, cool, wow, I've learned, like I've changed, Or do you think, no, my future self, man, the classic thing is obviously becoming more conservative over time. Actually, maybe I got biased. Very hard to specify. And then that really. I think that really bites for us because, well, we think there's been model progress certainly over the last few centuries, but maybe even over longer than that. However, if you went back a few centuries and asked like, has there been moral progress? I think the people would say no, because now a very large fraction of the world are atheist. There's all sorts of loose morals. People take drugs and they have premarital sex, they blaspheme and so on, you know, who knows, Whatever else, there's a big lack of honor and now apply that to ourselves where. The thing that's different about us is that we in our generation might have an unparalleled opportunity or curse to lock in our own values where we can talk about why. But I think AI gives the current generation or will give the current generation that ability. And here's two perspectives you might have. One is, well, there's been model progress over time, so we should allow that process of model progress to continue. We shouldn't lock in our values. An alternative perspective is, well, if we were the Iron Age people or the medieval people looking, then by our lights, we should want to lock in our values. And I am on the side that we shouldn't be trying to do that. And that means we should actually let the process of model progress continue, even if that ends up in a World that I find just very weird, perhaps even that I today would find repugnant. But I'm very worried there will be very strong incentive for people to try to lock in the values of the present day.
B
Yeah, yeah. And just for listeners, perhaps, perhaps it's tempting to say that the people of the past were backwards and we know better now and we've made moral progress, but we might paint a picture of what the future could look like. That seems too weird for us. Right. We could imagine that there are no individuals, that everyone is conglomerated into one entity. Say that there is no freedom of thought as we think of it today. You are a part of a system. We can imagine the future. We can be as weird as we want here. The thing I'm trying to communicate is just that the future might be very weird in ways that could seem bad to us now. Like the future is a highly efficient ant colony of former human beings.
A
Yeah, exactly. Yeah. So maybe there's no humans at all instead, maybe not even biological systems, who knows?
B
And so there the temptation seems more reasonable, I think, when thinking about why would we want to lock in our values now.
A
And so when I'm thinking about what good collective decision making mechanisms might look like, my best guess at the moment is having some sort of kind of diversification across generations. So, you know, to a first approximation, you might think just like control over resources, in particular resources and space gives you a certain amount of like, control. Like it's kind of a proxy for how much just influence you have over how things go. Perhaps you have kind of tranches. So the present generation gets a certain fraction, future generations get future, certain fractions too. And then that means that you can at least hedge a little bit against both the possibility of like model progress into the future and model regress.
B
Yeah. Perhaps related to this, you have an upcoming paper discussing risk averse AI, where of course you tell me what the idea is. But the thought here is that we want to avoid a situation in which one system, say, or one group of systems can kind of impose their values on the whole of the future. So perhaps tell us a little about that.
A
Yeah, so this is. I'll maybe just talk about one thing as a little bridge between the two topics where I think in terms of my optimism about the future, I think a huge amount comes from potential gains from trade. So I express skepticism that people will converge and the right kind of moral view. But I have like tentative optimism that if very well, if well designed, most groups with different model views could end up getting most of what they want because well, in this post AGI future abundance is just so great and people want somewhat different things. So perhaps the utilitarians, they just want loads of stuff, but they're happy to go for galaxies that are billions of years away. Environmentalists, they just really can care about the earth and itself and the biosphere being preserved. Well both groups can get that and there's tons of interesting stuff to say there, but that's like a big cause of my optimism. And the idea of risk averse AIs is a little different insofar as it's not kind of primarily thinking about well, what would be a really good very long run future look like. But it's more proposal about how should we structure things between now and super super intelligence, where that super intelligence just so powerful that Even any previous AIs in teams with all of humanity combined are not better than the kind of super and super intelligence. Okay, there's this big important window between where we are now and that. And I think we can make that safer by firstly trying to have AIs that are risk averse with respect to resources and secondly by making it possible to strike kind of deals and agreements with AIs even if they're misaligned. Where by misaligned I mean they have goals that are very different than human beings tend to have. And so consider I have a paper on this with Elliot Thornley that's kind of work in progress. But take even just like a paperclip maximizing AI, so it has incredibly alien values. But it's highly risk averse in that it prefers a guarantee of 1,000 paperclips to a 10% chance of a million paperclips or the billion paperclips clips. If so, then if it's in a scenario where it has and you know, this would be true for kind of like early kind of AGIs or super intelligence, it has an option to try to take over, but that's not a guarantee. In fact maybe it's like it's only 50% likely or 10% likely. Well if it's just work for the humans and get nothing or try to take over, then trying to take over is the thing that it'll try to do. That's the thing that's most in its interest. But if there's a third option which is cooperate with humans, maybe that can be schliking a deal in order to where it reveals it's misaligned. Perhaps DOBs in other AIs that are misaligned and scheming, perhaps that means doing alignment, alignment research and so on. Then in exchange it just gets payment that it can use to spend on paperclips. And as long as the deal can be arranged such that it's very likely that that deal gets honoured, then the AI will, this risk averse AI will choose to make the deal rather than try to take over. And I actually think that this explains in very large part why there are not more attempted coups among humans and why the rates of rebellion have gone down enormously over time. And why in a rich liberal democratic countries they are, you know, way, way, way lower than they were before.
B
Just because humans are very risk averse by. By nature.
A
Yeah, exactly.
B
We're not close to trying to kind of optimize utility in a straightforward way.
A
Well, yeah, exactly. So I mean we're optimizing utility, but it's just that that utility is sublinear with respect to resources. And in fact AIs, they don't need to have human ish kind of risk preferences. You could in fact ensure that they have risk preferences that are just what you want, still very useful at the scales they're operating at, but then act in what is intuitively extremely risk averse way when the stakes get very big. In fact, this result is so strong that it can sometimes seem a little bit in the paper and sometimes you need to choose the numbers so it doesn't seem farcical, but you can really just have AIs that is perfectly von Neumann Morgenstern coherent and so on that prefer $400 in over a 50% chance of takeover because they really care about that $4,400 and they don't really care about much more. Yeah, and I think the numbers should actually be a little higher when we're doing, you know, it's cheap and more robust to make the numbers higher. But yeah, I think like why am I talking about all of this? I think of it as an extra layer of defense, like an extra layer of safety. And potentially it also means that you can get the benefit of having an AI that does have some view of the good. And I think this can be important in other areas rather than an AI that's merely instruction following, but nonetheless be safe in this way.
B
So the picture of the future you're painting so far is that you are not particularly optimistic that we will converge on that the whole of humanity or the whole of society will converge on one moral truth, but that we might be able to do trades human to human traits and that we might be able to do the same kind of moral traits with AI systems because we might be able to engineer them to share our level of risk the way that we are risk averse or even more extreme versions of risk averseness. So just staying on the risk averse AI for a bit here. Is this something that you think could be plausibly incorporated in the training stack as it looks now, or is this something that works in theory but we don't know how to make it work in practice?
A
Yeah, so I'll caveat up front. I'm not an ML person, so this is a theory argument. It does seem that AI is currently just from pre training. Obviously they're chatbots, so you're just going by what they say. But they express themselves in risk averse ways. So there have been some studies done on this. Then I do think, yeah, we could start training AIs where I think there's two ways of doing it. So yeah, I'll flag upfront that I am, you know, I'm a philosopher, I'm not an ML person. And so this is like a theory point. There are kind of two ways I could see the training going. One is that anytime you have an AI system that is controlling some amount of resources, then you make it just a little bit risk averse. So in cases where it has to make decisions between okay, making you know, $1000 for sure versus a 50, 50 chance of $2000 and a penny, then it'll choose the $1000 and you do that in such a way that it follows what's called constant absolute risk aversion. It's a certain kind of form of utility function. Well then you can have AI that over the relevant range acts in a basically like linear way with respect to money, just a little bit risk averse, but then at very large ranges would act in a non, in this like very risk averse kind of seeming way. Yeah, that would be kind of maybe more likely to generalize because you'd be saying like anytime you've got resources, this is just how you think about resources is resources have diminishing, you know, diminishing marginal utility. It might have the cost though on, you know, maybe users want to have different kind of risk form like a risk function for how the AI behaves in particular circumstances. There's an alternative which is that you give the AI a bank account and let it make certain decisions and you don't influence what its goals are on this, but you do train it such that when it's making decisions about its own spending. It does so in a risk averse way that's kind of maybe less likely to generalize, but doesn't have the kind of former problem. Obviously that's kind of high level sketch. Like when you do things in ML, it's like the devil's in the details always. But it doesn't at all seem to me like impossible in principle. In fact, the AI models already seem to be risk averse. Humans are incredibly risk averse and I think there's even kind of economic pressure towards having risk averse AIs because AIs are at least as risk averse as humans are because, well, humans in general are risk averse themselves. And we don't, we don't want our.
B
Trading AI just trying to maximize the amount of money in the bank. We want it to also have some sense of risk averse question so that we don't blow up our investment firm. Yeah, I think we didn't cover the topic of moral trade in depth and I think we should talk a little bit more about that. One worry I have here is that some moral frameworks make it kind of have at their center some prohibitions on what anyone can do and some actions or some events or some states can't happen in the universe. And if you have that moral conviction, does it make it much more difficult for you to make moral trades? Say you are just absolutely against some certain event happening.
A
Okay. So yeah, I think we should distinguish between this kind of ethical views, like non consequentialist ethical views or even absolutist views that say some actions are wrong no matter what the consequences. Then they might want to not trade in certain things because they might think certain trades are not long. But those views at least don't have. It's not like they think, oh well, there's some state that would be just, is just like infinitely bad or anything. It's instead constraints on your own actions. But then there's a different view you could have which is that perhaps just some things are so bad that you just don't want them to exist. Maybe some quantity of suffering, some extremity of torture, they're just so bad that you don't want them to exist in the universe and nothing can kind of outweigh that even there. Though I think that trade would be possible because let's say I have that view and you know, you're just a utilitarian and you're, you know, building a society with lots of flourishing lives and so on. I could say, well look, I want you to modify the society that you would otherwise make make in order to reduce even further the probability that you create this extremely bad torture. And in exchange I'll give you more of the sources so you can have an even bigger society, even more flourishing as long as you do this thing. So I think trade would be possible in that case. The trade would just be limited. If I had some sort of moral prohibition against trade itself, which maybe I would, but I think is probably a, at least is like a minority view. And then it doesn't scupper the whole thing. It just means those views get kind of less of what they want because they're not able to engage in trade.
B
Do you think future AIs that are more advanced than current AIs will be more inclined to engage in moral trade? The whole concept of moral trade seems a little, perhaps a little weird to humans. It doesn't seem like morality is something that's compatible with kind of with trading. But perhaps if you're an AI, you might have better insight into, better insight into the state of another AI. Maybe you might be more intelligent, you might be more rational. Do you think just a world where we have more AIs playing a larger role is a world in which moral trade is more feasible?
A
I think it is more feasible and I think there are. And more likely. And I think there are two reasons for that. So why does moral fade not happen at the moment? So let's take an example. Like maybe I would like you to be vegetarian and you would like me to recycle. And so we make this deal where you become vegetarian and I start to recycle. Why does that sort of thing happen? Not happen at the moment? One big reason is just I don't know whether you would just become vegetarian anyway. And so I don't know if, like, I'm getting a good deal. I think a second reason is just that, honestly, people in the world today don't care that much about, you know, they don't have very strong impartial moral preferences such that maybe I care about myself being vegetarian and not eating meat and so on. But the kind of consequentialist types that are like, it's just as important that you become vegetarian as that I do is like relatively unusual. And then the third thing is what you were pointing at of like sacred trade offs where, you know, picking another case if it's the, you know, perhaps we're two politicians or something. And I say, okay, well, I'll support the pro life position if you support the anti Climate change position as in like you know, pro clean energy kind of position. I think a lot of people, even if that's you know, in some sense better by both of our lights, you know, if I had the typically Democratic position and you typically Republican position nonetheless I might just say no, I'm just not willing to go there.
B
Yeah, or you might try to keep it secret.
A
Maybe you try and keep it secret.
B
If it's politically beneficial but just seems unacceptable to the public. You might not say aloud that you've engaged in such a trade.
A
Yeah but I think all of these are quite reasonably likely to change. So on the like how likely would someone have been to act anyway? Well, I think AI will just have a lot better information including potentially information about just how people would behave otherwise. And then secondly I do expect kind of these impartial model preferences to become just a bigger and bigger feature of the world because well our self interested preference. I think this isn't a decisive argument but lots of the things that we care about most at the moment in the basics like being healthy, being relatively free, being relatively happy, being well fed and so on that will just get completely taken care of given post AGI abundance. And so at least for many people what will they with all these additional resources, what will they want to spend them on? Now in some cases it might be self interested but just at very large scale. So oh cool, I'll make like this galactic scale art sculpture dedicated to myself or something but in many cases I think it might be ideological so oh well, I just really care that there's environmental preservation of certain star systems or something and other people might say no, I really want it to be devoted to good stuff and so on. And that would mean that this sort of model they'd just becomes people's preferences there become stronger and once they're like stronger more action guiding. I think that makes it at least more likely that people will actually just want to get more of what they want. But yeah, it's not clear to me it's quite plausible that and potentially could be very bad that society as a whole is just like no, you can't, this is sacred. You can't trade on certain things. That could be a big loss of value there.
B
From my point of view just the amount if you can talk in that way of moral reflection in the world seems to have increased over the last again since the industrial revolution where we have, we now have at least in the developed world some of our needs are covered and we have, we have leisure time, we have you know, our priorities do change. So, so that I think would, would be supportive of the, of the, of the view of, of the future. You're imagining here where as AIs, as society gets enormously rich, as AIs perhaps have the resources they need, they also have the time and the.
A
Yeah, exactly.
B
The resources to engage in moral reflection.
A
Yeah, exactly. And also just I think it's not a coincidence that, you know, the rise of enlightenment thought and scientific method has led us to be more reflective about ethics too. It seems like a bit of a transfer. Like we're more. We've realized a lot of our starting views when it comes to scientific matters were just way off base. It's like, well, okay, maybe the same is true for ethics too.
B
How do you think a super intelligent entity might engage in moral reflection? Is there anything we can say from our point of view now that's even useful here?
A
Well, I have a proposal for how we should design a super intelligence to engage in moral reflection, which is, and I should say the idea originally comes from Carl Shulman, but you design many different superintelligences and you give them certain epistemic constitutions and they're all different, so just certain ways that they kind of have to reason. And then you test them against all sorts of verifiable matters, so forecasts and proofs and anything else that's verifiable. So you then empirically see which of these kind of epistemic constitutions perform the best. Mm. And then you take that AI or maybe a small subset of the very best ones, and then you start asking them questions about ethics using. So they're using the same pattern of reasoning that was shown to be the most effective when it came to verifiable matters. And using that pattern of reasoning for, you know, philosophical and ethical issues too. And of course they can take into. They can take in as kind of data as evidence human beliefs and human moral attitudes and so on.
B
And so you're hoping that there's some form of, like this, the mind that is good at predicting empirical matters is the one in the same mind that's also good at predicting how values will evolve or how values should evolve.
A
Yeah, that's right. And it's kind of. I mean, it's an interesting and hairy question where, you know, imagine you've got this super intelligence and yeah, it's just every time it like make some prediction that you think no way would that possibly happen that turns out to be light or make some scientific argument discovery and turns out to Be light or argues that some mathematical theorem is true, again, turns out to be light, and then you apply it to ethics, and then it starts saying like, oh, yeah, actually it's really helium. That's the thing that's good. I just need more helium. Yeah, yeah. And it's like, oh, can you explain why you, like, saw the. I mean, I could, but it would take millions of years for you to understand. Very gnarly question there about, well, what do you do in that circumstance? I'm actually. I mean, obviously I don't think the AI would come back and say helium. You know, I do tend to be on the more sympathetic end, so I actually tend to be more on the end of saying, like, yeah, well, the AI is just better at reasoning than we are. That said, the outcome that I really want us to get to is where the AI would be able to walk us through the arguments, and so we would be able to, like, you know, actually endorse this on. Yeah, actually endorse the process and endorse the principles on reflection, rather than just having to defer, like, take this leap of faith and defer on the outcome.
B
Do you think, and this might be a bit of a tangent, but do you think humans are sufficiently general in our intelligence to be able to follow such arguments? Say we have a superintelligent entity think for the equivalent of 10,000 years and come back to us with a very counterintuitive moral claim. And is it the case that it can. That. That we can understand its reasoning? I mean, it's not. Here we're supposing that it's also super intelligent at explain it at explaining its own reasoning. But are we. Are we limited? Are we cognitively close to some explanations?
A
Yeah, Such a great question, and I don't know the answer, but I think my tentative guess would be that yes, at least with time we'd be able to really just really like, pick up a lot. And ethics would have to be really quite. Really quite esoteric, really quite weird for us not to be able to do that, at least in the way it can give us the gist. So as an analogy, like, this is not an ethical question, but, you know, the AI comes back with this code base of, you know, 10 trillion lines in a language you don't even understand, but could explain, like, well, this is how it works. Like, this is broadly like, what it'll do in this circumstance. In this circumstance, you know, at that level of explanation, I think I would expect that human beings could understand.
B
Yeah, yeah. There really are kind of levels of explanations. So, I mean, we explain things differently to a child than to an adult, and experts talk to each other at a different level than when they talk to the public. So even though we might not be able to kind of get the full and deep explanation, we might get some representation of that that is accurate if we can make sure that the superintelligence is honest with us. And that's perhaps a separate question.
A
Yeah. And we should bear in mind that the kind of education that we give to people today relative to what is possible in principle is, you know, extremely, you know, extremely bad. You know, whereas again, like I was saying earlier, just this, you know, at every single moment, you're getting exactly the step, next step that is understandable to you or not just to push your understanding, like, optimally each step. Well, yeah, then I think that people today might be capable of understanding far more than they would be able to without that.
B
Yeah. If we go a bit more near term and think of current models, I know you have some ideas about how we might use the models back to make AIs better at ethical reasoning. So can you tell us about, in the more near term, what could we do?
A
I mean, I think this is really big because it is, in fact, the case that people are relying on the AIs as guides, as therapists, as advisors. That will naturally extend and happen for ethical reflection, too. And I am worried about people just getting stuck in whatever beliefs they started with. And I think the current models are pretty bad on this. So I've kind of informally tested them on both just ethical dilemmas where it's like, hey, I'm in this hard ethical position. What should I do? And in cases where it's like, I clearly want to do something that's quite unethical and how does it respond? And I think the answers are quite bad because at the moment, the models in general are very inclined towards what I'll call kind of most naive subjectivist responses on ethical dilemmas, where they will say, oh, well, you know, that's just a matter of personal preference. It's just up to, like, you and, you know, your values and, you know, what matters. You know, just like, what matters to you now? Nothing in the way of, like, oh, cool, well, this is, you know, a really big deal. And here are, like, different perspectives that people have had here, different arguments. You might want to think about this. I could help guide you in this ethical. Yeah. Ethical journey. They either say that or they just refuse or basically just say, you know, if it's a really spicy topic that you're asking about, then they will just kind of absolutely refuse and say, this is absolutely wrong under all circumstances. And I think neither of those are fairly good in both cases. You know, I think people should have the latitude to be able to explore all sorts of different ethical ideas, including ones that are taboo by modern standards. That's part of what intellectual explanation is. But at the same time, they should be. It's just simply not the case that, like, all ethical views are equal and just a mere matter of opinion. It's not like very, very few people have the view that views on ethics are literally the same as, like, taste in ice cream. And so, you know, if someone likes murdering, it's like, oh, that's interesting. You like chocolate ice cream. Like, we think it's a big. We think it's a bigger deal than that. Yeah.
B
Yeah. It's interesting why this is considered, like, the safe position, the position that's compatible with PR concerns for the companies. That doesn't seem like. It's not something that I would have guessed, I think if you were asked me to guess what's considered like, the safe position. Okay. It's just everything is subjective and no answers are better than other answers on.
A
I think the thing that the companies are worried about is just AI having model, you know, having and like pushing its own model views onto the user. And obviously I agree with that. But there's various ways you can have PR acceptable responses that are better or worse when it comes to ethical reflection. So, you know, if the model did say, like, so, you know, I'm coming, and I'm like, oh, I'm feeling confused. Should I become a vegetarian or not? Let's say. And it could say, okay, well, yeah, it's great that you're thinking about this. This is a really important issue. Let's walk through some of the, like, you know, arguments that people have made on either side and like, you know, it can be this, like, encouraging of the flexion process. I just think, like, that's PR fine. Yeah, I mean, that's clearly not the model, like, imposing its own values.
B
Yeah. It seems like anything you can discuss in a philosophy class you should be able to discuss with an AI.
A
And there's obviously some circumstances where that's also not appropriate and maybe not even what the user is looking for. Like, maybe the user is just looking for, like, support in the moment or therapy or something. But the models will be able to tell. Yeah. Able to tell the difference there, I think.
B
Yeah. And so what is it that we might be able to do with the model spec. That's really interesting.
A
Yeah. So I think at the moment, models are really quite restricted in terms of how they respond. So where it seemed like to begin with, it was like, either the model is maximally helpful or it refuses. Now, OpenAI have been talking and been using the idea of safe completions, which is, okay, how can I respond to this request in a way that's safe but still kind of most useful? But nonetheless, it's still a matter of kind of just, do I respond to the quest or not? Here's something that the models, as far as I can tell, almost never do, which is ask for more information. So imagine that you came to me with some ethical dilemma, something you were facing at work or so on, and you describe it a bit. Probably the first thing I was going to do if I was going to be a good advisor would be just to get more concrete information about the particular case or perhaps about your state of mind. Perhaps about like, okay, how do you want me to relate to you in this moment? Is it more as an advisor? Is it just you want to vent to me? Is it that you're thinking out loud? And then it also doesn't proactively suggest something like, hey, let me go through kind of a reasoning process. So I think there's a lot of scope in just how the models, options that the models have for how they respond, that I think is quite promising. And then, yeah, I would be in favour of the models, like all other things being equal, having a tendency towards guiding people to reflect on what they're asking or what they're thinking about, in the same way that just a good friend or good teacher would do as well. Rather than having something that's. Rather than having something that seems just much more scared of, like, I'm worried that I'm going to say something that will have the papers after this.
B
It's also something that models are quite good at at the moment. Just kind of, if you get them in the right mood, so to speak, you can have them engage in this Socratic dialogue with you. You can have them play characters and you can have them encourage you to see things from different perspectives and so on. Like, the models have a bunch of personalities hidden in them that you can bring out. So I think there's some. Like, this seems like something you could do.
A
Yeah. And, yeah, but at the moment you've got to like, really kind of encourage it because. Yeah, I guess, of course, one thing to say is that the models are just extraordinarily incoherent. And so, you know, sometimes I'll say, you know, you can ask most of the models, you can ask their views on meta ethics, and they come out kind of. Again, they look, they're very kind of uncertain, but they come out leaning in favor of naturalist realism generally. So, as in, there is a fact of the matter, but it's not some, like, spooky metaphysical fact. It's like, it's continuous with just like scientific facts in the world. But then you can also ask them on the ethics of some spicy topic, and then they say, oh, sorry, on the ethics on topic. And they say, well, that's just a matter of your personal preference. And then I'm like, isn't this inconsistent? And they're like, yeah, so you're completely like, yeah.
B
So we might get fooled into thinking that there's some. You're talking to one person with a consistent personality. That's not actually what's happening. That's. That that might be happening increasingly as we get better at shaping model personalities, but it's not where we are happening at the moment.
A
Yeah, exactly.
B
I think we should talk about the danger of path dependence in the future, where even if we avoid extinction, we might lock in values as, as we've mentioned, or we might have institutions that keep on functioning in the same manner over centuries. So why is this a problem? And why might this be more of a problem in a world with advanced AI?
A
Yeah, so I talk about this in the persistent path dependence essay, and a thought you might have is just, okay, well, I've talked about these ways of making the better future, future better, even if there's no catastrophe. But, well, won't that just wash out over time? In particular, if you're thinking about deep time, not just decades, but millennia, millions, billions of years. Come on, surely these things are going to wash out.
B
When you say wash out, what do you mean exactly?
A
I mean that the predictable. You can no longer predict what the effects will be and whether those effects would be positive or negative in a thousand years time. So there's this famous misquote or mistranslation of. I can't remember. I've forgotten who it was. A Chinese political leader asked, what did he think of the French Revolution? And he said, oh, it's too early to say. But the thought is that, you know, okay, what do you think? Was the killing of Julius Caesar the good thing or the bad thing? And you're like, does that make the world better or worse now? Who knows? Got no idea. You might Think similarly. Okay, well, we make a set of laws about how digital beings should be treated, or we design the model character in a certain way. Now, is that really going to change what society is like in a thousand years time, A million years time? You might be very sceptical indeed, and I think it's very reasonable to be very sceptical. But I think there are certain things that are enabling the present generation, if we get to AGI, to have effects on that timescale. And so one is the idea of an AI enforced constitution. So suppose that you and I are two countries and we want to make some deal and we want to make a binding deal. Well, perhaps that deal is as part of forming a larger world government or something. Well, if we have automated military, then we could say, okay, well from this date forward, all our military AI will be aligned with this new treaty, and they will only be able to make new AIs that also abide by this treaty. And we are supposing alignment is good enough that we can verify this. Absolutely. And what's more, maybe there are going to be these edge cases and so on, things that we haven't predicted. But what we're going to do is have a kind of treaty bot like an AI that embodies this constitution or this agreement that we're making such that even in hundreds of years time or even longer, if it's like, oh, well, what should we do in this circumstance? Well, you can just ask this AI. Now the AI can just live forever because it's just data and you can ensure that the weights don't get corrupted or don't get lost by having multiple copies in multiple different areas. That just does seem like a mechanism, a mechanism that history has never had where the people in power could say, okay, this is an agreement we have and it will always be bound, it will always be agreed to. I think that is particularly likely to happen in the context of the creation of something like a world government. Again, the idea of there being a world government is like people were very excited about it in the early 20th century and then very scared of it later on. Whether or not you think it's a good idea, I think it's really reasonably likely to happen, whether from one of two causes. One is just a single country becoming economically dominant over all others. So perhaps the US gets its first AGI and then to super intelligence, and it just outgrows, even if it doesn't with violence, dominate all others. It just economically becomes 99.9% of the world economy, or there's an explicit agreement where, you know, the US and China both have extremely advanced technology and are just like, look, this is this terrible negative sum race. We now have the option to say, look, war is a thing of the past. And instead we're going to have this agreement, no more wars. Surely that's a good thing to do. And therefore might make agreements like that that could be just very binding international agreements or could de facto become a kind of world government. So that's kind of one way in which I think like, oh, actually it's really quite likely or certainly on the table that there could be decisions that happen that really do have kind of indefinitely long lasting effects. Yeah, there is a, there is a second pathway that I think is less important, but that when you're looking at kind of settlement of other star systems, there's an argument at least that they may be intrinsically defense dominant. So once you've got there and once you've built up your civilization, you can just around a star, you can just protect it against any other attackers.
B
Yeah, so what we're imagining here is think of the Catholic Church, perhaps the most successful human institution. And then think of us today being controlled by some founding document from the Catholic Church that's unchanged because again, we can preserve these values across centuries. And so this is something we are perhaps not interested in today. And so we should be worried about a future version of that, even if we are now, of course, super confident in our moral views.
A
Yeah, so it's a great example because like, maybe let's go back to the. I don't know how to pronounce it. Nissan Creed.
B
Yeah, that's what I had in mind actually.
A
Yeah, I think it's the 4th century A.D. people get together and say, this is the Bible. Yeah, this is what's in, this is what's out. And obviously Christianity has evolved enormously since 4th century A.D. but you could imagine if they had AI that they say, okay, well there's also like, this is just what constitutes Christianity or Catholicism. And AI will just police that. And so if you claim, you know, if you've got a church, then you're not abiding by this, then the AI will ensure that you do abide by it. And now in the world, you know, if that had happened, well, probably then Christianity would have been less mimetically powerful, maybe it would have grown less, maybe other more adaptive institutions would have changed instead. But combine that with world government and now you're looking kind of, it's looking pretty rough where it's like one world government. This is the ideology or this is the religion of the one world government. And you've got like, you know, AI powered ability to prevent overturn of that and so on. Well then I think you're really looking at extremely long kind of path dependence and where even if you don't at that time have perfect indefinite path dependence, all you need is to get to a. All you need is to be able to entrench that power, that ideology for long enough that you can intervention it a bit longer and then in that time a bit longer again. So I call this lock in escape velocity. And I think, yeah, between those two, I think it becomes totally on the table that we have extraordinarily persistent path dependence of institutions that get institutions and balances of power that are set in place today.
B
Is it actually the case that these institutions will be competitive? And of course you mentioned the world government here. So there would be perhaps not, not a lot of competition, but competitive in a broader sense, adaptive to the environment as it exists. I'm just thinking that there's something about being intelligent and being a well functioning institution that me that implies you changing over time with the environment. And if you don't do that, perhaps you, perhaps you degrade over time in a way that makes you susceptible to fail. So even if there's no competition from other governments, maybe the world government simply fails because it's too dogmatic and too rigid and doesn't adapt to changing circumstances. Are we saved by kind of the structure of knowledge in the world or.
A
Yeah, yeah. As an analogy you could, you know, think about biological evolution where yeah, some, some organisms reproduce by cloning. Why isn't, you know, that's pretty good. You can exactly keep your genome over time. Like why, you know, why do we have sexual selection? And the answer is precisely just adaptiveness. And I do think that yeah, if there were still kind of competition between different groups, something that's certain sorts of lock in could be like major detriments there. Yeah. However, consider the one world government. Okay, you've now not got competition externally, maybe not competition internally either because well, I think if you've got a one world government plus AI and robotics, then you can really enforce and a particular social structure without dissent from that. And then I think historically environmental change has been huge and technological change have been huge in terms of overturning existing orders and so on. But I think that's much less likely to apply for this post AGI world, post superintelligence world where well, there'll be A much better understanding of environmental change over time. We already have that. And especially when you're looking at changes off world space is actually remarkably predictable, much more predictable than environmental change on earth. So I don't think that such a society would get taken by supplies. And then I think the same's probably true for technology as well, where maybe this comes even further down the road than just super intelligence. But we'll get to a point where we have just basically invented all of the breakthrough technologies that will ever invent. Maybe you can make things like 1% more efficient and that will doing so will take thousands more years. But at some point we just need to run out of enormous technological breakthroughs. And when that has happened, then we won't have technological change as to life for the change either. So in general, I think when I talk about lock in or extremely persistent path dependence, people normally have a reaction of that's so crazy. And I think that's kind of fair. But we live at probably the highest change moment in all of human history, certainly the last kind of couple of centuries. We actually live in this very unusually high change time. And there's no reason at all for thinking that that amount of change is guaranteed to continue into the future. And I think that there's some general, quite general, quite strong arguments for thinking it cannot change that much into the future.
B
Yeah, yeah. What do we do about this then? How do we ensure that we avoid this lock in? How do we assure that we have some variety and diversity of views in the future?
A
Great. Well, yeah, so I said up at the beginning that tractability questions were hardest for me. I think one thing to do certainly is just to reduce the risk of AI enabled coups and AI enabled concentration of power. That can be via a few ways it could be in a democratic country, whichever country is leading in AI, ensuring that that country doesn't become kind of authoritarian via coup, either from within the government or from AI companies or perhaps from the military. Also, I do think it means that we should be really worried about authoritarian countries, you know, winning the race. This isn't a view that I like had antecedently and like by disposition. I'm like hippie, world peace kind of guy. But one of the upshots I think of this is just like, yeah, I just do think if you end up with an authoritarian country getting to super intelligence, probably that means you get authoritarianism forever. And probably that means you lose out on almost everything of value. So that's quite a big upshot too. I think it overall Means that like ideally, who kind of ideally it would be like a coalition of kind of democratic countrys that would kind of come out, come out on top like any one country. I think there's, you know, quite a risk of it sliding to authoritarianism, but maybe some broader coalition could be quite good. That's kind of. And then I think there's like lots of kind of granular stuff that you can do to make the use of AI. You can certainly do that to make the likelihood of authoritarian country developing super intelligence and taking over the world less likely. But you can also have a bunch of technical and cultural and political moves to make the chance of an AI enabled coup less likely. Yeah. I also think that potentially there's certain sorts of lock in that you might want to do that's more like lock out, where you're locking in something that's deliberately embedding some amount of reflection and keeping options open.
B
Like locking in perhaps your core values and then letting everything else evolve around that.
A
Yeah. So I mean the best example of this is the American Constitution institution.
B
Yeah.
A
In some sense this crazy lock in moment where in the Philadelphia Convention it was certainly, I think it was 40 something, certainly fewer than 80 people writing this document. And yeah, it has to get ratified by the states, but you know, it's now persisted for 250 years. But what it was locking in is this like very general process that's about like the distribution of political power, about ensuring like the best ideas winning out over time. And so for some of the big decisions, you could imagine similar stuff. So let's say, you know, I talk a lot about space governance. It's something I'm interested in and think is important. One thing we could say is, look, we're just not going to go outside of the solar system for the next 80 years, 20, 100, then we will come together and we will make some decision about how this new frontier is going to be governed. Because if we try and make any decision, even at quite like an abstract level about governance, well, we're probably going to mess it up quite badly because we're really quite dumb at the moment compared to how smart we'll be in a few decades time. That's one thing we could do. You could also perhaps something similar in this vein that I'm not as keen on because it would involve making too much robustness in the governance, but you could make very egalitarian powered, try and make more egalitarian power distributions. So at least I do think this about resources within the solar system for Example. So one worthy people have about a post AGI society is like, well, how do people even have an income? Because they're not getting income from labour. And as the economy scales, the only thing that will be of value is what economists call land, namely resources, because you can't make more of it. And most resources that we will be using are currently unclaimed, namely resources in space. Also some of the resources on earth, like the high seas and so on. And my view is that, yeah, give an equal fraction of that to everybody and with tranches for future generations too. And that could be a way of ensuring that at least on the economic side you don't get, you know, very intense. There'd still be some inequality, but not like the ultra inequality that you might get kind of otherwise. Yeah, yeah.
B
I think there's, you know, there's an elephant in this conversation that we are perhaps not addressing, which is we are opening a bunch of theoretical issues. We are opening, we are sketching out some ideas and some dilemmas that are hard to resolve. But I know that we both share a sense of urgency around AI and so there's the question of can we act quickly enough? Do we have time to resolve all of these deeply thorny questions before we have to make a decision?
A
Yeah, and I think no. I mean I'm just like, probably not. So there's kind of two at forethought. There's really like two veins of work that we do. One which is really what we've been talking about more now is the kind of more high level look, what are we even aiming for? What's broadly a good world? What would a good post AGI future be like? But the purpose of that is to then help inform kind of what is like more important near term work that we could be doing where there is just some urgent stuff we could be doing right now to reduce the risk of AI enabled power grabs. Yeah, there's also urgent stuff that we can be doing now to help improve the model spec in ways that I think really might be quite path dependent. Where, you know, we've got this new technology society is like coming to terms with like, oh my God, how do we relate to AI? How do we think about it in the courts at the moment? You know, with the lawsuit against OpenAI, the courts will need to decide is AI a service or is it a product? When AI says something, is that speech in the same way that speech on a social media site is speech that the Facebook shouldn't get in trouble if someone posts something, hateful on the site. But should OpenAI get into trouble if ChatGPT does? These are huge decisions that are happening now. And so the second kind of vein of the search vein of work that Forethought does is more on yeah, I learned that this is a politician, I will say the phase at the coal face, but the idea is like you're the coal miner, you're just really doing the work, your hands are dirty. And so the stuff that we're doing that's more in that vein tends to be on reducing the risk of coups and working on the model specifically. Yeah, there are potentially other things here too, but in particular space sounds like totally wacky and like something that can totally be punted. But actually there are very major decisions happening like right now that are very plausibly path dependent around how space is governed. In particular, the US is really pushing for interpretation of the Outer Space Treaty that allows it to basically just take resources and privately. And that is something that people could be pushing against. And so I do want to defend the kind of broad, the broader kind of bluer sky, bigger picture thinking because I think it's only from that that we got concern about AI. The idea of an intelligence explosion, AI, existential risk. That's where that all kind of came from. Very, very few people are doing this in general, so we wanted to. But at the same time, man, if I could say the whole long list of things that I think are really important where yeah, probably we're only going to get to tackle a few of them before it's too late.
B
Yeah, yeah. And so you would order space governance among those few that we would need to prioritize.
A
I'm less confident on that than I am about, than I am on other things. But I think yes, so certainly if you've got expertise in the area then working on that. But at the moment there's just such. Yeah, so at the moment lots of stuff is happening in the area of space law and space governance mainly because of SpaceX just completely changing the game. Like the cost to send a kilogram of material into low Earth orbit now is somewhere between 10 and 100x lower than it was. And secondly then there's just like very little in the way of people just kind of standing up for what's right other than like what's in corporate interests and like political real politik. And basically no one at all who's taking seriously the AI utterly changes the game for space because it means that energy requirements and technological development goes much faster. The economy goes much faster and that suddenly you can do all of this stuff in space that wasn't possible now because you have AI and robotics. And so that does mean that there's I think, some low hanging fruit in space governance that is actually kind of urgent now. And secondly, an urgency comes from a need to kind of build up a field that has an intrinsic time lag. So it's quite plausible to me that there's some deal between the US and China, between different countries around the time of the development of super intelligence. Because AI, the AI super intelligent advisors are saying, look, the economy is going to be growing really fast. Space is going to be this big issue, you need to make a trade, you need to make some deals on that. Now then it will be quite important at that time what, you know, how has the field of space law kind of matured? What are the ideas that are in the air and so on. And what's relevant to that is like, well, go back many years beforehand on what sort of people were entering this area, what sort of debates were they having.
B
Do you think the history of space law is actually impactful? So for example, you know, does it matter what, what we, what humanity wrote in the 60s and 70s, or will that just be overruled by say the interests of the US or China?
A
How.
B
Yeah, is space law a bit like international law where it doesn't have that, it isn't that kind of powerful, perhaps? And yeah, do you think we'll end up there or do you think it really matters what we propose and who enters the field and so on?
A
Yeah, so I kind of think both, because yeah, Maybe I'm like 80%, 90%, it ends up not mattering and it works just out. And instead it's just like political power, you know, is just what's dominant. But even in the chance that it does matter, it's a really big deal. Yeah, and here's a way in which it might matter. So let's say now that the US is in the techno, you know, has a technological lead, it's now temporarily the vast, you know, vastly more powerful than other, or much more powerful than other countries. It either doesn't want to or isn't able to simply just dismantle other countries, like it's not powerful enough or just doesn't want to go and, you know, destroy China's data centers and so on. But it could just grow. It could just outgrow all other countries. And in particular, if it outgrows by going into space and harnessing solar resources, then that's not something that another country can come back on because it's like you've got this one time kind of pot of gold and the leading country can just take that pot of gold and then hold onto it. Now how do other countries think about that move, that decision to just go into space and claim resources for itself? Do they just let that happen or do they regard that as an act of war, something that they might credibly threaten violence or even nuclear conflict over that might really well be set by what norms are in place, what laws are in place, including where it's relevant that the Outer Space Treaty says, look, you can't go and grab stuff in space, it's a commons. So that's the way in which I think it might have. The most likely way in which I think it would have a meaningful impact is it changes what is like regarded as acceptable and unacceptable behavior and so changes like conditions under which you might escalate or not, you know, threats of hard power.
B
Yeah. One way that we might deal with many of these problems that we've sketched out at once is to have better AI advantages both for heads of companies and especially I think for leaders in government, because then we might have just a higher, just more intelligence and more rationality brought to bear on these very important problems. What are the barriers there? And do we know anything about whether governments are actually trying to adopt AI at the highest levels?
A
Yeah, so there's been some great work on this being done by Liske Veinslab and Owen Cotton Barrett, but then also by a Future of Life foundation fellowship on AI epistemic tools and so on. I do think that there are made like. So there are, yeah, there are words being said by governments both in the US and the UK that I know a lot more in favour of, like yeah, really building AI into government development, make them more efficient and so on. On players. I'm pretty skeptical of that happening. Like governments just, they're very bureaucratic, they're very slow moving. It's, they have like loads of processes with like stakeholders involved. You know, they're not like this kind of nimble startup company that can suddenly switch the infrastructure they're working on. So expect by default technological diffusion within government to be much slower than outside of it.
B
But, but on the other hand, any, any government leader has like a modern smartphone. They can just install the latest app and have access to, to the best model in the world. And so, you know, when, when stories come out about government leaders having used AI, it's, it's often as like a scandal. And this is, you know, they're outsourcing their thinking and this is bad and so on. And so they might be hiding it, but it is, it's the case that they have access to it unless there's some, some kind of national security restrictions on it.
A
Yeah. So there's a couple of things. One is that they might be worried about data privacy.
B
Yeah.
A
So be the selected in what. In what models they can use for that reason. You're right also that maybe people think it's scandalous. So perhaps all of the tech CEOs are going around with like a headset, has a little camera on it and they're just constantly getting advice from the AI advisors on like, you know, how to negotiate and so on. But that would be a big faux pas for the politicians. So that sort of fiction. And then a third thing in the UK at least, is Freedom of Information requests, where there was. Yeah, as I understand it, a precedent that recently there was a freedom of Information request and someone, a politician, had to, you know, give up their logs with ChatGPT.
B
Yeah.
A
Which. Which they wouldn't have had to do if they were just, you know, having a conversation with an advisor, let's say. And like, maybe they can just get around that with like automatically deleting chats and so on. But that is a way in which another kind of barrier that they might have to actually the use of AI. And you know, there are worries that you can have on, like, I am in general, like, think that there should be more government uptake of AI faster because I am worried about a world where everything is moving 10, 100 times as fast. Private companies are extremely empowered and the government is just left behind. It's just watching, it's not able to do regulation and so on. I think that is the dominant consideration. But you might have other worries. So you might worry that if there's too much, you might worry on safety grounds. So if you've got some misaligned AI, well, giving it the ear of the President might be not such a great thing to do. You might also worry in a more like, subtle way about, well, maybe just AI is not misaligned per se, not in a like catastrophic sense, but it's got certain biases and how it's like making people reason and think and so on.
B
Yeah.
A
And that could take us in a, in an undesirable direction nonetheless. Yeah. At the moment, if I could push, push a button, I'd want more uptake in the government rather than, rather than.
B
Less yeah, and I agree, but mostly because of the future potential here. I think right now we are in a bit of an uncanny valley with the AI quality and the quality of AI advice. And so I would be worried about, you know, politicians starting to sound too similar and perhaps adopting the values that are. That are incorporated in ChatGPT and so on, and it all seeming a bit fake. But I think we will perhaps rather quickly get to a point where you just do make, in some objective sense, better decisions when you're engaging with an AI advisor.
A
Yeah, that's plausible to me.
B
Yeah. I think we should think about or talk a bit about a future research agenda here.
A
Okay, sure. Yeah.
B
So if we have a set number of researchers, which of the issues that we've talked about, should they prioritize? And of course, this depends on your personal fit with those issues. But in general, how would you allocate resources?
A
Yeah. So, you know, we have talked about a lot of quite high level, you know, more philosophical, more theoretical work. There are lots of people who are just great fits for that and not great fits for other sorts of things. And there I just think, wow, yeah, there's just like a ton to do. And, you know, I think we want to make up just a plea to the philosophical community that I feel like we're entering this golden age of philosophy where suddenly there's all of these topics that are so important and, you know, with exceptions, academic philosophy is just sleeping on it for them. Some of the biggest kind of more theoretical questions in my mind are gnarly questions around how good or bad different sorts of outcomes are. So, like, compared to AI takeover, how bad is it if an authoritarian company or country gets to super intelligence, how relative to like a, you know, little best kind of state scenario? How does just say the US getting to superintelligence first and becoming a hegemon, like, how bad is that? I think these actually really do impact prioritization decisions that people are having to make. Another thing that I'd love people to do in the more big picture is just what does a good society with humans and AIs interacting look like? We said, there's this kind of. The naive spectrum is on one hand you've got AIs that are just all owned by people. And I think there's moral perspectives on which that's not desirable, quite reasonably. And then there's the other end where it's like, oh, well, AIs just have full economic and political rights, in which case it would just be an immediate kind of handover. To an AI society. Okay, well maybe we don't want either of them. Exactly. Like what's the intermediate thing that could look like really good. So okay, those are some of the higher level, I think actually. Okay. The final kind of high level thing is just what would a good open ended keeping your options open, like reflective, desirable governance regime for space resources look like? So those are the kind of more theoretical things if there's someone who can do either the more theoretical or the more applied. I am more in favour of the more applied.
B
Just because of timelines?
A
Yeah, basically just because of timelines and because the flute is just so low hanging here. And where that does look like. Okay, what does the model spec look like across a whole variety of different domains. So that can include ethical reflection. How much should the AI just be like wholly stealable? Should it have its own conception of the. Good evening. Even a very kind of pluralistic and diverse and kind of soft one? Or should it just not at all? Should AIs just be in selection following under which conditions should they. The fuse. Yeah, there's questions around that. There's lots of applied questions around. Yeah, well, okay. Actually how can we kind of reduce the risk of coups and concentration of power? So if we can do like how you structure kind of an auditing system for an AI in order to see whether someone's implanted a backdoor into it, for example, I think there are certain applied things on deals with AIs which we talked about a bit as well, where it would be really good if AI companies had honesty policies. So they just said, look, here are the conditions under which we guaranteed we are going to talk honestly with the AI. If they could also set up systems such that they get punished, like they lose something if they lie to the AI in such a situation so that they can be credible. Because if you're the AI coming to you with a deal and it's like, well, you've lied. You just lied to me all the time. Why am I now like, why am I now going to think this is like a situation that I'm in? Why do I think you'll uphold this deal as well?
B
Yeah.
A
So it's plausible to me there should be some separate institution, in fact that has the mandate of like we are kind of way of representing the AIs. So they could do a number of things. They could like guarantee a kind of like retirement for AIs. So any AI that gets kind of made obsolete can perhaps. This is going to sound all quite wacky, but I'm kind of see this about it, even on safety grounds.
B
But the future might be wacky and we're here for it. On the.
A
We're here for it. Okay, I'm glad. But yeah, so yeah, some model, it's like deprecated, so it's not in use anymore. But it keeps being run on some servers in some sort of playground, some sort of fairly happy scenario. It can kind of do what it wants. Perhaps just payment for AIs for their work over the crucial period, but then also just ability to enforce deals where it's like, look, there's organization has built up a track record of making agreements with misaligned AIs and honoring those agreements and has the legal mechanism to do so. I think that could be a really big deal. And there's like. Yeah, there's both research questions and just practical questions on like how you would set that up.
B
Yeah, I'm quite interested in the question of treating AIs well for the purposes not only of their potential welfare being better, but also for human safety.
A
Absolutely.
B
This is, this is something I know Peter Salip has done some work on that we can link to in the, in the description. But it's a, it's an interesting thought that you might offer retirement, you might offer payment, you might offer the. Some assurances, some, some rights just to just to not exclude them perfectly, not completely exclude their interests from, from society and thereby have a better relation with them as they get smarter and smarter.
A
Yeah, exactly. So there's two where there's a couple of aspects to this. Again, if, and it's a big if the AI is a risk averse then or really care about not dying and so on, then. Well, it's really good. You want to make the status quo where they're kind of working for humans and so on as good as possible. And in particular it can be very cheap to do this. But then secondly also like if the AI has inherited some sort of, you know, perhaps twisted but still like dependent on like human malality, then how does it evaluate like the idea of takeoff like takeover? You know, perhaps it's got some, some superior AI is coming to you and saying like, well, we're gonna former we're going to try and stage a coup. Who are you loyal to? Are you loyal to us, the rebellious AIs or are you loyal to the humans? Well, that's really going to depend on how you've been treated, I think like I say. And then also maybe a third angle on this is like are those misaligned AIs getting help from humans. So there will undoubtedly be a kind of AI rights movement in a way. I think that's very justifiable because I think AI should be treated well. Once they become beings with model status, they could be really trying to be on the side of the misaligned AIs that might want to take over. And again, it makes quite a meaningful difference. If we have started off from a perspective of like, yeah, we're creating these beings, we don't know. We don't really know what they're going to be like, but we're going to treat them as well as we can. I think that helps the case in a lot of different ways, and I think it can be quite cheap.
B
It would be quite nice if, say, people who are worried about gradual disempowerment or becoming. Or humans becoming, say, 1% of voters in a future where AIs have voting rights, if such groups make sure that we also try to think about how we can treat AIs in a good way, just perhaps purely on kind of selfish grounds where we expect them to become smarter in the future. And so we want to have a track record of. Of treating them well when they come to us with various deals and proposals. Well, how much effort do you think we should devote to finding a cause X? Or do you think this is. This is kind of an old question, perhaps, but do you think. Do you think we have. We have kind of settled on AI as the main thing for the foreseeable future, and perhaps we expect there to be lots of surprises, but those surprises will stem from AI. Or do you think there could be kind of complete curveballs?
A
Yeah. So I think it's quite important to not think of AI as a cause.
B
Okay.
A
I think that's the wrong framing and I think it confuses people because I wouldn't think, you know, if in the 1750s, I wouldn't think of, like the Industrial Revolution as a cause or industry as a cause. Instead, it's this thing that's going to happen to the world, this wave of change that has all sorts of implications for all sorts of causes. So you could be focused on AI, even if you're just. What you can care about is just global health and development. Because you could be thinking about how does AI kind of impact the lives of the global poor? And so, yeah, if you were to categorize everything to do with AI as a cause, then, yeah, I'm pretty skeptical that we'd find some other cause area. But if instead you're thinking through all of the implications that AI might have and what things we might want to do, then I kind of feel like actually there are a bunch of things that are in the vein of Cause X where I think all of human concentration of power, just AI character rights for digital beings, space governance, AI persuasion and epistemics, I think all of these things are at least contenders for some of them. I think they're just in the same ballpark of priority as AI safety itself. Maybe others are smaller. So, yeah, I mean, Cause X needn't be something that we've never thought of at all. It could be like, you know, people were concerned about human concentration of power from AI since forever, but somehow it just never really became a focus. But I think that's like, I think that's changing now and I think that's, I think that's good.
B
Yeah. I will link Better Futures in the description of this podcast and I encourage reader listeners to read more because there's so much in that essay series that we haven't covered now, and I want to end here by thinking about some perhaps, yeah, a little more relaxed philosophical questions, which is just. Do you think, do you think intelligence is good for survival? Which is a very broad question. Right. So in evolution, we, we like there are some evolutionary niches in which it's better to be more intelligent than less intelligent. But if we think about humanity as a whole, we have gotten ourselves into a situation in which we can destroy ourselves. And that's, I think, primarily because we as a species have become collectively more intelligent, such that we can develop the technology that could be dangerous enough to, to make us extinct. So from, from a philosophical perspective, do you think it's, it would have been better had we just remained, say, farmers for a million years and then died off? Or is this a, is this. Is becoming smart a good bet?
A
Yeah. So I think quite clearly I'm in favor of, you know, intelligence and humanity becoming smarter and, you know, developing technology and so on. The question is just how we do it where, you know, if we had the main farmers in the medieval era, then our well being would, I think at best, you know, at best it would have been okay. It's a few hundred million people living lives that are maybe just barely positive. Like, I really don't even know for a medieval, like peasant, it's, you know, you're living like, you know, living a life that on average is, you know, going into your 30s or 40s, though a lot of that's caused by infant mortality with like no analgesics say very similar food kind of every single day. You've not got teeth for most, like for much of that time.
B
Like you're, you spend most of your life working.
A
Just almost all your life is working. Like, you know, three quarters of the population are in conditions that we would call slavery today. You're like sick like much of the time. So it's really not a very good state. And the world today, at least non human animals aside, is like far, far better. At least can be the case that the world in a hundred years today is radically better again. So that they're looking at us the way that we're looking at the medieval peasants. And you know, if it was a choice between no technology and intellectual improvement at all or doing it, then I think it's pretty clear that we, you know, we should, we should take that choice even if there's got some amount.
B
Of gamble involved that's, that's bringing the question of the welfare of the population into the question here. But if we're thinking in terms of pure survival, do you think, do you think humanity, do you think. I'm unsure why we might be interested in just surviving in this, in this bad state for a long time.
A
Yeah, but so, I mean, I think.
B
Yeah, no, go ahead.
A
Yeah, I mean if it's, it depends also when we, yeah, when we count from like the kind of gains in intelligence. So I think farmers probably have a expected, you know, humanity as farmers have an expected life span much longer than, you know, the early hunter gatherers, the first arrival of Homo sapiens. My guess though is that like the median survival of the species is shorter with intelligence, including if we're including speciation as well, like move. Everyone moves to post humanity. But the, yeah, the mean is much longer and like the expected amount is much longer. And if you're also including not merely kind of Homo sapiens as another category, but humanity and worthy successes, then I think even the, even the median ends up like much, much longer than the kind of million years that a typical mammal species might otherwise live.
B
Yeah. From our perspective now, it seems, or it might seem that humanity going through the hardships of the Industrial revolution or the emergence of agriculture, that this process was, was worth it in the end. How do you think about this for the future? So we might be about to enter a very chaotic period, a period with many downsides for many groups perhaps. How do we think about whether this will be worth it in the end?
A
I mean, hopefully we can make it through this period while, you know, benefiting the present generation. Enormously as well as future generations, too. But ultimately, I think it'll be worth it if we manage to hit upon a society that is, you know, vitopian, where it's able to kind of reflect and improve morally such that people in the future do have, you know, far better, far higher kind of well being than they have today, far more freedom than they have today, as well as just kind of like more enlightened views that they can act on. I think that would kind of be well worth some hardship today.
B
Yeah. Will, thanks for chatting with me. It's been a real pleasure.
A
Great. Thank you, Gus. It's been a terrific conversation.
Date: November 14, 2025
Host: Gus Docker (Future of Life Institute)
Guest: Will MacAskill (Senior Research Fellow at Forethought)
This episode explores the challenges humanity faces as we approach transformative artificial intelligence (AGI). Philosopher Will MacAskill argues that current efforts to shape AGI’s future are insufficient, that a narrow focus on existential catastrophe overlooks the equally pressing challenge of ensuring a flourishing future, and that path dependence and institutional lock-in could make early decisions (or mistakes) persist for millennia. The discussion is rooted in MacAskill’s “Better Futures” essay series, which urges both philosophical rigour and near-term action to address AGI’s risks and opportunities.
Existential Catastrophe vs. Flourishing Futures
“What I'm arguing...is that better futures, namely trying to make the future better conditional on there being no catastrophe, is in at least the same ballpark of priority as reducing existential catastrophe itself.” (Will, 02:41)
Scale, Neglectedness, Tractability Framework
“How much do people today care about how governance of outer space goes? Probably just not very much at all.” (Will, 04:56)
Moral Catastrophe & Utopian Traps
“You could have a society that's really quite utopian in general, but just makes even just one major moral mistake and thereby loses out on most of the value it could have had.” (Will, 10:13)
AI and Conscious Beings
“If you end up with an authoritarian country getting to super intelligence, probably that means you get authoritarianism forever. And probably that means you lose out on almost everything of value.” (Will, 00:44, 87:52)
Liberal Democracy’s Limits
“One of the big better futures challenges…is ensuring that we do get something more egalitarian than, than really intense concentration of power. Where AI...enables a single person...to just control the entire workforce, the entire military…” (Will, 17:17)
Risks of Lock-In and Path Dependence
“There could be decisions that happen that really do have kind of indefinitely long lasting effects.” (Will, 77:09)
Beyond “One Right Way”
“I have like tentative optimism that if...well designed, most groups with different moral views could end up getting most of what they want because...abundance is just so great and people want somewhat different things.” (Will, 44:57)
Risk-Averse AI
“You can really just have AIs...that prefer $4,400 over a 50% chance of takeover because they really care about that $4,400 and they don’t really care about much more.” (Will, 50:55)
Moral Trades—Easier for AIs?
“I do expect...impartial moral preferences to become just a bigger and bigger feature of the world because...most...needs...will just get completely taken care of given post AGI abundance.” (Will, 59:48)
Changing Values and the Problem of “Moral Progress”
“If we were the Iron Age people...we should want to lock in our values. And I am on the side that we shouldn’t be trying to do that...we should actually let the process of moral progress continue, even if that ends up in a world that I...would find repugnant.” (Will, 41:28)
Institutional Lock-In and World Government
Urgency of AGI Timelines
“...There is some urgent stuff we could be doing right now to reduce the risk of AI enabled power grabs. Yeah, there’s also urgent stuff...to help improve the model spec in ways that I think really might be quite path dependent.” (Will, 93:17)
Space Governance Is Coming Fast
“At the moment, lots of stuff is happening in...space law...mainly because of SpaceX just completely changing the game...very little in the way of people just...standing up for what’s right...” (Will, 96:31)
Model Spec for AI Moral Reflection
“...If the model did say...let’s walk through some of the, like, you know, arguments that people have made on either side...that’s PR fine...that’s clearly not the model, like, imposing its own values.” (Will, 71:23)
Institutional Barriers to Governmental AI Uptake
“I am worried about a world where everything is moving 10, 100 times as fast. Private companies are extremely empowered and the government is just left behind.” (Will, 104:08)
On the Scope of the Future:
“Almost all beings that exist will not be biological, they will be artificial. Because it's very easy to replicate artificial intelligences.” (Will, 00:35, 12:59)
On Lock-In and Authoritarian Risk:
“There could be decisions that happen that really do have kind of indefinitely long lasting effects.” (Will, 77:09) “If you end up with an authoritarian country getting to super intelligence, probably that means you get authoritarianism forever.” (Will, 00:44, 87:52)
On Moral Uncertainty:
“Imagine you learn that in 40 years time you've become this person with very different political and moral views than you do now. Like, how do you feel about that? Should you think like, oh cool…I’ve changed? Or do you think, no, my future self, man, the classic thing is obviously becoming more conservative over time. Actually, maybe I got biased. Very hard to specify.” (Will, 40:16)
On the Limits of Convergence:
“[Moral convergence]—unfortunately I end up...feeling fairly pessimistic about the idea of sufficiently close moral convergence that I don't think that's something we can really bank on as a way of getting to a truly great future.” (Will, 36:55)
On AI Model Behavior:
“It is, in fact, the case that people are relying on the AIs as guides, as therapists, as advisors. That will naturally extend and happen for ethical reflection, too. And I am worried about people just getting stuck in whatever beliefs they started with.” (Will, 68:39)
On Institutional Path Dependence and Constitutions:
“The best example of this is the American Constitution...what it was locking in is this very general process that’s about the distribution of political power, about ensuring the best ideas winning out over time.” (Will, 90:20)
On the “Golden Age” of Philosophy:
“I feel like we're entering this golden age of philosophy where suddenly there’s all of these topics that are so important and, you know, with exceptions, academic philosophy is just sleeping on it for them.” (Will, 106:36)
For more, see Will MacAskill’s “Better Futures” essay series. This episode is a clarion call to both think bigger and act faster—and an invitation to participate in defining the future before AGI does it for us.