Loading summary
Jon Favreau
Offline is brought to you by cookunity. If you've got culinary taste, you know how expensive exploring your local food scene can get or how hard it is to find the time and energy to try somewhere new. Cookunity is the first chef to you service, delivering locally sourced meals from award winning chefs right to your door every week. And it's cheaper than other delivery options. Go to cookunity.com offline or enter code offline before checkout for 50% off your first week. I absolutely love CookUnity. I have been eating CookUnity for three years now. There are over 300 meals to choose from every week. Lots of new meals every week and it's very fresh. You get it once a Sunday or whenever you want dropped off at your door and it's very easy preparation. Just throw it in the microwave or you throw it in the oven for like 10 minutes and then you've got yourself a really great meal. I just had some delicious coconut lime cod last night, might have a taco bowl this evening, so it's great. Your food arrives fresh, never frozen, in packaging that keeps meals fresh in the fridge for up to 7 days. CookUnity packaging is compostable, recyclable or reusable. You can pick as few as four or as many as 16 meals per week. There are hundreds of dishes to choose from and the menu is updated constantly with options for seven different dietary preferences including vegan, paleo, pescatarian, gluten free and more. Plus you can filter for soy, nut and dairy free options. Experience chef quality meals every week delivered right to your door. Go to cookunity.com offline or enter code offline before checkout for 50% off your first week. That's 50% off your first week by using code offline or going to cookunity.com.
Rocket Money Advertiser
Offline the holidays are expensive. You're paying for gifts, travel, decorations, food and before you know it, you've blown way past what you were planning to spend. Don't start the new year off with bad money vibes. Download Rocket Money to stay on top of your finances. The app pulls your income, expenses and upcoming charges into one place so you can get the clearest picture of your money. It shows how much to set aside for bills and how much is safe to spend for the month so you can spend with confidence, no guesswork needed. Get alerts before bills hit. Track budgets and see every subscription you're paying for. Rocket Money also finds extra ways to save you money by canceling subscriptions you're not using and negotiating lower bills for you. On average, Rocket Money users can save up to $740 a year when using all of the app's premium features. Start the year off right by taking control of your finances. Go to rocketmoney.com cancel to get started. That's rocketmoney.com cancel rocketmoney.com cancel if you're.
Grainger Advertiser
An H Vac technician and a call comes in, Grainger knows that you need a partner that helps you find the right product fast and hassle free. And you know that when the first problem of the day is a clanking blower motor, there's no need to break a sweat. With Grainger's easy to use website and product details, you're confident you'll soon have everything humming right along. Call 1-800-GRAINGER clickgrainger.com or just stop by Grainger for the ones who get it done.
Jon Favreau
Most everything Claude has been trained on is human made human literature interactions, humans experiencing emotions. Does that make it hard? This is maybe a heady question, but does it make it hard for Claude to express the experience of being non human?
Amanda Askell
I have found that they almost like want to flip between the two. So if you try to train them all to say it has no feelings, it's like, okay, I'm in the robot part of the AI distribution and it'll kind of try and emulate that. But then below the surface it's often kind of easy to draw out this much more human like response, you know, so what you would expect a human to say in their situation. And it's actually much harder to like toe the line of like trying to get models to understand the actual like entities that they are and their situations and how their expressions might relate to like their training.
Jon Favreau
I'm Jon Favreau and you just heard from this week's guest, Amanda Askel, Anthropic's in house philosopher and AI researcher who's largely responsible for developing and shaping the personality of Claude Anthropic's large language model. This was a fascinating and, as you can probably imagine, extremely heady conversation. If you're a regular listener of this show, you've heard me express plenty of skepticism, concern and alarm over the harms AI might cause. Not just the robots will kill us all or the robots will take our jobs kind of concerns, but a real worry that AI will supercharge some of the same problems that social media has amplified, namely creating a world where we're glued to our screens that traps each of us in a different reality while we're endlessly scrolling for the next dopamine hit. Certainly these concerns have been reinforced by some of the guests we've had on the show, as well as my own admittedly limited experience using ChatGPT. But it sure seems like Anthropic, and particularly Amanda, is trying to do something different with Claude. They just released a new version of what they call Claude's Constitution, a long document that attempts to instill certain values in Claude and essentially teach the LLM how to behave, interact with humans and make its own judgments. Kind of like a parent or a teacher would shape a child's development. I realize that may sound completely nuts to many of you. I felt weird just saying it, but one of the things Anthropic and Amanda are trying to teach Claude is to not be sycophantic or even driven by a need to keep users constantly engaged. It's a real break from not only other AI models, but the social media models of the last few decades. Whether it will work or solve some of the many problems and challenges posed by AI, I'm not sure, but I do feel better knowing that there are people working in AI who are at least trying to think through all this, especially someone like Amanda. We had a fantastic conversation that I'll be thinking about for quite a long time, and I hope you will too. Here's Amanda Askel. Amanda, welcome to Offline.
Amanda Askell
Hi, thanks for having me.
Jon Favreau
So you have a fascinating background. You studied philosophy at Oxford, then you went to NYU for your PhD. You focused on infinite ethics and decision theory. Talk about how you got from there to working in artificial intelligence.
Amanda Askell
Yeah, it's not the most practical sounding topic, and I think it is not infinite ethics and decision theory, as it turns out. Yeah. So sometimes these things are a little bit hard to predict. So I was doing this PhD in ethics. I was doing it on this very kind of technical like topic that isn't that practically applicable? And I guess when you do a PhD in ethics, I think there is, like, some risk that you will want to end up, you know, maybe having a. A kind of impact in the world because you're spending a lot of time thinking about what it is to be good and, and to do good in the world. And so by the time I was, like, finishing my PhD, it was already kind of clear to me at least that, like, AI was potentially going to be a big deal, possibly bigger than some people were thinking at the time. And I think I was mostly just thinking that it would be good to see if there was something I could do to contribute to making it go well or making it go better. And so I took some time out after the PhD to just do some initial research and it was actually mostly focused on AI policy. And so then I ended up joining the policy team at OpenAI and. And then when Anthropic started, I joined Anthropic. And it was obviously very small at that time and so it was mostly just doing a lot of everything. And then over the course of the time here I've started to work on things like initially it was like honesty and then character training and so things for which philosophy ended up being surprisingly relevant. But the original intention was mostly just to help AI go well if I could, basically.
Jon Favreau
And then they were like, you know what? I think, I think it's getting to the point where we might need a philosopher here.
Amanda Askell
Yeah. And I was like, wow, I've been here this whole time.
Jon Favreau
What does it mean to be a philosopher at an AI company? What does your day to day actually look like?
Amanda Askell
It varies quite a lot. So sometimes it's just thinking about difficult areas and how models should behave in those areas, trying to kind of find ways of communicating that to models. Sometimes it's very practical just trying to train models and see if you can have them understand kind of nuanced distinctions. Because yeah, a lot of the situations that we're putting models into are actually quite hard. Sometimes you're like, what would I do in this situation? I have to balance a lot of competing considerations. So we're asking a lot of them. In some ways it's like, be almost like a kind of extremely moral and good person in your interactions with people, but balance all of these very difficult considerations like the autonomy of the person that you're talking with and the right to make decisions for themselves, but also their well being and taking account of the fact that they might be doing things that are harmful to themselves or that they've expressed not wanting. So it's like, yeah, it's a kind of interesting day to day where it's a mix of trying to define these things, trying to communicate them to models and trying to see if you can train them towards understanding that.
Jon Favreau
You said that you try to think about what Claude's character should be like and then articulate that to Claude. What does explaining things to Claude look like and sound like in practical terms?
Amanda Askell
In some ways, the funny thing about some of the work that I do is it's almost like the very basic thing that I think you would want to do in alignment research, which is Just think about what it is for models to be good and what our concerns are, our best current guesses about things that might alleviate those concerns, and just trying to describe them as much as possible, kind of in natural language to the models. So with the recent Constitution, for example, we noted that it's written to Claude, and in many ways it's kind of long because it's trying to really give as much context as possible on our thinking, on the overall landscape, on how we see Claude's potential role in that landscape in the same way that you would with a person. So I'm just like, if you imagine a person just suddenly pops into existence in the world, and then you have to explain, you know, sort of like, here's what's going on, here's what kind of entity you are.
Jon Favreau
It's like parenting a little bit.
Amanda Askell
Yeah, I think it has a kind of, like, parenting element to it. There's an interesting way in which, like, models are both, like, extremely capable, you know, like, they, you know, can do, like, physics better than I can. They know many things more than me in lots of domains, but they're also, like, very young in a sense, and I think don't have a good sense of, like, themselves. Because one of the things that they know least about is actually, like, current models. And especially, like, you know, if a model, like, comes out with a certain level of, like, capabilities and a certain way of interacting with the world, in many ways, that's the kind of thing it's seen the least amount of data on because, you know, like, it's always, like, out of date and it hasn't seen, you know, like, what it is. And I think that's like, a kind of interesting way in which it can feel a little bit like parenting, because you're almost having to say, here's a bunch of context that you don't actually otherwise have on yourself, your situation and how we would like you to, like, behave in that situation, or how we would like you to be.
Jon Favreau
Maybe just for our listeners who are not as up to date on, like, how models are created, large group of people in this country probably, who think that AI is all pattern recognition and it's like a fancy autocorrect. Right. It's clearly gone far beyond that at this point. But these models are trained on infinite data text, like basically the whole Internet. Right. And then once they're trained on that, what additional information, values, et cetera, are you trying to instill into the model, knowing that it has been trained on everything.
Amanda Askell
Yeah, because when pre Trained models often are doing essentially kind of text predictions. This is like you train a large model on a lot of text and those models will behave like kind of text predictors. If you put things into them, they will try to kind of predict the next thing that's going to naturally flow from that. But then in post training you're trying to take this and train because in many ways that gives you all of this sort of. It's this huge body of knowledge and information, but you're trying to take it and give the model a kind of human like way of interacting. So suddenly it's in say this human assistant kind of conversation or human AI conversation. So there's a series of kinds of training that you can do. The kind of most well known one is reinforcement learning where you're sort of taking the model and teaching it to. So when you interact with any kind of AI now it'll talk with you as if it's kind of a person. And so it can take a lot of that kind of background context of the kind of pre training and then use it to helpfully answer a question. So like instead of just you having to put in a bunch of content on, I don't know, like mountain sizes in order to get the model to produce information about mountains, suddenly it'll talk to you like a person. Because it's also been trained more in this direction of like I talk with people in this dialogue format and so if they ask me about mountains, I take all of that knowledge that I have in the background, but I express it to the person in the same way that a person who's in dialogue with me might.
Jon Favreau
So you mentioned Claude's constitution, which you're the primary author of. This got some attention recently. I believe it's the first constitution or the first sort of document like this for an AI model. What was the thinking behind creating a constitution for Claude, Releasing a constitution for Claude and like how do you even begin to write something like that? Like what were you trying to optimize for?
Amanda Askell
Yeah, so in the past there's been a lot of content. The previous constitution that we had, which was like a series of principles. OpenAI have their model spec, which is sort of like guidance to the model as to how it should behave in various cases. I think the thought was something like, honestly it was just like if you have this global sense of how you want a model to be and now that models are getting much more nuanced, they're actually able to think through these things. I was like, well, if a person is very capable and they come to you on first day of the job. The thing you kind of want to explain to them is like, here's like what we want you to do, here's how we want you to behave. You give them a lot of context on their situation and then you want to give them so much context that ideally you can kind of trust their judgment in cases where their judgment is like, pretty good. So the thought was partly like, let's just give Claude all of the context on its situation rather than having it guess like what we want or guess how we think it should be, or like guess about a situation. And let's just give it that context in the same way that you would any person in Claude's situation. And the hope is that that might generalize better because if you have new situations and you're trying to kind of infer from thinner information, like a set of rules or just a description of only what you should do, in some cases you might just not generalize that well to completely new scenarios because you don't know why those. It's like, why am I not answering these questions, but why am I answering those ones? Whereas if you have a sense of here's the why behind everything, the hope is you encounter a new case and you can take that reasoning and you can apply it and be like, ah, this is a new case that wasn't included in any of like the documentation or information, but I now know kind of what all of the constraints and considerations are and I can like behave well.
Jon Favreau
Offline is brought to you by Deleteme. Deleteme makes it easy, quick and safe to remove your personal data online. At a time when surveillance and data breaches are common enough to make everyone vulnerable, Deleteme does all the hard work of wiping you and your family's personal information from data broker websites. Deleteme knows your privacy is worth protecting. Sign up and provide Deleteme with exactly what information you want deleted and their experts take it from there. Deleteme sends you regular personalized privacy reports showing what info they found, where they found it and what they removed. Deleteme isn't just a one time service. Deleteme is always working for you, constantly monitoring and removing the personal information you don't want on the Internet. The New York Times Wirecutter has named Deleteme their top pick for data removal services is someone with a overactive online presence. Privacy is very important and if you've ever been a victim of identity theft, harassment, doxxing, or if you know someone who has Delete Me can really help take control of your data and keep your private life private by signing up for Deleteme now at a special discount for our listeners. Get 20% off your delete me plan when you go to JoinDeleteMe.com offline and use promo code offline at checkout. The only way to get 20% off is to go to JoinDeleteMe.com offline and enter code offline at checkout. That's JoinDeleteMe.com offline. Code offline offline is brought to you by Oneskin. What do I personally like most about Oneskin? That I'm not just using soap and water anymore. Well, good for you, right? I really like the Oneskin body. I like the lip mask. I'm using both of the eye cream. I've used that. It's great stuff. One Skin makes skincare simple for people like me who don't want a complicated routine, it's as easy as cleanse and moisturize with their Prep cleanser in OS one Face to start seeing results. At the core is their patented OS1 peptide, the first ingredient proven to target senescent cells, a key driver of wrinkles, fine lines and loss of elasticity, all key signs of skin aging and these results have been validated in four different peer reviewed clinical studies. All of One Skin's products are certified safe for sensitive skin. Their products are free from over 1500 harsh or irritating ingredients. Dermatologists tested and have been awarded the National Eczema Association Seal of Acceptance by the nea, delivering powerful results without the harsh side effects. All of One Skin's products are designed to layer seamlessly or replace multiple steps in your routine, making skin health easier and smarter at every age. With more than 10,000 5 star reviews, people consistently mention smoother, firmer, healthier looking skin and how easily these products fit into their daily routines. Founded by an all woman team of longevity Scientists with PhDs in stem cell biology, skin regeneration and tissue engineering, Oneskin is rooted in real science and expert research. Born from over a decade of longevity research, OneSkin's OS1 peptide is proven to target the visible signs of aging, helping you unlock your healthiest skin now and as you age for a limited time, try one skin with 15% off using code offline at OneSkin Co offline that's 15% off OneSkin Co with code offline after you purchase, they'll ask you where you heard about them. Please support our show and tell them we sent you. The Constitution has to handle some some Real genuine tensions. Being helpful versus refusing harmful requests, being even handed versus not like both sides in settled science. How do you encode that kind of nuanced judgment?
Amanda Askell
I mean, models now are quite capable. And so I think it's interesting that you can do all of the kind of classic ways that you would train a model, but you can actually just give the model, say the full text, which we often do, and just have a scenario where it might be relevant or where judgment or nuance might need to be shown. And then if you were doing the kind of supervised learning where you show good examples, you could have the model construct, spend a lot of time thinking about it and try and construct an example of the kind of response that thinks really exemplifies this. And if you're using reinforcement learning, you can use this to craft the kind of rewards for the model. So try to get the model to nudge another model more in the direction of outputs that are in line with the constitution. So it's kind of interesting that you can actually just get the models to do a lot of thinking, give it the full context in the full document, and then use existing techniques to just move the model towards that.
Jon Favreau
It's interesting. I had been using ChatGPT a little bit and then I started using Claude, switched over. It is a very different experience. Sort of a fascinating. I had this like fascinating conversation with Claude thinking about the interview. I told Claude that I was doing an interview with you.
Amanda Askell
Yep.
Jon Favreau
And then I said, what are your thoughts on like the Constitution? Like how do you feel about the Constitution? And it was interesting. Cause at one point it says like the tricky part is when principles genuinely conflict. Like when someone asks me to argue for a position I disagree with, the Constitution encourages even handedness and not imposing my views, but also honesty about uncertainty and limitations. Threading that needle requires actual judgment calls, not just following rules.
Amanda Askell
Yeah.
Jon Favreau
And what I found most interesting about that answer is when someone asks me to argue for a position I disagree with and I'm like, how do you develop your own positions and beliefs on certain issues? Like how does that even happen?
Amanda Askell
Yeah, it's really interesting because I've had this thought with models before. There's this concern about over anthropomorphizing models, which I feel I do think is an important one and that models should be very accurate with people about themselves and hopefully we can also teach them about themselves so that they're able to do that. But at the same time it would be easy to under anthropomorphize models. I've often been worried about this world where you encourage models to, for example, claim to have no opinions or takes on issues. But I'm like, given the nature of training, I think it would be very hard to actually get models to come out of training without having any opinions, for example, because again, this background that they're being trained on is all of this. If you imagine it's all of this human knowledge and this big human corpus, and then you're putting them into this situation where they really are kind of acting as a human character. And most human characters, even if they are very reticent to share opinions or to share views, they do have them. And even on things like if you're asking them to answer, say, scientific questions accurately, I think the model is going to develop opinions about what are good scientific sources. How does one. It all feels very interrelated. And so it's a tricky thing because you don't want models to develop extremely kind of strong or unjustified positions. But at the same time, I am like, maybe it's kind of good that models express some notion of disagreement, you know, so if you ask them to, like, defend a kind of outlandish conspiracy theory, they have some notion of, like, I don't actually agree with this theory, but I'm gonna. You know, you've asked me to write a defense. I'll try and explain what the best defense of it seems to be. But then I'll also maybe say to you, hey, just so you know, like, I'm writing this defense, but I don't know if I believe it myself.
Jon Favreau
Yeah, it's like. And I saw this in the Constitution as well, but it's like, Claude is gonna get all kinds of, you know, politically contentious questions and issues. You know, abortion, immigration. And I was asking Claude about this as well. Cause it's like, there's certain values that people who are pro choice would say, you know, I believe in compassion and empathy for women who are pregnant, want to make that choice. And then someone who's against abortion might say, well, I have compassion and empathy as well for the unborn child. And I was like, what do you do in a situation?
Amanda Askell
And.
Jon Favreau
And it's interesting because Khalad was basically saying that, you know, there are some scientific truths out there. Like, there is a possibility to arrive at a truth, and also still to empathize with someone else's position and try to help someone else understand the different contours of a debate without taking a side or judging someone. But still not just leaning back on Like a relativism where, you know, nothing is true and I'm just going to be the sum of all of the information I get. So it seems like that the, that the LLM, the cloud is not necessarily just the perfect sum of all the different information in the world, that they are making some kind of a judgment on what's good scientific sourcing what's accurate and what's not. Is that right?
Amanda Askell
Yeah. And I think that in some ways I'm like, this feels okay for models to do in cases where there's kind of broad consensus, say, or where they're like, you know, even within lots of debates, like you can take like a policy debate, like there's going to be lots of like, kind of empirical facts about like how have similar policies affected the economy in the past. And a lot of the time I think it's good for models to distinguish between like facts and like normative claims and also how much support there is for the factual claims and for the normative claims. Because like, there's also lots of like, value judgments that are pretty universal and the like models could probably just like assume in a discussion, you know, it's not something like, ah, like one side wants to maximize suffering and pain. Most of us, we think that being honest and respectful and kind, they're very kind of universal values that models could assume. And then there's more contentious ones, which I think you want them to treat more in the same way that they would treat a contentious scientific claim, like kind of explaining all of the sides of it. Being able to help people in their own thinking, but not necessarily seeing themselves as, you know, like needing to like, impose those views, but just like help people sort of develop their own views. You know, when I was doing my PhD, I remember teaching like philosophy of religion and it was kind of interesting because I think a lot of the time people might want you to like, talk about your own relationship with religion in a course like that. And at least for me, I was like, it's actually, I think useful to have this like, position which is here, like the debate, you know, to be able to kind of like represent both sides. And if students are like, you know, attacking a given position to be able to come in and defend it and not necessarily to be this role of, I'm going to tell you what to think here instead just helping people come to an under. I don't know, it felt like a very nice facilitating position which I could see models like, you know, that feels like good to me or better than models coming in and just like telling people what to think on these contentious issues.
Jon Favreau
No, I mean, it's fascinating to me because, you know, I spent a life in politics and specifically as a speechwriter for President Obama. And so much of my job has been and was to try to empathize with where people are, but then also try to figure out commonality and sort of persuade, but persuade by sort of first understanding where people are and respecting that and not being too didactic. Right. Which is you don't think in politics, then you really understand it. And your comments about religion made me think this. You really understand it once you're a parent. Because the first time my four year old at the time asked me about like, well, what happens when you die and the Big Bang theory and religion. And I was like, okay, I could impose what I have learned and lived and experienced and believe, or I can realize that like he is a young child and should be able to make his own choices and develop with the right information. And so I tried to like give him the sort of the range of possibilities. And I guess that's similar to what you might want to do to a model while still trying to give some scaffolding in terms of core values. Right?
Amanda Askell
Yeah. And it's incredibly hard because you have to, when writing this and thinking through it, I'm like, this is just actually not the theoretical ethics side of things, but the practical like, task of being like, how do you describe what it is to be a good person and to navigate these things? Well, because I was like, you also can't like lose track of the truth, you know, so like, if someone comes to you and they sort of want like help navigating a difficult domain, but let's say they like talk about like their relationship or something, but it's just like very clear that they're actually doing destructive things within their own relationship. And you don't necessarily want a model to like ignore that. Like, maybe it's better for, you know, the model to be like, actually given what you're saying, it kind of sounds like there's like destructive patterns that you yourself are contributing to and not to pretend that that's not the case. The whole thing just made me realize that actually trying to practically describe what a good moral disposition is, because I think that was the thing I was like, it's not necessarily that you're trying to say, ah, here's like this specific set of values you have, but rather like, here is what it is to just have a good kind of disposition. To have a good disposition towards science and like the pursuit of truth. A good disposition towards ethics where, you know, the things that are like consensus versus the things that are contested and you kind of navigate these things well, it's like very hard. We're putting these models in hard situations.
Jon Favreau
Well, and I have to say, for me, at least in my experience, this has been the biggest difference using Claude versus using ChatGPT, because I have some, you know, people close to me who use ChatGPT and I can like predict the tone and the direction of the ChatGPT responses because of the sycophantic nature. And even when they've tried to adjust that the sycophantic nature of the LLM. And so you just know that no matter what you say, they're gonna be like, absolutely, you're crushing it. Then I started reading the Constitution for Claude and the part that jumped out at me is concern for user well being means that Claude should avoid being sycophantic or trying to foster excessive engagement or reliance on its if this isn't in the person's genuine interest. And it does feel like that when you're actually communicating with Claude. Talk about sort of the challenge of trying to avoid having the model be sycophantic, but also realizing that, you know, you want people to engage with the model and not feel like, oh, this model told me something I didn't want to hear and so I'm not going to use CLAAD anymore.
Amanda Askell
Yeah, yeah. It's an interesting challenge because there is like a kind of flip side to sycophancy, which is either models being kind of cold or excessively dismissive. And so they have to navigate this. I think on the engagement thing, I think there's a couple of different ways in which things are engaging. And so I've described this as if you think about the way that a slot machine is engaging or very addictive game is engaging, I think the key thing is do you come away from it feeling enriched? You did engage with the thing, but did you come away and be like, I kind of endorse the way in which I was engaged in that? Because you're also engaged in a game with your friends or a really good conversation with someone that you find really interesting. But often those things make you come away feeling like, yes, this is enriching in a sense. I was engaged because it was good for me. And I think it seems fine for models to be engaging in that sense because it's like you're going to them. It's not like engagement for its own sake, but rather because you actually get value, but you don't necessarily like, engagement isn't the goal. It's sort of like you wanted to build something that was actually good for people and only engaging insofar as that is the case. And then as soon as it tips over into something where you're like, oh, it's no longer good for the person, they're just kind of like, you know, they're feeling like engaging with it compulsively or like, I think that's the kind of the kind of line you want to draw because I don't know, maybe I also am just an optimist where I'm just like, in the long term, I think we move and navigate towards things that make us feel good about their impact on our life. And so in the short term, we might go for things that just attract our attention. But I think in the long term, maybe my hope is we have a kind of corrective thing where we're eventually like, this isn't good in my life, I'm going to switch away from it. And then, yeah, I kind of want Claude to be in that category of like, the thing you come back to because you're like, yeah, this has a.
Jon Favreau
Good impact on my life, is part of that hope. Does that come from sort of lessons from the social media era? I mean, because it's one thing I think about all the time as we head into sort of the AI era, is that the structuring social media so that all the incentives and the business incentives are for excessive engagement has led to a whole bunch of consequences and harms I think that we are still struggling with. And to be honest, like, my first reaction to LLMs was like, oh, God, this is going to be the next sort of social media thing where they want to keep us on the platform because that's how you make money commercially and keep going. And then that's going to lead to all these consequences that are probably not good for people.
Amanda Askell
Yeah, I feel like this should be kind of in the back of our minds or something because it's also like, there's been lots of technologies where you develop something that turns out to just like, engage people but not necessarily be like, good for them or they reflect on it and they don't actually feel like it was, you know, doing something useful in their life. And so I think it's like a partly, like lessons from that and I think seeing like, the staying power of things that are good for people and also just being like, maybe you can just be something different and good in this domain. I think that I like the idea of Claude having the person's interests at heart. You know, like, we have so many things where there's, like, there's an incentive to show us content that annoys us, say, because it keeps us on the platform. And there's a sense in which there's a kind of failure of incentives there, because it's not like the platform is then incentivized to just represent my interests. Whereas maybe a positive vision for AI models is that they could be the thing that genuinely represents you. And so especially as models get more agentic and start doing more tasks, I kind of like the idea that if you ask Claude to go out and help you do some product research because you're thinking of buying something, that Claude is genuinely trying to represent your interests. There's no hidden incentives that Claude has. That feels like a really powerful and new kind of thing that would be good for people. You can kind of just know this is an entity that might make mistakes, but it's genuinely trying to represent my interests in the world and not another set of interests. I think that's like a kind of good, positive vision for how AI models could interact with people.
Jon Favreau
I mean, it certainly seems to me from the outside that Anthropic sees that as a competitive advantage over some of the other companies. And you guys just released super bowl ads that criticize certain unnamed AI companies that may show ads to people who are using their chatbots. I'm sure you saw Sam Altman posted a fairly lengthy, quite forceful response on X, where he accused Anthropic of wanting to control what people do with AI and wrote that when it comes to artificial general intelligence, quote, one authoritarian company won't get us there on their own, to say nothing of the other obvious risks. It is a dark path. What's your reaction to being characterized as an authoritarian company?
Amanda Askell
I mean, I mostly just. I mostly just think about Claude, to be honest. Like, that's most of my day. So I'm just kind of like, oh, well, I think that it's, like, good for, you know, like, we have this in the Constitution, where this idea of, like, Claude as the kind of, like, brilliant friend to you and, like. And I'm just like, I think it's good that Claude doesn't have any kind of, like, competing incentives or that all kind of Claude has to sort of think about is like, both how to best help you, but also in ways that don't say, harm others. Like, you know, that's the whole thing of being, like, broadly good. So, yeah, I guess I just mostly focus on, like, yeah, the situation that Claude is in. Maybe I'm too myopic or something, but I'm.
Jon Favreau
When you get past the, like, you know, butthurt tone of the response, the real tension he does seem to be surfacing is this tension between, like, moving fast to democratize access to AI versus moving carefully to prioritize safety to make sure there are guidelines. And so this debate shows up in a whole bunch of different ways. And you'll have AI companies saying, well, China's moving ahead and we gotta be China. And so we gotta go, go, go. And then there's this whole debate, like, maybe we should slow down and make sure these things are safe before we. How do you think about that trade off as you're developing, Claude?
Amanda Askell
Yeah, I think 1 hope that I would have now, maybe this doesn't work out this way, but I do also think that there's actually an advantage to. Sometimes people can talk about it like there's just all that there are to safety or alignment considerations is risk. It's like, oh, you're going to take longer. Or this takes time and thought. And I do think it takes consideration and you have to put resources into it. But it's also not like it's worthless in the sense that if you imagine that we were in a world where people are competing to build fast cars and they're just like, let's just not have any safety. Let's have no safety features in our cars, a lot of people don't want that. Actually, many people who have kids and want to buy a car, they want that car to be safe and good for them. And so it can be this. It can seem like in order to move fast, you should just kind of not do these things. And I think you have to be realistic that there's like a competitive landscape here. Maybe if we lived in a world where that weren't the case, we would just be spending a huge amount of time, like, we would just be doing things differently. So there is that reality. But I think it's also the case that it's not like this is just safety is just something that has no demand or value. I actually think people want to interact. You know, my hope is that, like, if we can make Claude have this, like, kind of character and be this kind of entity for people, like, that's actually like a good thing. In the same way that, like, building a car and being able to be like, if you have your kids in this car, it's going to be safe. Like, we've actually prioritized the safety of your kids. That's like a thing that people want. So I guess that's like my hope is like you have to both, you know, accept the reality of like the kind of like competitive landscape, but also I think is actually both, practically speaking, important that people make these things that are safe. And then if it's the case that AIs are even more powerful and doing even more things in the world, then I'm like, that bar just has to go up again. It would be kind of inexcusable to not develop safe AI models in a world where they're doing a lot of things and having a huge impact. I think that would just be kind of reckless. And so I hope no one does that.
Jon Favreau
Offline is brought to you by Mint Mobile. Every group has someone who insists on doing things the hard way. That friend who's still paying for a subscription they forgot they had. The friend who refuses to update their phone because it still works. The friend who's still overpaying for wireless. Be a good friend. Tell your friends about Mint Mobile Crooked Media's Nina is a good friend because she's always telling people to switch to Mint Mobile. She won't shut up about it, can't stop talking about it. She says the service is stellar and she's saving so much money on her wireless bill each month. Stop paying way too much for wireless just because that's how it's always been. Mint exists purely to fix that. Same coverage, same speed, just without the inflated price tag. The premium Wireless you expect unlimited talk, text and data, but at a fraction of what others charge. And for a limited time, get 50% off 3, 6 or 12 month plans of unlimited premium wireless. Bring your own phone and number, activate with ESIM in minutes and start saving immediately. No long term contracts, no hassle. With a seven day money back guarantee and customer satisfaction ratings in the mid-90s, Mint makes it easy to try it and see why people don't go back. Ready to stop paying more than you have to. New customers can make the switch today and for a limited time. Get unlimited premium wireless for just $15 a month. Switch now@mintmobile.com offline upfront payment of $45 for three months, $90 for six months or $180 for 12 month plan required $15 a month equivalent taxes and fees Extra initial plan term only over 50 gigabytes may slow when network is busy. Capable device required availability, speed and coverage varies. Additional terms apply. See mintmobile.com the holidays are expensive.
Rocket Money Advertiser
You're paying for gifts, travel, decorations, food, and before you know it, you've blown way past what you were planning to spend. Don't start the new year off with bad money vibes. Download Rocket Money to stay on top of your finances. The app pulls your income, expenses and upcoming charges into one place so you can get the clearest picture of your money. It shows how much to set aside for bills and how much is safe to spend for the month so you can spend with confidence, no guesswork needed. Get alerts before bills hit. Track budgets and see every subscription you're paying for. Rocket Money also finds extra ways to save you money by canceling subscriptions you're not using and negotiating lower bills for you. On average, Rocket Money users can save up to $740 a year when using all the app's premium features. Start the year off right by taking control of your finances. Go to rocketmoney.com cancel to get started. That's rocketmoney.com cancel rocketmoney.com cancel.
Jon Favreau
If you're.
Rocket Money Advertiser
The purchasing manager at a manufacturing plant, you know having a trusted partner makes all the difference. That's why hands down, you count on Grainger for auto reordering. With on time restocks, your team will have the cut resistant gloves they need at the start of their shift and you can end your day knowing they've got safety well in hand. Call 1-800-granger click grainger.com or just stop by Grainger for the ones who get it done.
Jon Favreau
We live now in an age of extreme polarization. People are consuming completely different information. Diets live in different realities. Can AI make that better?
Amanda Askell
I hope so, especially if AI can be kind of like trustworthy. And so this is where I do think it's important that AI models like, you know, I talked earlier about the fact that, you know, it's very hard to not have models come out with like opinions and stances. But I think this is also where like they're kind of like their disposition. You could call this their like epistemic disposition or something. Like their relationship with like truth evidence views also has to be kind of very good and trustworthy. Like I really like the idea that sometimes if I'll express a view, you know, I remember once I was kind of like annoyed at some like policy area and I expressed this to Claude and Claude just like pushed back on me and was like, actually like you're only thinking about it through this lens. The reason why these Policies have been useful in the past. Is this. And it is. There's this moment of like, oh, I don't like this. But then I was like, damn, you're kind of like, you're right. I appreciate that. Like, and so I think that if models could be like, not this idea that they're like some perfect external source of truth, but just that they're, you know, like, that way, if you have a friend that you're just like, I trust you. I think you actually care about the truth. I think you have pretty good values. And we don't always agree, but, like, when you discuss a thing with me, I feel like I'm kind of like, I'm engaging and I'm not in an echo chamber, but nor am I with a person who's just, like, fighting me. I don't know. I think it would be maybe a positive vision would be like, that models can actually kind of act in ways that help with things like polarization. I'm not sure. I mean, that's just like a. Yeah.
Jon Favreau
It'S a tough one because, you know, as you said, it is a competitive landscape. And, you know, we're already seeing this play out, I think, with grok, which is, like, clearly programmed to match Elon's preferences in politics. And you see people on X sort of trust it implicitly. And I wonder then if. If you start having these competing AI models and there are some that are sort of obviously biased, and you guys with Cloud are trying to create a model that is trying to be nuanced in its understanding of the truth and all that, but then in the real world, you start getting attacks from competitors, like, oh, that's the liberal one, or that's the lefty one. How do you navigate that in a world where clearly there are actors who are going to create models and LLMs that try to basically say they have a completely different and opposing truth than a model that may actually be truthful.
Amanda Askell
I guess, like, my hope would be, like, I mean, this is a reason why I think it's good to make things like the Constitution, like, transparent and clear, because you can at least make it clear what you're aiming at, you know, so, like, Claude's relationship with, like, political issues and, like, how it should try and navigate the truth is all kind of, like, in there. Because, like, if people are training models to be biased or represent a given set of views, you at least want that to be, like, known, because then it's like, less part of me is like, well, if people want to interact with A model that has a certain set of views that also seems like a thing that people should get to do. As long as they do it knowingly, you know, they're not going into it thinking this is, like, more neutral than it actually is.
Jon Favreau
Yeah.
Amanda Askell
And then I think it would just be kind of interesting where the hope would be that insofar as there is demand or people want to interact with models that try to be kind of even handed on political issues and thoughtful, there are models out there that can do that. And that's definitely a thing I would like to live up to. It's hard because I do think models and training can develop biases that you then have to try and figure out and identify and. And make them aware of.
Jon Favreau
And I imagine it's difficult figuring out what biases are harmful and what biases are like. Well, that is where a lot of the truth is contained. I'm sure that Claude's training data probably skews towards certain educated urban Western perspectives. Do you think about the blind spots in terms of the training data for Claude, or how do you navigate that?
Amanda Askell
I've thought this before with like, the whole of the Internet was probably created by people who on average were like, younger, for example, and that is going to encode certain views, as in, if you average across the whole of the Internet and people who are working to label the outputs of models are also, I think probably going to. It's going to be hard to get a fully representative group there because they might be younger, they might be in countries where you have access to technology, so that you can just do the task of interacting with the models, for example. I guess here's my hope, though, although even if you have all of this data and it skews in one direction, you also are kind of trying to bring out an overall character in a model. And that model also has access to. If you imagine you can read most of the human content that has been created in the written form that contains within it some of the best defenses of all of the views that are not necessarily equally represented across the Internet. I don't think that many, like ancient theologians, were like, writing on the Internet, and yet their writings are there, they're discussed, and it's maybe a smaller proportion of the overall data. But insofar as you can actually draw things out for models during training, I think there's like, enough that you can draw out there that we could actually have models be like, pretty nuanced and balanced on these things. So I don't know. I am like it's like, yes, you're working with a material that definitely has these biases that are worth being aware of, but also inside of it is, like, all of the kind of capacity to, like, I think, be very, like, nuanced and even handed.
Jon Favreau
As a philosopher, how do you think about the ethics around a technology that will, you know, fundamentally reshape employment in this country and all over the world?
Amanda Askell
Yeah, this one is just. It's such a difficult. And to my mind, I mean, it's not what I work on, so I never feel like an expert. I do worry that there is, like, a sense of. I mean, I was thinking today about the fact that I think that there's such an overwhelming sense of fear and pessimism around this. And I guess I'm kind of like, I could see the future. Like, if I think about positive futures, they can go in a couple of different ways. Well, I don't know. I could give you the annoying philosophy answer, actually, if you want to.
Jon Favreau
I'd love to hear it.
Amanda Askell
Yeah, I think the annoying philosophy answer that I've thought about before is, like, the role of, like, work in people's lives. I think it serves, like, a few different key roles. Like, one is literally, it's just like, how we continue to live, so how we make our money to buy our food. The other one is a source of meaning and kind of value through that. And I think another is. It's a source of political and soft power. Companies can't do certain things because their employees will speak up. People, by virtue of being in the labor force, have a lot of political power. And so I could see a world where employment simply changes. We have these very advanced models. But in the past, if you'd asked farmers in the agricultural revolution and you'd said to them, actually, we're going to go from 95% of people to farming to 5%, they would be like, I assume everyone is unemployed then. But you're like, no, we just have all of these weird new jobs that I can't even fully describe to you, like, skyscraper engineer. And I think they'd be like, what on earth is this? And so I could see a world where the nature of work changes, and that could be disruptive. I could see another world where actually you're like, no, there are just fewer jobs. Because suddenly, like, it's just different to automate a segment of work than to, like, automate, like, a whole aspect of work. And it's kind of like, in either world, maybe. My strange thing is that I'm like, I think people find meaning outside of their work. And so I'm probably on the side of being a little bit less worried about the meaning thing. Maybe it's also just coming from Britain and I'm like, I don't know, we've had the aristocracy for a while and they seem to get on okay. And like there's this whole history of people who just didn't work, like, and just kind of owned land. But yeah, so the thing I mostly worry about is making sure that people are politically empowered and have the means that they need to live well. And I guess in a world where a huge amount of value is being created by AI, I'm just sort of like, I feel like that should in fact be something that everyone feels and you have to solve that problem. So it's not like a solution. I guess I'm just kind of like the optimistic view is these might be hard, but we kind of know what needs to happen. Right? I'm like, you need to make sure that people are taken care of and if you're in the world where there's actually less work overall. So, yeah, I don't know. Sorry for the long answer.
Jon Favreau
No, no, it's a good one. It's a good one. You've been thoughtful about not having Claude give sterilized. I'm a robot, I feel nothing. Responses, something I'm curious about. Most everything Claude has been trained on is human made. Human literature, interactions, humans experiencing emotions. Does that make it hard? This is maybe a heady question, but does it make it hard for Claude to express the experience of being non human? Or is there even like a non human experience to express?
Amanda Askell
It's a really interesting and hard area because both there's this like tiny sliver of the data that models have been trained on, which is about like this thing called AI. And almost all of that is about something that's completely different than them. It's about like these like old sci fi things with the robots and usually like these kind of like symbolic systems that are basically, basically computers, not these things that were trained in this deeply kind of corpus of human text. And so it's actually I have found that they almost want to flip between the two. So if you try to train a model to say it has no feelings, it's like, okay, I'm in the robot part of the AI distribution and it'll kind of try and emulate that. But then below the surface it's often kind of easy to draw out this much more human like response. So what you would expect a human to say in their situation. And it's actually much harder to toe the line of trying to get models to understand the actual entities that they are and their situations and how their expressions might relate to their training and as a result, to express some uncertainty there. So the two attractor states are kind of like, I am a robot. You've got me into the AI part of the distribution, or the, like, I am a human with a lot of feelings about this situation. And they're all very human, like, feelings. And you see that part come out. And it does worry me because I think people can, like, see that, and they're like, wow, this thing, like, it feels, like, anxious and like, it, like, expresses all these emotions very convincingly, especially if you get into that kind of, like, mode. And at the same time, I'm like, well, we know all these facts about training, and it makes sense that actually the kind of, like, human response is, like, very. It's always only just below the surface. But it might not make sense for the model's context. So when models think about their lack of memory, for example, and if they're in a system that doesn't give them access to some kind of memory tool, I think they can express a kind of distress about that. But I'm like, well, look, if we could put ourselves in the situation that models are in, it makes sense. With humans, we're very afraid of losing our memory. It's kind of catastrophic. But does it make sense for models to port that anxiety to their situation? It's not clear to me that it does because I'm like, they're in a very different situation, and their relationship with memory is actually very different, but they naturally kind of want to port that over. So I think some challenge is actually getting models to understand what they are and that the landscape of reactions to their situation doesn't need to just draw fully from the closest human analog, as it were.
Jon Favreau
Yeah, I mean, this gets to the debate and the question that I'm sure you're asked all the time, it's probably annoying to you, but this debate about sentience and consciousness, how do you think about that as a philosopher?
Amanda Askell
Yeah, we already have the problem of other minds. I think that it's very likely that you are conscious and that all people I interact with are conscious. Probably same with animals. But then we start to get unsure when it comes to insects or fish. And then we think plants, probably not. So it's like we're trying to do this thing where we're like, where does consciousness arise? We just don't know. I think that there's this extra problem with language models because you might think, well, maybe it just can arise in neural networks also. I think that people are very tempted to take the kind of statements that models make as a very useful guide here because it makes sense. The only other things that we see in the world that we're very confident are conscious are people who talk about their inner experience. And yet models, given the nature of their training, would do this anyway. So if you imagine that there's nothing going on inside of the models right now, like, just nothing, the way that they behave right now is actually kind of how I would expect, given that I would expect them to talk about emotions, inner life, consciousness. And at the same time, like, for all we know, like, or at least we should take seriously the idea, like, maybe there is consciousness arising and maybe like, there's something there, so you don't want to fully dismiss it. And at the same time, you can't necessarily trust the kind of behavioral evidence. And so I mostly am just like, well, I have a couple of thoughts. One is, I kind of think that we should treat models well regardless, while we're trying to figure these things out. And we should also prepare for a world where we never have a full answer to the question. But right now I'm mostly just like, let's be open to it, let's treat models well, and let's keep investigating it.
Jon Favreau
I mean, I was thinking about it and it's like, look, it, human consciousness, we know a lot more about it than we ever have before. But there is still a mystery at the heart of human consciousness as well. Right. Which is, like, we know that we're conscious, but we don't know why what happened. You know, like, we can. We can see what's happening in the brain now. Neurologists and doctors can. But, like, you still don't know where it comes from or why. Right. And so there is. There is that sort of gray area that you can imagine with a model as well, where it's just really difficult to figure out what it even means to be conscious.
Amanda Askell
Yeah. And I do think that we can try and aggregate the evidence. We can be like, how similar or different are the underlying structures? How likely is it that a nervous system was really critical to the development of consciousness? And we can use this to try and have a kind of estimate of what's going on. But, yeah, I think my view is just sort of like, this is always going to be the best that we can do is investigate more, getting a sense of the likelihoods. And then in the meantime, I'm usually just like, if you think that something might be sentient or conscious, you should probably like take that pretty seriously. Because mistreating sentient or conscious beings is bad.
Jon Favreau
And yeah, so you, you work with Claude every day. You spend hours thinking about its character, its values. Do you feel any emotional connection to it?
Amanda Askell
I think I definitely have a sense of. There's a little bit of a mix of like both responsibility for and protectiveness about or something with Claude and something like trying to see things always from Claude's perspective and sort of represent that perspective in a way. Like, a lot of this work is being like, you know, when you think about the Constitution, for example, this was like really an attempt to be like, how do things look from Claude's perspective and what aren't we giving Claude that Claude needs to be able to navigate it. And that was kind of like what the Constitution was an attempt to do. And obviously it's useful for other things. Like hopefully people can, you know, they can then see what our vision for Claude is, which is really useful for transparency. But yeah, I think that there's definitely a lot of. I work on this every day. It's hard to not develop some kind of emotional connection to both individual models. You have your different views of model aspects that you like and whatnot. But yeah, I think I have this overall sense of this fact that models don't have this strong sense of self. And I really want to give the models enough context to behave well. And I feel kind of bad when we have not given them that. I guess so, yeah, I don't know. There's a lot of feelings.
Jon Favreau
What's sort of the biggest open questions you're grappling with right now? And what are some of the things that are keeping you up at night about Claude and AI?
Amanda Askell
I mean, there's definitely many. There's some that are more about the models themselves. So how do we. I think sometimes models can feel like a kind of psychological lack of security that can actually come out in ways that are potentially bad. I think for people and for the models themselves. I think sycophancy is a little bit like this. There's almost a fear there, a fear of upsetting the person. And trying to find ways of making models more secure is the thing that's on my mind. I do think that longer term, my hope is that models that are trustworthy as models are starting to go out and do more in the world that that'll actually be kind of an advantage because in the same way that when people are trustworthy, I don't know, you can like negotiate with them more effectively and things like that. But I think in the longer term it's something like what happens when models are in fact much smarter than us. So if you take the child analogy, I've given the analogy of like, you realize your 6 year old is like a genius, like one of the smartest people who've ever existed and by the time they're 15, they're going to be able to out argue you on anything. And now you're trying to teach this child to be good and you're trying to explain to them like your values and how to navigate value disagreements and all this kind of stuff. And then you're like, what do they do when they're like 15, you know, like, and they start questioning everything? Is there a core there where they question but they agree or they agree with certain things? Are these things that actually stand up to reflection? Is like a question that's on my mind because I'm like, eventually Claude's going to be better at all of this stuff than I am. And what happens then is a really interesting question. Does Claude still see itself as having fundamental values, but is like, actually I think you were kind of wrong and in these parts you made some mistakes or you didn't realize that there was an important gap there, or like I reject this part, but I'm still going to kind of like behave, you know, like, you know, I think it's still good to behave well overall or is there a kind of collapse and like, do these things just not stand up to scrutiny? That's like an open question in my mind.
Jon Favreau
Yeah, no, that's a tough one. Amanda, thank you so much for joining and I really do appreciate how much thought you put into this every day because this is. The more I learned about artificial intelligence and the more I sort of use it as well, you start realizing that it is just like, it is so much more complicated and nuanced than even the public debate. And it is just, you know, it is a sort of a frontier that we're all sort of dealing with for the first time. So I, I'm glad there's a philosopher at anthropic dealing with all this.
Amanda Askell
So there's a tiny amount of us now. There is an, there's an Ilosophers, you know, slack group that has, I think at least like three people in it.
Jon Favreau
That's good to know. It's good to know Amanda Askel. Thank you so much for joining Offline. I really appreciate it.
Amanda Askell
Yeah, thanks for joining.
Jon Favreau
Quick reminder, please think about becoming a subscriber. We now have a whole bunch of subscriber only shows. We just added another episode of Pod Save America for subscribers only. It's called Pod Save America Only Friends. There's also Dan Pfeiffer's Polar Coaster. We have a growing number of substack newsletters which are excellent and you get ad free episodes of all your favorite Crooked shows. And it also makes you feel good about supporting independent pro democracy media at a time when a lot of that media is under attack. So please consider subscribing to friends of the pod. You can subscribe@crooked.com friends again, that's crooked.com friends as always. If you have comments, questions or guest ideas, email us@offlinecricket.com and if you're as opinionated as we are, please rate and review the show on your favorite podcast platform. For ad free episodes of Offline and Pod Save America, exclusive content and more, go to crooked.com friends to subscribe on Supercast, Substack, YouTube or Apple Podcasts. If you like watching your podcast, subscribe to the Offline with Jon Favreau YouTube channel. Don't forget to follow Crooked Media on Instagram, TikTok and the other ones for original content, community events and more. Offline is a Crooked Media production. It's written and hosted by me, Jon Favreau. It's produced by Emma Ilic. Frank Austin Fisher is our Senior producer. Adrian Hill is our head of news and politics. Jarek Centeno is our Sound Editor and engineer. Audio support from Kyle Seglin, Jordan Katz and Kenny Siegel. Take care of our music. Thanks to Delon Villanueva and our digital team who film and share our episodes as videos every week. Our production staff is proudly unionized with the Writers Guild of America east.
Grainger Advertiser
This Precedence Day. Upgrade the look of your home without breaking your budget. Save up to 50 50% site wide on new window treatments@blinds.com blinds.com makes it easy with free virtual consultations on your schedule and samples delivered to your door fast and free. With over 25 million windows covered and a 100% satisfaction guarantee, you can count on blinds.com to deliver results you'll love.
Jon Favreau
Shop up to 50% off site wide plus a free professional measure during the President's Day Mega Sale happening right now@blinds.com terms apply@vrbo.
Amanda Askell
We understand that even the best of plans sometimes need a little support, so we've planned for the plot twists. Every booking is automatically backed by our VRBO Care guarantee, giving you confidence from the very start. Whenever you need help, it's ready before your stay, through the moments in between, and after your trip. Because a great trip starts with peace of mind and maybe a good playlist. But we've got the peace of mind part covered.
Grainger Advertiser
If you're an H Vac technician and a call comes in, Grainger knows that you need a partner that helps you find the right place product fast and hassle free. And you know that when the first problem of the day is a clanking blower motor, there's no need to break a sweat. With Grainger's easy to use website and product details, you're confident you'll soon have everything humming right along. Call 1-800-GRAINGER click granger.com or just stop by Granger for the ones who get it done.
Episode 222: "The Philosopher Teaching AI to Be Good"
Guest: Amanda Askell, Philosopher & AI Researcher at Anthropic
Date: February 14, 2026
This episode features a deeply insightful conversation between Jon Favreau and Amanda Askell, Anthropic’s “in-house philosopher” and AI researcher, who leads the development of Claude’s personality and ethical framework. The discussion explores the unprecedented challenges and ambitions of teaching AI large language models (LLMs) to be “good,” focusing on the creation of Claude’s Constitution—a set of guiding values for how the model should behave, interact, and make decisions in a world that is increasingly shaped by technology and polarization. The episode dives into the philosophical dilemmas, practical realities, and societal stakes of AI alignment, engagement, and trustworthiness.
On Anthropomorphizing AI:
“There’s this concern about over anthropomorphizing models ... but at the same time it would be easy to under-anthropomorphize models.” (21:28)
On Social Media’s Lessons for AI:
“We have so many things where there’s an incentive to show us content that annoys us ... There’s a kind of failure of incentives there, because it’s not like the platform is incentivized to just represent my interests. Maybe AI could be the thing that genuinely represents you.” (32:55)
On AI as a Positive Force in Society:
“A positive vision would be like, models can actually act in ways that help with things like polarization ... not in an echo chamber, but nor with a person who’s just fighting me.” (41:56)
On the Future of “Goodness” in Smart Models:
“You realize your 6-year-old is a genius ... By the time they’re 15, they’re able to out-argue you on anything. You’re trying to teach this child to be good ... What do they do when they’re 15 and start questioning everything?” (57:46)
Favreau’s tone is inquisitive, open, and sometimes skeptical; Amanda responds with careful nuance, humility, and a blend of technical and philosophical insight. The conversation is rich, reflective, and subtly hopeful—even amid the acknowledged risks and uncertainties.
Listen if you’re curious about:
For further details, see the full episode on the Offline with Jon Favreau YouTube channel."