
Loading summary
A
Why did one large company decide to create a data fairness tool that's free and publicly available? Find out on today's episode. I'm Alice Shong from Sony and you're listening to Me, Myself and AI.
B
Welcome to Me, Myself and AI, a podcast from MIT Sloan Management Review exploring the future of artificial intelligence. I'm Sam Ransbotham, professor of analytics at Boston College. I've been researching Data analytics and AI at MIT SMR since 2014 with research articles, annual industry reports, case studies, and now 12 seasons of podcast episodes. On each episode, corporate leaders, cutting edge researchers and AI policymakers join us to break down what separates AI hype from AI success. Hey listeners, thanks again to everyone for joining us. Today. I'm talking with Alice Cheong. She's the global head of AI Governance at Sony and lead research scientist for AI Ethics at Sony AI. She leads a team that guides the establishment of AI governance policies and frameworks across all of Sony's business units. She's been a research scientist and a whole bunch more. Alice, thanks for joining us.
A
Thank you so much for having me.
B
Hey, to start, we first talked a few years ago when you were at the Partnership on AI, but I'm curious what you're up to now. Can you tell us about Sony's work on responsible AI ethics and governance?
A
Yeah, sure. Sony is a large multinational company headquartered in Japan with a diverse array of businesses around creative entertainment and technology. So we have operating companies focused on music, motion pictures, video games and electronics. And AI became an an early focus of ours back in 2018 when we first set up our AI ethics guidelines. As a technology company, we wanted to ensure that this new and emerging technology was being used responsibly across our business units. And indeed, since then, it's only grown in importance for our company. I have two hats on at Sony. One is as global head of AI governance, where over the past several years we are one of the early companies to start investing in AI ethics. And when I joined, I established our AI Ethics office and our processes in terms of how different AI uses and AI integration into products and services is evaluated for responsible AI. Now we're at the point of not just thinking about AI ethics, but also AI governance. So how do we establish these frameworks for ensuring the responsible evaluation of these technologies that are increasingly being integrated in every aspect of business? And that's kind of one hat that I have on kind of policymaking, guidance, setting, so on and so forth, and then the other is leading our AI ethics research team within Sony AI a lot of the work of my team over the past several years has been looking at what are some of the fundamental gaps and barriers that practitioners face in terms of being able to develop responsible AI in practice. And one of the major areas there is lack of ethically sourced data, even for pretty basic things like being able to check for bias in models. There's not really great fairness evaluation data sets in many areas like human centric computer vision. So we've been doing a lot of work there and that's recently culminated in publication in Nature of our fair human centric image benchmark, also known as phoebe. And so we really hope that our work can help enable the broader community, both within Sony and outside, to be able to move towards more trustworthy and responsible AI development.
B
Yeah, I got pretty excited when I saw the PHOEBE work. I think it's pretty interesting. It's definitely a problem and I'm glad there are a few people working on this, give us some details like what exactly is involved in PHOEBE and what are the pieces and how do I use it tomorrow if I want to?
A
It's so interesting because when I first got into this AI ethics space around computer vision, I kind of assumed that a lot of my role would be helping people on how exactly to do fairness assessments. It's a pretty difficult area in terms of how do you measure fairness, what does fairness mean, what do you do when you have biases? But I realized actually the biggest initial barrier folks have is even just being able to evaluate for bias. And so theoretically what you want there is you want a data set that's been ethically sourced. So that means there's been appropriate consent and compensation and sourcing throughout the process. So everyone who's participated in that data collection process has been appropriately compensated and has consented to their data being used and has control over how that data is being used. And then also for bias evaluation particular, you want to have a very diverse, ideally globally diverse, data set because you don't want to be checking for bias. But all of your subjects have light skin, for example. In that case, you really wouldn't be able to tell whether your model performs well on darker skin tones. And so it was quite easy to say what would need to go into this. But when you actually look at the different data sets that are available, it turned out that the standards in the field were quite low. Computer vision is a field that played a major role in terms of the deep learning revolution. And part of that was these web scraped data sets that were massive and relatively cheap to source because they didn't involve any of these considerations of consent and compensation. And even though it's been many years now of this field progressing and the technology has just gotten better and better, that baseline of relying on problematically sourced data sets hasn't necessarily changed a lot. And so a lot of our work in the past several years has been really thinking deeply of how do we actually do this in practice? It's very easy to say, yeah, like, please collect data from people around the world, please ask them for consent, please pay them, and then please make a rigorous benchmark that can be used to check a lot of different types of AI models. But that's much more difficult to do than it is to say. And that's what a lot of our project has been about.
B
Yeah, I think that's always the case with this. I think no one comes out there and says, hey, I'd really like to be unethical and let's have our company have terrible governance. I mean, it's kind of like data cleaning in some way. I mean, everybody knows that they want clean data, but it's easy to say and hard to do. Let's start with measuring bias. How does that manifest itself in corporate America in terms of mistakes?
A
Yeah, great question. For the most part, I think there's a lot more awareness of bias than there is like, good practices of actually measuring and mitigating it. So when folks don't do this, what ends up happening is you have technologies that are released that maybe don't work well for certain subpopulations, and especially in areas like human centric computer vision that are used in surveillance contexts, law enforcement contexts, but also just very everyday contexts like unlocking your phone, making payments on your phone, or border control, like going through border entry as well. These are all different areas where if the technology doesn't work well for you, at minimum, it's an inconvenience. Maybe you have to look at your phone several times at different angles before it can recognize you. But this can also lead to much more problematic impacts in terms of anything from financial fraud to folks being wrongfully arrested. So this is an area where I think everyone knows it's a problem. Everyone wants to make it better. No one actually wants these technologies to perform poorly on folks. But if you don't have good ways to measure bias in the first place, then there's no way you're going to be able to then further try to mitigate that.
B
AI isn't just a technology shift, it's a leadership test. The real challenge isn't Adopting AI, but knowing how to apply responsibly, productively and at scale. Led by MIT faculty at the forefront of research and practice, MIT Sloan Executive Education offers a portfolio of AI focused courses in person and and online designed for leaders who are building AI literacy across their organizations and rethinking expertise, productivity and governance in an AI driven world. Learn more at executive mit.eduaismr. Yeah, actually, I like your gamut of possible badness. I wouldn't be surprised if we added up the societal cost of all the tiny problems and that there would actually be a big number in terms of, I mean, I don't know, just something silly like total lost productivity from looking at your phone three times versus one time when we have these measures. Actually, maybe I'll tell a story here. One of the things I do in class is I have a data set that has star bellied sneetches, if you remember star bellied sneetches from the Dr. Seuss. And we're not discriminating against stars or non stars. And what we do is correlate data in that data set and say that even if you ignore those stars, you end up with biased data. So it helped me with my class exercise. We end up making a bunch of biased models that inadvertently use stars on star bellies. What should students do from there? And if you're listening, students, don't copy this.
A
In terms of how PHOEBE can help in this situation, for example, there's two major components of phoebe. One is like the ethical source sourcing component, which I talked a little bit before of the consent and compensation. The other is making sure that you have a wide variety of diversity and labels for these different sorts of demographic attributes and other attributes as well. And so a few notable aspects of PHOEBE were that we did use self reported demographic information in an effort both to ensure our labels were more accurate and also that we weren't relying on third party annotators to guess, for example, you know, is this person from this ancestry or that ancestry, this gender, that gender? You know, it gets into a pretty dicey place pretty quickly. So self reported demographics were really key. And then we had also extensive annotations about the environment, other physical attributes of the person, the cameras being used, all sorts of things. And this really allows you to kind of slice and dice a bit more what are some of the relevant types of biases that you might be concerned about and also on a more granular level, diagnose what might be some of the underlying causes. So for example, when we talk to folks in computer vision, something like skin tone, it might be the skin tone itself. It might be issues of contrast, like with the background and stuff. So there's many different ways in which someone's, say, race or gender might manifest into visual artifacts that then can make the model perform better or worse. And that's really useful for folks then to be able to figure out how to improve their models. And so using your example, we want to know which creatures have the stars on their bellies, which don't, so we can see how they're being treated differently. But not just that dimension. We want to see a lot of other dimensions as well, and then be able to see, okay, where we see those differences in performance for the model. Why is that? Why is the model having such a hard time on stars versus not stars and then go from there in terms of further improvements?
B
Yeah, I like that because it seems like you're pointing out a couple of different things. That one, I think when we talk about these things, it tends to go down a path of, oh, get more data. And certainly there's nothing wrong with getting more data. I think we all would like more data. It's not always practical or possible to do that. And you actually mentioned a whole different category of the ethical issues around the ways that you get it, and that's true as well. But I mean, one thing you're pointing out there is that the modelers can actually make some technical improvements here. I don't want us to say, all right, well, we give up. We just take whatever data we have and just let the modelers figure it out. Because that never is a good solution. But you are offering some hope here. And I guess that's some of what Phoebe is about, is giving modelers a better data set to work with, to do that. Have you seen people pick up and start using this?
A
Yeah, we've seen a lot of great pickup. Like even just in the first couple weeks of Phoebe's release, There were over 60 different institutions downloading it. Folks from academic institutions, industry and government institutions. So it was great to see kind of the wide swath of folks who were interested in using Phoebe. And really our hope is that this can be an industry benchmark in terms of both lifting standards around responsible data collection in general. So that can be true regardless of whether the future data sets are collected for training versus evaluation purposes, fairness versus not fairness oriented purposes, and then also most immediately to be able to start checking their models, since fairness evaluations can be very empowering in that it opens up a lot of different avenues of how you could make model improvements. So like you mentioned, there's possibilities in terms of collecting more data. There's also possibilities in terms of trying to think about loss function of the model, what is it being optimized for, how to train it to optimize a bit more for different groups and thus have more balanced performance. There's also possibilities on the non technical side as well. For example, your model does not perform very well in certain lighting conditions. Then maybe you don't use that particular model for certain downstream uses. Or maybe you try to have mitigations like if it's on a device, maybe that device will have like a flashlight or something to illuminate before it does whatever task it's carrying out. So it's always good on the mitigation front to think about these models as being embodied in real life systems as well, because it's not just a matter of the model itself, it's everything around it that then impacts how ethical the use case is.
B
I like a lot about that because I think most of the world would love it if you would come out and say, all right, here's a magical answer to solve bias and to solve fairness and just use this benchmark. And what you're pointing out is a lot of different mitigation strategies that each one I'm guessing is imperfect. I mean, a flashlight on a device will probably help, but not completely solve. But you know, you add enough of these things together and then we're improving. It's not a oh, we've solved the problem, but it's made some steps towards that. And actually that makes me think about a phrase I was reading, something that you're talking about, data nihilism. So maybe talk briefly about what that means for listeners who didn't read that article and we can tie that to what you were just saying.
A
Yeah, great. When we first started Phoebe, it was really like an AI ethics problem. And so we kind of put together all these ethical desiderata, including consent and compensation. And at the time there wasn't as much discussion, I would say, in the mainstream about ethical sourcing of data For AI models Nowadays though I think that has grown a lot, especially with the whole Genai revolution. Now everyone's very conscious of the fact that their data and their content is probably being ingested by AI models somewhere somehow. And I think that can lead to a sense of data nihilism where folks feel like, okay, either we have these super powerful models which maybe at this point the cat's out of the bag there and we just have to give up on all of our data rights, we have to give up on control there, or we try to reverse things, but maybe that's not possible. And that's what I mean by data nihilism. This sort of feeling of this dichotomy that we're trapped in, where we really can't have the technology and also have any sort of control over our data. And I think what's really notable about Phoebe is we really sought to show that you could source data in a more ethical way. Obviously, it's more difficult, it's more expensive, but, you know, Phoebe was kind of the proof of concept that at least on some scale you're able to do this. And there's so many brilliant minds right now working in the AI space that if these practices are considered important, then we can figure out ways to try to scale this and change how these technologies are being developed and preserve more data rights in the process.
B
Yeah, because I get that, that whole idea of, well, it's out there, my images are out there, everything is out there anyway. But we can all make a bit of progress towards that. And if everybody does a little bit, then it can help. I like that overall message. How does that reflect back into Sony products?
A
Phoebe is being used as well across Sony. Even before the launch, we tested it out with a number of our business units that are developing computer vision technologies. And so it has become important part of enabling them to do fairness assessment more. And that way we are able to do that sort of diagnosis. See, are there any failure modes of these models and then work with them in terms of possible mitigation strategies as well. I think it's been great to see. See that, because again, like, the release of Phoebe publicly is to try to encourage this happening, you know, everywhere in the industry as well. And we hope this will put more attention on this issue and unblock folks to be able to see. Yeah, like there maybe are some things that could be improved in these models before they go out. And hopefully eventually that becomes more industry standard as well. Because one thing that's quite difficult now is, you know, there aren't without necessarily requirements always to do these sorts of assessments. It's really on individual business units or companies to decide that they care about this and want to assess for bias.
B
Yeah, that last point's a huge point because if you give someone a questionnaire, do you want to do the right thing? Everyone's going to tick the yes box. But in the end, we all are constrained by time and resources. And I Admit I'm lazy. If there's a shortcut, I'm going to say, well, why don't I go down that path? And it certainly has been a shortcut within the community to go for this data that's just sort of out there and lingering. But I think one thing you've done there is that moving towards making it easy. And we've had a previous guest, Ziad Obermeyer, he was working with medical data and it's very hard for normal people to get a bunch of medical data to build models. And his thinking was that Nightingale would collect a bunch of data and then let the solving of the data problem differ from the solving of the data sourcing problem. And you know, that's a lot of what you've done there. I mean, you mentioned Phoebe's largely around image information. What about things like voice, sound, other modalities? How should we be thinking about those?
A
We focused Phoebe on human centric computer vision precisely because it is one of the most sensitive areas. Like there's just so much biometric information, other personally identifiable information that is available in images. And also, you know, the IP rights around images are particularly important as well. And so, you know, I think in other modalities like voice and such, they have similar challenges and similar considerations as well around the consent of the individuals being recorded, for example. And in certain contexts there might be also copyright considerations as well. But yeah, I kind of see this as like, Phoebe sort of took a bit more of the superset of a lot of the ethical issues that might come up. And in most other modalities you'll see a bit more of a subset of that. And so there's obviously more research that would need to be done to kind of apply those. But some of the general concepts in terms of consent, compensation, privacy, IP diversity and fairness, all of those kind of at a high level would apply in these different contexts. So we hope that some parts of this can be recycled as well for folks that are collecting data in other modalities.
B
Yeah, I like that. I mean, at the principal level, in some sense it's just a stream of ones and zeros. So how we interpret that is largely up to us. Maybe I was self motivated because I've never been able to get through a phone tree successfully because nothing recognizes my beautiful southern accent. But that's a, that's a problem I face. I guess we'll have to wait for the next version of Phoebe to help out with that.
A
Yeah, it's a great example of how important diversity in these evaluation Data sets is. And also, yeah, you can imagine some of the challenges too in terms of how to identify folks with different accents and how to classify those and how to source from, I guess, not just around the world, but very specific regions as well in order to get that kind of diversity. And then all of the different languages as well layered on top. Definitely. Also a very rich area of research. And we hope that Phoebe helps inspire more of this as well. Because I think the reality of ethical data collection is a lot of it is quite difficult, challenging work that requires this level of operationalization and thinking about real world challenges too, that often, I guess researchers shy away from. It's not as glamorous as coming up with the new way for models to actually learn and such. And algorithmic improvements usually are much faster to develop and publish, or maybe not always faster to develop, but at least people get very excited about the technical improvements. That said, when we talk about responsible AI like what you mentioned, it's really hard to make progress in these areas if we don't actually think about real world. How do you collect data from different folks?
B
Actually we had a guest on a few weeks ago from Wendy's who was working with their drive thru restaurants and that's all voice information. And I remember Will was talking about how many different varieties of the ways that people speak there are, and they faced a lot of challenges in that process there. But going back to Phoebe, showing that example that it can be done and that it's a good example that this is a process that you can follow there. I want to switch here a little bit and talk about how you got there. You know, when I last talked to you, you were at the partnership on AI. Tell us a little bit about your career path and how you ended up at Sony and how you got interested in these things. Take us on that path.
A
I feel like it's sort of a winding path. Overall, my background is very interdisciplinary, so I started out more on the math and economics side thinking I was going to work in economic policy. And so that was my main target there. And then my first stint working at a tech company and developing my first ML model really was very illuminating to me because at the time, that's when I first got interested in these issues of algorithmic bias because I realized the model that I myself was developing was quite biased. The data was skewed towards certain geographies. What it was learning like made a lot of sense for people in those geographies, but not elsewhere necessarily. And part of it was also kind of my Personal experience as well. So the data I had access to was primarily from folks on the east coast and west coast. And this was job related data. And for myself, coming from Appalachia, I knew that it was quite different in terms of the economic opportunities that people had there. And any sort of model built on this data was not necessarily going to work the best for folks like those who I grew up with. And that concerned me a lot. I talked to a lot of folks in the company about this, but there was no such thing at the time as a field of algorithmic bias. This was like 12 years ago now. But it got me really interested in this area and how do we think about ethics and justice in the context of these new technologies that we're learning from massive amounts of data? And that kind of sort of steered my career more into the tech policy space and the AI and algorithmic fairness space in particular. And so after finishing all of my graduate programs, I have graduate degrees in law, statistics and economics. I ended up working a bit at a law firm and then going to the partnership on AI, since I wanted to really focus in on this issue of algorithmic bias. And there I started up a research lab focused on these areas. And pretty quickly these issues of data availability became quite salient. If you look at the algorithmic fairness literature, there's a lot of emphasis on metrics and how do you measure bias, how do you mitigate it? Because those are kind of the really fun technical problems. But at pai, we were doing these multi stakeholder convenings. We're consistently convening. After convening, all of the companies were saying, yeah, you know, like we actually are struggling with that first step of, you know, if we want to collect data that has demographic information in it, our privacy team will just shut that down. But then if we don't have any demographic data, it's really hard to do any sort of fairness assessment. So kind of going back to the star belly example, if we don't even know who has stars on their bellies and who doesn't, then how are you going to tell if your model works well for creatures with stars on their bellies? And so that became a major focus for me from a research perspective of what can we do in that context? And when I moved to Sony, I thought that this was a really great opportunity to not just report on this issue and kind of try to impact this more on a policy level, but instead to kind of create a solution to this problem. And that's what inspired Phoebe, since it's One thing to say, yeah, this is a problem people need to do something about. It's another thing to create a data set that fills the gap.
B
I like your bringing out the catch 22 because I think that's something that we don't always appreciate that, you know, we have to collect a bunch of seemingly invasive information in order to assess whether or not we're making decisions based on seemingly invasive information. And this is an inherent problem. Today on our branded interview segment, I'm talking again with Shayan Mohanty, chief data and AI officer at global technology consultancy ThoughtWorks. Shan, thanks for joining us.
C
Thank you again so much for having me.
B
It's been a few months since we last spoke and obviously a lot has changed. We considered a few topics to chat about today. Where do you want to start?
C
There's a lot to talk about. I think the world has changed over the last even just couple of months. Some topics that keep coming up are agentic AI governance and sort of like the agentic operating system, if you will. And to that point, how do you enforce certain types of patterns or behaviors in agents? How do you control them? We saw the advent of openclaw and the rampant agent ism, if you will. And like what happens if you have no avenues of control? I think control and like really fast iteration is very much what we're hearing at the moment. And the two things play off each other quite a bit.
B
They play off each other. They seem in many ways antithetical. How do you have both control and rapid iterations at the same time?
C
Yeah, so what's really interesting is that I think we re derove roughly the same structure of a classical operating system without having started there. So bear with me, this is kind of like a weird analogy history of operating systems. Yeah, yeah, yeah, yeah. So if you think about governance, typically governance is just this like really dirty word. Especially in like software engineering circles. You don't really want to be in the middle of a governance motion or a process. You would think about governance as being a bolt on after the fact. But what's really interesting is that all of our operating systems were designed with governance principles. As a first class citizen. So think about identity as a first class citizen. Certain things can be run, certain things can only be read, certain things can be written and only by certain people or by certain processes. We have the concept of name spacing for instance, or tenancy. We have the concept of isolation. These are not bolt ons. And because we have these primitives, we as say software engineers don't have to reinvent that on our host machines. We don't have to think about lower level security implementations or what is allowed to happen or not allowed to happen to a certain degree, so long as you have the right kind of cleanliness in the way that you build things. So the point here is we hear roughly the same things as we talk about agents. Like how do we contend with agent identity? Should they just inherit the identity of a user or should they have their own identity? How do we think about permissions? Should an agent be allowed to do X, Y and Z? How should we reason about the composability of all those policies and permissions? How do we think about memory access? How do we think about isolation and tenancy? Like these are all things that we are coming back to in a really interesting way. And like everyone is re implementing, reinventing like one slice of it at a time. But like an operating system is not composed of seven disparate pieces that all get glued together. Like actually they're designed with all of them in mind and it comes up together. So we're kind of seeing the advent of that at the moment. I think that's really interesting.
B
Definitely interesting. How have people reacted to these ideas so far?
C
Actually, we hosted a whole bunch of tech luminaries in Deer Valley, Utah recently to celebrate the 25th anniversary of the Agile Manifesto. And one of the topics of discussion that we had, it was all about the future of software engineering, but at the same time it was not about software engineering, if that makes sense. It's like software engineering as an analogy to everything else that is changing. And one of the things that kept coming up was the need for verification and validation in atomic units. So we talked about what would a programming language look like that was designed specifically for agentic use?
B
There you go. That's a great example. We designed everything in the past for humans and we're talking about a world where it's interfaces that we need to think about the machine as a consumer. Shan, thanks for sharing your perspectives here. Where can our listeners go?
C
To learn more, you can Visit us@thoughtworks.com where we write a whole lot about governance and compliance and all sorts of other very fun agentic things.
B
One of the things that we do here is I got a little quick segment here where I like to ask some rapid fire questions and just answer the first thing that comes to your mind. What's moving faster about artificial intelligence than you expected and what's moving slower?
A
That's a good question. I guess what's moving faster is just how Much it's being integrated into everyday life, especially in companies. I think it's been interesting to see just in the past couple of years how everyone's KPIs have suddenly become about AI adoption. And that would have been hard to predict maybe five to 10 years ago. And then in terms of what's moving slower in general kind of actual implementation of anything around, like AI ethics in model development. The field has been around now for several years and there's a huge literature at the moment, but it still often feels very divorced from actual practitioners implementing these different techniques. And you know, again, that's kind of part of the motivation of Phoebe to create that incentive there. But it's. Yeah, it has been something that I haven't seen as much of across the field.
B
I have to say I'm not a bit surprised that that was your answer about slower. Given your background, what's the worst use of AI? How are people using this technology poorly?
A
Say contexts where there isn't sufficient human expertise or oversight, but there are very important decisions being made. I say it kind of abstractly in a way because I think oftentimes folks will point to specific types of use cases like high risk use like HR or healthcare or criminal justice. And I think it's a little bit more use case specific than that. You can theoretically have a good use in those domains if you have the proper human oversight, the proper training both for the individuals and proper oversight of the technology, but where it's just kind of being used in a very autonomous fashion and there's either no one providing oversight or the individuals providing oversight don't have the knowledge or expertise to do so. That's where I would be most worried.
B
How do you personally use these tools?
A
Not as much as maybe you'd expect. I feel like I spend a lot of time auditing them. So probably that's the most common use case for me.
B
I'm not sure that counts as a normal use case.
A
Yeah, time wise, probably the most common is me trying to audit them and see where they might have failure modes. But I also do use it in my daily life to check on the health of my plants, for example. So yeah, I will say they are quite useful in contexts where maybe you're a novice and you want some starting point for diagnosing things.
B
One question we used to ask people was what the first career you wanted? And given the variety of your background, let me just ask, what did you want to be when you grew up?
A
When I was a little kid, I wanted to be an artist. And so in a way, I feel like my current work is starting to nudge back into that direction with the way AI development has gone and a lot of these AI ethics issues now, actually coinciding a lot with artist rights, which was not at all the case when I first entered this field. But that is kind of nice in that I have always been really interested in the field of art as well.
B
I was not expecting that. All right, well, this has been fascinating. I think that the kinds of efforts you're making, which are about making it easier for people to do the right thing. Phoebe is one example. But. But as long as societally we make it easier for people to do the right thing, then people will be more likely to do the right thing. And so I'm thrilled at the amount of effort and work that you all put into doing this. Thanks for taking the time to talk with us today.
A
Yeah, thank you so much for having me. And yeah, hopefully our work can help inspire others also to do the same.
B
Thanks for listening today. I hope you'll download Phoebe and see if their data can help. On our next episode. I'm joined by Taylor Stockton, my former student and Chief Innovation officer at the U.S. department of Labor. Please join us.
A
Thanks for listening to me, myself and AI. Our show is able to continue in large part due to listener support. Your streams and downloads make a big difference. If you have a moment, please consider leaving us an Apple podcast review or a rating on Spotify and share our show with others you think might find it interesting and helpful. Helpful.
Host: Sam Ransbotham (MIT Sloan Management Review)
Guest: Alice Xiang (Global Head of AI Governance and Lead Research Scientist, AI Ethics at Sony)
Date: March 10, 2026
This episode explores the critical challenges of data fairness in artificial intelligence, featuring a deep dive with Alice Xiang from Sony. The conversation focuses on the motivation, development, and impact of Sony's publicly available fairness benchmark for human-centric computer vision—PHOEBE—and broadens into discussion about ethical data sourcing, bias evaluation, and making responsible AI practices a business norm, not just an aspiration.
[01:22–03:57]
“As a technology company, we wanted to ensure that this new and emerging technology was being used responsibly across our business units… We really hope that our work can help enable the broader community … to move towards more trustworthy and responsible AI development.” – Alice Xiang [02:00–03:57]
[03:57–06:31]
“It was quite easy to say what would need to go into this. But when you actually look at the different datasets available, it turned out the standards in the field were quite low.” – Alice Xiang [05:08]
[06:31–12:46]
“No one actually wants these technologies to perform poorly on folks. But if you don’t have good ways to measure bias… there’s no way you’re going to be able to then further try to mitigate that.” – Alice Xiang [07:50]
[12:46–17:13]
“We hope this will put more attention on this issue and unblock folks to be able to see… there maybe are some things that could be improved in these models before they go out.” – Alice Xiang [17:00]
“This sort of feeling … that we really can’t have the technology and also have any sort of control over our data. … PHOEBE was kind of the proof of concept that at least on some scale you’re able to do this.” – Alice Xiang [15:56]
[19:28–22:24]
“Some parts of this can be recycled … for folks collecting data in other modalities.” – Alice Xiang [20:32]
[22:24–23:03]
“A lot of it … requires this level of operationalization … that researchers shy away from. … Algorithmic improvements … people get very excited. … But when we talk about responsible AI … it’s really hard to make progress in these areas if we don’t actually think about real world.” – Alice Xiang [21:44–22:24]
[23:03–26:25]
“If we want to collect data that has demographic information … our privacy team will just shut that down. But then … it’s really hard to do any sort of fairness assessment.” – Alice Xiang [24:49]
On the Status Quo Problem:
“Computer vision is a field that played a major role in terms of the deep learning revolution. … But that baseline of relying on problematically sourced data sets hasn’t necessarily changed a lot.” – Alice Xiang [05:22]
On the Real-World Stakes of Bias:
“At minimum, it’s an inconvenience… But this can also lead to much more problematic impacts, anything from financial fraud to folks being wrongfully arrested.” – Alice Xiang [07:17]
On Incentivizing Fairness in Industry:
“Without necessarily requirements always to do these sorts of assessments, it's really on individual business units or companies to decide that they care about this and want to assess for bias.” – Alice Xiang [18:00]
[31:20–34:41]
Through the PHOEBE dataset, Sony (led by Alice Xiang’s team) pushes the industry forward by providing a practical, ethically sourced benchmark for fairness evaluation—proving that responsible data collection is possible and impactful. PHOEBE’s immediate uptake reflects a hunger for these resources. The episode closes with an emphasis on making it easier for companies and individuals to “do the right thing” in AI, pushing for these practices to become industry standards rather than outlier initiatives.
“As long as societally we make it easier for people to do the right thing, then people will be more likely to do the right thing.” – Sam Ransbotham [34:41]
For more information on PHOEBE, listeners are encouraged to visit Sony AI or MIT Sloan Management Review resources.