
We cover OpenAI's latest video-generation model Sora 2 and concrete harms and potential risks from deepfakes.
Loading summary
A
Foreign.
B
Welcome back to another episode of the AI Policy podcast. This week we're going to be talking all about deepfakes and the release of OpenAI Sora 2. So I'm Sadie McCullough and I'm joined, as always, by Greg Allen.
A
Great to be back, Sadie.
B
So let's just jump right in. On September 30, OpenAI released Sora 2. And I remember on this podcast in February, February 2024, talking about the original SORA release. So an interesting full circle moment. But sora, as many of you probably know, is a state of the art video generation model, resulting in tons of AI generated content on the Internet and social media. In addition to some of the very odd videos of Sam Altman that were released this weekend, in addition to many others, Sora too hasn't has reinvigorated the debate around risks of deepfakes. So we're going to dive into this today. Let's start with Sora 2. Can you walk us through some of the details of its release?
A
Sure. So, to begin, deepfakes have been with us for a long time, but when they first came out, you did require some kind of expertise in order to generate these things. You had to download various programs, sort of get them to work, upload your own training images, et cetera. SORA stands out from the trend that we've observed over the past seven years or so, both in terms of the ease and the quality of the video that it's generating. And it's, you know, we're calling it a state of the art. It's broadly comparable to some other tools that Google and others have put out there. But as soon as something goes live on ChatGPT and OpenAI, you're exposing it to that user base, which they claim is now up to 800 million individuals weekly. So that is just a huge population of folks who are being exposed to the opportunity to generate these videos and the ability to consume these videos. It also comes with some special capabilities that are designed to be, you know, fun and useful to users, including this cameo feature, which is meant to be a safeguard against deepfakes that replicate likeness without a person's consent. And that's where you scan the user's face and perform a liveness check, providing data to generate a video of the user, but also to authenticate their consent for other folks to use that likeness on the app. So hypothetically, you know, if I was to go in and scan my face and basically say, I'm the real Greg Allen, anybody who's trying to generate Stuff on SORA using this face, ask me for permission before you let it go live. Now this has already been bypassed, ironically using deepfakes by organizations such as Reality Defender. But I do think it points to the, the sort of arms race between the people who are generating superior AI enabled generation technology, deepfake creation technology, and those who are creating superior authentication technology. And I think we've been dealing with this problem really for hundreds of years. But I would say after the invention of the photograph, the authenticators had a significant advantage and edge over the forgers. It's not that photographic forgery was impossible. Joseph Stalin, the authoritarian dictator of the Soviet Union, employed an entire team whose only job was to create fake photos. But the point was that the photograph was such a great technology for capturing truth so effortlessly, easily available, and the forgery technologies were so expensive and complicated that truth was had a much, much easier time than did quote unquote, like unreality or deception. And so SORA is sort of the next evolution in this generation capability becoming so much more capable. If you haven't played around with it yourself, there is kind of a visceral delight and also a stomach queasiness as you can see yourself. You know, one of the demo videos is an individual jumping out of a parachute, sorry, jumping out of a plane with a pizza for a parachute. And it looks pretty dang realistic with the caveat that. Right, that's not something that ever happened and wouldn't probably ever happen. And the other thing that's interesting about sora, in addition to it sort of coming out with new categories of authentication technologies, is also its effort to, you know, create sort of a social media esque feed of this type of content where, you know, you can just consume these AI generated videos for as long as you want because the whole universe of people is creating huge amounts of them.
B
So I can see how this would, you know, cause be a cause for concern. And a lot of these concerns that we're hearing about Sora aren't actually new. Experts and policymakers have been discussing these risks of deepfakes for years. So what are some of the most concrete ways that deepfakes have caused harm to date that you can talk about?
A
Yeah, so I mean, I'm among the people who've been discussing it for years. I wrote a report in July of 2017 that was talking about deepfakes as a challenge for a political disinformation perspective, from a economic perspective. And I think those, those hypotheses have come true. Fortunately, they have not come true. In the worst case scenario predictions. But we've had a number of pretty disturbing incidents that show that the safeguards that we have right now, both cult, culturally and from a policy perspective, are not putting us in an especially safe situation. So let's talk about the various harms that exist today as a result of where we are in deepfake technology. I think the first thing is that it's just kind of a leveling up of pre existing categories of cybercrime. So we've been dealing for decades with fraudulent emails, people sending you fake text, purporting to be someone or something that they are not in an effort to scam you out of their money. So deepfakes didn't create that problem. That problem existed in the email era, it even existed in the snail mail era. But what deepfakes have allowed is for that flavor of crime to emerge into new domains of audio and video. And I think a especially remarkable case of this happened in February of 2020-24 when CNN reported that, quote, a finance worker at a multinational firm was tricked into paying out $25 million to fraudsters using deepfake technology to pose as the company's chief financial officer in a video conference call. The elaborate scam saw the worker duped into attending a video call with what he thought were several other members of the staff, but all of whom were in fact deepfake creations. So imagine getting on a conference call with a bunch of your coworkers and they look like your coworkers and they sound like your coworkers, but surprise, they're actually people, real time impersonating your coworkers. And that scam was credible enough, effective enough, that this guy lost $25 million of his company's money. That's pretty alarming. And even in July 2019, I remember the cybersecurity firm Symantec said that they had multiple clients who had lost millions of dollars through these sort of. At the time it was audio only phone calls. But basically they're analogous to the email phenomenon of spear phishing. But they were sort of upgraded by deepfake technology to the audio domain of phone calls, et cetera. And so I think we just have to realize that we're moving from a world of seeing is believing even in real time, you know, digital communications to actually the. The sources of trust are going to have to come from other things than an individual's likeness in face and voice. It's going to have to come from like cryptography. Right? The same way that, you know, an email Sadie is from me is because it comes from CSIS server and it's in cryptography, it's cryptographically secured and there's a lot of like fancy math that is checking to ensure that it's only allowed to come from my email address when it's actually coming from me. Those type of safeguards are now going to have to be built into companies all around the world, government organizations around the world, into their habits when they're thinking about these kind of conference call things. There's just a lot of that kind of muscle memory that needs to be built up. So that's on the crime side of the equation. There's also the political interference, parents side of the equation. So starting with just impersonating political figures. In July, the Washington Post reported that, quote, an imposter pretending to be Secretary of State Marco Rubio contacted foreign ministers, a US Governor and a member of Congress by sending them voice and text messages that mimic Rubio's voice and writing style using artificial intelligence powered software, according to a senior U.S. official and a State Department cable obtained by the Washington Post. So we're now already in the stage where either foreign intelligence services or pranksters are meaningfully able to use these technologies as part of just interference in government operations now for criminal intent. Now, sometimes this has happened in the past just with like voice impersonators. I remember this happened related to, I think it was vice presidential candidate Sarah Palin. Somebody successfully impersonated her voice and was able to do something. So it's not that this problem is entirely unprecedented. It's just available to a much broader range of actors and the enabling degree of quality is much, much higher. And you come back again to, you know, how do you know that it's authentic? This is something that organizations have to build in new procedures for authentication that are probably beyond what they're normally used to. And that's going to be a pain in the butt. But like, too bad, like that's, that's what we all have to get used to. This is the world that we live in. So that's one favor, one flavor of government interference. The other one is political interference, you know, trying to create a scandal, trying to drive voter behavior. And we have some pretty noteworthy examples of all of this. And I want to start with something that's quite old or actually, no, I'll come back to that later. But in January 2024, there was a series of robocalls impersonating then president Joe Biden that was encouraging voters to skip New Hampshire's primary elections. A consultant working for a Republican campaign initiated these calls. Now, the individuals in question were ultimately caught. I think they had to pay something like a $6 million fine. And the Federal Elections Commission published additional regulations clarifying, hey, this was already illegal because of the deception involved, but now we're making it sort of secondarily illegal for the use of AI specifically. And the point here being that, like, there is a time sensitive aspect to all of this. There's an aphorism that is often erroneously attributed to Mark Twain that says a lie can travel around the world while the truth is still getting its pants on. And I think there's a lot of wisdom in that quote, because it talks about the sort of time sensitive nature of all of this. You know, that phone call, that robocall impersonating Joe Biden went out to tens of thousands, hundreds of thousands of individuals, and they heard that. And there was a long period of time between them hearing that and very close to election day, and them finding out it was fake, which may be after election day. And these kinds of time sensitive things have not yet hit the, like, absolute disaster stage. But I want to talk about, like, one scenario where I think they plausibly could. So in 2022, after Russia's invasion of Ukraine, there was a deep fake of President Zelensky of Ukraine surrendering and instructing his fellow citizens and military forces to also surrender and basically just let Russia take over Ukraine. Now, this was used with deepfake technology that was available at the time, which it turns out was pretty bad. And so it was, you know, recognizable even to the untrained eye, that there was sort of something wrong with this video. But, you know, rewind history and imagine that there's something that's like Sora or better than Sora that's being used to generate that video. And I think a lot of people would have a hard time understanding that what they're looking at is fake, for sure. And certain moments in history have depended upon really small windows of time and critical injections of information at specific moments. You know, one example that looms large in my own thinking is in 2000, and I forget if it was 15 or 16, but there was an attempted coup, attempted military coup in Turkey. And actually the sort of hinge moment in preventing the coup from succeeding was when the President of Turkey, Erdogan, he did a live broadcast. And the broadcast was a camera aimed at an iPhone. And the iPhone was a FaceTime call with President Erdogan. And that is when he rallied his population to basically say, we can sustain this coup. I'm still alive, I'm still safe, and I need all of you to support me in resisting. And the key there is that they had the sort of critical information, Erdogan saying that he was safe and that he wanted everybody to resist, that he was going to keep fighting. And also the sort of crit moment, you know, the window of hours when the population was looking for a reassuring message. And then also access to distribution mechanisms, which in this case was a broadcaster. You can imagine, you know, a parallel universe in which, let's just say it's Russia, right? Let's say Russia hacks a major broadcaster and then uses a deepfake and, you know, it's, oh, why is it low quality? It's because we're, you know, videoing a FaceTime call with the President. So in that alternative universe, right, if, if Russia had hacked Ukrainian broadcasters, if they had used modern deep fake technologies, if they had released the, the surrender video at a critical window of influencing public opinion, you can imagine that that a very sophisticated version of this type attack of attack, as opposed to the clumsy one that Russia actually tried to pull off with respect to Zelensky, might actually work, especially if, you know, Russian intelligence services knew that Zelenskyy was in a bunker with, you know, temporary loss of access to communications or something like that. So this, all of this is just to say that if you look at the sort of string of data points that we have right now, and a lot of them are just anecdotes, but if you sort of look at the string of anecdotes we have right now, I think it is enough to say that there is like no reason to draw comfort from the fact that disaster has not yet struck. To the extent that disaster has not yet struck, I think a lot of that has to do with luck. A sufficiently sophisticated actor who is thinking about the time, place, message distribution channel that they want to inject still has a very powerful window of opportunity to make some bad stuff happen. And that's where I sort of think we are today. Sora, to its credit, right, includes not only the cameo feature that we mentioned, which is about, mostly about individuals protecting their own likeness, but also, you know, features whereby there's a watermark on the video that dances around. So it's not effortless to get rid of that watermark, but it does require some time and effort. And then there's metadata associated with the video where it sort of discloses, I am AI generated video to software that knows to look for that. None of that makes it impossible to use SORA for bad purposes, but it is about raising the barriers to entry of bad behavior, Basically saying, you know, I don't want 13 year olds able to misuse SORA. If Russian intelligence services, you know, spend millions of dollars and months trying to figure out how to use this for bad, okay, that's awful. But at least we made them work hard. And I think that's kind of where we're headed in terms of the arms race between authenticators of media and forgers of media, unfortunately.
B
So those are some pretty concerning and compelling examples that you gave. And I can't help but think CSIS is a national security think tank. And like, can you walk us through, like, what you think some of the biggest risks are when it comes to US national security?
A
Sure. I mean, just take the cybercrime example I gave and the media, sorry, the Marco Rubio example I gave, and smash them together. If you have individuals who are in charge of critical national security decisions, whether at the staff level or at the senior principal level, and you can imagine a sort of sophisticated injection of the right content at the right time to deceive those individuals, you can imagine them making really bad decisions. Like, I'm just making this up. And this is obviously a worst case scenario, you know, just for illustrative purposes. But imagine somebody who is an officer who does like missile warnings type behavior and they receive a conference call request from somebody who they know and trust, basically saying like, hey, there's a satellite launch that's going to fly an unusual trajectory, but it's totally legit and you don't need to worry about it. And like, no, it actually wasn't a satellite launch, it was a missile launch. And that person has now been strategically deceived to sort of let their guard down.
B
That's a scary thought.
A
Yeah, I mean, again, I'm sort of giving you the worst case scenario, but that's one type of thing. And fortunately, you know, classified communications already take place in the sort of heavily encrypted type ecosystems. But a lot of world leaders are interacting on commercial platforms. There was a very famous intergovernmental meeting during the COVID era where they were just using vanilla Zoom and somehow somebody had let the URL to get into that Zoom chat float out on the Internet. And somebody looked in a screen grab and saw that that was the URL. And so random individuals were breaking into this Zoom chat. That was, you know, the meeting between a bunch of different heads of state. And so the point is, commercial platforms are used for important national security communications, not backstopped by robust encryption or identity authentication procedures. And I think all of that is just not safe in the information ecosystem that we're, you know, running into. And like I said, you know, we've had enough close calls to say that we need more defenses, more widely deployed. And not all, but definitely not all of this is on the AI companies, right? OpenAI is trying to build in additional defenses. Google is trying to build in additional defenses into the content generations themselves. That's necessary, but it's not sufficient. Other platforms need to be doing this. One of the things that I would be very interested to see is if social media platforms like Facebook, like X, like LinkedIn, if they would build in authentication procedures so that when you're looking at something on a social media feed, you could just click a button and it would run an analysis and say this is likely to be AI generated. Because even though we're like to the world in which the untrained eye is certainly not enough to detect if something's a deepfake in most instances and oftentimes even like the trained eye or the trained ear is not enough. Like, I have not run this experiment on myself, but I'm familiar with journalists who have listened to deep fakes of their own voice and been fooled as to whether or not they actually said that or they didn't say that. And so the point is here, really it is digital authentication technologies are sort of the last defense in determining whether stuff is real or false. And when I was in the Department of Defense, the DARPA was funding a lot of research into making these techniques better and better because obviously U.S. intelligence community, the U.S. defense community really needs to know if some piece of intelligence that they obtain is real or fake.
B
So I don't love hearing about the worst case scenarios anymore. So I guess, you know, let's switch over to political implications, which probably won't be much better, but we can give it a try. So obviously, you know, a big topic, especially around election cycles, is the concerns about deep fakes undermining the political system. So could you walk us through what some of your concerns are in this regard?
A
Yeah, and I already touched on some of these just by implication when talking about the Biden example. But the point here is really around democratic elections and the ability of citizens and voters to know what is truthful. And again, you know, SORA is not creating this problem, but SORA and capabilities like SORA are putting high quality generation technologies into the hands of a much broader degree of actors. And there's already been attempts to use that for political manipulation. And sometimes that is for, you know, very malicious purposes, you know, trying to smear someone by saying that they said something offensive that they never said. But sometimes it's, you know, satire, which is protected by the First Amendment. So in the same way that, that, you know, you can't send someone to jail for drawing a political cartoon that a politician doesn't like. Well, what if AI draws a quote unquote political cartoon, right? And it's a ultra realistic cartoon, but it's AI deepfake technology. That's kind of where we're struggling and politicians are now embracing the use of deepfake generation technology. Not least of which is Donald Trump, the President of the United States. He is now using AI generated stuff and using it. I think he would say this for humor, right? He's disseminating these images because he thinks they're funny, because they sort of viscerally make a point that he wants to make about his own strength, the silliness or weakness of his opponents. And so you can say that this is all protected, you know, free speech, not just because he's the president, but because satire is protected. But it does sort of leave you in this, this world where, okay, how are people supposed to know what's real? How are people supposed to know what's false? It's not like, you know, Donald Trump is saying AI generated, AI generated, you know, every single time, the way some journalistic outlets are. And I think, you know, the, the Senate Republicans recently did something kind of interesting. So on October 17, they released an attack ad that was published on X and that was using a deep fake of Senator Chuck Schumer, the minority leader. And what's interesting is his quote was real, but the video was fake. And that's because they had perceived that there was this benefit to making that quote more visceral. And because they said, you know, this thing that Chuck Schumer has said is going to be bad for him politically, good for us politically. If more people know that he said that, how do we make it as visceral as possible? That he said that, let's put it in a video, and then that's fake. The video is fake. They even disclose, you know, this is generated by AI on it. And the question is, you know, what are the lines of ethical behavior? What are the lines of legal behavior? I think we're kind of in uncharted territory here, and the norms are still being established as we go.
B
So they made that video just for promotional purposes.
A
Yes. Score punish points. Yeah.
B
All right, well, let's kind of transition here a little bit and talk more about the technology and history of, like, how we've gotten to Sora. So can you talk about, like, a little bit about what are the underlying technologies that Deepfakes use and how it's, like, improved over the years?
A
Sure. I think a useful place to start is just computer generated imagery throughout history. So what I think is a very helpful anchoring point is a movie that came out while I was in college. It is the 2007 film Beowulf starring Angelina Jolie and Anthony Hopkins. Now, this movie, which is, you know, an adaptation of the ancient epic tale of Beowulf, I think the movie stinks. I didn't like it at all, but it was entirely CGI shot with motion capture technology. And what's so interesting is that in 2007, if you had. Which as Beowulf did, 400 to 500 people working for three years with a budget of $150 million, that's crazy. And the explicit cooperation of the actors who you're trying to depict. What you got was Beowulf. So it's a sort of a landmark in terms of what was the state of the art at a given point in time for computer generated imagery. And I remember, you know, the marketing hype associated with that movie when it came out was, you know, it's so realistic, you won't even be able to tell it's cgi. Well, if you look at a picture of this movie, like, any doofus can tell that this is obviously CGI. So that's where we were in 2007. $150 million, 400 to 500 people working for three years cannot create a realistic looking video that even an average person can obviously tell is obviously fake. That was the cost and complexity associated with video forgery at that point in time. And now that was using traditional CGI techniques. So not. Not primarily drawing from machine learning technologies. And the real breakthrough came in 2014 with the creation of a technology called generative adversarial networks. This is one of the OG types of generative AI. It's not using large language models, it's not using diffusion models, the techniques that we use today. But it was unambiguously a breakthrough in the creation of synthetic content and especially in the creation of face content. So if you look at the original paper that published Generative adversarial Networks, it is such a humble, you know, technology in terms of, you know, what these generated faces look like, these are 50 pixels by 50 pixels, black and white images that kind of, you know, definitely look like a face, but not a.
B
Great face, more like a blob.
A
Yeah, kind of blobby. And that's where we were in 2014. This was, I'm not kidding, this was a breakthrough right at that point in time. A year later in 2015, miraculously, you know, we had added color. Like, that was the progress that we made by 2015, using generative adversarial networks. By, you know, 2017, 2018, 2019, there was websites being created called this person does not exist. And if you would just click, you know, that you would see a face that, that definitely looks like a person, definitely looks real, and it's not a person. That person doesn't exist, has never existed. And I remember there being a lot of visceral emotions associated with, like, how can it be that if an image looks that good and it's fake? And so the point being, and by the time you get to 2022 and you start adding new technologies like diffusion models, et cetera, they're really, really good. And every time somebody has identified some kind of limitation in the AI technology. So, for example, MidJourney, which is an AI image generator that was very popular in 2023, it's still very popular today. People always criticized it because the hands looked very fake, that the thing had a hard time depicting realistic looking hands and fingers. All of these problems have been solved, right? Like, whatever thing you're holding to as, oh, don't worry, you know, it looks fake this way. So you just need to always remember to check hands. Like, the people who work on creating AI technologies are smart. There's a lot of them, they work hard. Whatever you're drawing comfort from today is not going to be a source of comfort in the future. I mean, one that I remember thinking a lot about and still think is kind of an interesting phenomenon is right now, you know, my blood is pumping through my veins. And what that is creating is a slight reddish tint on my skin with every beat of my heartbeat. And now that's imperceptible to the human eye. But if you set up the camera, right, and you run an algorithm correctly to analyze the image, you can detect this heartbeat in a person. So theoretically, anytime anybody is being recorded by a halfway decent video camera, if you have fancy software, you can know what that person's pulse rate was at any point in time. So what's interesting is there's actually no need for, like Hollywood to have the Capability to impersonate this pulse rate phenomenon, because the watchers of the video are not capable of detecting the pulse rate, and they're deriving no enjoyment from seeing the pulse rate. It takes fancy software to detect the pulse rate. But here's the thing. The latest generation of AI video generators can generate a synthetic pulse rate to be detected. And so this is kind of where I come down, as I think about potential policy interventions, I think it would be very interesting to basically make it illegal to generate video with a fake pulse rate.
B
How do they make pulse rates?
A
So, as I said, it's an imperceptible, at least to the human eye, reddening of the skin with every heartbeat because your blood pressure is high as the heart is squeezing and your blood pressure is low as the heart is relaxing. And that actually shows up in your face, in your skin especially. The paler you are, the easier it is to do this. But I think it's doable with any skin tone or ethnicity. And so I think it would be very interesting to make it illegal to generate video that has a false pulse rate, because that would basically separate who is generating synthetic media for entertainment value, like Hollywood, which shouldn't care about the synthetic pulse rate, and who is generating synthetic media for deceptive value, like, you know, people who are up to no good, essentially. Because I can think of no legitimate reason why someone would want to have a synthetic pulse rate. I mean, outside of, like, research labs that are, you know, trying to understand the nature of reality. So that's, that's something that I think about is could we, could we continue raising the barriers to the effortless creation of bad content or malicious intent? Because the point being that Beowulf cost $150 million, took three years, 400 to 500 experts. That's kind of what you want bad behavior to do. You want, you want bad behavior to be expensive and complicated, and you want harmless behavior to be cheap and easy. So can we find policy interventions that raise the barriers to harmful behavior without raising the barriers to innocuous behavior? So trying to, you know, watermark AI generated videos is one example of trying to raise the barriers to malicious behavior while keeping the barriers to harmless behavior, you know, low. But it's, like, not always fun to, like, look at watermarks dancing across your video. Maybe you, you wish the video was good. So I'm intrigued by stuff like this pulse rate phenomenon that maybe would make it easier and less, less unenjoyable, right, to have these sort of detection methods embedded in the content itself. And I'M not saying this is going to solve every kind of problem, but I think it's an interesting hypothesis. So now we're at the SORA stage, which is basically just as widely distributed, very high quality capabilities as you can possibly imagine. And we were already dealing with the negative consequences of deepfakes first in pornography and the non consensual generation of pornography. But now these kinds of capabilities are just so much better and they're so much more widely available. And that's kind of the new landscape that we have to navigate.
B
So let's keep moving in that direction. What do you, what, what is already happening to combat deep fakes and like, what do you think should be happening?
A
Yeah, so I think there's multiple things going on simultaneously. So the first is the, the. The generators embedding safeguards in their content, sometimes doing so at the explicit request of regulators. So, for example, the European Union in the EU AI act directly addresses deepfakes. And here's recital 134 of the EU AI act, quote, Deployers who use an AI system to generate or manipulate image, audio or video content that appreciably resembles existing persons, objects, places, entities or events and would falsely appear to a person to be authentic or truthful parentheses. Deepfakes should also clearly and distinguishably disclose that the content has been artificially created or manipulated by labeling the AI output accordingly and disclosing its artificial origin. So they're putting the regulatory burden there on deployers, which might be a model developer, might not be a model developer. Kind of depends on what the business model is of that model developer. But that is pretty interesting. And they're not the only ones who are taking steps here. There is China, which has a deep synthesis regulation that went into effect in January 2023, which also required watermarking of synthetically generated content. In California, there was a law that was passed in 2024 called the Defending Democracy from Deepfake Deception act of 2024, and that was targeting social media platforms. It was requiring large online platforms to, quote, block the posting of materially deceptive content related to elections in California during specified periods before and after the election, end quote. Notably, that law was struck down as basically violating free speech. Recall what I said before about, you know, protections on satire and parody and just political speech in general. And so where California has retreated after that original law was overturned is to requiring, you know, labeling, requiring labeling of AI generated content as opposed to banning the creation or dissemination of AI generated content. And that appears to be a more and more common approach that's being taken in this regard.
B
So, Greg, can you talk a bit about what journalists are doing to combat deep fakes and how they're, you know, differentiating between what's real and what's not.
A
Yeah. So journalists obviously have a very special interest when it comes to deepfakes. They, assuming they're a truth seeking journalistic organization, they don't want to report anything that's false. But they also rely on, you know, anonymous whistleblower type material. Right. If you send a tape to, say, the New York Times of me, you know, confessing to a crime, the New York Times is probably gonna wanna investigate that. And if you REWIND, you know, 20 years, that investigation might be pretty short. Right. I mean, if the, if there's like a videotape of me committing a crime in a courtroom 20 years ago, that's like not even evidence.
B
Yeah. They weren't gonna ask many questions, right.
A
Like. Yeah. And you could receive that anonymously. Right. So let's just say that I'm on trial for murder. If the judge receives an anonymous letter that says, Greg Allen definitely committed that murder, you know, put him in jail, throw away the key. Well, that anonymous letter counts for almost nothing, right? Like literally nothing. Maybe if they, you know, if the anonymous letter has like specific details of like look under the filing cabinet or something. But, but in general, general, the point is it's easy to create a letter and so it's also worthless, Right. Unless there's some kind of chain of custody associated with that document or a difficult to forge signature, et cetera, et cetera. Video for a very long time was in a different category, you know, 20 years ago. The anonymous letter doesn't do anything. The anonymous video, that person is probably going to jail, right? The video is probably going to be enough to convict that person, especially, you know, if it's high quality, especially if you get a good look at their face, et cetera, et cetera. And so that is, that is the world in which, you know, the generation technology and the authentication technology are vastly superior to the forgery technology. And we're probably getting to a world where that's not necessarily the case anymore. So this is a real challenge for criminal trials. This is a real challenge for journalists. And one of the ways that journalists are trying to deal with this is number one, they're trying to create improved technologies for the authentication of media and they're trying to disseminate those technologies more widely. They're trying to create tools so like Agency France Press, which is a big news organization, Based out of France, they have created, like, browser plugins that include some digital forensics capabilities. And they're just making that available to journalists around the world because they want to make it easier for them to at least detect, like the low quality forgeries that are sloppy, for example. So that's one thing they're trying to do to detect the forged content. The second thing they're trying to do is to create, you know, a provable record associated with the true content. And this gets to the idea of like the liar's dividend, which is basically now anyone anywhere can sort of say, oh, that's not me committing that crime or confessing to that scandal. That's AI generated. So how do you really put high quality authentication associated with something? Well, at this stage, they're again relying upon cryptography. So if you think about Bitcoin, it really matters that nobody can spend the same bitcoin twice, right? If I have a bitcoin and I spend it, I buy something from you, Sadie. Well, now you have that bitcoin and I don't have it anymore. But it's also just like a file on my computer, so I could copy it infinitely, right? In principle. And the way that you prevent that is with blockchain encryption technology. And that's what prevents anybody from, you know, spending the same bitcoin twice, because it authenticates a moment in time who has the bitcoin and how a transaction modifies it so that, you know, one person has it and another person doesn't have it. Well, you can, you can use similar versions of that kind of encryption technology to establish, for example, that a given version of a photo existed at a given moment in time and came from a specific author. So if, like, for example, at a big protest that tragically turns into a, you know, war crimes massacre type situation, you want your photographs, if you're a, if you're a journalist, to be sort of known as true photographs, not later AI generated images that are false. And so now camera companies like Leica, like Nikon, like Sona, Sony, they're building these cryptographic signature capabilities into their next generation of cameras and making this available to journalists so they can sort of say, in a robust, cryptographically secured way, I was there, I took this image at that moment, you know, this one is true. If something else happens later that, you know, modifies my image, that's the fake one, this is the true one. So I'm supportive of all of these efforts. I think the sad truth, though, is it does come down to kind of argument from authority. Whereas we previously had a world where seeing is closer to believing. And now we're moving to a like, do I trust that journalistic organization or not? And to some extent, you know, humanity has been wrestling with this for a long time. As I mentioned at the beginning of this podcast, Joseph Stalin had an entire department whose, you know, job was to, like, erase people from photos after he sentenced them to death. And so like pretending that they never existed and were disappeared from history. So it's not that we've, we've never faced these challenges, but they're just, they're just going to be much more pervasive, much more omnipresent. And so it sort of comes down to like, do you trust that newspaper? Do you trust that news organization? And that's going to pose a challenge in a democracy where it's really, really helpful to have the same set of facts. It's really, really helpful to have authoritative sources of evidence that can cut through liars. You know, you think about, for example, the Watergate scandal and Richard Nixon. This was a he said, she said kind of a debate for a really long time. You had conflicting accounts of the truth, what Nixon knew, what he did, who else was involved in the COVID up, et cetera. And all of that was washed away because it turns out that there was tapes of Nixon conversations directly ordering the crimes in question to be committed.
B
So nowadays someone could have just made that.
A
That's the point, right? So he said versus she said was totally irrelevant compared to what the tape said. And now we're in a world where, well, the tape said is not that far away from he said, she said. And maybe there's a community of, you know, experts with access to awesome digital forensics technology or who are participating in some kind of cryptographic signature chain of trust. They can say, no, actually, this is the real tape. But that's still not as good as where we were, you know, a couple decades ago in terms of, of the battle between the truth and falsity. There's one other story that I want to tell here that I think is kind of indicative of the challenge that we're going to have. And this is a story of former House majority leader Kevin McCarthy. So in 2017, while he was the House Majority leader, the Washington Post told McCarthy staff that they were going to write an article attributing the following quote to him. And as soon as you hear the quote, you're going to know why it was so explosive. Quote, there's two people that I think Putin pays Rohrabacher and Trump. So you have, in 2017, the House Majority Leader being accused of having said, now this was. The recording is attributed to 2016. So while the election was still. But having said that, they believe that the. That McCarthy believes that the President was under the pay of the President of Russia, Vladimir Putin. An explosive quote. So when the Washington Post went to McCarthy's staff and said, we're going to write this article, they came back and said, no, that is absolutely fake. How dare you say that you're going to publish something so obviously fake? And then the Washington Post said, well, actually, we have a transcript of the full conversation. And McCarthy staff came back and said, well, your whole transcript is fake. You know, we're not going to believe that. And then the Washington Post came back and said, we have a recording of McCarthy saying those words. And then they said, well, it's not fake, but it was a joke. And my point is, in this new era that we have, they can just keep saying it's fake the whole time. They sort of never have to concede to reality in the face of evidence. And as our media environment gets increasingly Balkanized, that category of challenge is just going to be so much more difficult to wrestle with.
B
Absolutely. Well, let's talk about one more group that has a lot of interest in combating deep fakes, and that's AI companies and cybersecurity companies. What are they currently doing? What initiatives do they have underway to combat these concerns?
A
Sure. Well, similar to journalists. Well, let me separate it. You know, we already talked about all the things that they're doing to try and disclose that their content is AI generated. That's watermarking, that's putting in associated metadata, et cetera, to say, hey, this is AI generated content. Now, some of that perhaps they're doing in the face of regulatory mandates from the European Union, from China, from elsewhere. But other aspects of that are, you know, voluntary. There's been this organization created called the Coalition for Content providence and authenticity, C2PA. And this is basically a group of a bunch of different companies involved in the synthetic media game, including, for example, Adobe, who created Photoshop decades ago. And what they're trying to do is create like a nutrition label for digital content, which is to sort of say, here's the associated metadata, here's the associated things that give you a way to know whether it's real or whether it's fake. And again, you know, you want to raise the ease of detecting that it's fake, and you want to raise the ease of knowing that it's real by providing these sort of associated strengths. So that kind of industry association is trying to come up with the sort of common sense safeguards to put in place here. And you know, I'm happy to note that, you know, some of the folks who are participating in that were actually funded by DARPA in those digital forensics research technologies that I was not, you know, personally running, but knew all the folks who were running it when I was in the Department of Defense. And so that sort of coalition of people who are trying to create improved technologies, improved standards associated with knowing what is genuine is continuing to do work and to try and disseminate those standards. You know, cybersecurity firms, they're trying to sound the alarm and basically just say like, hey, the stuff that you have done to prevent phishing from your emails, like wake up. Now you have to do that for video calls. Now you have to do that for phone calls. Now you have to think about your entire chain of communication and understand the extent to which it is secure. And it's not easy. I mean, if you are a publicly traded company, you have an obligation to retain records, right? When there's associated lawsuits. So the kind of stuff that you do when you switch everything to encrypted messaging platforms is kind of a pain in the butt to not only be encrypted, but to also retain records. That's a hassle that I'm sure a lot of companies are sort of still wrestling with how they're going to set up all of that, do it in a serious way. One other thing that I think is interesting that is a challenge facing the AI companies is how all of this intersects with intellectual property. So I think right now what the AI companies are trying to argue in the various lawsuits that they're facing associated with copyright or other kind of intellectual property infringement is that specific media properties have copyright, but artistic styles do not. So there's a difference between here I am creating a scene from the Lion King, which is a copyrighted, you know, movie from Disney, versus here I am generating an image in the artistic style of the animated movie Lion King. They're saying that, you know, one is probably covered by copyright, one is probably not covered by copyright in their argument. Well, if you've been following college sports, you know that one of the ways that NCAA athletes, you know, were first able to earn from their athletic prowess was these so called nil deals. Name, image, likeness. And so the point here is that like, like these people's likeness is a commercial asset. It is A way for them to make money. And if you can produce, you know, an image or a video of someone and associate it with their product, well, that is, even if it's, like, iffy, even if it's, you know, not obvious that it's that person, it just kind of looks like them, you know, are you making money off of their likeness that they are entitled to some kind of compensation for? And I think Hollywood is getting kind of freaked out by the progress in deepfake technology. A number of actors and actresses have come out saying that they don't like this in their own industry. You know, they negotiated various rights associated with use of their likeness in AI. But there's other industries where those deals haven't come to pass. And then there's the world of amateurs who are not making big Hollywood movies, but might be inspired by some joke they want to tell about an actor from Hollywood or a movie that they saw. And that's going to be a really, really tough situation to disentangle. Who has what rights under what circumstances. And as the technology gets better and better, the questions don't get any easier.
B
Sure. Well, let's round out this episode by bringing it back to policy.
A
What?
B
How does policy attempt to reduce the damage caused by deepfakes and what's currently going on in that regard in the US and globally?
A
Yeah, I think the number one thing is the one we already talked about, which is the EU AI Act's restrictions, the California restrictions and the Chinese restrictions, and all of those are really aimed at disclosure, basically saying it's okay to generate deepfakes as long as you say that you are generating a deepfake. And at least in AI and where the fec, the Federal Elections Commission is going, it also appears to be headed towards that sort of disclosure type requirement as opposed to a prohibition. And I think that's probably a pretty reasonable place to land. I'm of course, interested in exploring, you know, other ideas, like the heartbeat idea that I mentioned earlier in the podcast. But. But I want to point out that there's a challenge even if you disclose, and that's a psychological phenomenon called the continued influence effect. So here is a quote from a scholar at the center for an Informed Public talking about this phenomenon. When we receive a correction about something we already know, we don't simply erase the old information from our minds. Rather, representations of both the old information and its correction coexist in our brain's knowledge networks, playing a role in guiding our future judgments and beliefs. We're especially likely to rely on old information when it's the only explanation we have or the one that comes to mind easiest. And the point here is that when you see something viscerally, you know, a video that communicates something on an emotional level, that emotion can linger even after you know that it's false. And this is why I want to talk about one of the earliest examples of political disinformation with forged media. And it comes from the 2004 presidential campaign. This is a. This is an example I learned from Hany Farid, who's a real maestro of digital forensics and somebody who talks a lot about deepfakes and the challenges. And so during the 2004 presidential election, there was an image that came out that showed John Kerry, the Democratic nominee for president, who had been a decorated Vietnam War veteran, but also a protester of the Vietnam War and an opponent to continuing it in a photo with Jane Fonda. Jane Fonda, who is an actress, but she was not just a. Like, a small protester of the Vietnam War. She, like, went to go visit North Vietnam and pal around with North Vietnamese and Viet Cong folks. And so she was, like, politically radioactive among Vietnam veterans in the United States. And so this photo, which it turns out was completely fake of John Kerry and Jane Fonda together at a protest of the Vietnam War, basically told, you know, an entire generation of Vietnam War veterans. You know, you may have thought that John Kerry had a principled opposition to the Vietnam War that was rooted in patriotism, but actually he had an unprincipled opposition that was close to Jane Fonda. Here they are together at the same protest, you know, basically getting ready to speak next to each other. Now, that photo was fake, but the New York Times wrote about it as though it was real, because they saw it and they were like, oh, well, I guess it's probably real. And it was just using Photoshop. And Hanny Farid actually invented a new technique, which was looking at, like, the angle of the sun in the way that the two people were illuminated in the photo, showing that they couldn't possibly have been in the same picture at the same time. But here's what I want to. That's the story I really want to tell, which gets to the continued influence effect. So there was somebody who was interviewed in a poll by npr, and here's the story. You know, as recounted by Henny Fried. Days before the 2004 presidential election, a voter was asked for whom he would vote. In reciting his reasons for why he would vote for George W. Bush, he mentioned that he could not get out of his mind the image of John Kerry and Jane Fonda at an anti war rally. When reminded that the image was fake, the voter responded, I know, but I can't get it out of my mouth. And I think that is just that is such a crystallization of how images can have this emotional resonance that is deep even when you learn later that it's fake. And here's a person who's conscious of the effect that it's having on him. I'm sure there's a much larger universe of people who are not conscious of the effect that it's having on them.
B
Well, I think that's a great place to end it. And thank you so much again for breaking down more news in AI. And we'll be back very soon with more news breakdowns. But thanks for joining us and thanks for listening everyone.
A
Thank you. Thanks for listening to this episode of the AI Policy Podcast. If you like what you heard, there's an easy way for you to help us. Please give us a five star review on your favorite favorite podcast platform and subscribe and tell your friends. It really helps when you spread the word. This podcast was produced by Sarah Baker, Sadie McCullough and Matt Mann. See you next time.
The AI Policy Podcast
Episode: Sora 2 and the Deepfake Boom
Host: Center for Strategic and International Studies
Date: October 23, 2025
Guests: Sadie McCullough (host), Gregory C. Allen (Senior Adviser at Wadhwani AI Centers)
This episode explores the surge in deepfake technology following the release of OpenAI’s Sora 2. The conversation, led by Greg Allen and Sadie McCullough, dives deep into the risks, national security threats, historic context, and the evolving regulatory and technological responses to deepfake-generated content. The discussion covers real-world incidents, the technological evolution of generative AI, policy reforms, implications for democracy, and future strategies to combat malicious use of deepfakes.
"SORA is sort of the next evolution in this generation capability becoming so much more capable. If you haven’t played around with it yourself, there is kind of a visceral delight and also a stomach queasiness as you can see yourself." – Greg Allen (03:34)
Cybercrime:
"A finance worker at a multinational firm was tricked into paying out $25 million to fraudsters using deepfake technology to pose as the company's chief financial officer in a video conference call." (06:04)
"Imagine somebody who is an officer who does missile warning type behavior… and receives a conference call request from someone they know and trust, basically saying, ‘Don’t worry about that satellite launch.’" – Greg Allen (18:42)
"What are the lines of ethical behavior? Of legal behavior?… Norms are still being established as we go." – Greg Allen (25:57)
"The latest generation of AI video generators can generate a synthetic pulse rate to be detected... I think it would be very interesting to basically make it illegal to generate video with a fake pulse rate." – Greg Allen (33:04)
Policy & Laws:
"If you rewind 20 years... an anonymous letter counts for nothing... The anonymous video—[that] person is probably going to jail, right?" – Greg Allen (40:19)
"Now we’re in a world where, well, ‘the tape said’ is not that far away from ‘he said, she said.’" – Greg Allen (46:49)
"I know, but I can’t get it out of my mind." – Voter on fake John Kerry & Jane Fonda photo (60:04)
On the arms race of authentication:
"It points to the sort of arms race between the people who are generating superior AI enabled generation technology, deepfake creation technology, and those who are creating superior authentication technology." – Greg Allen (02:33)
On shifting standards of trust:
"We're moving from a world of seeing is believing… to the sources of trust are going to have to come from other things… like cryptography." – Greg Allen (07:50)
On the limitations of disclosure alone:
"When you see something viscerally—a video that communicates something on an emotional level—that emotion can linger even after you know it's false." – Greg Allen (59:43)