Lawsuit Filed Against OpenAI by News Companies Following Recent Deals - The Mark Cuban Podcast

Summary6 min read

Summary of "The Mark Cuban Podcast" Episode: Lawsuit Filed Against OpenAI by News Companies Following Recent Deals

Release Date: May 3, 2024
Host: The Mark Cuban Podcast
Episode Title: Lawsuit Filed Against OpenAI by News Companies Following Recent Deals

In this compelling episode of "The Mark Cuban Podcast," the host delves deep into the unfolding legal battle between major U.S. newspapers and OpenAI, the company behind ChatGPT, along with its partner Microsoft. This lawsuit marks a significant moment in the intersection of artificial intelligence, journalism, and copyright law. The episode meticulously breaks down the complexities of the case, its implications for the future of AI, and the broader impacts on the news industry.

1. Introduction to the Lawsuit

The episode opens with the host announcing the pivotal news: five major U.S. newspapers have initiated a lawsuit against OpenAI for copyright infringement, also targeting Microsoft in the process.

"Today we have some big news in AI and that is the fact that five major US newspapers are currently suing OpenAI for copyright infringement and they're also suing Microsoft." [00:00]

2. Background and Context

The host provides a historical perspective, noting that similar lawsuits have emerged previously but emphasizes the unique circumstances of the current case, particularly following OpenAI's recent deals with various news organizations.

"We've seen similar lawsuits in the past, but all of this is coming on the backs of OpenAI making a bunch of new deals with news corporations." [00:45]

Notably, OpenAI has established agreements with prominent entities like Axel Springer and the Financial Times, and previously with The New York Times. These agreements involve OpenAI compensating these news outlets for the use of their content in training AI models.

3. Details of the Lawsuit

The crux of the lawsuit lies in allegations that OpenAI and Microsoft have "purloined millions of the publishers' copyrighted articles without permission and without payment" to generate revenue through tools like ChatGPT.

"The newspapers... are accusing OpenAI Microsoft of purloin millions of the publishers' copyrighted articles without permission and without payment." [19:30]

Additionally, the lawsuit contends that both companies have removed copyright management information, such as journalists' names and titles, from the content when cited, and have diluted trademark claims by using newspaper trademarks within AI responses.

4. The Role of Alden Global Capital

A significant revelation is that the suing newspapers are owned by Alden Global Capital, a single investment firm, rather than acting independently. This centralization suggests a coordinated effort to maximize legal and financial leverage against OpenAI.

"They're eight big newspapers owned by one company. I think this is what's important. The same investment company, which is the Alden Global Capital." [07:15]

Alden Global Capital's strategic move includes hiring the same law firm that represented The New York Times, Rothwell, Fig, Earnest, and Manic, to ensure consistency and potentially strengthen their legal stance.

5. Fair Use and Copyright Implications

The host explores the contentious issue of fair use in the context of AI training. While copying entire articles would clearly infringe on copyrights, quoting small sections for commentary or analysis typically falls under fair use. However, the distinction becomes blurred when AI models generate text that might closely mimic original articles.

"It's like fair use for us to grab quotes from things that are happening and, and talk about them." [05:50]

The discussion highlights the challenge of defining the boundaries between legitimate use and copyright infringement in AI-generated content.

6. Reputational Damage Claims

Another facet of the lawsuit involves allegations that inaccuracies or "hallucinations" by AI models can harm the reputations of the news outlets whose content is being used. The concern is that erroneous information generated by AI could misrepresent the original sources.

"The newspaper has also accused both of these companies of reputational damage because the AI's hallucinations." [15:20]

The host debates the validity of these claims, suggesting that errors in AI responses are relatively rare and can be mitigated through better safeguards and content moderation.

7. Implications for OpenAI and Microsoft

The lawsuit poses significant challenges for OpenAI and Microsoft, potentially setting precedents that could alter how AI companies utilize existing content. The financial stakes are high, with the suing parties seeking substantial compensation for the alleged infringements.

"This definitely has a lot of implications for these AI companies, what they're able to do." [22:10]

Moreover, the consolidation of multiple lawsuits by Alden Global Capital could amplify the pressure on OpenAI and Microsoft to renegotiate terms or overhaul their content usage policies.

8. OpenAI's Response

OpenAI has responded to the lawsuits by categorically denying intentional copyright violations. The company acknowledges that instances where ChatGPT may reproduce verbatim text from sources like The New York Times are rare bugs they are actively working to eliminate.

"They're saying that it's literally just spitting out the same thing. And they're saying that this is a rare bug that we're working to drive to zero." [34:05]

This stance underscores OpenAI's commitment to refining their models to prevent direct copying of copyrighted material.

9. The Big Picture: Impact on AI and News Media

The host contextualizes the lawsuit within the broader landscape, emphasizing how generative AI tools are disrupting traditional revenue models for news organizations. With AI providing concise answers and summaries, the reliance on search engines like Google for news discovery is diminishing, potentially leading to a significant loss in ad revenue for publishers.

"Between the two of those is quite disruptive. So it's going to be interesting how this plays out." [38:50]

The episode posits that the outcome of this lawsuit could dictate future interactions between AI developers and content creators, shaping the economic and operational frameworks of both industries.

10. Conclusion and Future Outlook

In wrapping up, the host reflects on the possible trajectories of the lawsuit, including the likelihood of additional newspapers joining the legal action and the potential for a combined case under a single judicial hearing.

"Eventually, during the lawsuit, they're starting with eight right now... they'll like dump them all in. It'll be interesting to see what happens." [27:45]

The episode concludes on a note of anticipation, highlighting the case's potential to influence AI content policies and the sustainability of traditional news revenue streams in the digital age.

"It's going to be fascinating to see how this goes." [36:30]

Listeners are encouraged to stay informed and engaged as this legal battle unfolds, with broader implications for the future of AI and journalism.

Notable Quotes:

"You mention that OpenAI is doing deals with a lot of news publishing companies... and they're paying them out essentially for their content." [02:10]
"The New York Times tried to negotiate a deal with OpenAI and Microsoft for a bunch of months before their lawsuit... they asked for way too much money." [24:00]
"OpenAI isn't designed to copy and paste, so this is probably not something that happens a ton." [34:50]

This episode provides a comprehensive analysis of the lawsuit's intricacies, the strategic maneuvers by Alden Global Capital, and the broader implications for AI and the news industry. For listeners seeking to understand the evolving dynamics between technology and media, this discussion offers valuable insights and foresight into potential future developments.

Loading summary

Transcript1 lines

[00:00]
A
Today we have some big news in AI and that is the fact that five major US newspapers are currently suing OpenAI for copyright infringement and they're also suing Microsoft. I think, you know, you don't want to just sue the new startup, you want to sue the guy that's actually giving them all the money, which of course is Microsoft. So this is kind of an interesting deal and lawsuit. I want to break it down. We've seen similar lawsuits in the past, but all of this is coming on the backs of OpenAI making a bunch of new deals with news corporations. We have a deal with Axel Springer, they just last week signed a deal with the Financial Times. And of course we've had them have this whole lawsuit and a deal conclusion with the New York Times as well in the past. So in my opinion, what's really going on right here is probably that OpenAI is doing deals with a lot of news publishing companies. They're paying them out essentially for their content or for whatever. And that's totally cool. But, but I think because of that, then of course we have a bunch of other companies that are like, hey, look, we want a little piece of the action, right? They don't want just the New York Times get paid out from the lawsuits. They all want to jump on. And you pretty much see this where if there's like a piggy bank, everyone wants to smash it. The more money that OpenAI gives out to other news corporations, new ones are going to jump on. And I think because OpenAI essentially trained from the entire Internet, like, yeah, you can make a case for every single news company that they were included in the training set. And something important I want to say is in these lawsuits. And we'll break down more of the specifics, but like, to, to, to essentially conclude that, look, OpenAI trained off of like, let's say the New York Times. It doesn't even mean, like, if, if OpenAI blacklisted the New York Times URL, which I don't actually think they did because I think they wanted all that data, but let's say they did because they didn't want to get in some sort of lawsuit with them. That doesn't mean that they would be completely exempt from having trained off the New York Times data because it's super easy for anyone on the Internet to copy and paste the article and repost it on their own website. Now that's, you know, step one. And maybe the New York Times could get mad at that, that website or whatever, especially if it's supposed to be behind their paywall or whatever. It gets a little bit trickier though when something that's like fairly fair use is that you go to any article on the Internet and you copy like a paragraph or a couple sentences and you quickly quote them somewhere, right? You're like as the New York Times said in their article, this blah blah happened. And also lest you think like that there's some like holy protected class of New York Times articles. Like I've seen Bloomberg and the New York Times quote things from other people as well or other blogs or you know, most recently we have this huge tech debacle where we have Marcus Brownlee who's been reviewing a bunch of AI tech and he gives, he's giving a bad reviews, whatever. And it's this big, big viral moment. It's on Twitter. Yeah, Every major news company is quoting direct quotes from his YouTube videos and they're also, you know, direct quotes from his X account and people that are commenting on his ex post. So like, I don't know, it's in my opinion it's all fair use. Right? Like we have things happening and there's commentary going down. So this all, I say all this because it makes it a little bit trickier when you're doing these lawsuits and you're saying like OpenAI trained off of a New York Times snippet. Because the way they actually prove that is they try to go to OpenAI and say like, hey, give me like an exact quote from this article about this thing that happened. And if OpenAI spits out like a direct quote from the new New York Times article. The New York Times, like haha, we caught you. Right? So anyways, but like it doesn't necessarily mean it's actually from the original article. It could be someone quoting it. And is that even bad? Because like it's kind of fair use for us to grab quotes from things that are happening and, and talk about them. I think like, like obviously it would be bad if you said what happened in this specific event and it like literally copy and pasted the whole New York Times article, but it's not even doing that. And I think it gets one step further. Sorry for all of my analogies here, but the last thing that is important is as these AI tools are becoming more and more popular. Even before ChatGPT came out, when we had things like Jasper AI, which I was using back in like two years ago is September a couple of years ago, I was using this thing a ton and that was kind of essentially DaVinci, which was an older version of ChatGPT, before ChatGPT was released. Whatever. The point is, people were using these AI tools to rewrite articles. And today you can go stick a New York Times article into ChatGPT, get it to rewrite it, even though the New York Times might have had exclusive data on that. And then maybe the New York Times could get mad about that. But at the same time, what's the difference between meet that and like, me reading a New York Times article and writing my opinion piece on what they said? Right. Everyone's kind of doing it. So there's a lot of gray areas. So these lawsuits are a little interesting. Let's dive into exactly what's going on here. So who are the newspapers? They are. They're eight big newspapers owned by one company. I think this is what's important. The same investment company, which is the Alden Global Capital. I think it's important because it's not like all these guys are friends and they all get together and sue Open AI. Really, it's Alden Global Capital that's suing OpenAI. So let's make that clear. Um, the second thing is the New York Times had a similar lawsuit and they got a bunch of. They had a bunch of publishing claims. Up until now, they were the only one that really took legal action against them. What's interesting is that it seems like they might still be the only independent one that took legal action versus, like, this is a conglomerate. This is an investment group suing Open Air. They are an investment group, so they probably want money. Whatever this is, it's definitely about the money. So a lot of other newspapers, including the Financial Times, Associated Press, Axel Springer, have all made specific paid deals with OpenAI. Other AI companies where they're getting millions of dollars annually to essentially get their content included. Microsoft isn't talking about this. They're not commenting on this whole story, this whole lawsuit. Whereas OpenAI is a little bit more chatty about it. But who are these news companies that essentially are one big conglomerate? We have the New York Daily News, Chicago Tribune, Orlando Sentinel, South Florida Sun Sentinel, San Jose Mercury News, Denver Post, Orange County Register, and St. Paul Pioneer Press. Okay. These are like local newspapers, if we're being honest. Right? I guess Chicago Tribune, probably one of the bigger ones in there, but otherwise, yeah, these are like local city News, really one big investment firm. So they're all being in. They're all being represented by Rothwell, Fig, Earnest and Manic, which is one of two law firms that was supporting the New York Times in their lawsuit against opening on Microsoft. So it's kind of like we get a big investment firm that owns a bunch of these newspapers. They go hire the same law firm that was, you know, used by the New York Times to sue OpenAI. The lawsuit was filed in the same district as the Times lawsuit, which is interesting, because if the same judge has chosen to oversee both of the cases, they could actually choose to combine the two complaints. Was this done on purpose? Was this random? Obviously, this was done on purpose. They see this lawsuit from the New York Times is probably going well or has a high chance of going well. So this investment firm grabs their conglomerate of newspapers, they file a similar lawsuit in the same district, hoping the same judge gets it. If same judge gets it, he could combine the two cases, and essentially this investment firm would be piggybacking on the New York Times case. So you can obviously, some see some, like, pretty solid financial motives for all of this. But it's interesting someone that apparently was familiar with this and was reporting by Axios. So, you know, I mean, the familiar with the case, I like, put a caveat on that, because I hate those kind of sources when there's not an actual person's name behind it. And you could get fake stuff in news all the time on this. But in any case, apparently, according to Axios, someone familiar with the Alden subsidiaries that owns the newspapers said that the paper is right now opting to sue the two firms instead of attempting to negotiate a deal. Right. So the New York Times tried to negotiate a deal with OpenAI and Microsoft for a bunch of months before their lawsuit, which OpenAI said was like, a surprise because they're like, I thought we're going to negotiate a deal. But, you know, it took them a long time. Evidently, what that means when they say tried to negotiate a deal and didn't work, is that like they asked for way too much money or according to OpenAI, way too much money opening. I didn't want to pay the price they were asking. So they're like, fine, we'll sue you. Evidently, same thing. They tried to negotiate a deal, and then now comes a lawsuit because they're not getting the price they want. It's going to be interesting to see if they ever get the price. I think, uh, it's going to be a precedent that, you know, essentially has a lot of implications for these AI companies, what they're able to do. So for now, Alden isn't ruling out having more of its. I think they own 60 newspapers. Eventually, during the lawsuit, they're starting with eight right now. I think what that means is they like found eight newspapers that they had like really high odds of them, you know, being included in the data set. And they're like, well, we don't maybe 100% know or can prove if all 60 are. But they're like, they're, they're like not ruling out adding all 60 of their newspapers to this lawsuit. I guess if, if it comes down to it where it's like per news site they get a certain amount of money, then they're like, well, 60 is better than 80s or multiple on that. So they might just like dump them all in. It'll be interesting to see what happens. So how does all of this work? Similar to the New York Times lawsuit? I think at the really center of this complaint is copyright infringement claims around essentially OpenAI using their articles to train the model. The newspapers right now or this investment firm is accusing OpenAI Microsoft of quote, purloin millions of the publishers copyrighted articles without permission and without payment. And this is of course to make money from ChatGPT. So the newspaper right now is also claiming that OpenAI and Microsoft removed copyright management information like the journalists names and titles from the work when the information was cited. So the lawsuit also includes diluted trademark claims which allege essentially that OpenAI Microsoft are, they didn't have like the authorization to essentially use this. And they, they also say that they use the newspapers trademarks in the answers on ChatGPT and Copilot. You can kind of like join Microsoft and OpenAI together in this complaint because Copilot is essentially chat GPT running their back end too, right? So one other thing that the newspaper has also accused both of these companies of is reputational damage because the AI's hallucinations. So essentially they're like, look, you're like using our info and you're like using that to give people answers. But then it can also hallucinate and say the wrong thing. And that's bad for us because I think this, this point is kind of moot because like obviously may, okay, maybe I guess if I straw man this or steel man this, maybe what they're saying is according to the New York Times article, this, this, this, this thing happened, right? So maybe ChatGPT could say that and then it had like some erroneous fact and then they're like, it's going to damage a reputation because now it's associating a brand name with that. The easy solution is to just like say tell Chat GPT they could just Hard code it to say, you're never allowed to say the New York Times or the Washington Post or whatever. There's obviously pros and cons to that. It'd make it feel like less high quality stuff. I think that if they want to play hardball, OpenAI does that. And as far as like stealing their copyright and branding, obviously it's like Google in my opinion, it's helping their branding, it's making them relevant to cite them. The last time I went to the New York Times to read an article was frankly never. Probably because it has a paywall but also because I don't know, you know, I get my news in other ways. I get it from newsletters, I get it from different places. So it seems like it's becoming less relevant. If you want to stay relevant, I think it's probably great to be sort of sighted. You know, there's, there's a higher chance that I'm going to go read a New York Times article if I see it quoted in something even in chat GPT than if it then if it never says the name. So it's kind of interesting because they're like mad. They're using their data and they, but they don't want them to use their name. Whatever. Let's talk about the big picture here. I think the outcome of this lawsuit is going to be really big for how AI companies incorporate news into their content. So news publishers up until now have obviously just relied on ad revenue that has come from search traffic. It's been 20 years where we've seen this generative AI tools can essentially wipe that out because I'm not always going to Google. I mean honestly I don't. Yeah, I use perplexity, I use u.com I use a lot of other tools instead of Google. So Google's getting disrupted in this but I'm getting like a concise roundup of answers. I also use like newsletters which has kind of replaced my like AI newsletters are fantastic shout out and so like I use that to get a lot of my AI news and they cite the sources but I'm not going to click on it if they give me a really concise which I know they just use AI tools to like, to like summarize an article or whatever but if they're just going to summarize it right there, I don't need Google to find it and I don't need to read the article to get the information. So like yeah, between the two of those is quite disruptive. So that's been the model. Um, and it's definitely getting disrupted. They're all concerned about that and they all don't want this to happen. So it's going to be interesting to see how this plays out. Opening Eye right now has already opposed both of these lawsuits. They're fighting both of them in courts. And they say that the. The New York Times specifically cited how open AI's tools regurgitated verbatim copies of the New York Times. So they're saying, like, it's literally just spitting out the same thing. And they say that this is a rare bug that we're working to drive to zero. Like, in other words, ChatGPT isn't designed to copy and paste, so this is probably not something that happens a ton. If it does, it's a rare bug. They're working to drive to zero, as they've said. And so, yeah, obviously everything's just going to get rewritten. In any case, it's going to be fascinating to see how this goes. If you enjoyed the podcast today, if you learned something new about these lawsuits and kind of how the landscape is going to play out in the future, I would really appreciate it. If you like the video or give us a review, if you're listening on Apple or Spotify, subscribe and I will catch you in the next video.