
Loading summary
Cecilia Zaniti
The United States is strong intellectual property protection regime is what gave rise to Hollywood, what gave rise to Silicon Valley. Oh, hey, I trained my own GPT on all your work and it's been so helpful. And she's like, okay, you know, thanks, like, and it's sort of like, how am I supposed to feel about that? Like, it's maybe it's not verbatim copying and you know, who knows what it outputs. But even if it rewords the phrases, should there be compensation for that? Of course, in that process they're going to defend and say no. It's called the general purpose transfer transformer. The intended use of this device is like a VCR for your own personal use. My prediction is actually that there ends up being a commercial solution to that where there's some startup that's like, hey, GPT yourself. And then it's literally like you can get her knowledge from there. They're going to figure out AGI they can figure out and not have infringement. And there's people that have said that they're using GPT4.5 to actually do the copyright analysis.
Nathan Labenz
Hello and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs and builders working on the frontier of artificial intelligence. Each week we'll explore their revolutionary ideas and together we'll build a picture of how AI technology will transform work, life and society in the coming years. I'm Nathan labenz, joined by my co host Eric Torenberg. Hello and welcome back to the Cognitive Revolution. My guest today is Cecilia Zaniti, founder of gcai, an AI assistant for in house counsel, previously general counsel at replit and lead counsel on the Alexa team at Amazon, among other roles, and author of some of the most helpful viral Twitter threads that I've read about the legal landscape surrounding generative AI broadly and about the potentially blockbuster New York Times vs OpenAI case in particular. It's safe to say that that legislators from the founders of the United States, who I learned from Chuchilla, did specifically provide for the protection of intellectual property in the United States Constitution to the more recent Congresses that have extended copyright protections and otherwise modified intellectual property law, did not anticipate a technology like generative AI when writing the rules that we live by today. And so it falls to the courts, for now at least, to figure out how to apply an established legal paradigm to an emerging technology paradigm. To be honest, aside from filing a couple of provisional patent applications over the years, including one just last year, which I wrote with the help of GPT4, this is not an area that I've studied much at all myself, so I brought a ton of questions to Chachilia and proceeded from the most general and basic to the most specific and nuanced. Why do we have intellectual property in the first place and what are its main branches? What legal frameworks apply to generative AI? What rights do content owners have? Does the New York Times have a good case? Is it really possible that the court could order GPT4 to be destroyed as the New York Times demands? How likely are the courts to rule on this question at all versus trying to force Congress to act? Are there any notably good proposals for generative AI law that are not already broadly known? And how will the US position inform or perhaps be challenged by different choices that other countries might make going forward? We cover all this and more in this conversation. As always, if you're finding value in the show, we appreciate it when you share it with friends. This episode, quite naturally, would be for the lawyers in your life. Now. I hope you enjoy this conversation about the relationship between the law and generative AI technology with Cecilia Zanitti Cecilia Zanitti, technology lawyer and founder of GCAI at getgcai Online welcome to the Cognitive Revolution.
Cecilia Zaniti
Thanks Nathan. Glad to be here.
Nathan Labenz
I'm excited for the education that you are about to share with me and with the audience. So many of us are AI enthusiasts, AI builders, AI researchers, and we of course spend some time thinking about, like, how is this going to impact the world? And you know, that often gets played out in very positive visions of, you know, of, of the impact we want to make. And certainly there's the worries too of the, you know, the, the risks that we want to avoid. I honestly haven't spent a lot of time personally and I suspect that's probably true for a lot of folks that, that listen to the show thinking about some of the kind of practical but super important interactions between these new AI technologies and something like the law. And obviously this has come to the fore a bit recently with, you know, increasingly like a ton of lawsuits being filed, the most notable of which is the New York Times versus OpenAI, you know, which is kind of got the, the counterparties to, you know, to create a case that seems like it could be a big precedent center and, you know, could be referenced in history books for many years to come. I want to, you know, get into all this with you, but I would love to start if you would indulge us as people that are, you know, at least speaking for myself, very much a beginner when it comes to questions of the law by just asking for a kind of introduction to intellectual property law, like why does it exist? What are the big sections of it? You know, what are the. Maybe some of the live issues. And then we'll get into kind of how AI enters the scene and, you know, how that changes things, perhaps. But a good kind of brief foundation, I think would be super useful.
Cecilia Zaniti
Of course. So intellectual property law, you're right to ask about it, because if you look at the rise of the big tech companies, they're not big property holders. The property that they hold is intellectual. It's software, it's marketing, it's brand. If you look at literally the Fortune 20 in 1980, there was not a single tech company in there. And now I think it's four of the five, with the. The Magnificent seven or whatever in the, in the top. But in terms of the law, it's been long recognized that intellectual property is a. Important to encourage. It literally goes back to the Constitution. So not a lot of people know this, but the Constitution, in the article that created Congress, Article 1. So Article 1, Section 1, Clause 8, talks about to encourage or to promote the progress of science and the useful arts, Congress shall allow for a limited time monopolies for authors and inventors. And so that's the IP clause. And that literally gave rise to the Patent Office, the Trademark Office. And a lot of people say that America's, the United States, strong intellectual property protection regime is what gave rise to Hollywood, what gave rise to Silicon Valley. And so that's kind of the basis for it. And then in terms of the types of intellectual property. We can dive in if you'd like.
Nathan Labenz
Yeah, okay, cool. That's a. I did not realize that it was that high up in the, in the Constitution. So when I think of this, I. At least three terms come to mind. Patents, which I understand to be kind of around invention, copyrights and trademarks. And I won't even venture to try to distinguish between those final two. Are there any other classes that I'm missing? And how should I kind of conceptualize those three buckets?
Cecilia Zaniti
Yeah, yeah. So trade secret is the other typical fourth category considered. So the famous example is of course the formula for Coke that, you know, two people that know it can't fly together, and it's very, very deeply protected. But essentially those three buckets of, you know, patents, trademarks and copyrights, those are going to cover sort of most of what you think of, of software and what are, and what you, when you, when you think about, you know, AI and the production of LLMs. So we can dive into each of them. I think they each have their own nuances. They each are created in slightly different ways and protected in slightly different ways.
Nathan Labenz
Particularly I always think about like inputs and outputs for AI systems. And I'm kind of interested also in like, what do you have to have or create to sort of be the input to then have these rights and then on the other end, like, what are the special rights that I have in virtue of having these things?
Cecilia Zaniti
Yeah, yeah. So distinguishing again the three types of ip and then we'll jump into copyright because I think that's where the New York Times suit is and that's where it'll get kind of juicy for your listeners. Patents are a pretty formal process. So you actually apply for the right to exclude on an invention. And the patent itself has what's called claims that lay out the steps for whatever it is. And it's very, it's called prosecuted, but it's very heavily scrutinized at the time that you file. So you file for a patent on a given technology and then you know, the patent office, you have some back and forth called office actions that goes into that, but it's not. You don't kind of automatically get it. In contrast, copyright literally attaches on creation. You don't have to do anything. Like you can literally decide, I want to write a poem about my kid, grab your pen, start writing, congrats, you have a copyright. It's when it's actually in, in a fixed medium of expression in your head, not a copyright yet. In a fixed medium expression such as typed or written out, you likely have a copyright. Now if you just from memory were to write out the lyrics to Taylor Swift, you would not have a copyright in that because she previously has a copyright in that particular expression of that song. But if you were to rewrite a Taylor Swift song and instead make it about potato chips, again, you're copyrighted.
Nathan Labenz
Very interesting. So on the patent side, this is not pertinent so much to the main topic for today, but I guess for one thing, it's often common that the patent system is broken. And we certainly hear stories. I don't know how really common they are or if this is the dominant narrative or just selective storytelling, but you certainly hear this idea that the patent system is broken and the big companies that have the resources to do it, they're going out and patenting all kinds of things very well in advance of actually being able to do it. And then they can sort of squat on them or Protect themselves or what have you. But one thing that's very striking about the current AI moment is Google invented the transformer and either they didn't patent it or they patented it in such a way that. Guess I'm kind of embarrassed that I don't know this, but everybody else is using it, right? So could they have patented the transformer and just chose not to?
Cecilia Zaniti
Or so they could have patented the transformer model. So software patents, a little bit of a debate. There was quite a raining back in of software patents under a line of cases called alice, but essentially you still can patent software processes that have a specific result and that are embodied in a particular way. According to Twitt, of course, my source for legal research, not really, but I did check the transformer model was patented by Google and but for them enforcing it, it's kind of a losing game for them to enforce it because they're the platform they want everybody developing. And so their patent strategy is much more defensive. So there were the patent wars, Google bought Motorola and you know, the smartphone patent wars. I build many hours of my life on those cases. But essentially, you know, they're kind of like there's sort of like a detente breached. So little known fact, every Android implementation, 10 bucks go to Microsoft. And most people don't know that, but literally every Android phone, there are so many Microsoft operating system patents that the deal that was reached was that that would be the case. So, so if you're a Samsung or any device, Android device maker, it's actually a question whether you trigger this particular licensing fee. And it's a big deal because it's an element of the BoM, the bill of materials for a piece of hardware. And so essentially yes. Could Google go after people for their huge. Google is one of the bigger patent filers and patent holders, but in general their strategy has not been to go after people, they use them much more defensively. So for example, Sonos had a big series of patent cases against Google and Google of course had a bunch of patents that it could assert against Sonos. So that litigation went on for, I think it's been seven years, eight years. And you know, if I'm Sonos, I don't know that it was wise to suit Google in that scenario. To your point that it's like, you know, the pockets are really deep, the portfolio of patents that Google has is really wide. And so chances are even a great result would be some kind of cross licensing.
Nathan Labenz
Yeah, interesting. So I mean this would be if there was one moment, right, that would maybe challenge that detente. Like this might be the one. Right. You could imagine a scenario where Google would say, hey, yeah, you know, don't be evil, all that stuff. We kind of semi retired that anyway. But OpenAI and Microsoft have gotten ahead of us here in a way that we really can't allow. And it might like disrupt search, our core business. And we own the patents on this. Let's go shut them down. If they were to do that, it seems like they would have a chance of winning. Right?
Cecilia Zaniti
It's complicated. So I mentioned that the $10 that Google pays Microsoft, the amount of Microsoft patents that Google reads on is potentially infinite. Right. So every single Google service probably has a basic file system, for example, and Microsoft's patents go super deep in file systems. But more it's the PR thing, right. Can you imagine Google Files a big patent suit against OpenAI? Talk about code red. So code red was last year when ChatGPT came out and everybody Google freaked out and they really marshaled the company in this direction to move faster. And then Bard and Gemini and so on came out of that. But you know, if they, if they sue that if I'm their strategist, so not just their lawyer, but their strategist, I'm thinking, you know, the press on that would be, you know, old line. Google in its last throws tries a legal mechanic against OpenAI. Right. So it wouldn't play out great that way, could they? Yeah. I mean on the OpenAI side though, there's a concept and your readers might find this super interesting. You can design around a patent. So remember I mentioned that there's the claims, so very specific. Step one, step two, step three to do this. And if you skip one of those steps or you do something a little different, you potentially don't infringe. And that infringement analysis, you know, that requires a court case at itself. So one fun story about Google I actually worked at, my first job in a legal team was at Yahoo in the early 2000s when they were still competing with Google. Right. So Yahoo stock was going up two bucks a month. I was very happy I was able to pay for law school from that. And they were still competing in search. So Chilu and some other really big industry luminaries that eventually, you know, Hadoop came out of Yahoo. Yahoo doesn't get great credit for its it's tech, but it was good. But any event, Yahoo bought a company called Overture which had literally come up with sponsored search and they had a patent on sponsored search, which is literally the entire Google business. Model. And so there was a suit on that that I, that I was, you know, went to a couple of hearings for. And the issue was that the patent, as if you pull it up, you can Google it. Not Yahoo, but you can Google it and find it. The patent says it's. We have a patent on, or, you know, the claim. What is claimed? That's how the patent reads. It's very formal. What is claimed is a system, a method wherein placement in search is, quote, determined using bid amounts. So determined using Google's defense was, we don't determine your placement in search using bid amounts only. We also look at relevance. We also look at the name brand. And they had all these things, and it basically said, no, no, no. Our algorithm is, like, way more complicated and interesting. It's not practicing this patent because we determine it using a plurality of factors. So that was kind of a legal thing. And that case settled. Litigating on that point, really tough. Right? We were analogizing to GPAs. Right. Okay. If the. If the valedictorian is determined using gpa. But then if you play a sport, you win. Okay, is that. Is that infringing? So you get into these really things that kind of almost match LLMs where each word matters. There's a whole process in patent law called claim construction, where the parties basically agree on what the definition of each specific word is that they're going to use go forward. So, yeah, patent law, you know, similar to fair use under copyright law, which we'll get into, is, you know, expensive and unpredictable to litigate. That's the theme you're going to see in a lot of these kind of legal solutions, is that when a system is unclear or gray or based on the meaning of specific words, it becomes more of a challenge to litigate and more of a risk.
Nathan Labenz
Hey. We'll continue our interview in a moment after a word from our sponsors. Yeah, fascinating. Okay, so there's. There's the law, and then there's the, you know, public relations question. Then there's also kind of the game theory dynamics of, like, if Google starts suing Microsoft, Microsoft can sue them right back, you know, on a million different things as well. And then everybody. It's kind of a mutual assured destruction. I don't know if this varies across these different types of intellectual property, but to what degree, when I have a monopoly, does that mean that basically I can. I have total rights to choose, like, who I deal with and how? There's no like. Because I saw an interesting proposal the other day where somebody said, you know, There should be like mandatory licensing where you can't, you know, hold your invention totally exclusively but you, you know, you might should be able to profit from it, but you can't like tell nobody else that they can ever use, you know, your good idea. But as I understand it right now, you get to decide as the owner, right? Like will you, you could just refuse outright. I just don't want to deal with these folks and that's that.
Cecilia Zaniti
Yeah, I mean there are some kind of statutory exceptions in with respect to drugs. So there's a whole regime around generic drugs and the amount of head start that the original patent holder gets and how they do that or you know, sometimes, you know, the whole open source movement. Google has open sourced a number of patents as have most of the big players. It gets back to that constitution line, right? It says exclusive rights. So it's very similar to real property. Right. Like if you have this like plum piece of land, you know, next to the county fair, you don't have to make it a parking lot. You can, you might make great money, but you don't have to. Now there is a process called eminent domain where the government could potentially, you know, come in. But that's really contra to you know, the original Constitution. Another kind of fun fact it was, you know, it's life, liberty and the pursuit of happiness. The original text was life, liberty and property. And so, you know, very serious thing and role that the government plays in giving people, holders of both real and intellectual property the right to, to really capitalism. Like they can do it, they cannot, they can do it badly. They can be selective. Now there are some, you know, like the Civil Rights act and so on about like serving people at your restaurant. You know, there are rules on that. But in this case like how you know, Google decides to commercialize or not the transformer model, you know, it's a strategy question and they took one path. But seeing you know, really OpenAI commercialize it as they have and had just such a, such a breakout hit so quickly, you know, it sort of indicates that, you know, what other gems is Google sitting on? I have the same curiosity as you do.
Nathan Labenz
Interesting questions about like discovery too. And if you, my understanding is if this wherever were to get litigated then would the records become public and would like. Is it the case that the only reason that we originally knew why Google or like how Google was ranking its search results and all these different factors was because that came out in discovery. I wonder what Google might learn about what OpenAI's got up their Sleeves if they got into a courtroom and everybody had to tell all their secrets.
Cecilia Zaniti
Yeah. So, I mean, Google's definitely not a stranger to discovery and they've been under definitely antitrust scrutiny for a number of years. And you know, I've been in a house lawyer, I was at Amazon for, for almost four years. And you know, we train folks internally that really, you know, the, the New York Times test, literally what, you know, what you write, would it, would it withstand the front page, both from a PR perspective and a legal one. But in terms of actually the, you know, the litigation process, it's, you know, one of the things that I think Meta actually got called out about in that I think it was Francis Hagan. But there was like this whistleblower suit a couple of years ago was, you know, they essentially used Facebook internally. Cool dog fooding. But anybody could sort of go into any. It would be like open, completely public slack channels. And a company of that size, like, there's a lot of info you can have there. And they've since, you know, locked that down. But that's how that the. Francis Hagan was able to actually pull down different research studies that she eventually presented to Congress was because of this open system. So you, that being said, I don't, you know, I'm pretty pragmatic about it in my, in my legal advice in house in the sense that information kind of wants to be free, you know, and I've been at companies where we had, you know, almost completely public slack channels. And you know, from an engineering standpoint too, you can kind of scroll, you can get the history, you can understand. So, you know, there's definitely a. That's the benefit that you're weighing against, you know, the risk of like, yeah, if you get in a litigation, then it's like free for all. And you see things, lots of fun examples there. All the big tech companies were sued by the California Attorney General over App stores. And some of the discovery there was like, don't ever give refunds. This is a great revenue stream. And kind of like just pretty bad statements internally that led to suits. So a lot of it is also just communicating smartly.
Nathan Labenz
Very interesting. Okay, so let's then jump to the copyright side. So you've said that as soon as I make something, I immediately have copyright in it. This falls under the same exclusive situation. But yet I guess a couple questions I have on that one would be like, is this a commercial exclusion? You know, does it matter, like if I'm getting paid for, you know, using somebody else's copyrighted material. And also, like, I feel like I see cover songs all the time that are like, not necessarily, you know, signed off on, but seem to happen. So give me a little bit more kind of a rundown on the copyright.
Cecilia Zaniti
Side of this, you know. So your question on is, is copyright a commercial right? Is it a other kind of right? People do look at it as a personal right. So in France, it's actually called, like, adroit morale, or there's a French word. But essentially that as a creator, I have this kind of really ethical right to exploit my work and to be the one to exploit it. But how you do that. The metaphor that they teach in law school, which I think is a good one, is that an intellectual property right is a bundle of sticks. So as an example for a movie, you can separately license the rights to play the movie on the plane, the rights to play the movie on streaming, the rights to play the movie on cable, the rights to play the movie overnight. Like, the rights to play the movie in Italy. Like, whatever it is, there's infinitely many ways that you, as the owner of that right, can decide to split it up. And you see that right, like, you know, Netflix has a huge licensing department that writes big checks. Your Hollywood studios have. They call it. I think they call it business affairs, but legal and business affairs. But essentially that's the team that thinks about how to split up these rights. And then there's merch and all kinds of downstream stuff. So that's the right that the copyright holder has. But there are limits on that, and fair use is a really important one. So fair use, kind of interesting. It's not a defense. It's not like you're infringing and then you claim fair use. Although in practice, that's how it ends up happening. Fair use is actually beyond the reach of the copyright. So the exclusive right in the first place doesn't stop people from using your work in a way that is deemed fair use. Now, how you decide that, it's four factors. It's squishy, and we can get into it. But that's the other important limit on copyright.
Nathan Labenz
Okay, I'm not sure I fully understand the distinction between you're accused of violating copyright, and then you defend yourself with this versus it doesn't extend to that in the first place.
Cecilia Zaniti
But the reason it matters in practice is that you say it's not infringement, and it's kind of like, I guess it's a defense, but procedurally it's not like, oh, I did it. I. Infringed your copyright. And then you get back, you get back to like, no, no, no, it wasn't infringement because it was fair. And, and so the four factors, and by the way, they're not exhaustive. The test literally says may consider these factors and then can consider others. So it's squishy and it's a balancing test. So whenever you have balancing tests, it's like, oh, on the one hand, on the other hand, and it's kind of like, it's not like, I don't know, provisions of the tax code where it's like X percent of your income. And it's very. It's relatively clear. Copyright is known as one of these areas. It's a little more gray. And that kind of makes sense because it's creative expression. But in terms of the factors. Let me pull up, I want to make sure I'm precise for your folks. I'll pull up the factors now. But it's basically the purpose and character of the use. So how is the alleged infringer, in this case, OpenAI using the work? Is it for profit? Is it transformative? There's a bunch of sub questions under that. And that first factor is the one where there tends to be the most play in technology cases. That's factor one. Factor two, the nature of the copyrighted work. So is the work that is being infringed or allegedly being infringed, is it creative? Is it the kind of thing that copyright wants to encourage and society wants to encourage? New York Times here, they would say, yes, of course it is. You know, we've got news articles going back to 1851. We've got news articles that literally we discover things that lead to criminal prosecution. We've got 50 to 100 million readers a week on our work. It's very creative. This is the type of work that copyright wants to protect. So that's that second factor. You'd probably have a situation where OpenAI would most likely admit that it's not like the New York Times is a phone book, which is the one case that found that, okay, phone books are not super creative. There's only so many ways to list phone numbers and names. It's might be copying that as a commercial problem, but it's not a copyright problem. And then amount and substantiality of the portion used is factor three. So how that plays out is, do you need to use the whole work or can you use bits and pieces of it? Usually this plays out in, like, critical work or in sampling cases or, you know, if you are doing, you know, critique. Do you need to show a whole movie? No, you can just show clips of it and do your critique. So that's that factor. And then finally, and this one is gonna, is gonna be important in this case, the money, the effect of how the infringer uses the work on the market and value of the original. So if it's a one to one substitute, in other words, people use ChatGPT and they cancel their New York Times subscriptions because they can get full articles. That's much less likely to be fair versus if it's a complimentary use or if it's not going to affect the ability of that original rights holder to do what they want with their bundle of sticks. Make sense?
Nathan Labenz
Yes. Although I'm not necessarily ready to judge the case.
Cecilia Zaniti
If you want in the show notes, I can also give you like a fair use cheat sheet or something like that. I don't know if people would would like that. There's also a might actually be helpful for folks. The Library of Congress maintains a database of fair use cases and they've organized it really nicely. They've put in summaries so you can literally look things up. You know, tech case that found fair use or tech case that didn't. And the copyright Office tries to give, you know, again encouraging authors to be creative. Gives a lot of guidance on, on fair use that you can check out.
Nathan Labenz
These sound like great resources. Hey, we'll continue our interview in a moment after a word from our sponsors. What would you say have been kind of the most recent wave of battles between let's say technology and other kind of constituencies, rights holders across society? Just prior to the current wave of AI stuff, where have the fault lines or the battle lines been drawn?
Cecilia Zaniti
Just before now big technology shifts tend to have lawsuits that go with them. And part of the reason for that is that you really see money moving. So you know, I talked about, you know, when I was at Yahoo and at the time online advertising was still pretty new and the share of wallet of a given advertiser, let's say General Motors or Ford advertising cars, they were still doing it in print media and TV and now that has shifted. So when you see shifts like that, it's like okay, who's losing money, they're probably going to sue. Who's making money, they're probably going to get sued. And so this OpenAI suit is a pretty classic example in that paradigm. Others. So I was involved in the Apple Samsung case and basically Apple came out with the iPhone. It was such A step function leap from the flip phones and, you know, I guess dumb phones. But in any event, the ability to actually, you know, scroll on your screen and access the web in this intuitive way, you know, Samsung very quickly came out with their own version and that was a big IP suit. And so, you know, when you. That's what I would sort of look for in terms of the trends now. The legal mechanics of how it happens, you know, tend to tend to differ. Digital music is another great example, also involving Apple, the scraping cases and the kind of the Internet at large. Google's trademark lawsuits are also another great example. So it was not a settled question of law for a long time whether if you search for Acme, Acme's competitors could buy that keyword. For a while there were a bunch of lawsuits about that. Google won them all. And basically now that's a pretty settled area of law. But essentially Google's argument was like, well, you know, we're presenting information. It's a different kind of use. And the other fun factor is that judges really like using Google. I really like using Google as a lawyer. It's one of like the best tools based on that. My view, and this is, I guess a little bit more in the political economy space, is when judges like a technology, they kind of find a way to roll for you. Like the market finds a way. I'm. I'm pretty optimistic in that sense.
Nathan Labenz
There's always this. I find this fascinating and I'm, you know, as you can tell, very novice in all of these different aspects. But I do find the question of, like, is the law really the law or is it sort of like this meta situation of like figuring out how we want to interpret the law to do what we want to do. And I think that's always a really interesting question. So you mentioned scraping. That's starting to get like, pretty close to the current question. Right. My understanding is that if stuff's out there on the web, like you're broadly kind of allowed to scrape it to a first approximation, like LinkedIn, you know, lost some cases, I understand, against folks who were scraping LinkedIn profiles and whatnot.
Cecilia Zaniti
Yeah. So this gets to, like, copyright and commercial rights are kind of different. So LinkedIn has in their toss, presumably that you can't scrape by automated means in such a way that it takes down their site. Right. And you know, it would be a violation of the toss. That being said, we do have this case law around. You know, use for reverse engineering is okay. And then facts themselves are not copyrightable. So that Great telephone book case I mentioned, Feist versus Rural Telephone, went all the way to the Supreme Court, and the copier won in that situation and said, okay, there may be other things, but in this scenario, copyright was meant to promote creativity, and it's not meant to promote labor, just the work of putting together a phone book. It's meant to promote the creativity. And so that's the line of cases that. That LinkedIn was able or, you know, effectively lost on. But, you know, there's other ways now. Right? Right. Like how you calibrate the scraping if you're actually going around a paywall, you know, and one of New York Times's arguments here is that they very carefully calibrate where to put the paywall. Maybe you get a guest article, you get a certain number, you know, maybe if you click from Instagram, you get the article for free. But then if you try to reshare it, you know, like. And that's their right. Like, this is really a property. Right. Getting back to that, you know, carnival parking lot example, I can decide to just have my parking lot open on Sundays, and that would be my right. And here the property holder, New York Times says I get to decide where people and how people see my content. And so, so that's how it kind of relates in the scraping case of LinkedIn, you know, the fact that my name is Cecilia Zaniti and my title is founder and CEO of gci, like, that's just a fact. And so, you know, even if somebody discovered that through LinkedIn, there would not be. LinkedIn would not have a claim on that.
Nathan Labenz
Then maybe let's start to try to separate some of the possible issues of what's going on with the New York Times and OpenAI. Because I could see different kinds of complaints that the New York Times might have. A few that come to mind are, one, you took all of our stuff and loaded it in with a bunch of other stuff and trained your models on it. And we may or may not like that. Two would be, we've seen these examples of ChatGPT, like outputting very close to verbatim, essentially plagiarizing an article. I guess I don't know if it would count as plagiarizing if it has been effectively attributed to the New York Times, which I'm not actually immediately sure whether those examples did say this is a New York Times article or not. But in any event, it was able to essentially have memorized and, as OpenAI has called it, regurgitate the full text verbatim or very Nearly verbatim. So they might not like that. And then I guess a third one would be they now have these browsing capabilities where ChatGPT can go out and access stuff on my behalf and pull it back. And so now we kind of have Google like questions of like, or maybe even Facebook like questions of, well, what's the right way to do that? Should they be able to show the whole article? Should they be able to just summarize the article? Should they be able to give you the headline only and link out to it? So is that a good organizing framework for the issues here? And am I missing any? And which ones are really kind of the core ones that are at issue?
Cecilia Zaniti
Yeah, no, you separated it perfectly. Um, and in fact, that's both how the law and, you know, OpenAI and people have been thinking about it. So the, the immediate one of the reasons the New York Times case, I thought is stronger than others is that regurgitation issue. So they, they hired, you know, somebody to go and look and get these verbatim articles, and they have a deep enough library that they got it. So that one is much more clearly infringement. Because if you look at the factors, it's like, okay, you're using the whole. You're a possible substitute. It's really taking away that control from the copyright owner. And it's not really transformative in the sense that somebody wanted the original article, they could go to New York Times or they could, in this, as shown on that thing, they could go to OpenAI. So that's kind of this regurgitation issue. And OpenAI has categorized it as a bug, a singular bug. Now, I don't know, I'm not an engineer, but that seems a little weird to me. But in any event, they're trying to show it's an isolated thing that they are fixing. I would expect in discovery them to come out and defend on that. But that issue, per Open AI, and then I think you could, you could say, reading the complaint, it's not really the core issue. The core issue of training, the one that I think the reason this case has caught so much attention and I think could be a watershed, is really the training. Like, I talk with a creator over the weekend, they've got a podcast, and they're sort of like a relatively famous creator, and they said that somebody, they've had three or four people email them and say, oh, hey, I trained my own GPT on all your work, and it's been so helpful. And she's like, okay, thanks. And it's sort of like, how am I supposed to feel about that? Maybe it's not verbatim copying and, you know, who knows what it outputs? But even if it rewords the phrases, if it's the, you know, the advice, the business advice that this person would give, you know, should there be compensation for that? And that's, that's its own question. And copyright law is not a, it's not a perfect fit because of what I said. But the four factors, you can figure out how you might, you know, it'll be fascinating to see how New York Times and other creators argue it. But I do think you mentioned the, the idea of music covers the music industry is actually a great analogy here because there's a very clear and robust system for how you license music. So Weird Al, who does like funny things like songs where he takes very popular songs and makes them about potatoes or something, he actually licenses all his music because he's like, I don't want to mess around. I'll just pay. I know it'll be a good work. I'll just pay the original. And that's what he does. So, you know, that gives you evidence of somebody very, a very sophisticated creator electing to pay because there's this very clear system. So that was also the benefit of itunes. Itunes came out and it was like, okay, we knew that Napster sort of felt. I was talking with Jason Calacanis last week and he's like, yeah, it felt like stealing. It was, it felt that way to me. And that's where you get that balancing and those kind of like public good communities concerns here doesn't quite have the same feeling as Napster, but it doesn't quite feel like itunes either. Maybe an itunes will emerge where my friend, the creator, somebody creates a GPT of all her work and it's helpful to them. You know, should she get paid in some way? Should there be some kind of Google result where it's like, you can do this? You know, maybe. And I think that that could be what emerges.
Nathan Labenz
Yeah, fascinating. Okay, so it's safe to say the law is not, it was not written with any anticipation of these new technologies. And, you know, maybe in a few minutes we can kind of turn toward like what the law should be or, you know, there's always the possibility that, you know, Congress could do something. But for now, like going back to one of the fair use defense, the main one here, as I understand it, is just the transformative nature of the, the of the activity. Right. So, and it's, it's funny like the, the, the residents. I don't think this was planned either. But the transformer being the architecture that is, you know, commonly in use.
Cecilia Zaniti
Exactly. No name, names matter, right? I mean, think of, think of Uber's God mode. And, you know, the privacy folks got really upset about that. And when I advised engineering teams internally, you know, people are like, oh, we're going to come up with a product called Eye of Sauron. I'm like, yeah, let's, let's not put that in. The code is like what I, what I advise. So, yeah, these kind of, you know, terminology and advocacy, like, the fact that it's called Transformer is kind of a nice fact. 100% agree with you.
Nathan Labenz
So it seems like it is, you know, to say that this is a transformative use on the face of it, seems pretty clear to me. Aside from, I mean, there is this regurgitation issue, which, you know, I certainly don't think that's like how most people are using ChatGPT is to like, try to get it to spit out old New York Times articles. So it does seem like there's a pretty just, you know, man on the street, hey, we took all this text and we made this artifact out of it. And this artifact can do that. Would you call that transformative? I would say, like, you know, the vast majority of people would be like, yeah, that's clearly transformative. Somebody will dispute everything. So who is disputing that and how are they disputing that?
Cecilia Zaniti
So that's all. That's all right. And the legal analysis, you know, you're half a lawyer already, just having explained that that way, so good job. But you know, I think what's the fair use analysis it looks at? Is there a new purpose, meaning or message in the thing? So classically, you know, a critical article of a movie. I'm a, I'm a, I'm Roger Ebert and I'm writing a critique of a movie and I have a couple of lines, you know, that's clearly a different, a different purpose you're reading to evaluate the movie. Different really meaning and different message. So Roger Ebert is good in terms of GPT and it being a different, different way to use the work that's obviously that's going to be a huge percentage of their case is, is really pushing on that and how it's different where New York Times would potentially have holes in that is okay, but you have to implement measures to where it won't be a substitute. And this is where OpenAI will say and has said in their response in the blog post that, you know, we're working on fixing this bug, the regurgitation bug. I would be curious to see stats on it. We have different content measures that prevent users from asking for the copies. So this is the equivalent, like if you go to Kinko's with, with something, you actually sign a piece of paper or you click in their online interface that you have rights to copy whatever it is. And you know, there are cases on that. And OpenAI would say, every single user signs our Terms of Use. Our Terms of Use say that if you do a bunch of fancy prompting to get regurgitation, that's a violation. And they would both have to have a terms of Use use protection, but then they would also have to have technical ones. Right. So Napster never actually got the copies of the music, but they knew that that's what was going on. They knew that copyrighted music was most of the file sharing. It wasn't sharing, you know, open source notes or anything like that. It was really, if you looked at a Napster server at, you know, University of Illinois in 1999, it was all, you know, Soundgarden or whatever, like musicians from there, Metallica, as the case may be. Right. And so that kind of knowledge of potentially contributory infringement or other kinds of infringement is like, what is the technology provider doing now? You could end up with a congressional solution. So the dmca, Digital Millennium Copyright act, it was posted to deal with the fact that, like, if I can post pictures online, which of course I can, you know, early sites like geocities, Shutterfly, whatever, I can actually put up pictures. And they are potentially infringing. DMCA says, okay, you're the photographer, send a note to Shutterfly, get it taken down, send a note to judge, to geocities, get it taken down. And there's a whole kind of process for that. We don't have that yet for LLMs. And so you might expect that something like that would emerge for it to be fair, for it to truly, for OpenAI to truly be able to say like, we're not, you know, copyright infringements are our intent. We're not intending to be a substitute. If you're a rights holder, you can do XYZ things to get your copy content either pulled out or some kind of compensation system like the music folks, that that's going to be where the play is. But of course, in that process, they're going to defend and say no. It's called the General Purpose Transformer. The intended use of this device is like a VCR for your own personal use. So that's how they would, they would go on that.
Nathan Labenz
So is there any in between right now, like for example, the emergence of Spotify, we now have these deals and I guess they could always make a commercial deal between them. The New York Times and OpenAI could say privately, here's our deal, we'll pay you X, you'll not give us a hard time anymore. But if they don't want a deal, is there anything between OpenAI winning and it being like, yep, this is fair use, you're cool. Or on the other end, like, it's not. And I guess then the downstream question would be like, then what happens? They've made some pretty aggressive demands around the destruction of models. Right. So is there any other legal in between?
Cecilia Zaniti
Yeah, so the law is actually relatively adaptable in that sense. So as you mentioned, New York Times had different claims, so different counts of the lawsuit of like, you did this, you did this, you did this. That is typically kind of like bifurcated and analyzed directly. And then, you know, it will take time. But it's very common for one issue to go up to a higher court and then that court is called remand, but basically send it back down to look at the specific facts. So procedurally, can the law handle these sort of shades of gray and different accusations? Yes, it can. In terms of what will happen and is there a possible middle ground? Yes, I think there absolutely is. You could have a scenario where the rule comes out saying, okay, regurgitation, infringement for a used to be fair, the model provider has to take reasonable measures to prevent regurgitation, something like that. And then the question becomes, okay, what's reasonable? Is there some kind of industry standard that emerges? So that's one. Another way it could go is, you know, they could absolutely settle. They could have damages going backwards. So GPT 3.5, which I think was a little bit worse on regurgitation than GPT4. Or you know, it could be like, you know, that model, we get some kind of royalty for that. Or you know, there could be infinitely many solutions. Just like there's like, you remember the bundle of sticks I talked about? Like, it's a really helpful framework to think about. Like, okay, where do we put the line how many sticks does New York Times get to keep and which ones are they going to rent out? Which one they're going to pay for? But what's hard about this is that, you know, we know and we, we. I think a lot of us in technology believe that generative AI is absolutely going to be incredibly lucrative. Trillion dollars. You know, I think BCG says it's like a trillion dollars of productivity is coming. That being the case, any deal you do now, are you kind of like selling the baby? Because New York Times wouldn't get the eventual value. But you know, it's an interesting question that I don't have an answer, but I do think a middle ground is very possible. I disagree with some commentators that are like, oh, it's the end of opening. I know, like that cat's out of the bag. That would be very unlikely to get any kind of full injunction at this point.
Nathan Labenz
So. But there isn't anything that's like, you can do this, but you have to pay. That would be between the parties.
Cecilia Zaniti
Yeah, I mean the concept injunctions themselves are also considering four factors. And one of the factors is is it a harm to whoever's seeking the injunction that cannot be solved with money. And if you think about it, there's actually very few things that can't be solved with money. And so what ends up happening is, as you said, is that it's a private agreement. The court. It's typically unlikely. I mean, sometimes you might get fines. Copyright does have statutory damages. So in the case of Napster, the record companies did go after individual people. There's a woman, Jamie Thomas lady in Minnesota that had, you know, I think she had 80 songs or something. And at the time the statutory damages were like, I think they were 15k or something. They go up in valation. But anyways, she ended up having to pay $400,000. She was just like a regular lady. And so like that was the record industry saying, no, no, we're going to get our specific damages per work. And so you could get something like that. I think it's unlikely. Also OpenAI has said they will indemnify. So unlike Napster, which in that case didn't indemnify this woman. OpenAI has said they would. And so, you know, I think this is gets it more in the kind of clash of titans scenario that you, that you mentioned at the beginning.
Nathan Labenz
Yeah. Okay. So I guess another question is what are the damages? I mean, it seems very unlikely. This is something my wife, who is a lawyer has trained me to think about. You know, anytime somebody's like, we're going to sue him, she's always like, well, are there really any damages? And a lot of times there's not. This seems like, like it's, it's very hard for me to imagine that they're going to be able to produce a ton of people who are like, yeah, I canceled my New York Times subscription because I now have ChatGPT. You know, it's like they're pretty different products, and I really don't think many people are doing this regurgitation thing. So it seems like it's more of like a principled thing. Right. Like, if I'm reading the two sides and I'm trying to infer what they really care about, it seems to me that OpenAI probably wants to set a precedent that what they're doing is allowed and fine, and they want to put this to bed once and for all. And this is like a good chance to do that. And then on the New York Times side, it's a little bit harder for me to say, but it almost seems like they want to, like, it seems like they want, like, Congress to act. Like, it doesn't seem like they really are going to win in the courts. They might need, like, Congress to say, hey, we need a new regime here entirely, because this training thing is, like, it is transformative, but we can't necessarily have everybody's stuff, you know, sucked up and transformed with no compensation.
Cecilia Zaniti
Yeah, I mean, a congressional solution is very possible. And this kind of strategic, you know, litigating with that in mind is very, I would say New York Times is likely doing that. New York Times has also lost at the Supreme Court before on copyright. And so, you know, they, they have that history as well, but in terms of like, the, the, the damages and is it the principle of it? You know, sure, like, there's definitely principle involved here, but that's actually something that the law is pretty good, pretty good at handling. Right. It's literally saying, like, weigh the public good. Like, what is this? Will New York Times or future journalists, future creators, continue to create works if they know that they're not going to be compensated and that they'll train LLMs? You know, that's, that's the kind of thing that I expect a court, the court would consider certainly at the highest levels in terms of the damages. I mean, you do. If I'm a lawyer for New York Times, you know, they got, they got good lawyers and they can find people. You know, I'm seeing Andrew Chen, who I love, Andreessen Horowitz, investor. He posted a long and viralish tweet thread about how he uses GPT only for news. And so, you know, that's one example, but it's the kind of thing that it doesn't take a huge mental leap to Think like I have ChatGPT open as a window all the time, right? So this is, this is a tech shift, right? And so like when these behaviors change, you know, who. And you play musical chairs or whatever, who's holding the bag or mixing metaphors. But anyways, there are times to, to fix my metaphors. But anyways, you know, you can see why they're doing it. Does come down to money too, of course. My favorite quote about copyright was law school. We had the attorney for. It was like a big rapper. I don't remember if it's like P. Diddy or Beyonce, some. Somebody as a guest speaker. And they said, you know, the question was, how does copyright come into play in your work? And he just like totally deadpanned is like, well, copyright is not about the money. And all of us are like, surprised. He's like, it's about all of the money. And we're like, okay, like, so that was a very kind of, I guess to your, to your wife's point, like a very sort of cynical way to look at the law, but, but, but helpful. And so, you know, in a scenario where New York Times can get an ongoing royalty forever, like, of course they're going to try to go for that.
Nathan Labenz
But just to be clear, that wouldn't be possible today. Aside from direct agreement by the parties, like a judge cannot say you have to pay in perpetuity.
Cecilia Zaniti
It's not, it's not as clear here. You know, like there are scenarios where, you know, like in the Napster scenario or you know, with watermarks, if you could show a specific number of copies were made, you know, or in the context of patent lawsuits, there are sometimes like per unit damages that are assessed. And that's, that's the normal framework here. I agree with you. It would be technically quite hard. Although to the point of discovery, you know, you can come up with math to support it. Right? So New York Times had a, a graphic saying, okay, we're X percentage of common crawl. Common crawl is X percentage of your training data. Divide your valuation of $100 billion by that percentage and that's what you pay. So you can come up with something, but it's not going to, it's not going to be very clean.
Nathan Labenz
To your point, very interesting. How do you, how do we think about the different modalities? Because on the one hand we have text and it seems like if I'm grokking this, it's like, okay, if you directly memorize and regurgitate an article, you're probably in some trouble and they kind of recognize that to an extent where they're trying to fix it. Then on the other hand, we've seen a lot of Mario and Luigi images generated from Dall E3 and those are not like exact reproductions. But even though they're kind of different, and they might even be like quite different, then it still seems like those are kind of drawn in the, in the circle of things that are not allowed or, you know, where you would have to pay royalties on them or something. Right. So how should we think about different modalities from text to image and then you know, maybe music or. I don't know how many different modalities we can really consider, but it seems like it varies.
Cecilia Zaniti
Yeah, so you've got kind of an overlay of. You can get to exponential complexity pretty quick because you've got, you know, the four types of IP that we started with at the top of the hour and then you've got, you know, kind of X different modalities with video, images, text, music, et cetera, brand and you know, the number of legal permutations and then technical permutations. You know, I do expect that they're probably either Congress or courts will have to come out with something more clear or we're going to have like the canonical case for each type. You know, in the case of Mario, you know, Sony for its video games and Nintendo as well, they're incredibly active rights holders and you can see why, you know, a billion people have played, played Mario and you know, their creator, I read an article about him, like he literally is like he, he's in kids dreams. Like this is like a thing that, that creating this, this world. But then you think about when, you know, one of the most savvy acquisitions, Disney's acquisition of the Marvel Universe was, you know, it's all IP and that the ability to exploit that again is, is something that they paid for and that Disney is an incredibly successful, enduring corporation because of it. And so in terms of the different modalities, like I do think New York Times case is cleaner being just text. But we're going to see, you know, the visuals are incredibly compelling. I made a bunch of Marios myself. But if you think about the idea expression dichotomy, even Italian plumber, you can think of lots of permutations that aren't Mario from like draw an Italian plumber maybe he's super stylish and he's like that dye workwear guy on Twitter. Like, you know, he's got his like prom perfect suits like that's literally that would be more Italian. I'm Italian, like, and I don't dress like Mario or whatever. And so you can imagine that the ones where it's so clearly, it's like a feeling. And the fair use cases on copyright that get at graphics, they really are about that, that sort of like, is it so clear that the impression that this alleged infringer gives is so clearly the original that there's no way they could have come up with it on their own? Or it's just like not a thing. That's the test that's used. It's a difficult test, but it's been applied. There was a, a restaurant that was inspired by. Oh, spongebob. Yeah. So it was like a spongebob inspired restaurant. But it was like all the same stuff. Like the crab looked the same like I think they called them, you know, it was like the, the, the Rusty Crab instead of the Krusty Krab, something like that. Anyways, in situations where it's so clear, like I think even in a, in an AI context you're going to end up with licensing and what I would expect is some kind of brand registry. So now if you ask OpenAI to, if you ask Dall E to make you Mario, you get, you get an error. And if you ask for Italian Plumber, you know, they probably created a bunch of keywords, but I would expect they'll have some sort of self serve UI for publishers and rights creators to kind of claim things. But again, you know, how scalable is that, how clear it is? You know, DMCA is very clear. That was a congressional solution. So you could, you could end up with that here too.
Nathan Labenz
I haven't studied in detail how OpenAI's opt out works. You may know more about it, but my understanding of the current state of play is that OpenAI has created an opt out on even just for the tech side and they have done a deal with Shutterstock on the image side and perhaps other deals as well that are not disclosed. So interesting that they're, they're going proactively for the licensing on the image side. And on the tech side they're basically just saying, hey, you can opt out if you want to. Not entirely clear even on a technical level how that would work. If you've just, you know, you trained GPT4 18 months ago and now it's kind of baked right. And like, are you really going to be able to extract? You're not going to retrain. So what exactly does that opt out mean? It Would be easier to understand if it was just at the level of like the browsing features. I don't know if you have any more kind of clarity on that.
Cecilia Zaniti
Yeah, yeah. So OpenAI points to in August they rolled out a specific respect for robots, text robots Txt, so you can just give the instruction. I think it's like OpenAI, you know, colon do not correct crawl or something, or we can put it in the show notes. But that is, that would apply also to if you have images on your site. So you can do it that way. But in terms of models that are already trained, so GPT 3, 5, et cetera, you know, you can't, you know, it's like a cake, right? You've baked it with flour. You're only going to extract the flour, the vanilla in a cake.
Nathan Labenz
Like, okay, it's been transformed at this point.
Cecilia Zaniti
Exactly. It's a cake, right? So, you know, especially in the case of these small ingredients, like, you know, you might cook something with nutmeg and have like little, two little shakes in there, but you can tell. And so that would be New York Times argument is that like, okay, two little shakes, but we made the model this much more fluent or whatever. But you know, technically though, I, I'm with some of the commentators on Twitter have said, you know, if they're going to figure out AGI, they can figure out and not have infringement. There's people that have said that they're using GPT 4.5 to actually do the copyright analysis. So if you ask for some of these queries where you're like, give me an Italian plumber or saving a princess, my experience, my own testing, I get like the start of a result and then it'll catch itself. It'll say, you know, it'll render and then it'll say, you know, error copyright policy. And so some people on Twitter have opined that maybe they've got the more advanced models checking for infringement on the prior model.
Nathan Labenz
That is definitely an interesting strategy that I, I do believe they are pursuing in various ways. And, and, and both kind of forward. You know, they're trying to do this weak to strong generalization as well. So they've got a lot of like model. Well, this is just out of their super alignment team. I shouldn't say just because it is a. I think it's a big deal and it's. I haven't figured out quite how I understand it yet or how much confidence I have in it, but basically they have. The general premise is the systems keep getting more powerful at Some point they're going to be more powerful than us. We still want to teach them what we want them to do, but we need them to be able to generalize beyond our ability to supervise directly. So they then create a toy version of that problem where GPT2, I think becomes the weak teacher and GPT4 or whatever is the strong student. And the question is if GPT2 gives you sort of a. Because it's not that good, right, but it kind of knows what it wants on. They have different questions, whatever, but if it can give you this sort of imperfect but like directional signal of what to do, how does GPT4 learn from that? And does it perform better perhaps than the, does the strong student perform. Perform better than the weak teacher? And they show at least some weak to strong generalization. I'm not a big believer that this is about to solve our problems, especially given some of the hacks that I kind of understand that are included there. One of them is, you would ask, well, how much better can the strong student do relative to the weak teacher? And it seems like they can turn up the delta. The, the strong student improvement relative to the weak teacher they can increase, but they do that by increasing the confidence of the strong student. So it's like basically they make it more willing to override the weak teacher and therefore it can do better. But I'm kind of like, well, wait a second. If the solution is going to be to tell these super strong AIs that they should just be more confident in overriding their weak human teachers, like, that doesn't sound like a scheme that I'm like ready to bet the fate of the world on.
Cecilia Zaniti
That's fascinating. It's so interesting. The teacher metaphor though, that's what I'm taking out of what you said is as a matter of pedagogy, that's been one of the most interesting things. And I teach a prompting class for lawyers. And in that class I make the point that these models are basically fast, smart, overeager interns who have read the entire Internet. And so like in that scenario, some of these methods, like, you know, I think there was a guy that found out that you could, you could tell OPEN or GPT that you would tip it and it would do better. Somebody even tried it with a dog treat that you can say, you know, get this answer right, you'll get a dog treat. And so it's, it's interesting that like, whatever the behavior, you know, if you have like, I guess anthropic, does a constitutional AI thing or you know, in the super alignment, like whatever it is that you don't want. Copyright infringement, whatever, like how do you actually teach it and encourage the model to do that? It's, it's a, it's a fascinating question. I, I'll have to read. Read more on this. Like, strong student or weak teacher? And I'm like, oh shit, am I the, am I the, the weak teacher? And my students are great.
Nathan Labenz
The premise is that humanity collectively is the, is the weak teacher. So eventually anyway, and you know, depending on how much credence you want to put into some recent Sam Altman comments, like perhaps not even that far into the future.
Cecilia Zaniti
Yeah, that was a great conversation. The one with Bill Gates, super interesting. He also, I thought it was funny that he said his most used software is slack. So I was like, okay, he's human like the rest of us.
Nathan Labenz
Okay, so coming back to the intellectual property stuff for a minute and then I do want to spend. If you have a few extra, I'd love to hear a little bit more about your company as well. The AI doctor of our global humanitarian dreams is too good of a promise to lose out on because we want to protect the Marvel Cinematic Universe and Disney's rights to exploit it. So there's something here where the collective good has to trump at least some IP considerations. I feel confident in stating that. Does the law have anything like that today?
Cecilia Zaniti
Yeah, so I mean, I think like the sort of software and biotech having different interests as an example has been the case for some time. And you know, the patent system, you know, you mentioned a lot of people say it's broken and then it's like, okay, do you separate out pharmaceuticals from software? So there are examples of that in the context of fair use. Like, you know, to the credit of, of the law since whatever 1851, when Fair Use was first a judicial doctrine right around New York Times, when New York Times got started. But anyways, back to the 1800s, it, it causes this kind of public good balancing test. And it's not a surprise, right, that in OpenAI's blog post in response they said, you know, not only are we fair use, but we're good for the world. And the link that they cited, you'll put appreciate this because it's a legal thing, they cited somebody's dog that's saving the life of their dog using Chat GPT. Now, two things interesting about that. One is that, you know, obviously the, it brings to mind the public good. But two is they didn't cite to a human that saved themselves because they knew there was like, liability and they say we're not a replacement for a doctor, so they picked a dog example. Similarly to, you'll see, Fitbit folks famously can tell if you're pregnant, right? So when you're pregnant, you have more blood in your system and it shows up in your, in your heartbeat and your other stats that Fitbit can say. And when they talk about that, because of the law that says Fitbit is not a medical device, it's a personal fitness device. Blog posts about that are like, and then I talked to my doctor and they confirmed. And so they're always sort of like citing the doctor. So the law will find a way. I do think, like I said, you know, the cat's out of the bag. Do you need to have a bottle? This technology is, you know, what's gotten me so excited as a builder is like, I can't stop thinking of use cases to use this for, like, you know, and that was not the case for me with, you know, with Blockchain or with Web3. And like, you know, it's interesting and I can see the rationale for it, but it doesn't have this like, oh, my gosh, we could apply to this. We could apply to this. We could apply it to this. I was talking with a friend who's a plaintiffs attorney, and they mentioned that when they're deciding whether to. To basically to go to the mat and sue a company, some of these soft factors like, that would come out of like, crunchbase of like, does the company have money? How do they act to lawsuits? You know, that kind of analysis is deep and takes a lot of time. And you can imagine, you know, a transformer trained on all of the law and all of current business could come up with that result pretty quick. So, like, things like that where my lawyer friends in particular have even gotten excited, which we're not, we're kind of like engineers. We're not super, like, excitable, and we're sort of naturally skeptical. And so GPT has reversed that. So I think the possibilities are super strong. And like I said, I maybe, you know, I'm not quite Marc Andreessen level, but I'm quite a tech optimist on this.
Nathan Labenz
Couple other questions that come to mind there. One, we didn't even talk about it really at all. But if I were to say what has surprised me about how 2023 went, the fact that there hasn't been more pushback from the kind of licensed professional classes is definitely pretty high up on My list of surprises. I would have guessed we would have seen an AMA versus OpenAI sooner. Right. You're giving out medical advice without a license. Yeah, you caveat it, whatever. But really everybody knows that. Everybody's asking their medical questions here. As far as I know, not much of that has happened really at all. We've had like a couple of the sort of tragicomic stories of like the lawyer that, you know, had ChatGPT draft the thing and like filed it and got in trouble that way. But we haven't seen this like major clash of interests. Do you have any theory of why that is? It could be like maybe they don't have as much standing as I think or. But I'm surprised that there hasn't been more conflict there.
Cecilia Zaniti
You know, it surprises me a little bit. Certainly, you know, I do see some of it. So, you know, the California Bar came out with guidance about using ChatGPT and then of course the story made big news and. But you've got, you know, even John Roberts at the Supreme Court said a few weeks ago in his report on the state of the judiciary, you know, that this will change legal practice and that, you know, that legal research will soon be unimaginable without AI, was his quote. And you know why that is? I think a couple of things. So one, it's super useful, you know, literally like I can draft, you know, using my product, which is, sits on top of GPT. 4, I can draft a demand letter that would have taken you know, 10, 15 minutes or even an hour in like a minute. It's pretty good. Like it's, it's literally like the jaw dropping moments I have in my classes from lawyers. So there's sort of like, it's good. Like that's, that's one thing. The second thing is we kind of remember the Internet because we're in it. Like most people who are in law, you know, practicing lawyers now, you know, you're going to be at least 20, you know, 25 to have gotten through law school and you remember, you know, kind of the Internet coming up and Google coming up and you're, you're a, you're like a digital native, I guess. And so based on that, I think the, the sort of future of the profession gets it. And then three, I think this like unbundling of expertise, you know, has been happening for, for a bunch of years. You know, Ben Thompson had a good article about it, you know, this Instratech about how, you know, he gets all these eyeballs, but he's just a guy on the Internet. And that really is the democracy of, of tack. And that's what, what tech would point to. Now the query though of like, you know, how, what will it mean to be a lawyer? We, you know, we are worried about that. Like I definitely, you know, there are, there are some folks who are like, kind of like oh my gosh, where's, where's my job going? And you see that and you definitely hear about it. But then we see the case of the, you know, the made up cases or you know, occasionally you get, you know, maybe GPT 4, 5 or future will be better but you do occasionally get garbage in response to legal queries like fairly often. And so I'm really bullish on lawyer in the loop or professionals in the loop. But doctors, I don't know doctors. The other thing, I mean physical like you still need to be seen. So I, I, I think like Dr. Google is like not great. Maybe Dr. Chatgpt is better but, but we'll see.
Nathan Labenz
Yeah, it's happening quick. There's another recent paper out of Google, DeepMind where they have a chatbot competing against humans and also against humans plus AI for diagnosis purposes. And it's one of the great buried leads. I'm a collector of great buried leads in the AI space. And this is one where it's like by the way, the AI is outperforming the human 2 to 1 and it's also outperforming in terms of accuracy. It's like 60 to 30.
Cecilia Zaniti
Was that, was it radiology or what was the, what was the, what was the question?
Nathan Labenz
They've been on a tear with a, you know, a series of different things. This one in particular was a chat modality only. I think I need to double check that. But it was by far the biggest delta that I've seen between human and AI in the medical field. Like previous results have been more like they're comparable or like the AI is narrowly edging the humans out. But we shouldn't over interpret this and this one is like getting to the point where it's going to be pretty hard to come up with all the caveats to be like, we really don't have to take this at face value because the ratio is a full 2 to 1 in terms of just getting to the right diagnosis. So certainly there are things that, what I always tell people is if I have a serious health issue, I will want to use both a human and an AI doctor. I'm not going to leave either one on the table at this point. But you Know, I don't have access to that system because again, they haven't, you know, put it out there. But it sounds, you know, very, very good. And you know, at 2 to 1 it's like going to be hard to defend the licensing regime.
Cecilia Zaniti
No, I mean, that's right. I mean if, if you have access to one, one of the bar associations actually has this California bar has a duty of efficiency to your client where, you know, if you have access, if you have the possibility of using a tool to zealously represent your client, you're, you actually kind of, you're obligated to do it. So it's an interesting angle, you know, including for a doctor, you know, if, if your ultimate goal, you know, is patient health, then yeah, you would want to use these tools yourself. And then, you know, but, but I think it's, you know, somebody's got to train it too. Right? And you know, in this case is trained on the whole Internet to, to my point where some of these common issues, like I've been part of continuously part of an online mom's group group since my first kid was born 18 years ago. And I've written probably volumes of things about various kid ailments. And so yeah, probably, you know, like these ones that are funny, like toxic citis. I'm like, what the hell is that? Sounds super duper scary. Okay, randomly, if your kid ever like their leg like freezes up for like a day, it's that you can ask your doctor and they will calm you down. But you're like, oh my God, my kids. No, but you discover that I, at least I discovered it through these online forums and then like, you know, I called the doctor and that was the case. So yeah, I mean, I think it'll be, I mean again I get back to the optimism and I share your, both your curiosity and your, your excitement about it.
Nathan Labenz
Two more big picture questions. And then the company, so open source, we haven't really talked about throughout this conversation. It certainly changes the practical reality if all of a sudden, you know, GPT4 were ordered destroyed, you know, we would still have mixed roll, you know, and it would be out there and it's just kind of out there, right? There's a company behind that you could sue them. But we also see these like increasingly, you know, collective forms, whatever. There's certainly a way to organize and train a large model such that there's really not going to be a good target to sue. And also like once the weights are out there, they're out there. So on the practical sort of ability to enforce or ability to go sue somebody or to extract rents. Open source does seem to change the game. How would you think about that from a legal standpoint though? Is it relevant in the New York Times first OpenAI that they would be like, yeah, by the way, this is all out there for everybody and free and with nobody that you can sue or does that just kind of have to sit to the side for the to legal purposes?
Cecilia Zaniti
Well, a couple things. So first of all there's like kind of always somebody that you can sue. Like yeah, if it's open source.
Nathan Labenz
Like spoken like a true lawyer.
Cecilia Zaniti
Exactly.
Nathan Labenz
There's always somebody you can sue.
Cecilia Zaniti
You know, in the case of like, you know, whoever prior the resources or you could sue somebody for hosting it or you know, there's like things to do. But you know, how does it change the picture? I think open source is just one more way you can exploit your rights. Right? Open source is not, you know, what is it? Free as in freedom, not free beer or whatever. Like it doesn't mean there's no money changing hands. Right. And so all the great open source companies that we've seen come up and you know, like, and just models like, you know, I follow JJ on Twitter and you know, he makes a great case that, that open source is actually both the way to advance software and to make a lot of money. So like, I don't know that it necessarily, necessarily change the New York Times picture. I do think there will be something around, you know, crawling and trespass and providing some kind of opt out, you know, otherwise as a downstream user of the open source, you can see the downstream users and yeah, okay, you got it off the Internet but you knew it sucked. It's kind of, then it goes to the, the Napster scenario where you know, you have the individual woman who was sued. It's like, oh, it's just open source. I have no idea. Like it would be hard for a legitimate company to make that argument. And you see open source compliance being a big thing. Right. You know, there's startups now that are checking how you do your open source dependencies using AI. And so you know, whatever the problem is, you can apply AI to that problem and then you can find somebody to sue. So that, that would be my take there.
Nathan Labenz
Okay, so last big picture one, you know, you said earlier that it's kind of hard to predict how these cases will play out. So I won't if you can do that if you want to, but I won't necessarily ask you to predict how the courts will rule. But I guess I'm really curious as to how would you rule maybe under current law and then maybe even more interestingly, what do you think the law should be? We have this kind of congressional paralysis that unfortunately leaves us I think in a spot all too often where we're like, well we couldn't hope for any new rules to really be appropriate for this. So we just have to kind of, you know, force everything through the old paradigm. But I would love to hear some forward looking thoughts on like if we were going to write new rules, like what do we think they might ought.
Cecilia Zaniti
To look like, what should the rules look like? So the scenario where my friend the influencer has got people creating GPTs exclusively of their work, it feels like there should be a solution for that. And my prediction is actually that there ends up being a commercial solution to that where there's some startup that's like hey GPT yourself and. Or it becomes so easy that this creator could input her website and then it's literally like you can get her knowledge from there. So like some tech solution where the original, like if you're literally wanting that person's insight and voice, there's some content compensation on that. So I predict some, something in that nature in terms of, you know, another interesting thing, people have started watermarking. There's actually a, I think a company called Nightshade where you can actually poison your images, where it will actually infect the training from there forward. That feels kind of illegal to me so. So I don't think that'll be the solution but something where there is some sophisticated watermarking. There's a, of bunch, bunch of startups looking at that. I would expect that OpenAI or Microsoft or one of the players will buy one and then it will become, you know, really the legit marketplace. And then the third thing I would expect is just a lot of deal making, right? So you know Kindle, which is, you know, digital technology that affected a legacy industry quite a lot. There was just a lot of deal making directly from Bezos himself with the publishers on that. So I would expect, expect, you know, a lot of deal making in terms of how to rule. You know, I definitely would do, you know, kind of what you, what you laid out Nathan, where it was like the three or four different claims and really distinguish in the New York Times case like okay, regurgitation. I expect the court would rule something like you must take reasonable measures to prevent it and you know, more of like almost like a tort Type of, type of tort is a, when there's a wrong in the law that's called tort. And like a tort approach to that I would expect. And then on the training issue, you know I, it's tough. It's definitely, it's going to be real transformative but I think there will be some individual rights after. Or maybe you register and you say hey, if you do a training on, on me, I have the opportunity to test it or you can get a report. You can imagine OpenAI saying here's the number of times that people ask for something related to New York Times. So I, I, I imagine it will be a sophisticated suite. And by the way, Google has had 20 years of, of, of being able to think about how to deal with IP claims. You know you can find their, they have a transparency report where they've got, I think it was 7 billion, they did 7 billion takedowns last year or something like that, like some crazy number. So I expect a, a system to emerge.
Nathan Labenz
Yeah, there's this anthropic research about influence functions tracing model outputs to the training data. I don't know that it's scalable on the level that it would need to be to create a sort of Spotify like Rev Share, but it's the closest thing I've seen to and I think it's probably flawed in many ways as well. It's definitely not as clean as Spotify. You go on Spotify, you play the song, you know whose song you're playing. You can kind of divide up the revenue in a pretty clear cut way with these influence functions. The purpose of the research was to show that with really small models you have kind of dumb, kind of stochastic parrot like behavior and that keyword matching is kind of a rough heuristic for what training data documents seem to most influence the generated output for whatever query. But then as they go up to the bigger models you start to see that the documents that are most responsible, it's a much more sophisticated relationship that they have. Not just simple keyword matching but genuine semantic relationship to what is happening. This is an interesting way of beginning to quantify and they've got some really cool visualizations as well that just kind of show that yeah, when you have a small model it's kind of working in a certain regime, bigger models are working in a different regime. Would it be possible or reasonable to expect that they could calculate that for every single ChatGPT query? Probably not. I don't know how that seems like it might be Overwhelming. But there is at least some principled approach there where you could say this output, you know, connects to this somehow.
Cecilia Zaniti
Or maybe it's a threshold program. Right. You know, like you can imagine, you know, TikTok and all these others, you have to have a certain number of followers. Like, you know, Twitter, you need a certain number of followers before you qualify for monetization. You can imagine things like that. I mean, I don't know, I just laugh whenever, like I, I guess I, I share. I don't remember if you were one of the people that tweeted about this, but when the EU AI act came out and had a specific size of model that was regulated, I just always think of like, was it Bill Gates or somebody famously was like, I don't see why you would ever need more than 32 kilobytes of storage. And it's like, well, okay, like I can see that now, you know, so, you know, I would challenge tech and challenge the companies to think about how to do it now if that's distracting from, you know, amazing medical breakthrough, you know, okay. But wherever there's money to be had, which there is here, then there should be innovation. And so I would expect that if it's not one of the, the big players directly, that one of these smaller players will figure it out.
Nathan Labenz
Yeah. It's funny, it's amazing how obviously this is such a huge emerging force in society, the rise of generative AI broadly, but it kind of breaks some of our paradigms too. You think about how little money artists actually get from Spotify and the effective rate per play is just so low. And that feels like this is coming here too. OpenAI is doing whatever, some ungodly amount of tokens and some ungodly amount of generations, but they're still only doing a billion dollars. So you kind of imagine, okay, what's a rev share on that? And now how am I going to spread it out and whatever. And next thing you know, the New York Times is probably not even getting very much now or even 10 or 100x revenue growth from here. And on top of that they're like trying to push their prices down as low as they can and don't even seem to be. There's a weird non economic mode of operation where they're not trying to make all the money they could make and trying to keep their prices as low as they could. Some might argue if you were a data supplier to them, living off of rev share, maybe unreasonably low. It's just a very weird thing that is hard to figure out.
Cecilia Zaniti
And then you throw in the IRL component, right? Taylor Swift making a billion dollars on her tour this summer, which was amazing. And it's like, okay, how does the money shift? Where does it go? And, you know, is Taylor the only artist powerful enough to, you know, she wrote a letter to itunes and brought them to their knees and had them switch their policy on basically not artists not getting paid during the trial period. So, I mean, you can definitely like how this will shake out. And the amounts of money that were, you know, the billion dollars now is going to seem, to your point, like chump change.
Nathan Labenz
Well, there's a lot more, Many more shoes to drop. So maybe we can get back together in the future with another case or with a new precedent or maybe even some new legislation. For now, let's talk a little bit about your company. So this is. Usually I have to try a product before I'll do it on the show, but you've been so generous in educating me.
Cecilia Zaniti
It's for lawyers, so actually. So I'm not selling to you yet. You are not my icp, to use the startup terminology. But yeah. So basically, it's gcaietgc AI and it's in private beta now because I want it to be good. But essentially it sits on top of GPT. It's a chat modality. But some of the warnings, like GPT now gives warnings, as you said, where it's like, I'm not a lawyer. Consult a lawyer. Okay, I am a lawyer. You don't have to give me that warning. So I've developed a bunch of prompts that make it speak more like a gc. Right? So GC is bottom line up front. You know, hopefully I broke down the issues clearly. But, you know, some of the, the GPT, you know, there was the viral tweet from the Wikimedia guy that was like, GPT is a golden retriever. Okay? Golden retrievers are not good lawyers. And so basically, I've, I've, I've adjusted it to, to be good. So that's one, two is, you know, really thinking about really developing the vertical AI where it's part of my workflow. Right. So, you know, Sam Altman mentioned that his most used app is Slack. Well, so is mine. And that's where all the legal questions come in, like, how do I get a copyright or how do I whatever. And, you know, so integrating it there in the workflow, integrating it within Google Docs. And then for me, you know, it's really about in house. And at tech companies, you know, typically your lawyer is your in house lawyer. Sometimes you interact with outside lawyers if you're in litigation. And certainly OpenAI is going to be in that boat now. But that's where I came up and I mentioned getting so excited about AI. That's where I'm applying it.
Nathan Labenz
So it's get GC AI and maybe not in the exact target market, but I'm on the wait list.
Cecilia Zaniti
Well, your wife, you said your wife's a lawyer, so maybe we'll try her. But yeah, we're in the kind of feedback phase, you know, rooting out bugs and you know, hope to launch in the next month or two.
Nathan Labenz
Well, keep us updated on that. This has been a fantastic conversation. I've really learned a lot from it and I appreciate you taking the time to share all your knowledge with me and with our Cognitive Revolution audience. Cecilia Zaniti, thank you for being part of the Cognitive Revolution.
Cecilia Zaniti
Thank you, Nathan. This was super fun. Reach out anytime.
Nathan Labenz
It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcrterpentine co or you can DM me on the social media platform of your choice.
Podcast Summary: "The Cognitive Revolution" | Episode: NYTimes vs OpenAI: Generative AI and the Law with Cecilia Zaniti
Podcast Information:
In this episode of "The Cognitive Revolution," hosts Nathan Labenz and Erik Torenberg delve into the burgeoning intersection of generative AI and intellectual property law. Their guest, Cecilia Zaniti, founder and CEO of GC AI, brings her extensive legal expertise to discuss the landmark case of New York Times vs. OpenAI. The conversation navigates through the complexities of intellectual property (IP) in the age of AI, the specific legal challenges posed by generative models, and the broader implications for creators and technology companies.
Timestamp: [05:30]
Nathan opens the discussion by seeking a foundational understanding of intellectual property law, asking Cecilia to explain its purpose and main branches.
Cecilia Zaniti:
"Intellectual property is important to encourage progression of science and the useful arts," ([07:00]).
She elaborates that the U.S. Constitution, specifically Article 1, Section 8, Clause 8, instituted IP laws to grant limited-time monopolies to authors and inventors, thereby fostering innovation and creativity. The main branches of IP discussed include:
Timestamp: [07:29]
Nathan and Cecilia explore the role of patents in the tech industry, particularly concerning AI innovations like the transformer model developed by Google.
Cecilia Zaniti:
"Google patented the transformer model but chose not to aggressively enforce it, adopting a defensive strategy instead," ([10:43]).
She explains that while Google holds patents for foundational AI technologies, their strategy focuses on defensive use to protect their platform rather than pursuing infringement cases. This approach mirrors past tech patent wars, such as those between Google and Sonos, highlighting the complexities and resource-intensive nature of IP litigation in technology.
Timestamp: [22:21]
The conversation shifts to copyrights, particularly how they apply to generative AI models like GPT-4.
Cecilia Zaniti:
"Copyright attaches on creation, allowing creators to license separate rights, such as streaming or merchandising," ([23:03]).
Nathan probes deeper into the nuances of copyright, questioning whether using copyrighted material commercially necessitates compensation. Cecilia responds by detailing the "fair use" doctrine, a balancing test considering factors like purpose, nature of the work, amount used, and market impact to determine permissible use without infringement.
Timestamp: [34:08]
A significant portion of the episode is dedicated to analyzing the New York Times vs. OpenAI lawsuit, a potential precedent in the realm of generative AI and IP.
Cecilia Zaniti:
"The core issue of the New York Times case revolves around the training process of AI models, specifically whether using copyrighted material without compensation constitutes infringement," ([35:44]).
Key points discussed include:
Training vs. Output: The lawsuit challenges whether training AI on copyrighted material without explicit permission or compensation violates IP laws.
Regurgitation Issue: A major complaint is AI models occasionally output verbatim or near-verbatim excerpts from copyrighted works, which could undermine the original’s market.
Comparisons to Music Licensing: Cecilia draws parallels to the music industry, where licensing models ensure creators are compensated for uses of their work, suggesting a similar system could emerge for text and other media.
Timestamp: [45:16]
The hosts explore possible legal outcomes and frameworks that could address the challenges posed by generative AI.
Cecilia Zaniti:
"A congressional solution akin to the DMCA could emerge, providing a clear process for addressing AI-related infringements," ([50:22]).
She proposes several potential solutions:
Timestamp: [53:34]
The discussion broadens to address how different media modalities—text, images, music—pose unique challenges for IP laws in the context of AI.
Cecilia Zaniti:
"Each modality requires distinct legal considerations, and the law must adapt to these nuances," ([54:38]).
She highlights:
Timestamp: [75:14]
Open source practices introduce additional layers of complexity in enforcing IP rights against AI models.
Cecilia Zaniti:
"Open source doesn't negate IP rights; it simply offers another avenue for managing and licensing content," ([75:22]).
She explains that while open-source models make it harder to target a single entity for litigation, IP rights can still be enforced against users or entities that misuse the content. Open source encourages innovation but doesn't inherently resolve the issues of unauthorized content usage in AI training.
Timestamp: [77:35]
Looking forward, Cecilia offers her insights into how the legal system can adapt to better handle generative AI.
Cecilia Zaniti:
"Implementing a system where creators can opt-in or receive compensation for AI training on their works is essential," ([79:43]).
She suggests:
The episode provides a comprehensive examination of the intricate relationship between generative AI and intellectual property law. Cecilia Zaniti offers a nuanced perspective, emphasizing the need for both legal adaptations and innovative solutions to ensure that the rapid advancements in AI do not undermine the rights and compensations owed to content creators. As the New York Times vs. OpenAI case unfolds, it stands as a pivotal moment that could shape the future of AI development and IP law.
Notable Quotes:
Cecilia Zaniti:
"Google's patent strategy with the transformer model is much more defensive than offensive." ([10:43])
Cecilia Zaniti:
"Fair use is not just a defense; it's a limit on copyright's exclusive rights." ([24:46])
Cecilia Zaniti:
"A congressional solution akin to the DMCA could provide a clear process for AI-related infringements." ([50:22])
Cecilia Zaniti:
"Each media modality—text, images, music—requires distinct legal considerations." ([54:38])
Cecilia Zaniti:
"Implementing a system where creators can opt-in or receive compensation for AI training on their works is essential." ([77:35])
This detailed summary captures the essence of the conversation, highlighting the key legal challenges and potential solutions at the intersection of generative AI and intellectual property law. For those unfamiliar with the episode, it offers a thorough understanding of the current landscape and the critical issues that stakeholders are navigating.