WSJ Tech News Briefing
Episode Summary: The New AI Data Trade, Part 1 – Cashing In on AI
Date: August 17, 2025
Host: Coleman Standifer (The Wall Street Journal)
Overview
The episode examines the emerging market for licensing data to AI companies, focusing on whether smaller content creators can cash in as "AI data providers." The host, Coleman Standifer, unpacks how the relationship between AI models, web data, and media creators is changing—from unauthorized scraping to copyright lawsuits, multimillion-dollar licensing deals, and new monetization tools. The discussion is anchored on the experiences of both independent creators (e.g., Jared Brick of Brick House Media) and big media companies, spotlighting the complex question: In the growing AI data economy, will it pay off for the little guys?
Key Discussion Points & Insights
1. AI’s Insatiable Demand for Data
- LLMs (Large Language Models) require massive, ongoing streams of data to operate effectively and stay current.
- Quote:
“From the perspective of LLMs, data is essentially words. For LLM designers, words are like the new oil.”
— Bob McMillan, Tech Reporter (02:36) - Traditionally, LLMs have trained on any publicly available data ("wherever humans have created linguistic products"), raising questions of consent and compensation.
2. The Old vs. New Internet Value Exchange
- Traditional web crawlers (Google, DuckDuckGo, etc.) drove users to the sites they indexed, benefiting content creators with traffic and ad revenue.
- AI crawlers now extract data without necessarily returning users to the source, threatening the old web economy.
- Implication: Loss of traffic = loss of revenue for media creators and publishers.
- Quote:
“…AI services don’t often send their users back to the websites they pulled from. For many publishers and content creators, this amounts to…unauthorized access…and a loss of traffic and revenue.”
— Coleman Standifer (03:27–04:44)
3. Legal Showdowns & Big-Publisher Deals
- Content owners are fighting back both in courts and with commercial agreements:
- New York Times, Reddit, Disney, NBCUniversal, Dow Jones—all mentioned as plaintiffs or parties to lawsuits against AI companies over unauthorized use of data.
- Some, like News Corp (the Journal’s parent), are cutting major licensing deals with AI firms (e.g., OpenAI, Amazon).
- Quote:
“Amazon will be paying the New York Times $20 million a year to access its content. News Corp...signed a deal with OpenAI that could be worth more than $250 million over five years.”
— Coleman Standifer (08:53)
4. Reddit’s Pushback and Landmark Lawsuits
- Reddit has enacted a new policy distinguishing commercial vs. non-commercial data use, seeking compensation or blocking scrapers.
- Reddit CEO Steve Huffman:
“If they continue to take, then we’ll be forced to file a lawsuit, which is what we did in this case.”
— Steve Huffman, Reddit CEO (07:36) - Ongoing lawsuits, such as Reddit vs. Anthropic, may determine the future legal contours of AI data usage.
5. Small Creators Find Opportunity in the AI Data Trade
- Independent production studios like Brick House Media can license archives to AI firms, potentially turning dormant media into revenue.
- Jared Brick’s Lightbulb Moment:
“Now they’re coming back to content creators because they realize we have so many terabytes and petabytes of media sitting on hard drives that they have no access to...I’ve got all this media. It now has value. It didn’t have value really before.”
— Jared Brick (10:34)
6. New Tools: Monetizing the Crawl
- Cloudflare’s “Pay Per Crawl”
- Enables web publishers, including small sites, to set terms and prices for AI crawlers accessing their content.
- Will Allen, Cloudflare VP:
“You’ll get a certain response back when there’s a payment required and it’ll include the price per crawl. You can decide…great, I want to pay for this content and use it. Or no, I don’t want to.”
— Will Allen (10:00)
- Such tools may shift the balance of power, providing alternative to litigation or take-it-or-leave-it deals.
7. Unanswered Questions – How Much Money, Really?
- The episode closes with a preview of Part 2, promising a reality check on the actual economics for smaller content creators.
- Quote:
“So how do these smaller AI licensing deals work and how much money is really up for grabs? That’s in the second installment…”
— Coleman Standifer (10:53)
Notable Quotes & Memorable Moments
-
Jared Brick (On the data opportunity):
“We had no monetization strategy for it other than just archiving it. When I learned that AI licensing…was a thing...I’ve got all this media. It now has value.”
(01:08, 10:34) -
Bob McMillan (On LLM priorities):
“Data is essentially words. For LLM designers, words are like the new oil.”
(02:36) -
Coleman Standifer (Explaining the shift):
“AI services don’t often send their users back to the websites they pulled from...”
(03:27) -
Steve Huffman, Reddit CEO (On enforcing data rights):
“We can cut them off, we can ask them to stop, but if they continue to take, then we’ll be forced to file a lawsuit...”
(07:36) -
Will Allen, Cloudflare VP (On pay-per-crawl tools):
“You’ll get a certain response back when there’s a payment required and it’ll include the price per crawl.”
(10:00)
Timestamps for Key Segments
- [00:18] — Jared Brick explains why content creators want protection and compensation for their data.
- [01:08] — Jared Brick on discovering the potential to license archived content for AI training.
- [02:36] — Bob McMillan: Data is the new oil for LLMs.
- [04:44–05:28] — Lawsuits from major publishers and the growing legal battle over data usage.
- [07:36] — Reddit CEO Steve Huffman on why Reddit is suing AI companies.
- [08:53] — Coleman Standifer details big licensing deals (NYT, News Corp, Reddit, etc.).
- [10:00] — Will Allen (Cloudflare VP) explains the Pay Per Crawl system to monetize publisher data access.
- [10:34] — Jared Brick describes seeing new value in old digital content.
- [10:53] — Tease for Part 2: The economics for small players in the AI data trade.
Overall Tone & Takeaways
- Analytical, investigative, and occasionally cautious.
- The episode is hopeful for smaller creators but clear-eyed: legal, operational, and financial frameworks are still shaking out.
- Big money is now being made from data, but the returns for the “smaller players” are yet to be determined—a question the next episode promises to tackle.
For listeners:
This concise yet deep-dive episode provides the groundwork for understanding why AI companies need your data, what’s at stake for publishers big and small, and how new tools and deals may pave the way for fairer, more sustainable AI data economies.
