Loading summary
Joanne Wright
IBM is on a mission to become the most productive company in the world. Join SVP of Transformation and Operations Joanne Wright at the break to learn how its mission can benefit your enterprise and why AI is the catalyst for success.
Marty Pesis
People felt like they were hearing about AI companies training their models with their data that was posted online without their permission, feeling like they didn't know what to do about it, feeling like they just needed a partner to help navigate the space.
Colman Stanifer
That's Marty Pesis, CEO of Austin, Texas based AI data broker Trovio, which connects content creators who are interested in licensing their content to train AI models and AI companies that need data. Trovio is backed by venture capital firm 776, which was started by the co founder of Reddit, Alexis Ohanian. Trovio is one of a list of companies that have started offering AI content licensing services. Others include RHEI in British Columbia, Canada and Protege in New York, which is backed by venture capital funding including names like Footwork and Bloomberg Beta. Large Language Models, or LLMs for short, initially train their models using open source data sets some scraped from the web. This led to tension between publishers, content creators and AI companies around the compensation for the use of their data. The Wall Street Journal's parent company, News Corp. Has a content deal with OpenAI. Two of News Corp's subsidiaries have sued Perplexity. Many smaller websites and content creators simply don't have the resources to have a seat at the table. But could AI data licensing change that? Coleman I'm Colman Stanifer and this is the new AI Data Trade, a special two part series from the Wall Street Journal where we ask can smaller content creators make money from their data and will it be as much as they hope? This is Part two. Let's make a deal. As a broker, Trovio has separate contracts with AI companies and content creators. Marty declined to comment on his AI clients due to contractual agreements. But let's say you're a content creator and you're thinking about licensing your videos to help train an AI model. Before you upload anything, you have to sign a licensing agreement. After putting in some basic contact information and an estimate of how much footage you plan to upload, you're shown a contract that outlines the terms of the deal. I viewed a version of Trovio's agreement and here are some of the important details. Money generated from licensing out the footage gets split based on the version of the contract I saw. 60% goes to you, the content creator, and 40% goes to Trovio. I asked Marty if the revenue Split is ever different, and he said it can vary. The contract lasts for three years, then automatically renews for a fourth, unless you give written notice that you want out. Once you've signed up for Trovio and uploaded your videos, they're now ready to be processed.
Marty Pesis
The data is clean, indexed, and enriched with metadata. That really differentiates it. We run it through a bunch of models that we've created and a bunch of pipelines that we've created on our end to enrich that content and ultimately are able to deliver it in the formats that the AI companies need.
Colman Stanifer
That metadata basically describes the video before it gets uploaded to an AI model, and that makes it easier for the model to train off of it. So after all that, how much could you, as a content creator, expect to get paid? Well, it could be determined by a few factors. For example, if you're working exclusively with Trovio or if the video is already publicly available online. Also, quality, type of footage and how much of it you upload matters a lot. What type of footage really, I guess, sells. Right, Because I guess in theory, I could upload my iPhone library to Trovio, but that probably wouldn't get bought up by AI companies. Like, what are they looking for?
Marty Pesis
So there is a quality bar that we look for. I've got a lot of content of my kids on my iPhone, and it's okay, but it's not really the quality that we're looking for. It's usually like, below the bar. So there is a certain bar that we look for in terms of quality.
Colman Stanifer
Remember Jared brick from episode one? He told me he's getting paid a dollar to $4aminute for his uploaded content. This is in line with some of the numbers I saw on Trovio's website. Marty told me Trovio gets better offers from AI companies when videos are specific. For example, instead of someone just holding a football, a team in motion. Playing football would sell better or even better, an unusual sport like curling, where there's a lot of motion and unique moving parts. So what's the bottom line for. For a creator? Jared has licensed over 50 terabytes of data to Trovia. To put that in perspective, it's 50 iPhones worth of data if you've paid for the most expensive model with the most storage. He says he's made just under $80,000 since signing a licensing deal last November. Jared thinks he'll break $120,000 by the end of this year and says that extra money has given him more flexibility with his business.
Jared Brick
We've done other creative projects, we've funded different things, we've paid off debt, we've built new revenue Streams.
Colman Stanifer
As of November 2024, Trovio says it's paid creators more than $5 million for their content. Marty told me that number will reach 25 million later this year. He also said nearly 5,000 content creators have signed up for their service. But when you sign up with Trovio, there's no guarantee you'll get paid at all. That's after the break.
Joanne Wright
In 2023, IBM set a goal to become the most productive company in the world. It started by asking questions, lots of questions, says Joanne Wright, SVP of Transformation and Operations at IBM.
How can we radically simplify end to end workflow and processes? What can we eliminate? How do we automate everything that we can? And then how do we embed AI into everything we do? So far, over a two year period, we've delivered over $3.5 billion of productivity savings for the company.
Colman Stanifer
Tyler Tuen is the owner and head of Good Garden Film, a small production company based out of Whidbey Island, Washington, near Seattle. He's currently working on a web series called Working on Whidbey.
Tyler Tuen
We tell the stories of the behind the scenes of what it takes to run a community.
Colman Stanifer
Welcome to Whidbey Island, Washington State.
Charles Walsh
With a population of just over 70,000.
Colman Stanifer
I found Tyler on Trovio's website under testimonials. Tyler told me he uploaded just over three terabytes of footage or three of those expensive iPhones. Tyler says he's not received any compensation so far for the footage he's uploaded. He thinks this could be due to the age of the footage, which he said was pretty old, with some videos going back to 2006. Marty said that the age of footage is just one factor considered. Older footage could be valuable if it's unique or matches an AI company's needs. Tyler had a lot of footage he could upload, but he's not sure if it's worth it.
Tyler Tuen
There is a certain amount of hours involved in managing data on my end and I don't see the value yet in spending my time in that direction. Where I've got other things, I have other arenas where I'm making human relationships and money from those human relationships.
Colman Stanifer
Other creators are just as skeptical.
Charles Walsh
I don't want to train AI. I want to use AI. I want to have AI do my laundry and dishes so I have time to create. I don't want AI.
Colman Stanifer
Charles Walsh is a YouTuber and freelance photographer based in Oregon. He says he was contacted by rhei, a company that offers a similar service to Trovia.
Charles Walsh
I knew nothing about them and that's what was interesting. I have two channels. I have one in English and one in Spanish. I was born and raised in Costa Rica, so Spanish is my first language. And that's the channel that got flagged by rhei.
Colman Stanifer
Charles said that the company told him he could potentially make just over $5,000 from licensing his content from his Spanish language channel.
Charles Walsh
For a channel, especially my Spanish channel, that's making 30 bucks a month, I was like, 5,000 would be awesome.
Colman Stanifer
But Charles decided not to sign a contract. Some of his YouTube videos include pictures from viewers, and he said he didn't feel comfortable licensing content that ultimately wasn't entirely his own. But also to him, AI is a tool to be used, and he doesn't want to fuel what he partially views as a replacement for his content.
Charles Walsh
Well, it's just like anything, they're tools, but we either use them or they use us. And that's kind of where I'm at. I want to use AI for certain things. I think it's a great tool.
Colman Stanifer
I reached out to rhei for comment and the company's spokesperson confirmed that they contact creators via email, quoting them specific dollar amounts and offering licensing services in order to train AI models. RHEI declined to provide the total number of content creators that have signed up for their services, which they call RHEI Data Pro. It's unclear if companies like Cloudflare or Trovio are temporary solutions or a long term model for this new AI landscape, but there does seem to be a consensus building among media companies and content creators alike. Either cut a deal or find a way to protect your data from AI crawling. Data brokers think they could be a way for everyone, not only big companies, to get in on the AI boom, but there is still a lively debate around fueling AI models that may replace work done by content creators, leading to hesitancy from many creators. However, there are also those that are embracing selling their data to train AI models. Here's Jared from Brickhouse.
Jared Brick
Again, you have to stay with the market, especially in the media realm. You have to stay present. We don't want to become blockbusters sitting there with our hard drives going oh, we didn't do anything and just holding on to our media. So that's the first thing. You can't fight technology. If I don't get involved, is AI going to stop?
Colman Stanifer
No. The new AI data trade was produced by me, Coleman Standifer Sound design and Mixing by Jessica Fenton Aisha Al Muslim is our development producer, Scott Salloway and Chris Zinsley are our deputy editors and Falana Patterson is the Wall Street Journal's head of News Audio. I'm Coleman Standifer. Thanks for listening.
Joanne Wright
It's not just IBM that benefits from its mission to be the most productive company in the world. So do its clients. Joanne Wright, SVP of Transformation and Operations at IBM, explains, we've created a playback.
As Client Zero for how to do really fast effective AI. The key has been to drive for progress over perfection. We built a solid foundation with data and taken the opportunity to really learn from the people who have a role to play in running IBM each and every day. Our own experience has taken us from far beyond just doing pilots and theory to real ROI and real productivity. A lot of our clients are very hungry to know what they can learn from us as Client zero and then obviously how can they avoid perhaps some of the mistakes we've made or some of the failures we've had? The fact that we've been able to derive and deliver our own use cases across everything that we do really transcends our clients experience.
Visit IBM.com to learn how AI can drive enterprise wide productivity.
WSJ Advertising Department
Custom content from WSJ is a unit of the Wall Street Journal Advertising Department. The Wall Street Journal News Organization was not involved in the creation of this content.
Episode: The New AI Data Trade, Part 2: Let's Make a Deal
Date: August 18, 2025
Host: Colman Stanifer (The Wall Street Journal)
This episode, the second in a two-part special, explores the maturing landscape of AI data licensing, particularly for smaller content creators. Host Colman Stanifer dives into the new world of data brokers—the companies connecting creators who want to monetize their content with AI firms hungry for training data. Can creators cash in on this AI gold rush, or do the odds still overwhelmingly favor the big tech players?
Origins & Purpose:
Companies like Trovio (Austin, TX), RHEI (British Columbia), and Protege (New York) are emerging as intermediaries to help creators license their content for AI training purposes. Trovio, for example, is backed by Reddit co-founder Alexis Ohanian’s firm, 776.
(01:31–02:00)
Market Tensions:
Initial AI model training scraped freely from the web, sparking backlash over compensation. Big companies, like News Corp (WSJ’s owner), are cutting content deals, but smaller players felt left out and powerless—data brokers aim to bridge that gap.
(02:00–02:44)
“People felt like they were hearing about AI companies training their models with their data that was posted online without their permission... feeling like they just needed a partner to help navigate the space.”
— Marty Pesis, CEO, Trovio (00:18)
The Licensing Process:
To license your content, you sign a contract (typically 3 years, with automatic renewal unless notice is given). Revenue splits in Trovio’s sample contract: 60% to creator, 40% to the broker—though this varies.
(02:44–03:12)
Data Processing:
Once uploaded, the content is cleaned, indexed, and enriched with metadata to make it AI-ready.
(03:12–03:29)
“The data is clean, indexed, and enriched with metadata... that really differentiates it. We run it through a bunch of models and pipelines... ultimately able to deliver it in the formats that the AI companies need.”
— Marty Pesis (03:12)
“There is a quality bar that we look for… It’s usually like, below the bar [iPhone family videos]... So there is a certain bar that we look for in terms of quality.”
— Marty Pesis (04:11)
“We’ve done other creative projects, we’ve funded different things, we’ve paid off debt, we’ve built new revenue streams.”
— Jared Brick (05:28)
Payouts & Guarantees:
Trovio claims to have paid out over $5 million to creators as of Nov 2024, expecting $25 million by late 2025. Yet, there’s no guarantee every uploader will get paid—some will earn nothing if their content isn’t licensed.
(05:36–06:04)
Mixed Experiences:
“There is a certain amount of hours involved in managing data on my end and I don’t see the value yet... I have other arenas where I’m making human relationships and money from those human relationships.”
— Tyler Tuen (07:43)
“I don’t want to train AI. I want to use AI. I want to have AI do my laundry and dishes so I have time to create. I don’t want AI.”
— Charles Walsh (08:08)
“It’s just like anything, [AI tools]... we either use them or they use us. And that’s kind of where I’m at.”
— Charles Walsh (09:15)
Industry at a Crossroads:
Media and small content creators face a choice—join the data licensure wave or find new ways to protect content from AI crawl. Data brokers position themselves as a means for "everyone, not just big companies," to profit, but creator skepticism on several fronts (ownership, time investment, disruption) remains.
Adapting to Change:
Some, like Jared Brick, embrace the change, emphasizing the risk of being left behind by the rapid advance of technology.
“You have to stay with the market... We don’t want to become blockbusters sitting there with our hard drives going ‘oh, we didn’t do anything and just holding on to our media.’ You can’t fight technology. If I don’t get involved, is AI going to stop? No.”
— Jared Brick (10:35)
On the dilemma facing creators:
“Either cut a deal or find a way to protect your data from AI crawling.”
— Colman Stanifer (09:27)
Summing up the stakes:
“We don't want to become blockbusters… just holding on to our media.”
— Jared Brick (10:36)
This episode provides a nuanced look at the complex, evolving relationship between content creators and the burgeoning AI data brokerage world. While some creators are cashing in, others confront ethical, practical, and financial hurdles. The debate remains lively—should you fuel AI with your data, or focus on protecting and differentiating your work as the landscape keeps shifting?