
Loading summary
A
Most people don't realize how much of their personal information is being bought and sold every day. Data brokers are making billions, pulling details about you from public records and the Internet, then packaging and selling it, usually without your consent. That's how your information lands in the hands of scammers, spammers, even stalkers. It's why you get endless robocalls and why ads seem to follow you everywhere. That's where Aura comes in. Aura actively removes your data from broker sites and keeps it off. They also instantly alert you if your information shows up in a breach or on the dark web. But Aura goes beyond data protection. With one app you get a vpn, antivirus, password manager, spam call protection, dark web monitoring, and even up to $5 million in identity theft insurance, all backed by 24. 7 US based fraud support. Other companies might sell just credit monitoring or just a vpn. Aura gives you all of it together at the same price. Competitors charge for just one service, so start your free trial today at aura.com safety protect yourself now at aura.com safety
B
acast powers the world's best podcasts here's a show that we recommend as we
A
all live through the chaos of another
B
Donald Trump presidency, it can be easy
A
to lose sight of his most troubling legacy.
B
The U.S. supreme Court has reshaped the country's legal landscape on abortion, guns, religion and more. In Slate's new season of Slow Burn, we're taking on Trump's first Supreme Court pick.
A
He is the most unpredictable vote on this court.
B
Slow Burn Becoming Justice Gorsuch out now Wherever you get your podcasts, ACAST helps
C
creators launch, grow and monetize their podcasts everywhere.
B
Acast.com
C
we talk a lot about the power of AI models, but what happens when the public web data they rely on is incomplete, outdated, or just plain wrong? Agility requires not just reacting to market changes, but anticipating them by having a clear, real time view of the world. This means having the infrastructure to see the digital landscape as it truly is, not just how it's presented in curated reports. Today we're going to talk about the foundational element that will determine the winners and losers in the age of AI access to high quality real time web data. As AI agents become more autonomous and LLMs more integrated into our workflows, the quality of the public web data they consume is no longer an academic concern. It's a critical business imperative that impacts everything from competitive intelligence to dynamic pricing and customer experience. Welcome to season eight of the Agile Brand Podcast. This season we're Going all in on Expert Mode, MarTech, AI and Customer Experience. Talking with the people and platforms behind the brands you know and love. Again, I'm your host Greg Kilstrom and I help Fortune 1000 companies make sense of martech, AI and marketing ops. Hit, subscribe or follow to make sure you always get the latest episodes and leave us a rating so others can find us as well. And make sure you check out our sponsor, Tech Systems, an industry leader in full stack technology services, talent services and real world adoption. For more information, go to techsystems.com now let's dive in. To help me discuss this topic, I'd like to welcome Ariel Shulman, Chief Product Officer at Bright Data. Ariel, welcome to the show.
B
Thank you, Rick. Thanks for having me.
C
Yeah, looking forward to talking. Definitely a timely topic here to be diving into. Before we do though, why don't you give a little background on yourself and your role at Bright Data?
B
Sure. So as you said, I'm Bright Data. As chief Product officer, I've been with the company for about 11 years across several different roles. My core, I'm an engineer. I studied industrial engineering followed by an mba. So I'm well in. You know, I like multidisciplinary issues, talking to developers, talking to designers, talking to customers and building things at scale. I'm married. I have two grown kids who are now also attending engineering school. Continuing the tradition. And in Bright Data, I've worked on really many, many different parts of our infrastructure and our product. Back from when we actually invented the concept of proxies and residential proxies, I was the guy who built our residential proxy network, which gives us a really strong position to access web data from. And today I lead a team of about 10 engineers. We move really fast with quick iterations and tight feedback loops and, you know, things are moving very quickly. Especially we're in the web, we're in the web space where nothing, everything is online and can be changed instantly. This is not harder business. So we need to be on our toes and there we are.
C
Yeah. Yeah. And maybe just a little more detail on, you know, can you explain what, what does Bright Data do and what's the core problem you solve? Who's your who, your customers and everything like that.
B
Yeah. So Bright Data is really the biggest company doing what we call public web data infrastructure. And what this means is that we help companies access and use publicly available web data at scale. Publicly available web data means things like prices, reviews and things like that. And this is information that's available on the web for humans. Right? And what we do is we actually make this accessible to machines. And this is hard because the web keeps changing, websites keep changing, and it's very hard to access at scale programmatically. And we do that, and we've been doing that for the last 10 years. Today we're something like 500 people. We have an annual revenue rate of over $300 million. We have a huge proxy network and unblocking infrastructure. We process more than 50 billion pages per day, which is, by the way, more than what Google processes. Wow.
C
Wow. So, yeah, let's dive in here. And I want to start at the strategic level here. Many marketers are used to working with first party data or syndicated marketing or market research. How has the role and strategic importance of public web data evolved for brands especially, you know, in the last 12 to 18 months with the explosion of generative AI?
B
Yeah, I think that in the last 12 months, especially once we saw the adoption of AI chatbots really take off, public web data moved from something that was kind of nice to have or complimentary to being strategic. And the reason is that your customers are actually looking at your brand and at your products, not only through your channels, for example, through your website, but also through other things. So review sites, social platforms, forums, and obviously generative AI platforms. And if you don't monitor that closely, you, you risk making decisions based just on the information you have, ignoring, you know, a wider base. And that can leave you with blind spots. And generative AI has really accelerated how this is, has how this is happening, because people still search, but they ask kind of open questions like what is the best so and so. And you know, how does this brand compare to this brand? And then it is the AI agent that actually makes the decisions, or at least the recommendations on behalf of the customer. And it feeds on public web data. So how you rank in Google, for example, is important. But that's not the only source for these chatbots. So it is important to have, you know, a wide kind of web presence. And that's why web data is important. And it's important to know, you know, how you stack up against competition and how to appear and how to be smart about it.
C
Yeah, yeah. And I think this is getting a lot more notice and attention, surely. But what do you think are still the biggest blind spots or maybe misconceptions that, that marketing leaders, execs still have when they think about using public web data? You know, is it primarily technical, is it strategic, ethical? All of the above?
B
It's kind of all of the above. But I think one of the biggest misconceptions conceptions is that it's easy, right? Because you can go on ChatGPT and let's say you work for some E commerce brand and you say, you know, write me a scraper to that will collect the prices and reviews from my competitor and that looks okay, but it doesn't really hold at scale because things change constantly. And obviously most customers, this is not their kind of core competency to write scrapers or to do things at scale. In fact, one of the analogies I like to make about web data collection is that it's kind of like quantum mechanics, but in reverse. Because if you think about quantum mechanics, when you go really small, the laws of physics change, right? Things go crazy. And with web scraping or web data collection, that's exactly the case, but the opposite direction. So you could, everything works okay until you go above a certain threshold and then things become crazy, you get blocked, you get captured, you get misinformation. It's very hard to pull off at scale. And for enterprise customers, the scale is what is what's interesting. So they tend to underestimate the complexity and they also tend to underestimate the value of what's, of what's in there. Okay. Because if you look at things like reviews that can help you with sentiment analysis, you just launched a new product, you know where you want to know how it's doing. You have some sort of a media crisis and you want to see how, how the Internet is reacting. All of those things are on the web and are not available in, you know, in what you have now. So it can get very complicated when analyzing it. Lastly, there's the ethical considerations. So web data collection is perfectly acceptable and we can talk about, maybe make maybe that about that soon because you know that we have been sued by some big players and that's a really interesting story. But it doesn't mean that you can collect web data as, you know, as you see fit. There are ways of doing this. There are some things you need to respect. You never need to log in. If you log in, it ceases being private, public web data. You need to do this while respecting private information. You need to do this while safeguarding the website's performance. So many customers, especially the large ones, are concerned about these things. And those are some of the things that inside the bright platform are very robust and addressed.
C
Yeah, yeah, well. And so building on the certainly generative AI has accelerated some of the use cases as well as the usage. Agentec AI probably takes that yet Another step further. And you know, your company recently launched a suite of products aimed at powering AI agents. So what are some of the most compelling use cases that you've seen for AI agents to reliably access some of this live web data?
B
Yeah, so one of the major use cases that we're seeing right now is what we call geo. So that's eccentric SEO, but for generative models or LLM visibility, that is understanding how AI systems perceive your brand. In fact, I actually listened to your podcast and I saw that you had brand light a couple of weeks ago. That's, that's one of the things that they do, right? They help brands do that. So we actually provide the infrastructure to do this at scale. Okay. We're not strong into analyzing what the answers are like, but we provide uninterrupted access to these, to these, to this kind of information sources. So it is important to, when you're looking at these answers, when people are looking at your brand, how does it appear, when do competitors show up, what is your kind of share of answer and things like that. So that's one major use case, which is the LLM visibility or how do you appear inside the chatbots as opposed to, let's say your standard Google SEO? A second one is competitive pricing and everything that's E commerce related. So if you've bought a product or booked a flight or did anything along those lines, most likely your traffic or some of the, some of the related traffic went through by data because E commerce relies on this kind of information. And AI agents, if you ask an open question such as, you know, what is the base, you know the best, the best guitar for beginners or something, it will go out and read forums and do all sorts of things, but it will also try to find you some relative relevant information regarding prices and things like that. If you are a retailer, it's very important for you to know this information. And thirdly, also it's related, this is related to E commerce and to retailers, is review and sentiment operations. So you want to know what, how people are talking about your product. And AI agents will actually give that a relatively high score because it's content that's human driven. It's, it's usually in the form of question of questions and answers, things that AI agents kind of like. So it's important for you to understand what's going on because AI agents will use that in compiling their response.
C
Yeah. So for someone, you know, marketing leader out there that's listening to this and maybe as a kernel of an idea of, okay, there's some potential here. What's a first step to start realizing the potential here or even integrating this kind of intelligence into their existing martech stack and, and things without causing a lot of disruption.
B
Yeah, I think though, as, as you know, it's, it's not surprising that you need to focus on something. So probably the important thing is to choose one important high value decision that you want to, to make based on this kind of web data. For example, what should pricing be for this new product that I'm about to launch? Or what do people think about my recently released product? Or anything along those lines. Then you need to look at the scope. So which domains or what are the sources that you will be looking at? Okay, how fresh do you want the data to be? It can be real time or it can be like a month low, depends. And the volumes and the geographies also, because this can be very, especially for large multi, you know, multinationals, it can be very different in different countries based on languages and cultural preferences and things like that. So the data that you collect needs to match kind of the. Try to provide the answer to the question that you're trying to solve. You should always start light with kind of a pilot. It's very tempting to download terabytes of data and it's possible, but it can actually be counterproductive because you kind of get overwhelmed. Start with a few gigabytes or megabytes, put them into maybe even something like Google Sheets, see what the data feels like. And once you have a feel for it, then you can go at scale and then the information becomes very, very statistically robust. So the idea is to go fast, to get some kind of proof of value and then once you're ready to go, go full blast.
C
So I want to get back to something you briefly touched on earlier, and that's the governance, the ethics and those things. So accessing web data at scale certainly does bring up questions of ethics. You addressed that brief briefly and as well as legal challenges and you know, as you mentioned, Bright Data's famously faced and won legal battles with platforms like Meta and X. What's the key principle that marketers need to understand about the right to access public data? And you know, how should they build a governance framework to do this responsibly?
B
Yeah, yeah, that, that was an interesting time. Absolutely. So, you know, technology in general moves very fast and definitely much faster than the law. Yeah. And this is, this is no exception. So you're right. We were indeed sued by Meta and by x. And we won in both, you know, both cases in federal court, court in California. These are actually very important precedents, okay, that have shown the world that public web data is something that is. That that is free to collect under certain conditions, which I'll explain. And it's interesting to see that this is actually. Our rulings are actually used as precedents in, you know, big trials that are taking place right now with some other big companies. By the way, just as an interesting tidbit of information, Meta, while suing us, was actually a customer of Bright Data.
C
Oh, wow.
B
They were scraping e commerce sites, you know, for data, I assume to compare for the marketplace. Marketplace. So the key principle is that if the information is available publicly, it can be collected lawfully. And, you know, there's a famous quote from the. The. The trial that we had against X, okay, Twitter at the time where the judge basically told the lawyer, and I quote, you do not own the Internet. Okay? That he said that you don't own the Internet. And what else? You. You are trying to appropriate part of something that isn't public and open to everyone and say it's only open to you and your customers. Information can be accessed by everyone. So if you do not log in, you have not accepted the site's terms and conditions, you have not entered into an agreement with them. It's okay. We also. You also need to do that in a way that will not have any kind of negative implications on the site's performance. Because when you scrape or when you collect data, if you do that irresponsibly, you can actually damage the site or the response time. So one of the things that we do in Bright Data is, is that we actually measure the response time of the website as we collect the data. And if we see that for some reason, it starts to slow down. For any reason whatsoever, we slow down or maybe we even stop because we never want to be something that's going to be in the way. So going back to your question, customers, this is a very deep tech. I'm a product guy, so I love this. It's a very, very deep tech. Companies are very unlikely to be able to build this kind of thing as part of the platform. We offer all of these things. So make sure that we never log in, make sure that website performance is protected, that we don't know, no personal information is collected. All of those things are part of the system. So I think governance is super important for us. It's important. We have a special page on our website, which is called the Trust Center. That explains exactly how all of these things are done. And it's kind of part of the package. And this is typically important for large customers. Customers, large Fortune 500 customers, sometimes, sometimes even SEC regulators, companies, they're very concerned about these things. And we're very pleased with the outcome of these, of these trials because it actually proved what our CEO always says, that, you know, public data should remain public.
C
Yeah, yeah, Love it. So let's talk a little bit about measuring success then. So, you know, moving past the, We've implemented something, you know, how should, how should customers measure ROI of investing in web data infrastructure like this? And you know, are we talking about, I guess, you know, what, what are some, what are some ways that that ROI is measured here?
B
Okay, well, this depends really on how you use the data.
C
Sure.
B
You know, we have customers who are actually using bright data as an integral part of their technology stack. And if we are, if we go down, which fortunately it doesn't happen, they go down as well. So it's not even a matter of roi, it's a matter of like life or death for the business. If you're running say a price comparison site and you want to have real time information regarding prices on different sites, a query from a customer will result in a query to write it out, which will go to websites and send information back. So ROI can be as extreme as, you know, kind of life or death, or it can be, I would say, softer. So stuff that has to do with, for example, the quality of the data or the staleness or the freshness you would say of the data or how, you know, kind of structured and predictable it is. For many customers, this is very important because it has direct implication, for example, on their pricing models. People take web signals from all over to decide on pricing and promotion based on availability of our product in competitors sites or competitors, even local brick and mortar stores. This is a very sophisticated system. So in some cases, you know, we have clearly demonstrated that the, that the web data that we provided has in some cases doubled revenue for certain products during certain time windows because some discounts were removed as a result of the retailer understanding that they are the ones with the stock that can actually, they're in a good position. They don't have to undercut competition. Yeah, so that's, that's about it. If you're looking at marketing, then we are trying to provide information regarding the campaign efficiency and conversions and, and things like that. That's more on the marketing side.
C
Yeah, yeah. And so, you know, look, looking ahead a bit for Those that maybe they don't have that idea yet or, or they're, they're contemplating some things. You know, what, what's an important action that a marketing leader should take to, you know, make sure their brand is prepared for this, this future you've described where, you know, autonomous AI agents are a key part of the digital landscape.
B
Well, I think it's important to start experimenting. The cost of entry and I'm just not talking, not even talking about the like, like the dollar cost, the effort cost, even the mental cost of trying these things is not very high. At least on Buy Data we have what we call a PLG led, so product LED kind of system. You can sign up, you can try things for yourself and you want to see even on a small scale, the value of, of public web data. It's fascinating for brands to just send a couple of queries anonymously to chatbots in different countries in different languages, for example, and see how they are perceived. Very eye opening experience. And I think that this is becoming more and more important because these, you know, these things are actually impacting consumer decisions. Right? You're asking own open question and you're getting a recommendation. Whereas before you would get in the traditional SEO world, you would get a bunch of links and as a human you would be tasked with going to those links and making some decisions. Right now you have an agent that processes web information and gives you an answer. So you want to know what the web information that AI agents are looking at looks like and how you can potentially manipulate that web information so that you would be better off when that AI agent makes its recommendation.
C
Yeah, yeah, makes sense. Well, Ariel, thanks so much for joining today. I got a couple questions for you as we wrap up here. First one, if we were having this interview one year from today, what is one thing that we would definitely be talking about?
B
Okay, so I can't give you the name of a customer, but I can tell you that a few weeks ago I went to visit a customer, a robotics company and I saw with my own eyes one of those home robots. Okay. And I think that's really the next thing that's kind of we're going to see AI moving from the virtual world into the physical world. We're going to see robots starting to appear, impacting the economy and impacting people's lives. Today we serve something like 14 or 15 of the top LLMs in the world. And a lot of the information that they are asking of us is related to the physical world. That would be video or images or Spoken language for understanding people and maybe even responding back. So I think we're going to see robots because we know that this information is being used to train robots. What could now appear science fiction next year might, you know, seem pretty normal.
C
Yeah. Yeah.
B
Wow.
C
And last question for you here. What do you do to stay agile in your role and how do you find a way to do it consistently?
B
Okay, so I try to, as we say here in very data, stay in the trenches. So I look up, even though I'm a chief product officer and this is big company, I look at support tickets, I look at what breaks in production, I talk to customers. Web changes in literally real time. Okay. Things can change immediately. People, websites change overnight. Things, you know, blocking mechanisms change overnight. All sorts of things change. So it's very important to stay agile in understanding what's going on in shipping. So when we ship product here, we ship in really small, measurable steps. We do in bright data 66. 0 product releases per day. So we iterate extremely, extremely fast. And personally, one of the things I like is to try new tools constantly. So whenever I read that about some new startup or some Y combinator or something like that, I will go and I will, I will pay for the first month and I'm going to try it out. The cost is meaningless. The cost of missing out is much higher. Okay, So I will try this. And even, you know, as a, as a product fanatic, it's always interesting to see new products and new onboarding, onboarding mechanisms. So just try new things constantly.
C
Yeah, I love it.
B
Thanks Greg. Thank you for the all of these thoughtful questions. You know, public web data in general is kind of a misunderstood space. A lot of people have questions about it. Is it legal, is it okay? And I think that we have demonstrated that it is. The mission that we've had in Bright Data has always been the same. To allow kind of uninterrupted public access to this public web data to everyone. It used to be purely kind of old fashioned, sort of, so to speak, web scraping. Now it's more AI related, but at the core it's, it's the same thing. So we take this human information, make it available to machines at scale, and these are going to be a few interesting years with the AI and the robots.
C
Agreed. Agreed. Well, again, I'd like to thank Ariel Shulman, chief Product Officer at Bright Data for joining the show. You can learn more about Ariel and Bright Data by following the links in the show notes. This episode is brought to you by Tech Systems. They're leaders in full stack, tech services, talent solutions and helping companies put it all in action. You can learn more@techsystems.com that's teksystems.com and thanks again for listening to the Agile Brand podcast. If you like the episode hit, subscribe and drop a rating so others can find the show too. And if you're interested in consulting, advisory work, or if you need a speaker for your next event, feel free to reach out. Just visit GregKilstrom.com that's G R E G K I I H L S t r o m.com the Agile brand is produced by Missing Link, a Latina owned, strategy driven, creatively fueled production co op. From ideation to creation, they craft human connections through intelligent, engaging and informative content. Until next time, stay curious and stay agile.
B
ACAST powers the world's best podcasts. Here's a show that we recommend. In uptown New York City, Underdogs created a sound that changed music forever and
C
we called it Salsa.
B
And one label captured that sound like no other
C
records.
B
I'm Rosie Perez and this is our thing. The Birth of Salsa in Nueva York, an original podcast from Futuros Studios, premiering May 26. Follow wherever you get your podcasts.
C
ACAST helps creators launch, grow and monetize their podcasts everywhere.
B
Acast.com.
The Agile Brand with Greg Kihlström®: Expert Mode Marketing Technology, AI, & CX
Episode #845: Bright Data Chief Product Officer Ariel Shulman on Why Access to Real-Time Web Data Is Critical in the Age of Autonomous AI
Date: April 17, 2026
Host: Greg Kihlström
Guest: Ariel Shulman, Chief Product Officer, Bright Data
This episode centers on the growing strategic importance of real-time public web data for brands, especially as autonomous AI agents and generative AI platforms become integral to marketing technology (MarTech) and customer experience (CX) operations. Greg Kihlström and Ariel Shulman explore how access to up-to-date, publicly available web data underpins everything from competitive intelligence and dynamic pricing to brand reputation and AI training, highlighting the legal, technical, and ethical complexities involved.
[05:54]
“Public web data moved from something that was kind of ‘nice to have’... to being strategic. ...If you don't monitor that closely, you risk making decisions based just on the information you have, ignoring a wider base. Generative AI has really accelerated how this is happening...” (Ariel Shulman, [06:19])
[08:05]
“One of the analogies I like to make … it’s kind of like quantum mechanics, but in reverse … Everything works OK until you go above a certain threshold and then things become crazy...” (Ariel Shulman, [08:05])
[10:55]
“One major use case ... is understanding how AI systems perceive your brand…. So it is important to … see how you appear, when competitors show up, what is your kind of ‘share of answer.’” (Ariel Shulman, [10:55])
[13:24]
“Start light with kind of a pilot ... put [the data] into maybe even something like Google Sheets, see what the data feels like. And once you have a feel for it, then you can go at scale....” (Ariel Shulman, [13:24])
[15:29]
“If the information is available publicly, it can be collected lawfully.... There’s a famous quote ... the judge basically told the lawyer ... ‘You do not own the Internet.’” (Ariel Shulman, [16:21])
[19:08]
[21:23]
“It’s fascinating for brands to just send a couple of queries anonymously to chatbots ... and see how they are perceived. Very eye-opening experience.” (Ariel Shulman, [21:23])
[23:54]
“The cost of missing out is much higher... I will try this [new tool/service], and ... it’s always interesting to see new products and new onboarding mechanisms. So just try new things constantly.” (Ariel Shulman, [24:41])
For more insights, follow Bright Data and Ariel Shulman, and stay subscribed to The Agile Brand with Greg Kihlström.