
Loading summary
A
Foreign. Hello, everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. My name is Ryan Donovan and today we're talking about all the AI bots and the traffic and the effects that it has on the Internet. And my guest for that is Akamai data scientist Robert Lester. So welcome to the show, Robert.
B
Thank you, Brian. Happy to be here. Been a big fan of Stack Overflow and the work done there for a long time. Very exciting.
A
We love to hear that. And as a site that is concerned about the traffic we receive on the Internet, this is a topic close to our heart. But before we get to that, would like to get to know you. How did you get into software and technology?
B
I did not kind of start my academic and professional career here. I actually started in ancient languages. That was a big interest of mine for a long time and studied that. And it kind of led me to a natural evolution towards language and logic problems that got me into computer science and engineering. And it's kind of led me towards data science, where I get to do a blend of engineering and problem solving, but also data storytelling and crafting. So I really like both of that.
A
So what was your favorite ancient language?
B
I spent a lot of time reading ancient Greek and Latin poetry, primarily the classics.
A
There you go. So today we're going to be talking about the AI bots on the Internet. And we've always had bots crawling the Internet for search indexing and such. But from what I've heard, it seems like the bots that the AI companies have sent out are sort of another level of traffic. Can you give us a sort of overview of the research that you did on this?
B
If we back up, it kind of starts with classification and taking a look at where we are in the evolution of all this stuff. So when we think about the tech giants, the traditional ones, especially those that already have scrubbed the Internet for a large majority of its data, like Google or Amazon, those who already have these sort of products built out, they've already got these massive internal repositories of data as well as the infrastructure in place already to be scraping the Internet daily, updating their indexes and all of that. So from a training presence, we only classify really as the AI bots in this space, those kind of adjunct research bots, like you might see from, for example, the Google Vertex Lab. It's really difficult to kind of engage with a customer sometimes and say Googlebot, traditionally you want to rank high in search rankings. This is something that has been going on for 15 years on the Internet. But then at the same time the same data is getting mixed with their AI training data. So it's difficult to draw that line. But then from another perspective you also see getting away from that are primarily for training data. We get over towards like user driven activity like fetchers is what we classify them as. And these are kind of invocations of external fetching. When users are using the model and for someone like OpenAI, they don't have that search index already entirely built or anthropic. And then we see with something like Google AI Overview, they're able to make internal fetches towards their already indexed results. So it's a question of kind of presence and categorization rather than these companies taking over.
A
The fetching is like when you do a research query or it pulls in just in time data, it does inference on the fly, Right?
B
Exactly. Yeah, that's how we're classifying them.
A
So my sense was that the AI bots are putting a lot more bot traffic on pages. Is that borne out by your research?
B
So it's not a massive needle mover yet, but the growth rate is what we're more interested in. As far as the raw numbers, we're still only looking at this stuff as about a percent of all of the validated bot traffic that we see on a daily basis. But this is a massive growth over what we were seeing at the beginning of the year or last year at this time where we've gone up I think 400% across all industries. So it's been a pretty incredible increase and something that we're definitely keeping our eye on.
A
Yeah, well the way that AI companies have used data has changed in the last year. Like the beginning was just all for training data and now it is that sort of reasoning model, chain of thought, like agentic stuff. Do you see the agentic stuff sort of increasing that traffic load even more?
B
So part of the problem here is drawing lines between what's bot, what's not a bot. If you think about what an agent is, it's automated like a bot, but it's reasoning in a more intelligent manner than a bot. And it's non deterministic in that fashion a lot of the time. And so the behavior isn't quite the same. Similar to kind of classifying these user driven fetchers, it's hard to draw that line. And so what we're kind of moving towards is more of identification and intent of these bots or whatever you want to call them, these entities that your online products are interacting with. And moving kind of away from bot or not, because that is seemingly a less important question at this point.
A
I think I remember that site on the early Internet. Bot or not.
B
Yeah. It's very different. It's rapidly evolving and it's pretty cool.
A
Yeah. Because when you have a AI agent, it's almost like giving everybody their own.
B
Sort of bottom in a way. Absolutely. Or something that these large language models are doing as well is increasing access for people. So while we're seeing this rise in AI bots, there's also been increased kind of Internet activity across the board.
A
So it said the big majors have everything already indexed. Basically, they have a copy of the Internet on their servers. Right.
B
Something like that. I won't speak for them necessarily, but they have a lot of data at their disposal. And the key thing is also they're reusing infrastructure in a lot of cases to where, if you want to classify it as an AI bot, sure, you absolutely can. And in some cases that makes sense, but it also makes sense to classify it as a traditional search engine optimization bot.
A
Do you think the other AI companies will start doing this? Should they do this? Is there a reason that they don't?
B
I think that they're probably working on it. We do see very large amounts of training activity from some of the bigger names in the space. As you'd expect, the leaders in the space are definitely the ones making more waves on the Internet. I assume that in each of these training runs for their new model releases that they're collecting more and more of the Internet's data and trying to do better and better things with it.
A
You know, we've seen some mitigation efforts against these bots, whether to reduce traffic or to protect the content of these websites. Things like, you know, different licensing schemes, sort of closing of the Internet. Do you think those are effective? Do you see any part of that making an effect?
B
The question I think is most important first, though is what is your business model? What does it rely on? And what. What posture makes the most sense for you? Something that we've done at Akamai that I think is pretty responsible approach when it comes to this stuff is being nuanced. In our approach. We're approaching this as a management problem, not as necessarily a threat vector. But these bots can be beneficial to people in different industries while being detrimental to others. Like, for example, someone in the hospitality or retail. They're going to be more inclined to increase their LLM retrieval optimization. You know, they want to be the first ranked page. You want your hotel Room up there. First you want your sneakers coming to the top of the search results. But at the same time, digital media companies, news publishers, people in that industry, they don't want their content aggregated. You know, that hurts their referrals, hurts their click through rate, and that's in many cases bad for business. So, you know, mitigation isn't the only number that we're going for though. We have seen a rise in the number of customers that are mitigating bots, these AI bots, and on a case by case basis. But yeah, we're seeing a lot of varied approaches across the board and I think that's pretty healthy for the space.
A
It seems like the difference you're pointing out is whether the content that you're putting out supports the business or it is the business. What is it when you say you've measured the bot traffic, how do you get the data? I mean, I know you all are a big infrastructure company, but how does that work on the back end?
B
So there are different data feeds that we rely on. Obviously we can't catch every single thing that comes in across the Internet or else we'd be absolutely drowning. We rely on research feeds from what we're seeing across our customer base for both threat research and larger analytics purposes. You know, we're able to look at both attack traffic and non attack traffic. And so this really helps inform a lot of our research, our model building and things of that nature. It's a large amount of data and it's often like looking for a piece of hay in a haystack. So we do our best there, but we rely on a lot of different feature data that we're able to gather from our different products.
A
Do you end up using any AI to sort out the haystack data?
B
We're constantly innovating at Akamai and there is even on my team we work heavily in threat research and a lot of other places. But a lot of what we do starts with ground level analytics and trying to take a look at the space at large and then applying more advanced research methods and getting towards model building. As a final, final result, we're aimed towards enhancement of a lot of products. Our product backbones are still very fundamental and we do our best to increase the effectiveness these kind of newer concepts. We leverage large language models on our own, we leverage neural networks and certainly something where we're always trying to improve.
A
Did you see the bot traffic evenly distributed across sites or was it very strongly targeted towards larger sites?
B
Whether winners or losers, there are Definitely winners and losers in this game. If you were to guess what the top industries were going to be targeted by these AI bots, what would you say?
A
I'd imagine it's probably somewhere in the tech industry, right?
B
It's actually commerce. So the most targeted industries are commerce, which kind of encompasses retail, hospitality, things of this nature, different online brands. But really what's happening is the most requests are coming from these bots that need to be constantly updated for spaces that need to be constantly updated. You're going to see a lot of fetcher requests towards hotel providers or companies because they're always changing rates on rooms. People are always trying to get the best deal. And so it's interesting that that is kind of where this is funneling, but it makes a lot of sense as far as market dynamics go.
A
Do you have a sense of what purpose of these bots are? Are they front ends creating alternative commerce marketplaces? Are they researching prices?
B
It's actually really interesting. We're just starting to take a look at a report that was released by the National Economic Bureau and some OpenAI and Harvard researchers and it said that ChatGPT user driven traffic is moving away from work and towards non work activities. And we're seeing a lot more of this doing than in the past where people are asking models to do things for them rather than just asking questions. I think that is probably in large part due to the fact that we're starting to see them engaging more with external resources, whether that be through agents, fetchers, these different search triggers, search bots. I imagine in many cases there are a lot of these wrappers out there that are kind of just an API call to one of OpenAI's models and trying to build the most effective hotel fetcher. But at the same time we are seeing a lot of organic user driven traffic as well.
A
You know, also with the unevenness of distribution, it's, it's not equally driven by all of the bots, right?
B
There's certain standout ones and it's constantly changing, which is crazy. But you know, we're looking at this stuff and every week something new is happening. Like we published a blog on this earlier in September, I believe, but it was talking about OpenAI. And after their GPT5 release a lot of stuff went just kind of insane. Their numbers were going up and down like crazy. They had released this new model and when you would make a search request, we would see a lot more results from the search request and we were able to at least request growth in ChatGPT user which is the user agent for that bot. But yeah, it went insane. And then it seemed later that they dialed it back, and then we're crawling through dev forums and we're seeing that people are reporting a lot of ghost requests made by ChatGPT. And then soon after there was a new release and seemingly affix to that. And so that is something else that stands out about these AI native companies. You know, the ones that have popped up in the past five years is they're not afraid to build in public and they're not afraid to move fast and break things and put them back together. And they're having a lot of success doing it, but it's something that is definitely bearing out in what we see from them.
A
Yeah, I mean, in this case, though, the things that they're breaking may be the rest of the Internet.
B
We hope not. So far it's pretty benign, but yeah, it's definitely worth watching.
A
Would you say the bot behavior was insane? Is that a product of just like the sort of fluctuating, constantly changing behavior, or were there things where you're like, what is this guy doing?
B
Oh, no, I'd say it's definitely the prior and it's relative to what we know. Right. When we see these traditional search crawlers, a lot of them behave in a very predictable sense. We see seemingly circadian patterns that, you know, might relate to load shifting between clusters or something like that. You know, where we're able to see these. If something makes a massive change, then we look into it. They've made an infrastructure change, and this is the new norm. We haven't really been able to establish a lot of norms for these bots, and that's partially due to just the new nature of them, but it's also due to the fact that they're growing very fast from a popularity standpoint, but also from an infrastructure standpoint, where they're getting better at collecting data, they're getting better at letting their models loose on the Internet, and it's cool to watch.
A
And the nature of what AI does and can do changes too, is fascinating.
B
Yeah, absolutely. Some of the more bleeding edge stuff that we're looking at now is really interesting. We're starting to see agents interacting at point of sale, which is something that we're not entirely sure how the public is going to react to something like that, or if it's necessarily a super viable future. But it's a really interesting concept of these agents are actually exchanging money and, and they're buying products, and what does that do to the customer who is optimizing for sales from anyone, not just a human. But maybe you need to learn how to sell the agents now, which is a totally different question. Maybe.
A
There's so many weird things with that. Like first of all, are you comfortable having your agents spend your money?
B
These things are pretty good and they're getting better, but they're not perfect. And so it introduces a pretty interesting question, both on the client side, but the customer side as well. Both from a sales perspective and also a security perspective, because we don't know exactly how they're going to interact. And that's why it kind of gets back to that question of not bot or not, but intent and identification.
A
Have you seen any data on the sort of difference in behavior between like sort of simulating a browser and taking entire web pages or, or any of them just calling APIs directly?
B
We haven't seen a lot of differences. I mean, there are some though in how these AI companies present themselves. You know, for example, some of these places are really cooperative. They're doing their best to be good participants in the online universe. They're doing their best at self identification, helping us verify that they are who they say they are and making sure they don't get blamed for anything that wasn't them. Right. Which is really positive. And so we see that, for example, one company that does this, they use a certain identifying feature for a lot of requests that come from this bot that comes from interaction in their browser. However, when people are going through the API and making calls from there, they aren't efficient enough or whatever the case may be, they're not including the same signal. And so despite best efforts and the fact that they are still identifying in some respect, it makes it a trickier question where we're having to rely more on behavioral signals and self identification entirely.
A
For those bots that don't self identify, what are the sort of behavioral signals that you use to spot them?
B
Can't give away everything. But we do factor in a lot of features, whether it be something like network telemetry, whether we're starting to look at the actual behavior of how these things are working, which is something that we've been working on building models for for a little while, which has been just an awesome and super interesting process trying to identify what these bots actually behave like online, which is awesome. But realistically, we're looking at everything. We're looking at self identifying features, we're looking at telemetry, we're looking at different signals across the board. And a lot More feature data, but, yeah, can't get too deep into it.
A
Don't want to spill the secret sauce.
B
Exactly. No, it's still moving fast, and some of these partners are more cooperative than others. And it's still the wild, wild west out there.
A
Are there things you are nervous about or hopeful about in the increasing AI bot storm?
B
I. I'm hopeful that customers are going to be able to engage with these bots in the most effective manner for them. I think that while it is a changing landscape and it's a little bit intimidating when this stuff is moving so fast and you have to make plans around it, but there's also massive opportunity here. The first people who are able to game this in their favor are going to be massive winners. If what we've seen to this point indicates anything to the future as far as growth and the trajectory of where this is all heading, this will be an important part of the new online economy and the Internet of things. So it's a really interesting proposition. I mean, in the short term, we're still working through all the seasons with this stuff. We've been looking at it for a while now, but the traffic today and the way that these models are used today is way different than it was a year ago and even than it was six months ago. So we're looking at Cyber Week coming up. We're really excited to take a dive into what exactly. We see some of these companies with agents that, for example, interacted positive. They didn't exist at this time last year. So we're excited to see what they do. And bots always go crazy during the holiday season. Everyone's familiar with Grinch bots and all of these more traditional threat vectors as the holidays approach, but this is an entirely new ballgame.
A
You know, Black Friday, Cyber Monday and then Bot Tuesday, maybe.
B
Yeah, something like that. Keep an eye out. We'll definitely be putting some stuff out about that and talking about what we see. For sure, I think the best message to take away from it all is just how wide open this arena is right now. There are so many options from a customer standpoint, and there are so many different factors going into the equation right now that being able to manage these bots and being able to see them is probably the most important thing. You want to be ahead of the curve on this thing, and it's something that I think we do really well at Akamai as far as being able to provide this service and being prepared is the best possible step forward. So get in touch with your bots.
A
All right, it's that time of the show again where we shout out somebody who came onto Stack Overflow, dropped some knowledge, shared some curiosity and earned themselves a badge. Today we're shouting out a populist badge winner, somebody who dropped an answer that was so good it outscored the accepted answer. So congrats to Evan Phoenix for answering LLVM IR back to Human Readable Source language question mark. So if you're curious about that, we have an answer for you in the show notes. I am Ryan Donovan. I edit the blog, host the podcast here at Stack Overflow. If you have topics, questions, concerns, comments, please email me@podcasttackoverflow.com and if you want to reach out to me directly, you.
B
Can find me on LinkedIn and I'm Robert Lester. You can find me doing Akamai AI Pulse blogs or yeah, if you want to reach out, I'll be on LinkedIn as well.
A
All right, thank you for listening, everyone, and we'll talk to you next time. Sam.
Release Date: January 6, 2026
Host: Ryan Donovan
Guest: Robert Lester (Akamai Data Scientist)
This episode dives deep into the explosive evolution of bot traffic on the Internet, particularly focusing on the surge of AI-driven bots from companies like OpenAI, Google, and Anthropic. Host Ryan Donovan and Akamai Data Scientist Robert Lester discuss what these bots are, how they differ from traditional search crawlers, the implications for web infrastructure and businesses, and speculate on the future of automated agent activity online. Instead of a simple "bot or not," the conversation explores evolving strategies for detection, management, and the blurry boundaries between helpful automation and disruptive scraping.
"We're kind of moving towards more of identification and intent of these bots... moving kind of away from bot or not, because that is seemingly a less important question at this point."
— Robert Lester [04:59]
"As far as raw numbers, we're still only looking at this stuff as about a percent... but this is a massive growth... up I think 400% across all industries."
— Robert Lester [04:06]
"If you were to guess what the top industries were going to be targeted by these AI bots... It's actually commerce."
— Robert Lester [11:16]
"After their GPT5 release a lot of stuff went just kind of insane... But yeah, it went insane. And then it seemed later that they dialed it back..."
— Robert Lester [13:12]
"We're starting to see agents interacting at point of sale, which is something that we're not entirely sure how the public is going to react to..."
— Robert Lester [15:55]
"Mitigation isn't the only number that we're going for though. We have seen a rise in the number of customers that are mitigating bots, these AI bots, and on a case by case basis."
— Robert Lester [07:37]
"We do factor in a lot of features, whether it be something like network telemetry, whether we're starting to look at the actual behavior of how these things are working... we're looking at everything."
— Robert Lester [18:40]
"The first people who are able to game this in their favor are going to be massive winners... this will be an important part of the new online economy and the Internet of things."
— Robert Lester [19:42]
"The best message to take away from it all is just how wide open this arena is right now... Being prepared is the best possible step forward. So get in touch with your bots."
— Robert Lester [21:18]
This episode shines a spotlight on the fast-moving frontier of AI bot activity on the web, unraveling the complexities and huge (sometimes unpredictable) implications for businesses, technologists, and users alike. It’s a world where bots both help and hinder, and where the difference between agent, bot, and user blurs more by the day. As Robert Lester sums up, “being prepared is the best possible step forward. So get in touch with your bots.”