a16z Podcast: Enabling Agents and Battling Bots on an AI-Centric Web
Date: July 4, 2025
Guest: David Mitten (CEO, Arcjet)
Host: Joel de la Garza (a16z Infra Partner)
Episode Overview
This episode explores the rapid evolution of web traffic towards AI-driven agents and the complex challenges of distinguishing helpful automated agents from harmful bots. Host Joel de la Garza speaks with David Mitten, CEO of Arcjet, about building nuanced, application-level security controls to enable new agent experiences—without succumbing to the blunt instrument of simply blocking all non-human traffic. Together, they balance optimism about the opportunities of AI agents with the hard realities of scaling and securing today’s web.
Key Discussion Points & Insights
The Changing Nature of Web Traffic
-
Explosion in Bot & Agent Traffic
- Already ~50% of web traffic is automated.
- AI agents are still in early stages, but are poised for massive growth (00:01, 18:37).
- Old approaches to bots—blocking based on IP or user agent—are too crude for today’s needs.
-
Nuanced Needs
- Not all bots are bad; many provide direct business value (e.g., purchasing bots, search indexers).
- Treating agents as first-class users requires fine-grained and contextual controls (00:32, 02:20).
Notable Quote:
“Just blocking them, just because they're AI is the wrong answer. You've really got to understand why you want them, what they're doing, who they're coming from, and then you can create these granular rules.”
— David Mitten (00:01 / 18:37)
Good Bots vs. Bad Bots: What’s at Stake?
-
Ecommerce Implications
- Blunt blocking can stop real transactions and hurt revenue (04:10).
- As agents automate more actions for legitimate users, distinguishing intent is critical.
-
Legacy Approaches & Their Shortcomings
- Old-school bot-blocking tools mostly filter at the network level, missing crucial application context (05:09).
- Modern sites need in-app tools that understand session context, user intent, and the particulars of each endpoint.
Notable Quote:
“Blocking anything that is called AI is too blunt of an instrument. You need much more nuance. And the only way you can do that is with the application context understanding what's going on inside your code.”
— David Mitten (05:30)
Standards, Signals, and Enforcement Mechanisms
-
Robots.txt and Beyond
- Robots.txt is a voluntary, decades-old standard mainly followed by ‘good’ bots like Googlebot (07:10).
- Malicious or non-compliant bots either ignore robots.txt or use it to find sensitive areas (07:10).
- Need for enforceable, granular forms of agent control as opposed to mere signaling.
-
Rise of Multiple AI Agents
- Companies like OpenAI deploy multiple bots/agents with various purposes: training crawlers, search indexers, real-time task agents (08:48).
- Website owners must distinguish among different bot types, allowing desired ones and restricting others (10:53).
Notable Quote:
“Blocking all of OpenAI's crawlers is probably a very bad idea.”
— Joel de la Garza (10:45)
Layered, Contextual Control
- Building Up Layers
-
Effective defenses require a layered approach:
- Robots.txt for cooperative bots.
- IP-based filtering—with caveats, as proxies and ‘residential’ proxies now muddy traditional signals.
- User-agent heuristics—good bots typically self-identify.
- Fingerprinting protocols—useful for recognizing recurring traffic signatures (12:47-14:57).
- Reverse DNS lookups—verifying bot identity.
- Session and behavior analysis—needed for deep application context.
-
Example: J3/J4 hashes aggregate network/session metrics to fingerprint bots (14:52).
-
Notable Quote:
“You can block a lot of them just on that heuristic combined with the IP address.”
— David Mitten (14:11)
Agent Avatars, User Representation, and Complexity
- Agents as Avatars
- Future web activity will be dominated by agents acting on users’ behalf, requiring nuanced policy enforcement (19:08).
- Control systems must answer “Who is this bot acting for?” and “What are they trying to achieve?”
Notable Quote:
“These things are almost like avatars. They're running around on someone's behalf and you need to figure out who that someone is and what the objectives are.”
— Joel de la Garza (19:08)
The Human Proofing Problem
- Proving Humanness Remains Unsolved
- Methods like digital signatures exist but have poor usability.
- Machine learning (ML) and classic AI have long been used for traffic analysis, but LLMs face speed challenges in real-time (22:04, 23:37).
- Future: Ultrafast, edge-deployed ML models may enable near-real-time inference and agent classification on individual requests (24:40).
Notable Quote:
“All I can think is advertisers are going to love this. It just seems like the kind of technology built for... hey, he's looking at this product. Show him this one.”
— Joel de la Garza (25:27)
Noteworthy Quotes & Memorable Moments
| Timestamp | Speaker | Quote/Insight | |-------------|--------------------|-----------------------------------------------------------------------------------------------| | 00:01 | David Mitten | “50% of traffic is already bots, it’s already automated, and agents are only really just getting going...” | | 06:33 | David Mitten | “It’s kind of like blocking Google from visiting your site... you’re no longer in Google’s index.” | | 08:48 | David Mitten | “OpenAI has four or five different crawlers... one for training models, others for search, others for real-time tasks.” | | 14:11 | David Mitten | “And Googlebot, OpenAI, they tell you who they are. And then you can verify that by doing a reverse DNS lookup...” | | 19:08 | Joel de la Garza | “These things are almost like avatars. Right. They're running around on someone's behalf and you need to figure out who that someone is and what the objectives are.” | | 22:04 | David Mitten | “The pure solution is digital signature, right? But we've been talking about that for so long and the UX around it is basically impossible for normal people to figure out...” | | 24:40 | David Mitten | “That's what we're working on, is getting this analysis into the process, so that for every single request that comes through, you can have a sandbox that will analyze the full request and give you a response.” | | 25:27 | Joel de la Garza | “As I listen to you say that and describe this process, all I can think is that advertisers are going to love this.” | | 25:57 | David Mitten | “Right?” — “Yeah, we're in a weird world.” (Closing humorous exchange) |
Key Timestamps & Segments
- 00:01–02:20: Setting the stage: bots as a majority of web traffic; the agent explosion is coming.
- 02:20–05:09: Good bots vs. bad bots, why blunt blocking approaches damage legitimate usage.
- 05:30–07:10: Limitations of traditional solutions; robots.txt as a voluntary standard.
- 08:27–11:08: Agent use cases; the multiplicity of bots and why granular distinction matters.
- 12:47–15:13: Layers of defense: robots.txt, IP filtering, fingerprinting, and combining signals.
- 16:05–19:19: Adding authentication/identity layers, new paradigms (Cloudflare, Apple).
- 19:19–21:05: Granular control, the avatar analogy, trends in agent objectives and maliciousness.
- 21:05–24:40: Human proofing: digital signatures' undesirability; ML and AI for traffic analysis; the cost and latency of inference.
- 24:40–25:38: Edge inference, implications for advertising, preventing click spam.
- 25:57–End: Light-hearted closing and a look at the future.
Conclusion
This episode challenges web developers, security professionals, and digital businesses to modernize their agent/bot policies in light of surging AI-driven traffic. The days of “block all bots” are over; a move to layered, context-sensitive, and even identity-aware approaches is underway. As AI agents become the primary interface for many users, tomorrow’s sites must welcome good agents and block bad ones—intelligently, flexibly, and in real time.
