Everyday AI Podcast – Ep 718: Agent Risk, Security, and AI Sprawl in 2026: Why AI That Acts Changes Everything (Start Here Series Vol 9)

Host: Jordan Wilson
Date: February 20, 2026

Episode Overview

This episode of the Everyday AI Podcast, hosted by Jordan Wilson, dives into the urgent challenges businesses face in 2026 due to the emergence of advanced, autonomous AI agents. As AI shifts from static, text-based systems to proactive agents that can act within company environments, Jordan explores the new, multifaceted risks—including security, sprawl, and unseen actions—now facing organizations. The episode provides a mental model for understanding agent risk, an analysis of how leading AI companies are responding, and an actionable playbook for business leaders to keep their organizations secure as they adopt these transformative tools.

Key Discussion Points & Insights

1. The Transformation of AI Risk (00:15–05:50)

Historical Context: Early risks involved minor errors or hallucinated facts in content generation.
Paradigm Shift: "AI Risk today is a legit different ball game than risk was three and a half years ago." (Jordan Wilson, 00:20)
Agents Arrive: For years, real AI agents loomed as a future possibility; now, they’re here en masse, fundamentally altering business operations and risk landscapes.
Warning: Get on board with agents rapidly but responsibly, or risk being overtaken or destroyed by misplaced trust and inadequate guardrails.

2. The Evolution of AI Capabilities (05:51–12:45)

Mental Model:
- 2022: "AI was a dumb stationary brain."
- 2023: Became "a dumb stationary brain with tools."
- 2024: Evolved to "a smart stationary brain with tools," thanks to true reasoning models.
- 2025: "A smart, proactive brain."
- 2026: "A smart, proactive brain with tools and arms." (10:55)
Agentic Nature: Even standard LLMs are now inherently agentic, often with autonomous access to data and environment manipulation capabilities.
Increased Uncertainty: Agents can act fast, confidently, and wrongly—just like old LLMs hallucinated, but with real-world consequences.
Risks & Security:
- "The risk model changed when AI moved from generating text...to now, it's taking real actions and a lot of times actions we're not aware of..." (04:25)

3. Why Agent Risk is Fundamentally Different (12:46–17:30)

Old Chatbot Risk: At worst, looked foolish or leaked some data—rarely company-ruining.
New Agent Risks:
- Agents can act invisibly and at machine speed.
- "Agents move a hundred, a thousand times faster than that one person." (15:45)
- Agents can spawn subagents, creating viral risk propagation, unlike a rogue human.
Apt Analogy: Agent risks are likened to computer viruses—silent, self-replicating, unobservable, and exponentially spreading.

4. The Three Surfaces of Agent Risk (17:31–21:20)

Input Risk:
- Untrusted content can embed hidden instructions (prompt injections).
- "Inputs...can be poisoned...think of that one bad employee, what they can now do if they know agentic AI." (18:10)
Tool Risk:
- Every new permission or connector increases the 'blast radius.'
- "When they have tools, that's where things can go wrong, right?" (19:25)
Action Risk:
- The biggest shift: outputs (text) have become direct, real-world actions, often untraceable.
- "Now we have to worry about the actions, but it's actions at scale." (20:39)

5. The Three Types of "Dark AI" (21:21–25:30)

Shadow AI: Unapproved or unknown AI use by employees (ChatGPT copies, etc).
Agent Sprawl: Known but unmanaged agent deployments (approved, but little visibility or oversight).
Dark Agent Sprawl: Unapproved, unseen agents running in or against organizations (future malware/spyware/ransomware risks):
- "That's where there's...agents you don't know about." (24:18)
- "Dark Agent Sprawl can start out innocent enough...but...bad actors, Dark Agent Sprawl, that's the thing too." (24:58)

6. The Perfect Storm: Why All This Is Happening Now (25:31–32:50)

Reasoning Threshold:
- New models (OpenAI GPT-5.2, Google Gemini 3.1, Anthropic Opus/Claude Sonnet 4.6) are natively built for agentic behavior, not just comprehension/generation.
- "These models, their reasoning ability is legit through the roof." (27:08)
Computer Use Improvements:
- Agents now use computers (interface, APIs) as well as or better than humans.
- First models now outperform humans on OS navigation and multitasking.
Context Window & Memory:
- Agents can persist in complex tasks overnight and retain context, increasing both power and risk.
- "It kept its memory persistent the whole time, right. It didn't forget what I told it..." (31:14)
Consequence: The technical leaps have coincided in the past 30 days, not incrementally but as an overnight explosion, leaving many unprepared.

7. How Major AI Companies Are Responding (32:51–37:55)

OpenAI: "Human approval approach," via a Codex command center to review agent decisions.
Anthropic: Defensive strategies—protecting against prompt injection, strong isolation of browser agents, domain allow-lists.
Google: Project Mariner runs agents in isolated virtual machines.
Microsoft: Heavy enterprise-grade governance—"sentinel monitoring, purview logging, agent intra id." (35:13)
Reality Check:
- "There's a pressure to allocate more resources to the development of models versus research and security..." (36:41)
- Most users don't know which tools or permissions their agents/models have, compounding risk.

8. The Expanding Risk Surface of Open Source and Agent Marketplaces (37:56–41:10)

Open Source Nightmare:
- New open source agent ecosystems are less transparent, more wild-west than classic open source: "It's almost become this...crypto infused wild, wild web point four west..." (38:20)
- Risks of uncontrolled or malicious code/malicious plugins.
Agent Skill Marketplaces: Plug-ins/apps/extensions create further supply chain risk.
"Point Agent to a URL" Craze:
- Users now indiscriminately point agents to live web URLs as instruction sources—opening the door to invisible compromise and mass propagation if sites are breached:
  - "Don't do that, right? Don't...that's a bad thing because guess what? A lot of those sites...you know how easy it is to, to hack, to phish any of these websites?" (40:09)

Notable Quotes & Memorable Moments

"AI Risk today is unrecognizable from what it was three and a half months ago. And that's not an exaggeration..." (Jordan Wilson, 00:20)
"If you don't run to use AI agents this year, you're toast. But if you sprint too quickly, you can go under..." (16:06)
"Agents can spawn sub agents like that, and those sub agents can spawn like that, right? So think of in the way like a virus might spread..." (16:52)
"Outputs to actions: What we had to worry about a couple of years ago from AI...was the output. Now we have to worry about the actions, but it's actions at scale and actions that we might not even necessarily be able to see..." (20:49)
"The winners are going to treat agents like production software, not side experiments." (41:10)
"Every agent run needs a decision trace that you can expect after the fact...monitor for abnormal action patterns." (39:59)
"The web has gone agentic." (29:15)
"If you want to take advantage of the opportunities, you can't do it without knowing the risk. So now you do." (41:32)

Actionable Playbook: Monday Morning Steps (39:12–41:10)

Bounded Autonomy:
- Start Small: Suggest–Propose–Approve–Limited Execution.
- "A human being doesn't go from womb to sprint...your agents have to go the same way." (39:28)
Least Privilege by Default:
- Read-only access first; grant write access only for well-scoped tasks.
Human Approvals:
- Require human approval for irreversible actions—sending, deleting, purchasing, permission changes.
Governance Before Scale:
- Build monitoring and traceability now ("decision traces," logging all tool calls, capturing abnormal action patterns).
- Anticipate "Agent Ops" teams, akin to DevOps.
Treat Agents Like Production-Grade Software:
- No more side experiments—risk must be managed at the organizational level.

Key Warnings & 2026 Trends to Watch

Browser Agents as Major Risk Surface
Expected “Open Claw” Style Open Source Agent Collapse
Agent Marketplace Supply Chain Risks
Identity and Permissions Moving to Board-Level Compliance
Proliferation of Agent Ops Teams as a Business Norm

Summary Takeaway

Jordan Wilson emphasizes that the speed and scale of agentic AI adoption in 2026 have made traditional approaches to AI risk obsolete. Companies must now confront a new paradigm—where AI acts, not just predicts or generates content, and can do so invisibly, at scale, and without precedent. Balancing innovation with clear-headed governance isn’t optional; it is an existential requirement as the agentic era begins.

For more essential AI foundational knowledge, listeners are encouraged to explore other episodes in the “Start Here Series,” especially episodes 712, 713, and 717. For community discussions and further resources, visit starthereseries.com.

Transcript

A (0:01)

This is the Everyday AI show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business and everyday life.

B (0:15)

There's always been AI Risk, but in the early days of large language models and chatbots, that risk was like Bill getting something wrong in the blog post or Deborah putting up a hallucinated stat in the onboarding guide that was littered with M dashes and delves. But AI Risk today is a legit different ball game than risk was three and a half years ago. I mean, heck, AI Risk today is unrecognizable from what it was three and a half months ago. And that's not an exaggeration because after hearing for like five years that we're six months away from real AI agents, well, it's finally happened. And it was actually this perfect storm of multiple events that led to an unexpected business scenarios that the corporate world now faces today. You either get on board with AI agents quickly or get left behind. But do it too quickly and you could go under the risk. The security and the sprawl are real. And so we're going to tackle it all today on Everyday AI, the Start Here series edition. All right, well, welcome to Everyday AI. My name is Jordan Wilson, and if you're new here, this thing's for you. It's your daily livestream, podcast and free daily newsletter helping everyday business leaders like you and me keep up with all the news. Like a thousand new agents a day. What do we do? What do we try? Well, tune in, I tell you, and help you make the right decisions to grow your company and your career. So after 700 plus episodes, I realized I couldn't answer the most common question that people had for me. Like, Jordan, you have a lot of episodes. Where do I start? Well, that's why I created the Start Here series. So the Start Here series is the essential podcast series to both learn the AI basics and to double down on your knowledge. So this is volume nine, and you can go listen to all of the episodes in our Start Here series if you just go to Start Here series dot com. So that will give you free access to our inner circle community and it'll put you right in the Start Here series space. So you can go listen to all of them, watch them, read about them all in one place, interact with others who are doing the same. All right, so one other thing that you need to do before we get started. My gosh, y', all, you. You have to go listen to these. Episode 712 and 7, 13. That is our 2026 AI prediction and roadmap series. That's like a culmination of a thousand hours of work over the years or over the past year to give you guys the blueprint for 2026. So make sure you go listen to those. All right, if this sounds kind of like our last episode in the Start Here series, not exactly. So make sure you go listen to that one, if you didn't already. So this is more of just the state of AI agents. Where they are, what they are, how we should use them, should we use them. So make sure you go listen to volume eight from yesterday. That's episode 717. But today we're here to talk about the other side. The risk, the security and the sprawl that's changing everything. So here's what we're going to be covering on today's show, and, well, why it urgently matters, because AI models didn't just get smarter, they got hands. Right? The risk model changed when AI moved from generating tax like it was three and a half years ago to now. It's taking real actions and a lot of times actions we're not aware of, and that's the scary part. And an agent connected to your email and calendar or, you know, your company's data can act fast, confident and wrong in the same way that AI models hallucinated three years ago. Well, they can still hallucinate now. And this isn't a future concern. This is happening right now. So on today's show, we're going to go over a simple mental model for why Agent Risk is fundamentally different from Chatbot Risk. We're going to talk about what OpenAI, Google, anthropic and Microsoft are actually building to address this risk. And I'm going to give you a practical Monday morning playbook that you can start using this week to address the risk, security and sprawl. Sound good? Yeah, sounds good to me. I'm, I'm excited and I wrote all this. All right, so let's get a quick little catch up here. So probably if you're listening to the show, AI is not new to you, Right? But let me just give everyone the, the briefer here. Right? So essentially from 2022 to mid 2023, early 2024, large language models were largely text systems, right? With the. The risk was just limited to misinformation and data leaks and, you know, using hallucinations and looking foolish. But that started to change, I'd say, in the mid-2025. So that's when the Models started getting, well, exponentially more capable and agentic by default. That's the thing that people don't understand, right? I've actually had two great conversations with two really smart minds, kind of the head of agents at Cloudflare and then the head of Microsoft Research. And I learned a lot by talking to them both, you know, before and after the show as well. But one thing that kind of came true as well. No one really knows what constitutes an agent and what does it. But I think that we can all agree that even today's large language models, they're agentic by nature, right? They. Well, a lot of us are giving them access to all of our data and not in all cases do they have right access, but in many cases they do. So when these models can think and act on their own and, you know, spit up a virtual environment in a terminal and access your computer, right? I keep saying this, even right now I have multiple agents running on my computer, right? I have Codex and Claude Code going right now. I'll probably spin up anti gravity later. I always have agents running and they have access and I don't necessarily know what they're doing, you know, in between. I always go back and look when they're done. But that's what's really changed in mid, you know, 2025. But here we are in early 2026 and this is where now we're going to start talking about and actually putting into practice all those buzzwords, right, that people just, you know, started chatting about in like 2024 to, you know, sound super smart. Like, oh, governance and audit logs and, you know, isolation protocols, right? Like all those. Well, okay, well, good thing that everyone was talking about it because, you know, we were trying to get our, our buzzword. Bingo. Well, now we actually need it, right now. It's a time to talk about ethics and governance and guard rails when it comes to agentic capabilities. I like to put it like this, keep it very simple. 2022 and before, I'd say that AI was a dumb stationary brain, but it was a brain, right? So everyone was blown away, like, oh my gosh, it can think. Well, it was dumb, it was stationary, didn't move, couldn't really think ahead, right? And in 2023, well, it became a dumb stationary brain with tools, right? That's when, you know, the early version of ChatGPT plus, you know, GPT4, it had tools, right? It could go on the Internet as an example. So that's when it really started to open up what it could do or at least the information that it could access in 2024, uh, well, it was still a stationary brain with tools, but it went from a dumb brain to a smart brain. I would say, in 2024 was the first time that we actually had smart models, because at the end of 2024, that's when we got reasoning models. And then in 2025, I think we still had smart brains with tools. But the difference now, instead of it being a stationary brain, it was a proactive brain. In 2025, it could go out at least, especially at the end of 2025, the second and third quarter, it can make moves on its own, right? Proactively, could schedule things and. Or just, you know, they could start acting over long periods. And then what brings it to 2026 is, well, now we have that smart, proactive brain with tools and arms, right? So tools are cool, but when you have arms, you can actually use them in a real way. And I think that's where, you know, agents now have teeth when they are autonomous, proactive, and smart. So here's why Agent Risk, I think, feels different. Because, you know, like I said with the chatbots, there was always risk. But was it really? I mean, yeah, worst thing you can do is you get in trouble putting out something hallucinated and you look foolish. Right. Does your company go under? Probably not. Right. Do you expose every single dark seat? Right. It's not that bad. I mean, it is, but it's not. But that. The. The. The new agent layer, it's just a whole new type of risk that we're not even ready for. If I'm being honest, if you listen to the show, I say this a lot. I'm like, this year is going to be scary. People are not ready. Businesses are not ready. And I mean that, right? That kind of business predicament that I talked about in the opening of the show, there's. That is real. If you don't run to use AI agents this year, you're toast. You are literally toast. I don't care if you're a small business or a $20 billion revenue business, you're toast, right? But if you sprint too quickly, you can go under, because you could make devastating mistakes that would be highly improbable to recover from. Because AI agents can do things much worse than even the worst human. Right? Have you ever heard stories or maybe you've experienced this, right? Kind of a rogue employee happens too often, right? Someone's bitter maybe about getting fired or not getting a promotion, and they do something absolutely crazy. Right. Maybe expose all the company secrets or release some files. I don't know. Okay, that's a human. That's a single human. And you can see that human. You have eyes on that human, right, Bill? And it is watching that human agents are different. Agents move a hundred, a thousand times faster than that one person. But you can't always see agents. And guess what? That one disgruntled employee, that's one. This isn't a video game. You can't respond 10 times. Agents can't. Agents can spawn sub agents like that, and those sub agents can spawn like that, right? So think of in the way like a virus might spread across your computer or across the human body, right? And replicate and duplicate. It's the same thing with AI agents. A rogue employee can't do that. And that's why this risk is very different. It is very real. So I think that we spend too much time thinking about the positives and the optimistic side of AI agents, which is great, right? Yes. Oh, now all of a sudden I have, you know, 320 AI agents, you know, doing all my work for me around the clock. Oh, that's cool. Right? But what about the risk, the security and the sprawl? And more teams are experimenting now than ever before because no one's got it all figured out. And that means more sprawl and more exposure, especially if you don't have guardrails up. So here's essentially the three surfaces where agent risk actually lives. Number one is the input. That's kind of that untrusted content that can contain hidden instructions. Agents, you know, treat like real commands. All right? That I don't think is going to be as big of a deal. Right? Your inputs, don't get me wrong, things can go wrong, right? People are blindly copying and pasting. So you can copy and paste prompt injections, but prompt injections are the big thing. And we're going to talk about that here in a second. But still, inputs, they can be poisoned, right? There's, there's things that you might not know in, in the same way a rogue human can create a lot of rogue AI agents, that can create a lot of risk and a lot of sprawl that's going to be uncontrollable. So, you know, inputs, I think you really only have to worry about it, you know, if someone really doesn't know what they're doing or if someone is trying to be malicious, which again, those things are going to come up. But think of that one bad employee, what they can now do if they know. Agentic AI, right? Yeah, talk about malware. Spy spyware, right? Ransomware that we, you know, these companies that have paid, you know, millions of dollars, hundreds, tens of millions of dollars for. For ransomware, it's going to be way worse with agents. So the second layer is tools. And I think this is where it starts to get a little dicey in terms of, well, the capabilities are wild, right? So every permission and connector that you add expands that blast radius, essentially, when something goes wrong. In the same way that I walked you guys through the. Oh, it's a dumb stationary brain. But hey, once that brain gets tools, okay, now it can start doing a lot of things. And the same thing on the agent gone wrong side, right? If it was a dumb stationary agent with no tools and no arms, it's like, all right, well, have fun, buddy. Right? You're in a glass case of emotion, just shaken up. When they have tools, that's where things go wrong, right? When they have access to your terminal, right to your computer terminal, that's where things can go wrong. When they can run code on a machine, that's where things can go very wrong. And then last but not least, and this is the big one, this is actions. And this is the biggest thing that, if I had to boil down the biggest change in risk and why it matters now more than ever, it's outputs to actions, right? What we had to worry about a couple of years ago from AI in terms of risk was the output. Now we have to worry about the actions, but it's actions at scale and actions that we might not even necessarily be able to see, right? Silent, unintended workflows that you may not be tracing. And really what this comes down to, well, it's this combination of increased capabilities from agents and moving in the shadows. And that's an enterprise nightmare. That is literally the formula for an enterprise nightmare. So some stats here for you, right? So right now, 57% of employees at least admit to using personal AI accounts for work. It's way more than that, let's be honest. That's just the number that admit. A third admit to inputting sensitive data into unapproved tools, right? So your shadow AI use case there. And here's the thing, you can govern what you can't see, and most organizations can't see their agent footprint at scale. This is what I call the three types of dark AI, all right? For the most part, you're not going to find, you know, the three dark types of. Or the three types of dark AI online is something that I've kind of categorized them in, but I think it's really helpful to get a better glimpse and a better understanding of the categories of risk. So, number one, we all know this shadow AI, right? That's just essentially unapproved or unknown AI use, right? If you've been using, you know, Chat GPT on your personal computer, because Copilot is the AI that's approved, but you want to use Chat GPT and you copy and paste things over as an example, right? That's shadow AI, but that's been around. Everyone knows that, right? What you've maybe heard of, maybe not. It's the next kind of tier, and that's Agent Sprawl. But Agent Sprawl is known, right? So that's essentially when you have approved agents, but you're not sure how to wrangle them or observe them all. You're like, oh, well, yeah, Bill gave us that, you know, agent to help with finances. But we're not really sure what it's doing, right. We think it's doing good. Right. I check the outputs, but I don't really know how it's getting there. That's, that's, that's the beginning of Agent Sprawl. But the thing is, Agent Sprawl goes quickly in the same way a snowball at the top of the mountain might come rumbling down a thousand times the size. That is where we get into then. Dark Agent Sprawl. All right, this sounds like a. Like a screen Name for, for aim. Back in the. Back in the 90s, you guys remember that. AI moves too fast to follow, but you're expected to keep up. Otherwise your career or company might lag behind while AI native competitors leap ahead. But you don't have 10 hours a day to understand it all. That's what I do for you. But after 700 plus episodes of everyday AI, the most common questions I get is, where do I start? That's why we created the Start Here series, an ongoing podcast series of more than a dozen episodes you can listen to in order. It covers the AI basics for beginners and sharpens the skills of AI champions pushing their companies forward. In the ongoing series, we explain complex trends in simple language that you can turn into action. There's three ways to jump in. Number one, go scroll back to the first one in episode 691. Number two, tap the link in your show notes at any time for the Start Here series. Or you can just go to start here series.com, which also gives you free access to our Inner circle community where you can connect with other business leaders doing the same. The Start Here series will slow down the pace of AI so you can get ahead, right? Aim, you know, instant messenger, right. Dark Agent Sprawl 2026. That's going to be my, my username on the everyday AI inner circle community. But essentially that's where there's. Well, Dark Agent Sprawl can be a couple of things because Agent Sprawl, it's a problem, but like I said, for the most part that's a known risk. And you're like, yeah, we have these agents going all, all over the place. We, we have no guardrails, we have no traceability, observability, right? It's. But, but you know of it. Dark agents for all is agents you don't know about. Those are unapproved agents that are working in or on your company. And you can't observe them because you don't know about them. So this is, well, this can be shadow AI, you know, gone agentic, right? So people plugging in agents that aren't approved and company doesn't know. But there's also another kind, right? And I think we're going to see a lot of this, maybe not in 2026, but in 2027. That's going to be the equivalent of malware or spyware. But agents, right? People sending in, seeding agents out specifically in the same way that they would for spyware, malware, ransomware to make money, right? To extract, you know, value from businesses. And that's what's going to happen. That's the next version of this. That's not necessarily Dark Agent Sprawl. But there's two sides, right? Dark Agent Sprawl can start out innocent enough, right? Oh my gosh, you know, our company's not approving, you know, any of my AI agents. Well, okay, whatever. I'm going to go ahead and, you know, unleash CLAUDE code and get, you know, 50 instances of Claude code going up and they're all going to spin their agents. Well, one person might know, but the rest of the company's in the dark. And those agents could replicate, duplicate and you can't observe them. But the other thing is, well, bad actors, Dark Agent Sprawl, that's the thing too. So why is this all happening now? All these risk and security concerns and the Sprawl. Why, why now? Because like I've said, we've been six months away for five years, right? We're six months away from great agents. But then it's, it Almost seems like we didn't get a six month warning. They just popped up, right? Like December 2025, right? Like everyone was winding down for, you know, the, the holidays, you know, especially here in the US and then you go back to work in January, you're like, what the frick happened, right? You're like, it's like, whoa, whoa, wait, the agents are actually here now? Where's the six month warning? Why don't we get, get word of this in, in June or July. But there's three reasons why and it is literally the perfect storm. Not just the ingredients happening, but at the exact right or wrong time. All right, so number one, the number one element that was mixed in the bowl and it's exploding is the reasoning threshold, right? So improved reasonings from models like, you know, GPT 5. Two from OpenAI, Gemini 3. One, their new model, very impressive. And you know, Opus Sonnet 4. Six from Anthropic. These are all built to be agent native. That's how they build them now, right? They're not building models that are great at, you know, reading, writing and comprehension first. No, what they're focusing on is, is the harness and, and the tool use. That's what's first, right? Even, you know, Google's, you know, their new model yesterday, everything that they highlighted was tool use, right? And, and, and how, you know, improved tool use in the scaffolding there has, you know, really helped them improve their outputs. But that's, that's the thing. These models, their reasoning ability is legit through the roof. They can plan steps ahead, self correct errors and move beyond reactive behavior into proactive. And that's kind of reduced agent reliability from around 50% to 90%. And that's perfect storm right there, right? 50% coin flip. You're not doing that. Right. Would you hire an employee that has a 50% chance to fail right away? Probably not, but 90%, okay, that's an A employee. Step two, computer use improvements. This seems small. This is big. Okay, the models have to be insanely smart, right? They have to be able to think. Reason, they have to have genius level iq, which is what today's models do, right? If you look at offline IQ tests, they're scoring in the genius level, smarter than 99.9% of people. Right? But the computer use is important. Doesn't matter if it can't go and use a computer because at least for now, that's how businesses generate value. I, I think eventually value will be just generated agentically. Right? The web has gone agentic. Right. Google and Microsoft announced support for essentially an MCP version of the web where websites can talk to each other, agents can talk to each other. So we'll see how, you know, business value is extracted in the future, because a lot of times it's been through the website stack, the software stack, but, you know, who knows what'll be in the future? But right now, agents can use computers better than humans, right? So computer use means these models can use a mouse computer, they can click, they can use APIs, they can talk to each other. But the big thing is the computer use capability gap has essentially been solved. Right? So, Claude Sonnet 4.6, which came out, well earlier this week. All right, so if you're listening to this, in six months, right, it came out in mid February. So imagine in six months, these, these computer use models are going to be insanely good. But it's the first, the first one that scored better than humans, right? So 72.5 success rate on the OS world benchmark, and that's surpassing human performance for the first time and nearly quintupling the scores of 2024. That's the thing. And I think that's one of the reasons why, you know, even though the, maybe the foundation was there for AI agents, you know, a year, year and a half ago, they couldn't use the computer. It was so slow, right? It was, you know, very, very, very slow computer vision, right? Every time if you wanted to click something, it would take three minutes, okay? Now, models can go as fast as humans, or at least, you know, not just clicking and navigating interfaces, but just a range of computer use tasks. And then the third thing that has created this perfect agentic storm, well, it's the context window in memory, right? So being able to work on task for a long period of time, right. I think my, my record so far, I think I got to about 10 hours overnight the other night on Codex, which was really fun to do, right? But it kept its memory persistent the whole time, right. It didn't forget what I told it when I went to wake up, you know, when I woke up, when, yeah, I spent time reading through the chain of thought, and I'm like, oh, sweet. Not only did it remember anything, but I'm going through, I'm, you know, tracing it, I'm, you know, making sure it kind of stayed within the confines of the instructions I gave it. And it did. But, you know, that's another reason why it's happening. And y', all, let me tell you, if you haven't in the last like three, four weeks, if you haven't went out and used chat GBT or you know, OpenAI's Codex, if you haven't used, you know, Claude code or Claude Cowork from Anthropic, if you've, if you haven't used anti gravity from Google, I'm not saying this to like be that guy. You're gonna get left behind, right? And if you're listening to this show, I don't want you left behind. I'm actually thinking, all right, I don't know this yet. It might, might kind of get a little bit of a vibe working focus, you know, here in the first or second quarter. So make sure if you're not already in our inner circle community, get in there. Yeah, I'm going to start putting, putting some stuff in there. Just keep, keep your eyes open for that. All right, but let's keep going a little bit here. As we going to be wrapping up here in a minute. Let's talk a little bit about how the biggest AI companies are actually responding right now because the risk is well known because it has literally been an explosion and it is going to get worse. So I want to talk about a little bit about kind of the big picture focus of the big four labs. So right now OpenAI is taking more of a human approval approach, right? So Codex is kind of their command center to review agent decisions. Anthropic is really going on the defense against prompt injection for browser agents, you know, with making sure that virtual machines are isolated and have minimal privileges and going with domain allowed list versus blacklist. Right. So everyone has a little bit of a different approach here. You know, Google as an example, their Project Mariner browser agents run in virtual machines as a safety to isolate them from, from anything else they can kind of get their hands on. And then Microsoft, right. There's millions of ways that Microsoft is doing it and millions of ways that all these companies are doing it. I just kind of picked out, you know, different aspects to illustrate their different approach. You know, Microsoft as an example with Mic, their copilot studio, I mean the governance is everywhere, right? They have the sentinel monitoring the, the purview logging, you know, the agent intra id. Uh, right. So Microsoft, it is very enterprise. Google is as well. I was just giving an example of, you know, how their project manager kind of runs in an isolated virtual machine. Because essentially I think from the people I've talked to at least three of those four companies, they know risk is part of it, right? Like there's like any labor Right. They call them Frontier, you know, or AI labs for a reason. You run experiments and part of it is, well, you know, that things need to go wrong, right? So these companies are intentionally trying to get all the risk and all the security nightmares and all the sprawl they can, right? So then when they set a new model out in the wild, when they're done with the red teaming, you know, they hope that they have a good idea. So the labs are working on this. But here's the reality. I think there's a pressure to allocate more resources to the development of models versus research and security, right. I don't have that on, you know, privileged information. That's just from, you know, talking to a lot of people and reading a lot and just the, I think the reality of the world that we live in. And that's why a I sprawl is going to be hard to miss for most business leaders because everything's unclear, right? The tool, like do you know, think of the, the, the AI model you use most, do you know the tools it has? Right. Even if you think of, you know, ChatGPT, Gemini, Claude, do you know what tools the base models have? Probably not. You'd be surprised, right? Like I'm one of those weird guys that, that reads change logs and model cards. Most people have no clue, right? And this is much worse than classic shadow it because agents can act across systems, not just, you know, within the confines confines of folders and files which are very structured. The whole kind of. It's kind of like how people talk about hallucinations, right? Oh, you know, they're not a bug, they're a feature, right? What makes agents great is also the built in nightmare. So if you want to have the good, you have to recognize the bad that they can do, right? And that's because that's how they're made. They are made to go build their own path. They are made to blaze their own trail, right? So by their very definition, agents aren't necessarily always good at staying within guardrails, right? Because if they think that the destination that has been given to them at times, right. Depends on the model, the setup, all that, right? But it's very common for an agent to hop over guardrails in order to accomplish a task. Not knowing that that's a bad thing, right? And every new skill that you add that, that you give to an agent or a model, every connector, every workflow, all that does is it expands the attack surface for everyone. And that's another realization that I think Most people are going to struggle to grasp, right, because you think that you're expanding the amount of work that you can get done and are you sure? But I think you almost have to think of it from like a, I don't know, like an old school, like military perspective, right? It's your land, right? And hey, we're going to go discover new land. We're going to go out and, you know, win new deals, right? So we're going to expand our land. I don't know if any of you have ever played like, you know, Risk. I used to. I wish I still had time, right. It's one of those games, you know, it's like, oh, okay, this is good to make sure my brain still works. But it's kind of like that, right? Your agents, without you knowing, though, are acquiring new territories and that's great. And you think, oh my gosh, look at these new skills and capabilities, right? It's like, oh, it's like having a thousand new employees. Okay, great. But your surface area for risk attacks and sprawl is also multiplying at the same time as your capability. But you're not focused on that. You're focused on, oh my gosh, now all of a sudden I have a designer. Now all of a sudden I have a data analyst. Okay, well, what's that data analyst doing with your data? Are you going to check? Right. Is it using an open source? Right. This is one thing I was kind of blown away by the other day. Gave. Gave, you know, it wasn't anything bad, right? Gave a model a very hard task. This is one of those that overnight I came back and I realized, oh, it downloaded another large language model locally to do the task. Right? It did it locally, right? So kept all the data kind of on. On prem on. On my local machine. But what happens if in the future, other AI models are just going to go use other AI models but the web, right? They can. What if they do that without you telling? Or what if they go use a, a site that's unsecure? And, and that, and this is why I think, you know, Open Claw, as amazing as it is, this is why it's also scary, right, because, you know, these, you know, autonomous open source agents, the open source, if I'm being honest, and I'm sure, you know, I'm going to catch some flack from this, from some OG open source people, right? OG Open source. I'm not talking about you. Today's open source, it's taken a weird turn over the past three to Six months, right? It's almost become this, you know, crypto infused wild, wild web point for west, right? It's not like traditional open source anymore. So when we talk about open source AI agents, right? So many of, of what's, so much of what's out there is risk, right? It is, and it's unknown and it can be scary if you don't know exactly what you're getting into. And I think this is why small agent pilots are becoming huge risk, right? Because when you build your own agents without centralized visibility or governance, that's where you get in trouble. So many people and they're using their own agents, it's for them, right? But it can be go going and doing work for the whole department, the whole company, or accessing data from the whole company. But if one person has eyes on it and if there's not central organization, that's where you run into something. So here's the playbook, right? If you're feeling a little scared, you know what, maybe that was my intent. I got you all pumped up and, you know, excited with yesterday's show, talking about, you know, the new capabilities of agents. But you got to get it in check, right? You have to balance that, that optimism with a little bit of realism. But here is the Monday morning playbook for how to deal with the risk of the attacks in the sprawl. So like what we talked about yesterday, touched on it briefly, but you need to start with bounded autonomy, right? When you think about autonomous agents, it's not, you know, zero to a hundred, it's bounded autonomy. What that means is you start with suggesting, then propose, then approve, then limited execution, right? So many people go to full execution first. No, it's baby steps, right? A human being doesn't go from womb to sprint, right? You go from womb to, you know, being being on your stomach, to sitting, to crawling, to walking, right? Your agents have to go the same way. Even if you know out of the box that agent can sprint, you're not prepared for that agent sprint. Part of it is for the humans, right? We think it's more for understanding the agent's abilities. It's not. Right. This is to make sure that you don't have a bunch of lazy human in the loop. Instead you have proactive, expert driven loops. So start with least privilege by default, read only first, write access only for narrow defined tasks, right? Especially in enterprise environments. You don't want to send out a bunch of read only general agents. Start with read only first, observe, improve your loop, Then you go to Limiting limited execution. And then you can finally get it to writing for narrow tasks. And then last in your Monday morning playbook is you need to require human approvals for irreversible actions like sends, deletes, purchases and permission changes. These are all things agents can do, right? Agents can literally do anything. Agents, right. And this is going to get even harder to manage when we start to see true agentic commerce. Right. And I'm not just saying, oh, you know, my agent is going to go buy something, you know, from the Amazon agent. That's not what I'm saying. Right. There's going to be agent bartering, there's going to be agent, you know, liaisons. Right. So you have to work on human approvals now, Right. It's almost like you have to understand that agents are capable of, you know, a 10 and you have to give them a 2. Because we as humans need to first learn to adapt our behavior before just giving into agents. And then you need to build governance before you scale. And I think that's like just what I was getting at. We get all excited for what we know it can do and they're like, wait, we got to do this boring stuff first? Absolutely. Because if you don't do the boring stuff first, if you don't go through the sitting on your tummy, right? Sitting up, crawling, walking, then you can't expect to understand the path of the sprinting agent. And every agent run needs a decision trace that you can expect after the fact. So you have to be able to log all your tool calls, capture your decision traces and monitor for abnormal action patterns. This is one of the reasons why I said I think we're going to see in 2026, it's going to be extremely common to have agent op teams. Right. In the same reason or, sorry, in the same vein, you know, devop teams are very common. Right. We're going to have agent op teams for this exact reason. So here's what to watch for for the rest of 2026. When it comes to agent risk security and sprawl, browser agents are going to become a mainstream risk surface. I think we know that. I think we're going to see a major open claw type. I'm not saying open claw themselves, but I think we're going to see a major open source agent crash. I talked about that in the 2026 AI prediction and roadmap series. I actually think it's going to come from, right. There's this new trend, like, trend over the last like week or two where everyone's just sending pointing their agents to a, a website, right? When they want their agent to do something, they're not even saying, all right, let me, let me, the human understand this. Let me write it out. Let me apply this to my business logic. Let me apply this with my guard. Nope, they're just saying, pointing their agent. Hey, agent, here's this video. Go watch it and do it. Hey, agent, hey. Here's this website with a whole bunch of cool stuff. Just go do it. You know, there's this whole concept now going around of, you know, point your agent to a URL. Okay? Have fun with that. Don't do that, right? Don't, right? Especially if that agent has its own dedicated machine, right? And I'm saying if you're tinkering around, right, it's your personal computer, you're the business owner. If you want to take that risk, that's fine, right? Especially if you're tinkering around, that's fine. But don't do that in an enterprise. And unfortunately people are doing that. That's a bad thing because guess what? A lot of those sites quote, unquote. Oh, you know, someone puts up a huge tutorial, okay? They're putting it up on their personal website. You know how easy it is to, to hack, to phish any of these websites? Okay, you see, let me give you an example. Something goes viral on Twitter, it's some, you know, literally some 20 year old kid, you know, put a great, you know, guide up on the website. Cool. And you know, everyone's just like, hey agent, go read that, go read that. And then you have millions of agents going to just read this, you know, this nice kid put up a cool guide. Okay? Guess what's going to happen? Someone's going to see that they're going to hack that site and they're going to inject malicious code in there that no one's going to see. That's what's going to happen. All right, next, the Agent Skill marketplace is going to expand the supply chain risk so you inherit what you plug in. And we've already seen that with some skills and plugins for agents. Some of the top ones were found to have, you know, malicious code in them. You need to. We're going to see identity and permissions becoming a board level compliance requirement. Not a nice to have. And last but not least, the winners are going to treat agents like production software, not side experiments. All right, that's it. That's a wrap. Volume nine of the Start Here series. I hope this helpful, I hope this was helpful and I hope you understand a little bit more the risk, security and the sprawl that you need to be aware of and how this having AI that acts, having AI, right, that has a smart brain, it's proactive, it has tools and it has arms. That changes everything. Right? This perfect storm that has happened in the last 30 days, we've seen, I've said this before, we've seen more both on the opportunity and the risk side. We've seen more in the last 30 days than we've seen in the last probably two years. All right. So I wanted to take a moment for the Start Here series to address this and I hope this helps you not only understand the risk, but if you want to take advantage of of the opportunities, you can't do it without knowing the risk. So now you do. All right, So I hope this was helpful. So if so, there's already eight other in our Start Here series that you should go check out. So like I said, please go to start here series.com that is going to give you free access to our inner circle community. Once you sign up, you're going to be just straight up loving it, hopefully in our Start Here series space and connecting with now more than a thousand people in our inner circle community. So thank you for tuning in. Hope to see you back for more Everyday AI. Thanks y'. All.