Transcript
A (0:00)
Today on the AI Daily Brief, how the team that designed Agent Skillz uses agent skills. And before that in the headlines, you can now control Claude Cowork from your phone. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG Blitzy, AIUC and Mercury. To get an ad free version of the show, go to patreon.com aidaily brief or you can subscribe on Apple Podcasts. Ad free is just $3 a month. If you are interested in sponsoring the show, send us a Note@ SponsorsIDailyBrief AI at this point we are firmly selling into the summer, so if you are planning campaigns in the future it is a good time to reach out. And of course if you need to know anything else about the ecosystem, you can also find that on aidailybrief AI. I would once again point you to the newsletter, which is back and is basically the best way to get access to the links that I talk about in the show. Again, you can get that all on aidaily Brief AI and with that out of the way, let's dive in. One of the interesting ways that you can tell what's really important to AI builders and people on the front lines is when there's a story that on the surface looks fairly small, but which is getting a disproportionate share of the conversation in AI circles. Our first story today is exactly that. On the surface, it's just a simple new feature for Claude Cowork. In this case it's called Dispatch and it allows you to bring your Claude cowork with you on the go. That said, based on the reaction 3 million views on the announcement tweet, 9,000 bookmarks. This one is a big deal to people. In the wake of OpenClaw, companies in the agent space have either been a releasing their own versions of OpenClaw. That was obviously the topic of our show yesterday, or or they've been slowly adding the important features of openclaw to their existing product suites, which has been of course, Anthropic's approach. A couple weeks ago we got remote control for Claude code, which allowed users to initiate CLAUDE sessions on their computer and then carry them onto their mobile devices where they could control them doing whatever it was that they were doing. Basically coding from the gym. Dispatch is basically that. But for Cowork, the cowork sessions are still hosted in a sandbox on your computer, meaning Claude still has the same access and protections. However, you can now kick off a Cowork session and then continue monitoring progress and providing approvals while out and about. Anthropic described the feature as like having a walkie talkie for communicating with Claude. Cowork developer Felik Risberg wrote, it feels pretty magical to give Claude a mission on my computer and get occasional updates like creating reports from internal dashboards or finding me a better seat on my next flight. Everything Claude can do on your computer files, Browser tools are reachable from wherever you go. First impressions are good. Daniel San writes, Testing cowork for my phone the walkie talkie analogy is spot on. Your phone becomes a remote control that talks to Claude running on your desktop. One more to the weekend testing list. Stay tuned. Post incoming on how it works Ethan Mollick writes, after using it a bit, Claude cowork dispatch covers 90% of what I was trying to use OpenClaw for, but feels far less likely to upload my entire drive to a malware site. He continues, what I like easy, much more stable and safe Existing connectors mean better integration with Gmail browsers, etc. Very good tool use. What is missing for me? Ability to invite Claude to any channel, the heartbeat and proactivity and multiple sessions. Right now, dispatch is one chat. Now for hardcore OpenClaw users, all of those things would be deal breakers. But this isn't necessarily about converting hardcore openclaw users, it's about bringing those types of feature sets to the full spectrum of tools for all the different types of agent users. Indeed, I think Powell Hurren gets it right when he writes the bigger story Code. Cowork, Web and now Dispatch are all converging towards the same thing, a persistent AI layer that follows you across devices and contexts. I think that is exactly right. This qualification that we keep talking about is actual just form factor adjustments as everyone figures out the right way for people to interact with agents across a variety of different use cases and behavior patterns. Speaking of Open Claw, one of the things that we've been tracking is the rise of Open Claw in China. You might remember seeing a bunch of viral videos about people standing in line to get access to their first open clause, supported by some of the big Chinese tech companies. But apparently the Chinese government is now growing concerned. In recent weeks, regulators warned staff at government agencies and state owned enterprises of the dangers of openclaw and advised them not to install the agent. This seems to be somewhere between a stern warning and an outright ban across different regions and entities. Last week authorities released a list of six do's and don'ts for organizations deploying OpenClaw. Among their suggestions were using the official version and minimizing Internet access and permissions. Adoption is so pervasive that the Hong Kong Monetary Authority, which is basically their central bank, issued an official statement that they had no plans to deploy OpenClaw on their internal IT systems. Chinese media is now running OpenClaw horror stories regarding privacy leaks and financial screw ups, with one user apparently giving their OpenClaw access to a credit card, which was promptly run up to the limit. Wendy Chang, a senior analyst at the Mercator Institute for China Studies, believes that OpenClaw has a natural cultural resonance in China. She said, most people view technology as a convenience, so when something new comes out, they're more willing to try it. Some have suggested openclaw being free and open. Open source has a major role to play in its popularity. Many analysts have noted that Chinese tech firms have struggled to monetize their models among consumers as the concept of software subscriptions is far less developed in the East. Stanford professor Graham Webster, who focuses on geopolitics and tech, but who before that was my hallmade and entrepreneurial collaborator at Northwestern back 20 years ago, suggested that the rise of Openclaw could be a flashpoint for China's AI industry. Until now, any and all experiments have been encouraged under a formal regional initiative called AI. However, the clear privacy and security concerns could trigger a rethink, according to Webster, who said it could be a moment that starts to cause the Chinese government to think about the downsides of widely available open models. It feels to me like there's an interesting story brewing here, although I'm still not exactly sure what it is and what it says about where we are. But it's something that I'm going to continue to pay attention to. One flag related to that While in general, optimism about AI is way higher in China than it is in the U.S. there was a huge spike in the term AI anxiety on WeChat in February, peaking in mid March as open claw mania hit a crescendo. Tony Peng of Recode China wrote, what is different this time is the mood. In those earlier waves, the mainstream mood was excitement, awe and curiosity. This time, more and more people are expressing anxiety, fear and concern. Tony argues that the most obvious reason is job insecurity. He writes, for most ordinary people in China, AI still means chatbots, Claude code or Codex is not available. There's no household AI agent with real penetration. Then all of a sudden, media reports are claiming OpenClaw can handle a wide range of tasks autonomously, and the gap between what people knew and what they were being told deepened at the sense of being left behind. In other words, even in a place with high AI optimism, the job displacement fear persists now. Separately, Chinese authorities are taking a second look at Meta's acquisition of Manus. From the outset, it seemed that Manus had designed their corporate structure to circumvent controls on Chinese tech exports. The the company relocated their headquarters from Beijing to Singapore in July of last year, shortly after they began taking capital from US Venture firms. Sources said that officials at China's National Development and Reform Commission called executives from Meta and Manus to a meeting last week to express concerns over the deal. Government actions remain unclear, but they appear to include an effort to bar Manus executives from departing China for Singapore. The New York Times discussed a range of different options that Chinese officials might pursue, including clawing back data exports or declaring the relocation unlawful. This could be a reaction to growing concerns about losing AI talent to the West. However, some analysts have suggested it's just a maneuver to create leverage ahead of trade talks later this month. Meta is trying to present themselves as unconcerned, with a spokesperson stating the transaction complied fully with applicable law. The outstanding team at Manus is now deeply integrated into Meta. We appreciate the appropriate resolution to the inquiry and one last one on China Nvidia says it's restarting production as Chinese export plans get back on track Speaking at a press conference on Tuesday, Jensen Huang said, we've been licensed for many customers in China. We've received purchase orders from many customers and we're in the process of restarting our manufacturing. Our supply chain is getting fired up now. The process for getting Export approval for H2 hundreds has been an on again, off again affair since the idea was floated by President Trump back in December. The most recent chatter from the beginning of March was that Nvidia would shut down production and reallocate the fab time to producing next generation Vera Rubin hardware. No single catalyst was attributed to the decision, but export plans have seen multiple setbacks from both Beijing and Washington in recent months. Huang suggested on Tuesday that the squabbling within the Trump administration had been settled. Commenting President Trump's intention is that the United States should have a leadership position in access to Nvidia's best technology. However, he would like us to compete worldwide and not concede those markets unnecessarily. Reuters, meanwhile, reported that it's all systems go from the Chinese side as well. Sources familiar with the situation confirmed that Chinese authorities had granted approval for multiple companies to purchase H2 hundreds. Earlier reports suggested demand was staggering, with multiple Chinese firms placing orders for hundreds of thousands of chips. That demand could go towards explaining Huang's new forecast that Nvidia could see a trillion dollars in sales by 2027. Lastly today, speaking of that big prediction from Jensen about revenue, Amazon CEO Andy Jassy also sees AI doubling revenue for aws. According to Reuter sources, Jassy shared the lofty projection with staff at a recent all hands. He said that over the long term, AI could boost annual sales for AWS to $600 billion, double his prior estimate. Jassy said, I've been thinking for the last number of years that aws call it 10 years from now could be a $300 billion annual revenue run rate business. I think what's happening in AI that AWS has a chance to be at least double that. AWS Most recently booked 128 billion in sales for 2025, 19% growth from the prior year. And while the numbers that he's throwing around seem big, the prediction might not be all that extravagant. This would represent 17% annual growth for the coming decade, analyst Patrick Moorhead writes, In my view, this is the clearest signal yet that hyperscale cloud is entering a second growth phase that dwarfs the first net AI is repricing the entire cloud total addressable market upward. Brock meanwhile, points out if AI genuinely doubles aws revenue to 600 billion by 2036, then Amazon will emerge as one of the biggest beneficiaries of the entire AI buildout without even having to build the models themselves. Interesting stuff going on, but that is going to do it for today's headlines. Next up, the main episode. Agentic AI is powering a $3 trillion productivity revolution and leaders are hitting a real decision point. Do you build your own AI agents, buy off the shelf or borrow by partnering to scale faster? KPMG's latest thought leadership paper Agentic AI Navigating the Build, Buy or Borrow decision does a great job cutting through the noise with a practical framework to help you choose based on value, risk and readiness and how to scale agents with the right Trust, Governance and Orchestration Foundation. Don't lock in the wrong model. You can download the paper right now at www.kpmg.usnavigate. again, that's www.kpmg.usNavigate. if you're looking to adopt an agentic sdlc blitzi is the key to unlocking unmatched engineering velocity. Blitzi's differentiation starts with infinite code context. Thousands of specialized agents ingest millions of lines of your code in a single pass, mapping every dependency with a complete contextual understanding of your code base. Enterprises leverage Blitzy at the beginning of every sprint to deliver over 80% of the work autonomously. Enterprise grade end to end tested code that leverages your existing services, components and standards. This isn't AI autocomplete. This is spec and test driven development at the speed of compute schedule. A technical deep dive with our AI experts at blitzi.com that's B L I T-Z-Y.com There's a new standard that I think is going to matter a lot for the enterprise AI agent space. It's called AIUC1 and it builds itself as the world's first AI agent standard. It's designed to cover all the core enterprise risks, things like data and privacy, security, safety, reliability, accountability and societal impact. All verified by a trusted third party. One of the reasons it's on my radar is that 11 labs, who you've heard me talk about before and is just an absolute juggernaut right now, just became the first voice agent to be certified against AIUC1 and is launching a first of its kind insurable AI agent. What that means in practice is real time guardrails that block unsafe responses and protect against manipulation, plus a full safety stack. This is the kind of thing that unlocks enterprise adoption. When a company building on 11 labs can point to a third party certification and say our agents are secure, safe and verified, that changes the conversation. Go to AIUC.com to learn about the world's first standard for AI agents. That's AIUC.com this episode is brought to you by Mercury. Radically different banking now available for personal accounts. I already use Mercury for my business, so when they introduced personal accounts it made immediate sense for me. I try to bring the same level of intention to my personal finances that I bring to building companies and most traditional banks just do not feel designed for that. With Mercury Personal you can toggle between business and personal in a click. You can set up sub accounts for specific goals, automate transfers so projects and savings fund themselves and put idle cash to work with high yield savings, all without friction. And it's built for people who care about how their money moves and want tools that actually keep up. Visit mercury.compersonal to learn more. Mercury is a fintech company, not an FDIC insured bank. Banking services provided through Choice Financial Group and column NA Members FDIC. Welcome back to the AI Daily Brief Today we are doing a bit more of a practical hands on style episode. It was inspired by this post from Tariq over at the Claude Code team at Anthropic called Lessons from Building Claude Code. How we Use Skills and the context for this is that if you take away one theme from pretty much all of 2026's episode so far, it's that we are moving into a much more agentic era of AI. Skills are a key component of how to get value out of agents. And so today we're going to first give a little bit of a background of what skills are, talk about some of these lessons and best practices from the team at Claude Code, and then share a few more resources where you can take the conversation farther. First of all, let's talk about what skills are. The official GitHub repo calls them a simple open format for giving agents new capabilities and expertise. Skills are folders of instructions, scripts and resources that agents can discover and use to perform better at specific tasks. Write once, use everywhere. The background is this as AI coding agents were getting more and more capable throughout 2025, people started to hit a very similar wall, which was basically that system prompts kept ballooning. Every new capability meant more instructions, more examples, and more edge cases crammed into a single context window. Of course, the more you try to jam into a context window, the more you're going to have performance degradation. Having to juggle all of that knowledge all at once was crowding out space for actual execution on the task at hand. That led to agents getting slower, more expensive and less reliable. Now, the insight that ended up driving skills was that agents don't need access to all of their knowledge all the time. What they need is to be able to load the right knowledge at the right moment. On October 16, Anthropic officially announced Skills in a blog post. The post was called Equipping Agents for the Real World with Agent Skills and frame the issue as this CLAUDE is powerful, but real work requires procedural knowledge and organizational context, they write. As model capabilities improve, we can now build general purpose agents that interact with full fledged computing environments. Claude Code, for example, can accomplish complex tasks across domains using local code execution and file systems. But as these agents become more powerful, we need more composable, scalable and portable ways to equip them with domain specific expertise. This led us to create agent skills, organize folders of instructions, scripts and resources that agents can discover and load dynamically to perform better at specific tasks. So what a skill actually is is a directory anchored by a markdown file, every skill directory is going to have a Skill MD file that's going to have some required metadata, like a name and a short description. When agents have access to skills, rather than having to have all of the context all at once, they simply load up the name in the description. The idea of progressive disclosure in Skills is to give the agent just the information that it needs in order to make good decisions without overloading its context. So basically, the first layer of detail is just the short description, which means that when the agent is doing a task, it has those descriptions in mind and can go call up that skill if it seems like it would be useful. The second level of detail in this progressive disclosure regime is the actual body of the Skill MD file. If the agent thinks that that skill is going to be useful, it'll move from just reading the description to reading the contents of that Skill md. Now, the Skill MD body is still very small, while the metadata is tiny at roughly 100 tokens per skill, even the full skill MD body is recommended to stay pretty small. This leads to the third level of detail in progressive disclosure. Basically, as skills grow in complexity, they also might have context that's relevant only in specific scenarios. And in fact, this is a really important part that gets missed. In the article From Anthropics to Reek that we're going to come back to, he writes, a common misconception we hear about skills is that they are just markdown files. But the most interesting part of skills is that they're not just text files. They're they're folders that can include scripts, assets, data, et cetera that the agent can discover, explore and manipulate. Basically, you can bundle additional context in the form of other markdown files or references or scripts that get linked out to from the Skill MD file. The analogy, they say, is a well organized manual that starts with a table of contents, then specific chapters, and finally a detailed appendix. Almost immediately, skills began being adopted outside of just the anthropic ecosystem. OpenAI added skills support to both ChatGPT and the GitHub Copilot family of coding agents. All adopted the standard, and other ecosystems and harnesses have jumped on board as well. The launch of OpenClaw really took the skills conversation to the next level. As people started en masse building all of these different agents, a lot of them had common skills needs like, for example, understanding how to use specific tools, how to interact with certain types of file formats like documents and PDFs, or how to take specific actions like transcribing audio. A site called Clawhub quickly launched that now has something like 28,000 skills, and other people have their own collections focused on particular use cases or areas of interest. And yet what Anthropic found when they actually sat back and looked was that as many skills as there were available, many if not most of them, could fit into one of nine categories Library and API reference, product verification, data and analysis, business automation, scaffolding and templates, code quality and review, CI CD and deployment, Incident runbooks and infrastructure ops. So that's what led them to this post. Let's talk first about some of the categories in this taxonomy, and then some of the more general best practices that Anthropic shared. I'm not going to go through all nine categories, but let's talk about a couple. One key category they found was data fetching and analysis skills that, for example, connect to your data. These skills, they write, might include libraries to fetch your data with credentials, specific dashboard IDs, et cetera, as well as instructions on common workflows or ways to get data. Another category which I can see being important to listeners of this show is business process and team automation. In other words, skills that automate repetitive workflows into one command they write. These skills are usually fairly simple instructions, but might have more complicated dependencies on other skills or mcps. An example might be a weekly recap Skill where Merge PRs closed tickets deploys come together in a formatted recap post. Another category in their key taxonomy of skills, which relates to some conversations we've been having recently, is about code quality and review. Now, the conversation that we've been sharing here is one about what happens when coding agent Sprawl gets sufficient that it just becomes impossible for humans to review all the code. There are some who are argued that we're already far past that point, while others cling on to the idea that humans need to have the final look. My very strong instinct is that even if it would be better if all code that was released as products and services actually had human review, I don't think there's any chance that that paradigm gets out of 2026. I think we're going to have to solve the problem of code review in new ways, which, I'll be clear, is a problem that I am not qualified to solve. But I just think that we're going to be producing such an incredibly high volume of code that at some point we'll give up the ghost on the idea of being able to review it all. That makes code quality and review skills seem all the more potentially important. This Anthropic describes as skills that enforce code quality inside of your org and help review code. Some of the examples are adversarial review, which would spawn a fresheye sub agent to critique, implement fixes and iterate until findings degrade into nitpicks. Or a code style skill that enforces code styling, especially styles that Claude does not do well by default. Interestingly, and I think related to that, Tariq argues that one of the highest ROI categories are verification skills. They describe this as skills that describe how to test or verify that your code is working. Verification skills are extremely useful for ensuring Claude's output is correct. It can be worth having an engineer spend a week just making your verification skills excellent. Consider techniques like having Claude record a video of its output so you can see exactly what is tested, or enforcing programmatic assertions on state at each step. So there are more categories in the taxonomy, but that gives you a feel for what Anthropic is seeing in terms of their most valuable skills. Now admittedly this is from the Claude code team, so he's going to hindex highly technical. Whereas if you had an agent builder who is mostly focused on business processes, you probably see more gradations of this category for business process and team automation. Maybe even more useful then are Tariq and Claude Code's Tips for actually Making Skills One thing that a number of folks missed is that Anthropic actually just updated their Skill Creator tool. Skill Creator, they write, helps you write evals, run benchmarks, and keep your skills working as models evolve. And it was meant to answer a specific challenge. Since launching Agent Skills last October, they wrote, we've noticed that most authors are subject matter experts, not engineers. They know their workflows but don't have the tools to tell whether a skill still works with a new model, triggers when it should, or if it actually improved after an edit. Ultimately, they write. The goal is bringing some of the rigor of software development like testing, benchmarking, and iterative improvement to skill authoring without requiring anyone to write code. Solopreneur and educator Ollie Lemon actually called this out as a fairly big deal. He wrote Anthropic shipped three upgrades to skills that fix most problems. Almost everyone runs into Problem one you had no way to measure how well your skills were actually performing. Now you can run evals that test your skill against multiple prompts and get a score. Problem 2 your skills break when models update and you don't notice. With the new skill creator, you can run AB tests comparing your Skill. In Raw Claude problem number three, he writes, claude doesn't even use your skill half the time because the description is too vague or too specific. Now the skill creator rewrites your descriptions automatically, so they trigger at the right time. Anthropic, he points out, ran this on their own skills and saw better triggering five out of six times. Now one other note from the Skill Creator that I thought was valuable is the framework for organizing skills into two categories. They call those two categories one capability uplift skills that help Claude do something the base model either can't do or can't do consistently, that is Certain types of document creation. And then the second category of skills are called encoded preference skills that document workflows where Claude can already do each piece, but the skill sequences them according to your team's processes. The distinction matters, they say, because these two types of skills may need testing for different reasons. Capability uplift skills may become less necessary as models improve, while encoded preference skills are more durable, but only as valuable as their fidelity to your actual workflow. So back to Tarik's post. Here are some of their top tips for making skills better. The first is don't state the obvious, they write. If you're publishing a skill that is primarily about knowledge, try to focus on information that pushes Claude out of its normal way of thinking. The front end design skill is a great example. It was built by one of the engineers at Anthropic by iterating with customers on improving Claude's design taste, avoiding classic patterns like the interfont and purple gradients. The second tip is to build a gotcha section. In fact, Tariq argues that the highest signal content in any skill is the gotcha section. These sections articulate common failure points that Claude runs into when using your skill, and ideally, he says, you update your skill over time to capture these gotchas. A third tip goes back to that idea that people still think of skills as just a single markdown file rather than an entire folder. And Tariq says you should think of the entire file system as a form of context engineering. They also suggest you should avoid railroading Claude. I give Claude the information it needs, but give it the flexibility to adapt to the situation. As Tariq puts the conclusion, this should be thought of more as a grab bag of useful tips than as some sort of definitive guide. That makes sense, because right now everyone is just racing to figure out how to actually engage with the new capabilities of agents. And so every bit of advice at this point is going to be at least a little bit a Work in progress now, one of the interesting things then is how all of these work in progress lessons apply to different categories of users. The most obvious is probably the advanced agent builders who are building and maintaining complex multi agent teams. For them, obviously skills are essentially a modular architecture for agent capabilities, and frankly this is kind of the audience that Tarik Most wrote this post for. A level down from that are the individual power users, which in my guess is a lot of you fall into this category. This is not a person who's building complex agent teams and orchestration models. Instead, they are using one or a small number of agents to get their own work done faster or better, or do things that weren't possible before for that type of user. Skills are basically reusable prompts with superpowers. The difference between a skill and a saved prompt is that a skill can include actual code templates, reference data and examples, not just instructions. The practical value then is you figure out how to get the agent to do something well once and then you package it so it works reliably every time. The standup post example from Tariq's post is perfect for this tier. This is an automation of a daily task you do and the type of thing that you want to happen consistently over and over again. This also helps demonstrate why that gotcha section can be really valuable. Every time the agent makes a mistake, you add it to the skill so it doesn't happen again and the skill becomes a living document that gets smarter over time. This also helps you stay not locked into one specific ecosystem because skills are supported by Codex, Claude code, cursor, etc. You're not locked into any one tools prompting format. But what about for the mainstream user, the person who isn't even yet fully in Claude code or Codex? People who are using off the shelf tools or experimenting with Perplexity Computer or Notion custom agents? What's interesting here is that the design pattern holds and you can see even in these simpler prosumer and consumer tools the idea of skills as reusable capabilities infiltrating into the mainstream. In fact, earlier this week Notion announced custom skills for Notion AI. In their announcement tweet, they write write a prompt, you'll use it once, write a skill and you'll use it forever. And this is the mental model shift. Even if you are not an agent builder with Claude code, the shift is from thinking about ad hoc prompting to reusable capabilities. For a lot of folks out there, you're not ultimately going to have to care about the full architecture of skill MD files and progressive disclosure and all these things. Those folks just know they can teach the AI to do a specific thing their way, give it a name and invoke it whenever they want. For some, it'll almost be an Update to custom GPTs, which for many became essential even though they never fully took off. Now you can see how Notion has simplified skills into their own ecosystem. Basically you could take any page in Notion, click the menu and turn that page into a skill. And the point is that this concept of skills as reusable capabilities is a concept that is converging across the entire AI stack, from consumer uses up to much more advanced uses all at once. The underlying idea is that AI is less and less a one off conversation and more and more a library of reliable, repeatable capabilities. Skills, I think are a useful framework for that no matter what level you're engaging with it on. And hopefully this episode has given you a little bit of a better starting point. We might go deeper in a future Operator episode, but for now that is going to do it for today's AI Daily Brief. Appreciate you listening or watching as always and until next time, peace.
