Transcript
A (0:00)
Today on the AI Daily Brief why OpenAI are adopting the skills mechanism and how it could improve agents before that in the headlines, the fallout from the latest White House Executive Order on AI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick announcements before we dive in. First of all, thank you to today's sponsors, kpmg, rovo, Robots and Pencils and Blitzy. To get an ad free version of the show go to patreon.com aid daily brief or you can subscribe on Apple Podcasts. If you are interested in sponsoring the show, send us a Note@ Sponsors AI. We can send you all the information you need. Also at aidaily Brief AI, you can find out anything else you might need to know about the podcast. We're going to be doing a few more days of this newsletter test this week before reviewing and seeing what the plan is for January. For now, like I said, you can find that all on aidaily Brief AI welcome back to the AI Daily Brief Headlines edition. All the daily AI news you need in around five minutes. Last week, after a lot of behind the scenes discourse, some of which spilled into very public acrimony, President Trump signed a highly contentious order attempting to block states from passing their own AI regulations. Now this is one of those classic debates that's about a hundred things at once. To take the administration at face value, this is about creating a single federal rulebook as a necessary step to ensuring the US can win the AI race. But then, of course, underneath that, there are issues of the power relationship between the federal government and states. That's one that's been big here in the US for the last 250 years or so. And there's also the sub story of the GOP fracturing around Trump's alliance with AI technology companies. A draft of the order had circulated in late November, sparking outrage on both sides of the aisle. The executive order that ended up passing on Thursday was substantively identical to the draft that included the controversial measure of establishing a dedicated task force within the DOJ to start a campaign of litigation against states with their own AI laws. The order also instructed the Commerce Department to withhold federal broadband funding from states that had, in the words of the eo, onerous AI laws. There are three big issues that the EO brings up when it comes to state level regulations. First, they said by definition, it creates a patchwork of 50 different regulatory regimes that makes compliance, especially for startups, particularly challenging. Second, the White House claims, quote, state laws are increasingly responsible for requiring entities to embed ideological bias within models. Third, they say state laws sometimes impermissibly regulate beyond state borders, impinging on interstate commerce. Now, of course, the Democratic side of the aisle immediately had a lot to say about this. Scott Wiener, who has been extensively involved in state AI legislation in California, said it's absurd for Trump to think he can weaponize the DOJ and commerce to undermine those state rights. If the Trump administration tries to enforce this ridiculous order, we will see them in court. Federal Senator Brian Schatz has already sponsored a bill that would overturn the order. Schatz drew on the criticism that this order blocks state law and replaces it with nothing, commenting, congress has a responsibility to get this technology right and quickly, but states must be allowed to act in the public interest in the meantime. Now, as I mentioned before, the order also triggered infighting for Republicans, who are worried that AI will be a losing issue in the midterms, writes the Washington Post. Populist forces within the Republican Party mounted an extensive campaign to derail the action after a draft of the order leaked last month, arguing that fears over AI's potential to automate jobs would undermine the party's message to workers. Now, the Post said a handful of tech leaders neutralize those fears for now, convincing the president, a longtime real estate developer, that burdensome regulation could cripple the industry. White House AIs are David Sachs did take to Twitter Slash X to have some conciliatory words on at least a few of the concerns from the right. He called them the four Cs child safety communities, creators and censorship. On child safety, he said preemption would not apply to generally applicable state laws. So state laws requiring online platforms to protect children from online predators or sexually explicit material would remain in effect on communities. He said AI preemption would not apply to local infrastructure. In short, preemption would not force communities to host data centers they don't want. On creators, he said copyright law is already federal, so there is no need for preemption here. Questions about how copyright law should be applied to AI are already playing out in the courts. That's where this issue will be decided. And on censorship, he claims, as mentioned, the biggest threat of censorship is coming from certain blue states. Red states can't stop this. Only President Trump's leadership at the federal level can. Still, it does not seem all is resolved when it comes to AI politics on the right. The Post describes a, quote, simmering rift between the populous and tech factions of the Republican Party, with one source saying it feels like millions of votes across the country just got traded for thousands of VCs and tech rich votes in regions Republicans will never win. Now moving over to another recent move. Last week the President announced that Nvidia's previous generation H200 chips would be approved for export the first time that unmodified Western versions of the chips had been approved in over three years. That news was immediately followed by reports that Beijing was meeting with tech firms and considering how tightly to restrict access. Basically the strategic consideration for China is how much to allow in these new chips which could accelerate the output of their labs. The versus to continue to focus on their domestic chip industry which while potentially slowing down those outputs in the short term could create long term resilience and independence. Speaking with Bloomberg on Friday, AIs are David Sachs said China's rejecting our chips. Apparently they don't want them and I think the reason for that is they want semiconductor independence now. He cited Financial Times reporting here rather than inside Communications. Still, the comments highlight that the chip strategy may be too late. The logic of granting access to H2 hundreds was largely that the US needs to get ahead of China developing their own advanced chips and if Nvidia can't flood China with their chips then that sort of puts the strategy in jeopardy. Nvidia, meanwhile, said while we do not yet have results to report, it's clear that three years of overbroad export controls fueled America's foreign competitors and cost US taxpayers billions of dollars. Added Sachs, what you see is China's not taking them because they want to prop up and subsidize Huawei. That was part of our calculation of selling not the best but lagging chips to China is that you can take market share away from Huawei. But I think the Chinese government has figured that out and that's why they're not allowing them to that point. Bloomberg is reporting that Beijing is preparing a $70 billion package to incentivize domestic chip making. Final details, including target companies are still to be determined, but this could be the largest ever state backed investment in semiconductors. For comparison, 39 billion was allocated to the Chips Act. Subsidies in the US and the EU is currently putting together a $46 billion package for their domestic industry. Moving over to models, GPT 5.2 has been out for a few days and the independent benchmarking results are in. The model is now tied for leader in the overall Artificial Analysis Intelligence Index. Nuzzling up together with Gemini 3 Pro on their coding index. The model also tied for first place with Gemini 3 Pro with Claude 4.5 Opus a couple of points behind. Now for any of you who follow developers on X and see the difference of opinion on Opus 4.5 versus all these models is exactly the sort of reason why you need to be skeptical of the overall value of benchmarks on their agentic index. GPT 5.2 is in second place to Opus 4.5 but slightly ahead of Gemini 3 Pro. Overall. All these results really do to show is that with 5.2 OpenAI now has a credible competitor to the other big labs. It is not decidedly and clearly better than the other models, but it is a meaningful bump from GPT 5 and 5.1. Now recent reporting suggested that Code Red would continue until next year and these results I think help show why. Now one particularly interesting result was on GDP VAL. That benchmark, you might remember, was developed by OpenAI and seeks to measure agenta capabilities by giving models real world white collar tasks with established economic value. Unlike some other benchmarks, it measures end to end task completion. Artificial Analysis recently developed an independent AI evaluator for the tasks that allows them to include GDP VAL in their assessment suite. When OpenAI announced it, they were using real world experts in addition to an experimental AI assessor on that benchmark, 5.2 managed to top the leaderboards, pulling ahead of Opus 4.5 by a decent margin. I think people are still trying to wrap their head around GDP VAL and come to a common sense understanding of just how valuable the benchmark is. But again, this just further solidifies to me that there is a very tight, clear competition with the premier models of all the major foundation labs. We will see if OpenAI can change that with their next release, which is anticipated in January. For now, however, that is going to do it for today's headlines. Appreciate you listening or watching. As always. And until next time, peace. Hello friends. If you've been enjoying what we've been discussing on the show, you'll want to check out another podcast that I have had the privilege to host, which is called you can with AI from kpmg. Season one was designed to be a set of real stories from real leaders making AI work in their organizations. And now season two is coming and we're back with even bigger conversations. This this show is entirely focused on what it's like to actually drive AI change inside your enterprise and as case studies, expert panels and a lot more practical goodness that I hope will be extremely valuable for you as the listener. Search you can with AI on Apple, Spotify or YouTube and subscribe today Meet Rovo, your AI powered teammate Rovo unleashes the potential of your team with AI powered search, chat and agents or build your own agent with Studio. Robo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Rovo to your favorite SaaS app so no knowledge gets left behind. Rovo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira Confluence and Jira Service Management Standard, Premium and Enterprise subscriptions. Know the feeling when AI turns from tool to teammate? If you Rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian get started at ROV as in victory o.com AI isn't a one off project. It's a partnership that has to evolve as the technology does. Robots and pencils work side by side with clients to bring practical AI into every phase automation, personalization, decision support and optimization. They prove what works through applied experimentation and build systems that amplify human potential. As an AWS Certified Partner with Global Delivery Centers, Robots and Pencils combines reach with high touch service where others hand off. They stay engaged because partnership isn't a project plan, it's a commitment. As AI advances, so will their solutions. That's long term value. Progress starts with the right partner. Start with robots and pencils@ropotsandpencils.com aidaily Brief this episode is brought to you by Blitzy, the Enterprise autonomous software development platform with infinite code context. Blitzi uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzi platform, bringing in their development requirements. The Blitzi platform provides a plan, then generates and pre compiles code for each task. Blitzi delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the Sprint. Public Companies are achieving a 5x engineering velocity increase when incorporating Blizzi as their pre IDE development tool, pairing it with their coding pilot of choice. To bring an AI native SDLC into their org, visit blitzi.com and press get a demo to learn how Blitzi transforms your SDLC from AI assisted to AI native. Welcome back to the AI Daily Brief. Today we're getting a little bit more technical than we normally do, but there's a reason for that One of the big themes of 2025 was supposed to be AI agents, and while I would argue that that came true, it was a little bit more nuanced than I think people thought it would be. Going into it, I believe that the expectation was that we would see agents proliferate across the enterprise. Instead, what we got was one coding agents becoming the most important breakout category in AI writ large and to a lot of infrastructure and standards type work around how we build agents that set us up for that sort of maturity and proliferation in the years to come. Now around that, one of the things that's been interesting is to see how companies, even very fiercely competitive companies in the space, have frequently decided over the course of the last year to adopt each other's standards rather than trying to compete around standards. We saw this of course with MCP, which became a standard way adopted by Google and OpenAI and Microsoft, even though it originated with anthropic to allow LLMs and AI applications to access outside information. And now it appears that something similar might be happening with Skills. At the end of last week, a number of folks on Twitter Slash X, including Simon Willison, noticed that Anthropic skills mechanism was starting to show up in the OpenAI ecosystem. So let's talk about what skills are and why this could be a big deal. Back in October, Anthropic introduced agent skills, which they called a new way to build specialized agents using files and folders. And at core, files and folders are what skills are. Specifically, Anthropic writes that skills are organized folders of instructions, scripts and resources that agents can discover and load dynamically to perform better at specific tasks. The goal is to allow general purpose agents to become specialized agents in the context of the work that they're doing at the time. And in many ways when Anthropic introduced this, that seemed to be the goal. Instead of developers having to build this complicated balkanized and fragmented landscape of custom designed agents for every single different use case by making capabilities and knowledge composable and accessible on demand, a much less fragmented landscape of generalized agents could access those capabilities and knowledge when needed to become specialized agents. A skill is basically a folder or a directory that contains a file called Skill md, in other words a markdown file. That file has a name, a description and instructions. When an agent that has access to skills starts up, it loads the names and descriptions of all installed skills into its systems prompt and then when a relevant task comes up, Claude can read the full instructions. This is what Anthropic calls progressive disclosure Claude only loads context when it needs it. In other words, Claude doesn't have to waste a bunch of time loading up all the instructions in each skill. It can just sort through that name and description metadata to figure out which skills it should be accessing for a particular task. So layer one of progressive disclosure is that basic metadata of a name and a description. The second layer of detail is the actual body of the file with instructions, procedural knowledge, context, whatever it may be, if there is even additional content that can also be bundled underneath, leading to a third level of progressive disclosure. In that announcement, Postanthropic wrote As skills grow in complexity, they may contain too much context to fit into a single skill MD or context that's relevant only in specific scenarios. In these cases, skills can bundle additional files within the skill directory and reference them by name from Skill md. These additional linked files are the third level and beyond of detail, which CLAUDE can choose to navigate and discover only as needed in the example they give, which is a comprehensive PDF toolkit for extracting text and tables. The second layer, Overview, includes a line for advanced features, JavaScript libraries, and detailed examples. See Reference MD. And if you need to fill out a PDF form, read Forms MD and follow its instructions, this is that bundling of additional content. So like I said, sometimes skills are going to include procedural knowledge, sometimes they're going to include background and context. Sometimes they're going to include code. For example, instead of Claude generating code to extract PDF form fields, a skill might include a Python script that does it reliably. So there are a bunch of theoretical benefits of this system. Skill files are markdown files, meaning that anyone can write them. This allows for customization without engineering. If you can write instructions for a human, you can write instructions that become part of a skill. Second benefit is efficiency. Progressive disclosure means that context is only loaded when it's needed so that the user isn't burning tokens on irrelevant instructions. There's the composability benefit in the fact that skills stack. You could have multiple skills working together instead of building single purpose agents. There's reliability. We just mentioned that coding example and skills can include code that runs deterministically, but instead of it being regenerated every single time. And finally, there's portability. Institutional knowledge gets captured in a format that persists and can be transferred, meaning that new users or agents can access it immediately. So basically, if the model context protocol is an open standard for allowing LLMs to connect to external tools and data sources in a uniform way, skills are a standard for specialized instructions and context that allow LLMs or agents to perform specialized tasks without the user having to re explain the process every time. Now, when Skills came out, there was a lot of excitement about them. AI engineering thought leader Simon Willison, for example, wrote a post called Claude Skills are awesome, maybe a bigger deal than mcp Now Simon's core argument comes down to efficiency and simplicity. Back in October he wrote Model Context Protocol has attracted an enormous amount of buzz since its initial release back in November last year. Over time, the limitations of MCP have started to emerge. The most significant is in terms of token usage. GitHub's official MCP on its own famously consumes tens of thousands of tokens of context, and once you've added a few more to that, there's precious little space left for the LLM to actually do useful work. Simon continued, My own interest in mcps has waned ever since I started taking coding agents seriously. Almost everything I might achieve with an MCP can be handled by a CLI or command line interface instead. LLMs know how to call CLI tool help, which means you don't have to spend many tokens describing how to use them. The model can figure it out later when it needs to. Skills have the exact same advantage, only now I don't even need to implement a new CLI tool. I could drop a markdown file and describing how to do a task efficiently, adding extra scripts only if they'll make things more reliable or efficient. Now trying to simplify this as much as possible. Basically what Simon is saying is that with MCP you have to build something for Claude to use a tool. With a cli, Claude can just use tools that already exist. But with Skills, Claude can just read instructions you wrote and figure it out. And indeed to Simon, as he puts it, the simplicity is the point. He writes, one of the most exciting things about Skills is how easy they are to share. I expect many skills will be implemented as a single file. More sophisticated ones will be a folder with a few more Something I love about the design of Skills is that there is nothing at all preventing them from being used with other models. You can grab a Skills folder right now, point codec CLI or Gemini CLI at it and say read PDF SkillMD and then create me a PDF describing this project and it will work. Despite those tools and models having no baked in knowledge of the skill system, I expect we'll see a Cambrian explosion of skills which will make this year's MCP rush look pedestrian by comparison. The core simplicity of the Skills design is why I'm so excited about it now. In retrospect, that looks a little prophetic. Sean Wang Swix wrote, I was skeptical when Simon Willison said that Claude Skills are awesome, maybe a bigger deal than mcp, but early indications are this is correct. He then shared a talk from the recent AI Engineer Code Summit, which he said is the fastest talk to ever pass 100,000 views on the AI Engineer channel. The talk, by the way, was about why we should stop building agents and start building skills. The problem they identified was intelligent agents lack expertise. Genius without experience, as they put it. The solution is a new architecture with skills. A skill they say is an expert in a folder and the new app store for AI are the skills that they can access. The old way then are monolithic agents that have a separate agent for each domain, hard coded or prompted in context, and which doesn't improve over time, while the new Way Agents plus Skills are a general agent with many skills packaged in simple reusable folders that enable continuous and tangible learning. Then at the end of last week, people started to notice skills showing up in the OpenAI ecosystem. AI techie Arun writes, OpenAI just quietly stole anthropic's homework and it's brilliant. OpenAI integrated anthropic skills mechanism into ChatGPT and Codex, allowing the models to dynamically manage files like spreadsheets and PDFs. This modular approach to agent capabilities is proving to be a foundational piece of Next Gen LLMs. Simon Willison also picked up on this on Friday. He wrote, OpenAI aren't talking about it yet, but it turns out they've adopted Anthropic's brilliant skills mechanism in a big way. Skills are now live in both ChatGPT and their codecs CLI tool. This was confirmed a couple days later by Thibaut at OpenAI who wrote, We've added experimental support for Skills and it combines well with GPT5 too. Already seeing some cool things in the wild that leverage skills in Codex. I think about Skills as an extension of Agents MD with progressive disclosure. By the way, Agents MD was OpenAI's lightweight markdown standard for providing AI coding agents specifically with project specific instructions. So thinking in a similar domain now. In Simon's new post he wrote, one of the things that most excited me about Anthropic's new skills mechanism back in October is how easy it looked for other platforms to implement. A skill is just a folder with a markdown file and some optional extra resources and scripts. So any LLM with the ability to navigate and read from a file system should be capable of using them. It turns out OpenAI are doing exactly that, with Skills Support quietly showing up in both their codec CLI tool and now also in ChatGPT itself. Now so far people are just starting to experiment and figure out how they work in OpenAI. But as Simon summed up when I first wrote about SKILLS in October, I said they're awesome. Maybe a bigger deal than MCP. The fact that it's just turned December and OpenAI have already leaned into them in a big way reinforces to me I called it one correctly. Hold aside Simon's good call, this to me is continued evidence that it matters way more to these foundation lab companies to move at the speed of development than to own the standard. Kishan wrote. OpenAI seems comfortable to let Anthropic create standards like MCP and skills then adopt them later. Skills are wonderfully simple and I wish all the CLI agents adopt the pattern. Look, even though 2025 was a big year for agents in a lot of ways, it's still very clear that we are so barely scratching the surface of what's possible. And one of the things that will accelerate us heading into 2026 is the common adoption of these mutual standards. So super interesting stuff. Excited to see what people go build with this. For now that is going to do it for today's AI Daily Brief. Appreciate you listening or watching as always. And until next time. Peace Sa.
