
Loading summary
A
Today on the AI Daily Brief is 2026 going to be the year of AI agent swarms. Before that in the headlines, some big jumps in Anthropics, fundraising and revenue. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, ZenCoder and Super Intelligent. To get an ad free version of the show, go to patreon.com aidaily brief or you can subscribe on Apple Podcasts. To learn more about sponsoring the show, send us a Note@sporsidailybrief.AI Also, if you are interested in the research that we did at the end of last year, we have our next research kicking off soon. To keep track of all that as well as to hear about future products we have coming AI maturity maps, AI opportunity radars and much more, go to aidbintel.com, where you can sign up to get that information as soon as it comes out. Now with that out of the way, let's dive in. Welcome back to the AI Daily Brief Headlines edition. All the daily AI news you need in around five minutes. We kick off today with some fundraising and business news out of Anthropic. The company is close to finalizing their latest funding round, which could raise more than $20 billion. Reports state that Anthropic has between 10 and 15 billion in firm commitments that could be finalized early next week, including the Singapore Sovereign Wealth Fund and Sequoia making large investments. Anthropic has also recently doubled the size of the round from 10 to 20 billion in response to excessive interest. One investor told the Financial Times that the round was five to six times oversubscribed before the size increase. In addition to venture capital and sovereign wealth, Microsoft and Nvidia have also committed to invest a total of 15 billion in the company, which is on top of the 20 billion from investment firms. The round would reportedly value anthropic at $350 billion, almost a doubling from their Series F, which closed in September. The fundraising frenzy firmly cements Anthropic's momentum. Last year, remember, OpenAI raised 40 billion, anchored by 30 billion from SoftBank, meaning that anthropic is now neck and neck with those figures. In addition to fundraising news, the information has an update on Anthropic's revenue growth forecasts. They report that Anthropic updated investors in December and hiked forecasts across the board. 2026 revenue is now expected to come in at 18 billion around a 4x increase from last year's numbers and up 20% from estimates made last summer. In 2027, Anthropic expects to generate 55 billion in revenue for 2029. Their most optimistic forecast calls for 148 billion. That forecast is particularly notable as it's 3 billion more than OpenAI's last forecast, which was made during the summer. OpenAI of course, may have hiked expectations since then, but still very notable that Anthropic believes they could overtake OpenAI within three years. The other big number from the financial update was Anthropic's increasing training costs. They expect to spend 12 billion on training this year, which is a 50% increase from summer projections. Their forecasts also project training costs to exceed 100 billion by 2029. These increased costs push back Anthropic's timeline for profitability by a year, with the company now expecting to flip cash flow positive by 2028. Now one of the things that Dario and Anthropic have of course been weighing in a lot about is chip exports to China, with Anthropic being firmly in the camp that we should not be exporting chips to China. An update on that front as Beijing has approved the first batch of Nvidia chip imports. Reuters reports that Chinese officials have improved the import of Several hundred thousand H200s, allowing access to the advanced chips for the first time. Sources said the first batch of approvals were primarily allocated to three unnamed tech giants. The Wall Street Journal later named Alibaba and ByteDance as two of the three receiving approval. Other enterprises are still in the queue awaiting a subsequent round of approvals, presumably including high flying startups like Deepseek who may have to wait in line to set up their H2 hundreds. Reports stated that Chinese AI firms will be required to support local chip makers as well using their chips for some training tasks and most AI inference. Basically, it seems like officials are trying to strike a balance allowing Chinese companies to train advanced models while also protecting domestic chip makers. Now this could be a huge boom to Nvidia's first quarter financials. Several hundred thousand H2 hundreds is in the ballpark of 10 billion in sales, and that's only the first round of approvals. In Q2 of last year when Chinese chip exports were shut down by the US government, Nvidia reported a $5.5 billion write down associated with losing Chinese sales. That implies Nvidia could see record Chinese sales this quarter simply based on this first round of approvals. Nvidia CEO Jensen Huang is currently visiting China to meet with local employees, but reports suggest that he hasn't met with any senior officials. That said, his next stop is Taiwan, where people familiar with the trip said he plans to ask suppliers to bump up H200 production to meet Chinese demand. Moving over to the training side of the house, the UK government has expanded their AI training initiative with an ambitious new goal to upskill every worker in the country. The Department for Science, Innovation and Technology announced on Tuesday that free AI training will be made available to every adult worker. The training will come in the form of 20 minute online courses with modules covering use cases like drafting, text, content creation and automation of administrative tasks, Technology Secretary Liz Kendall said. We want AI to work for Britain and that means ensuring Britons can work with AI. Change is inevitable, but the consequences of change are not. We will protect people from the risks of AI while ensuring everyone can share in its benefits. New partners, including Cisco, Cognizant and the National Health Service will join existing partners including Amazon, Google, Microsoft and Salesforce in the upskilling initiative. The department claimed this would be the largest targeted training program since the establishment of Open University in the late 1960s, which delivers distance learning for higher education. They said the program had already delivered a million courses and the government would aim to retrain 10 million workers by the end of the decade. Workers that complete the training will be certified with an AI Foundations badge to give employers confidence they have basic AI skills. Now there is a lot that we could say about this. The cynic in me of course sees all of the potential challenges with this program, most of which sort of amount to a question of whether this is too little to move the needle. But we gotta start somewhere. Governments need to get involved in a way that is actually helpful to people adapting to a new world rather than just trying to pretend that they have control over whether that new world exists. And so for that reasons I think this is a good thing and I'm excited to see it hopefully go even farther than they're thinking right now. Now our main episode today is about a new model out of China and its agent Swarm capabilities. But Alibaba's QUIN team also released a new model earlier this week, specifically called Qin3Max Thinking. Now, as you can probably tell from the naming convention, this is the big flagship model from the Quinn team. They're equivalent of GPT5.2 Pro, Gemini 3 Pro or Opus 4.5. The model makes use of an inference technique that the Quen team are calling Heavy Mode Quantum. Quinn is doing things slightly differently from existing approaches to test time scaling, generating a response, then feeding it back into the model for improvements in a recursive loop, it appears to be generating some pretty significant gains. Quen said that this method improved benchmark scores on GPQA, which is a PhD level science test, from 90.3% to 92.8%. On live code, bench scores jumped from 88% to 91.4%. Overall, the benchmarking looks pretty strong. Now the cost is a little beefy For a Chinese open source model, Quinn 3 max thinking comes in at around the same cost as Claude Haiku 4.5, meaning that it's still much cheaper than models like Gemini 3 Pro or GPT 5.2, but about 10 times more expensive than Deep Sea V3. 2. Now Quinn 3 is already being used by many American companies. Airbnb CEO Brian Chesky, for example, recently said that his company was relying on Quinn 3 as a more affordable alternative to US models, meaning that you got to think that they will be watching this model release closely. Although again, how it stacks up compared to Kimik 2.5, which we will talk about in our main episode, remains to be seen. Lastly, today it's not just the Chinese labs with some interesting new product to show off. Google has released a new feature for Gemini 3 Flash called Agentic Vision. The feature leverages Gemini's state of the art multimodal reasoning with code to execute unique capabilities, writes Google. Agentic Vision introduces an agentic Think act observe loop into image understanding tasks. Think the model analyzes the user query in the initial image, formulating a multi step plan act. The model generates and executes Python code to actively manipulate images such as cropping, rotating or annotating or analyzing them, such as running calculations, counting bounding boxes, et cetera. Last is Observe. The transformed image is appended to the model's context window. This allows the model to inspect the new data with better context before generating a final response. Overall, this promises to improve Gemini's ability to annotate images, perform data visualization tasks, help with basic image analysis. Google said that the loop improves model performance by between 5 and 10% across most vision benchmarks. Still, developer experience lead Omar San Saviero hinted at the most exciting unlock from the new feature. He showed an output of an annotated image of a table containing a spill. Gemini had identified a spill, a piece of cloth and several other items. The annotations appear to be instructions for a robot to clean up the spill by first clearing away the items in the way, dampening the cloth and wiping up the spill. The implications of course, being that this feature could be used to give robots on the fly analysis and reasoning ability, allowing them to tackle tasks that they've never seen before. Ultimately, though, as I said, when it comes to new models, the big conversation is around Kimi K 2.5 and so with that, we will wrap up the headlines and move on to the main episode. Hello friends. If you've been enjoying what we've been discussing on the show, you'll want to check out another podcast that I have had the privilege to host, which is called you can with AI from kpmg. Season one was designed to be a set of real stories from real leaders making AI work in their organizations, and now season two is coming and we're back with even bigger conversations. This show is entirely focused on what it's like to actually drive AI change inside your enterprise and as case studies, expert panels, and a lot more practical goodness that I hope will be extremely valuable for you as the listener. Search you can with AI on Apple, Spotify or YouTube and subscribe today. If you're using AI to code, ask yourself are you building software or are you just playing prompt roulette? We know that unstructured prompting works at first, but eventually it leads to AI slop and technical debt. Enter zenflow. Zenflow takes you from vibe coding to AI first engineering. It's the first AI orchestration layer that brings discipline to the chaos. It transforms freeform prompting into spec driven workflows and multi agent verification where agents actually cross check each other to prevent drift. You can even command a fleet of parallel agents to implement features and fix bugs simultaneously. We've seen teams accelerate delivery 2x to 10x, stop gambling with prompts, start orchestrating your AI. Turn raw speed into reliable production. Grade output at Zenflow Free. Today's episode is brought to you by my company, Superintelligent. In 2026, one of the key themes in enterprise AI, if not the key theme, is going to be how good is the infrastructure into which you are putting AI in agents. Superintelligence Agent readiness audits are specifically designed to help you figure out one where and how AI and agents can maximize business impact for you and two what you need to do to set up your organization to be best able to leverage those new gains. If you want to truly take advantage of how AI and agents can not only enhance productivity, but actually fundamentally change outcomes in measurable ways in your business this year, go to BeSuper AI. Welcome back to the AI Daily Brief. Today we're talking about something that has been of interest to people for quite some time. When I first started this show, all the way back In April of 2023, already there were people who were extremely interested in the way that LLMs could generate code. Now it would take a couple of years and some significant advances in the models to actually unleash vibe coding in the way that it happened over the course of 2025, but the idea was there very early. We've similarly had interest in vast teams of agents that can coordinate amongst themselves to accomplish more things, even if the capability set hasn't fully been there. Which isn't to say that people haven't been experimenting. Lindy released their Agent Swarm tool back in April of 2025 and the concept is related to something that I've talked about on this show, the doctor Strange theory of AI Agent work. Now the specific point that I've made is actually about the difference in how enterprises think agents will play out versus how I think they will play out, with the difference being that I don't think that agents are going to be one to one replacements for existing human work. I think that we're going to be able to deploy lots and lots of agents to scenario and war game different types of work. Which while not exactly the same as agent swarms, which are more about breaking down complex tasks into specific subtasks, is in some ways still part of the same larger conversation about how agents will actually work in the future. Over the last couple of days we have started to get the first big model releases of 2026, and maybe the most significant so far is Moonshot's Kimi K2.5. While it is the agent swarm feature of K2.5 which has the most chatter, it's worth checking out the broader model as a whole. Artificial analysis sums up the shift when they write Moonshot's Kimik 2. 5 is the new leading open weights model, now closer than ever to the frontier, with only OpenAI, Anthropic and Google models ahead. And indeed the benchmarks are impressive. K2.5, for example, claims 50.2 on humanity's last exam, which would put them ahead of GPT 5.2 running on high settings, Opus 4.5 and Gemini 3 on a variety of other benchmarks as well. They claim performance that matches or exceeds these premier Western models on the overall independent Artificial Analysis Index. Kimi jumps from 11th place overall with their K2 thinking model into fifth, only behind two iterations of GPT 5.2, Opus 4.5 and Gemini 3 Pro. And of course the cost is cheaper than any of those models. In AA's tests, Kimi K2.5 was about four times cheaper than Opus 4.5 or GPT 5.2, but was still much more expensive than, for example, Deep Seq version 3.2. One of the things that Moonshot has emphasized in their launch is the model's native multimodality artificial analysis. Again, writes Kimik, 2.5 is the first flagship model from Moonshot to support image and video inputs. This is the first time that the leading open weights model has supported image input, removing a critical barrier to the adoption of open weights models compared to proprietary models from the Frontier Labs. They point out that this makes a significant difference as compared to other open weights leaders like Deep Seq's V3. 2. Now anytime we get a model out of China, of course, one aspect of the discourse is what it says for the state of the AI race. On that front, there were a number of people who took to Twitter X to share examples of Kimi 2.5, claiming that it was Claude Enrico from Big AGI says identity crisis or Training Set. Still, overall, even with some of the suspicion of distillation of Western models, The release of 2.5 certainly validates the recent arguments from people like Demis Hassabis that Chinese models are very, very close to the US when it comes to performance, if not yet having had an example of actually pushing the frontier. As Balaz Nathi points out, however, the real value in 2.5 is not, as he puts it, pure IQ dominance, it's about how it does in an actual work environment. He calls it less chatbot and more employee. And indeed, there are a couple things that stood out to me about the 2.5 announcement that are really impressive. One is the way that they're using this multimodal input capability in the context of coding. They show an example of taking a screen recording of a website, dumping it into Kimmy and asking it to clone it with Kimmy shipping that code, including UX and interactions. If this actually works like that, it opens up a significant new frontier in AI coding that you have to imagine that everyone will raise to copy very quickly. Another thing that Moonshot emphasized is how good 2.5 is at office skills, things like financial modeling in Excel or creating high quality PowerPoints. Now again, this could be incredibly valuable when it comes to work, although I haven't really been able to find a ton of examples yet of people testing this out that don't just feel like paid influencer posts. One that I found that did seem to positively test out these features came from Shafi. He wrote this new AI model Kimi from China created a full slide deck from my journal article in one single shot prompt. I just gave it the keyword and journal name, not even the link or PDF to the article. It searched the article and found the correct one, developed the contents after reading the paper, created contents for 12 slides including searching images from Internet, asked for suggestions to make edits which I declined and asked it to go ahead and generated slides in a PowerPoint format. Everything happened inside my phone in five to six minutes. Since it's my own article, I know it got most of the things right and yet as I said at the beginning, probably the feature that people are most excited about is this Agent swarm parallelization. An example that Kimmy gave was adapting O. Henry's short story the Gift of the Magi into a 10 minute short film. They asked it to generate a highly consistent storyboard script and embed it into an Excel file, which they said from a single prompt created a 100 megabyte Excel file generated with images with a total of 55 scenes. Simon Willison writes the self directed Agent Swarm paradigm claim there means improved long sequence tool calling and training on how to break down tasks for multiple agent to work on at once. He gave it the prompt I want to build a dataset plugin that offers a UI to upload files to an S3 bucket and stores information about them in an SQLite table. Break this down into 10 tasks suitable for execution by parallel coding agents. He said the response was pretty good. It produced 10 realistic tasks and reasoned through the dependencies between them. GlobalSoul writes tried Kimmy Moonshot agent swarms and it is quite magical. Basically they gave Kimmy a list of stocks and asked it to create a report that analyzes each from a variety of different factors. They said it created individual files for each company, an overall summary, and finished the output for all companies in 10 minutes. Swix also had an interesting experience in his testing. He writes little detail from exploring the K2.5 agent swarm preview today I asked it to make a custom website for the Latent Space podcast and despite it being trained to parallelize eagerly and having full permission to do so, it recognized that this was a noob task and did a highly competent job with one agent and refunded my credits. This thing might be AGI. I've never expected a parallel agent lab to use less than what it was trained or opted in to use. In other words, just because it could use a parallel agent structure it recognized that for certain tasks it doesn't need that. Client founder Saud Rizwan explains a little bit about what's going on in the background. He writes, LLMs are trained on sequential reasoning, breaking tasks down step by step, one to do after another. When you ask them to orchestrate parallel work, they don't know how to split tasks without conflicts. Moonshot calls this serial collapse and solved it with reinforcement learning. They used parl parallel agent reinforcement learning, where they gave an orchestrator a compute and time budget that made it impossible to complete tasks sequentially. It was forced to learn how to break tasks down into parallel work for sub agents to succeed in the environment. Simon Smith from Qlikhealth did a full test as well and came away pretty impressed. He writes, I've been thinking about the best way to organize agents in step by step workflows where each agent has skills defined by an agent skills file and to then scale this across an enterprise. Today Kimi dropped its K2.5 model along with Agent Swarms and I thought, could this be it the answer? Mostly he then walks through how you do this. First, using Kimi, you actually use the model selector to select Agent Swarm in the same way that you would select between, for example Instant or thinking mode. For Simon's task, he gave Agent Swarm the task of responding to an rfp, which included in his words, research, strategy, creative brainstorming and concept development, media planning, analytics planning, high level project planning, and consolidating everything into a final written response in a Word document. Document, he continues, as would be familiar to users of agentic coding tools like Claude code and Codex, Kimmy turns your request into a step by step plan and then proceeds to work through it. Where things get interesting, however, is how it executes the plan with multiple agents for each step in the plan, he writes, Kimmy creates a set of relevant agents. And importantly, these aren't generic agents. Agents each have roles and names. Each agent he writes plays a specific role defined for it in a prompt, and even gets a name and avatar. The role description ensures the agent focuses on a specific job to be done, and the name and avatar make this extremely user friendly. The model is then smart enough to figure out which agents can work in parallel or, in the case that an agent requires the output of a different agent, how to run them sequentially. Simon writes that you can monitor agents overall via a dashboard with progress indicators and also select individual agents to monitor their work. One of the important things that Simon points out is that part of the big upgrade here is not just the performance, but but the user experience, he writes. When I think about something that would scale up to an enterprise, which will include a lot of users who won't be comfortable in something like Claude code in the terminal, this feels like it would be easily adopted. It's extremely clear and intuitive. The model gave Simon both not only the final output, but also all of the intermediate outputs from each of the distinct agents. Now Simon's big request, and his caveat is that he wants access to connectors or MCPs as well as agent skills to be able to fully sync this with the larger ecosystem of data that people work in. Overall though, he says, I'm impressed. I've been waiting for something like this that makes it easy for anyone, regardless of technical expertise, to ask AI to do something and have it complete the task. With multiple agents playing different roles and working collaboratively, this feels like the emerging future of humans managing teams of AI agents the way they currently manage teams of other humans. I honestly don't understand how Kimmy got here first. There are other solutions out there for agents to work together on tasks, but everything I've seen is too technical for the average user requiring you to use the terminal, or too rigid requiring you to pre build workflows. How did Kimi create such a great model with such excellent agentic capabilities and build such an intuitive interface? Now this is the interesting question and why it makes me feel like we are very much seeing the beginning of a broader phenomenon around these agent swarms. In addition to K2.5, I've seen a couple people talking about Claude Code's new task system in the same context, and so it seems like something that's probably on the minds of those folks as well. LangChain developer Sidney Runkle is also talking about this sub agent's architecture, all of which makes me feel like 2026 might be the year of the agent swarm. Indeed, there's enough chatter that Ethan Malik is making one last, perhaps vainglorious, attempt to steer us away from using the swarm terminology. On Monday, he tweeted let's not call groups. Both terrifying and not a useful analogy. Groups of agents should be called teams or organizations. It both describes how to structure them and also how to use them. Don't let the weird AI folk naming win again. I'm not sure where it will land when it comes to terminology, but it really does feel like this is something new happening, and I'm excited to see how it develops. I will be testing out K2 5. Maybe we'll do a special bonus Operators episode about that for now. However, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching as always and until next time. Peace. Sam.
Podcast: The AI Daily Brief: Artificial Intelligence News and Analysis
Host: Nathaniel Whittemore (NLW)
Date: January 28, 2026
In this episode, Nathaniel Whittemore dives into what could be the defining AI trend of 2026: "agent swarms". After a brisk round-up of major AI industry headlines—including record fundraising for Anthropic, Chinese chip import news, UK upskilling efforts, and fresh model launches—the main focus shifts to the emerging paradigm of agentic AI systems, with a spotlight on Moonshot’s Kimi K2.5 model. The episode offers both technical breakdowns and first-hand user experiences, probing whether swarms of AI agents working in parallel represent the next major leap in productivity, and how these developments might reshape work and enterprise infrastructure.
Shafi ([00:21:17]):
“Kimi from China created a full slide deck from my journal article in one single shot prompt...Everything happened inside my phone in five to six minutes. Since it’s my own article, I know it got most of the things right.”
Simon Willison ([00:23:40]):
“It produced 10 realistic tasks and reasoned through the dependencies between them.”
Swix ([00:24:28]):
“This thing might be AGI. I’ve never expected a parallel agent lab to use less than what it was trained or opted in to use.”
Simon Smith ([00:27:05]):
“When I think about something that would scale up to an enterprise...this feels like it would be easily adopted. It’s extremely clear and intuitive.”
Nathaniel Whittemore ([00:39:00]):
“It really does feel like this is something new happening, and I’m excited to see how it develops.”
On UK’s Training Initiative:
“We gotta start somewhere. Governments need to get involved in a way that is actually helpful to people adapting to a new world rather than just trying to pretend that they have control over whether that new world exists.”
— Nathaniel Whittemore ([00:08:18])
On Kimi 2.5’s Breakthrough:
"Less chatbot and more employee."
— Balaz Nathi ([00:19:32]) (paraphrased by NLW)
On Agent Swarm Usability:
"I’ve been waiting for something like this that makes it easy for anyone, regardless of technical expertise, to ask AI to do something and have it complete the task...Users managing teams of AI agents the way they currently manage teams of other humans."
— Simon Smith ([00:27:40])
Nathaniel frames 2026 as potentially “the year of the agent swarm,” pointing to Moonshot’s Kimi K2.5 as the clearest incarnation yet of complex, collaborative, parallel AI at work. With benchmarks rivaling Western leaders, robust agentic architectures, and a focus on user experience, this new paradigm foreshadows teams of AIs that can handle sophisticated, multi-step tasks in both technical and non-technical settings. As excitement—and terminology debates—swirl, the episode leaves listeners contemplating a near future where managing fleets of AI agents is as natural as managing human teams today.