Transcript
A (0:00)
Today on the AI Daily Brief meet the open source model that is outperforming GPT5 and basically everyone else when it comes to agentic performance. Before that in the headlines. Maybe Vibe coding isn't dead after all. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Super Intelligent Robots and Pencils, Blitzy and kpmg. To get an ad free version of the show go to patreon.com aidaily brief or you can subscribe on Apple Podcasts. If you are interested in sponsoring the show and especially if you are hoping to get any Q1 placements, now is a really good time. Things are filling up fast and I'm trying to map everything out. So if you are interested or thinking about sponsoring the show and you just want to learn about the opportunities we have, send us a Note@ SponsorsIDailyBrief AI Like I said, if you are hoping to get Q1 placement, now is a good time to reach out. Lastly, as I mentioned yesterday, we are now up over a thousand use cases contributed to the AI ROI Benchmarking study. I am so appreciative of all of your help so far and if you want your use cases included, as well as to get access to the full readout of all of this incredible AI ROI information, go to roisurvey AI. It'll be live for about another week and a half. With that, let's dive in. Welcome back to the AI Daily Brief Headlines edition. All the daily AI news you need in around five minutes. Apparently, rumors of vibe coding's demise have been greatly exaggerated. Speaking with TechCrunch on Monday, Lovable CEO Anton Osika said that the company is closing in on 8 million users, dramatic growth from their 2.3 million active users back in July. Osica claimed the company is now seeing 100,000 new products built on Lovable every single day. We didn't get a new revenue number, but lovable crossed the $100 million ARR milestone back in June and there are currently rumors of new funding being raised at a $5 billion valuation, which would almost triple their valuation from fundraising over the summer. Now part of the interview addressed a report from Barclays in September which showed that traffic to Lovable had dropped by 40% since a peak in August. Osika said that retention was still strong, with 100% net dollar retention meaning the average user spends more over time now of the major Vibe coding startups, Lovable might be the one that's most focused on empowering non coders. The platform not only enables easy prototyping, but is increasingly being used to deploy full products. If you've ever been on aidailybrief AI, for example, that is built, maintained and hosted, all with help from Lovable. Now, when it comes to where the company is focused, it follows from that same specialization, osica said. The part of the engineering organization that we're moving the quickest on hiring is security engineers. He said that the goal is to make building with Lovable more secure than building with just human written code. Now, in terms of the battle for the vibe coding space and increased competition from OpenAI and Anthropic, Osika said that he thinks it's not winner take all. He said if we can unlock more human creativity and human agency and just driving the change so that anyone can create if they have good ideas, that should be celebrated regardless of whoever does that. Next up Meta has returned to open source with a new speech recognition model called Omnilingual ASR. The model's big selling point is support for a huge range of underserved languages. Out of the box, the model can recognize over 1600 languages. In contrast, OpenAI's open source whisper model supports 99 languages. Developers can also extend this support with a feature called Zero Shot in Context Learning. The model can learn new languages at inference time using just a few paired examples of speech and text with no retraining required. Meta said the feature can allow the model to support as many as 5,400 languages, which is pretty close to every language in use globally. Functionally, then, Meta are claiming to have created something like an AI Rosetta Stone for Universal Speech Recognition, reported Benchmarks are also very strong with the model, more than quadrupling the performance of OpenAI's Whisper Large model. Meta claims a character error rate of less than 10% for 95% of high and medium resource languages, as well as 36% of low resource languages with less than 10 hours of audio in their data sets. Now, while the model itself is very cool, the reason that most people are taking notice is that the release suggests that Meta might not be completely done with open source models. When Mark Zuckerberg started spending billions of dollars to build out the Superintelligence team, there was a suspicion that the days of leading open source models coming out of Meta were numbered. Does this suggest that those concerns were overblown? Only time will tell, but it's certainly a positive sign. Next up, some interesting comments from a deep SEQ researcher who has warned that AI could replace most jobs within a decade. Senior researcher Chen Deli made a rare public appearance at the World Internet Conference in China late last week alongside executives from five other AI and robotics companies. He warned that over the next 10 to 20 years, quote, societal structures will also be greatly challenged. Tech companies should play the role of guardians of humanity, at the very least protecting human safety, then helping to reshape societal order. Chen said that we're currently in the honeymoon phase where AI cannot work independently to complete economically useful tasks and people can harness AI to boost their own productivity. However, he predicted that the next five to 10 years will see a rapid transition that leads to massive job cuts. Chen suggested, quote, during this period, tech companies should serve as whistleblowers, warning society of potential risks. Now, this view certainly isn't rare in the West. What makes it interesting is to see it emerge from one of the leading Chinese companies. AI optimism among the US population is among the lowest in the world at 39%, but in contrast Chinese sentiment is among the highest at 83%. The AI transformation has become a core part of the Chinese government's economic and social strategy in the in that context, the comments from Chen seem extremely non consensus and frankly even potentially a little risky. Moving over to markets, Core Weave more than doubled their revenue forecast last quarter, but delays in data center construction have lowered revenue forecasts. The AI data center operator reported earnings on Monday with revenue doubling year over year to come in at 1.36 billion, outperforming analyst estimates. Core Weave also trimmed their loss, making to $0.22 per share, coming in way under the $0.57 per share projected by analysts and and an 85% reduction compared to a year ago. Still, the big story from Core Weave's earnings was a delay to a major product that's limiting forward revenue. CEO Michael Intrader disclosed that a third party developer is causing temporary delays. Fourth quarter earnings will be impacted, but the client agreed to an adjusted timeline so Core Weave will maintain the full value of the contract, Intrader said. Everybody is frustrated. The data center provider is frustrated, we're frustrated, the client is frustrated. People who are waiting on the next iteration of AI are frustrated. Now the mystery client could be OpenAI or Meta, who each have over 10 billion in contracts with Core Weave. Core we've lowered full year revenue forecast to 5.05 billion from 5.15 billion due to the delays. Now one really positive signal. However from that call it seems that installed GPUs are holding their value for longer than expected. Coreweave has been criticized in the past for assuming a six year depreciation schedule on Nvidia H1 hundreds, which is longer than the more common four or five year schedule during earnings. However, Core we've announced that their first H100 contract was reaching expiry and was re signed within 5% of the original price. In other words, at the moment at least it looks like the scarcity of COMPUTE is trumping all other factors in the current market. Now checking in on AI stock themes. Overall, it does seem like many of the jitters last week were perhaps broader macro factors and not AI alone. As we came into the week with a deal to end the government shutdown deal on the horizon, there was a major Wall street rebound with AI stocks leading the way. The S&P 500 was up 1.3%, winning back around 75% of its drop from last week. The Nasdaq regained around two thirds of last week's loss, and Nvidia led the way with a 4.8% rally. Now I certainly do not think that this means that all of the concern that we saw last week was just based on bigger macro factors, but it is a good reminder that right now AI is both the chief beneficiary and and biggest victim of any shift in market sentiment, good, bad or otherwise. That, however, is going to do it for today's headlines. Next up, the main episode. Today's episode is brought to you by my company, superintelligent. You've got a hundred what if ideas, but which one becomes an agent? Superintelligent maps every AI use case across your company and helps you create an agent plan that you can actually execute. We match opportunities to your tech stack, your data profile, and your team. No more guesswork, just a clear path from pilot to production. If you want agents that deliver business outcomes, start with planning. Go to BeSuper AI and sign up for a demo. Small, nimble teams beat bloated consulting every time. Robots and Pencils partners with organizations on intelligent cloud native systems powered by AI. They cover human needs, design AI solutions, and cut through complexity to deliver meaningful impact without the layers of bureaucracy. As an AWS Certified Partner, Robots and Pencils combines the reach of a large firm with the focus of a trusted partner. With teams across the us, Canada, Europe and Latin America, clients gain local expertise and global scale. As AI evolves, they ensure you keep peace with change, and that means faster results, measurable outcomes, and a partnership built to last. The right partner makes progress inevitable. Partner with Robots and pencils@ropotsandpencils.com aidaily Brief this episode is brought to you by Blitzi, the enterprise autonomous software development platform with infinite code context. Blitzi uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the blitzi platform, bringing in their development requirements. The blitzi platform provides a plan, then generates and pre compiles code for each task. Blitzi delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public Companies are achieving a 5x engineering velocity increase when incorporating Blitzi as their pre IDE development tool, pairing it with their coding pilot of choice. To bring an AI native SDLC into their org, visit blitzi.com and press get a demo to learn how Blitzy transforms your SDLC from AI Assisted to AI native. What if AI wasn't just a buzzword but a business imperative? On youn can with AI we take you inside the boardrooms and strategy sessions of the world's most forward thinking enterprises. Hosted by me, Nathaniel Whittemore and powered by kpmg, this seven part series delivers real world insights from leaders who are scaling AI with purpose, from aligning culture and leadership to building trust, data readiness and deploying AI agents. Whether you're a C suite executive strategist or innovator, this podcast is your front row seat to the Future of Enterprise AI. So go check it out at www.kpmg.us aipodcasts or search you can with AI on Spotify, Apple Podcasts or wherever you get your podcasts. Welcome back to the AI Daily Brief. Today we are once again talking about another Chinese open source model that is really changing people's sense of what is possible in the field of AI today. Now to put this model release in some proper context, we have to go back to January. It is now coming up towards the end of the year and of course this is the time when I start to plan out my end of year coverage, which is a big time for reflecting on the year that has passed and what's to come. And any end of year Big Story recap is inevitably going to kick off with the Big Story from January, which was of course the release of Deep Seq. When Chinese lab Deepseek dropped their reasoning model, it caused an absolute tizzy in the AI industry that even sent stocks reeling. Now there were three big reasons that Deep SEQ was such a big deal. The first was that it totally changed people's perception of how far behind US China really was. Up until that point, people were working on the assumption that when it came to model development, China was meaningfully behind the us, and Deep Seek seemed to suggest that wasn't true. The second big reason for concern, and the one behind the big stock wobble, was that at the time it was it had appeared that they had achieved those results at significantly lower cost than big US training runs. This made everyone question the incredible amount of resources being spent on the data center buildout. The third reason Deepseek was such a big deal was more on the consumer side. When they released their R1 reasoning model, the Chatbot app that housed it actually dethroned ChatGPT to become the number one downloaded free app on Apple's App Store for iPhone. Now what was interesting about this was that Deep Seq was not the first company to release a reasoning model. At that point, OpenAI's R1 had been available for a number of months. The difference was that Deep Seq made it available for free, meaning that for most people it was their first experience with a reasoning model. Which of course, if you've ever experienced the jump from a non reasoning tool, reasoning model is just a fundamentally different LLM experience. So this is what kicked off the year and set the tone for a number of different conversations that we'd be having throughout the year. Now, more recently, the whole China element of this story has heated back up in a big way. Nvidia CEO Jensen Huang recently said in very stark terms that he believed that China would win the AI race because of their disposition towards it. And even though, by the way, all these outlets are reporting that he backtracked, for my money, the backtrack was kind of more just a reaffirmation of what he was saying while trying to present a slightly more positive spin like the US still had a chance. Along with the rise in AI skepticism among market investors, there has also been a surge in the idea that China isn't building as many data centers and that perhaps the US is overbuilding. Then investor Gordon Johnson went viral with a tweet that said question for the AI bulls. The US currently has around 5,426 data centers and is investing billions to build more. China has around 449 data centers and is not adding. If AI is real, why isn't China building thousands of data centers every month which they could clearly do semi analysis? Dylan Patel responded, where did you get the idea that they aren't adding? Not as much as the US but China has thousands of data centers and are building many more. Your data source sucks. Now the substance here is less important than the narrative and the fact that once again, China's actions become the big foil for the US's. And this is the setup into which the new Kimmy K2 thinking model was released. The new model was released by Moonshot last Thursday with claims of outperformance on major benchmarks. The model purportedly leads both GPT5 and Claude Sonnet 4.5 on humanity's last exam, which is a general knowledge test test on Browse Comp, which is a test of agentic search, and SEAL 0, which is a test of the ability to collect real world data. The model lags slightly on major coding benchmarks like Suite Bench Verified, but not by much. Didi Das of Menlo Ventures wrote, today is a turning point in AI. A Chinese open source model is number one. Kimi K2 thinking scored 51% on humanity's last exam, higher than GPT5 in every other model. $0.60 per million tokens and two and a half dollars per million tokens output the best at writing and does 15 tokens per second on two Mac M3 ultras. Seminal moment in AI, in other words, the point that Didi is making here is that in addition to performing well, it's doing so cheaply and in a way that's efficient enough that people could run it on their own hardware. Now, in addition to scorching the benchmarks, Moonshot claimed the model is capable of 200 to 300 sequential tool calls without human interference. If that's true, it would make it incredibly capable for agentic workflows. Frankly, head and shoulders above many of the Western frontier models. Indeed, according to independent testing from artificial analysis, Kimi is now ranked ahead of GPT5, Claude 4.5, Sonnet and Grok4 on Agentic tool use, and there's a fairly significant gap. Some, like Dan Max, suggested that this might be enough to delay the release of the next generation of models. As the Frontier Labs go back to the drawing board, referencing that same recent quote that we were just talking about from Jensen Huang, the one where he said that Chinese AI is nanoseconds behind America. Dan wrote, Jensen is right. Look at Kimmy K2 thinking watch for delayed releases of Gemini 3 Opus 4.5 and GPT 5.1. Delays signal they are not clearly better or cheaper than Kimmy K2 thinking. That is evidence that the USA is indeed falling behind in the race, said machina. Kimmy K2 beating Gemini 3 would be, well, humiliating doesn't even cover it. Think about what Google has decades of data, the best talent money can buy, infrastructure that runs the Internet, and they're sweating a smaller team's model. That's not supposed to happen in tech. The big guy wins always. Maybe not this time though. Now part of what has people excited is that the model is open source, so people were running their own tests over the weekend. Pietro Scurra, the CEO at magic path AI, wrote kimik2 thinking is incredible, so I built an agent to test it out. Kimi Writer it can generate a full novel from one prompt, running up to 300 tool requests per session. Here it is creating an entire book, a collection of 15 short sci fi stories. LXE gave the model the task of balancing nine eggs, a book, a laptop, an empty plastic bottle and a nail to try out its reasoning. The model came up with a counterintuitive solution of arranging the eggs to support the book as the starting point, then adding the book, laptop, bottle and the nail in turn. Alexey remarked, Kimiketu Thinking is the only modern reasoning model in recent memory that provided a human solution to this on the first try. Now another big shift here is that Chinese models are now right there with the US models on coding AI coding has been the breakout killer use case for this year, and frankly that's probably been something of a comfort for the Western companies as this is one area where they've continued to maintain something of a lead. At the beginning of the year, Claude 3.5 Sonnet was the premier model with no close competitor. Since then, later versions of Claude GPT 5, Gemini 2.5 Pro, Grok 4 all have vied for the top of the leaderboards and API credits from developers. Increasingly, though, Chinese models are catching up, if not to the absolute state of the art, at least presenting a very compelling cost to value trade off. Kimi K2 thinking is clearly better at coding than Claude 3.5 Sonnet, the model that everyone was using just a few months ago, and it's being served at a fraction of the cost. In a recent article, the information suggested that that competition is a huge problem for Anthropic, in particular given how much of their revenue is derived from API use for coding. They also point out that looking abroad is an imperative for the Chinese startups, writing it is critical they find customers outside China who pay to access the AI models through APIs, no matter how low the prices are. That's because it's difficult for AI companies in China to generate revenue from domestic customers where price competition is fierce and business customers are reluctant to pay for subscriptions, the article continues. As the overall AI coding market grows rapidly, the Chinese companies are betting that there will be sufficient demand for cheaper and good enough options. And in fact, this is one way that the release of Kimike 2 could end up being different to the Deep Seq moment. If the release of Deep Sea Gar 1 was all about giving consumers their first glimpse of reasoning models that were hidden behind the paywall at OpenAI, Kimi K2 thinking could end up being more about providing a near state of the art model that could perform in the enterprise at a fraction of the cost. Another interesting shift is that models like Kimmy K2 thinking are opening the door to self hosted LLMs in a way that wasn't really feasible last year. Up until recently, there has been a stark trade off when a developer chose to run models locally. Previously you could use open source models to underpin products that didn't need state of the art AI, or you could tinker around with them. But for serious advanced production use cases, there needed to be a very significant reason to want the privacy or security of a local model to make up for the produced performance. Kimike2 thinking is one of a crop of Chinese models that have reduced that gap. One of the reasons for that is an innovation in quantization. You can think of quantization as kind of like compression for AI models. While the process reduces performance, it also lowers the memory requirements substantially to allow models to fit on consumer hardware. Kimike to Thinking, for example, can be quantized down to run on a pair of Mac M3 Ultras, which is certainly not a cheap consumer setup, but it is a realistic rig for a professional programmer or a company. Some are starting to Wonder if local LLMs will be a growing trend. I'm not really sure that I'm convinced at this point, but it is possible that we will see certain types of industrial use cases where the balance of value that you get from running locally does shift things, and that will be an important trend to keep an eye on. And while we haven't seen a lot of US enterprises all of a sudden adopting Chinese models, there are growing reports that the startup ecosystem has already made the switch. Bloomberg opinion columnist Kathryn Thorbeck wrote in recent weeks, a subtle shift has become increasingly apparent. Speculation has been stirring for months that low cost open source Chinese AI models could lure global users away from US offerings, but now it appears they are also quietly winning over Silicon Valley. She referenced Chamath Palihapitiya, commenting that One of his portfolio companies has already moved major workflows to Kimmy K2, which he said is, quote, frankly just a ton Cheaper than OpenAI and Anthropic. That same week, Airbnb CEO Brian Chesky said that they hadn't integrated with OpenAI because the connections aren't quite ready. Instead, Airbnb's new service agent is quote, relying a lot on Alibaba's Quentin 3 model, which Cesky said is very good and also fast and cheap. Miramoradi's thinking machines lab is also building on Quen3Cursor's new in house coding agent. Composer1 is rumored to be built on top of a Chinese model and hugging face. Downloads for Quen have recently overtaken downloads of Meta's llama models, suggesting a shift in user patterns for open source AI. Referencing that same Jensen Huang quote, Thorbeck wrote, it's premature for Huang to declare a winner. The US still has clear advantages when it comes to access to cutting edge chips and computing power, but Beijing's low cost and open source push is undoubtedly attracting developers, the backbone of AI innovation. If Washington truly wants to come out on top in the long run, it should start by asking why Silicon Valley is already switching sides. So what's the net of all of this? Kashyap Patel writes, Kimmy K2 thinking is more important than O3 not because the model is better, but because of what it signals about the future of AI development. For him, there are a few different elements of this. First, that the open source lag is now measured in months, not years. That basically we've seen the closed model advantage window collapse from more than 18 months to three to four months. That China is treating AI like they treated electric vehicle manufacturing. In other words, not trying to match the west, but trying to lap it on price and accessibility and competing on economics. And then this observation. The real race isn't to AGI, it's to democratization, he writes. Who cares if you build AGI if only a thousand companies can afford it? Kimik 2 provides frontier performance at commodity prices. That's the game. Dean Zacharyansky thinks that the agentic capabilities update is the real deal here. He writes in July 2025 models could not effectively call tools three to five tool calls max. Then Kimik 2 released and every subsequent model has been post trained for tool calling. Now we have agents that can run for an hour and 30 minutes. This is the quietest and most significant advancement in recent memory. Bindu Ready writes, In spite of all the closed source drama, the biggest story of 2025 has been open source agentic models three new models dominate the cheap mass market. Agent Space, GLM, Kimmy, K2 and Quinn Coder are all amazing with trillions of tokens being used every day. That leads to a prediction from 2026 will be the year of open weights. We will see at least two US Labs enter the arena. Kimi and GLM will push to close the gap in agentic coding. Deep Seq will finally release R2. We will have state of the art image and video generation models. LLM developer community will explode now look, obviously one of the subtext for a lot of this show is around the geopolitics of this, but when it comes to consumer choice, it's hard to see all of these advancements as anything but incredibly valuable. New frontiers of performance and costs are being pushed bringing the efficiency and affordability of everything down. And that's going to mean all of us being able to do even more with these models than what was previously possible. Pretty interesting stuff. Obviously a lot to keep track of for now. That's going to do it for today's AI Daily Brief. Appreciate you listening or watching as always and until next time, peace.
