
Loading summary
A
Today on the AI Daily Brief, a massive fundraising round for Cursor and what it says about app layer companies versus the model layer before that in the headlines. Welcome to the agentic Hacker age. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick note before we dive in. First of all, thank you to today's sponsors, KPMG Blitzy, Robots and Pencils and Rovo. To get an ad free version of the show go to patreon.com aidaily brief or you can sign up at Apple Podcasts to learn about sponsoring the show or anything else including job opportunities, speaking gigs, et cetera. Visit us at aidailybrief AI and of course while you're there, check out the AI ROI Benchmarking Study. @ this rate we are going to put together one of the biggest collections of information about actual ROI for actual use cases. If you want to get the full version of the study, come and share which use cases are driving the most value for you. You can get all of that at roisurvey AI. Now with that, let's get into some very interesting conversations to close out our week. Welcome back to the AI Daily Brief Headlines edition. All the daily AI news you need in around five minutes. We kick off today with a story that could very easily be a main episode. Anthropic say they've thwarted the first reported use case of AI enabled or really agentic cyber espionage. In mid September, Anthropic detected suspicious activity that was later determined to be a quote, highly sophisticated espionage campaign. The company said that they have high confidence that the threat actor was a Chinese state sponsored hacking group. The unprecedented part was that the group didn't just use AI for planning Claude's agenda, capabilities were used to carry out the attack. The hackers reportedly used Claude code to automate an infiltration of 30 global targets with a small number of successes. The targets were organizations like large tech companies, financial institutions, chemical manufacturers and government agencies. Anthropic monitored this activity across 10 days, banned accounts as they were identified and coordinated with authorities as appropriate. They said that Claude Code was able to perform 80 to 90% of the attack with human intervention only required during a handful of key decision points. This allowed the attack to be carried out at a speed that would have been impossible for human hackers. Claude's guardrails were circumventing the attack into smaller tasks which each seemed innocent but added up to a massive system breach. In their postmortem Anthropic wrote, this campaign has substantial implications for cybersecurity in the age of AI agents Systems that can be run autonomously for long periods of time and that complete complex tasks largely independent of human intervention. Agents are valuable for everyday work and productivity, but in the wrong hands they can substantially increase the viability of large scale cyber attacks. Anthropic believes this issue will grow as AI models become more capable, so they're expanding their detection capabilities, they wrote. With the correct setup, threat actors can now use AgentIC AI systems for extended periods to do the work of entire teams of experienced hackers. Less experienced and resourced groups can now potentially perform large scale attacks of this nature. They further noted that this is an escalation of the vibe hacking findings they reported over the summer, as those incidents still had, quote, humans very much still in the loop directing the operations. Sure, this is a topic that we will be hearing a lot more about in the months to come, but one other story from Anthropic In a very different dimension of their work, they are joining the infrastructure buildout, announcing a $50 billion commitment for U.S. data centers. Up until now, Anthropic has been a renter of compute, getting most of their access through partnerships with Google and Amazon. On the financial side, this hasn't been a big problem, allowing Anthropic to functionally spend equity instead of cash on their largest expense during their early growth phase, but it has come with trade offs. At certain points, Anthropic has been required to use in house chips from Amazon and Google when they might have preferred to be using Nvidia's GPUs. They've also been repeatedly bottlenecked by compute, leading to severe rate limits that hampered customer retention at times. With this year's rapid growth, Anthropic has stepped up to another echelon and consequently they're looking to own some of their own infrastructure. The announcement discussed several sites to be built across the US including in Texas and New York. UK based data center developer Fluidstack will partner on the project with the expectation that the data centers will start coming online next year. Anthropic spoke about the project in terms of the administration's AI goals, saying it was about, quote, maintaining American AI leadership by strengthening domestic technology infrastructure, CEO Dario Amade said in a statement. We're getting closer to AI that can accelerate scientific discovery and help solve complex problems in ways that weren't possible before. These sites will help us build more capable AI systems that can drive those breakthroughs while creating American jobs. Now, speaking of $50 billion, that is also the reported valuation from an upcoming fundraising round for Mira Murati's Thinking Machines lab. According to Bloomberg reporting sources, the deal terms haven't been finalized and some sources said the round could close at 55 or even 60 billion. For those keeping track at home, that would be a very quick forex from TML's 12 billion dollar valuation from their fundraising round in July. The new valuation would catapult TML to become one of the most valuable private companies ever, less than a year from launch. For some quick comparisons, Stripe's most recent mark in secondary markets is around 106 billion, Databricks recently raised at 100 billion, and Canva reportedly marked up to 42 billion during a tender offer to employees in August. Now it is true that TML is no longer a pre product company with the release of their reinforcement learning platform Tinker last month, but they are still pre revenue and haven't really established a clear business model or even a firm product niche. Sources said that Tinker is being used by several university research groups as well as some paying enterprise customers. But this valuation certainly isn't going to be based on anything like revenue forecasts or anything like that. As with earlier rounds, it's a bet on talent, with TML boasting a stacked roster of some of the best AI researchers drawn from OpenAI, DeepMind and other labs. Really, the only comp that truly makes sense is Ilya Sutskever Safe Superintelligence, which is also a pre product bet on talent. SSI established a $32 billion valuation in April, moving over into product lan. Google has added Deep research to Notebook LM. Now, NotebookLM has already proven to be one of the most interesting and popular tools in AI, but until now the way to get the best results was pretty manual. Google says the addition of Deep Research will allow users to automate the process of putting together source documents, allowing NotebookLM to function more like an AI research assistant. Their example video showed a user simply typing in latest breakthroughs in quantum physics and setting the agent to work. Come back a few minutes later and Notebook has an entire dossier ready to read or transform into a podcast or video slide deck. Speaking of video slide decks. In addition, NotebookLM has introduced the ability to prompt custom styles for video overviews. They showed a variety of different styles like 8 bit pixelated art, pop art, turn of the century art nouveau. And these are firmly in that category of app updates, which aren't about some underlying model improvement, but about making a product simply more aligned with what its users need from it. Still, that wasn't Google's biggest launch of the day. DeepMind has released an agent called SEMA2 as a research preview. SEMA, which stands for Scalable Instructable, Multi World Agent, was described by DeepMind CEO Demis Hassabis as a general agent that can understand and reason about complex instructions and complete tasks in simulated game worlds, even ones it has never seen before. He continued, Incredible to see how it can just learn from self play. A crucial step towards AGI now. The first version of SIMA was released in March of 2024 and was fairly primitive. It learned to complete some simple tasks like following instructions like turn left, climb the ladder or open the map across a wide range of video games. It had a total of 600 different instructions it knew how to follow. The most interesting part about that result was that the agent could take what it learned from training conducted in one game and apply it to a game it had never seen before. Over DeepMind's total eval set, SEMA1 had just a 31% success rate and the rate plummeted to just a couple of percentage points. On games it hadn't seen before. SEMA 2 has demonstrated a dramatic improvement in task completion. It has a 65% success rate across the eval set, which is starting to get pretty close to the human level of 76%. On games the agent hadn't seen before, it achieved around a 13% success rate. The ability to generalize across different environments is one of the reasons many researchers are looking to world models as one of the keys to AGI. DMind even tested how SEMA2 would perform in entirely novel games that were generated on the fly by their Genie 3 world simulation model, SEMA 2 is able to orient itself, understand instructions and take meaningful actions towards a goal despite never having seen the environment before. Super interesting and firmly in this theme of alternative paths to AGI that we'll be increasingly spending time on. Lastly, a couple quick follow up notes to GPT 5.1. It is now available via the API and OpenAI has also published a prompting guide to help developers migrate their use cases. The guidance actually reveals a lot about the design decisions made for this model Update. For example, OpenAI suggested 5.1 has a tendency to be too verbose in providing an answer. They suggested it's worthwhile giving specific instructions about how much detail you want to be contained in the outputs. The guide also noted that the model is much more steerable than previous iterations, so developers can dial in very specific behaviors when it comes to agents. I'm continuing to have great early experiences with GPT 5.1 and I'm excited to see what you guys think of it. For now though, that is going to do it for the headlines. Next up, the main episode what if AI wasn't just a buzzword but a business imperative on you can with AI, we take you inside the boardrooms and strategy sessions of the world's most forward thinking enterprises. Hosted by me, Nathaniel Whittemore and powered by kpmg, this seven part series delivers real world insights from leaders who are scaling AI with purpose. From aligning culture and leadership to building trust, data readiness and deploying AI agents. Whether you're a C suite executive strategist or innovator, this podcast is your front row seat to the Future of Enterprise AI. So go check it out at www.kpmg.us aipodcasts or search you can with AI on Spotify, Apple Podcasts or wherever you get your podcasts. This episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzi platform, bringing in their development requirements. The Blitzi platform provides a plan, then generates and pre compiles code for each task. Blitzi delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzi as their pre IDE development tool, pairing it with their coding pilot of choice. To bring an AI native SDLC into their org, visit blitzi.com and press get a demo to learn how Blitzi transforms your SDLC from AI assisted to AI Native AI isn't a one off project, it's a partnership that has to evolve as the technology does. Robots and Pencils work side by side with clients to bring practical AI into every phase automation, personalization, decision support and optimization. They prove what works through applied experimentation and build systems that amplify human potential. As an AWS Certified Partner with Global Delivery Centers, Robots and Pencils combines reach with high touch service where others hand off. They stay engaged because partnership isn't a project plan, it's a commitment. As AI advances, so will their solutions. That's long term value. Progress starts with the right partner. Start with robots and pencils at robotsandpencils.com aidaily Brief Meet Rovo, your AI powered teammate. Rovo Unleashes the potential of your team with AI powered search, chat and agents or build your own agent with Studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Rovo to your favorite SaaS app so no knowledge gets left behind. Rovo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights. From day one. Rovo is already built into Jira Confluence and Jira Service Management Standard, Premium and Enterprise subscriptions. Know the feeling when AI turns from tool to teammate? If you Rovo, you know Discover Rovo, your new AI teammate powered by Atlassian get started at ROV as in VictoryO.com welcome back to the AI Daily Brief. One of the big news items to end out this week was that AI coding startup Cursor just raised a fresh $2.3 billion at a $29.3 billion valuation. Now that sort of rarefied error is a valuation that so far has been exclusively for the model companies. And so what's interesting to me about it is not just to explore the fundraising in isolation, but as a representative example of how people are thinking about the battle between the application layer and and the model layer. You might have seen this tweet floating around this week. It comes from investor and entrepreneur yashan and got 20 million views this week for what is ultimately sort of an insider baseball type of conversation. This is the foundation of our entire conversation, so let's read what he has to say and then break it down a little bit. Ishan writes, My AI investment thesis is that every AI application startup is likely to be crushed by rapid expansion of the foundational model providers. App functionality will be added to the foundational model's offerings because the big players aren't slow incumbents. It is wrong to apply the analogy of fast startup slow incumbent here. They're just big. Far more so than with any other prior new technology. There is a massive and fast moving wave that obsoletes every new app almost as fast as it can be invented. There is almost no time to build a company and scale it, Wong continues. There are two ways AI application startup founders can make money. One Make a flash in the pan app that generates a ton of cash and bank the cash. My estimate is that you have about 12 to 18 months of cash flow generation or two make a good enough app that you get acquired by one of the big players for sufficient equity. The situation is highly unstable. We don't know if it's going to crash or go to the moon. But both scenarios make it very unlikely that any AI application startup will independently become a generational super company. The best odds are finding an application niche in a highly specialized field which with extremely unique and specific data barriers, ideally ones related to real atoms hardware or world related data, and not software and finance. So the key elements of the argument here are one that foundation model providers will eat the app layer. Basically that we have to throw out our old heuristics around slow incumbents versus fast startups because the incumbents here are driving disruption at extreme speed. The second point however, which he gets into in a follow up post on his own thread, is that the foundation is too unstable to build lasting app businesses. So Yishan continues in a subsequent post, the entire novelty of this thesis is that unlike in the past, specific elements of the AI industry are likely to make it so that application companies cannot outrun the wave of obsolescence which will rush along far, far more quickly than prior technology waves. The foundational technology has not stabilized in any way whatsoever, and applications require a sufficiently stable foundation for some extended period of time in order to create value and then a system for monetizing that value. The wholesale rate of change in the nature of the foundation is the reason why I think almost all application startups will not survive to achieve any significant scale, not because the current large players are special. So this is the nuance that it would be easy to lose in this conversation. What he's really talking about is a speed of change argument, and he's effectively arguing that app startups will get overtaken by sea changes before they can become real businesses, and that it's not that the big labs are quote unquote better in any specific way, but that only they have enough internal stability and resources to survive the chaos that they themselves are creating. He concludes in his second post, seed changes are now happening on a 9 to 12 month cycle. Very few startups can turn into a mature business in that time frame. And by mature I mean having all the boring stuff like sales relationships and brand recognition. Yes, your engineers can make the change, but human hiring cycles and team solidification and market relations are incompressible. If you hire 100 people a month, your organization will implode. Thus application companies never quite make it to a full business threshold before the sea change happens out from under them. When I say the incumbents will take the application space, I mean that they're the only ones who can provide enough internal stability and resources to survive the sea changes they themselves will be driving. Not that they're going to provide a superior product, they're just the ones who won't starve. So like I said, this had 20 million views and generated a huge amount of conversation both on the post even and even in other channels like LinkedIn. So let's talk first about the people who thought that Yashan was wrong in some fundamental way. Many of these themes can be sort of bundled into the idea that vertical apps, workflows or UX still matter hugely. David Roberts writes, I think you're underestimating how much unique UX context engineering integrations, human in the loop and embedded workflows need to exist for any vertical business application to actually get from 70% decent to 100% outcomes with AI vertical applications are going to be enormous and they will not be eaten by the foundational model providers. Now implicit in David's argument is that the stuff that it takes to make a vertical application Specifically for business a B2B application work is so immense and complex that it's just not in the incentive of the foundation model companies to do that. And certainly this is a point that I resonate with, seeing how much last mile integration work it takes for a very powerful AI tool to be actually useful inside the context of a business. Now Yashan actually responded to this one, saying, your reasoning here supports my thesis rather than undermining it. What I think he means is that there's going to be so much change so fast that the app layer companies aren't going to be able to survive long enough to do that sort of complex last mile work that David is talking about, ultimately leaving it only to the foundation model companies, even if they don't prioritize it in the short term. Aaron Levy from Box, who's one of the most thoughtful thinkers when it comes to enterprise AI, says the counter dynamic to the AI model doing everything is that, at least in the enterprise, bridging the AI model's capabilities to the customer's environment still requires a tremendous amount of long tail work. The gap between an AI agent working for 90 or 95% of the solution and 100% is usually about 10x more work than most realize. So here you see Aaron reinforcing many of the themes from David's post. He continues getting access to the enterprise data, connecting to the enterprise workflows, delivering the change management that employees need to adopt, the technology, handling the regulatory and compliance requirements of that industry, and so on, all require some degree of highly dedicated focus in a domain. Others argue that Yashan might be underestimating the new types of moats that could be formed. Investor Natasha Malpani writes, I'd say the opposite. The real white space is at the application layer. Everyone wants to sell shovels, but the gold is in how people actually use them. The infra race is a knife fight between hyperscalers, OpenAI, Google, anthropic, meta, Amazon. They'll undercut each other on price, latency, context window and token cost until margins collapse. Developer tooling looks safer, but it's crowding fast and every improvement gets absorbed upstream by the foundation models or downstream by open source forks. Meanwhile, applications are where behavioral moats form data isn't the only barrier habits are Users don't live in APIs or eval dashboards. They live in experiences, context, workflow, brand and trust compound. Fast distribution and feedback loops create data advantages that scale locally even when models converge globally. You win if you own feedback surface to capture every edit, action and intent. Build domain depth and embed in daily workflows. Collect proprietary exhaust behavior and telemetry that the model providers will never see. Some infra will break through security evals, low latency edge compliance, but the broader white space is still at the application layer where people, agents and systems actually interact. Go deep enough that a foundation model can't care and sticky enough that users won't leave even when it can. Now again, I really want to double click on this foundation model can't care piece. A huge amount of the work that is required right now for AI applications to work inside enterprises is work that foundation models do not have the luxury of caring about. It is simply too much. Complex, boring, repetitive, but still customized to the customer work. Which is why that outside of the foundation model companies, the firms that have done the best from the AI boom are the big systems integrators and consulting firms. The fact that the foundation model companies have to compete on other vectors creates a window of opportunity for a different category of company to swoop in and do the work that it takes to actually bring these solutions to market in practice. Now the other point from Natasha that I want to really double click on is this idea of proprietary exhaust. For those of you who don't live in Silicon Valley jargon, that paragraph might have seemed really dense. Let's read it again. You win if you own feedback surface to capture every edit, action and intent. Build domain depth embed in daily workflows, Collect proprietary exhaust, I. E. Behavior and telemetry that the model providers will never see. Exhaust is the data that comes out of the usage of a product. And many of the folks that are most excited about the application layer when it comes to AI have a thesis that when it comes to improving model performance, this type of behavioral exhaust is the real gold because it's the only thing that's not commoditized to everyone else. In other words, the foundation model companies all have access to the exact same trading data, more or less, or some version of the same trading data. But a company that gets enough usage can create a feedback loop where they actually see how people are interacting with the models, and that data stream can be used to refine how the model, and also the experience that the model lives in, works. This is going to be particularly relevant to our example of Cursor, which we'll come to in a moment. Still, even with all of these arguments for why Ishan's thesis might be wrong or at least limited, there's a big overlap in the Venn diagrams between these two camps that I think would acknowledge that many AI apps are just flimsy rappers and that the real winners are likely to be the deep autonomous systems. Jacques Reynolds writes, most new AI apps aren't defensible, they're just UI wrappers on top of someone else's model. The mode disappears the moment OpenAI or anthropic ships the same feature natively. The real upside isn't in building another AI app, in my opinion. I think it's in implementing AI inside existing business workflows where data, context and customer relationships create real barriers. Chong Call builds this thesis out even farther. He writes, the issue isn't that foundational models will kill application startups. It's that most AI applications today aren't really applications, they're shallow automations built to impress investors on a six month timeframe. He basically makes a comparison to early SaaS and says today the same story is repeating with AI agents. Duct tape workflows, zero defensibility, no reliability at scale. But the core question hasn't changed who's building a system that delivers real value repeatably, reliably and autonomously? So the implication of this is that if you are building an application, you have to build it deep. You have to be hands on. You have to be in a position to actually capture that behavioral exhaust data. Nowfall writes, I think even if a new application starts on this constantly evolving base, it can endure if it embeds itself in existing workflows, writes to proprietary systems of record, builds proprietary data and learns from usage and or captures distribution before incumbents bundle the feature. More importantly, AI wrappers that continue to swiftly ship features that solve users needs even as competition arrives are difficult to compete with even for the foundation models. And so again I think that you're starting to see the through line here that acknowledges the incredible speed at which things are changing and the new challenges that creates for the app layer as well as the innovation capability of the big foundation model companies, but still sees this core path for some number of extremely high performant application layer companies. And indeed a lot of the responses was about what it takes to be one of these actually successful application layer companies. Sarah Catanzaro writes, My AI investment thesis is that AI application startups will need to solve research and engineering problems that the labs are not currently focused on, thereby accumulating more technical defensibility. At times their objectives may even diverge. We already see this in creative industries where post training alignment impedes the ability of models to produce diverse outputs. It will be hard to survive since the app companies will also need to define compelling workflows and user experiences. But with the right team and support, some but not all will make it a 16z's Anisha Shiara writes about a few approaches that he thinks advantage app layer startups. The first are categories that benefit from being multimodal, basically where the experience for the end customer is better if they can access models from different providers, cornered resources, those locked proprietary data sets and ecosystems that quote imply a ton of feature surface area. He gives the example of Granola. Sure, you can replicate granola's recorder, but is OpenAI really going to build the entire ecosystem of productivity apps implied by it now? Regardless of what we all think about this, the reality is that money is still pouring in. The Information, for example, recently published a piece called Investors Chase neolabs to outflank OpenAI and Anthropic. They point out that over the last month those investors have made or discussed two and a half billion dollars of investments into just five startups. The information, writes the NeoLab startup's founders say they hope to exploit new approaches to developing AI models in research they say major developers like OpenAI and Anthropic may have overlooked. And that brings us to the Cursor part of the story. Now, Cursor is of course one of the big breakout leaders of the last year. When the story of 2025 is written, AI coding will be at the very top of the narratives and one need look no further than the valuation jumps of Cursor to see just how big a deal. At least investors are treating that whole theme as the company has raised $2.3 billion in a new round that values them at 29.3 billion. That is close to triple their $9.9 billion valuation from their Series C in June and a 12X compared to their valuation from the beginning of the year. In addition to the funding, Cursor also announced that they've reached $1 billion in ARR and that they now produce more code than any other coding agent. Yuchen Jin did the research and commented Cursor is almost certainly the fastest company in history to reach a billion dollars in arrival. Achieving this milestone in a little over two years, he added, and let's see if you can spot the connection to our broader theme Today, people said Cursor would go to zero because it's just a wrapper AI products won't be monopolized by Model labs in my opinion. One Products win by delivering real user value. Model capability alone isn't enough. 2 Once they hit product market, fit companies can train their own models, often based on open source models combined with their own unique data and rl environments. Cursor's Composer 1 is an example. Now Composer, which is Cursor's proprietary model, seems central to their business strategy moving forward. They said that they intend to use this fresh capital to invest further in developing Composer. The Wall Street Journal framed this raise, in fact, as being a test case to see if app layer startups can transition away from relying on the foundation model companies. They noted that both OpenAI and Anthropic are now directly competing with Cursor. When asked about this, Cursor CEO and co founder Michael Trull gave a diplomatic response, stating, we're excited to be one of the first examples of a large company built on their platforms. All of the AI labs are important partners to us, but clearly Composer, their unique model is top of mind. Trull said it does take significant resources, both specialized talent and also GPUs to do something at Composer scale. This funding lets us do it in a big way. Cursor also showed just how much the model environment is changing. Back in April, the most popular models on cursor were Claude 3.7, Sonnet, Gemini 2.5 Pro, Claude 3.5, Sonnet, and then in fourth and fifth place, GPT 4.1 and GPT 4.0. The fastest growing in April were 0304 Mini and Deepseek version 3.1. Today, the most popular models are in the first place, Sonnet 4.5. In the second place, Composer 1 and then after that GPT 5, GPT 5 codecs and Sonnet 4. The fastest growing however is Composer 1. All of which brings us to an interesting point about where this Venn diagram between the app layer and the model layer overlaps which is at some point do the handful of app layer companies that can break through and reach the scale to survive just become model companies themselves? That certainly seems to be part of the direction here with Cursor and I think will be an interesting thing to watch. Anyways it's a fascinating discussion and I think if you take away anything it just shows that right now things are changing so fast that even the people whose entire job it is to watch and understand and allocate against these movements don't really have any idea what's happening. We are all just students with the very fast spinning world. Our teacher for now. That's going to do it for today's AI Daily brief. Appreciate you listening or watching as always until next time. Peace. Sam.
Host: Nathaniel Whittemore (NLW)
Date: November 14, 2025
This episode dives deep into one of AI's most pressing strategic debates: Are application-layer (app) startups doomed to be "crushed" or rendered obsolete by the rapid evolution and vertical integration of foundational AI model providers? Using the massive $2.3 billion fundraising round for Cursor, an AI coding platform, as a springboard, NLW explores the ongoing tension and interplay between the app layer and the foundation model layer in the AI value chain. The episode analyzes arguments from both sides of the tech and VC world, examines industry developments, and highlights what it means for startups, investors, and users.
On threat escalation:
On foundation models vs. app startups:
On the power of verticalization & behavioral data:
Cursor’s breakout:
Closing reflection:
The episode underscores the extraordinary pace and uncertainty in AI: While model providers are consolidating influence and moving quickly to absorb app functionality, there are strong arguments and some examples (like Cursor) for applications that embed deeply enough—leveraging proprietary workflow data and user experience—to resist rapid commoditization. However, the lines increasingly blur, and successful app-layer companies may evolve into foundational model companies themselves. As NLW concludes, even experts are just trying to keep up with the unprecedented speed of change.