
Loading summary
A
Today on the AI Daily Brief, a new measure for how much work AI can actually automate. Before that in the headlines, Amazon's cloud group outperforms the AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Robots and Pencils, Blitzy, KPMG and aria. To get an ad free version of the show, go to patreon.com aidaily Brief or sign up on Apple Podcasts. To learn about sponsoring the show, shoot us a Note at sponsorsdailybrief.AI to learn about speaking in jobs, go to aidailybrief AI and to participate in our ROI Benchmarking Survey and get access to a report around how much value everyone is getting out of the now hundreds and hopefully thousands of different use cases that people have Contributed, go to ROI Survey AI and now with all those AIs out of the way, let's get to the show. Welcome back to the AI Daily Brief Headlines edition. All the daily AI news you need in around five minutes. We kick off today with an upside surprise from Amazon. Now I think at this point it's fair to say that 2025 has been a bit of a rough one for Amazon. Alexa finally made it to market, but it was met with pretty underwhelming reviews. Google appeared to swoop in on their big anthropic partnership in a 10 figure deal that highlighted Google's custom silicone. While there was initial excitement around Amazon Nova, that buzz has petered off pretty significantly. And of course for the past week they have become the poster boy for AI layoffs. Turns out that is very much not the full story. Posted quarterly revenues of 33 billion, up 20% from last year and recording their highest growth rate since 2022. Analysts had only forecast 18% growth and there's nothing analysts love more than underestimating a fast growing tech stock. E commerce sales were also a slight outperformance, coming in at 180 billion for 13% growth as we saw with Google's earnings on Wednesday. ROI justifies capex and Amazon is the current leader in AI spending. They bumped up the forecast for this year's CapEx to 125 billion, exceeding the analyst forecast of 119 billion. That's up 55% year over year with the expectation it will continue increasing next year as well. CEO Andy Jassy said, we continue to see strong demand in AI and core infrastructure and we've been focused on accelerating capacity, adding more than 3.8 gigawatts in the past 12 months. He also had some fighting words for the other hyperscalers, commenting, percentage growth is a relative term. It's very different having 20% year over year growth on a $132 billion annualized run rate than to have a higher percentage growth rate on a meaningfully smaller annual revenue, which is the case with our competitors. The markets like the spunk and they like the news. And the Stock was up 13% in after hours trading, making Amazon surprisingly the best performer during this big week of tech earnings. Now, one important note that came from the earnings call was Jassy disavowing the notion that Amazon's 14,000 white collar layoffs were anything to do with AI. He told investors the layoffs were, quote, not really financially driven. And it's not even really AI driven. Not right now. It's culture. He explained that Amazon had added a ton of headcount, a lot of locations and a lot of business lines over recent years. So he said, you end up with a lot more people than you had before and you end up with a lot more layers. Sometimes without realizing it, you can weaken the ownership of the people that you have who are doing the actual work. He continued, it can lead to slowing you down. As a leadership team, we are committed to operating like the world's largest startup, and that means removing layers. Now, AI certainly has a part of this story, but it's a much more nuanced part than it's been reported. I think it's clear that they're positioning for a future where you don't need as many people to do the same amount of work. At the same time. It's very clearly not AI replacing people right now. In fact, if you just have to pin it down to one thing, Jassy is basically saying this is a reversal of overhiring after the pandemic. Pretty much. Period. Full stop. Now. One of the sub stories of the way that analysts talk about this that's so funny to me is that they're just having such a hard time understanding that there might be a change in how companies want to spend their money. Jack Farley, who hosts the Great Monetary Matters podcast if you're interested in macro, writes, very strong quarter from Amazon, no doubt. But at the same time, Amazon's free cash flow is collapsing. AI Capex is consuming so much capital, but also they're spending free cash flows rather than raising debt. It is such a foreign concept right now that there might be a better way to invest profits to get a better return than buybacks? I don't know man. For my money, companies investing in a future that they're very convinced in seems a lot better than just artificially juicing their stock price. But what do I know now speaking of funding structures, while Amazon might be using free cash flow data center, funding markets are on fire as Meta closes a Record Corporate Bond deal meta closed a $30 billion bond sale this week to fund their data center build out. The deal was the highest large grade corporate debt issuance this year, sources said. The deal attracted orders for 125 billion, showing massively outsized demand from investors to fund AI capex. According to Bloomberg data, this is the largest demand ever for corporate bond sale. Now these bonds aren't paying wildly excessive returns. They range between 5 and 40 years of maturity and are expected to pay 1.4 percentage points over treasury yields for the longer dated instruments. Interestingly, this sale was reported shortly after Meta's earnings report, which saw the stock tumble on fears of ramping capex with no clear connection to roi. Turns out that the bond market has a much longer term horizon. And again, if you were trying to be a nuanced watcher of AI bubbles, keep an eye on whether we start to see a shift in how much demand for corporate debt around this stuff there is. If all of a sudden we see these sort of bond sales starting to scrape the bottom of the barrel, that's when things might be looking a little tricky. Next up, the latest in the layoff theme. YouTube has offered staff voluntary buyouts as they restructure their organization around, you guessed it, AI Tech reporter Alex Heath broke the news on his Sources blog, obtaining a copy of an internal memo from YouTube CEO Neal Mohan. Mohan wrote, looking to the future, the next frontier for YouTube is AI, which has the potential to transform every part of the platform. We need to set ourselves up to make the most of this opportunity. He referred to this as the first update to YouTube's core leadership structure in a decade. The structure will shift three product groups to report directly to Mohan, along with additional reshuffling of divisions. YouTube said that no layoffs would take place as part of the shakeup. Instead, US based staff who are quote, ready for a new challenge will be offered severance as part of a voluntary exit program. While it would be very easy to be cynical about the corporate speak here, I think it's actually part and parcel of the same story we saw with Jassy's comments about Amazon layoffs that when we say AI layoffs we might be talking about a couple very different things. On the one hand, r just straight up, we don't need you anymore because AI can do your job sort of layoffs. On the other hand, R we anticipate that 10 years down the line because of a new digital hybridized workforce that includes both agents and humans, we're going to need less people than we do now for the current set of activities. That doesn't mean that we won't need more people for new things that haven't come around yet, but for the stuff that we offer now need to start moving towards a different organizational structure. I think we're going to see a lot of these types of preemptive and proactive restructurings over the course of the coming year. Little bit of Investment News Nvidia plans to invest a billion dollars into AI coding company Poolside Poolside is primarily focused on building foundation models that are used on programming rather than IDEs and applications to support the use case. They released their first product last year after being founded in 2023 by former GitHub CTO Jason Warner. Bloomberg reports that they're currently seeking? 2 billion in fundraising that would quadruple the company's valuation to 12 billion. Nvidia already invested 500 million into poolside during their last fundraising round in October of 24. Poolside plans to use a portion of their fresh capital to fund the purchase of Nvidia GB300 chips, according to sources. And for Poolside, this fundraising lines up with plans to build out their own compute ambitions. Earlier this month, Bloomberg reported on a deal with Core Weave to build a 2 gigawatt capacity data center in West Texas to support their model training and inference needs. On the feature side of the house, Canva has added new AI tools in a push to keep up with the rapid evolution of design. The new tools leverage Gen AI to create and edit posters, short videos and presentations using natural language prompts. The release comes just a few months after a major product overhaul in April, and notably within days of Adobe unveiling their big AI update. The product update seeks to reframe Canva as an AI powered creative operating system, which integrates AI into every layer of content creation. Essentially, they're looking to move away from template based creation into a more freeform workflow driven by Gen AI. One example of how this might work is their approach to semi automated brand advertising, co founder Cameron Adams said. Canva automatically scans your website, figures out who your audience is, what assets you use to promote your products, the message it needs to send out the formats you want to send it out in, makes it creative for you, and you can deploy it directly to the platform without having to leave Canva. This, I think, is part and parcel of the productization era of AI that I think we were quickly moving into. Canva's updates are following the theme closely, with a real emphasis on integrating existing AI capabilities into full product suites that allow people to get bigger chunks of work done. That's a theme that we are going to very much keep an eye on, but for now that's going to do it. For today's headlines. Next up, the main episode Small nimble teams beat bloated consulting every time Robots and Pencils partners with organizations on intelligent cloud native systems powered by AI. They cover human needs, design AI solutions and cut through complexity to deliver meaningful impact without the layers of bureaucracy. As an AWS Certified Partner, Robots and Pencils combines the reach of a large firm with the focus of a trusted partner. With teams across the us, Canada, Europe and Latin America, clients gain local expertise and global scale as AI evolves. They ensure you keep peace with change, and that means faster results, measurable outcomes, and a partnership built to last. The right partner makes progress inevitable. Partner with Robots and pencils@ropotsandpencils.com aidaily Brief this episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context. Blitzi uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blizzy platform, bringing in their development requirements. The blitzi platform provides a plan, then generates and pre compiles code for each task. Blitzi delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzi as their pre IDE development tool, pairing it with their coding pilot of choice. To bring an AI native SDLC into their org, visit blitzi.com and press get a demo to learn how Blitzi transforms your SDLC from AI assisted to AI native. What if AI wasn't just a buzzword but a business imperative? On you can with AI, we take you inside the boardrooms and strategy sessions of the world's most forward thinking enterprises. Hosted by me, Nathaniel Whittemore and powered by kpmg, this seven part series delivers real world insights from leaders who are scaling AI with purpose. From aligning culture and leadership to building trust, data readiness and deploying AI Agents Whether you're a C suite, executive strategist or innovator, this podcast is your front row seat to the Future of Enterprise AI. So go check it out at www.kpmg.us aipodcasts or search you can with AI on Spotify, Apple Podcasts, or wherever you get your podcasts. There's a reason most enterprise AI initiatives never make it to production. You can't find a platform that's both powerful and secure enough. The result? AI budgets burn with zero business impact. But not anymore. ARIA is the enterprise AI platform that delivers speed without compromise. Unlike other platforms that force you to choose between fast deployment or secure operations, ARIA brings speed and security together Launch AI quickly without cutting corners on compliance. Scale rapidly without sacrificing governance. Move at the speed of business without moving past your security requirements. Fortune 500 companies across finance, healthcare, retail, legal and more choose AERIA because they deliver what seemed impossible enterprise AI that's fast enough to beat the competition and secure enough to protect your most sensitive Data. Ready for AI at full speed with zero compromise. Visit aeria.com to see the platform in action. That's airia.com Simplify Enterprise AI welcome back to the Daily Brief. Today we are talking about an interesting new way of measuring agent performance. A big theme that we've talked about throughout this year is the need for better benchmarks and evaluations. Right now, the benchmarks and evaluation suites that we get alongside new models are highly academic. They're not really connected to the real world. And that, I think is part of why people are so hungry for alternatives like the meter scale that tests the ability of AI to complete tasks at different durations. And yet, even metrics like this one are still pretty theoretical and academic. For example, the default meter scale tests the duration with which a model can complete a task at 50% success rate or 80% success rate. But in the real world, 80 and certainly not 50% success is probably not going to pass muster. Now, of course, this is all the more important as the narrative of AI layoffs starts to scream to the forefront. This fear is based on AI being able to do the work of lots of people, and so whether it actually can is a pretty important question. Now, recently there have been some new efforts explicitly designed to move performance measurement to the real world. You might remember back at the end of September, OpenAI introduced GDPVAL. The idea was to create a metric that measured model performance on economically valuable real world tasks. The way that GDPVAL works is they took 44 occupations selected from the top nine industries that contribute to US GDP, cut that up into 1320 specialized tasks, or rather worked with experienced professionals in each of those areas to cut it up into those specialized tasks. And then put the output of the AI through up to five rounds of expert review, including, as they put it, checks from other task writers, additional occupational reviewers, and model based validation. When this was launched, I was incredibly excited about taking the move from evals into the real world. And so I sat up and took notice when Dan Hendricks, the director of the center for AI Safety, tweeted out a new project called the Remote Labor Index that was explicitly once again about testing AI in the real world. So what is the Remote Labor Index? The creators say that the goal is to create a standardized, empirical measurement of AI's capability to automate practical remote computer based work. Now, importantly, the tasks that they tested on are not theoretical or academic. They were real world projects sourced directly from online freelance platforms, primarily focused on upwork like OpenAI with GDP. Val, the group behind Remote Labor Index, or RLI, started by engaging directly with a set of experienced Upwork freelancers. 358 to be exact. The average freelancer in this pool had over 2,300 hours worked and over 23,000 in platform earnings. That may seem small, but a lot of these folks are international and a lot of These jobs are 10 or 15 bucks at a time. The researchers pay the professionals to provide samples of their past completed work, with the goal being that the benchmark is grounded in actual economic transactions. The thing that they were measuring against was projects that real clients defined, commissioned and paid for. Starting from a pool of 550 potential projects, they ran them through an extensive filtering and cleaning process, removing anything that required physical labor, anything that couldn't be evaluated, or anything that had client interaction, like tutoring. The final RLI data set consisted of 240 unique high quality projects, with each project containing a brief, which is the original text document describing the work to be done, the input files, all the assets, data or documents needed to start think a voiceover file, a PDF design, a spreadsheet, et cetera, and then the Human Deliverable, the final gold standard product that the human professional delivered and got paid for. That, of course, is what the AI's work would be judged against. To get a little bit more of a sense of the projects that were involved, it took human professionals a mean of 28.9 hours to complete each project, with a median of 11.5 hours, meaning that there was fairly big variety in the difficulty. The Average project costs $632 with a median of 200 and the project span 23 different categories with the top being video and animation at 13%, 3D modeling in CAD at 12%, graphic design at 11%, game development at 10%, architecture at 7% and audio at 10%. And now for the results. In short, the current state of the art AI agents perform near the floor the top performer was manuscript, achieving an automation rate of just 2.5%. What that means is that in a head to head comparison, a human evaluator decided the AI's deliverable was at least as good as the human deliverable and would be accepted by a reasonable client only 2.5% of the time. Grok 4 and Sonnet 4.5 both had an automation rate of 2.1%, GPT 5 had 1.7 and ChatGPT agent had 1.3 and Gemini 2.5 Pro had just a 0.8% automation rate. So why did the projects tend to fail? When an AI's deliverable was rejected, it was typically for one of the following 45.6% were rejected for poor quality. In other words, the work was technically done, but it wasn't professionally acceptable. They said that could be anything from childlike drawings to robotic and unnatural voiceovers. For another 35.7% of rejections, it was about incompleteness, where the agent simply failed to Finish the job, 17.6% failed for technical and file issues, the AI producing corrupt or empty files or delivering the work in an unusable format, and 14.8% had inconsistencies where, for example the deliverable lacked internal logic. An example of this would be in an architecture project, a house's appearance changing completely across different 3D views. Now, if you're doing the math and quickly realizing that this adds up to more than 100% certain projects failed for more than one issue. So this is pretty rough, right? Maybe lower than some people would have thought, but I think that there are a bunch of caveats and the paper makes some of them as well. First of all, there were areas where agents did much better, specifically audio and image related work, along with writing and data retrieval or web scraping. This isn't surprising given how ubiquitous those types of use cases are with the assistance that we have. The next important caveat is that they saw progress even though the overall rate of automation was low. They used a second metric, an elo, or relative performance Score, to have an additional layer of analysis they write. For each project, a deliverable from two different AIs is presented to human evaluators who judge which deliverable is closer to completing the project successfully. If both agents successfully complete the project, then their deliverables are compared on overall quality. And looking at that, they found that across all projects, AI agents are steadily improving, even if still far below the human baseline. Another thing that I think is fairly positive is the degree of completeness. Frankly, I think I would have expected a higher incomplete rate than 35.7. It feels like a fairly significant Rubicon to be able to get the job completely done even if it's not good enough. And the other really important thing to note is that this test is not about judging task completion. It is instead about judging an agent's ability to do an entire work stream that a client has defined as a project that might have multiple steps and which is actually worth paying for to them. The paper's authors are very explicit about the choice that they're making here. In other words, it's different than gdp, Val they want a metric of full Automation it's really important for us to remember as we're discussing the spectrum of autonomy and automation, that it is not always a priori going to be the case that the goal that we have for AI is full automation. There is a whole conversation happening about this right now in the agentic coding world where the folks who are building those systems are trying to figure out how much to prioritize totally autonomous background work versus speeding up code assistance. I think it's a great attempt, though, frankly, to have something that is exclusively concerning itself with full automation. I think that the authors of the paper are right to identify that we need metrics that actually give us a sense of how fast that level of automation is evolving outside of what is going to be increasingly loud political noise. To some, this means that we should be a little bit calmer when it comes to our doomsday prognostications. Rio Longacre writes, Reminder AI in its current incarnation is great at automating specific tasks, not entire jobs. Anyone who tells you otherwise is either deluded or lying. Obviously this could change over time, but the current hyperventilating about mass layoffs and a permanent underclass are hyperbolic. And what's interesting is that now that this is out, people are using it to ask if there are other important ways to measure automation as well. Amit on Twitter writes 2.5% automation rate is actually really high. For general purpose AI, consider this these are random, economically valuable projects spanning multiple domains. No fine tuning, no human in the loop, just raw agent capability. Berkhof implies that if you got into very specific areas like software engineering, the automation rate would likely be much, much higher. And so perhaps in addition to this base, we're going to need sector specific measures and tests. Look, overall, I am here for it. I think the more that we can measure and understand what's actually going on and how we should be benchmarking the performance of our AIs, the better requisite shout out to our AI ROI benchmarking study. That's live at Roisurvey AI. It's exactly why we have this project live. We want to help provide some of that information. And I really appreciate the group at the center for AI Safety and their partners for doing that with the Remote Labor Index. For now, that's going to do it for today's AI Daily brief. Appreciate you listening or watching as always. And until next time. Peace. Sam.
Host: Nathaniel Whittemore (NLW)
Date: November 1, 2025
In this episode, Nathaniel Whittemore dives into a new empirical measurement of how much real-world work AI agents can fully automate. Using the "Remote Labor Index" study as a focal point, the episode provides insights into current automation capabilities, industry benchmarks, and the evolving narrative around AI-driven job disruption. The discussion is structured around these themes: the practical limits of current AI automation, the nuance of layoff narratives, and the importance of grounded measurements over academic benchmarks.
"It's not even really AI driven. Not right now. It's culture." (03:58)
"It's very clearly not AI replacing people right now... this is a reversal of overhiring after the pandemic. Pretty much. Period. Full stop." (05:32)
"Looking to the future, the next frontier for YouTube is AI, which has the potential to transform every part of the platform. We need to set ourselves up to make the most of this opportunity." (08:55)
Nvidia & Poolside: Nvidia to invest $1B in Poolside (AI coding platform founded by ex-GitHub CTO), fueling both its valuation and infrastructure expansion (new data center in Texas).
Canva Product Evolution: Canva launches new Gen AI features for design tasks, marking industry-wide movement toward productized, end-to-end workflow automation:
"They're looking to move away from template-based creation into a more freeform workflow driven by Gen AI." (11:55)
Striking Result:
"In short, the current state of the art AI agents perform near the floor... the top performer was Manuscript, achieving an automation rate of just 2.5%." (21:06)
Reasons for Failure:
"When an AI's deliverable was rejected, it was typically for one of the following: poor quality... incompleteness... technical and file issues... inconsistencies..." (23:50)
Some Areas Show Promise:
Relative Performance Is Improving:
Task vs. Full Job Automation:
"This test is not about judging task completion. It is instead about judging an agent's ability to do an entire work stream that a client has defined as a project that might have multiple steps..." (27:30)
Comparison to GDPVAL:
Grounds for Cautious Optimism/Calm:
"Reminder AI in its current incarnation is great at automating specific tasks, not entire jobs. Anyone who tells you otherwise is either deluded or lying... current hyperventilating about mass layoffs and a permanent underclass are hyperbolic." (30:04)
But the Bar May Be High:
“If you got into very specific areas like software engineering, the automation rate would likely be much, much higher.”
On Amazon’s AI Layoffs Narrative:
"AI certainly has a part of this story, but it's a much more nuanced part than it's been reported." (05:26)
On RLI’s Real-World Roots:
"The thing that they were measuring against was projects that real clients defined, commissioned and paid for." (17:25)
On the Limits of Current Automation:
"Maybe lower than some people would have thought, but I think that there are a bunch of caveats and the paper makes some of them as well." (24:47)
On Measuring Real Progress:
"I think the more that we can measure and understand what's actually going on and how we should be benchmarking the performance of our AIs, the better." (31:41)
Nathaniel concludes that while AI is accelerating, claims of mass job automation are greatly overstated—at least for now. Full replacement of skilled human labor remains rare; however, measuring precisely and across domains is vital for tracking progress. He invites listeners to pay attention to evolving benchmarks like the RLI and contribute to their own survey (ROISurvey AI) for a clearer view of AI’s actual economic impact.
"For now, that's going to do it for today's AI Daily brief. Appreciate you listening or watching as always. And until next time. Peace." (32:55)