Transcript
A (0:00)
By every standard AI metric, Apple is losing. They're not selling tokens, they're not building data centers, they're not doing advanced research. And yet their hardware is the machinery on which the most advanced AI users on the planet are actually running their day to day. So what is this race that Apple is actually running? As one of the big tech giants, Apple has really been noticeable in its absence in the AI race. I was one of many analysts who said, God, Apple intelligence is a disaster. What's happening with Siri? Apple specialists like Jon Gruber wrote a year or so ago that Apple's credibility had been damaged and squandered by their presentation at the Worldwide Development Conference, where he called the demo a concept video. Casey Newton said Apple's begun to lose the plot. And Ben Thompson, who runs the newsletter Stratecherie, said Apple's nowhere near the cutting edge. And I was part of that chorus of voices as well, that I was somewhat disappointed, perhaps blind, oblivious to the fact that every single day when I was hammering away at ChatGPT or at Klaun, I was doing it through an Apple device. And in fact, even as I switched from model to model to model, the device I used did not change. So there was something that I missed that was right in front of me. But recently there is something interesting that is happening. So when I went off and started to play with the openclaw agents earlier this year, I was running them on a small Mac Mini that I have in my equipment cabinets. And within a week or so I was hammering it so hard that Roon, which is the audio transport that I use around the house, and our CCTV cameras were really no longer working, partly because the OpenClaw agent was demanding so much of the resources on the system. So I went off and I bought a new Mac Mini, as you know, for our Mini Arnold, I've discussed this many, many times. When I then suggested to my team that they might want to get Mac Minis, we noticed that instead of a three to four day delivery time, delivery times had extended somewhat. And Tom's Hardware, which is a fantastically nerdy blog that I've been reading for years and years, had a headline that said it all. Open Claw Fueled Ordering Frenzy Creates Mac Shortage. And there was a TikTok from a Best Buy employee about a month ago showing all these empty shelves where Mac Minis had sat previously asking, is this an AI thing? Well, yes, it's an AI thing. I'm going to hazard the hypothesis here. I'm not going to tell you this is what is exactly happening. I don't have visibility of who all the people are going in to buy Mac Minis in Apple at Apple stores around the world. I don't have visibility on what's happening with Apple's supply chain given the pressures, the demands on RAM in the markets at the moment. I mean virtually every class of RAM has got backlogs of a year or two or more if you go and see what's happening with Micron and SK, Hynix and others. But I'll hazard that the demand for these higher end Mac minis is coming from openclaw. And the reason is simple. I mean I've taken you through my personal setup. Pete Steinberger, the Austrian developer sort of released that OpenClaw version in November 2025 and as of it's got 350,000 stars on GitHub which you'll be bored of hearing this whenever I talk. It's the record, it's the fastest number, the fastest time to that number of stars on GitHub. And here's the thing, if you look at my setup and admittedly I'm a maybe not bleeding edge, but I'm certainly on the front edge of usage. I've now got a setup where I have three Mac minis and my MacBook Pro to support the work that I do and, and what we are doing is a real microcosm of what is happening elsewhere. And I think the most interesting spot is what's happening in China. So China has gone open claw crazy. About a week ago, early March, Tencent's engineers set up folding tables outside of their headquarters in Shenzhen and spent the day installing OpenClaw on strangers devices for free. I mean I love that idea that you just walk past and you hand over your phone and someone puts on, you know, a piece of open source software for you. But a thousand people showed up. They were carrying NAS drives because NAS drives normally have a bit of Linux running on them. You can run quite a lot of code on them and some with Mac Minis under their arms. And a few days later Tencent launched three AI agent products in the same day and their stock surged 7%. That same week several Chinese local governments rolled out a subsidy program. Shenzhen, Haifa, Huangzhou, Nanjing. They were offering $2.8 million grants of some sort to entities to support the deployment of these agents. They were calling it the opc, the one person company. So if you haven't read Po Zhao, he has a substack called hello China Tech. I mean he's writing some really Interesting things about this. We'll actually have quite a lot on openclaw in China in Sunday newsletter but Po's work on this has been really, really excellent. He's noted that this is a structural break. You decades, Chinese local governments have competed for factories and headquarters and now what they're doing is they're trying to compete for individuals with AI agents. And a provincial official puts this was interviewed by some journalist. He goes, you have to talk about AI all the time, otherwise you might lag behind. So OpenClaw is making that quite easy. And Apple hardware happens to be really well suited for it and can make it fast. And China has followed that chain faster than anyone. So that's the demand side. We have Chinese provincial governments competing for individual AI users the way they once competed for factories. I bet no one had that on their dance card. If this kind of analysis is useful to you, subscribe to the channel. We're doing this every week and you know as well as I do that the next few months are going to be incredibly interesting. Now let me tell you what Perplexity just did with all of this. Perplexity announced the Perplexity personal computer. This is a persistent AI agent that lives on the Mac Mini. You know, Aravind talks about this as an AI operating system that takes objectives. Now it runs 24 7, it costs about $200 a month, and behold, they use a Mac Mini. So both things are true. If you're a benchmark addict, if you're a meter hound, if you are an Epoch loyalist and you're tracking the data, you're tracking LM arena and those scores, Apple is nowhere to be seen. But they happen to have found themselves, I think, at the forefront of this race. The company controls a number of assets that will make a difference in AI and we can just go through them layer by layer. So they have the silicon, they've got, you know, the Apple, the chips themselves, they have that neural engine. They have the OS and the OS has hooks into that neural engine. It has the MLX framework. They have a privacy architecture. And that privacy architecture is not just a little bit of text on a marketing brochure. It is actually embedded within the enclaves and the software and the operating system. And because of that, they have consumer trust through the interface. You know, the Apple phone, the Apple device is the thing probably that most of us touch every year and in fact go away and do this. Think about this, think about of everything you own in your life, from your socks to your wallet, what do you touch? The Most my bet will be is your wedding band if you have one, your specs if you have them, and then there'll be an Apple device. And so that is the degree of consumer trust we have. So they didn't go into the frontier model competition because they control these other things and that gives them a particular capture mechanism because every third party model ends up having to go through something that Apple controls and that can also be the app store. So they capture the value of that relationship even when they don't own own the model. And this is something they have been doing for years and years and years and they know how to do. But there is much more to it because Apple hardware is actually pretty special in all of this. I mean I talked a little bit about the hardware, but it is important to understand what's going on. So they've got this unified memory where the memory sits between the GPU and the CPU and the neural engine and it has incredibly high memory bandwidth for a consumer device. And the neural engine is optimized for matrix multiplication. Ding, ding. That's what transformer models need. And the Apple neural engine is a massive matrix multiply unit and it can run at, for those Who Care, nearly 40 trillion operations per second. And it offloads that work from the CPU and the gpu. So it makes it well set up to run local models. And if you want to know the state of the art of what that can look like, go off and check on any A company called xolabs E X O L A B S. Oh, you know how to spell labs. Anyway, Xolabs Alex Cheema It's a London based company and they build a consumer distributed computing substrate for AI inference the way that BitTorrent distributed file storage and folding at home, distributed scientific computing. And if you're a real old timer, SETI at home, it's networked Mac Studio. So you go and buy these Mac Studios, you wire them up, you use the XL Labs framework and you can run really, really big models. And I think honestly it's a little bit vainglorious for the people who run them. They're like, oh well I've got 10 of these and I'm running a big model. But I think what EXO is doing is really, really interesting. There are hardware constraints, you know, the demand for compute, the demand for inference is extremely high. We have done some of our own very, very detailed modeling. So I feel very confident in saying there is a compute crunch, there is a utilization crunch and the demand is growing faster than the chips can roll off the production lines. And so what that does is that relatively relative appeal of running a model locally because you know, you don't want to be sitting there with a degraded service from an API when you might be able to run a reasonably good quality model locally. Models are getting better and better locally because as we know you can now run GPT4 class models on your local device. QIN 3.5 is a great example of that and people have run Quin 3.5 on iPhones. But more importantly than that is that these models are running on device and that plays really strongly into questions of trust, into questions of privacy. We have seen in legal cases people saying well your Claude chats are not legally privileged. We also know that people like OpenAI are thinking about advertising, so are they going to be inveigling into what we chat about in order to deliver persuasive advertising. And Apple sits in this unique position. It's high trust, it's pro privacy, it makes its money through hardware and no other company has that stack. And this makes Apple credible for those most sensitive AI interactions. Now I talked about this in an essay a while back. I called it the Quadrant Guardian. I was saying look, we're being inundated by all of these tasks and we can't get into Eisenhower's quadrant right up there because there was just so much stuff coming in. And I said look, I would love to have a Quadrant Guardian which works for me and only lets through the things that I want let through. And you know, we've had a little bit of that experience in the way that Apple now allows us to control the way notifications come in. It's more fine grained by the way than the way I control my notifications, which is I never answer my phone and it's basically on silent. And for those of you who've emailed me, you also know that I barely ever read or reply to email either. That's just my attempt to guard my time. Now Apple is really really well positioned to do that. And in the last few weeks as I've been using an OpenClaw agent working with Armini Arnold, I've seen the power of having something that is that local orchestrating what I need. And one of the things that I noticed is that the latency when you're interactive chat back and forth with an agent does matter. So I am putting a lot of my queries to Armenia Arnold through bedrock on AWS rather than directly through anthropic. And that adds an additional 250 milliseconds of latency and it's really, really noticeable and ever so slightly annoying. It doesn't matter with workloads that are running in the background, right? If it's going off to do a 30 minute task, who cares about that? 200 milliseconds, but back and forth, back and forth, back and forth, you do care. So if you can get that on the device, you're going to care quite a lot. And if you can get it on your device in a privacy architected way, which you know, Apple famously has been very, very good for and they really, really care about, that's even better. So you have this moment where the curves are changing. What goes on in the cloud are these really, really extreme and exceptional models. But maybe what we will want on our device are things that don't have to necessarily be out of this world. But, but I do think, you know, you end up as we see with the Perplexity computer, a world which is about a mixed model. So in the Perplexity computer you have an on device model, but you also have cloud models running as well. So it's a hybrid architecture. There is an always on local substrate and there's a cloud for heavier inference. That's quite important as we start to think about Apple. When Apple is able to run models locally or you're able to run better and better models locally without jailbreaking your iPhone and they are faster, then Apple's got that position with you. It also captures the part of the value that goes up to the cloud. It's either through your loyalty or perhaps through a commercial relationship. If you think about what you need as a consumer, you don't necessarily need a Nobel laureate quality answer for every single thing that you do. So what Dario Amadai has said, and look, I've changed my mind about this as well. I think Dario's now, he's so authentic in what he says and in general the things he said have come to pass. But Dario talks about a country of geniuses in a data center. All this Nobel quality activity. And that's all good and well if the type of question you're asking demands that kind of excellence. But for a lot of the questions we need in our day to day where we just need a little bit of support, we don't need to be pushed at that level. And what we've seen is that through model distillation, through optimizations, through efficiency shifts, we're able to get that frontier quality on smaller models that will run on device perhaps six months later, perhaps a year later, Perhaps six weeks later. And so at some point it will just be really, really good enough. Now the way I think about this, I think about this as the K problem. So what is it with TVs right, TVs got to 4K. And then a few companies came out with these 8K TVs which maybe some people have bought. No one's going to come out with a 16K TV and mean it. And the reason is that the human eye is can't resolve above 4k really. So why go beyond that? And I think in the day to day the kind of workloads we want to get back in seconds, orchestrating super complex tasks, household questions, issues about the news, figuring out our calendar. You don't need the metaphorical 16k monitor, you don't need GPT 19.6, you'll need maybe GPT 5.8 or 6.2 and that will be able to run on your edge device. And I think that that is a really important opportunity for an Apple. It's not just an opportunity for Apple by the way. So we should also think about Samsung here. You know they make a lot of Android devices. Yes, Samsung's an investor in Perplexity, but Samsung's also an investor in a company called Liquid AI. And Liquid AI makes AI models that have a different architecture to the transformer. They're an MIT spin out. But the liquid AI models are effectively one tenth of the compute load and complexity of the equivalent transformer model for the same performance. So they're not up at Opus 4.6 level, they're not up at GPT5.4 Pro, but you know, maybe a generation or half generation behind. But their models are a tenth of the footprint and Samsung has a position in there. I'm not going to say any more than that. I don't have any more insight than that other than there's clearly this edge opportunity. And so you know, the things that you probably mostly ask your AI and go and take a look at them and think about the things you don't ask your AI are probably those ones. Ideas about private moments, health, money, relationships, your investment portfolio, your pension, some kind of customer complaint. Those are things where you may, like me, be a little bit blase, it's just going in the cloud, who cares. But increasingly if you had the choice of getting those answered well and getting them locally and privately and within your own enclave, I think people would start to do that. And that on device advantage is really present for that. I think it's also present just with this idea that you have some sovereignty of that part of your cognition. I mean, this is a topic that I've talked about in a previous live, we talked about it in AI Vistas, which was this idea of where the boundaries of our cognition lie. And in a funny sort of way, when you were a teenager, you didn't want your elder sibling to read your diary. Well, your diary now lives in the chat log transcripts on ChatGPT or on Claude or on Gemini. And right now those are one legal order away from being read. Now that isn't to say that what happens in the cloud isn't going to be important, isn't going to be hugely important over time, that this shift to on device is going to slow down what happens on the cloud? No, far from it. Because of all those other types of workloads. But more importantly, more importantly, as I have discovered with Armini Arnold is once I have an AI orchestrator that knows and understands my context in the way that Armini Arnold does and in the way that models on devices, Apple devices that are in their privacy enclave will. I'm asking it to do more and more complex things that involve farming workloads out to the infrastructure that is out in the cloud. Which is why when I wrote that essay a few weeks ago and said I'm using a hundred million tokens a day, well, I'm out of date. That average now is closer to about 170 million tokens a day because I'm putting more and more workloads through that. And that doesn't include all of the things that ARM and Yarnold is coordinating through OpenAI's Codex and Claude code, which of course execute their inference in the cloud as well. The truth is we don't need a Nobel Laureate on our smartphone for everything every single day. But what we do need will be available on those Apple devices. So what you've heard me today do is to say, well listen, the facts have changed a little bit and that persistent advantage that Apple has I think is potentially starting to show. I don't think the Mac Mini purchase today will make a big difference to a multi hundred billion dollar revenue company, but they might be a signal that we are going to see this shift to local AI on local devices in the next couple of years. If you know someone who's been quietly buying Mac Minis or has been trying to buy them and can't send them this episode, they will recognize exactly what we've been describing. And if you want to go deeper into what life is like with an AI chief of staff. Go back and find my episode where I talk about Armini Arnold, and how my workflow, my patterns of work and my patterns of thinking have changed. It pairs really well with what we've covered today.
