Transcript

A (0:00)

Two days ago, Goldman Sachs's chief economist said that AI investment had added, in his words, basically zero to US GDP in 2025. But here's the thing. They're looking at companies across the economy, but they don't look at people like me. A small number of us have already deployed what amounts to a 5, maybe a 10 person team working round the clock on our behalf. I'm not special, I'm just early. The gap between the people who've started and the people who haven't started is widening every week. Is this going to make me worse at certain things? Am I going to think less carefully before delegating because the system is so capable? Am I sharpening my judgment or losing the muscle that that judgment requires? Welcome to Exponential View, the show where I explore how exponential technologies, in particular AI, are reshaping our future. And it does feel like that future is coming ever closer and becoming more like a discussion of the present. Now, each week I'll share some of my analyses or speak with a guest to make light of a particular topic. But this week I want to show you something. What you're seeing on the screen right now is a piece of software. It is a knowledge dashboard that I use to track what I need to do each day. It brings together material from my internal systems, from our CRM system, from our research engine, from X, from my email, from other sources. And it ranks the things that it thinks are important for me and given my work, a lot of the things that are important for me are outward facing. They're not necessarily about projects and things that are happening internally. This is a live knowledge dashboard. The thing about this is it's live now, right now, and it runs on a Mac Mini in my studio just over there. You can't see it, it's in an equipment cabinet. This didn't exist eight days ago. I haven't written a single line of code of it. It was put together by six AI agents overseen by a super agent. Or perhaps I should say it was one AI agent overseeing six sub agents. And they built it overnight over the course of a couple of days with my feedback. They argued about the database schema at three in the morning. They wrote tests for each other's codes, they deployed it and I woke up and it was running. The first time it was running it was a bit ugly, it was a bit shonky as any new piece of code, but ultimately it works and I've iterated a couple of times and it's something that I now use every day now that super agent, the primary agent that orchestrated and coordinated all of that, is called R, Mini Arnold. Now, the R comes from Isaac Asimov's novels. In those novels, robots that were intelligent were given that moniker, R for robot R. Daniel Olivel. And it's what we use when we are naming agents of the type that I've just described today with an exponential view. What about the name Arnold in the Terminator films? In the second Terminator film, Arnold Schwarzenegger comes back to protect humanity from the even more terminatory Terminators. And so our Mini Arnold, because my agent is not as big as Mr. Schwarzenegger, is, sitting in that Mac Mini and doing all of that work. I've been running Armini Arnold, or RMA as I will call it, during the course of this conversation for about a month. And it has really changed the way I work more than any single tool since the web browser. I can't overstate it. You're going to hear me talk about it now for 25 minutes. But it has changed the way I work. And what is it? Well, Armini Arnold is, for want of any kind of better word definition phrase. The academics can argue about this. It is an AI agent. It is the AI agent that does things that are similar to the AI agents that were imagined back in the 80s and the 90s. Think about Alan Kay's famous Knowledge Navigator and what I've experienced over the last three or four weeks is that a lot of the discussion about AI agents, especially when we think about it in economic terms, perhaps might have missed the point entirely. Two days ago, Goldman Sachs's chief economist, Jan Hatzius, said that AI investment had added, in his words, basically zero to US GDP in 2025. And earlier in the week, there was the paper from the NBER that suggested that 80% of American, British and Australian companies were reporting no productivity gains. And that headline is repeated in lots of other places. There was a PC magazine that said the AI agent hype is real. The productivity gains aren't. Now, there are many ways to interpret that number, of course. Not least, that means that 20% of firms just three years after the ChatGPT moment do claim to see productivity benefits. And that concords with our proprietary tracking of US public companies. Genai claims it lines up with what our friend Erik Brynjolfsson at Stanford University is starting to say about the productivity data that is becoming visible. But here's the thing. Those numbers measure a particular type of real thing, and perhaps they're measuring something that is less important. Maybe they are Measuring the wrong thing. They're looking at companies across the economy. Sometimes they look at individual companies, but they don't look at people like me and they can't capture what is going on there. The revolution isn't merely just general intelligence. One of the things I've observed is that when the cost of delegation falls below the execution cost for a growing fraction of what we call knowledge work, when that cost falls by an order of magnitude, you do much more of it. And that's basically the whole story. What RMA has done is it has made it really, really easy for me to just bark orders sometimes into a system in parallel and get lots and lots and lots of work done. If you think about a big company, they're running pilots, they're building governance frameworks and they're hiring chief AI officers, a job that exists today and will not exist in five years. But they have to contend with all the issues of big companies and that's what turns up in the statistics. A small number of us have already deployed what amounts to a 5, maybe a 10 person team working round the clock on our behalf. Now I'm one of those people and I'm not special, I'm just early. The gap between the people who've started and the people who haven't started is widening every week. And that's not just because the technology is getting better constantly though, it is. But it's because that relationship that you have with the technology compounds. That relationship starts with when you started to use ChatGPT, when you started to move from quick summarize this paragraph of text to more complex multi turn prompting, when you started to use deep research, when you started to think about your prompting strategies in order to get better outputs. That compounds rapidly. But the other thing that compounds is that the agent RMA in my case has worked with me for 30 days and it knows things about me that no brand new tool does. It knows things in certain types of contexts that you don't get if you're using Claude or ChatGPT with their memory capabilities. Let me just take you through the bits and pieces that make up our Mini Arnold. Our Mini Arnold runs on a Mac mini. It's got 64 gigs of RAM. Look, I bought this Mac Mini for the agent. I'm obsessed with buying more RAM than I might possibly need because it's the one thing that's really complicated with Apple computers to upgrade, you can always add more storage, you stick an SSD in. But RAM is a sort of one way street. It's got 64 gigs of RAM. It's connected by wired ethernet, a 10 gigabit ethernet directly to my firewall, has a 5 gigabit Internet connection out to the world and it sits in the equipment cabinet in this studio running 24 hours a day, drawing about the same amount of power as a desk lamp. The software that it's running is called openclaw. I've written about it a lot on Exponential View. It's open source, it's, it's self hosted, it runs on my hardware, my data largely stays on my machine. Some of it is in a shared Dropbox in the cloud so I can send things to Armini Arnold. Some of it in terms of the long term memory gets exported out to a vector store that's in the cloud so that it can learn my preferences over more extended periods of time. And the model underneath most of the time is anthropic's Claude Sonnet 4.6 model. Sometimes I flip up to Opus 4.6, sometimes I go down to the quick and Easy Haiku 4.5, 4.5. And from time to time RMA might call an OpenAI perplexity or a GROK model if there's something really particular that I need. But I would say from time to time not even 1 in 100 queries that I do. Last week Sam Altman snagged Peter Steinberger, the developer behind OpenClaw. He's joined OpenAI. He had been pursued by Mark Zuckerberg, messaging him on WhatsApp. I don't know how Mark got his number, but I guess Mark owns WhatsApp, so maybe he has ways to see if they could support that business. I know that many, many Silicon Valley VCs were really trying to get hold of OpenClaw, trying to get a hold of Pete to see if they could support that business. So you can draw your own conclusions about what that tells you about how much of a shift in consumer user interactions we've seen from the model of the AI agent that OpenClaw has delivered to us. So that is the, that's a structural story and I've written about it on the newsletter. You can go off and see some more details. Let me describe to you what it actually looks like. Like many of us, I wake up early in the morning. By the time I'm downstairs stretching or making coffee, RMA has been running all night. And Certainly by about 6 or 6:15 in the morning, a morning brief arrives on my WhatsApp. And that morning brief is similar to things I'm sure you're used to the calendar for the day, priority emails flagged overnight. But of course, given the work that I do in research and analysis, it includes interesting analysis, the top stories that have come out of Prism, which is our research backplane that we use at Exponential View, that has thousands of things that the team has read and annotated and their own analysis. It also includes RMA's own analysis of what it thinks might be interesting for me, given the wider context it has of me. And part of that wider context is the information it gets out of Orbit. And Orbit tracks all of my major relationships, work relationships. So it's a flashy CRM. And Orbit will be connecting the types of people I might be meeting in the next few days with the research and the stories that are happening today. And RMA assembles all of this together and delivers it to me. I'll get a summary of all the tasks that have run overnight and I will run quite complex research and coding tasks overnight. I'll be told which ones succeeded, which ones failed. There's a whole lot of housekeeping. RMA runs as a system and it might have to give me details of things that need my attention. Now, what are these tasks? Well, some can be quite complex. So you may have seen that a few days ago, some substack called Citrini Research wrote an essay that was pretty interesting. A detailed scenario of an economic collapse powered by AI that ran out for two or three years. It was obviously trending on Twitter, it was blowing up on Substack. And according to some of the financial press, it also affected the markets and resulted in a real shock to the markets at a time. So I was really curious about digging into the authors and the team behind Citrini and behind that essay. And so I asked Armini Arnold to do some research. Who are those authors? What's their heritage? Where have they come from? How serious are they? How robust was the work that they had done? And now I know RMA set up a team of investigators to go off and comb through this. I think at one point there were five sub agents working on this task and doing web searches and drilling down and trying to see what they could learn. And I've got back quite an interesting report, which if I have time, I may put some of in the newsletter on Sunday. But that wasn't the only thing that was happening overnight. So another set of agents were refactoring the code of my CRM system. What does that mean? It means that this is a system that I designed and I've built over the course of a few weeks. And of course I'm interested in shipping features. I'm much less interested in thinking about the elegance and the sort of structural integrity of the. The code. That's true for any code. It's the accumulation of tech debt. I told RMA that what I wanted to do was assemble a team of security researchers, quality analysts, documenters, to walk across the code base of the CRM and fix what was broken. Reporting back to an architect who had confirmed that everything still worked. It was a major piece of work. Several thousand lines of code were touched overnight. I also have a different team that works every single night for me of agents. These ones are running OpenAI's GPT codecs, and what they do is they walk across all of my other GitHub repos, making small improvements, finding tiny bugs and shutting them down. And finally, RMA has a team of security agents, some of which are inside the firewall, some of which are outside the firewall, which effectively find and patch security vulnerabilities across the system. And typically they don't find much, but normally once every three or four days, something does crop up. So this is like having a chief of staff who can draw upon a number of specialist teams essentially constantly, overnight. And if you think about what I just described there, how much of that was me initiating work? Well, two of the pieces, the piece about checking the CRM code and doing the research and the digging around, you know, the Citrini essay, were de novo tasks for the day, but the rest are things that are just scheduled and run day after day. Now, what is the interface for all of this? The interface is WhatsApp. The reason is that's the interface that I use for so much else. Why not just use WhatsApp? It's the same app that I use to message my wife and my family and my friends and my kids and the people I work with. There's no special dashboard required. I have built that dashboard that I shared with you earlier because it was really useful given the breadth of things that I'm working on, to have a dashboard like that. I mean, you just don't really want to be flagging research stories on and off through WhatsApp, but you don't need that dashboard. And in fact, I only check in on that a couple of times a day. What I do hundreds of times a day is use WhatsApp effectively to delegate and check and verify work so that my agent and its team of agents are working on. So Today I open WhatsApp and I talk to it. The agent comes back, it picks up where we left off previously. It knows my priorities. It knows who to contact and why. And I'm going to show you what that looks like. So here you can see what my WhatsApp looks like. I have a tag actually for all of the various RMA agents. So this is the thing that's quite strange. It's not a single conversation. It is eight conversations. In fact, there's slightly more than that. Each different bit of context that I have for long running tasks. With rma, I create a communication lane for SO R Mini Arnold right at the top. It's one of my favorites is the General Chuck it at it and figure it Out. RMA IGB is the channel for my new book. This sub agent there has all of the chapters. It notes the chapter structure. It knows the argument, it knows all of Chantal's latest findings. It has access to Prism so that it can look things up for me. The channel on Orbit is the CRM. It knows about my contacts and other details, but it also has access to the GitHub repo. So when I see a bug quickly during the day, I can tell it to fire off and fix it. Rma EV research are all the things that relate to the research that we're doing within Exponential View. Whether it's for the newsletter or for something else, it's the same AI but with eight specialized instances, all running in parallel, all with their own memory, all with their own tasks, all with their own context. Now, the script says they don't talk to each other and each one knows deeply about its domain and not much about the others. The truth is they do sometimes get confused and sometimes I will be in the EV Research channel and I'll get a response from one of the other channel lanes showing up. This is early software. There are going to be these types of issues and teething trouble. So if we then think about what that means, I mean, it's a really, really a kind of remarkable situation because of course I've got this master orchestrator army, Arnold, but I have these eight contexts which it should technically adhere to. And it's like having those eight simultaneous relationships, but it's with the same entity. There is some shared context. I haven't figured out exactly how to do this perfectly. I'm still trying to figure it out. I've experimented with other things. I experimented with using Telegram for this, but I don't really like Telegram as an app. I experimented using Slack for it, but I just again found I don't spend as much time in Slack as I do in WhatsApp. I had it actually build me a small app that was a persistent web app where I could keep track of all of these channels. None of those work for me. Those approaches might work for you. Everyone is different. The asymmetry in all of this I find most interesting is not AI and humans. It's actually individuals and institutions because I'm currently running these capabilities that I doubt any Fortune 500 company has deployed at this level. So let me tell you about the type of impact that this system has had for me, and in that it'll address questions like how much does it cost? Or is it worth it? A quick note. If you want to support us in bringing more of these conversations to the world, please consider subscribing to the show. I'll give you three small vignettes. I had a wonderful meeting with a major sovereign wealth fund a couple of days ago. As a big meeting. They're impressive, super, super smart. And I had to rush straight from having a filling at the wonderful dentist just down the road from me up in Hampstead in London, and it was going to be a really, really hectic morning. RMA had figured out a brief for me and said, you've got this dentist thing and then you've got to run straight to this meeting. Here's a brief. Recognized the name and prepared me for it. The briefing was pretty thorough. Of course it had all the obvious things you'd expect because it's gone to the CRM and it's pulled those details. But it also has access to all of my granola transcripts and any other conversations I've had that are relevant. And it has access to prism. And so PRISM contains lots of research as well as our house views, knowledge of how the team in Exponential View thinks, uses frameworks, thinks critically, and it knows a lot about the person I was meeting. So the context I was given was really quite remarkable. And I've used CRMs that prepare you like this previously and I've used AI systems over the last year or two that try to prepare you like this, seriously. But they sort of lack that context and that nuance that I have through rma. And I will explain in a few minutes why I have so much context and nuance there. What I got was something that described the investment posture around compute data centers, chips, their wins, the open questions, all anchored in a really broad set of news and analysis that I might have missed. So I walked into that meeting differently. I wasn't scrambling to look somebody up in perplexity, while I was running out of the tube, I actually had some real context. It's the kind of thing that a great chief of staff with plenty of time would have prepared if I had one and had had the time. The thing that surprised me because I hadn't told RMA to do this was after the meeting, I moved off to the next thing, which as it happened, was to go to the Apple Store to get my laptop screen repaired. It updated the CRM with the fact that I'd had the interaction. It didn't have much details, but I did have a WhatsApp saying anything you want to add as notes from this interaction. And you know, that again was just an open loop that I would have had to remember and had to surface. And it also reminded me that I've met a peer to that person previously and whether it's worth reconnecting with them as well. Pre meeting briefing, that's fine. That's reasonable inference. The quality of that briefing better than things I've seen before. The post meeting logging in a quite a non invasive way. And I would just say something about that non invasiveness which is that if I have a zoom meeting where I've run granola, I don't get that flag. I guess it sees it's got the granola transcript and it can populate the CRM that way. Now here's the thing. I actually don't really know how that happened and how it worked. I don't know if it'll happen reliably week and week, day after day. I mean, these LLMs are jagged, as we know, wobbly in some cases. But this isn't the case of the AI doing my homework. This is an agent that understands the shape of that professional relationship and all the effort I've put in to ensure that there is enough context around it. And it knows that a meeting doesn't end when you leave the room. So I found that pretty interesting, quite special again, compared to things I've used previously. Here's a second example, and I love this example because it's about how this episode was made. So 24 hours ago I needed a script for this show. I knew roughly what I wanted to say. I had the thesis, it's the delegation cost argument. I had a few stories. I know the audience. What I didn't have was time. And you know, as regular listeners know, I work with Chantal normally on the scripts. So I thought, I've had a pretty busy Thursday, it's going to take me four or five hours I was frankly exhausted by the time I got home. So I went to WhatsApp and I gave RMA a brief and I said, look, I'm doing this live substack. It's kind of about you. It's tomorrow. Here's the thesis, here's the tone, here's roughly what I want to cover. And I want it written in my voice, the voice I've been cultivating over 30 years of my professional experience. Go, RMA. Go and figure it out. And then I closed WhatsApp and went to do my other thing, which was to finish reading the book I was reading. So what happened in the background, RMA spawned four specialist sub agents simultaneously. Each one got a brief, a task, and its own context window. So the context window is the working memory of an LLM during sort of a back and forth. Mine have generous context windows. It's about a million tokens. They ran in parallel. It's four separate instances of Claude Sonnet in some cases, Claude Opus in others. Four different research threads running at the same time. Now, the first agent RMA called the archivist, went into the memory layer. What it was trying to do was find out all of the things that I had asked RMA to do over the past 15, 20 days, 30 days, identify which ones could make nice vignettes. It searched 79, tracked behavioral patterns for corrections. The dozens of times I've corrected it, including correcting how it should use its name and refer to itself. It extracted the writing rules that I've taught it and the explicit ones that we developed through the stylometer product. In some cases, instructions I've written down. They're instructions that the agent has learned over the time, patterns that are extracted. The second agent that it created searched the external landscape. What was happening with AI agents this week. It found that Goldman Sachs zero productivity data. It found that OpenAI had just hired Pete Steinberger, something I was well aware of, but it founded itself. He's a guy who created OpenClaw. It found the story about Zuckerberg's WhatsApp message. It baked it into the script. The third agent was the evidence collector. Numbers, statistics, the 79 things I've had to teach it. The 15 consecutive analysis briefs on AI in India. The 179 mistakes that we have imported into its SOL MD document, which is a specification document that OpenClaw agents have. Every number in the script has been checked by that dedicated agent and by the way, by a QA agent that came afterwards. Their job was to make sure I didn't say Something that I couldn't defend. The fourth agent researched the format. So what makes a live substack show land? What do the first 90 seconds need to do? Where should I put the screen shares to maximize that visual hook? It read all of the transcripts of the previous Friday discussions like this that I've led. It looked for engagement patterns. It looked at similar, better, more experienced, different presenters and podcasters and came back with structural recommendations. And the fifth agent took all of these research packages and built a narrative structure. The final agent wrote a full script. 4600 words, technically, 28 minutes, technically, in my voice. The total token cost for that. Well, I'm a bit generous about this. I give the agents really large token budgets to work with. They never come close. So I told it this was consequential. I was going to be reading a lot of this out to this incredibly important audience, this group of people who matters so much to me, which is every single one of you. And I said, 300 million tokens. That's your budget to go and do this piece of work. A million tokens is five to ten bucks. So I was thinking, maybe this will cost $1,500 to do. In fact, the agent pipeline used 280,000. I expected it to come three orders of magnitude below because I've been using it for a long time. That is what I did, and that is how this script came about. Now, that process took 40 minutes of wall clock time. Army Arnold told me this was going to take all night. It often says things like that and then tells me to go to bed. Took 40 minutes. I was reading my book. I came back, read the draft, sent some corrections, a handful of corrections, and then I waited till the morning to look at the version that showed up. And the pattern here is not like a. I wrote my script. You know, I work with people who help me with my script. We know about Chantal and the other researchers. That's not what this is. This pattern to look at is orchestration. I described what I needed. I did it with a nuance and a complexity because I've got some experience in this field and people, things, systems, I delegated to, went off and did the work. I set the objective, I allocated resources, I specified constraints, and ultimately I reviewed the output. Everything that any manager of a team does, I mean, that's what we do. And I went off and did something more enjoyable. As I said, I was reading this book, and I think it's important to note that the review of the script was not just seven questions I mean, it was a couple of hours, two and a half hours, certainly less than it would have been if I had had to do the entire process end to end. As a mark of experiment and disclosure. This is the first time we've tried this. This is the very, very first time we've attempted a structure like this for one of the shows in this detail because I wanted to demonstrate viscerally where we might be headed with all of this. So there's two and a half hours of me time going back and forth and into all of this. But this is the shift, right? AI is not a faster typewriter, it's not better spell correct, manned by a stochastic parrot. It is becoming a team that can be briefed and in some cases trusted to go away and come back with something worth my time and more importantly, worth your time. The fact that you're hearing any of this at all is evidence that at least in my mind, that process worked. So I'm going to keep going. I want to talk about the SOL MD document as well. So this is a file that's on the computer. It's only 12 kilobytes, it's tiny, but it's the personality specification for RMA. It's like a super prompt, if you will. Now the thing about SOL MD is that some people go off and write their own because they want to really tune their agent in a particular way. But what I like to do is I prefer revealed preference. You can tell a lot by looking at what people do rather than asking them what they would do. And so what I asked Armini Arnold to do was to look at our interactions, look at how I work and come back with a proposal of what should be in that SOL MD document. It found those 179 things that had got wrong, by the way, just in the first 10 days. It's much, much more than that. These were tasks that broke emails that were wrong and ham fisted code that didn't run at one point. Oh my God, it corrupted its own config file late on a Saturday night. I had workloads that I wanted to run overnight and I sat there debugging JSON at about 11pm on a Saturday night. I was pretty darn cross about that. But ultimately it had broken itself so badly it couldn't recover. And each time something went wrong, I correct it, sometimes politely, sometimes tersely, sometimes I swear. And it got a load of corrections. But the System also extracted 146 behavioral patterns from our interactions and from those corrections. And these are Patterns that were from my actual behavior. And from those patterns, it built a big five personality profile. So as many of you will know, the big five is the most scientifically robust way of personality profiling people. RMA has a score of 4 out of 5 on openness. Extraversion level is lower. 2 out of 5. I don't want it to over explain. I don't want it to be jazz hands and rubber chicken. It's got to be emotionally stable. It's given a score of 4 out of 5 and it absolutely needs to be conscientious. 5 out of 5. Lots of the corrections were about file naming, config, safety, debugging. The result is it does behave better and it will check a credential before claiming it's expired, but not reliably. I have to say I still have to do much more work. It still will make small mistakes, that it really should know better in a way that a human chief of staff would. They would know the pressures on my time, they would know the mental load that I'm juggling and they would know which I's and T's have to be absolutely correct. But let's be clear, this is not sentience. This is an elaborate behavioral specification that is read fresh every session every time the gateway restarts running and being interpreted by a large language model. It is mega prompt engineering. If you've ever managed a team, you know that the people you trust more are the ones who learn from their failures. And I think it's quite a nice feature that Steinberger has built into openclaw. This ability for it to learn from its failures. Now I know this is sounding like a pitch, but as I said, this is the most remarkable software I have used since I happened on my first web browser, which was the links text only browser back in the years beyond 1991 I think. But I am thinking about a few things. Is this going to make me worse at certain things? Am I going to think less carefully before delegating because the system is so capable that I don't have to specify when RMA handles all my research synthesis. Am I sharpening my judgment or losing the muscle that that judgment requires? Look, I don't know yet. I think that's a very, very first order and obvious claim and you can build every type of analogy. I learned to drive before anti lock brakes were a standard. So I was taught about that sort of pumping to allow your car to brake in slippery conditions. ABS brakes means I've no idea how to do that now. And anyone who's learned in the last 20 years has no idea how to do it either. Has that made us worse drivers? I'm not sure, but I am aware of of the risks. So I have taken quite particular steps to ensure that good thinking is still happening the extent that it can over a WhatsApp channel. So, for example, lots of my modes of reasoning through revealed preference, again, rather than my dictating them, have been turned into deterministic patterns. That means they're not running stochastically, probabilistically, they are sort of deterministic patterns and they're often thrown in as checks to any of the valuable outputs that comes out of these systems. And the purpose there is to give me something that I can read with criticality, that is not just reams of LLM slop, because then what would be the point? And of course, as we know, there are lots of other things that I do and my team does to make sure that we are using the machines, but becoming really good at using them because our minds are sharp. So earlier this week, my colleague Nathan spent about three hours, just pen and paper, working on some piece of research he's working on and sent me this amazing photo of 18 or 20 pages of A4, all handwritten and all diagramming his thinking. So we're doing these types of things. I'm tracking it, reading the research as well, but I think the jury's out as to where this goes. But the second thing I would say is that it has a lot of context about me and it knows those professional relationships, it knows my business priorities, it knows what's happening with the book and some other projects, and it's able to draw context across all of them. And it will sometimes flag contradictions in my own thinking, which is kind of a surprising output. It's useful, maybe a bit disturbing, maybe just a bit exciting. The other thing that I find really fascinating is quite strange is I've set up RMA to review its code, and every single night it clones those GitHub repositories, all of them. It checks for tech debt, it checks for security failures, test coverage, if they are things that it can handle itself because they're of a certain quanta, it does it itself end to end. It files the findings to me and sends me a report and asks me to approve an action plan. You know, this is an AI agent doing code review on itself. I don't think this is the intelligence explosion, but it is quite interesting that I feel that my tech debt levels are going to be kept relatively low. I'M currently running these capabilities that I doubt any Fortune 500 company has deployed at this level. Their AI is the enterprise version. It's got to be standardized, it's got to be audited, it's got to have SOC 2 compliance. It's got to be averaged over 100,000 employees. My note is who I'm trying to stay in touch with, what I'm thinking about at the moment, what the next book is, where we've hit roadblocks in a project which funds I'm meeting. And out of all of that, what could be a proposal for the best next action to move that forward. And that institutional lag here is going to be measured in years, not months. And it does create that strange inversion. It's like when Twitter was bursting on the scene in 2008 and 2009 and companies were blocking it. A few traders were using Twitter and were getting a sniff on the market early. But this inversion is even bigger. The individual knowledge worker running this open source software on a $600 Mac. But you could run it on a $250 Mac. You could run it on an old laptop that you've got with a cracked screen. You could run it in a VPS in the cloud and then using AI tokens. I mean, I happen to use Sonnet, but lots of people are having success with some of the open source models that are comparable to Sonnet and that are 10 times cheaper. And you're getting a more capable infrastructure than most giga corporations will be able to deliver. Maybe that gap will close, but right now it's here and I think it matters. The software that I'm running is not going to be a fringe technology for long. And the reason is that Pete steinberger has joined OpenAI and they're going to allow the open source project to keep running, at least for a while, but they're clearly going to think about how to productize this. Meta has acquired Manus, the Singaporean firm mostly known for its deep research capabilities. Right, but those research capabilities were all about multiple agents being spawned at the same time. Manus has released its own agentic platform, which you can play about with. I'm sure Meta will want to bring those capabilities internally. And of course you have other Chinese companies like Kimi. Kimi has the Kimi Claw agent, which you can get if you subscribe. I think it's 35 bucks a month and you can have a play around with it. What I've shown it's proof that this can all work. Right. It's Alan Kay's Knowledge Navigator example. But the moves by OpenAI, by Meta, by Kimi, and no doubt by the others will show that it's coming to everyone. Easier to set up, cheaper to run Sandbox, more secure and more reliable. And of course, I could say it'll be this year. That's an easy thing to say because Kimi has already launched Kimi Klaw, but from the Western companies, I'm pretty certain this year. How has this changed my behaviour? What would I suggest you go off and do? I think the first thing that I would suggest is think about a task that you are delegating and try delegating to an agent this week. You know, one task, Claud Cowork, is a really good example of an agent to use if you don't want to go through the sort of the palaver and the intricacies of an Open CLAW agent. The second thing to do and to understand, and this is as true for Claud code as it is for OpenClaw, is what are the specifications, like that SoleMD document or the Claw MD document that these agents can draw upon? Because that is quite an important way of in some way shaping, perhaps not constraining, but shaping its behavior. So it works in ways that work for you, that don't annoy you. I mean, I'm sure you felt this with the LLMs anyway, which is that I found that Claud is actually has always been, for a couple of years, quite likable. And for a while it was quite likable, but just not as good as GPT5 from OpenAI. But the tone of GPT5 or 4.5.2 wasn't quite as nice as that from Claude. So I kind of preferred Claude for a lot of things. And I think that will be true for the AI agents that you choose to build. But then once you've done those two things, here's the thing that I do that I think is worth experimenting for yourself, which is push the boundaries, right? It's really hard to get excited with stuff that isn't consequential, a better to do list. I mean, all of us have been through like personal productivity hell. Remember the milk getting things done, the Eisenhower matrix? I mean, to do lists are never things that get people out of bed. So start with something that is actually consequential. And you can see that I have put the most consequential things that I'm working on into Armini Arnold, as, by the way, I have with Claude, Cowork and some of the other tools that I use because if it's consequential, you are going to care about getting it to work and you are going to care about the experience that you have from it. So if we run back to all of this, I've got this Mac mini. It's in my home office in the studio at the back of the garden. It prepared a briefing on a sovereign wealth fund that I didn't ask for. It teaches itself from its own mistakes. It reviews and improves its own code. At 3 in the morning, it can run. I mean, my record, I think I was talking to one of my colleagues, was 43 parallel overnight on a mammoth piece of work. I've named it after a friendly killer robot and I've given it that Asimov prefix just to ensure we all know that it's a robot. Apart from helping me get lots more done, it's raised some questions around, you know, how come this is so much better, given the context I've given it? And what does it mean in terms of the way that I now work and think about the frontier of places that I can effect now because of this capacity and capability that I have? I don't have the complete answer. I know the question matters. The important thing is the people who start asking it and start thinking about it are the ones who are going to be able to shape to some extent what comes next. Thanks for listening all the way to the end. If you want to know when the next conversation is released, just hit subscribe wherever you're listening. That's all for now, and I'll catch you next time.

Exponential View with Azeem Azhar

Episode: Showing You My AI Chief of Staff (OpenClaw Practical Guide)
Date: March 5, 2026

Episode Overview

In this solo episode, Azeem Azhar provides a practical, behind-the-scenes look at his personal AI chief of staff: a homegrown, multi-agent system built on OpenClaw and running on a Mac Mini. He walks listeners through how this assemblage of specialist agents—collectively known as "Armini Arnold" or "RMA"—now orchestrates much of his research, communications, and daily workflow. The conversation dives into how AI agents are redefining knowledge work, why individual adoption is quickly outpacing the enterprise, and what it means to live and work with a personalized AI team today.

Azeem reflects on the economic context, the evolving divide between early adopters and others, and his own changing work habits, inviting listeners to both experiment and consider the broader implications.

Key Discussion Points & Insights

1. AI’s Economic Impact: A Personal Perspective

Contrasting mainstream assessments: Goldman Sachs’s chief economist recently claimed that AI had added “basically zero” to US GDP last year. Azeem argues these measurements miss individuals like him leveraging personal-grade AI teams ([00:00], [03:00]).
- Quote:
  “They don’t look at people like me… I’m not special, I’m just early. The gap between people who’ve started and those who haven’t is widening every week.” — Azeem ([00:36])
Power of individual AI leverage:
Azeem notes that those proactively using advanced AI assistants experience a compounding benefit, rapidly outpacing even large organizations’ adoption ([05:15]).

2. The Anatomy of Armini Arnold (RMA)

What is RMA?
- RMA is an orchestrating AI agent overseeing multiple subagents, each with specific roles (research, security, code improvement, CRM, etc.).
- Runs on an Apple Mac Mini, mostly uses Claude Sonnet 4.6 but occasionally queries Opus, Haiku, OpenAI, Perplexity, or GROK models ([07:00]).
- Built and iteratively improved using only Azeem’s prompts; no direct code written by him ([01:00], [02:00]).
- Quote:
  “It was put together by six AI agents overseen by a super agent… They argued about the database schema at three in the morning.” — Azeem ([01:16])
Data and privacy:
RMA operates mainly on Azeem’s own hardware with much of the data local, some backed up to Dropbox and a cloud vector store for long-term memory ([11:00]).
Agent orchestration:
Tasks are decomposed and delegated: RMA directs clusters of subagents for research (e.g., vetting viral essays), codebase improvements, and security scans ([15:00], [17:30]).
Interface:
WhatsApp is the primary command and feedback channel, with different ongoing “conversations” for separate contexts (e.g., book writing, research, CRM) ([22:00]).
- Quote:
  “What I do hundreds of times a day is use WhatsApp effectively to delegate and check and verify work.” — Azeem ([24:15])

3. Daily Workflow & Use Cases

Morning routine:
RMA provides a custom WhatsApp morning brief with calendar, emails, priority research, and relationship-focused insights ([13:45]).
Task orchestration:
Example: In-depth, overnight research into a trending essay (Citrini Research) involving multiple subagents; simultaneous code maintenance and bug fixes by other agents ([15:00]).
Post-meeting support:
RMA logs interactions in the CRM, prompts for follow-ups, and integrates transcripts automatically ([31:00]).
- Quote:
  “It updated the CRM with the fact that I’d had the interaction… It also reminded me that I’ve met a peer to that person previously.” ([32:20])
Script generation for this episode:
RMA coordinated multiple agents for research, formatting, and writing, handling nearly all of the script’s assembly autonomously based on Azeem’s brief—at a small fraction of the anticipated AI token cost ([37:00]).

4. Emergence of a New Work Model

From tools to orchestration:
The shift is not merely to a “faster typewriter,” but to the ability to brief and orchestrate a bespoke “team” of agents ([43:30]).
- Quote:
  “AI is not a faster typewriter… It is becoming a team that can be briefed and, in some cases, trusted to go away and come back with something worth my time and, more importantly, worth your time.” ([44:49])
Self-improving agents:
RMA learns from repeated corrections, assembling a behavioral and personality profile (the “SOL MD” document) extracted from actual interactions and feedback ([47:10]).
- Quote:
  “If you’ve ever managed a team, you know the people you trust more are the ones who learn from their failures. I think it’s quite a nice feature that Steinberger has built into OpenClaw.” ([50:12])

5. Human Judgment, Agency, and Limits

Delegation and judgment:
Azeem openly wonders if delegating complex reasoning to the agent risks dulling his own critical skills, likening it to how automatic brakes changed the driver’s experience ([55:30]).
- Quote:
  “Am I sharpening my judgment or losing the muscle that that judgment requires? Look, I don't know yet.” ([55:44])
Maintaining critical thinking:
RMA is configured to run deterministic checks and replicate Azeem’s revealed preferences—attempting to automate (but not eliminate) criticality ([57:00]).
- Human–AI interaction, not replacement:
  The team still carves out time for pens-and-paper work and deep reading to avoid total reliance on agentic workflow ([59:00]).

6. Individuals vs. Institutions

The great inversion:
A single motivated individual using open source and cloud resources now has more advanced “chief of staff” capabilities than most large companies can deploy, due to institutional inertia, compliance, and the slow pace of enterprise IT ([01:06:00]).
- Quote:
  “The asymmetry in all of this I find most interesting is not AI and humans. It's actually individuals and institutions…” ([29:50])
- “The individual knowledge worker running this open source software on a $600 Mac… you’re getting a more capable infrastructure than most giga corporations will be able to deliver.” ([01:09:30])
Institutional lag implications:
Corporate adoption will take years, but as open source projects like OpenClaw and new agentic platforms (Kimi Claw, Manus) proliferate, this advantage will become widely accessible ([01:12:00]).

7. Advice for Listeners

Get hands-on:
Start by delegating a single consequential task to an agent this week and experiment with specification templates (like the SOL MD document) to shape agent behavior ([01:13:00]).
Push meaningful boundaries:
Avoid trivial to-do lists; instead, test agents on substantive work to realize their true value ([01:15:30]).
- Quote:
  “Start with something that is actually consequential… you are going to care about getting it to work and you are going to care about the experience.” ([01:15:50])

Notable Quotes & Memorable Moments

“The gap between the people who've started and the people who haven't started is widening every week.” ([00:36])
“This is like having a chief of staff who can draw upon a number of specialist teams essentially constantly, overnight.” ([17:10])
“What I do hundreds of times a day is use WhatsApp effectively to delegate and check and verify work...” ([24:15])
“AI is not a faster typewriter… it is becoming a team that can be briefed…” ([44:49])
“The asymmetry in all of this I find most interesting is not AI and humans. It’s actually individuals and institutions…” ([29:50])
“The individual knowledge worker… you’re getting a more capable infrastructure than most giga corporations will be able to deliver.” ([01:09:30])
“Start with something that is actually consequential… you are going to care about getting it to work…” ([01:15:50])
“Am I sharpening my judgment or losing the muscle that that judgment requires? Look, I don't know yet.” ([55:44])

Timestamps for Key Segments

00:00 – 03:30: Opening reflections on AI economic impact and personal use
07:00 – 11:00: System structure, hardware/software overview
13:45 – 17:30: Daily workflow, task orchestration, agents’ roles
22:00 – 26:00: WhatsApp interface and interaction model
31:00 – 35:00: Meeting preparation and post-meeting automation
37:00 – 45:00: Script generation case study (for this episode)
47:10 – 52:00: Continual self-correction and personality tuning (SOL MD)
55:30 – 59:00: Concerns about judgment and critical thinking
01:06:00 – 01:11:00: Comparison of individual vs. institutional adoption
01:13:00 – end: Advice for listeners; meaningful agent use cases

Conclusion

Azeem’s episode is less a “how-to” and more a provocation: personal agents are not just a theoretical future—they’re here, creating a new playing field for those willing to build relationships with them. He encourages experimentation, emphasizes the importance of crafting rich context and intent for your agents, and suggests that early adopters have a unique opportunity to shape the technology—and their own workflows—as both individuals and contributors to a rapidly shifting technological world.

For a deeper technical breakdown and further experiments, Azeem recommends checking out his Exponential View newsletter.

wavePod

Showing you my AI chief of staff (OpenClaw practical guide)

Summary

Exponential View with Azeem Azhar

Episode Overview

Key Discussion Points & Insights

1. AI’s Economic Impact: A Personal Perspective

2. The Anatomy of Armini Arnold (RMA)

3. Daily Workflow & Use Cases

4. Emergence of a New Work Model

5. Human Judgment, Agency, and Limits

6. Individuals vs. Institutions

7. Advice for Listeners

Notable Quotes & Memorable Moments

Timestamps for Key Segments

Conclusion

Transcript

Summary

Exponential View with Azeem Azhar

Episode Overview

Key Discussion Points & Insights

1. AI’s Economic Impact: A Personal Perspective

2. The Anatomy of Armini Arnold (RMA)

3. Daily Workflow & Use Cases

4. Emergence of a New Work Model

5. Human Judgment, Agency, and Limits

6. Individuals vs. Institutions

7. Advice for Listeners

Notable Quotes & Memorable Moments

Timestamps for Key Segments

Conclusion