
Loading summary
A
Today on the AI Daily Brief, nine Codex tips from the Codex Team before that in the headlines. Yeah, we got a verdict in the Elon OpenAI trial, but that's much less interesting than Composer 2.5. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Robots and Pencils, Bolt and Zencoder. To get an ad free version of the show, go to patreon.com aidaily brief or you can subscribe on Apple Podcasts. To learn more about sponsoring the show, head on over to AIDAILY Brief AI or send us a note at SponsorsIDailyBrief AI. While you are at AIDAILYBrief AI, you can apply for our new Growth Engineer role. I'm going to be closing that soon, so if you are interested, get your application in and you can also find a link to register for the third cohort of Enterprise Claw, which is coming right up soon. Basically, if you get all excited about the Codex talk today and want to get that spirit of agent building across your company, that's what Enterprise Claw is going to be good for. But with that out of the way, let's talk Composer 2.5 and what it says about cursor in the AI race. One of the questions coming into this year was whether what Swix calls the agent labs, but which we now might call the Harness First Labs, these are companies like Cursor, Cognition, et cetera, would be able to compete on the model front. The concern for these companies of course was that if they were totally beholden to the models from the big labs, if those labs started to move in the direction of building their own harnesses as well, it could squeeze out the space for the cursors and cognitions of the world. And at the same time the cursors and cognitions of the world had something valuable in the form of the data exhaust from the usage of their platforms, which theoretically gave them insight into how people were actually interacting with these models, which could turn into a valuable asset for training their own models. Whether it could or couldn't, it was clear that this was a direction that they were going to start to head and that the space between these so called agent labs and the model labs was destined to close. The model labs were going to move into the harness space. The agent or harness labs were going to move into the model space. In January, CEO Michael Trull told staff that it was, quote, wartime, recognizing that Cursor's business model was being eroded from both sides. Claude Code was coming after them on the harness side, but they also couldn't keep eating the cost of serving anthropic models at a discount. With that in mind, he said that the company's number one priority was to build the best coding model this year. The release of Composer 2 in March was a decent first step, but it was still mostly about bringing down costs. Where users adopted the cheap in house model, it was mostly for simple tasks. However, it failed to drive new users to the platform. The just launched Composer 2.5 could be a different story. The model appears competitive on the key benchmarks. It scored 69.3% on Terminal Bench 2.0, which is just behind Opus 4.7, 69.4%. On Sui Bench multilingual it scored 79.8%, comparable to both Opus 4.7 at 80.5% and GPT 5.5 at 77.8%. On Cursor's In House benchmark, which tests more difficult coding tasks, Composer 2.5 scored 63.2%, just about a point behind both 4.7 and 5.5. Now the coding performance gets Composer 2.5 into the ballpark of usability for serious coders, but the bigger part of the story is the cost. Cursor is serving this model at just $0.50 per million input tokens and 2.50per million output tokens, making it half the cost of Opus 4.7 or GPT 5.5. They also seem to have delivered some big token efficiency gains. Their benchmark run on SUI Bench came in under a dollar per task compared to around $5 per task for GPT 5.5 on extra high settings, or $11 per task for Opus4.7 on max settings. That has led Cursor to claim that their model is 10x more efficient compared to similarly capable models now. Now Composer 2.5 is still built on top of the same base model as Composer 2, which is Moonshot's Kimi 2.5. That implies that the entire performance boost came from better reinforcement learning techniques and suggests that even if people don't switch en masse from Opus and GPT to Composer, that there is a ton of room to post train leading open source models to compete at the frontier, especially around these discrete tasks. Cursor also announced that they're in the middle of training a new model from scratch using Xai's Colossus 2 training cluster. They wrote with Colossus 2's million H100 equivalents and our combined data and training techniques, we expect this to be a major leap in model capability. Leon Lin summed up the model release posting, so basically we got an Opus 4.7 model that costs 10x less. I have to test this later in the day. He checked back in with the results, writing, pretty fast and efficient model. Does a great job. I'd say it's almost as Strong as Opus 4.7 or in some cases just at the same level. Cheap model, good at front end, still a bit generic design when used without skills. Analyst Max Weinbach agreed, writing, composer 2.5 is very good. It's good at doing more than just quick iterations of front end. Now I will probably use it over Claude and Cursor. Addressing what had changed from Composer 2, he added, Composer 2 was good at some quick tweaks, but I didn't trust it enough to do more. I trust 2.5 a lot more. Ellie from Prime Intellect noted that following the XAI deal, which you might remember is where XAI has the option to buy them for $60 billion, Ellie said that Cursor is now competitive with the Frontier Labs on both model performance and training computer. Certainly Elon seems excited spamming retweets to all of the excited posts about the model and to give you a sense of the type of opening that Cursor might have, especially if you're sitting there thinking to yourself, aren't companies all just signing up with either OpenAI or anthropic this point? Chamath Palihapitiya wrote, if you're running a consulting business and you're deploying Anthropic or OpenAI directly into your organization, I'm looking at you, PwC and Accenture. You're letting the fox into the henhouse. OpenAI and Anthropic are openly funding and starting competitors to you, while also using your usage to drive more success for them, this is not a failure on their part, but a failure on your part. Consulting businesses that understand this are adopting a control plan that allows them to arbitrate where tokens go and who generates tokens for them. Controlling the tokens is controlling the spice. Basically there is a lot more conversation around just how locked you want to be as an enterprise into these models, which creates an interesting open lane for the harness. First, companies that may yes have their own models, but are also ultimately model agnostic. Next up, Cloudflare has published their findings after working with Mythos over the past couple of months in what could be one of the most useful reviews of Anthropic's secretive new model Cutting directly to the chase, they write Mythos Preview is a real step forward, and it's worth saying that plainly before getting into anything else. We've been running models against our code for a while now, and the jump from what was possible with previous general Purpose Frontier models to what Mythos Preview does today is not just a refinement of what came before, it's it's a different kind of tool doing a different kind of work. And that makes a clean apples to apples comparison to early models difficult. Cloudflare goes on to explain two big differences in kind. Unlike previous models, Mythos is capable of creating an exploit chain rather than just detecting single bugs. That means it can synthesize multiple attack primitives into a functional exploit. Cloudflare wrote that the use of reasoning to build complex exploits makes the model work more like a senior researcher rather than an automated bug scanner. The other big change is the ability to generate proofs. Previous models were quite good at detecting potential vulnerabilities, but the they would rarely demonstrate an exploit. Mythos can generate functional exploits, making it far more useful as a debugging tool that doesn't simply generate a list of false positives. What's more, Mythos is able to test and refine its exploits if they don't work the first time, providing additional evidence to fix the vulnerability. Cloudflare said that some other models have been able to find the same underlying bugs, but would rarely go much further. All of this adds a huge amount of context on why Mythos matters. In the weeks after the preview release, many pointed out that other models could find many of the same bugs that Mythos identified. But as Cloudflare points out, there's a big difference between pointing out potential bugs and providing full code for a functional exploit demonstrating exactly what needs to be patched. Author Daniel Jeffries argues that this is the type of analysis we need around these tools, saying this is the kind of conversation we need, not idiotic ones, about the end of all software. We need what is the right answer because these models are coming and will get better. So how we put our heads together and make better and more secure software across the world and it can't just be patching the 100 or so projects that got access to Project Glasswing. That's that is not going to help the world. Lastly, today, one little bit of AI drama comes to a close, at least in this version. Elon Musk has lost his case against OpenAI and Sam Altman after three weeks of testimony. It took the jury just two hours to return a unanimous verdict, one which will be wildly unsatisfying to many and leave many issues unresolved emotionally, even if they are resolved legally. The jury found that the claim of breach of charitable trust was barred by the statute of limitations, meaning that Elon took too long to commence his lawsuit with it. The claim that Microsoft dated and abetted this breach also fell away. Musk's claim for restitution was also found to be barred by the statute of limitations. And with that, this big, massive, headline grabbing tech trial fizzled into nothing. Now, honestly, the trial was already weird even before we got to the verdict. In the weeks before the trial commenced, Elon abandoned his more ambitious claims of fraud to focus solely on the breach of charitable trust. He went to trial seeking two outcomes. The removal of Sam Altman and Greg Brockman as executives and a casual 134 billion in damages. But even with the reduced scope, the jury didn't need to consider the merits of the case, simply making their decision based purely on technical defects with the complaint itself. This fatal issue was itself an interesting microcosm of how the case played out. Musk had claimed that Altman and Greg Brockman conspired to, quote, steal a charity, linking these claims to Microsoft's $10 billion investment in 2023. However, OpenAI convinced the jury that Musk was aware of plans to form a for profit company as early as 2018, after being sent a term sheet describing the proposed structure. The trial also surfaced Elon's own thoughts on the matter, namely a 2017 proposal to fold OpenAI into Tesla to allow for commercial fundraising. The jury determined that these events began the three year limit for Elon to bring this lawsuit. Last week, the Verge did a pretty good job summing up the case, arguing that it, in their words, accomplished nothing but airing dirty laundry. We certainly got a lot of new information on what went on behind closed doors over the past decade at OpenAI. That includes the power struggle between Elon and other co founders, the fateful week where Sam Altman was ousted and then returned as CEO, apparently internally referred to as the Blip. But beyond that, there were no real answers. All we got was a three week interrogation into the characters of Elon, Sam Altman and other leaders, which, to borrow a phrase from a Miramoratti text message disclosed during the trial, was directionally very bad. Now the good news for the AI industry is that as blustery as this whole thing was over here, I don't think that most people in the world outside of AI were paying any attention at all to this? Unfortunately for us, broad popular opinion already has it that AI is just another tool for the rich to become richer and not something that's actually going to help their lives in any meaningful way, at least not at a cost that they're willing to bear. And so more evidence of the billionaires flinging their golden at each other really doesn't change that perspective. Do I think it would be better for AI if all of these leaders had a moratorium on speaking to the public about the industry? Yes. Yes I do. But am I also glad that at least this particular bun fight is done? Yes, yes I am. With that, I am so glad to close the coverage of that particular saga and move on over into the main episode. All right folks, quick pause. Here's the uncomfortable truth. If your enterprise AI strategy is we bought some tools, you don't actually have a strategy. KPMG took the harder route and became their own client zero. They embedded AI and agents across the enterprise. How work gets done, how teams collaborate, how decisions move not as a tech initiative, but as a total operating model shift. And here's the real unlock that shift raised the ceiling on what people could do. Humans stayed firmly at the center while AI reduced friction, surfaced insight, and accelerated momentum. The outcome was a more capable, more empowered workforce. If you want to understand what that actually looks like in the real world, go to www.kpmg. uSAID. That's www.kpmg.us AI Today's episode is brought to you by Robots and Pencils, a company that is growing fast. Their work as a high growth AWS and Databricks partner means that they're looking for elite talent ready to create real impact at Velocity. Their teams are made up of AI native engineers, strategists and designers who love solving hard problems and pushing how AI shows up in real products. They move quickly using roboworks, their agentic acceleration platform so teams can deliver meaningful outcomes in weeks, not months. They don't build big teams, they build high impact, nimble ones. The people there are wicked smart. With patents, published research and work that's helped shaped entire categories. They work in Velocity pods and studios that stay focused and move with intent. If you're ready for career defining work with peers who challenge you and have your back, Robots and Pencils is the place. Explore open roles@rootsandpencils.com careers that's robotsandpencils.com careers Today's episode is sponsored by Bolt New. Bolt New is agentic engineering on Multiplayer mode. Designers, product managers and engineers build in the same environment, and the design system agent keeps every screen on brand. No more Frankenstein UIs stitched from a dozen prompts. Whether you're shipping internal tools, moving from prototype to production, or replacing a legacy admin panel, Bolt New takes your team from concept to deployed app. One personal recommendation Hit Plan mode before you build. I had a project I'd half described in three different prompts, and plan mode made me actually think through it with Bolt New before a single line got written. It saved me from rebuilding the same screen probably about four times. Build better apps faster. Start with the link in the description so coding agents are basically solved at this point. They're incredible at writing code. But here's the thing nobody talks about. Coding is maybe a quarter of an engineer's actual day. The rest is standups, stakeholder updates, meeting, prep, chasing context across six different tools. And it's not just engineers. Sales spends more time assembling proposals than selling. Finance is manually chasing subscription requests. Marketing finds out what shipped. Two weeks after it merged, ZenCoder just launched ZenFlow work. It takes their orchestration engine, the same one already powering coding agents, and connects it to your daily tools Jira, Gmail, Google Docs, Linear Calendar Notion. It runs goal driven workflows that actually finish your standup Brief is written before you sit down. Review cycle coming up, it pulls six months of tickets and writes the prep doc. Now you might be thinking didn't openclaw try to do this? It did, but it has come with a whole host of security and functional issues which can take a huge amount of time to resolve. Zencoder took a different approach. SOC 2 Type 2 certified curated integrations, Tighter security perimeter, Enterprise grade from day one, model agnostic and works from Slack or Telegram. Try it@zenflow.free. Welcome back to the AI Daily Brief. We have had two pretty big think type of episodes in a row and right now as I record we are eagerly awaiting all the new goodies we're going to get at Google I O and so given that, I thought it would be a good day to have a bit more of a practical hands on kind of main now this of course is the year of the harness where where people realize that unlocking the true power of agents involves getting good at using the software through which you spin up and manage those agents. Whether that's something open source like OpenClaw or something from one of the big labs like Claude Code or Codex. Codex specifically has been on an absolute tear this year, going from almost no users at the beginning of the year to mid single digits right now, and as Anthropic has been forced to make some difficult decisions around their pricing model and move to cut off certain categories of usage that were previously being subsidized, particularly outside their own harnesses. And OpenAI has taken advantage to pull more of these advanced coding type users into the GPT and Codex ecosystem. What this leaves us with is a lot of folks who are digging into Codex seriously for the first time and figuring out all the ways to make it work for them. Now I had been planning a Codex primer episode, but over the weekend I saw this post from Jason Liu from the Codex team talking about the tips that have made his use of Codex really perform even better. The post is called Codex Maxing. It was published on Jason's GitHub and so what we're going to do today is extract the biggest insights from that post as a bit of a 101 to see the best practices in using Codex from some of the folks who built Codex. My presentation version of Jason's post was of course built with Codex. Now, by way of background, Jason said that for a while he had been using Codex for coding related tasks, but over the last few months it's really become an entire workspace for him. And part of what makes it valuable is that it replaces the single instance, give a prompt, get an answer of a ChatGPT style interface with with a broader complete experience that can hold context over time and do much more extensive types of work. The core of Jason's tips are nine practices that add up to one larger shift. The first is using long running durable threads. Now you might remember a few weeks ago when one of the recent Codex updates had a new system for compacting context. This is basically the way that on the backend the harness collapses and compresses the context from a long running conversation into just the key elements that it needs to know. Clearing out space in the context window to keep the chat moving part of what OpenAI insiders noted around that update was that the compaction system had gotten so much better that they could basically keep a set of persistent threads going that never lost the larger context of the thread and were able to continuously add it on. Jason's experience building a Chief of Staff thread was actually one of the examples that I pointed to as I talked about how you might use this new version differently. And to try to put a little meat on the bone here of why this sort of durable thread could be valuable. A lot of the features inside apps like ChatGPT are basically just proxies for memory and context. When you use a project and you add a bunch of files to that project, it's effectively about giving each new conversation in that project the ability to go draw from all of that context. But it still has to go draw from that context. It's not necessarily up to date unless you specifically maintain it. And the process of retrieving context from those files isn't always perfect. That's not to mention UI UX issues of having a whole bunch of different threads and chats going you that you have to sort through and figure out which one you are actually having the relevant conversation within. The idea of the mono thread pattern is to put key conversations about a particular topic all in one log thread, relying on Codex's compaction to allow that thread to be durable and persist over time. Jason's tip is not only to use this mono thread pattern, but to have a different thread for each of his key work streams. Which is not to say that every single thing that you work on is deserving of this type of mono thread, but that for key persistent work streams they often are. Tip number two is about voice. Now this is one that if you are a regular listener, you will have heard me squawk on about endlessly. I actually strongly advocate that anyone who's interacting with agents or basically doing any work on computers at this point download something like Whisper Flow as an improved version of their computer's native voice recognition. But with Codex you don't even actually have to do that because its internal speech to text system is basically the gold standard. For Jason, Voice is not just about getting the message out faster, it actually opens up a totally different type of relationship with Codex itself. The Art of the Ramble gives you the ability to provide much more backstory. It allows you to provide richer information about areas of uncertainty versus certainty. Is it allows you to explain what you do know, what you don't know, what you think you know, what you don't think you know. To name trade offs and to allow the AI itself to help you turn messy thoughts into something clear rather than having to do that all yourself. As Jason puts it, a lot of plans get better when the model has access to the messy version of what I think, not just the polished one. This is 100% my experience as well. Could not co sign harder on this particular tip. Tip three is an interesting one that takes advantage of a key feature in Codex to break the pattern a little bit of how we interact with AI. If you are a prolific AI user, you're probably used to an interaction pattern that goes something like ask for a particular output, that is prompt the thing, wait for it to do its work, and then once it does its work and delivers things, you figure out what corrections and changes you want it to make, and then that whole system repeats. But Codex's Steer feature allows you to do things a little bit differently, especially once you've got the first artifact that you're reviewing, you can actually be starting to build that feedback even as the tool is working. Steer is the feature in Codex that allows you to add or update the prompt without stopping the flow. Overall, among other implications, this means that you don't necessarily have to get the entire prompt perfect up front. Instead of this sort of brittle upfront planning, you can start a little bit more broadly with the overall goals and constraints, and then as progress comes in, actually steer the conversation so effectively you and the agent are working in parallel and you're not just stuck having to sit around wasting time on Twitter as you wait for the AI to do its work. By the way, voice is the perfect medium for this type of steering, because once again, as you observe things, as the agent is building, you can just ramble into them. You don't have to have a perfect constructed sentence typed out every time. Tip number four from Jason is about memory, and one of the things that's interesting about it is that even though Codex has started to introduce native memory features, you can go to settings, then personalization, then memories. Jason's argument is that while those things are quote, useful for stable preferences, recurring workflows, project conventions, and known pitfalls, they are not, as he puts it, a replacement for checked in instructions or an explicit vault. Jason's core argument is that work should leave behind structured memory, not just a longer chat. And so he's built a whole file system in Obsidian, which if you haven't used, is a simple file based note system that interacts with your local environment to, in a structured way, turn his threads into a structured set of context that can be called upon later. Talking about his durable threads, Jason writes a long thread can remember a lot, but that memory is trapped inside the thread unless the useful parts get serialized somewhere durable. The point of the memory system is to turn what the thread learns into an artifact I can inspect, edit and reuse. Jason also shares the specific structure of the vault he puts together with the top level agents MD markdown file that has instructions that say things like as you learn more about people, make progress on projects or close an open loop update the relevant pages in the vault. The vault, he says, holds rolling context around my work, people, decisions, open loops, daily notes, project state, and the bits of understanding that would otherwise get lost between threads. So for any of you who use the Personal Context Portfolio builder that I shared about a month and a half ago at this point, while that Personal Context Portfolio Builder was about putting together the broad context that you would take to any new agent experience, Jason's basically bringing that back down to the project base level in a way where there is a direct flow from the big threads where he's working on things into this vault that gets updated automatically. He also notes that he keeps the Vault as a GitHub repo, which allows him to also work in the cloud. This memory section is one of the most overloaded with insights, which is why I'm sticking a little bit more closely to what Jason wrote. For example, he talks about why the review step of seeing what the agent decided to put in the vault, that is what it thought was important enough to remember is a valuable step. He continues, I do not want Evergreen threads to quietly accumulate vibes in conversation history. I want them to write down what changed. This person prefers this. This project is waiting on that. This decision was made. This loop is closed. This is also, he says, why I like memory as files. Files force the agent to compress experience into a form that can survive the thread. If the thread dies, compacts badly, or becomes too expensive to keep leaning on, the useful knowledge is still there. At that point, pinned threads start to feel less like chats and more like different workers reading from the same notebook. So some of the other ideas for things that you could put into that memory include rules, taste. That is what good means for tasks that include design, writing or analysis, lists of relevant sources, anti patterns or what not to do, links to key artifacts, and more. Tip number five is about computer and browser use, although I think the way that my Codex interpretation summed it up as tools is a pretty good shorthand. Tools allow Codex to turn into an evidence gatherer. When you give Codex the ability to use your computer and use the browser, it can do things like read files, open pages, run tests, edit artifacts, check visuals and more. And understanding which tool or which environment matters for each different type of work is a key skill. So if the truth and evidence that matters lives in code, documents, logs, CSVs, slide files, PDFs, or other types of artifacts on your computer, that's where you're going to need computer use. When the artifact needs visual inspection or it needs to go check live documents or sources that live elsewhere. That's where browser use matters. And then of course, when the relevant information lives in other systems like Slack or Gmail or GitHub or Notion or Vercel, that's when you're going to use connectors. On the one hand, this sort of tool use is pretty obviously valuable, but it still does require a bit of setup that can feel like you're delaying yourself when you're just trying to get a thing done. However, if and as you move from thinking about Codex as just a different interface for the same thing that you would have used ChatGPT for previously, and instead think of it as the guts to an entire new work system, tool based access to whatever environments Codex needs to have the full context and do all the work it needs to do become essential. And speaking of Codex as a new work system, one of the biggest changes that is only just starting to emerge is the idea of being able to disentangle your work from physically sitting in front of a laptop or desktop. Codex is pushing hard into this area. First it had remote control, and now of course Codex is actually available as a full fledged feature in the ChatGPT app. And for most people the implication isn't going to be that they're going to do everything from their phone now, but simply that you can work more nimbly thanks to these remote control type features. You can capture intent while ideas are fresh, you can help redirect, or you can steer a thread without reopening the whole project. And in the same way that Jason's tip was that steering can be used to compress the time where you're waiting, remote control effectively does the same, but for much longer running work. If increasingly we have projects that take on the scale of hours, not just minutes, being able to steer them while on the go is a massive productivity enhancement, and so it's really worth taking the time to figure out the relationship between the full fledged desktop type experience and the remote controls that you can use to interact from mobile. Tip 7 is about heartbeats, and anyone who built an open claw will be well familiar with this pattern. Heartbeats are a recurring or scheduled check in that let the thread that you're working on wake back up. Heartbeats can be scheduled on a particular time basis, like every half hour, every hour, or they can be tied to specific triggers. A couple examples that Jason gives include his chief of staff thread. He has a heartbeat that every 30 minutes that thread checks slack in Gmail for unanswered messages to help him prioritize what matters most. This is exactly the sort of feature that was very common for the first early build in experiments in OpenClaw. Jason also gives an example that shows how setting up the ecosystem that Codex can interact with can make this sort of heartbeat even more powerful. Talking about an animation project, Jason writes, I had posted a video in Slack and asked Codex to check the thread every 15 minutes for feedback re render a new version when comments came in and reply back into the thread tagging the reviewer the Slack MCP server could not upload files, so the agent used computer to press the add file button and post the revised render anyway. In other words, what Jason is saying is that these Slack specific tools didn't have an upload feature, so he just used computer use to take care of that manually. The interesting part, writes Jason, is not just that it checked Slack every 15 minutes. The loop crossed tool boundaries Slack for feedback, remotion for the render omputer for the upload. That is when Heartbeat's connectors in computer use stop feeling like separate features together they become a feedback loop that keeps running without me sitting there. Jason's eighth tip in Behavior Pattern is around goals, although he fully admits that he's still working to figure them out right now. The TLDR of the slash goal feature, which by the way is now not only in Codex but also in Claude code, is that when you have a project that has a very specific, knowable and verifiable success criteria, you can use the goals feature to keep the agent pushing against that objective in a way that a normal prompt might just give up on. Now I'm actually going to skip over goals here, because later in the week, either as a main episode or as an operator bonus episode, I've actually got a full goals guide also built off of recent tips from the Codex team themselves. But suffice it to say that Goals is big enough for an entire episode on its own, as people really figure out how it changes the behavior pattern of interacting with agents. Jason's last tip is about the side panel, and this is one area where I think Jason is thinking about things differently than many others. He writes, the part of Codex I am most excited about is the side panel. It's easy to think of this as a place where previews happen, but that undersells it. The side panel is where Codex stops being only a chat app and starts becoming the place where work happens for him. He says it does three jobs inspecting artifacts, operating web services, and reviewing changes. And the reason that this is so important is that this is the space that allows him to parallel, process and work even as the other agent is working. The important thing he writes is not merely that codecs can generate artifacts, it's that I can inspect and annotate them without breaking the loop. And I think here it's worth taking a step back to recognize that the TLDR of this entire set of tips is is about exactly that, not breaking the loop. How, in other words, do you allow the agents inside Codex to keep working in parallel with their human partner, rather than it being an endless series of turns between the two? I think part of the value of Jason's tips in even just thinking about that as the desirable behavior shift. Which is of course not to say that a you're never going to have that turn based interaction with AI where you give it a prompt, you let it do a thing, and then you review it when it's done. Nor is it to say I don't believe that if you don't have your agent running 24 7, you're somehow not maximizing the value of the system. But for anyone who has found themselves distracted by the context switching as you wait for these ever more powerful tools to do ever bigger jobs, this sort of shift in thinking has a lot of potential to reintegrate those work experiences. So that's going to do it for our nine tips from the Codex team about how to Maximize Codex. There will of course be a link to Jason's original post in the show notes. Hopefully this helps you get more out of one of the most powerful harnesses you can be using. For now though, that's going to do it for today's AI Daily Brief. Appreciate you listening or watching as always. And until next time, peace.
The AI Daily Brief: Artificial Intelligence News and Analysis
Host: Nathaniel Whittemore (NLW)
Episode: 9 Codex Tips From the Codex Team
Date: May 19, 2026
In this episode, Nathaniel Whittemore (NLW) offers a hands-on, practical guide to maximizing the use of OpenAI’s Codex harness, leveraging a widely-shared post by Jason Liu from the Codex team. NLW organizes the episode around Jason’s nine actionable tips for getting the most out of Codex, transforming it from a simple LLM interface into a dynamic, parallel agent workspace for engineering and knowledge work. The episode also covers recent AI industry news, from major model benchmarks and pricing innovations to key developments in coding agent workflows.
Jason Liu’s “Codex Maxing”—the Practitioner’s Guide
(23:45 – End, with each tip broken down below)
NLW’s guided walkthrough, grounded in firsthand Codex team practices, reframes agentic coding from prompt/response chat to deeply integrated, parallel human+AI workflows. With concrete tips on threads, voice, steering, memory artifacts, tool integration, and persistent objectives, this episode is a must-hear (or must-read) for anyone seeking to seriously level-up their Codex or similar agent harness skills.