Transcript
A (0:00)
Welcome to the podcast. I'm your host, Jaden Schaefer. Today on the show, I want to talk about OpenAI fighting back against the giant onslaught of features Anthropic has been pushing some negative PR Anthropic has been getting, but at the same time an incredible new tool called Claude Design that just came out. I also want to talk about where some VC dollars are going in the AI space, some surprisingly interesting things there, and a new term called token maxing. In addition, OpenAI just massively beefed up codecs for desktop control, memory and in app, browser and over a hundred plugin integrations. Basically this is them swinging directly at Anthropic's Claude code and Claude cowork. And I think it matters a lot for where editing agents is going to be going in the future. So let's get into it. Before we do, I wanted to mention AI Box. The thing I keep hearing from people is that they're paying for ChatGPT, Claude, Gemini, perplexity, even Mid Journey. And by the time you get all of that added up to, you're likely 70 or $80 a month across a bunch of different logins. AI box gives you access to over 80 different AI models in one interface. All of this is just 8.99amonth. If you want to get access to it, there's a link in the description to AI Box. AI and in addition we have something called the AI Box Builder where you can essentially link together multiple AI models. We build up the entire workflow for you and you vibe via build tools without needing to know any code at all. I'm not a developer and I built this for other people that are not developers. So if you want to check it out, there's a link in the Descript AI box. AI okay, the first thing I want to talk about is a company called Factory. So this is an AI coding startup. They're focused specifically on enterprise engineering teams. They just closed $150 million Series A at a $1.5 billion valuation. Kosla Ventures led the round. Sequoia Insight Partners and Blackstone all were participating in this. The founder is named Matin Grinberg. He was a physics PhD student at Berkeley. He, he basically cold emailed sequoia partner Sean McGuire in 2023 and they apparently were good friends. They bonded over physics research. McGuire convinced him to drop out and Sequoia seeded the company. Their customer list already included Morgan Stanley, Ernst and Young and Palo Alto Networks. So obviously this is a very enterprise focused, you know, they're not targeting individual developers in any way. They're focused on the enterprise. Grinberg's pitch is why Factory, I think, is kind of differentiated from their model flexibility. They basically can let you switch between Claude, Deepseek, whatever makes sense. Although honestly, Cursor does that too, as do I think, most of the serious players at this point. What I think this does tell us is that even with anthropic and OpenAI and cursor already in the market, enterprise AI coding still has room for some category specific players. Morgan Stanley isn't going to let some, you know, random developer tool run inside their network unless it's built with their compliance and security posture in mind. And I think that's basically the gap that Factory is filling. And, and I think $1.5 billion in their current valuation says that VCs are believing there is a real gap here. Of course, Claude code is trying to get in there as well. And you can look at things like Cognizant just getting, you know, all of their employees, 350,000 of them, onto Claude and all the Anthropic tools. So I think there's probably competition from a lot of players, but it's interesting that they're carving out a niche there. Okay, the next thing I want to talk about is a brand new tool from Anthropic called Claude Design. It's a research preview. Right now it's available Pro Max teams and enterprise subscribers and is powered by Claude Opus 4.7, the model that just came out a day or two ago. So this is what Anthropic just shipped and basically you can describe what you want, a pitch deck, a one pager, a landing page prototype, and Claude generates a first draft. You've probably seen it kind of make web pages before. So it's interesting because you actually can use Claude Design to kind of come up with the mockups ahead of time and then you can refine it by either directly editing it or just talking to it. I actually appreciate both of these options as I've used a lot of tools. Like, I mean, I don't want to throw too much shade at Lovable because I know they do have some direct editing features. It actually never worked super good for me in the past. Maybe they're better now. But in the past, Lovable would have something where you could, you know, describe the website or whatever you're trying to build. It would generate the design and then you're supposed to be able to click on it and edit directly. It never worked. And what sometimes when I do the chat after I do that it would like undo them. It was just kind of bad. So I think Claude has cracked this a little bit better, maybe levels there as well. But you're able to export as PDFs, URLs, PBTX files and you can send the outputs straight to Canva. So Canva has a big integration with them and you can keep all of your collaborations there. It can also read your company's code base and design files to apply a consistent design system across all of your outputs. Which is actually, I think, the more interesting piece. If you kind of look at this technically, Anthropic is positioning this as kind of complimentary to Canva. They're not, they're saying like, look, we're not going to compete with them. It's a compliment. The target audience is specifically people who aren't designers, so founders or PM startup operators who need to make something look presentable really fast. What I think about this is that Anthropic is continuing to move up the stack. Earlier this year they launched Cloud cowork, then Agentic plugins for specific departments. And I think now this. I think where they're going with this is that they're not just trying to be an API company, they want to actually own actual workflows and surface area. It's the same play that OpenAI has been making. And I think you're going to hear more why this matters when we go into the deep dive later on in the episode. But also just shout out to Google, who has been doing this in basically every vertical. Google has Stitch, which is a very similar design tool as well. So, yeah, I think we're going to see a lot of these players get more into the software itself beyond just the models, which is pretty interesting. Okay, the next thing I want to talk about is token maxing. So there's a funny story on, on TechCrunch recently where they're talking about token maxing. Basically it's the pattern of companies and developers when they're bragging about how many tokens their AI coding tools burn through. As if, you know, the more tokens that that they're using means that they're more productive. I think the actual data that they've been crunching is very different. So tools like Claude, Code, Cursor, Codex, they're all generating way more accepted code at the on the first pass, 80 to 90% of initial acceptance rates. But when you look at the same code two weeks later, the effective acceptance drops to 10 to 30% because engineers were constantly rewriting it. Right. So basically what's going on here is when they're like, look, we're writing like all of our code, 50% of our code. A lot of companies are like, 50% of our code's written by like Claude, and it sounds amazing. And they're like, and it's all perfect and great, or, you know, a high level of it. And maybe that's true. And I'm not saying that that's necessarily bad. I use cloud code heavily as well with my startup. But what's interesting is I think people that pretend it's, you know, basically a marker of productivity, because what they found get clear. I was doing this specifically and they found that AI users have 9.4 times higher code churn than non AI users. And Pharaoh AI found code churn increased 861% under high AI adoption. And then you also had Jellyfish, which was looking at 7,500 engineers and found that the teams with the biggest Token budgets got two times the throughputs at 10 times the token cost. And I think basically it's just a reality check. Right? The productivity gains from AI coding are real, but they're also a fraction of what the raw output numbers suggest. Right. If you're like, I can write a million, you know, lines of code a day now, and I used to only be able to write this amount, okay, well, the. You definitely are getting real productivity gains. It's not an argument, but I just think it's important to, to check ourselves and you know, that the million lines of code are not all good because if you look at it three weeks later, a big chunk of it has to be rewritten or fixed, which is fine. I mean, a normal developer writes code and read, you know, and works on it and optimizes it and fixes it. I think senior engineers, interestingly, are less accepting and then AI of AI code than juniors, and probably because they know which parts are subtly wrong. Right. So when there's like a code push, they're less likely to accept that code push. If you're a manager thinking about how to measure AI roi, I think counting merged and shipped is important and not just how much is generated. Okay, let's get into physical intelligence. This is a Robotics foundation model startup. It just published research on a new model called PI 0.7. And I think this might be the most novel kind of technical story of the day. But basically what they're claiming right now is that PI 0.7 can perform tasks it was never specifically trained on by composing skills it learned in Other, other contexts. So the example that they're kind of highlighting all this, they have like an air fryer and the robot had only briefly seen this air fryer in training. So you know, wasn't trained to operate the air fryer in any way. It briefly seen it in training and I think it only seen like two short clips of an air fryer. Then they gave it a step by step verbal instruction and it figure out how to operate it. And in, in like some broader testing, the generalist model actually matched specialized models on jobs like making coffee, folding laundry and assembling boxes. Researchers said that the generalization ability was really surprising to them. Right. So basically you train a special, you train the robot specifically to fold laundry and yeah, it does good. And then they have a new model where it's just kind of a generalist at everything. It's not trained to fold laundry. But you explain step by step how to fold laundry and it does it just as good as the robot or almost just as good as the robot that was specifically trained on this. That is fascinating. And I think especially when you look at, you know, I would say quote unquote general models like OpenAI or Anthropic that they're building that can do a lot of different things. Generally good, it's kind of good news for them because you know, you, you may not have to have models specifically trained on just a specific task when he was talking about, you know, kind of like physical robotics and stuff. So on the business side, physical intelligence has already raised over a billion dollars. They were last valued at 5.6 billion. They're reportedly in talks to nearly double that to 11 billion. Their co founder Lacy or Lachi Groom has a track record of backing Figma, Notion and Ramp. So I think obviously this is why VC dollars are going to Lachi. I think something that is interesting, a caveat that I'll put on this, if I'm trying to be honest, like basically PI 0.7 still can't handle a lot of multi step tasks. Right. It's not doing this autonomously without any coaching. I think the robotics field doesn't really have a lot of clean benchmarks like LLMs do. You know we have like humanities last exam and like all these different benchmarks that we give AI models on, you know, like engineering and math and other areas and we can tell exactly how good they are at those tasks. There's not a lot of that with robotics. I'm sure there's going to be more as we get more into it. So basically for Robotics. So you kind of have to just trust whatever their demo is. But I think if this kind of generalized behavior is going to be something that we're looking at, it's pretty significantly stepping us towards robots that actually work in really messy real world environments. So this is something I'll be closely watching over the next six months or so. Okay. OpenAI has just released a new, a whole bunch of new features to their desktop app, which is called Codec. Something I've tried in the past, but have opted for Anthropics, Claude code and Cloud cowork in recent, you know, weeks in the last month. And I think OpenAI sees that and they really want to make a big push to win people back or to have people try OpenAI Codex for the first time. So huge upgrade to Codex. I'll walk through a couple of things that are new because I think opening eyes basically swinging directly at Anthropics Cloud code, which honestly has been really crushing it. Right. So this is what they added first. Codecs can now run in the background on your Mac, which is phenomenal. Right. It can open up applications, it can click around, it can type into your desktop while you keep working on something else. This is actually something I like. Cloud sort of does this. But I'm going to be honest, even with cloud cowork, a lot of times if I have like an automated task running, like I've got a bunch of things that I'm just like, you know, every day at 9am, do this, every day at noon, do this. And it's like grabs, like analytics or grabs data goes and gets me a report on something. So actually the thing that I love using it for is if there's no API for a service, I'll just have it go log into the account and go grab the data I need and bring it back to me. Hopefully Those companies offer APIs in the future, but for now that's what I do. In any case, it is annoying with Cloud cowork that lots of times when those automated tasks start happening, all of a sudden this Chrome browser pops up on my screen if there's, if it can't do it like in the background and all of a sudden it's like clicking on things right in front of me and I'm like swatting flies, like trying to get this thing away while I keep working on something different. Should I have computer, like possibly. But a lot of the times I just have it running on the side. In any case, OpenAI is trying to combat that and have it work on things in the background. So it's not just writing code in an editor, it's actually operating your entire machine. So this is what I'm excited about. Computer use from OpenAI, which they sort of have done for a long time. They were the OGs way before anthropic was shifting things in this, but they really just felt stale and bad. I've tried a lot in the past and Trust me, if OpenAI agents back in the day were, you know, like six months, a year ago, were as good as what Anthropic's doing now, I'd be, you know, shouting them from the rooftops. But it seems like now they're making a comeback. So they can also run multiple agents in parallel without interfering in your desktop, which means that you can have one fixing a bug and you can also have one running tests, one writing docs, all at the same time. They also have a new in app browser so it can hit web applications directly. They have 111 plugin integrations, so Code Rabbit, GitHub or GitLab issues. They have a bunch of new exciting things with their memory feature so it can remember previous sessions. They have an image generation that is now inside of Codex, which to be fair, Claude does not have any sort of image generation. They also rolled out a pay as you go pricing specifically for enterprise and business customers. So. Well, I think Anthropic is definitely ahead right now. I would say that the plugin ecosystem is probably part of one of the most underrated pieces of this entire announcement because they have like 111 different plugins at launch. They're going to be adding more. And with Claude, Cowork, I mean, it's awesome, but I have like maybe like four things, you know, my, my Google Calendar and Chrome like synced up. So a bunch of like Google tools and GitHub and. But like, beyond that, there's so many different tools I use that don't integrate very well with it. It's got to go and use my Chrome browser to access them. So anyways, I think a lot of these integrations that OpenAI is pulling in are going to be very useful. All right, that is the show. If you're getting value from these episodes, please drop a comment over on Apple Podcasts or leave a couple stars over on Spotify. You hit the about tab on Spotify to. To drop a review. Basically, the reviews help the show reach way more people. It boosts it in the algorithm. It helps it out a ton. If you haven't done it already. I would be eternally grateful. Also, if you want to consolidate the AI subscriptions you're already paying for, go check out AI box AI There is a link in the description. 80 plus models, plain English automations 8 99amonth. I'll catch you guys all in the next episode.
