Loading summary
A
And you said it's critical that this remain open. Why is it critical?
B
We've built our lives on top of software and we're kind of at a point where that software is owned by other people. Like companies whose incentives are to maximize profit. Agents. It gets a lot more intimate. Anthropic or OpenAI. They're the good guys right now. But if they have our data, all our memories, our whole life's work, they can convince us of anything, influence us of anything. That's a pretty bad world to be
A
in once you start feeding it. Your knowledge base, essentially your entire business. Now they can train their LLM on that. They have access to your entire, in my case, venture capital firm or media company.
B
We're going to give our digital lives and our digital identities to these companies and they're going to rent them back to us. These are like our digital selves.
C
We could have many copies of ourselves like that we fully own and control.
A
I don't want to rent myself back from Sam Altman. That is my personal black mirror. Thanks to our friends at PayPal, the exclusive sponsor for this Week in AI. Try the payment and growth platform that's trusted by millions of customers worldwide. PayPal Open start growing today@paypalopen.com. all right, everybody, welcome back to this Week in AI. This is the show where we get three amazing CEOs, founders who are deep into building AI and we talk about the week's news. Pretty simple format, me plus three. This is episode number eight of this Week in AI. You can subscribe thisweekinai AI and you can look for us on YouTube this week in AI. And what an incredible roundtable we're gonna have today. Kanjun Kyu is here. She's with a company called Imbue and they are making open source agents which can. June is my favorite topic of the moment. Tell us a little bit about what you're building and who's using it, how it makes money.
B
We really believe open agents have to win over closed platforms. And I think we'll talk more about this later in the show. But right now we're focused on building open agent infrastructure. So infrastructure for running lots of agents in parallel. So you can take a cloud code instead of having to run cloud code. You can run what's called right now something out there is called manager on top of Claude code. And now you can swap out Claude code with OpenAI codecs. You can swap out any model you want to. And that's what we want to do, is give people the power to Swap out these underlying models to commoditize the model layer so that the model providers don't have all of that power over you. And we go into a world where people have a lot more power and AI is a little bit more democratized,
A
then if people wanted to understand where you are in the market, you're a competitor to OpenClaw and you're building that layer.
B
Yeah, it's actually the market's a little complex. Most people don't see that there are actually many, many layers. Um, so right now we are not a competitor to Open Claw, I would say, or maybe a competitor to CLAUDE Code directly or CLAUDE Desktop, where if you're trying to use CLAUDE Desktop or CLAUDE Cowork, you could instead you as a developer use these tools at this layer right now. So right now it's still a developer layer for running lots of agents. We run hundreds of them, kind of autonomously, programmatically. It's like writing a program where an agent is a function that you're calling or an agent is in the loop. But yeah, the next layer we'll work on is the Open Claw layer or more the personal persistent agent layer. And I think just starting to get to the point where that's possible.
A
And you said it's critical that this remain open. Why is it critical?
B
If we think about what agents are, agents are software that we're giving our whole lives to. We're giving our memories, we're giving our workflows, we're building like our entire, you know, we're building our business infrastructure on top of agents. I'm building my life infrastructure on top of these agents that are like checking my email, checking my Slack messages, drafting stuff for me, et cetera. I'm convincing me of things. You know, I'm brainstorming with my. My stuff all the time, my agents all the time. And so in the last 10 years, we've built all our lives on top of software. And we're kind of at a point where that software is owned by other people. Like companies whose incentives are to maximize profit, which is, you know, good for them but bad for us, because their incentives are not fully aligned with us. There's this misalignment issue. And so with agents, it gets a lot more intimate. If Anthropic or OpenAI, they're the good guys right now, but if they have our data, all our data, all our memories, our whole life's work, they can convince us of anything, influence us of anything, and we are beholden to them. We're locked in and there's not that much competition, then that's a pretty bad world to be in as a human.
A
The lock in is the key there because these things, once you start feeding it your knowledge base, essentially your entire business now they can train their LLM on that. They have access to your entire, in my case, venture capital firm or media company, they learn all that. I increasingly want to be open source for the whole stack and open hardware for the whole stack. Just get me a Mac Studio or, or an Nvidia. I want all this to be independent because I agree with you. It's absolutely too important to be given to any one of these big companies. You're essentially giving them your entire business. Karina Hong, you're here and you're building an AI mathematician. Welcome to the program to verify code. Okay, now, math. I was a incredible B minus student, which means I could have done better.
D
Good.
A
I may not have to do better now in the age of AI. Like AI is just going to be so much better at math than any human being could possibly be. That what's the point? So why is this necessary to verify code with an AI mathematician? And who's using this and why?
D
Yeah, so we're seeing a massive amount of code being generated at an unprecedented speed right now. And the question usually of wipe coding is like, well, is code review or is testing up to speed? And then even better, even if you see all your tests passed in certain cases where you really want the thing to be safety or mission critical, you don't want to have an untested kind of case where it's missed. And in those edge cases, there's first a lot of value in actually verifying those, but also a lot of value in trusting that you don't have such a catastrophic outcome. So something called formal verification has been around, I think since 1980s. There are formal verification experts, I mean before the age of AI, who want to do program verification so they will formally verify the program generated. And this is actually quite made it into some really high profile large projects. Actually the Paris subway system, the automatic switching was formally verified. The labor trade union made sure to negotiate that thing because they were like, so trade union for technology, I guess. And then there are this other case where I think the European Space Agency, the Ariane spacecraft that was also part of it, was formally verified. And AWS obviously is pushing toward automated reasoning even before the sort of chatgpt time, for the last six years at least. And now in the time of agent, you want something to be 100% correct. Or make sure that it's functioning properly before you're comfortable for it to pass to other agents. And then you have 1 billion agents running. So I'd argue something quite, you know, provoking, which is super intelligence is meaningless if it's not verified. I don't want Schrodinger's super intelligence. So that's what we're doing. We're building an AI mathematician that always give you the proof of everything. So proof step by step. And this AI mathematician has actually got a 6 perfect score ever in the 100 years of Putnam exam history, which is the hardest undergraduate math competition. The first five are all humans. So this is the only AI that got, you know, 120 out of 120 trained on formal verification data code review
A
and manually testing or using an LLM to test your code base is not as strong or it's more brittle than doing a formal verification using mathematical analysis. Maybe you could explain just briefly that difference.
D
100%. So I'll give an example. Actually is code in a different way if you think about software but actually in the hardware space there's also code. So all the chip design, the design to verification cycles like 1-23-124, the design to verification team size also 12312, 4. So you basically have this. A lot of designs that are taking very long verification cycles and people are actually already using SMT based formal checking tools like Jasper Code go by Cadence in a way. You have a lot of constraints and you're trying to operate under those constraints to verify that actually the property is satisfied or a counterexample can be found. And, and we have been like, you know, trying to use the same AI mathematician to verify certain like circuits that are like not, not very large but is outside of the current capability of SMT based formal checker any, any searcher in the industry.
A
And who's your customer? Are you selling it into people making vibe coding software or is it developers and corporations with developers to be a backstop against these large language models and codecs, you know, hallucinating or making mistakes.
D
So we were selling to folks who have hardware verific. Then we're hoping to have distribution partners with like Cogen companies.
A
So they would say hey, we're going to put you into here. So you're selling into that enterprise which is then selling into many different potential partners.
D
Or call our API.
A
Yeah, all right. And Jonathan Siddharth is with us. He is the co founder and CEO of Turing and they provide data to all the frontier AI lab Models and then you do some sort of enterprise management of that. Explain this business. What type of data are you selling to these large language models and how are they ordering it? Is it reoccurring? Is it like one time projects where they ask you to get 10 lawyers together in a room to, you know, answer all these hard questions? How does it work?
C
What we focus on is accelerating superintelligence to drive real economic progress. And first we work with all the frontier AI labs to provide data to improve their models for coding, mastering all types of enterprise workflows and frontier step. So it's automating swe, automating everything a human does in front of a computer and automating frontier stem. Now in doing that, we get to learn the limits and capabilities of these models. We see the jagged edges of their intelligence. We take that knowledge and then go to enterprise. This is business number two for Turing where we work with some of the largest financial institutions in the world to build end to end AI systems to deploy superintelligence. So it's almost like a palantir for AGI. But when we build these systems, Jason, we actually see where these models and agents break in practice. And we do error analysis based on where they break and takes those learnings to improve the frontier AI models, make the model smarter, solve even bigger problems in enterprise, see where things break, make the model smarter, solve even bigger problems in enterprise. And we run this loop, we call this a superintelligence accelerator. Our view is that data and deployment is the moat and that's the key to actually making smarter models. And perhaps we should be using imbue systems in some of our enterprise deployments. Maybe an open agent architecture. Sounds great.
B
Yeah, I think they would be happier with that. Go ahead.
D
Sorry.
C
That's great. And Karina, like, hopefully. So some of the things that we do is provide data to systems that go into some of the most well known agent encoding systems today. And our data has also gone into making some of these models really good at math for many of the frontier models. So there's good synergies there. And Jason, to your question, you asked a lot of questions there about how does this work, is this recurring, all of that? So I would say it's reoccurring. As you can tell, investors really like it when I say that it's reoccurring, not really recurring. But I would say there's unlimited demand for high quality data. The scaling laws are continuing to hold, meaning more data, bigger model, more compute, means the models smoothly keep improving. It's increasingly harder to generate that data because as the models get smarter, the floor of human intelligence that's needed to advance the models also becomes even smarter.
A
They've scraped everything that could be scraped off the web, stolen, with permission, without permission. It's all been sucked into these models one way or the other. Even if you're using open crawl or whatever it is, there are web pages that are copies of web pages, archive is and the Wayback Machine. So all of this has been absorbed. So, so you need to come up with new sources of data. Do you record people's computer sessions and look for those mistakes or is it just at the chat window interface?
C
So there's new types of data. I would say there's three types of data. The first is higher export humans from every domain of economically valuable activity. So Jason, think of this four dimensional matrix. Every industry you can think of, financial services, life sciences, healthcare, retail, etc. Second dimension, every function, software, engineering, sales, marketing, finance. Third dimension, every role in the org chart, if it's finance, cfo, director of FP and a head of accounting. And fourth dimension, what's the workflow that a human does? A CFO may do the workflow of preparing for a board meeting, or preparing for an earnings call, or closing the books on monthly financials. But to automate that workflow, you need to understand what that workflow is. And you need data for either imitation learning, where a human demonstrates how to do that workflow, or data for reinforcement learning where you have a prompt which is a human talking to the model, and a verifier for how do you tell if the output deliverable was high quality? For example, Jason, when we were prepping, you were guiding us on, hey, what's a good marketing message, right? Like for a company, you are basically defining a rubric. You are basically saying, hey, don't have it. Be super verbose. Don't have it. Look, buzzwords. No buzzwords. Be conversational.
A
You should be able to. Your customer should be able to come up with that sentence. That's actually the key to describing a business. If you really want to describe any technical business, you just ask your customers, hey, write me a one word explanation of why you use your product. They're like, to get a cab at the airport to find an apartment with a kitchen in Tokyo so I can cook my own meals. Airbnb. Like, they will literally tell you what your product is and you'll be like, we're a global community of, you know, homesteader. It's like, oh my God, my head is spinning. We demand aggregate. And it's like you just rent homes. You rent cool homes. It's like, that's it.
B
Just do the unsexy thing.
A
Yeah, I mean, there's like, what you tell VCs or the press, and then there's what you tell customers or what customers tell you. And like, man, when you get customers just explaining to you why they did your accelerator, it's like, I wanted to raise money and find a co founder. You're like, okay, yeah, that's like, top two needs that founders have. All right, listen, there is a ton of news this week, and we're gonna get into it here. The first one is just Anthropic's insane run rate and where that money might be coming from. It looks like Zuckerberg might be the largest customer of Anthropic's product by far. Anthropic just reported that they hit a $30 billion run rate, up from 9 billion just six months ago. Here's what the financials actually look like. Here is a revenue chart that in my group chat with investors, people are losing their minds over because we've never seen anything like this. Here's Anthropic in red, OpenAI in green. And it seems that just the flipping has happened, and Anthropic is now selling more tokens than OpenAI, generating more revenue. And this supposedly has the team over at OpenAI on tilt. So let's just talk a little bit about Anthropic. Carina, what are your thoughts on this generational run, this revenue ramp that nobody can really fathom? Why is it going up this fast? Is it open Claw agents pounding their servers? Is it consumers becoming aware of it? Or is it really just there's so much coding being done and coders are hammering their APIs.
D
Yeah. I remember in 2024, when anthropic is working on coding, I have other friends who are at OpenAI, other frontier labs, and they kind of think of that direction as just another vertical application. I mean, it's no different from enterprise consulting, for example, or finance use case. But I think that coding is everything. I mean, software eats a word, and everything that is in the real world can be in a way controlled from the software stack. And in a way, mass is code and code is math, at least from our belief. So I really think that this coding and the reasoning capability, Anthropic has really differentiated itself there. Everyone that I know is using clock code or cursor, and I think that, like codecs, there are new releases. People have excitement but it's not enough for them to switch. So I think that this sort of coding as a very differentiated mode could explain a lot of that. Honestly. Claude has also been amazing in helping me write. I write, you know, articles. I sometimes write like, you know, poetry. I love literature. And somehow Claude is even better in writing than GPT by a lot. There's like a feeling of like, taste that, that really, I think, like shines through. So I'm just basically using Claude for everything at this point.
B
That's so interesting, including poetry.
A
So when you're doing poetry, do you say, write a poem or do you say, here's my poem, give me some ideas.
D
Here's my poem, help me revise. Weirdly, I find Claude able to come up with out of distribution, like actual creative suggestions that I was not, you know, I definitely did not find that for Gemini, for OpenAI sometimes. But it's not the kind that will surprise you and will wow you.
B
That's actually so interesting because I mostly use GPT 5, 4 for writing and I also use Claude very intensively. But my Claude just like is so good at reasoning and not very poetic. And so maybe this is like what happens when it stores a lot of memories on you and kind of drifts from the default model. But I haven't experienced that. One of the things back in 2022 that a lot of us inside the industry or inside the research field at least knew was that coding would make models better at reasoning. There was a very clear correlation if you train on code that the models improve prove at reasoning. And that's kind of been anthropic strategy the the entire times, like Dario's very focused on being good at writing.
A
Why would coding help your reasoning? I mean, I understand, hey, if you learn how to play chess or poker, you're going to understand strategy much more and chunks and these heuristics. But explain why code does that for reasoning, which is what a lot of work people are doing. Hey, tell me, make a business plan for me or analyze my financials. Why would code inform that so well,
B
yeah, it's kind of like what they say where in college if study stem, that actually still really helps you in the business world to reason through problems from first principles. And the reason structurally is that when you're training these models, they learn embeddings, these like abstractions on top of the data that they're getting. And when you are trying to learn how to code, it kind of learns these good abstractions for like, okay, this follows after this, which follows after this. And that does make sense. And we can verify it because we ran it and it worked. Whereas like this doesn't follow from this and we ran it and it didn't work. And so it gets really good fast training data. Whereas in the real world it's really hard to get good verifier data. So that's why Karina and Jonathan, I think you're, you probably have more to say about this. Your work is directly basically how do we verify real work and coding.
D
So I actually did Stanford Law School for like two years and I kind of like came in with absolutely no like liberal arts education. I didn't have like a single assignment during college where it's a reading assignment. I did math and physics. So it's all problem set somehow like for contracts, for tax, for bankruptcy, for corporations, antitrust. I was able to just basically ace the clause. It was literally the mass does transfer to these various sort of very structured legal reasoning in this case. But for cases like civil litigation or criminal law where it's more fuzzy fact stories, I were not able to do well. So I think there's something interesting there on the sort of verifiable data. So what we are doing with the AI mathematician is we take the formal mathematics route where we kind of train on computer programs for proofs. Now that's very different from how other people, especially Frontier labs, like foundation models, they train on chain of thought. That's informal for chain of thought. It's hard to sort of run it and then just kind of see how each step follows versus we use lean and, and this is a differentiated data. And what we found is on the phonemic exam we actually beat the top scoring LLM, which is deep seq, I think 103 or something out of 120. So it's the first time in formal math with far less compute, far less data budget, it was able to surpass an informal math. So you see a similar sort of thing with coding and general reasoning. But I'm sure Jonathan has lots to add all sorts of data kind of.
A
I'm curious also what OpenAI. What, what are the people at OpenAI thinking? Kanjan? Like if you have people you must know all over the industry, like if they're watching this where you know they've got infinite competitors from open source and they shut down sora, they get rid of this Disney deal, they're raising all this money. There's a report the CFO and SAM might not be aligned in terms of the build out. I don't know if you saw that story, with the amount of money being raised, they just raised over $100 billion. And then you add to that this New Yorker story, which felt like a bit of a nothing burger, but kind of rehashed all of the Sam Altman unique personality aspects of, like, he's a people pleaser, but then he's got a loose relationship with the truth, I guess would be how it was explained by his colleagues. Like, is this why people are leaving OpenAI? There's been a lot of defections. What do you think, Kendra?
B
You know, I don't know how much I want to say in public about this, but I think that there's a reason why you've seen all of OpenAI's top leadership leave consistently over time, including Dario, who was the first to start Anthropic, to start competing AI labs that are focused on safety. I think that says something very specific about, like, oh, we're building this very powerful technology and I need to do something different. That is not this at the company. I think one thing that's very striking is last year, folks were quite concerned about Google. That was the primary competitor. No one was concerned about anyone else, just Google. And they definitely didn't feel like they were winning by default. And now it's very clear that Anthropic has made it to that list. It's not just Google, it's Anthropic. And there's some questions about should we go enterprise, should we go consumer, where do we compete? So that's something that what would you
A
do if you were them, Jonathan? Because chatgpt is the verb. It's the Uber, it's the Google it. You know, it's very rare to get that status of I'm calling an Uber or I'm door dashing this. And they have that in Consumer, and it seems like they're now ceding that to Gemini or. And now they're going to get rid of the consumer products they were working on. They were going down the Disney path, going to maybe make it even more accessible for people. What's the strategy, you think, over there? And then buying a nascent podcast. Lots of weird moves that would lead one to believe that they're doing tons of side quests and then shutting them down. What's your take, Jonathan? What is the discussion at drinks at night or, you know, at pickleball or whatever when people are, you know, in and around these companies and have offers from them?
C
Firstly, Jason, I have a ton of respect for both Sam and Dario, and I think the Key is coding and enterprise. If you. I think that's the answer to a lot of this and that's the reason for all this crazy growth. The focus on coding and enterprise and coding is key. Kanjan laid it out really beautifully, like where when the models get better at coding, they get better at out of domain tasks in reasoning, math, stem. We don't fully understand why and people believe that it's because there's something about coding that teaches you to think algorithmically, teaches you to think step by step. Coding is lower ambiguity compared to other natural language. So there's been some research on how there is transfer out of domain with coding with nothing else. This is true, I think coding and math. So I'm glad Karina is working on automating math. Coding and math I think are the key. The second reason, Jason, is when you win in coding, it actually accelerates AI research itself. What's the number one thing AI researchers do? Write code, implement papers, execute new ideas. So if you win in coding, you will just improve at a faster rate. That's reason number two. Reason number three is a lot of things that don't feel like coding are actually coding. For example, when you ask a search engine a question like, hey, what are the key trends in AI investing in 2026 relative to 2025, what that search engine might be doing behind the scenes? It'll come up with a plan. It'll execute a query to PitchBook, it'll execute a query to Crunchbase, write some Python code to analyze the data, use Numpy matplotlib to like plot the results and share it back to you. That's the model writing code. So without you thinking about it, yes, a CFO that's preparing a board material could be writing code to calculate stuff like ltv, cac, Magic number, all these fun things. So coding is really, really key. And I think both OpenAI and Anthropic are really focused now on coding and enterprise, which is wonderful. And in consumer, I feel like the feedback loops are fairly well established. There's ChatGPT, there's Gemini, there's a few others. And when you have that feedback loop, when you know what real humans ask your AI assistant and you know when you did a good job, when you didn't do a good job, you know when to work with companies like Turing to generate data to improve in the areas that you're weak in. And in the same way that Google had a lead in search a while back, relative to Yahoo and Bing, Google's improvement rate was just Faster because they had more usage. In consumer. The flywheel is set in enterprise. The flywheel is wide, wide, wide, effing open. It is open and it doesn't transfer as much. The person that wins in financial services may not be the person that wins in life sciences or the person that wins in healthcare, but you gotta deploy, you gotta let the models touch reality.
A
And if we are getting to the end game in terms of the amount of data and the training, and then it becomes a game of how do we make the tokens. Karina. Less costly? Because that's what everybody's looking at right now. And there's a report, we'll pull it up here, we'll show it on the screen. Just there's some leaderboard contest inside Meta for who can burn the most tokens. Which is like giving a marketing agency, like who can buy the most billboards and ads and TV commercials and that's their success without ever looking at what did those commercials and billboards actually do. There's a report that. And I don't know if this tweet is correct, but back of the envelope math, they're like, oh my God, is meta 5 billion, 10 billion of this $30 billion number? And what is Meta thinking here when you read this story, Karina? Is it incredibly stupid or is it brilliant? Because if your team embraces this and burns through a bunch of tokens, well, at least they're using the tools. You got everybody off the bench to use the tools. So maybe it's just like a crazy thing that you're just immersing everybody. Here's a tweet from John Chu from Coastal Adventures on Twitter. Plenty of my Meta friends told me folks have been building bots that just run a loop, burning tokens as fast as they can due to this policy, an absolutely stupid policy, and is similar to how Meta uses LOC to measure engineering output. Managers are supposed to use it as a proxy and dig in to understand more complexity, but plenty of managers are lazy and just don't. Karina, your thoughts?
D
Yeah, that's fascinating. I mean like I'm looking at this sort of like, you know, comparison marketing team who spent the most money. Oh man. I guess like buying like Michelin star restaurant, like code the whole venues table is what we'll do. But I don't know, I think Meta this is a way to I guess, force more AI usage and especially given I think the sort of push to really innovate and use all the AI web coding tools. But any metric can be gamed. I Mean, this is just something where people have been realizing since the beginning of time is anything that you say is a metric and especially if it's ties to a performance review psc, then it's really survival game, right? And like in a way, I think the big tech and especially the layoffs has been kind of in a squid game. I think things are rough there we see the one direct indicator is the real estate market. Like the housing prices of Mountain View is like really going down and it's like so much worse than last year this time. I don't know, I think a lot of the sort of, you know, senior folks at Meta could use a little bit push of that. But also those ones beyond level five, they almost have guaranteed tenure in a way. Even if they kind of mess up one project or another. It's more like the junior folks sounds like those that really have this hacker spirit and build a loop that's just interesting.
C
Humans reward hack too.
A
Yeah, well, I mean if you show me an incentive, I will show you an outcome. And if the incentive is I'm being judged by this, then all you have to do is if you want to save tokens, you say, hey, speak like a caveman. Just tell me if you got the task done, yes or no. Don't explain all the details, don't show me your work, just confirm it worked, yes or no. And then you could just say, hey, talk like an MBA bullshit artist. And it would just give you all this flowery language, you burn up millions of more tokens. Somebody said the way they were getting clawed down 80% was to just have it talk like a caveman this week, which I thought was pretty interesting in terms of the agent space Kanjun. So explain what you think is going on here in terms of the embracing of these tools. And then maybe the difference between the levels of developers, is this actually making a level a junior developer move up the stack in terms of ranking or is this AI slop? Like what's the reality on the ground? Is it making bad developers appear better, but they're not actually better right now? Like the game on the field in
B
2026, we actually went from having one product to 10 products in January and we split our team to be basically max three people per project. And it's because coding agents are at a point, cloud code in particular, but also codecs actually are at a point where you can write high quality code with them, like very high quality code by default they will mostly still write mostly slop, but with good infrastructure for running these agents you can actually do a lot autonomously. So our team probably spends like, you know, about one FTE per team on token burn.
A
Got it. So for every three developers, there's essentially a fourth in tokens.
B
Exactly. And the.
A
Does that seem normal to you or appropriate, or does it seem crazy?
B
I think it's. I think it's actually low. Like, so right now we're at a point where. So part of why we build open agent infrastructure is it accelerates us. Like you were saying, Jonathan, training models for coding accelerates researchers when they're trying to train more models, they write mostly code. Well, building agent infrastructure, that makes you much better at coding and makes these coding agents run for a much longer period of time. Like, we think general agents are mostly just going to be writing code a lot of the time to do what they're trying to do, because that's one of the most reliable ways to accomplish a task is to write code and then verify in code. And so a lot of what we do.
D
Yeah. Can you measure successful token counts? Like, you know, is there a way for them to refine this metric, like in your, in your expert view, you know, to measure like those effective tokens, whatever that means?
B
Yeah, we actually, I mean, we're a small team, we're only 30 people. So we measure internally by like, each team tries to figure out, like, how long can we autonomously run these agents and have them produce PRs that don't need any review or editing. So we are looking at like, how many like, edits are made in a pull request. A pull request is like when you submit code, um, and like one of our teams is running. One of our teams, my CTO is running these agents overnight and in the morning, waking up to about 60 or 70 pull requests and reviewing them. A lot of them don't need any changes and just pushing them. He's gone from. He's like writing almost 10,000 lines of code every day right now. I mean, lines of code are not a good metric, but I mean, that's just dramatic. That's like.
A
And do you have a fear that we're now, because of this abundance, going to make incredibly over complex pieces of software as opposed to elegant pieces of software, because constraint made people write better code. Right. You have only a certain number of developers and you have to be thoughtful. Hey, these 12 features, we're not going to do them right now. We're going to focus on these three features. What I'm starting to see is people can build so fast. They're building monstrosities in terms of products. So how do you think about managing that?
B
Yeah, it's kind of like saying like, oh, because we now have a lot of consumer products in the world and we buy things from Amazon, do we build overly complex houses or interior design as opposed to like really simple, bare bones interior design is the answer is like, yes, but it's like much more expressive. You know, as we've run experiments like refactor the entire thing automatically in 24 hours after writing tests, like full test coverage on the thing. That's something you can do now so you can make your product really complex. You can have scope, creep and add a bunch of features and then you can just redo the entire thing based on what you learn pretty quickly. And so I think what we actually see is not junior engineers, senior engineers, junior engineers are being helped, but the senior engineer knowledge of how you design a system is still super, super critical. I think those, you know, those things, the improvements kind of even out. But overall the workflow for writing code is changing dramatically. And I think this time next year you'll see that the workflow for software engineering is super, super different. And how you do something else, how
A
will it be different? And then Jonathan, I'll go to you to get your input on this. But Ken June, how is it going to be different? If you had to make a prediction, what in the process is the key change next year?
B
Yeah, one thing we're playing with is can we grow code so, so can we have models be the ones like growing and maintaining code? We're playing with something where you can like text the agent and the agent will, you know, be writing the code and evaluating it fully and writing tests and then coming back to you. And so like people call it like, I think it's more interesting than just being a maintainer of a project, but like, can the model be the one to grow the project and experiment with things and try things? I think that's something that we're going to see more and more of over the next few years, that a lot of code is actually generated by the model. I'm curious actually, Jonathan and Karina, how you use coding agents and how does that affect your engineering workflows?
C
I love how at the bleeding edge you are with, let's call it AI assisted engineering. What is your advice for engineering managers doing product building today in the post Claude, post cloud code agentic era? Maybe top three takeaways from the way you run engineering. You said something about like you running like thousands of Claude code instances. I feel like you and the cloud code team might be at the bleeding edge. And Karina would love to hear how you run engineering at your company. I'm happy to share some stuff on my side as well.
B
Yeah, one of this is not a product pitch, but we actually just open source our infrastructure for running these agents. It's called Manager mngr. We open sourced it last week and it's a library for orchestrating agents. It's like a very simple library. So it lets you say like encode things like for every text test in the last week that had a flake, like, you know, sometimes it passes, sometimes it doesn't pass. Fix that test. Or for every user flow in my workflow, fix that test that user flow every time X happens. So you can write these like, you know, agent programs where agents are part agent running is part of the program. And the reason we open sourced it is because we really want other people to be building on this like more open infrastructure. You can swap out the models underneath, you can swap out whatever you want, you can add your own memory, whatever it is, like store contacts locally. You get to keep your own data, enterprises keep their own data. And so this lets you kind of programmatically string together agents and also programmatically verify. So we'll run this with stop hooks that say, don't stop until you've solved the entire problem. Don't stop until. Our verifier, we have another open source product called Vet. It's a verifier for verifying coding, you know, agent conversation history. So it's like, and also other bugs and code that they produce in the pull request. So it's like okay, if Vet finds any issues, like fix the issues before you submit the print. So you can do a lot of these like if else kind of things programmatically. And that's what makes it really powerful. You don't want to be running the agent manually. You want to actually programmatically build up this system for you. And I actually think engineers and we, in the best case we have exoskeletons where like we slowly built up this exoskeleton over time of like how to do the kinds of cool things that we want to do, like the knowledge work we want to do, the coding we want to do and we can build that up step by step, block by block, programmatically. And that's what we're trying to help people do. Yeah, that's what we do internally at least.
C
I feel like the process of developing software has just changed dramatically in like the last two or three months. Right. I feel like you should write a book. You should write like a book or a paper on like, what's the.
A
We're hearing it across the entire portfolio and we're seeing it in products that people are pitching us, which are, hey, I think it comes back to the recursive stuff that Karpathy was sharing where he's like, okay, well, what if it tries an experiment? Well, in the consumer software space, you might want to try a feature and you might pull that feature from the feature request that people are commenting. Or you go to a subreddit and it says, man, I wish this product would have a way to split fares for Uber or doordash or group ordering. And it's like, okay, I read on Reddit and in our customer support logs that people want to split bills. Okay, let me research that. Let me write the code. Let me put it in there. Here's Karpathy's running LLMs over a couple of hours or a couple of days, it looks like maybe two days there and making it better. Yeah, that scientific experimentation and humans are just really bad at keeping it up for a long time. They just. When you're running a startup, you just eventually like, oh, God, I'm so exhausted with these customers. I'm so exhausted with this product. And people can't.
B
We're not meant to be factors of production in an economy. You know, we're meant to be living creatures, not like a productive item. But we've been made into these weird productive items.
A
Yeah. And I mean the entire economy, for the, at least for our lifetimes has been. How many people out of, you know, can we get 5% of the people to do stem to write this code? And can we keep them from having any other life but submitting this code and making these products for the other 95% of people? Now it's like that whole career is going to just be abstracted into a cloud of agents.
D
Yeah, it's fascinating. I think, like from the sort of like hiring perspective, I had this interesting realization where so we hire like a team of mathematicians. You know, we open up a London office, a bunch of people are programming in Lean, you know, in Imperial College London community. And so we hire them as like math experts. Like there's sort of our in house, you know, data labeler. And they also like, look at like, you know, quality of output. But then we realized at one point they all start like wipe coding and hacking our like axiom prover. So they kind of like, you know, just like shifted from like theta labelers experts all the way to developers without us noticing. And what have you been doing? I'm hacking Turkey and Turkey is the internal code name. So it's just fascinating. I think a lot of teams are kind of interviewed and built in the way of how software engineering used to work. I mean you have leetcode interview now. I don't know, I think I heard places like cognition getting rid of those sort of leetcode interview and switch that to either a work trial or long consecutive hours. Show me something you can build with all the tools available to you. And that's just an interesting change. I think people, interestingly, I think talent market wise, there are two groups of people who are extremely important. One is the most engineering of the engineering folks. Those are the infra guru and they are just in hot pursuit. And there's this other research scientist type talents and maybe not coding machines, but now they because of the algorithmic mathematical way of thinking. And I think Eric Schmidt recently made the point is that people who grew up doing math are going to be more and more in hot pursuit in the software engineering market because you are going to have all the tools available to you Almost like automatically 5 research engineers to join your team where you are the lead scientist. I think it's interesting how these two groups of talents, one very low level, one very high level.
A
Jonathan, the velocity of this commodification is stunning because if we were sitting here a year ago, we were talking about making developers, are they going to be 5% or 15% more efficient? Because it's auto completing the next sentence and oh, how much faster will they advance in their career? And now we're kind of moving to well, everybody's obviously a developer. And two questions for you, Jonathan. How are you architecting it? And then I want to go around the horn of what's the end state here? What happens in two or three years if these things are recursive and able to run themselves? What are we all doing in the world when it comes to software development? What is the human role in all of this?
C
I feel like the folks who are starting companies today are so much more advanced with how they do it. This is a shift. It's a paradigm shift.
A
It's a literal paradigm shift. Yeah.
C
And I'll share what I'm doing differently at Turing. So. So firstly, like I've, I just feel like. So I'm trying to automate the job of the CEO at Turing. Right. I'm sure you, both of you are probably doing the Same thing.
B
I'm certain, very excited about this.
A
And it's a lot of chores being this, a lot of chores.
C
And I did this like over a weekend with cloud code. That's how magical this was. So I have this cadence where with my exec team, I used to have a chief of staff where I would have to think of what are the most important topics we should be discussing at the exec team level every week at Turing. And I would try to get what's red, yellow, green and turing across each of our customers. Especially what's yellow and red and what actions do I need to take. But I am fundamentally token constrained, right? Like I cannot read through hundreds of projects going on at Turing at any point in time. Thousands of pull requests happening last week. What did we ship last week? Like we have a 100% engineering team. There's a lot going on. I don't quite know what's going on. And I was getting information filtered through layers in the org chart, like managers telling stories, others telling stories, lots of synthesis. So I was like, f this, I'm going to go direct to the source. I'm going to look at actual dashboards, data in Salesforce, data in Jira data and GitHub, all these raw inputs. I feel like the truth is in code and people talking to customers. That's where ground reality is. So I built the system which automatically pulls in all of this and creates like a daily brief for me every week. What's red, yellow, green across all our customers? What are the topics we should be discussing as an exec team? And the topics are so good. And the exec team is finding it useful, right? It's. They are getting more information.
A
It's the virtual chief of staff.
C
It's the virtual chief staff.
A
Chief of staff who doesn't sleep and has access to all information. Yeah, that's the key is when you give openclaw, which I did, like root access to Google Suite, I can say when I'm talking to somebody Cir and I made a little skill check in report and I just say when somebody's starting a conversation with me. I'm like cir5. And it just gives me a summary of their email by category, by priority, their meetings and their start of day, end of day reports in Slack. And it's like, okay, I didn't have to waste 15 minutes, you know, just asking you and being a detective, what did you work on this week? Or what's your priorities? It's like your priorities are what you did. So the whole game of interpreting and gaming each other. The employee trying to game the CEO. What does the CEO want to hear? Why are they asking me these questions? The CEO trying to pick apart without accusing the employee of, you know, just fucking off and not doing any work. Like, you asked them, like, what did you do? People are like, why are you asking me? Am I getting fired? Or are you taking a project away from me? Just. It's human nature.
C
But when the elon. It's the elon. Like, my underlying prompt is, what did
A
you get done last week?
C
Last week.
D
But I'm new here, so are they okay giving you guys that access? I don't know. I feel like we're new, small.
A
Yes. I can explain this to you, Karina. Yeah, it's very simple. When you work at a finance company or you work in customer support, there is no expectation that what you're emailing, what you're dming, what you're doing on your work computer. There's no expectation of privacy with that. If you work at Goldman Sachs, they know every single thing you type is tracked. Because when they want to fire you, they find you saying some stupid stuff on the Bloomberg terminal or in the Slack, and they record every phone call because they have to make sure that if something goes south, they understand, like, oh, you shorted the stock. And it's like, yeah, here's the phone call, here's the confirmation. It's a highly regulated industry. The benefit of knowing everything going on on the computer, whether it's on Slack, whether it's on Zoom, is just so great for a manager. And then you just have to tell your company. You just tell your company, listen, this is going to advance the company. Don't do anything on your work computer that's dating or your Coachella plans or, you know, whatever you're trying to do with peptides, like, or memes like, that doesn't belong on a work computer. You have your personal phone over here, you have your work phone. People are fine with it, I think. And if you're not fine with it, by the way, the paradigm just shifted. It's over. Every CEO is doing this right now. We had a CEO on the pod last week. He said, I'm. I created a script just to tell me every decision made in my company. So this was like a very interesting angle of attack. Every decision that was made and summarize the debate. Give me the decision made and I'll just. And then rank them by importance, whatever. So now the CEO on Saturday can just say, okay, they made the decision to Invest in this company to not invest. They decided to go Prada on this. Pro rata on this not. It was Nick Harris from Light Matter who was on last week's episode. This week in AI, he's working on photonics. It's incredible what he's working on. But he just said, I just need to know all the decisions in this fast moving company. So tell them. It was almost like you pulling up pull requests or whatever it happens to be. So, Karina, I give you permission to have God mode for everything. I mean, you have God mode for all the code written, right?
D
Yeah.
B
It's interesting. And Karina, everything is recorded now. Everything's in Slack. Everything's recorded. There's a middle ground.
C
There's a middle ground which is, for example, in my system, I call it Enigma. Like the, the.
A
The.
C
There are like 10 different meetings that happen that I actually should probably be on, which are like account reviews and so on. But if I did that, I'll just be in meetings all the time.
D
Right.
C
So we use Granola. So the meeting notes are transcribed. That feeds into this like in a meeting. And in meeting there's definitely no expectation of privacy, at least for the CEO and others. And specific Slack channels like where the actual work happens, like humans talking to each other. So I wanted all the tokens of human interaction. So granola gives me that. And then in each meeting there's usually an artifact like a Google Doc with what are the discussion topics, what was decided. So those I think are fair game. But as Jason said, I mean, it's in a company, I think it's all good as long as you're open about what's being tracked. Just let them know.
A
People who want privacy. In a corporate environment like this, it's one of two things. There's a valid reason, like it's HR data and it's people's compensation or something. But even then, for the last 20 years we've tried to have an open compensation philosophy in Silicon Valley. So like you don't have this weird stuff where rebels.
D
FYI.
A
Yeah, yeah. It's kind of trying to be more transparent. Makes everybody get focused on work and not focused on trying to game the system. And the second group of people are people who are around. That's it. Like people who are messing around, not doing their work or remote worker. And I had this happen. We literally installed a piece of software on the computers because we're a finance company that just tracked like to make sure people weren't exporting stuff on thumb drives because you could just export on a thumb drive. In your case is the entire code base. In my case, every legal document we've ever signed, right. Or, you know, somebody's due diligence when we invested in their company. Like, this is pretty particularly dangerous, right? It might have, like, very sensitive information about a board meeting where there's a lawsuit going on. And that's not a public lawsuit or threats of a lawsuit. This is like, really crazy stuff. And I just found out, like, some people were working an hour and a half a day. It was like, really disappointing. Or the people who were like, saying that they were, they needed a raise. Like at somebody's like, I need to make $200,000 a year. I need to double my salary. And I was like, okay. And then they were like, hey, boss, she's working 30 hours a week. This other person's working 60. Doesn't make any sense. Like, that'd be completely, profoundly unfair to double their comp. But other people are doubling their effort and they're more effective. So. So it's just, you have to decide how do you want to run your organization? Do you want to have an elite organization like an NBA team? NBA teams can June. They will track people's blood work. They'll videotape them shooting shots and say, hey, here's how to increase your shooting percentage by 5%. If you do this protocol and they're tracking their sleep data. It's just the nature of business today. I think it's kind of over.
C
Jason, do you envision, as an investor, will you be looking at this type of data in companies you invest? If you could do this type of, like, you could kind of, I mean, how many times they use the word align? Stakeholders, like, don't invest in companies that use the word align.
A
I mean, it depends on the relationship you have with your founder. So for me, as an early stage, like, high trust, you know, backer, you know, as opposed to like a late stage person coming in and wanting to mess with the business for some personal gain. I'm like, I'm always so early. 90% go to zero. I make my money off of like, you know, an Uber or, you know, Robinhood. Every hundred investments. So I don't sweat the small stuff anymore. But yeah, for hand wringing, VCs who lose their minds and want to send CEOs and management teams on crazy adventures to do tons of reporting. Like, oh my God, that makes you want to like, shoot yourself in the head. As a fellow board member who witnesses like that, I've had to Take multiple junior, you know, associates on a board and just say, stop giving more work to the team. Like, let them cook. All that matters is what their customers are saying. And like the talent level here. Like, let's focus on those two things as opposed to you doing this, like, crazy churn report and da da, da, da, da. Yeah, I don't think I need access to that. But most founders, I just tell them, come to me with the hardest problem. So if you think about what this technology will enable you to do, people are taking entire board decks and board notes. This is investors I'm talking about. And they just say, prepare me for this board meeting. What are the questions I should be asking? So it's happening on the other side. So you all are preparing decks with AI and projections with AI reports, analysis. And then on the other side, the VCs are saying, okay, this is a 200 pages worth of board materials. Tell me what's the most important things here? What should I be asking? And so eventually there will be a board member who's AI. You will put a C3PO on the board, you'll put a replicant on the board, and they'll just be the greatest board member ever.
C
Yeah, I think we should use Imbue. I think we should use Imbue and we should all let the AI agents have board meetings with each other.
A
I think a shadow board is actually you can use.
C
We can have a board meeting every day.
A
Like the.
B
You can have these agents. Agents can tell you how your company is doing all the time.
A
Kanjun. It sounds crazy, but I think creating a shadow board and saying, this is the legendary VC who's made more money than they could ever spend and they understand the essentialism of creating a great company. This is the tactical board member who understands go to market better than anybody. And, you know, this is your finance board member. They understand how to work with CFOs and sales directors. And you just have them grinding every day instead of every quarter, every day on your startup and just being in your ear. That's a genius idea.
B
Actually. That's really interesting. I have a version of this, but I love this idea of making a board. I have a version of this that's a group of advisors that are kind of, oh, this is what this person would say. What this person would say in my advisory group. And I'll ask them every week, roughly give a. Give a bunch of data on what I'm thinking about and get their perspective on each, you know, their AI mentor perspective. And it's really useful.
D
That's, that's fascinating. I heard there are like people on Chinese social media, they're saying that people distill their colleagues and people distill their like boyfriend and ex girlfriend and like someone trying to like distill their colleague to get that colleague fired. Which is like, like, it's like dude, Black Mirror episode AI to like Black Mirror.
B
That's right. That's right. I was using this for White Mirror.
D
That's the extreme of sort of this. That would be a little crazy. It sounds like a dystopia.
A
So I had asked this question earlier, like what's the end state here? I have some ideas around the end state as an investor and as CEOs and that whole dynamic of the startup game. I want to put that aside for a second and just say what's the end state for product production and software development? Where do we see that? If this continues in three years, like two years ago, we're talking about making developers 15%, 20% more effective each. Oh my God. Every, we're going to Get a free 5th developer if this keeps up or we're going to get twice as many developers. Then we're like, well each developer will have a 10 person team in tokens who are agents doing all the work for them and they're just going to be doing air traffic controller or whatever. Okay, now let's fast forward 36 months. What are the possibilities here, Kanjun? If you were to just. If you succeed with your open source platform, you know, everybody's got 100 agents working for them, multiple simulations going and we hit some level of super intelligence. What happens? Does the game of life just end? Every piece of software is instantly created and we don't even have to build startups anymore because people look at a glass mirror phone and say I want a meditation app suited just for me. And it just intuits what you want and makes you the thing. Where could this be in three years?
B
That's a great question. Three years.
A
And I know you were thinking about the human freedom is one of the topics we were talking on the group chat about. So you can kind of dovetail, I think that topic, topic in here, which is purpose.
B
You know I can, I can bring it in. I think there are, I think we're actually kind of at a fork in the road and there are two default paths and in the current default path where anthropic is winning, OpenAI is winning, Google is winning. What we're seeing is AI getting integrated into existing products, their existing products and we're seeing this verticalization like Anthropic is trying to kill. Third party. Anthropic cutoff access to openclaw for, you know, maximum Pro subscribers.
A
So Sam bought the founder, I believe, to get him distracted from Open Claw.
B
Yes.
A
The worst thing ever. I mean that was the most, I mean in a list of sinister things that people accuse Sam of doing. I felt this was like the most cynical thing to do. Take the most promising open source project and say I'm going to give the founder a couple of hundred million dollars. I'm super happy for Peter.
B
In order to distract him. Yep.
C
Yeah.
A
But the way he's killing it, like there's one thing you can buy a piece of, you can buy a company and then shoot it and like kill the product and kill the brand, you know, like that's dark.
B
Yeah, this is a very like sinister influence kind of behind the scenes. I think you're right about what's going on. But yeah, you see these, like OpenAI and Anthropic are killing their competitors because that's the capitalist system we live in. You know, they have to make investor returns and they have to make a moat. They have to lock you in. Right now there's not much moat. They're losing money. And so it's hard, it's hard to operate in this incentive structure and not do that, not try to kill your threats. And I think the default outcome is that they keep verticalizing. Max and Pro plans keep getting subsidized. Enterprises want to stay on their team plans or their enterprise plans because they're also subsidized relative to the API. And so as a result the like external third party providers, the open alternatives, they are just more expensive for both enterprises and consumers to use. And so you kind of end up locked in into this ecosystem. They don't like to share their memories of you. You know, you have to like kind of claw your memories. You can't download your entire conversation history from either Claude or OpenAI, which is wild. And so they know more and more about you and it's less and less easy to get out. And as you said, Jason, like they are going to understand you really well. They are going to intuit your preferences. They already do that. You already feel it. We already. I feel it at least.
A
And Karina's punching up her poetry with it.
B
That's right. Karina, yours is really good at poetry.
A
Your unpublished poetry. I don't know what's in your unpublished poetry.
D
This could be really nerdy poetry, you know, like it's like I'm not like suspected that writing.
B
Maybe it's just math, poetry, that Korean.
D
Oh wow. What does that theorem prove?
C
This theorem and make it rhyme?
B
No, I think Karina is a real artist, a true artist, feeling it. But in the default path, what's going to happen is that these companies, we're going to give our digital lives and our digital identities to these companies and they're going to rent them back to us. These are like our digital selves being locked up and rented back to us in the default path and we're going to like rent all of our employees from OpenAI and Anthropic and Google.
A
This is why I am so locked in, Jonathan to Apple, Silicon, Openclaw, Kimi, etc. I know it's not as good as Claude. I know like perplexing computer is awesome, but I just feel like we need to own this and we must fight for it. Now wait, you said there were two paths.
B
There's a second path. And actually I'm really curious, Jonathan, for your thoughts on the second path and Jason as well. Like, I didn't. I think the second path is like right now as consumers, you know, we're kind of in the like processed food era. We haven't realized yet that organic is better for us. And so right now we're in the like, oh, let's just use what's easy. It's the processed food of like Claude, we're just gonna buy at the grocery store, get it right out of the box. But like the organic, the like open version, the open source version, like what you're setting up, Jason, all of your own stack, that's like actually what's good for us to build the infrastructure of our life on like, I think 10 years from now, like our entire life infrastructure, our house, our like knowledge work, our friendship data, our health data. Yeah, it's all gonna be in here, Relationships, everything, it's all gonna be in here. And I think open source does not have to be synonymous with hard to use. In the past it was hard to use because not very many developers, not very much money went into it. But now every single person can be a developer and so every single person can go in and go edit their open source, you know, software infrastructure. And in fact there is an argument for open being better than closed because now you can go and change the stuff, whereas before you couldn't. And if it's closed then you can't go, I can't go change my cloud desktop. It does some really annoying things. I want to add like this recurring like slightly different scheduling functionality and I can't. But Jonathan, I think one thing I'm most stuck on is open models catching up. And you said you have all this data that you sell to the model providers and one of the that it's very specific data, enterprise workflows, coding workflows and the open source models just don't have that kind of data. So in your view, like how do and Jason and Karina, yours as well. Like how do open models actually stay caught up as you get more and more specific and private data into them?
C
Great question Kanchan. So we work with a lot of the open model builders also, so we provide them with data as well. Many of them don't have as big budgets as many of the larger frontier models, but we also work with them and I've wondered about this a bit. I feel like there's going to be room for the trillion parameter giant world models and smaller models and it's very workflow specific. If you're building a general assistant you probably want knowledge about the world so you can reason about all sorts of things. But if you're doing automation of an invoice to pay workflow or automate service ticket resolution or automate the job, automate the workflow of an FPA person that's doing analysis monthly, you probably don't need. You can probably be in the half a billion to 10 billion parameter regime. Fine tune that model on your proprietary data, distill the human intelligence of your humans working in your enterprise into the models, Automate your proprietary tool calls for that. Probably a smaller open and we are, I realized we are mixing a few things here like open versus closed and small versus large and open. Also there's distinctions between open source open weights like those are also different things. And whether you can fine tune it or not, that's another dimension. But by and large one shift that I'm seeing is in Enterprise there used to be these two camps. I'm going to call them the fine tuning camp and the no fine tuning camp. Like two years ago it was in the enterprise it was a lot more of the fine tuning camp where let's take a small model, let's fine tune it on proprietary data. Maybe we do some sft, maybe we do some RL and build custom models. The no fine tuning camp is hey, we have this giant model and all you in the enterprise need to do is be smart about how you manage context and how you manage memory. With intelligent context and memory management, you don't have to touch the weights. The models are pretty good at in context learning and you can even do continual learning without touching the weights if you're smart about how you accumulate training samples with marginal information gain. Like when the model is doing something wrong and the human is error correcting it. Just like Jason error corrected some of the prompts for marketing taglines. If that could be baked into memory and context, you don't need to touch the weights. Forget rl. Let the labs do rl. You just build the infrastructure around it. And what I'm seeing is in the last year there's been a larger shift towards the no fine tuning camp. Take the large model, do good things with context and memory. No need to fine tune. Not that the fine tuning has gone away, but that's an interesting shift because I feel like two years ago it was very much you need big models for consumer enterprise small models on prem sovereign AI. And even though people talk about sovereign AI and there are important problems to solve with ensuring IP stays within the compliance boundaries of the enterprise, but I'm seeing a lot more openness to using giant models with just intelligent context and memory management than there was a while back. So I think that we're going to have both. I think we're going to have big models and we're going to have small models. And the Frontier Labs could make big and small models. Some of them could be open just like OpenAI does with GPT OSS. And some of the companies like Meta could maybe, I don't know.
A
Well, that's the weird thing is, Karina, Microsoft, Meta and Apple basically having done nothing of substance in AI. I mean, Microsoft of course bet on OpenAI fantastic. And they have Azure. Okay. But it's not like there's a Microsoft AI product that we would all say, oh my God, and they have good pallets. Yeah, I can't get enough of this Microsoft blank or this Meta blank. What's the Facebook blank AI thing? And then when it comes to Apple, I'm absolutely frustrated with Siri to the point at which WhisperFlow is literally this little startup. Whisper flow has made a product that is 1,000 times better than Siri. And how is that possible? They've got hundreds of billions of dollars sitting there and they can't spell your name correct, Karina. They don't know any context about you. You tell it like, put my address in. It's like, I don't know where you live. You're like, you're my iPhone. I'm on you eight hours a day. You know my address.
D
I feel you have Google.
A
It's literally in Google Maps. It's on my vcard. It's everywhere on this fricking phone. So how do you think about open source and then those remaining players if you were going to add to Catherine?
D
I will also say that probably the breakthrough about continual learning is going to happen at one of the startups. I think there are like about three that I'm tracking. I think one of them, I'm super excited. I mean, just like, you know, watching from the sidelines, it's like the dream of personal intelligence of like indeed, you know, you don't need a huge, huge, huge proprietary model to serve your personal daily need. I see like, you know, two or three groups just making those breakthroughs with like top talents from XAI and from other places and from OpenAI, because they are not. That direction is not supported there. And I mean, at a point Meta had a chance to have the best, you know, model, if not open model. I mean it was llama and then somehow that team kind of spinned out and went to Ms. Trial instead to found miss trial. And that's, I think, one of the interesting things. There's so much alphas beneath the layers of management of a big organization. And perhaps back then the CEOs didn't have the tools to go direct. You can't actually just have your AI penetrate through the layers of bureaucracy. I'm not sure they're doing it now. I hope they are because I really do think that small teams, generally with a focus that's not easily changed, can achieve a lot. And we are in this interesting market where new labs are founded and there's so many Neolabs and you look at the people who are joining these companies and look at sort of the valuation, it is higher than before. But personal economics wise, it still doesn't make sense for a lot of the early founding members. But they are there, they're locked in because of the dream. They are seeing that this is a direction that they can pursue for as long as it takes to, to, to like, you know, bring it to realization. And I think Axiom being one of the mass super intelligence, like big players here, we are seeing people with that dream that they cannot fulfill if they join any other company. So it's interesting. I think a lot of innovations are happening in the venture startup landscape. And so people say it's a bubble, but people also say if it's not a bubble, it's a moonshot. Is this one or the other? And binary Outcome fail fast. That's a really great time to be doing venture investing in a way.
C
Eventually I think with Apple Silicon and this on device intelligence would be awesome. And I like the imbue vision of just an open system. So maybe we'll create our own intelligences that amplify us so that we could have many copies of ourselves that we fully own and control.
A
Well, I like the way Ken June you presented it, which is I don't want to rent myself back from Sam Altman. No, definitely, that is my personal black mirror. Like that's the JCal Black Mirror episode is me having to go to Sam saying, can I have myself back?
B
And they can lock you out of yourself at any time. Like, that's terrifying.
D
The other thing is make yourself do discoveries and like, you know, if they have me, they can make me do math until the end of the universe.
B
It's like Pantheon, you know,
A
it's nuts.
B
One thing I am curious about though. Yeah for, for Jason, Jonathan, Karina, like the economic dimension of personal agents and open agents, like I'm really struggling to figure out, you know, Jonathan, you said the open model trainers, they don't have as much money to spend because they haven't locked in as much profit because they are giving their models away kind of for free and they're just charging inference, like economically. A big question I have is how to make the personal AI sustainable. You know, one way is, okay, make it so consumers only prefer it. But that seems hard to kind of convince everyone, like, oh, this is going to be a huge problem for you. And so, yeah, how do we sustain this?
A
There is a roadmap for it. We saw, you know, with WordPress, which powers like a third of the web or some crazy number, but they only monetize like 1% of that. So that is the beauty of open source is this ability to only take a little bit back and then do commercial versions of it. What I wanted OpenClaw to do was to have openclaw.org, open source and then do openclaw.com and make it a. And I told him, like, if you raise money, obviously I'll put money in if there's room. It'll be the largest venture round ever. But he didn't want to run a company, is my understanding of it. But you would have a dot com and you could have the hosted version or the version with customer support, or you can take the free version, which tons of People use the WordPress Open source software and some people like to use the hosted version Apple has a unique place in this ecosystem. I just spent $3,400 on my MacBook Pro. It's got the 14 inch one. Usually I'm a MacBook Air, but I'm like, I need a really fast and more powerful one if I'm going to be running OpenClaw or some of these things. And then I'm like, well it's obvious that everybody in my company is going to just be on a Mac studio with 512 gigs of RAM. And as an employer giving somebody a $10,000 computer, a $3,000 computer or a $1,500 computer makes no difference to any business out there. There's no business. I mean like even a call center. If you gave a call center employee a $10,000 computer versus a $2,000 computer, does it make any difference for that business? No, it's the least cost. And my first computer, an IBM PC Junior, I think my dad spent $1,500 on it in 1982 or 83. That would have been three times that, four times that with inflation. So it would have been a six, $7,000 computer. So back then we used to spend regularly 4,000, 5,000 or six, $7,000 on a computer. Even $10,000 in today's dollars. I think we go back to that. The open source models are just tremendous and can do 90% and then the jobs that can't you just have some sort of a router, an intelligent router that says hey, this query.
B
Yeah, like route to the frontier model if needed. And so you need a layer on top of. You can't just use all of Claude's ecosystem.
A
But if things are going the way we think they are. Korina, like what amount of model are we going to need? Are we going to be so in abundance that like Kimi 7.5, four or five generations from now just feels like I can't even use this. It's like owning a Tesla. A Tesla can go 0 to 60 in under 3 seconds. It can go 150 miles an hour, it can go 3 or 400 mile range. You don't use any of those.
D
Right.
A
There's a speed limit. It is unnecessary to go 150 miles per hour and it's unnecessary to have 400 miles.
D
Yeah, that's interesting. I mean I'm in the camp that believe that recursive self improvement is going to come. I believe that companies working in coding, co generation, companies working in code verification and mathematics being one part of this. That's part of one end Game at least I believe that there are a lot of problems that are unsolved. I want to understand the universe. I can't understand the universe if I don't or I can't even understand my own brain. I mean neuroscience famously hard if I don't have literally the sharpest model that will train itself be that AI, AI scientist. And I firmly believe in that word. So I think it's like the MIT part of me is like scientific ambition. That is one line and I think that's super intelligence. Yeah, exactly.
A
Refer to it as opposed to egi.
B
Really? Scientific super intelligence. Yeah.
D
And I also think that we have a choice which is are we building super intelligence that is slop or like, you know, hallucinates all the time. So every five times you run it you understand the universe and the other four times you understand something else and it generates like millions and tens of millions of lines of math and you're like, is that the answer to the universe? I don't know. So I think we're also at a crossroad of whether humanity as a whole choose verify superintelligence or not. And axiom is I think one of if not like the one company building verified superintelligence. But I also believe in like, you know, like my mom, like she doesn't necessarily get the most utility out of superintelligence. She wants to be humans have like, you know, fulfillment, happiness. And personal intelligence will be another line. In a way the quest of personal super intelligence. The word super is like too unnecessary here. In a way it's about understanding and empathy. And in a way I think I'm pretty much drawn to the humans and pitch of are you going to build AI that maximizes human welfare in a way for those users? And that's very interesting as well. But I think it's like market forces is going to make recursive self improvement happen. Market forces is going to mean winner take all. Close proprietary models with large amount of funding is going to accelerate until the end of the time and the other ones will be left behind. But the thing that can counter and kind of balance out market force is ideology. There are people where I'm like, I'm just, I'm the best AI talent you can find. I'm just not going to work in a company that like require me to give those privacy because I don't know, I have this freedom ideology. Even if I'm in a corporate setup, like this person still expected then you maybe as CEO will have a choice of okay, am I going to employ these moonshot talents that can deliver the next GPT or am I going to employ the less shiny talents who will give access and will probably.
A
We have an example. Karina Karpathy is releasing some of the most fascinating and applicable repos on GitHub that are having non developers be like yeah, I'd like to you know, make an LLM myself. And he's like yeah, here's how to do it. Here's a YouTube video, here's how you can make your own LLM and here's how you train it. Like that's kind of mind blowing. And you only need one Carpathy for every hundred people who are like I need OpenAI stock or anthropic stock and you know, need to make my $20 million. Right.
D
But in a way it's interesting is that your skill level and your economic sort of, you know, circumstance determines whether you have the right to buy your ideology. This is, this is I think something that's interesting. It's like you know, someone who is not the party level. I mean it's just a very interesting end game in my view for that. It's like what's going to counterbalance?
A
Yeah Jonathan, we have a term for it, it's called FU money. At some point you get enough FU money you're just going to do what you want. It seems like Karpathy could work everywhere, anywhere and be the co founder of any company at any time and he's just posting interesting stuff to X all day long in GitHub. It was super fascinating and like sometimes you're a parent, you need to put food on the table and that's your morality. I have to feed these kids and maybe you got to steal a loaf of bread along the way. It's a super fascinating time if abundance happens. Maybe people are just like I want to only work on things that are world positive.
B
Interesting people can choose that people bully
D
from Open Ed and Sorvik during that one weekend too.
B
Yeah, totally. I also think we can have personal super intelligence for what it's worth. Not just relegating humans to having personal intelligence and emotions. Just like today we are far more advanced in what we think about work on than we were 300 years ago.
C
That's right. And I feel super optimistic about what's ahead. I feel like we are going to head to idea to company in one prompt. Like may not happen in three years but over the next decade. And I feel like we are so trained in thinking of the Andy Grove style. A human can have five to eight direct reports. I think the future a human might run five to eight companies, right. And maybe what the human does is put their tax ID in, make sure everything is, does all the compliance stuff. And I think the nature of a job is also going to change. Humans will I think transition from doing the work to like verifying the work of agents doing the work. And maybe people will all have multiple fractional jobs where you could be running different companies, working at different companies. I think the 9 to 5 or like 24, 7, like one company thing may change. One question I have for you Karina is it's so inspiring to work on solving, on building like math superintelligence. What does the world look like where there is a math superintelligence?
D
Yeah, that's interesting. I think. So the dream that we really have is you have like a billion AI goals working for you.
C
AI gods.
D
Gods, like you're the mathematician, the gods. So you'll have all these abundant outlier reasoning capabilities. I think humanity is pretty sad that we have been reasoning bound for some time. I mean you have Gao Luo who died at the age of 21, 22 out of that famous romantic duo. And then group theory kind of got set back for decades. One person died and then entire humanity got set back for decades. And similar story with Ramanujan, right? Malnutrition. And then, you know, you stop having these intuitions in number theory, those like magical formulas that other people build on. Still today, mathematicians are trying to decipher his last notebook. Now we want to have AI doing mathematical discovery and then the sort of time span from a mathematical breakthrough to like applied science and actual market sort of advances shorten from like 300 years to like three days, I want that word. And in a way that really complements the whole coding advances because those are like trial and error. And this is first principle and you kind of need both and you also need first principle to now verify, verify tranner, which I think is a really beautiful part. When we started out we didn't know there's this verification angle. So now it's beautiful that math needs code and code needs math and it's not one direction. And so that's really the future that we are building. And in a way it will be very interesting to see like how all these sort of, you know, company operations are being kind of coded up in the software stack. How much of the world can be explained by theoretical math? I mean theory of deep learning is a field that hasn't really, you know, been mature because most of the theory people are able to analyze about deep neural networks are like assuming one layer, assuming a linear transformer and we still don't understand anything.
A
I mean the fact that transformers are not understood and we don't know the output, but we know it's doing an incredible job reasoning.
D
That's right.
A
It makes one wonder about our own brain.
D
Yes, about our own brain. And you have probably 1 in 100 neuroscientists who come from a math major at MIT. Ela Feed she applied Chinese remainder theorem, something about elementary number theory remainders to analyze neural capacity, like how many neurons you can store in your brain and what's spacing like. And it's physicists who kind of did string theory who come into neuroscience really push forward the field of theoretical neuroscience. I just want the same to happen with AI mathematicians in every single field.
C
I love that Karina. I'm going to steal that line of humanity is reasoning bound. That's such a neat way to think of it. You're right. Reasoning boundaries and your point about like, it'll be nice to have like a million gulfs. Maybe you could call your intelligence super normal intelligence or ultra normal.
A
Very bad. All right, listen, this has been an amazing episode. Jonathan from Turing, thank you for coming. You're hiring, I'm assuming. Where can people find out more about
C
turing turing turing.com and if you want to work on research to help advance the models for coding Sui enterprise, join us.
A
Karina, tell us a little bit about where people can find some more about Axiom.
D
Go to Axiommath AI recently 2 release. One is open source so you can run on your computer to discover interesting graphs like touring graphs and it's called Axplorer. The other one is also free hosted open service called Axle. If you want to run like, you know, large mass computer program, Axle is your go to infrastructure and community has
A
been using it and Kanjunq Imbue. Where can they find more and what should they learn about?
B
Yeah, you can go to imbue.com we just open sourced a product last week for you to run a fleet of parallel coding agents so you can make your engineering team much more automated. And also it's not just specific to code so you can actually build business processes into it. We do that. We have a bunch of people who do that and we are definitely hiring if you want to build products from zero to one, that's where we run our whole company that way.
A
All right, and we'll see you all next time on this week in AI. Bye Bye.
Episode: What's Left for Humans When AI Builds Everything?
Date: April 8, 2026
Host: Jason Calacanis
Guests:
This episode is a “CEO roundtable” on the rising dominance of agentic AI, open vs. closed AI ecosystems, how AI is changing software development, and what might be left for human purpose and agency when machines increasingly build everything. The panel debates the explosion in coding agents, the data economy that powers model improvement, the future of open source AI, and what humans will do in a world where “agents build agents.” Notably, it tackles the profound, Black Mirror-ish implications for digital identity, work, creativity, and economic structure.
"We're giving our memories, we're giving our workflows... building our business infrastructure on top of agents... And so with agents, it gets a lot more intimate. If Anthropic or OpenAI... have our data, all our memories, our whole life's work, they can convince us of anything... That's a pretty bad world to be in as a human."
"Superintelligence is meaningless if it's not verified. I don't want Schrodinger's super intelligence. So that's what we're doing. We're building an AI mathematician that always gives you the proof of everything."
“There's unlimited demand for high quality data... The scaling laws continue to hold... The floor of human intelligence that's needed to advance the models also becomes even smarter.” [C, 12:14]
“Coding is everything... mass is code and code is math... coding and the reasoning capability, Anthropic has really differentiated itself there... Everyone is using Claude Code or Cursor.”
“When you're training these models, they learn embeddings... When you are trying to learn how to code, it kind of learns these good abstractions... you get really good fast training data. In the real world, it's hard to get good verifier data.”
"We're going to give our digital lives and our digital identities to these companies and they're going to rent them back to us... our digital selves being locked up and rented back to us."
“You can make your product really complex... and then you can just redo the entire thing based on what you learn pretty quickly.”
"A human might run five to eight companies, right. And maybe what the human does is... verifying the work of agents doing the work."
“No, definitely, that is my personal Black Mirror... me having to go to Sam saying, can I have myself back?”
"I mean the fact that transformers are not understood and we don't know the output, but we know it's doing an incredible job reasoning... makes one wonder about our own brain."
On Digital Sovereignty:
On Superintelligence Verification:
On the Economic Future:
On Commoditization of Software Work:
On Team Structure & Token Usage:
On Human Agency:
[Episode Guests' Projects]
For listeners seeking a deeper, CEO-level pulse on AI's development and social ramifications, this episode is a must-hear window onto how the industry’s leaders are grappling with AI’s transformative—but fraught—impact.