Summary8 min read

Latent Space: The AI Engineer Podcast

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover (April 23, 2026)

Main Theme & Purpose

This special crossover episode brings together Swix, founder at Latent.Space, and Jacob Efron from Unsupervised Learning. Their energetic, insider conversation covers the rapid evolution of AI engineering, with a focus on what’s new in coding agents, the changing market structure, infrastructure trends, foundation model dynamics, startup impact, and the evolving playbook for AI-first companies. They debrief Swix’s recent experience at AIE Europe, discuss foundation model bloat, agents as customers, startup defensibility, coding wars, and the next frontier for AI labs and engineers.

Key Discussion Points & Insights

1. Current AI Engineering Zeitgeist

AIE Europe Debrief and Hot Topics
- Swix highlights (01:34) three top themes at the recent AIE Europe:
  - Openclaw (major story in recent months)
  - Harness Engineering and Context Engineering (critical in agents and RAG)
  - Evergreen infrastructure: Evals, observability, GPUs, LLM infra
- “Obviously openclaw is the story of the last four or five months.” — Swix (01:39)
- Multimodality and generative media are big, but these three dominate current conversations.

2. Infrastructure Stability—Is “This Time Different”?

“Harnesses” and Infrastructure Plateau
- Swix discusses (02:55) the current stabilization of integration patterns (skills, markdown files, scripts) in agent infra.
  - “It feels like we've landed at skills, which is like the minimal viable format... I don't see how it can be more simple than that.” — Swix (03:20)
- Still expects adaptation in subdomains (real-time, memory, sub-agents).
LangChain's Constant Pivots
- Referencing Harrison Chase’s tweet about finally reaching infrastructure stability after years of “reinventing the company.” (02:13)

3. Product Layers: Application vs. Infra

Building for Change
- Application teams iterate rapidly as models evolve; infra companies face steeper, more abrupt customer churn when devs jump for the “next hot thing.” (04:34)
- “Developers are definitely a pickier audience maybe than an accounting firm or a bank.” — Jacob (04:34)
Vertical vs. Horizontal Companies
- Ops: “When you're Ligora, when you're a bridge, you are the outsourced AI team...” — Swix (05:57)
- Infra/Sandboxing reinvents “classic cloud” with AI workloads—tremendous scale, but not necessarily defensible.

4. DIY Models, Open Model Value & Chips

Agent Lab Playbook
- Use general models, specialize/domain-tune, eventually train own models once data abundance achieved. (06:59)
- “Once you have enough workload and enough high quality data from your users, then you can obviously train your own models and save a lot on cost and latency...” — Swix (07:11)
Marketing Bonus vs. Real Value
- “There actually is real value...domain specific models...absolutely make a ton of sense.” — Swix (07:47)
Rise of Alternative Chips
- Cerebras, Thales, Matx: non-NVIDIA chips enabling huge inferencing speed-ups (“thousands of tokens per second”). Unlocks new application patterns. (09:28–10:24)

5. Agents as Primary Customers

AI Consumes Infra
- 60% of Vercel admin app traffic comes from bots—not humans (12:08).
- “If it doesn’t exist as an API that agents can use, it doesn’t exist.” — Swix (12:24)
Designing for Agents
- Agent experience should essentially mirror great developer experience: clear APIs, docs, stateless flow, progressive disclosure. (12:38–13:43)
Compounding Advantages for “Early Winners”
- Products “in the training data before 2023” have outsized share in agent interactions.

6. Coding Wars: Market Structure & Competitive Dynamics

Market Size Explodes
- Coding as “mother of all markets”: “OpenAI and Anthropic have made it their P0 to competing coding.” — Swix (17:34)
- CLAUDE, OpenAI, Cursor: all rumored/estimated to be in the $2B+ ARR club within a year of launch (17:34–18:00).
- Penetration: coding use is roughly 50% of generative LLM infra.
Momentum vs. Mean Reversion
- “Why can’t it keep going? …You could have just did the momentum bet instead of the mean reversion bet.” — Swix (18:00)
Token Maxing Culture
- Adoption today is about maximizing usage, not just quality. Leaders are “living at the edge” with massive compute budgets and high-risk, high-discovery usage. (19:27–20:42)
- “The people who are going to discover the next hot thing are living at the edge, right?” — Swix (20:41)
Foundation Model Competition
- Anthropic (Claude Code) is positioned as the “high-priced premium player” with access restrictions; OpenAI (Codex) and Gemini are “super subsidized.” (20:55–21:41)
Stability and Stickiness
- Despite newcomers, market is dominated by 2–3 major labs with smaller long-tail niche applications. Major change hinges on a true inflection in economics, brand, or tech. (22:10–23:39)
First-Mover Effects—and Their Limits
- Claude Code has remarkable stickiness as first premium experience, but time will reveal whether that persists. (28:13–32:27)
  - “Loyalty and stickiness to first movers and category creators I don't think is as high as it might be in some other areas...” — Swix (30:47)

7. Startups & SaaS in the AI Age

Foundation Labs vs. Startups
- Mid-sized LLM infra startups are squeezed due to lack of defensibility; small startups can move fast or “get acqui-hired” by labs. (35:44–36:50)
- Traditional SaaS with low NPS (Net Promoter Score) are most at risk—AI enables rapid, customized alternatives. (36:50–37:49)
Internal SaaS Replacement Dilemma
- Swix shares (37:33–39:39) internal debate on rebuilding expensive SaaS in-house via AI, describing organizational resistance as primarily cultural, not technical.
  - “The trick has been dealing with the rest of my team and getting them on board because I'm the most technical person on my team but I can't make that decision myself.” — Swix (37:34)
- Ongoing AI-vs-SaaS discourse: Will agent-native or AI-first record systems replace classic cloud databases?

8. Security, Scaling & Model Releases

Security vs. Biosafety
- At a recent Anthropic dinner, Swix worries about biosafety (AI enabling synthetic bio risks); Anthropic execs focus on classic security and responsible model access. (40:21–41:19)
- “Someone sitting at home can manufacture a virus that wipes out half of humanity?” — Swix recounting his warning (40:25)
Model Access Strategies
- Anthropic bundles access with product and restricts access; OpenAI opens broadly, letting the market determine use (41:31–41:51).
- Compute rationing (above 10T params) is temporary and a function of global cluster buildouts, not persistent scarcity. (41:57–43:06)
- “We are continuing to scale number of params when everyone kind of can see that we're not going to get the next thousand or one million x from this paradigm.” — Swix (43:06)

9. Open Source Models: Reappraisal & Divergence

Open Model Share & Quality
- A year ago open source share was “5% and going down”—now, seeing reversal, especially among top 20% of companies for cost/speed. (44:57–47:15)
  - “The fireworks and the togethers are crushing and so will all the fine tuners.” — Swix (46:42)
- Fine-tuning as a service, once suspect, is now growing as open models mature (“derivative of the open models market”).

10. The Evolving AI Coding Process

Dark Factories and “Zero Review”
- Term: Dark factories (from Simon Willison): the next phase is not just “zero human coding,” but “zero human review.” (47:44–49:07)
  - This will require automated verification/testing at scale—enabling software to grow in “quantity and quality.”
- “That quantity helps you get to quality, which I think people are very uncomfortable with. Because people associate more quantity with slop.” — Swix (48:36–48:51)

11. Forward-Looking Theses & The Next Frontier

Personalization, Memory, and World Models
- Memory and personalization are likely to become core differentiators, overtaking current “spam quantity” approaches in agent selection. (14:01–16:49)
- “World models”: beyond “next token prediction” towards spatial, embodied intelligence—citing Fei-Fei Li’s work and the need for “AI that understands what a table is, what matter is, what physics is.” (51:58–54:04)
  - “Fei-Fei Li...may not have the solution yet, but she has the right problem statement.” — Swix (52:42)
  - Pop culture parallel: “It’s like Good Will Hunting—knows everything, hasn't experienced anything.” — Swix (53:56)

Notable Quotes & Memorable Moments

On AI Speed and the Event Circuit:
“We're trying to match AI speed.” — Swix (01:29)
On Infra Stability:
“Very expensive to say ‘this time is different’ sometimes, but when you're just writing code, it's actually okay to just try to make a call.” — Swix (02:55)
On Open Model Growth
“Fine-tuning as a service, once suspect, is now growing as open models mature.” — Swix (46:42)
On AI Replacing SaaS Internals:
“I could probably spend $2,000 and build a custom version of that.” — Swix (37:34)
On The Market’s Rapid Growth:
“That number is just mind boggling.” — Swix (00:00 and echoed at 26:31 re: $1.2T IPOs)
On The Future of Agents & Coding:
“2025 was the year of coding agents, 2026 is coding agents breaking containment to do everything else.” — Swix (00:05, 32:58)
Pop-Culture Analogy:
“It’s like Good Will Hunting...a very intelligent LLM who knows everything but hasn’t experienced anything.” — Swix (53:56)
On First-Mover Advantages and Potential for Change:
“The natural forces of the world revolt when any one company is too much the star of the show...I’d be shocked if we don’t have reversion of vibes.” — Jacob (33:30–34:24)

Timestamps for Important Segments

| Timestamp | Segment Summary | |-------------|--------------------------------------------------------------------------| | 00:00–01:30 | Opening, crossover intro, context, main themes at AIE Europe | | 02:13–03:20 | Infra trends, stability, LangChain pivots | | 04:34–06:38 | Horizontal vs. vertical strat, application v infra, startup challenges | | 06:59–08:50 | Agent Lab playbook, own model training, value vs. marketing | | 09:28–10:24 | Rise of alternative chips, speed benchmarks implications | | 11:25–13:43 | Infra for agents, agent experience, Vercel stat, product hygiene | | 17:26–20:41 | State of coding wars, scale, “living at the edge,” compute advantage | | 20:55–23:39 | Foundation model competition, stability, potential for market inflection | | 28:13–32:27 | Coding first-mover sticky effects, Claude Code v. Codex | | 35:44–37:49 | Labs vs start-ups, SaaS at risk, low/NPS SaaS replacement | | 37:33–39:39 | Internal debate on in-house SaaS replacement (case study) | | 40:21–41:19 | Security v. biosafety, responsible releases (Anthropic anecdote) | | 44:57–47:15 | Shift in open model share and economics, fine-tuning and open infra | | 47:44–49:07 | Dark factories, zero-review software, next leap in coding agents | | 51:58–54:04 | World models, spatial intelligence, next AI intelligence benchmarks |

Summary Takeaways

Swix and Jacob provide an unvarnished, insider-rich tour of AI engineering in flux—where code generation and synthetic agents are eating not just software, but SaaS categories and the startup ecosystem itself. They work through the practical, economic, and cultural consequences of these shifts—debating infra stability, emergent memory and personalization, the sticky power of product “firsts,” and why open models and non-NVIDIA chips matter. From “dark factories” to world models, they lay out a vivid map of what matters at the edge of AI’s rapidly-unfolding frontiers, ending on a call for more embodied, spatialially-grounded intelligence in the next wave of AI models.

For more on this and similar episodes, visit: https://latent.space

Loading summary

Transcript134 lines

[00:00]
Swix
Isn't that crazy? That number is just mind boggling.
[00:03]
Jacob Efron
What is the state of the AI coding wars today?
[00:06]
Swix
We're in a phase of sort of like capability exploration. The general thesis that I have been pursuing now is that the same way that 2025 was a year of coding agents, 2026 is coding agents breaking containment to do everything else.
[00:17]
Jacob Efron
Do you worry about the foundation models just eating into a bunch of these startup categories?
[00:21]
Swix
Mid sized startups, yes.
[00:23]
Jacob Efron
What do you think the end state of this market is for the market
[00:26]
Swix
structure to significantly change?
[00:28]
Jacob Efron
There would be Today on Unsupervised Learning, we had a fun episode and what become an annual tradition, a crossover episode with our friends at Leighton Space. Swix and I sat down and we talked about everything happening in the AI ecosystem today, what we thought of the various changes at the model layer, what's happening in the infra world, the coding wars, and a bunch of other things. It's a ton of fun to do this with someone I really respect and another great podcaster in the game. Without further ado, here's our episode. Well Swix, this is super fun to be back with another unsupervised learning latent space crossover episode. Yeah, I feel like a lot of places we could start, but one thing I always find fascinating about the way you spend your time is you obviously are at the epicenter of this engineering movement and community and you run these events and conferences and put on these awesome talks and I think just have a great pulse on the zeitgeist of what's going on. Maybe to start just what are the biggest topics people are thinking about right now?
[01:22]
Swix
Yeah, so I just came back from London where we did AIE Europe and we're doing roughly one per quarter now.
[01:27]
Jacob Efron
Yeah, you're really up the pace.
[01:29]
Swix
We're trying to match AI speed.
[01:31]
Jacob Efron
Yeah, exactly. The topics will be completely different, I imagine.
[01:34]
Swix
I definitely curate the tracks. You can see what I think when you see the track lists and the speakers that I invite. Obviously openclaw is the story of the last four or five months and then just below that I would consider Harness Engineering and Context Engineering to be two related topics in agents and rag, and then there's a long tail of evergreen stuff like evals, observability, GPUs, LLM, infra just in general. We also have other updates on multimodality and generative media let's call it. But definitely the first three that I mentioned are top of mind people.
[02:13]
Jacob Efron
I think harnesses in particular are so interesting. There was this Tweet from Harrison Chase the Lane chain CEO that caught my eye recently where he said it finally feels like we have stability around the infrastructure, around AI. And I think what he basically was implying is, look, over the past two, three years as a company at the epicenter of AI infrastructure was a bit like playing whack a mole. Right. You were constantly moving around with. However the building patterns were evolving.
[02:36]
Swix
So Harrison for sure, right. That he's basically had to reinvent the company every year since he started LangChain. Right? It was LangChain, Langgraph and all deep agents and like, I think he's like one of the most nimble, adept, sharp people about this. Yeah, yeah. Now.
[02:50]
Jacob Efron
Now it's.
[02:51]
Swix
This time it's different.
[02:52]
Jacob Efron
Yeah, yeah. Do you buy that or what have you kind of make of that take?
[02:55]
Swix
Mm. I think that you. It's very expensive to say this time is different sometimes, but when you're just writing code, it's actually okay to just try to make a call. And I think it may not even matter if this call is right or not. I just don't even care that much because you can be right on the thesis, but if you don't figure out how to monetize the thesis, then who cares if you said something first? That said, it does feel like, for example, we went through a lot of different ways of packaging integrations up with agents. And it feels like we've landed at skills, which is like the minimal viable format, which is just a markdown file with some scripts attached to it. And I don't see how it can be more simple than that. And so there is some justification for the stability around harnesses. I feel like there may be more adaptation with regards to maybe the real time elements or sub agents or memory or any of those agent disciplines, let's call it in agent engineering. But if the thesis is that, okay, you just want agents are LLMs with tools in a loop with a file system where they can do retrieval with skills and all these standard tooling that now seems to be relatively consensus, then probably that makes sense. I just think like there's no point trying to stake your reputation on this thesis that we're there because if it changes again, just change with it, it's fine.
[04:34]
Jacob Efron
I've always been struck by how that is much more challenging for infrastructure companies and application companies. Obviously. I think on the application side you've seen Brett Taylor from Sierra, Maxine from Lagora. They're like, look, we build what's ahead of the models and we're Willing to throw everything out every three months as the models get better. But the thing you at least have there is you have an end customer, right, that's like decently sticky. They will mostly stick. They'll give you a shot at least of building these things. What I've always found more challenging at the kind of like reinvent yourself every three months at the infrastructure layer. It's like developers are definitely a pickier audience maybe than an accounting firm or a bank. And so it's definitely a more challenging position to be in to have to constantly reinvent yourself.
[05:18]
Swix
Yeah. And when they churn, it's very complete, they'll leave to the hot new thing because there's no defensibility, I guess even if you are a database, people can migrate workloads off databases. It's a known thing. So I think basically what we're talking about is the vertical versus horizontal debate in AI startups. And the way I think about it also is just that like when you're ligora, when you're a bridge, you are the outsourced AI team. Your job is to apply whatever state of the art AI methods.
[05:55]
Jacob Efron
Yeah. Like this translation layer between model capabilities
[05:57]
Swix
and your end customers to the end customers. And well, if they didn't have you, they would have to hire in house and they're not going to hire in house, so they have you. And I think that's a reasonable, very robust to any whatever trends and discoveries that people make. In the engineering layer I do think there is sort of useful horizontal companies being built, but they're all very much like, sort of like the reinventions of classic cloud in the AI era. And the primary one being sandboxes, which it's another form of compute. Guys, let's not get too excited about it but I mean the workloads are enormous, right?
[06:39]
Jacob Efron
Yeah, it's interesting and I feel like as part of this, you know, the questions that folks are asking around infrastructure, there's a lot around, you know, the extent to which companies should have their own AI teams and what they should be doing in house. And you know, I think there's questions around should people be training their own models? Should people be doing, you know, RL in house based on the data they have? I feel like, you know, one has to evolve their takes on this every, every three months with paces. But where are you at on this today?
[07:00]
Swix
I think actually all models have gone up and obviously I'm involved in cognition and also cursors doing a lot of own model training and I think that is some part of what I've been calling the agent lab playbook where you start off with the state of the art models from the big labs and you specialize for your domain. But once you have enough workload and enough high quality data from your users, then you can obviously train your own models and save a lot on cost and latency and all that good stuff. You also get a marketing bonus of cost. Calling it some fancy name and putting out some research.
[07:39]
Jacob Efron
From my seat I can't tell how much of it is actual value that's provided to the end user and how much of it is that marketing bonus. Right. It seems some combination of the.
[07:46]
Swix
I think it's both.
[07:47]
Jacob Efron
Yeah.
[07:47]
Swix
No, there actually is real value and you know that for a number of reasons. Like one, even when it's not subsidized, people do choose it as like one of the top four or five. This is both composer two and sui 1.6. I want to top five models in a fair market, in a free market, in a model switcher, people do choose it and it's not subsidized, so that's as good as it gets. But beyond that, domain specific models, for example for search, which both companies have absolutely make a ton of sense. Everyone says yeah, you should always do this and honestly I think the infrastructure for that is becoming easier with thinking machines tinker thing as well as Prime Intellect's lab stuff. Yeah, I mean this is one of those reversal of the bitter lesson where you first bootstrap on the large models and the general purpose models to get big and as you get very well defined workloads that are just high quantity but not high variance, then you just distill down to a smaller model and run that on your own. Which totally makes sense.
[08:51]
Jacob Efron
What I'm less clear on is the kind of DIY RL use case which I think is really mostly around improved quality for different things. Obviously there's probably more more efficient ways to get a smaller model that's faster and cheaper and it'll be interesting to see whether obviously you had two, three years ago this whole case of companies that were pre training and claiming better outcomes in their domains then getting kind of cooked as each model iteration improved. I wonder whether a similar story plays out in the RL space for the focus on pure outcomes and quality, not the cost side, which clearly your own models for cost at scale makes a ton of sense.
[09:28]
Swix
I think there are two sides of the same coin. You basically always want to hold quality constant or trade off a little bit of quality for a drastic decrease in cost. And that's true for everyone. One element I wanted to bring out which is very much in favor of open models is custom chips. So this would be Cerebras but also Thales and then there's a huge range of stuff in between. This has been a huge story this past year on just everything non Nvidia is getting bid up, including like freaking MATX is working which is very rewarding for me. But I think one of those things where oh suddenly because the number of alternative hardware is increasing and the inference that you can get is insanely high, we're talking thousands of tokens per second instead of less than 100. So the trade off for quality doesn't hold as much anymore because the speed is so high.
[10:24]
Jacob Efron
Have you seen a lot of companies go all in on the alternative chips?
[10:27]
Swix
So cognition has on Cerebras and so has OpenAI. So no, I don't think so. Beyond that's mostly because that's foreshadowing a. I used to be kind of a skeptic in terms of like okay, so what if I get my inference at 100 tokens per second sped up to 200 tokens per second. It's only 2x faster. It's not that big a deal. But when you. I think every 10x does unlock a different usage pattern and we have proof in Thales and some of the others that you can actually drastically improve inference speed. And what happens from there, I don't even really know. It's so hard to predict when entire applications just appear at once. And it also isn't that expensive. This is one of those things where I think the investment cycle is going to be multi year and I would caution people to not dismiss it too quickly.
[11:26]
Jacob Efron
Yeah, I mean one other infra question I was curious to get your thoughts on is obviously it seems increasingly a lot of the cutting edge infra companies are building for agents as the buyers of their product or users of their product. Right.
[11:37]
Swix
Another huge theme. Yeah.
[11:39]
Jacob Efron
And I'm trying to figure out what do you have to do differently about selling into agents? Are they just the ultimate rational developers or is there.
[11:47]
Swix
No, absolutely not. I think they are easily problems injected and very tuned towards basically compounding existing winners. So congrats if you won the lottery for getting into the training data before 2023 because now you're installed in there for the foreseeable future. But yeah, one stat that Vercel cto, Malta UBEL dropped at my conference was that there are now 60% of traffic to Vercel's like admin app architecture for like configuring Versal applications is bots. It's not human. So like your primary customer is agents now and it's mostly coding agents, mostly people using CLI, OMcP or whatever. But yeah, I mean I think step one, if it doesn't exist as an API that agents can use, it doesn't exist which I think is like it's a good hygiene thing anyway to make everything API available. But now it's an extra push on products people to not only work on the ui, you should probably work on the CLI stuff. Beyond that I think honestly there is so I come from the sensibility of I think everything that you are trying to do for agent experience now, which is the term that Matt Billman at Netlify is trying to coin, is the same thing that you should have been doing for developer experience. You should have had good docs, you should have had a consistent API that is mostly stateless. You should have I guess discoverable or progressive disclosure or search or whatever. And so now that people have energy in finding these customers to do that, that's great. Do I believe in extending beyond that into something like AEO for gaming the chatbots? Not necessarily, but obviously there's going to be huge advantages from people who figure out the short term wins and short term wins can compound.
[13:43]
Jacob Efron
Do you think these compounding advantages to the pre training data cutoff companies? Obviously over some period of time I imagine that doesn't persist as you think about, I don't know, three, four years from now what the selection criteria end up being. Do you think it still mirrors exactly what you were saying before? It's exactly what you should have been doing all along to sell a good product to developers.
[14:02]
Swix
It could be, except that I think in three, four years we'll probably have much better memory and personalization. So then general AEO or GEO doesn't really matter as much. So I think whatever memory or personalization system we end up with will probably determine what you end up choosing much more than what is currently the case, which is just frequency of mentions, let's call it. So you just spam quantity. And I think that's, I mean that's something I'm looking forward to. I do think like, I think that the fundamental exercise to work through for yourself is if you start a new sort of disruptor company now there's a big incumbent that everyone knows like Supabase. Supabase is kind of like the postgres database incumbent. If you Want to start new Supabase? How would you compete with them? And I don't necessarily have the answer, but I do think people relatively new, I think they were starting 2023 and there was a recent survey where people checked what Claude recommends by default. If you just don't prompt it with anything, just say, give me an email provider and says resend in like 70% of cases. Like, the fact that you can get in there with like such a relatively short existence I think is encouraging. Yeah, I do think like you do want to do whatever it is to get in very short mentions this because it's not going to be 20 of them, it's going to be like three.
[15:27]
Jacob Efron
No, definitely. It feels like probably more consolidation than ever or kind of like a winner take most market than maybe the physics of go to market in the past might have enabled.
[15:39]
Swix
The other thing also is semantic association is going to be very important in the sense that you want to do the combo articles where you're like, use my thing with vercel with blah blah, blah, and that all gets picked up in a corpus. And so that's probably one thing that you want to do. Well, I don't know what else. It's one of those things where I feel I'm behind. I don't know how you feel about
[16:04]
Jacob Efron
this, but I think AI is just everyone constantly feeling like they're behind some. Yeah, I want to meet the person
[16:10]
Swix
that doesn't feel behind with ax. Right. So my stance was exactly what I said before. Everything that you should do for agents is something that you should have done for humans anyway. And so to the extent that you're just getting more energy to do things for agents, great. But it's hard to articulate what new thing apart from just like more spam you should be doing anyway. That will be my take right now. I do think there will be more turns at this. I think the personalization turn that is coming will be big. And I don't know what that looks like because basically we feel kind of tapped out on the memory side of things.
[16:49]
Jacob Efron
Yeah, I guess since we last chatted, you took this rollover at cognition and you've obviously have a front row seat to the AI coding space today. I feel like coding in many ways people view it as this. Besides being the mother of all markets and this massive opportunity, I think it's kind of a preview of what's to come for many other spaces. Both I feel like agents are most advanced in coding. I also feel like the competition between foundation models and application Companies mirrors what we may see in other spaces. And so maybe for our listeners, can you just lay out what is the state of the AI coding wars today?
[17:26]
Swix
It is massive, right? And I don't think necessarily last time we talked about this we appreciated the
[17:31]
Jacob Efron
size of what I wish we did.
[17:34]
Swix
The state of AI coding wars today. Both OpenAI and Anthropic have made it their P0 to competing coding. And Thopic is at like 2.5 billion in ARR just from cloud code. The way they recognize ARR is up for debate. OpenAI I don't think a public number is known, but let's call it 2 billion as well. And then cursor is rumored to be 2 billion. And those are the public numbers that are known. So huge markets that have just been created in the past one year. Cloud CO just recently celebrated their one year anniversary, which is pretty insane. I think the other thing that I see is there's some other people who are like, oh, here's the relative penetration of Claude use cases and it's like coding 50% and then legal, whatever, hell, it's like the remaining ones. And there was a very popular tweet that was like, okay, look at the empty space and all these other use cases. If you are a new founder today, you should be betting on the other stuff because on a sort of catch up theory and my, my pushback is the same pushback that I had on Apple versus Google, which is like, well, why is this time different? If it went from let's say 10 to 50% in the past year, why can't I keep going? And getting that wrong is actually a very painful one because you could have just did the momentum bet instead of the mean reversion bet. So I think that is the state of things now that people are very much into psychosis, there are getting rewarded for spending more rather than spending less. And I think we're not in that phase of efficiency, we're in a phase of sort of like capability exploration. So I think people who are more crazy, who are more creative get rewarded comparatively well.
[19:27]
Jacob Efron
It's an interesting, I mean it feels like behind these like token maxing leaderboards and whatnot is this, it's like the first phase of this transition from a workforce perspective is you just got to show your employer like, hey, I use
[19:37]
Swix
these tools, here's my number of tokens, I cost and that's it. They don't care about the quality right now. It is maybe distasteful to someone who cares about the craft and all that. But directionally, everyone just wants you to go up regardless. And so it's not very discerning and it's probably very sloppy. But I think it's net fine because we're still probably underusing AI just in generally. And so I think that's very interesting. We had on the podcast Ryan, the popular firm OpenAI, who spends a billion tokens a day, and that's for those counting home. It's like something like $10,000 worth a day of API tokens if they did market rates. And most of us can't afford that. And probably a lot of what he does is slop, but he's going to. If there were a new capability, he would discover it first before you, because he was trying and you were not trying to. And you only do things that work well. Good for you. But the people who are going to discover the next hot thing are living at the edge, right?
[20:42]
Jacob Efron
An increase in living at the edge is just having the compute budget to run these experiments similar to what living at the edge on the research side has always been. It was constrained in many ways by the amount of compute you had to run these experiments. It feels similarly almost on the builder or actualizing these tools.
[20:56]
Swix
Now the other thing that's very obvious is Anthropic is kind of like the high priced premium player where restricting limits or restricting model releases even is the name of the game. Whereas Codex is like, come on in guys, use our SDK, use our login, we don't care, we're going to reset limits. Whatever you do want to try to exploit the subsidies where you can get it. And definitely Codex is super subsidized right now. Gemini also very subsidized. And comparatively I think you should make. Hey, I guess while that's going on, it's not that bad to be a capabilities explorer on just the $200 a month plan from Cloud Code or from OpenAI. And my sense is that people aren't even there yet.
[21:41]
Jacob Efron
How do you think this market ultimately plays? I mean, it's obviously such a big market that any slice of that market is interesting for anyone going after it. But I think what makes people so interesting in the coding market particularly is it feels like it's kind of this foreshadowing of what will happen in any other kind of application market that the foundation models eventually turn to and are all their models against and gather data around. And so how do you think, does there end up being room for lots of different kinds of players or what do you think the end state of this market is. And do you think that's applicable to other markets?
[22:10]
Swix
I feel like there will be. I mean status quo is probably the most likely outcome, which is there are two big players and there's a small range of longer tailed people that fit other use cases that the two big players don't. That feels right to me. I think that for the market structure to significantly change, there needs to be a significant change in the economics or the brand building or the value propositions of the companies involved. And I haven't seen any in the last six months that have really changed the stories materially. So I feel like they would just keep going until something else happens. Something else happens, meaning Microsoft wakes up and goes, guys, we have GitHub. We'll do something much bigger here other than just Copilot. And that would be a big change. MSL has put out a model now and I was in a breakfast with Alex Wang where they were like, yeah, we really, really want to go after the coding use case. They haven't done anything yet, but don't underestimate them. Right. And similarly for the Chinese labs, I think they're trying to go after it like ZAI is doing stuff. ZI and GLM are the same thing. Everyone's trying to get a piece of that pie. I feel like the status quo has been pretty stable for the past almost a year, I will say. Yeah.
[23:40]
Jacob Efron
And is there room for the. Not for the application companies more on the enterprise side or where do the. What service area do the model companies leave for application companies?
[23:50]
Swix
Yeah, that's a good one. It's very much evolving, I will say, because OpenAI did not have this level of attention on coding a year ago. We just don't have that much history. Right. And it seems like for example, so the big push at opening now is the super app. Is that a consumer thing? Is that like a products like portfolio rationalization thing? How much is that going to take away attention from coding at the time when they actually do want to put more coding? I think it's very unclear. So I do think there's all these at both big labs. Sorry. At both OVI and Anthropic and D and XAI are separate cases. They are trying to see the other TAM expansion areas. So cloud code for finance, cloud, cowork, all those things. Whereas I think cursor and cognition are comparatively just focused on coding. And so I do think they leave space and I do think for the other verticals that also means the same thing, that they're not going to Be that intensely focused on that domain. Except for I think I would mark out finance and healthcare as the next ones that they're clearly going after. I would say comparatively, healthcare seems more thorny. There have been some announcements about it, but I would respect the finance work a lot more just because the path to money is a lot clearer.
[25:13]
Jacob Efron
Yeah, obviously I think maybe similar to the space that's being left in these other domains, there's obviously a lot that's required to actually implement these tools in enterprises versus maybe just giving model access to folks out of the box.
[25:28]
Swix
Yeah. So the agent lab thing is like we'll do the last mile for you. Whereas I think the model labs tend to just trust the model and be minimalist about it. Both of them work. I don't necessarily think one beats the other for every use case. All I do know is that it does seem like the large enterprises do want a dedicated partner that isn't just the model labs, which is kind of interesting.
[25:56]
Jacob Efron
We've been in this phase of pure capability exploration and so I think nothing has been better for large labs. They are always going to be at the frontier of capability exploration and so I think have a very good relationship with a lot of these enterprises. But ultimately over time the incentive structure of these labs is always going to be maximal token consumption for the end customers they work with. And there's just I think so few companies that have actually gotten to massive scale. Maybe coding again is the most interesting. So it's the first space that really is just completely gone. Yeah, you must live it every day. Like absolutely insane.
[26:31]
Swix
And I think even, okay, I think we say good things about cursor cognition. But the sheer liftoff of both anthropic and OpenAI because they have independent valuations. I mean, let's throw an XAI in there. It's now IPOing at 1.2 trillion. That number is just mind boggling. I feel like in normal investing or normal startups there's kind of like a ceiling market cap or valuation that you reach and you go like, all right, it's going to be chiller from now on and these guys are not slowing down.
[27:02]
Jacob Efron
Well, I also think the dynamic that's fascinating about some of these later stage companies is in the past. I feel like in venture world if you got to a certain level of scale, the question around you was really more a valuation question. This is why there was different types of venture people did it. The late stage growth people were just incredible at a little bit of what's the ultimate market opportunity. Of this company. But also what's the right way to evaluate. We know it's, it's in some bands of an outcome that is like sure there's some variance to it, but it's relatively understood what that bands is. And then maybe you get over time surprised to the upside. Whereas any kind of like even the labs themselves, any later stage company, the bands of which that company might be worth right now, even in a year or two years, are so massive because of how fast the ecosystem changes that it's like even for later stage companies, every three months could be an existential level event. To the upside, to the downside. I think that you're obviously seeing it in the positive with code which if you think about a company like Anthropic for a while it was unclear if they were going to have access to enough capital to really stay in the race. Then coding hit at the exact right time. They had the perfect model for it, they executed brilliantly and now one of the most valuable companies in the world.
[28:14]
Swix
At the same time, I have zero sympathy for OpenAI because they're crushing it and they're all rich. This is like a high class champagne problem to have to be number two at coding or whatever. Who cares? You're doing great.
[28:27]
Jacob Efron
It's funny though, you would be closer to this even though you're in the coding space. But it's like a lot of people I talk to think Codex is just as good if not better than Claude code, right? I think one thing that I've been really surprised by and maybe Claude code is a better product in some ways. I'm curious your thoughts is just in consumer AI with ChatGPT you saw this big first mover advantage, right? Where admittedly today like I don't know, Claude Gemini, great products, not sure, not abundantly clear ChatGPT is any better. But like people stick with ChatGPT, it's the first thing to introduce them.
[28:56]
Swix
They stay, but they're not growing anymore. I don't know if you've seen right,
[28:59]
Jacob Efron
but that to me is more of like a product problem than it is. It's not like they've like lost share to someone else. My understanding is the overall problem with consumer AI today is much more of a how do you take this tool? And, and for folks like us, like knowledge workers, it's like this incredible magic tool. But it's not necessarily a daily active use tool for a lot of people around the world today. And what are the products? It's kind of a category wide problem in coding for example, the entire space has gone parabolic. There may be some relative growth in other consumer AI players, but it's not like consumer AI as a category is going parabolic and they're not capturing most of that thing. I think it's actually the larger problem is much more, hey, the category has kind of hit a bit of a plateau of people haven't figured out how to bring tons more users on board or increase the frequency of those users. And so it seems more of a category wide problem than it is a massive market share change. I was going to draw the comparison to the coding space where Claude Code was the first product obviously to introduce people to this magical experience. By all accounts, Codex is pretty damn close to as good, if not better. But still that first product, you would have thought that would not be a super sticky product surface area. And it actually has, it turns out it feels like the first lab to introduce you to an experience really does keep a lot of the focus.
[30:13]
Swix
I think maybe it's still early days. ChatGPT is like three plus years old and cloud code is only just around a year. So give it time. Yeah, I mean definitely a lot of people have switched to codecs, maybe that will keep going. It's really hard to tell. Yeah. I do think that because we are in this high volatility, high temperature phase, the loyalty and stickiness to first movers and category creators I don't think is as high as it might be in some other areas in our careers that we've looked at.
[30:47]
Jacob Efron
Yeah, though I mean I've been surprised by the Claude Code thing. I would have thought that like in many ways I always worried about the.
[30:53]
Swix
You think it would have been gone by now?
[30:54]
Jacob Efron
Not gone, but I would have. I would have. I always worried that the consumer business of these companies would be quite sticky and the enterprise API business was actually in some ways your least loyal buyers they would move to.
[31:06]
Swix
But they worked out that it wasn't the enterprise API, it was enterprise product.
[31:09]
Jacob Efron
Totally. Maybe that was the secret, but the amount of lock in or just default behavior that has happened in that space is more than I might have imagined with two products that by all accounts are pretty damn similar.
[31:22]
Swix
Yeah, no fight there. I will say I do think that Codex is still in like a catch up, like in terms of personal experience. The only thing I like out of Codex is like spark and I feel like the skills integration is a little bit better. I feel like the speed is a bit better maybe because it's written in rust or whatever. Very minor things that you almost like telling yourself rather than objectively assessing between two of them. I do think vibes wise, I think that's going on. I feel like the missing questions in this whole debate is why is it so concentrated in only two names? Where is the Gemini presence? Where is the Xai presence? And they are trying. It's just they haven't made that much progress yet.
[32:13]
Jacob Efron
I think what the Claude Co moment does show, it actually in some ways makes you a little more bullish on the potential for someone else to catch up. Because it does feel like if you're the first person to introduce some magical net new product experience, that that actually might be stickier than one might have imagined.
[32:27]
Swix
Right, right, right. Okay. Yeah.
[32:29]
Jacob Efron
And so what do you think that
[32:31]
Swix
new product experience might be like? It's like. And this is a failure of imagination on my part, like I always wonder, like people always say this, like, well, the thing that will save us is like being first to the next new thing. Like what is it?
[32:42]
Jacob Efron
Yeah, I don't know, something around like consumer agent, computer use, like hybrid. I think, obviously I think we're like scratching the surface on the consumer side.
[32:54]
Swix
So my current theory is like openclaw is like a vision of things to come.
[32:58]
Jacob Efron
Totally.
[32:59]
Swix
And it's good that OpenAI has like the association with Openclaw, but by no means do they have the rights to win it. The general thesis that I have been pursuing now is that the same way that 2025 was the year of coding agents, 2026 is coding agents breaking containment to do everything else. And so coding agents continue to still win, but because they generate software and software eats the world. So it's kind of like the trans associative property of software eats the world. Coding agents eats software. Therefore coding agents eat the world, which is an interesting.
[33:30]
Jacob Efron
And breaking containment, always an easier phrase in the consumer context than the enterprise one. You've seen people run these really cool experiments in their own personal lives. I think like figuring out how you obviously everyone's focused on the enterprise side now around how you create these experiences. I feel like the vibes people love to have these narratives of everything is completely shifted. It's like actually OpenAI, organizationally volatility aside, is great products, great team, great models. Everyone else in the world is incentivized for there to be two, three more. Everyone would love more great model companies. And so I feel like the. The natural forces of the world revolt when any one company is too much the star of the show. There's so many people in the ecosystem that are incentivized for that not to happen. And so I think I'd be shocked if we don't have reversion of vibes. Not maybe completely the other way, but at least a little bit more equal at some point over the next 612 months.
[34:24]
Swix
I think there's just kind of different stages. When you talk about the world wanting more model companies, I think about the NeoLabs and I mean, I don't know, is it fair to say none of them have really broken through in the past year?
[34:36]
Jacob Efron
I think that's totally fair.
[34:37]
Swix
Which is rough and. Well, how are we going to grow that diversity in choice? This is it. Yeah.
[34:47]
Jacob Efron
It'll be really interesting to see what ends up happening with that. And you've seen folks like Nvidia very incentivized to make sure there's a broader platform of other model providers, I think.
[34:59]
Swix
I don't know, people say this, but I don't think they try that hard. Nvidia tries harder to build neo clouds than neolabs.
[35:07]
Jacob Efron
Well, they try pretty damn hard to build neo clouds, so that's.
[35:09]
Swix
Yeah, but like, you know, let's call it like the core weaves of the world. Much happier place than any neolab built on top of them. Yeah.
[35:19]
Jacob Efron
Though one might argue it's easier to enable a neo cloud to be successful than it is. You can't will a neolab into existence
[35:25]
Swix
the same way, control over it for sure.
[35:28]
Jacob Efron
What else is kind of catching your eye today on the startup side? And you worry. There's obviously this whole narrative of like, you know, the foundation models, you know, they announced a product and every stock goes down 15%. Like.
[35:37]
Swix
Yeah.
[35:37]
Jacob Efron
Do you, do you worry about the foundation models just kind of eating into a bunch of these startup categories?
[35:44]
Swix
Not really. I think actually, like, as there's this. Okay, there's this. There's the point of view of like being an investor in startups and there's a point of view of like, do you want to start something? And I think honestly, like, the downside for all these is so minimal in the sense of like the worst you do is you just get hired into one of these labs anyway. So I think the market for people who just do things and try things and try to execute in like a competent way, even if it doesn't work out commercially, even if it just wasn't that great anyway, like, but like, that's your job interview to go into one of these things anyway. So I don't feel that from a very, very small startup's Perspective mid sized startups. Yes. I would say there's been a lot of dead LLM infra consolidation like the Lang fuses at the world getting Azure to Clickhouse. I think people have maybe worked out the domain specific playbook and I think that's okay. Yeah, not that worried about. Okay. So I would say I'd be more worried about traditional SaaS like low NPS SaaS. This is the whole AI versus SaaS debate that's been going on and literally I'm going through that exact thing in my company. So I'm kind of thinking through this on a very visceral level. Right. On one hand you have the people who say you vibe coders don't appreciate the amount of work that goes into a CRM. And yeah, you think you can rip out Salesforce. So did the 30 entrepreneurs before you. Right. You classically underestimate the things that you don't deeply know and the targeting audience is not you. At the same time we have never been able to build software so easily and customize software so easily and yeah, you're not going to use 90% of the things in Salesforce. So what's the type?
[37:33]
Jacob Efron
What have you done internally?
[37:34]
Swix
So there were the main SaaS that we do for event management and sponsor management. That's and we pay 200k a year for that. Not huge but like chunky for my scale. And yeah, I could probably spend 2,000 and build a custom version of that. The trick has been dealing with the rest of my team and getting them on board because I'm the most technical person on my team but I can't make that decision myself. I think in the same way I've been telling with other CEOs team leaders as well, it's like, well you can be super cloud pilled, you can be super LLM psychosis and you think that's okay, but you have to bring your team with you. And I think the sort of widening disparity in LLM psychosis in companies is causing real rifts because on one hand the people who are less AI native are not getting with the picture they're actually behind. They're actually not waking up to the fact that you, everything you think is necessary is not actually that necessary. And in fact it would be better of you if you just held your nose and went in and came out the other side only talking to agents in natural language and your life would actually be better and you're just close minded. There's that perspective. The other perspective is oh you vibe coder you did this in a weekend and you got the 80% solution and now the rest of your employees have to pick up the rest of your shit, right, that you thought you're such hot amazing at, but, like, actually you didn't figure it out. And like, actually LLMs are still useless at this and blah, blah, blah. So, like, I think there's this huge debate going on in every company right now. And, like, you know, I have a small microcosm of it, but, like, yeah, it's making me hesitate to pull the trigger, but, like, I will at some point. It's like, maybe I put it off for one year, but not like five. But sass is definitely getting squeezed. It does make me wonder. I do think that there's an opportunity for a more AI native system of record thing that is not just postgres or not just MongoDB, although both are very good. Maybe it's a convex or people bring up convex a lot. I don't know. I just feel like the sort of quote unquote firebase of AI apps isn't really a thing yet beyond what we have, which is fine. We could probably start in a more sort of rapid iteration cycle first before scaling up to a Postgres or MongoDB, which are more sort of old tech. I was at a dinner with Mike Krieger, the CPO of Anthropic, and we were just kind of going around the room going like, what are people most worried about? And for me, instead of security, I brought up bio safety.
[40:22]
Jacob Efron
Classic, actually.
[40:23]
Swix
Like I said, it was cliche and classic. And the rest of the table were like, what do you mean? Someone sitting at home can manufacture a virus that wipes out half of humanity?
[40:32]
Jacob Efron
It's like the OG Geoffrey Hinton. Like, this is why you should be scared.
[40:36]
Swix
I'm like, yeah, read the risk reports. This is the thing, I think. And Mike was just sitting there, knowing he was sitting on Mythos and going like, actually, it's security. And I think, like, I think there's part of it is very good marketing. Like, too good. Like, I would actually advise Anthopic to tune down the marketing because also it's just a very good model and you don't have to make so many marketing claims around it. At the same time, it is not really a private model if you give it to 40 companies, each of whom have like 10,000 employees or whatever. Right? It's not private. There's bad actors in there.
[41:19]
Jacob Efron
Hopefully not as bad as releasing it widely, but. No, I mean, it's an interesting Case study for how. I mean many model releases, this might be the first model release that looks like the rest of them from now on.
[41:32]
Swix
Right. There's an overall product strategy of. For anthropic of bundle restrict access bundle product with model maybe. Whereas OpenAI has definitely been a lot more sort of philosophically aligned on. We will just enable access everywhere and we don't know what will come out of it. Right.
[41:51]
Jacob Efron
Though I mean this current moment, obviously the cynical take is also just ties to the amount of compute that both companies.
[41:57]
Swix
Yeah, I think that's true. I do think this is the, the scale and the dawn of larger than 10 trillion parameter models is very interesting. I think it's a temporary phenomenon because we have much larger compute clusters coming online for everyone over the next three, five years and this is already written in the cards. So to the extent that will we have rationing of models above 10 trillion in two years, I don't think so.
[42:28]
Jacob Efron
I think everyone will just have rationing of the next phase.
[42:31]
Swix
Right, right. But like that's as it should be. Almost like my, my classic example which I. This is just me theorizing, not anything confirmed by Google. When Google announced Gemini they actually announced three sizes which was Flash, Pro, Ultra. They never released Ultra, they only have Pro and Flash. So my theory is they have Ultra sitting in a basement and they just kept distilling from it for. For Flash and Pro, which like. Yeah, I mean I actually think that's as it should be for any lab that they do that. Yeah.
[43:03]
Jacob Efron
Just because those are the models that people actually want to end up using and it's just like cost per hand.
[43:07]
Swix
Yeah, it's cost. It's not the want, it's just the cost. I do think like it is interesting that for a while I was considering the theory that models capped out at 2 trillion and I think that's proving to be wrong and well then if I'm wrong, how wrong am I? Do we do 200 trillion? Do we do 2 quadrillion whatever. And I don't think we have the straight answer to that. But it's interesting that we are continuing to scale number of params when everyone kind of can see that we're not going to get the next thousand or one million x from this paradigm. So the alias of the world are working on other model architecture improvements. We need a different scaling law I guess because I feel like people already feel like we're tapped out on this. The end state of this is we turn most of the world into data centers and I don't know if we want that.
[44:08]
Jacob Efron
Yeah, if the returns of intelligence are there, maybe not so bad.
[44:14]
Swix
I think there's just a sheer amount of like, unscalability that is wrangling people's sensibilities right now, especially in terms of context lengths. My classic quote is that context length is the slowest scaling factor in LLMs. We took maybe three years to go from 4,000 context length to a million, and that's about it. Gemini has had a million token context length for two years now and no one's using it. So yeah, memory is probably going to be the biggest limiting constraint on all these things.
[44:51]
Jacob Efron
Yeah, certainly seems that way, I guess. I'm curious, over the last year, since you recorded last, what's one thing you've changed your mind on?
[44:57]
Swix
I feel like I was kind of bearish on open models last year in the sense of I had just done the podcast with Ankur Goyal of braintrust where he, I mean, you know, he has a good cross section of all the top AI companies and he says market share of Open source is 5% and going down. I think that's changed. I think it's going up.
[45:22]
Jacob Efron
And even if even the capability gap does seem to be increasing depending on
[45:26]
Swix
the data, it's hard to tell. Yeah, it's really hard to tell because like, okay, for listeners, capability gap increasing is like on public benchmarks. And let's say you're comparing mythos versus, I don't know, GPT OSS or GLM 5.1 and it was really hard to tell because even if they were closing, you will also not believe that they were closing that much because it's very easy to game the benchmarks. So you just don't really, really know. All you know is there's somewhat objective open router stats on what people choose in the free market. And people do choose some of these open models in significant volume, except that a lot of them are heavily discounted. So you need to kind of like price adjust these things. So even if that were true, which I'm not sure, I feel like the number is just up now instead of down. I think the separation between what the top tier agent labs are doing versus the average startup in AI or the average GPT rapper is significant enough that you should not worry about the, the sort of mean industry number and you should cohort things into like, here's the median, here's like the bottom 80% and here's the top 20%. And top 20% acts very differently than the bottom 80% and so top 20%, which is all I care about, is definitely going towards more open models. The fireworks and the togethers are crushing and so will all the fine tuners. Right. So like I think maybe last time we even said things like fine tuning as a service doesn't work well, now it's going to work. It's a derivative of the open models market.
[47:02]
Jacob Efron
Well and also in the workload scaling to the point where people care about cost and speed more and more. And that moving from just pure use case discovery of what can these models do to okay, we know what they can do at scale now let's do them cheaper and faster. Yeah.
[47:15]
Swix
So that change I think is probably the most significant in, in my mind and I always like to do the mental math of like this is what I think about scheduling a learning rate. Like when you've been wrong once, what else were you wrong on? And I'm kind of working through it. To me, the other thing was the coding one, which obviously I have now come full 360 on, but I think people are not appreciating dark factories enough, which I don't know if you've discussed in the POD yet.
[47:45]
Jacob Efron
No.
[47:46]
Swix
And so this is kind of a strong DM Simon Willison term. The general idea is, okay, there's different levels of AI coding psychosis. You can have the very first level, which by the way, I encountered first in cognition five months ago, was zero human written code, which seems like a reasonable thing now, was less reasonable five months ago. The next frontier that sounds as crazy today as zero coding was in the past is zero human review. Just check it in without even reviewing it. And very few people are doing that. But OpenAI is exploring this and I feel like it's definitely the only scalable way to do this, which just means you have to just flip the SDLC or change large amounts of what you normally do, which is probably things you should have done anyway, more testing, more automated verification or whatever. But that is a frontier at which when you have unlocked that in your companies, you are just going to produce much more quantity of software than you've ever had. And it's going to be so disposable, so cheap that you can probably innovate in quality a lot as well. That quantity helps you get to quality, which I think people are very uncomfortable with. Because people associate more quantity with slop. Right?
[49:08]
Jacob Efron
No, it's back to exactly the discussion we're having on the reaction of these token maxing scoreboards and the idea that today maybe that's not the Best sign of productivity and efficiency going forward.
[49:18]
Swix
Yeah, but you still get rewarded for it. So you're like, fuck it, whatever. But I think the people who are doing well, who do most well in 2026, are not the cynics who go like, oh, that's just slop. I'm not going to to participate in that. They're like, okay, this is happening with or without me. Let's bend this the right way.
[49:36]
Jacob Efron
Yeah, I love that. I think for me, kind of a related thing on the open source model side is for so long I really didn't think it made any sense to do any sort of rl post training, pre training, anything you could do to improve overall quality. Certainly for latency and cost, it always made sense to me. But for overall quality, God, you just get that for free. And the model is three, six months later. I think what I'm starting to change my tune on a little bit is hearing all these app companies talk about we build stuff and then we throw it out three months later as the models improve. You're like, okay, well then what you're doing for capability improvement is just another version of that. I still don't think that your RL or post train is going to make you have a better model for years and years to come. But maybe I think you still have to be pretty rigorous. Is that the single best thing you can do to solve a customer problem? Oftentimes it's literally just like now, add more data and feed more data even via connectors to these models or I don't know, do some clever engineering on the back end or whatever it is. But if the single best thing you can do for that three month time period to improve your customers outcomes is post training in some way that really improves the output of a model. Even if you throw it out three months later because the general models get up there, it still might have been worth doing. And so I think I'm more open to.
[50:46]
Swix
You throw out the results, but you don't throw out the raw data.
[50:48]
Jacob Efron
So then you just run it again. And so basically there's some obviously at the level of cost of like $10 million, maybe that's too much, but there's some level of cost where.
[50:55]
Swix
No, it's not even $10 million.
[50:57]
Jacob Efron
No, of course it's not. There's obviously some level of investment at which it's the equivalent of just like staffing four engineers to go build something for three months.
[51:05]
Swix
Yeah. So the other thing I really, for listeners, I'm just going to leave some, some Droplets of info look into like the long trajectory. The synthetic rubrics work that people are doing is very important, including something that's called Dr. Grpo. I'll just leave those key search terms in there. I think what it means is that RL is going much more multi turn than people think. And that means that you can customize the models in way more specific dimensions than traditional, let's call it SFT or this sort of shallow rl that was done a year ago. So like hundreds of turns. And I think that leads you down a path of complete domain specificity.
[51:50]
Jacob Efron
What else of these unanswered questions in AI today are you looking for in the next year are you paying close attention to.
[51:58]
Swix
I have a few thesis for what is the next frontier? One is memory, which memory and personalization we talked about. The other is really world models, which we've done a small little series on from Fei Fei Li to even Moonlake and General Intuition. And there's a lot of debate as to the relative importance of this. I think a lot of it manifests as 3D static worlds that you kind of inhabit for a little bit and you walk around and they're like, cool, but how does this help me with my B2B SaaS?
[52:30]
Jacob Efron
It's like all the hype now is robotics, right?
[52:32]
Swix
Yeah. And there's obviously a correlation between world models and embodied vision and experiences, which leads to robotics. But I think world models is very interesting in just in improving intelligence itself from the next token prediction paradigm. And so I think people are kind of testing their edges around that. One of our top articles this year so far has been on adversarial world models. I do think if you don't do anything else, just read Fei Fei Li's essay on spatial intelligence on why LLMs don't have it. And she may not have the solution yet, but she has the right problem statement. And so everyone else is trying to solve that problem statement in their own way and let's see who wins. But I don't think does you any favor to equate world models to robotics or world models to gaming or some kind of like or like the current manifestations. Because what is at stake is a much more important conception of intelligence than just answering questions. It is, does the AI understand what a table is like? What matter is, what physics is? It's almost like for those who are movie fans, it's like Good Will Hunting where Matt Damon knows everything because he read it in a book, but he's
[53:53]
Jacob Efron
never Great, great scene with Robin Williams, Robin Williams.
[53:56]
Swix
And I look at that scene and I go like, that's exactly the difference between a very intelligent LLM who knows everything but hasn't experienced anything.
[54:05]
Jacob Efron
Wow, that's an awesome note to end up. Have you used that anthem before?
[54:08]
Swix
That's great. Yeah. So one thing I've done with Landspace is I moved to adding daily write ups. And so one of the times I was doing this daily write up, I
[54:16]
Jacob Efron
wrote, that's a great one. I love that. Well, so it's been a ton of fun. Thanks so much for coming on. I'm Jacob Efron and this has been Unsupervised Learning, a podcast where I get to talk to the smartest people in AI and ask them tons of questions about what's happening with models and what it means for businesses in the world. As I hope is clear, I have a ton of fun doing this. It's a nights and weekends project in addition to my day job as an investor at redpoint. But our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It's really what ultimately makes this whole thing work. And so please consider doing that. And thank you so much for your support and listening. We'll see you next episode.