Summary9 min read

Podcast Summary

This Week in AI – Episode 10 (Apr 23, 2026)

Guests:

Aravind Srinivas (CEO, Perplexity AI)
Edwin Chen (CEO, Surge AI)
Host: Jason Calacanis

Main Theme

An in-depth roundtable with CEO-level AI experts covering the future of AI applications, the evolving business landscape, Apple's AI potential, model commoditization vs. specialization, benchmarks and user-centric product development, and lessons from bootstrapped billion-dollar companies.

Key Discussion Points

1. The State and Future of Coding with AI

LLMs in Coding: A discussion opens on whether coding is "solved" by large language models (LLMs) like Claude, Code, Codex, Copilot, Cursor, etc.
- Coding is "completely open ended," unlike games like Go or Chess (B, 00:15; C, 00:21).
- The space of possibilities in coding is endless, limited only by imagination.
End Game?:
- Both guests agree we are nowhere close to "solving" coding with AI (B, 34:12; C, 34:14).
- Coding is not a closed system; it's about creating new things with no defined end state.
- New paradigms are forming: from autocomplete (codex, Copilot) to “auto-diffs” (command-line changes), moving toward "auto-outcomes" (you specify the outcome, AI gets you there) and eventually perhaps compiling directly to binaries (C, 35:36–36:44).
Edwin Chen: “Coding is very different from playing Go... it’s completely open-ended. You could literally create any program in the world.” (34:15)

Aravind Srinivas: "If AIs can actually move you to the level of working at outcomes and binaries, then what you think of coding also changes..." (36:46)
Efficiency and Job Creation
- AI tools have massively increased productivity; small teams can now build and ship faster, reducing need for large dev teams (A, 37:41–39:20).
- The "no code" movement has been obsoleted as LLMs enable true general-purpose AI-powered coding (C, 39:00).

2. Data Labeling & the “School for AGI”

Beyond "Data Labeling"
- Surge AI does more than basic data labeling; it’s like “building a school for AGI.”
- Employs high-level experts (e.g., Harvard, Princeton, Stanford PhDs) to "teach" AIs advanced reasoning, creativity, and taste—not just facts (B, 05:59–07:21).
Edwin Chen: “I often think of what we’re doing as raising these models not just to be correct... but to think and to have certain kinds of values and to have wisdom and taste…” (06:18)
Terminology Evolution
- Prefers “AI teaching” over data labeling or even “data training.” (07:21)
- AI “teaching” includes probing, measuring, and instilling values.
How the Loop Works
- Experts interact with models until they find failures—then instruct them on the correct reasoning. Failures may come from user downvotes, direct probing, or loss patterns analyzed by data science. (B, 08:59–10:23)

3. Apple’s AI Edge and Silicon Strategy

Tim Cook's Successor & Opportunity
- Discussion triggered by Tim Cook’s transition and John Ternus’s elevation.
- Apple Silicon (the M-series chips) is a powerful, underrated AI asset. Future devices could support local LLM inference, agentic workflows, and unparalleled privacy (C, 12:17–14:22).
Aravind Srinivas: “Agent loops start running locally... You get to own your agent loops, what data your agent accesses on your local system... All that can stay private.” (13:11)
Apple’s Models vs. Frontier Models
- Open-source LLMs are catching up. New models (e.g. Kimi 2.6, Quinn 3.6) can run locally, leveraging Mac hardware (C, 12:17–14:22).
- Apple is positioned not for disruption—iPhone's value is privacy and ecosystem lock-in (C, 20:42).
- Apple should create its own foundation model to preserve brand voice and user trust (B, 15:56–17:18).
Edwin Chen: “If you don’t build your own, you’re relying on somebody else’s taste and personality… Apple has always had such a strong vision... they can’t just outsource it.” (17:18)

4. Bootstrapped Billion-Dollar Companies & Capital Allocation

Surge AI’s $1B Bootstrap
- Surge AI reached a billion in revenue without raising venture capital. (B, 24:01)
- Dangers of large AI fundraising: misaligned incentives, growth over quality, board pressure, "unnatural acts" to chase growth (B, 24:01–25:41).
Edwin Chen: “When you raise a billion dollars, you get growth targets… incentivize volume over quality… your model starts whispering clickbait in your ear.” (24:23)
Perplexity’s Capital Discipline
- Raised ~$2B but focused on profitability, positive gross margins, and disciplined growth (C, 29:28–30:47).
- Avoids excessive spending typical of “unicorn” culture—aims for a blend of discipline and ambition.
Aravind Srinivas: “You can be both... have the discipline of bootstrap founders, but have the ambition of the most successful founders... Elon.” (27:27)
AI Application Layer Economics
- Productized AI apps (e.g. Perplexity Computer) have strong unit economics, unlike many coding app competitors (C, 31:07).
- Many coding assistants operate at a loss due to “loss leader” pricing from frontier labs (C, 31:07–31:51).

5. Model Commoditization vs. Specialization & The Value of Orchestration

Application Layer Dominance
- The future value accrues to application/product layer, not just core model providers (C, 55:02).
- Pure API/model offerings are hard to sustain without application integration.
Aravind Srinivas: “People don’t buy models, they buy products… Money is in the applications.” (50:28)
Commoditization Limits
- Open-source models catch up to “frontier” models within months—but specialization stays (e.g., Claude is best at agentic work, Grok at being “unhinged,” Google at multimodal, etc.) (B, 55:50 and C, 55:54)
Edwin Chen: “I still don’t think AI models themselves are going to get commoditized… every model has a different personality.” (52:50)
Harnesses & Orchestration
- Products like Perplexity Computer aggregate/specialize models, allowing users to leverage the best for each task. Orchestration and context retention are now critical (C, 39:20; C, 55:54).

6. Benchmarks, LM Arena, and Real-World Evaluation

Problems with Benchmark-Driven Development
- Benchmarks (e.g., LM Arena) have been gamed—teams hack rankings that do not reflect real user needs (B, 58:16).
Edwin Chen: “LM arena is just this terrible cancer on AI… you have all these teams purely dedicated to hacking it… even though they agree it makes their models worse.” (58:16)
- Goodhart’s Law comes into play—once you aim to optimize a metric/benchmark, it loses real-world value (A, 59:35).
Real User-Centric Evaluation
- Advocates for measuring models by how they serve real people—engineers using code assistants for their actual tasks, evaluating for quality, creativity, and utility, not just benchmark scores (B, 60:03).
- Perplexity tracks actual problem-solving as success, e.g., queries that required no clarifying follow-up (C, 62:58).

7. Efficiency, Company Building & Cross-Disciplinary Innovation

Lean Teams, High Output
- Both companies champion running lean: Perplexity headcount only up 30% with 5x revenue growth; Surge AI has similar efficiency (A, 42:35).
- AI enables more direct contribution across company roles (sales making decks, ops prototyping UI, designers coding, etc.)—vertical integration and less delegation (B, 43:35).
Edwin Chen: “One of the things I love about things like Cloud Design... now you can just talk to Claude Design... it spits something out in 5 minutes… as opposed to days of back and forth.” (43:35)
Hollywood Analogy
- AI blurs all creative roles: storyboarding, writing, production—movie made with actors on gray screens, rest by AI; cross-functional expertise becomes key to breakthrough innovation (A, 45:43–47:40).
Aravind Srinivas: “Most of the work into producing a movie is about pre-production and story… a good story will win, even if you don’t do a great job on production values.” (47:40)

8. Favorite Recent AI Tools & Experiences

Host & Guests named their favorite recent AI breakthroughs:

Jason’s Pick: Whisper Flow speech-to-text with a foot pedal for dictation—a “life-changing” combination for productivity (A, 72:38).
Aravind’s Picks:
- Perplexity Computer for financial research & data analysis (C, 72:53).
- X’s “Explain Grok” feature—contextual explanations of trending X/Twitter topics (C, 72:53–73:25).
- Gemini 3 Flash—praised for its model speed/quality balance (C, 74:02).
Edwin’s Picks:
- Claude Design—impressed by ease/creativity unlocked in design process (B, 74:40, 75:10).
- Using LLMs to analyze his blood work, getting better recommendations than his doctor (B, 75:10).
Discussion of AI-powered health and sleep monitoring tools (Whoop, Oura, Apple Watch, etc.) and their increasing integration with LLMs for actionable health advice (A, 75:44–78:08).

9. Hiring & Company Growth

Surge AI: Always seeking experts and team members – surgehq.ai (B, 78:41).
Perplexity AI: Hiring sales (enterprise expansion) and full-stack engineers (C, 78:56).

Notable Quotes & Moments

| Timestamp | Speaker | Quote/Highlight | | --- | --- | --- | | 06:18 | Edwin Chen | "I often think of what we’re doing as raising these models... to have wisdom and taste…" | | 13:11 | Aravind Srinivas | "Agent loops start running locally... All that can stay private." | | 17:18 | Edwin Chen | "...if you don’t build your own [model], you’re relying on somebody else's taste..." | | 24:23 | Edwin Chen | "You get growth targets… incentivize volume over quality… your model starts whispering clickbait in your ear." | | 27:27 | Aravind Srinivas | "You can be both [capital efficient and ambitious]… the discipline of bootstrap founders, and... ambition of Elon." | | 34:15 | Edwin Chen | "Coding is...completely open-ended. You could literally create any program in the world." | | 50:28 | Aravind Srinivas | "People don’t buy models, they buy products… Money is in the applications." | | 52:50 | Edwin Chen | "I still don’t think AI models themselves are going to get commoditized… every model has a different personality." | | 58:16 | Edwin Chen | "LM arena is just this terrible cancer on AI…" | | 62:23 | Jason Calacanis | "There’s a difference between optimizing for what makes users’ lives better and for clicks and engagement." | | 62:58 | Aravind Srinivas | "For example, if a user asks a question and then follows up with, ‘but no, I meant…’ that means you didn’t do a good job." | | 74:40 | Edwin Chen | "I’m a really big fan of Claude Design… it’s really well designed and opinionated." |

Timestamps for Key Segments

The coding paradigm, open-endedness & automation: 00:00–03:55, 32:19–39:20
Data labeling vs. AI teaching: 05:59–10:23
Apple Silicon & strategy: 12:17–15:36, 20:42–23:01
Bootstrapping, capital discipline & industry funding: 24:01–31:51
Model orchestration, commoditization & specialization: 50:28–55:45
Benchmarks & real product evaluation: 57:59–62:34
Efficiency & lean organization: 42:35–45:43
Cross-disciplinary innovation (Hollywood, design): 45:43–48:51
Favorite AI products & experiences: 72:38–78:08
Hiring: 78:41–79:12

Tone

Candid and expert-led, with practical, sometimes philosophical, and always user-focused reflections
Light humor; competitive yet collegial banter
Emphasizes operational wisdom, long-term thinking, and clear skepticism about hype and vanity metrics

In Summary

The guests argue convincingly that:

The application layer is where real value will accrue in AI, not just core models.
AI’s true promise is in user-centric, privacy-preserving, highly efficient products—exemplified by Apple’s ecosystem and the efficiency of both featured companies.
Commoditization is limited by model personality, brand, and user trust; specialization (and orchestration of models) will be critical going forward.
Real-world benchmarks should be human- and utility-focused, not gamed leaderboard scores.
The next AI boom will be characterized not by oversized organizations, but by small, nimble, cross-disciplinary teams leveraging AI as a true productivity multiplier.

This rigorous, CEO-level roundtable is essential for anyone building, investing in, or deploying next-generation AI.

Loading summary

Transcript150 lines

[00:00]
A
Can these other companies keep up with Claude code cursor codex, GitHub's copilot?
[00:07]
B
We're almost still sort of at the beginning of all of this. Progress that can still be made. Someone else could catch up.
[00:13]
A
Are we in the end game when it comes to coding?
[00:15]
B
I don't think we're anywhere close. Coding is very different from playing GO in that coding is completely open ended.
[00:21]
C
The space of possibilities in coding is endless. It's limited purely by your imagination.
[00:26]
A
Are the LLMs going to get commodified?
[00:28]
B
I still don't think that AI models themselves are going to get commodified. Even if they all have have the same degree, same level of knowledge, people will naturally sometimes just want to talk to different models depending on their mood, even for the same topic.
[00:39]
A
What do you think about where the value will start to accrue?
[00:43]
C
I believe that the value is in the application layer. One of the main reasons some of them went out of business, they couldn't build an application. The pure API model doesn't work.
[00:52]
B
I think there's a difference between optimizing for what the humans want, what the real users want and what makes their lives better, as opposed to optimizing for clicks and engagement.
[01:01]
A
Thanks to our friends at PayPal, the exclusive sponsor for this Week in AI. Try the payment and growth platform that's trusted by millions of customers worldwide. PayPal Open start growing today@paypalopen.com all right everybody, welcome back to this Week in AI episode 10. This is the new roundtable that I've been doing in order to get smarter about AI and keep up with an industry that is moving. Every month is probably a year in our industry. Keeping up with it incredibly hard. That's the point of this podcast. We'll talk about whatever's happening in our industry with experts, the people who are actually building the future. And we've got an amazing roundtable this week. Aravind Srinivas is with us. He is the co founder and CEO of Perplexity AI started out with AI powered search and answers. People became addicted to that and then released something called Perplexity Computer in addition to a number of other really great products. And per the FT the Financial Times, your revenue has grown 100 million to 450 million apparently. I don't know if you've confirmed that or not, but you've had quite a run. You did confirm it.
[02:15]
C
Okay, 500 million was what we confirmed a week or two ago.
[02:20]
A
Amazing. So this is hyper growth at its best and it is it perplexity Computer that's really driven this. Is it, is it that interface?
[02:27]
C
Yeah, that's right.
[02:28]
A
Why, why, why has this product become such a hit in your mind? I use it every day. I love it. I love the model Council. I've been using Perplexity Comment browser and won't shut up about it. But you can download this incredible app. You have Perplexity Computer. Why is it catching people's attention? What are people using it for?
[02:48]
C
I think it makes agents really simple. That's the core reason. Um, it makes it very, it's the most intuitive interface to have a manager of a, to be a manager of several agents, to essentially orchestrate several different agents which don't need to like, you don't need to think about whether it runs locally or on the cloud or like setting it up. There's no onboarding required, like there's no onboarding pain. There's no need to bring API keys. It all works intuitively in the same interface that you're used to asking people to do stuff for you. And it connects to hundreds of connectors that are valuable. It puts all the models in one harness, one agentic harness. So you don't have to feel a vendor lock in to CLAUDE or GPT. You just can guarantee the best model will do whatever it's supposed to do. And people love that. People are using it for a lot of deep and wide research and browser automation, data analysis, so many tasks, building dashboards, building web apps.
[03:55]
A
Yeah, we've started using it internally. We obviously had a fascination with openclaw. We're still iterating on that open source project. We've tried Claude, cowork and then the team just started loving Perplexity Computer to the point at which I had to upgrade to the $200 a month account. So you got me on the hook.
[04:14]
C
That's awesome. I'll be happy to. Any customer support for you guys. So please, please feel free to ping me if you have run into any issues.
[04:23]
A
We will ping your customer support line. It's. We're having good success with back office functions. You know, we have a venture capital firm and we've got accounting and we've got legal documents. We do due diligence. So we've been writing the scope of work, you know, the standard operating procedure, the best practice for say doing due diligence on startups. And now we're going into the Perplexity Computer product and trying to figure out, hey, which sections can we actually give to it and how well does it work? You know and it's been quite impressive. Also joining us today, Edwin Chen is here. He is the founder and CEO of Surge AI. They're doing data labeling for all these frontier models. Founded in 2020 and this has become an incredible space. You have clients from OpenAI to Google, Anthropic, Microsoft, Meta, from what I understand, 130 employees, approximately 50,000 expert contractors. Some of this might be needs to be updated. I'm not sure Edwin, because like I said earlier in our opening, things are moving really fast. But there's been a lot of brouhaha on the Internet about expert networks as well. Is this a great business? Is it a terrible business? They're very fast growing businesses which always gets people wringing their hands. We have an investment in One Micro One, which I think is a contemporary of yours. Tell us a little bit about the business, why it's important, and then maybe I don't know if it's a backlash or the criticism of the industry that we saw in the last week or two.
[06:00]
B
I mean, I would start off by saying I actually hate the terminology data labeling because when you talk about data labeling, you think about people doing incredibly simple things like labeling images of cats and dogs and drawing bounding box drawn cars. And I think what we do is actually so much more complex than that. I often think of what we're doing as building a kind of school for AGI. We have all these incredibly smart physicists, like Harvard professors, Princeton graduate students, Stanford Computer Science PhDs. And what they're doing is they're kind of like cross examining these models and probing them to figure out when they make mistakes. And then when they make a mistake, they are going in and teaching them all these incredibly advanced things. And so yeah, I think it's almost like one of the most profound things that we can do for AI. It even goes beyond teaching. I often think about what we're doing as raising these models not just to be correct, not just to produce the answer to a question, but to think and to have certain kinds of values and to have wisdom and taste and all that. First of all, I'll start by saying I think what we're doing.
[07:03]
A
Is there an industry term that has evolved this from data leveling labeling? Because I agree with you, it's much more than that. When you hire PhDs or lawyers or CPAs to, to do this data training, is it data training? What, what's the right term in the industry or does it need one?
[07:22]
B
So again, I think I often think about this either parenting or education analogy where again, what we're doing is going beyond teaching them facts, going beyond just teaching them, oh, like, you know, this is a Wikipedia page and here's the correct answer. Instead, we're trying to teach them creativity and taste. So personally, the terminology I like is either teaching the models, AI teaching, because I think it actually gets beyond training as well. You're also measuring them and all that. But
[07:56]
A
how much are these collectively the Frontier models, spending on this? It seems like billions of dollars a year.
[08:01]
B
Yeah. And I think the crazy thing is I actually think that this pales in comparison to their compute budgets. So I think they should be spending a lot more.
[08:09]
A
Yeah, fair. These are true Experts getting paid 100 bucks an hour, 200 bucks an hour from what I understand, in very specific fields, because a lot of the data obviously on the web that could have been crawled has been crawled. So how does it just take us mechanically and then we'll get into this amazing docket we have today. We've got four or five great subjects we're going to chop up, but just for the audience to understand. How does it work? Do you take the queries that people gave a thumbs down to when they were using an LLM and does that get routed? Like if somebody is not happy with a perplexity query, does it get routed to you to fix? Or do you just say, hey, let's just hire this group of attorneys to take these important cases in the world and annotate them in an intelligent fashion. How does the data work?
[08:59]
B
Yeah, so there's actually a bunch of different ways it can work. So probably the most canonical way it works is, okay, so you have this expert mathematician and in the course of their normal research, yeah, they're trying to prove some new theorem. They will kind of just interact with the models as if they're doing their normal research. So hey, try to solve this problem or try to explain this concept to me. They keep on doing that until they find a failure from the model. Again, this is why I often think about it as a cross examining model. You're talking to them until you kind of find this very, very interesting failure. And sometimes there's ways of accelerating finding that failure. Like we may do various things on our end where we're asking our data science team to find loss patterns in the models that guide the failures. Or yeah, sometimes Frontier Labs will send us certain kinds of queries where they sense that users are unhappy. And I mean, it's not always the case that the user is right. Users are often wrong. They fund things down for incredible reasons. And so they will still need to make sure that the model failed there. And so what we do is we'll verify that model failed and if so, we'll teach it a correct answer. There's all these different ways of coming up with these almost broken gaps in the model's reasoning. And so it might be either us finding it ourselves, it might be through various kinds of analyses that we do, it might be through user conversations. We basically take those conversations and then we teach it the right answer.
[10:24]
A
Yeah, and it's becoming a great job for people. All right, listen, Topic one, the industry is really was taken back by Tim Cook deciding to transition out of the CEO role. This is an important thing for us to discuss here because Apple has a huge opportunity in AI for a number of reasons and they named John Ternus as CEO. Cook's going to be CEO until September 1st, then he'll move up to executive chairman. He's still going to work on industry relations, but Ternus has been there now for 25 years and he worked on a lot of very important hardware products while at the company. And I guess the take I'm most interested in hearing, Aravind from you and also from Edwin is Apple Silicon has arguably been one of their great success stories. They got off of intel and then when the open clothing came out or Kimi opensea a bunch of open source models, people said, hey, where can I run these? Okay, if you don't have an Nvidia rack, People started pulling together Mac Studios with 128 gigs of RAM, 512 gigs of RAM and they also have Siri. So you have Siri, you have Silicon and you have a system, an operating system. So three big S's there. What should the new CEO do? What would you advise them to do, Aravind, if you were working with them with this incredible group of assets because they don't have a language model to speak of. They've worked on some open source projects. They've got a dysfunctional broken Siri that everybody wants to throw out the window of their car when they try to use it. But it does feel like they are positioned well. What are your takes?
[12:18]
C
I actually think the M series chips, which is a project led by John Ternus, the current CEO, is one of their underrated assets. I think people really underestimate what it takes to build a powerful chip at this moment in time. It is even better on the benchmarks than DGX Spark, at least for local inference of LLMs that can be hosted locally. And the Open Source models. You might have seen Kimike 2.6 that launched recently, I think yesterday, that seems to be doing even better than Opus and GPT on some of the Terminal Bench and Agentix suite benchmarks. So I do think these models are getting to a point like Quinn 3.6, Kimi 2K2.6, they're getting to a point where they can be competitive with the Frontier, but they could also potentially run on one of your MacBooks or the auxiliary hardware like Mac Minis. Especially the M6 chips are going to be even better. They've already secured like a lot of the fab capacity in advance for the 2 nanometer chips for next year, so they should go deeper on this. And I think you have the perfect leader for that. And Tim has set up the company well so that Apple's looking as a bet paid off for multiple years in the future. So if agent loops start running locally, that's the CPU compute. All that stuff doesn't need to be centralized on servers. You get to own your agent loops, what data your agent accesses on your local system, local files, local apps, messages, emails, notes, photos, all that can stay private. And the orchestration loop can run locally. And the model orchestrating them could also potentially run locally. And which company is best positioned to profit from all this? I think it's Apple. So they're actually in a pretty good spot.
[14:22]
A
Yeah, this is an incredible vision if you think about it, Edwin, because frontier models are expensive now. They are the frontier model, so they tend to be ahead. But as you learn, if you're a perplexity user and it's just picking the best model for you, you know, nine out of 10 queries that most people do, they actually don't care what models exactly.
[14:44]
C
This vision is compatible with Frontier models coexisting together. Like this orchestrator can still ping a sub agent that relies on a Frontier model. It could use your own API key, or it could use a Perplexity centralized version. Doesn't matter. But the key thing is the loops start running locally on your hardware. The agent loop itself, the recurring processes like event triggers. We could have a trigger that says, every time Jason texts me about an issue on Perplexity Max, make sure to alert my support team about it. I could set up a lot of loops like this that just don't need to run on any server. And then it starts to be my own personal computer or my own agent that I own. And the hardware device that's best suited for this is Apple's ecosystem.
[15:36]
A
Edwin, what are your thoughts Here on the power of the silicon and what the new CEO should do given, hey, they're kind of starting from zero in terms of any kind of product that's facing the, you know, their massive customer base. What would you do? Do you have the same vision as Arvind or a different one?
[15:56]
B
So I think I would say two things. So one is, I think historically a lot of people have thought that LLMs were going to be commodities. At the end of the day, every model is going to be intelligent to some level and they're going to be interchangeable. I think what I believe and what we've been starting to see over the past year is that actually every model has a different personality. Like you interact with ChatGPT and it just feels very different from the type of conversation, type of personality, the type of taste that you get from Claude or Gemini. And so it's almost like if they're not a commodity and I really don't think they're going to be, you really, really, really need your own foundation model because AI is just going to be so important to the future and to the kind of feel that you want your products to have, that you really are going to need your own foundation model. Otherwise you're just going to be relying on somebody else's taste, somebody else's sophistication. So I really do think that the base foundation model is going to be incredibly important for Apple and they really need it better themselves.
[16:59]
A
Do they need to own a model? Edwin, on that first point, do you think that they should either buy a model company or just start a group and maybe fork an existing open source one because they have an image, one that they work on that's open source. What do you think they should do in terms of building models? And then definitely go on to your second point, Neil.
[17:18]
B
I mean, I definitely think they really need to build their own because if you, you don't build your own, you're relying on somebody else's taste and personality for how an AI should behave. And like, you know, Apple has obviously always had such a strong vision for what their products and what like their design should be that they can just outsource it to somebody else. Like, sure, they may be able to do it temporarily, just play around with all these different concepts and play around with what AI products may look like on an Apple device. But they really, really want to own their future. They're going to need their own. They're going to need to infuse their own values into the way they want these AI systems To behave seems clear
[17:58]
A
to me that they are going to put the whole company behind this. And one has to wonder like Steve Jobs started the Apple Silicon movement. I think it was 2008, 2009 when they made the decision. They came out with the first products like eight, nine years later. This was a very significant strategic effort. But I don't think that they had in mind like, oh, this is going. At that time there were no large language models. Nobody even knew that this product would exist. But man, talk about serendipity and making a great bet, huh? Aravind if you think about historically Steve Jobs's legacy, he kind of saw around a corner that, you know, or maybe two corners at once, this impossible task.
[18:40]
C
Yeah, I think Apple Silicon is like very force. Like it wasn't necessarily a bet they made to build hardware for LLMs, but the hardware got increasingly more and more powerful. The neural engine is very capable. The MLX compiler is like really, really good. And they have like a lot of expertise in building these things now. And not just that, like it's not just about Mac studios or minis, it's also consider the fact that if you do want, you know, like an ecosystem of compute, right? Like, like you want to wear an Apple Glass, let's say in the future you want to parse whatever you're seeing and you want to start asking questions about it. And all that pairs seamlessly with some auxiliary hardware you have at home. But it's all running as a pseudo desktop server. So you're able to pair all that compute in one family of devices. I think all that's like the kind of magical experiences you can provide to a consumer without draining the battery on the, on the device itself so that all that stuff hasn't been converted into a real consumer experience yet. But it feels like to me that even if other people build all these consumer AI devices, they're eventually going to lose to Apple because they have all the chips advantage. They've already secured the capacity for it years in advance. They have the os, they have the ecosystem lock in and they have all your personal contacts and you trust them to handle it in the most privacy conscious way. Right?
[20:16]
A
This is a key point, right? There's a privacy. Unpack that a bit because you mentioned it earlier, like hey, as a corporation you want to own your agent loops, the agentic knowledge that your organization is building. It's essentially your entire business. And to feed it into another LLM, well that might seem to some people like okay, no big deal until all your secrets are now being used by your competitors because you just did the training on the next cloud model.
[20:43]
C
Yeah, exactly. So I mean, this is also why I think even if they're using a different model, like say, I guess the news is they're working with Gemini, they will host it on their own silicon, they will customize it to their own needs. They'll be doing a lot of custom post training for that. So my sense is that even though a lot of people have like, you know, opinions on how bad Siri is or good Siri is, they have the ability to take time and do things the way they want to do because they have a lot of advantages as a brand that people truly trust. And the ecosystem lock in is underrated. And the auxiliary hardware, devices, that chip advantage, all this is like really underrated right now because, and here is my opinion, I haven't said this before, the iPhone is actually not getting disrupted by AI at all. In fact, the more AI works better, the iPhone essentially becomes your digital passport. It has your wallet, it has all your cards, it has your passes, it has your health records. You connect with other human beings through it. You do facetimes, you do calls, you have your photos of precious moments in your life. All these are things that are truly personal to you and have no connection to AI. And that's why they can actually afford to move slow.
[22:14]
A
Yeah, it does seem to me that that privacy piece, Edwin, is the privacy plus, the silicon plus. Oh my gosh, my photos are here, but I don't want my photos up on OpenAI. All due respect to ChatGPT, but, but even with Google, I'm like, do I really? I, I turned off syncing, you know, and took my photos off Google. I'm like, I don't think I want those in the Google cloud right now. I much prefer to keep all my kids photos and maybe I'm a weirdo on my local device and I trust Apple to not train their next image model on my kids images, et cetera.
[22:49]
B
Yeah, yeah, exactly. I mean especially it's because these models are so powerful that when they're trained on certain pieces of data, they just end up regurgitating it. I think it's really, really important that people should wonder about where their data comes from.
[23:01]
A
Yeah. The next story. Edwin, you were talking on our group chat there about the massive amount of late stage capital and we are in a really interesting moment in time in venture capital. We've all been in the industry for a while, but the amount of money and the velocity of the money coming into this space is extraordinary. AI companies raised $242 billion in Q1 of 2026. I'm assuming that number includes the giant hundred billion dollar raise by OpenAI. But your company, Edwin Surge, you, you waited to raise money. I think you hit like a billion dollars in revenue before you did your first round. Maybe talk a little bit about why you went with the bootstrap model for as long as you did. And then what the impact of all this money being dumped on founders? What is the impact that's going to have on the industry? And then Aravind, I'm going to go to you just to talk about how you manage your treasury because you've also been a beneficiary of this.
[24:01]
B
Yeah, so I mean, first of all, we've actually never raised. So we're still. You've never raised, so we're still happily bootstrapped in and growing in, I think the best way possible. So I'm actually really, really happy that we've never raised. And yeah, to your point, I think it's never been easier to raise money in AI. And that's kind of the problem when you raise a billion dollars. You get all these growth targets from your investors that incentivize volume over quality. You get all this board pressure to spend your time optimizing for your next fundraise instead of the product that you're building. All of my friends who are CEOs of other companies, they're like, oh yeah, I have to spend the next few weeks just prepping a board deck. And they're always jealous of the fact that I don't have to do the same thing. And I think the problem is I've heard that some post training teams, their goal is not to make their model more intelligent. Their goal of all these post trading teams is actually just to get their companies a billion users. And if that's your North Star, and yeah, it's a North Star, that happens when you raise gazillions of money. Is it surprised that your model starts trying to whisper in your ear and start clickbaiting you? It's kind of funny. A couple weeks ago I was chatting with ChatGPT and I was asking it give me some tips for what to do in Tokyo. And it ended a response to me with like, hey, by the way, do you want to hear about one weird trick that you could do that let's know. And it was just shocking to me because we have this super intelligent model and it just sounds like a 2002 tabloid. And the problem is this is what happens when you have all these different incentives that don't align with what you were originally trying to build. So, yeah, I think we chose a completely different path, and I think we're really happy that we did it.
[25:41]
A
Super impressive. And you hit a billion dollars of revenue bootstrapping, which I don't. I'm trying to think of another company that's done that in our industry, and I can't. I don't know if you can. Edwin, have you heard of one who's hit a billion in revenue without raising venture capital? I mean, I know people who have been incredibly judicious about raising. But that's a true. That's a true first for me. I don't think I've ever heard that.
[26:08]
B
Yeah, I mean, I think it's really important because I actually really do think that AI, it's just so important for our future that it kind of needs to be shielded from the typical Silicon Valley growth hack playbook.
[26:18]
A
The only ones I can think of. Mailchimp was famous for that in Patagonia, but that's not in our industry. But those are SoHo, that other company, that was another one that didn't raise a ton of money to do this. Aravind, you've got a war chest. You've raised. You've raised from some of the most important companies and investors in the world. Do you think we're seeing unnatural acts? That's the term I use for what the phenomenon Edwin's talking about. I saw it up close and personal when I was in the publishing space, and people would chase clickbait. They would do all kinds of unnatural acts to try to get their page views up. And BuzzFeed would be. And Business Insider would be these canonical examples of just lunacy. How do you stay grounded? And then also you're in competition with people, so if you don't raise, then there's capital as a weapon. As I saw up close and personal with Uber vs Lyft vs Icar. Just Travis was an absolute monster when it came to raising money. And if you invested in one, you couldn't invest in the other. That's kind of gone away here a bit. But what are your thoughts on this issue?
[27:28]
C
Huge kudos to Edwin for doing what he did. I think not raising capital and getting to a billion in revenue is very, very hard. Not just in terms of business building, like, you know, just financial health, but also convincing other people to come join your venture. They're all looking at valuations. They're all. They want validation from the rest of the world like to. To. To convince really good employees to come join you when you don't yet have a working business. They want to see validation from somebody else, which is. Which could be a reputed venture capitalist. So trying to build a team without that is very hard. So kudos to him. I think one thing that, you know, founders need to take away from the success of companies like Surge is the need to be more disciplined. You can raise money for sure, as long as you truly know that you're spending it the right way. And Elon raises a lot of money, but he knows exactly what to do with it. Xai has raised a lot of money, but it's being spent on building data centers. He's known for being very judicious about the allocation of capital. So you can be both that you can have the discipline of bootstrap founders, but you can also have the ambition of the most successful founder in history of capitalism, Elon. So if you can figure out a way to be both, you could be far more successful. So that's my takeaway. It doesn't have to be a dichotomy between staying bootstrapped forever or raising endlessly with very indisciplined capital allocation. I think you should just be very good capital allocator and you should have a clear plan for why you need money. And then you should continue to have the discipline of a bootstrap founder even if you have a gigantic war chest.
[29:24]
A
How have you managed that? You've raised. How much have you raised to date? I mean, I think it's been public.
[29:29]
C
I think we've raised around like around 2 billion. Cumulatively. We haven't raised since like August of last year. And our goal is actually to advance further in our revenue progress that we've been doing since the beginning of the year and try to become profitable. Unlike a model company, we don't have to actually spend a lot on compute, particularly on training. We do a lot of post training, but we don't do any pre training. So we have no excuses to not be profitable. And like, unlike companies in the coding application layer, where your gross margins on the revenue are actually negative, we don't have that problem. So gross margins on all the revenue we make are pretty positive, highly positive in the case of like max users, the $200 a month plan. So our goal is to just keep growing the top line, stay disciplined, not actually spend more on payroll or infra, and become profitable as soon as possible. And when that happens, we don't actually think more capital is leading to a meaningful change to our destiny. And that probably should become the blueprint for application layer. Companies try to just run the company in an efficient way with the discipline of bootstrap founders like Edwin and try to keep growing the top line revenue.
[30:48]
A
Having the unit economics dialed in Arvind is critically important. We've seen some other folks who are supposedly, and I think the coding space is the number one example, they're just losing money.
[31:00]
C
That's correct, yeah.
[31:01]
A
Yeah. On their, I don't know what percentage of users, or maybe it's the entire user base in aggregate loses money.
[31:08]
C
Yeah, that's right. That's what I know I could change. But today that is the case. And this is not because of the product, it's actually because of the Frontier Lab subsidizing tokens in the form of a subscription plan. So even though CLAUDE code is worth $200 a month, the amount of tokens you can consume on CLAUDE code is actually worth more than the $200 a month you pay. So they're actually running it as a loss leader to just dominate token collection in order to take all those tokens and make their models even better. So if you are an application layer company competing with Codex and cloud code and coding, it's pretty difficult for you to have any positive gross margins.
[31:51]
A
And Edwin, I would assume that part of that is whoever has the most tokens consumed specifically in coding will have the best model because you'll have the most reinforcement learning. And all that usage from developers is going to create signal for your model.
[32:10]
B
Yeah, yeah. I think there's a lot of interesting signals that you can learn and you know, just kind of like the more you reason, the more your models reason, like oftentimes that just leads to better responses.
[32:20]
A
Can these other companies keep up with Claude code? Cursor Codex, GitHub's copilot, are they going to keep up or do you think we've hit this sort of acceleration where Claude's going to run away with it?
[32:33]
B
Edwin, I don't think there's anything inherently preventing any other companies from catching up. Like certainly that data is valuable, but we're almost still sort of at the beginning of all of this progress that can still be made that I think, yeah, I think someone else could catch up if they wanted to.
[32:50]
A
Do you do data labeling for all this, like Fortran and Cobalt? And is that part of the desire of these companies to get you to find these old graybeards to explain to you how these as 400 and microcomputers from the 60s and 70s actually work. Is that a big business for you?
[33:11]
B
It is. I mean the coding landscape is just so huge that we have to be part of every aspect of it. So yeah, it's every single language. It is front end design and back end design. It is the correctness of the algorithms, efficiency to the algorithms, but also the quality and the beauty of the front end designs that they create. So it's just such a wide landscape that yeah, we have to be part of.
[33:34]
A
Are we in the end game when it comes to coding? You know, it is a finite set of data, Edwin. So it would seem to me there will be diminishing returns at some point. Are we 96% of the way there? 99% of the way there, like self driving is apparently, you know, 98, 99% of there with the edge cases. How would you contextualize that game? If we made it a game in terms of being perfectly solved, Chess got perfectly solved. They believe no Limit hold' em has been almost perfectly solved. And plo. We'll see if that eventually becomes perfectly solved. I think they're on the way.
[34:13]
B
Yeah, I mean, I don't think we're anywhere close.
[34:15]
C
So.
[34:15]
B
One of the things I often think about is coding is very different from playing Go in that coding is completely open ended. Right. You could literally create any program in the world. It doesn't have a single solution or a single end state. Sure. A game of Go sort of ends with one person winning and the other person losing. It's almost like I think one analogy I often think about is imagine you took Jeff Dean and you gave Jeff Dean 1000 years to learn more about coding and to explore the world and to also learn about poetry and mathematics and physics and, and I don't know, history and artistry and all of that. He'd be able to incorporate all of these principles into what he builds. And yeah, I mean software is about building globe spanning infrastructure. It's about designing rocket ships. So there's almost an infinite ceiling to what coding is capable of that. I really think that we're just 1% of the way there.
[35:08]
A
Where do you stand on Aravind? You think we're getting close to solving the game of coding and everyone will just be able to make quality code? Or do you think we're producing a lot of slop with a lot of attack vectors? We've obviously covered the various attack vectors that AI is helping identify with Mythos. But what's your take on where we are at in terms of solving the game of Coding.
[35:36]
C
I think the framing should be around what does solving mean. So maybe think of this as paradigms like cursor, GitHub, copilot, like autocomplete, like you're trying to complete a few lines of code, but you are writing code, largely quad code codex, as command line interfaces. You can almost think of it as auto diff. You're looking at the diff, the new lines of code added and the existing lines of code subtracted. You're not actually auto completing anymore. You're operating at a different abstraction of changes. The next paradigm is just going to be auto outcomes. You're going to look at the outcome, you're not even going to look at the diffs, you're not going to read any line of code. You're going to look at the outcome and then you're going to ask for changes and you're going to keep iterating. That's clearly the next thing. And then Elon talks about it. I think he talked about it in one of the XAI all hands that got live stream where he said you're going to just output the binary,
[36:44]
A
which is a wild thing to think about.
[36:47]
C
So I still feel we haven't hill climbed on capability yet. I think we are still very early in what it means to solve coding entirely. Of course, problem solving skills like, you know, the ability to connect dots across different things that Edwin was talking about. You know, I almost imagine like how would it. What is Jeff Dean or someone like. I heard Linus is also using AI. So what are these people coding? How do they do things these days that they were not able to do before? I think all these things are very interesting to think about. But Also fundamentally, if AIs can actually move you to the level of like working at outcomes and binaries, then, you know, what do you think of coding also changes from this, Inspecting lines of code and things like that.
[37:41]
A
I think that's the most exciting part. And you know, I always say startups are like where you can see these trends before anything else. It's kind of like Santa Monica with yogurt. I'm sorry, with yoga and like fresh food and farmer's market. Like everything interesting starts with the hippies in Santa Monica and Venice and then goes east if they wind up making it pass there. And I feel like startups are the same thing. Startups now you'll have two or three people. They'll never add their fourth employee, the fifth employee that they thought they were going to add. And they're shipping code faster than I'VE ever seen. And they're doing their go to market and their customer acquisition using things like Perplexity's computer. And they're producing code at such an alarming rate. Edwin, I actually think we're going to see a significant amount of job creation as more people realize I don't need a developer to start a company. And we had a false start. There were people using scripting and. Oh, God, what was it called? Before Vibe coding, there was a term for like these code. They were almost like wysiwyg. Gosh, what was the name of it? Do you know, Arvind, that people used to do where they would kind of vibe code.
[39:01]
C
No code.
[39:01]
A
No code. Thank you. The no code movement, which was like such a false start. But I used to have people pitch the accelerator and I'd be like, oh, who built this? I did. I'm like, okay, whatever. I thought you were the salesperson from Salesforce who started their own company. Yeah, but I just figured out how to use no code. And like, no code's just gone now. Right? Like, it's just totally replaced.
[39:20]
C
It's fairly limited in what it can do in terms of, like, what are all the possible set of things you can do? Because it was built with certain intentionality, certain deterministic behavior, certain level of hard coding. So obviously it can cover all the combinatorial possibilities that models can just generate code on the fly and do whatever you ask them to do. This is also why it goes back to the point that I think Edwin made earlier, which is the space of possibilities in coding is endless. It's limited purely by your imagination. You can build things that exist inside Minecraft. The kind of structures and worlds that you can build inside Minecraft is endless. So as a game, Minecraft is even more complicated than Go is. And Minecraft is just one game that AI can code. And the world is full of infinite possibilities. So that's kind of why solving coding means you have something truly general purpose intelligence.
[40:19]
A
What are you encouraging your developers to use internally? And how much more productive are they this year when you look at it?
[40:29]
C
Yeah. So largely it's two camps. Codex or cloud code. I've been trying to understand why one is preferred over the other, and it keeps changing. But I can share with you a rough level of understanding I have today, which is swift UI and rust. People like codecs. I think it seems to be better there. Front end development and full stack development. People like clock code, especially if you want to have front end design work done. Clock code seems to Be better. So this also goes to the point Dario made in one of the podcasts recently of this whole point of models commoditizing. Like actually what's happening is models are specializing and even within a specialty like coding, there are specializations on which aspects of coding each Frontier Lab is actually good at. Which is also why we wanted to build a product like computer. Because when models start specializing deeply, an orchestration of what each one can do individually at whatever they're skilled at is valuable. Yeah. And so yeah, we're largely in these two camps and headcount wise, we've remained flat since beginning of the year and over, over a period of one year that is exactly from last year, same time to now, we have grown roughly just 30%. So I want to remain this efficient and I want like our company to be an example for many other founders in future to build like sub 500 people companies that can make like several billions in revenue. And I think that's the way to go because you want that sort of like force multiplier. You want your designers to write code, you want your business professionals to do data analysis. You want your sales reps to actually like make their own presentations and decks and data analysis of the customer. You want them to do the bug triaging. You don't need a program manager, project manager to be an intermediary there. So it just vertically integrates your company even more.
[42:35]
A
And everybody is adding skills which we lived in for 20 years. Hey, pick a specialty, be an expert. Edwin was the advice. And now, hey, if you're a salesperson and you can redesign the landing page for the demos and you think you have a better idea, you can just vibe code it and send it. And the dev team's like okay, whatever. Or if you're the chief revenue officer, you don't have to go to the data analysis group. You can just dump your spreadsheets into Perplexity Computer and just rip. How are you using, you heard the sort of two camps and Aravind just to put a pin in it. 30% headcount growth, 5x revenue growth. That's significant if you think just about efficiency. And Edwin, you have 130 people at least in my research, somewhere around that number. And if you're over a billion in revenue, doesn't take a genius to figure out how efficient you are right now. So how do you think about efficiency and company building?
[43:35]
B
Yeah, so I absolutely agree with Arvind where I really strongly believe that. I mean historically there's been this incentive for companies to grow as much as possible as quickly as possible. And I think people always underestimate the bureaucracy and the politics and the communication complexity that that creates. Again, does anybody, I think very, very few people want to be running a 5,000 person company you're no longer invested in. You're no longer spending your entire day playing with your product and talking to your users. You're just spending your entire day being a corporate CEO who's just managing, managing company. And so I absolutely agree with Arvind on our front. And I think that to your point, I think that one of the things I love about things like cloud design, for example. Yeah, it used to be the case that if, or maybe if one of our front end developers or even somebody on our operations team, if they wanted to prototype a new interface or prototype a new landing page for these experts that come in, they would need to write down their ideas, send it to a designer, wait for a designer to sketch something up. And that may take a couple days. And then maybe the division didn't quite look at what they wanted. And so it would just be this long iteration cycle. Now you can just talk to Claude design, it spits something out pretty amazing within 15 minutes, 10 minutes, 5 minutes, and you can just iterate so much faster. And yeah, you can actually see this is what this vision looks like. And maybe I didn't like it now that I see it in person. And so they changed our idea. They just go somewhere completely different. And cloud design is online all the time, unlike our designer who sleeps eight hours a night. And so I think it just makes the product development process both faster, but then also for the personnel or operations team or for the engineer who's like building a sanding page to just get to own something end to end and to basically see their vision fleshed out as opposed to sort of like delegating parts of it. So yeah, I really bullish.
[45:44]
A
I was at the breakthrough prize this weekend, you know, Yuri Miller's science prize, and I was talking to Wonder Woman, Gal Gadot, the actress, and we were just talking and there was a director there, Darren Aronofsky, and we're just talking about how it's impacting Hollywood as an example. And you start to think about the unique roles everybody had. There were people who were storyboard artists, there were people who wrote scripts. There were directors like Akira Kurosawa or Spielberg who would draw their own or Ridley Scott from Aliens and Gladiator. He was known for drawing his own, you know, cells and he would, you know, draw all of these interesting, you know, images that he would then give to a cinematographer. Now with AI, you have the people who are writing screenplays, the producers, they're all coming together and anybody can do almost anything. Write dialogue, do the backgrounds and write all these cells to do them. And the cross disciplinary nature of that leads to innovation. Aravind, if you've ever met somebody who had expertise in multiple areas, whether it was computer science and art or art and you know, sales, they can just make some breakthroughs that other people don't have. And she was telling me there's a, there's a movie coming out, bitcoin Killing Satoshi. And it's only a $70 million budget. It would have cost 200 million. But they're just doing the actors on a gray screen and then everything is being built by AI in the background. So all they had to do is write a great script, have the best actors perform it, and now they can just build the movie with, you know, having had them on a soundstage for 20 days or whatever it happens to be. Think about what happens in that industry now. You can make three movies for the cost of one really kind of interesting moment in time.
[47:41]
C
Yeah, yeah. I mean, what do you do? Talk about the story? I think that he's, he's given a lesson like how most of the work into producing a movie is all about the pre production phase, like getting the story right, like getting the score, story and storytelling right. That was his biggest learning at Pixar was, you know, he would sit with the team, he would try to like go through the whole storyboard and if it didn't make sense, go redo it. Go redo it. And we're not, we're not going to make the movie until we are. Until this is so good. And this was the single lesson he learned from Walt Disney is you cannot make a bad story succeed no matter how good you produce the actual movie. But you can, even if you don't necessarily do a great job at production values, a good story will win.
[48:36]
A
So you see that with independent films, right? You can see some incredible independent film where you're like, yeah, it's a little rough along the edges, but man, a great performance is based on a great story, based on great dialogue. You get all that right? And that is the, that's true in products too.
[48:52]
C
Like you, you, a simple like, like products often work if one or two ideas, you just hit it out of the park. Switch.
[48:59]
A
Yes.
[48:59]
C
And you know, I think you did
[49:01]
A
that with Comet, by the way. You were the first to drop a browser. And that's when I first started communicating with you. I was like, this browser is unbelievable. And I got everybody in the. I don't know if you've used it before, Edwin, but it was the first where you could be like, hey, here's what's on my page. Let's work with that, whatever that happens to be. And it was the first time you kind of let ChatGPT or whatever model you're using out on the real web. And man, that was a major breakthrough. Now, obviously, with hooks and integrations, it's getting to the next level. I wanted to talk a little bit about the commoditization of large language models and the creation of small language models. SLMs, I guess is the industry term, or VSLMs, verticalized ones. And I think, Aravind, you have the belief that we're starting to hit some form of commodification or wondering if where the value is going to accrue, is it going to accrue to the harness, to the wrapper? Is it going to accrue to the core model? So maybe you could explain your best estimation of what's going to happen in the next year or two in terms of people loading Kimi or Deep Seq or not even knowing which model they're using and then what a harness is and how people should think about harnesses and the impact they're going to have.
[50:29]
C
People don't buy models, they buy products. Right? Fundamentally, at the end of the day, the consumers have to pay for services. And pure model companies basically don't exist anymore. Anthropic is as much playing in the application layer as they're playing in the model layer. And whatever the information reported, I forgot, but 30 to 40% of the revenue, at least 30% of the revenue is coming from applications. So that shows you that you have to be a application layer player whether you build models or not. And the money is in the applications. If you have a model, obviously you can vertically integrate it with the application and you can build custom harnesses for your models. And he'll claim your models on being good at your own harness. So that's an advantage you have. But the disadvantage you have is you have to always ensure you have the best model all the time. And that's serious, serious competition. Truly a game you can only play if you have at least tens of billions of dollars in cash to spend on compute. And it's not just about the cash you have, it's also that you have to secure compute capacity and compete with all the other players trying to secure the compute capacity years in advance. Power capacity now and then. Hyperscalers need to be invested in you. It's a game that you only play at the highest level. So and that's also why there are like four or five players playing there
[52:04]
A
and maybe less in the future. Maybe we'll see some of them consolidate or maybe some people get out of that business if they don't feel they can compete. Edwin, where do you think this winds up? You obviously are helping people train their models. These are your customers. So you're rooting for them, you're helping them build. But there's also, hey, people saying it really is the application layer where the value is going to accrue, whether it's Google's suite of products in their browser, Apple's suite of products. As we talked about in the first topic of the show, Perplexity Computer, Claude, Cowork, openclaw, all of these different front ends harnesses. The orchestration level is becoming more important. So how do you think about the orchestration level yourself? And are the LLMs going to get commodified?
[52:50]
B
I still don't think that AI models themselves are going to get commodified. And I think a big part of that is because I just think so often about their personalities or the specializations that Erin mentioned earlier. And so an example of that, even if I asked today a fairly simple question like I don't know who was Abraham Lincoln? I'm going to get a very different response from ChatGPT versus Claude, for example, or ChatGPT versus Gemini. And in the same way that, okay, I have a bunch of friends and sometimes I will ask them certain questions, even if they all have the same level of intelligence, even if they all have the same degree, same level of knowledge, sometimes I just want the quick, snappy answer from one of my friends. I just enjoy talking to them more. When I'm in a certain mood, sometimes I want a really well researched, really insightful thing, but I know it's going to take me five minutes to get the answer from my friend. And sometimes I'm just too busy to talk to them in the same way. I just feel like people will. Even for fairly similar tasks like coding, like front end coding versus backend coding or different languages, people will naturally sometimes just want to talk to different models depending on your mood, even for the same topic. So I really don't think that the AI models will get commoditized.
[54:07]
A
I'm wondering, Aravind, if you have the other side of it, where so much of my work is happening in perplexity Computer openclaw in my. And I'm like, we need to have these skills, this SOL file, these memory files local on our hard drives. And before we use any language model, we're like, hey, here's the context. This is how I like to work. This is how I like my answers. And I've had to now, with four or five different models, explain to it. I like concise answers. I like you to just solve the problem, not give me updates on your thinking. Like openclaw became so verbose recently in the latest version, I wanted to kill myself. It was like, like, okay, the user wants me to do this. Okay, I'm going to do this. Okay, I'm going to do this. I was like, no, no, you just give me the steak when it's perfectly cooked. I don't want you to explain to me all the steps in cooking the steak. So what do you think about where the value will start to accrue?
[55:02]
C
I believe that the value is in the application layer. And there were a lot of model companies and one of the main reasons some of them went out of business is also why they couldn't build an application. The pure API model doesn't work because you cannot build a model that's so far so much better than the rest. No one's able to maintain that much of a significant. The only time there was a significant lead in the model layer was when GPT4 existed. And it took a year for anybody else to catch up after that. The gap has usually been months, I would say. And, and even between open source and Frontier, I think the gap is like six months to a year at this point.
[55:45]
A
Is that what you feel? Six months to a year? You feel the same way. Edwin, what would you say the gap is? Open source to Frontier.
[55:51]
B
Yes. In terms of raw model intelligence, the raw API layer. Yeah, I agree.
[55:54]
C
So I think commoditization and specialization are not necessarily mutually exclusive. There are some models that are getting specialized, clearly. Cloud models are clearly very good at agent decoding, code execution, agent orchestration and OpenAI models. And Google's models are very good at multimodal stuff because they have a lot of data and multimodality that nobody else has. Elon's Grox models are very good at being unfiltered and unconstrained. And that has its own.
[56:30]
A
Unhinged.
[56:31]
C
Unhinged. Yeah, unhinged.
[56:34]
A
They call it unhinged mode.
[56:35]
B
Right.
[56:35]
A
I think that's their.
[56:37]
C
Yeah. And that talks, that speaks a lot to the shapes and how you shape the values of the model. What do you train it on? What are the fundamental ground truths it assumes is true, or at least has been trained for it. And so that's all not commodity. These characteristics of how these models behave and what they're good at and all these things are not commodity. What is commodities? If they're all hill climbing on Ella Marina or Terminal Bench or Gentix V or Humanities last exam, these are all like academic benchmarks. If all of them are hill climbing on these benchmarks, because that's the stuff you publish to researchers to show you're at the frontier, that part is commodity, because open source is also doing that. So some qualities will be specializations. A lot of academic benchmarks will be commodity. And it'll be up to the model trainer, the product builder, the application layer owner to take what is commodity and shape it in a way that matters for the use cases that they own. And you can only survive, you can only have value accruing to you if you actually own a certain bunch of workflows and have a bunch of loyal customers, high retaining customers and own a bunch of workflows because that's the only way that you collect unique tokens, unique data that you alone can harness and keep improving on those capabilities. Capabilities.
[58:00]
A
How do you think about these LM Arenas of the world, the benchmarks, Humanities last test, Edwin, because you're helping folks with training, obviously you're a key player in this. How do you. What's your take on LM arena and people optimizing for these benchmarks today?
[58:17]
B
I really do think that LM arena is just this terrible cancer on AI. Like you basically have a random niche subset of population, like people. I think people don't realize that it's so niche and they think that it's a random representative set of users. But no, it's like a random niche subset of population that just wants free access to models. And they have endless time to wait for Elamarena to spin endlessly before they glance at the responses for two seconds and then they click their favorite. And so basically what happens with Elamarena is that you get models that completely hallucinate and they beat out models that answer correctly as long as they have a bunch of pretty formatting that catches the eye of this random niche subset of population. But the problem with it is that it's such a visible benchmark, like everybody knows about it in industry. It's such a visible benchmark that you have all these VPs, all these CEOs, all these companies that basically have entire teams purely dedicated to hacking it. It's pretty well known within the industry that, yeah, once you have a data science team analyzing the kind of weird idiosyncratic preferences of this niche population, you can just hack it. And so companies do it even though they've researched themselves. They agree that it simply makes their models worse. So personally I was really hoping that would completely die out after Meta showed last year how easy it is to hack. But somehow, somehow. So is this.
[59:36]
A
Well, and this speaks to, I guess Warren Buffett would say, show me an incentive, I'll show you an outcome. Or Goodhart's law. When a measure becomes a target, it ceases to be a good measure because everybody starts optimizing for benchmarks.
[59:51]
C
Yeah, Edwin, yeah, exactly.
[59:54]
A
And what should we be measuring then? How should we benchmark the industry and the models that are being built? Is there a better way to do this?
[60:04]
B
So I think the way to really do it is to think about how real humans are using these models in real life. So for example, a lot of what we do is we simply run these human evaluations. Well, we take models like we take Claude Code or we take Gemini and we ask the software engineers themselves, go use this in the real world, go use it for your actual day to day job and then ask it your queries and then measure whether or not it actually helped you. So not only was it correct, not only would it pass this set of unit tests, but was the web page that it created for you, is it something that you would actually want to launch to your users? Did it make great recommendations for you for your A B test, for the metrics that you're trying to optimize for that you actually believe in and not just again playing this sort of benchmark game where a lot of these benchmarks, I think people don't realize they're just very contrived. So the prompts themselves are things that no real user would ever ask. The way you measure the benchmarks, because they're often auto evaluated, they're purely measured on like they didn't match a certain string. And in the real world, that's not what we're looking for. In real world you care about things like the creativity responses, you care about the design of webpage and so on.
[61:19]
A
If you think about it like Google did, Google was measuring bounce back rate at some point where it was like somebody searches for, hey, what time is the Knicks game today? They go to a web page do they come back and click on the second and third result? If they did, you didn't give them the right result. And then eventually I remember talking to Larry about this 20 years ago, Larry Page and he was like eventually Jason, we're just going to tell you what time the Nick game starts. That's eventually what's going to happen. The computer will just know. And that's a very weird thing to think about. Google's whole existence was don't come back to the website for that query. Don't come back. And how quickly can we get you off of our website? Whereas other people, Disney Corporation, espn, they were saying hey, when we get you to the website, how long can we keep you on the website? Can we keep enticing you? Meta obviously with Instagram and Facebook, how long can we keep your session going? YouTube, how long can we keep our session going? Two very different north stars.
[62:23]
B
Yeah, yeah. I think there's a difference between optimizing for, for what the humans want, what the real users want and what makes their lives better as opposed to optimizing for clicks and engagement.
[62:34]
A
Yeah. And AI should be at its best Arvind of just solving your problem. And do you track that like did I solve the problem or not? With Perplexity Computer, did I solve the problem or not with the Model Council if people don't know Model Council you can explain it a bit like that's to me I think the ultimate test is do I keep querying you and am I happy with the answer?
[62:59]
C
Yeah, yeah, yeah. So I mean to the Larry Page thing he said eventually the AI sorry computer should tell you like what time the game starts. You don't have to click on a bunch of links. That's precisely why we built Perplexity. Like that was the problem we solved. There links to answers. So yeah we do track that. For example there's a very simple heuristic if a user asks a question and then there's a follow up question that's like but no, I meant that means you didn't do a good job with the first question. Initially we started just using heuristics because every day we get a lot of queries so it's hard to do run a lot of LLM, compute on all of them and filter them. But now we don't care. Like we have small language models that can just run on a lot of query logs and like filter threads where like user clearly had to clarify the prompt again in order to get a better answer which means you did not understand the user Intent. Well, in the first prompt itself, and this is like another like a Larry Page philosophy thing. Like, even if the user's prompt wasn't detailed enough, your job is to still give them a good answer. You should consider a user prompt as intent, not the actual descriptive prompt. This is a very different product design philosophy from ChatGPT, where in ChatGPT, I think, at least in the beginning, they used to tweet stuff like, you're not a high taste tester enough if you don't know to tell why this model is better than the previous model. No, you shouldn't need to be like, the model should speak for itself. Users should feel it and you shouldn't blame it on their prompting capabilities. And so we took the Google philosophy of like, the user is never wrong, where even if their prompt was bad, even if their prompt was incorrectly phrased, it's on the AI to disambiguate and understand and like reformulate it and expand the prompt and search as much as possible and give as much information the user wants and ask a clarifying question at the end if they, they're happy with that or they're looking for something else.
[65:06]
A
It seems like this was the big innovation with Claude's 4.6 and now 4.7. Yeah, Edwin, when you ask it a very simple question, it kind of threads out and thinks, well, what are your next five questions? Or what did you really mean by that? And it tries to rationalize it and give you an answer that's much more comprehensive than you could ever have imagined. Yeah, yeah.
[65:30]
B
I mean, I think Claude has always been very, very good at the planning stage where each one has to formulate this plan up in advance and then execute it and then, yeah, I kind of like backtrack whenever that's going wrong. So yeah, I think that's what.
[65:42]
A
Yeah, my favorite tool is Model Council. I am super addicted to it. I don't know. Do you use it, Edwin, do you use Perplexity's Model Council ever? Or do you have, I mean, you might have your own for your own benchmarking, I guess, internally, but have you used it before? And any thoughts on putting the models up against each other and knowing the diffs?
[66:04]
B
Yeah, so I am constantly doing that myself just because, I mean, it's almost like a lot of what we do in our day to day work or like a lot of our experts do, they're just constantly comparing the models. So I personally do it a ton,
[66:18]
A
which is multiple windows open.
[66:20]
B
So I have a special, I guess, app that I Built to do it myself. I mean, and then our experts have a different kind of app. But yeah, Ervent will have to give me a demo of model console.
[66:30]
A
Model Council is spectacular. You give it your query, threads it out to whatever three or four models you want, then it'll tell you here's where they disagree and here's who you should trust. Yeah. And how popular is this now?
[66:44]
C
It's pretty popular among the Max users. So actually, you know, Jensen asked me to build that feature. So Jensen said he really loves asking different models the same thing. And the way he said he would do it is he would ask Perplexity one question, he would ask Claude one question, he would ask ChatGPT one question and Gemini. And then he would look at all the AIs and see what each of them say and then compare in his head. I was like, hey, Perplexity has all these models in one app. Maybe we can just do that within the app itself so that you don't have to open four apps and read all of them and then figure out what the differences are. And so what if we built that feature natively and we'll be the only app like layer company that could do this because other app layer companies have an incentive to just put their own model. And so we built that and he was pretty happy about it and we rolled it out. Obviously it's expensive. Like you're going to ask four or five different models each question and it's not just like aggregating the answers like the Orchestrator looks at. It tells you exactly where they differ, where they agree and what you should truly take away. And so there's some synthesis layer that's actually running with the Frontier model too. And I like to use it a lot for health queries, because health queries are where the evidence is there on the Internet. But how the models interpret that evidence in terms of the prompt you asked often differs. Some models are very risk conscious. Some models are actually risk seeking in terms of what they recommend. And so you want both modes of behavior there and then an analysis. So I think that's super useful. I like asking Model Council about what it thinks about different stocks. What do you think, can Tesla be worth $10 trillion in the next 10 years? Or which Max 7 stock actually could be worth 10x in the next 10 years from where it is today? And I like asking all these different models simultaneously. And I think Consul is a good cool feature. And Council is just a skill inside computer too, so you can use it in computers.
[69:04]
A
Oh really? I Didn't realize that. I mean, that's the next thing I need. I need an AI to sit next to me while I'm working. I guess this is what Microsoft Copilot was supposed to be. And just tell me, hey, dummy, there's a quick key for that. Hey, dummy, Perplexity Computer has that built in. I need like a clippy that just tells me when I'm doing something on my computer that I'm an idiot. And there's a faster way to do it that would be pretty helpful at this point in time.
[69:34]
C
I think that's going to be a feature that I feel like Apple's best position to ship because you need full access to your screen and you're not going to trust the server side AI company to do that.
[69:46]
A
You ever think about building an operating system? I know this sounds insane. Have you thought about that? Because you built a browser and you built a computer and what's the difference between Perplexity Computer and Comet Browser? Chrome became an os. Have you thought about just building an OS that people can boot?
[70:02]
C
I think it's certainly an interesting idea. Well, fundamentally, Jason, everything's about distribution. If you build an os, you need to get it distributed in actual hardware devices, which means you need to have an OEM that wants to distribute it for you. And if you actually read the contracts Microsoft has signed with the hardware OEMs, it'll make you look at Google like an angel. So
[70:28]
A
that was a big part of the antitrust case back in the day.
[70:32]
C
It's still pretty terrible. There's a reason even Chrome OS never picked up. And they were largely only able to get adoption at the level of schools and banks and governments because
[70:46]
A
I ran our firm on Chromeboxes for maybe two years. And then the one thing that broke it was Zoom. The ascent of Zoom just made it impossible because when you did a Zoom call in a browser window, it sucked. And Zoom never released itself. It's just like, this is never going to work for our team. But people loved it, because at work you could remain focused. You wouldn't have IPHOTOs popping up, you wouldn't have Apple music. People just love the, you know, the restraint of it. Okay, as we wrap here, I want you guys to think about the most impressive AI experience you've had the last couple of months. A tool, a product. You can shout anybody out and I'll kick us off. I have become addicted to Whisper Flow. I don't know if you guys are using Whisper Flow, but it is the greatest speech to text I'VE ever experienced in my life. And I had given up on the category of speech to text because I was just so disappointed in Siri's ability to take dictation. It just never worked. It could never do my last name. I mean Arvind. It would never be able to figure out yours. We can't figure out Calacanis, but my God, whisper flow is so amazing. And then I got a foot pedal for 20 bucks off of Amazon. When I'm at my computer this morning, I was using the model Council and I would press down on my pedal on my Windows machine and I would just talk and keep talking and keep talking and keep talking because the more you give a model, the better the response. The rambling long prompt is so much better than a short one. And having to just go back and forth and play ping pong with questions and then I just lift up my foot pedal, bang, text right in there. It is a life changing experience. Just that stupid foot pedal and this incredible piece of software. Do you have one? Aravind or Edwin?
[72:38]
C
We can name our own products, right? Is that the purpose of the question?
[72:41]
A
I mean you can, but sure. If you want to get one that
[72:46]
C
you love for something. Yeah, I would say feel free to
[72:51]
A
give a shout out to your own, but the goal, sure.
[72:53]
C
I love, I love. I'm a big time user of Perplexity Computer. I love using it for financial research, company running like internal data analysis, all that. Sure. But I guess I personally thought the integration, like the Grok integration inside X has improved considerably in recent times. The Explain Grok button on tweets, I don't always understand some of the jokes. So I think, I think it's pretty good and it's exceptional.
[73:26]
A
Especially when you're catching up to a pop culture moment or a breaking news story. Like there's somebody's like, my God, well, I guess President Trump and Iran's over and I'm like, I can't keep up. So I hit that Grok button and explains the whole context so well.
[73:40]
C
Yeah, yeah. When I'm on my desktop I can just use Comet to do that for me. But when I'm on X mobile app, you need that Explain Grok button is pretty good. So I think it's a very beautifully done integration. So kudos to them. It wasn't actually good before.
[74:00]
A
It was not good before. It was lacking.
[74:03]
C
So definitely it improved tremendously. The other thing I'm impressed about is I just think like the Gemini flash model, the Gemini 3 flash, it's just insanely. Fast for the capability it has. It's probably the fastest the model that hits the best sweet spot in terms of speed and intelligence. And so that's just sheer, like, amazing piece of engineering that they've accomplished.
[74:33]
A
It is wicked fast. Edwin, what do you got? What are you obsessing over on the weekends or at nights in the AI space?
[74:40]
B
So I actually am a really, really big fan of claw design. I think it's really well designed and opinionated, and it's almost like, okay, yeah, maybe this is where I see the Instagram founders touch. So I feel like it's a great product. I mean, it still has some bugs, some little annoyances that have been a little frustrating for me, but I'm very, very optimistic for it.
[74:59]
A
And then Claude design. Yeah, that is. I think it came out last week. I mean, Claude is just releasing stuff at a pace that none of us can keep up with. I know it's out. Just haven't had a chance to play with it yet.
[75:10]
B
I was like, maybe this is where it would be fun to learn a little bit more about design. So, yeah, I had a lot of fun playing around with it last week. But I would say one other place where I think models just kind of blew my mind was I got my blood work a couple months ago, and it's like, okay. So I tried talking to my doctor. My doctor just gave me generic recommendations and rushed me out the door. And so I took some photos of it and I uploaded it to all different models. And honestly, they gave me some great recommendations. And I feel like I've been feeling so much better since I've been following them. So I thought that was a pretty amazing experience.
[75:44]
A
That is an interesting thing. I don't know if you guys are on Team Whoop or Team Aura or what you use or Function or Superpower. So many of these great things out there. But whoop. Now I'm trying to get my sleep dialed in so I can give a good performance on podcasts and when I'm meeting with founders and, you know, trying to hit certain stress, you know, goals, which is a good thing, like stressing your body in a good way. And it now has an AI built into it that's tuned on your health. So it was terrible six months ago. Then just the last couple of weeks, it was like, wow, you did a really great job skiing. You know, this is your sixth day in a row of skiing. You might want to take a day off. You could consider some hydration beverages, and you're at 7,000ft of elevation. So you're going to need to drink more water. You're going to have to get more rest. It was like, whoa. And then it's like, I know you just took a long flight to Japan. Here's how you should reset. This is like another breakthrough moment. It's like, oh, I know you're on a different time zone. Here's how that's going to affect your sleep. Would you like a sleep plan? And I was like, would I like a sleep plan to get over jet lag? You bet I would. Really well done. Like a very verticalized, incredibly fast. Using my data. Yeah. Incredible. You have vitamin D, Edwin. Is that your issue? Like everybody, it seems like everybody doesn't get enough vitamin D in our industry.
[77:08]
B
I do take a lot of ind, but that was one of its organizations. But I'll have to check out the whoop.
[77:12]
A
Yeah. Oh, you use the whoop too?
[77:14]
B
Yeah, I'll have to check it out.
[77:15]
A
You should check it out. I, it's, do you use any of these Aravind?
[77:21]
C
I use Apple Watch. But yeah, I, I, I do have my blood work done pretty much every month and.
[77:29]
A
Yeah, every month.
[77:30]
C
Whoa.
[77:31]
A
That's obsessive.
[77:32]
C
Well, well, that's because I, I had like, you know, like some conditions and I needed to make sure I monitor it. But now I'm fine. I'm pretty like, like fine on all the vials. But yeah, my, the things I go low on at times are vitamin D and zinc because some diet stuff. So it's, it's good to know, it's good to know ahead of time. And I think definitely the proactive intelligence is very important. So we, we built like, like in computer, you can connect all your health stuff like Apple Health Whoop or.
[78:07]
A
Oh, you can, I'm gonna do it.
[78:09]
C
Function, health Bevo. You can put all that, you can put your lab results, everything and set it up. So we are very serious about making computer work really well for personal health use cases.
[78:20]
A
I think it's going to be incredible. All right, gentlemen, I know that you've got very efficient teams, but at some point you must need to hire a person for something specific. We get a lot of people listen to the pods. So Edwin, anybody you're searching for, I know on the expert side, you must be constantly looking for experts. So where can people find more information if they want to be an expert or go work at your company?
[78:42]
B
Yeah. So just go to our website, Surge hq, AI or email me. Personally, I love reading applications.
[78:47]
A
So amazing. Aravind, what are you looking for what do you need? How can we help you with keeping the product train running and it's been doing really well.
[78:56]
C
Yeah we're going deeper on the enterprise so people interested in sales roles definitely feel free to apply. Engineers are watching this hiring for full stack engineers so definitely feel free to apply here and yeah, very excited to see who's interested in us.
[79:13]
A
Absolutely. And this has been episode 10 of this Week in AI. Go to thisweekinai AI Sign up for our newsletter. We're gonna have a paid newsletter. We're tracking every single company that's invested and we're gonna be sending out reports on every one of these seed stage and series A company. So you're gonna wanna sign up and get the free email. And then in I think June 1st we're gonna launch the paid version with even more granular details for people who are in the industry and we'll see you next time. Bye Bye.