Chetan Puttagunta and Modest Proposal - Capital, Compute & AI Scaling - [Invest Like the Best, EP.399] - Invest Like the Best with Patrick O'Shaughnessy

Summary5 min read

Invest Like the Best with Patrick O'Shaughnessy Episode: Chetan Puttagunta and Modest Proposal - Capital, Compute & AI Scaling (EP.399)
Release Date: December 6, 2024

In this compelling episode of Invest Like the Best, host Patrick O'Shaughnessy engages in an insightful conversation with two esteemed guests: Chetan Puttagunta, a General Partner and investor at Benchmark, and Modest Proposal, an anonymous investor managing a substantial portfolio in public markets. The discussion delves deep into the evolving landscape of Artificial Intelligence (AI), particularly focusing on the scaling paradigms of large language models (LLMs), the shifting investment strategies, and the broader economic implications.

1. The Evolution of AI Scaling: Pre-Training to Test Time Compute

Chetan Puttagunta opens the discussion by highlighting a pivotal shift in AI development. He explains that AI labs have recently encountered plateauing effects in pre-training scaling, a method where increasing computational power during the training phase directly enhances model performance. This traditional scaling approach, based on power laws, posited that multiplying compute power by tenfold would yield significant improvements in intelligence and capability.

Chetan Puttagunta [05:20]: "We're now shifting to a new paradigm called test time compute... scaling on what's now being called reasoning."

However, Chetan points out that the saturation of human-generated text data has limited further pre-training advancements. As a result, AI development is transitioning towards "test time compute," where models engage in reasoning and solution verification during inference, rather than relying solely on pre-trained data.

2. Implications for Public Tech Companies

Modest Proposal provides a macroeconomic perspective on how this shift affects major public tech entities. He underscores that a significant portion of the S&P 500's market capitalization is now intertwined with AI themes, spanning industries from industrials to utilities. The transition to inference-based scaling aligns expenditures more closely with revenue generation, offering a more sustainable financial model compared to the capital-intensive pre-training phase.

Modest Proposal [11:57]: "If you're a two to five person team, you can take something like coding... a billion dollar trading run is essentially you're committing two funds to do one training run that may or may not work."

He critically assesses leading AI players like OpenAI, Anthropic, and Meta, emphasizing the challenges they face in maintaining dominance amid evolving scaling paradigms and the immense capital required for breakthroughs in synthetic data generation.

3. Shift in Investment Focus: From Model to Application Layer

Chetan elaborates on the burgeoning opportunities within the AI application layer, where small, agile teams are leveraging open-source models like Meta’s LLAMA to innovate without the exorbitant costs previously associated with model training.

Chetan Puttagunta [17:29]: "We're seeing these small teams catch up to the frontier with spend that is not one order, but multiple orders of magnitude less than what these large labs were spending to get there."

This democratization of AI model development is enabling startups to focus on creating specialized applications that deliver significant value, thereby attracting substantial investment at a rapid pace. Modest Proposal corroborates this by highlighting how AI applications are drastically reducing software and human capital costs for enterprises, leading to swift adoption and deployment.

4. Infrastructure and Cost Dynamics

The discussion shifts to the infrastructural implications of moving from pre-training to test time compute. Both guests agree that this transition necessitates a rethinking of data center architectures, emphasizing efficiency and lower latency over sheer computational power.

Modest Proposal [75:54]: "It's clear that we're going to rethink how you want to build your infrastructure to service a much more inference focused world than a training focused world."

Chetan adds that advancements in semiconductor technology, such as Cerebras' efficient inference capabilities, are pivotal in making AI applications more cost-effective and scalable.

5. Valuation Trends and Market Sentiment

Patrick probes into the current investment climate, noting the high valuations of AI startups despite potential competition and market saturation.

Chetan Puttagunta [63:00]: "Our cost of inference is essentially zero and our gross margin for this task is 95%."

Both guests express optimism, attributing the favorable valuations to the drastically reduced costs of AI compute and the surge in demand for innovative AI applications. Modest Proposal emphasizes the resurgence of "Animal Spirits" in the public markets, driven by the transformative potential of AI technologies.

6. Future Outlook and Philosophical Implications

The conversation culminates with speculative insights into the advent of Artificial General Intelligence (AGI). Chetan envisions AGI on the horizon by 2025, driven by advancements in reasoning and autonomous task completion.

Chetan Puttagunta [35:57]: "AGI is very close by... we're very, very close to it."

Modest Proposal raises cautionary notes about the unpredictable nature of AGI development, referencing historical instances where AI systems surpassed human expectations in unexpected ways.

Modest Proposal [84:32]: "Anytime that comes into play, I think the stakes are just higher."

7. Under-Discussed Aspects and Closing Thoughts

Both guests agree that the broader infrastructural and economic implications of AI scaling are under-explored in mainstream analyses. They call for more comprehensive sell-side reports and private market evaluations to fully grasp the transformative impacts of this paradigm shift.

Chetan Puttagunta [82:50]: "We haven't seen sell side reports or analysis on this new paradigm shift... very capital efficient."

In conclusion, the episode provides a nuanced exploration of the current and future states of AI development, investment strategies, and the intricate balance between technological innovation and economic sustainability. Chetan and Modest Proposal offer a forward-thinking perspective, advocating for strategic investments in the AI application layer while acknowledging the challenges and uncertainties inherent in the path toward AGI.

Notable Quotes:

Chetan Puttagunta [05:20]: "We're now shifting to a new paradigm called test time compute... scaling on what's now being called reasoning."
Modest Proposal [11:57]: "If you're a two to five person team, you can take something like coding... a billion dollar trading run is essentially you're committing two funds to do one training run that may or may not work."
Chetan Puttagunta [17:29]: "We're seeing these small teams catch up to the frontier with spend that is not one order, but multiple orders of magnitude less than what these large labs were spending to get there."
Chetan Puttagunta [35:57]: "AGI is very close by... we're very, very close to it."
Modest Proposal [84:32]: "Anytime that comes into play, I think the stakes are just higher."

This episode is a must-listen for professional investors, CEOs, entrepreneurs, and business strategists keen on understanding the intricate dynamics of AI scaling and its profound impact on the investment landscape.

Loading summary

Transcript64 lines

[00:00]
Patrick O'Shaughnessy
Two fun facts about our newest sponsorship partner, Ramp. First, they are the fastest growing fintech company in history, reaching a level of revenue in five years that I can't quote exactly but is eyebrow raising. Second, they are backed by more of my favorite past guests, at least 16 of them when I counted, than probably any other company that I'm aware of. A list that includes Ravi Gupta at Sequoia, Josh Kushner at Thrive, Keith Raboy at Founders Fund and Coastal Ventures, Patrick and John Collison, Michael Ovitz, Brad Gerstner. The list goes on and on. These facts demand the question why? Having been personally obsessed with the great businesses through history, one clear lesson is that the best of them are run by disciplined operators. These operators manage costs with incredible detail and they are constantly thinking about how they can reinvest every dollar and every hour back into their business. This is RAMP's mission to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. First on expenses. The average American business has a profit margin of 7.7%. This means saving 1% on costs is the equivalent of making 13% more revenue. The average ramp customer is able to save 5% on their expenses each year. Of course, every entrepreneur is looking for ways to grow revenue by 50%. They should just as seriously seek to save 5% on their expenses. Second on time. Unnecessary complexity is why most finance teams spend 80% of their time doing operational work and only about 20% of their time on strategic work. Ramp makes spend management very simple by handling your company's expenses, travel, bill payments, vendor relationships and even accounting. It's notable that some of the best in class businesses today, companies like Airbnb, Anduril and Shopify and investors like Sequoia Capital and Vista Equity are all using Ramp to manage their spend. They use it to spend less, they use it to automate tedious financial processes, and they use it to reinvest save dollars and hours into growth at both Colossus and Positive Sum. My businesses, We've used Ramp for years now for these exact reasons. Go to ramp.com invest to sign up for free and get a $250 welcome bonus. That's R A M p.com/invest. As an investor, I'm always on the lookout for tools that can truly transform the way that we work as a business. Alpha Sense has completely transformed the research process with cutting edge AI technology and a vast collection of top tier reliable business content. Since I started using it, it's been a game changer for my market research. I I now rely on Alpha Sense daily to uncover insights and make smarter decisions. With the recent acquisition of Tigis, Alpha Sense continues to be a best in class research platform delivering even more powerful tools to help users make informed decisions faster. What truly sets AlphaSense apart is its cutting edge AI. Imagine completing your research five to ten times faster with search that delivers the most relevant results, helping you make high conviction decisions with confidence. AlphaSense provides access to over 300 million premium documents including company filings, earnings reports, press releases and more from public and private companies. You can even upload and manage your own proprietary documents for seamless integration. With over 10,000 premium content sources and top broker research from firms like Goldman Sachs and Morgan Stanley, Alpha Sense gives you the tools to make high conviction decisions with confidence. Here's the best part. Invest like the Best Listeners can get a free trial now just head to alpha-sense.com invest and experience firsthand how Alpha Cents and Tigis help you make smarter decisions faster. Trust me, once you try, you'll see why it is an essential tool for market research. Hello and welcome everyone. I'm Patrick O'Shaughnessy and this is Invest like the Best. This show is an open ended exploration of markets, ideas, stories and strategies that will help you better invest both your time and your money. Invest like the Best is part of the Colossus family of podcasts and you can access all our podcasts including edited transcripts, show notes and other resources to keep learning@joincolossus.com Patrick O'Shaughnessy is the CEO of Positive Sum. All opinions expressed by Patrick and podcast.
[04:01]
Modest Proposal
Guests are solely their own opinions and.
[04:03]
Patrick O'Shaughnessy
Do not reflect the opinion of Positive Sum.
[04:06]
Modest Proposal
This podcast is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of Positive Sum may maintain positions.
[04:15]
Patrick O'Shaughnessy
In the securities discussed in this podcast.
[04:17]
Modest Proposal
To learn more, visit Psum VC.
[04:24]
Patrick O'Shaughnessy
My guests today are Chetan Puttigunta and Modest Proposal. If you're as obsessed as I am about the frontier in AI and the business and investing implications, you will love this conversation. Chetan is a general partner and investor at Benchmark, while Modest Proposal is an anonymous investor who manages a large pool of capital in public markets. Both are good friends and frequent guests on the show, but this is the first time that they have appeared together. The timing could not be better. We might be witnessing a pivotal shift in AI development as leading labs hit scaling limits and transition from pre training to test time compute. Together we explore how this change could democratize AI development while reshaping the investment landscape across both public and private markets. Please enjoy this great discussion with my friends Chetan Pudigunta and Matas proposal. So, Chetan, maybe you can start by just telling us from your perspective what is going on right now that is most interesting in the technology part of the story of LLMs and their scaling.
[05:21]
Chetan Puttigunta
Yeah, I think we're now at a point where it's either consensus or universally known that all the labs have hit some kind of plateauing effect on how we perceive scaling for the last two years, which was specifically in the pre training world. And the power laws of scaling stipulated that the more you could increase compute in pre training, the better model you were going to get. And everything was thought of in orders of magnitude. So throw 10x more compute at the problem and you get a step function in model performance and intelligence. And this certainly led to incredible breakthroughs here. And we saw from all of the labs really terrific models. The overhang on all of this, even starting in late 2022, was at some point we were going to run out of text data that was generated by human beings and we were going to enter the world of synthetic data fairly quickly. All of the world's knowledge effectively had been tokenized and had been digested by these models. And sure, there were niche data and private data and all these little repositories that hadn't been tokenized, but in terms of orders of magnitude, it wasn't going to increase the amount of available data for these models particularly significantly. As we looked out in 2022, you saw this big question of was synthetic data going to enable these models to continue to scale? Everybody assumed, as you saw that line, this problem was going to really come to the forefront in 2024. And here we are, we're here and we're all trying to train on synthetic data, the large model providers. And now, as it's been reported in the press and as all these AI lab leaders have gone on the record, we're now hitting limits because of synthetic data. The synthetic data as generated by the LLMs themselves are not enabling the scaling and pre training to continue. And so we're now shifting to a new paradigm called test time compute. And what test time compute is, in a very basic way is you actually ask the LLM to look at the problem, come up with a set of potential solutions to it, and pursue multiple solutions in parallel. You create this thing called a verifier and you pass through the solution over and over again iteratively and the new paradigm of scaling, if you will, the X axis is time measured in logarithmic scale and intelligence is on the Y scale. And that's where we are today, where it seems that almost everybody is moving to a world where we're scaling on pre training and training to scaling on what's now being called reasoning, or that is inference time, test time, however you want to call it. And that's where we are as of Q4, 2024.
[08:31]
Patrick O'Shaughnessy
Just a follow up question on the overall picture. So setting aside capex and all this other stuff that we'll talk about with the big public tech companies in just a moment, is it reasonable to say, based on what you know now, that the switch to test time scaling, where time is the variable, is like a who cares? As long as these things keep getting more and more capable, isn't that all that matters? And the fact that we're doing it in a different way than just based on pre training, does anyone really care? Does it matter?
[09:02]
Chetan Puttigunta
There's two things that come up pretty quickly in test time or reasoning paradigm, which is as LLMs explore the space for potential solutions very quickly, as a model developer or somebody working on models, you quickly realize that algorithms used for test time compute might exhaust the useful search space for solutions quite quickly. That's number one. Number two, you have this thing called a verifier that's looking at what's potentially a good solution, what's potentially a bad solution, what should you pursue, and the ability to figure out what's a good solution and what's a bad solution, or what's an optimal path and not an optimal path. It's unclear that that scales linearly with infinite compute. And then finally, tasks themselves can be complex, ambiguous, and the limiting factor there may or may not be compute. So it's always really interesting to think of these problems as if you were to have infinite compute to solve this problem, could you go faster? And certainly there's going to be a number of problems in reasoning where you could go faster if you just scaled compute. But oftentimes we're starting to see evidence that it's not necessarily something that scales with compute linearly with the technology we have today now, can we solve all of that? Of course there's going to be algorithmic improvement, there's going to be data improvement, there's going to be hardware improvement, there's going to be all sorts of optimization improvements here. The other thing we're still finding is the inherent knowledge or data available to the underlying model that you're using for reasoning still continues to be limited. And just because you're pursuing test time, it doesn't mean that you can break through all previous data limitations by just scaling compute at test time. So it's not that we're hitting walls on reasoning or we're hitting walls on test time, it's just the problem set and the challenges and the computer science problems are starting to evolve. And as a venture capitalist I'm very optimistic that we're going to be able to solve all of them. But they're could be solved.
[11:27]
Patrick O'Shaughnessy
So if that's the sort of research labs out view modest I'm curious for you to give us the big public tech companies down view because so much of this story has been the spend capex, the strategic positioning, the quote unquote ROI on all this spend and how they're going to earn a return on this insane outlay of capital. Do you think that everything Jathan just said is well reflected in the stance and the pricing and the valuations of the public tech companies?
[11:57]
Modest Proposal
I think you have to start at a macro level and then get to a micro level at a macro level. Why this is so important is everyone knows the Mag 7 a larger percent of the S&P 500 they represent today. But beyond that, I think thematically AI has permeated far broader into industrials, into utilities, and really makes up, I would argue somewhere between 40 and 45% of the market cap is a direct play on this. And if you even abstract to the rest of the world, you start bringing in asml, you bring in tsmc, you bring in the entire Japanese chip sector. And so if you look at the cumulative market cap that is a direct play on artificial intelligence right now, it's enormous. And so I think as you look across the investment landscape, you almost are forced to have an opinion on this because most people in some form or another are benchmarked against an index that is going to be a derivative play on artificial intelligence at the micro level. I think that this is a fascinating time because all of public market investing is scenario analysis and probability weighting different paths. And if you go back to when we talked probably four months ago, I would say that the distribution of outcomes has shifted. And at that point in time, pre training and scaling on that axis was definitely the way and we talked about what the implications were at the time. We've talked about Pascal's Wager or we've talked about Prisoner Stiletto and in my mind it was easy to talk about that when the cost of anteing up was $1 billion or $5 billion. But we were rapidly approaching the point in time where the ante was going to be $20 billion or $50 billion. And you can look at the cash flow statements of these companies, it's hard to sneak in a $30 billion trading line. And so the success of GPT5 class, broadly, let's apply that to all the various labs, I think was going to be a big proof point as to whether or not the amount of capital was committed because these are three four year commitments. If you go back to when the article was written on Stargate, which is the hypothesized $100 billion data center that OpenAI and Microsoft were talking about, that was a 2028 delivery. But at some point here in the next six to nine months it's a go no go. We already know that the 3 to 400,000 chip super cluster is going to be delivered end of next year, early 2026. But we probably need to see some evidence of success on this next model in order to get the next round of commitment. So all that is a backdrop. I think at the micro level this is a really powerful shift if we move from pre training to inference time. And there are a couple big ramifications. One, it better aligns revenue generation and expenditures. I think that is a really, really beneficial outcome for the industry at large, which is in the pre training world you were going to spend 20, 30, $40 billion on CapEx, train the model over 9 to 12 months, do post training, then roll it out, then hope to generate revenue off of that in inference. In a test time compute scaling world, you are now aligning your expenditures with the underlying usage of the model. So just from a pure efficiency and scalability on a financial side, this is much, much better for the hyperscalers. I think a second big implication, again we have to say we don't know that pre training scaling is going to stop. But if you do see this shift towards inference time, I think that you need to start to think about how do you re architecture the network design? Do you need million chip super clusters in energy low cost land locations or do you need smaller, lower latency more efficient inference time data centers scattered throughout the country and as you re architect the network, the implications on power utilization, grid design. A lot of the, I would say narratives that have underpinned huge swaths of the investment world I think have to be rethought. And I would say to date, because this is a relatively new phenomenon, I don't believe that the public markets have started to grapple with what that potential new architecture looks like and how that may impact some of the underlying spend.
[17:06]
Patrick O'Shaughnessy
Jathan, I'm curious maybe to tell the story of Deep Seek and other things like it, where you see new models being built by small teams for relatively small dollars that are competing in performance with some of the leading edge models. Can you talk about that phenomenon and what it makes you think about or the implications for the landscape?
[17:29]
Chetan Puttigunta
It's really amazing. In the last call it six weeks of time, the number of teams we've met here at Benchmark that are two to five people and Modest has talked about this in your podcast before, which is that the story of technology innovation has been there's always been two to three people in a garage somewhere in Palo Alto doing something to catch up to incumbents very, very quickly. I think we're seeing that now in the model layer in a way that we haven't seen frankly in two years. Specifically, I think we still don't know 100% that pre training and training scaling isn't coming back. We don't know that yet. But at the moment, at this plateauing time, we're starting to see these small teams catch up to the frontier. And what I mean by frontier is where are the state of the art models, especially around text performing? We're seeing these small teams of quite literally two to five people jumping to the frontier with spend that is not one order, but multiple orders of magnitude less than what these large labs were spending to get there. I think part of what's happened is the incredible proliferation of open source models. Specifically, what Meta has been doing with LLAMA has been an extraordinary force here. Llama 3.1 comes in three flavors, 405 billion, 70 billion, 8 billion, and then Llama 3.2 comes in 1 billion, 3 billion, 11 billion and 90 billion. And you can take these models, download them, put them on local machine, you can put them in a cloud, you can put them on a server and you can use these models to distill fine tune, train on top of modify, et cetera, et cetera, and catch up to the frontier with pretty interesting algorithmic techniques. And because you don't need massive amounts of compute or you don't need massive amounts of data, you could be particularly clever and innovative about a specific vertical space or a specific technique or a particular use case to jump to the frontier very, very quickly. I think that is largely changing how I personally think about the model layer and potential early stage investments in the model Layer. There's a lot of ifs here and a lot of dependent variables and literally in six weeks none of this could be true anymore. But if this state holds, which is that pre training isn't scaling because of synthetic data, it just means that you can now do a lot more jump to the frontier very quickly with a minimum amount of capital, find your use case, find where you're most powerful and then from that point onward the hyperscalers frankly become best friends. Because today if you are at the frontier, you're powering a use case, you're not particularly GPU constrained anymore, especially if you're going to pursue test time inference or test time compute or something like that, and you're serving, let's say 10 enterprise customers. Or maybe it's a consumer solution that's optimized for a particular use case. The compute side of it just doesn't become as challenging as it was in 2022. In 2022 you would talk to these developers and it just became a question of well, could you get 100,000 cluster together because we need to go train and then we have to go buy all these data. And then even if you knew all the techniques, all of a sudden you would pencil it out and say like I need a billion dollars to get the first training run to go. And that just is not a model. Historically that's been the venture capital model. The venture capital model has been could you get together a team of extraordinary people, have a technology breakthrough, be capital light and jump way ahead of incumbents very quickly and then somehow get a distribution foothold and go at the model layer for the last two years that certainly didn't seem like it was possible. And literally in the last six, eight weeks that's definitively changed.
[21:41]
Modest Proposal
I think it's important the point about meta open source and the hyperscalers open source pushing the frontier, smaller models being able to scale to very successful points is enormously beneficial, particularly for AWS, who doesn't have native LLM. But if you just take a step back and think about what historically cloud computing was, it was providing a set of tooling to developers and builders. AWS first articulated this vision. I heard it publicly in September when Matt Garman was at Goldman Sachs conference. But their view clearly has been that LLMs are just another tool, that generative AI is another tool that they can provide their enterprise customers and their developer customers to build the next generation of products. The risk to that vision was an all powerful generalizable mob. And so again, this is where you sort of have to rethink if we're not going to build these massive pre trained entities where you drive training loss down to near nothing and that in some form or another builds the metaphorical God if instead the focus of the industry is at test time, at inference time and trying to solve real problems at the point of need for a customer. I think that again re engineers and re architects the entire vision of how this technology rolls out and I think we need to be humble that we don't know what llama4 is going to come out with. We don't know what Grok 3 is going to come out with. Those are the two models that are currently being trained on the largest clusters ever. So everything we're saying right now may be wrong in three months. But I think the entire job right now is to ingest all the available information and replot the various scenario paths with what we know today. I do not feel like people have updated their priors as to how these paths may go forward if this is correct.
[23:55]
Patrick O'Shaughnessy
I'm curious Jathan, how the idea that maybe now you would invest in a model company because of this change. I remember you telling me over dinner two years ago that as a firm you just decided we're not investing in these companies. Like you said, it's just not the model that we do. We don't write billion dollar checks for the first training run and so we're not investing in that part of the stack. We're investing more in the application layer, which we'll come back to in a little bit in this discussion. But maybe say a bit more about this updated view on how that could work out what a sample investment could look like and whether even if llama 4 is the pre training scaling loss hold, if that even changes that because it would just seem like something like Deepseek just only benefits from okay, now instead of 3.2 it's 4 and we're still doing our thing and it's still better and cheaper and faster and whatever. So yeah, what do you think about this new view you have potentially investing in model companies, not just application companies.
[24:49]
Chetan Puttigunta
In Meta's last earnings call, Mark Zuckerberg talked about them starting Llama 4 development and he said that Llama 4 is being trained on a bigger cluster than anything he's ever seen out there. The number that was quoted was it's bigger than 100,000 H1 hundreds or bigger than anything I've seen reported for what others are doing. And he also said, you know, the smaller Llama 4 models should be ready in early 2025. What's really interesting about that is that regardless of whether llama 4 is a step function from llama 3 kind of doesn't matter if they push the boundaries of efficiency and get to a point where even if it's incrementally better, what it does to the developer landscape is pretty profound. Because the force of LLAMA today has been two things, and I think this has been very beneficial to Meta is one. The transformer architecture that Llama is using is a sort of standard architecture, but it has its own nuances. And if the entire developer ecosystem that's building on top of Llama is starting to just assume that that llama 3 transformer architecture is the foundational and sort of standard way of doing things, it's sort of standardizing the entire stack towards this Llama way of thinking, all the way from the hardware vendors will support your training runs to the hyperscalers and on and on and on. And so standardizing on Llama itself is starting to become more and more prevalent. And so if you were to start a new model company, what ends up happening is starting with Llama today is not only great because Llama is open source, it's also extraordinarily efficient because the entire ecosystem is standardizing on that architecture. And so you're right. As an early stage fund with $500 million of capital, and we're trying to make 30 investments every fund cycle, a billion dollar trading run is essentially you're committing two funds to do one training run that may or may not work. And so that's an extraordinarily capital intensive business. And by the way, the depreciation schedule for these models is frightening. Distillation as a technique makes defensibility of these models and these notes of these models extraordinarily challenging. And it really comes down to what's your application on top of it, what's your network effects, how are you capturing your economics there and all of that. And I think what is now the case as of today is if you're a two to five person team, you can take something like coding as an example, and you could push your way into a model that generates better coding answers faster by fine tuning and training on top of Llama, and then offer an application with your own custom models that really produces extraordinary results for your customers, whether it's developers or something like that. And so our particular approach and strategy here has been to invest heavily in applications. Starting when we saw OpenAI APIs start to take off, we started to See developers talk about these OpenAI APIs in the summer of 2022. And a lot of our efforts starting then was to just find entrepreneurs that were thinking about leveraging these APIs to go after the application layer and really start thinking about what are applications that simply could not exist before this current wave of AI. Obviously we've seen some really incredible successful companies come out of that that are still early, but the kind of traction they're seeing, the kind of customer experience they're providing, the kind of biometrics, all of that has been extraordinary. You had Brett Taylor on your podcast a couple of weeks ago, so Sierra is an example of this. In procurement we have a thing called Level Path. Many other examples across the portfolio at the application layer, where you can just go through every single large SaaS market and go after it with an application layer investment and start to really think about what's now possible that wasn't possible two, three, four years ago.
[29:11]
Patrick O'Shaughnessy
I'm curious to talk a little bit about the big foundation model players that we talked about llama, but less about Xai, Anthropic and OpenAI, maybe Mata, starting with you. I'm curious just your thoughts on their strategic positioning and the things that are important for each. And maybe OpenAI as an example. Maybe the story here is just what a great brand they built and that they have so much distribution and they have all these great partnerships and people know it and use it and they have lots of people paying them 20 bucks or whatever. Maybe the distribution is more important than the product in the model. I'm curious what you think about these three players that have so far dominated, but seem through this analysis so far, it's important that they keep innovating.
[29:52]
Modest Proposal
So I think the interesting part for OpenAI was because they just raised the recent round and there was some fairly public commentary around what the investment case was. You're right. A lot of it oriented around the idea that they had escape velocity on the consumer side and that ChatGPT was now the cognitive reference and that over time they would be able to aggregate an enormous consumer demand side and charge appropriately for that, and that it was much less a play on the enterprise API and application building. And that's super interesting if you actually play out what we've talked about when you look at their financials, if you take out training runs, if you take out the need for this massive upfront expenditure, this actually becomes a wildly profitable company quite quickly in their projections. And so in a sense it could be better. Now then the question becomes what's the defensibility of a company that is no longer step function advancing on the frontier. And there I think this is ultimately going to come down to one Google is also advancing on the frontier and they most likely will give the product away for free. And Meta. I think we could probably spend an entire episode just talking about Meta and the embedded optionality that they have on both the enterprise side and the consumer side. But let's stick to the consumer side. This is a business that has over 3 billion consumer touch points. They are clearly rolling Meta AI out into various surfaces. It is not very difficult to see them building a search functionality. I joke they should buy perplexity. But you've also just had the DOJ come out and say that Google should be forced to license their search index. I can think of no bigger beneficiary in the world than Meta having the opportunity or at marginal cost to take on Google search index. But the point is that I think there will be two very large scaled Internet players giving away what essentially looks like ChatGPT for free. So it will be a fascinating case study in canvas product that has dominant consumer mind share. My children know what ChatGPT is. They have no idea what Claude is. My family knows what ChatGPT is. They have no idea what GROK is. So I think for OpenAI the question is can you outrun free? And if you can and training becomes less of an expense, this is going to be a really profitable company really quickly. If you go to Anthropic, I think they have an interesting dilemma which is people think Sonnet 3.5 is possibly the best model out there. They have incredible technical talent. They keep ingesting more and more of OpenAI's researchers and I think they're going to build great models. But they're kind of stuck. They don't have the consumer mind share. And on the enterprise side I think that LLAMA is going to make things very difficult for the frontier model builders to try to grab great value creation there. So they're stuck in the middle. Wonderful technologists, great products, but not really a viable strategy. And you see they raise another $4 billion. To me that's vindicative that pre training is not scaling so well because $4 billion is not anywhere close to what they're going to need. If the scaling vector is pre training, I don't have a good sense for what their strategic path forward is. I think they're stuck in the middle. Xai, I will plead ignorance on that one. He is a one of a kind talent and they're going to have a 200,000 chip cluster and they have a consumer touch point, they're building an API. But I think if pre training is the scaling vector, they're up against the same math problem that everyone else has, only possibly mitigated by Elon's unique ability to raise capital. But again, the numbers get so big so quickly in the next four or five years that that may even be greater than him. And then if it's test time compute and algorithmic improvements and reasoning, what is their differentiation? What is their go to market? When you have people who have staked their claim on the consumer side and then you have an open source entity on the enterprise side that's every bit as formidable. So when you look at those three, I think it's easiest to see what OpenAI's path forward is. One thing I will say about OpenAI, though, is Noam Brown, who I find to be one of the most effective communicators in the research world. He was on Sequoia's podcast recently and he was asked about AGI and he said, look, I think when I was outside of OpenAI, I was skeptical of the whole AGI thing, that that was actually what mattered to them. And when I got inside of OpenAI, it was very clear to me that they are very serious about AGI and that that is their mission and everything else is in service of AGI. It's easy for us to sit on the outside and articulate the strategy that we might pursue if we were in charge there, but I think we need to be cognizant of the fact that part of the reason they've gotten to where they are today is because they are on a mission. That mission is to develop AGI, and we should be very humble about ascribing any other end game for them than that.
[35:57]
Chetan Puttigunta
And my personal belief is that AGI is very close by.
[36:03]
Patrick O'Shaughnessy
Say more. And why is it not already here? These things are smarter than almost everyone I deal with.
[36:09]
Chetan Puttigunta
Yeah, I think so. AGI as narrowly defined, or maybe expansively defined, depending on your viewpoint, is a highly autonomous system that surpasses human performance in economically valuable work. In some cases, it's very easy to argue AGI is here using that lens. I think what is pretty clear is that if you look at the announcements made by OpenAI and their execs that have given interviews in recent weeks, an example that's brought up is end to end travel booking as something where that's something we can expect to see in 2025, where you can prompt the system to book travel for you and it'll just go do it. And that is a new way of thinking, which is end to end task completion or end to end work completion. That involves, obviously reasoning, that involves agentic work, that involves using computers, as Claude has come out with. And you're combining multiple ways of these large language models interacting with the ecosystem itself, putting into a very nice package that then is just able to do the end to end work and fully automate it and do it better than humans. And in my view, from that lens, we're very, very close to it. And I imagine that we'll be pretty close to or at AGI in 2025. I don't see how, given the current progress and the current innovation and now moving to test time, compute and reasoning, AGI is not around the corner with that lens.
[37:53]
Modest Proposal
And it's funny, because we sort of become the frog boiling in water, where we pass the Turing test pretty easily, and yet nobody sits here anymore and talks about, holy crap, we passed the Turing test. It just came and went. And so it could be that this declaration of AGI is something along the same lines, where it's like, yeah, of course the model can book end to end travel. That's not actually that difficult. Whereas two and a half years ago, if you had said, hey, there's an algorithm that you can tell them what you want to do, it books it end to end and sends you a receipt, you would say, no way. So there may be some of this boiling frog to it, where all of a sudden you wake up one day and a lab says, hey, we've got AGI. And everyone's sort of like, ah, cool. There is one particular reason, though, that lab declaring AGI is interesting in a broader sense, which obviously is the relationship with Microsoft. And Microsoft first disclosed last summer that they have the full rights to the IP of OpenAI up until AGI is achieved. And so if OpenAI elects to declare that AGI is achieved, I think then you have a very interesting dynamic between them and Microsoft, which will compound an already very interesting dynamic which is at play right now. So that's something to watch next year, certainly for public market investors, but also for the ramifications of the broader ecosystem. Because I do think again, if we're right about the path that we're pursuing now, there will be a lot of reshuffling of relationships and business partnerships as we go forward.
[39:38]
Patrick O'Shaughnessy
Chatham, was there anything else in modest assessment of the big players? And we'd love to hear your thoughts on Google since we didn't talk about them as specifically anything that he said that you disagree with or would press further on?
[39:50]
Chetan Puttigunta
No, I think what we just don't know is we don't know the underlying discussions in all of these rooms and we can speculate and understand what we might do. But I think ultimately every Internet business or technology business ultimately has come down to either on the consumer side, distribution then combines with some kind of network effect and lock in effect and then you're able to just run away with that and separate from the field. And then on enterprise it's largely been a business that's driven by technology differentiation and delivery of that technology with great SLAs with great service, with with very unique approaches to solution delivery and so modest comments on consumer and how consumer is going to evolve I think is exactly right. You have Meta, Google and XAI with consumer touch points. You have OpenAI with an extraordinary brand today with ChatGPT and a ton of consumer touchpoints already. On the enterprise side the challenge has been that these APIs have largely to date not been as reliable as what developers expect. Developers have gotten used to because of the excellent work of hyperscalers that if you are out there with APIs for a product, that product should be infinitely scalable available 247 and the only reason the API ever goes down is because some giant data center lost power or something. There's very few reasons why an API should fail has become the developer mindset to enterprise solutions. And over the last two years the quality of AI APIs has been a huge challenge for application developers. And so what's happened as a result is people have figured out workarounds and have solved all those problems with pure innovation. But going forward in this again we keep going back to this if pre training and scaling is not the way to do it and it's all about test time computer. This is where again we go back to the traditional way of hyperscalers. And I think this is where AWS is extraordinarily advantaged because Azure and Google have great clouds, but AWS has the biggest cloud it has really built for resilience in a way that's very very differentiated. And even today if you're running llama models, you want to run llama models on AWS or if for some reason you have some very specific use case and you need to support on PREM customers, you can at very large financial institutions that have complex regulatory environments or compliance reasons, you can run these models on PREM if you choose to and AWS has even gone there with VPCs and GovCloud and all this kind of stuff. And so if we assume that pre training and scaling there is done, then all of a sudden AWS becomes extraordinarily powerful and their strategy here to just be friends with everybody in the developer ecosystem over the last couple years and not pursue their own LLM efforts. Well, they are pursuing but not sort of like in the same way that others have will likely end up becoming a pretty good strategy because all of a sudden you have the best APIs service. The other part I think is Google, which we haven't talked about yet is their cloud is very good at certain things. So they have an enterprise business. That enterprise business is actually pretty scaled now if you look at the latest earnings and obviously their, their consumer business is, is dominant and there has been a perception that they're getting disrupted today. I think these forces are very disruptive to them. But it's unclear that the disruption has already happened. What are they doing about it? Obviously they're trying and it's pretty clear that they're trying very hard. But I think it's an interesting one to watch and the one that I like to watch because it's the classic innovative dilemma and they're clearly trying to be on the good side of not being innovated away as an incumbent. They're trying very hard and so there's very few cases in business history of the incumbent preventing the innovator's attack. And if they do defend their business through this era, that'll be an extraordinary achievement.
[44:27]
Modest Proposal
Yeah, Google is so fascinating because you had a brilliant sell side analyst, Carlos Kerner, who unfortunately passed away. But in 2015 and 16 he spent many, many reports writing about Google's progress towards artificial intelligence and the underlying work that they were doing at DeepMind actually was so fond of it, he went to go work at Google ultimately, but sort of first exposed this idea of the underlying work that they were doing there in neural nets, in deep learning. And it's clear they were caught off guard by the brute force scaling of the transformer that what advanced this wave of technology was literally throwing computer. But if you read any of the interviews with people who foreshadowed this data wall, one of the things they talked about was that self play might be a mode to overcome the lack of data and who is better at self play than DeepMind? And if you look at the pieces that DeepMind brings from for the transformer and what they bring together with transformer and scaling of Compute. It seems as though they have all the pieces to win. Now, the question I have always had is not can Google win at AI? It's is winning, whatever that looks like, ever going to possibly replicate how good winning was in the current paradigm? That's really the question. To Jayden's point, it would be amazing if they overcome the dilemma and win, but I think they have the pieces there. The question really is if they can build a business out of the assets that they have that in any way looks as good as what is arguably the greatest business model we have ever seen, which was Internet search. So I'm equally as fascinated to follow them. I think on the enterprise side they have incredible models and incredible assets. I think they have a lot of trust to earn. I think that over time they've come and gone in that world and so I think that's a harder axis of attack for them. But certainly on the consumer side and certainly in the model building side, they have all the assets in place to win. The question is just what does that prize look like, particularly now if it doesn't look like there may be one or two models to rule them all.
[46:59]
Patrick O'Shaughnessy
Chetan, I'm curious, as an investor seeking a return, what path you personally hope for?
[47:06]
Chetan Puttigunta
I personally hope for AI to continue for a really long time. You need big disruptions as a venture investor to unlock distribution. And if you just look at what happened in the Internet or in the mobile and where value accrued, value predominantly accrued at the application layer in those two waves. Now, obviously our hypothesis, and my hypothesis was that this layer again was going to be very receptive to distribution unlock because of innovation at the AI application layer. I think that's largely played out so far. It's still early days, but the application vendors that have come out with production AI applications for, for both consumer and for enterprise have found that those solutions, which can now only exist because of AI, are unlocking distribution in ways that was frankly not possible in the world of SaaS or prosumer SaaS or whatever. We'll give you a very specific example with an AI powered application. We're now going to CIOs at Fortune 500 companies showing these demos. And two years ago there were really nice demos. Today it's a really nice demo combined with five customer references of peers that are using it in production and experiencing great success. And what becomes very clear in that conversation is that what we're presenting is not a 5% improvement over an existing SaaS solution. It's about we can eliminate significant amounts of software spend and human capital spend and move this to this AI solution and your 10x traditional ROI definition of software is easily justified and people get it within 30 minutes. And so you're starting to see these, what used to be a very long sales cycle for SaaS in AI applications. It's 15 minutes to a yes, 30 minutes to a yes. And then the procurement process for an enterprise completely changes. Now the CIO says something like, let's put this in as quickly as possible, we're going to run a 30 day pilot. The minute that's successful, we're signing a contract and we're deploying right away. These are things like three, four years ago in SaaS was just completely out of the realm of possibility because you were competing against incumbents, you were competing against their distribution advantage, their service advantage and all this kind of stuff. And it was very hard to prove why your particular product was unique. And so since 2022, and I'll call it since ChatGPT November 2022, that seems like a really good line of pre and post in this world. We've made 25 investments in AI companies and for a $500 million fund with five partners. That's an extraordinary pace. The last time we had that kind of a pace was surprise when the App Store came out in 2009. And then the pace that we had that kind of pace was again in 95, 96 with the Internet. And in between those you see us in our pace being pretty slow. We average around maybe five to seven investments a year in non disruptive times. And clearly now our pace has dramatically increased. And if you just look at of those 25 companies, four have been infrastructure companies and the rest have been application companies. And we just invested in our first model company which hasn't been announced yet. But it's two people, two extraordinary brilliant people that are jumping to the frontier with very little capital. And so we've clearly bet and anticipated there's dramatic innovation and distribution unlock happening at the application layer. We've seen that happen already. These products are truly, as a software investor, are absolutely amazing. They require a total rethinking from first principles on how these things are architected. You need unified data layers, you need new infrastructure, you need new UI and all this kind of stuff. And it's clear that the startups are significantly advantaged against incumbent software vendors. And it's not that the incumbent software vendors are standing still, it's just that innovators dilemma in enterprise software is playing out much more aggressively in front of our eyes today than it is in consumer. I think in consumer, the consumer players recognize it, are moving it and are doing stuff about it. Whereas I think in enterprise, it's just even if you recognize it, even if you have the desire to do something, the solutions are just not built in a way that is responsive to dramatic RE architecture. Now could we see this happening? Could a giant SaaS company just pause selling for two years and completely re architect their application stack? Sure, but I just don't see that happening. And so if you just look at any sort of analysis on what's happening on AI software spend, something like it's 8x year over year growth between 2023 and 2024 on just pure spend, it's gone from a couple of hundred million dollars to well over a billion in just a year's time. And you can see this pull. You can feel feel this pull. If you're in any one of these AI application companies, it's like more of these companies are supply constrained than demand constrained. You Talk to the CEOs of these application companies and they just say things like, well, I see demand as far as I can look out. I just don't have the capacity to go service all the people that are saying yes to me. So I'm going to segment it and go to where they are. And my hope as an investor is that it continues to play out this way and that we have stability to just pursue these angles. And frankly, the model layer stabilizing is a huge boon for this application layer, primarily because as an application developer, you were sitting there watching the model layer take step function leaps every year and you kind of didn't know what to build and what you should just wait on building because obviously you want it to be completely aligned with a model layer, because the model layers are now moving to reasoning. This is a great place for an application developer. One thing you know as an application developer is humans are not patient. And so you need to always build solutions that optimize on performance and quality. You cannot go to a user as an application developer and say like, I'm going to deliver a high quality response.
[53:41]
Patrick O'Shaughnessy
It'S just going to take longer.
[53:43]
Chetan Puttigunta
That's not been a winning argument. Now for certain use cases, is that possible? Could you have it run in the background for 24 hours? Absolutely. But those use cases are not widespread and predominant and people aren't going to be willing to buy that kind of stuff. And so if as an application developer today, all of my board meetings over the last couple of weeks have been these companies saying, in this new reasoning paradigm, we're really confident that we can invest in these four things that we've been super hesitant to in the last year and a half. But now we're going to go all in on these bets and the kind of performance gains you're going to see out of our systems is going to be huge.
[54:22]
Patrick O'Shaughnessy
Sorry, why is that the case? Why does the reasoning thing make it so that their confidence goes up? Just like spell that out.
[54:28]
Chetan Puttigunta
Well, if you are an application developer and you're looking at the models today and you're saying, I can see clear efficiencies for my use case, but I have to invest in these five infrastructure layer things and these UI things. But if a new model comes out in six months and blows all that investment away just because the model itself can do it, then why would I ever invest in these things? I'm just going to wait for the model to do it and then bet on that. But in this reasoning paradigm, if all the labs pursue reasoning and reasoning is intelligence on the Y axis, time on the X axis, and that's where we're going, then any improvement that I can make in my own tool to make either that reasoning time dramatically compressed because of the way algorithmically I'm feeding reasoning and I'm able to take the data and manipulate it and all that kind of stuff, I should invest in it now. If reasoning is now the new paradigm and the last mile delivery at the application layer against these reasoning models means that I'm building technology and tooling that model companies are very, very unlikely to build. And as those reasoning systems continue to get better, my last mileage and last mile delivery systems are still advantaged and defensible.
[55:42]
Patrick O'Shaughnessy
Do you both have favorite examples of this? Beyond coding and customer service, which seem to be the two dominant and incredibly exciting and cool use cases with lots of companies chasing after versions of that, do you have other favorite examples that would fit the CIO of the Fortune whatever company saying we need this in our company now?
[56:03]
Modest Proposal
Jason loves all his children so he's not going to be able to give you specific examples.
[56:06]
Chetan Puttigunta
I can give you 20 of them.
[56:10]
Patrick O'Shaughnessy
Maybe like categorically is my question. There's coding, there's support, essentially top down.
[56:15]
Chetan Puttigunta
Look at the biggest bends of enterprise software and you could attack that with an AI powered AI first solution. And so we've got a great company called 11X that's going after sales automation. We've got a great company called Leia that's being used by lawyers to dramatically increase the efficiency of their work. I think legal has been a very interesting question because people assume that lawyers work on billable hours. If you're automating billable hours, aren't their economics going to change? Well, now the evidence two years into this is that actually lawyers end up becoming way more profitable by using AI. And the reason is, is that a lot of the work that was wrote repetitive and hard and done by junior people inside of law firms, law firms weren't able to bill for that stuff anyway. And so if you can take down the time to do document analysis from three or four days to 24 hours, all of a sudden you free up all your lawyers to do all the strategic work that they can bill for and stuff that is extremely valuable for clients. And we've got a company that's automating accounting as an example, and financial modeling. We've got a company that's changing how game development is working. We've got somebody that's going after circuit board design, which has been a hugely manual and human intensive thing and computer systems are particularly really good at. And we recently invested in something going after an ad network. Now that's been something that's not been touched for a long time from startups, but it turns out matching the people that have inventory with the people that want to do advertising in the AI world is just way more efficient. And so we invested in a company that's got a new document processing model and they're going after open text. When was the last time a startup thought about open text? It's been a long time where these huge incumbent SaaS markets were thought to be open to new startups. So you had to pursue more niche, more vertical. And I often joke because like I saw this, it was payroll for field workers working in Eastern Europe, was a SaaS company that you like legitimately had to think about in 2019 and now we're back to like large swaths of horizontal spend again to say like, hey, there's an incumbent here that's worth 10 billion plus. The market here is 10 billion of annual spending. AI makes a product here easily 10x better, faster and all the things that users want when they see it. And you need a new platform to come out with that kind of advantage. And that's what this is.
[58:54]
Modest Proposal
And Patrick, you asked back at the beginning about the big debate on ROI and CapEx and all this, and when you listen to Chip and when you listen to other investors in the application layer, when you listen to the hyperscalers, the big Takeaway over the last three months is the use cases are coming. Yes, everybody knows about coding, everybody knows about customer support, but this is really starting to permeate and get out into the broader ecosystem and the revenues are becoming real. The challenge on the ROI question always was, okay, you put the capital in here, you then amortize it over the inference period, but meanwhile, you are then stacking the next quantum of capital for the next model. And so everybody could draw those extrapolations and say, oh my God, it's not just that Microsoft is going to spend $85 billion in cash CapEx inclusive of the leases in 2025, it's what does it mean for 26, 27, 28? Because the pre trained models were getting so big if. And again, it's an if we are plateauing and we're spending less money on pre training and moving that capital towards inferencing. We know that spend is coming, we know the revenue generation of the customer is coming. And so it becomes much more easy to say this spend is warranted. I think it is important that people remember the underlying clouds of these companies, meaning just the normal storage and compute, are still growing high teens to low 20s. So there's some capital that needs to be allocated towards that. When you're a hundred billion dollars business growing 18%, you're a $60 billion business growing 25%. It's the incremental capital above that everybody was very concerned about. Six, nine months ago, my personal takeaway coming out of Q3 was, okay, I see it. There's use cases here. The inferencing is happening, technology is doing what it's supposed to. The cost of inferencing is plummeting, the utilization is soaring. You put that together, you get a nice rising pot of revenue and everything's good. Satya and Adela talked about this. The challenge is you spend the money for the model, you get it on inferencing, but then we're spending on the next model. If we can start to say, hey, maybe we're not going to spend the next $50 billion on the model, the ROI calculation looks a lot better. And something that you asked Jason was why is stability in the model layer important? I think Sam Altman gave the right answer on this, which was six months ago, he was on a podcast and said, if you're scared of our next model being released, we're going to run you over. If you're looking forward to our next model coming out, then you're in a good position. Well, if the Actual reality is the next model is going to be at inference time and not retraining. You probably have less worry about them steamrolling. So I think everything that we are talking about in this one pack is very conducive to a favorable economic reality for the entire ecosystem, which is all the attention capital being put towards inferencing. The real concern was do we need to spend fifty hundred, two hundred billion dollars to build these ever more accurate models in pre training, where do prices.
[62:26]
Patrick O'Shaughnessy
Most reflect extreme optimism or hype? Still, I've certainly seen my fair share of private markets companies, let's say series A type companies that price at extremely high valuations. They're often incredible teams and very exciting, but they're also playing in spaces where if something works, you could imagine lots of other very smart investors funding some competitors. So you see these scenarios where it's like great team, high price, high potential competition, really exciting, everything's moving fast. I'm curious what signals you both read from valuations and or multiples right now.
[63:01]
Chetan Puttigunta
In the private markets. One of the things that's happening is just the dramatic drop in prices of just compute whether it's inference or training or whatever, because it's just becoming way more available. If you're sitting here today as an application developer versus two years ago, the cost of inference of these models is down a hundred X, 200 X. It's frankly outrageous. You've never seen cost curves that look this steep, that fast. And this is coming off of 15 years of cloud cost curves which were amazing and mind blowing by themselves. The cost curves on AI are just a completely different level. We were looking at cost curves in the first wave of application companies that we funded in 2022. You look at the inference costs and it would be like 15 to $20 per million tokens on the latest frontier models. And today most companies don't even think about inference costs because it's just like, well, we've broken this task up and then we're using these small models for these tasks that are pretty basic and then we're like the stuff we're hitting with the most frontier models are these like very few prompts and the rest of the stuff we've just created this intelligent routing system. And so our cost of inference is essentially zero and our gross margin for this task is 95%. You just look at that and you're just like, wow, that is a totally different way to think about application gross margins than what we've had to do with SaaS and what we've had to do with basically software for the last decade plus. And so I think that's where you're starting to look at and saying the entire application stack for these new AI applications. And it starts with people that provide inference. It starts with the tooling and the orchestration layer. So we have a portfolio company that's extremely popular called LangChain and the inference layer. We have fireworks. These kinds of companies are seeing extraordinary usage by developers and then all the way up the stack to the applications themselves. I think just the pace of innovation, pace of commercial success is driving a lot of excitement with private investors. What is also appealing of model stability is now we can finally assume if this sticks, that all these companies are going to be fairly capital light. Because if you're not having to spend a lot on pre training, if you're not going to have to spend a lot on inferencing, because most of the hyperscalers are now going to present you with really reliable APIs at these kinds of costs. It's a great time to be in the application development business and it's a great time to be in the application development stack.
[65:42]
Patrick O'Shaughnessy
Modest. What do you think on valuations?
[65:45]
Modest Proposal
I think you have to start in general with Animal Spirits. If you go back to the week before ChatGPT was released, if you go to the fall of 2022, tech had probably just suffered its most brutal bear market since the dot com collapse. It was arguably worse for the median tech stock than even the financial crisis. You had some of the very large growth funds down 60, 70%. You had the hyperscalers laying off people for the first time ever. You had capex cuts, you had OPEX cuts. It was a very different vibe in the tech world and in the public markets at large. The release of ChatGPT catalyzed the reemergence of Animal Spirits. And it's been a progressive process. But I think where you are today, you have the public markets trading at 24 times earnings. And again, this goes beyond just the MAG7 at this point. I mean, Google trades it, I think 19 or 20 times, so they're not one of the offenders here. And so I think in general there is a lot of optimism baked into the public markets, a lot of which is tied thematically to this idea that we're in a new platform era and that the sky is the limit for a lot of various new concepts. So there's that global overhang, if we are right. I think that what it really comes down to is understanding what does this new path forward look like if Capex and Hyperscaler Opex is more closely tied to revenue generation. If you listen to aws, one of the fascinating things they say is they call AWS a logistics business. I don't think anyone externally would sort of look at cloud computing and say oh yeah, that's a logistics business. But their point is essentially what they have to do is they have to forecast demand and they have to build supply on a multi year basis to accommodate it. And over 20 years they've gotten extraordinarily good at what has happened in the last two years and I talked about this last time is you have had an enormous surge in demand hitting inelastic supply because you can't build data center capacity in three weeks. And so if you get back to a more predictable cadence of demand where they can look at it and say, okay, we know now where the revenue generation is coming from. It's coming from test time, it's coming from Chatham and his companies rolling out. Now we know how to align supply with that. Now it's back to a logistics business. Now it's not grab every mothballed nuclear site in the country and try to bring it online. And so instead of this land grab, I think you get a more reasonable, sensible, methodical rollout of it may be. And I actually would guess that if this path is right, that inference overtakes training much faster than we thought and gets much bigger than we may have suspected. But I think the path there in the network design is going to look very different and it's going to have very big ramifications for the people who were building the network, who were powering the network, who were sending the optical signals through the network. And all of that I think has not really started to come up in the probability weighted distributions of a huge chunk of the public market. And look, I think most people overly fixate on Nvidia because they are sort of the poster child of this. But there are a lot of people downstream from Nvidia that will probably suffer more because they have inferior businesses. Nvidia is a wonderful business doing wonderful things. They just happen to have seen the largest surge in surplus. I think that there are ramifications far, far beyond who is making the bleeding edge gpu. Even though I do think there will be questions about okay, does this new paradigm of test time compute allow for customization at the chip level? Much more than it would have if we were only scaling on pre trade. But I think this question whenever I have this in normal conversations, people overly fixate on Nvidia, I think people like to debate that particular name, but I think there's a lot of other derivative plays of the AI build out where the distribution of outcomes have shifted and that has not been reflected yet.
[70:41]
Chetan Puttigunta
I just think it's really important to think about in the test time and the reasoning paradigm from an application layer, how many of your prompts actually utilize reasoning as a way to respond to those prompts. And yes, application developers, as this technology becomes more available and usable, will use way more of it than they are today. But if you just look at the current techniques and the wows you're getting from the application layer already, what percent of prompts or what percent of queries are going to use reasoning? It's very hard to squint and say it's going to be 90% of queries. That doesn't seem like it's going to go that way because again, your users are not going to wait. Humans are inherently impatient and you have a solution that's like just spinning and thinking your users are gone. It doesn't matter what sector they're in, they're just gone. And so yeah, you can have a certain set of tasks that take a long time and deliver great accuracy, but speed is by far the most important consideration for these application developers. And so are we just going to have a system that continues to just go back and through and back and through and utilize all this compute in what market share of queries use that? It's hard to imagine that being super majority of queries. And so then the implication, at least from a private market, early stage investor, which take huge grains of salt on what it means for anything other than my world. But the implication there is simply that you just don't need as much compute as you did with training. Training is just a constant exercise. You're scaling and you're just really hitting all your compute power all the time and just going at the application layer, it's extraordinarily bursty. You're going to have certain tasks that need a lot right away, and for a lot of it you just don't need a lot. And so this is where again, like hyperscalers and things like EC2 and S3 were incredible. And now in this new world, the solutions from hyperscalers are really terrific. I think AWS's training and the TPUs from Google are really, really terrific and they offer a great developer experience. I think part of what has been known for application developers is that GPUs are really tough to use for this use case, getting max utilization out of GPUs chained together, whether you're buying that from Dell or whether you're buying it from a hyperscaler is just really hard to use. But with new software innovations that's obviously gonna get better. And then the stuff that's coming out from the hyperscalers themselves, they're really, really terrific and you just don't need to hit em as hard as you did when you were doing training. When you're doing test time compute.
[73:28]
Modest Proposal
I think it's a really important point in the utilization of the GPUs. If you think about a training exercise, you're trying to utilize them at the highest possible percent for a long period of time. So you're trying to put 50, 100,000 chips in a single location and utilize them at the highest rate possible for nine months. What's left behind is a hundred thousand chip cluster that if you were to repurpose for inferencing is arguably not the most efficient build because inference is peaky and bursty and not consistent. And so this is what I'm talking about that I just think from first principles you are going to rethink how you want to build your infrastructure to service a much more inference focused world than a training focused world. And Jensen has talked about the beauty of Nvidia is that you leave behind this in place infrastructure that can then be utilized. And in a sunk cost world you say sure, of course if I'm forced to build a million chip supercluster in order to train a $50 billion mob, I might as well sweat the asset when I'm done. But from first principles it seems clear you would never build a 350,000 chip cluster with 2 1/2 gigawatts of power in order to service the type of request that Chetan's talking about. And so if you end up with much more edge computing with low latency and high efficiency, what does that mean for optical networking? What does that mean for the grid? What does that mean for the need for on site power versus the ability to draw from the local utility? I think these are the types of questions I would be very interested to read about. But to date a lot of the analysis is still focusing on what's going to happen when we light up Three Mile island, because the new paradigm is really too soon to change.
[75:41]
Patrick O'Shaughnessy
Do you think that we still need and will see though tons of innovation in the semiconductor world and layer, whether it's networking, whether it's optical, whether it's chips, Themselves different kind of chips.
[75:54]
Modest Proposal
I would imagine this would accelerate it even more because it was very difficult to foresee a world where you took on big green in training. The way I think about this over hundreds of years is you have a gold rush, a land grab and everybody's just doing whatever they can. But in technology then as some stability sets in, you get an optimization period. You've already had that on the inferencing side. It's what Chatham referenced is people had time to optimize the underlying algorithms in computer and inference has fallen 99%. It's the same thing that happened with Internet transit at the end of the bubble which was people said no, you can never stream a movie. Do you have any idea how much that would cost? And the cost of Transit has fallen 25% a year like clockwork for 20 years. The literal profit pool of that business is static for 20 years. And so I think we've had this mammoth demand surge and I think that if we get a little bit of stability and everyone can take a breath, there will be the two guys in the garage optimizing every single thing possible. And that's the beauty of technology over the long term is it is deflationary because it's an optimization problem. But you don't have time to optimize when you're in land grab mode. I quoted this to you last time. The data center industry, they were power neutral. There was no demand growth in power for the entire data center business for five years. That was because you were in the fully mature stage of the cloud data center buildup. I don't know when you'll reach that point. I mean we know that these guys on a 3, 4 year, at least 26 or 27 are pot committed to their build out. When in that path will everyone have time to take a deep breath and say okay, now let's figure out how to run these more efficiently. That's just the nature of things. Same thing on the compute side. I just think we haven't yet gotten to the point where technologists have been able to apply their optimization. They've been in the implementation and I'll.
[78:09]
Chetan Puttigunta
Give you a couple of data points from my end. So my partner Eric is on the board of a great semiconductor company called Cerebras and they recently announced that inference on Llama 3.1405 billion for cerebras is it can generate 900 plus tokens per second which is a dramatic order of magnitude increase. I think it's like 70 or 75 times faster than GPUs for inference as an example. And so as we move to the inference world, the semiconductor layer, the networking layer, et cetera, there's tons of opportunities for startups to really differentiate themselves. And then the second thing I would bring up is I was just recently talking to the CIO of a large financial services institution who said that over the last two years they were pre buying a lot of GPUs because they assumed that they were going to have lots of AI workloads and who knows if maybe they needed to do some training themselves. And so those systems are now being installed into their data centers and they're now online and we're in this world where you don't need to create your own model. And even if you did, you just fine tune an open source model. It's not that heavy. And so his view is like, look, if you have AI applications that run on prem like it's essentially free, I have all this capacity, I'm not using it for anything. Inference is light and so I have at the moment infinite capacity to run AI applications on premises and it'll cost me zero marginal dollars because it's all, all the stuff is up and running and I'm not using it for anything, so I'm ready to buy. So not only are all these application things that you're talking about hugely exciting because they unlock our and all the stuff, but the minute you can run any of this stuff on prem on our stuff, that dramatically decreases the cost for us. And so it's just like win, win, win all over the place when you have something like that. And that's the current state of play. Now how long does this over capacity last? Application developers are famous for using all the capacity and pushing the limits and all of a sudden what used to be over capacity ends up becoming under capacity because all of a sudden we have all this bread build out and we decide to stream video on it. And so of course AI applications are going to get more sophisticated and swallow up all this capacity, but that is just a much more predictable world and much more sane world from an investment perspective than scaling to infinity on pre training.
[80:34]
Modest Proposal
The one thing I am curious to monitor is it's important to remember that the reporting was not that the models weren't getting better, it's that the models weren't getting better relative to expectation or the amount of compute applied to them. So I think we do need to be cautious to conclude that the labs are not going to keep trying to figure out the unlock on the pre training side, I think the question there is one, what should we be looking for? But then two is if they continue to push on that vector, do we believe and this was the question I always wrestled with was if scaling laws held in pre training would people be willing to spend $100 billion? And I know that everybody says if you're playing for the ultimate prize you would. But has enough doubt been cats that simply brute forcing pre training is the path to that ultimate unlock? Or is it now some combination of pre training, post training and test time compute? In which case again I think that the world is just the math is much more sane. And I've seen a lot of notes coming out saying people are declaring the end of AI progress and all that. And hopefully the takeaway from today is none of that is what I think people really in the weeds looking at this are saying. People are saying AI is full speed ahead. I think the question is just what the axis of advancement is. And from my seat the math seems much more sensible. Everything seems much more rational pursuing this path rather than the upfront cost being spend any amount you can to build this hypothetical God. So I think this is a much better outcome if this is the path that we end up going down.
[82:38]
Patrick O'Shaughnessy
I'm curious what you think, if anything, is the most under discussed part of this whole story. Are there things that you find yourself thinking a lot more about than you hear discussed from your friends and colleagues.
[82:50]
Chetan Puttigunta
On the public investor side? Just reading sell side reports that we haven't seen sell side reports or analysis on what this new paradigm of test time compute means and how things change. And so I'm really looking forward to way more sell side analysis on this new paradigm shift. I think there's also little coverage in the private markets. I think it's known to people that are meeting these entrepreneurs is just how capital efficiently these entrepreneurs are getting to the frontier today. And this is just a shift that's happened very, very recently. And you're seeing people just show up and having spent under a million dollars to match performance not broadly, but in specific use cases with the frontier models. And that's just not something that we were seeing two years ago or even a year ago. And so I think that's pretty dramatically.
[83:38]
Modest Proposal
Undercovered pre training is a big test of capitalism. If we pursue down this path, I feel much better with a microeconomic background analyzing what's going to happen because you don't have to put in the NPV of God. And I just think that that's much better in Terms of what I'm looking forward to reading and hearing. Yeah, I'd love to see thoughtful in our house. This really wrestle with right now. I feel like it's a little defensive. People are defending the fact that scaling's not done, it's just moved. So that's great. But let's now work through the second order effects, the third order effects. And how does this really manifest itself? I think it's very good for the overall ecosystem, the overall economy, but I think there's going to be a lot of surplus shifting from pockets that looked like winners before and pockets that looked like losers.
[84:32]
Patrick O'Shaughnessy
What outcome in the next six months would most disorient you?
[84:37]
Chetan Puttigunta
Well, two dramatic examples on the positive side, if somebody came out with results that pre training was back on and there was a huge breakthrough on synthetic data and all of a sudden it's go, go again and ten billion dollars and a hundred billion dollars cluster would be back on the table. You would go back, but all of a sudden the paradigm shift would be wild. All of a sudden we would now be talking about a hundred billion dollars super cluster that was going to pre train. And then obviously if my expectation comes out that next year we're going to call AGI, we're going to have AGI and we're building a hundred billion dollar customer because we had a breakthrough on synthetic data and it all just works and we can just simulate everything. That would be pretty dramatically disorienting. I think another scenario is it's pretty clear now that while we've exhausted data on text, we are not close to exhausting data on video and audio. And I think that it's still TBD on what these models are capable of on new forms of modes. And so we just don't know because the focus hasn't been there. But now you're starting to see large labs talk more about audio and video. What these models will be capable of from a human interaction perspective, I think it's going to be pretty amazing. I think you've just seen already how much leaps have gone into image generation and video generation and what does that look like in a year's time? In two years time could be pretty dramatically disorienting.
[86:10]
Modest Proposal
Yeah. I think the hard part as a non technologist is for the last year, year and a half, the question has been what would GPT5 bring if it adhered to the scaling law and no one could really articulate because all we know is okay, training loss would be lower. So you'd say, okay, this thing's More accurate at next token prediction. But as far as what does that actually mean from a capability standpoint? What's the emergent capability we were unaware of before it was released? So I think it's really hard to know ex ante what you're looking for other than the labs coming out and saying this is so good in its accuracy that it warrants staying on this log linear trajectory of spec. And if someone comes out and says that, I think irrespective of this entire conversation or what you may believe, you have to say, okay, that's happened again. I just think you have to have a super open mind. And if we were having position three months ago, there were whispers, but it wasn't in the open. I just think you have to be updating your priors constantly. So clearly, like Jason said, I'd be looking for that. Personally, I watch Llama closely. There's clearly a risk at some point that they decide not to keep open sourcing. And if I were other players in the ecosystem, I would be doing my damnedest to make sure that Llama stays open. And there are certain ways you could go about doing that. But I think that's one thing, because their willingness to spend at the frontier and make those models available the way they do, I think has completely changed the strategic dynamic in the model industry. So that's another one that I would be paying attention to.
[88:04]
Patrick O'Shaughnessy
I have a philosophical question as we near the end of the discussion, which is around asi. So if AGI is here or coming next year, how the both of you would even think about. I guess it builds on that point about what do we even expect from a GPT5 that stays on the scaling wall? What does it mean? Because there are fewer and fewer things, at least in a simple chat interaction, that I could imagine it doing a much, much better job on, or even what that would look like. And again, we're probably just in the early innings of application development and fine tuning and improvement and algorithmic updates and blah blah, blah. So I'm curious, just philosophically, what you think the litmus test could or might be for something beyond what we have naturally, as the existing models get tweaked and tuned and better, what does the ASI even mean? Does it mean it solves previously impossible math or physics challenges or something else? What does that idea mean to you both?
[88:59]
Chetan Puttigunta
These are my words. I don't remember who originally stated them, but humans are really good at changing the goalposts on expectations, and AI in the 1970s meant something different than what it meant in 80s and 90s, 2000s and in 2024. And so if a computer can do it, humans have a really good way of describing that as automation and whatever a computer can't do, that now becomes the new goalpost for AI. And so I think that these systems are already extraordinarily intelligent and are extraordinary at replicating human intelligence and sometimes exceeding human intelligence. I think if you just look at the path that some model developers like DeepMind and several startups are pursuing with things around math and physics and biology, it's very clear that there's going to be applications and outputs of these models that are going to be things that humans were simply not capable of doing before. We already have seen that in things like protein folding. Today we're starting to see a little bit of that as it relates to math proofs. I am confident we're going to start seeing that as it relates to physics proofs. And so my optimistic hope for humanity is that, I don't know, we'll be able to open wormholes or something, we're going to be able to study general relativity at a scale that we haven't been able to before, or study black holes, or simulate black holes in a way that we haven't been able to. All of that sounds a little bit ridiculous at the moment, but certainly the way things are progressing and the way things have progressed, we don't know what is possible and what's not possible. And to bring it back to an investor point of view, when you have the unknown future where the possibility is up to your imagination, that's usually a really great time to be an early stage investor because that means that technology has unlocked. And usually when technology unlocks in a dramatic fashion, distribution also then unlocks. And you can now go get customers that were very expensive to get. And so previously if you wanted to build a consumer application, you then had to factor in the tax of the app stores, search, ad networks and all that kind of stuff. And all of a sudden it just became a very quick exercise in unit economics and similarly in SaaS, it was like a productivity gross margin and infrastructure costs and you just tried to do a spreadsheet exercise. And early stage investing started to become more spreadsheet like than true technology innovation. I think when you have big breakthroughs like this, everything sort of changes again, like distribution is nearly free. If you have something unique and there's like a word of mouth and virality factor to it, the technology spend is really, again, you go back to just investing in your developers and your research scientists and the R and D and the ROI and R and D starts to become remarkable again. That's what's most exciting as an early stage investor is that we just don't know what the future holds and therefore it's back to human ingenuity and people able to push these boundaries.
[92:01]
Modest Proposal
When it's exciting for a early stage investor, I think it's terrifying for a somewhat skeptical public market investor. Prices are based on vibes and not math. In the spreadsheet with asi, I think we've talked about this before, this whole concept. The reason people spend so much time on it is because it is so profound. Ultimately you have people who are invoking quasi religion in their view of what we're building. Anytime that comes into play, I think the stakes are just higher. It's kind of unknowable and it's super complicated. So we all love to debate it. But I think the one thing we haven't touched on here, which is there's a pretty fervent belief amongst a group of people that there will be recursive self improvement at some point in time. And I think that that would be a big path to unlock in whatever hypothetically ASI means is when the machines are smart enough to learn themselves and teach themselves. On a less sort of dramatic view, the way I think about this, there's AlphaGo which famously did that move that no one had ever seen. And I think it's like move 37, everybody was super confused about, ended up winning. And another example I love is Norman Brown because I like poker, talked about his poker bot confused. It was playing high stakes, no limit, and it continually over bet dramatically larger sizes than pros had ever seen before. And he thought the bot was making a mistake and ultimately it destabilized the pros so much. Think about that. A computer destabilized human in their approach that they have to some extent taken on over betting now into their game. And so those are two examples where if we think about free training being bounded by the data set that we've given it, if we don't have synthetic data generation capabilities. Here you have two examples where algorithms did something outside of the bounds of human knowledge. And that's what's always been confusing to me about this idea that LLMs on their own could get to superintelligence is functionally they're bounded by the amount of data we give them upfront. And so if you have examples like this where algorithms are able to get outside of what they're Initially bounded by. That's super interesting. I'm not smart enough to know where that leads us, but that's the kind of thing that I feel like is the next thing to come is how do you escape the bounds of what you're given up front?
[94:39]
Chetan Puttigunta
I think what's remarkable from my perspective is how much of this innovation is happening in the United States and how much of it is happening in Silicon Valley. We've had a rough couple of years since the pandemic, and it's really amazing. There was an investor friend of mine who's not based in Silicon Valley, and he was just saying, I can't believe it's happening in Silicon Valley again. And it's just become this beacon where all of the labs are based here. A lot of the people that are working on these applications, these infrastructure companies, et cetera, are here. Or even if they're not here, they're somehow connected to being here and are often visiting here a lot. And I would say that the focus on the innovation here is really extraordinary. On AI, the progress being made in the United States, in Silicon Valley specifically, is extraordinary. I do think that there's a level of attention that investors and entrepreneurs now have is how fragile the system is and how much we need to protect it and continue to invest in it. And I think there's a lot of focus now that innovation is something that needs to be protected. And I think a lot of people are now paying a lot of attention to make sure that all of this innovation that's happening in the United States continues to benefit everybody. And I think that's a really optimistic and cool thing to recognize.
[96:01]
Modest Proposal
The agglomeration effects are real. If the reporting is right, the way the transformer paper came to pass is that someone was rollerblading down the hall and heard two guys talking about something and went in and whiteboarded and two more people came by and who knows how much of that is a Parker forward up. But it is fascinating to see from an economist standpoint that like these human network effects are real and that Covid did not destroy them and that work from home did not destroy them. And that there really is something tangible to being together and the synthesis of ideas, multidisciplinary coming together to build this world. Changing architecture.
[96:43]
Patrick O'Shaughnessy
Guys, it's always such a blast talking to you both. I'm lucky to get to do this private. It's fun to do it in public. Thanks for your time.
[96:48]
Modest Proposal
Of course.
[96:49]
Chetan Puttigunta
Thank you.
[96:51]
Patrick O'Shaughnessy
If you enjoyed this episode, check out joincolossus.com there you'll find every episode of this podcast, complete with transcripts, show notes, and resources to keep learning. You can also sign up for our newsletter, Colossus Weekly, where we condense episodes to the big ideas, quotations and more, as well as share the best content we find on the Internet every week.
[97:13]
Modest Proposal
SA.