Transcript
A (0:00)
This podcast has been prepared exclusively for institutional, wholesale, professional clients and qualified investors only, as defined by local laws and regulations. Please read other important information which can be found on the link at the end of the podcast episode.
B (0:24)
Morning welcome back to the Eye on the Market podcast. This one is mid May and it's called Back to our regularly scheduled programming and update on AI. I've spent a lot of time this year on the intersection between politics and economics and markets for good reason. There was a flurry of executive orders and memorandums and proclamations on tariffs, which was a catalyst for the first Sell america episode since 1982. And by sell America, I'm referring to material and simultaneous decline in US equities, the dollar, treasury bonds combined with US Equity underperformance versus the rest of the world. Like Blanche dubois and Streetcar Name Desire, the US Relies a lot on the kindness of strangers. And one of the first charts in this piece shows how much the US is now reliant on foreign versus domestic net savings. So US is almost entirely reliant on this basis anyway on foreign net savings. So Sell America episode is not a good one, but it looks like for my 62nd birthday, I think that's right, Trump is going to set the China reciprocal tariff rate at 10% like the rest of the country, rest of the other countries, in which case we've updated our tariff rate on all US imports and it now looks like if we assume this temporary negotiation holds that we're approaching an equilibrium state, which is a roughly 10% reciprocal tariff on a bunch of goods, other goods get exempted and then other goods get subject to section 232 product specific tariffs. The big picture is that you're still looking at the largest tariff increase in, you know, 70 years or so, but it's a lot lower, roughly half of what it was, let's say a month and a half ago. So now that that's happening and we're approaching maybe some kind of steady state that countries and companies can adapt to, let's go back to some regularly scheduled programming, which is an update on AI, which was the primary driver of US Equity markets before all this trade war stuff began. And even during all of this trade stuff going on, US companies spent more time on Q1 earnings calls talking about AI adoption than they did about tariffs, which is interesting. And another thing to keep in mind is that the market capitalization of companies that benefit directly or indirectly from AI is two and a half times larger than the market cap of the company. U.S. companies that would be the victims of Tariffs. So I think you could make the argument that AI is at least as important as tariffs to equity investors, if not more so. At the same time, the premium one would pay for AI plays relative to the stock market is back down to the level it last was in 2017. So it's probably a good time to be taking a look at this. So one of the things that happened during all the tariff stuff and the Sell America discussions was a lot of people were writing how US equities are very, very expensive versus the rest of the world. And if you just looked simply at PE multiples of the US versus Europe or Japan, they that's what you would probably conclude. But equities can be cheap or expensive relative to each other for certain reasons. And I always remind people US companies are a lot more profitable than their non US counterparts. And so if you look at the we have a chart in here. I have it shown on this page for people watching. If you plot ROE against price to book, for example, in other words, fundamentals versus valuations, there's a very linear relationship. The more that a sector, the higher the sector are we, the higher the price to book ratio. And so on that basis, the US equities don't look quite so mispriced relative to the rest of the world. Here we're comparing it to the developed world x us. We have a number of different ways of running the chart that will all tell you the same message. Now the top dot on here is of course the tech sector, which has the highest price to book ratio, but also by far the highest projected return on equity. And as a sign of just how successful the tech and interactive media space has been, that sector now accounts for 35% of all the earnings in the market, compared to just 19% a decade ago. And over the last couple of years, the primary driver in the tech space has been AI adoption. So that's what we want to focus on. So here we refreshed Stanford does this and we refreshed their chart on how AI capabilities are advancing. They've been generating this chart for several years and it and it looks at how AI models do versus humans on a number of different things related to classifying images, visual reasoning, language, understanding math, science, things like that. And AI capabilities have now more or less matched or exceeded humans. And at the same time, costs have gone down a lot. Increasingly small but very powerful models inference costs for a system performing, let's say at the level of GPT 3 and a half dropped by almost 300 times between November 22 and October 2024. And also hardware costs are declining, energy efficiency is improving, open weight models are closing the gap with closed weight models. So there's a lot of things going on. I think it's now that the tariff equilibrium appears to be set, I think it's time for us to refocus on these other things. So what I want to do is quickly walk through and this is what's in the eye on the market this week. I want to walk through the things that are most visible to the things that are least visible. What's most visible is the increase in hyperscaler spending, whether in dollar terms, share revenues. I think it was almost the way we track it. We look at capital spending plus R&D. And just for the four big hyperscalers, it was 450 billion in 2024 and expected to be substantially like 30% higher that in 2025, which is kind of amazing. So that's the most visible thing you can see. Then the next visible thing that you can see is improving capabilities of AI models on versus tasks and exams and things like that, you know, on paper. Then the next visible thing to see, which is not as visible, is AI adoption by the corporate sector. And then the hardest thing to find is to understand the true pace of AI related revenue growth of the hyperscalers, which is really an important thing. So right now some of these things are a lot more visible than others. The hyperscalers continue to live by this mantra. We have more to lose by underspending than overspending. Okay, great. But at least we can see some evidence of AI adoption and revenues associated with it. So when I first started writing about language models in February 2023, there were a lot of questions about hallucinations and just how relevant language model scores on multiple choice exams were when the models were trained on the answers to those exams. So all you were really getting was a sense of are they good memorizers? And yes, they are good memorizers. But progress has been made on a number of fronts. The models are now tested against way more advanced exams than simple multiple choice. And while you can't eliminate the contamination issue entirely, in most cases, a lot of these models are doing much better on graduate level science questions that require multi step reasoning across physics, biology and chemistry. And they're doing better on math questions that involve symbolic reasoning in algebra and combinatorics and number theory, and not just pattern following and guessing the next word or guessing the next number. So a lot has been done since, let's say over the Last two, two and a half years. And here's a chart, for example, on how the different language models are doing on this Google Proof Q and A test. In other words, things that you can't find the answers on Google, or at least not very easily. And from from mid 23 to, let's say the fall of last year, the models were still languishing in the 30 to 50% range in terms of scores. With the advent of some of the reasoning models, those scores have gone up to, let's say, 70 to 90% across most of the models that you look at. Similarly, reasoning models have really helped how these models do on math. So the next chart is 1 on this US Math Olympiad selection exam. Again, the models were languishing with really crappy scores. I don't know if that's a compliance approved word, but whatever. In 2023, and then late last year in the fall, again around the same time that the model started doing better on the Google Proof exams, they started doing better on some of these math exams with the advent of the reasoning models, whether it's Claude or Gemini or O3 or things like that. Now those are just exams, right? And exams don't have a ton of practical use in the real world. They're interesting things to look at. What's more interesting to us is how do they do on coding? And these models are now being used to see how they can do in terms of writing and editing code. And here, this test looks at the ability of these models to execute over 200 tasks in multiple coding languages. Some of them are still only getting a little bit more than half of these exercises correct. But other ones like the 04 Mini and Gemini 2 and a half Pro, which is Google's product, is doing much better in the 70 to 80% range. And remember, there's something important about models like this. If you're dealing with a system like air traffic control, self driving cars and interpreting people's MRIs, mistakes are catastrophic. And so a model that scores less than perfect is a problem. But for most tasks that a lot of these things are being used for and might be used for, there's the ability to both apply human intervention and also other models to come in and clean up mistakes. So when the consequence of mistakes are not catastrophic, I think model success scores less than a hundred percent can be perfectly viable and still, and can still add to productivity. Here's another score, same thing. AI model coding competitions within OpenAI's own universe. When the reasoning models kicked in last year, the score started to go up substantially. So we get into all the details on this. That's enough of that technical stuff, which actually one more because I think this is important too. How complex a task can these models try to tackle? When we first started talking about these models a couple of years ago, we were using them as benchmarks against just looking for an answer to a question in Wikipedia. Whereas now we're asking them to write emails, create websites, analyze data sets. And so a recent paper in Nature magazine looked at how long these models can stay on track while working through some very complex multi step problems. And these things have improved a lot. So if you want more information on that now, the models still struggle with certain real world issues. There are certain, I think more valuable tests that look at things like can they fix bugs or add features in GitHub repositories? So far only Anthropic's Claude product has more than a 50% score on this. And there's other examples as well. There's something called Humanity's last exam where most of the models still do quite poorly. And then I also asked GPT4 to draw a map of Europe and it did a hysterically and hilariously bad job even after when I asked it to try and fix referred to it labeled London as the city as bland. Now no argument there but you know that's a mistake. And then and that's in the appendix of the written piece. I think the most important thing is other than the coding exercises, none of these benchmarks really have any impact on a chief technology offer officer that's thinking about enterprise adoption of AI or things to drive their business impact through enterprise use cases. So that's we're going to look at next. Enough with all of these theoretical exams and things things what's going on in the real world now to be clear, a lot of these tests and exams and tasks are things that a lot of these models are scoring well on, but only after all of the AI model builders torture their models to do well on them. And so we also have to take a look at the hallucination rates that some of the new REZ models are experiencing in the wild when they're not working on certain preset exercises. And they are very high. This hallucination right now is a big problem for reasoning models. One possible explanation is that some of them are recursively sampling base models that have single digit hallucination rates. But if you keep sampling it multiple times, you're going to end up with a very high hallucination rate. Some people think this is more of an engineering problem than a science problem. There may be paths around it, but the bottom line is, look at this table. The hallucination rates of OpenAI's suite of reasoning models is roughly in the neighborhood of 50%. And sometimes it provides broken links, sometimes it even describes steps that it did in interim computations that it didn't even do. I mean, it tells lies and falsehoods that my 4 year olds used to do, and that's how easily some of them are identified. So I think it's important to understand that some of the improved proficiency that we're seeing on the prior charts and pages are things they've been trained to do better at, whereas used on a kind of random, broad basis by the rest of us, by the diaspora of users, we're going to be much more subject to hallucination risks. And this is something called Goodhart's Law in spade. Goodhart's Law means once a benchmark becomes widely acceptable, it tends to lose its value because people game and manipulate the outcome. The best example is in colonial India they had a problem with cobras and so they paid people to bring in COBRA so they could kill them. So then people started to breed cobras so they could bring them in and get paid for their cobras and they ended up with too many cobras anyway. It's, it's very important to understand that the hallucination risk for the reasoning models are still very high and it's still a problem that has to be solved. It means also that in corporate applications, all of those kind of hallucination risks have to be bred out of whatever process that those reasoning models are used for. Well, I'm not a huge fan of McKinsey surveys. I think there's questions about rigor and thoroughness and things like that. My favorite study about consultants came out a few years ago and shows that when you hire consultants like McKinsey, the most likely outcome is that six or nine months later you're still hiring consultants like McKinsey anyway. That said, they did a survey of about 1500 companies and ask them questions about how much over the next three years do you think you're going to reduce employees? How much are you going to reduce overall costs? How much your revenue is going to go up? And to simplify this, this, this survey, the MO. The good news is that around 50% of all respondents said that they expect employee cost reductions and revenue increases from adoption of generative AI. That's the good news. The bad news is the most frequent answer in almost each case of the people who said it was going to help was the smallest amount of help. In other words, the most frequent three year reduction expected was 3 to 10% of employees, rather than 11 to 20 or more than 20, et cetera. Same story with revenue. The most frequent response was it'll help us by less than 5%. But that said, it does show that AI adoption is increasing in real world business cases. And we're getting the same story from a Bain survey that was just completed that looked at the change. Over the last year, the AI adoption cases have gone up by 50 to 60%. And the census also does an interesting survey where they look at adoption rates by sector over time, and you're starting to see adoption rates of 20 to 30% in some of the sectors where you'd expect to see them. I thought an interesting anecdote was, and this is from one of our AI researchers, there's a Mexican used car platform that told our researchers they replaced their entire outbound sales team with an AI voice model powered by a generative AI platform. And most of the time, customers can't even tell they're speaking to a voice bot. And the voice bot does a better job converting customers than the human baseline they're comparing it to. So another couple of things. There's been a sharp increase in FDA approved medical devices that rely on AI. I want to finish up with something about the FDA at the end, but there's been an increase in the use of medical devices. And then there was an article in the Atlantic that I thought was interesting that proposes the idea that AI is starting to impact the job market. And they show this chart as an example of that, which is for the better part of the last 30 years, the overall unemployment rate was higher than the unemployment rate for recent college graduates. Whereas since 2020, which is before some of this AI stuff really got kicked in, but since 2020, that number has been falling. So recent graduates have higher unemployment rates than the overall market. And what do recent college graduates do? They summarize information, they aggregate data and they create charts and tables and graphs. And if that's what AI is getting better at, and if that's what AI is being used for, that would help explain this. So this is indirect evidence at best. Now, last September when I last did an AI update, I expressed a lot of concern that the hyperscalers were spending a ton of money and that we would pretty soon need to see hard evidence of them starting to get a return on all that investment. And at the time, I Cited analysis by a guy named David Kahn at Sequoia where he backed into using his own assumptions how much the industry would need to earn every year, assuming certain capital spending and gross margin of the hyperscalers. And he got a number like 500 billion a year in annual incremental AI revenue. Now he assumed the requirement for very rapid payback period. But even if you relax that constraint, you need some very big figure AI revenues for these companies given that they're spending hundreds of billions of dollars on CAPEX and R& D every year. So what are we seeing? Well, first of all, the hyperscale or capital spending in R and D as a share of revenue figures are starting to creep up again. So in 2022-2024 for Microsoft, Google, Amazon, those numbers were, were kind of plateauing at around 25%. So in other words, they were spending more, but they're also earning more. Now those numbers are starting to go up and we're at 30 to 35%. So we have to start to watch this because it seems like capital spending growth is overtaking the overall revenues of these businesses. Now the good news is Microsoft gave us some clues. As far as we can tell, they're the only one of the hyperscalers that's giving you hard data on how much they're earning from AI. On a trailing one year basis, it looks like three to three and a half billion dollars, obviously growing substantially in terms of year on year rates. And then another interesting observation, they said they processed 100 trillion tokens in Q1 2025, 50 trillion of them are just in March. Obviously that's a super jargony thing for them to say, but when you translate it into English based on what we know about the way these models work, it means that there's a lot of inference activity going on and not just model training. So there's an Inference models are typically assigned that corporations are adopting AI models, using them in actual workflows. So that was a good sign too. And for the other hyperscalers, they, their AI revenues are buried inside the cloud. And we're starting to see quarterly revenues, particularly for Amazon, Microsoft and Google start to pick up, although they're kind of flat for Oracle and IBM. So we're seeing some evidence that AI revenues are picking up when we look at the cloud. But at the end of the day, the most important chart is this highly cyclical one. But it's what's going to go on with how long can these hyperscalers keep this going? And that's going to be a function of their overall free cash flow margins. Amazon's is still negative, but for Meta and for Microsoft and for Google, they're still hanging in with 20 to 30% free cash flow margins. As long as that's the case, I think they'll be able to keep these capital spending wars going. But that's the thing we have to watch the most. If you start to see a sustained dip for Meta, Microsoft and Google below the 20% level in terms of free cash flow margins, I think the markets would be very concerned that the capital spending is getting way ahead of the AI revenue generation. So what happens next? Our people, you know, we have a lot of AI going on inside the company. Our people tell me there's too much focus on software developers. They only spend around 30% of their time actually coding. So even if you improve their productivity in coding by 50%, you're still only talking about a 15% productivity improvement. And they think the larger gains instead of coding per se from generative AI are things like software maintenance, unit testing, integration testing, performance monitoring. These are harder things to measure, but they think the savings potential is much greater behind the scenes. I think Microsoft and Amazon are likely to accelerate efforts to build their own foundational models. There's a lot of stuff going on that you can read about each week in the press on the ongoing divorce between OpenAI and Microsoft, Amazon, Google and Microsoft are trying to manufacture their own GPU like chips to break Nvidia's stranglehold on the market. And there's a bunch of AI adoption milestones to watch for over the next couple of years in terms of self driving cars and drones and multimodal AI used in entertainment, personalized AI assessment, things like that. But at the end of the day, looking Back to the 1990s, this is the biggest capital spending experiment on record by the tech sector. We're now setting new consistent highs in terms of capital spending and R and D as a share of revenues. And so the bottom line is this thing better work and the AI adoption rates are going to have to continue to start going up pretty soon. I didn't mention AI approved. The AI and machine learning approved involved medical devices that have been approved by the fda. Speaking of the fda, I don't know if I should be drinking so many Frescas. I have no idea if that's healthy or not. So I did want to mention one thing about the fda. The drug approval rates have fallen in half, at least in Q1 of this year. And I was thinking about that recently, when I found out people immunized between 1963 and 1967 for measles, mumps, rubella, received an inactivated version of the vaccine and these. And it was just that brief period because the live vaccine wasn't approved, pardon me, until 1967. So the problem is the inactivated version of the vaccine gives you much lower immunity, the live one. And some of the studies show that after getting the vaccine, only a quarter of the people still had detectable antibodies at some point later. So the CDC is recommending that individuals vaccinated during that period get a new live vaccine. But for certain medical conditions, like one I happen to have, you can't get live attenuated live vaccines because they've got live attenuated viruses and they're not good for people that have certain kinds of immune deficiencies. And so, you know, that's that. So now people like me and others are being negatively impacted by all the people that are deciding that they don't want to get vaccinated anymore for MMR, even though it's 97% effective against the spread of measles. And just to give you a sense, the infectiousness, the infectiousness measure of COVID and the flu is like 1 to 2. For polio and smallpox that's 5 or 7, and for measles it's 12 to 18. And. But what's going on in the country is over the last 10 years, the vaccination rates have fallen from nationwide 95 to 92%. But there's a whole bunch of states below 90%. Georgia, Colorado, Wisconsin, Alaska and Idaho's has fallen to 80%. And in Gaines County, Texas, where a lot of the measles cases have occurred, vaccination rates are 80% and one school district is below 50%. And there was a study recently from Stanford that estimated that measles could become endemic again within two decades given these declines in vaccination. At the same time, instead of consistently messaging the importance of the vaccine, RFK Jr. Has directed health agencies to to explore potential new treatments for people to get vaccine to get measles, including vitamins and cod liver oil. I think it's a good time for me to stop this podcast right there. Thank you for listening and we'll see you again next time. Bye.
