
Loading summary
Unknown Host
From LinkedIn News.
Gary Marcus
I'm Leah Smart, host of Everyday Better, an award winning podcast dedicated to personal development. Join me every week for captivating stories and research to find more fulfillment in your work and personal life. Listen to Everyday better on the LinkedIn podcast network, Apple Podcasts, or wherever you get your podcasts from. LinkedIn News. I'm Jessi Hempel, host of the hello Monday Podcast. Start your week with the hello Monday Podcast. We'll navigate career pivots. We'll learn where happiness fits in. Listen to hello Monday with me, Jessi Hempel on the LinkedIn podcast network or wherever you get your podcasts.
Unknown Host
Is the AI field reaching the limits of improving models by scaling them up? And what happens if bigger no longer means better? That's coming up with AI critic Gary Marcus right after this. Welcome to Big Technology Podcast, a show for cool headed and nuanced conversation of the tech world and beyond. We're joined today by AI critic Gary Marcus, the author of the book Rebooting AI and Marcus on AI on Substack, and he's here to speak with us about whether the AI industry is hitting the limits of scaling generative AI models up and what it means if we're truly seeing diminishing returns from making these models bigger. Gary, it's great to see you. Welcome to the show.
Gary Marcus
Thanks for having me.
Unknown Host
So the genesis of this episode is that I did an episode with Mark Chen from OpenAI about GPT 4.5 and you come into my DMs and you say, listen, I want to give a rebuttal. Scaling is basically over and it's not exactly what OpenAI has said. Now, for those who don't know about the scaling laws, basically the idea is that the more compute and data you put into these large language models, the better they're going to get. Basically, predictably, linearly, well, exponentially was the idea, right? And so the context here is now we've seen almost every research house all but admit that that has hit the point of diminishing returns. I think Mustafa Suleiman was here. He pretty much admitted it. Thomas Kurian, CEO of Google Cloud, said that their diminishing returns are happening. Yann Lecun has also talked about the fact that you're just not going to see as many returns from AI scaling as you would beforehand. So just describe the context of what we're seeing right now. How big of a deal is it? And then what are the implications for the AI industry? Because this is the big question. I mean, how much better can these things get right. That is the big question with AI today.
Gary Marcus
Well, I mean, I have to laugh because I wrote a paper in 2022 called Deep Learning Is Hitting a Wall. And the whole point of that paper is that scaling was going to run out, that we were going to hit diminishing returns. And everybody in the field went after me. A lot of the people you mentioned, I mean, LeCun did, Elon Musk went after me by name, Altman did. And they all, like Altman said, give me the strength of the, of a mediocre deep learning skeptic. So people were really pissed when I said that deep learning was going to run out. So it's amazing to me that a bunch of people have conceded that these scaling laws are not working the way they used to be. And they're also doing a bit of backpedaling. I think that that Mark Chen interview, I can't quite remember the details, but I think it was a version of back pedaling and redefining things. So if you go back to 2022, there were these papers by Jared Kaplan and others at OpenAI and they said, look, we can just mathematically predict how good a model is going to be from how much data there is. And then there were the so called chinchilla scaling laws and everybody was super excited. And basically people invested half a trillion dollars assuming that these things were true. You know, they, they made arguments to their investors or whatever. They said, if we put in this much data, we're going to get here. And they all thought that here in particular was going to mean AGI eventually. And what happened last year is everybody was disappointed by the results. So we got one more iteration of scaling after 2002, after 2022 that worked really well and we call that GPT4 and all of these models that are sort of like that. So I wrote that paper around GPT3. We got another iteration of scaling. So right, three was scaling compared to two, it was much better. Two is scaling compared to one, it was much better. So much better. And sorry, much more data meant much better.
Unknown Host
But what is, what, what is much better?
Gary Marcus
Well, I mean, one way to think about it is you didn't need a magnifying glass to see the difference between GPT2 and it was, we didn't call it GPT1, but the original GPT and you didn't need a magnifying glass for GPT4 as opposed to GPT3. It was just obviously better. A lot of people thought is that we would pretty quickly see GPT5. And a lot of people raced to build it. So OpenAI tried to build GPT5 and they had a thing called Project Orion and it actually failed and eventually got released as GPT 4 and a half. So what they thought was going to be GPT 5 just didn't meet expectations. Now they could slap any name on any model they want. And in fact, lately nobody understands how they're naming their models. But they haven't felt like any of the models that they've worked on since GPT4 actually deserve the name GPT5. And it didn't meet the performance that these so called mathematical laws required. And what I said in that paper is they're not really mathematical laws. They're not physical laws of the universe like gravity. They're just generalizations that held for a little while like a baby may double in weight every couple of months early in its life. That doesn't mean that by the time you're 18 years old that you're going to be £30,000. And so we had this doubling for a while and then it stopped. And we can talk about why, but the reality is it's not really operative anymore. So there's been efforts to kind of misdirect and shift direction. So I think everybody in the industry quietly or otherwise acknowledged that hey, we're not getting the returns that we thought anymore. And nobody's been able to build a so called GPT five level model. That's a big deal, right? I'm a scientist and as a scientist, originally scientist, as a scientist, we have to pay attention to negative results as well as positive results. So when 30 people try the same experiment and doesn't work, nature is telling you something and everybody try the experiment of building models that were 10x the size of GPT4, hoping to get to something they could call GPT5 that was like a quantum leap better than GPT4. They didn't get there. So now they're talking about scaling, inference, time, compute. That's a different thing. So the, but before we get there.
Unknown Host
I just want to talk to you about, I want to test your, your theory here. So it's not that scaling is over, right? I don't think anyone that we're talking about say scaling is over. Basically what they're saying is if you want to make the model better, and I think that makes means more intelligent, more conversational, even more personable, you can still do it by scaling. I think what they admit, the thing that they admit though is that it takes much, much more compute and Much more data to get the same results that you would in the previous.
Gary Marcus
So let's clarify two things. One is that what people talked about, about scaling originally was a mathematically predictable relationship between performance and amount of data. You can go back and look at the Chinchilla paper, the Jared Kaplan paper, and lots of things that were posted on the Internet. There were papers that saying, or T shirts saying, scale is all you need. You looked at that T shirt and it had equations from the Jared Kaplan paper and it said, you know, here's the exponent. You can fit the equation. If you have this much data, this is the performance you're going to get. And there were a bunch of papers, a bunch of models that actually seemed to fit that curve, but it was an exponential curve. And what's happening now is, yeah, you add more data, you get a little bit better, but you're not fitting that curve anymore. We've fallen off the curve. That's what it really means to say that scaling isn't working anymore is you don't, you know, if I drew a curve for you, it was going up and up and up really fast, and it's not going up as a function of how much data you have, so. Or how much compute you had. So you added a bunch of compute and you got this much better performance. And this is how people justified running these experiments that cost a billion dollars is they're like, I know what I'm going to get for the billion dollars. And then they ran the billion dollar experiments and they didn't get what they thought they would. Yeah, you get a little bit better. But that's what diminishing returns means. Diminishing returns means you're not getting the same bang for your buck as you used to. That's where we are now. So anytime you add a little piece of data, the model is going to do better. Excuse me, on that piece of data. But the question is, does it generalize and give you significant gains across the board? And we were seeing that and we just aren't anymore.
Unknown Host
So is there still a path for these models to become much more performant? I mean, let's say you do supersize these clusters to the point that is insanely. They are insanely bigger than they were previously. Let's talk about, like Elon Musk's 1 million GPU cluster.
Gary Marcus
Well, let's look at what Elon got for his money, right? So he built Grok 3, and by his own testimony, it was 10 times the size of Grok 2. It's a little better, but it's not night and day. Right. Grok 2 was night and day better than the original Grok. GPT 4 was night and day better than GPT 3. GPT 3 was night and day better than GPT 2. Grok 3 is like, yeah, you can measure it, you can see that there's some performance. But for 10x the investment of data compute and not, not to mention costs of energy to the environment, it's not 10 times smarter by any reasonable measure. It just isn't.
Unknown Host
Okay. And so this would be the point where I say, well then this entire AI moment is done, however, well, that's this moment.
Gary Marcus
There will be other AI moments but this one.
Unknown Host
I'm setting it up to say that it's not because, because like you mentioned, you're talking about test time compute. That's another way to say reasoning, I think, which is these models.
Gary Marcus
Well, I'm going to give you a hard time, but okay, but I. People do do that.
Unknown Host
But with, with reasoning or test time compute, you'll help me figure out the finer details. What these models are doing is they're coming to try to find an answer and they're checking their progress and deciding whether it's a good step or not and then taking another step and another step. Yeah, and we've seen that they have been able to perform much better when you put that reasoning capabilities on top of these large models, which has enabled these research houses to continue the progress.
Gary Marcus
And give you, but it's not really you, it's, it's these companies, some pushback on that. So it is true that you can build a model that will do better if you put more compute on it, but it's only true to some degree. So then I'll get to whether it's actually reasoning or not. But it turns out that on some problems you can generate a lot of data in advance. And for those problems, adding more test time compute seems helpful. There's a paper this weekend that's causing, calling some of this into question, by.
Unknown Host
The way, just to explain to folks, test time is when the model is giving an answer. That's what test time is.
Gary Marcus
That's right. So you have these models now like O3 and O4, that will sometimes take like 30 seconds or five minutes or whatever to answer a question. And sometimes it's absurd because you ask it like, what's 37 times 11 and takes, you know, 30 seconds? You're like, my calculator could have done it faster. But we'll put aside that absurdity in some cases it seems like time went spent, sometimes not. But if you look carefully, the best results for these models are always almost are almost always on the same things which are math and programming. And so when you look at math and programming, you're looking at domains where it's possible to generate what we call synthetic data and to generate synthetic data that you know are correct. So for example, on multiplication, you can train the model on a bunch of multiplication problems and you can figure out the answer in advance, you can train the model what it is that it should predict. And so on these problems in what I would call closed domains, where we can do verification as we create the synthetic data, we verify that the answer we're teaching the model is correct. The models do better. But if you go back and you look at the OH3, sorry, the O1 paper, even then you could already see that the gains were there and not across the board. They reported that on some problems O1 was not better than GPT4. It's only on other problems, these cut and dry problems with the synthetic data, that you actually got better performance. And I've now seen like 10 models and always seems to be that way. We're still waiting for all the empirical data to come in, but it looks to me like it's a narrow trick that works in some cases. The amazing thing about GPT4 is that it was just better than GPT3 on almost anything you could imagine. And GPT3, the amazing thing is it was better than GPT2 on almost anything you can imagine. Models like O1 are not systematically better than GPT4. There's, they're better in certain use cases, especially ones where you can create data in advance. Now the reason I wouldn't call them reasoning models though you're right that many people do, is what I think they're doing is basically copying patterns of human reasoning. They're getting data about how humans reason certain things. But the depth of reasoning there is not that great. They still make lots of stupid mistakes all the time. I don't think that they have the abstractions that we think, for example a logician has when they're reasoning. So it look has the appearance of reasoning, but it's really just mimicry and there's limits to how far that mimicry goes. I'll give you just one More example is O3 apparently hallucinates more than the.
Unknown Host
Models that came before it, which is stunning. Like how does that happen?
Gary Marcus
I mean, that's a good broader question, which is our understanding of these Models is still remarkably limited. So the technical term or one technical term. Well, I was going to give you a different one. Which is, which is black box.
Unknown Host
Okay.
Gary Marcus
But they're closely related, those two terms.
Unknown Host
Interpretability, to get figure out what's going.
Gary Marcus
On in the black box, if you can at all. I mean, I'd almost put it another way, which is that the black box.
Unknown Host
The thing in the plane that tells you what actually happened.
Gary Marcus
Well, that's a different thing. Right. So a black box in a plane is actually a flight recorder that records a lot of data. But what we mean in machine learning by black box is you have a model where you have the inputs and you have the outputs. You know, how you calculate them, but you don't really understand how the system gets there. So in this case, you're doing all this matrix multiplication. Nobody really understands it. And so nobody can actually give you a straightforward answer for why O3 hallucinates more than GPT4. We can just observe it. That's what happens with black boxes is you, you empirically observe things and you say, well, it does that, but you don't really know why and you don't really know how to fix it either. Another example just in the last couple days is apparently Sam Altman reported. I forget the new model is. Is stubborn or what was it? I forgot.
Unknown Host
No, it's not stubborn. It's a bro.
Gary Marcus
It's a bro.
Unknown Host
But that's GPT4. Oh, it's just like it became very fratty.
Gary Marcus
Became very fratty.
Unknown Host
And like, you, right, you would be like, what's going? Like, help me with this. And it's like, yo, that's a hell of good question, bro. And you're like, we don't know why this happened. And they rolled it back completely.
Gary Marcus
Yeah, exactly. Or I thought they were partly rolled or whatever.
Unknown Host
No, no, Sam said it's now the latest iteration.
Gary Marcus
It's been completely, just completely rolled back. So. Right. That was what I would call again empirical. Like they tried it out and it didn't work. Or it worked in a way that irritated people. Right. And so we don't know in advance. Like, there's a lot of. Just like, try it. Because that's how black boxes work. And we have some things, but those things are not very strong. So the scaling quote laws were empirical guesses about how these models work, and they were true for a little while, which was amazing. And. And they're not true anymore, which is also amazing in a way. So we don't know what's going to happen from the black boxes.
Unknown Host
Right, okay, so, but let me now.
Gary Marcus
Sort of, and sorry, let me come back to one other thing quick, which is interpretability. So that's a very closely related notion. So let's say you look at a GPS navigation system that's a piece of AI that's very interpretable. So you can say it is plotting this route. It says, you know, you can go this way, you can go that way. This is the function that it's maximizing. This is the database it's using. This is how it looks up the data. We don't have any of that in these so called black box models. We don't really know what the database is that it's consulting. It isn't exactly consulting a database at all and we don't know how to fix it. And so you know Dario Modi, who's.
Unknown Host
A CEO, talked about this on the show. You actually praised his interpretability post.
Gary Marcus
That's right.
Unknown Host
For interpretability.
Gary Marcus
I'll be honest, I haven't read the paper yet. I just read the title. So, so bad on me. But the title of his paper was something like on the Desperate need for Interpretability or Captures it. And I think he's right. I've said this too myself. Like in my last book I talked about interpretability being really important. The only difference between Dario and me on this point is we both think that we're screwed as a society if we stick with uninterpretable models. He just thinks that LLMs will eventually be interpretable. And his company, to be fair, has done the best work on interpretability of LLMs that I'm aware of. Chris Ola I think is brilliant, but they haven't got that far. They've gotten further than anybody else, but I don't think we're ever going to get very far into the black box. And so I think we need to start over and find different approaches to AI altogether.
Unknown Host
Right. So Gary, if I'm listening to what you're saying on this show so far, it is basically after GPT4, we haven't made a lot of progress, however, a little bit. But let me just do the pushback here, which is, I mean if you think about what it's like using these, These models after GPT4, they are significantly better. I'll give you one example. I was using 03, this new reasoning model, or test time model, whatever you want to call it. And I just, I'm in it and I'm doing crazy things and it's exceptionally helpful. So I put a photo of myself on a rock climbing wall and said, what's going on? And it like was able to look at the form where my body was, where my, what my posture was and like analyze all these things and give actually helpful coaching tips which you never would have had with, with GPT4. Then you think about what Claude is doing. The anthropic bot. I was with some friends last night and this is what we do for fun. I vibe coded a retirement calculator directly in cloud. It took like 10 minutes. We went from, we took a bank statement, we got a line graph of the person's balances, a bar graph of their expenses, financial plan, and then we coded a retirement calculator based off of the data that we had there. You also have PhDs that are now adding their unique insights into these models for training. They just basically are sitting and writing down what they know and the model is absorbing it. So we are seeing, I would call it, vast improvement over the GPT4 models.
Gary Marcus
So I mean, there's a couple different ways to think about that. So one is on a lot of benchmarks, there's improvements, but there's also issues of data contamination. Alex Reisner wrote an excellent piece in the Atlantic about the issues of data contamination. And we've seen a lot of studies where people are like, well, we tried it. My company is not really that much better. So they're better on the benchmarks. Are they better in general? Not so clear. There was a new benchmark released by a company called VAL AI or something like that. Val's AI to the Washington Post talked about yesterday where they looked at things like, can you pull out a chart based on a, a series of financial statements, SEC statements from a bunch of companies and these systems all claim to do it, but accuracy was under 10%. And overall on this new benchmark, accuracy was at 50%. Would these be new models be better than GPT4? Maybe, but they weren't that good. So I think people tend to notice when they do well, they don't notice as much when they do poorly. And although I think there's been some improvement, there has not been the quantum leap that people are expecting. We have not moved past hallucinations, we have not moved past stupid reasoning errors. If you go Back to my 22, 2022 paper, Deep Learning is hitting a wall. I didn't say there'd be no progress at all. What I said is we're going to have problems with hallucinations, we're going to have problems with reasoning, planning until we have a different architecture in some sense, and I think that that's still true. We still stuck on the same kinds of things. So if you have, you know, deep research, right to a paper, it's going to make up preferences.
Unknown Host
Okay.
Gary Marcus
It's probably going to make up numbers. Like, you know, did you actually go back and check? So, for example, with, I think it's called, they all have similar names now, whatever Grox version is. Deep search Deep research.
Unknown Host
Yeah, it's had some, I don't know, deep research. I'm convinced that we have AGI. Until these companies learn how to call deep research something other than deep research, they all use the same exact name. It's really bizarre.
Gary Marcus
So whichever version Grok has, I asked it, for example, to list all of the major cities that were west of Denver. And to somebody who wasn't paying attention to be super impressive. But because I really wanted to know how well it was working, I checked and it left out Billings, Montana. Right. So you got a list that looks really good. And then there are errors. This often happens. And then I had a crazy conversation with, after that, I said, what happened to Billings? And he said, well, there was an earthquake there on February 10th or whatever. And I looked up in the, you know, the seismological data. I use Google because I want to have a real source or duckduckgo. And there was no earthquake then. And I pushed it on and said, well, I'm sorry for the error and whatever. So we're still seeing those kinds of things. We may see them less, but they are still there that we still have those kinds of problems. So I don't doubt that there's been some improvement. But the quantum across the board that people were hoping for is not there. The reliability is still not there, and there's still lots of subtle errors that people don't notice. And then, you know, if you want to talk to me about retirement calculators, there are a lot of those on the web. So the easy cases for these systems are the ones where the source code is actually already there on the web. Like Kevin Roose talked about this example of having he quote Vibe coded a system to look in a refrigerator and tell tell them what recipe to make. But it turns out that app is already there on the web and there are demos of that with source code. And so, like, if you ask a system to do something that's already been done, that's always been true with all of these systems, that's their sweet spot is regurgitation. And so, yeah, they can Build the stuff that's out there. But if you want to code things in the real world, you usually want to code something that's new. And the systems have a lot of problems with that. And another recent study, excuse me, showed that they're good at coding, but they're not good at debugging. And like, coding is just the tiniest part of the battle. Right. The real battle is debugging things and maintaining the code over time. And these systems don't really do that yet.
Unknown Host
But, you know, search has made them more reliable. When these bots are able to search the web and they are now starting to give you lots of links in the actual answer.
Gary Marcus
I still, like, get daily people sending me examples of, you know, it hallucinated these references.
Unknown Host
I'm not saying hallucinations have been solved, but for me, like, I will use it. It's an incredible research assistant. And then when it links out to things, and I'm not sure of those figures, I'll then go to the primary sources and start reading.
Gary Marcus
I mean, it looks good on you that you go to the primary sources. Well, this is a. I worry the most about people who don't. And we've seen countless lawyers, for example, get in trouble using these systems.
Unknown Host
Has it been countless? I just heard of one.
Gary Marcus
Oh, no, no, no. There's many more than that. There's some in the US there's some in Canada. I think there was just one in Europe. I mean, it's not really countless. One could sit there and do it, but it's got to be at least a dozen by now.
Unknown Host
And whether this is going to be all right, I think we can both agree on this, that whether this is the end of progress or towards the end of progress or whether there's a lot more progress, there's a real problem of people outsourcing their thinking to these bots.
Gary Marcus
Well, Microsoft did a study, in fact, suggesting that critical thinking was. Was getting worse as a function of them. And that wouldn't be too surprising. We have a whole generation of kids who basically rely on these bots and who don't really know how to look at them critically. You know, in previous years, we were starting to get too many kids relying on whatever garbage they found on the web, basically. And I mean, chatbots are basically synthesizing the garbage that they find on the web. And so we're not really teaching kids critical thinking skills. And nowadays, like the idea for many kids of writing a term paper is I typed in a prompt in ChatGPT and then maybe I made a couple edits and I, I turn it in. You're obviously not learning how to actually think or write in that fashion. A lot of these tools, I think, are best used in the hands of sophisticated people who understand their limits. So new coding has actually been, I think, one of the biggest applications. And that's because coders understand how to debug code and so they can take the system. Basically, it's just typing for them and looking stuff up. And if it doesn't work, then they can fix it. Right. The really dangerous applications are like when somebody asks for medical advice and they can't debug it themselves and, you know, something goes wrong.
Unknown Host
Okay, so I'm going to take into consideration all the things that you've said so far and see if I can get a sense as to where you think we're heading. It seems like there was a push to just make these models better based off of scale. That could be things like the 300,000 GPU cluster, I think Meta used for llama 4. Or it could be the million cluster GPU center that Elon's built for Grok. And what you're saying is that's been maxed out pretty much like no one's.
Gary Marcus
It's not maxed out, but there's diminishing returns. There's diminishing returns.
Unknown Host
So the point that, a point that I'm trying to make here is you don't believe that there's going to be anyone that's going to build a bigger GPU data center than that. Because if you're seeing diminishing returns from something that costs billions of dollars, it doesn't make sense to invest.
Gary Marcus
Well, wait a second. I'm not saying people are rational. I think that people will probably try at least one more time. They'll build things, you know, probably Elon will build something that's 10 times the size of Grok 3, which will be huge. And it will, you know, it will have a serious impact on the environment and so forth. I just don't.
Unknown Host
It's not just GPUs, also, it's data, right? Like how much more data?
Gary Marcus
Well, let's come to data separately in a second. So I think people will actually try, right? I think Masa has just bankrolled Sam to try. I just don't think they're going to get that much for it. I don't think they'll get zero. I mean, there will be tangibly better performance on certain benchmarks and so forth, but I don't think that it's going to be wildly impressive and I don't think it's going to knock down the problems of hallucinations. Bone headed error.
Unknown Host
So here's what I'm getting at. That's not going to feel much better than what we have today. It doesn't seem like you believe that reasoning is going to make the bot feel much better than we have today.
Gary Marcus
Not, not the kind of reasoning they're doing.
Unknown Host
No emergence, there's no emergent coding. So are you basically saying that what we have in AI today, this is it like this for generative for a while?
Gary Marcus
I guess. I mean, look, I put out some predictions last year in March that people can look up that had on Twitter and those predictions include, I said there'd be no GPT5 this year or if it came out, it would be disappointing.
Unknown Host
Supposed to come in summer.
Gary Marcus
Well, this is last year, so. So I said, right, in 2024 we won't see this. And that was a very contrarian prediction at that point. Right. This was a few weeks after people had said, oh, I bet GPT4 is going to drop off to the super bowl, like right after the Super Bowl. Won't that be amazing? So people really thought it was going to come last year. If you go back and look at, you know, what they said on Twitter, etc. And it didn't. And I correctly anticipated that it wouldn't. And I said we're going to have a kind of pile up where we have a lot of similar models from a lot of companies. I think I said 7 to 10, which was sort of roughly right. And I said we were going to have no moat because everybody's doing the same thing and the prices were going to go down, we'd have a price war. All of that stuff happened. Now maybe we get to so called GPT5 level this year keeps getting pushed back. I don't know if we'll get much further than that without some kind of genuine innovation. And I think genuine innovation will come. But what I think is we're going down the wrong path. Yann Lecun used this notion of, you know, we're on the exit ramp or how do you say it, large language models are the off ramp to AGI. You know, they're not really the right path to AGI. And I agree with him, or you could argue he agrees with me because I said it, you know, for years before he did. But we won't go there. The broader notion is sometimes we make mistakes in science. I think one of the most interesting ones was people thought the genes were made of protein for a long time. So the early 20th century, lots of people tried to figure out what protein is a gene made of. It turns out it's not made of a protein, it's made of a sticky acid that everybody now knows called DNA. So people spent 15 years or 20 years like really looking at the wrong hypothesis. I think that giant black box LMS are the wrong hypothesis, but science is self correcting in the end. People put another $300 billion into this and it doesn't get the results they want. They'll eventually do something different.
Unknown Host
Right, but what you're forecasting is basically an enormous financial collapse because.
Gary Marcus
That's right. I don't think LMS will disappear. I think they're useful, but. But this, yeah, the valuations don't make sense. I mean, I don't see OpenAI being worth $300 billion. And you have to remember that venture capitalists have to like 10x to be happy or whatever. Like I don't see them, you know, IPOing at $3 trillion. I just don't.
Unknown Host
No, it's interesting because I almost see the OpenAI valuation as the one that makes the most sense because they have a consumer app, the where, the place that I start to get. If what you're saying is correct, that we're not going to see any more, if we're seeing real diminishing results from scaling and this is basically where we are, then there's real worry for companies like Nvidia, which has basically risen on the idea of scaling.
Gary Marcus
I mean they're down a third, a.
Unknown Host
Third this year, 2 point something, 2.5 trillion last year.
Gary Marcus
They're a genuinely good company, they have a wonderful ecosystem, they're worth a lot of money. I mean I don't, I don't want to put an exact figure, but I'm not surprised that they fell and I'm not surprised that they're still worth a lot.
Unknown Host
But this is a thing. If we end up seeing the fact that this next iteration, the $10 billion that Sam is going to spend seemingly on the next set of GPUs, if that doesn't produce serious results, that's going to hurt, that will cause a crash in Nvidia because so much of the company's demand is coming based up that this idea that scaling is going to work.
Gary Marcus
So they have multiple problems, both OpenAI and Nvidia. So one is it does look to me like we're hitting diminishing returns. It does not look to me like this inference time compute trick is really a general solution. It doesn't look like hallucinations are going away. And it does look, everybody has the same magic formula. So everybody's basically doing the same thing. They're building bigger and bigger LLMs. And what happens when everybody's doing the same thing? You get a price war. So Deep Seat came out and OpenAI dropped its prices quite a bit.
Unknown Host
Right.
Gary Marcus
And so every, because everybody, I mean, not literally everybody, but you know, 10, 20 different companies all basically have the same idea, are trying the same thing. You have to have a price for it. Nobody has a technical moat. OpenAI has a user moat, they have more users.
Unknown Host
And that's, that's the most valuable thing they have. Like for that is the most valuable API. I would say the API is close to worthless. I don't know worthless is the right word. But it's worth, it's not worth very much. It is that it's not. GPT is the thing that, that it really has.
Gary Marcus
It's the brand name that is most valid.
Unknown Host
I also think it's the best bot right now.
Gary Marcus
I, it might be. I mean, I think people go back and forth. Some people someday say it's Claude, I've.
Unknown Host
Been on the cloud train for a long time and now you're on the channel and I'm on chat GPT, I.
Gary Marcus
Think, what I think is going to happen is you have leapfrog, right? But the leaps aren't going to be as big as they, they were. So four was a huge leap. I mean, this is a different way of saying before it was a huge leap over three. You know, let's say I can't even keep up with the naming scheme. GPT 4.1, right. Let's say is better than Grok 3 point or Claude 3.7. Let's just say hypothetically. And so people run to this side of the room and then, you know, Claude, whatever 3.8.1 or whatever will be a little better. And then some people will run to that side of the room, but nobody's gonna be able to charge that much money because the, the advances are gonna be smaller. And people start to say, well, you know, I use this one for coding and this one for brainstorming and whatever. But nobody anymore says this is just like dominant, like GPT4 was just dominant when it came out. There was nothing as good as it for anything. If you wanted this kind of system, you used it. Right? I mean, that's my memory of it. I don't hear any of the. The chatgpt or whatever. I can't even keep up with the names anymore. Any of those products, any of the OpenAI products being referred to in the same kind of hush tones, like they're just better and like, you know, Google still in this race and they may undercut on price. Meta is giving stuff away. People are building on it. Deep Seek, I hear, has something new that's going to, you know, be better than ChatGPT. And, you know, maybe, maybe it's true, maybe it's not, but we're in this era where the differences between the models are just getting really small.
Unknown Host
I want to ask you when you're going to admit that you were wrong about things or if you ever will.
Gary Marcus
Which things? Which things?
Unknown Host
I think that. But I also realize that the question doesn't really hit because I just want to say we spoke the last time you were. I think you've been on the show two times, once with Blake Lemoine, once, one on one.
Gary Marcus
Yeah.
Unknown Host
And we. Because it's interesting, I think you're one of the most outspoken AI critics and you say a lot of the things that we say here on the show, which is that AGI is marketing. And even if we don't hit AGI, there's still a lot to be concerned about, whether that's the BS that people are talking about or being able to use these models for, you know, for nefarious purposes by churning out, like, content. Like, I don't know if you saw there was this study of this University of Zurich tried to fool people on Reddit or try to convince people on Reddit based off of answers by GPT and it still convinced more people than.
Gary Marcus
This is the new persuasion study. I'm aware of it, but I'm ready yet.
Unknown Host
So I guess, like, to me it does seem like it's kind of tough to be a critic of LLMs right now because they have been getting so much better. But I don't know, just sort of.
Gary Marcus
Like, I mean, people say, gary, you're wrong. And I say, well, here are the predictions I actually made. Like, I've actually reviewed them in print and I've asked people who say that I'm wrong to, like, point. What did I say that was wrong? I think that sometimes people confuse my skepticism with other people's skepticism. But I think if you look at the things that I have said in print, they're mostly right. And it, you know, like Tyler Cowan said, you're wrong about everything. You're always wrong. And I Said, tyler, can you point to something? And he said, well, you've written too much. I can't do it.
Unknown Host
Well, I look through some of your stuff, and I do think that sometimes it seems like you might have put, like, this enormous burden of proof for the AI industry. Like, you do pick out sometimes. Like, everyone that says, like, AGI is coming this year, and you're like, these people are liars. But that being said, like, I think your core people are wrong.
Gary Marcus
I've offered to put up money for Elon Musk, and I offer criteria. And I'll tell you about that, right, in 2022, in May, I offered him $100,000 bet. Later, I upped it to a million dollars. And I put out criteria on Twitter. I said, I'm going to offer these. Do these make sense to you? And everybody on Twitter, not everybody, nearly everybody on Twitter at the time said those were fine. Like, people accused me of goalposting, shifting, but my goalposts are the same, right? The 2014 paper in the New Yorker article in the New Yorker where I talk about a comprehension challenge, I've stuck by that. That is part of my AGI criteria. I made a bet with Miles Brundage on the same criteria, which he actually took the bet to his credit. But when I put them out in 2022, this is the important part, everybody was more or less in agreement that those were reasonable criteria. And I said, if you could beat my comprehension challenge, which is to say, you know, watch movies, know when to laugh, understand what's going on, if you could do the same thing for novel, if you could translate math from English into stuff, you could formally verify. If you could go into a random kitchen teleoperating a robot and, you know, make a dinner, if you could. What was the other criterion? Oh, you write, I think it was 10,000 lines of bug free code. I mean, you could do debugging to get there, whatever, you know, okay, if you could do like three out of five, we'll call that AGI. And at the time, everybody said, that's fine. Now people are backtracking. Like Tyler Cowen said, O3 is AGI. By what measure?
Unknown Host
I felt that that was kind of a stretch.
Gary Marcus
It was cheesy. And he said the measure was him. It looked like AGI to him. He invoked the classic line about pornography. I know, but people have pointed out lots of problems with O3. I think it's absurd to call 03 AGI.
Unknown Host
I wouldn't call it AGI.
Gary Marcus
So you a minute ago said, gary, you're wrong. But then you ticked off a bunch of things I'm actually right about.
Unknown Host
I didn't say, gary, you're wrong. I said, is there a point you'll admit you're wrong? Like what I'm.
Gary Marcus
Yes, there is. It's the point at which I'm wrong. So. No, no, no. Let me clarify one other thing, but.
Unknown Host
Let me just say I didn't say that you're wrong. I just said, like it when I point of advance that you would say, okay, yeah, I've been wrong about this stuff because I have listened to some of your.
Gary Marcus
Let me clarify something.
Unknown Host
But I also, right after I said that, I was like, you know, it's kind of like a tough question. And then I explained where I agreed with you.
Gary Marcus
Yeah. So, yeah, that's what happened. So some people take me as saying that AI is impossible, and that's not me. Right. I actually love AI. I want it to work. I just want us to take a different approach. Right. I want us to take a neurosymbolic approach where we have some classical elements of classical AI, like explicit knowledge, formal reasoning, and so forth that people like Hinton have kind of thumbed their nose at. But they say demis Hasabis is used very effectively in alpha flow. So we get into that if you want. If we get to AI. The question about whether I'm right or not depends on how we get there. So I've made some pretty particular guesses about it, and I have guessed that pure LLM will not get us there. Pure large language model. So will I concede I'm wrong when we get to AI that actually works? Depends on how it works.
Unknown Host
Okay. Yeah. And I think it's clear that, I mean, I don't know, we could watch this back in a couple years if.
Gary Marcus
We get to pure LLMs, if, silly, another round of scaling, you know, gets us to AGI by the criteria that I laid out, then I will have to concede that I was wrong.
Unknown Host
Okay. All right, I'm going to take a quick break, and then let's come back and talk a little bit more about the current risks and maybe read some of your tweets and have you expand upon them. We'll be back right after this. And we're back here on Big Technology Podcast with AI skeptic Gary Marcus. Gary, let me ask you this. So, you know, one of the things we talked about last time you were here was that AI doesn't have to reach the AGI threshold to be something that we should be concerned About.
Gary Marcus
Absolutely not.
Unknown Host
And a lot of the focus was on hallucinations. You and I both. I think we have a little bit of a diverging opinion on hallucinations. I think they've gotten much better. You think it's still a big problem.
Gary Marcus
Those could both be true, by the way.
Unknown Host
That could both be true. All right, so let's. Let's put a pin in that for now. I think where I'm seeing the most concern is virology. Um, or we just had a study that came out that showed that AI is now in PhD on PhD level in terms of virology. We had Dan Hendricks from the center for AI Safety who was here, who talked about the fact that, like, AI can now walk virologists through how to create or enhance the function of viruses. And we're starting to see some of these AI programs, like you mentioned, Deep Seeking, be available to everybody, be pretty smart, and be released without guardrails or not enough guardrails, especially if they're open source. So what are you worried about here? Is that the core concern, or is there other stuff?
Gary Marcus
I think there's actually multiple worries and different worries from different architectures and architectures used in different ways and so forth. So dumb AI can be dangerous. So if dumb AI is empowered to control things like the electrical grid and it makes a bad decision, that's a risk. Right. If you put a bad driverless car system in, you know, a million cars, a lot of people would die. Right. The main thing that has saved a lot of people from dying in driverless cars is there aren't that many of them. And so, you know, even though they're not actually super safe at the moment, you know, restrict where we use them and so forth, we don't put them in situations where they wouldn't be very bright. So dumb AI can cause problems. Super smart AI could, you know, maybe lock us all in cages if it wanted to. I mean, we have to talk about the likelihood of it wanting to, but there are definitely worries there, and we need to take them seriously. And then you have things that are in between. So, for example, the virology stuff is AI that's not generally all that smart, but it. It can do certain things, and in the hands of bad actors that can do those things. And I think it is true, either now or will be soon enough, that these tools can be used to help bad actors create viruses that cause problems. And so I think that's a legitimate word, even if we don't get to AGI. So we have Dummy dumb AI right now is a problem. Smarter AI, even if it's not AGI, can cause a different set of problems. And you know, if we ever got to super intelligent, that that might open a different can of words. I mean, you can think like, you know, human beings of different degrees of brightness and with different skills, if they choose to do bad things, can, you know, cause different kinds of harm.
Unknown Host
And so what's your view on open source then?
Gary Marcus
I worry about it. I do worry about it because bad actors are using these things already. They're mostly using them for misinformation. Not sure how much biology they're doing, but they will. And they're going to be interested in that. You know, state actors that want to do terrorist kinds of things things will do that. I am worried about open sourcing at all. And I think the fact that META could make basic, that META could basically make that decision for the whole world is not good. Like, I think there should have been much more government oversight, scientists should have contributed more the discussion. But now those kinds of models are open source. They've been released. We can't put that genie back in the bottle. And over time, just like people. I should have said this earlier, even if the models don't get any better, we will still find new uses for them. And some of those new uses will be positive and some of them will be negative. Right. We're still exploring what these technologies can do and people are finding, you know, ways to make money in dubious ways and to cause harm for various reasons and so forth. And so, you know, giving those tools very broadly as problems. On the other hand, I think what we've learned in the last three years is that the closed companies are not the ethical actors that they once were. So, you know, Google famously said, don't do evil and they took that out of their platform. You know, Microsoft was all about AI ethics. And then, you know, when Sydney came out, they're like, we're not taking this away, we're going to stick with it.
Unknown Host
Oh, they did kill Sydney. Or Sydney was this. Well, they very, I don't know, that tried to steal Kevin Roose's wife.
Gary Marcus
Yeah, I mean, they were, they reduced what it could do, but. But they stuck with it in some sense. But, you know, and like OpenAI said that we're, you know, nonprofit for public benefit now. They're desperately trying to become a for profit that is really not particularly interested in public benefit. It's interested in, in money. And they may become a surveillance company, which I don't think is because what.
Unknown Host
You'Re talking about with the advertising side.
Gary Marcus
So basically they have a lot of private data because they have a lot of users and people type in all kinds of stuff, and they may have no choice but to monetize that. And, you know, they've been showing signs of that. They hired Nakasone, who used to be at the nsa. They bought a share in a webcam company, and they recently announced they're trying to build a social media company. They want, you know, they look like they're on a path to sell your data, your very private data to, you know, whoever they care.
Unknown Host
It's concerning because whatever data I gave to Facebook, I always used to think that this conversation around Facebook data was a little ridiculous because I didn't think I was giving that much information to Facebook. But I am giving open AI a lot of information.
Gary Marcus
I mean, there's a lot of people that treat it as a therapy.
Unknown Host
Well, that's the number one use as therapist compared to. I don't use it as a therapist, but I'm like putting a lot of my work.
Gary Marcus
I read a great book called Privacy and Power. I'm blanking slightly on the title by Carissa Valise, and she had examples in there, like people were taking data from Grindr and extorting people. Right. Grindr is an app for gay people. If you don't know and you know, that's still in our society and like, in some places is acceptable in other places, you know, people don't necessarily want to come out if they're gay, whatever. And so people have been extorting people with data from Grindr. Imagine what they're going to do. You know, people type into ChatGPT like their very specific sexual desires, maybe crimes they've committed. Like, people are typing a lot of times, they want to commit crimes they want to commit. You know, we have a political climate where, you know, conspiracy, crime or conspiracy might be treated in a different way than it once was. And so just typing it into ChatGPT might, you know, get somebody deported. Who knows?
Unknown Host
Now I'm freaked out.
Gary Marcus
It's. I wouldn't personally use the system because the writing is on the wall. And I think that they. They make some promises to their business customers, but not to their consumer customers. And that stuff is available for them to do what they want with it, and they probably will, because that's how they're going to make money. Here's another way to put it, is suppose I'm right about the things I've been arguing and they can't really get to, you know, the GPT seven level model that everybody dreamed of. It can't really build AGI, but they're sitting on this incredible treasure chest of data. What are they going to do? Well, if they can't make AGI, they're going to sell that data.
Unknown Host
This is why I always thought like when you take in a lot of money, it's always, you always have to pay that money back in some way and that changes the way you operate.
Gary Marcus
That's right. I mean look at 23andMe. They're out of business and now that data is for sale and who knows what's going to happen with the 23andMe data.
Unknown Host
I hope you're wrong about this one. But, but, well, the history.
Gary Marcus
Exactly. This is.
Unknown Host
I'm not saying you are. I'm just saying I hope you are.
Gary Marcus
Because that would I. I hope I'm.
Unknown Host
Wrong too, but there is a level.
Gary Marcus
Of a lot of things I hope I'm wrong.
Unknown Host
Gary, if the pro. If people got that freaked out about what Facebook was doing with your data, if they overstep, there's going to be a major societal backlash.
Gary Marcus
I think maybe, I mean sometimes people just accommodate to these things. I've been amazed at how willing people are to, you know, give away all that information to Facebook. I don't use it anymore.
Unknown Host
But let me ask you this. You quote tweeted one of these. So we'll get into a tweet here. You quote tweeted one of these tweets. Is the push to optimize AI for user engagement just metric chasing Silicon Valley brain or an actual pivot in business model from create a post scarcity society. God. To create a worse TikTok. This is what basically we're talking about is that that might be the pivot.
Gary Marcus
Yeah, that's right. I think that was someone else's tweet.
Unknown Host
That I. Yeah, Daniel Litt. And you said I've been basically telling you about this.
Gary Marcus
Yeah, exactly.
Unknown Host
So that's what it is. You also wrote this saying the quiet part out loud. The business model of Gen AI will be surveillance and hyper targeted ads just like it has been for social media.
Gary Marcus
That's right. We were just talking about that. And what I was quote, cheating. There was something from Aravin Srinivas, if I pronounce his. Yep. Name correctly, who's the CEO of Perplexity. And he basically, I said, he's saying quite hard out loud. He basically said we're going to use this stuff to hypertech target you also.
Unknown Host
Said that companies like Johnson and Johnson will finally realize that, Jenny, I was not going to deliver on its promises. Have there been companies that have pulled back, like is using Johnson and Johnson.
Gary Marcus
As an example, that was based on a Wall Street Journal thing. And I may have failed to include the link because of Elon Musk crazy, crazy notions around.
Unknown Host
You got to put the links in the Elon.
Gary Marcus
You got to put the links. Whatever else.
Unknown Host
Acceptable.
Gary Marcus
Yes, that's right. So anyway, that was. I was alluding to a Wall Street Journal report that had just come out which showed that J and J had basically said, in so many words. I'll paraphrase it. They tried gen AI and a lot of different things, generative AI, and a few of them worked and a lot of them didn't. And they were gonna, like, stick to the ones that did like customer service and maybe not do some of the others. I mean, you have to go back, you know, a year and a half in history to when people thought, Jenny, I was going to do everything that an employee was able to do, basically. And I think what J and J and a bunch of companies have found out is that's not really true. You know, they can do a bunch of things that employees do, but they can't typically do everything that a single employee does. And, you know, they're reasonably good at triaging customer service and they're not necessarily good at creating, say, a careful financial protection.
Unknown Host
Okay, so, Gary, we have like five minutes left. I want. You said something in the, I think in the first half about the path that you think needs to be taken to AGI. Can you explain what that is in like, as basic of a way as you can to like, you know, make it as simple to understand for anyone who's not caught up with the systems that you spoke about?
Gary Marcus
Sure. So a lot of people will have read Danny Kahneman's book Thinking Fast and Slow, and there he talked about System one and System two cognition. So system one was fast and automatic, reflexive. System two was more deliberate, more like reasoning. I would argue that the neural networks that power generative AI are basically like System one cognition. They're fast, they're automatic, they're statistically driven, but they're also error prone. They're not really deliberative. They can't. They can't sanity check their own work. And I would say we've done that pretty well. But system two is more like classical AI, where you can explicitly represent knowledge, reason over. It looks more like computer programming. And these two schools have both been around since the 1940s, but they've been very separate for what I think is sociological and economic reasons. Either you work on one or you work on the other. People argue or fight for graduate students and fight for grants and stuff like that. So there's been a great deal of hostility between the two, but the reality is they kind of complement each other. Neither of them has worked on its own. So the classical AI failed. Right. People build all these expert systems, but there were always these exceptions and they weren't really robust. You'd pay graduate students to patch up the exceptions. Now we have these new systems. They're not really robust either, which is why OpenAI is paying Kenyans and PhD students and so forth to kind of fix the errors. The Advantage of System 1 is it learns very well from data. The disadvantage is it's not very accurate. Sorry, very abstract. So the. No, I should have said that slightly differently. The large language models and that kind of approach. Transformers are very good at learning, but they're not very good at abstraction. You can give them billions of examples and they still never really understand what multiplication is. And they certainly never get any other abstract concept. Well, the classical approach is great at things like multiplication. You write a calculator and it never makes a mistake. But it doesn't have the same broad coverage and it can't learn new things. You can wire multiplication in, but how do you learn something new? The classical approaches have had trouble with that and so I think we need to bring them together. And this is what I call neuro symbolic AI. And it's really what I've been lobbying for for decades. And I think it was hard to raise money to do that in the last few years because everybody was obsessed with generative AI. But now that they're seeing the diminishing returns, I think investors are more open to trying alternatives. And also AlphaFold is actually a neurosymbolic model and it's probably the best thing that I ever did.
Unknown Host
And so decoding proteins. Protein folding.
Gary Marcus
Yeah, Figuring out the three dimensional structure of a protein from, from a list of its nucleotides. And so are you going to raise.
Unknown Host
Money to try to do this?
Gary Marcus
I'm very interested in that, let's put it that way.
Unknown Host
Masa, son, if you want to make use of your money. No, I'm kidding. You talking to him?
Gary Marcus
Not at this particular moment.
Unknown Host
Okay, Masa, if you're watching, I don't know, trying to help. Okay, great. Well, Gary, can you shout out where to find your substacks? So if anybody wants to read your longer work on the state of AI, where should they go?
Gary Marcus
Sure. So people might want to read my last two books, by the way, Taming Silicon Valley, which is really about how to regulate AI, and rebooting AI, which was 2019, is a little bit old, but still, I think anticipates a lot of the problems around common sense and world models that we're still facing today. And then for kind of almost daily updates, I write a substack, which is free, although you can, can pay if you like to support me. And that's at Gary Marcus, substack.com.
Unknown Host
Okay. Well, I'm a subscriber. Gary, great to have you on the program. Thanks so much for coming.
Gary Marcus
Thanks a lot for having me again. Yet again.
Unknown Host
Yet again, yet again. Well, we'll keep doing it. It's always nice to hear your perspective on the world of AI, so I.
Gary Marcus
Always enjoy our conversations. Thanks for having me.
Unknown Host
Yes, same here. All right, everybody, thank you for listening. We'll be back on Friday, breaking down the week's news. Until then, we'll see you next time on Big Technology Podcast.
Big Technology Podcast: "Is AI Scaling Dead? — With Gary Marcus"
Release Date: May 7, 2025
Host: Alex Kantrowitz
Guest: Gary Marcus, AI Critic and Author of "Rebooting AI"
In the episode titled "Is AI Scaling Dead?," host Alex Kantrowitz engages in a thought-provoking discussion with Gary Marcus, a prominent AI critic and author of Rebooting AI. The conversation centers on the current trajectory of artificial intelligence (AI) development, particularly focusing on the scalability of large language models (LLMs) like those developed by OpenAI.
Gary Marcus begins by referencing his 2022 paper, Deep Learning Is Hitting a Wall, predicting that the progress in deep learning through scaling would encounter diminishing returns. He notes, “It's amazing to me that a bunch of people have conceded that these scaling laws are not working the way they used to be” (04:12). Marcus argues that the once-promising scaling laws, which suggested exponential gains with increased compute and data, are no longer yielding the same improvements. He cites admissions from industry leaders like Thomas Kurian of Google Cloud and Yann LeCun, acknowledging that the anticipated performance boosts from scaling large models are plateauing.
The discussion shifts to whether substantial enhancements are still possible. Marcus explains that while adding more data and compute can yield incremental improvements, “you're not fitting that curve anymore” (02:26). He highlights that recent iterations, such as GPT-4, provided noticeable improvements over GPT-3, but subsequent efforts like GPT-4.5 and Project Orion failed to meet the lofty expectations set for GPT-5. Marcus emphasizes that these scaling attempts are no longer the reliable pathways to significant advancements.
Marcus delves deeper into the practical outcomes of scaling, using Elon Musk’s Grok 3 as a case study. He states, “Grok 3 is like, yeah, you can measure it, you can see that there's some performance. But for 10x the investment of data compute... it just isn't” (08:51). This example underscores the inefficiency and limited gains from large-scale investments. Additionally, Marcus critiques the naming conventions of AI models, suggesting a lack of meaningful differentiation as companies struggle to justify incremental improvements.
A significant portion of the conversation addresses the "black box" nature of advanced AI models. Marcus explains, “We don't really understand how the system gets there” (14:02), highlighting the challenges in interpreting the internal workings of LLMs. This opacity makes it difficult to diagnose why models like O3 hallucinate more than previous versions or behave unexpectedly, such as adopting a "fratty" persona. Marcus advocates for greater interpretability in AI, emphasizing its necessity for reliability and safety.
Continuing on reliability, Marcus acknowledges minor improvements but points out persistent issues like hallucinations and reasoning errors. He notes, “We have not moved past hallucinations, we have not moved past stupid reasoning errors” (20:23). Marcus argues that while models may perform better on specific benchmarks, their overall reliability remains questionable. He cites examples where AI systems fail at tasks that require genuine understanding, such as accurate coding and debugging, reinforcing his stance that mere scaling does not equate to true intelligence or functionality.
The conversation transitions to the economic ramifications of hitting scaling limits. Marcus predicts, “I don't see OpenAI being worth $300 billion... they have multiple problems, both OpenAI and Nvidia” (29:00). He foresees a potential financial collapse for companies heavily reliant on scaling strategies, like NVIDIA, which is pivotal to the AI ecosystem. Marcus warns that without significant innovation beyond scaling, the inflated valuations and heavy investments in AI infrastructure may not be sustainable.
When questioned about admitting potential inaccuracies in his predictions, Marcus maintains his position by referencing his track record. He states, “I said, if you could beat my comprehension challenge... if you could do like three out of five, we'll call that AGI” (35:16). Despite some industry pushback, Marcus remains steadfast, arguing that until AI demonstrates genuine comprehension and reasoning capabilities as per his criteria, the claims of achieving Artificial General Intelligence (AGI) remain unfounded.
Marcus broadens the discussion to the inherent risks posed by current AI technologies, even without reaching AGI. He categorizes AI risks into three levels:
He expresses deep concerns about the open-sourcing of AI models, arguing that it empowers bad actors and exacerbates the misuse of AI. “I'm very worried about open sourcing at all” (41:53). Marcus criticizes companies like Meta for releasing powerful AI tools without adequate oversight, fearing misuse in areas like misinformation, privacy invasion, and even facilitating criminal activities.
In the final segment, Marcus outlines his vision for achieving AGI through a neurosymbolic approach, which combines the statistical power of neural networks with the structured reasoning of classical AI. He references Daniel Kahneman's Thinking Fast and Slow, comparing current AI (System 1) to fast, automatic cognition and advocating for integrating System 2-like deliberative reasoning into AI systems. “We need to bring them together. And this is what I call neurosymbolic AI” (49:45).
Marcus believes that merging these approaches can overcome the limitations of purely large-scale models, enhancing both their learning capabilities and their reasoning accuracy. He cites AlphaFold as a successful example of neurosymbolic AI, demonstrating its potential to revolutionize fields like biology.
As the episode wraps up, Marcus encourages listeners to explore his work through his Substack garymarcus.substack.com and his books, Taming Silicon Valley and Rebooting AI. He reiterates his commitment to advocating for more interpretable and reliable AI systems, emphasizing the urgent need for transformation in AI research and development.
Notable Quotes:
Gary Marcus provides a critical lens on the current state of AI, challenging the industry's heavy reliance on scaling large models and highlighting the pressing need for alternative approaches to achieve true intelligence and reliability. His insights serve as a cautionary reminder of the limitations and ethical considerations that must guide the future of AI development.
For more in-depth analysis and updates, listeners are encouraged to subscribe to Gary Marcus’s Substack at garymarcus.substack.com and check out his books Taming Silicon Valley and Rebooting AI.