The Science of Making Truthful AI - Azeem Azhar's Exponential View

Summary8 min read

Episode Overview

Podcast: Azeem Azhar's Exponential View
Episode: The Science of Making Truthful AI
Date: February 7, 2024
Host: Azeem Azhar
Guest: Richard Socher, AI researcher, entrepreneur, founder/CEO of You.com, former Chief Scientist at Salesforce

Theme:
Azeem Azhar and Richard Socher explore the nature of intelligence, the historical roots and current frontiers of artificial intelligence, the challenges of building truthful AI, and the future of AI architectures and applications. Through candid discussion, the episode cuts through AI hype, addresses the realities and misconceptions of large language models (LLMs), and unpacks the implications for the future of technology, industry, and society.

Key Discussion Points & Insights

1. What is Intelligence?

Historical Context: Socher discusses the roots of modern AI, referencing influential work on neural nets for speech recognition and the watershed moment of the ImageNet dataset, which enabled significant breakthroughs in computer vision.
- "There's one event that happened even before ImageNet and that was George Dahl and Geoff Hinton actually working on speech recognition and neural nets... Speech recognition actually is best done with a neural network." (03:53, Socher)
Multifaceted Intelligence: Intelligence is not just one thing—it includes motor, visual, language, and higher-level reasoning capabilities. Language and the ability to transmit knowledge across generations is highlighted as a unique human feature.
- "Language enables you to have collective intelligence, too, and historical intelligence and memory." (06:42, Socher)
Engineering Challenge: From an engineering perspective, the challenge is that intelligence is not well-defined, making it difficult to build.
- "It's kind of a disaster from an engineering perspective though, isn't it? ... There's any amount of terrible enterprise software that has been built from that kind of specification." (09:39, Azhar)

2. From Replicating Human Skills to Building New Capabilities

Beyond Human Intelligence: AI need not strictly mimic humans; it can exceed in areas humans did not evolve for, like protein folding or analyzing massive datasets.
- "AI can do things that no human has ever evolved to be able to do..." (10:13, Socher)
Shift in Developer Mindset: LLMs and modern AI models rely less on human-crafted rules and more on learning directly from labeled data.
- "As a developer, you don't think about your own skills and logic that much anymore. You think about what does the neural network need? How do I clean my data..." (13:03, Socher)

3. Limitations and Capabilities of Current AI

Zero-shot Capability & Biological Analogy: Some animal intelligence is genetically 'pre-trained', a useful analogy for the design of neural architectures.
- "Biology figured out a way to store that learning in a genetic sequence such that when that brain gets instantiated and evolves in the womb, then it already has a set of knowledge..." (15:34, Socher)
Surprises in LLM Progress: Progress in LLMs is impressive but not wholly unexpected; abstraction and generalization have predecessors in word embeddings and earlier work on prompt engineering.
- "My dream had always been to build a single model for all of natural language processing. And so in 2018 we invented DECA NLP. ... we invented prompt engineering." (17:49, Socher)

4. Do LLMs Invent or Simply Interpolate?

Mathematical Analogy & Extrapolation: Socher clarifies misconceptions—LLMs can extrapolate within a conceptual 'hypercube' and combine new concepts, not just interpolate between existing data.
- "A big misconception people have is that these models can only interpolate... But that's actually not true." (21:55, Socher)
- "I have images of black cats and I have images of yellow cars. Now the model ... will eventually be able to... create a yellow cat, even though it has never seen a yellow cat in the training data per se." (22:12, Socher)

5. How Far Can LLMs Go? The Limits of Scaling

Scale vs. New Architectures: The most significant upcoming leap may be in having LLMs write and execute code—enabling them to solve problems more reliably. Further scaling (adding more data/parameters) may not be as fruitful as architectural advances.
- "The biggest change for large language models will be their ability to program... You can actually force [an LLM] ... to translate [a problem] into code, run it, and give me an answer." (24:50, Socher)

6. Building Truthful AI: Engineering, Limitations, and Citations

LLMs as CPUs & Orchestration Analogy: Socher likens LLMs to CPUs—powerful general engines which need to interact with memory, storage, the internet, and other specialized components.
- "It's useful ... to think of the LM as the CPU of a computer. ... But ... it needs RAM ... a hard drive ... an Internet connection ..." (27:43, Socher)
The Challenge of Truthfulness: Hallucination (making up information) is unavoidable; combining LLMs with real-time search, up-to-date data, and reliable citation is critical.
- "Generative AI is only useful if the artifacts it produces are quick to verify, but would take you a long time to create yourself." (31:53, Socher)
- "In large language models ... you can't verify if that answer is actually correct or not, and you don't know where the answer came from ... You need that verification." (33:15, Socher)
Engineering Citations: It’s nontrivial for an AI to cite sources accurately; not just a software challenge, but an AI problem itself.
- "The citation logic itself is also a hard AI problem. When do you use which resource for your facts?" (34:12, Socher)

7. The Future Industry Structure: Commoditization and Differentiation

Commoditization of LLMs & AI: Open source models (like Mixtral) may outperform proprietary LLMs. The differentiators for companies will increasingly be design, engineering, and user experience—standard startup advantages.
- "There's a good chance that large language models and maybe even AI will be commoditized, will be not the big differentiator. ... The main differentiator is in standard company startup tech stuff like marketing design, engineering..." (36:40, Socher)
Industry Analogy: Azhar compares LLMs to traditional databases or engines—important but only part of a broader, composable stack, with much of the value in application, not just the core tech.

8. Regulation, Open Source, and the 'P(Doom)' Debate

Regulatory Capture & Cynicism: Azhar suggests the narrative of dangerous AI may be a rational industry strategy to shape regulation and maintain first-mover advantage—a point Socher acknowledges (somewhat wryly).
- "You might want to say your technology is so powerful that it's dangerous ... knowing that the government doesn't have the capability to assess that question..." (38:16, Azhar)
- "We call it regulatory capture in Silicon Valley ... but it seems like the world will not adhere to that. Open source models are out." (39:03, Socher)
AI Existential Risk ('P(Doom)'): Socher is firmly skeptical of apocalyptic AI scenarios, dismissing them as speculative science fiction not grounded in reality or real-world data. He instead favors pragmatic focus on real, already existing issues (bias, fairness, real-world applications), while supporting reasonable, targeted regulation where truly needed.
- "It's a lot of cool sci fi scenarios, they're fun. I would probably watch the action movie ... but we got to keep it real and we can look at real problems, right? AI does have real problems. It will pick up biases ..." (43:18, Socher)

Notable Quotes and Memorable Moments

Azhar on software engineering ambiguity:
"It's kind of a disaster from an engineering perspective though, isn't it? ... Any amount of terrible enterprise software has been built from that kind of specification." (09:39)
Socher on AI and evolution:
"AI can do things that no human has ever evolved to be able to do..." (10:13)
Socher on prompt engineering and LLM evolution:
"My dream had always been to build a single model for all of natural language processing... And we invented prompt engineering." (17:49)
Socher on LLMs and conceptual novelty:
"The models can extrapolate a little bit, but not too far out...The most exciting stuff is where humans can't extrapolate anything because we're not evolved to look at, for instance, protein sequences or millions and millions of weather samples..." (22:12)
Socher on programs as the next leap:
"The biggest change for large language models will be their ability to program... That I think will give them so much more fuel for the next...few years." (24:50)
Socher on citations and truthfulness:
"The citation logic itself is also a hard AI problem. When do you use which resource for your facts?" (34:12)
Socher on the future of AI as commodity:
"There's a good chance that large language models...will be commoditized, will be not the big differentiator. ...The main differentiator is ... marketing design, engineering..." (36:40)
Socher on AI existential risk hysteria:
"A lot of cool sci fi scenarios, they're fun. I would probably watch the action movie that comes out...But we got to keep it real and we can look at real problems, right? AI does have real problems. It will pick up biases..." (43:18)

Timestamps for Key Segments

| Timestamp | Segment | |-----------|---------| | 02:15 | Defining Intelligence | | 03:53 | The historical roots of deep learning | | 06:42 | What is intelligence—from a brain perspective? | | 10:13 | Human vs. Artificial intelligence and developer mindset shift | | 13:03 | Sentiment analysis, feature engineering vs. neural networks | | 15:34 | Biological pre-training, architecture in animals and AI | | 17:49 | The surprising progress of LLMs, prompt engineering | | 21:55 | Limits and capabilities: do LLMs only interpolate? | | 24:50 | The biggest near-term leap: LLMs writing and executing code | | 27:43 | Architecture analogy: LLMs as CPUs, orchestration & engineering challenges | | 31:53 | Making generative AI truthful and verifiable; citation engineering | | 36:40 | Will LLMs be commoditized? Open source vs. proprietary future | | 38:16 | Regulatory capture, strategic narratives, open source's unstoppable rise | | 39:52 | AI existential risk ('P(Doom)'), science fiction vs. pragmatic issues | | 43:18 | Real-world AI risks; skepticism of sci-fi doom scenarios |

Conclusion

This episode provides a candid, accessible yet deeply informed tour through pressing questions in AI: what intelligence is, how models really work, whether LLMs are truly creative or just statistical parrots, the engineering realities behind 'truthful AI,' and the likely shape of the emerging AI industry. Richard Socher draws on his rare dual experience as both leading academic and entrepreneur to demystify the field, foregrounding scientific realism and product pragmatism over hype or fearmongering.
For listeners seeking to understand both foundational AI concepts and their real-world applications (and limits), this episode is indispensable.

Loading summary

Transcript50 lines

[00:00]
Richard Socher
Foreign.
[00:06]
Azeem Azhar
Hi, I'm Azeem Azhar, founder of.
[00:08]
Narrator/Host
Exponential View and your host on the Exponential view podcast. When ChatGPT launched back in November 2022, it became the fastest growing consumer product ever, and it catapulted artificial intelligence to the top of business priorities. It's a vivid reminder of the transformative potential of the technology. And like many of you, I've woven generative AI into the fabric of my D work. It's indispensable for my research and analysis, and I know there's a sense of urgency out there in my conversations with industry leaders. The common thread is that urgency. How do they bring clarity to this fast moving, noisy arena? What is real and what isn't? What, in short, matters? If you follow my newsletter, Exponential View, you'll know that we've done a lot of work in the past year equipping our members to understand the strengths and limitations of this technology and how it might progress. We've helped them understand how they can apply it to their careers and to their teams and what it means for their organizations. And that's what we're going to do here on this podcast. Once a week, I'll bring you a conversation from the frontiers of AI to help you cut through that noise. We record each conversation in depth for 60 to 90 minutes. But you'll hear the most vital parts distilled for clarity and impact on this podcast. If you want to listen to the full unedited conversations as soon as they're available, head to Exponential View Co. In today's conversation, I speak with Richard Socher. He's an AI researcher and founder. As a scientist, his papers have been cited more than 150,000 times. He's also been a successful entrepreneur. He built one company, which was acquired by Salesforce, and then he became chief scientist at that behemoth software firm. And in 2020, he founded U.com. it's a chat search assistant. It's one of could be a new category of consumer Internet products. So we discuss the research behind his startup and his journey to founding it. But I start our conversation with a more expansive question, one which he is qualified to explore.
[02:15]
Azeem Azhar
That is what is intelligence?
[02:18]
Narrator/Host
We recorded the conversation for over an hour. My listeners here have an edited, shorter version. If you want to listen to the whole conversation, head over to www.exponentialview.co. enjoy.
[02:30]
Azeem Azhar
Richard, welcome to the show.
[02:32]
Richard Socher
Thanks for having me. Azim.
[02:34]
Azeem Azhar
Well, 168,000 citations. I mean, that is. That's quite something. I mean, is your family proud of that?
[02:44]
Richard Socher
Actually, it's Funny that you say that, but yes, my dad especially is very proud of that. He even brought it up at his wedding speech.
[02:53]
Azeem Azhar
Right, right, his wedding speech at your wedding.
[02:58]
Richard Socher
Correct? Yeah, yeah.
[02:59]
Azeem Azhar
Wonderful, wonderful. Just to really, you know, put the pressure on your partner as to the expectations that they, they have to achieve. Right.
[03:10]
Richard Socher
Mostly. Mostly for fun, you know. Yeah, it is. But, you know, he was also an academic for a few years and understands that, that, that is not very common.
[03:20]
Azeem Azhar
That stems mostly, I guess, from the paper that in a way started this all off, which was more than a decade ago, the imagenet paper with Fei, Fei Li, who's also of course been one of my guests. I've spoken to her a few times and had that critical bit of kindling that I suppose kicked off the deep learning wave. Do you think that's right? If we look historically, is that reasonable place to start? Yeah.
[03:53]
Richard Socher
So I think there's one event that happened even before imagenet and that was George Dahl and Geoff Hinton actually working on speech recognition and neural nets. There's still some probabilistic pre training models in there, but that was sort of the first time where people say, wow, if we have more training data now, speech recognition actually is best done with a neural network. And then the ImageNet wave came. Of course, ImageNet was the data set. Alex Kruzefsky, Hinton again, and Ilya Sutskever, you know, actually using that dataset and training a large convolutional neural net. That was the watershed moment, I think, for most to understand. Wow. It was enabled by having this data set. So it's a necessary condition for that success. But of course the model is absolutely crucial. And then when you look at my second most highly cited paper, it's a word vector. Paper and word vectors were kind of the necessary ingredient to get natural language processing into the neural network field as well. Because speech is somewhat straightforwardly put into a neural network. Images are very easy to put into a neural network. Neural networks want numbers as inputs. Think of a function F of X equals X squared or something, right? X is the input and you get the function out. Like X squared. Like a neural net is even more, much more complex function. And not just one number, but often thousands of numbers or millions of numbers that are fed into the neural network and then you get some output. And so words aren't necessarily a list of numbers. And so having a word as a vector was a very crucial moment for. And of course there are other ways you can put words into vectors. Word two vectors. The other famous Word vector. But those two papers kind of helped everyone to get to start using neural nets for natural language processing, too. And that's sort of most of the rest of my citations.
[06:04]
Azeem Azhar
Yeah, I mean, they're pretty foundational. Now, there's a key phrase that you said which was if we had enough data. And we're going to return to the question of data during our conversation, but maybe let's zoom out a little bit. We talk a lot about the artificial intelligence wave, the artificial intelligence boom. AI is the hottest domain name that you can find these days. But we often skirt over what we mean by the I in that, the intelligence. So what is intelligence?
[06:42]
Richard Socher
That is a great question. I'll try to keep it short because we could talk about that for hours. But, you know, I think there are different ways of looking at intelligence. And the most obvious one that comes natural to a lot of people is you look at the brain, and then we can look at what can the brain do. And one obvious thing is it can move. It can help. Well, the brain itself, not. But it can trigger movement in physical bodies. And so a lot of people look at robotics as an artificial version of motor intelligence that we see in animals and humans. Then the brain also helps us understand visual input. So we look at computer vision in AI, and you look at visual intelligence, the visual cortex, which takes up quite a bit of parts of the brain. And then we can look at natural language. And that, of course, I think, is the most interesting manifestation of intelligence. And it is certainly the one that sets us apart the most from animals have obviously, in some cases, even better motor control. Chimpanzees have very fast motor control and great visual inputs. But natural language is kind of what we have the most of. Much more sophisticated than any other animal, as far as we know. Maybe there's some questions around whales, but it's not helpful to not have opposable thumbs to create writing and so on. Writing. And that's the interesting thing about language. Eventually, language enables you to have collective intelligence, too, and historical intelligence and memory. Right. You can now learn from lessons of other people you've never met or seen or could talk to. No one would remember them, but you read their text from thousands of years ago. And language also connects thought. And then you can talk about visual inputs, you can talk about motor inputs and outputs. And so I think language is the biggest one. And then, of course, there are sort of these even higher, higher levels of intelligence, like planning and logical reasoning and mathematical reasoning. And the field of AI has often made the mistake of thinking, well, if we solve those hardest bits, the rest will be easy. But it turns out it was actually opposite, like logic, math and so on is already easier for a computer.
[09:00]
Azeem Azhar
It's interesting, but in your discussion of it there, it reminds me a little bit of Justice Potter Stewart discussing obscenity and obscene material. And he said, I'll know it when I see it. And you know, from an engineering perspective, it's. Well, let's say this from a human perspective, from a perspective of us being sort of squishy biological things with feelings and consideration and empathy, it's quite nice to have this idea that there are, you know, lots of different types of intelligence and we can't quite put our finger on it. I mean, it makes the world more, more livable and we can be more accepting of others.
[09:39]
Narrator/Host
It's kind of a disaster from an.
[09:41]
Azeem Azhar
Engineering perspective though, isn't it? It's like, you know, build this thing. It kind of is a bit like this, but it might be a little bit like this, but, you know, you know what I mean? I mean, there's any amount of terrible enterprise software that has been built from that kind of specification. So how do you, how do you match what you, you described with what an engineer needs to try to build, build an intelligence? Or is it that in reality, when we take away this sort of marketing gloss, what we're building is kind of great software, but we're not really building intelligences?
[10:14]
Richard Socher
It's a great question. It also is connected to the second bit of my previous answer, which is I think most of the time we looked at human intelligence and then we tried to replicate that in artificial intelligence. But of course, AI can do things that no human has ever evolved to be able to do. And so we can take scientific data, we can take protein sequences, we can take weather data from millions and millions of samples, and then try to do better weather prediction than a human could ever do, better amino acid protein prediction than any human could ever do. And so I, I think, and that's kind of the interesting, where the analogy breaks. You know, we can understand the concept of intelligence without having to replicate it exactly the way a human does. You know, humans like have all this evolutionary baggage of zero sum games of having to hunt and mate. And you know, there's always competition. AI doesn't have to evolve in that, in that pattern. And what's interesting is in recent years, and that started sort of around 2010 with some like minor network papers and natural language processing, but also in other places like computer vision, of course, earlier and speech even before that. But we don't ask the programmer anymore to try to replicate either the brain, but also not even their own expertise about the particular problem. So, you know, an early example in NLP from my, I think third or fourth most cited paper was sentiment analysis. In the past, sentiment analysis was done by having linguists and experts sit there and say, well, I know a lot of positive words. Amazing, awesome.
[11:49]
Azeem Azhar
So sentiment analysis is essentially what marketing firms do to figure out whether a brand is going up in people's imagination or not. Right. They see a tweet about a brand and they analyze it and they say, this person who said, you know, Pepsi Cola is so bad, that's actually a positive thing. Right. Because bad means good in that cohort. And so sentiment analysis was often seen as quite, quite tricky and relied on, you know, cultural knowledge and sort of expertise from linguists, in a sense.
[12:19]
Richard Socher
Exactly. And like, it's used in lots of places, people use it for algorithmic trading. In fact, there are fun stories where whenever Anne Hathaway starred in a movie, she won an Oscar, people said that, you know, they loved her acting. And then the stocks for Berkshire Hathaway went up multiple times after Anne Hathaway movies come out. And we call that entity disambiguation. Right. Just a different half of it.
[12:41]
Azeem Azhar
I have to put my hand up and take some responsibility for that, because about 18 years ago, when I was at Reuters, I, one of my teams, developed and launched the first algorithmic newsfeed for hedge funds to automatically train on, you know, sentimental.
[13:00]
Richard Socher
You may have actually been in the thick of that one. Yeah, that's funny. Yeah.
[13:04]
Azeem Azhar
So.
[13:04]
Richard Socher
So sentiment analysis, people used to say, oh, now, now there's negation, so it's not good. So that's like, you know, that's a feature. And then the AI quote, unquote, which is, wait, those human design features. And then eventually they would realize, man, there's a lot of complexity here, right? Like, this movie doesn't care about cleverness or any other kind of intelligent humor. It's like, well, there's a lot of positive words in there. There's some negation there, but the negation actually negates everything positive in that sentence. And so we created the largest sentiment data set. And then that allowed us to also show that a neural network outperformed every other traditional AI model, or machine learning model, as we called it back then, like, by a good margin. And so that is what's very important now. As a developer, you don't think about your own skills and logic that much anymore. You think about what does the neural network need? How do I clean my data, label my data, think about distribution shift over time because there are new things that come up that data weren't in the data before. And so you then try to just give those as training data to the neural net and you let it figure out all the complexities and details and logic of that domain.
[14:14]
Azeem Azhar
So I mean, one of the challenges with that approach is that there are clearly things that humans do which are not reliant on their empirical experience are things that they've observed, right? So mathematical reasoning is one great example. I mean, if mathematicians were empiricists, we would still be at Pythagoras theorem, right? We could say goodbye to topologies and manifold and all these other things. But the other challenge is that we can see in the natural world behaviors which are what we'd call in machine learning. Zero shots, right? If you see a baby ibex running vertically up a cliff away from a predator, it doesn't get a chance to run that picture in a million training cycles over 10 epochs or whatever it is. So how do we square the fact that we can see things that in this Potter Stewart esque definition of intelligence, of knowing it when we see it, where it's not done from training and it's not done from thousands of trial runs.
[15:34]
Richard Socher
I think we're conflating two things here that are actually very interesting, which is I think it's clear that a lot of animals have what I would call biological genetic training and have basically a set of weights that they're fascinatingly born with. Like, you can go even to like, I think horses for instance, they plop out and they just like start walking away after like a few minutes versus like humans have to figure out a lot of stuff. And there are very cute pictures of babies trying to give a thumbs up and they're like, I think this is it. And then you get the feedback from the parents and they're like, yes, thumbs up. Those were the right fingers. And so I think there is actually a ton of training and learning. And then biology figured out a way to store that learning in a genetic sequence such that when that brain gets instantiated and evolves in the womb, then it already has a set of knowledge. And in some ways we're doing a little bit of that in that some of our models are made to be able to ingest images, for instance, versus sound versus text. And so you know that there's some sort of high level architecture that makes it better, for instance, to deal with time sequences and different lengths of those time sequences. So there's a little bit of architecture learning also that humans are going through right now. And we actually tried in the field to do a little bit of automated architecture search for a while, but it never really fully. Humans are still better at finding the best neural network architecture than some other AI models.
[17:19]
Azeem Azhar
If we come to large language models, which is the state of where we are today, it's what's got people excited, it's what's propelling these DECA billion dollar company valuations. Are you surprised with what large language models can, can seemingly achieve today compared to where you thought they might be, say two or three years ago? I mean, has you been surprised to the upside or to the downside?
[17:50]
Richard Socher
A little bit, but maybe not quite as much as, as most other people. The, the bit where of course it's, it's amazing is how much they can abstract away knowledge and do things. And we've seen inklings of that before, even in word vectors, where all of a sudden you could do the famous example of king minus man plus woman goes to queen. And I was like, oh, we never taught it that, but it kind of learned it in just the word vectors. And then we worked on contextual vectors that would put a whole sentence in there and you had similar interesting patterns you could see there. And then my dream had always been to build a single model for all of natural language processing. And so in 2018 we invented DECA NLP. And part of that was that we invented prompt engineering. I just released my TED talk, just got released about that. And so basically with prompt engineering, the idea was we would have one model in the past just to understand why that is so interesting. Like in the past, we would often say, all right, you want to do sentiment analysis, we train a sentiment analysis model. You want to do translation, we'll train a translation model. You want to do summarization, we'll do a summarization model. And we're like, well, what if you just had a piece of text and you just asked a question about that text? And that question could be, what's the sentiment? What's the translation into German? What's the summary? And then you just basically can train a single model, you pre train the word vectors, the contextual vectors, the whole answer decoder. And that paper actually famously got rejected from ICLR very publicly, but motivated a few others, including folks at OpenAI who at the time were still mostly working on hand gesture recognition and dota Gameplaying and so on. But in their GPT2 paper, they cited Brian McCann and Al and Deca NLP and say, look, these guys were able to build a single model for all of these different problems. And they just had, and they called it not a question eventually, but a prompt. And so that 2018 paper motivated them to push also on that single model for all of nlp. But it also made us think, well, clearly we should build a better answer engine. And then ultimately, when you use a search engine, what you actually want is to get an answer, not a list.
[20:10]
Azeem Azhar
Of 10, not a list of blue links, right? Yeah, but I want to come to something that you said there, which I think is worth digging into. You said that. I may be paraphrasing here, but these networks and the more basic technologies like word vectors and embeddings were able to see relationships that weren't kind of explicitly programmed in. And I think we've all had that experience in the sense that if you use one of the large language models, I mean, ChatGPT is a big one that people are most familiar with, you can get it to analogize, right? You can, you can get it to draw analogies in the same way that you can say queen is to king as woman is to what? And it'll say man. And I think that was, you know, the famous embeddings paper from a few years ago. And, and the question is whether that's actually new knowledge, right? Or whether that isn't just data that is in the, you know, in the data set. And I suppose the way to think about this is If I have 22 soccer players and I measure their heights, the heights is in the data set, but the mean, the average of their height is not in the data set. Do we then say to extract the mean is finding new information that wasn't in the data set or is it just like a mathematical property of the data? Because I kind of feel that some of the examples that we talk about are actually just mathematical properties, right? It's like, I don't know, like it's a cosine similarity or a distance in this multi dimensional space and these two things approximate to each other. It's not new knowledge, it's like a literally a mathematical reality.
[21:56]
Richard Socher
So in some ways, yes, it is not like I would say it is in the data, quote unquote. But more importantly, a big misconception people have is that these models can only interpolate. You know, I have a point X here, I have another point Y here, and all the model can do is find things on the line between X and Y. And then, you know, you can think of larger things that eventually are called convex hulls. And a lot of people think the model can only interpolate between all the things it has seen before, but that's actually not true. One thing that's amazing about these distributed representations, we have a natural language processing, these large neural nets is they can be on the hypercube of concepts. And what does that mean? For instance, I have images of black cats and I have images of yellow cars. Now the model will, like an image generation model will eventually be able to say and create a yellow cat, even though it has never seen a yellow cat in the training data per se. And so it can actually merge new concepts. And now you have to be more creative and think of, oh, I want to see a yellow cat, which doesn't exist in nature. And so you have to creatively kind of think of that and you don't have to do the execution anymore. But so the models can extrapolate a little bit, but not too far out. And then of course the most exciting stuff is where humans can't extrapolate anything because we're not evolved to look at, for instance, protein sequences or millions and millions of weather samples from different weather stations to predict, you know, where the weather might go next. And so that is where these models can shine, outshine humans already massively.
[23:44]
Azeem Azhar
So your yellow cat analogy I think is quite helpful, especially for a non technical audience. So if we think about this in maybe a non technical terms, I had a discussion a few weeks ago with somebody who, you know, has definitely jumped both feet into the large language model space and is buying Nvidia GPUs like they're going out of fashion. And this person said to me, you know, don't underestimate how far we can take large language models with the, the sort of suggestion that there's a couple more years of at least of really rapid technical progress, which we've already seen, maybe even that this is kind of almost like an end state of an architecture to get to a great, to get to a great capabilities. What do you think? How far can we go with large language models to really have the next paradigm shift and moment? Do we get there through scale or do we get there through some radically new architecture or indeed approach?
[24:51]
Richard Socher
It's a great question. And I actually think that the biggest change for large language models will be their ability to program. So let me explain that. As you said, large language models just predict the next token given the previous set of tokens, which can include very complex activities like prompts. And their biggest shortcoming is that they will hallucinate and they will make up stuff, especially if you ask it a question around, for instance, a mathematical subject. Like if I gave a baby $5,000 at birth to invest in some no fee stock index fund and I assume some percentage of average annual returns, how much will they have by age 65? Now a large language model will just be like, I've seen like questions like this, you know, and I'll just like start writing a bunch of text. But it doesn't, it doesn't actually say, well this requires me to think super carefully, do some real math and then like give the answer. But you can actually force it, you can tell it to say, hey, if there's a complex mathematical question, how about you try to translate that question into computer code, in our case on u.com in the Genius mode that we have, into Python code and then you write that code, then you run the code, and then you look at the output of that code and give me an answer. And that insight that we can get them to write code and a surprising fact that they will write code that compiles that is absolutely perfect syntactically and often semantically also like that I think will give them so much more fuel for the next like few years in terms of what they can do. Because you think about it, once this model can write code and code runs, you know, software runs the world, AI is eating software. Like you know that then you realize these models can do things also you can run APIs that will execute certain things in the real world. And so I think that's where we get a lot more juice. I think in terms of extra scale, it's unclear because at some point we've crawl the whole Internet. There's only so much more data that is very useful for the model to train on.
[27:02]
Azeem Azhar
That was helpful sense of you.com, which is your, I'm going to call it AI personal assistant that you're building and just unpick for me that process by which you very briefly that you are able to create robust working code that doesn't have confabulations in it. Do you send it to a different system effectively to make sure that it's robust? So in a way what the LLM is doing is kind of being like the conductor in a train station or are you able to do it in a single architecture?
[27:44]
Richard Socher
You bring up a great point, which is these large language models People think they're just like. Some people think they're just magic and they're going to do everything. Some people think they're overhyped, but there's clearly amazing new capabilities in them. But what we found is if you actually want to run them in production to give to millions of users millions of answers a day, then you actually have to think of them more. And this is an analogy that my friend Andrew Karpathy came up with, which is a little bit broken in some ways, but it's very useful in general, which is to think of the LM as the CPU of a computer. And the CPU is amazing and it's the core of the engine of a computer. But like, it actually needs ram, it needs random access memory, and that is our context window. It needs a hard drive, which is our embedding file, embedding systems for retrieval, augmented generation, where you actually can refer back to facts. It needs an Internet connection and a browser, right? And that is what we've built, like a new index of the web that is meant to be consumed for, for LLMs. And it needs all of these things around it. And it turns out in the end, when you ask a question on U.com, we actually run 10 different models. Several of them are large language models. And there is certainly one core large language model in there. But you need to have a lot of other capabilities. Just the ability to execute code is a whole Python interpreter, right, that the LM now has access to. It gets to choose whether it wants to use it or not. And that's what enables these amazing answers. And in some cases you also have to switch CPUs completely. Maybe you can think of that as multicore or something, and that analogy. But for some questions, it doesn't make sense to use a massively large neural net because it's expensive and slower. And it makes sense for a simple question to just use a smaller large language model, quote, unquote, and then the LM kind of becomes the orchestrator of all of these different systems.
[29:42]
Azeem Azhar
So I think that's such a great analogy. I've seen Andre Capati's sort of paper or presentation on this question as well. And it's a really powerful model because what it essentially says is that.
[29:58]
Narrator/Host
We.
[29:58]
Azeem Azhar
Can make discrete certain capabilities because it's actually just too hard to generalize them. And I think about how do you get these things to run on people's phones? How do you get them to run on edge devices? And so what that means is that I think when people hear the word LLM and the way it's been presented by certain journalists has been this idea that it's a kind of full wrapping. But of course, in my simplistic world, an LLM is like a CPU or it's like the engine of a car. All cars need engines, but not all engines are cars. And you need brakes and wheels and an axle and seats and like other engineering things that, that cars have. I'm not a car engineer, so I don't know, but I know there are other things. And so, so that, that's part of, I guess of the presentation of, of where we are, but it also speaks in my mind to where we can get to with this technology because ultimately a car engine will never get you from New York to Philadelphia because you need wheels and you need a chassis and you need a bunch of other stuff. So when we look at the drawbacks of kind of LLMs, one approach could be it's the productization that sits around it. So one of the things I found fascinating with you is that or you.com is it does sort of footnote and reference its, its answers. So help us just briefly, again for like a non technical audience, help us understand what's going on. There is this, this isn't just magic coming out of a big LLM. There's some, there's some engineering happening. So how does it roughly work without giving away the 11 secret spices that go into your delicious batter?
[31:53]
Richard Socher
Yeah, you're 100% right. It is. There's just a ton of engineering that's required to make it work accurately. You know, the biggest problem that LLMs had, and this is sort of something we ran into like two years ago in 2022, when we wanted to be the first and ended up being the first like search engine that actually connects an LLM to the web. And so, you know, an idea that obviously has been copied hundreds of times last year by big and small, small folks. But the main problem with these LMS is that they will hallucinate. They'll just predict the next tokens and that might make stuff up. They cannot be trained every five minutes. It's not physically and computationally possible to train a large neural network or a large language model. Every five minutes a news article happens. So that's another problem. And then the third thing is, and that is I think a general thing about generative AI, which is generative AI is only useful if the artifacts it produces are quick to verify, but would take you a long time to create yourself. So generative AI for Images, for instance, is very powerful because you can create an image and then look at it and say after a second or two that's beautiful or not. But it would take you a very long time to create that image. And the same thing is true for large language models. When you get an answer and you can't verify if that answer is actually correct or not, and you don't know where the answer came from, is it some random blog you have, some whatever cancer treatment you need to know, and you don't want some hippie dippy blog to be the main source of that answer you want to verify it comes from legit research resources and so on and journals. You need that verification. And those were the three main problems we solved by telling the large language model. Hey, you can use the Internet if you want for this question. Maybe someone just asked, write me a poem about love and paramotoring and then I'll just write that. And you don't need citations in a poem, but if you ask a question about a recent news event or a complex health condition or some advice in school, then we can tell the lm, hey, you could look up on the Internet what the right answer might be and then in the prompt you can encourage it to use those answers. And then you have to build citation logic. And one thing we found actually is that the citation logic itself is also a hard AI problem. When do you use which resource for your facts? And in fact there are some folks now that kind of copy that idea of having these citations, but they have fake citations. They say, oh, here's a fact and then they add a citation behind it and then you click into that citation and that website doesn't even mention that fact. So it's also an AI problem actually how do you correctly cite your sources? And that's where we're really pushing hard to get amazing answers that are factual, up to date and verifiable through these citations.
[34:51]
Azeem Azhar
What you described to me though now starts to look a lot more like the traditional software industry, to be honest. I mean, an LLM is a little bit like a database. It behaves slightly differently to a traditional database because you buy it pre configured with lots of information in it. It's stochastic, it's fuzzy rather than deterministic. So the things that would have you fire Oracle or MySQL or Infamix, you wouldn't fire an LLM for because that's what you want it to do. It's part of the value it provides is that it doesn't give the same answer every time. It also has this distillation, like it's got a compressed version of the whole of the Internet, you know, inside. But what you described to me actually though in terms of building products is just something that looks very similar to building with databases or with a mobile framework. And I wonder if that's the case, what you think the structure of the industry is going to look like. How different is it really going to be from how we've built enterprise or Internet applications in the past? You know there's a lot of it is open source, right? The domain name system, Apache, 50% of databases or more are open source, MySQL and so on. And then lots of the consumer front end sort of the JavaScript frameworks are open source and then you've got proprietary systems. And the value actually comes from how those things get stacked together and where companies sit in the value chain, right. And whether they can attach a network effect or something to it.
[36:40]
Richard Socher
You bring up a really good point which is I think there's a good chance that large language models and maybe even AI will be commoditized, will be not the big differentiator. And we're already seeing this with open source Mix trial now on various benchmarks has outperformed Claude, which is like unbelievable. And it's open source, so it's not fully open source. I think for AI we need to actually redefine what open source means and it's not just include here's a final trained model, but it ideally understand includes the training data, all the hyperparameters and how to train that model with that data, the training code also which most people don't make public and a bunch of other stuff. So like I think there's more to open source, but still you can use the model and you can actually fine tune it yourself. And that means that you might argue that most exciting core is going to be commoditized and the main differentiator is in standard company startup tech stuff like marketing design, like engineering, making it fast and beautiful and everything. And so I think there is something to that now. It is an exciting new capability, right? You couldn't have built this a few years ago before large language models came out. And it will disrupt potentially this trillion dollar traditional search industry. And you know, I put my eggs into that basket. But indeed to really win you do need just to do a lot of standard startup things.
[38:17]
Azeem Azhar
So I mean a lot of startups succeed with, with Mindshare and then securing your developer buy in. And so one could imagine just in a hypothetical world that one thing you might want to do if you're early out of the gate is say your technology is so powerful that it's dangerous, that it really needs government intervention, knowing that the government doesn't have the capability to assess that question or to do anything about it. But it will certainly create a lot of mindshare for you so that you're on the COVID of every paper for a year. I mean, that would be a really, really rational strategy and then you'd build a lot of a big developer community around that. Am I being cynical if I say that?
[39:03]
Richard Socher
I mean, you're not the only one. We call it regulatory capture in Silicon Valley and that is certainly what a few of the really large players that have been able to secure a billion plus dollars in funding have been trying to do. But it seems like the world will not adhere to that. Open source models are out. And so it was a nice try, but hopefully won't be that successful.
[39:31]
Azeem Azhar
So are you, I mean, just a brief answer on this one. You know, are you somebody who carries what I think a number of people in Silicon Valley carry these days, a P doom in your head, or do you feel like that's not even a question that you should devote a cycle of your, of your time to?
[39:53]
Richard Socher
I do not. You know, at some point, of course enough people have talked about it and I love AI, I'm writing like on some weekends also a book on AI and like, so you of course have to answer to some of these questions. And you know, if you're a non expert and you hear some of these experts talk about this, you get very scared. And I think at some point, you know, the folks that think PDUM exists and is very, very large, they'll, they're scared some person who has a mental health condition and will find a gun and then say, well, I gotta murder some AI researchers I guess, like, because who wants doom, right? They're guys, these guys are the most evil people in the world working on doom. And I think it's a really dangerous conversation. And so let me, let me maybe dive into it more than I would have loved to. But I think it's important, I think the, the interesting like thing P stands for probability, right? And the most important thing you learn about probability is Bayes Theorem. So it already starts being a problem because these people want to look at pdoom, which is a prior, essentially. It's not, they're not looking at PDOOM conditioned on some data, right? So it should be conditional probability but these people don't look at actual data, right? They come up with really interesting fun sci fi scenarios like Terminator, you know, we have time travel, Terminator comes back, wants to destroy this and that and like, oh, there's like these micro nano robots that are somehow super intelligent and then they will destroy everyone. And this I wants to like somehow thinks of humans as like this.
[41:27]
Azeem Azhar
Very imaginative. Richard, your book should be a sci fi book. I think it would do very, it would do very well.
[41:33]
Richard Socher
And funny enough, some of the biggest proponents of P Doom are former sci fi authors. But it's not research. And when you actually double click into it, and I did engage with several of those folks and I have a public conversation with Nick Bostrom in the German media. It was all in English, you probably find it online. But I engaged with some of them. But if you actually double click into how is it really going to make humanity extinct? Like get rid of all of us? Like the scenarios that they actually come up with are hilarious. And there's like, oh, but it can influence people so they all murder each other. And I'm like, you know, if the most intelligent people would always rule everything, we would have very different politics than we do. Right? And so it's not like an intelligent person can just convince everyone to do something because they're so much more intelligent. And so I think it's a lot of cool sci fi scenarios, they're fun. I would probably watch the action movie that comes out of it too. But we got to keep it real and we can look at real problems, right? AI does have real problems. It will pick up biases and humanity isn't super proud of all the biases and historical training data that were racist and sexist and so on. And as AI touches real lives, I'm not against regulating it, right? It makes sense to regulate self driving car startups. So not every startup can just like go up on the highway and see what happens. I don't want my AI neurosurgeon in the future to just like try some reinforcement learning in my brain and see if it works out or not. You know, like I want that massively validated and regulated before it gets to people. But it doesn't make sense to try to regulate the basics of foundational model research. It's just absurd.
[43:19]
Narrator/Host
Well, thanks for listening. What you heard was an excerpt of a much longer conversation. To hear the rest of it, go to exponentialview. Co members of Exponential View and the community get access to the full recording as soon as it is available and they're invited to continue the conversation with me and other experts. I do hope you join us. In the meantime, you can follow me on LinkedIn threads and substack for daily updates. Just search for Azeem A Z E E M or if you're in the US and Canada, Azeem Thanks.