Tom Griffiths, "The Laws of Thought: The Quest for a Mathematical Theory of the Mind" (Henry Holt and Co., 2026) - New Books Network

Summary

Generating summary

Hang tight — this usually takes a few minutes.

Loading summary

Transcript81 lines

[00:01]
Commercial Narrator
Close your eyes, exhale, Feel your body relax and let go of whatever you're carrying today. Well, I'm letting go of the worry that I wouldn't get my new contacts in time for this class. I got them delivered free from 1-800-contacts. Oh my gosh, they're so fast. And breathe. Oh, sorry. I almost couldn't breathe when I saw the discount they gave me on my first order. Oh, sorry. Namaste. Visit 1-800-contacts.com today to save on your first order.
[00:28]
Tom Griffiths
1-800-Contacts.
[00:30]
Commercial Narrator
My dad taught me a lot, including how easy it is to forget to cancel things. So I downloaded Experian, my BFF Big Financial Friend. Experian could help me cancel my unused subscriptions and lower my bills, saving me hundreds a year. Get started with the Experian app today. Your big financial friends here to help you save smarter. Results will vary. Not all bills or subscriptions eligible savings not guaranteed $631 a year average savings with one negotiations and OnePlus cancellations paid membership with connected payment account required. See experian.com for details. Experian this message may be shocking to millennials. If you are one, you might want to sit down Right now, loads of people are searching the following on depop Low rise Jeans, halter top, velour, tracksuit, puka shell necklace, disc belt. You likely place these in the dark of your closet in 2004, never to be seen again. But if you can find it in yourself to dust them off, there are a lot of people who will give you money for them. Sell on depop, where taste recognizes taste.
[01:31]
Gregory McNiff
Welcome to the New Books Network. Hello and welcome to the New Books Network. I'm your host, Gregory McNiff, and I'm excited to be joined by Tom Griffiths, the author of the Law of the Quest for a Mathematical Theory of the Mind. The book was published by Henry Holt and Company in February of 2026. I selected the Laws of Thought because it addresses one of the most ambitious questions in cognitive science, whether human thinking can be described by formal mathematical principles. This book combines intellectual history with modern AI and cognitive science in a way that is both rigorous and accessible. The Laws of Thought explains not only how different models of the mind work, but why particular mathematical frameworks have shaped our understanding of cognition. Good afternoon, Tom. Thank you for joining us to discuss your book.
[02:19]
Tom Griffiths
It's great to be here.
[02:21]
Gregory McNiff
Tom, why did you write this book and who did you have in mind as the target audience?
[02:26]
Tom Griffiths
Well, I think in writing the book, I first of all wanted to share a set of stories that I love with other people. For a long time, I've been interested in this question of how we can use mathematics to understand the mind, going all the way back to when I was a teenager in Australia. And I guess in some sense, there's an audience I have in mind, which is that younger version of me who is really interested in knowing these things and having the chance to tell that young person all of these things that I've learned in the time since then. But I think the other reason I wanted to write it is that I think for many people, AI seems to have come out of nowhere. We're suddenly interacting with our computers in a new way and chatting with these systems that seem sort of remarkably human like. And I wanted to give people some of the context for this moment that we're in, so that when they're trying to think about what it is that AI can do and how it relates to how human minds work and so on, they have a set of tools for making sense of those things.
[03:23]
Gregory McNiff
Yeah, absolutely. I want to ask you about two things you said there. Namely, growing up, I think you spent a year or two at home due to a Covid like sickness, where you developed an affinity for programming and got really good at it. So I'd like to ask you about that. And then this notion of AI coming out of nowhere, it seems like it's that joke of, you know, an overnight success takes decades, as you clearly go through in the history of your book. It didn't come out of nowhere. So I want to ask you about that perception, and I'm sure ChatGPT did a lot to foster it, that it just, you know, all of a sudden appeared out of nowhere. But I do want to ask you about that. But before we get there, you have a really nice opening that lays the groundwork for the book. And in there, you suggest that the cognitive revolution began when mathematics was used to express testable hypotheses about the mind. Can you talk about that moment, that inflection point?
[04:19]
Tom Griffiths
Yeah. So the cognitive revolution was a moment that happened in the middle of the 20th century where psychologists suddenly discovered some of the tools that they need to have a rigorous science that also talked about things that are inside our heads. So one of the challenges of doing psychology is that you have a subject matter which you can neither see or touch. Right. Like, even physicists who are trying to think about, you know, distant stars or tiny atoms have tools that they can use for actually looking at those things or potentially even intervening on them. And so Psychologists, faced with this problem in the first part of the 20th century, said, we're not going to even think about thoughts or talk about feelings. We're going to just focus on the things we can see or touch, and those things are behavior and the environments that shape it. And so, for the first part of the 20th century, you weren't really allowed as a psychologist to talk about mental states. And so the cognitive revolution said, well, we do want to talk about thoughts and language and all of the other things that happen inside our head, but maybe a way that we can do that rigorously is by using math. And so, just like physicists, part of the reason that they're able to span that giant realm that goes from atoms to planets is that they have mathematical theories that allow them to do that. Psychologists discovered that some mathematical tools were the things that would make it possible for them to have a rigorous science of the mind.
[05:38]
Gregory McNiff
Perfect. Throughout the book, you sort of focus on three frameworks, namely rules and systems, neural networks, and Bayesian models as, I guess, the mental frameworks to understand the development of this mathemization of the mind. Can you briefly talk about each of those and why they're important?
[05:56]
Tom Griffiths
Yeah. So psychologists and cognitive scientists more generally. So cognitive science is kind of the science that grew out of that cognitive revolution. It's using all of the different disciplines that we might use to study the mind. Psychology, neuroscience, computer science, philosophy, anthropology, linguistics, sort of coming together. Cognitive scientists have pursued these three different approaches, really, because they capture different aspects of thought. So the sort of starting point where people were starting in that cognitive revolution was the math of rules and symbols, things like logic, things like how computers work. That was kind of the analogy that they had for explaining things. And they sort of pushed that as far as they could. But when they did that, they discovered there are certain kinds of things that it's quite hard to describe using logic. So, for example, if you want to understand the concepts that people have, right. If you ask someone whether a chair is a piece of furniture, they'll say yes. If you ask them whether a rug is a piece of furniture, it's a little more fuzzy. And that fuzziness wasn't something that logic could capture. And so you needed a different kind of math to capture that. And they started thinking in terms of things like maybe concepts correspond to regions in space and objects are points in that space, and sort of using a sort of geometric metaphor. And that led to the development artificial neural networks. And then the third approach, probability theory, tells us how it is that we should reason in the face of uncertainty. So if we're trying to solve problems where we've got some information, but we don't have enough information to be certain about the conclusion, we can't use logic. But probability theory tells us what inferences we should make, and it also tells us how learners should learn from data and how it is that they can use biases that they've accumulated and sort of other sources of information to be able to make quick inferences from small amounts of data. Something that's really important for explaining human cognition.
[07:47]
Gregory McNiff
Okay, I want to start with the first framework. We just discussed rules and systems. And specifically, it seems like the forefather or the individual who really served as the catalyst for it was George Boole. Could you talk about his contribution to mathematizing logic?
[08:06]
Tom Griffiths
Yeah. So people had been thinking about how something like reasoning works for a long time. You can go all the way back to Aristotle. And Aristotle in ancient Greece started to think about how can we characterize what makes something a good argument. It's not entirely clear whether he meant a good way of thinking versus a good way of trying to convince somebody of something. But he was interested in this question of how do you figure out what's a good argument? He was interested in syllogisms where you have sort of two premises that you assume and then you draw a conclusion, like, some A's are B's, some B's are C's, therefore. Well, in that case, I'm not sure what conclusion you can draw. Maybe all A's are B's, all B's are C's, therefore all A's are C's. Right. He wanted to know the difference between those arguments which, you know, tells you that one of those is a good one and one of them is not. And so he had some kind of theorizing about it. And then the mathematician Wilhelm Gutfred Leibniz had some sort of attempts to formalize this. But the person who really cracked that problem was George Boole, who was a schoolteacher living in England, and on the side, a really serious mathematician. And so Boole was trying to think about, how do you use something like algebra? That was his key insight, was that in order to solve this problem, it wasn't going to be just like arithmetic. It was going to require a new kind of algebra, something like algebra, to describe how these processes of reasoning might work. And so he was able to show that there was a simple algebraic system that could reproduce all of the kinds of conclusions that Aristotle had come up with. And Then go from there to define something that gave us the real structure of our sort of first kind of mathematical logic.
[09:49]
Gregory McNiff
Perfect. I have to ask you, Boole's first book, I believe, was the Investigation of the Laws of Thought. I'm sorry, An Investigation of the Laws of Thought was his second longer book. The title of your book is obviously the Laws of Thought. Do you see yourself as any way falling in that tradition of Boole? How do you situate yourself relative to his breakthrough?
[10:13]
Tom Griffiths
Yeah, so there are really two relationships. And I should say, as you can tell from that similarity in titles, the book is really inspired by Boole's work. That's something that I took as inspiration in sort of putting this together. And the epigraph comes from Boole's book as well. Sort of saying, this is something that we should be concerned with as people who want to make sense of the world. So Boole wrote that book in the middle of the 19th century and at that time people talked about this idea of the laws of thought as the same way that we would talk about the laws of nature. So the same people who are kind of like trying to use math to understand the physical world, were interested in using math to understand the mental world. And they talked about these two things in parallel. Just as we can figure out what the laws of nature are, we can figure out what the laws of thought are. So Boole's book was one of a bunch of things that were published at that time using this phrase, the laws of thought. It's kind of come to be associated with him because it's the one that survived from that era forwards. But I wanted to revive that term is something that's in our vocabulary.
[11:18]
Gregory McNiff
Right.
[11:18]
Tom Griffiths
When you go to school, you learn the laws of nature. It would be nice if you go to school and also learn the laws of thought and sort of re advertise the importance of that. So there's a thread there that goes back in terms of saying, hey, this sort of 19th century idea that we should be using thinking about the internal world in the same way we think about the external world is something that we really want to bring through to this modern time. And then the second part is that really what I end up talking about? Two of these approaches, logic and probability theory, are things that Boole was preoccupied with as well. And so the things we end up with as our sort of modern laws of thought, tracing from that origin with Boole all the way through the following 200 years, they really the same fundamental ideas. But we can see how those have developed and turned into tools that we can use for understanding human minds as well as tools for understanding AI.
[12:11]
Gregory McNiff
Right. And we'll meet, I guess, a descendant of Bool working today who made sizable contributions to AI as well, right? Geoffrey Hinton.
[12:19]
Tom Griffiths
Yeah, that's right. So one of the nice sort of stories in the book is that one of the people who played an important role in developing the algorithms that are used for training the neural networks that underlie modern AI is a descendant of George Boole's as well. So we have Boole being responsible in some way for all of the different ideas that I talk about in the book.
[12:43]
Gregory McNiff
No, no. Fascinating. I want to turn to behaviorism. And you write, quote, behaviorism is such a counterintuitive idea that it's worth spending a moment trying to understand it. Can you explain the problem that the behaviorist movement was trying to solve?
[12:57]
Tom Griffiths
Yeah. So behaviorism is the perspective that was that pre cognitive revolution version of psychology. So they said, we want to have a science. Psychology should be a science. And if they looked around and they looked at what the other sciences were doing, you didn't see anything where there were sort of fuzzy subjective stuff. Right. So prior to behaviorism, the sort of earliest psychologists used methods that relied on subjective reports. So if you went into the sort of very earliest psychology labs, you would see people who are engaged in experiments and they were carefully trained to sort of report on their internal experiences. And then that was treated as data. And then those data were analyzed to come up with theories and so on. And so in the 1920s, John Watson, who had primarily worked with animals, said, well, let's actually rethink psychology along the lines of the kind of thing we do with animals. When I'm working with animals, I don't think about what are the thoughts or feelings those animals might be having. I just sort of focus on the behavior. I can see the environment that's producing it. These simple learning mechanisms that we know work in animals. Let's just use that as the foundation for our rigorous science. And that approach then said, let's not talk about mental states. Sort of a psychology without what I think many of us would think of as sort of key ingredients of psychology.
[14:19]
Gregory McNiff
Can you talk about the contributions of Skinner and Jerome Brunner in shaping behaviorism?
[14:25]
Tom Griffiths
Yeah. So basically, Skinner and Bruner were in some sense opponents, although in fact, they didn't interact very much directly in terms of these two perspectives. Where Skinner was Watson's intellectual heir, he sort of took up the flame of behaviorism. When Watson had to leave the field due to a scandal. And he tried to extend the power of behaviorism even further, Thinking about, well, maybe it is okay to keep some of that subject matter of mental states in what we're studying. But what's not okay is using those to explain things. What we should be doing is explaining the things that people want to explain, using mental states in terms of our environment. So he would say, when you talk about having knowledge of something, what you're talking about is that you're going to change your future behavior as a consequence of past experiences that you had. And when you talk about having enjoyed something, what you mean is that you're more likely to do that thing again in the future. Right. Sort of mapping our psychological language onto things that are really expressed in terms of behavior. And so he was really very ambitiously trying to take on the full traditional subject matter of psychology, but doing it with these tools that came out of the way that we study animals. And he was kind of a genius for understanding how it is that animals learn. And then Jerome Bruner was a psychologist who had a very different perspective on what the subject matter of psychology might be. So he, during World War II, went to France, as it was recovering from the war, and was there as an official cultural attache and spent a lot of time hanging out with, you know, Simone de Beauvoir and, you know, this sort of John Paul Sartre, the sort of intellectual elite of France, and sort of came back to the United States with this sense that there was a lot of things in our experience which were not captured by the current psychology. And so he started to do experiments where he was looking at how people's experience changed the way that they would respond to something, that it wasn't just a matter of the thing in the world producing a response in them. It was that their interpretation of that thing in the world was what mattered for them responding in that particular way. And then he kind of made it okay to talk about these cognitive states. And then he also turned out to be one of the people who made this first connection between mathematics and being able to come up with rigorous theories of the mind.
[17:03]
Gregory McNiff
I want to turn to the symposium on information theory. Specifically. This was a conference that I believe Miller later described as being the moment at which cognitive science was conceived. The actual date is September 11, 1956. Could you talk about the significance of that, particularly relative to the idea that I believe there was now an acceptance that mathematics could be used to express precise hypotheses around the thought and language of the mind.
[17:35]
Tom Griffiths
In the 1950s, there were two things that began to shift. So one important thing that happened was Bruner, who was trying to think about, how do you do psychology? You know, talk about these sort of mental states. Went to the Institute for Advanced Study here in Princeton, and he hung out with John von Neumann, who was building a computer on the institute grounds and was dealing with kind of questions like, how do you build a memory system? How do you search for information? How should you represent that information? And from that, Bruner came back and he was like, okay, actually, logic is a really powerful tool. I can use that as a way of studying something like cognition. So he'd sort of made this connection to mathematics. Miller had made his own connections to mathematics through information theory and had sort of been exploring how information theory might explain some of the things that are going on in the way that people perceive the world and maybe use language. And then this meeting at MIT on September 11, 1956, Symposium on Information Theory, brought together a group of people who had all independently been pursuing these things and sort of provided a nucleus that sort of demonstrated, oh, this is a new approach. So those people were Miller, who talked about his work on information theory. If you've ever heard of the magical number seven, that's the work that Miller had been doing looking at these sort of information processing capacities in human minds. The second was Herb Simon and Alan Newell, who had just come from an event that was the first sort of conference on artificial intelligence, the place where the term artificial intelligence was created. And they'd created arguably the first artificial intelligence system and the first computational model of cognition, which was a system that could make mathematical proofs in logic using ideas that came from how they thought human mathematicians might solve those problems. And then the third was Noam Chomsky, who was working on using sort of structures, similar to the kinds of structures that you see in formal logic, to characterize the structure of human languages. And so out of the convergence of those three things, Miller had this experience where it was suddenly clear that, you know, mathematics was going to be a tool that they could use to have the rigor that they needed in order to be able to study how minds work.
[20:02]
Gregory McNiff
Great. I want to move on to some of those names, particularly around computation. Could you talk about the contributions of Turing, Shannon and von Neumann and building a universal computer that could think?
[20:14]
Tom Griffiths
George Boole had done this fundamental work of giving us a definition of what mathematical logic was and giving us an example of a kind of system like that, but it wasn't a system that you could do anything with other than as a human mathematician. It allowed a human mathematician to derive the consequences of these things. And Leibniz before Boole had sort of seen that if you could turn thinking into mathematics, you could think about making machines that could think for you. He'd developed sets of calculators that could automate sort of arithmetic. And so it was like, oh, well, maybe you can automate this process of thinking. But the real progress in that didn't happen until the start of the 20th century, when Turing, who was trying to solve a mathematical problem about the limits of mathematics, had this idea of defining a machine that does the things that a human mathematician would do. Right. So breaking down what it is to do mathematics in terms of a sequence of steps where you're writing things down and then you're sort of moving over, writing something else down. You read something, it changes what you're thinking. You write something down. He described what all those steps are and then used them to define an abstract machine that's called a Turing machine. That's the foundation for what we now call a computer. And then turning that into a physical system would take a little bit longer. So there was work that was done by Claude Shannon, who was also the inventor of information theory. The approach that Miller had been using, which was really focused on the information that's contained in signals, things like language. Shannon worked out how to take Boole's principles and turn them into a recipe for building electrical circuits. So if you could write down a formula using Boole's logic, you could now create an electrical circuit that instantiated that formula. And then von Neumann, along with a group of other people, developed digital computers that were built on these principles. In particular, von Neumann played an important role in developing the first programmable digital computer. And so that was a computer that really instantiated Turing's idea of being able to write information to memory and erase things that are in memory in a way that made it much more flexible than the previous kinds of computers that existed.
[22:36]
Gregory McNiff
And Turing and Shannon, did they meet at Princeton in the late 30s? They did interact at one point.
[22:42]
Tom Griffiths
Yeah. All of these people had these sort of moments where they interacted with one another. So von Neumann and Turing knew each other, and Turing actually did his Ph.D. at Princeton. So his paper that he wrote about the Turing machine was actually mailed into. The final version was mailed into the. The journal from a place that's a couple of miles from my house over.
[23:06]
Gregory McNiff
Here.
[23:09]
Tom Griffiths
And interacted with von neumann. During his PhD, Shannon was at The.
[23:17]
Gregory McNiff
Bell Labs in Jersey.
[23:19]
Tom Griffiths
Yeah, Bell Labs, which was sort of nearby. And during the war, both Turing and Shannon were engaged in war related intelligence activities and they ran into each other in the course of that. But of course they weren't cleared to talk to one another about any of the things they'd been doing. So they had a conversation about brains. Right? Like really about. Like, how do you use mathematics to understand how brains might work? The world moves fast. Your workday even faster. Pitching products, drafting reports, analyzing data. Microsoft 365 Copilot is your AI assistant for work built into Word, Excel, PowerPoint and other Microsoft 365 apps you use, helping you quickly write, analyze, create and summarize so you can cut through clutter and clear a path to your best work. Learn more@Microsoft.com M365Copilot this episode is brought.
[24:18]
Commercial Narrator
To you by Indeed. Stop waiting around for the perfect candidate. Instead, use Indeed sponsored jobs to find the right people with the right skills fast. It's a simple way to make sure your listing is the first candidate. C. According to Indeed data, sponsored jobs have four times more applicants than non sponsored jobs. So go build your dream team today with Indeed. Get a $75 sponsored job credit@ Indeed.com podcast. Terms and conditions apply.
[24:44]
Gregory McNiff
Yeah, I give you credit, by the way. I've read so many books on Turing and I never knew he took a class with victims dying. I think at Oxford I might have that wrong, but I think he.
[24:55]
Tom Griffiths
Yeah, at Cambridge.
[24:56]
Gregory McNiff
Yeah, Cambridge, of course. 50. 50. And I get it wrong. And then even more amazing, Wittgenstein designed a. Was it a propeller for a plane?
[25:06]
Tom Griffiths
Yeah, yeah, that was. Yeah. I mean, he had an engineering background and you can. You can find. So for Wittgenstein's lectures, because Wittgenstein didn't write a ton, right. His lecture notes have become sort of an object that people study. And so there's one of the sets of lecture notes, which are his lectures on mathematical logic that are actually lectures on the foundations of mathematics where there are some exchanges between Wittgenstein and Turing. So Turing's just sitting in the class as a student and sort of asks questions and then Wittgenstein responds to them.
[25:39]
Gregory McNiff
Yeah, I think he described him as a bit odd, but we really don't have time to go so much into Wittgenstein. But I think he had an influence on Eleanor Roche as well, and she suggested she was the only student who actually, I guess, imbibed or learned his lesson not to pursue philosophy. And I'll ask you about her in a few minutes. But I think made sizable contributions to this notion of categories that you were discussing earlier. I want to ask you just briefly, because your background, you're at Princeton. I think you did your PhD at Stanford. In the book you reference, I forget who this nomenclature of East Pole, West Pole. Do you feel like you. You've got both of the polls covered there, given your background and history?
[26:17]
Tom Griffiths
So one of the things that happened after this cognitive revolution, right, we talked about this symposium that happens at mit. MIT becomes a center for people who are interested in, in particular, this symbolic way of thinking about minds, right? So the Chomsky style approach, you know, thinking in terms of things that are kind of like logic. One of the challenges that Chomsky ran into is that having characterized language as this very complex mathematical object, he then didn't have a way to explain how children could do anything like learning that language from the information that they get. So Chomsky ends up postulating, maybe they don't learn that much at all. Maybe they have these sort of very strong constraints, and they're just sort of setting a few bits through the information that they're getting in order to allow them to figure out, okay, it's this particular structure rather than that particular structure. And so that idea that symbolic representations are important and learning is strongly constrained by our natures was something that became pretty strongly associated with and is still somewhat associated with that sort of Cambridge, Massachusetts area. And so a philosopher characterized that area as the East Pole of cognitive science in the sense that when you're at the North Pole, any direction you go is south. When you're at the East Pole, any direction you go is west, right? So we're sort of defining an orthodoxy in this view of how the mind worked, where now if you travel anywhere away from Cambridge, Massachusetts, you end up with a different perspective. So, yeah, I actually did part of my PhD, I was at MIT, and then part of my PhD, I was at Stanford. So I had the chance to enjoy the sort of East Pole culture. It's diminished to some extent. You really find much more homogeneity and views sort of geographically as a consequence of us being able to communicate and travel much more. But I'd say that there's still some truth to that being a more dominant way of thinking in that part of the world.
[28:22]
Gregory McNiff
That's interesting. I would definitely want to move on the Chomsky. But one last question. We talk about mathematics, and we're clearly going to get into Bayesian probability But you write about the importance of geometry as well and say geometry can also be expressed as a formal system. What's particularly unique about geometry and helping us understand the logic and the rules here?
[28:43]
Tom Griffiths
So the insight that moved people away from this rules and symbols perspective was wanting to be able to capture sort of gradedness. Right. Like the fact that things aren't perfectly yes or no, true or false. And one way that you could do that is by thinking about things as being more or less distant from one another. And so in doing that, you're making an interesting assertion, which is that maybe one mathematical system we have, geometry, would work as a way of describing a thing that we do. Right. Thinking. And so you're sort of asserting that the axioms that characterize lines and points in space and so on, are now going to be able to characterize concepts and the relations between them and so on. And so that's a lot of what cognitive scientists do is exploring whether a particular kind of mathematical system is a good fit for an aspect of thought. And so geometry has been one of those approaches that people have tried to map onto thinking.
[29:40]
Gregory McNiff
Okay, I want to move to one of the more, I guess, contentious sections, and it's multiple sections because you circle back later in the book. But could you briefly talk about Chomsky's contribution to linguistics? Regardless of how you feel about him, he certainly dramatically influenced the field.
[29:58]
Tom Griffiths
Yeah. So what Chomsky did in and what he presented in that 1956 symposium was a recognition that the structure of language might be more complex than people had been thinking at the time. So if you were an information theorist or a behaviorist, you would think that to characterize the structure of language, it would be sufficient to count up, say, how often each word is followed by another word. And that these kinds of statistical relationships between words might be enough to explain how languages worked. Right. So that was important for the information theorists because they wanted to be able to say, build codes that are efficient codes. And to do that you need to know what the probabilities of words co occurring are. And it was going to be sufficient for the behaviourist, because if what really mattered was how often one word followed another word, then you could learn that just by learning associations between those words. And maybe that gave us a story for how it is that children learn a language. And Chomsky showed that that way of thinking just was not going to work for capturing the structure of human languages. And so he gave sort of simple examples that illustrated that in fact languages have structures that need something which is More like a symbolic grammar to be able to capture. And then from that he ended up, because he'd said, now language is a much more complex mathematical object, sort of having to say, and maybe it's not something that we are doing a lot of learning about, but rather it's something that we sort of have built into us to some extent.
[31:32]
Gregory McNiff
Okay, Chomsky actually concludes that English is not a finite state language. What does that mean?
[31:39]
Tom Griffiths
So a finite state language, the easiest way to think about it is.
[31:44]
Gregory McNiff
If.
[31:44]
Tom Griffiths
You imagine a board game, right? So one of these very basic board games where there's no dice involved, you just get to sort of make decisions about where it is that you move forward on each turn. So you've got a board game, you have positions that you can move to from your current position. And they'll say, each time you move from one position to another position, you're going to produce a word. So that's a finite state grammar has a finite number of states. Those correspond to the positions that you can move to on the board. And then as you're moving between positions, you're producing words. And so that structure is one that can produce different sentences just by following different paths through those positions on the board. One of the properties of that finite state grammar is that it doesn't do a good job of situations where the very start of a sentence is important to the very end of the sentence. Because if you needed to do that, you would need to just have, like, a path, a unique path for every sentence from the start to the end, right? And there's no way to do that with a finite number of states. So you could think about sort of sentences like, let's see, the mouse the cat chases, runs, right? Sort of makes sense. The dog, the mouse the cat chases, likes, runs, is also a grammatical sentence in English. You have to work a little harder to figure out what it is. You can sort of keep on building up complex sentences where you're adding more nouns at the start and more verbs at the end. And Chomsky said, because these sentences are a grammatical sentence in English, and because whether it's dog or dog is at the start of the sentence determines whether it's run or runs at the end of the sentence. That means that it'd be very hard to capture that in a finite state language. And in fact, it's impossible if you allow sentences that have infinite length. And so, as a consequence, he said, English is not a finite state language.
[33:40]
Gregory McNiff
In describing Chomsky's contributions and insights, you write human Behavior is filled with hierarchical structure, and phrase structure grammars are a way to describe that structure and generate new instances of it. Could you briefly explain that?
[33:54]
Tom Griffiths
So a phrase structure grammar is the solution that Chomsky had to that problem. And basically, it's a way of building up a sentence in a much more sophisticated way. So if you ever at school had to diagram a sentence where you're like, this is the noun phrase, this is the verb phrase, this is the noun, this is the verb, and so on, and you sort of had to say what the parts of speech were for all of the words. And if you ever had to draw a tree for a sentence, you were in doing that using something like a phrase structure grammar. So phrase structure grammar is a recipe that says, start your sentence, and then you have a rule that tells you what are the kinds of pieces that you can build that sentence out of. So you might build it out of a noun phrase and a verb phrase. Then there's another rule that says if you have a noun phrase, then that noun phrase starts with a determiner, like the. And then a noun, right? And if you have a verb phrase, that verb phrase has a verb, right? And, you know, it sort of builds up the sentence step by step. And when you do that, you can create grammars that are powerful enough that they can capture the complex phenomena that the Chomsky had shown English possesses.
[35:01]
Gregory McNiff
Okay, you end that chapter by noting there is the possibility that the real problem is that language acquisition shouldn't be thought about in terms of logic at all. And then you move on to a discussion of deduction, induction, and abduction. Could you explain those terms and how we should think about language not using logic?
[35:22]
Tom Griffiths
Where Chomsky got stuck in trying to think about learning language was that he was thinking about it in terms that were kind of like the way that logic works, right? So logic tells us how to go from things that we absolutely know are true to other things that we absolutely know are true. And if you try and apply that approach to something like language learning, you run into a problem, because there's a particular situation that can arise where you can never get information that lets you know that you have made a mistake. And that particular situation is, if you spoke a language which was English plus a few extra sentences, you could never, just by hearing people speak English, discover that you were wrong about that language. So those extra few sentences, you would never hear. But that's okay. You would never hear anything that was inconsistent with your understanding of how the language worked, because you would hear all the Sentences in English which are inside your language, and the fact that you never heard those other things wouldn't be a problem. And so you could end up with everyone having a different idea of what the language they were speaking was because they weren't getting the information they needed to sort of logically rule out that hypothesis that they had. That was the incorrect hypothesis. And so Chomsky's conclusion is you need some kind of constraints to help you narrow down and get the right kinds of languages. It's called the logical problem of language acquisition. And so that problem was a consequence of thinking about learning as something like deduction. Right. As something like logic. In fact, philosophers distinguish between different kinds of problems. There's deduction going from things that are certainly true to other things that are certainly true. And there's induction, which is more like seeing a thing a bunch of times and then concluding that that thing is always going to happen. Right. You get up in the morning, sun rises. You get up in the morning, sun rises. You get up in the morning, sun rises, get up in the morning, you expect the sun's going to rise the next day. And then abduction is another kind of thing, which is if you see something and then you come up with an explanation for it, so something happens, and then you say, oh, that must happen because of this kind of thing. And so in neither of those cases, induction or abduction, are you doing so with certainty. Right. You're drawing a conclusion, but it's a conclusion which is an uncertain conclusion. And so that sets up a new problem, which is, how can we characterize induction and abduction with the same kind of mathematical precision that Boole and his successors gave to mathematical logic?
[37:57]
Gregory McNiff
Tom, you read your book very well. You anticipated my next question, which was, basically, we're moving to inductive inference, and, you know, we want to provide the rigor of mathematical theories to understand how the brain works. And I wanted to make sure that's where we're going. Right. Perfect. Okay. As I alluded to earlier, another of Wittgenstein students was Eleanor Roche. Could you briefly talk about her contributions to understanding categories? I. It was actually fascinating. I know. I'll let you speak. But she compared, replicated an experiment, I think, in her trials outside England with some students at Harvard and got very similar, Close to similar responses.
[38:43]
Tom Griffiths
Yeah. So Roche was a graduate student at Harvard, and she was interested in trying to understand how we form categories. This was a preoccupation at the time. So she was sort of working a little bit after the cognitive revolution, trying to make sense of what categories are like. And there was a study at the time which showed that if you asked English speakers about colors, they were quicker to identify the colors that were sort of the best examples of that particular word. Right? So you're quicker to say that a really blue. Blue is blue. Right. And then she had the opportunity to go to Papua New guinea, totally different culture from Harvard, and she ran some of these experiments there. They spoke a very different language, one where in fact, they only had sort of two main color terms. One which is roughly sort of light colors and warm colors, and one which is. Sorry, they're these two terms which basically split up along a sort of light, dark, warm, cool axis, the space of all of the color terms. And despite having this very different way of dividing up the space of colors, it seemed like the results that she got were very similar and that those speakers of these very different languages were also quick to identify very blue blues and very green greens and so on. And so from that, she came up with an interesting hypothesis, which is that maybe it's not that the words that we use for the colors that are sort of giving us the best examples of those colors, right? It's not a blue blue, because there's a bunch of things that are blue. And then we end up with that being a sort of prototypical representation of it. It might be the other way around, which is that we have these sort of good examples of things and then we kind of build categories around them. And so her characterization of what categorization is is it's not rules, right. Which is what the logical theory had said. It might be. It's more like what was called a family resemblance structure. This is the idea that she got from Wittgenstein, which is that we're sort of making generalizations about what the borders of categories are by comparing how similar things are to other things. And sometimes those things are just sort of like good examples of things that are given to us by our perceptual systems.
[41:14]
Gregory McNiff
Follow up there. Why is the idea of fuzzy category boundaries so important?
[41:19]
Tom Griffiths
Logic tells us things are true or false, right. And doesn't have a lot of room in between those. There are other kinds of logics people have developed that try and have sort of fuzzy logic or have multi values and so on. But the fundamental idea is that if you're going to talk about, you know, what it is to be a cat or something like that, you're going to have a logical definition that tells you absolutely. If the thing has these properties, then it's a Cat. And if it doesn't have these properties, then it's not a cat. And so having fuzzy boundaries in our categories, another thing that Roche showed. So she did a lot of work on what's called typicality, showing that a chair is a typical piece of furniture and a rug is not. And so when people think about things, that typicality influences all sorts of other things that people do when they're thinking about things. For example, people are quicker to identify a robin as a bird than a penguin as a bird, just because they're sort of radically different in terms of how typical they are as birds. So that notion of fuzziness and typicality was not something that was easily captured by the logical perspective. And so it encouraged thinking in different terms about what categories might be. Well, the holidays have come and gone once again, but if you've forgotten to get that special someone in your life a gift. Well, Mint Mobile is extending their holiday offer of half off unlimited wireless. So here's the idea. You get it now, you call it an early present for next year. What do you have to lose? Give it a try@mintmobile.com Switch limited time.
[42:47]
Commercial Narrator
50% off regular price for new customers. Upfront payment required $45 for three months, $90 for six month or $180 for 12 month plan taxes and fees. Extra speeds may slow after 50 gigabytes per month when network is busy. See terms.
[42:57]
Gregory McNiff
You talk about how colors on the electromagnet spectrum are wider apart, I'm paraphrasing here, than maybe mentally how we associate them. You give the example of red and violet being physically farther apart, but yet we, I guess, make an association between them relative to other colors like blue, green or yellow. The way to make sense of this is to realize that our internal representations can have a different geometry than the external world. And here I want to flag another use of mathematizing this thought by Roger Shepard. He realized that he needed a way to recover these psychological spaces from people's similarity judgments. He found it a mathematical method called multidimensional scaling. Could you briefly describe why that was so useful in sort of extracting these perceptions from people's mind?
[43:46]
Tom Griffiths
Yeah. So you can think about Roche as having shown that we have this fuzziness in the way we think about categories. One way you could try and capture that is by assuming that objects are points in space and we're sort of somehow measuring the distance between things as a way of sort of capturing the similarity between them. So that kind of perspective opens up the possibility that we might be able to identify what those spaces are based on people's judgments of similarity. Roger Shepard developed a technique for doing that called multidimensional scaling, that goes from people's similarity judgments to a guess about what their internal geometric representation might be. So if you apply multidimensional scaling to colors, you discover people's internal representation is more like a wheel, Whereas the physical representation of color in terms of wavelength is more like a line. And so we can discover that there's a difference between the geometry of the world and the geometry inside our heads. And that opens the door to then being able to explore our internal world using this mathematical tool in much the same way that the external world has been studied for many years.
[44:56]
Gregory McNiff
Interesting. I want to move on to neural networks and learning. You describe, neural networks can be seen as a way to transform one space into another. What do you mean by that?
[45:07]
Tom Griffiths
One of the challenges that comes up when you start thinking in these spatial terms is now we have this idea, maybe concepts are regions in space and objects are points in space. But we've lost the thing we had with logic, which was a way of doing computation, right? We had this transition from being able to write things down as logical formulas to then having a way of thinking about what thought is in terms of the computations that you perform on those formulas. And neural networks are the solution to that. They're a way of mapping information from one space to another so you can think about what a neural network is. So neural networks are a sort of computing approach that's inspired by the brain, where you have a set of nodes that are connected to other nodes, and information flows through those connections between those nodes. But if you think about the activation of nodes, as each of those nodes is one dimension of a space, then you can think about their activation as a point in space. And the neural network's weights are a function that maps you from one point in one space to a point in another space. And the power of neural networks is because those connections between nodes can be adjusted through experience. They give you a way of learning the functions that transform you from one space to another, and a way of kind of learning how to do the computations that you want to do on those spaces.
[46:23]
Gregory McNiff
How did Donal Hebb's theory of learning differ from Rosenblatt's perceptron?
[46:28]
Tom Griffiths
Donald Hebb was a Canadian neuroscientist who came up with one of the earliest mathematical descriptions of what it is that neurons might be doing when they're learning. And the basic idea was that they were forming associations. And the way that they would form associations is that if one neuron fires. So if one neuron gets activated and another neuron fires at the same time, then the connection between them will get stronger. Right? So if a neuron and its neighbor are firing at the same time, then they build a stronger connection. The neuroscientist Carla Schatz sort of reduced this to a slogan. Neurons that fire together wire together. Frank Rosenblatt was a psychologist who really wanted to develop a mathematical model of brains as a whole and started doing that by working on the visual system. And he built something called a perceptron, which was really one of the first neural networks that was able to. To learn through experience. And Rosenblatt had originally gone and sort of talked to people in Hebb's lab to try and sort of find out how they were thinking about learning, and then decided that wasn't going to do what he wanted it to do and came back and then came up with his own learning algorithm. And Rosenblatt's learning algorithm had the critical difference that it was focused on not whether things are happening at the same time or sort of the shared activations between things, but on when the system was making a mistake. So in Hebb's learning algorithm, in Hebb's learning algorithm, you just depend on these associations. In Rosenblatt's learning algorithm, the neural network would take a picture as input and produce an answer. Say, is this a triangle or a square? And then based on whether that was the right answer or not, it would adjust its connections. And so if the output that it was producing was too high, it would adjust a connection down. If the output it was producing was too low, it would adjust the connection up. And so it was making changes to the weights based on making mistakes, rather than just based on associations between things you write.
[48:32]
Gregory McNiff
In general, adding more layers to a neural network makes it easier for it to compute more complex functions of its inputs. Why is this case?
[48:40]
Tom Griffiths
You can think about our neural network, as I said, a mapping from one space to another. If you just have one set of nodes going in and one set of nodes going out and one set of connections between them, then you're limited in the kind of form that that function can take. In particular, you're limited to what we call linear functions, right? So functions that, if you imagine sort of trying to characterize. If you're learning how to classify things into one class or another, you'd be restricted to learning things that were a straight line, a boundary between those classes that was A straight line through the space, right? So that's a very restrictive assumption. In the kinds of functions that be computed, if you add another layer of weights on top of that and have some non linearity between them. So something which is, you know, sort of like neurons do something where they accumulate inputs and then fire. And so if you did something like that, that's a non linearity because they're not just adding up what comes in and then putting it out. Then you get something which is able to now compute non linear functions. So that's the first thing is that just by having two layers with a non linearity in between, you expand the set of functions you can compute. And then one of the things that people subsequently discovered is that as you add more layers into your neural network, it makes it possible for that neural network to kind of have internal representations that cover more of the input space. And so you can start building up more complex representations as they go through the network.
[50:14]
Gregory McNiff
What is gradient descent, and would you consider it a great name for a band?
[50:18]
Tom Griffiths
Gradient descent is the learning algorithm that followed Rosenblatt's original work. So Rosenblatt kind of derived his learning rule just by thinking about how should you change your weights in response to a mistake, and was able to prove, working with some mathematicians, that this was a really good way of designing a learning rule. With that learning rule, the system could learn any function that it could represent, which was a cool result. Gradient descent is a way of deriving learning rules where we say there's some function that we care about, in this case the error of this neural network, and we want to make that error as small as possible by adjusting the weights of the neural network. One way that we can do that is by saying, well, based on the location where we are, if I change the weights in a particular direction, is the error going to go up or down? And the answer to that question is what's called the gradient of the function, right? Whether the function is going up or down based on the values of the weights at that point, we can get more into it. But in calculus class, you learn how to calculate gradients as the derivative of the function. And there's an endnote about it for people who want the technical details. But it's not something you really need to understand. To understand how this algorithm works, you can kind of think about an intuitive version of this, right? If you imagine that I take you somewhere in Princeton and you are actually, this is going to work better if it's not Princeton, because you don't have to Worry about running into buildings, maybe I'll take you out into a field. And your job is to find the highest point in that field. Right. One way that you could imagine doing that is based on where you are standing, move your foot around and sort of feel around the ground around you. And then you could think about, if you want to find, sorry, the lowest point in that field, you would feel around and then maybe take a step in the direction where the ground seems to be going down the most. That algorithm is gradient descent. That's exactly what gradient descent does. So it's a way of deriving what the learning rule should be for the neural network by figuring out in what direction the error is going to go down. And it turns out, doing that math, you end up with something which is very closely related to the original learning rule that Rosenblatt had found then. I don't know if it's a good name for a band, but there are various mathematician bands that have names after algorithms. Um, it. It was the subject of a joke. I go to a conference called the Neural Information Processing Systems Conference. And for the, you know, the early years of that conference, it was always held somewhere that was near a ski resort, because the workshops at the conference would be held at the ski resort. And so people would go to talks in the morning and talks in the afternoon. And then there'd be like a break from 10am to 3pm and so people would be saying that they were doing, you know, gradient descent during the break.
[53:14]
Gregory McNiff
No, that's awesome. And I should say, you have really nice charts that, for me, particularly explain the sort of local minimum problem. I'll let the reader get into what exactly that is, but it's something that sort of clicked reading your book with the charts and the explanation. I want to move on to another real breakthrough, which is back propagation. Could you describe why that was so important?
[53:34]
Tom Griffiths
Rosenblatt had solved the problem of identifying a learning algorithm that worked for a neural network with one layer of weights. He also thought about neural networks that had more layers of weights. He defined networks that had this property, even came up with a learning rule. But his learning rule didn't always work. Backpropagation is the learning rule that generalizes that approach that Rosenblatt had taken to multilayer neural networks. And the big problem that you have when you have a multilayer neural network is if you're updating the weights based on the error, then when you go back from the first layer, you have to say, okay, how do I work out how responsible a weight that goes deeper in the network is for producing an error a few steps into the future. And so it turns out that there's a simple answer to this. Dave Rumelhart and Geoff Hinton worked out what that solution looks like. You get it by applying the gradient descent, calculating the derivative, and sort of pushing that back through the network. It turns out that, in fact, the math that you need to do that is something that comes from Leibniz, which is a nice connection back to our sort of earlier cognitive scientists. But that procedure is basically the way that you work out what the error is that you should be using for updating a weight that's deeper in the network is by. There's a simple formula that sort of says the responsibility it has for the error is based on the influence that it has on the nodes at the next level up. And you sort of sum all those up and pass those back. So the reason it's called backpropagation is that you can think about activation goes forward through the network, and then error flows backwards along the weights of the network and is accumulated back at the nodes inside the network.
[55:19]
Gregory McNiff
You write, the large language models at the heart of modern AI systems are the descendants of these early neural network models of language and the beneficiaries of insights from generations of cognitive scientists. We've been talking about a few what are the capabilities and limitations of large language models?
[55:36]
Tom Griffiths
So, I mean, large language models are remarkable be the first thing that I would say. I think they're powerful in ways that transcend perhaps even the expectations of the most optimistic people who had created the earlier systems that they came from. One of those people is Jeff Elman, who developed these early neural networks that looked at predicting the next word in a sequence. And that's really the same training procedure which is used for our modern large language models. The other thing I would say is, I mean, when we compare the systems that we end up with to human cognition, they're importantly different in a few ways. One is that humans learn from far less data than those large language models do. Right. So in some sense, the models are a challenge to Chomsky, who said, we're not going to be able to learn language from our input. We have to have strong constraints. But there's another sense in which Chomsky was quite right, which is that in order for the models to learn what they learn, they need on the order of between like, 5,000 and 50,000 years of continuous speech input, Whereas a child learns language in, you know, more like five years. Right. So there's Sort of orders of magnitude different in the amount of data that they need in order to learn language. Another important difference is in the generalizability of the solutions that they find. It really does seem like they're kind of getting to a similar point in terms of being able to use language and being able to act intelligently to humans, but they're getting there via this very different route where the solutions that they find are not necessarily things that are intuitive to us. And the kinds of ways in which they fail are often surprising. So our AI systems today have a property people call jagged intelligence, where you. You see, they can do really well on one problem, and you take another problem that's right next to it, and they do really poorly. And the reason why that's counterintuitive is that it's a consequence of the way in which the models are trained and then finding solutions that are not like our sort of human solutions. And so the generalizability of that intelligence is quite different, where if you had a human being who was able to solve Math Olympiad problems, they would probably be able to do all sorts of other kinds of things reliably. But a neural network that can solve methylimpiad problems is not necessarily going to be able to do other kinds of things, even, maybe even simpler problems, and sort of screws up in ways that surprise us.
[57:51]
Gregory McNiff
David Marr published a paper with a colleague in which he wrote, complex systems like a nervous system or developing embryo must be analyzed and understood at several different levels. What was his insight regarding the computational level in terms of explaining how we think how the mind works?
[58:08]
Tom Griffiths
Ma was really a pioneer in thinking about the idea that when we're trying to study information processing systems like human brains, we're going to end up coming up with different kinds of explanations that are not necessarily in competition with one another. So he distinguished between three different levels that we can explain information processing systems. At the computational level, which is the abstract problem the system is solving, and the ideal solution to that problem, the algorithmic level, or level of algorithm and representation, which is about the actual concrete processes which are used to try and, you know, approximate that solution. And then the implementation level, which is about how those algorithms and representations are realized in some kind of physical substrat. And so if you think about human minds, then the implementation level is what's going on in neurons. The algorithmic level is what's going on in terms of cognitive processes. And the computational level is this more abstract level, maybe where our laws of thought reside. Right, which is about what the ideal Solutions to the problems that human minds have to solve look like. And Meagher, as a neuroscientist, was radical in arguing that, in fact, thinking at that computational level is particularly important for structuring the questions that we ask at those other levels. So he said something like, trying to understand the brain by studying only neurons is like trying to understand bird flight by studying only feathers. Saying that if you want to understand how a bird flies, a really important part of that is understanding the general principles of aerodynamics. And those principles of aerodynamics answer all sorts of questions about why birds wings are shaped the way that they are, how much they have to flap, how much they need to weigh, all of these other kinds of things. And if you were just looking at feathers, you'd miss the fact that the whole system is being shaped by this broader set of computational principles.
[60:00]
Gregory McNiff
You have a fair amount of discussion around Bayesian probability and whether or not individuals use Bayesian probability. And you introduced Kahneman, Tversky. I think they had certain objections to it. Do we reason with Bayesian probability, or how does that factor into decisions that the mind makes and how the mind works?
[60:22]
Tom Griffiths
So there are two distinct questions we can ask now that we have the toolbox that MA gave us, right? So one is, is Bayesian probability useful for understanding human cognition? Right. And we can ask that question at the computational level. Is it a good characterization of the problem that human minds have to solve and what the optimal solution to the problem looks like? Like. And then the other question at the algorithmic level is, do people's sort of cognitive processes work in a way that allow them to do a good job of approximating Bayesian inference? And so psychologists kind of have gone back and forth on the utility of Bayesian thinking for the reason that these two things are distinct from one another. Carmen and Tversky showed very convincingly that if you ask people questions that require Bayesian reasoning and express them in terms of things like probabilities, people do very badly. They make mistakes that are not consistent with probability theory, not necessarily consistent with Bayes. But that doesn't mean that Bayes isn't a valuable tool for making sense of how people do things like learn, make inductive inferences, integrate prior experiences or constraints on learning with the data that they see. And so making that distinction is useful also, because we can go back and we can look at some of the things that people do that are strange when you sort of ask them these questions framed in terms of probabilities, and say, well, maybe they were solving a Different problem. And then when we analyze that different problem from the perspective of Bayesian inference, we get something which kind of makes a little bit more sense.
[61:47]
Gregory McNiff
Perfect. I referenced ChatGPT earlier in this interview. I'd like to ask you, why are they such a good example, or do they do so well at learning human languages?
[61:59]
Tom Griffiths
So large language models combine together, importantly, all three of these different frameworks we've been talking about. Right? So the neural networks part, they make use of these sophisticated methods that we have for training neural networks, and they also make use of modern neural network architectures that make it easier for those models to learn complex functions. And so they give you a tool which can be used for learning any complex function given enough data. And as a consequence, give us ingredients that we would need in order to get something that can represent something as complicated as human language. In turn, the data that they're trained on is language, which captures a lot of the things that we would want symbolic systems to capture. It has all the properties of hierarchy and compositionality and all of these things that were characteristic of language that Chomsky pulled out. And in fact, they're also trained not just on human languages, but they're also trained on code. And code has a structure which actually corresponds even more closely to the sorts of phrase structure grammars that Chomsky was interested in. And so you can think about language as being a really good substrate for developing intelligence in terms of the structure that it has, as well as it being an encapsulation of human knowledge and experience that the model is able to learn from. And then the reason why it's successful in learning language and being able to use that to answer other kinds of questions is the way that that problem is presented to the model. So the models are trained to predict the next token, so next word or part of word, based on the sequence of the words that appeared previously. And that training means that implicitly, what the model is doing is learning a probability distribution. So by learning the probability of the next word and then learning the probability of the next word after that one, and learning the probability of the next word after that one, this neural network is learning an enormous probability distribution over sentences and over sort of entire sort of documents, pieces of text. And so what the system has learned how to do is given now some arbitrary text as an input. It can use that information, do something like Bayes rule, to condition on that information, to sort of update its beliefs, and then generate an answer which takes into account the information which is contained in the data that it's seen in a way that's appropriately Bayesian. And so it's really putting those three things together in a way that then allows it to capture some of these important ingredients for intelligence.
[64:33]
Gregory McNiff
Fascinating. Last question. Tom, you write in the final chapter, putting it all together. Cognitive science has made a lot of progress in understanding the laws of thought, much of it in the three decades since I was drawn to the field. But we still have plenty of mysteries to inspire new generations of cognitive scientists. What are some of those mysteries? And what would you say to those budding cognitive scientists that are probably undergrads these days?
[64:56]
Tom Griffiths
Some of the mysteries that we still have are about this big gap that we have between humans and our best AI systems, right? So how is it that humans are able to learn from so much less data than our AI systems? In machine learning and cognitive science, we call this inductive bias, right? Humans are bringing something to the problem which is not captured by our current models. We'd like to understand what it is that those humans have. I think that thing is also going to be useful in understanding why the models are sort of not generalizing, right? Why they sort of have this jagged behavior. Because our human inductive biases not only allow us to learn from small amounts of data, they also make us robust in the solutions that we find. And so being able to create systems that find more human like solutions, right? Not just being able to learn from less data, but being able to learn solutions that kind of just make more sense to us in terms of the way they're representing the world and something that is, again, something that can draw on cognitive science. Those are some of the questions that motivate me. And in the last chapter of the book, I also talked to a bunch of cognitive scientists who are in the earlier stages of their careers and really starting to ask new kinds of questions. Those new kinds of questions include things like how does this relate to consciousness? How do we connect these mathematical theories we have to our phenomenal experience? How do we work together, right? How do you use the insights that we have about how human minds work to understand how you can make sense of humans working together effectively or working with AI systems effectively and being able to do so in a way that is perhaps even better as a consequence of the information that we have about how those different kinds of systems work? How do we use our environment, creating artifacts and things like that that support our cognition? And how can we do a better job of supporting human cognition? How can we understand our cognitive processes, the processes that allow us to make decisions and then help to make people smarter by getting around some of the biases that we have and sort of thinking about how to use computation to support the intelligence that we have in a way that sort of expands beyond us and incorporates the insights that we can get from our AI systems. So I think for any budding cognitive scientist out there, there are still plenty of mysteries to explore. And we're at a really exciting moment where we've gone from having one system that was kind of like human intelligence to study, to now having a bunch of these systems that we've created, giving us lots of new insights, but also lots of new questions.
[67:25]
Gregory McNiff
Well, yeah, your book does a very nice job at articulating the questions and offering some very solid answers. Tom, thank you so much for taking the time to join me today for the conversation and for writing such a thoughtful and ambitious book.
[67:39]
Tom Griffiths
Thank you.