Transcript
Vishal Misra (0:00)
Anthropic makes great products. Plot code is fantastic, Cowork is fantastic. But they are grains of silicon doing matrix multiplication. They don't have consciousness. They don't have an inner monologue. You take an LLM and train it on pre1916 or 1911 physics and see if it can come up with the theory of relativity. If it does, then we have AGI.
Martin Casado (0:21)
Just today, by the way, Dario allegedly said that you can't rule out that they're conscious.
Vishal Misra (0:27)
You can rule out their conscious. Come on. To get to it's called AGI. I think there are two things that
Podcast Host (0:33)
need to happen five years ago, Vishal Misra got GPT3 to translate natural language into a domain specific language it had never seen before. It worked. He had no idea why. So he set out to build a mathematical model of how LLMs actually function. The result? A series of papers showing that transformers update their predictions in a precise, mathematically predictable way. In controlled experiments, the models match the theoretically correct answer almost perfectly. But pattern matching is not intelligence. LLMs learn correlation, they don't build models of cause and effect. To get to AGI, Misra argues, we need the ability to keep learning after training and the move from correlation to causation. Martin Casado speaks with Vishal Misra, professor and Vice Dean of Computing and AI at Columbia University.
Martin Casado (1:28)
Vishal, it's great to have you in.
Vishal Misra (1:30)
Great to be back.
Martin Casado (1:31)
This is one of my favorite topics, which is how do LLMs actually work? And I think that in my opinion, you've done kind of the best work on this, modeling it out.
Vishal Misra (1:39)
Thank you.
Martin Casado (1:39)
For those that did not see the original one, maybe it's probably worth doing just a quick background on kind of what led you to this point, and then we'll just go into the current work that you've been doing.
Vishal Misra (1:50)
Five years ago, when GPT3 was first released, I got early access to it and I started playing with it and I was trying to solve a problem related to querying a cricket database and I got GPT3 to do in context learning, few short learning, and it was kind of the first, at least to me, it was the first known implementation of RAG retrieval augmented generation, which I used to solve this problem of querying, getting GPT3 to translate natural language into something that could be used to query a database that GPT3 had no idea about. I had no access to GPT3's internal, but I was still able to use it to solve that problem. So it worked beautifully. We deployed this in production at ESPN in September 21st.
