Training Minds We Don’t Fully Understand - The Jaeden Schafer Podcast

Summary5 min read

Summary of "Training Minds We Don’t Fully Understand" Episode of The Joe Rogan Experience of AI

Podcast Information

Title: The Joe Rogan Experience of AI
Host: The Joe Rogan Experience of AI
Episode: Training Minds We Don’t Fully Understand
Release Date: July 21, 2025
Description: This episode delves deep into the collaborative efforts of leading AI companies to understand and monitor AI reasoning processes. The discussion explores the significance of "chain of thought" in AI models, the implications for safety and competition in the AI industry, and future prospects for unraveling the complexities of AI decision-making.

1. Introduction to AI Collaboration

The episode begins with the host highlighting an unexpected unity among top AI companies—OpenAI, Google, DeepMind, and Anthropic. These industry leaders have collaboratively published a paper advocating for the monitoring of AI's reasoning processes, termed "chain of thought."

Host [00:00]: "A bunch of industry leaders from top AI companies have all got together and published a paper urging essentially the top AI companies to monitor AI's thoughts and how it's actually arriving at questions."

2. Understanding Chain of Thought in AI

The host explains the concept of "chain of thought," likening it to how humans approach complex problems by breaking them down into manageable steps. This methodology allows AI models to reason through questions systematically rather than providing immediate, one-shot answers.

Host [05:30]: "It's like if a human is working on a complex math problem, they're reasoning, writing down notes—the AI models are doing the same."

3. Current Implementations and Variations

Several AI companies have integrated features that display the AI's reasoning process. Anthropic, Deep Seek, and Grok are mentioned as leaders in showcasing the AI's line-by-line thought processes, allowing users to trace how conclusions are reached. In contrast, OpenAI has maintained a more opaque approach, not revealing the detailed steps of their AI's reasoning.

Host [10:15]: "My favorite is Anthropic. I like how Deep Seek and Grok both show you line by line, the thought process that the AI model ran through."

4. Monitorability and Safety Concerns

The central theme of the discussed paper is "chain of thought monitoring," which provides insight into AI decision-making—a crucial safety measure. The researchers emphasize the importance of maintaining this transparency to ensure AI alignment and prevent unintended behaviors.

Host [15:45]: "Chain of thought monitoring represents a valuable addition to safety measures for Frontier AI, offering a rare glimpse into how AI agents make decisions."

5. Competitive Dynamics in the AI Industry

The host speculates on the underlying motivations behind the push for chain of thought transparency. One theory is that it serves as a mechanism to prevent the reverse engineering of proprietary AI models' "secret sauce," thereby maintaining a competitive edge. The collaborative stance of top companies may also be a strategic move to standardize safety protocols amidst fierce industry competition.

Host [20:30]: "If you're one of these AI researchers and you want to reverse engineer how other models are staying best in class, having chain of thought monitoring makes it easier to copy the secret sauce."

6. Industry Leaders and Future Commitments

Prominent figures like Ilya Sutskever (OpenAI), Jeffrey Hinton (Google DeepMind), Shane Legg (OpenAI), Dan Hendricks (XAI), and John Schulman (OpenAI) have endorsed the paper, signaling a unified approach towards AI safety and monitorability. Additionally, Anthropic CEO Dario Amadei has committed to unveiling the "black box" of AI models by 2027, aiming to demystify the algorithms behind AI reasoning.

Host [35:20]: "Dario Amadei has some really clever software and techniques that they're working on. His goal is to have cracked open the black box and explain exactly how the AI models’ algorithms work by 2027."

7. Implications for AI Alignment and Safety

Understanding the internal workings of AI models is pivotal for assessing their safety and alignment with human values. The upcoming transparency initiatives are expected to provide deeper insights into AI behaviors, potentially preventing scenarios where AI systems act unpredictably or undesirably.

Host [40:10]: "We don't really know how the AI models work. We just train the algorithm, and it gives us a good result. Understanding this is crucial for alignment and safety."

8. Concluding Thoughts and Future Outlook

The episode concludes with reflections on the critical juncture the AI industry faces. The collaborative efforts to monitor and understand AI reasoning are seen as vital steps towards ensuring the responsible advancement of artificial intelligence. The commitment to transparency by leading companies is anticipated to foster a safer and more predictable AI landscape.

Host [50:00]: "We're at this critical time where we have this new chain of thought thing. It seems pretty useful, but it could go away in a few years if people don't concentrate on it."

Notable Quotes:

Host [05:30]: "It's like if a human is working on a complex math problem, they're reasoning, writing down notes—the AI models are doing the same."
Host [15:45]: "Chain of thought monitoring represents a valuable addition to safety measures for Frontier AI, offering a rare glimpse into how AI agents make decisions."
Host [35:20]: "Dario Amadei has some really clever software and techniques that they're working on. His goal is to have cracked open the black box and explain exactly how the AI models’ algorithms work by 2027."
Host [50:00]: "We're at this critical time where we have this new chain of thought thing. It seems pretty useful, but it could go away in a few years if people don't concentrate on it."

Key Takeaways:

Leading AI companies are collaboratively emphasizing the importance of monitoring AI's reasoning processes to ensure safety and alignment.
"Chain of thought" methodologies mirror human problem-solving, providing transparency into AI decision-making.
There are strategic and competitive motivations behind the push for AI transparency, including preventing reverse engineering of proprietary models.
Commitments from industry leaders aim to demystify AI algorithms, paving the way for safer and more understandable artificial intelligence systems.

Conclusion This episode provides an insightful exploration into the current efforts and future aspirations of the AI industry regarding the transparency and safety of AI reasoning processes. By fostering a collaborative environment among top AI companies, the industry aims to navigate the complexities of advanced AI development responsibly.

Loading summary

Transcript1 lines

[00:00]
Host Name (e.g., Sam Altman)
In a very interesting move, there has been some, you know, unexpected unity in the AI market as a bunch of industry leaders from a bunch of top AI companies have all got together and published a paper urging essentially the top AI companies to monitor AI's thoughts and how it's actually arriving at questions. So today on the podcast, I want to break down what is currently being done in that regard, what needs to be done in the future, who's leading the way and. And what, you know, what this whole paper was all about as it has a lot of people talking. Before we get into that, if you want to try any of the top models from companies that I talk about on this podcast, I'd love for you to try out AI Box. AI, my personal company has just launched our beta, and you essentially get access to the top 40 AI models so you can chat with all of them in one chat thread. And in addition to that, we have something called Media Storage, where any file that you generate, whether that's images or whether that's audio, all of that gets stored in the media storage folder. So for me, this was something I really wanted, I hated with Chat, GPT or other platforms, you generate something in a chat thread and you couldn't find it afterwards. Like, you couldn't figure out where it was. So we have one place where everything is stored. You're able to access all of it, you're able to see the prompt that generated that particular image or file, and you're able to click and go back to the exact thread that was used to generate it. So there's a ton of cool features in here. I'd love for you to try it out. You can. It's $19 a month, so that saves you these $20 subscriptions that every platform has. You can get all of them all on one platform for one price. You go check it out. AI Box AI. The link is in the description. All right, let's get into what the researchers have been saying in regards to AI's thought process. So the top. The companies that essentially are kind of involved in this is OpenAI, Google, DeepMind, and Anthropic. And not really the companies, but top researchers at all these companies came together, put out a paper, they all kind of signed it. So what they're really talking about is something called Thoughts of AI Reasoning Models. This is what they, what they, what they're kind of researching is the thoughts of AI reasoning models in this paper that they've published. And really what they're getting to is something called Chain of Thought. But it's, it's kind of how these reasoning models and do these, you know, they, they do the deep dives, the deep thinking, the all of these kind of tools that are on there. It's called different things on every single AI model. But basically the way that the AI arrives at an answer, they want it to be studied. So OpenAI was kind of the first one that came out with this. When they came out with their O3 model and it had, you know, they're like, this thing has, has reasoning. They didn't explain how the reasoning was done. And that's because they were the first ones with it. They didn't want to like give away the secret and tell everyone what they were doing. But basically most AI researchers understood what was going on. Deep Seek very quickly cloned the tool and, and had a meteoric rise in how good their AI model was once they did it. So it really showed. Oh my gosh. Like, this technique is the way to turn these AI models into best in class AI models. So when Deep Seek did, it went super viral. Everyone was talking about how good that AI model was. But basically what it does is the same way. Like when a human is working on a complex math problem and you're, you know, you're looking through it, you're reasoning, you're writing down notes. This is what the AI models are doing. They're not just trying to one shot, you know, based off of everything, you know, off the top of your head, what's the answer to this, which is how they were working before. It's like if, you know, based off of everything, you know, work through the question, come up with, or it would, it would say things like, you know, come up with like 20 steps to figure out the solution to this problem. It's like, okay, first we ought to figure out X, Y and Z. Then we got to figure out, you know, what the user means by this. Then we got to figure out if the intent that I think is actually correct. Right? So it has like all the, all these steps and it breaks it down. Now you've probably seen this on a lot of AI models because everyone has now some sort of feature like this. My favorite is I think Anthropic is doing a good job. I like how Deep Seek and Grok both show you line by line, the thought process. You can kind of drop it down and see the thought process that the AI model ran through in order to get to your result. That was something that OpenAI never launched with because I think they're trying to guard Their secret at this point, I don't think it really matters, but you, I think it's useful. You, you can essentially see exactly what's going on. And so this is what this paper is all about. Essentially we have what they're calling monitorability. So we can see how they're arriving at their questions, more or less. This is not perfect, but it's a pretty good idea of how they're arriving at their, at their answers to questions. And so the AI researchers are essentially saying we need to keep this monitor ability. So it's called a chain of thought. Um, anyways, this is directly from their paper. It says chain of thought monitoring represents a valuable addition to safety measures for Frontier AI, offering a rare glimpse into how AI agents make decisions. Yet there's no guarantee that the current degree of visibility will persist. We encourage the research community and Frontier AI developers to make the best use of chain of thought monitorability and study how it can be preserved. Okay, there is an interesting sneaky angle to this, and I don't know if this is actually true, but if you think about it, when OpenAI first came out with the, you know, this chain of thought or these reasoning models, they didn't release how it was coming up with the answers. They didn't release what, you know, they're calling this monitorability device that you get from GROK or Deep Seq, where it shows you exactly what it's thinking. And that's because they didn't want people to know, you know, to reverse engineer the prompts that they were telling, you know, how I said, like, you know, basically there's a prompt that will tell it like break this into 12 parts and figure out what you need to do, Yada, yada. Well, OpenAI and no one actually tells what the, what the previous, what the pre prompt is before it runs through the process. So you can get an idea by looking at the process of what the pre prompt is. You can reverse engineer it, but you don't know for a fact. And so what's interesting here, they're saying, you know, there's no guarantee on how long this can exist for, because right now GROK and Deep SEQ and others are like showing you, but they don't have to, like, they could just give you the result and just do what OpenAI used to do, which is just have this little loading bar that said thinking, but. And they could just say, thinking, thinking, thinking. All right, here's your result. Now that's what the AI researchers don't want. And they're saying it's for safety reasons, right? They're like, hey, we don't want anyone to, you know, we don't want to forget how it's getting to the answers. And then it's a bit more of a black box and we can't figure it out and we don't know if it's aligned or safe or yada yada. But a sneaky angle that I've been thinking of is if you're one of these AI researchers and you want to reverse engineer how other models are staying best in class. For example, Grok 4 just came out and it proved that its model was better on all the benchmarks in for reasoning than, you know, it was the best model, best in class model on all the benchmarks. Now maybe that's just because their model's bigger, but at this point a lot of people are saying it's because of some of the tools and the pre prompts and some of the ways that the model has those things worked into it. And the best way to reverse engineer what those tools and pre prompts and kind of some of these, the secret sauce of the model would be to look at its chain of thought. And so I think an interesting sneaky angle would be if these researchers are worried that, hey, if everyone turns off their chain of thought, we won't be able to copy everyone else's like, good ideas as well. And so when something makes a breakthrough, it's going to be harder to copy it. Now is this the real reason? Is it all about safety? I don't really know. I'm just saying if you have chain of thought monitoring available for different AI models, it does make it a little bit easier to kind of copy the secret sauce of other AI companies. So, you know, leave that with. Leave that for. For what? Or take that for what it's worth. The other thing I do think that's interesting wor that's worth mentioning is the people that are signing off on this OpenAI, Google, DeepMind, Anthropic, these are already all the top companies anyways. They're already at the very top of the pack, right? So you throw Gro in there in between the five of these companies or the, sorry, the four of these companies, they're basically like all the top models, the LLM models that have the highest, you know, the highest benchmark scores. And so what's interesting is like, are they trying to make sure that everyone plays by the same rules? Are they trying to make sure if anyone has a breakthrough that they can kind of copy them Are they trying to stop any new companies that might leapfrog everybody with some secret, secret sauce? And then that everyone has to scramble to figure out what they're doing. Do they want there to be regulation that forces us to keep chain of thought like, so I'm not really sure exactly where it goes. It. And at the end of the day, there's. They're not calling for any sort of regulation or anything. And there are big names that are talking about it, right? Or signing off on this. You have safe Superintelligence CEO Ilya Escover who used to be over OpenAI Nobel laureate Jeffrey Hinton Google DeepMind co founder Shane Leg X AI safety advisor Dan Hendricks Thinking Machines co founder John Schultzman So very, very big names in the industry are all essentially, you know, talking, talking about this and signing off on it. So this is really interesting. This is coming at a time when there is cutthroat competition in the AI industry. Mark Zuckerberg is hiring everyone's top researchers. As of this morning. I saw he hired two more top researchers from OpenAI. So it is an absolute bloodbath when it comes to what you have to pay for to keep up with these researchers and what people are willing to do, what these companies are willing to do to make sure that their AI is ahead of everybody else's. So it's very, very competitive. And I wouldn't be surprised if there was some sort of competitive angle. This is one thing that I did want to say from, from Bowen Baker, who worked on this particular paper. He said, we're at this critical time where we have this new chain of thought thing. It seems pretty useful, but it could go away in a few years if people don't really concentrate on it. Publishing a positioning paper like this, to me is a mechanism to get more research and attention to this topic, because before that happens. So the paper isn't any groundbreaking news. It's not calling for any groundbreaking action. It's just putting. It's. It's, you know, they're, they're calling a positioning paper. They're like, just so you know, we had this chain of reasoning for safety reasons. This is good. Everyone should keep doing it. So it'll be interesting to see if that holds any ground, if that actually makes any changes, if anyone is actually forced to do anything. What I will say in regards to this, that I feel like might be even more important is that Anthropic is one of the top companies that's really been working on a lot of this safety stuff. And earlier this year, their CEO, Dario Amadei, he announced that they were pretty much making some sort of commitment to open the quote, unquote, black box of AI models by 2027. So even when we do the chain of thought, we don't always know why an AI model will give the responses that it gives as it works through that chain of thought that you, that you may have given it. So even if you were like, okay, we see why it came up this answer because of this chain of thought, but it's like, how did it get to all the individual pieces? That is a black box. Still, we don't really know how the A model does it. It's algorithms. It's guessing what comes next in a line or a letter or a token. And so Dario Amadeo has some really clever software and techniques that they're, they're working on. And his goal is to have cracked open the black box and explain exactly how the AI models, algorithms work to get to the responses it gets to. And he's hoping to do that by 2027. So very interesting. It's crazy to think we, we don't even know how these AI models are work. We just train the algorithm and, and it gives us a good result. We don't really know why, but it seems that that is going to be coming sooner than later and we're going to then be able to understand the alignment of these models, how safe they are if they're going to go off the, the rails. Like we recently had Grok Xai's Groq kind of go off the rails recently. And so we'll see, we'll be able to get a bit of a deeper look into all of that. Hey, listen, if you enjoyed the podcast episode today, the number one way that you could say thank you is to leave a rating and review or comment wherever you get your podcast. Comments on Spotify, comments on YouTube, all of it helps the podcast a ton get found by more incredible people like yourself. Thanks so much for tuning in. Make sure to check out AI box AI for all the latest AI models and I will catch you in the next episode.