
Loading summary
Jordan Cooney
The Voices of Search Podcast is a proud member of the I Hear Everything Podcast Network. Looking to launch or scale your podcast, I Hear Everything delivers podcast production, growth and monetization solutions that transform your words into profit. Ready to give your brand a voice? Then visit iheareverything.com welcome to the Voices of Search Podcast. A member of the I Hear Everything Podcast network, ready to expedite your company's organic growth efforts. Sit back, relax, and get ready for your daily dose of search engine optimization wisdom. Here's today's host of the Voices of Search podcast, Jordan Cooney.
Hello SEOs and marketers. My name is Jordan Cooney from Pre Visible. Joining me today is Michelle Robbins, who is the manager of Strategic initiatives and intelligence at LinkedIn. LinkedIn is a social networking platform dedicated to facilitating career development and networking opportunities. It offers a platform for individuals to showcase their skills and experience to potential employers. Yesterday, Michelle and I talked about the new space race. Today we're going to continue our conversation discussing why LLMs fail and why AI alignment is needed. Okay, here's my conversation with Michelle Robbins, Manager of Strategic Initiatives and intelligence at LinkedIn. Michelle, welcome back to the Voices Search Podcast.
Michelle Robbins
Good to be back, Jordan. How are you?
Jordan Cooney
I'm doing really well. Yesterday was a lot of fun. Yeah, we talked about where AI is going, this race that's happening, how it connects to Google and search. It's not a conversation that I get to have often on the pod. And candidly, I don't think it's a conversation that a lot of us are thinking about, which is just where has Google come from when it comes to AI and how are they evolving as the competition and the space with all the different models that exist changes very rapidly? If you didn't get a chance to listen to that episode, please go back. Michelle dropped a ton of insights. Not just the history of where things are going, but where things may evolve, especially around the communication from Google and a lot of these AI models. Today we're getting into why these LLMs fail and the alignment that is needed behind them. Maybe before we go into some questions, can you set the stage for our listeners? What do you mean by LLMs fail? There's a lot of talk about this, hallucinations, whatnot, but what is that meaning to you? And. And let's go from there in terms of this conversation.
Michelle Robbins
You know, it's interesting because I think that people get pretty dissatisfied with a lot of the results they get out of the LLMs. And we've seen, even with AI overviews as well as other people asking wild questions of the AI chat bots, the various ones, and getting interesting answers and thinking, these machines are dumb. This isn't good technology. We shouldn't be using this. It kind of discourages people from really getting the value that could be gotten out of the models. So I think it's important, first of all to understand what these things are and what they aren't. And because everyone has been conditioned to use Google in particular, you ask a question, you get a bunch of answers. You can sift through the answers and decide which one you like best or whatever. But usually people click on the first couple of results and they get what they came for. They're happy and satisfied. I, I think that a lot of people believe that these models are the same. Right. You ask a question and it's going to return you the answer. And they don't realize that these are not fact machines, these are prediction machines. Right. They're generating a response based around probabilities.
Jordan Cooney
Right.
Michelle Robbins
And I don't think people understand that because when someone says something confidently, even if it's wrong, you're inclined to believe it. You know, if I were to say, you know what, Jordan, this morning, what you have for breakfast, you probably had like eggs or bacon. I had baby. Babies are delicious. You should try, you know what I mean? Like, if somebody says something in a convincing way, you know, it changes how you interpret it and how you absorb it. And so I think that when people get bad responses out of these different technologies, then they don't understand why. Right. Or if you, you know, you've probably had this experience where you ask a question and you can ask that question. I do this all the time. I love to see how the different models will respond to the same question. So I, because I'm a nerd about this, I guess I have the pro versions of all of the models because I want to see what the. Okay, this is the top end of the model. And I'll ask the same question across ChatGPT and Gemini. I'll ask Perplexity, I'll ask Claude, and it's wild how different those responses are. You never get the same response. Right?
Jordan Cooney
Right.
Michelle Robbins
Whereas even with the only two options we have for search engines being in Google, a lot of the time you could reliably get the same result whichever engine you were using for the top things, for the common things. That's not the case with these models. And even within the models themselves, you're not going to get the same response for the same question the next time. You ask it, it might be slightly different. It could be rad. You could get like a bullet list of points with the first response and with the second response, get two paragraphs of narrative. You've seen this, right?
Jordan Cooney
Yeah, absolutely. And I'm curious because we're talking about the failure of these systems, right? And I think that the uniqueness of this generative predictive structure by which they operate under does create some vulnerabilities. And my question for you is, like, what's the failure rate? Like, what should our expectation as users of these beautiful tools be in terms of how often it's going to fail and how often it's going to, quote, unquote, let us down? Right. And I want to preface it by saying Google made it very easy to do another search if your first search stunk. You know, I'm putting it overly gloved, but like, that's just what they did. They made it really easy. It's a very simple user interface. Your first search wasn't very good. No problem. Do it again and it'll come very fast. Your responses, right. What is our expectation with LLM models? What's the failure rate there and what should we be thinking about as users?
Michelle Robbins
So it's impossible to give a failure rate because the failure rate will depend on the knowledge and data sources that a given foundation model was trained on, how it was tuned, how it's been retrained, and what it was specifically trained to do at first. Right. So, like, what I'm talking about is, for example, Claude. Claude was a programming first coding LLM. And so that's why it tends to score really highly on using it for code. Some of the other chats might have had a different objective function for their creation. There are a lot of smaller LLMs that are focused on medical research. You're going to get really good information and a really high accuracy rate on medical information. But if you start asking it for lasagna recipes, you might get some wild stuff. Right? So it's really hard to say across the board. Across all models, you can expect to get inaccurate information back x percent of the time. It's just going to vary. It's going to vary by the model. It's going to vary by the question being asked even. Right. So it's hard to come up with a number like that. But I think understanding that our responsibility as consumers of the information that comes out of these models is to, you know, review it. Right. It's like we've all seen the, the news about, you know, these attorneys that I guess were using chat GPT to file briefs and it's like you didn't even review it, you didn't even validate, you know, I mean, that's our responsibility. It's like if you just want to put your brain in a jar and use, you know, a GPT to do all of your work, like, are you even valuable?
Jordan Cooney
Yeah.
Michelle Robbins
Like, what is your core? You know, what is your value? So your value is in being able to critically evaluate the output. Right. And being able to understand. Yes, this is good. This is quality. No, I need to edit this. I think they're great starting points for a lot of things. Even for coding, though, you need to review what it's being. You want your engineers and you want your developers to use these tools because they can absolutely get you results quicker, but you want to have people who know what they're looking at to understand that, yes, this is good output, this is good code. This isn't going to bring down the system, right?
Jordan Cooney
Correct. This has been a hot topic on our pod and I want to dig into this because this whole validation, this whole how do we get to the right final expectation when we're using an LLM, whether it be for code, whether it be for science, whether it be for extremely specialized industries like legal, healthcare, accounting, these are all great places where we can utilize these LLM models to make our jobs more efficient. But one of the things that keeps coming up as a trend is you have to have some sort of baseline knowledge. You can't just expect to use these tools and get out a response. If you know nothing about accounting, if you know nothing about healthcare, if you know nothing about SEO, you put in a prompt that you know nothing about. Well, guess what? You're gonna get back Nothing. So I'm curious to get your perspective on how do we manage the utility of LLMs to ensure that outputs are accurately representing what's gonna be useful to that end user and then the utility of whatever that response is at large.
Michelle Robbins
I like to think of these tools as thought partners. So if we frame it as a thought partner, think of it as like, you and I have a conversation, I call you up and I say, jordan, I'm struggling with this. This is what I'm seeing in this situation. What's your perspective on it? We all get together and do this all the time and it's really helpful, it's really beneficial to be able to talk through some things with another person that might have a different perspective or come to the table with a set of information than you have available and that makes you and I better. The more we have these kinds of conversations, right, we start thinking about other things, we start thinking in different ways. And so I think of these tools the same. They're a thought partner. So, you know, if you ask it a question about something and it returns an answer, interrogate that answer, right? Say, tell me more about this or why did you come to this conclusion? Or what does this mean? Or explain this to me. So your suggestion that you can't come to it with no baseline knowledge, I would push back a little bit because I think you could. If, let's say there was something you wanted to learn about. I'd like to learn about gardening, right? Maybe grow some food in my backyard because who knows what's going on with the supply chain these days? So, like, maybe I should be a gardener. I know nothing about gardening. But you know where I bet they.
Jordan Cooney
Get those avocado trees in now?
Michelle Robbins
Right now? Right now. So I think about, you know, how would I, how would I learn more about that? And I would 100% use a GPT, right? I would 100% go and say, you know, I'd like to start growing. Here's, here's where I live. Here's, you know, the climate. Here's how much room I have in my backyard. You know, what could I grow? What could I reasonably grow that would thrive in this environment and see what it gives back to me, right? Let's say it says green beans. I'd be like, great, thanks. And then I would probably go and find like the goddess of green beans on, you know, there's got to be a website because there's a website for everything. And I would validate that information with an expert in the field, depending on the criticality of it. Right, right. And I think that, so I think you can come to it with no knowledge and gain knowledge, but you have to think of it as, again, as a thought partner, not as someone who's just going to tell you do X, Y and Z. I mean, in some cases you can, it's, you know, if.
Jordan Cooney
There'S like a low, very binary type questions or prompts.
Michelle Robbins
Yeah, right, Right. Yeah. Things that are very low risk. You know, it's low risk. If it tells me how to, you know, grow green beans and I don't grow any green beans, it's like, well, that's all right. But I wouldn't, I wouldn't go to it and say like, you know, it was something that is, that is more high risk. I don't have a great example at.
Jordan Cooney
The moment, but, well, maybe one to think about and how we create alignment on is his health care.
Michelle Robbins
Yeah, that's really high risk.
Jordan Cooney
Yeah, right. Like we've noticed this a lot in some of our conversations is that one of the beautiful things about These models, maybe ChatGPT Perplexity or the like, is that you can be incredibly personal with it, far more personal than you can be with a search bar. And you can share, you know, very specific data and facts about you as an individual. And the prompt will, will be very precise and the response will not be what you normally get from Google when it comes to healthcare, which is you will die, which is inevitably the truth. But not exactly what I was looking for for this particular situation that I'm dealing with, with my particular age, my particular health situation, my particular, you know, circumstances. And so one of the interesting things is with healthcare is you can get very, very intimate and very specific with a, with a prompt. What are your thoughts in these types of situations, how we're able to do that? Validation, Because I love Thought partner. I love that as a, as a way to be very intimate with my own needs. But how do you. You explained going into experts and other things, but what are your thoughts on how you continue to evolve that as we use this technology?
Michelle Robbins
Well, again, I think it's really important to understand this gets to the alignment and the alignment problem. Right. So I personally most enjoy using Claude because I know that anthropic has a really high bar for what they will allow the model to respond to and to put out there as knowledge. A lot of people will get frustrated because they might ask Claude something, they might ask ChatGPT something and they'll get a decent answer. And they might ask a llama, which is a completely open model and has fewer guardrails, not no guardrails, but fewer guardrails. They might ask it something that gets all kinds of information. Claude is going to, because of the way it's been aligned. It's going to evaluate what you're asking and compare that question against its constitution, which are basically a set of rules that define its behavior and what it will and won't respond to and how it will respond. And so that's why oftentimes from Claude, Claude will come back with this is something that I'm not in a different way. It's not verbatim, but I'll come back with something like I'm not able to answer this question for you. You should probably seek the advice of a doctor or I've got this basic information, but this is not medical advice. This is not, you know, it's very good about setting your expectations. And I think that as we see these tools evolve, and if we see that there is, you know, harm in the real world that results from information people might get from these different tools, then those kinds of guardrails and that kind of behavior, you're going to see more and more. But at the moment, there is just such a push for people to use these because also in the utilization, the model developers get more training data, they get more information, they're able to build the next stronger, better, faster model. And the goal of these folks is the goal of this research and the goal of this industry really isn't to produce these tools for us as much as it is to produce AGI and then have that AGI produce asi. And so when we talk about the alignment problem, what that specifically is how these models are being not just trained and tuned, but then aligned for optimal human outcomes to be not only helpful and informative, but to not be harmful. And if we can't, and that hasn't been solved, there is no fully aligned model. There are ways to jailbreak. Now, we didn't get into jailbreaking or anything like that, but that's a whole other show. There's ways to get information out of these models that they're not supposed to be releasing. And until that problem gets solved, which is a very, very. Alignment is a very, very hard problem, because it's a hard problem to solve, these models will behave, you know, will, will or won't return the information that somebody requests, which can lead to a frustrating user experience. And so it's a. It's a hard line to walk, right? You want people to use the models, you want them to get a benefit, but you don't want harm to come.
Jordan Cooney
So let's talk more about this alignment real quick. This is a really fascinating part of this episode, and I think our listeners should take this away because alignment's a very tricky thing. When you're early in a new pioneering set of technology. It's not as easy to get clarity on what that is. And from your perspective, Michelle, where should alignment start? Is this on an individual basis? Is it education? Do we need to teach people how to use these things better? Is it on an organizational level? Is it on a macro, political, societal level? Where do you get to that place of alignment?
Michelle Robbins
So when I, when I speak about alignment, I'm talking about, like, honestly, technically, because that's a. That's a. That's a stage in the model development is alignment. That's like an official stage of development for these models. And where the challenges lie are exactly what, what you're saying. How do you align a piece of technology to every single instance that could come up. Right. So you've got problems of context, you've got problems of differences in cultures, you've got problems of even agreement on is this. Should we index on making it a secure model or index on preserving privacy? Because those are two different things. Right.
Jordan Cooney
Correct.
Michelle Robbins
Security versus privacy. Cultural context, languages, language differences and interpretation, and even how a given scenario would be interpreted, treated, or unfold across different cultures. How can you create one model that can understand all of those things? There's just a lot that is shorthand for humans. How do you build in that kind of shorthand in a model, in its training, much less in then aligning it to make sure that you're always respecting cultures, you're respecting differences, you're not putting out harmful information, whether it's visual, video or text. Right. Because the harm really comes in all of those places. But some have higher risk than others. So it's a very big challenge. And there are tools that are utilized to attempt to do this, to try and make the best performing, the highest performing, as well as the least harmful model.
Jordan Cooney
Yeah. So, Michelle, like, as we close out this episode, what's the downside if we don't get to this place of alignment within these models?
Michelle Robbins
So a lot of folks in the AI safety community that do a lot of research around alignment, a lot of what comes out of that community, and that I share their concerns, is that if we can't fully align these foundation models, and when we get to AGI, Artificial General Intelligence, broadly, if we release an AGI that is not aligned, the expectation is that to get to artificial superintelligence, which is the actual end game, then we're going to need to rely on the AGIs to get us there, because humans and human knowledge aren't going to be able to do it. If we release an AGI that is not fully aligned and are relying on that AGI to then create the next step, the asi, we won't be able to have any insight into how it's created. That we have insight now into controlling what's happening in the foundational LLMs and theoretically the AGI being developed. But if the AGI being developed is not fully aligned and develops the asi, it'll be in a complete black box, a complete lock box. The horses are out of the barn at that point. So we have to get this right now because we're not going to be able to get it right later, right?
Jordan Cooney
And that's a great place for us to wrap up this episode of the Voices of Search podcast. A huge thank you to Michelle Robbins from LinkedIn for joining us. If you'd like to get in touch with Michelle, you can find a link to her LinkedIn profile in our show notes or visit our company website, LinkedIn.com.
Ben Shapiro
Okay, thanks to Jordan Cooney, the founder of Pre Visible. If you'd like to get in touch with Jordan, you can find a link to his LinkedIn profile in our notes. You can contact him on Twitter. His handle is J.T. cooney. That's J T K O E N E. Or you can visit his company's website, which is Previsible IO that's P R E V I S I B L E I O. Just one more link in our show notes I'd like to tell you about. If you didn't have a chance to take notes while you were listening to this podcast, head over to voicesofsearch.com where we have summaries of all of our episodes and contact information for our guests. You can also subscribe to our weekly newsletter, and you can even send us your topic suggestions or your marketing questions, which we'll answer live on our show. Of course, you can always reach out on social media. Our handle is voicesofsearch on LinkedIn, Twitter, Instagram, Facebook, or you can contact me directly. My handle is Benjayshab B E N J S H A P And if you haven't subscribed yet and you want a daily stream of SEO and content marketing insights in your podcast feed, we're going to publish an episode every day during the work week. So hit that subscribe button in your podcast app and we'll be back in your feed tomorrow morning. All right, that's it for today, but until next time, remember, the answers are always in the data.
Jordan Cooney
Sam.
Voices of Search Podcast Summary: "Why LLMs Fail (and why AI Alignment is Needed)"
Podcast Information:
Hosts:
[00:43] Jordan Cooney:
Jordan welcomes listeners and introduces Michelle Robbins from LinkedIn. He references their previous episode about the "new space race" in AI and sets the agenda for today’s discussion: understanding why Large Language Models (LLMs) fail and the necessity of AI alignment.
Jordan’s Opening Remarks:
"If you didn't get a chance to listen to that episode, please go back. Michelle dropped a ton of insights."
— Jordan Cooney [00:43]
Michelle Robbins on LLM Limitations
Michelle begins by addressing common frustrations with LLMs, such as hallucinations and inconsistent responses. She emphasizes that many users mistakenly treat LLMs as "fact machines" rather than "prediction machines."
"They're not fact machines, these are prediction machines. Right. They're generating a response based around probabilities."
— Michelle Robbins [03:45]
Key Points:
Jordan’s Inquiry on Failure Rates
Jordan probes the practical implications of LLM failures, questioning users on what failure rates to expect and how they compare to traditional search engine inaccuracies.
"What's the failure rate there and what should we be thinking about as users?"
— Jordan Cooney [05:24]
Michelle’s Response:
"It's impossible to give a failure rate because the failure rate will depend on the knowledge and data sources that a given foundation model was trained on."
— Michelle Robbins [06:23]
Responsibility of Users
Michelle stresses the importance of users critically evaluating and validating the information provided by LLMs.
"Our responsibility as consumers... is to review it."
— Michelle Robbins [07:05]
Practical Applications:
Jordan’s Expansion on Thought Partnership
Jordan highlights the necessity of users having baseline knowledge to effectively utilize LLMs, ensuring the utility and accuracy of responses.
"If you know nothing about SEO, you put in a prompt that you know nothing about. Well, you're gonna get back Nothing."
— Jordan Cooney [09:12]
Michelle on LLMs as Learning Tools
Michelle discusses how LLMs can aid users in learning new subjects by providing initial guidance, which users can then validate through expert sources.
"I like to think of these tools as thought partners... You have to think of it as, again, as a thought partner, not as someone who's just going to tell you do X, Y and Z."
— Michelle Robbins [09:48]
Example Provided: Starting a gardening hobby with LLM assistance, followed by validation from dedicated resources or experts.
"If I say I'd like to start growing... I'd go and find like the goddess of green beans on, you know, there's got to be a website because there's a website for everything."
— Michelle Robbins [11:03]
Defining AI Alignment
Michelle introduces the concept of AI alignment, explaining it as the process of ensuring that AI systems behave in ways that are beneficial and non-harmful to humans.
"Alignment is a very, very hard problem to solve."
— Michelle Robbins [16:55]
Challenges in AI Alignment:
Model-Specific Alignment Efforts
Michelle highlights how different models, like Claude, incorporate strict alignment measures to prevent the distribution of inappropriate or harmful content.
"Claude will come back with something like I'm not able to answer this question for you. You should probably seek the advice of a doctor."
— Michelle Robbins [13:52]
Potential Risks
Michelle warns of the dire consequences if AI alignment is not successfully implemented, particularly as we approach Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI).
"If we release an AGI that is not fully aligned... the horses are out of the barn at that point."
— Michelle Robbins [19:20]
Key Concerns:
Michelle’s Call to Action
Michelle emphasizes the urgency of addressing AI alignment to prevent catastrophic outcomes in the evolution of intelligent systems.
"We have to get this right now because we're not going to be able to get it right later, right?"
— Michelle Robbins [20:42]
Closing Remarks by Jordan and Ben Shapiro
Jordan wraps up the episode by thanking Michelle Robbins for her insights and providing listeners with information on how to connect with both speakers via LinkedIn and other platforms.
Final Takeaway: The alignment of AI systems is crucial to harnessing their potential benefits while mitigating risks. Users must engage responsibly with LLMs, and developers must prioritize robust alignment strategies to ensure AI technologies evolve safely and ethically.
"Until next time, remember, the answers are always in the data."
— Jordan Cooney [22:04]
Key Quotes with Timestamps:
Resources Mentioned:
For more detailed insights and episode summaries, visit voicesofsearch.com.