Transcript
Piyush Kadam (0:00)
This is episode 706 of the AWS.
Shruti Koparkar (0:03)
Podcast, released on February 3rd, 2025. Hello, everyone. Welcome to the official AWS podcast. This is Shruti Koparkar and I will be your host for today's episode where we will discuss LLMOPS with Amazon SageMaker. So LLMs, or large language models are powering innovative applications across industries from, from, say, creating really creative content to providing insightful analysis. But building these powerful models and more importantly, deploying them and managing them in production is a complex challenge. And llmops is the solution. So this podcast is for data scientists, ML engineers, MLOps engineers, or anyone really who is interested in operationalizing large language models. So tune in as we explore the best practices, tools and strategies for successful LLM ops. Now joining me today are Piyush and Lauren. Welcome both, and why don't you introduce yourselves? So, piyush, maybe you go first.
Piyush Kadam (1:13)
Hey folks, I'm Piyush Kadam. I'm a senior Product manager with Amazon SageMaker. I'm a longtime Amazonian and excited to talk more about this topic.
Lauren Mullinax (1:23)
Hey everyone, I'm Lauren Mullinax. I'm a senior AI ML specialist solutions architect. Happy to be here, Shruti, and excited to talk more about llmops.
Shruti Koparkar (1:32)
Excellent. Okay, so before we dive into sort of the details of LLMOPS and all the cool functionality we have in Amazon SageMaker, maybe let's just kick off with the basics. What is LLMOPS like? We are familiar with the MLOps terminal. What is LLMOPS and why is it so important as this AI landscape evolves?
Piyush Kadam (1:59)
Yeah, for sure. I think the rapid innovation in AI is kind of generating a lot of acronyms and terminology. So it would be good to demystify what LLM ops is. It really started about probably 20, 30 years back with DevOps, where you needed a set of tools and best practices to manage your software. You have written a cool piece of software, but how do you go and distribute it to thousands or millions of your end users? That's when you got via DevOps, these innovations like CI CD pipelines, code repositories, automated rollbacks, A B, testing frameworks, et cetera. And then about 10, 15 years ago, when AI kind of stepped outside of academia into the industry, you needed to kind of go and morph that DevOps framework to fit for ML models. That's when you got the world of MLOps. Right. So it's still a software artifact, but slightly different. You needed to kind of have new features around model training, monitoring these models, deploying Them, they're slightly different in terms of how you would handle those compared to a traditional piece of software. You also got things like experimentation, experimentation, specifically during model creation time. That's where things like mlflow came out. And now of course, we are in the era of foundation models and large language models. In the last couple of years now these MLOps tools have had to evolve to kind of work with the new paradigms that foundation models have introduced. Back in the time of traditional ML, you had to train your models from scratch with your own data and then make sure those models are performing well and then go deploy them. But now foundation models have kind of cut short that initial journey. You can just take foundation models off the shelf from third party providers or from open source repositories, but you still need to make sure that they work well for your application, they pass your evaluation criteria. And so the MLOps tools are now morphed into LLM Ops tools. And there are of course new paradigms too that need to be accounted for. For example, prompt engineering, this completely new in the world of LLMs. And now these MLOps tools now have to be able to experiment with different prompts, which was not a thing in MLOps. So LLOps basically encompasses the frameworks and best practices that let you take these off the shelf foundation models and validate them, test them out and then deploy them at scale so that you are generating responsible and cost effective responses for your applications.
