#706: Automate LLM fine-tuning and selection with Amazon SageMaker Pipelines - AWS Podcast

Summary

AWS Podcast Episode #706: Automate LLM Fine-Tuning and Selection with Amazon SageMaker Pipelines

Release Date: February 3, 2025

Introduction

In Episode #706 of the AWS Podcast, hosted by Shruti Koparkar, Amazon Web Services delves into the evolving landscape of Large Language Model Operations (LLMOps) with a focus on Amazon SageMaker Pipelines. Joining Shruti are Piyush Kadam, Senior Product Manager for Amazon SageMaker, and Lauren Mullinax, Senior AI/ML Specialist Solutions Architect. This episode is tailored for data scientists, ML engineers, MLOps professionals, and anyone interested in the operational aspects of large language models (LLMs).

Understanding LLMOps

Shruti opens the discussion by contrasting traditional MLOps with the emerging field of LLMOps. Piyush Kadam provides a foundational understanding:

“LLMOps basically encompasses the frameworks and best practices that let you take these off-the-shelf foundation models and validate them, test them out, and then deploy them at scale so that you are generating responsible and cost-effective responses for your applications.”
— Piyush Kadam [04:59]

He traces the evolution from DevOps to MLOps, highlighting how foundation models have transformed the initial stages of model development. Unlike traditional ML models that require training from scratch with proprietary data, foundation models can be sourced from third-party providers or open-source repositories. This shift necessitates new practices in customization, evaluation, and deployment, which LLMOps seeks to address.

Operationalizing LLMs: Challenges and Solutions

Shruti emphasizes that, while the core principles of MLOps—such as experimentation, repeatability, reliability, and scalability—remain relevant, LLMOps introduces unique considerations:

“Do you need to version your prompts and things like that? So that's helpful to level set that fundamentally it's sort of the idea is the same of, you know, trying to manage this complex life cycle, but now focused on these large language models or foundation models.”
— Shruti Koparkar [06:06]

Piyush elaborates on two primary differentiators for LLMs:

Customization Process: Unlike ML models where data characteristics and training processes are transparent, foundation models operate as "black boxes." Customization begins with prompt engineering and may extend to fine-tuning with proprietary datasets to meet specific application needs. This requires rigorous evaluation to ensure models adhere to responsible AI guidelines, especially in sensitive industries like finance and healthcare.
Deployment Parity: While deployment processes for LLMs share similarities with traditional ML models, the emphasis on responsible AI and model evaluation introduces additional layers of complexity.

SageMaker Pipelines for LLMOps

Lauren Mullinax introduces Amazon SageMaker Pipelines, highlighting its role as a scalable, serverless solution designed to manage end-to-end ML and LLM workflows. Key features include:

Visual Pipelines Designer: An intuitive drag-and-drop interface that simplifies pipeline creation without extensive coding.
Python SDK: For advanced users who prefer scripting and customization.
Scalability: Capable of handling tens of thousands of concurrent ML workflows, ensuring seamless integration with other AWS services.

“So pipelines one benefit is that it can scale to run tens of thousands of concurrent ML workflows in your production environment.”
— Lauren Mullinax [10:24]

Lauren underscores the importance of SageMaker Pipelines in automating repetitive tasks, facilitating experiment management, and maintaining scalability—all crucial for efficient LLMOps.

Specific Capabilities of SageMaker Pipelines for LLMOps

The discussion delves deeper into how SageMaker Pipelines caters specifically to LLMOps needs:

Fine-Tuning Steps: Purpose-built steps for fine-tuning foundation models, including selection of managed instances and data processing.
Distributed Training: Supports launching distributed training jobs across multiple GPUs, essential for handling models with billions of parameters.
Experiment Management and Versioning: Integrated with SageMaker Clarify and the open-sourced FMEVAL library for advanced model evaluation metrics. Facilitates collaboration among teams by tracking different fine-tuning experiments and enabling model rollbacks.

“With fine-tuning, which I think is a very unique and important part of LLM ops, we have not seen that with MLOps and traditional ML models in the past.”
— Lauren Mullinax [13:37]

Lauren also highlights how SageMaker Pipelines integrates with services like Data Wrangler and GitHub, enhancing data processing and experiment tracking capabilities.

Cost Optimization in LLMOps Using SageMaker Pipelines

Cost management is a critical concern in LLMOps due to the high expenses associated with GPU usage. Piyush introduces two key features within SageMaker Pipelines that aid in cost optimization:

Step Caching: Automatically detects unchanged pipeline steps and reuses previous results, preventing redundant operations.
Selective Execution: Allows users to explicitly choose which pipeline steps to rerun, offering granular control over resource usage.

“We have a customer who has even 100 steps... If I start off with a pipeline completely created from this visual Designer... I just want to test out how my pipeline would behave if I update step number five.”
— Piyush Kadam [22:02]

He shares a real-world example where a customer saves costs by reusing results from unchanged steps, demonstrating the practical benefits of these features.

Version Control and Experiment Tracking

Effective version control and experiment tracking are paramount for managing complex LLM workflows. Lauren explains how SageMaker Pipelines facilitates this through:

Model Registry: Tracks model lineage and supports approval workflows for moving models from development to production.
Integration with MLflow and GitHub: Enhances visualization, tracking, and monitoring of experiment runs.
Hyperparameter Logging: Records hyperparameters and metrics for each training run, enabling detailed comparisons and informed decision-making.

“You can have a model registry, having an approval workflow within that actual model registry, so that can help support your model approval process for different stages.”
— Lauren Mullinax [27:04]

These capabilities ensure that teams can efficiently manage multiple experiments, collaborate effectively, and maintain robust version control over their models.

Conclusion

The episode concludes with a recap of the transformative impact of SageMaker Pipelines on LLMOps. Shruti emphasizes the scalability and operational efficiency gained through automated pipelines, enabling organizations to manage large-scale LLM projects with ease.

“Until next time, keep on building.”
— Shruti Koparkar [30:08]

Listeners are encouraged to leverage SageMaker Pipelines to streamline their LLMOps workflows, optimize costs, and maintain high standards of model performance and compliance.

Key Takeaways:

LLMOps builds upon MLOps, addressing the unique challenges of deploying and managing large language models.
Amazon SageMaker Pipelines offers scalable, repeatable solutions with features tailored for LLMOps, including fine-tuning, distributed training, and comprehensive experiment management.
Cost Optimization is achievable through step caching and selective execution, reducing unnecessary resource expenditure.
Version Control and Experiment Tracking are seamlessly integrated, facilitating collaboration and ensuring model reliability and compliance.

For developers and IT professionals aiming to harness the power of large language models, this episode provides valuable insights into leveraging AWS tools to automate and optimize LLM workflows effectively.

Transcript

Piyush Kadam (0:00)

This is episode 706 of the AWS.

Shruti Koparkar (0:03)

Podcast, released on February 3rd, 2025. Hello, everyone. Welcome to the official AWS podcast. This is Shruti Koparkar and I will be your host for today's episode where we will discuss LLMOPS with Amazon SageMaker. So LLMs, or large language models are powering innovative applications across industries from, from, say, creating really creative content to providing insightful analysis. But building these powerful models and more importantly, deploying them and managing them in production is a complex challenge. And llmops is the solution. So this podcast is for data scientists, ML engineers, MLOps engineers, or anyone really who is interested in operationalizing large language models. So tune in as we explore the best practices, tools and strategies for successful LLM ops. Now joining me today are Piyush and Lauren. Welcome both, and why don't you introduce yourselves? So, piyush, maybe you go first.

Piyush Kadam (1:13)

Hey folks, I'm Piyush Kadam. I'm a senior Product manager with Amazon SageMaker. I'm a longtime Amazonian and excited to talk more about this topic.

Lauren Mullinax (1:23)

Hey everyone, I'm Lauren Mullinax. I'm a senior AI ML specialist solutions architect. Happy to be here, Shruti, and excited to talk more about llmops.

Shruti Koparkar (1:32)

Excellent. Okay, so before we dive into sort of the details of LLMOPS and all the cool functionality we have in Amazon SageMaker, maybe let's just kick off with the basics. What is LLMOPS like? We are familiar with the MLOps terminal. What is LLMOPS and why is it so important as this AI landscape evolves?

Piyush Kadam (1:59)

Yeah, for sure. I think the rapid innovation in AI is kind of generating a lot of acronyms and terminology. So it would be good to demystify what LLM ops is. It really started about probably 20, 30 years back with DevOps, where you needed a set of tools and best practices to manage your software. You have written a cool piece of software, but how do you go and distribute it to thousands or millions of your end users? That's when you got via DevOps, these innovations like CI CD pipelines, code repositories, automated rollbacks, A B, testing frameworks, et cetera. And then about 10, 15 years ago, when AI kind of stepped outside of academia into the industry, you needed to kind of go and morph that DevOps framework to fit for ML models. That's when you got the world of MLOps. Right. So it's still a software artifact, but slightly different. You needed to kind of have new features around model training, monitoring these models, deploying Them, they're slightly different in terms of how you would handle those compared to a traditional piece of software. You also got things like experimentation, experimentation, specifically during model creation time. That's where things like mlflow came out. And now of course, we are in the era of foundation models and large language models. In the last couple of years now these MLOps tools have had to evolve to kind of work with the new paradigms that foundation models have introduced. Back in the time of traditional ML, you had to train your models from scratch with your own data and then make sure those models are performing well and then go deploy them. But now foundation models have kind of cut short that initial journey. You can just take foundation models off the shelf from third party providers or from open source repositories, but you still need to make sure that they work well for your application, they pass your evaluation criteria. And so the MLOps tools are now morphed into LLM Ops tools. And there are of course new paradigms too that need to be accounted for. For example, prompt engineering, this completely new in the world of LLMs. And now these MLOps tools now have to be able to experiment with different prompts, which was not a thing in MLOps. So LLOps basically encompasses the frameworks and best practices that let you take these off the shelf foundation models and validate them, test them out and then deploy them at scale so that you are generating responsible and cost effective responses for your applications.