Ray & KubeRay, with Richard Liaw and Kai-Hsun Chen - Kubernetes Podcast from Google

Summary5 min read

Kubernetes Podcast from Google – Episode Summary: "Ray & KubeRay, with Richard Liaw and Kai-Hsun Chen"

Release Date: September 3, 2024
Hosts: Abdel Sghiouar and Kaslin Fields
Guests: Richard Liaw and Kai-Hsun Chen from Anyscale

Introduction

In this episode of the Kubernetes Podcast from Google, hosts Kaslin Fields and Mofi Rahman delve into the intricacies of Ray and KubeRay with experts Richard Liaw and Kai-Hsun Chen from Anyscale. The conversation explores how Ray serves as a unified compute framework for scaling AI and Python workloads, and how KubeRay integrates Ray's capabilities into Kubernetes clusters.

Understanding Ray and KubeRay

Ray is introduced as an open-source compute engine designed to scale AI and Python workloads efficiently. It offers a suite of libraries for training, serving, and data processing, all accessible via a Python interface. KubeRay, on the other hand, is a Kubernetes operator that facilitates the deployment and management of Ray clusters within Kubernetes environments.

Richard Liaw provides a historical perspective on Ray's inception, highlighting its origins at UC Berkeley and its initial focus on reinforcement learning and distributed deep learning:

"[00:06:33] Richard Liaw: ...the Ray project came out of the work that Robert and Philip were doing on like distributed deep learning, distributed reinforcement learning."

Kai-Hsun Chen emphasizes Ray’s versatility and its seamless integration with Kubernetes:

"[00:07:37] Kai Sun: ...Ray is pretty versatile and general purpose... you can use the single Python file to cover end to end."

User Experience and Design Choices

The discussion underscores Ray’s commitment to developer productivity and ease of use. Richard explains how Ray allows data scientists and machine learning engineers to build and iterate on pipelines within familiar environments like Jupyter notebooks:

"[00:10:22] Richard Liaw: ...Ray allows you to do everything end to end in one sort of development environment."

Kaslin and Mofi touch upon the user personas Ray caters to, noting that Ray serves a broad spectrum from data scientists to platform engineers:

"[00:13:21] Richard Liaw: RAY is not only a machine learning engineering tool, it's not only like a platform engineering tool. It like with Cube Ray we make it really easy for these platform engineers..."

Ray Libraries and Use Cases

Ray's ecosystem comprises various libraries such as Ray Data, Ray Tune, Ray Rlib, and Ray Serve, each tailored for specific aspects of AI workloads. Richard highlights Ray Data as a pivotal project focused on efficient data processing for AI:

"[00:16:06] Richard Liaw: Ray Data is a data processing engine... being able to ingest that into the GPU very efficiently."

Kai shares his favorite library, emphasizing Ray’s flexibility in managing multiple models within a single graph, which significantly reduces costs:

"[00:17:37] Kai Sun: Razor provides a very flexible way to do multiplacing... they reduce 50% of their cost."

Integration with Kubernetes: Challenges and Benefits

The integration of Ray with Kubernetes via KubeRay presents both opportunities and challenges. Kai outlines the complexities of aligning Ray’s multi-process architecture with Kubernetes’ microservice-oriented design:

"[00:19:23] Kai Sun: ...Kubernetes is primarily designed for the microservice architecture... Ray runs multiple processes in a single container, which poses integration challenges."

Despite these hurdles, the benefits of leveraging Kubernetes’ robust ecosystem are significant. Ray's deployment on Kubernetes unlocks access to various tools like schedulers, observability solutions, and load balancers, enhancing Ray's production readiness.

Future of Ray

Looking ahead, Richard discusses the evolution of Ray in response to the rapidly advancing machine learning landscape. Key focus areas include Accelerated DAGs, which aim to optimize GPU cluster programming, and support for diverse accelerators like TPUs and AMD GPUs:

"[00:21:55] Richard Liaw: ...accelerated DAGs allow us to program GPU clusters... supporting multiple accelerators is essential for Ray to remain relevant."

Kai adds that ongoing developments in KubeRay aim to enhance support for heterogeneous computing resources and improve usability and security features:

"[00:31:23] Kai Sun: ...support like TPU and multi-host TPU... build some ecosystem for usability and security."

Misconceptions and Surprising Use Cases

The hosts address common misconceptions about Ray, such as its perceived complexity and the necessity of large GPU clusters. Richard advocates for greater awareness of Ray Data, which simplifies data processing for AI workloads:

"[00:24:13] Richard Liaw: More people should be aware of Ray Data... it will make your life like a 100 times easier."

A surprising use case shared involves an individual connecting multiple MacBook Pros to form a Ray cluster, demonstrating Ray's flexibility beyond traditional cloud environments:

"[00:26:21] Richard Liaw: ...connecting a ton of MacBook Pros together to create a Ray cluster was really cool."

Conclusion and Future Events

As the episode wraps up, the hosts and guests highlight upcoming events like the Ray Summit in San Francisco, encouraging listeners to engage with the growing Ray community. They reiterate Ray's role as a fundamental building block for ML platforms, facilitating seamless transitions from local development to scalable Kubernetes deployments.

Kaslin concludes with a reflection on Ray's evolution from a university project to a comprehensive solution for distributed AI workloads, paralleling Kubernetes' journey from managing stateless applications to supporting diverse use cases.

Notable Quotes

Richard Liaw on Ray’s inception:

"Ray project came out of the work that Robert and Philip were doing on like distributed deep learning, distributed reinforcement learning." ([06:33])
Kai-Hsun Chen on Ray’s versatility:

"Ray is pretty versatile and general purpose... you can use the single Python file to cover end to end." ([07:37])
Richard Liaw on developer productivity:

"Ray allows you to do everything end to end in one sort of development environment." ([10:22])
Richard Liaw on Ray’s user base:

"RAY is not only a machine learning engineering tool, it's not only like a platform engineering tool." ([13:21])
Richard Liaw on Ray Data:

"Ray Data will probably make your life like a 100 times easier." ([24:13])
Kai-Hsun Chen on KubeRay challenges:

"Kubernetes is primarily designed for the microservice architecture... Ray runs multiple processes in a single container, which poses integration challenges." ([19:23])

Final Thoughts

This episode offers an in-depth exploration of Ray and KubeRay, highlighting their significance in the AI and Kubernetes ecosystems. Richard Liaw and Kai-Hsun Chen provide valuable insights into Ray’s capabilities, integration challenges, and future directions, making it a must-listen for professionals engaged in AI, machine learning, and Kubernetes.

For more information, follow the hosts on Twitter @KubernetesPod or email kubernetespodcast@google.com. Visit kubernetespodcast.com for transcripts, show notes, and subscription links.

Loading summary

Transcript107 lines

[00:00]
Kaslan Fields
Hello and welcome to the Kumarez Podcast from Google. I'm your host Kaslan Fields.
[00:05]
Mofi Rahman
And I'm Mofi Rahman.
[00:16]
Kaslan Fields
In this episode, our guest host and AI correspondent Mophi Raman interviews Richard Lia and Kai sun from Anyscale. About Ray and Cube Ray Ray is an open source unified compute framework that.
[00:28]
Mophi Rahman
Makes it easy to scale AI and.
[00:29]
Kaslan Fields
Python workloads while Cube Ray integrates raise capabilities into Kubernetes clusters.
[00:35]
Mophi Rahman
But first, let's get to the news.
[00:40]
Mofi Rahman
Litmus Chaos featured on the last episode of this podcast, episode 234 has completed a third party security audit conducted by 7A Security. The audit consisted of a white box security review paired with pen testing. The results include 16 findings with a security impact, six vulnerabilities, 10 hardening recommendations, eight threads to the project defined with detailed attack scenarios and fix recommendations and recommendations for future security hardening in Litmus Chaos. The audit report emphasizes that Litmus Chaos has well implemented security efforts that reflect well on the function build and maintenance of the project.
[01:19]
Kaslan Fields
Google Cloud announced that Nvidia L4 GPUs are now available on Cloud Run in Preview. Cloud Run is a serverless execution environment for containerized applications in Google Cloud. The release of GPUs on Cloud Run opens the door to many new use cases for Cloud Run developers such as performing real time inference with lightweight open models, serving custom fine tuned Genai models and speeding up existing compute intensive Cloud Run services.
[01:45]
Mofi Rahman
A number of co located CNCF led events took place in Hong Kong between 21st and 23rd of August including Kubecon Cloud, NativeCon Open Source Summit and Open Source Genai and ML Summit China 2024. That's four events in one. The event featured a number of excellent talks including a keynote from Linus Torvalds.
[02:07]
Kaslan Fields
The CNCF hosted COLO schedule for Kubecon Cloud native con North America 2024 is live. The co located events will take place before the main schedule of Kubecon on Monday and Tuesday, Nov. 11 and 12th, 2024 in Salt Lake City, Utah.
[02:22]
Mofi Rahman
Kubernetes 1.31 became available in the Google Kubernetes Engine rapid channel on August 20, just one week after its release. Information about the new features and deprecations in 1.31 on GKE can be found in the GKE release notes and don't forget to check out our interview with the release lead to learn more about 1.31.
[02:44]
Kaslan Fields
Red Hat announced the general availability of Red Hat OpenStack services on OpenShift. OpenShift is a container application platform designed for developers based on Kubernetes. OpenStack is an open source platform which allows users to build and manage private or public public clouds with their own pooled virtual resources and is designed for system administrators. OpenShift and OpenStack represent different approaches to the challenge of distributed computing. OpenStack is particularly popular in telecommunications use cases. This new offering provides a native way to run the virtual resource management tool OpenStack on top of the Kubernetes based.
[03:21]
Mofi Rahman
Platform OpenShift Broadcom hosted the VMware Explore conference in Las Vegas from August 26th to 29th. Among the announcement at the event was the release of Tanzu Platform 10. Tanzu Platform 10 allows you to choose between Cloud Foundry and Kubernetes for your platform runtime, whether in public or private cloud environments. Among other new features. This change enables user to create more streamlined workflows for developers, especially those developing gen AI powered applications.
[03:51]
Kaslan Fields
The CNCF shared that since launching the Kubestronauts program at Kubecon EU 2024 in Paris, over 500 Kubestronauts have joined the program. Each of these 500 plus Kubestronauts have active certifications in all of the CNCF's Kubernetes certifications. That's the KCNA, the KCSA, the CKA.
[04:11]
Mophi Rahman
The CKAD, and the CKS. And I'm not going to spell those out for you right now. Make sure you go look them up.
[04:17]
Kaslan Fields
If you pass all five of these certifications and have them all active at once, you too can become a cubestronaut. You can find links to more information about the program in the show notes.
[04:27]
Mofi Rahman
The DAPR community will hold a virtual DAPR day event on October 16. DAPR stands for Distributed Application Runtime, a free open source runtime system that helps developer build distributed application. The community will be celebrating five years of dapr. The schedule for the event is now.
[04:45]
Mophi Rahman
Live and that's the news.
[04:49]
Mofi Rahman
In this episode we have Richard Liao who is a member of the product team at any scale. He was previously working on the RAY Open Source project, as well as one of the authors of the ray, as well as one of the authors of the original Ray paper. Richard, welcome to the podcast.
[05:07]
Richard Lia
Thanks. Happy to be here.
[05:08]
Mofi Rahman
We also have Kaishun who is the maintainer of the Cube Ray project. He's also a member of the Anyscale team. Welcome to the podcast. Kaishun.
[05:16]
Kai Sun
Yeah hi. Hi everyone.
[05:18]
Mofi Rahman
Richard, how did you get involved with the Ray project?
[05:21]
Richard Lia
Back in seven eight years ago, the project was being started at UC Berkeley and this is a period of time where everyone was very excited about reinforcement learning and there was no sort of simple system to build reinforcement learning applications at scale. So the Ray project came out of the work that Robert and Philip were doing on like distributed deep learning, distributed reinforcement learning. And I was an undergrad, sort of a starting graduate student at a time at the Berkeley Rise lab and I started building reinforcement learning applications on top of and algorithms on top of Ray. This led to me and other graduate students working on Ray Tune, which is a distributed hyperparameter tutoring project, and Ray Rlib, which is distributed reinforcement learning project. I spent a couple years doing that and then when we started any skill in 2019 I also joined and eventually became the engineering manager on top of all distributed training solutions at for re. And currently I've sort of moved on to work on other projects including Ray Data and Ollum Inference.
[06:34]
Mofi Rahman
The same question to Kaishun how did you get involved with the RAY project?
[06:37]
Kai Sun
Oh yeah, I think it is also a long story. I think at first it starts in my undergrad I maintain an open source project and Apache Submarine it is ML platform that built on the Hadoop ecosystem on the Hadoop Young and then we observe the transition from Hadoop Young to the Kubernetes or also support on the Kubernetes and then I also write an induction paper and then I found the open source project that is the Ray and the size of the paper and I find it pretty interesting. So when I just go to US I just join the N scale to work on the cube Ray and then I work on the record stuff. Yeah and currently is that currently I've put more emphasize on the record side on the distributed training.
[07:20]
Mofi Rahman
So we already mentioned the word Ray a few times but for our listeners who probably are not familiar if you had to describe Ray in a few sentences, what is Ray?
[07:29]
Richard Lia
I'm happy to take that. So Ray is a a compute engine for scaling AI workloads and it offers a set of libraries including libraries for training, serving data processing, all built within, using a Python front end and integrating very closely with the rest of the machine learning and AI ecosystem.
[07:52]
Mofi Rahman
And in this space I think Kaishan also mentioned something about Hadoop and Apache projects that exist in this space and there are a bunch of other things like Pytorch and there are things like Spark and Airflow, all these other things that exist around the Same time in 2019 there was also a project called Kubeflow that was doing a bunch of things in the community space. So while existing solutions that does similar things or does things that are in the same space as AIML workload, why Ray? Like why was Ray created and what problem was it solving in the time that was other solutions were not doing?
[08:25]
Kai Sun
God got. I think this is a pretty good question. I think the first point is that I think before Ray I think ML infrastructure is primarily in the microservice architecture. After Kubernetes intro for example, maybe in the kubeflow I think you need to have some system for the data processing and some operator for one for TensorFlow, one for Titorch. I think it's recently combined into one training operator and you need to do some use case serve or some stuff to do a serving. But I think it pretty makes sense for the control plane logic. But for computation side I think people prefer to use a monolithic computation runtime and Ray is pretty good at that because Gray is pretty versatile and general purpose. And ML workload is a very versatile workload including like a data processing, training, tuning and serving. And at each stage you require a different kind of workload, different kind of infrastructure requirement. For example like serving you require like auto scaling and high availability. And for training maybe you require like again scheduling. You also need to support different kind of accelerator and different kind of workload. Yeah, so I think it's changed very fast in the workload and people want to combine use a single runtime for all the workload instead of use the microservice for different parts because if you use different part you need to use like some YAML file, something to group them together. But it was Ray that you can use the single Python file to cover end to end. Yeah, so I think that's why people like Ray because iterates much faster, you don't need to and it can very flexible and much future proof. Yeah, I think this is from my perspective because I also maintain a project in Apache Submarine. You can think it's also the microservice architecture stuff.
[10:20]
Mofi Rahman
Richard, would you like to add anything in that?
[10:22]
Richard Lia
Yeah, I think the developer productivity aspect is pretty big. In particular, a lot of data scientists and machine learning folks will use notebooks and instead of writing machine learning pipelines in microservices and separate sort of containers, Ray allows you to do everything end to end in one sort of development environment, especially with a notebook. Then you can sort of chain data processing with training, with batch inference and sort of keep that all within the same sort of developer context. And that's very powerful and makes building and Iterating on machine learning pipelines much easier than before.
[11:02]
Mofi Rahman
I think like from my experience of using Ray has been similar of experience of being able to write code that is in my local machine that is like just Ray Init sets up a Ray cluster on my local machine for me. And the code that from there going somewhere else on a Ray cluster on some sort of like a distributed system or Kubernetes seemed very intuitive. So I just wanted to kind of like speak to you guys to see if that was kind of like a design decision from the get go. Like make the experience of the people that are used to in the data science community like having notebooks, not having to move too far from their Python environment. Like was that something like a decision made early on or was it something just organically happened?
[11:42]
Richard Lia
So I think two parts. One is the part about developing locally and being able to scale, very scale into large scale contexts without having a lot of, you know, without having A Distributed Systems PhD was one of the key design principles that we want to have when building Ray. Now I think when it comes to the notebook sort of environment, that was something that was a natural artifact of us wanting to be able to have this sort of developer experience and make sure that we worked really well with the rest of the Python ecosystem, the standard single process Python ecosystem. And so when it came to seeing people start using Ray and notebooks and us actually giving tutorials, it became like a very natural sort of evolution of how we would want the Ray experience to evolve. Now I think one thing to call out is in any skill, one thing. So anyscale is like the managed Ray product. The developer experience that we tend to advocate for our users is focused on VS code and being able to develop scripts from scratch. And I think that's one of the things that probably marks the difference between like a machine learning engineer versus like a data scientist. And that's kind of also an aspect of what we target, like people who are sort of working more on the machine learning side and building machine learning pipelines rather than, you know, data scientists doing large scale numerical computations.
[13:22]
Mofi Rahman
Kind of going off from that you mentioned like machine learning engineers versus data scientists and my work in the Kubernetes side of things, we actually deal with a lot of folks that are platform engineers. So from the Ray project point of view, if you had to define Ray's user on the other side, would you say the user base currently are mostly folks that are in the data science camp or machine learning engineer camp or platform engineering camp or do you see like a nice distribution between the three.
[13:49]
Richard Lia
I think there's actually somewhat of this undefined category that allows RAY to touch all of them at the same time. Like RAY is not only a machine learning engineering tool, it's not only like a platform engineering tool. Right. It like with Cube Ray we make it really easy for these platform engineers to deploy RAY with the ray actual core APIs like the machine learning engineer and the data scientists will use those. And when I mean ray like core APIs, I mean like the libraries and the RAY core primitives and we see that like you know, with the libraries especially some of the more higher level libraries like Ray Tune like that's something that the data scientists will naturally have in their arsenal. And at this point, so it is a tool for sort of a large swath of different people in common machine learning organizations at this point.
[14:40]
Mofi Rahman
Richard, you just mentioned CUBRA and we have the maintainer, one of the maintainers of CUBRA here, Kaishen. So if you wanted to kind of add a little bit more understanding of what CUBRA is, why it exists and why people should know more about cubra.
[14:55]
Kai Sun
I think the first is last because I think RAY is a very powerful tool and Kubernetes is. I think Ray is a very powerful tool for the computation and I think Kubernetes is the most powerful tool that's for like the orchestration and deployment and I think Kubra is Kubernetes operator to deploy RAY on the Kubernetes so that it can unlock opportunity for user to integrate with Kubernetes ecosystem with Ray. Yeah, for example, like some schedulers like the Cube volcano unicorn and they can use like a sound observer tool like a Prometheus Grafana, French bits, a lot of stuff and the sound load balancer like the nginx like istio. Yeah. So I think CUBRA is a reoperator and unlock a possibility for users to. Because in computation is not enough. You also need to have some deployment and a lot of stuff. And the Kubernetes ecosystem, I think it's a very good ecosystem to enable user to actually productionize Ray.
[15:57]
Mofi Rahman
So Ray have a number of these libraries part of RAY Core as well as the RAY project itself. Is there any favorite that you want to talk to our audience more about?
[16:07]
Richard Lia
Yeah, I think the RAY data project in particular is a really interesting sort of project that we're starting to focus more of our energy and resources around. It hasn't gotten a lot of public attention, but we've been doing a lot of work Internally to make it better. And we've seen a lot of our customers and open source users really get excited by the abilities and capabilities that it brings. In particular, RAID data is a data processing engine in some sense, but it's largely focused towards AI workloads. And what that means is it's being able to take big data and map it to your GPUs in a very effective, cost efficient, performance scalable way. Right. So that means like having really good integration of data pre processing when feature pre processing and then being able to ingest that into the GPU very efficiently. That's something that other systems like Dask wouldn't be able to support. And like the other side is batch inference, being able to handle sort of large models and scale out very, very quickly and handle all sorts of different data modalities, audio, text, image, video, so on and so forth. And just to feed into those large neural networks that we have now, that's like somewhat of a unique capability.
[17:32]
Mofi Rahman
Same question to Kaishun. What is your favorite ray library that exists?
[17:37]
Kai Sun
I think Razor provides a very flexible way to do multiplacing because I think in the production side we see a lot of user that they need to use a lot of different model in a single graph and I think racer provide a very easy way for that. And I think the second is that because as we said that it is a monolithic runtime and we see some very useful benefit from it. For example, I think because in array a single scattering unit is a test enactor which is a function or class instead of a container. So some of our users that at first they use several microservices to serve the model, maybe one golden microservice and send to a Python microservice and send to the other golden microservice and then when they use a reserve to write combine all of them together because they can avoid a lot of time on the serialization and this violation and they can share the resource between the stage. So I think they reduce 50% of their cost. Yeah, I think this is the public blog from the Sensara. So I think this is a pretty cool use case for me.
[18:48]
Mofi Rahman
I think a bit of a personal, not gripe, but a challenge for me is that a lot of the names in the rayserve and ray job space gets kind of overlapping on the kubernetes space because kubernetes have service and they have jobs. So there is a definition of ray job in the cube Ray operator. There's also a ray space job on The Ray definition. Like when folks go about like speaking and learning about these things. Like what have you seen as to be the most challenging part of mapping all the Ray concepts in the world of Kubernetes?
[19:23]
Kai Sun
Yeah, I think it is actually not easy to integrate around like a Ray on the Kubernetes because at first I think Kubernetes is primarily designed for the microservice architecture. For example, I think in the best practice is that in each container only run one single process. But I think in the race that we run multiple process in a single container, each teray test and each director is also a process. So we have a lot of challenge about that. For example, because we run multiple process. So it will be very hard to integrate with some login tool that's on the Kubernetes side. And because I think on the Kubernetes a lot of tools that are reading like a standard arrow and standard arrow but really have a multiple process. So it needs to write some logs into the file instead of standard arrow. And I think on the other side maybe like the auto scaling because I think in typically I think the autoscaler like the HPA or Carpenter. I think in the Kubernetes world it assumes that maybe all the pilot in a single deployment or something that it is status and you can detect use like a resource utilization like the CPU or memory utilization to decide to scale up or scale down a part because it is status. But in the race that is different, it spits a single application across multiple nodes. So you didn't know that is this part. That is it which part of this application running on this node. So it is possible that if you use physical resource utilization to determine to scale down this part, it may break the application. So like Ray needs to implement the autoscaler by ourself. So I think one of the challenges that we we need to make the Ray more friendly for the Kubernetes I think we become better and better, but still needs to have some help from the Kubernetes ecosystem and the community.
[21:20]
Mofi Rahman
And this is a question I think for Richard that the Ray project itself started about in 2019 as a research from universities and then over the next few years as the industry itself has evolved and changed. The Ray project also evolved and changed about five years into the Ray project existing for the next few years of Ray. What are the big challenges the Ray team and the project itself are thinking about?
[21:47]
Richard Lia
Huh? So this for the next how many.
[21:50]
Mofi Rahman
Years you said let's say the next five, the first five is we're in the first five and how about the next five?
[21:56]
Richard Lia
Yeah, I think machine learning is moving so quickly that it's just like a constant reinvigoration of the project that needs to continue to happen. One big thing that Kytron is working on right now is this thing called like accelerated dags which allows us to program GPU clusters with any sort of accelerator and be able to write these sort of efficient distributed programs that take advantage of GPU interconnects and open up the space for different applications that leverage GPUs in special ways. For example. Right. Like pipeline parallelism is something that the ecosystem and the machine learning community is starting to pick up and use more and more. But the abstractions for pipeline parallelism have sort of been quite limited both on the training and the serving side. And so one of the things with Ray DAGS is Ray sort of accelerated DAGS is being able to compose and create these efficient and effective and fault tolerant pipeline parallel training and inference pipelines very, very quickly. That's something that we've sort of needed to adapt and wasn't part of the original sort of story around Ray. I think the other sort of thing that we're also seeing is the rise of multiple accelerators. TPUs being one, sort of the AMD being like another, and Trainium and Inferentia being sort of another. And for Ray to continue to be relevant as like the default substrate for distributed computing, that's obviously something that we have to be aware of and be able to sort of schedule over, be able to do memory management over and so on and so forth. So that's part of sort of the vision that we have and allowing us to have a substrate for managing sort of resources over for AI workload.
[23:49]
Mofi Rahman
Okay, I think, yeah, I think that's a really great answer. The next question to the both of you is what is one or two things about Ray that you want people to know but like have been finding that in the community people have misconception about and this is the place you can just set the record straight and let people know this is one thing they should know about Ray if they had to.
[24:11]
Kai Sun
You say that's how I think users should know.
[24:13]
Mofi Rahman
Yeah, like isn't it something like the people you find in the community are either not understanding or have misconception about anything about Ray that it would be good for people to understand differently. If there is nothing, people are all everybody's getting Ray exactly as folks intended. That's fantastic. But oftentimes like, you know, people either think it's too difficult or it is like you need like a huge GPU cluster to run it or it's hard to learn or like anything you think in the community people are finding like you are hearing over and over again about something but you think that's a misconception.
[24:47]
Kai Sun
God. God.
[24:49]
Richard Lia
Yeah. The only thing I would probably say is just like, I think more people should be aware of Ray Data. I think there's a lot of people who are doing like, you know, batch inference or trying to force their like image processing pipelines to run with some really weird, you know, MPI setup or whatever. And Ray Data will probably make your life like a 100 times easier. But there hasn't been a lot of like, you know, public discourse around it. And I think that's something that I'd be excited to fix.
[25:22]
Mofi Rahman
Yeah. Hopefully this will get the conversation going and people are going to learn more about Ray Data. I have very limited experience of Ray Data, but I think from my experience it is actually made a lot of sense. Like the functionality that is built into Ray Data just made a lot of sense compared to my knowledge in like Pandas and Numpy, like bringing some of those stuff over to Ray Data have been quite a breeze. So I'm a fan personally.
[25:46]
Richard Lia
Yeah, yeah. So you're one of these guys. You gotta try out Ray Data again.
[25:53]
Mofi Rahman
And I think the last question I wanted to ask to both of you is that in your work over the last few years in the community and with people and possibly customers, is there a time where you've seen folks use Ray in a way where either was surprising or sometimes even eye opening for the RAID team to see like people in the community using the project or the libraries in some way that the original thought wasn't there? Like you didn't think people could use it this way?
[26:21]
Richard Lia
Yeah, I'm happy to give a response here. And Kashan, you can add on as well. There's this one guy on Twitter, I forget what his name was, but he like connected like a ton of MacBook Pros together or like MacBooks, like laptops together to create like a Ray cluster. And he was like, I forget exactly what he was doing. I think maybe training a model, but that was like kind of mind blowing and it blew up on Twitter as well. The odd thing, I mean, obviously Ray is distributed framework, but it's usually distributed for like cloud data center servers. Right. And this guy was like connecting his laptop and whatever. So that was really cool.
[26:56]
Kai Sun
Yeah, I think maybe for me is that I think maybe one of the things that I found is that some people that use the container in. Container? Yeah, people want to launch a container inside a Kubernetes pod, for example, if you serve some model and they don't want to back all the stuff in a single image and then they want to. So if they want to launch this kind of model, they launch a container in the container and with some additional image. Yeah, so they don't need to back everything that's into a single image. And yeah, I am very confusing about that. But to be honest, some people use that and it seems work well.
[27:38]
Mofi Rahman
Yeah, I think I was speaking to like a few months ago, it was the 10 year anniversary of Kubernetes as the project and I had a chance to meet with a bunch of the original contributors and the maintainers in that space. And I asked this question to a few people like do you think like can you think of a time like. Or an example of people using Kubernetes in a way initially did not think. And the answer I got overwhelmingly is like 90% of the things people are doing on Kubernetes today they did not think was like initial use case for Kubernetes. Kubernetes was initially created for like stateless applications, like web applications to scale massively on the cloud. But people found all sorts of interesting ways to use Kubernetes in many ways. And over time I think the project itself like caught up with stateful set stuff and storage and now accelerators to GPUs and GPUs and a bunch of other things. Last Kubecon there was a talk on the Kubernetes schedule, like Kubecon schedule about using FPGAs as a operating unit on Kubernetes as well. So people are finding all sorts of cool interesting ways to use these things. And it's always exciting for me to like hear these stories like the one Richard mentioned about like using a MacBook Pro Ray cluster. Richard, you mentioned accelerated DAG. Can one of you kind of like explain a little bit more what it is? It sounds interesting, but I'm not. I don't think our listeners are going to be completely clear of what that does.
[29:03]
Kai Sun
I think maybe I can take that part because I'm working on this project. Accelerate I think Assert Deck. It starts because I think Array is that you can say currently array is a resource architecture. And then a third is that it's. First is that reduce the system overhead that's from Ray. And the second is that we want to do more about authority centric computation like use some protocol like NICO and maybe RDMA in the future to accelerate the system overhead. And we are talking on two kind of workload. The first one is the large scale inference and the second one is on the large scale training and I am primarily focusing on the large scale training and I think S3DAG provide a very simple API for you to for example define your pyparison and in my experience is that I think there is a very popular strategy of pypy person and literal bubble. Pyparison is from a researcher team in Singapore and you can see their open source repository. They fork Megatron and then add some patch on it to implement the neural bubble piperism and you can read the patch. I think it's still maybe it is a short patch but you require a lot of knowledge from the Megatron to implement this patch. But with the Ray asserted dagger you just need to have maybe 100 line of code in a single python script and import rate and then you can implement it. You don't need to focus on Megatron so I think Megatron is pretty powerful do a lot of application but Azratex provides a very flexible way for you to define your distributed strategy.
[30:51]
Mofi Rahman
Okay so I'm going to actually ping you later for the links that you mentioned. I think that can be really good to have in the show notes for people to have access to those. Richard Kaishan thank you so much for spending this half an hour with me answering a lot of my questions about Ray. Hopefully people will get some interesting insight into the Ray and Cubray project before we finish anything you would like to end on either any new things that people should be excited about that are coming from either the Ray project or the Cubray project and we can finish on a hopeful future note.
[31:24]
Kai Sun
Yeah I think a Kubray community we recently focusing on several perspectives. I think the first one is for the long running job and we define some like how does it handle if it fails and figure out the best practice for the like how to do a checkpointing. Yeah and we collaborate very closely with the Google Kubernetes team. Yeah and the other side is that we are also focusing on like the support like heterogeneous computing resource for example we also collaborate with the Google Kubernetes team to support the TPU and the multi host TPU and we support the auto scaling for the multi host TPU and I think this is maybe the currently the only solution that's in the open source world to support a different kind of resource and support a multi host auto scaling and the third one is that we also currently build some ecosystem for usability and security. I think we currently build some like the authentication solution that's on the Kubrick side and we also build some like easy to use the like a kubectl plugin for users. Yeah. And we are also figuring some like upgrade mechanism for the serving recently. Yeah. So I think there are a lot of stuff still on the roadmap. Yeah. So I think it's pretty exciting and the community becomes much popular.
[32:52]
Mofi Rahman
On that exciting note, thank you so much Richard and Kaishun for joining in this episode of Kubernetes podcast and the social accounts for both Kaishun and Richard will be in the show notes so people can go follow and ask probably more questions about Ray and Kubra. Thank you so much.
[33:09]
Richard Lia
Thank you. Bye bye.
[33:12]
Kaslan Fields
Thank you very much Mophie for that.
[33:14]
Mophi Rahman
Interview with Richard and Kai.
[33:16]
Kaslan Fields
Soon from any scale about Ray.
[33:18]
Mophi Rahman
I've been really looking forward to Ray and I'm excited about what I learned. But first off, welcome back to the show Mophie.
[33:24]
Mofi Rahman
Yay, I made it back again. Woo.
[33:29]
Mophi Rahman
Mophie in his day to day work when we do, you know, that kind of stuff is the primary person on our team focused on AI stuff. So he has been kind of deeper in Ray and AI use cases on Kubernetes than anyone else on our team. So I was excited to have him do this interview. Since you have more context in this area, you're able to kind of dive a little bit deeper into it. I know as I was listening through the episode I heard you asking some questions about things that you've seen and so I really appreciated that perspective.
[34:07]
Mofi Rahman
Yeah, I think from our point of view, Ray is one of the very fundamental building blocks that we are thinking that we along with other customers and people in this space would be using for building out the ML platform. So in our work that we are like working with customers as well as folks from the community. When we're thinking about an ML platform it is a fairly difficult thing to describe for people because ML platform means very different things to different people. But the fundamental things that people need from not just ML platform, like any developer platform, it's things like multitenancy, it's things like getting resources when you need them and some version of self servicing and in many cases a lot of the cloud provider system like GCP and things like other cloud providers. You have some system where people can do self servicing. But for most organizations that permission system is probably too open, so they end up building some level of guardrail around it to control or limit the amount of knowledge needed to do the day to day task for the developers and data scientists and machine learning engineers. So people build out different level of abstraction for their folks to have access to resources in some ways. And when you are doing all of that, I think having some sort of way to define your workload in a way that doesn't need to change across a lot of this boundary that you cross, Ray actually becomes really useful in that space. You can define your workload in Ray. You could run that workload from your local machine as well as run that in a distributed manner in massive, massive scale. I think that's one of the things I also talked to Richard and Kaishun about is that the model of Ray seems to fit very nicely in the world of I want to do something locally, do some experimentation, but then take it out to a distributed computer and scale it out to basically as big as I need to.
[36:05]
Mophi Rahman
So I've been trying to learn about RAY for a while. It's been on my list for actual work stuff that I need to do for some time. And I feel like this conversation has finally got me kind of near a baseline at least. So I like the way that you laid that out as kind of the primary use case that at least we're really focused on in terms of how Ray relates to Kubernetes is all about enabling this kind of workflow where engineers, data scientists can create their AI stuff with Ray and then you can run Ray on Kubernetes and so you can productionize it easily. So you've got kind of this path from your local machine to production. I think that makes a lot of sense.
[36:54]
Mofi Rahman
Yeah. Also a really interesting conversation with someone like Richard who have been with the project from a really early stage of the project as well. Where Ray was initially almost like a library for doing some data science work. I think Ray RLIB and Ray Tune was the first two component of ray. So one of the things that is interesting about Ray as well is that it's a like end to end things that are all part of ray. So they have Ray Data, Ray Core, RAY Serve, Ray Tune. RAY Cluster is also concept of like building out the cluster. We spoke about a couple of other projects in this space. Like airflow is something Kubeflow exists in this space. Ray Spark also exists in this space. And one of the key challenges a lot of folks have with these things is that when you are like Building things the way you locally test things and you have to convert those into something else, turn them into jobs or containers to run them on kubeflow and other things. With Ray you could kind of have this end to end flow. All part of under the Ray umbrella in terms of making sense of your problem and understanding the problem space. Once you have done the initial investment of like going all in on Ray, I think you have a really nice way to wrap and map all your problems or most of your data science problems in the world of Ray. So once you have done the initial investment it kind of becomes like, okay, it's a gradual progress from that point on.
[38:20]
Mophi Rahman
And I didn't know you all well I guess Richard mentioned in his intro that Ray came from UC Berkeley. It was like a university project that kind of developed into a whole product, I think.
[38:32]
Mofi Rahman
Yeah, I think Richard at the time was I think doing his research on like a postgrad research there and he was working on distributed systems. And when you're talking about distributed systems, doing it manually or doing it in like more complicated ways, the Ray came from. And this is what I love about like projects that come from trying to solve a very specific use case and then they find out this is actually maps to a lot of other use cases like this surprise. I mean one of the biggest example of that probably is Kubernetes, right. Like Kubernetes initially was trying to solve the problem of running stateless distributed applications on cloud like commodity hardware. At Google with Borg, the engineers were doing this and they wanted to figure out how do I make sure this just works across any kind of hardware. But now Kubernetes kind of anecdotal story like in the ten year celebration time, like few months ago in the last kubecon I ran into Tim, Tim Hawkins and I was asking like the things people are using Kubernetes for. Did any of you initially thought people would be using any of this for? Yeah, they're like 90% of the. And Tim told me 90% of things people are doing with Kubernetes now were not like initially designed for.
[39:44]
Richard Lia
Right.
[39:44]
Mofi Rahman
Like AI workload running databases. Right. Stateful workloads. These are not the initial subset of things Kubernetes was designed to do. But Kubernetes came like, was built with a, like a strong sense of we want to solve this problem. And people found out that other problems also map pretty nicely in a distributed system like this. So Ray has very different but also kind of similar story to that where they're trying to solve A very specific use case for their distributed workloads. But turns out a lot of machine learning and AIML workload kind of mapped to similar paradigm, which is awesome.
[40:16]
Mophi Rahman
The generalization of the usage of distributed systems, distributed systems is just such a common problem of I have a lot of hardware and I want to run stuff on it. So it makes a lot of sense that these things get kind of generalized out and something with Ray that I'm kind of trying to wrap my head around.
[40:32]
Kaslan Fields
So Ray has all of these different.
[40:33]
Mophi Rahman
Components like you were mentioning, Ray Data, Ray Core, Ray Accelerated dag. And so it has all these different components. I know that there's an API component to it. It's not a language of itself. So is it kind of like an open source API standard with all of these modular components in different areas or.
[40:58]
Mofi Rahman
So most of them are just Python libraries. Like. So if you are like using Ray Data, so you would be closer to something like Numpy or Pandas that have similar functionality. And this is also anecdotal. I was speaking to someone and also my personal little bit of tinkering that I've done with Ray Data, they have a very like, they have taken a lot of like, inspiration and learning from other existing libraries in the space and try to simplify all the things people found as challenging in those things like understanding and mapping machine learning and data science concepts in code in some way. So like if you're coming from the world of Pandas or scikit or Numpy and trying to map your knowledge into the world of Ray and Ray Data, I think you'd have a pretty nice time. Like, I think most of the things work as you would expect it to work for the most part. So that's another reason people are like, I think a lot of folks do like Ray libraries because their existing knowledge in the data science concepts almost always kind of map fairly easily onto the world of Ray.
[42:01]
Mophi Rahman
So if it's mainly a set of like Python libraries, essentially, what benefit do you get from running cubray from the operator on Kubernetes?
[42:12]
Mofi Rahman
So Ray has basically at this point, initially when Ray started it was mostly libraries, but now Ray has evolved to the point of it is the library, but also the execution engine underneath. So the things you're running on Kubernetes. So in terms of like the workload that runs within the Ray Serve and Ray job, which is the construct they have for running the workload itself that doesn't necessarily need you to use any of the Ray like core libraries, you could continue using Your numpy code, you could continue using your Scikit, learn Pandas code. The way the Ray infrastructure layer works, you add some annotations and that lets the Ray engine understand that this thing has to get spread across, like remotely spread across a bunch of computers. So if you look through the Ray documentation, you're going to see a bunch of these annotations like Ray Task or Ray Remote and things of that nature. So you can define any Python function as a reactor and this actor can be then like spread across. Like you can distribute it across any number of Ray remote runtime that exists. So you can actually almost think Ray as like broken down into two major parts. One is the library part itself, where all these like data science and like machine learning, like code and library exist. The other one, the infrastructure layer, when Ray first started, it was mostly in the library side. They didn't really think about the infrastructure. But as Ray grew, the need for having more of an infrastructure solution became more and more apparent. So the cube part of it is the operator that installs the Ray job, ray service and Ray cluster as a construct into Kubernetes. And on top of that it could run any type of Python code. But Ray also comes with a bunch of this existing like Ray data, Ray tune and Ray reinforcement learning libraries part of the Ray core itself.
[44:04]
Mophi Rahman
And we've talked on the show before about some of the limitations of kind of similar concepts in Kubernetes. You mentioned during the interview that there are a number of words that are kind of shared between Ray and Kubernetes and that could get a little bit confusing, like the concept of ray jobs and the tasks and things like that.
[44:21]
Mofi Rahman
So if not just like Ray and Kubernetes in general, there is concept of like. The job itself is such a ubiquitous term. Like it exists in the world of HPC and it exists in the world of just in general. So when you say something is a job, it can mean so many different things. So like one funny example, we actually published a video we can add it to the show. Notes is that in Kubernetes there is job. When you have a ray job in Kubernetes, it's the ray's job without a space. That is the Kubernetes resource definition of ray job. And then inside Ray they have a concept of a job which is spelled ray space job, which means like execution of a ray bit of ray code. So when you read it out loud, ray job versus Ray job, it is just becomes very complicated to think about. But the easiest way to Think about it is a Ray job, that is the Kubernetes one. It is the Ray job, the Ray code plus a submitter that submits the code on your behalf. Together they become Ray job without the space. As I'm saying it for the third time, I feel like I am getting confused what I'm saying. But like, so one of them without the space, it's the Kubernetes resource definition CRD and that defines the Ray code plus something that submits the code on your behalf to Array cluster versus Array cluster is kind of like bunch of resource to get bundled together which will be used to run your code, if that makes sense.
[45:52]
Mophi Rahman
This is a challenge with abstractions and layers of abstraction is you're taking a lot of the same concepts and abstracting them at different levels. So they kind of have similar names. That makes sense though. At least that would happen. Whether or not I can wrap my head around using the right ones at the right times. We will see. Awesome. So another thing that I really liked about your interview is that especially Kai sun and you both mentioned a lot of different tools kind of in the same spaces all at the same time. So there was a section for example, where Kai sun mentioned different schedulers, so like Q and Volcano and Apache Unicorn and observability tools and load balancers. And I liked that. That kind of gave this view of this isn't something that's going to exist in a vacuum. Obviously it's part of a distributed system and so it's going to require all.
[46:47]
Kaslan Fields
Of these other components.
[46:49]
Mophi Rahman
You also mentioned about the benefits of being tied into Kubernetes. One of the benefits is that you get access to the whole Kubernetes ecosystem because there's already so much stuff there that you can kind of tie together with Kubernetes to build your whole platform that you really need. So Ray is one component of this. It's doing the custom resource definition and it's also a tool for developers, I guess.
[47:12]
Mofi Rahman
Yeah. So if you actually like were to open up the Ray docs and kind of go to the overview, they have actually a pretty nice picture that defines the relationship between all the parts of Ray fairly nicely. So at the very top layer, which is all the Ray AI libraries, which is the Ray data, Ray train, Ray, all of them exist as a library. Those are interchangeable. You could use the Ray versions of those libraries or you can find like any other Python code that would do the work. Underneath is the Ray core, which is the actors, your remotes, all the ways you can Basically define how Ray would distribute the workload for you. All of them are usually defined via some sort of annotations in your Python code. So most of Ray code, if you just have those annotations, your code would just work as a Python function. But the moment you start adding running them in Ray, Ray would understand the annotations and run the code a little bit differently. So that's kind of like a very, I don't know, elevator pitch of how Ray kind of does the things. And all of the Ray core stuff then eventually runs on some sort of like infrastructure. So that could be anyscale's own service, it could be kubernetes, it could be running on a vm, it could be on your own data center on bare metal. But raycore is the code that is like running on some sort of place where compute can run on. Now one of the things in this is missing is that Ray itself does not necessarily have a lot of consideration or solutions built in for things like observability logging solutions or things like scheduling out or like scheduling out to Kubernetes and which I think is a good thing because if Ray were to solve all this, it'll be very locked in system. But now that they are like falling back onto things like Q, Unicorn and volcano makes Ray a lot more composable and make sure that people can build out. It's like a more of a composability aspect of Ray, which I really like. Another project in this space that solves like bunch of this AI and ML related problems that started a while ago too, it's kubeflow and it comes with a lot of the pieces bundled in and you have to kind of have all these glues connected to other things. So kubeflow comes more like a framework where all the moving pieces are all under kubeflow. And on the other hand, Ray's approach is more like Ray is a piece that you plug into your system and collect the metrics that you want, collect the logs that you want. And the challenge with that then is you have to maintain your own metric system on log system and also security. Ray does not have a concept of authentication oauth or like SSO all this other stuff in the Ray cluster you have to kind of wrap it yourself using your systems. So it's a bit more work but makes it more composable.
[50:00]
Mophi Rahman
Awesome. So I think I've got a good baseline now hopefully for understanding Ray a little bit better. I hope to dive into it more in the next week because I need to do some work on that.
[50:10]
Kaslan Fields
So I hope all of you out.
[50:12]
Mophi Rahman
There enjoyed learning about Ray and we'll check it out.
[50:15]
Kaslan Fields
Any last words that you want to say about Ray?
[50:17]
Mophi Rahman
Mophie, I feel like I'm doing like a second interview here because I wanted to learn from you about this.
[50:23]
Mofi Rahman
Yeah, I think, I mean so there are like few big aspects to Rey which is Ray is whatever the person using Ray like Ray to them is different. Like if you are someone who is very deep into the AI workloads itself, you are probably like a very deep into the Ray libraries versus me personally I'm probably more interested in the Ray core and distributed aspect of it. So even when two people are talking about Ray, oftentimes we're not talking about the exact same thing because Ray is like have multifaceted of doing different solutions. But when you're talking about like building an ML platform which we are currently a lot of folks are thinking about, I think Ray becomes one of the pieces. The Ray core part of it becomes one of the pieces that you can use to have a way to distribute your job across. And Cubray in GKE recently also announced the Ray add on that automatically installs CUBRA on your cluster. And so with all of this it becomes a little bit easier to get from Kubernetes to Ray. The blue point, the connection point becomes a little bit easier.
[51:32]
Mophi Rahman
So you mentioned like we've talked a lot here about kind of the pathway from local development to Kubernetes clusters. So the and also kind of mentioned that there's several different Personas here. So like for those running Kubernetes clusters, if you're serving data scientists who might be using Ray to do data science things, then you might want to consider looking into how to get Cube Ray on your clusters and see if your data scientists can benefit from that. And then from the other side, if you are a data scientist or some developer who is working on AI workloads, you might look into the Ray libraries.
[52:08]
Kaslan Fields
To see if they're useful.
[52:09]
Mophi Rahman
Does that kind of COVID do you think? Some of the primary Personas definitely does.
[52:14]
Mofi Rahman
I think yeah, like you have basically about three main ones, right? You have the Ray cluster one, which probably we are more closer to SGK devil advocates. And then you have the machine learning engineers, data scientists that are doing their work in the middle. The glue is you take your Python functions and convert them into like distributed applications and then they get to run. So you have three part main parts to it and Ray makes it fairly easy with just like simple annotations that you can just put so oftentimes the default decision that Ray takes are almost always fairly optimized for you so you don't have to do too much work figuring out the right tuning of the distributedness of your application.
[52:56]
Mophi Rahman
Excellent. Very last thing before we stop for this episode. Ray Summit is coming up.
[53:04]
Kaslan Fields
So there's going to be an event.
[53:05]
Mophi Rahman
Held in San Francisco focused on Ray.
[53:08]
Mofi Rahman
Yeah, I think it's a really growing project with a lot of cool things coming up. In our chat, Kaishan and both Richard and Kaishan mentioned new things that are in the roadmap for Ray, both for Cubra side of things and the Ray libraries, and I'm excited to see where the project goes. It's about five years ish old and it's only growing.
[53:28]
Mophi Rahman
So if you're in the San Francisco area and want to learn more about Ray, you might consider checking out Ray Summit, which is at the very end of September, beginning of October, like September 28th to October 2nd or something like that. We'll make sure to have the dates and links in the show notes thank.
[53:44]
Kaslan Fields
You so much for that interview, Mophie.
[53:47]
Mophi Rahman
I'm really glad that I now know something about Ray.
[53:49]
Mofi Rahman
Thanks for having me. That brings us to the end of another episode. If you enjoyed this show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on social media at kubernetespod or reach us by email at kubernetespodcast@google.com you can also check out the website at kubernetespodcast.com where you will find transcripts and show notes and links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show. Thanks for listening and we'll see you next time.