From 911e852ceb0743b7f8cf589bdcc695ffc49f6c80 Mon Sep 17 00:00:00 2001 From: Dmitri Gekhtman Date: Wed, 7 Dec 2022 12:53:46 -0800 Subject: [PATCH 1/9] wip Signed-off-by: Dmitri Gekhtman --- ray-operator/README.md | 45 ++++++++++--------- .../config/samples/config-map-ray-code.yaml | 25 ----------- 2 files changed, 25 insertions(+), 45 deletions(-) delete mode 100644 ray-operator/config/samples/config-map-ray-code.yaml diff --git a/ray-operator/README.md b/ray-operator/README.md index 0e7d698e98..aad82d5487 100644 --- a/ray-operator/README.md +++ b/ray-operator/README.md @@ -1,44 +1,44 @@ # Ray Kubernetes Operator -KubeRay operator makes deploying and managing Ray clusters on top of Kubernetes painless - clusters are defined as a custom RayCluster resource and managed by a fault-tolerant Ray controller. +The KubeRay Operator makes deploying and managing Ray clusters on top of Kubernetes painless - clusters are defined as a custom RayCluster resource and managed by a fault-tolerant Ray controller. The operator automates management Ray cluster lifecycle, autoscaling, and other critical functions. The Ray Operator is a Kubernetes operator to automate provisioning, management, autoscaling and operations of Ray clusters deployed to Kubernetes. -![overview](media/overview.png) +![Overview](media/overview.png) Some of the main features of the operator are: - Management of first-class RayClusters via a [custom resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-resources). - Support for heterogenous worker types in a single Ray cluster. +- Ray Autoscaler integration; autoscaling based on Ray application semantics. +- Use of `ScaleStrategy` to remove specific nodes in specific groups - Built-in monitoring via Prometheus. -- Use of `PodTemplate` to create Ray pods -- Updated status based on the running pods +- Use of Kubernetes `PodTemplates` to configure Ray pods +- Updated RayCluster Status based on the state of running Ray pods - Events added to the `RayCluster` instance -- Automatically populate `environment variables` in the containers -- Automatically prefix your container command with the `ray start` command -- Automatically adding the volumeMount at `/dev/shm` for shared memory -- Use of `ScaleStartegy` to remove specific nodes in specific groups +- Automatated management of critical configuration, such as required `environment variables`, the `ray start` entrypoint, and a `dev/shm` volume mount for Ray's shared memory. ## Overview -When deployed, the ray operator will watch for K8s events (create/delete/update) for the `raycluster` resources. The ray operator can create a raycluster (head + multiple workers), delete a cluster, or update the cluster by adding or removing worker pods. +When deployed, the KubeRay Operator will watch for K8s events (Create/Delete/Update) for the `RayCluster` resources. The KubeRay Operator can create a Raycluster (Ray head pod + multiple Ray worker pods), delete a cluster, or update the cluster by adding or removing worker pods. ### Ray cluster creation -Once a `raycluster` resource is created, the operator will configure and create the ray-head and the ray-workers specified in the `raycluster` manifest as shown below. +Once a `RayCluster` resource is created, the operator will configure and create the Ray head pod and the Ray worker pods specified in the `raycluster` manifest as shown below. ![](media/create-ray-cluster.gif) -### Ray cluster Update +### Ray cluster update -You can update the number of replicas in a worker goup, and specify which exact replica to remove by updated the raycluster resource manifest: +You can update the number of replicas in a worker group, and specify which exact replica to remove by updated the RayCluster resource manifest: ![](media/update-ray-cluster.gif) -### Ray cluster example code - -An example ray code is defined in this [configmap](config/samples/config-map-ray-code.yaml) that is mounted into the ray head-pod. By examining the logs of the head pod, we can see the list of the IP addresses of the nodes that joined the ray cluster: - -![](media/logs-ray-cluster.gif) +!!! note + While updating `replicas` and `workersToDeleteUpdate` is supported, updating other fields in RayCluster manifests is **not** supported. + In particular, updating Ray head pod and Ray worker pod configuration is not supported. To update pod configuration, + delete the RayCluster, edit its configuration and then re-create the cluster. In other words, + use `kubectl delete` and `kubectl create` to update a RayCluster's pod configuration, rather than `kubectl apply`. + Support for in-place updates of pod configuration is tracked in KubeRay issue [#527](https://github.com/ray-project/kuberay/issues/527). ### Deploy the operator @@ -78,11 +78,16 @@ Sample | Description !!! note For production use-cases, make sure to allocate sufficient resources for your Ray pods; it usually makes - sense to run one large Ray pod per Kubernetes node. + sense to run one large Ray pod per Kubernetes node. We do not recommend allocating less than 8Gb memory for a Ray pod + running in production. Always set limits for memory and CPU. When possible, set requests equal to limits. + See the Ray documentation for further guidance. See [ray-cluster.complete.large.yaml](config/samples/ray-cluster.complete.large.yaml) and - [ray-cluster.autoscaler.large.yaml](config/samples/ray-cluster.autoscaler.yaml) for guidance. The rest of the sample configs above are geared towards experimentation in local kind or minikube environments. + [ray-cluster.autoscaler.large.yaml](config/samples/ray-cluster.autoscaler.yaml) for examples of RayCluster + resource configurations suitable for production. + The rest of the sample configs above are geared towards experimentation in local kind or minikube environments. - The memory usage of the KubeRay operator depends on the number of pods and Ray clusters being managed. Anecdotally, managing 500 Ray pods requires roughly 500MB memory. Monitor memory usage and adjust requests and limits as needed. + The memory usage of the KubeRay Operator depends on the number of pods and Ray clusters being managed. + Anecdotally, managing 500 Ray pods requires roughly 500MB memory. Monitor memory usage and adjust requests and limits as needed. ```shell # Create a RayCluster and a ConfigMap with hello world Ray code. diff --git a/ray-operator/config/samples/config-map-ray-code.yaml b/ray-operator/config/samples/config-map-ray-code.yaml deleted file mode 100644 index db806bd2c4..0000000000 --- a/ray-operator/config/samples/config-map-ray-code.yaml +++ /dev/null @@ -1,25 +0,0 @@ -apiVersion: v1 -kind: ConfigMap -metadata: - name: ray-code -data: - sample_code.py: | - import ray - from os import environ - redis_pass = environ.get("REDIS_PASSWORD") - print("trying to connect to Ray!") - ray.init(address="auto", _redis_password=redis_pass) - print("now executing some code with Ray!") - import time - start = time.time() - @ray.remote - def f(): - time.sleep(0.01) - return ray._private.services.get_node_ip_address() - values=set(ray.get([f.remote() for _ in range(1000)])) - print("Ray Nodes: ",str(values)) - file = open("/tmp/ray_nodes.txt","a") - file.write("available nodes: %s\n" % str(values)) - file.close() - end = time.time() - print("Execution time = ",end - start) \ No newline at end of file From 187d012ae1a3d1507c73ca59d0d562c33e35e53e Mon Sep 17 00:00:00 2001 From: Dmitri Gekhtman Date: Wed, 7 Dec 2022 13:01:49 -0800 Subject: [PATCH 2/9] wip Signed-off-by: Dmitri Gekhtman --- ray-operator/README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/ray-operator/README.md b/ray-operator/README.md index aad82d5487..eb079eb184 100644 --- a/ray-operator/README.md +++ b/ray-operator/README.md @@ -84,11 +84,12 @@ Sample | Description See [ray-cluster.complete.large.yaml](config/samples/ray-cluster.complete.large.yaml) and [ray-cluster.autoscaler.large.yaml](config/samples/ray-cluster.autoscaler.yaml) for examples of RayCluster resource configurations suitable for production. - The rest of the sample configs above are geared towards experimentation in local kind or minikube environments. + The rest of the sample configs above are meant only for experimentation in local kind or minikube environments. The memory usage of the KubeRay Operator depends on the number of pods and Ray clusters being managed. Anecdotally, managing 500 Ray pods requires roughly 500MB memory. Monitor memory usage and adjust requests and limits as needed. +We recommend running the following example in a kind or minikube environment with a resource capacity of at least 4CPU and 4Gb memory. ```shell # Create a RayCluster and a ConfigMap with hello world Ray code. $ kubectl create -f config/samples/ray-cluster.heterogeneous.yaml From cb2ef774e4c805da6071e3667707006719baeacc Mon Sep 17 00:00:00 2001 From: Dmitri Gekhtman Date: Wed, 7 Dec 2022 13:06:34 -0800 Subject: [PATCH 3/9] wip Signed-off-by: Dmitri Gekhtman --- ray-operator/README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/ray-operator/README.md b/ray-operator/README.md index eb079eb184..2876d78d2b 100644 --- a/ray-operator/README.md +++ b/ray-operator/README.md @@ -80,7 +80,7 @@ Sample | Description For production use-cases, make sure to allocate sufficient resources for your Ray pods; it usually makes sense to run one large Ray pod per Kubernetes node. We do not recommend allocating less than 8Gb memory for a Ray pod running in production. Always set limits for memory and CPU. When possible, set requests equal to limits. - See the Ray documentation for further guidance. + See the [Ray documentation](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html) for further guidance. See [ray-cluster.complete.large.yaml](config/samples/ray-cluster.complete.large.yaml) and [ray-cluster.autoscaler.large.yaml](config/samples/ray-cluster.autoscaler.yaml) for examples of RayCluster resource configurations suitable for production. @@ -91,6 +91,10 @@ Sample | Description We recommend running the following example in a kind or minikube environment with a resource capacity of at least 4CPU and 4Gb memory. ```shell +# From the parent of your cloned kuberay repo: +$ cd kuberay/ray-operator +# If you haven't already done so, deploy the KubeRay operator. +$ kubectl create -k config/default # Create a RayCluster and a ConfigMap with hello world Ray code. $ kubectl create -f config/samples/ray-cluster.heterogeneous.yaml configmap/ray-code created From dad251f67f77e09410b6f4a6d9add6b13f70d296 Mon Sep 17 00:00:00 2001 From: Dmitri Gekhtman Date: Wed, 7 Dec 2022 13:28:11 -0800 Subject: [PATCH 4/9] Draft Signed-off-by: Dmitri Gekhtman --- ray-operator/README.md | 21 ++++++++++++------- .../samples/ray-cluster.heterogeneous.yaml | 2 +- 2 files changed, 15 insertions(+), 8 deletions(-) diff --git a/ray-operator/README.md b/ray-operator/README.md index 2876d78d2b..d14ca65cbb 100644 --- a/ray-operator/README.md +++ b/ray-operator/README.md @@ -90,13 +90,16 @@ Sample | Description Anecdotally, managing 500 Ray pods requires roughly 500MB memory. Monitor memory usage and adjust requests and limits as needed. We recommend running the following example in a kind or minikube environment with a resource capacity of at least 4CPU and 4Gb memory. +Run the following commands from the root of your cloned kuberay repo. ```shell -# From the parent of your cloned kuberay repo: -$ cd kuberay/ray-operator +# Clone the kuberay repo if you haven't already. +$ git clone https://github.com/ray-project/kuberay +# Enter the root of the repo +$ cd kuberay/ # If you haven't already done so, deploy the KubeRay operator. -$ kubectl create -k config/default +$ kubectl create -k ray-operator/config/default # Create a RayCluster and a ConfigMap with hello world Ray code. -$ kubectl create -f config/samples/ray-cluster.heterogeneous.yaml +$ kubectl create -f ray-operator/config/samples/ray-cluster.heterogeneous.yaml configmap/ray-code created raycluster.ray.io/raycluster-heterogeneous created @@ -106,6 +109,9 @@ NAME AGE raycluster-heterogeneous 2m48s # The created cluster should include a head pod, worker pod, and a head service. +# It may take a few minutes for the pods to enter Running status. +# If you're on minikube or kind, a Pending status indicates that your local Kubernetes environment +# may not have sufficient CPU or memory capacity -- try adjusting your Docker settings. $ kubectl get pods NAME READY STATUS RESTARTS AGE raycluster-heterogeneous-head-9t28q 1/1 Running 0 97s @@ -123,8 +129,8 @@ raycluster-heterogeneous-head-svc ClusterIP 10.96.47.129 637 ``` ```shell -# check the logs of the head pod -$ kubectl logs raycluster-heterogeneous-head-5r6qr +# Check the logs of the head pod. (Substitute the name of your head pod in this step.) +$ kubectl logs raycluster-heterogeneous-head-9t28q 2022-09-21 13:21:57,505 INFO usage_lib.py:479 -- Usage stats collection is enabled by default without user confirmation because this terminal is detected to be non-interactive. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details. 2022-09-21 13:21:57,505 INFO scripts.py:719 -- Local node IP: 10.244.0.144 2022-09-21 13:22:00,513 SUCC scripts.py:756 -- -------------------- @@ -151,6 +157,7 @@ $ kubectl logs raycluster-heterogeneous-head-5r6qr Execute hello world Ray code ```shell +# Substitute the name of your head pod in this step. $ kubectl exec raycluster-heterogeneous-head-9t28q -- python /opt/sample_code.py 2022-09-21 13:28:41,176 INFO worker.py:1224 -- Using address 127.0.0.1:6379 set in the environment variable RAY_ADDRESS 2022-09-21 13:28:41,176 INFO worker.py:1333 -- Connecting to existing Ray cluster at address: 10.244.0.144:6379... @@ -161,7 +168,7 @@ Ray Nodes: {'10.244.0.145', '10.244.0.143', '10.244.0.146', '10.244.0.144', '10 Execution time = 4.855740308761597 ``` -The output of hello world Ray code show 5 nodes in the Ray cluster +The output of the hello world Ray code show 5 nodes in the Ray cluster ``` Ray Nodes: {'10.244.0.145', '10.244.0.143', '10.244.0.146', '10.244.0.144', '10.244.0.147'} ``` diff --git a/ray-operator/config/samples/ray-cluster.heterogeneous.yaml b/ray-operator/config/samples/ray-cluster.heterogeneous.yaml index 68acc1396c..0168cddd98 100644 --- a/ray-operator/config/samples/ray-cluster.heterogeneous.yaml +++ b/ray-operator/config/samples/ray-cluster.heterogeneous.yaml @@ -43,7 +43,7 @@ spec: # the following params are used to complete the ray start: ray start --head --block ... rayStartParams: dashboard-host: '0.0.0.0' - num-cpus: '1' # can be auto-completed from the limits + num-cpus: '1' # can be auto-completed from Ray container resource limits block: 'true' #pod template template: From 67c0830d98800a581519528275319d3234c2ec9b Mon Sep 17 00:00:00 2001 From: Dmitri Gekhtman Date: Wed, 7 Dec 2022 13:29:13 -0800 Subject: [PATCH 5/9] Remove link. Signed-off-by: Dmitri Gekhtman --- docs/components/config/samples/config-map-ray-code.yaml | 1 - 1 file changed, 1 deletion(-) delete mode 120000 docs/components/config/samples/config-map-ray-code.yaml diff --git a/docs/components/config/samples/config-map-ray-code.yaml b/docs/components/config/samples/config-map-ray-code.yaml deleted file mode 120000 index ed6972a36b..0000000000 --- a/docs/components/config/samples/config-map-ray-code.yaml +++ /dev/null @@ -1 +0,0 @@ -../../../../ray-operator/config/samples/config-map-ray-code.yaml \ No newline at end of file From e93ee306320a52ec989dd8c7831bc59a14e46f44 Mon Sep 17 00:00:00 2001 From: Dmitri Gekhtman Date: Wed, 7 Dec 2022 13:40:20 -0800 Subject: [PATCH 6/9] . Signed-off-by: Dmitri Gekhtman --- ray-operator/README.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/ray-operator/README.md b/ray-operator/README.md index d14ca65cbb..d7e63b5201 100644 --- a/ray-operator/README.md +++ b/ray-operator/README.md @@ -1,24 +1,24 @@ # Ray Kubernetes Operator -The KubeRay Operator makes deploying and managing Ray clusters on top of Kubernetes painless - clusters are defined as a custom RayCluster resource and managed by a fault-tolerant Ray controller. The operator automates management Ray cluster lifecycle, autoscaling, and other critical functions. -The Ray Operator is a Kubernetes operator to automate provisioning, management, autoscaling and operations of Ray clusters deployed to Kubernetes. +The KubeRay Operator makes deploying and managing Ray clusters on top of Kubernetes painless. Clusters are defined as a custom RayCluster resource and managed by a fault-tolerant Ray controller. The KubeRay Operator automates Ray cluster lifecycle management, autoscaling, and other critical functions. ![Overview](media/overview.png) -Some of the main features of the operator are: +Below are some of the main features of the KubeRay operator: + - Management of first-class RayClusters via a [custom resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-resources). - Support for heterogenous worker types in a single Ray cluster. -- Ray Autoscaler integration; autoscaling based on Ray application semantics. -- Use of `ScaleStrategy` to remove specific nodes in specific groups -- Built-in monitoring via Prometheus. +- Optional Ray Autoscaler integration; autoscaling based on Ray application semantics. - Use of Kubernetes `PodTemplates` to configure Ray pods -- Updated RayCluster Status based on the state of running Ray pods -- Events added to the `RayCluster` instance +- Use of `ScaleStrategy` to remove specific Ray worker pods. - Automatated management of critical configuration, such as required `environment variables`, the `ray start` entrypoint, and a `dev/shm` volume mount for Ray's shared memory. +- Built-in monitoring via Prometheus. +- Each `RayCluster`'s Status is updated based on the state of running Ray pods. +- Kubernetes Events concerning `RayCluster` instances are emitted to aid observability. ## Overview -When deployed, the KubeRay Operator will watch for K8s events (Create/Delete/Update) for the `RayCluster` resources. The KubeRay Operator can create a Raycluster (Ray head pod + multiple Ray worker pods), delete a cluster, or update the cluster by adding or removing worker pods. +When deployed, the KubeRay Operator will watch for K8s events (Create/Delete/Update) for `RayCluster` resources. The KubeRay Operator can create a Ray cluster (Ray head pod + multiple Ray worker pods), delete a Ray cluster, or update the Ray cluster by adding or removing worker pods. ### Ray cluster creation @@ -28,7 +28,7 @@ Once a `RayCluster` resource is created, the operator will configure and create ### Ray cluster update -You can update the number of replicas in a worker group, and specify which exact replica to remove by updated the RayCluster resource manifest: +You can update the number of replicas in a worker group, and specify which exact replica to remove by updating the RayCluster resource manifest: ![](media/update-ray-cluster.gif) @@ -57,7 +57,7 @@ NAME READY STATUS RESTARTS AGE ray-operator-75dbbf8587-5lrvn 1/1 Running 0 31s ``` -Delete the operator +Delete the operator. ```shell kubectl delete -k "github.com/ray-project/kuberay/ray-operator/config/default" ``` From d4641214a1d1ab25994075ae2b2c4e95f5763b29 Mon Sep 17 00:00:00 2001 From: Dmitri Gekhtman Date: Wed, 7 Dec 2022 13:55:04 -0800 Subject: [PATCH 7/9] Archit: Period. Signed-off-by: Dmitri Gekhtman --- ray-operator/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ray-operator/README.md b/ray-operator/README.md index d7e63b5201..b62e829da8 100644 --- a/ray-operator/README.md +++ b/ray-operator/README.md @@ -9,7 +9,7 @@ Below are some of the main features of the KubeRay operator: - Management of first-class RayClusters via a [custom resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-resources). - Support for heterogenous worker types in a single Ray cluster. - Optional Ray Autoscaler integration; autoscaling based on Ray application semantics. -- Use of Kubernetes `PodTemplates` to configure Ray pods +- Use of Kubernetes `PodTemplates` to configure Ray pods. - Use of `ScaleStrategy` to remove specific Ray worker pods. - Automatated management of critical configuration, such as required `environment variables`, the `ray start` entrypoint, and a `dev/shm` volume mount for Ray's shared memory. - Built-in monitoring via Prometheus. From 4ff808b28d80fa72cf49e5f9b538221b87a0cddc Mon Sep 17 00:00:00 2001 From: Dmitri Gekhtman Date: Wed, 7 Dec 2022 13:57:35 -0800 Subject: [PATCH 8/9] Link to subsection. Signed-off-by: Dmitri Gekhtman --- ray-operator/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ray-operator/README.md b/ray-operator/README.md index b62e829da8..f632f24334 100644 --- a/ray-operator/README.md +++ b/ray-operator/README.md @@ -80,7 +80,7 @@ Sample | Description For production use-cases, make sure to allocate sufficient resources for your Ray pods; it usually makes sense to run one large Ray pod per Kubernetes node. We do not recommend allocating less than 8Gb memory for a Ray pod running in production. Always set limits for memory and CPU. When possible, set requests equal to limits. - See the [Ray documentation](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html) for further guidance. + See the [Ray documentation](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources) for further guidance. See [ray-cluster.complete.large.yaml](config/samples/ray-cluster.complete.large.yaml) and [ray-cluster.autoscaler.large.yaml](config/samples/ray-cluster.autoscaler.yaml) for examples of RayCluster resource configurations suitable for production. From a18cb5540d2d1cf6dab8e04744f01c6e5cd3af26 Mon Sep 17 00:00:00 2001 From: Dmitri Gekhtman Date: Wed, 7 Dec 2022 13:59:48 -0800 Subject: [PATCH 9/9] tweaks Signed-off-by: Dmitri Gekhtman --- ray-operator/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ray-operator/README.md b/ray-operator/README.md index f632f24334..ee31352b1b 100644 --- a/ray-operator/README.md +++ b/ray-operator/README.md @@ -155,7 +155,7 @@ $ kubectl logs raycluster-heterogeneous-head-9t28q 2022-09-21 13:22:00,515 INFO scripts.py:910 -- Running subprocesses are monitored and a message will be printed if any of them terminate unexpectedly. Subprocesses exit with SIGTERM will be treated as graceful, thus NOT reported. ``` -Execute hello world Ray code +Now, we can run the hello world Ray code mounted from the config map created above. ```shell # Substitute the name of your head pod in this step. $ kubectl exec raycluster-heterogeneous-head-9t28q -- python /opt/sample_code.py @@ -168,7 +168,7 @@ Ray Nodes: {'10.244.0.145', '10.244.0.143', '10.244.0.146', '10.244.0.144', '10 Execution time = 4.855740308761597 ``` -The output of the hello world Ray code show 5 nodes in the Ray cluster +The output of the hello world Ray code shows 5 nodes in the Ray cluster. ``` Ray Nodes: {'10.244.0.145', '10.244.0.143', '10.244.0.146', '10.244.0.144', '10.244.0.147'} ```