ray-project · kevin85421 · Apr 24, 2023 · Apr 21, 2023 · Apr 21, 2023 · Apr 24, 2023
diff --git a/README.md b/README.md
@@ -3,16 +3,33 @@
 [![Build Status](https://github.com/ray-project/kuberay/workflows/Go-build-and-test/badge.svg)](https://github.com/ray-project/kuberay/actions)
 [![Go Report Card](https://goreportcard.com/badge/github.com/ray-project/kuberay)](https://goreportcard.com/report/github.com/ray-project/kuberay)
 
-KubeRay is an open source toolkit to run Ray applications on Kubernetes.
-It provides several tools to simplify managing Ray clusters on Kubernetes.
-
-- Ray Operator
-- Backend services to create/delete cluster resources
-- Kubectl plugin/CLI to operate CRD objects
-- Native Job and Serving integration with Clusters
-- Data Scientist centric workspace for fast prototyping (incubating)
-- Kubernetes event dumper for ray clusters/pod/services (future work)
-- Operator Integration with Kubernetes node problem detector (future work)
+KubeRay is a powerful, open-source Kubernetes operator that simplifies the deployment and management of [Ray](https://github.com/ray-project/ray) applications on Kubernetes. It offers several key components:
+
+**KubeRay core**: This is the official, fully-maintained component of KubeRay that provides three custom resource definitions, RayCluster, RayJob, and RayService. These resources are designed to help you run a wide range of workloads with ease.
+
+* **RayCluster**: KubeRay fully manages the lifecycle of RayCluster, including cluster creation/deletion, autoscaling, and ensuring fault tolerance.
+
+* **RayJob**: With RayJob, KubeRay automatically creates a RayCluster and submits a job when the cluster is ready. You can also configure RayJob to automatically delete the RayCluster once the job finishes.
+
+* **RayService**: RayService is made up of two parts: a RayCluster and a Ray Serve deployment graph. RayService offers zero-downtime upgrades for RayCluster and high availability.
+
+**Comminity-managed components (optional)**: Some components are maintained by the KubeRay community.
+
+* **KubeRay APIServer**: It provides a layer of simplified configuration for KubeRay resources. The KubeRay API server is used internally
+by some organizations to back user interfaces for KubeRay resource management.
+
+* **KubeRay Python client**: This Python client library provides APIs to handle RayCluster from your Python application.
+
+* **KubeRay CLI**: KubeRay CLI provides the ability to manage KubeRay resources through command-line interface.
+
+## KubeRay ecosystem
+
+* [AWS Application Load Balancer](docs/guidance/ingress.md)
+* [Nginx](docs/guidance/ingress.md)
+* [Prometheus and Grafana](docs/guidance/prometheus-grafana.md) 
+* [Volcano](docs/guidance/volcano-integration.md)
+* [MCAD](docs/guidance/kuberay-with-MCAD.md)
+* [Kubeflow](docs/guidance/kubeflow-integration.md)
 
 ## Documentation
 

diff --git a/docs/guidance/ingress.md b/docs/guidance/ingress.md
@@ -1,6 +1,7 @@
 ## Ingress Usage
 
 Here we provide some examples to show how to use ingress to access your Ray cluster.
+
   * [Example: AWS Application Load Balancer (ALB) Ingress support on AWS EKS](#example-aws-application-load-balancer-alb-ingress-support-on-aws-eks)
   * [Example: Manually setting up NGINX Ingress on KinD](#example-manually-setting-up-nginx-ingress-on-kind)
 

diff --git a/docs/guidance/kubeflow-integration.md b/docs/guidance/kubeflow-integration.md
@@ -45,6 +45,7 @@ kubectl get pod -l ray.io/cluster=raycluster-kuberay
 # raycluster-kuberay-head-bz77b                 1/1     Running   0          64s
 # raycluster-kuberay-worker-workergroup-8gr5q   1/1     Running   0          63s
 ```
+
 * This step uses `rayproject/ray:2.2.0-py38-cpu` as its image. Ray is very sensitive to the Python versions and Ray versions between the server (RayCluster) and client (JupyterLab) sides. This image uses:
     * Python 3.8.13
     * Ray 2.2.0

diff --git a/docs/guidance/observability.md b/docs/guidance/observability.md
@@ -6,6 +6,7 @@
 In the RayCluster resource definition, we use `State` to represent the current status of the Ray cluster.
 
 For now, there are three types of the status exposed by the RayCluster's status.state: `ready`, `unhealthy` and `failed`.
+
 | State     | Description                                                                                     |
 | --------- | ----------------------------------------------------------------------------------------------- |
 | ready     | The Ray cluster is ready for use.                                                               |

diff --git a/docs/guidance/pod-command.md b/docs/guidance/pod-command.md
@@ -30,12 +30,13 @@ Currently, for timing (1), we can set the container's `Command` and `Args` in Ra
           command: ["echo 123"]
           args: ["456"]
 ```
+
 * Ray head Pod
-  * `spec.containers.0.command` is hardcoded with `["/bin/bash", "-lc", "--"]`.
-  * `spec.containers.0.args` contains two parts:
-    * (Part 1) **user-specified command**: A string concatenates `headGroupSpec.template.spec.containers.0.command` from RayCluster and `headGroupSpec.template.spec.containers.0.args` from RayCluster together.
-    * (Part 2) **ray start command**: The command is created based on `rayStartParams` specified in RayCluster. The command will look like `ulimit -n 65536; ray start ...`.
-    * To summarize, `spec.containers.0.args` will be `$(user-specified command) && $(ray start command)`.
+    * `spec.containers.0.command` is hardcoded with `["/bin/bash", "-lc", "--"]`.
+    * `spec.containers.0.args` contains two parts:
+        * (Part 1) **user-specified command**: A string concatenates `headGroupSpec.template.spec.containers.0.command` from RayCluster and `headGroupSpec.template.spec.containers.0.args` from RayCluster together.
+        * (Part 2) **ray start command**: The command is created based on `rayStartParams` specified in RayCluster. The command will look like `ulimit -n 65536; ray start ...`.
+        * To summarize, `spec.containers.0.args` will be `$(user-specified command) && $(ray start command)`.
 
 * Example
     ```sh
@@ -128,6 +129,7 @@ lifecycle:
     exec:
       command: ["/bin/sh","-c","/home/ray/samples/ray_cluster_resources.sh"]
 ```
+
 * We execute the script `ray_cluster_resources.sh` via the postStart hook. Based on [this document](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks), there is no guarantee that the hook will execute before the container ENTRYPOINT. Hence, we need to wait for RayCluster to finish initialization in `ray_cluster_resources.sh`.
 
 * Example

diff --git a/docs/guidance/prometheus-grafana.md b/docs/guidance/prometheus-grafana.md
@@ -25,6 +25,7 @@ kubectl get all -n prometheus-system
 # deployment.apps/prometheus-kube-prometheus-operator   1/1     1            1           46s
 # deployment.apps/prometheus-kube-state-metrics         1/1     1            1           46s
 ```
+
 * KubeRay provides an [install.sh script](../../install/prometheus/install.sh) to install the [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) chart and related custom resources, including **ServiceMonitor**, **PodMonitor** and **PrometheusRule**, in the namespace `prometheus-system` automatically.
 
 ## Step 3: Install a KubeRay operator
@@ -92,9 +93,9 @@ spec:
   targetLabels:
   - ray.io/cluster
 ```
+
 * The YAML example above is [serviceMonitor.yaml](../../config/prometheus/serviceMonitor.yaml), and it is created by **install.sh**. Hence, no need to create anything here.
 * See [ServiceMonitor official document](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#servicemonitor) for more details about the configurations.
-
 * `release: $HELM_RELEASE`: Prometheus can only detect ServiceMonitor with this label.
 
 <div id="prometheus-can-only-detect-this-label" ></div>
@@ -156,6 +157,7 @@ spec:
   podMetricsEndpoints:
   - port: metrics
 ```
+
 * `release: $HELM_RELEASE`: Prometheus can only detect PodMonitor with this label. See [here](#prometheus-can-only-detect-this-label) for more details.
 
 * **PodMonitor** in `namespaceSelector` and `selector` are used to select Kubernetes Pods.

diff --git a/docs/guidance/tls.md b/docs/guidance/tls.md
@@ -42,6 +42,7 @@ kubectl apply -f ray-operator/config/samples/ray-cluster.tls.yaml
 ```
 
 `ray-cluster.tls.yaml` will create:
+
 * A Kubernetes Secret containing the CA's private key (`ca.key`) and self-signed certificate (`ca.crt`) (**Step 1**)
 * A Kubernetes ConfigMap containing the scripts `gencert_head.sh` and `gencert_worker.sh`, which allow Ray Pods to generate private keys 
 (`tls.key`) and self-signed certificates (`tls.crt`) (**Step 2**)
@@ -75,6 +76,7 @@ openssl x509 -in ca.crt -noout -text
 #           (Note: You should comment out the Kubernetes Secret in `ray-cluster.tls.yaml`.)
 kubectl create secret generic ca-tls --from-file=ca.key --from-file=ca.crt
 ```
+
 * `ca.key`: CA's private key
 * `ca.crt`: CA's self-signed certificate 
 

diff --git a/docs/index.md b/docs/index.md
@@ -24,17 +24,33 @@
 
 ## KubeRay
 
-KubeRay is an open source toolkit to run Ray applications on Kubernetes.
+KubeRay is a powerful, open-source Kubernetes operator that simplifies the deployment and management of [Ray](https://github.com/ray-project/ray) applications on Kubernetes. It offers several key components:
 
-KubeRay provides several tools to simplify managing Ray clusters on Kubernetes.
+**KubeRay core**: This is the official, fully-maintained component of KubeRay that provides three custom resource definitions, RayCluster, RayJob, and RayService. These resources are designed to help you run a wide range of workloads with ease.
+
+* **RayCluster**: KubeRay fully manages the lifecycle of RayCluster, including cluster creation/deletion, autoscaling, and ensuring fault tolerance.
 
-- Ray Operator
-- Backend services to create/delete cluster resources
-- Kubectl plugin/CLI to operate CRD objects
-- Native Job and Serving integration with Clusters
-- Data Scientist centric workspace for fast prototyping (incubating)
-- Kubernetes event dumper for ray clusters/pod/services (future work)
-- Operator Integration with Kubernetes node problem detector (future work)
+* **RayJob**: With RayJob, KubeRay automatically creates a RayCluster and submits a job when the cluster is ready. You can also configure RayJob to automatically delete the RayCluster once the job finishes.
+
+* **RayService**: RayService is made up of two parts: a RayCluster and a Ray Serve deployment graph. RayService offers zero-downtime upgrades for RayCluster and high availability.
+
+**Comminity-managed components (optional)**: Some components are maintained by the KubeRay community.
+
+* **KubeRay APIServer**: It provides a layer of simplified configuration for KubeRay resources. The KubeRay API server is used internally
+by some organizations to back user interfaces for KubeRay resource management.
+
+* **KubeRay Python client**: This Python client library provides APIs to handle RayCluster from your Python application.
+
+* **KubeRay CLI**: KubeRay CLI provides the ability to manage KubeRay resources through command-line interface.
+
+## KubeRay ecosystem
+
+* [AWS Application Load Balancer](guidance/ingress/#example-aws-application-load-balancer-alb-ingress-support-on-aws-eks)
+* [Nginx](guidance/ingress/#example-manually-setting-up-nginx-ingress-on-kind)
+* [Prometheus and Grafana](guidance/prometheus-grafana/) 
+* [Volcano](guidance/volcano-integration/)
+* [MCAD](guidance/kuberay-with-MCAD/)
+* [Kubeflow](guidance/kubeflow-integration/)
 
 ## Security