Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] [KubeRay] Add tutorial for connecting to google cloud storage bucket from GKE RayCluster #38858

Merged
merged 66 commits into from
Aug 31, 2023
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
d34a528
add docs
kevin85421 Aug 20, 2023
8030e26
update
kevin85421 Aug 21, 2023
7f1c300
add rayservice doc
kevin85421 Aug 21, 2023
f2ab31b
remove RayService
kevin85421 Aug 21, 2023
7225343
update
kevin85421 Aug 22, 2023
b92099f
update
kevin85421 Aug 22, 2023
9d602ba
kubeflow doc
kevin85421 Aug 22, 2023
e5564a8
update
kevin85421 Aug 22, 2023
26bc93a
update
kevin85421 Aug 22, 2023
7a573d3
update
kevin85421 Aug 22, 2023
743f078
update
kevin85421 Aug 22, 2023
70928ce
update
kevin85421 Aug 22, 2023
e7f4a58
update
kevin85421 Aug 22, 2023
0a46948
update
kevin85421 Aug 22, 2023
5d2b814
update
kevin85421 Aug 22, 2023
702f129
update
kevin85421 Aug 22, 2023
dbe0bb5
update
kevin85421 Aug 23, 2023
b5b6437
update
kevin85421 Aug 23, 2023
8c0dab7
update
kevin85421 Aug 23, 2023
3f43229
Update doc/source/cluster/kubernetes/benchmarks/memory-scalability-be…
kevin85421 Aug 23, 2023
c8e4fbc
Update doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
kevin85421 Aug 23, 2023
856d8bc
Update doc/source/cluster/kubernetes/benchmarks/memory-scalability-be…
kevin85421 Aug 23, 2023
5b7c2b0
Update doc/source/cluster/kubernetes/benchmarks/memory-scalability-be…
kevin85421 Aug 23, 2023
4d86c7c
Update doc/source/cluster/kubernetes/benchmarks/memory-scalability-be…
kevin85421 Aug 23, 2023
d2df3ed
Update doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
kevin85421 Aug 23, 2023
76831fd
Apply suggestions from code review
kevin85421 Aug 23, 2023
99a97d7
Apply suggestions from code review
kevin85421 Aug 23, 2023
7bc8d3d
Apply suggestions from code review
kevin85421 Aug 23, 2023
22730dd
update
kevin85421 Aug 23, 2023
2f2794d
Apply suggestions from code review
kevin85421 Aug 23, 2023
21dceba
update
kevin85421 Aug 23, 2023
6909059
update
kevin85421 Aug 23, 2023
6f18b98
Apply suggestions from code review
kevin85421 Aug 23, 2023
5a52048
update
kevin85421 Aug 23, 2023
d43ec07
update
kevin85421 Aug 23, 2023
2c4f191
update
kevin85421 Aug 23, 2023
3eb8b50
update
kevin85421 Aug 24, 2023
ec19d15
update
kevin85421 Aug 24, 2023
442dade
Add batch inference KubeRay RayJob example
architkulkarni Aug 24, 2023
5bc0af2
Add bucket doc
architkulkarni Aug 24, 2023
2340e6a
Merge branch 'master' of https://github.com/ray-project/ray into gcs-…
architkulkarni Aug 25, 2023
4f85260
Add to TOC
architkulkarni Aug 29, 2023
5c052db
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 29, 2023
347641c
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 29, 2023
ad016d3
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 29, 2023
44789e4
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 29, 2023
6f28325
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 29, 2023
fef5823
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 29, 2023
f62c720
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 29, 2023
b1f4d50
Add to user guides list
architkulkarni Aug 29, 2023
286c783
Remove "labels"
architkulkarni Aug 29, 2023
b39f7b5
Remove YAML and link to kuberay repo
architkulkarni Aug 29, 2023
098ac6c
Merge branch 'master' of https://github.com/ray-project/ray into gcs-…
architkulkarni Aug 29, 2023
677e0b0
How to find project id
architkulkarni Aug 29, 2023
fb279fe
Merge branch 'master' into gcs-gke-bucket-doc
architkulkarni Aug 29, 2023
77875d1
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 31, 2023
0cdf429
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 31, 2023
484e028
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 31, 2023
f835f3e
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 31, 2023
ea8e899
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 31, 2023
7746a44
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 31, 2023
e163585
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 31, 2023
9dd9691
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 31, 2023
cbfaa0c
rename RAY to GCP and remove shutdown()
architkulkarni Aug 31, 2023
671ee6c
Merge branch 'master' into gcs-gke-bucket-doc
architkulkarni Aug 31, 2023
58f81a7
Update doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
architkulkarni Aug 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,7 @@ parts:
- file: cluster/kubernetes/user-guides/gcp-gke-gpu-cluster.md
- file: cluster/kubernetes/user-guides/config.md
- file: cluster/kubernetes/user-guides/configuring-autoscaling.md
- file: cluster/kubernetes/user-guides/gke-gcs-bucket.md
- file: cluster/kubernetes/user-guides/logging.md
- file: cluster/kubernetes/user-guides/gpu.md
- file: cluster/kubernetes/user-guides/rayserve-dev-doc.md
Expand Down
168 changes: 168 additions & 0 deletions doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Configuring KubeRay to use Google Cloud Storage Buckets in GKE
kevin85421 marked this conversation as resolved.
Show resolved Hide resolved

If you are already familiar with Workload Identity in GKE, you can skip this document. The gist is that you need to specify a service account in each of the Ray pods after linking your Kubernetes service account to your Google Cloud service account. Otherwise, read on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you are already familiar with Workload Identity in GKE, you can skip this document. The gist is that you need to specify a service account in each of the Ray pods after linking your Kubernetes service account to your Google Cloud service account. Otherwise, read on.
You need to specify a service account in each of the Ray pods after linking your Kubernetes service account to your Google Cloud service account. f you are already familiar with Workload Identity in GKE, you can skip this document.

architkulkarni marked this conversation as resolved.
Show resolved Hide resolved

We will follow an abridged version of the documentation at <https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity>. The full documentation is worth reading if you are interested in the details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We will follow an abridged version of the documentation at <https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity>. The full documentation is worth reading if you are interested in the details.
This document follows an abridged version of the documentation at <https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity>. The full documentation is worth reading if you are interested in the details.

architkulkarni marked this conversation as resolved.
Show resolved Hide resolved

## Create a Kubernetes cluster on GKE

For this example, we will create a minimal KubeRay cluster using GKE.
architkulkarni marked this conversation as resolved.
Show resolved Hide resolved

Run this command and all following commands on your local machine or on the [Google Cloud Shell](https://cloud.google.com/shell). If running from your local machine, you will need to install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install).
architkulkarni marked this conversation as resolved.
Show resolved Hide resolved

```bash
gcloud container clusters create cloud-bucket-cluster \
--num-nodes=1 --min-nodes 0 --max-nodes 1 --enable-autoscaling \
--zone=us-west1-b --machine-type e2-standard-8 \
--workload-pool=my-project-id.svc.id.goog # Replace my-project-id with your GCP project ID
```


This command creates a Kubernetes cluster named `cloud-bucket-cluster` with 1 node in the `us-west1-b` zone. In this example, we use the `e2-standard-8` machine type, which has 8 vCPUs and 32 GB RAM.
architkulkarni marked this conversation as resolved.
Show resolved Hide resolved

Now get credentials for the cluster to use with `kubectl`:

```bash
gcloud container clusters get-credentials cloud-bucket-cluster --zone us-west1-b --project my-project-id
kevin85421 marked this conversation as resolved.
Show resolved Hide resolved
```

## Create an IAM Service Account

```bash
gcloud iam service-accounts create my-iam-sa
```

## Create a Kubernetes Service Account

```bash
kubectl create serviceaccount my-ksa
```

## Link the Kubernetes Service Account to the IAM Service Account and vice versa

In the following two commands, replace `default` with your namespace if you are not using the default namespace.

```bash
gcloud iam service-accounts add-iam-policy-binding [email protected] \
--role roles/iam.workloadIdentityUser \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is roles/iam.workloadIdentityUser from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the details. I think it's a predefined role built into GCP. The user can find more details at https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity which is linked in the doc

--member "serviceAccount:my-project-id.svc.id.goog[default/my-ksa]"
kevin85421 marked this conversation as resolved.
Show resolved Hide resolved
```

```bash
kubectl annotate serviceaccount my-ksa \
--namespace default \
iam.gke.io/gcp-service-account=my-iam-sa@my-project-id.iam.gserviceaccount.com
```

## Create a Google Cloud Storage Bucket and allow the Google Cloud Service Account to access it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Create a Google Cloud Storage Bucket and allow the Google Cloud Service Account to access it
## Create a Google Cloud Storage Bucket and give the Google Cloud Service Account to access it


Please follow the documentation at <https://cloud.google.com/storage/docs/creating-buckets> to create a bucket using the Google Cloud Console or the `gsutil` command line tool.

For this example, we will give our principal `[email protected]` "Storage Admin" permissions on the bucket. You can do this in the Google Cloud Console ("Permissions" tab under "Buckets" > "Bucket Details") or with the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For this example, we will give our principal `[email protected]` "Storage Admin" permissions on the bucket. You can do this in the Google Cloud Console ("Permissions" tab under "Buckets" > "Bucket Details") or with the following command:
This example, gives our principal `[email protected]` "Storage Admin" permissions on the bucket. Use the Google Cloud Console ("Permissions" tab under "Buckets" > "Bucket Details") or use the following command:

architkulkarni marked this conversation as resolved.
Show resolved Hide resolved

```bash
gsutil iam ch serviceAccount:[email protected]:roles/storage.admin gs://my-bucket
```

## Create a minimal RayCluster YAML manifest

Create a file named `raycluster.yaml` with the following contents:

```yaml
kevin85421 marked this conversation as resolved.
Show resolved Hide resolved
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
labels:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove labels here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done

controller-tools.k8s.io: "1.0"
name: raycluster-mini
spec:
rayVersion: '2.5.0'
architkulkarni marked this conversation as resolved.
Show resolved Hide resolved
headGroupSpec:
rayStartParams:
dashboard-host: '0.0.0.0'
template:
spec:
serviceAccountName: my-ksa
nodeSelector:
iam.gke.io/gke-metadata-server-enabled: "true"
containers:
- name: ray-head
image: rayproject/ray:2.6.3
resources:
limits:
cpu: 1
memory: 2Gi
requests:
cpu: 500m
architkulkarni marked this conversation as resolved.
Show resolved Hide resolved
memory: 2Gi
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
```

The key parts here are the following lines:
architkulkarni marked this conversation as resolved.
Show resolved Hide resolved

```yaml
spec:
serviceAccountName: my-ksa
nodeSelector:
iam.gke.io/gke-metadata-server-enabled: "true"
```

These should be included in every pod spec of your Ray cluster. In this example, we are just using a single-node cluster (1 head node and 0 worker nodes) for simplicity.
architkulkarni marked this conversation as resolved.
Show resolved Hide resolved

## Create the RayCluster

```bash
kubectl apply -f raycluster.yaml
```

## Test GCS bucket access from the RayCluster

Use `kubectl get pod` to get the name of the Ray head pod. Then run the following command to get a shell in the Ray head pod:

```bash
kubectl exec -it raycluster-mini-head-xxxx -- /bin/bash
```

In the shell, run `pip install google-cloud-storage` to install the Google Cloud Storage Python client library. Then run the following Python code to test access to the bucket:
architkulkarni marked this conversation as resolved.
Show resolved Hide resolved

```python
import ray
import os
from google.cloud import storage

RAY_GCS_BUCKET = "my-bucket"
kevin85421 marked this conversation as resolved.
Show resolved Hide resolved
RAY_GCS_FILE = "test_file.txt"
kevin85421 marked this conversation as resolved.
Show resolved Hide resolved

ray.init(address="auto")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to use Ray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to use Ray. My thought was that people using KubeRay use Ray and this just demonstrates that remote Ray processes can still access the bucket (and you don't have to do anything weird with environment variables, etc). If it's confusing I can get rid of it, I don't have a strong opinion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote Ray processes can still access the bucket

If we install google-cloud-storage only on the head Pod, will a task scheduled on a worker node still succeed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be installed on all pods. Alternatively, one can specify ray.init(runtime_env={"pip": ["google-cloud-storage"]})


@ray.remote
def check_gcs_read_write():
client = storage.Client()
bucket = client.get_bucket(RAY_GCS_BUCKET)
blob = bucket.blob(RAY_GCS_FILE)

# Write to the bucket
blob.upload_from_string("Hello, Ray on GKE!")

# Read from the bucket
content = blob.download_as_text()

return content

result = ray.get(check_gcs_read_write.remote())
print(result)

ray.shutdown()
kevin85421 marked this conversation as resolved.
Show resolved Hide resolved
```

You should see the following output:

```text
Hello, Ray on GKE!
```
Loading