Skip to content

Latest commit

 

History

History
269 lines (198 loc) · 11.8 KB

DEVELOPMENT.md

File metadata and controls

269 lines (198 loc) · 11.8 KB

Development

This section walks through how to build and test the operator in a running Kubernetes cluster.

Requirements

software version link
kubectl v1.21.0+ download
go v1.19 download
docker 19.03+ download

Alternatively, you can use podman (version 4.5+) instead of docker. See podman.io for installation instructions. The Makefile allows you to specify the container engine to use via the ENGINE variable. For example, to use podman, you can run ENGINE=podman make docker-build.

The instructions assume you have access to a running Kubernetes cluster via kubectl. If you want to test locally, consider using Kind or Minikube.

Setup on Kind

For local development, we recommend using Kind to create a Kubernetes cluster.

Use go v1.19

Currently, KubeRay uses go v1.19 for development.

go install golang.org/dl/go1.19.12@latest
go1.19.12 download
export GOROOT=$(go1.19. env GOROOT)
export PATH="$GOROOT/bin:$PATH"

Development

IDE Setup (VS Code)

  • Step 1: Install the VS Code Go extension.
  • Step 2: Import the KubeRay workspace configuration by using the file kuberay.code-workspace in the root of the KubeRay git repo:
    • "File" -> "Open Workspace from File" -> "kuberay.code-workspace"

Setting up workspace configuration is required because KubeRay contains multiple Go modules. See the VS Code Go documentation for details.

End-to-end local development process on Kind

# Step 1: Create a Kind cluster
kind create cluster --image=kindest/node:v1.24.0

# Step 2: Modify KubeRay source code
# For example, add a log "Hello KubeRay" in the function `Reconcile` in `raycluster_controller.go`.

# Step 3: Build an image
#         This command will copy the source code directory into the image, and build it.
# Command: IMG={IMG_REPO}:{IMG_TAG} make docker-build
IMG=kuberay/operator:nightly make docker-build

# To skip running unit tests, run the following command instead:
# IMG=kuberay/operator:nightly make docker-image

# Step 4: Load the custom KubeRay image into the Kind cluster.
# Command: kind load docker-image {IMG_REPO}:{IMG_TAG}
kind load docker-image kuberay/operator:nightly

# Step 5: Keep consistency
# If you update RBAC or CRD, you need to synchronize them.
# See the section "Consistency check" for more information.

# Step 6: Install KubeRay operator with the custom image via local Helm chart
# (Path: helm-chart/kuberay-operator)
# Command: helm install kuberay-operator --set image.repository={IMG_REPO} --set image.tag={IMG_TAG} .
helm install kuberay-operator --set image.repository=kuberay/operator --set image.tag=nightly .

# Step 7: Check the log of KubeRay operator
kubectl logs {YOUR_OPERATOR_POD} | grep "Hello KubeRay"
# 2022-12-09T04:41:59.946Z        INFO    controllers.RayCluster  Hello KubeRay
# ...
  • Replace {IMG_REPO} and {IMG_TAG} with your own repository and tag.
  • The command make docker-build (Step 3) will also run make test (unit tests).
  • Step 6 also installs the custom resource definitions (CRDs) used by the KubeRay operator.

Running the tests

The unit tests can be run by executing the following command:

make test

Example output:

✗ make test
...
go fmt ./...
go vet ./...
...
setting up env vars
?   	github.com/ray-project/kuberay/ray-operator	[no test files]
ok  	github.com/ray-project/kuberay/ray-operator/api/v1alpha1	0.023s	coverage: 0.9% of statements
ok  	github.com/ray-project/kuberay/ray-operator/controllers	9.587s	coverage: 66.8% of statements
ok  	github.com/ray-project/kuberay/ray-operator/controllers/common	0.016s	coverage: 75.6% of statements
ok  	github.com/ray-project/kuberay/ray-operator/controllers/utils	0.015s	coverage: 31.4% of statements

The e2e tests can be run by executing the following command:

make test-e2e

Example output:

go test -timeout 30m -v ./test/e2e
=== RUN   TestRayJobWithClusterSelector
    rayjob_cluster_selector_test.go:41: Created ConfigMap test-ns-jtlbd/jobs successfully
    rayjob_cluster_selector_test.go:159: Created RayCluster test-ns-jtlbd/raycluster successfully
    rayjob_cluster_selector_test.go:161: Waiting for RayCluster test-ns-jtlbd/raycluster to become ready
=== RUN   TestRayJobWithClusterSelector/Successful_RayJob
=== PAUSE TestRayJobWithClusterSelector/Successful_RayJob
=== RUN   TestRayJobWithClusterSelector/Failing_RayJob
=== PAUSE TestRayJobWithClusterSelector/Failing_RayJob
=== CONT  TestRayJobWithClusterSelector/Successful_RayJob
=== CONT  TestRayJobWithClusterSelector/Failing_RayJob
=== NAME  TestRayJobWithClusterSelector
    rayjob_cluster_selector_test.go:213: Created RayJob test-ns-jtlbd/counter successfully
    rayjob_cluster_selector_test.go:215: Waiting for RayJob test-ns-jtlbd/counter to complete
    rayjob_cluster_selector_test.go:268: Created RayJob test-ns-jtlbd/fail successfully
    rayjob_cluster_selector_test.go:270: Waiting for RayJob test-ns-jtlbd/fail to complete
    test.go:118: Retrieving Pod Container test-ns-jtlbd/counter-zs9s8/ray-job-submitter logs
    test.go:106: Creating ephemeral output directory as KUBERAY_TEST_OUTPUT_DIR env variable is unset
    test.go:109: Output directory has been created at: /var/folders/mx/kpgdgdqd5j56ynylglgn0nvh0000gn/T/TestRayJobWithClusterSelector2055000419/001
    test.go:118: Retrieving Pod Container test-ns-jtlbd/fail-gdws6/ray-job-submitter logs
    test.go:118: Retrieving Pod Container test-ns-jtlbd/raycluster-head-gnhlw/ray-head logs
    test.go:118: Retrieving Pod Container test-ns-jtlbd/raycluster-worker-small-group-9dffx/ray-worker logs
--- PASS: TestRayJobWithClusterSelector (12.19s)
    --- PASS: TestRayJobWithClusterSelector/Failing_RayJob (16.11s)
    --- PASS: TestRayJobWithClusterSelector/Successful_RayJob (19.14s)
PASS
ok      github.com/ray-project/kuberay/ray-operator/test/e2e    32.066s

Note you can set the KUBERAY_TEST_OUTPUT_DIR environment to specify the test output directory. If not set, it defaults to a temporary directory that's removed once the tests execution completes.

Alternatively, You can run the e2e test(s) from your preferred IDE / debugger.

Manually test new image in running cluster

Build and apply the CRD:

make install

Deploy the manifests and controller

IMG=kuberay/operator:nightly make deploy

Note: remember to replace with your own image

CI/CD

Linting

KubeRay uses the gofumpt linter.

Download gofumpt version 0.5.0. At the time of writing, v0.5.0 is the latest version compatible with go1.19. Run this command to download it:

go install mvdan.cc/[email protected]

As a backup, here’s the link to the source (if you installed gofumpt with go install, you don’t need this).

Check that the gofumpt version is 0.5.0:

gofumpt --version
# v0.5.0 (go1.19)

Make sure your go version is still 1.19:

go version
# go version go1.19 darwin/amd64

If your go version isn’t 1.19 any more, you may have installed a different gofumpt version (e.g. by downloading with Homebrew). If you accidentally installed gofumpt using Homebrew, run brew uninstall gofumpt and then brew uninstall go. Then check brew install [email protected]. It should be back to 1.19.x.

Whenever you edit KubeRay code, run the gofumpt linter inside the KubeRay directory:

gofumpt -w .

The -w flag will overwrite any unformatted code.

Helm chart linter

We have chart lint tests with Helm v3.4.1 and Helm v3.9.4 on GitHub Actions. We also provide a script to execute the lint tests on your laptop. If you cannot reproduce the errors on GitHub Actions, the possible reason is the different version of Helm. Issue #537 is an example that some errors only happen in old helm versions.

Run tests with docker

./helm-chart/script/chart-test.sh

Run tests on your local environment

Consistency check

We have several consistency checks on GitHub Actions. There are several files which need synchronization.

  1. ray-operator/apis/ray/v1alpha1/*_types.go should be synchronized with the CRD YAML files (ray-operator/config/crd/bases/)
  2. ray-operator/apis/ray/v1alpha1/*_types.go should be synchronized with generated API (ray-operator/pkg/client)
  3. CRD YAML files in ray-operator/config/crd/bases/ and helm-chart/kuberay-operator/crds/ should be the same.
  4. Kubebuilder markers in ray-operator/controllers/ray/*_controller.go should be synchronized with RBAC YAML files in ray-operator/config/rbac.
  5. RBAC YAML files in helm-chart/kuberay-operator/templates and ray-operator/config/rbac should be synchronized. Currently, we need to synchronize this manually. See #631 as an example.
  6. multiple_namespaces_role.yaml and multiple_namespaces_rolebinding.yaml should be synchronized with role.yaml and rolebinding.yaml in the helm-chart/kuberay-operator/templates directory. The only difference is that the former creates namespaced RBAC resources, while the latter creates cluster-scoped RBAC resources.
# Synchronize consistency 1 and 4:
make manifests

# Synchronize consistency 2:
./hack/update-codegen.sh

# Synchronize consistency 3:
make helm

# Synchronize 1, 2, 3, and 4 in one command
# [Note]: Currently, we need to synchronize consistency 5 and 6 manually.
make sync

# Reproduce CI error for job "helm-chart-verify-rbac" (consistency 5)
python3 ../scripts/rbac-check.py

Run end-to-end tests locally

We have some end-to-end tests on GitHub Actions. These tests operate small Ray clusters running within a kind (Kubernetes-in-docker) environment. To run the tests yourself, follow these steps:

  • Step1: Install related dependencies, including kind and kubectl.

  • Step2: You must be in /path/to/your/kuberay/.

    # [Usage]: RAY_IMAGE=$RAY_IMAGE OPERATOR_IMAGE=$OPERATOR_IMAGE python3 tests/compatibility-test.py
    #          These 3 environment variables are optional.
    # [Example]:
    RAY_IMAGE=rayproject/ray:2.7.0 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/compatibility-test.py

Running configuration tests locally.

The sample RayCluster and RayService CRs under ray-operator/config/samples are tested in tests/test_sample_raycluster_yamls.py and tests/test_sample_rayservice_yamls.py. Currently, only a few of these sample configurations are tested in the CI. See KubeRay issue #695.

# Test RayCluster doc examples.
RAY_IMAGE=rayproject/ray:2.7.0 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/test_sample_raycluster_yamls.py
# Test RayService doc examples.
RAY_IMAGE=rayproject/ray:2.7.0 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/test_sample_rayservice_yamls.py

See KubeRay PR #605 for more details about the test framework.