-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Post v0.5.0] Remove init containers from YAML files #1010
[Post v0.5.0] Remove init containers from YAML files #1010
Conversation
cc @jasoonn |
@Jeffwan @scarlet25151 @wilsonwang371 would you mind reviewing the changes in apiserver/pkg/util/cluster.go? Thanks! |
I will update the kuberay-helm repository after this PR is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR remove initContainer config in API server, and move the config logic to #973 config is aligned with examples'. /LGTM.
@@ -216,17 +216,6 @@ func buildWorkerPodTemplate(imageVersion string, envs map[string]string, spec *a | |||
Annotations: buildNodeGroupAnnotations(computeRuntime, spec.Image), | |||
}, | |||
Spec: v1.PodSpec{ | |||
InitContainers: []v1.Container{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for late. BTW, try to get more background on this. The workers will result in failure at the begining, even it will restart, it will become misleading from observability perspective. Is that a concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the old init container logic is wrong. It waits for the head service rather than GCS server. The head service will be ready when the image pull finishes. It will not fail in regular conditions because Ray internally will retry multiple times to connect to the GCS server. However, the retry mechanism in Ray has a timeout, so the workers will fail if the GCS server cannot be ready within the timeout.
In #973, we inject a default init container using ray health-check
to wait until the GCS server is ready. We removed the nslookup
init container after release v0.5.0 to keep the compatibility philosophy of KubeRay (See #940 for more details).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I also have a qq about this change, it seems like it prevents helm chart users to configre docker image to be used for the init container. Is there something I'm missing? We have a strong requirement to use internal docker registry so this makes the chart 0.5.0 not usable as-is for us. I checked that the init container is still created and not removed completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @skliarpawlo, your question is unrelated to KubeRay APIServer, so I open a new thread to answer your question.
Hi @skliarpawlo, In #973, the default init container will be injected by the KubeRay operator automatically, and the injected container will use the same Docker image as the Ray container in the same Pod.
This PR is merged after KubeRay v0.5.0 chart is release. Hence, the Helm chart will still have two init containers. I will update the Helm chart release tomorrow. The workaround for you is removing the following lines in your local |
Thank you! |
Hi @skliarpawlo, I create release v0.5.1 for Helm charts. You can use: helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 0.5.1
helm install raycluster kuberay/ray-cluster --version 0.5.1 |
Remove init containers from YAML files
Why are these changes needed?
See #973 for more details.
KubeRay APIServer
Python client
Helm Charts
Sample YAML files
test_sample_raycluster_yamls.py
test_sample_raycluster_yamls.py
test_sample_raycluster_yamls.py
test_sample_raycluster_yamls.py
test_security.py
test_sample_rayservice_yamls.py
Template YAML files for end-to-end tests => KubeRay CI
test_sample_raycluster_yamls.py
test_sample_rayservice_yamls.py
Test RayCluster Helm chart
test_security.py
# path: kuberay/ OPERATOR_IMAGE=kuberay/operator:v0.5.0 python3 tests/test_security.py
Test RayJob YAML (
ray_v1alpha1_rayjob.yaml
)Related issue number
#973
Closes #974
Checks