-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Replace service name with Fully Qualified Domain Name #938
Conversation
You can maintain compatibility by introducing a new env, say $FQ_RAY_IP, and keeping the old env as is (perhaps with a comment in the code that it's needed for backwards compatibility). |
Good idea. Will do that. |
Hi @DmitriGekhtman, this PR is currently backward compatible. Would you mind reviewing it? Thanks! |
LGTM! I have also tested it and the worker pod connects to the head pod successfully using the
|
…project#938) Replace service name with Fully Qualified Domain Name
Why are these changes needed?
Some big tech companies do not use
kube-dns
as the DNS server for Kubernetes Pods. They have a lot of Kubernetes services in the cluster, but services in different namespaces may have the same name. Hence, IP address with the information of both Kubernetes service name and Kubernetes namespace is useful in this case. That is, Fully Qualified Domain Name (FQDN) is a must.kube-dns
as the DNS server: We can use${HEAD_SERVICE}
to access the head service.${HEAD_SERVICE}.${NAMESPACE}.svc.cluster.local
to access the head service.Backward compatibility
Note: It breaks the backward compatibility (users need to updateuntil nslookup $RAY_IP ...
in their YAML files). However, this update is a must. It actually blocks some users to adopt KubeRay.[Update]: We plan to maintain both $FQ_RAY_IP and $RAY_IP environment variables at the same time. Hence, user does not need to update their YAML files.
${HEAD_SERVICE}.${NAMESPACE}.svc.cluster.local
${HEAD_SERVICE}
Why do I remove anything related to the head service (
svcName
&fqdnRayIP
) inbuildHeadPod
?(function call graph)
buildHeadPod
calls:BuildPod
calls:SetInitContainerEnvVars
: Set RAY_IP for init container. (case 1)SetContainerEnvVars
: Set RAY_IP for the Ray head container. (case 2)DefaultHeadPodTemplate
calls:SetMissingRayStartParams
: SetrayStartParams["address"]
for the Ray head container. (case 3)case 1: init container
svcName
&fqdnRayIP
.case 2: Ray head container
RAY_IP
env variable for the head Pod is hardcoded to "LOCAL_HOST". (The following code is without this PR.) => Do not need the information ofsvcName
&fqdnRayIP
.kuberay/ray-operator/controllers/ray/common/pod.go
Lines 559 to 566 in af8fb0c
case 3:
rayStartParams
for Ray headrayStartParams["address"]
. (The following code is without this PR.) => Do not need the information ofsvcName
&fqdnRayIP
.kuberay/ray-operator/controllers/ray/common/pod.go
Lines 641 to 646 in af8fb0c
To conclude, we can remove anything related to the head service (
svcName
&fqdnRayIP
) inbuildHeadPod
.function ExtractRayIPFromFQDN
${HEAD_SERVICE}.${NAMESPACE}.svc.cluster.local
${HEAD_SERVICE}
In addition, "." is invalid in Kubernetes service name (the regex for Kubernetes service name is
[a-z]([-a-z0-9]*[a-z0-9])
). Thus, the implementation is valid.[Need discussion] FQ_RAY_IP and RAY_IP should not be set by users.
FQ_RAY_IP
andRAY_IP
will always be determined by KubeRay operator. Users cannot set them. I cannot come up with any case that users need to specify these two values by themselves.Related issue number
Checks
FQ_RAY_IP
environment variable israycluster-kuberay-head-svc.default.svc.cluster.local
.RAY_IP
environment variable israycluster-kuberay-head-svc
.ray-worker
FQ_RAY_IP
environment variable israycluster-kuberay-head-svc.default.svc.cluster.local
.RAY_IP
environment variable israycluster-kuberay-head-svc
.ray start --block --address=raycluster-kuberay-head-svc.default.svc.cluster.local:6379 ...
Test backward compatibility
KubeRay operator in this PR is compatible with old configuration YAML files.
I will replace
RAY_IP
in the configuration files withFQ_RAY_IP
after release 0.5.0 because we need to make sure the configuration YAML files in the master branch are compatible with both nightly and the latest release KubeRay operator. [Feature] Replace RAY_IP with FQ_RAY_IP after release 0.5.0 #941