-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] KubeRay does not show clear error for duplicated groupName
field
#718
Labels
Comments
Thanks for posting this. There's a general theme here that we need to raise better errors and improve observability of the system. |
2 tasks
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 30, 2023
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 30, 2023
and validate worker group names are unique. closes ray-project#718 closes ray-project#736
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 30, 2023
and validate worker group names are unique. closes ray-project#718 closes ray-project#736
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 30, 2023
and validate worker group names are unique. closes ray-project#718 closes ray-project#736
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 30, 2023
and validate worker group names are unique. closes ray-project#718 closes ray-project#736
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 30, 2023
and validate worker group names are unique. closes ray-project#718 closes ray-project#736
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 30, 2023
and validate worker group names are unique. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml (base) The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique ``` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 30, 2023
and validate worker group names are unique. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml (base) The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique ``` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 31, 2023
and validate worker group names are unique. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml (base) ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 31, 2023
and validate worker group names are unique. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml (base) ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 31, 2023
and validate worker group names are unique. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## How to use locally ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy-with-webhooks ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml (base) ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 31, 2023
and validate worker group names are unique. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## How to use locally ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy-with-webhooks ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml (base) ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 31, 2023
and validate worker group names are unique. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## How to use locally ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy-with-webhooks ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml (base) ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Oct 31, 2023
and validate worker group names are unique. Add new Makefile targets `install-with-webhooks`, `uninstall-with-webhooks`, `deploy-with-webhooks`, and `undeploy-with-webhooks` to be backwards compatible and opt-in. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## How to use locally ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy-with-webhooks ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` ## Backwards compatibility Just use original Makefile targets of `install`, `uninstall`, `deploy`, and `undeploy`. ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Nov 20, 2023
and validate worker group names are unique. Add new Makefile targets `install-with-webhooks`, `uninstall-with-webhooks`, `deploy-with-webhooks`, and `undeploy-with-webhooks` to be backwards compatible and opt-in. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## How to use locally ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy-with-webhooks ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` ## Backwards compatibility Just use original Makefile targets of `install`, `uninstall`, `deploy`, and `undeploy`. ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Nov 23, 2023
and validate worker group names are unique. Add new Makefile targets `install-with-webhooks`, `uninstall-with-webhooks`, `deploy-with-webhooks`, and `undeploy-with-webhooks` to be backwards compatible and opt-in. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## How to use locally ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy-with-webhooks ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` ## Backwards compatibility Just use original Makefile targets of `install`, `uninstall`, `deploy`, and `undeploy`. ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Nov 23, 2023
and validate worker group names are unique. Add new Makefile targets `install-with-webhooks`, `uninstall-with-webhooks`, `deploy-with-webhooks`, and `undeploy-with-webhooks` to be backwards compatible and opt-in. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## How to use locally ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy-with-webhooks ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` ## Backwards compatibility Just use original Makefile targets of `install`, `uninstall`, `deploy`, and `undeploy`. ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Nov 23, 2023
and validate worker group names are unique. Add new Makefile targets `install-with-webhooks`, `uninstall-with-webhooks`, `deploy-with-webhooks`, and `undeploy-with-webhooks` to be backwards compatible and opt-in. Much of the code was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation` ``` ## How to use locally ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy-with-webhooks ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` ## Backwards compatibility Just use original Makefile targets of `install`, `uninstall`, `deploy`, and `undeploy`. ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
davidxia
added a commit
to davidxia/kuberay
that referenced
this issue
Nov 23, 2023
and validate worker group names are unique. Add new Makefile targets `install-with-webhooks`, `uninstall-with-webhooks`, `deploy-with-webhooks`, and `undeploy-with-webhooks` to be backwards compatible and opt-in. Much of the code, especially the YAML files, was generated by running the command below as [documented in kubebuilder][1]. ``` kubebuilder create webhook \ --group ray \ --version v1 \ --kind RayCluster \ --defaulting \ --programmatic-validation ``` ## How to use locally ```shell make manifests generate make install-with-webhooks IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy-with-webhooks ``` ## Example RayCluster that has duplicate worker group names ```shell cat dupe-worker-group-name.yaml apiVersion: ray.io/v1 kind: RayCluster metadata: name: dupe-worker-group-name spec: headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: rayproject/ray:2.7.0 workerGroupSpecs: - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 - replicas: 1 minReplicas: 1 maxReplicas: 10 groupName: group1 rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.7.0 ``` ## Before ``` kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` ## After ``` kubectl --context kind-kind apply -f config/samples/ray-cluster-dupe-worker-name.yaml ``` `The RayCluster "raycluster-dupe-worker-name" is invalid: spec.workerGroupSpecs[1]: Invalid value: v1.WorkerGroupSpec{GroupName:"group1", Replicas:(*int32)(0x40006e63cc), MinReplicas:(*int32)(0x40006e63c8), MaxReplicas:(*int32)(0x40006e63c0), RayStartParams:map[string]string{}, Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PodSpec{Volumes:[]v1.Volume(nil), InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"ray-worker", Image:"rayproject/ray:2.7.0", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount(nil), VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"", TerminationMessagePolicy:"", ImagePullPolicy:"", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"", TerminationGracePeriodSeconds:(*int64)(nil), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(nil), PreemptionPolicy:(*v1.PreemptionPolicy)(nil), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil), OS:(*v1.PodOS)(nil)}}, ScaleStrategy:v1.ScaleStrategy{WorkersToDelete:[]string(nil)}}: worker group names must be unique` ## Backwards compatibility Just use original Makefile targets of `install`, `uninstall`, `deploy`, and `undeploy`. ```shell IMG=kuberay/operator:test make docker-build kind load docker-image kuberay/operator:test IMG=kuberay/operator:test make deploy kubectl apply -f dupe-worker-group-name.yaml raycluster.ray.io/raycluster-dupe-worker-name created ``` closes ray-project#718 closes ray-project#736 [1]: https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
groupName
, the cluster goes into a loop of creating and immediately terminating all pods with thatgroupName
kuberay-operator
fails with a clear error that "Multiple worker pod types have the samegroupName
"Running Kuberay 0.3.0 and nightly, Ray v2.0.1
Reproduction script
Use any cluster YML with two worker pod types, both with the same
groupName
Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: