Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Head Node service is not getting exposed on Internal Load Balancer despite proper annotations #637

Closed
1 of 2 tasks
snow01 opened this issue Oct 16, 2022 · 10 comments
Closed
1 of 2 tasks
Assignees
Labels
bug Something isn't working P2 Important issue, but not time critical

Comments

@snow01
Copy link

snow01 commented Oct 16, 2022

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

I am trying to set up the head node service exposed as LoadBalancer but on an internal one, but KubeRay is still using public LoadBalancer despite giving proper annotations.

I have all the necessary modules installed in my EKS cluster, such as AWS Load Balancer, and I also have a sample service working to expose internal LoadBalancer. That way, I know my annotations are correct.

For the ray cluster setup, I have given annotations both at the top level and within the template - hoping it will be picked from one of them.

Pls see the config snippet below.

Reproduction script

apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: ${RAY_CLUSTER}
  namespace: ${RAY_CLUSTER_NS}
  annotations:
   service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
   service.beta.kubernetes.io/aws-load-balancer-scheme: internal
   service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0"
   service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
spec:
  rayVersion: '2.0.0'
  enableInTreeAutoscaling: true

  autoscalerOptions:
    upscalingMode: Default
resources.
    idleTimeoutSeconds: 60
    imagePullPolicy: Always
required.
    resources:
      limits:
        cpu: "1"
        memory: "2Gi"
      requests:
        cpu: "1"
        memory: "2Gi"
  ######################headGroupSpec#################################
  headGroupSpec:
    serviceType: LoadBalancer
    enableIngress: true

    rayStartParams:
      # Flag "no-monitor" will be automatically set when autoscaling is enabled.
      dashboard-host: '0.0.0.0'
      block: 'true'
      num-cpus: '0' # can be auto-completed from the limits
    template:
      metadata:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
          service.beta.kubernetes.io/aws-load-balancer-scheme: internal
          service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0"
          service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
...

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@snow01 snow01 added the bug Something isn't working label Oct 16, 2022
@kevin85421 kevin85421 self-assigned this Oct 18, 2022
@snow01
Copy link
Author

snow01 commented Oct 19, 2022

any updates on this, pls ?

@kevin85421
Copy link
Member

any updates on this, pls ?

Thank you for your followup! I started to take a look at the ingress support in KubeRay and found some questions. Currently, I am working on #646. The PR is a blocker of #637. Thank you for your patience.

@kevin85421
Copy link
Member

Hi @snow01,

I started to work on this issue by getting some context of aws-load-balancer-controller, ALB, and NLB. I will try to reproduce this issue on my EKS.

I have several questions on this issue.

I am trying to set up the head node service exposed as LoadBalancer but on an internal one, but KubeRay is still using public LoadBalancer despite giving proper annotations.

Q1: What is the definition of "internal one"? Do you mean aws-load-balancer-scheme: internal?
Q2: What is the definition of "public LoadBalancer"? Do you mean aws-load-balancer-scheme: internet-facing?

service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0"
service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip

Q3: Why the value of aws-load-balancer-internal is "0.0.0.0/0"? This doc indicates that the value should be boolean. In addition, it was deprecated in favor of aws-load-balancer-scheme. The following is the deprecation note in doc. Maybe this annotation is unnecessary?

  • This annotation is deprecated starting v2.2.0 release in favor of the new aws-load-balancer-scheme annotation. It will be supported, but in case of ties, the aws-load-balancer-scheme gets precedence.

Q4: The doc indicate that nlb-ip is deprecated. In addition, the backwards compatibility note for nlb-ip type in the doc said that nlb-ip is equivalent to aws-load-balancer-type: external and aws-load-balancer-nlb-target-type: ip. Although there is no conflict, maybe it is better to use external instead.

  • For backwards compatibility, both in-tree and AWS Load Balancer controller supports nlb-ip as value of service.beta.kubernetes.io/aws-load-balancer-type annotation. The controllers treats it as if you specified both annotation below:

    service.beta.kubernetes.io/aws-load-balancer-type: external
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    

Q5: Why do we need enableIngress: true?

Q6: What's the result of kubectl describe $YOUR_HEAD_SERVICE?

Thank you!

@kevin85421
Copy link
Member

I took a look at the source code. The annotations will not be passed to the head service. We can double-check this with the answer to Q6 mentioned above. If so, I will update KubeRay to enable the annotation propagations.

func BuildServiceForHeadPod(cluster rayiov1alpha1.RayCluster, labels map[string]string) (*corev1.Service, error) {
if labels == nil {
labels = make(map[string]string)
}
default_labels := map[string]string{
RayClusterLabelKey: cluster.Name,
RayNodeTypeLabelKey: string(rayiov1alpha1.HeadNode),
RayIDLabelKey: utils.CheckLabel(utils.GenerateIdentifier(cluster.Name, rayiov1alpha1.HeadNode)),
KubernetesApplicationNameLabelKey: ApplicationName,
KubernetesCreatedByLabelKey: ComponentName,
}
for k, v := range default_labels {
if _, ok := labels[k]; !ok {
labels[k] = v
}
}
service := &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: utils.GenerateServiceName(cluster.Name),
Namespace: cluster.Namespace,
Labels: labels,
},
Spec: corev1.ServiceSpec{
Selector: labels,
Ports: []corev1.ServicePort{},
Type: cluster.Spec.HeadGroupSpec.ServiceType,
},
}
ports := getServicePorts(cluster)
for name, port := range ports {
svcPort := corev1.ServicePort{Name: name, Port: port}
service.Spec.Ports = append(service.Spec.Ports, svcPort)
}
return service, nil
}

@kevin85421
Copy link
Member

Gentle ping @snow01

@snow01
Copy link
Author

snow01 commented Oct 25, 2022

Yes, the main issue is that annotations are not getting passed to the head service.

@snow01
Copy link
Author

snow01 commented Oct 25, 2022

Q1: What is the definition of "internal one"? Do you mean aws-load-balancer-scheme: internal?

Yes

Q2: What is the definition of "public LoadBalancer"? Do you mean aws-load-balancer-scheme: internet-facing?
Yes

Q3: Why the value of aws-load-balancer-internal is "0.0.0.0/0"? This doc indicates that the value should be boolean. In addition, it was deprecated in favor of aws-load-balancer-scheme. The following is the deprecation note in doc. Maybe this annotation is unnecessary?

You are right, this should not matter.

Q4: The doc indicate that nlb-ip is deprecated. In addition, the backwards compatibility note for nlb-ip type in the doc said that nlb-ip is equivalent to aws-load-balancer-type: external and aws-load-balancer-nlb-target-type: ip. Although there is no conflict, maybe it is better to use external instead.

This is also deprecated. So doesn't matter.

Following configurations are the one that should matter -

service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal

@kevin85421
Copy link
Member

Thank @snow01 for your reply!

I have added this issue to today's KubeRay community sync agenda. If you are interested in the discussion, welcome to join the community sync!

(1) Agenda
(2) Google Meet

@snow01
Copy link
Author

snow01 commented Oct 29, 2022

Let me try this as per doc and confirm back. Thanks

@DmitriGekhtman DmitriGekhtman added the P2 Important issue, but not time critical label Nov 4, 2022
@DmitriGekhtman DmitriGekhtman added this to the v0.5.0 release milestone Nov 4, 2022
@kevin85421
Copy link
Member

KubeRay has already exposed the entire head service for users #1040. Everything should be configurable now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 Important issue, but not time critical
Projects
None yet
Development

No branches or pull requests

3 participants