Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator fails if nodeport out of allowed range #536

Open
jacobtomlinson opened this issue Jul 20, 2022 · 3 comments · May be fixed by #706
Open

Operator fails if nodeport out of allowed range #536

jacobtomlinson opened this issue Jul 20, 2022 · 3 comments · May be fixed by #706

Comments

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jul 20, 2022

If a DaskCluster is configured with a NodePort service but the ports are out of range the DaskCluster will be created but the controller logs will error repeatedly.

HTTP response headers: <CIMultiDictProxy('Audit-Id': '8949180b-0ce2-45f7-bf41-8974bb2d6dbf', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '8f735b53-3ddf-42b8-9a01-100a585f89dd', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'a1296d7b-f6e8-49d6-86b1-398564e988c4', 'Date': 'Wed, 20 Jul 2022 14:34:11 GMT', 'Content-Length': '546')>
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Service \"rapids-dask-cluster-service\" is invalid: spec.ports[0].nodePort: Invalid value: 38786: provided port is not in the valid range. The range of valid ports is 30000-32767","reason":"Invalid","details":{"name":"rapids-dask-cluster-service","kind":"Service","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: 38786: provided port is not in the valid range. The range of valid ports is 30000-32767","field":"spec.ports[0].nodePort"}]},"code":422}


[2022-07-20 14:34:11,443] kopf.objects         [DEBUG   ] [default/rapids-dask-cluster] Patching with: {'metadata': {'annotations': {'kopf.zalando.org/daskcluster_create': '{"started":"2022-07-20T14:34:11.423731","delayed":"2022-07-20T14:35:11.443753","purpose":"create","retries":1,"success":false,"failure":false,"message":"(422)\\nReason: Unprocessable Entity\\nHTTP response headers: <CIMultiDictProxy(\'Audit-Id\': \'8949180b-0ce2-45f7-bf41-8974bb2d6dbf\', \'Cache-Control\': \'no-cache, private\', \'Content-Type\': \'application/json\', \'X-Kubernetes-Pf-Flowschema-Uid\': \'8f735b53-3ddf-42b8-9a01-100a585f89dd\', \'X-Kubernetes-Pf-Prioritylevel-Uid\': \'a1296d7b-f6e8-49d6-86b1-398564e988c4\', \'Date\': \'Wed, 20 Jul 2022 14:34:11 GMT\', \'Content-Length\': \'546\')>\\nHTTP response body: {\\"kind\\":\\"Status\\",\\"apiVersion\\":\\"v1\\",\\"metadata\\":{},\\"status\\":\\"Failure\\",\\"message\\":\\"Service \\\\\\"rapids-dask-cluster-service\\\\\\" is invalid: spec.ports[0].nodePort: Invalid value: 38786: provided port is not in the valid range. The range of valid ports is 30000-32767\\",\\"reason\\":\\"Invalid\\",\\"details\\":{\\"name\\":\\"rapids-dask-cluster-service\\",\\"kind\\":\\"Service\\",\\"causes\\":[{\\"reason\\":\\"FieldValueInvalid\\",\\"message\\":\\"Invalid value: 38786: provided port is not in the valid range. The range of valid ports is 30000-32767\\",\\"field\\":\\"spec.ports[0].nodePort\\"}]},\\"code\\":422}\\n\\n"}'}}, 'status': {'kopf': {'progress': {'daskcluster_create': {'started': '2022-07-20T14:34:11.423731', 'stopped': None, 'delayed': '2022-07-20T14:35:11.443753', 'purpose': 'create', 'retries': 1, 'success': False, 'failure': False, 'message': '(422)\nReason: Unprocessable Entity\nHTTP response headers: <CIMultiDictProxy(\'Audit-Id\': \'8949180b-0ce2-45f7-bf41-8974bb2d6dbf\', \'Cache-Control\': \'no-cache, private\', \'Content-Type\': \'application/json\', \'X-Kubernetes-Pf-Flowschema-Uid\': \'8f735b53-3ddf-42b8-9a01-100a585f89dd\', \'X-Kubernetes-Pf-Prioritylevel-Uid\': \'a1296d7b-f6e8-49d6-86b1-398564e988c4\', \'Date\': \'Wed, 20 Jul 2022 14:34:11 GMT\', \'Content-Length\': \'546\')>\nHTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Service \\"rapids-dask-cluster-service\\" is invalid: spec.ports[0].nodePort: Invalid value: 38786: provided port is not in the valid range. The range of valid ports is 30000-32767","reason":"Invalid","details":{"name":"rapids-dask-cluster-service","kind":"Service","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: 38786: provided port is not in the valid range. The range of valid ports is 30000-32767","field":"spec.ports[0].nodePort"}]},"code":422}\n\n', 'subrefs': None}}}}}
[2022-07-20 14:34:11,448] kopf.objects         [WARNING ] [default/rapids-dask-cluster] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'progress': {'daskcluster_create': {'started': '2022-07-20T14:34:11.423731', 'stopped': None, 'delayed': '2022-07-20T14:35:11.443753', 'purpose': 'create', 'retries': 1, 'success': False, 'failure': False, 'message': '(422)\nReason: Unprocessable Entity\nHTTP response headers: <CIMultiDictProxy(\'Audit-Id\': \'8949180b-0ce2-45f7-bf41-8974bb2d6dbf\', \'Cache-Control\': \'no-cache, private\', \'Content-Type\': \'application/json\', \'X-Kubernetes-Pf-Flowschema-Uid\': \'8f735b53-3ddf-42b8-9a01-100a585f89dd\', \'X-Kubernetes-Pf-Prioritylevel-Uid\': \'a1296d7b-f6e8-49d6-86b1-398564e988c4\', \'Date\': \'Wed, 20 Jul 2022 14:34:11 GMT\', \'Content-Length\': \'546\')>\nHTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Service \\"rapids-dask-cluster-service\\" is invalid: spec.ports[0].nodePort: Invalid value: 38786: provided port is not in the valid range. The range of valid ports is 30000-32767","reason":"Invalid","details":{"name":"rapids-dask-cluster-service","kind":"Service","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: 38786: provided port is not in the valid range. The range of valid ports is 30000-32767","field":"spec.ports[0].nodePort"}]},"code":422}\n\n', 'subrefs': None}}}, None),)

We should do a little more input checking to ensure this can't happen. We should also have the controller put the DaskCluster into some kind of failure status while this is going on.

@jacobtomlinson
Copy link
Member Author

If you try creating the example but set nodePort to something impossible the controller gets stuck in a loop.

https://kubernetes.dask.org/en/latest/operator_resources.html

https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport

We should add some validation here:

def build_scheduler_service_spec(cluster_name, spec, annotations, labels):

@skirui-source
Copy link
Collaborator

skirui-source commented May 3, 2023

I've tried creating a DaskCluster object with out-of-range nodeport and I see this:

HTTP response headers: <CIMultiDictProxy('Audit-Id': '4c8966e6-156a-4584-8640-39a7164e9949', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '9f5dce77-f6df-4593-8079-a7ece243e04d', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'f5d11d69-50fc-4b73-8ba4-6e255c69420e', 'Date': 'Wed, 03 May 2023 05:32:24 GMT', 'Content-Length': '210')>
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"simple-scheduler\" already exists","reason":"AlreadyExists","details":{"name":"simple-scheduler","kind":"pods"},"code":409}
cluster.yaml
# cluster.yaml
apiVersion: kubernetes.dask.org/v1
kind: DaskCluster
metadata:
  name: test-cluster
  namespace: dask-operator
spec:
  worker:
    replicas: 2
    spec:
      containers:
      - name: worker
        image: "ghcr.io/dask/dask:latest"
        imagePullPolicy: "IfNotPresent"
        args:
          - dask-worker
          - --name
          - $(DASK_WORKER_NAME)
          - --dashboard
          - --dashboard-address
          - "8788"
        ports:
          - name: http-dashboard
            containerPort: 8788
            protocol: TCP
  scheduler:
    spec:
      containers:
      - name: scheduler
        image: "ghcr.io/dask/dask:latest"
        imagePullPolicy: "IfNotPresent"
        args:
          - dask-scheduler
        ports:
          - name: tcp-comm
            containerPort: 8786
            protocol: TCP
          - name: http-dashboard
            containerPort: 8787
            protocol: TCP
        readinessProbe:
          httpGet:
            port: http-dashboard
            path: /health
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            port: http-dashboard
            path: /health
          initialDelaySeconds: 15
          periodSeconds: 20
    service:
      type: NodePort
      selector:
        dask.org/cluster-name: simple-node-port
        dask.org/component: scheduler
      ports:
      - name: tcp-comm
        protocol: TCP
        port: 8786
        targetPort: "tcp-comm"
        nodePort: 38967 #30007 

@jacobtomlinson
Copy link
Member Author

Looks like you have a dask scheduler pod with a conflicting name hanging around. Perhaps left over from a failed test run?

@jacobtomlinson jacobtomlinson removed their assignment Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment