Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics-server : bind: permission denied #7988

Closed
olevitt opened this issue Sep 20, 2021 · 10 comments · Fixed by #8014
Closed

Metrics-server : bind: permission denied #7988

olevitt opened this issue Sep 20, 2021 · 10 comments · Fixed by #8014
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@olevitt
Copy link
Contributor

olevitt commented Sep 20, 2021

Hi !

We just upgraded to Kubespray v2.17.0 and since then, metrics-server fails to start with the following error :

panic: failed to create listener: failed to listen on 0.0.0.0:443: listen tcp 0.0.0.0:443: bind: permission denied

I saw that there were some issues, PR and discussions linked to this matter recently on Kubespray (see #7886 and related) but apparently this issue is not solved even though the pod has the correct capability to bind to the privileged 443 port.

Note that PodSecurityPolicies are enabled on this cluster, could it be related to the seccomp.security.alpha.kubernetes.io/pod: runtime/default annotation that is applied by the privileged PSP ? I'm sadly not used enough to those concepts to say.

Below is the pod configuration :

gon@laboitemagique:~$ kubectl get pods -n kube-system metrics-server-58f6668d5c-kkrqp  -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: privileged
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2021-09-18T10:57:48Z"
  generateName: metrics-server-58f6668d5c-
  labels:
    app.kubernetes.io/name: metrics-server
    pod-template-hash: 58f6668d5c
    version: v0.5.0
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:seccomp.security.alpha.kubernetes.io/pod: {}
        f:generateName: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/name: {}
          f:pod-template-hash: {}
          f:version: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"e98946ca-939c-490a-bee5-36b73c16db71"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:affinity:
          .: {}
          f:nodeAffinity:
            .: {}
            f:preferredDuringSchedulingIgnoredDuringExecution: {}
        f:containers:
          k:{"name":"metrics-server"}:
            .: {}
            f:args: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:livenessProbe:
              .: {}
              f:failureThreshold: {}
              f:httpGet:
                .: {}
                f:path: {}
                f:port: {}
                f:scheme: {}
              f:initialDelaySeconds: {}
              f:periodSeconds: {}
              f:successThreshold: {}
              f:timeoutSeconds: {}
            f:name: {}
            f:ports:
              .: {}
              k:{"containerPort":443,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
            f:readinessProbe:
              .: {}
              f:failureThreshold: {}
              f:httpGet:
                .: {}
                f:path: {}
                f:port: {}
                f:scheme: {}
              f:initialDelaySeconds: {}
              f:periodSeconds: {}
              f:successThreshold: {}
              f:timeoutSeconds: {}
            f:resources:
              .: {}
              f:limits:
                .: {}
                f:cpu: {}
                f:memory: {}
              f:requests:
                .: {}
                f:cpu: {}
                f:memory: {}
            f:securityContext:
              .: {}
              f:allowPrivilegeEscalation: {}
              f:capabilities:
                .: {}
                f:add: {}
                f:drop: {}
              f:readOnlyRootFilesystem: {}
              f:runAsGroup: {}
              f:runAsNonRoot: {}
              f:runAsUser: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/tmp"}:
                .: {}
                f:mountPath: {}
                f:name: {}
          k:{"name":"metrics-server-nanny"}:
            .: {}
            f:command: {}
            f:env:
              .: {}
              k:{"name":"MY_POD_NAME"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:fieldRef:
                    .: {}
                    f:apiVersion: {}
                    f:fieldPath: {}
              k:{"name":"MY_POD_NAMESPACE"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:fieldRef:
                    .: {}
                    f:apiVersion: {}
                    f:fieldPath: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources:
              .: {}
              f:limits:
                .: {}
                f:cpu: {}
                f:memory: {}
              f:requests:
                .: {}
                f:cpu: {}
                f:memory: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/etc/config"}:
                .: {}
                f:mountPath: {}
                f:name: {}
        f:dnsPolicy: {}
        f:enableServiceLinks: {}
        f:priorityClassName: {}
        f:restartPolicy: {}
        f:schedulerName: {}
        f:securityContext: {}
        f:serviceAccount: {}
        f:serviceAccountName: {}
        f:terminationGracePeriodSeconds: {}
        f:tolerations: {}
        f:volumes:
          .: {}
          k:{"name":"metrics-server-config-volume"}:
            .: {}
            f:configMap:
              .: {}
              f:defaultMode: {}
              f:name: {}
            f:name: {}
          k:{"name":"tmp"}:
            .: {}
            f:emptyDir: {}
            f:name: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-09-18T10:57:48Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          k:{"type":"ContainersReady"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"Initialized"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
          k:{"type":"Ready"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
        f:containerStatuses: {}
        f:hostIP: {}
        f:phase: {}
        f:podIP: {}
        f:podIPs:
          .: {}
          k:{"ip":"10.233.99.83"}:
            .: {}
            f:ip: {}
        f:startTime: {}
    manager: kubelet
    operation: Update
    time: "2021-09-18T10:57:59Z"
  name: metrics-server-58f6668d5c-kkrqp
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: metrics-server-58f6668d5c
    uid: e98946ca-939c-490a-bee5-36b73c16db71
  resourceVersion: "224802137"
  uid: adc9b98a-08e1-44f8-a564-d8692b0bfdca
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: node-role.kubernetes.io/control-plane
            operator: In
            values:
            - ""
        weight: 100
  containers:
  - args:
    - --logtostderr
    - --cert-dir=/tmp
    - --secure-port=443
    - --kubelet-preferred-address-types=InternalIP
    - --kubelet-use-node-status-port
    - --kubelet-insecure-tls
    - --metric-resolution=15s
    image: k8s.gcr.io/metrics-server/metrics-server:v0.5.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /livez
        port: https
        scheme: HTTPS
      initialDelaySeconds: 40
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: metrics-server
    ports:
    - containerPort: 443
      name: https
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /readyz
        port: https
        scheme: HTTPS
      initialDelaySeconds: 40
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: 42m
        memory: 59Mi
      requests:
        cpu: 42m
        memory: 59Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
        - NET_BIND_SERVICE
        drop:
        - all
      readOnlyRootFilesystem: true
      runAsGroup: 10001
      runAsNonRoot: true
      runAsUser: 10001
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /tmp
      name: tmp
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-db25s
      readOnly: true
  - command:
    - /pod_nanny
    - --config-dir=/etc/config
    - --cpu=20m
    - --extra-cpu=1m
    - --memory=15Mi
    - --extra-memory=2Mi
    - --threshold=5
    - --deployment=metrics-server
    - --container=metrics-server
    - --poll-period=300000
    - --estimator=exponential
    - --minClusterSize=10
    env:
    - name: MY_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: MY_POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    image: k8s.gcr.io/addon-resizer:1.8.11
    imagePullPolicy: IfNotPresent
    name: metrics-server-nanny
    resources:
      limits:
        cpu: 40m
        memory: 25Mi
      requests:
        cpu: 40m
        memory: 25Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/config
      name: metrics-server-config-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-db25s
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: controlplane2
  preemptionPolicy: PreemptLowerPriority
  priority: 2000000000
  priorityClassName: system-cluster-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: metrics-server
  serviceAccountName: metrics-server
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      name: metrics-server-config
    name: metrics-server-config-volume
  - emptyDir: {}
    name: tmp
  - name: kube-api-access-db25s
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-09-18T10:57:48Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-09-18T10:57:48Z"
    message: 'containers with unready status: [metrics-server]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-09-18T10:57:48Z"
    message: 'containers with unready status: [metrics-server]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-09-18T10:57:48Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://112896056b4a417641e35577b5ff521091f7238c783e1f6d53fa96f86f0dfc99
    image: k8s.gcr.io/metrics-server/metrics-server:v0.5.0
    imageID: docker-pullable://k8s.gcr.io/metrics-server/metrics-server@sha256:6c5603956c0aed6b4087a8716afce8eb22f664b13162346ee852b4fab305ca15
    lastState:
      terminated:
        containerID: docker://112896056b4a417641e35577b5ff521091f7238c783e1f6d53fa96f86f0dfc99
        exitCode: 2
        finishedAt: "2021-09-20T07:35:20Z"
        reason: Error
        startedAt: "2021-09-20T07:35:16Z"
    name: metrics-server
    ready: false
    restartCount: 516
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=metrics-server pod=metrics-server-58f6668d5c-kkrqp_kube-system(adc9b98a-08e1-44f8-a564-d8692b0bfdca)
        reason: CrashLoopBackOff
  - containerID: docker://a6c0d3f9105a4f85fe8332abc6db49c08525e2b6994ff2da6bc5224bd978c23d
    image: k8s.gcr.io/addon-resizer:1.8.11
    imageID: docker-pullable://k8s.gcr.io/addon-resizer@sha256:35745de3c9a2884d53ad0e81b39f1eed9a7c77f5f909b9e84f9712b37ffb3021
    lastState: {}
    name: metrics-server-nanny
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-09-18T10:57:58Z"
  hostIP: 192.168.254.132
  phase: Running
  podIP: 10.233.99.83
  podIPs:
  - ip: 10.233.99.83
  qosClass: Guaranteed
  startTime: "2021-09-18T10:57:48Z"
@olevitt olevitt added the kind/bug Categorizes issue or PR as related to a bug. label Sep 20, 2021
@olevitt
Copy link
Contributor Author

olevitt commented Sep 20, 2021

Update : I tried installing a brand new kubespray with the same configuration (but on VM instead of baremetal and debian 11 instead of centOS 7 for controlplanes) and everything works fine on it.
Could it be related to the OS / kernel version ? I will make more tests to confirm / not confirm this.
Fails on :

controlplane1   Ready    control-plane,master   333d   v1.21.5   192.168.254.131   <none>        CentOS Linux 7 (Core)          3.10.0-1127.el7.x86_64   docker://20.10.8

Works on :

v2master1   Ready    control-plane,master   33m   v1.21.5   192.168.254.131   <none>        Debian GNU/Linux 11 (bullseye)   5.10.0-8-cloud-amd64   docker://20.10.8

@olevitt
Copy link
Contributor Author

olevitt commented Sep 20, 2021

Update : I tested running metrics-server on a worker node (which are running debian 10 / kernel 3) instead of a controlplane to see if it managed to bind and yes. So I guess this may indicate that the issue is linked to centOS 7 / kernel 3.

@kailunshi
Copy link

kailunshi commented Sep 21, 2021

same issue here, changing allowPrivilegeEscalation: false => true fixes it.

But why did we change the port to 443 from 8443 in the first place? @L3o-pold could you please shed some light as you made this change along with the metric-server upgrade in #7864?

Thank you!

@ralbon
Copy link
Contributor

ralbon commented Sep 21, 2021

Hi,

Same issue here, changing port 443 -> 4443 fixes also (CentOS 7)

I think using port 443 seems to be the target. If you check metric-server port has changed between release-0.5 and release-0.4.

@kailunshi
Copy link

@ralbon thank you for the context. Maybe that's why @L3o-pold made the port change, though I disagree with a secondary service taking the 443 port from the host.

Anyways, due to this issue we have to stay with the previous kubespray version. Hope this gets sorted out in the next release.

@oomichi
Copy link
Contributor

oomichi commented Sep 24, 2021

Thanks for this issue report.
conformed this issue happens on CentOS7 with the latest commit(22017b7) on my local machine also.
Let's find a solution for fixing this issue.

@oomichi
Copy link
Contributor

oomichi commented Sep 24, 2021

I created a pull request(#8014) to fix this issue.
I confirmed that works on CentOS7.

@manzsolutions-lpr
Copy link
Contributor

@oomichi Could this issue be re-opened, please?

I think that there are actually two overlapping issues: Port 443 being bound already and PSPs prohibiting the use of privileged ports. We use PSPs and this will not suffice.

There actually is an issue open upstream, too: kubernetes-sigs/metrics-server#782

@oomichi
Copy link
Contributor

oomichi commented Oct 11, 2021

@manzsolutions-lpreis Could you open different issue to make it easy if we still have another issue?

@manzsolutions-lpr
Copy link
Contributor

@oomichi I have to admit that while preparing the new issue I was unable to recreate this specific error we were seeing during the initial upgrade and the pod is apparently running with the proper PSP attached via a group for the kube-system namespace.
After encountering the issue last week we scaled the metrics-server deployment to zero and focused on other troubles like switching from Docker to containerd. Sorry for the fuss about nothing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants