set kubelet defaults for --cgroups-per-qos & --enforce-node-allocatable #277

angrox · 2018-03-28T15:07:50Z

Disclaimer: This issue reference to the following issue in the acs-engine:
Azure/acs-engine#2263
with the solving Pull Request here: Azure/acs-engine#2310

In my AKS Cluster (1.9.2) in westeurope I have the same issues as mentioned in the acs-engine issues:

Warning FailedNodeAllocatableEnforcement 1m (x106 over 1h) kubelet, aks-nodepool1-77770737-0 Failed to update Node Allocatable Limits "": failed to set supported cgroup subsystems for cgroup : Failed to set config for supported subsystems : failed to write 8342003712 to memory.limit_in_bytes: write /var/lib/docker/overlay2/7428add845f7e87ff8620731e8d9ef63a703255de49fb2a2f1d8a867f491f420/merged/sys/fs/cgroup/memory/memory.limit_in_bytes: invalid argument

When will the AKS clusters updated with the fix?

The text was updated successfully, but these errors were encountered:

angrox · 2018-03-28T15:15:02Z

Official Support Call is also open: 118032817900562

slack · 2018-04-03T03:04:23Z

The AKS rollout this week includes acs-engine 0.14.6. This patch will be available in all AKS regions by the end of the week, and will be applied to existing clusters via az aks upgrade.

angrox · 2018-04-03T18:11:38Z

Thanks! I will close this issue as soon as the patch is available and we've tested it!

bbrosemer · 2018-04-04T17:23:11Z

@slack does az aks upgrade imply it will be aligned with a kubernetes upgrade too ?

slack · 2018-04-04T21:23:27Z

@bbrosemer yeah, the updated configuration will be applied as the nodes are replaced during upgrade.

guesslin · 2018-04-09T03:07:54Z

@slack az aks upgrade not working to us, our AKS cluster turns into Failed state

$ az aks list
Name    Location    ResourceGroup    KubernetesVersion    ProvisioningState    Fqdn
------  ----------  ---------------  -------------------  -------------------  -------------------------------------------------
stage1  eastus      stage            1.8.10               Failed               stage1-stage-2f4d48-39f0e898.hcp.eastus.azmk8s.io

angrox · 2018-04-09T12:03:16Z

Currently I am upgrading from 1.9.2 to 1.9.6. When I log in to a updated machine I do not see the change in the kubelet configuration:

$ kubectl get nodes
NAME                       STATUS    ROLES     AGE       VERSION
[...]
aks-nodepool1-77770xxx-5   Ready     agent     2h        v1.9.6

Errors:

$ kubectl describe node aks-nodepool1-77770xxx-5
[...]
Events:
  Type     Reason                            Age                From                               Message
  ----     ------                            ----               ----                               -------
  Warning  FailedNodeAllocatableEnforcement  2m (x136 over 2h)  kubelet, aks-nodepool1-77770737-5  Failed to update Node Allocatable Limits "": failed to set supported cgroup subsystems for cgroup : Failed to set config for supported subsystems : failed to write 8342003712 to memory.limit_in_bytes: write /var/lib/docker/overlay2/5df472c59f31fb8272481b920fe782e5310d622f77f90d069a8f85ef05277cc7/merged/sys/fs/cgroup/memory/memory.limit_in_bytes: invalid argument

Arguments for kubelet on the node (in /etc/default/kubelet)

KUBELET_CONFIG=--address=0.0.0.0 --allow-privileged=true --authorization-mode=Webhook --azure-container-registry-config=/etc/kubernetes/azure.json --cadvisor-port=0 --cgroups-per-qos=false --cloud-config=/etc/kubernetes/azure.json --cloud-provider=azure --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --enforce-node-allocatable= --event-qps=0 --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5% --feature-gates=Accelerators=true --image-gc-high-threshold=85 --image-gc-low-threshold=80 --keep-terminated-pod-volumes=false --kubeconfig=/var/lib/kubelet/kubeconfig --max-pods=110 --network-plugin=kubenet --node-status-update-frequency=10s --non-masquerade-cidr=10.0.0.0/8 --pod-infra-container-image=k8s-gcrio.azureedge.net/pause-amd64:3.0 --pod-manifest-path=/etc/kubernetes/manifests

Edit: Upgrade failed ("provisioningState": "Failed",)

Edit2: Detailed error message:

   "properties": {
        "statusCode": "Conflict",
        "statusMessage": "{\"status\":\"Failed\",\"error\":{\"code\":\"ResourceDeploymentFailure\",\"message\":\"The resource operation completed with terminal provisioning state 'Failed'.\",\"details\":[{\"code\":\"VMExtensionProvisioningError\",\"message\":\"VM has reported a failure when processing extension 'cse6'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=5\\n[stdout]\\n\\n[stderr]\\n\\\".\"}]}}",
        "serviceRequestId": "7aae6d60-5c6d-429c-9ab7-6237d29e640c"
    },

jackfrancis · 2018-04-09T23:24:08Z

@angrox I infer you don't see --cgroups-per-qos=true in your kubelet runtime config?

ps auxfww | grep /usr/local/bin/kubelet is one way to grok that

angrox · 2018-04-10T07:21:55Z

@jackfrancis yeah, it is not there.

root 12008 0.0 0.0 161276 7000 ? Ssl Apr09 0:03 /usr/bin/docker run --net=host --pid=host --privileged --rm --volume=/:/rootfs:ro,shared --volume=/dev:/dev --volume=/sys:/sys:ro --volume=/var/run:/var/run:rw --volume=/var/lib/cni/:/var/lib/cni:rw --volume=/sbin/apparmor_parser/:/sbin/apparmor_parser:rw --volume=/var/lib/docker/:/var/lib/docker:rw,shared --volume=/var/lib/containers/:/var/lib/containers:rw --volume=/var/lib/kubelet/:/var/lib/kubelet:rw,shared --volume=/var/log:/var/log:rw --volume=/etc/kubernetes/:/etc/kubernetes:ro --volume=/srv/kubernetes/:/srv/kubernetes:ro --volume=/var/lib/waagent/ManagedIdentity-Settings:/var/lib/waagent/ManagedIdentity-Settings:ro --volume=/etc/kubernetes/volumeplugins:/etc/kubernetes/volumeplugins:rw k8s-gcrio.azureedge.net/hyperkube-amd64:v1.9.6 /hyperkube kubelet --containerized --enable-server --node-labels=kubernetes.io/role=agent,agentpool=nodepool1,storageprofile=managed,storagetier=Premium_LRS,kubernetes.azure.com/cluster=MC_cn-kubernetes-dev_cn-kubernetes-dev_westeurope --v=2 --non-masquerade-cidr=10.0.0.0/8 --volume-plugin-dir=/etc/kubernetes/volumeplugins --address=0.0.0.0 --allow-privileged=true --authorization-mode=Webhook --azure-container-registry-config=/etc/kubernetes/azure.json --cadvisor-port=0

--cgroups-per-qos=false

--cloud-config=/etc/kubernetes/azure.json --cloud-provider=azure --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --enforce-node-allocatable= --event-qps=0 --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5% --feature-gates=Accelerators=true --image-gc-high-threshold=85 --image-gc-low-threshold=80 --keep-terminated-pod-volumes=false --kubeconfig=/var/lib/kubelet/kubeconfig --max-pods=110 --network-plugin=kubenet --node-status-update-frequency=10s --non-masquerade-cidr=10.0.0.0/8 --pod-infra-container-image=k8s-gcrio.azureedge.net/pause-amd64:3.0 --pod-manifest-path=/etc/kubernetes/manifests

@slack So the fix was not applied?

angrox · 2018-04-10T13:29:10Z

Installing a new cluster fixes the issue - the patch is in place. Upgrading (see posts above) do not update the config

jackfrancis · 2018-04-10T15:55:14Z

@slack we should follow up with this, as @angrox's experience does not match our expectations. Thanks @angrox for your stamina here!

angrox · 2018-04-11T08:11:01Z

@jackfrancis I am also in contact with one of MS escalation managers and gave them access to our defective clusters. If you need more information please PM me.

jnoller · 2019-04-03T20:38:38Z

Closing stale/resolved

jnoller closed this as completed Apr 3, 2019

ghost locked as resolved and limited conversation to collaborators Aug 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set kubelet defaults for --cgroups-per-qos & --enforce-node-allocatable #277

set kubelet defaults for --cgroups-per-qos & --enforce-node-allocatable #277

angrox commented Mar 28, 2018 •

edited

Loading

angrox commented Mar 28, 2018

slack commented Apr 3, 2018

angrox commented Apr 3, 2018

bbrosemer commented Apr 4, 2018

slack commented Apr 4, 2018

guesslin commented Apr 9, 2018

angrox commented Apr 9, 2018 •

edited

Loading

jackfrancis commented Apr 9, 2018

angrox commented Apr 10, 2018 •

edited

Loading

angrox commented Apr 10, 2018

jackfrancis commented Apr 10, 2018

angrox commented Apr 11, 2018

jnoller commented Apr 3, 2019

set kubelet defaults for --cgroups-per-qos & --enforce-node-allocatable #277

set kubelet defaults for --cgroups-per-qos & --enforce-node-allocatable #277

Comments

angrox commented Mar 28, 2018 • edited Loading

angrox commented Mar 28, 2018

slack commented Apr 3, 2018

angrox commented Apr 3, 2018

bbrosemer commented Apr 4, 2018

slack commented Apr 4, 2018

guesslin commented Apr 9, 2018

angrox commented Apr 9, 2018 • edited Loading

jackfrancis commented Apr 9, 2018

angrox commented Apr 10, 2018 • edited Loading

angrox commented Apr 10, 2018

jackfrancis commented Apr 10, 2018

angrox commented Apr 11, 2018

jnoller commented Apr 3, 2019

angrox commented Mar 28, 2018 •

edited

Loading

angrox commented Apr 9, 2018 •

edited

Loading

angrox commented Apr 10, 2018 •

edited

Loading