FailedNodeAllocatableEnforcement on cgroups-per-qos false #2263

jcharlytown · 2018-02-13T16:55:17Z

Is this a request for help?:
Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
Issue

What version of acs-engine?:
As of now all shipping with PR #1960

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Tested with Kubernetes 1.8.4, 1.8.6, 1.8.7

What happened:
Kubelets constantly file events like this:

Warning  FailedNodeAllocatableEnforcement  49s (x3 over 2m)  kubelet, k8s-master-56038831-0  Failed to update Node Allocatable Limits "": failed to set supported cgroup subsystems for cgroup : Failed to set config for supported subsystems : failed to write 8342011904 to memory.limit_in_bytes: write /var/lib/docker/overlay2/05c98c88dcf284951773f777b09d37101b1eed0eb43b7e5295ee0600f4390f1e/merged/sys/fs/cgroup/memory/memory.limit_in_bytes: invalid argument

What you expected to happen:
This warning not to pop up, i.e., Normal NodeAllocatableEnforced 4s kubelet, k8s-master-56038831-0 Updated Node Allocatable limit across pods instead

How to reproduce it (as minimally and precisely as possible):
Deploying cluster using this model (I removed some secrets and ssh key):

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.8",
      "orchestratorVersion": "1.8.6",
      "kubernetesConfig": {
        "kubernetesImageBase": "gcrio.azureedge.net/google_containers/",
        "DockerBridgeSubnet": "172.17.0.1/16",
        "kubeletConfig": {
          "--eviction-hard": "memory.available<500Mi,nodefs.available<10%,nodefs.inodesFree<5%",
          "--system-reserved": "memory=1.5Gi"
        }
      }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "qos-cgroups-test",
      "vmSize": "Standard_D2_v3",
      "storageProfile": "ManagedDisks",
      "oauthEnabled": false,
      "distro": "ubuntu",
      "vnetCidr": "10.244.0.0/16"
    },
    "agentPoolProfiles": [
      {
        "name": "agents",
        "count": 1,
        "vmSize": "Standard_E2s_v3",
        "osType": "Linux",
        "availabilityProfile": "AvailabilitySet",
        "storageProfile": "ManagedDisks",
        "distro": "ubuntu"
      }
    ],
    "linuxProfile": {
      [...]
    },
    "servicePrincipalProfile": {
      [...]
    }
  }
}

Afterwards running kubectl describe node <NODE> on the new cluster - the described event should pop up.

Manually changing /etc/default/kubelet and changing --cgroups-per-qos to true instead of false makes the event disappear. However, I have not further investigated whether or not this has other side effects.

Hence, I have two questions: a) Am I the only person experiencing this? And, b) what is the rational behind changing this argument from true (as per default) to false and making it static at the same time?

Also, am I missing something? As far as I can tell I am not doing anything specific, this should be a very plain setup?

Anything else we need to know:

The text was updated successfully, but these errors were encountered:

khaldoune · 2018-02-13T19:09:25Z

+1

yastij · 2018-02-14T11:33:04Z

cc @karataliu @feiskyer

campbelldgunn · 2018-02-15T00:32:45Z

I am getting the same issue on an K8s 1.9.2 cluster using acs-engine 0.12.5 engine. In fact over 19K time over 3 days. @yastij @karataliu @feiskyer

pidah · 2018-02-19T19:46:33Z

@jcharlytown We are observing the same issue. According to the docs the --cgroups-per-qos flag needs to be enabled – https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#enabling-qos-and-pod-level-cgroups

yastij · 2018-02-19T19:53:45Z

cc @arnaudmz

jackfrancis · 2018-02-21T19:11:17Z

PR #2310 aims to fix this

jcharlytown · 2018-02-22T08:50:36Z

Thanks everyone for fixing this. +1!

jcharlytown · 2018-03-02T12:09:02Z

I just checked the latest release - it seems this fix is not included. Is there any specific reason?

yastij · 2018-03-02T12:46:33Z

cc @jackfrancis

jackfrancis · 2018-03-05T23:20:53Z

@jcharlytown @yastij Are you able to build from master to deploy clusters? If there is an overwhelming desire for a patch release we can include this fix in such a release this week.

jcharlytown · 2018-03-06T08:32:24Z

@jackfrancis, I can build from master, yes. I was just wondering why that was in general. To understand the procedures here might come in handy in the future as we are currently evaluating what tool chain to use to manage our clusters. Can you comment on why this was not included?

yastij · 2018-03-06T09:14:28Z

@jackfrancis - would be really nice, thanks !

jackfrancis · 2018-03-06T20:18:44Z

We generate a patch release when there is a wide-impacting fix that we want to encourage folks to opt into. This is why it's important to vote when you want to see something sooner than the next minor release. So if there's a v0.13.2, we'll include this in it. :)

Additionally in the future we plan to patch release when a new Kubernetes release is published to make it easier to deploy new clusters (without requiring users to build from master).

yastij · 2018-03-06T21:43:30Z

SGTM

pidah mentioned this issue Feb 19, 2018

set kubelet defaults for --cgroups-per-qos & --enforce-node-allocatable #2310

Merged

jackfrancis closed this as completed Feb 21, 2018

angrox mentioned this issue Mar 28, 2018

set kubelet defaults for --cgroups-per-qos & --enforce-node-allocatable Azure/AKS#277

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FailedNodeAllocatableEnforcement on cgroups-per-qos false #2263

FailedNodeAllocatableEnforcement on cgroups-per-qos false #2263

jcharlytown commented Feb 13, 2018 •

edited

Loading

khaldoune commented Feb 13, 2018

yastij commented Feb 14, 2018

campbelldgunn commented Feb 15, 2018

pidah commented Feb 19, 2018

yastij commented Feb 19, 2018 •

edited

Loading

jackfrancis commented Feb 21, 2018

jcharlytown commented Feb 22, 2018

jcharlytown commented Mar 2, 2018

yastij commented Mar 2, 2018

jackfrancis commented Mar 5, 2018

jcharlytown commented Mar 6, 2018

yastij commented Mar 6, 2018

jackfrancis commented Mar 6, 2018

yastij commented Mar 6, 2018

FailedNodeAllocatableEnforcement on cgroups-per-qos false #2263

FailedNodeAllocatableEnforcement on cgroups-per-qos false #2263

Comments

jcharlytown commented Feb 13, 2018 • edited Loading

Is this a request for help?: Yes

Is this an ISSUE or FEATURE REQUEST? (choose one): Issue

What version of acs-engine?: As of now all shipping with PR #1960

khaldoune commented Feb 13, 2018

yastij commented Feb 14, 2018

campbelldgunn commented Feb 15, 2018

pidah commented Feb 19, 2018

yastij commented Feb 19, 2018 • edited Loading

jackfrancis commented Feb 21, 2018

jcharlytown commented Feb 22, 2018

jcharlytown commented Mar 2, 2018

yastij commented Mar 2, 2018

jackfrancis commented Mar 5, 2018

jcharlytown commented Mar 6, 2018

yastij commented Mar 6, 2018

jackfrancis commented Mar 6, 2018

yastij commented Mar 6, 2018

jcharlytown commented Feb 13, 2018 •

edited

Loading

Is this a request for help?:
Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
Issue

What version of acs-engine?:
As of now all shipping with PR #1960

yastij commented Feb 19, 2018 •

edited

Loading