Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

FailedNodeAllocatableEnforcement on cgroups-per-qos false #2263

Closed
jcharlytown opened this issue Feb 13, 2018 · 14 comments
Closed

FailedNodeAllocatableEnforcement on cgroups-per-qos false #2263

jcharlytown opened this issue Feb 13, 2018 · 14 comments

Comments

@jcharlytown
Copy link

jcharlytown commented Feb 13, 2018

Is this a request for help?:
Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
Issue

What version of acs-engine?:
As of now all shipping with PR #1960

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Tested with Kubernetes 1.8.4, 1.8.6, 1.8.7

What happened:
Kubelets constantly file events like this:

Warning  FailedNodeAllocatableEnforcement  49s (x3 over 2m)  kubelet, k8s-master-56038831-0  Failed to update Node Allocatable Limits "": failed to set supported cgroup subsystems for cgroup : Failed to set config for supported subsystems : failed to write 8342011904 to memory.limit_in_bytes: write /var/lib/docker/overlay2/05c98c88dcf284951773f777b09d37101b1eed0eb43b7e5295ee0600f4390f1e/merged/sys/fs/cgroup/memory/memory.limit_in_bytes: invalid argument

What you expected to happen:
This warning not to pop up, i.e., Normal NodeAllocatableEnforced 4s kubelet, k8s-master-56038831-0 Updated Node Allocatable limit across pods instead

How to reproduce it (as minimally and precisely as possible):
Deploying cluster using this model (I removed some secrets and ssh key):

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.8",
      "orchestratorVersion": "1.8.6",
      "kubernetesConfig": {
        "kubernetesImageBase": "gcrio.azureedge.net/google_containers/",
        "DockerBridgeSubnet": "172.17.0.1/16",
        "kubeletConfig": {
          "--eviction-hard": "memory.available<500Mi,nodefs.available<10%,nodefs.inodesFree<5%",
          "--system-reserved": "memory=1.5Gi"
        }
      }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "qos-cgroups-test",
      "vmSize": "Standard_D2_v3",
      "storageProfile": "ManagedDisks",
      "oauthEnabled": false,
      "distro": "ubuntu",
      "vnetCidr": "10.244.0.0/16"
    },
    "agentPoolProfiles": [
      {
        "name": "agents",
        "count": 1,
        "vmSize": "Standard_E2s_v3",
        "osType": "Linux",
        "availabilityProfile": "AvailabilitySet",
        "storageProfile": "ManagedDisks",
        "distro": "ubuntu"
      }
    ],
    "linuxProfile": {
      [...]
    },
    "servicePrincipalProfile": {
      [...]
    }
  }
}

Afterwards running kubectl describe node <NODE> on the new cluster - the described event should pop up.

Manually changing /etc/default/kubelet and changing --cgroups-per-qos to true instead of false makes the event disappear. However, I have not further investigated whether or not this has other side effects.

Hence, I have two questions: a) Am I the only person experiencing this? And, b) what is the rational behind changing this argument from true (as per default) to false and making it static at the same time?

Also, am I missing something? As far as I can tell I am not doing anything specific, this should be a very plain setup?

Anything else we need to know:

@khaldoune
Copy link

+1

@yastij
Copy link

yastij commented Feb 14, 2018

cc @karataliu @feiskyer

@campbelldgunn
Copy link

I am getting the same issue on an K8s 1.9.2 cluster using acs-engine 0.12.5 engine. In fact over 19K time over 3 days. @yastij @karataliu @feiskyer

@pidah
Copy link
Contributor

pidah commented Feb 19, 2018

@jcharlytown We are observing the same issue. According to the docs the --cgroups-per-qos flag needs to be enabled – https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#enabling-qos-and-pod-level-cgroups

@yastij
Copy link

yastij commented Feb 19, 2018

cc @arnaudmz

@jackfrancis
Copy link
Member

PR #2310 aims to fix this

@jcharlytown
Copy link
Author

Thanks everyone for fixing this. +1!

@jcharlytown
Copy link
Author

I just checked the latest release - it seems this fix is not included. Is there any specific reason?

@yastij
Copy link

yastij commented Mar 2, 2018

cc @jackfrancis

@jackfrancis
Copy link
Member

@jcharlytown @yastij Are you able to build from master to deploy clusters? If there is an overwhelming desire for a patch release we can include this fix in such a release this week.

@jcharlytown
Copy link
Author

@jackfrancis, I can build from master, yes. I was just wondering why that was in general. To understand the procedures here might come in handy in the future as we are currently evaluating what tool chain to use to manage our clusters. Can you comment on why this was not included?

@yastij
Copy link

yastij commented Mar 6, 2018

@jackfrancis - would be really nice, thanks !

@jackfrancis
Copy link
Member

We generate a patch release when there is a wide-impacting fix that we want to encourage folks to opt into. This is why it's important to vote when you want to see something sooner than the next minor release. So if there's a v0.13.2, we'll include this in it. :)

Additionally in the future we plan to patch release when a new Kubernetes release is published to make it easier to deploy new clusters (without requiring users to build from master).

@yastij
Copy link

yastij commented Mar 6, 2018

SGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants