Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup v1/v2 compatibility issue when setting memory below the current usage #3509

Closed
kolyshkin opened this issue Jun 14, 2022 · 8 comments · Fixed by #3579
Closed

cgroup v1/v2 compatibility issue when setting memory below the current usage #3509

kolyshkin opened this issue Jun 14, 2022 · 8 comments · Fixed by #3579

Comments

@kolyshkin
Copy link
Contributor

kolyshkin commented Jun 14, 2022

With cgroup v1, when we set the memory limit to below the current usage (runc update on a running container), the kernel returns EBUSY and runc fails with a nice error message:

ERRO[0000] unable to set memory limit to 27033 (current usage: 270336, peak usage: 6082560) 

With cgroup v2, when do do this, kernel OOM killer just kill the container. This makes this behavior incompatible with cgroup v1.

One (imperfect) workaround is to add a flag to OCI spec that disallows to set memory limit to the value lower than the current usage. This is borderline ugly but at least in most cases we'll return an error instead of letting the container being OOM killed.

(the other, much less serious part of the problem is, when container is disappearing in the middle of runc update, we get all sorts of ugly messages)

@giuseppe
Copy link
Member

giuseppe commented Jun 15, 2022

could we use memory.high instead of memory.max?

@danishprakash
Copy link
Contributor

danishprakash commented Jun 15, 2022

I don't have a complete understanding at this point but are we talking about cgroup memory limit applied at the time of container creation? And if that's the case, is the difference then the fact that in cgroupv2 the kernel isn't returning an EBUSY anymore?

add a flag to OCI spec

And then have runc parse it and fail early instead of the container being OOMKilled?

@mrunalp
Copy link
Contributor

mrunalp commented Jun 15, 2022

This is when we try to update the memory limit of an already running container to a value that is less than what it is currently using. In v1, we got EBUSY, but in v2, kernel applies the value and if it is low, the container is OOM Killed.

@kolyshkin
Copy link
Contributor Author

could we use memory.high instead of memory.max?

From the vertical pod autoscaler POV -- yes. Meaning, it will still have to distinguish between v1 and v2. Meaning, it does not make sense to add a flag I have proposed in the description.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 17, 2022

could we use memory.high instead of memory.max

I think that will have to be phase 2 with cgroups v2 in k8s. Phase 1 is just a direct mapping to v1.

@utam0k
Copy link
Member

utam0k commented Aug 25, 2022

Is it possible to get the current memory usage from memory.current and if it is lower than that, not update it and return an error? This may be too much help as OCI runtime...?

kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Aug 29, 2022
This setting can be used to mimic cgroup v1 behavior on cgroup v2,
when setting the new memory limit during update operation.

In cgroup v1, a limit which is lower than the current usage is rejected.

In cgroup v2, such a low limit is causing an OOM kill.

Ref: opencontainers/runc#3509

Signed-off-by: Kir Kolyshkin <[email protected]>
@kamizjw
Copy link

kamizjw commented Sep 6, 2022

Is there a similar problem with other configurations other than memory?

@kolyshkin
Copy link
Contributor Author

Is there a similar problem with other configurations other than memory?

Not that I know of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants