Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check memory limits for cgroupv1 #27

Closed
wants to merge 1 commit into from

Conversation

simonbyrne
Copy link

Seems to fix issue with JuliaLang/julia#46796 (comment)

I admit I know very little about cgroups, or C programming, so this may well be incorrect, but it was able to correctly determine the cgroup limits on our Slurm cluster

high = uv__read_uint64(filename);
} else {
p = strchr(buf, ':');
while (p != NULL && memcmp(p, ":memory:", 8)) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait, I realise this isn't how cgroups work...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it is correct? But it would be good for someone who understands these things to check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it slurm creates a cgroup subtree? On my cgroupv1 system, the limit sits at /sys/fs/cgroup/memory directly. It'd be useful to add a comment here about what you're doing, that's easier to review than to parse C code 🙂

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that appears to be how cgroupv1 works, e.g. https://unix.stackexchange.com/a/621576

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently we're still missing something though, as on my cgroup1 system with Docker, I can't access even the memory subgroup listed in /proc/self/cgroup.

@maleadt
Copy link
Member

maleadt commented Oct 8, 2022

The fallbacks were already intended to address cgroupv1. What is wrong with those?

@simonbyrne
Copy link
Author

simonbyrne commented Oct 8, 2022

It isn't giving the correct result on our Slurm cluster. If I do:

├ srun -t 01:10:00 --mem 2G --pty bash -l # start job with 2G of memory

├ cat /sys/fs/cgroup/memory/memory.limit_in_bytes # what is read by uv__get_constrained_memory_fallback
9223372036854771712

├ cat /proc/self/cgroup # cgroups of current process
11:blkio:/
10:memory:/slurm/uid_5184/job_28925944/step_0
9:freezer:/slurm/uid_5184/job_28925944/step_0
8:pids:/
7:cpuset:/slurm/uid_5184/job_28925944/step_0
6:net_prio,net_cls:/
5:perf_event:/
4:hugetlb:/
3:devices:/slurm/uid_5184/job_28925944/step_0/task_0
2:cpuacct,cpu:/
1:name=systemd:/system.slice/slurmd.service

├ cat /sys/fs/cgroup/memory/slurm/uid_5184/job_28925944/step_0/memory.limit_in_bytes # current memory controller cgroup
2147483648

@simonbyrne
Copy link
Author

Any thoughts on this? I'd really like to get this into 1.9 (or ideally backported to 1.8).

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm sure upstream will have more comments though, so could you create a PR there once I have our pending changes merged?

high = uv__read_uint64(filename);
} else {
p = strchr(buf, ':');
while (p != NULL && memcmp(p, ":memory:", 8)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it slurm creates a cgroup subtree? On my cgroupv1 system, the limit sits at /sys/fs/cgroup/memory directly. It'd be useful to add a comment here about what you're doing, that's easier to review than to parse C code 🙂

max = uv__read_uint64(filename);
snprintf(filename, sizeof(filename), "/sys/fs/cgroup/memory/%s/memory.soft_limit_in_bytes", p);
high = uv__read_uint64(filename);
}

if (max == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (max == 0 || high == 0)

@maleadt
Copy link
Member

maleadt commented Oct 12, 2022

Actually, I'll try to upstream it, since I'm working on that right now anyway.

@simonbyrne
Copy link
Author

Actually, I'll try to upstream it, since I'm working on that right now anyway.

Thanks, please let me know if I can do anything to help.

@maleadt
Copy link
Member

maleadt commented Oct 26, 2022

I updated your PR to still correctly read out the limits I'm seeing (ref #27 (comment)) and pushed that to libuv#3754. Can you verify that works with your SLURM system?

@maleadt
Copy link
Member

maleadt commented Nov 25, 2022

Superseded by #32

@maleadt maleadt closed this Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants