Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Linux cgrpoup v2, issue-4885 #5068

Merged
merged 34 commits into from
Apr 16, 2024
Merged

Conversation

nezdali
Copy link
Contributor

@nezdali nezdali commented Mar 26, 2024

Fixes #4885

  • Introduced ILinuxUtilizationParser interface to be able to choose between parsers during DI
  • Introduced new class for cgroup v2 specifics LinuxUtilizationParserCgroupV2
  • GetHostCpuUsageInNanoseconds() parses /sys/fs/cgroup/cpu.stat file and calculates cpu usage in microseconds
  • TryGetCpuUnitsFromCgroups() parses cpu quota and period from a single file /sys/fs/cgroup/cpu.max
  • TryGetCgroupRequestCpu() parses /sys/fs/cgroup/cpu.weight and calculates from cpu weight to shares (1024 share == 1 CPU core) in order to get the Pod's cpu request limit.
  • Introduced new method GetMemoryUsageInBytesFromSlices() to calculate memory stats from slices (cgroup v2 specific)
  • Introduced an overload in SystemResources struct where new metric is added guaranteedPodCpuUnits, which get cpu request limit (as per k8s terminology)
Microsoft Reviewers: Open in CodeFlow

@mobratil
Copy link
Contributor

mobratil commented Mar 27, 2024

@nezdali , I deployed your change into my own cluster that has cgroupsv2 and printed SystemResources object.

The cluster have following resources in yaml:
          resources:
            requests:
              cpu: "300m"
              memory: "1.5Gi"
            limits:
              cpu: "900m"
              memory: "3.5Gi"

The printed values are following.

  System Resources: 
          Requests:
                  CPU = 309m,			(based on GuaranteedPodCpuUnits)
                  Memory = 3.5Gi,		(based on GuaranteedMemoryInBytes)
          Limits:
                  CPU = 900m,			(based on MaximumCpuUnits)
                  Memory: 15.61508560180664Gi	(based on MaximumMemoryInBytes)

While the requests for CPU corresponds to YAML specification, the memory seems to be not accurate.
Memory limit (MaximumMemoryInBytes) seems to be showing the capacity of the host machine instead of the pod limit.
The memory request (GuaranteedMemoryInBytes) contains the pod limit.

@nezdali
Copy link
Contributor Author

nezdali commented Mar 28, 2024

After more research it turns out that GuaranteedMemoryInBytes is the Resource Memory Limit (in k8s terms) which is read from /sys/fs/cgroup/memory.max
And currently there is no way to get the Resource Memory Request. It is always 0 in /sys/fs/cgroup/memory.min , for cgroupv2 container, VM (AzureLinux, Ubuntu)
K8s documentation on how memory.min populated https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2570-memory-qos/#readme

@nezdali nezdali marked this pull request as ready for review April 12, 2024 20:55
@dotnet-policy-service dotnet-policy-service bot added the waiting-author-feedback 📭 The author of this issue needs to respond in order for us to continue investigating this issue. label Apr 15, 2024
@@ -14,8 +14,10 @@ namespace Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux;
/// This class is not thread safe.
/// When the same instance is called by multiple threads it may return corrupted data.
/// </remarks>
internal sealed class LinuxUtilizationParser
internal sealed class LinuxUtilizationParser : ILinuxUtilizationParser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this class be renamed to LinuxUtilizationParserCgroupV1 then?

Copy link
Contributor Author

@nezdali nezdali Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually both parsers do not parse only cgroup fs at /sys/fs/cgroup , but also /proc , but the main purpose is to parse the cgroup so we rename if that makes code more readable

…ing/Linux/IFileSystem.cs

Co-authored-by: Nikita Balabaev <[email protected]>
@dotnet-policy-service dotnet-policy-service bot removed the waiting-author-feedback 📭 The author of this issue needs to respond in order for us to continue investigating this issue. label Apr 15, 2024
@nezdali nezdali closed this Apr 16, 2024
@mobratil mobratil reopened this Apr 16, 2024
@mobratil mobratil enabled auto-merge (squash) April 16, 2024 12:00
/// Get directory names on the filesystem based on the provided pattern.
/// </summary>
/// <returns>string.</returns>
string[] GetDirectoryNames(string directory, string pattern);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is an internal class I'd caution against using mutable arrays as return values. This could lead to some gnarly hard to track bugs.
Could this be changed to something like IList<string> or IEnumerable<string>?

Copy link
Contributor Author

@nezdali nezdali Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will do the changes in another PR since this one is merged already.

@mobratil mobratil merged commit 39b6a31 into dotnet:main Apr 16, 2024
6 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators May 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lack of cgroups v2 Support in Microsoft.Extensions.Diagnostics.ResourceMonitoring
5 participants