Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: audit and fix cgroup reservations #9341

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

smira
Copy link
Member

@smira smira commented Sep 19, 2024

Fixes: #7081

Review all reservations and limits set, test under stress load (using both memory and CPU).

The goal: system components (Talos itself) and runtime (kubelet, CRI) should survive under extreme resource starvation (workloads consuming all CPU/memory).

Uses #9337 to visualize changes, but doesn't depend on it.

@smira smira added this to the v1.9 milestone Sep 19, 2024
@smira smira force-pushed the fix/cgroups-reservation branch 2 times, most recently from ed43698 to 1a2652d Compare September 20, 2024 16:05
@smira smira marked this pull request as ready for review September 20, 2024 16:07
@smira
Copy link
Member Author

smira commented Sep 20, 2024

Controlplane:

NAME                                                                              MemCurrent   MemPeak    MemLow     Peak/Low   MemHigh    MemMin     Current/Min   MemMax     CpuWeight   CpuNice   CpuMax
.                                                                                    unset        unset      unset    unset%       unset      unset    unset%          unset    unset       unset    []
├──init                                                                            130 MiB      136 MiB    192 MiB    70.99%         max     96 MiB   135.93%            max       79           1    [   max 100000]
├──kubepods                                                                        658 MiB      674 MiB        0 B      max%         max        0 B      max%        1.5 GiB       77           1    [   max 100000]
│   ├──besteffort                                                                   68 MiB       73 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│   │   └──podf1f58515-b3d0-4732-8d57-a1d2d6eded15                                  68 MiB       73 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│   │       ├──17d1cef028c6597bf26ad0e52cf3a839eec909ac1170080d135a31e909e10bc2     68 MiB       73 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│   │       └──2373b772e588345784d51c86493262c6bd316d37333c773713ed893b3b9a0989    184 KiB      2.0 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│   └──burstable                                                                   590 MiB      606 MiB        0 B      max%         max        0 B      max%            max       18           8    [   max 100000]
│       ├──pod32c6547b9d144ffac5df87f44fe94431                                      71 MiB       74 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│       │   ├──77c5d8300fed810e3e3a7f4d3d934bf988622e4a39192938d9b38507aa57f54f     71 MiB       74 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│       │   └──e23d3a6faf8e84100d6671200d4a146458175d68a0dc7ca599726eea31f6e71b    220 KiB      3.3 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│       ├──pod5261417e-9670-4faf-8962-e678cd828090                                  47 MiB       50 MiB        0 B      max%         max        0 B      max%            max        4          14    [   max 100000]
│       │   ├──3cacde9de64a796592c63940658616ee598cff7c341753663d268e09b88d4456     46 MiB       48 MiB        0 B      max%         max        0 B      max%            max        4          14    [   max 100000]
│       │   └──ac59c6ae5bb59b10e2b709bc8f2ded3bd66fbd68e9acf9c09f4fa47bc529060e    212 KiB      3.1 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│       ├──podb5baa3da5406ac99f9ab6c1988aedeb2                                     335 MiB      347 MiB        0 B      max%         max        0 B      max%            max        8          11    [   max 100000]
│       │   ├──45edc4a93966060042610ffa5e6001c951c1a92d4f5ee123e88fe4cb49df5ba5    334 MiB      346 MiB        0 B      max%         max        0 B      max%            max        8          11    [   max 100000]
│       │   └──57e5a4d6d58ee94629bb083f550a09dbe918f2dc48759eec671fe8c3146551e8    908 KiB      3.8 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│       ├──podb95bf81a197c195dc9c8217a98d7edbd                                      79 MiB       82 MiB        0 B      max%         max        0 B      max%            max        2          18    [   max 100000]
│       │   ├──3a255a894710a279530dbbb12d0126dab43fa4539d7d56f338e15b8d23298d14     79 MiB       82 MiB        0 B      max%         max        0 B      max%            max        2          18    [   max 100000]
│       │   └──6b5057351e4527dbf78879f2cd1fbea8cf494dcde8440945ca9d2861672444da    216 KiB      3.3 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│       └──podfcd5e65a-dafa-4a74-91c9-2d6a1702b69a                                  58 MiB       58 MiB        0 B      max%         max        0 B      max%        170 MiB        4          14    [   max 100000]
│           ├──0328ca2caf91426cde6d098cf1bb1e334ba50ad800fd2a1ac39dbfaece947bd8    220 KiB      3.5 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│           └──ff55a6c8557732b49cbe9e014e34283316d427be1dfa09b406e5c275de43aec6     57 MiB       58 MiB        0 B      max%         max        0 B      max%        170 MiB        4          14    [   max 100000]
├──podruntime                                                                      554 MiB      579 MiB        0 B      max%         max        0 B      max%            max       79           1    [   max 100000]
│   ├──etcd                                                                        365 MiB      391 MiB    256 MiB   152.57%         max        0 B      max%            max       79           1    [   max 100000]
│   ├──kubelet                                                                     105 MiB      106 MiB    192 MiB    55.23%         max     96 MiB   109.63%            max       39           4    [   max 100000]
│   └──runtime                                                                      83 MiB       83 MiB    512 MiB    16.28%         max    256 MiB    32.34%            max       39           4    [   max 100000]
└──system                                                                          237 MiB      238 MiB    192 MiB   123.97%         max     96 MiB   246.54%            max       59           2    [   max 100000]
    ├──apid                                                                         34 MiB       35 MiB     32 MiB   108.52%         max     16 MiB   211.45%         40 MiB       20           7    [   max 100000]
    ├──dashboard                                                                   112 MiB      112 MiB        0 B      max%         max        0 B      max%        196 MiB        8          11    [   max 100000]
    ├──runtime                                                                      74 MiB       78 MiB     96 MiB    81.16%         max     48 MiB   154.97%            max       20           7    [   max 100000]
    ├──trustd                                                                       10 MiB       11 MiB     16 MiB    69.07%         max    8.0 MiB   126.71%         24 MiB       10          10    [   max 100000]
    └──udevd                                                                       6.8 MiB       14 MiB     16 MiB    87.87%         max    8.0 MiB    84.86%            max       10          10    [   max 100000]

@smira
Copy link
Member Author

smira commented Sep 20, 2024

Worker:

NAME                                                                              MemCurrent   MemPeak    MemLow     Peak/Low   MemHigh    MemMin     Current/Min   MemMax     CpuWeight   CpuNice   CpuMax
.                                                                                    unset        unset      unset    unset%       unset      unset    unset%          unset    unset       unset    []
├──init                                                                            103 MiB      128 MiB    192 MiB    66.80%         max     96 MiB   107.17%            max       79           1    [   max 100000]
├──kubepods                                                                         67 MiB      2.2 GiB        0 B      max%         max        0 B      max%        2.4 GiB      155          -2    [   max 100000]
│   ├──besteffort                                                                   36 MiB      2.2 GiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│   │   └──pod6c6eece8-04cd-401f-bf7f-9ddf6ef2fa69                                  33 MiB       73 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│   │       ├──47facd76839a9bebb74f86edfae821ef84911d8c752b5ab336e3eac1804a545d     33 MiB       72 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│   │       └──a590bdc89f065055c5a7006db5fe27787822d4da4e037c2bfa841df50a6bc34a    160 KiB      2.2 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
│   └──burstable                                                                    30 MiB       46 MiB        0 B      max%         max        0 B      max%            max        4          14    [   max 100000]
│       └──podfff3d464-0b0c-4adc-a5f1-d2f0017537a1                                  30 MiB       46 MiB        0 B      max%         max        0 B      max%            max        4          14    [   max 100000]
│           ├──752089319f548c1d38d75c8d97abe0049cc2d448e28994d7049ee3fce7bf51d7     30 MiB       45 MiB        0 B      max%         max        0 B      max%            max        4          14    [   max 100000]
│           └──76ea1925687b151ff1b3f0d9cf59c5485e4bba862b110614b8f8572fba3f24f5    188 KiB      3.2 MiB        0 B      max%         max        0 B      max%            max        1          19    [   max 100000]
├──podruntime                                                                      169 MiB      332 MiB        0 B      max%         max        0 B      max%            max      157          -2    [   max 100000]
│   ├──kubelet                                                                      82 MiB      101 MiB    192 MiB    52.70%         max     96 MiB    85.88%            max       39           4    [   max 100000]
│   └──runtime                                                                      86 MiB      240 MiB    392 MiB    61.21%         max    196 MiB    43.94%            max       39           4    [   max 100000]
└──system                                                                          110 MiB      199 MiB    192 MiB   103.76%         max     96 MiB   114.68%            max       59           2    [   max 100000]
    ├──apid                                                                         18 MiB       25 MiB     32 MiB    78.49%         max     16 MiB   114.09%         40 MiB       20           7    [   max 100000]
    ├──dashboard                                                                    55 MiB      112 MiB        0 B      max%         max        0 B      max%        196 MiB        8          11    [   max 100000]
    ├──runtime                                                                      31 MiB       59 MiB     96 MiB    61.46%         max     48 MiB    64.66%            max       20           7    [   max 100000]
    ├──trustd                                                                          0 B          0 B     16 MiB     0.00%         max    8.0 MiB     0.00%         24 MiB       10          10    [   max 100000]
    └──udevd                                                                       6.0 MiB       14 MiB     16 MiB    87.87%         max    8.0 MiB    74.76%            max       10          10    [   max 100000]

go.mod Outdated Show resolved Hide resolved
Copy link
Member

@Unix4ever Unix4ever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆒

Fixes: siderolabs#7081

Review all reservations and limits set, test under stress load (using
both memory and CPU).

The goal: system components (Talos itself) and runtime (kubelet, CRI)
should survive under extreme resource starvation (workloads consuming
all CPU/memory).

Uses siderolabs#9337 to visualize changes, but doesn't depend on it.

Signed-off-by: Andrey Smirnov <[email protected]>
@smira
Copy link
Member Author

smira commented Sep 20, 2024

/m

@talos-bot talos-bot merged commit 6b15ca1 into siderolabs:main Sep 20, 2024
50 checks passed
@smira smira mentioned this pull request Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backported
Development

Successfully merging this pull request may close these issues.

Audit the cgroups in Talos and resource reservation
4 participants