-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resctrl collector fixes #3326
base: master
Are you sure you want to change the base?
Resctrl collector fixes #3326
Conversation
Hi @JulSenko. Thanks for your PR. I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
monGroupPath = filepath.Join(controlGroupPath, monitoringGroupDir, properContainerName) | ||
rmErr := os.Remove(monGroupPath) | ||
if rmErr != nil && !os.IsNotExist(rmErr) { | ||
return "", fmt.Errorf("couldn't clean up monitoring group matching %q container: %v", containerName, rmErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap the error, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Errorf
not enough? Do you have in mind specific error type? Please let me know how to improve this error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return "", fmt.Errorf("couldn't clean up monitoring group matching %q container: %v", containerName, rmErr) | |
return "", fmt.Errorf("couldn't clean up monitoring group matching %q container: %w", containerName, rmErr) |
if existingPath != monGroupPath { | ||
rmErr = os.Remove(existingPath) | ||
if rmErr != nil && !os.IsNotExist(rmErr) { | ||
return "", fmt.Errorf("couldn't clean up monitoring group matching %q container: %v", containerName, rmErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap the error, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have in mind specific error type? Please let me know how to improve this error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return "", fmt.Errorf("couldn't clean up monitoring group matching %q container: %v", containerName, rmErr) | |
return "", fmt.Errorf("couldn't clean up monitoring group matching %q container: %w", containerName, rmErr) |
@@ -233,7 +251,7 @@ func findGroup(group string, pids []string, includeGroup bool, exclusive bool) ( | |||
for _, path := range availablePaths { | |||
groupFound, err := arePIDsInGroup(path, pids, exclusive) | |||
if err != nil { | |||
return "", err | |||
return path, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to return real value instead of zero value when you return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not ideal, but we need path value for additional check later if error type is errNotEnoughPIDs
or errTooManyPIDs
. E.g.: group could have changed completely and we need to recreate it or that path is correct, but some threads died / were spawned anew.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is supposed to be information about an error, then it should be part of error struct rather then value returned from a function. I would expect you to check if err == errNotEnoughPIDs
or if err == errTooManyPIDs
and act accordingly. Interface that you propose is difficult to grasp, in my opinion.
pids, err := getContainerPids() | ||
pids, err := getPids(containerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is getContainerPids()
used (it does not seem to be)? If not, would it be possible to drop the argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be, but it touches quite a bunch of definitions. I'll follow up on this if that works for you:)
resctrl/utils.go
Outdated
var processThreads []string | ||
for _, pid := range pids { | ||
processThreads, err = getAllProcessThreads(filepath.Join(processPath, strconv.Itoa(pid), processTask)) | ||
if err != nil { | ||
return "", err | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what are the benefits of this approach? As far as I understand (and I might be wrong) you extracted obtaining task IDs to separate loop, but the result will be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now threads are extracted earlier and passed as param to the findGroup
func, which previously accepted pids resulting in the incomplete list
@JulSenko, can you merge |
if existingPath != monGroupPath { | ||
rmErr = os.Remove(existingPath) | ||
if rmErr != nil && !os.IsNotExist(rmErr) { | ||
return "", fmt.Errorf("couldn't clean up monitoring group matching %q container: %v", containerName, rmErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return "", fmt.Errorf("couldn't clean up monitoring group matching %q container: %v", containerName, rmErr) | |
return "", fmt.Errorf("couldn't clean up monitoring group matching %q container: %w", containerName, rmErr) |
@@ -233,7 +251,7 @@ func findGroup(group string, pids []string, includeGroup bool, exclusive bool) ( | |||
for _, path := range availablePaths { | |||
groupFound, err := arePIDsInGroup(path, pids, exclusive) | |||
if err != nil { | |||
return "", err | |||
return path, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is supposed to be information about an error, then it should be part of error struct rather then value returned from a function. I would expect you to check if err == errNotEnoughPIDs
or if err == errTooManyPIDs
and act accordingly. Interface that you propose is difficult to grasp, in my opinion.
|
||
if !inHostNamespace { | ||
processPath = "/rootfs/proc" | ||
} | ||
|
||
for _, pid := range pids { | ||
processThreads, err := getAllProcessThreads(filepath.Join(processPath, pid, processTask)) | ||
for _, thread := range processThreads { | ||
treadInt, err := strconv.Atoi(thread) | ||
if err != nil { | ||
return "", err | ||
return "", fmt.Errorf("couldn't parse %q: %w", thread, err) | ||
} | ||
for _, thread := range processThreads { | ||
err = intelrdt.WriteIntelRdtTasks(monGroupPath, thread) | ||
if err != nil { | ||
secondError := os.Remove(monGroupPath) | ||
if secondError != nil { | ||
return "", fmt.Errorf( | ||
"coudn't assign pids to %q container monitoring group: %w \n couldn't clear %q monitoring group: %v", | ||
containerName, err, containerName, secondError) | ||
} | ||
return "", fmt.Errorf("coudn't assign pids to %q container monitoring group: %w", containerName, err) | ||
err = intelrdt.WriteIntelRdtTasks(monGroupPath, treadInt) | ||
if err != nil { | ||
secondError := os.Remove(monGroupPath) | ||
if secondError != nil { | ||
return "", fmt.Errorf( | ||
"coudn't assign pids to %q container monitoring group: %w \n couldn't clear %q monitoring group: %v", | ||
containerName, err, containerName, secondError) | ||
} | ||
return "", fmt.Errorf("coudn't assign pids to %q container monitoring group: %w", containerName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong but:
- The Old Way:
- Iterate over all
pid
s. - Fetch all the
tid
s for apid
- Iterate over all
tid
s - Write the
tid
to monitoring group. - Exit
tid
s loop. - Exit
pid
s loop.
- Iterate over all
- The New Way:
- Iterate over all
pid
s. - Save all the
tid
s for apid
to a variable. - Exit
pid
s loop. - Iterate over all
tid
s. - Write the
tid
to monitoring group. - Exit
tid
s loop.
- Iterate over all
Functionally these two approaches are identical. In the PR description you wrote:
Collect threads instead of pids. During collector startup we collect threads, so for the following checks we should also collect threads, not pids. Pid approach woks on relatively simple processes, but for any complex one pid list will never be the same as the original thread list, thus resulting in err on every collector run.
, but threads (tid
s) have always been collected and written to monitoring group.
I might be missing something but I can't understand where the bug that you are trying to fix is. I will appreciate more detailed explanation that will help me to understand your reasoning. A test case failing with The Old Way and passing with The New Way would be perfect!
/ok-to-test |
Resctrl collector fixes:
enabled=true
was missing.libcontainer
funcs to collect pids/threads instead of executingps
call which is very expensive.