-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
freezer: add delay after freeze #2941
freezer: add delay after freeze #2941
Conversation
libcontainer/cgroups/fs/freezer.go
Outdated
@@ -65,6 +65,10 @@ func (s *FreezerGroup) Set(path string, r *configs.Resources) (Err error) { | |||
return err | |||
} | |||
|
|||
if i%25 == 24 { | |||
// A short sleep before reading back also helps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the comment to clarify this helps what
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a big comment above telling the whole story...
Ah, I just found it now contradicts what I say here. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments rewritten; PTAL @AkihiroSuda
This is from a "good" VM (once you start testing, gha gives you a good VM...). Out of 400 runs, only 147 needed more than 1 retry. Out of those that need a retry, there are peaks at 24 and 49, meaning sleep helps.
|
LGTM |
I hate to keep adding those kludges, but lately TestFreeze (and TestSystemdFreeze) from libcontainer/integration fails a lot. The failure comes and goes, and is probably this is caused by a slow host allocated for the test, and a slow VM on top of it. To remediate, add a small sleep on every 25th iteration in between asking the kernel to freeze and checking its status. In the worst case scenario (failure to freeze), this adds about 0.4 ms (40 x 10 us) to the duration of the call. It is hard to measure how this affects CI as GHA plays a roulette when allocating a node to run the test on, but it seems to help. With additional debug info, I saw somewhat frequent "frozen after 24 retries" or "frozen after 49 retries", meaning it succeeded right after the added sleep. While at it, rewrite/improve the comments. Signed-off-by: Kir Kolyshkin <[email protected]>
b787703
to
524abc5
Compare
I hate to keep adding those kludges (for earlier ones, see #2918, #2791, #2774)
but lately TestFreeze (and TestSystemdFreeze) from libcontainer/integration
fails a lot (see #2907).
The failure comes and goes, and is probably this is caused by a slow host
allocated for the test, and a slow VM on top of it.
To remediate, add a small sleep on every 25th iteration in between
asking the kernel to freeze and checking its status.
In the worst case scenario (failure to freeze), this adds about 0.4 ms
(40 x 10 us) to the duration of the call.
It is hard to measure how this affects CI as GHA plays a roulette when
allocating a node to run the test on, but it seems to help. With
additional debug info, I saw somewhat frequent "frozen after 24 retries"
or "frozen after 49 retries", meaning it succeeded right after the added
sleep.
While at it, rewrite/improve the comments.
Fixes: #2907.