-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky TestInspect #42251
Merged
Merged
Fix flaky TestInspect #42251
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This test has been flaky for a long time, failing with: --- FAIL: TestInspect (12.04s) inspect_test.go:39: timeout hit after 10s: waiting for tasks to enter run state. task failed with error: task: non-zero exit (1) While looking through logs, noticed tasks were started, entering RUNNING stage, and then exited, to be started again. state.transition="STARTING->RUNNING" ... msg="fatal task error" error="task: non-zero exit (1)" ... state.transition="RUNNING->FAILED" Looking for possible reasons, first considering network issues (possibly we ran out of IP addresses or networking not cleaned up), then I spotted the issue. The service is started with; Command: []string{"/bin/top"}, Args: []string{"-u", "root"}, The `-u root` is not an argument for the service, but for `/bin/top`. While the Ubuntu/Debian/GNU version `top` has a -u/-U option; docker run --rm ubuntu:20.04 top -h 2>&1 | grep '\-u' top -hv | -bcEHiOSs1 -d secs -n max -u|U user -p pid(s) -o field -w [cols] The *busybox* version of top does not: docker run --rm busybox top --help 2>&1 | grep '\-u' So running `top -u root` would cause the task to fail; docker run --rm busybox top -u root top: invalid option -- u ... echo $? 1 As a result, the service went into a crash-loop, and because the `poll.WaitOn()` was running with a short interval, in many cases would _just_ find the RUNNING state, perform the `service inspect`, and pass, but in other cases, it would not be that lucky, and continue polling untill we reached the 10 seconds timeout, and mark the test as failed. Looking for history of this option (was it previously using a different image?) I found this was added in 6cd6d86, but probably just missed during review. Given that the option is only set to have "something" to inspect, I replaced the `-u root` with `-d 5`, which makes top refresh with a 5 second interval. Note that there is another test (`TestServiceListWithStatuses) that uses the same spec, however, that test is skipped based on API version of the test-daemon, and (to be looked into), when performing that check, no API version is known, causing the test to (always?) be skipped: === RUN TestServiceListWithStatuses --- SKIP: TestServiceListWithStatuses (0.00s) list_test.go:34: versions.LessThan(testEnv.DaemonInfo.ServerVersion, "1.41") Signed-off-by: Sebastiaan van Stijn <[email protected]>
thaJeztah
added
status/2-code-review
area/testing
process/cherry-picked
kind/bugfix
PR's that fix bugs
labels
Apr 3, 2021
tiborvass
approved these changes
Apr 3, 2021
@thaJeztah our hero. TBH not sure if it has to be |
tianon
approved these changes
Apr 5, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😄
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(hopefully) fixes #40651
This test has been flaky for a long time, failing with:
While looking through logs, noticed tasks were started, entering RUNNING stage,
and then exited, to be started again.
Looking for possible reasons, first considering network issues (possibly we ran
out of IP addresses or networking not cleaned up), then I spotted the issue.
The service is started with;
The
-u root
is not an argument for the service, but for/bin/top
. While theUbuntu/Debian/GNU version
top
has a -u/-U option;The busybox version of top does not:
So running
top -u root
would cause the task to fail;As a result, the service went into a crash-loop, and because the
poll.WaitOn()
was running with a short interval, in many cases would just find the RUNNING
state, perform the
service inspect
, and pass, but in other cases, it would notbe that lucky, and continue polling untill we reached the 10 seconds timeout,
and mark the test as failed.
Looking for history of this option (was it previously using a different image?) I
found this was added in 6cd6d86 (#33684), but probably
just missed during review.
Given that the option is only set to have "something" to inspect, I replaced
the
-u root
with-d 5
, which makes top refresh with a 5 second interval.Note that there is another test (`TestServiceListWithStatuses) that uses the same
spec, however, that test is skipped based on API version of the test-daemon, and
(to be looked into), when performing that check, no API version is known, causing
the test to (always?) be skipped:
- A picture of a cute animal (not mandatory but encouraged)