-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix init log forwarding race #2835
Conversation
3a14578
to
d4210f6
Compare
Looks like the "checkpoint --lazy-pages and restore" test case has stuck on Fedora, resulting in a timeout. Haven't seen it before. CI restarted. |
It was probably the VM itself, not the test case that got stuck (seen it elsewhere the same day) |
@cyphar @AkihiroSuda @mrunalp PTAL |
d4210f6
to
897d121
Compare
The last commit was kinda heavy and thus complex to review -- I have split it into three smaller and much more digestible ones. |
@cyphar @AkihiroSuda PTAL |
98fd91f
to
18dcb35
Compare
It's to work around a kernel bug we found (and fixed in 4.10) during the resolution of CVE-2016-9962. Basically it turned out that |
No, I was asking about the code that I have added in this PR, see Anyway, this PR is working correctly, so my interest is purely theoretical. |
logPipeFdStr := os.Getenv("_LIBCONTAINER_LOGPIPE") | ||
logPipeFd, err := strconv.Atoi(logPipeFdStr) | ||
if err != nil { | ||
return fmt.Errorf("unable to convert _LIBCONTAINER_LOGPIPE=%s to int: %s", logPipeFdStr, err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we can pass logFd as a parameter to StartInitalization
as we just converted it from env in init.go
...
Decided to do it this way since all the other parameters are obtained from environment right here.
It does not make sense to have a string for a numeric type. Signed-off-by: Kir Kolyshkin <[email protected]>
1. Remove redundant "echo $output" from the first test case, as "runc" helper function already logs the output. 2. Show the contents of log.out to stderr, so it case of error we can see what is going on. 3. Remove the check that `log.out` file exists. This check is redundant, since right after it we do `cat log.out` and check its exit code. 4. Factor out common checks into check_debug. Signed-off-by: Kir Kolyshkin <[email protected]>
Error handling is slightly cleaner this way. While at it, do minor refactoring and fix error logging in processEntry. Signed-off-by: Kir Kolyshkin <[email protected]>
This fixes a big red warning in my vim. Signed-off-by: Kir Kolyshkin <[email protected]>
I have noticed that ConfigureLogs do not return an error in case logging was already configured -- instead it just warns about it. So I went ahead and changed the warning to the actual error... ... only to discover I broke things badly, because in case of runc init logging is configured twice. The fix is to not configure logging in case we are init. Signed-off-by: Kir Kolyshkin <[email protected]>
- add check, checkWait, and finish helpers; - move test cleanup to runLogForwarding; - introduce and use log struct. Signed-off-by: Kir Kolyshkin <[email protected]>
Sometimes debug.bats test cases are failing like this: > not ok 27 global --debug to --log --log-format 'json' > # (in test file tests/integration/debug.bats, line 77) > # `[[ "${output}" == *"child process in init()"* ]]' failed It happens more when writing to disk. This issue is caused by the fact that runc spawns log forwarding goroutine (ForwardLogs) but does not wait for it to finish, resulting in missing debug lines from nsexec. ForwardLogs itself, though, never finishes, because it reads from a reading side of a pipe which writing side is not closed. This is especially true in case of runc create, which spawns runc init and exits; meanwhile runc init waits on exec fifo for arbitrarily long time before doing execve. So, to fix the failure described above, we need to: 1. Make runc create/run/exec wait for ForwardLogs to finish; 2. Make runc init close its log pipe file descriptor (i.e. the one which value is passed in _LIBCONTAINER_LOGPIPE environment variable). This is exactly what this commit does: 1. Amend ForwardLogs to return a channel, and wait for it in start(). 2. In runc init, save the log fd and close it as late as possible. PS I have to admit I still do not understand why an explicit close of log pipe fd is required in e.g. (*linuxSetnsInit).Init, right before the execve which (thanks to CLOEXEC) closes the fd anyway. Signed-off-by: Kir Kolyshkin <[email protected]>
First, add runc --debug exec test cases, very similar to those in debug.bats but for runc exec (rather than runc run). Do not include json tests as it is already tested in debug.bats. Second, add logrus debug to late stages of runc init, and amend the integration tests to check for those messages. This serves two purposes: - demonstrate that runc init can be amended with debug logrus which is properly forwarded to and logged by the parent runc create/run/exec; - improve the chances to catch the race fixed by the previous commit. Signed-off-by: Kir Kolyshkin <[email protected]>
18dcb35
to
2726146
Compare
(originally developed in #2487; decided to split for clarity)
Sometimes debug.bats test cases are failing like this (issue #2756):
It happens more often while writing to disk.
The issue is caused by the fact that runc spawns log forwarding goroutine
(
ForwardLogs
) but does not wait for it to finish, resulting in missingdebug lines from nsexec.
ForwardLogs
itself, though, never finishes, because it reads from areading side of a pipe whose writing side is not closed. This is
especially true in case of
runc create
, which spawnsrunc init
andexits; meanwhile
runc init
waits on exec fifo for some undetermined longtime before doing execve.
So, to fix the issue described above, we need to:
Make runc create/run/exec wait for ForwardLogs to finish;
Make runc init close its log pipe file descriptor (i.e.
the one which value is passed in _LIBCONTAINER_LOGPIPE
environment variable).
This is exactly what this commit does. In addition, it
adds logrus debug to late stages of runc init, and amends the
integration tests to check for those messages;
adds
runc --debug exec
test cases, similar to those indebug.bats
but for
runc exec
rather thanrunc run
;refactors libcontainer/logs unit tests to simplify and update for
the new
ForwardLogs
.PS I have to admit I still do not understand why an explicit closing of log pipe
fd is required in e.g.
(*linuxSetnsInit).Init
, right before the execvewhich (thanks to
CLOEXEC
) closes the fd anyway.Fixes: #2756