Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: Inconsistent behaviour in exec.Cmd.Wait #20730

Closed
eZanmoto opened this issue Jun 19, 2017 · 3 comments
Closed

os/exec: Inconsistent behaviour in exec.Cmd.Wait #20730

eZanmoto opened this issue Jun 19, 2017 · 3 comments

Comments

@eZanmoto
Copy link

What version of Go are you using (go version)?

go version go1.8.3 linux/amd64

Note that I'm running Go using Docker for reproducibility (see below).

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build270338682=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I created main.go, as follows:

package main

import (
    "fmt"
    "io/ioutil"
    "os"
    "os/exec"
    "time"
)

func main() {
    cmd := exec.Command("git", "daemon")
    cmd.Stdout = ioutil.Discard
    cmd.Stderr = ioutil.Discard
    // cmd.Stdout = os.Stdout
    // cmd.Stderr = os.Stderr

    if err := cmd.Start(); err != nil {
        die("%v", err)
    }

    fmt.Println("Waiting for process to start...")
    time.Sleep(time.Second*2)

    wait := make(chan struct{})
    go func() {
        fmt.Println("Calling `cmd.Wait`...")
        err := cmd.Wait()
        fmt.Println("Process exited:", err)
        close(wait)
        fmt.Println("Broadcasted wait signal")
    }()

    fmt.Println("Waiting for `cmd.Wait` to be called...")
    time.Sleep(time.Second*2)

    fmt.Println("Sending kill signal to process...")
    proc := cmd.Process
    if err := proc.Kill(); err != nil {
        die("%v", err)
    }

    fmt.Println("Waiting for wait signal or timeout...")
    time.Sleep(time.Second*2)
    select {
    case <-wait:
        fmt.Println("Got wait signal")
    case <-time.After(time.Second*5):
        fmt.Println("Timeout.")
        fmt.Println("Checking process state (", cmd.ProcessState, ")")
        if cmd.ProcessState != nil {
            fmt.Println("Confirmed process exited.")
            fmt.Println("Retrying for wait signal or timeout...")
            select {
            case <-wait:
                fmt.Println("Got wait signal")
            case <-time.After(time.Second*5):
                fmt.Println("Timeout.")
            }
        }
    }

    out, err := exec.Command("bash", "-c", "ps aux | grep git").CombinedOutput()
    if err != nil {
        die("%v", err)
    }
    fmt.Println(string(out))
}

func die(msg string, args ...interface{}) {
    fmt.Fprintf(os.Stderr, msg+"\n", args...)
    os.Exit(1)
}

I ran it with the following Docker command:

docker run --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp golang:1.8 \
    go run main.go

I then uncommented the commented lines and ran the same command again.

What did you expect to see?

I expected the output of both invocations to match.

What did you see instead?

Running the program with output redirected to ioutil.Discard produced the
following output:

Waiting for process to start...
Waiting for `cmd.Wait` to be called...
Calling `cmd.Wait`...
Sending kill signal to process...
Waiting for wait signal or timeout...
Timeout.
Checking process state ( signal: killed )
Confirmed process exited.
Retrying for wait signal or timeout...
Timeout.
root        27  0.0  0.1  16268  1992 ?        S    17:21   0:00 git-daemon
root        29  0.0  0.2  20040  2748 ?        S    17:22   0:00 bash -c ps aux | grep git
root        31  0.0  0.0  11128   992 ?        S    17:22   0:00 grep git

Running the program with output redirected to io.Stdout and io.Stderr
produced the following output:

Waiting for process to start...
Waiting for `cmd.Wait` to be called...
Calling `cmd.Wait`...
Sending kill signal to process...
Waiting for wait signal or timeout...
Process exited: signal: killed
Broadcasted wait signal
Got wait signal
root        26  0.0  0.1  16268  1960 ?        S    17:21   0:00 git-daemon
root        28  0.0  0.2  20040  2752 ?        S    17:21   0:00 bash -c ps aux | grep git
root        31  0.0  0.0  11128   980 ?        S    17:21   0:00 grep git

The core issue in the above is that cmd.Wait() doesn't return when the command
exits in the case where ioutil.Discard is being used.

Additional notes

I consider the second invocation to have the correct and expected output. This
opinion is backed up as it conforms with the operation of os.Process; a quick
test of this is to replace cmd.Wait() in the above code with
cmd.Process.Wait() in both invocations and see that the output matches that of
the second invocation above.

The problem seems related to the child processes spawned by git daemon. This
can be further demonstrated by instead spawning go run hello.go with different
implementations of hello.go, such as the following:

package main

import "fmt"

func main() {
    fmt.Println("Hello, world!")
}

main.go runs without producing a timeout with this implementation of
hello.go. Adding a time.Sleep(time.Second*5) before the Println in this
implementation also causes no timeout. However, if the Println is instead put
in an infinite for loop then a timeout occurs.

It appears that exec.Cmd.Wait currently requires all processes in the process
tree of the spawned process to exit before it can return. I suggest either
updating the behaviour to match that of os.Process, or else updating the
documentation to reflect this behaviour.

@ianlancetaylor
Copy link
Contributor

I think this is another variant of #18874. I don't know how to fix that one, or this one either.

There is no reason to set the Stdout and Stderr fields to ioutil.Discard, so it might help if you explain what you are really trying to do.

@eZanmoto
Copy link
Author

@ianlancetaylor
I just used ioutil.Discard for conciseness of the MWE, the same behaviour results from using bytes.Buffers like I normally do.

The problem, in my opinion, is that I expected exec.Cmd.Wait to function like os.Process.Wait and return when the process exited. However, based on #18874, exec.Cmd is additionally "bound" to the FDs it passed to the process, not just its PID, and so cannot exit until they are closed. My proposed solution is to update the exec.Cmd.Wait documentation to reflect this, in a similar way to the current documentation concerning c.Stdin. I would also update the initial line to indicate that Wait doesn't just wait for the command to exit, but may also wait for other events (to be detailed in the body of the documentation) to transpire, which could cause Wait to return later.

@ianlancetaylor
Copy link
Contributor

Let's move discussion to #18874. Thanks.

zwass added a commit to kolide/launcher that referenced this issue Oct 7, 2017
- Correctly detect when error channel is closed (potential fix for #134).
  Previously the logic was inverted for whether the channel was closed, so
  recovery was not initiated. Unit test TestOsqueryDies repros the suspected
  issue.
- Allow logger to be set properly.
- Add logging around recovery scenarios.
- Check communication with both osquery and extension server in health check
  (previously only the extension server was checked).
- Add healthcheck on interval that causes recovery on failure (Closes #141).
- Do not set cmd output to ioutil.Discard. Causes a bug with cmd.Wait (see
  golang/go#20730)
zwass added a commit to kolide/launcher that referenced this issue Oct 12, 2017
- Correctly detect when error channel is closed (potential fix for #134).
  Previously the logic was inverted for whether the channel was closed, so
  recovery was not initiated. Unit test TestOsqueryDies repros the suspected
  issue.
- Allow logger to be set properly.
- Add logging around recovery scenarios.
- Check communication with both osquery and extension server in health check
  (previously only the extension server was checked).
- Add healthcheck on interval that causes recovery on failure (Closes #141).
- Do not set cmd output to ioutil.Discard. Causes a bug with cmd.Wait (see
  golang/go#20730)
marpaia pushed a commit to kolide/launcher that referenced this issue Oct 17, 2017
- Changes to public API to better reflect actual usage and ease implementation.
- Use errgroup for coordination of process management/cleanup. This helps
  prevent leaking of goroutines (relative to existing implementation).
- Fix bug in which osquery process was not restarted after failure.
- Allow logger to be set properly.
- Add logging around recovery scenarios.
- Check communication with both osquery and extension server in health check
  (previously only the extension server was checked).
- Add healthcheck on interval that initiates recovery on failure (Closes #141).
- Do not set cmd output to ioutil.Discard. Causes a bug with cmd.Wait (see
  golang/go#20730).
@golang golang locked and limited conversation to collaborators Jun 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants