Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove running containers in case of a sigterm #8306

Merged
merged 1 commit into from
Apr 7, 2016

Conversation

bparees
Copy link
Contributor

@bparees bparees commented Mar 30, 2016

No description provided.

@bparees
Copy link
Contributor Author

bparees commented Mar 30, 2016

@rhcarvalho @smarterclayton ptal. part of #8266 (comment)

@@ -156,15 +156,27 @@ func dockerRun(client DockerClient, createOpts docker.CreateContainerOptions, lo

containerName := containerNameOrID(c)

// Container was created, so we defer its removal.
defer func() {
// Container was created, so we defer its removal, and also remove it if we get a sigterm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Kube pkg/util/interrupt handlers for this, specifically Run(func() error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function appears to leak go routines, it has no handling for the case where the signal never arrives. (calling Close() does not appear to do anything to terminate the go routine that I see), so it doesn't seem well suited for this use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is design exactly fro this use case. Put your remove in your defer inside run(func() {}). No channels or signals needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not in the defer, in the handler function to Chain()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interrupt.New(nil, removeContainer).Run(func() {
  defer removeContainer()
  ...
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not leak go routines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not leak go routines.

IIUC it does leak a goroutine if there is no signal received.

Code from pkg/util/interrupt/interrupt.go, with my own inline comments:

func (h *Handler) Run(fn func() error) error {
    ch := make(chan os.Signal, 1)
    signal.Notify(ch, childSignals...)
    defer signal.Stop(ch)
    go func() {
        sig, ok := <-ch  // will block forever if we go out of the scope of `Run` without having received a signal.
        if !ok {
            // unreachable code, no code ever closes ch.
            return
        }
        h.Signal(sig)
    }()
    defer h.Close()
    return fn()
}

The reason it can block forever is that we have no code closing ch to notify the goroutine to terminate.
After calling signal.Stop(ch), we know that the runtime won't send anything through the ch channel, so the receive operation will block forever, leaking the goroutine ∎ (Q.E.D.)

I'll send a patch to Kube.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line got a bit buried in the discussion above. Would be nice to get it to what we have in the source-to-image PR:

Container was created, so we defer its removal, and also remove it if we get a SIGINT/SIGTERM/SIGQUIT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

@bparees
Copy link
Contributor Author

bparees commented Mar 30, 2016

@smarterclayton reworked, but probably should not merge until your fix for interrupt.go is merged into origin, or else builds are going to leak a lot of goroutines.

@bparees bparees changed the title remove running containers in case of a sigterm [DO_NOT_MERGE] remove running containers in case of a sigterm Mar 30, 2016
@smarterclayton
Copy link
Contributor

How? The build runs once then exits?

On Wed, Mar 30, 2016 at 4:49 PM, Ben Parees [email protected]
wrote:

@smarterclayton https://github.com/smarterclayton reworked, but
probably should not merge until your fix for interrupt.go is merged into
origin, or else builds are going to leak a lot of goroutines.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8306 (comment)

@bparees
Copy link
Contributor Author

bparees commented Mar 30, 2016

How? The build runs once then exits?

sorry, you're right, i was thinking about this as running in the main process, but it's in a pod.

@bparees bparees changed the title [DO_NOT_MERGE] remove running containers in case of a sigterm remove running containers in case of a sigterm Mar 30, 2016
if err := client.Logs(logsOpts); err != nil {
return fmt.Errorf("streaming logs of %q: %v", containerName, err)
}
return interrupt.New(nil, removeContainer).Run(func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees could you please verify that using Run doesn't affect the glog messages in the terminal?
I don't expect them to be affected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not appear to:

I0331 23:00:14.457942       1 dockerutil.go:173] Starting container "openshift_s2i-build_ruby-sample-build-3_p1_post-commit" ...
I0331 23:00:14.720660       1 dockerutil.go:180] Streaming logs of container "openshift_s2i-build_ruby-sample-build-3_p1_post-commit" with options {Container:92fd0861d8531010703f53391ef5850ee1027e33147f44773a347f7d3d410019 OutputStream:0xc20802e008 ErrorStream:0xc20802e010 Follow:true Stdout:true Stderr:true Since:0 Timestamps:false Tail: RawTerminal:false} ...
/opt/rh/rh-ruby22/root/usr/bin/ruby -I"lib" -I"/opt/app-root/src/bundle/ruby/gems/rake-10.3.2/lib" "/opt/app-root/src/bundle/ruby/gems/rake-10.3.2/lib/rake/rake_test_loader.rb" "test/*_test.rb" 
Run options: --seed 28810

# Running:

.

Finished in 0.001928s, 518.5598 runs/s, 518.5598 assertions/s.

1 runs, 1 assertions, 0 failures, 0 errors, 0 skips
I0331 23:00:16.057152       1 dockerutil.go:186] Waiting for container "openshift_s2i-build_ruby-sample-build-3_p1_post-commit" to stop ...
I0331 23:00:16.623796       1 dockerutil.go:163] Removing container "openshift_s2i-build_ruby-sample-build-3_p1_post-commit" ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want to be called for bike-shedding, but dockerRun would be easier to read if we'd give a name to the function we pass to interrupt...Run.

func removeContainer(name, id string) { ... }

func startWaitContainer(name, id string, logOpts ...) { ... } 

func dockerRun(client DockerClient, createOpts docker.CreateContainerOptions, ...) ... {
    // ...

    // Start and wait for the container to run, and ensure we call removeContainer even if
    // the program catches a SIGINT/SIGTERM/SIGQUIT signal.
    return interrupt.New(nil, removeContainer).Run(func() error {
        return startWaitContainer(...)
    })
    // or maybe:
    //startWaitContainer := func() error { return startWaitContainer(...) }
    //return interrupt.New(nil, removeContainer).Run(startWaitContainer)
}

I have in mind that using interrupt as a "better defer" is not obvious for readers familiar with conventional Go code, so it's good to make it clear what's going on for our future selves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm happy to make it a named function (i've asked others to do the same before), but i can't make it a first class function w/ args because the InterruptHandler isn't designed to take functions w/ args. so it needs to be a closure.

@rhcarvalho
Copy link
Contributor

@bparees for sanity, I think it would be desirable to ensure removeContainer is only executed at most once.

Right now it can be executed by both the deferred call and the code from the interrupt package, and they are not mutually exclusive.

@rhcarvalho
Copy link
Contributor

My previous comment made me wonder what happens when the process receives two SIGINT signals in a row?

The first one will trigger removeContainer... and then the second one will interrupt the execution of removeContainer (and the rest of the program)?

@smarterclayton
Copy link
Contributor

The wait package handles run once.

On Thu, Mar 31, 2016 at 7:59 AM, Rodolfo Carvalho [email protected]
wrote:

My previous comment made me wonder what happens when the process receives
two SIGINT signals in a row?

The first one will trigger removeContainer... and then the second one
will interrupt the execution of removeContainer (and the rest of the
program)?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8306 (comment)

@smarterclayton
Copy link
Contributor

There should be no defer inside of the function, chain guarantees execution
of exit handlers.

On Thu, Mar 31, 2016 at 10:23 AM, Clayton Coleman [email protected]
wrote:

The wait package handles run once.

On Thu, Mar 31, 2016 at 7:59 AM, Rodolfo Carvalho <
[email protected]> wrote:

My previous comment made me wonder what happens when the process receives
two SIGINT signals in a row?

The first one will trigger removeContainer... and then the second one
will interrupt the execution of removeContainer (and the rest of the
program)?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8306 (comment)

@bparees
Copy link
Contributor Author

bparees commented Mar 31, 2016

I was going off @smarterclayton's example code:

interrupt.New(nil, removeContainer).Run(func() {
  defer removeContainer()
  ...
})

but i see that Close (not Chain) guarantees execution of all handlers in the event no signal was received.

Which frankly, I find weird. If i register a handler for signals, i expect to get called when a signal happens, not under other circumstances.

but regardless, i'll remove the defer, so that should resolve @rhcarvalho's concern about removeContainer being called twice.

@smarterclayton
Copy link
Contributor

Run's contract isn't about signals, just exiting a critical section
safely. It's a defer replacement.

On Thu, Mar 31, 2016 at 5:36 PM, Ben Parees [email protected]
wrote:

I was going off @smarterclayton https://github.com/smarterclayton's
example code:

interrupt.New(nil, removeContainer).Run(func() {
defer removeContainer()
...
})

but i see that Close (not Chain) guarantees execution of all handlers in
the event no signal was received.

Which frankly, I find weird. If i register a handler for signals, i expect
to get called when a signal happens, not under other circumstances.

but regardless, i'll remove the defer, so that should resolve @rhcarvalho
https://github.com/rhcarvalho's concern about removeContainer being
called twice.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8306 (comment)

@bparees
Copy link
Contributor Author

bparees commented Mar 31, 2016

Run's contract isn't about signals, just exiting a critical section safely. It's a defer replacement.

then it's weird to call the arguments to run "handlers" (or "notifies" as i guess the variable is named notify in the constructor). the closest thing they appear to be is finalizers. (or defers that respect sigterm)

@bparees
Copy link
Contributor Author

bparees commented Mar 31, 2016

@rhcarvalho @smarterclayton comments addressed, i think.

@rhcarvalho
Copy link
Contributor

@bparees kubernetes/kubernetes#23643 was not merged yet... shouldn't we have a UPSTREAM... commit here to bring in that patch within this PR?

Apart from #8306 (comment) to document the usage of the interrupt package as a defer replacement, and the commentary #8306 (comment), LGTM.

@bparees
Copy link
Contributor Author

bparees commented Apr 1, 2016

@bparees kubernetes/kubernetes#23643 was not merged yet... shouldn't we have a UPSTREAM... commit here to bring in that patch within this PR?

this won't merge until post-3.2 i don't think (unless @smarterclayton feels otherwise about the importance of having this clean up in place) and regardless we don't actually leak go routines even without the patch because we just run this stuff once within the builder pod and then the builder pod dies, so there's no danger of accumulating go routines (as pointed out to me by clayton). So i think we can just wait for the normal processes to bring it in.

Apart from #8306 (comment) to document the usage of the interrupt package as a defer replacement, and the commentary #8306 (comment), LGTM.

will add a little more verbiage around that.

@bparees
Copy link
Contributor Author

bparees commented Apr 1, 2016

(again using a loose definition of leak by which i mean, they won't accumulate)

@bparees
Copy link
Contributor Author

bparees commented Apr 1, 2016

@rhcarvalho comments addressed.

// the interrupt handler acts as a super-defer which will guarantee removeContainer is executed
// either when startWaitContainer finishes, or when a SIGQUIT/SIGINT/SIGTERM is received.
return interrupt.New(nil, removeContainer).Run(func() error {
return startWaitContainer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startWaitContainer is a func() error, we don't need to wrap it in an anonymous function:

return interrupt.New(nil, removeContainer).Run(startWaitContainer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@bparees
Copy link
Contributor Author

bparees commented Apr 6, 2016

@rhcarvalho @smarterclayton I think this is ready for a final review, but i don't intend to merge it for 3.2.

@rhcarvalho
Copy link
Contributor

LGTM

@smarterclayton
Copy link
Contributor

Approved

@bparees
Copy link
Contributor Author

bparees commented Apr 7, 2016

[merge] risk should be low since this is a mainline path that would fail tests if it were broken.

@openshift-bot
Copy link
Contributor

[Test]ing while waiting on the merge queue

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 11bfe8e

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/2788/)

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5539/) (Image: devenv-rhel7_3928)

@bparees
Copy link
Contributor Author

bparees commented Apr 7, 2016

router flake.
[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 11bfe8e

@openshift-bot openshift-bot merged commit 440fde1 into openshift:master Apr 7, 2016
@bparees bparees deleted the sigterm branch April 8, 2016 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants