Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: stoppable provisioners, helper/schema for provisioners #10934

Merged
merged 23 commits into from
Jan 30, 2017

Conversation

mitchellh
Copy link
Contributor

⚠️ For Terraform 0.9 (Do not merge until master is 0.9) ⚠️

Fast provisioner cancellation! 💃

asdf

This PR includes two things:

  • Create a new Stop API for provisioners that is called when terraform's Stop API (triggered by things like interrupts) is called.

  • Introduce a helper/schema framework for provisioners. This makes it easy to consume the Stop API also introduced in this PR.

This PR uses context heavily to make it easy to consume the cancellation, and helper/schema handles managing this context. Terraform's core stop mechanism was also changed to use context.

This switches to the Go "context" package for cancellation and threads
the context through all the way to evaluation to allow behavior based on
stopping deep within graph execution.

This also adds the Stop API to provisioners so they can quickly exit
when stop is called.
This modifies local-exec to be stoppable with the new Stop API call that
provisioners can listen to.
// and promote them to a list. For example "foo" would be promoted to
// ["foo"] automatically. This is primarily for legacy reasons and the
// ambiguity is not recommended for new usage. Promotion is only allowed
// for primitive element types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, but I don't understand why we run into this. Did we allow defining single elements for a list in the config previously?

Copy link
Contributor

@VladRassokhin VladRassokhin Jan 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the case of inline parameter of remote-exec provisioner, see what is removed in builtin/provisioners/remote-exec/resource_provisioner.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, backwards compatibility here is the reason this was necessary.

@jbardin
Copy link
Member

jbardin commented Jan 30, 2017

sorry, didn't mean to approve yet. still reviewing some, so please don't merge ...

Copy link
Member

@jbardin jbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some race issue around the context fields.

@@ -634,30 +633,34 @@ func (c *Context) Refresh() (*State, error) {
//
// Stop will block until the task completes.
func (c *Context) Stop() {
log.Printf("[WARN] terraform: Stop called, initiating interrupt sequence")

c.l.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no special-case lock handling here, so Unlock should be defered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

c.sh.Stop()
// Stop the context
c.runContextCancel()
c.runContextCancel = nil
Copy link
Member

@jbardin jbardin Jan 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set this to nil? This one is OK right now I think, but sets the code up for races in the future,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to to avoid double-cancels. I'm not sure if that actually has an effect on context though. See Stop()

// If this is called during a proper run operation, this will never
// be nil.
var stopCh <-chan struct{}
if ctx := c.runContext; ctx != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely races with acquireRun, which sets the context fields under a mutex. What we can do here is have walk get the stopCh and pass it in to avoid reading that field without the mutex:

func (c *Context) watchStop(walker *ContextGraphWalker, stopCh, doneCh <-chan struct{})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of special casing this during tests either; can we ensure that it's set for tests too and get rid of these nil checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its safe to access runContext because an entire Run is wrapped in a lock (acquireRun acquires a lock via cond vars). Fixing the special case though!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but watchStop doesn't take a lock, and can race with any method writing to the context fields. Just verified this shows up readily with the race detector.

This should fix it by removing the read of that field from watchStop entirely:

@@ -811,7 +811,9 @@ func (c *Context) walk(

        // Watch for a stop so we can call the provider Stop() API.
        doneCh := make(chan struct{})
-       go c.watchStop(walker, doneCh)
+       stopCh := c.runContext.Done()
+
+       go c.watchStop(walker, stopCh, doneCh)

        // Walk the real graph, this will block until it completes
        realErr := graph.Walk(walker)
@@ -906,12 +908,7 @@ func (c *Context) walk(
        return walker, realErr
 }

-func (c *Context) watchStop(walker *ContextGraphWalker, doneCh <-chan struct{}) {
-       // Get the stop channel. runContext will never be nil since this should
-       // only be called within the context of an operation started with
-       // acquireRun
-       stopCh := c.runContext.Done()
-
+func (c *Context) watchStop(walker *ContextGraphWalker, stopCh, doneCh <-chan struct{}) {
        // Wait for a stop or completion
        select {
        case <-stopCh:

@mitchellh
Copy link
Contributor Author

Addressed feedback. Can you check the latest commits?

All the run* unexported vars are safe to access without a lock since acquireRun/releaseRun protect access. Even if Stop is called, another operation can't start until releaseRun completes which requires that watchStop/walk and friends aren't running anymore.

case <-doneCh:
case <-ctx.Done():
cmd.Process.Kill()
err = cmd.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this err assignment races with the one in the goroutine above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed!

@mitchellh
Copy link
Contributor Author

All current feedback addressed. Please review again. :)

@mitchellh mitchellh merged commit 61881d2 into master Jan 30, 2017
@mitchellh mitchellh deleted the f-provisioner-stop branch January 30, 2017 20:53
@ghost
Copy link

ghost commented Apr 17, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants