Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: close provider/provisioner connections when not used anymore #2406

Merged
merged 2 commits into from
Jun 23, 2015
Merged

core: close provider/provisioner connections when not used anymore #2406

merged 2 commits into from
Jun 23, 2015

Conversation

svanharmelen
Copy link
Contributor

Currently Terraform is leaking goroutines and with that memory. I know strictly speaking this maybe isn’t a real concern for Terraform as it’s mostly used as a short running command line executable.

But there are a few of us out there that are using Terraform in some long running processes and then this starts to become a problem.

Next to that it’s of course good programming practise to clean up resources when they're not needed anymore. So even for the standard command line use case, this seems an improvement in resource management.

Personally I see no downsides as the primary connection to the plugin is kept alive (the plugin is not killed) and only unused connections that will never be used again are closed to free up any related
goroutines and memory.

Currently Terraform is leaking goroutines and with that memory. I know
strictly speaking this maybe isn’t a real concern for Terraform as it’s
mostly used as a short running command line executable.

But there are a few of us out there that are using Terraform in some
long running processes and then this starts to become a problem.

Next to that it’s of course good programming practise to clean up
resources when they're not needed anymore. So even for the standard
command line use case, this seems an improvement in resource management.

Personally I see no downsides as the primary connection to the plugin
is kept alive (the plugin is not killed) and only unused connections
that will never be used again are closed to free up any related
goroutines and memory.
@svanharmelen
Copy link
Contributor Author

@mitchellh @phinze I suppose you guys would want to have a look at this one 😉 If one of you could have a look/review and share your thoughts that would be great!

Thanks!

@svanharmelen
Copy link
Contributor Author

Ping @mitchellh and @phinze for a review on this one. Doesn't seem something that has a big impact, yet it improves the general code by properly releasing any unused resourses...

Would be really great if this could be added before releasing 0.6.0.

Thx!

func (m *muxBroker) NextId() uint32 {
return atomic.AddUint32(&m.nextId, 1)
return rand.Uint32()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces a (very low) possibility of the ID repeating, no? Which would be problematic for an identifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It are pseudo-random numbers and there are 4.294.967.295 of numbers to hand out. That means it would be extremely unlikely to get the same number within just a few numbers of each other.

And an ID may very wel be reused as soon as the previous command that was using the ID is closed by the new close provider. Having just a counter going up would mean there is hard limit of the maximum amount of commands that can be executed. And even through 4.294.967.295 is quite a lot, if you use TF in an always on setup it will hit that limit sooner or later.

But if we think this could be an issue, the only additional thing I can think of is to make a map with generated numbers and a timestamp (something like a map[uint32]Time) and then whenever a new number is generated we first check if the number was already handed out. But then additionally it should also delete entries that are older then x time, where x would be something we would agree on. I would say if you would like to make it extremely safe make it 60 minutes. There is no single apply/plan/refresh that takes 60 minutes right? Doing the cleanup could then be triggered only when NextId() is called, so it doesn't require a scheduled thing of it's own to periodically check and cleanup.

Sounds like maybe a good add-on, let me know what you think and if you think it's needed...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked about this on IRC, but mirroring here - I think we're fine to punt on this for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have a look tomorrow at how much impact a solution as discussed would have. The more I think about it the more I believe it wouldn't hurt and would make 99,99% a 100% which is always better of course 😉

But if so, I'll open a new PR for that one...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would feel better about reinstating the atomic increment. Packer/Terraform (both use this system, but not the same code) open a LOT of channels. Hundreds. While the chance is low, if it did conflict, it would be disastrous. LIkely an immediate panic at some point.

The atomic increment isn't a hard limit, we just expect it to wrap at some point (being a signed integer).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point and worth potentially opening in Yamux. But I think on our side we use it for other reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open an issue there to discuss this, but I can imagine use cases where this check would actually make sense. For Terraform it doesn't as the channels aren't usually used for more then a couple of minutes...

So would something like this be a workaround until (if) the logic in yamux can be changed?

func (m *muxBroker) NextId() uint32 {
  if atomic.LoadUint32(&m.nextid) == math.MaxUint32-10 {
    atomic.StoreUint32(&m.nextid, 0)
  }
  return atomic.AddUint32(&m.nextId, 1)
}

Maybe still a little dirty, but this would do the trick right?

Edit: With some comments explaining why that is done of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 This seems like a good compromise for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not picking this up today. Had a horrable (non productive) day with a broken macbook.

Will do some tests and make a PR for this tomorrow so we can get it in (if all is well of course) before shipping 0.6.0...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah, that would be ideal!

@phinze
Copy link
Contributor

phinze commented Jun 23, 2015

Thanks for taking this on @svanharmelen!

Overall this LGTM - two inline questions / nits.

Removed the `DotOrigin()` func after review of the PR. With this change
the PR is up-to-spec to be merged (as just discussed with Paul in IRC).
svanharmelen pushed a commit that referenced this pull request Jun 23, 2015
…ners

core: close provider/provisioner connections when not used anymore
@svanharmelen svanharmelen merged commit 01a9cee into hashicorp:master Jun 23, 2015
@svanharmelen svanharmelen deleted the b-close-providers-provisioners branch June 24, 2015 16:21
svanharmelen pushed a commit that referenced this pull request Jun 25, 2015
core: reverting a few lines from PR #2406
@ghost
Copy link

ghost commented May 2, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators May 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants