Adds "try" and "monitor-retry" options to `consul lock` command #1567

slackpad · 2016-01-06T02:06:55Z

The "try" option should close #780 and the "monitor-retry" option should close #1162.

…500 errors.

Previously, it would try once "up to" the timeout, but in practice it would just fall through. This modifies the behavior to block until the timeout has been reached.

slackpad · 2016-01-06T17:42:53Z

Tweaked things in that last commit so it always waits for the timeout period trying to get the lock vs. trying once "up to" the timeout, which usually just falls through right away. This should better match what users expect.

ryanuber · 2016-01-06T19:20:32Z

api/semaphore.go

 	pairs, meta, err := kv.List(s.opts.Prefix, opts)
 	if err != nil {
+		// TODO (slackpad) - Make a real error type here instead of using
+		// a string check.
+		const serverError = "Unexpected response code: 500"


Maybe at least move this out of the retry loop? Maybe we could use like an IsServerError(error) since I think we do this in a bunch of other places too IIRC.

Haha I was paste-n good call.

ryanuber · 2016-01-06T19:43:19Z

Looks like you already addressed the minor comments. This LGTM otherwise 🚢

Adds "try" and "monitor-retry" options to `consul lock` command

issacg · 2016-01-17T15:55:21Z

Question: Is it possible to use the new monitor-retry with a (slightly) larger timeout to facilitate restarting consul servers without damaging the quarum? I'm not sure if there wouldn't be unintended side-affects at first glance, but it seems that if you gave a large enough value to defaultMonitorRetry (say 60) one could use a configuration management tool to wrap something like service consul restart on all servers in each datacenter simultaneously without needing to time the rolling update. If it takes more than 60 seconds for the restart to complete and release the lock, then something is likely wrong with your setup anyway. It's a bit naive but it seems that the code should allow this, no?

Also, based on the naming of DefaultMonitorRetryTime it implies that this ought to be overridable, but based on the usage of defaultMonitorRetry and monitor-retry it seems to be intended to be hardcoded at 1s intervals. I'm just wondering what the folks who wrote the code intended.

Thanks!

slackpad · 2016-01-22T22:18:06Z

Hi @IsaacG we didn't design this feature with this use case in mind, but it seems like it could be possible to use it as part of a solution for this. One problem I can see though is that being able to take the lock doesn't necessarily mean that it's safe to take a server down, so doing this in a safe way would require more logic. For example, if one server out of three died during the upgrade and lost the lock, a second one would be able to get it and restart itself, putting the cluster into an outage condition. You'd still want to confirm things like last_log_index on the newly-added server before rolling another one - https://www.consul.io/docs/guides/servers.html.

For your second question, I named the DefaultMonitorRetryTime in case we decided to make it configurable later but opted for a 1 second time with just a configurable number of retries to keep the number of config knobs down. The time between retries isn't as important as the total time taken attempting retries so that seemed more important. I suppose if you had tons of nodes trying locks you might want to make that larger to avoid a thundering herd, but that's probably not a super common use case for locks.

issacg · 2016-01-24T15:32:39Z

@slackpad thanks for the comments!

In some very initial toying around, I already discovered that the 1 second DefaultMonitorRetryTime is a problem when trying to rolling upgrade, because consul lock died really fast. I don't have the logs in front of me ATM, but IIRC the problem was that consul client was unable to talk to the agent, and thus tried to immediately release the lock which, again, it couldn't do. I haven't had time to delve in to the source yet (out of round tuits for now 😞) but hypothesized that increasing the retry time might avoid this.

Also, as you pointed out, there are a lot of rough edges that I'm admittedly completely ignoring for now. I just wanted to start with something as a PoC.

slackpad force-pushed the f-lock-try branch from 2019f2a to 57aa1fd Compare January 6, 2016 02:07

James Phillips added 4 commits January 5, 2016 18:22

Adds "try" support to locks and semaphores.

8caa9e4

Adds the ability for semaphore monitors to ride out brief periods of …

ca08ba3

…500 errors.

Adds monitor retries to the consul lock command.

4afedda

Defaults monitor retries to 3 retries @ 1s for the consul lock command.

1d733f4

slackpad force-pushed the f-lock-try branch from 57aa1fd to 1d733f4 Compare January 6, 2016 02:34

Makes the timeout behavior more intuitive.

49342dc

Previously, it would try once "up to" the timeout, but in practice it would just fall through. This modifies the behavior to block until the timeout has been reached.

Makes the API behave better with small wait values.

c048c5a

ryanuber reviewed Jan 6, 2016
View reviewed changes

Factors server error checking into a new function.

58fb27e

slackpad added a commit that referenced this pull request Jan 6, 2016

Merge pull request #1567 from hashicorp/f-lock-try

06926f3

Adds "try" and "monitor-retry" options to `consul lock` command

slackpad merged commit 06926f3 into master Jan 6, 2016

slackpad deleted the f-lock-try branch January 6, 2016 19:50

slackpad mentioned this pull request Mar 11, 2016

Lock is reported lost when it might not be #830

Closed

ekmixon mentioned this pull request Aug 20, 2023

[Snyk] Fix for 1 vulnerabilities ekmixon/consul#517

Open

ekmixon mentioned this pull request Mar 8, 2024

[Snyk] Fix for 2 vulnerabilities ekmixon/consul#541

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds "try" and "monitor-retry" options to `consul lock` command #1567

Adds "try" and "monitor-retry" options to `consul lock` command #1567

slackpad commented Jan 6, 2016

slackpad commented Jan 6, 2016

ryanuber Jan 6, 2016

slackpad Jan 6, 2016

ryanuber commented Jan 6, 2016

issacg commented Jan 17, 2016

slackpad commented Jan 22, 2016

issacg commented Jan 24, 2016

Adds "try" and "monitor-retry" options to consul lock command #1567

Adds "try" and "monitor-retry" options to consul lock command #1567

Conversation

slackpad commented Jan 6, 2016

slackpad commented Jan 6, 2016

ryanuber Jan 6, 2016

Choose a reason for hiding this comment

slackpad Jan 6, 2016

Choose a reason for hiding this comment

ryanuber commented Jan 6, 2016

issacg commented Jan 17, 2016

slackpad commented Jan 22, 2016

issacg commented Jan 24, 2016

Adds "try" and "monitor-retry" options to `consul lock` command #1567

Adds "try" and "monitor-retry" options to `consul lock` command #1567