Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul lock loses a lock when 1 node from a cluster of 3 is down #1290

Closed
i-am-logger opened this issue Oct 12, 2015 · 7 comments
Closed

consul lock loses a lock when 1 node from a cluster of 3 is down #1290

i-am-logger opened this issue Oct 12, 2015 · 7 comments

Comments

@i-am-logger
Copy link

we are doing some chaos testing and we discovered that during taking down 1 node from consul cluster, consul lock loses the lock.

--- we recycle a random consul node froma cluster of 3 nodes..
log:
Lock lost, killing child
Terminating child pid 4344
Error running handler: exit status 1
exit status 1
Child terminated
Cleanup succeeded

@slackpad
Copy link
Contributor

Hi @isamuelson can you provide some more details about your test setup (are all three nodes in the cluster Consul servers)?

@highlyunavailable
Copy link
Contributor

@slackpad I've seen this before in normal operation - if the server that the agent is connected to dies, the agent freaks out and loses all sessions. I know I commented on it somewhere in github but the search doesn't find it.

@highlyunavailable
Copy link
Contributor

Aha, found it: #985

To be clear, the agent doesn't lose the session, it's more that Consul Lock sees an error (any error) and kills the child. A connection loss counts as error.

@slackpad
Copy link
Contributor

slackpad commented Jan 9, 2016

This got fixed in 0.6.1 via #1162.

@slackpad slackpad closed this as completed Jan 9, 2016
@Go36625090
Copy link

WHY??

-- from docs:

The contract that Consul provides is that under any of the following situations, the session will be invalidated:

Node is deregistered
Any of the health checks are deregistered
Any of the health checks go to the critical state
Session is explicitly destroyed
TTL expires, if applicable

@banks
Copy link
Member

banks commented Mar 13, 2018

@lxy9805287 this is an old issue that was closed years ago. If you have a further question about any part it might be best to ask it in the mailing list: https://github.com/hashicorp/consul/blob/9a9cc9341bb487651a0399e3fc5e1e8a42e62dd9/command/agent/event_endpoint.go#L16:2

@rpsiv
Copy link

rpsiv commented Sep 20, 2018

WHY??

-- from docs:

The contract that Consul provides is that under any of the following situations, the session will be invalidated:

Node is deregistered
Any of the health checks are deregistered
Any of the health checks go to the critical state
Session is explicitly destroyed
TTL expires, if applicable

Just incase anyone else stumbles across this like I did. A solution to have sessions persist if the Consul node it is registered to goes down would be to set the session Checks to a empty list when you create it. This would remove the serfHealth check which is there by default.

Keep in mind you likely would want to set a TTL if you are not going to use any health checks, otherwise you would get a stale lock if the service holding the lock were to fail.

IE
PUT v1/session/create

{
  "Name": "testservice",
  "Checks":[]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants