-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consul lock loses a lock when 1 node from a cluster of 3 is down #1290
Comments
Hi @isamuelson can you provide some more details about your test setup (are all three nodes in the cluster Consul servers)? |
@slackpad I've seen this before in normal operation - if the server that the agent is connected to dies, the agent freaks out and loses all sessions. I know I commented on it somewhere in github but the search doesn't find it. |
Aha, found it: #985 To be clear, the agent doesn't lose the session, it's more that Consul Lock sees an error (any error) and kills the child. A connection loss counts as error. |
This got fixed in 0.6.1 via #1162. |
WHY?? -- from docs: The contract that Consul provides is that under any of the following situations, the session will be invalidated: Node is deregistered |
@lxy9805287 this is an old issue that was closed years ago. If you have a further question about any part it might be best to ask it in the mailing list: https://github.com/hashicorp/consul/blob/9a9cc9341bb487651a0399e3fc5e1e8a42e62dd9/command/agent/event_endpoint.go#L16:2 |
Just incase anyone else stumbles across this like I did. A solution to have sessions persist if the Consul node it is registered to goes down would be to set the session Checks to a empty list when you create it. This would remove the serfHealth check which is there by default. Keep in mind you likely would want to set a TTL if you are not going to use any health checks, otherwise you would get a stale lock if the service holding the lock were to fail. IE
|
we are doing some chaos testing and we discovered that during taking down 1 node from consul cluster, consul lock loses the lock.
--- we recycle a random consul node froma cluster of 3 nodes..
log:
Lock lost, killing child
Terminating child pid 4344
Error running handler: exit status 1
exit status 1
Child terminated
Cleanup succeeded
The text was updated successfully, but these errors were encountered: