-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dist blocker #51
Dist blocker #51
Conversation
It prevents nodes from reconnecting too fast
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #51 +/- ##
==========================================
+ Coverage 98.35% 98.79% +0.43%
==========================================
Files 10 11 +1
Lines 792 831 +39
==========================================
+ Hits 779 821 +42
+ Misses 13 10 -3 ☔ View full report in Codecov by Sentry. |
…s_done Add dist_blocker_skip_blocking_if_no_cleaners testcase
dist_blocker_blocks_if_cleaner_says_done_and_second_cleaner_does_not_ack testcase Common connect_and_disconnect function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. My only concern is how to protect against the cleaner taking too long on some node. Some timeout maybe? But let's not make it too complex at once.
Could be timeout. Could be a very noisy logging. With timeout allowing node to connect would lead to the "unusual state" (i.e. untested state). It could be ok, it could be just adding more load to the cluster and causing errors. In this case there is a chance that there would be "that one node, that prevented the join". Currently we have:
Oh, I think there is 30 seconds grace period during which cets_discovery would not try to reconnect - so, it should be enough for the cluster to be able to accept the node again. Maybe it could be enough. |
In this PR we introduce cets_dist_blocker, which will set cookies that would prevent nodes from reconnecting before cleaning is done.
We use set_cookie/2, which allows to set a particular cookie for a specific remote node (and specific local node). If we set cookie to some random value, the node would not be able to connect to us (also we would not able to connect). But the existing connection will remain. Also, it only affects that specific node, other nodes would use cookie provided by
get_cookie/0
function.The idea is:
nodedown
message (sent usingmonitor_nodes
call).This PR addresses "sessionCount is incorrect during upgrades".
The final goal is to make global's prevent_overlapped_partitions reliable.
Proposed changes include:
PR to MongooseIM: esl/MongooseIM#4234
PR to MongooseHelm: esl/MongooseHelm#38