Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor leave - balance and tail violations #972

Merged
merged 15 commits into from
Oct 8, 2021

Conversation

martinsumner
Copy link
Contributor

Some curious and unexpected cluster plans when running leave operations, prompted a review of the lead code.

This PR corrects a number of issues found:

  • The first part of a leave plan is to plan to handoff all vnodes on the leaving nodes to other members of the cluster, in a way that does not break the target_n_val. The logic of this attempt_simple_leave function was incorrect, as it would produce plans that would include tail violations (where the location was safe when the ring was viewed as a list, but not when it was considered as a ring where the head wraps round to the tail). This issue is corrected.

  • The attempt_simple_leave did not attempt to balance the vnodes in any way, leading in some cases to unbalanced plans (and leaves that might not be supportable as a receiving node may not have sufficient space). There is now a preference to make choices that lead to more balanced outcomes - and a workaround to skip this stage and move straight to a freshly balanced cluster plan (riak_core.full_rebalance_onleave).

  • When a full rebalance was forced, as it was not possible to produce a safe outcome to attempt_simple_transfer, then a full rebalance would be triggered. However a deprecated function was used, one which could lead to tail violations (ignoring the improvements made in Riak 2.2.5). The updated function is now used.

Never attempt a simple transfer, always rebalance when a cluster leave operation is requested
sequential_claim is the refactored version not the v1 version  (which should handle tail violations).
Prefer to transfer to a node with a lower current count of vnodes - so that by default simple_transfer will tend to give more balanced results
Also include warning in the plan output
Reviewed previous code, to ensure this is being handled correctly.

However, this does appear to be incorrect - as the current simple_transfer does not handle tail violations.  It only checks forward to the end of the list for TargetN violation - it does not wrap around to the start of the list.
Initial unit test added
force_rebalance;
false ->
TargetN = app_helper:get_env(riak_core, target_n_val),
Owners = riak_core_ring:all_owners(Ring),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost the same code as in line 383 and following

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored so Owners/Members are determined only once

src/riak_core_gossip.erl Show resolved Hide resolved
Just to explicitly state the requirement that input to the function (Owners) must be sorted.  This is a common assumption in riak_core, and was an assumption of the replaced function.
@martinsumner martinsumner merged commit b36a7ae into develop-3.0 Oct 8, 2021
@martinsumner martinsumner deleted the mas-i970-simpleleave branch October 8, 2021 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants