Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocator: select a good enough store for decom/recovery #86267

Merged
merged 2 commits into from
Aug 26, 2022

Conversation

lidorcarmel
Copy link
Contributor

@lidorcarmel lidorcarmel commented Aug 16, 2022

Until now, when decommissioning a node, or when recovering from a dead
node, the allocator tries to pick one of the best possible stores as
the target for the recovery.

Because of that, we sometimes see multiple stores recover replicas
to the same store, for example, when decommissioning a node and
at the same time adding a new node.

This PR changes the way we select a destination store by choosing
a random store out of all the stores that are "good enough" for
the replica. The risk diversity is still enforced, but we may
recover a replica to a store that is considered "over full", for
example.

Note that during upreplication the allocator will still try to use
one of the "best" stores as targets.

Fixes: #86265

Release note: None

Release justification: a relatively small change, and it can be
reverted by setting kv.allocator.recovery_store_selector=best.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if you have run any benchmarks for the change - I think we should validate in in roachprod but otherwise LGTM.
:lgtm:

Reviewed 3 of 3 files at r1, 6 of 7 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @AlexTalks, @kvoli, and @lidorcarmel)


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer_test.go line 230 at r2 (raw file):

	allocRand := makeAllocatorRand(rand.NewSource(0))
	for ii, tc := range testCases {
		t.Logf("ii=%d", ii)

What thist.LogF is for?

@lidorcarmel lidorcarmel force-pushed the lidor_allocation_to_good_enough branch from 3c30e03 to 16d4269 Compare August 17, 2022 21:41
Copy link
Contributor Author

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, see the issue #86265
thanks!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @AlexTalks, and @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer_test.go line 230 at r2 (raw file):

Previously, kvoli (Austen) wrote…

What thist.LogF is for?

oops, leftovers.
removed.

@kvoli
Copy link
Collaborator

kvoli commented Aug 18, 2022

Nice results!

Copy link
Contributor

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some comments about the setting name but this looks overall really good to me!

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @AlexTalks, @kvoli, and @lidorcarmel)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 98 at r3 (raw file):

var enableRecoverToGoodEnoughStores = settings.RegisterBoolSetting(
	settings.SystemOnly,
	"kv.allocator.recover_to_good_enough_stores.enabled",

nit: does this setting (and name) make sense? perhaps we should have a setting that is something like:
kv.allocator.recovery_store_selector where the options are BEST, GOOD, ANY or something? I worry about the "good enough" terminology a bit, but if this makes sense to all I won't block on this of course.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 897 at r3 (raw file):

// selectWorst randomly chooses one of the worst candidate stores from a sorted
// (by score reversed) candidate list using the provided random generator.
func (cl candidateList) selectWorst(randGen allocatorRand) *candidate {

FYI - just curious, what is this used for?

Copy link
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: Apologies for my delays.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks, @kvoli, and @lidorcarmel)


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 714 at r3 (raw file):

				nil,
				nil,
				Dead, // Dead and Decommissioning should behave the same here

nit: can we randomly assign Dead or Decommissioning here to assert that they do indeed behave the same (just to prevent future regressions here).

@kvoli kvoli added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Aug 24, 2022
In preparation for adding a new selection function for a good enough
candidate, rename the existing "good" to "best".

Release note: None
Until now, when decommissioning a node, or when recovering from a dead
node, the allocator tries to pick one of the best possible stores as
the target for the recovery.

Because of that, we sometimes see multiple stores recover replicas
to the same store, for example, when decommissioning a node and
at the same time adding a new node.

This PR changes the way we select a destination store by choosing
a random store out of all the stores that are "good enough" for
the replica. The risk diversity is still enforced, but we may
recover a replica to a store that is considered "over full", for
example.

Note that during upreplication the allocator will still try to use
one of the "best" stores as targets.

Fixes: cockroachdb#86265

Release note: None

Release justification: a relatively small change, and it can be
reverted by setting kv.allocator.recovery_store_selector=best.
@lidorcarmel lidorcarmel force-pushed the lidor_allocation_to_good_enough branch from 16d4269 to 3ce0fd5 Compare August 24, 2022 22:07
Copy link
Contributor Author

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np at all, thanks both! PTAL.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 98 at r3 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

nit: does this setting (and name) make sense? perhaps we should have a setting that is something like:
kv.allocator.recovery_store_selector where the options are BEST, GOOD, ANY or something? I worry about the "good enough" terminology a bit, but if this makes sense to all I won't block on this of course.

Done.
I'm not worried about naming here because this setting should not be used (unless we really broke something), and I'm not sure we will have other strategies for selecting a destination store so I thought a bool is good enough.. but either way works (or, both names are good enough 😄 )


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 897 at r3 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

FYI - just curious, what is this used for?

when removing a replica we want to remove the worst.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 714 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

nit: can we randomly assign Dead or Decommissioning here to assert that they do indeed behave the same (just to prevent future regressions here).

Done (alternate the 2 instead of random).

Copy link
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)

Copy link
Contributor

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 98 at r3 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

Done.
I'm not worried about naming here because this setting should not be used (unless we really broke something), and I'm not sure we will have other strategies for selecting a destination store so I thought a bool is good enough.. but either way works (or, both names are good enough 😄 )

LGTM


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 897 at r3 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

when removing a replica we want to remove the worst.

OK, got it!

Copy link
Contributor Author

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for the review!
bors r+

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)

@craig
Copy link
Contributor

craig bot commented Aug 25, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 25, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 26, 2022

Build succeeded:

@craig craig bot merged commit 3688055 into cockroachdb:master Aug 26, 2022
@lidorcarmel lidorcarmel deleted the lidor_allocation_to_good_enough branch August 26, 2022 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recover replicas to a "good enough" store instead of the "best" store
5 participants