Refactor and cleanup treatment of keyspace IDs and KeyRange #12524

jeremycole · 2023-03-01T03:32:01Z

Description

Initially we encountered a bug where unevenly split shards with a set of key ranges as (... 000280-000300, 0003-) did not properly initialize shard 000280-000300 due to believing it overlapped with 0003-. (This was due to a comparison where 000300 should have been equal to 0003 but was not.)

In the course of fixing this bug, I decided to refactor and cleanup the treatment of keyspace IDs and topodatapb.KeyRange values overall. The treatment of keyspace IDs was very inconsistent resulting in many comparisons not properly handling the KeyRange-specific edge cases (mainly due to logic using bytes.Compare scattered across the codebase):

The Start and End fields comparing empty/zero values as minimum-key and maximum-key, respectively.
Inconsistently handling Start or End fields containing nil and zero-length []byte.
Handling of functionally equal but differently represented values, primarily different-length keys e.g. 80 vs. 8000. (Attempts were made previously to resolve this with addPadding which proved incomplete; my initial fix which we shipped/tested internally added calls to addPadding to everywhere keyspace IDs were being passed, which was effective but very ugly and poorly factored.)

In addition I've added a number of tests and improved test coverage of much of the existing code, as well as clarifying existing comments and adding new ones. All of the existing tests continue to pass, with one exception which appears to be due to a bug in the original code which was then codified in the corresponding test:

Side quest: a bug fix for `TestKeyRangeContiguous`

TestKeyRangeContiguous tested "-" and "-40" and expected a result of true, but those key ranges are most definitely not contiguous. The corresponding code had the following set of logic:

	if left == nil {
		return right == nil || (len(right.Start) == 0 && len(right.End) == 0)
	}
	if right == nil {
		return len(left.Start) == 0 && len(left.End) == 0
	}

That's attempting to handle nil values (although inconsistently) but treating nil as a missing value (inconsistently), whereas the rest of the code treats nil as full-range. I believe this should as well. I've replaced it with new logic, which I believe to be more correct:

	if KeyRangeIsComplete(a) || KeyRangeIsComplete(b) {
		return false // no two KeyRange values can be contiguous if either is the complete range
	}

If there is some reason this should treat nil differently, let me know and I'll be happy to fix that and document it. However, there is only one call site and it doesn't read to me like it requires any special/different handling, so I believe this was just a misunderstanding.

Future work

There's still a fair bit of work to be done to clean this up, I believe at least the following would be reasonable:

Introduce a new explicit type KeyspaceID or similar for keyspace IDs and replace usage of []byte to the extent possible.
- Adjust the naming/namespacing and interfaces for all the various functions on them to accomodate.
- Additionally remove the type Uint64Key (which is only used in tests) in favor of the new type.
Perhaps introduce a first-class type KeyRange, re-organize all the various comparison functions, and consistently handle both parsing and formatting KeyRanges in a single set of functions.
Remove EvenShardsKeyRange and GenerateShardRanges which behave very questionably at best and are rarely used.

I did not attempt to do any of these here to keep the PR size reasonable, but if there is some level of agreement (or at least a lack of disagreement) I'd be happy to undertake them.

Related Issue(s)

Checklist

"Backport to:" labels have been added if this change should be back-ported
Tests were added or are not required
Did the new or modified tests pass consistently locally and on the CI
Documentation was added or is not required

Deployment Notes

There should not be any user-facing impact of this change, as it's purely backend and bug fixing existing functionality which was perhaps never used before.

Signed-off-by: Jeremy Cole <[email protected]>

vitess-bot · 2023-03-01T03:32:05Z

go/vt/key/key.go

harshit-gangal

This is a good change and I think it will resolve the issue related to key range matching.

mattlord

This is great! Nice work on this, @jeremycole ! ❤️ I only had some minor nits/comments/suggestions. Please let me know what you think.

go/vt/key/key.go

go/vt/key/key_test.go

go/vt/key/key.go

go/vt/key/key_test.go

deepthi · 2023-03-01T21:34:26Z

@jeremycole others have done a detailed review, so I just wanted to chime in say that this is great work! The other proposed changes all seem to be good ones from a code readability / quality / maintainability point of view as well, so there is no objection to proceeding along those lines.

You did say in the description that you ran into a bug. Is there an issue for that which can be linked here? If not, do you mind creating one?
It will also be good to create an issue for the proposed improvements to link to an eventual PR.

harshit-gangal · 2023-03-01T22:15:43Z

go/vt/key/key.go

+func Normalize(id []byte) []byte {
+	if len(id) >= 8 {
+		return id[:8]


One breaking change that we should talk about is restricting it is 8-byte length. which the current implementation does not do.
Current implementation support keyspace id to be arbitrary length based on the Vindex.

An example of longer-than-8-byte keyspace IDs can be seen here (link courtesy @harshit-gangal)

Yeah, I (also) mistakenly thought they were always 64 bits, but that's only the length of the default hash vindex based keyspace IDs. Some are shorter and others longer. You can see examples of how many hex digits are produced for each vindex type by using the vindex query functions: https://vitess.io/docs/17.0/reference/features/vindexes/#query-vindex-functions

I was also pointed this blog post that I don't remember previously reading: https://vitess.io/blog/2017-09-18-custom-sharding-with-vitess/

I see what you mean here. I am not sure that's a real case that anyone is using but I am quite okay with not changing the semantics here. We can pivot the Normalize method to, instead of padding out to a fixed length (which there isn't any value that would satisfy the requirements without defining a maximum length) to instead remove all trailing zeroes consistently.

I made that change locally and it works fine and all tests pass the same except for TestNormalize (since it's explicitly checking the return values of Normalize, so it would be expected). I'll push another commit for this.

Signed-off-by: Hormoz Kheradmand <[email protected]>

jeremycole · 2023-03-02T01:33:38Z

Thanks for the prompting, I filed #12535 with the actual bug we hit that caused us to go down this rabbit hole. :)

Signed-off-by: Jeremy Cole <[email protected]>

… tests Signed-off-by: Jeremy Cole <[email protected]>

jeremycole · 2023-03-07T00:26:02Z

👋 I think I have addressed all review feedback with the last commits pushed. Sorry for adding everyone as reviewers, every time I touch a new file it seems we get a new reviewer added automatically. 😄

I broke up the additional work into several separate commits so that those changes can more easily be reviewed in isolation if you so desire. Let me know anything else you need! (And we'll see how the full CI runs go.)

deepthi · 2023-03-08T00:06:27Z

Looks like you broke something, both unit_test and local example are failing. On the plus side, they should be easy to reproduce locally!

jeremycole · 2023-03-08T04:16:02Z

The failure in unit_test was a flake unrelated to my changes, seems to be a timeout or something in gRPC stuff resulting in FAIL: TestMtlsAuth (60.11s). It succeeds locally.

I haven't quite gotten the local_example to work locally but in the mean time would y'all mind rerunning those tests?

deepthi · 2023-03-08T17:13:48Z

Looks like the local example failure is a flake as well. I see it failing on a PR that touches no code here #12566
https://github.com/vitessio/vitess/actions/runs/4356176371/jobs/7613821315

deepthi · 2023-03-08T17:17:57Z

@jeremycole I re-ran the 3 failing tests. I've also invited you to the org, so that in future you will have the ability to do this too. Invite goes to your primary GH email.

jeremycole · 2023-03-09T16:06:51Z

@mattlord does this address your comments and concerns?

jeremycole · 2023-03-21T22:03:48Z

👋 We're still holding on this but using it in our internal branch. Are there any remaining concerns about it?

mattlord · 2023-03-22T06:35:25Z

👋 We're still holding on this but using it in our internal branch. Are there any remaining concerns about it?

Hey! Sorry for the delay. I'll add this review to my ToDo for tomorrow.

mattlord

Nice work on this, @jeremycole ! The quality of the code, comments, PR description, issue etc are all very good. ❤️

…range_20230216

jeremycole added 3 commits February 28, 2023 14:50

Refactor and cleanup treatment of keyspace IDs and KeyRange

cf0025c

Signed-off-by: Jeremy Cole <[email protected]>

Address internal review comments

a1b92d3

Signed-off-by: Jeremy Cole <[email protected]>

Fix apparent bug in KeyRangeContiguous when a or b are full-range

9db3829

Signed-off-by: Jeremy Cole <[email protected]>

jeremycole requested review from deepthi, mattlord, rohit-nayak-ps and rsajwani as code owners March 1, 2023 03:32

vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Mar 1, 2023

mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving labels Mar 1, 2023

harshit-gangal removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Mar 1, 2023

harshit-gangal reviewed Mar 1, 2023

View reviewed changes

go/vt/key/key.go Outdated Show resolved Hide resolved

harshit-gangal approved these changes Mar 1, 2023

View reviewed changes

mattlord reviewed Mar 1, 2023

View reviewed changes

harshit-gangal requested changes Mar 1, 2023

View reviewed changes

Add test for bug in comparing "0003" vs "000300"

e34d240

Signed-off-by: Hormoz Kheradmand <[email protected]>

jeremycole added 2 commits March 1, 2023 18:17

Remove trailing zeroes in key.Normalize instead of adding padding

652179c

Signed-off-by: Jeremy Cole <[email protected]>

Address review feedback; test formatting, comments, function naming

3135522

Signed-off-by: Jeremy Cole <[email protected]>

jeremycole requested review from systay, shlomi-noach, GuptaManan100 and ajm188 as code owners March 2, 2023 04:47

jeremycole added 3 commits March 6, 2023 13:21

Refactor tests for TestKeyRangesIntersect

06f1c5f

Signed-off-by: Jeremy Cole <[email protected]>

Rename KeyRangesIntersect to KeyRangeIntersect for consistency

b4579a8

Signed-off-by: Jeremy Cole <[email protected]>

Remove unused KeyRangesOverlap function

d45b141

Signed-off-by: Jeremy Cole <[email protected]>

Rename KeyRangeIncludes to KeyRangeContainsKeyRange, clean up and add…

a76e7f8

… tests Signed-off-by: Jeremy Cole <[email protected]>

jeremycole requested a review from frouioui as a code owner March 7, 2023 00:22

harshit-gangal approved these changes Mar 7, 2023

View reviewed changes

mattlord self-requested a review March 22, 2023 06:57

mattlord approved these changes Mar 23, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into refactor_keyspace_key…

f2046d1

…range_20230216

mattlord merged commit 47f7234 into vitessio:main Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor and cleanup treatment of keyspace IDs and KeyRange #12524

Refactor and cleanup treatment of keyspace IDs and KeyRange #12524

jeremycole commented Mar 1, 2023 •

edited by mattlord

Loading

vitess-bot bot commented Mar 1, 2023 •

edited by mattlord

Loading

harshit-gangal left a comment

mattlord left a comment

deepthi commented Mar 1, 2023 •

edited

Loading

harshit-gangal Mar 1, 2023

deepthi Mar 2, 2023

mattlord Mar 2, 2023

jeremycole Mar 2, 2023

jeremycole commented Mar 2, 2023

jeremycole commented Mar 7, 2023

deepthi commented Mar 8, 2023

jeremycole commented Mar 8, 2023

deepthi commented Mar 8, 2023

deepthi commented Mar 8, 2023 •

edited

Loading

jeremycole commented Mar 9, 2023

jeremycole commented Mar 21, 2023

mattlord commented Mar 22, 2023

mattlord left a comment

Refactor and cleanup treatment of keyspace IDs and KeyRange #12524

Refactor and cleanup treatment of keyspace IDs and KeyRange #12524

Conversation

jeremycole commented Mar 1, 2023 • edited by mattlord Loading

Description

Side quest: a bug fix for TestKeyRangeContiguous

Future work

Related Issue(s)

Checklist

Deployment Notes

vitess-bot bot commented Mar 1, 2023 • edited by mattlord Loading

Review Checklist

General

If a new flag is being introduced:

If a workflow is added or modified:

Bug fixes

Non-trivial changes

New/Existing features

Backward compatibility

harshit-gangal left a comment

Choose a reason for hiding this comment

mattlord left a comment

Choose a reason for hiding this comment

deepthi commented Mar 1, 2023 • edited Loading

harshit-gangal Mar 1, 2023

Choose a reason for hiding this comment

deepthi Mar 2, 2023

Choose a reason for hiding this comment

mattlord Mar 2, 2023

Choose a reason for hiding this comment

jeremycole Mar 2, 2023

Choose a reason for hiding this comment

jeremycole commented Mar 2, 2023

jeremycole commented Mar 7, 2023

deepthi commented Mar 8, 2023

jeremycole commented Mar 8, 2023

deepthi commented Mar 8, 2023

deepthi commented Mar 8, 2023 • edited Loading

jeremycole commented Mar 9, 2023

jeremycole commented Mar 21, 2023

mattlord commented Mar 22, 2023

mattlord left a comment

Choose a reason for hiding this comment

jeremycole commented Mar 1, 2023 •

edited by mattlord

Loading

Side quest: a bug fix for `TestKeyRangeContiguous`

vitess-bot bot commented Mar 1, 2023 •

edited by mattlord

Loading

deepthi commented Mar 1, 2023 •

edited

Loading

deepthi commented Mar 8, 2023 •

edited

Loading