Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix format of the keys to support startUid. #3310

Merged
merged 9 commits into from
May 16, 2019
Merged

Conversation

martinmr
Copy link
Contributor

@martinmr martinmr commented Apr 23, 2019

Currently only reverse keys and data keys correctly understand the
concept of a startUid. This change adds an extra byte of metadata that
declares whether the last eight bytes of the key correspond to a start
uid so that all the required types of keys can understand startUids.

Added more unit tests, including tests for count keys, which previously
were missing.


This change is Reviewable

Currently only reverse keys and data keys correctly understand the
concept of a startUid. This change adds an extra byte of metadata that
declares whether the last eight bytes of the key correspond to a start
uid so that all the required types of keys can understand startUids.

Added more unit tests, including tests for count keys, which previously
were missing.
@martinmr martinmr requested a review from a team April 23, 2019 01:08
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Got a couple of comments. Get @mangalaman93 or @gitlw to review this as well.

Reviewed 3 of 3 files at r1.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @martinmr)


x/keys.go, line 366 at r1 (raw file):

// GetSplitKey takes a key baseKey and generates the key of the list split that starts at startUid.
func GetSplitKey(baseKey []byte, startUid uint64) []byte {

Add tests to test for schema key, type key, count key and all others.


x/keys.go, line 370 at r1 (raw file):

	copy(keyCopy, baseKey)

	p := Parse(baseKey)

if len(keyCopy) <= index { panic("...") }


x/keys.go, line 444 at r1 (raw file):

	case ByteCount, ByteCountRev:
		if len(k) < 4 {
			if Config.DebugMode {

We need to get rid of these Config.DebugMode things. We can make fmt.Printfs glog.V(2) or something. Maybe another change where you can go over all these DebugMode things.

func GetSplitKey(baseKey []byte, startUid uint64) []byte {
keyCopy := make([]byte, len(baseKey)+8)
copy(keyCopy, baseKey)

p := Parse(baseKey)
index := 1 + 2 + len(p.Attr) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that all the keys have similar structure, i.e. -

// byte 0: key type prefix (set to DefaultPrefix)
// byte 1-2: length of attr
// next len(attr) bytes: value of attr
// next byte: data type prefix
// next byte: byte to determine if this key corresponds to a list that has been split
//   into multiple parts

If that is the case, it will be useful to extract the logic of building the kyes from functions such as IndexKey, DataKey, CountKey into single function, instead of repeating it everywhere in each function.

k = k[8:]
if len(k) < 8 {

if len(k) < 16 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The print comment do not match with the condition.


if len(k) < 12 {
if Config.DebugMode {
fmt.Printf("Error: StartUid length < 8 for key: %q, parsed key: %+v\n", key, p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@mangalaman93
Copy link
Contributor

Seems to have a conflict too, please check

Copy link

@gitlw gitlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @martinmr)


posting/index.go, line 440 at r1 (raw file):

	// Also delete all the parts of any list that has been split into multiple parts.
	// Such keys have a different prefix (the last byte is set to 1).
	prefix = pk.IndexPrefix()

I feel it's more modular to change the IndexPrefix signature so that it takes an argument to specify the bytesplit flag. And the same argument for the ReversePrefix and CountPrefix methods.


x/keys.go, line 183 at r1 (raw file):

//   into multiple parts
// next four bytes: value of count.
// next eight bytes (optional): if the key corresponds to a split list, the startUid of

Does it make sense for the count key to have multiple split lists? I think it just contains a single number.


x/keys.go, line 371 at r1 (raw file):

	p := Parse(baseKey)
	index := 1 + 2 + len(p.Attr) + 1

Instead of calculating the index on the fly, I feel it's better to have a constant to represent offset of the ByteSplit.


x/keys_test.go, line 59 at r1 (raw file):

	startUid := uint64(math.MaxUint64)
	for uid = 0; uid < 1001; uid++ {
		sattr := fmt.Sprintf("attr:%d", uid)

It's confusing to see the attr have a number encoded, and keeps changing. Maybe remove this fmt.Sprintf and use "attr" directly as the attribute?


x/keys_test.go, line 91 at r1 (raw file):

	startUid := uint64(math.MaxUint64)
	for uid = 0; uid < 1001; uid++ {
		sattr := fmt.Sprintf("attr:%d", uid)

ditto

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @gitlw, @mangalaman93, and @manishrjain)


posting/index.go, line 440 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

I feel it's more modular to change the IndexPrefix signature so that it takes an argument to specify the bytesplit flag. And the same argument for the ReversePrefix and CountPrefix methods.

Won't do. I made IndexPrefix and all the others return the key with byteSplit set to zero for safety (usually these methods are called in contexts where only the main part of the posting list needs to be accessed, not any of the parts). Parts of the code that need to consider split keys should explicitly call this method.

I could have something like IndexKey() and IndexKey(split bool) but golang won't allow me to override methods.


x/keys.go, line 183 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

Does it make sense for the count key to have multiple split lists? I think it just contains a single number.

Good catch. I'll still keep the split byte for consistency but I'll explain that for count keys it will always be zero.


x/keys.go, line 366 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add tests to test for schema key, type key, count key and all others.

I forgot at the time but there's already methods for this in keys_test.go


x/keys.go, line 370 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

if len(keyCopy) <= index { panic("...") }

Done.


x/keys.go, line 371 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

Instead of calculating the index on the fly, I feel it's better to have a constant to represent offset of the ByteSplit.

The offset is not a constant since it depends on the length of attr. This would require have another field in ParsedKey which I'd prefer not to do.


x/keys.go, line 371 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

This assumes that all the keys have similar structure, i.e. -

// byte 0: key type prefix (set to DefaultPrefix)
// byte 1-2: length of attr
// next len(attr) bytes: value of attr
// next byte: data type prefix
// next byte: byte to determine if this key corresponds to a list that has been split
//   into multiple parts

If that is the case, it will be useful to extract the logic of building the kyes from functions such as IndexKey, DataKey, CountKey into single function, instead of repeating it everywhere in each function.

Done. I have put the logic to generate the first part of the list into a separate method.


x/keys.go, line 416 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

The print comment do not match with the condition.

It does. The condition is that whatever is left has at least 8 bytes for the uid and 8 bytes for the startUid.


x/keys.go, line 444 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

We need to get rid of these Config.DebugMode things. We can make fmt.Printfs glog.V(2) or something. Maybe another change where you can go over all these DebugMode things.

Done. I've added a todo to handle this in another PR.


x/keys.go, line 457 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

same here

Same as above. This is checking the total size of whatever is left of the key to process. In this case we need four bytes for the count and eight for the start uid.


x/keys_test.go, line 59 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

It's confusing to see the attr have a number encoded, and keeps changing. Maybe remove this fmt.Sprintf and use "attr" directly as the attribute?

I kind of copied this from other tests but I think the original intent was to make attr have variable length in order to test that the length is properly encoded.
That test is more thorough than having the attribute have the same length all the time. I'll add a comment here to explain.


x/keys_test.go, line 91 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

ditto

See above.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved the conflicts

Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @gitlw, @mangalaman93, and @manishrjain)

x/keys.go Outdated Show resolved Hide resolved
bytePrefix byte
byteType byte
Attr string
Uid uint64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct field Uid should be UID (from golint)

byteType byte
Attr string
Uid uint64
HasStartUid bool

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct field HasStartUid should be HasStartUID (from golint)

Attr string
Uid uint64
HasStartUid bool
StartUid uint64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct field StartUid should be StartUID (from golint)

p.Term = string(k)

term := k[:len(k)-8]
startUid := k[len(k)-8:]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var startUid should be startUID (from golint)

@martinmr martinmr merged commit c0a78ec into master May 16, 2019
@martinmr martinmr deleted the martinmr/fix-keys branch May 16, 2019 00:04
@mangalaman93
Copy link
Contributor

This is already merged? I still see comments from golangcibot unaddressed, and I didn't even have a chance to look at it again.

@martinmr
Copy link
Contributor Author

The lint comments are about renaming Uid to UID. We ignore those errors.

dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
Currently only reverse keys and data keys correctly understand the
concept of a startUid. This change adds an extra byte of metadata that
declares whether the last eight bytes of the key correspond to a start
uid so that all the required types of keys can understand startUids.

Added more unit tests, including tests for count keys, which previously
were missing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants