-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix format of the keys to support startUid. #3310
Conversation
Currently only reverse keys and data keys correctly understand the concept of a startUid. This change adds an extra byte of metadata that declares whether the last eight bytes of the key correspond to a start uid so that all the required types of keys can understand startUids. Added more unit tests, including tests for count keys, which previously were missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a couple of comments. Get @mangalaman93 or @gitlw to review this as well.
Reviewed 3 of 3 files at r1.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @martinmr)
x/keys.go, line 366 at r1 (raw file):
// GetSplitKey takes a key baseKey and generates the key of the list split that starts at startUid. func GetSplitKey(baseKey []byte, startUid uint64) []byte {
Add tests to test for schema key, type key, count key and all others.
x/keys.go, line 370 at r1 (raw file):
copy(keyCopy, baseKey) p := Parse(baseKey)
if len(keyCopy) <= index { panic("...") }
x/keys.go, line 444 at r1 (raw file):
case ByteCount, ByteCountRev: if len(k) < 4 { if Config.DebugMode {
We need to get rid of these Config.DebugMode things. We can make fmt.Printfs glog.V(2)
or something. Maybe another change where you can go over all these DebugMode things.
func GetSplitKey(baseKey []byte, startUid uint64) []byte { | ||
keyCopy := make([]byte, len(baseKey)+8) | ||
copy(keyCopy, baseKey) | ||
|
||
p := Parse(baseKey) | ||
index := 1 + 2 + len(p.Attr) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that all the keys have similar structure, i.e. -
// byte 0: key type prefix (set to DefaultPrefix)
// byte 1-2: length of attr
// next len(attr) bytes: value of attr
// next byte: data type prefix
// next byte: byte to determine if this key corresponds to a list that has been split
// into multiple parts
If that is the case, it will be useful to extract the logic of building the kyes from functions such as IndexKey
, DataKey
, CountKey
into single function, instead of repeating it everywhere in each function.
k = k[8:] | ||
if len(k) < 8 { | ||
|
||
if len(k) < 16 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The print comment do not match with the condition.
|
||
if len(k) < 12 { | ||
if Config.DebugMode { | ||
fmt.Printf("Error: StartUid length < 8 for key: %q, parsed key: %+v\n", key, p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Seems to have a conflict too, please check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @martinmr)
posting/index.go, line 440 at r1 (raw file):
// Also delete all the parts of any list that has been split into multiple parts. // Such keys have a different prefix (the last byte is set to 1). prefix = pk.IndexPrefix()
I feel it's more modular to change the IndexPrefix signature so that it takes an argument to specify the bytesplit flag. And the same argument for the ReversePrefix and CountPrefix methods.
x/keys.go, line 183 at r1 (raw file):
// into multiple parts // next four bytes: value of count. // next eight bytes (optional): if the key corresponds to a split list, the startUid of
Does it make sense for the count key to have multiple split lists? I think it just contains a single number.
x/keys.go, line 371 at r1 (raw file):
p := Parse(baseKey) index := 1 + 2 + len(p.Attr) + 1
Instead of calculating the index on the fly, I feel it's better to have a constant to represent offset of the ByteSplit.
x/keys_test.go, line 59 at r1 (raw file):
startUid := uint64(math.MaxUint64) for uid = 0; uid < 1001; uid++ { sattr := fmt.Sprintf("attr:%d", uid)
It's confusing to see the attr have a number encoded, and keeps changing. Maybe remove this fmt.Sprintf and use "attr" directly as the attribute?
x/keys_test.go, line 91 at r1 (raw file):
startUid := uint64(math.MaxUint64) for uid = 0; uid < 1001; uid++ { sattr := fmt.Sprintf("attr:%d", uid)
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @gitlw, @mangalaman93, and @manishrjain)
posting/index.go, line 440 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
I feel it's more modular to change the IndexPrefix signature so that it takes an argument to specify the bytesplit flag. And the same argument for the ReversePrefix and CountPrefix methods.
Won't do. I made IndexPrefix and all the others return the key with byteSplit set to zero for safety (usually these methods are called in contexts where only the main part of the posting list needs to be accessed, not any of the parts). Parts of the code that need to consider split keys should explicitly call this method.
I could have something like IndexKey() and IndexKey(split bool) but golang won't allow me to override methods.
x/keys.go, line 183 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
Does it make sense for the count key to have multiple split lists? I think it just contains a single number.
Good catch. I'll still keep the split byte for consistency but I'll explain that for count keys it will always be zero.
x/keys.go, line 366 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add tests to test for schema key, type key, count key and all others.
I forgot at the time but there's already methods for this in keys_test.go
x/keys.go, line 370 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
if len(keyCopy) <= index { panic("...") }
Done.
x/keys.go, line 371 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
Instead of calculating the index on the fly, I feel it's better to have a constant to represent offset of the ByteSplit.
The offset is not a constant since it depends on the length of attr. This would require have another field in ParsedKey which I'd prefer not to do.
x/keys.go, line 371 at r1 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
This assumes that all the keys have similar structure, i.e. -
// byte 0: key type prefix (set to DefaultPrefix) // byte 1-2: length of attr // next len(attr) bytes: value of attr // next byte: data type prefix // next byte: byte to determine if this key corresponds to a list that has been split // into multiple parts
If that is the case, it will be useful to extract the logic of building the kyes from functions such as
IndexKey
,DataKey
,CountKey
into single function, instead of repeating it everywhere in each function.
Done. I have put the logic to generate the first part of the list into a separate method.
x/keys.go, line 416 at r1 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
The print comment do not match with the condition.
It does. The condition is that whatever is left has at least 8 bytes for the uid and 8 bytes for the startUid.
x/keys.go, line 444 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
We need to get rid of these Config.DebugMode things. We can make fmt.Printfs
glog.V(2)
or something. Maybe another change where you can go over all these DebugMode things.
Done. I've added a todo to handle this in another PR.
x/keys.go, line 457 at r1 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
same here
Same as above. This is checking the total size of whatever is left of the key to process. In this case we need four bytes for the count and eight for the start uid.
x/keys_test.go, line 59 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
It's confusing to see the attr have a number encoded, and keeps changing. Maybe remove this fmt.Sprintf and use "attr" directly as the attribute?
I kind of copied this from other tests but I think the original intent was to make attr have variable length in order to test that the length is properly encoded.
That test is more thorough than having the attribute have the same length all the time. I'll add a comment here to explain.
x/keys_test.go, line 91 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
ditto
See above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved the conflicts
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @gitlw, @mangalaman93, and @manishrjain)
bytePrefix byte | ||
byteType byte | ||
Attr string | ||
Uid uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct field Uid
should be UID
(from golint
)
byteType byte | ||
Attr string | ||
Uid uint64 | ||
HasStartUid bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct field HasStartUid
should be HasStartUID
(from golint
)
Attr string | ||
Uid uint64 | ||
HasStartUid bool | ||
StartUid uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct field StartUid
should be StartUID
(from golint
)
p.Term = string(k) | ||
|
||
term := k[:len(k)-8] | ||
startUid := k[len(k)-8:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var startUid
should be startUID
(from golint
)
This is already merged? I still see comments from golangcibot unaddressed, and I didn't even have a chance to look at it again. |
The lint comments are about renaming Uid to UID. We ignore those errors. |
Currently only reverse keys and data keys correctly understand the concept of a startUid. This change adds an extra byte of metadata that declares whether the last eight bytes of the key correspond to a start uid so that all the required types of keys can understand startUids. Added more unit tests, including tests for count keys, which previously were missing.
Currently only reverse keys and data keys correctly understand the
concept of a startUid. This change adds an extra byte of metadata that
declares whether the last eight bytes of the key correspond to a start
uid so that all the required types of keys can understand startUids.
Added more unit tests, including tests for count keys, which previously
were missing.
This change is