-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Fetcher filter and field optimization #1500
Conversation
planner/scan.go
Outdated
// Keeping for refence as there is an outstanding | ||
// question. | ||
// | ||
// Don't need to run filter here anymore since the | ||
// fetcher will run the filter for us | ||
// | ||
// Question: How do we want to handle `info.filterMatches` | ||
// now? @shahzadlone | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answer: Technically if len(n.currentValue.Fields) != 0
shouldn't the state still be the same (i.e. doc match was found)? It's not explicitly a filter match, but would be fine to keep it anywhere after the if and before the return true, nil
.
Also, open to changing the name if you have any better ideas to represent a "successful document found while scanning / fetching, which matched / passed the filters"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some early comments as I know you are actively working on this and looking for feedback asap. As mentioned over discord, I really like the core approach.
@@ -115,6 +115,18 @@ func (sd SchemaDescription) GetFieldKey(fieldName string) uint32 { | |||
return uint32(0) | |||
} | |||
|
|||
// GetField returns the field of the given name. | |||
func (sd SchemaDescription) GetField(name string) (FieldDescription, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: There is already such a function on CollectionDescription
- I suggest refactoring that one to just call this one.
db/fetcher/fetcher.go
Outdated
schemaFields map[uint32]client.FieldDescription | ||
fields []*client.FieldDescription | ||
filter *mapper.Filter | ||
mapping *core.DocumentMapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: This does not appear to be used and should be removed
@@ -178,14 +183,32 @@ func convertToInt(propertyName string, untypedValue any) (int64, error) { | |||
|
|||
// @todo: Implement Encoded Document type | |||
type encodedDocument struct { | |||
mapping *core.DocumentMapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This is one change that I am really not a fan of. I was under the impression that fetcher was something that 'transcended' the planner system, yet here we are making a planner/mapper type a property on a core fetcher type (instead of a param to a couple of planner-centric funcs).
IIRC the only reason DocumentMapping
is in the core
package is because it had to be referenced by fetcher.FetchNextDoc
and encodedDocument.DecodeToDoc
. It is very coupled to the planner system and if you really want it as a prop on a fetcher packages type, I would instead suggest that we instead move fetcher out of db
and into planner
- I don't like that very much either, but here we are increasing the level of coupling between the two.
Might be worth having a call about this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a hard call, I'd say youre both right and wrong here.
Its needed here as you note because of fetcher.FetchNextDoc
which returns a core.Doc
type, which is used by the planner. In reality, the DocumentMapping
and core.Doc
could really be used anywhere, as its conceptually only tied to the CollectionDescription
.
I was almost going to hoist it into the description just to see what that would look like, as that would mean a core.Doc
could be used anywhere. Theres nothing specific core.Doc
has to the planner. The goal of it is to have an effecient array based representation for a document, instead of relying on a map[string]any
everywhere. Its primarily used in the planner, but as there are other places that document handling needs to be refactored (#935) it could be the case the core.Doc
is used everywhere.
But thats a bigger conversation that we don't need to get into.
Bottom line, the fetcher now needs to be able to run a filter evaluation, it can use the old style of filter evaluation that was used before the mapper existed, but then we would be doing double work to decode the document. Once to a map[string]any
to run with the old version of parser.RunFilter
, and once to return for FetchNextDoc
.
Additionally, when mapper
was implemented, the connor
package was migrated to use map[FilterKey]any
instead of map[string]any
which is generated in mapper/targetable.go
.
So it seems its somewhat bound to use the mapper system due to the filter requirements from within the fetcher.
Open to suggestions.
One reason to avoid putting fetcher
in planner
is that it can be run entirely on its own, as you can see in db/fetcher_test.go
, and from within the Collection API, which uses the fetcher.
Happy to have a call 📞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We spoke about this a while back, and I don't think we'll see eye-to-eye on this any time soon :) Author view to trump reviewer (ignoring CTO status ;)), and mark this convo as resolved.
I have a plan for the fetcher that I want to try out in the near future anyway, which will pretty much bypass this conflict entirely if it works out.
} | ||
|
||
func (encdoc *encodedDocument) decodeToDoc(filter bool) (core.Doc, error) { | ||
if encdoc.mapping == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to my comment on the mapping
prop, I see the fact that we have to check this here (in the hot/usage path) and potentially error as a symptom of a bad constructor/type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cost of checking this is miniscule compare to the cost of decoding and running the filter.
Same answer to my thoughts above. Fetcher now needs to run filters, filters need a core.Doc
, which you can only get by using a core.DocumentMapping
.
If we use map[string]any
to run the filter on the fetcher (like we did pre-mapper days) then we can remove the dependency on the mapping stuff, but then we double the decoding work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm much less worried about performance here - I flagged this as it is an error path that we do not/should not need in the 'doing' part of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to use a pointer's value within the function that doesn't define it, then we need to check that it's not nil. I feel like it makes sense to do that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm much less worried about performance here - I flagged this as it is an error path that we do not/should not need in the 'doing' part of the code
This doesn't address my point that we must do this here, I see no way around this. Moreover, this is based on how you wrote the mapper/filter/doc.
Tangentially, your noted objection to these x != nil
come up all the time in reviews, and we keep reminding you that its as command and idiomatic as it gets for Go code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This chain was only marked as a suggestion, so I guess we can assume that it is staying as is.
Tangentially, I think the insular dogma of 'idomatic Go' far worse than any deficiencies in the language, and it very much seems to prevent the language from evolving along with the rest of the programming profession. This problem has been solved for a very long time in pretty much every other major language, yet the Go community is locked by this dogma into relying on old fashioned 'everything that is a pointer is optional and you have no way of knowing if it really exists' mentality.
If you make this an function parameter it will never be nil, and it will be pretty obvious even if we refuse to unsubscribe from such dogma, as there are only a couple of references to it (the func, and parent func). Even if you keep the (dead) check in, it will be easier to read the code paths and see that it is dead code.
@jsimnz It felt like most of your comment was directed at me, not the suggestion itself - I hope I have explained why I raised this comment, and why I will continue to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it's a function parameter it can still be nil, it doesn't change anything.
In fact, it is a function parameter, just at a different function / call site to initialize the fetcher.
It should still be checked.
--
Regarding the idiomatic. Ultimately it doesn't matter what other languages do. This Is Go, and Go convention wins. The language authors and core maintainers have been very explicit that the goals as a language are not to be fancy, and will often ignore more modern PL techniques and research, in an attempt at "simplicity" and clarity.
The pointer has to be checked regardless. Putting it in something like a homegrown results or option container doesn't change that, nor does making it a function argument.
My comment was directed at the historical nature of this thread, as it comes up often, and the response is often the same on both ends.
PS. Word for word my previous response was 50% directed at you lol. The first half of the message was at your suggestion, as I pointed out there's almost no way around having the doc mapper in the fetcher unless either refactor the filter stuff, or double the decoding work. Neither seem ideal compared to leaving as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a reply here in the other comment thread you raised the issue on too, with my same points, from Friday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving this convo, as it is going nowhere, and is of very low value lol. Leave as-is if prefered.
// we update the bitsets as we collect values | ||
// by clearing the bit for the FieldID | ||
filterSet *bitset.BitSet // filter fields | ||
selectSet *bitset.BitSet // select fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: This appear unused and should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was used, but was removed, thanks for catching!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is still still relevant? I can't see if there was any changes as a result.
// allfields map[uint32]*client.FieldDescription | ||
|
||
// static bitsets | ||
filterSet *bitset.BitSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: The usage of this property is not self explanatory and needs code-comments, I am very tired atm, but it should still be easier to understand than it currently is for me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Documentation here is really good - thanks John :)
// 0 means we we ignore the field | ||
// we update the bitsets as we collect values | ||
// by clearing the bit for the FieldID | ||
filterSet *bitset.BitSet // filter fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I'm not a fan of the fact that this property is only ever used from outside of this type - to me it strongly suggests that it should be moved off the encodedDocument
type and held somewhere else, but perhaps I will warm to it as my brain-fog fades and the documentation increases :)
nitpick: Also feels a bit odd that this (and doc
, and the encProperty
s in Properties
) is a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the properties of encodedDocument
are directly accessed outside the type. Its really just a container type for holding state, and then its reponsible for combining state.
the doc filterSet
could be on the fetcher
directly, which would then have two sets, a "static" and a "dynamic" set, but as this is specific to the other fields within the encodedDocument
type, it felt appropriate to keep here. Also means if I need to pass around the doc for any reason, the bitset goes with it.
re nitpick: Theres no good reason for the encProperty
to be a pointer, but BitSet
must be a pointer type as that's how its defined in its package. New
method returns pointer, all methods are defined on pointer reciever, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the properties of encodedDocument are directly accessed outside the type. Its really just a container type for holding state, and then its reponsible for combining state.
Didnt realize you saw it that way - in that case I do think that this should have no funcs defined on it then as that means this type is used both as an object, and as a data-struct, which I really dislike. Change that is well out of scope regardless of what anyone thinks of it here though. Convo resolved :)
db/fetcher/fetcher.go
Outdated
// seekKV will seek through results/iterator until it reaches | ||
// the target key, or if the target key doesn't exist, the | ||
// next smallest key that is greater than the target. | ||
func (df *DocumentFetcher) seekKV(key string) (bool, *KeyValue, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: It doesn't look like key
needs to be a string, and it feels very strange that it is. Is this just because there is no easy to use std Golang []byte comparator and you want to save immediate dev time? If so I think it is worth documenting that (and very-maybe an issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was mostly just a conceptual similarity choice as atm the KV level keys are considered strings, but digging further into it theres benefits to being a string.
Firstly, there is an equivalent byte comparator as string comparator.
But, at the moment, core.DataStoreKey.Bytes()
just calls []byte(datastorekey.ToString())
anyway. Which will result in an extra allocation anyway ("safe" conversion between strings and bytes allocates+copies).
At the moment, theres just a lot of conversion back and forth between strings, bytes, back to strings, and bytes, and prob a few more. So until we get organized around the proper binary serialization, it doesn't really matter which to use here. But, the one fact we know is that datastorekey.ToString()
is cheaper than datastorekey.Bytes()
. So its a toss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can revisit this when we change the key encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, theres just a lot of conversion back and forth between strings, bytes, back to strings, and bytes, and prob a few more. So until we get organized around the proper binary serialization, it doesn't really matter which to use here. But, the one fact we know is that datastorekey.ToString() is cheaper than datastorekey.Bytes(). So its a toss.
Ah fair enough, happy as-is then.
I think we can revisit this when we change the key encoding.
I do see that as quite tangential/separate, and wouldnt recommend doing those two tasks together.
db/fetcher/fetcher.go
Outdated
} | ||
|
||
// run filter | ||
passed, err := mapper.RunFilter(filterDoc, df.filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This at first glance looked like a horribly inefficient way of doing this, before I remembered that multiple fields might be involved. I think it is worth documenting this.
(also if for whatever reason you strongly disagree with that suggestion, please then remove the // run filter
comment as it doesnt add any value - I assume you added it as you were planning out the implementation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was a comment above the if docDone
line that said // if docDone
xD. Just my process, thanks for flagging the comment.
What seemed ineffecient at first glance? It gets run anyway, previously on scanner, now on fetcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What seemed ineffecient at first glance?
I honestly cant remember, will reply again if I think I remember and if this code is still relevant (just going through the old comments before reviewing the new code atm)
db/fetcher/fetcher.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
passed, err := mapper.RunFilter(decodedDoc, df.filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: You appear to be running the filter twice when docDone
is true, please remove that before merge :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggle to see this. Can you point me to where the second run of the filter happens?
Edit: I looked at it some more and I think I see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed high-level, I mostly really like the changes, much smaller and simpler than the previous fetcher stuff.
Shared my thoughts on the questions you asked.
Will give a more deeper second look soon.
db/fetcher/errors.go
Outdated
@@ -25,6 +25,7 @@ const ( | |||
errVFetcherFailedToDecodeNode string = "(version fetcher) failed to decode protobuf" | |||
errVFetcherFailedToGetDagLink string = "(version fetcher) failed to get node link from DAG" | |||
errFailedToGetDagNode string = "failed to get DAG Node" | |||
errMissingMap string = "missing document mapper" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick:
errMissingMap string = "missing document mapper" | |
errMissingMapper string = "missing document mapper" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
return false, err | ||
} | ||
// @question: Should this be after the length check? | ||
n.execInfo.docFetches++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answer: Represents the attempts at fetching docs. Since the Fetch call is before and we do this after ensuring it was not a failure, it's fine where it is at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting here for both this answer and your other comment.
Technically it means that docFetches
and filterMatches
will basically be the same value (or differ by at most 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it means that
docFetches
andfilterMatches
will basically be the same value (or differ by at most 1).
I am going to talk from the perspective of what used to happen... The majority of the time, docFetches
and iterations
would be similar (because for every iteration we would attempt to fetch. But it's not strictly true. Also docFetches
and filterMatches
will mostly NOT be same, their difference is not strictly most by 1. It all depended on the number of docs and how many match the query or not. Consider these examples from our current execution explain tests.
Consider scan node execute results...
TestExecuteExplainRequestWithOrderFieldOnParent
"iterations": uint64(3),
"docFetches": uint64(3),
"filterMatches": uint64(2),
TestExecuteExplainMutationRequestWithCreate
"iterations": uint64(1),
"docFetches": uint64(1),
"filterMatches": uint64(1),
TestExecuteExplainMutationRequestWithDeleteUsingFilter
"iterations": uint64(2),
"docFetches": uint64(3),
"filterMatches": uint64(1),
TestExecuteExplainMutationRequestWithUpdateUsingFilter
"iterations": uint64(4),
"docFetches": uint64(6),
"filterMatches": uint64(2),
TestExecuteExplainRequestWithDocumentsButNoMatches
"iterations": uint64(1),
"docFetches": uint64(3),
"filterMatches": uint64(0),
For example in the last example it had two documents, it fetched 2 times and none matched the filter. The third time it fetched there was no more docs so total attempts are 3, with 0 matches, and we entered scan node once.
Fetcher update in this PR
I have to read the implementation in more detail to see if it is still possible to differentiate between the attempts fetches took place till match found
vs total matches found
. Might even just need to return a variable with result from the fetcher call if we want to keep track similar to before of all the fetch attempts till match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My whole point is that because of the changes in this PR, the behavior is now different. Previously, the scanner "fetched" a document from the fetcher, then the scanner was responsible for filtering.
One of the core changes in this PR is that we are moving the filtering from the scanner to the fetcher.
So keeping track of both filterFetches
and filterMatches
in the scanner is meaningless now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup makes sense, I did review the implementation in more depth and seems like in order to get that info now we would have to have the fetcher logic store that metric and then communicate back to the scanner. IMO let's keep the metrics we collect scoped to planner nodes only so the ones we need to bend over backward for in a more complicated manner "out-side" the plan nodes, I would skip those metrics for now.
tldr: can just remove/skip the metric that isn't as relevant now
planner/scan.go
Outdated
// Keeping for refence as there is an outstanding | ||
// question. | ||
// | ||
// Don't need to run filter here anymore since the | ||
// fetcher will run the filter for us | ||
// | ||
// Question: How do we want to handle `info.filterMatches` | ||
// now? @shahzadlone | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answer: Technically if len(n.currentValue.Fields) != 0
shouldn't the state still be the same (i.e. doc match was found)? It's not explicitly a filter match, but would be fine to keep it anywhere after the if and before the return true, nil
.
Also, open to changing the name if you have any better ideas to represent a "successful document found while scanning / fetching, which matched / passed the filters"
db/fetcher/fetcher.go
Outdated
for _, field := range col.Schema.Fields { | ||
df.schemaFields[uint32(field.ID)] = field | ||
df.selectFields = make(map[uint32]client.FieldDescription, len(fields)) | ||
// @todo: We can probably make this size more accurate, this is an upper bound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: How would you make this more accurate? You seem to be using all the allocated KVs in the for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the @todo
here is old, the line it refers to has been removed since opening the PR xD
db/fetcher/fetcher.go
Outdated
Start(ctx context.Context, txn datastore.Txn, spans core.Spans) error | ||
FetchNext(ctx context.Context) (*encodedDocument, error) | ||
FetchNextDecoded(ctx context.Context) (*client.Document, error) | ||
FetchNextDoc(ctx context.Context, mapping *core.DocumentMapping) ([]byte, core.Doc, error) | ||
Close() error | ||
} | ||
|
||
// KeyValue is a KV store response containing the resulting core.Key and byte array value. | ||
type KeyValue struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Does this need to be public? Asking because it doesn't seem to be use outside of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind. DocumentFetcher.KV()
returns this type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to be public. The NextKV
, NextKey
, and ProcessKV
public methods all don't need to exist, theyre old methods mostly for debugging, they can be removed, so KeyValue can be made private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we open a ticket to action this in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume our fetcher is already somewhat covered with tests, but I see here quite some new or updated logic (like filtering, mapping, seeking...). Isn't it supposed to be covered as well?
db/fetcher/encoded_doc.go
Outdated
@@ -27,6 +28,10 @@ type encProperty struct { | |||
Desc client.FieldDescription | |||
Raw []byte | |||
|
|||
// Filter flag to determine if this flag | |||
// is needed for eager filter evaluation | |||
Filter bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question : don't we want to stick to a "normal" convention of naming bool
vars with prefixes like "Is", "Has"... This one could be names ShouldFilter
Current name is surely shorted, but reading if prop.Filter {
feels a bit ambiguous (at least to my view)
Maybe it's still my C++ vision where you can if basically any type :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just bringing your attention to this one John.
db/fetcher/encoded_doc.go
Outdated
if encdoc.doc.Fields[prop.Desc.ID] != nil { // used cached decoded fields | ||
continue | ||
} | ||
if filter && prop.Filter != filter { // only get filter fields if filter=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: could write if filter && !prop.Filter {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Codecov ReportPatch coverage:
@@ Coverage Diff @@
## develop #1500 +/- ##
===========================================
+ Coverage 73.47% 73.65% +0.18%
===========================================
Files 188 188
Lines 19320 19608 +288
===========================================
+ Hits 14195 14442 +247
- Misses 4073 4102 +29
- Partials 1052 1064 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 2 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
encdoc.Key = nil | ||
if encdoc.mapping != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This is a bit odd without a New
function. Here mapping
should never be nil no? Yet at the same time if it ever is, the doc
property will never be cleared by reset leaving this instance in an invalid state.
I'd suggest adding a New(...) EncodedDocument
func that ensures mapping
is never nil, or if for whatever reason you really dont want that, add an else
clause that just sets doc
to nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted a Reset
instead of a New
so it keeps using the same allocated/defined struct on the fetcher.
As for the else
, the doc will already be nil
if the mapping
is also nil
, and the mapping never gets updated once its set from the beginning, so its either nil
or not for the whole thing. I can add if you you prefer, but its not a codepath that will add any meaning/safety imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted a Reset instead of a New so it keeps using the same allocated/defined struct on the fetcher.
I think you misunderstand why I mentioned New
, by declaring a constructor you are declaring a restricted initial state. There you can enforce mapping
is not nil and reset loses a dead code branch (the implicit else), as well as possibility of the invalid doc
state that I mentioned. New
can even call Reset
internally for a stronger initial state guarantee/description if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the fields of encodedDocument
are handled in fetcher brings me to the conclusion that a Reset
function makes more sense then a New
function.
db/fetcher/fetcher.go
Outdated
) | ||
|
||
// Fetcher is the interface for collecting documents from the underlying data store. | ||
// It handles all the key/value scanning, aggregation, and document encoding. | ||
type Fetcher interface { | ||
Init(col *client.CollectionDescription, fields []*client.FieldDescription, reverse bool, showDeleted bool) error | ||
Init(col *client.CollectionDescription, fields []client.FieldDescription, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: Breaking up the props onto multiple lines, but still with multiple props per line is really quite hard to read and I'd much prefer that each prop would have their own line. and not share with the func name and brackets:
foo(
a int,
b int,
...
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I didn't like it when I was doing it, but I also hate one line per for interface definitions. But I guess Ill have to, since the both alternatives suck more 🙃
db/fetcher/fetcher.go
Outdated
// only run filter if we've collected all the fields | ||
// required for filtering. This is tracked by the bitsets. | ||
if df.filterSet.Equal(df.doc.filterSet) { | ||
df.ranFilter = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: Minor, this looks like it is set in the wrong location - if errors are produced ranFilter
will be true, yet the filter may not have been run. I think this should be set after the error check for mapper.RunFilter
(currently between ln 539 and 540)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, yea it should be. I don't think it makes a diff since it will error out anyway, but for sanity 👍
planner/datasource.go
Outdated
@@ -43,7 +43,7 @@ func (p *Planner) getCollectionScanPlan(parsed *mapper.Select) (planSource, erro | |||
} | |||
|
|||
scan := p.Scan(parsed) | |||
err = scan.initCollection(colDesc) | |||
err = scan.initCollection(colDesc) // @todo: Should `initCollection` be merged into `p.Scan(...)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: I have an very large preference for this todo to be done. I guess it is technically out of scope, but initCollection
is amusingly horrible (presumably from a time where unsure whether it would be shared/re-inited) and it would be quick todo.
Failing that, please make sure that either the todo is removed, or is bound to a ticket, before you merge this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
… groupBy behavior
9ca4ee6
to
8efc47c
Compare
481162c
to
b45633a
Compare
## Relevant issue(s) Resolves sourcenetwork#490 Resolves sourcenetwork#1582 (indirectly) ## Description This is a reduced version of sourcenetwork#491. It takes a very different approach, and tries to keep as much of the existing Fetcher structure as possible. Basically, this will try to eagerly ignore documents that don't pass the given filter at the fetcher level. This means we can apply various optimizations then if the filter was applied at the scanNode level like before.
Relevant issue(s)
Resolves #490
Resolves #1582 (indirectly)
Description
This is a reduced version of #491. It takes a very different approach, and tries to keep as much of the existing Fetcher structure as possible.
Basically, this will try to eagerly ignore documents that don't pass the given filter at the fetcher level. This means we can apply various optimizations then if the filter was applied at the scanNode level like before.
There will be two followup tasks in future PRs.
A) Proper lexigraphical encoding
B) Proper datastore iterator instead of query.
There are still some outstanding bugs and failing tests, but I wanted to get the reviews coming in on the core implementation while I continue to squash bugs to parallelize the time spentAlso..ignore some of the lingering Print statements for debugging, they will be removed before merging.Tasks
How has this been tested?
Manually, unit + integration tests, CI
Specify the platform(s) on which this was tested: