-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: TimeTraveling (History Traversing) query engine and doc fetcher #59
Conversation
New system reserved indexes are NEGATIVE. Maybe we need a map heredefradb/db/base/descriptions.go Lines 42 to 45 in 53b61ea
This comment was generated by todo based on a
|
Support multiple spansLines 133 to 138 in 53b61ea
This comment was generated by todo based on a
|
Support multiple spansLines 157 to 160 in 53b61ea
This comment was generated by todo based on a
|
Find an effecient way to determine if a CID is a member of adefradb/db/fetcher/versioned.go Lines 268 to 273 in 53b61ea
This comment was generated by todo based on a
|
set slice sizedefradb/db/fetcher/versioned.go Lines 301 to 306 in 53b61ea
This comment was generated by todo based on a
|
Right now we ONLY handle LWW_REGISTER, need to swith on this and get CType from descriptionsdefradb/db/fetcher/versioned.go Lines 370 to 375 in 53b61ea
This comment was generated by todo based on a
|
Update with document schemasLines 294 to 299 in 53b61ea
This comment was generated by todo based on a
|
Remove NodeGed Wtter as a paramter, and move it to a MerkleClock fieldLines 97 to 100 in 53b61ea
This comment was generated by todo based on a
|
FIX THISLines 74 to 79 in 53b61ea
This comment was generated by todo based on a
|
Abstract schema definitions to CORECurrently schema definitions are stored in db/base/descriptions defradb/merkle/crdt/merklecrdt.go Lines 66 to 71 in 53b61ea
This comment was generated by todo based on a
|
handle error if no CID is givendefradb/query/graphql/planner/commit.go Lines 139 to 142 in 53b61ea
This comment was generated by todo based on a
|
When running the optimizer, check if the filter objectdefradb/query/graphql/planner/select.go Lines 225 to 230 in 53b61ea
This comment was generated by todo based on a
|
Also includes some extra explainer comments that arose during the code-tour onboarding video sessions. |
Added some minor comments as I was reading, but haven't gotten fully into it yet - will continue review tomorrow when my brain is sharper |
@AndrewSisley im not seeing any comments :/ |
53b61ea
to
1025a9e
Compare
parse groupbydefradb/query/graphql/parser/query.go Lines 369 to 372 in 1025a9e
This comment was generated by todo based on a
|
1025a9e
to
d98da05
Compare
If you made your comments as a review, but they are marked as "pending" its because you haven't submitted the review yet. |
Ahh okay - didnt realize it behaved like that - let me know if you have a reference when reviewing across two days - e.g. submit on the second day, or submit twice so you get stuff early, or never actually open the review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me and makes sense - quite a lot of fun functionality for fairly straight forward code :) I've added quite a few mostly minor comments that would good to be addressed though - most fall under:
- Dead code removal
- Comment removal
- Tests
- Soft-conflicts with other recent changes/findings that wont be flagged by git
It is also making me really want to merge the key changes as we seem to be building up more and more string magic and it'll keep getting harder and harder to maintain that branch until it goes in...
Also sorry for any comments that were addressed by your latest changes, I think you might have resolved a few already
// CollectionDescription describes a Collection and | ||
// all its associated metadata | ||
type CollectionDescription struct { | ||
Name string | ||
ID uint32 | ||
Schema SchemaDescription | ||
Indexes []IndexDescription | ||
Indexes []IndexDescription // @todo: New system reserved indexes are NEGATIVE. Maybe we need a map here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would suggest that this quite a nasty bug waiting to happen - might be worth double checking that an issue is created on merge by the bot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on this one? the negative stuff is only in relation to the Index identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment suggests this could be accessed by index? Indexes[x] where x might now be negative?
@@ -102,6 +142,15 @@ func (sd SchemaDescription) IsEmpty() bool { | |||
return false | |||
} | |||
|
|||
func (sd SchemaDescription) GetFieldKey(fieldName string) uint32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this can be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently its only used within the base package, but I can see its useful for future work outside of it. So might as well keep it in for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, duplicate as per below comment, will remove in favor of collection implementation
// Returns true, if there is a result, | ||
// and false otherwise. | ||
func (n *versionedScanNode) Next() (bool, error) { | ||
if !n.scanInitialized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't spot why this is needed, and it is a bit ugly checking a near-constant value for every value (as well suggesting perhaps falsely that it is needed and that there is some non-obvious route into this function). Would suggest removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it is technically required, but only because I realized there is a bug in the executor that this resolves, however it should be resolved elsewhere. When we create the plan, for some reason we don't call plan.Init()
before starting the executor and calling into Next()
.
So we can remove this check here (albeit, its a minuscule amount of over head, would only produce a single Assembly Op from the compiler), and add the necessary top level Init
call to the executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also extra overhead for any dev wanting to read the function :) And performance-wise (although I see that as trivial and unimportant here), it is an extra code branch that'll throw off any speculative execution somewhat (although if the bool is always the same value I think a lot of CPUs are good at picking up on that and assume it is that value ahead of time)
|
||
// get node | ||
// decode the block | ||
return dag.DecodeProtobuf(blk.RawData()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to set the node builder here or it might be spewing out v0 cids:
nd.SetCidBuilder(
cid.V1Builder{
Codec: cid.Raw,
MhType: mh.SHA2_256,
MhLength: -1,
})
either works fine, I guess it depends on the nature of the issue. If most of the comments/changes are independent, then you can submit multiple reviews. |
We forgot to call `plan.Init()` after `makePlan(...)` which caused the scanners to not be properly initialized as there was no propagation. This was incorrectly "fixed" without realizing it by checking if the scanners had beed initialized on each `scan.Next()` call. This worked, but added constant overhead to each call, and could be resolved by properly initializing the entire plan graph once at the beginning.
5ac6247
to
c3af989
Compare
datastores/badger/v3/iterator.go
Outdated
@@ -85,6 +87,8 @@ func (iterator *BadgerIterator) IteratePrefix(ctx context.Context, startPrefix d | |||
formattedStartPrefix := asFormattedString(startPrefix) | |||
formattedEndPrefix := asFormattedString(endPrefix) | |||
|
|||
iterator.prefix = formattedStartPrefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this isn't used, and this is incorrect (end is unlikely to fall within the full start prefix)
- Remove setting of iterator.prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol I put that in there while debugging and totally forgot, yea needs to be removed
if err := df.kvResultsIter.Close(); err != nil { | ||
return err | ||
} | ||
} | ||
df.kvResultsIter = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but as mentioned on discord should probs move df.kvResultsIter = nil
inside the if
// decode cidRaw from core.Key to cid.Cid | ||
// need to remove '/' prefix from the core.Key | ||
|
||
c, err := cid.Decode(cidRaw.String()[1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as I was losing interest in the Key refactor 😁
db/fetcher/versioned.go
Outdated
// run the DF init, VersionedFetchers only supports the Primary (0) index | ||
vf.DocumentFetcher = new(DocumentFetcher) | ||
return vf.DocumentFetcher.Init(col, &col.Indexes[0], fields, reverse) | ||
// df.index = index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest removal of commented out code
query/graphql/parser/query.go
Outdated
slct.QueryType = ScanQuery | ||
} | ||
|
||
// @todo: parse groupby |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this todo for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old
@@ -156,7 +156,9 @@ func (p *Planner) makePlan(stmt parser.Statement) (planNode, error) { | |||
if err != nil { | |||
return nil, err | |||
} | |||
return plan, nil | |||
|
|||
err = plan.Init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be why you started getting iterator errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, possible ill look into it
query/graphql/planner/select.go
Outdated
@@ -412,3 +467,5 @@ func (p *Planner) Select(parsed *parser.Select) (planNode, error) { | |||
} | |||
return top, nil | |||
} | |||
|
|||
// func (p *Planner) select(parsed *parser.Select, source planNode, render bool, fromCollection string) (....) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest removal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved assuming you revert badger/v3/iterator and fix the linter errors (some are very important) - dont remember anything else serious
Codecov Report
@@ Coverage Diff @@
## develop #59 +/- ##
===========================================
- Coverage 57.98% 57.66% -0.33%
===========================================
Files 91 95 +4
Lines 8935 9363 +428
===========================================
+ Hits 5181 5399 +218
- Misses 3194 3354 +160
- Partials 560 610 +50
|
…#59) * Implemented versionFetcher * Update Fetcher interface * Created Txn utility * Added versionedScan to query planner * Added tests for versionFetcher
…sourcenetwork#59) * Implemented versionFetcher * Update Fetcher interface * Created Txn utility * Added versionedScan to query planner * Added tests for versionFetcher
Note: Needs rebasing onto the most recent work as this branch is pretty out of date, just wanted to get a PR on the books.
resolves #58
Architecture
This PR implements a new kind of
DocFetcher
calledVersionedFetcher
which loads the given document update-graph into an ephemeral or transient datastore in-memory that exists only for the lifetime of the query execution.This datastore loads the blocks from the target version back to the genesis version and then re-serializes the target state from
v0
tovX
.The primary goal here was to re-use as much of the current Fetcher and Query system without modification to the execution flow of a regular query vs a time-traveling query.
This is why the implementation is scoped only to the
VersionedFetcher
and that output of theVersionedFetcher
matches the structure of the datastores as if it were a regular query.