feat: TimeTraveling (History Traversing) query engine and doc fetcher #59

jsimnz · 2021-11-24T21:23:40Z

Note: Needs rebasing onto the most recent work as this branch is pretty out of date, just wanted to get a PR on the books.

resolves #58

Architecture

This PR implements a new kind of DocFetcher called VersionedFetcher which loads the given document update-graph into an ephemeral or transient datastore in-memory that exists only for the lifetime of the query execution.

This datastore loads the blocks from the target version back to the genesis version and then re-serializes the target state from v0 to vX.

The primary goal here was to re-use as much of the current Fetcher and Query system without modification to the execution flow of a regular query vs a time-traveling query.

This is why the implementation is scoped only to the VersionedFetcher and that output of the VersionedFetcher matches the structure of the datastores as if it were a regular query.

todo · 2021-11-24T21:23:43Z

New system reserved indexes are NEGATIVE. Maybe we need a map here

defradb/db/base/descriptions.go

Lines 42 to 45 in 53b61ea

    
           	Indexes []IndexDescription // @todo: New system reserved indexes are NEGATIVE. Maybe we need a map here 
        
           } 
        
           // IDString returns the collection ID as a string

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:23:46Z

Support multiple spans

defradb/db/fetcher/fetcher.go

Lines 133 to 138 in 53b61ea

    
           // 	Prefix: df.spans[0].Start().String(), // @todo: Support multiple spans 
        
           // } 
        
           // if df.reverse { 
        
           // 	q.Orders = []dsq.Order{dsq.OrderByKeyDescending{}} 
        
           // } else { 
        
           // 	q.Orders = []dsq.Order{dsq.OrderByKey{}}

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:23:49Z

Support multiple spans

defradb/db/fetcher/fetcher.go

Lines 157 to 160 in 53b61ea

    
           	Prefix: df.spans[0].Start().String(), // @todo: Support multiple spans 
        
           } 
        
           if df.reverse { 
        
           	q.Orders = []dsq.Order{dsq.OrderByKeyDescending{}}

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:23:52Z

Find an effecient way to determine if a CID is a member of a

defradb/db/fetcher/versioned.go

Lines 268 to 273 in 53b61ea

    
           // @todo: Find an effecient way to determine if a CID is a member of a 
        
           // DocKey State graph 
        
           // @body: We could possibly append the DocKey to the CID either as a 
        
           // child key, or an instance on the CID key. 
        
           hasLocalBlock, err := vf.store.DAGstore().Has(c)

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:23:55Z

set slice size

defradb/db/fetcher/versioned.go

Lines 301 to 306 in 53b61ea

    
           // subDAGLinks := make([]cid.Cid, 0) // @todo: set slice size 
        
           l, err := nd.GetNodeLink(core.HEAD) 
        
           // ErrLinkNotFound is fine, it just means we have no more head links 
        
           if err != nil && err != dag.ErrLinkNotFound { 
        
           	return err 
        
           }

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:23:58Z

Right now we ONLY handle LWW_REGISTER, need to swith on this and get CType from descriptions

defradb/db/fetcher/versioned.go

Lines 370 to 375 in 53b61ea

    
           	// @todo: Right now we ONLY handle LWW_REGISTER, need to swith on this and get CType from descriptions 
        
           	if err := vf.processNode(fieldID, subNd, core.LWW_REGISTER, l.Name); err != nil { 
        
           		return err 
        
           	} 
        
           }

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:24:01Z

Update with document schemas

defradb/document/document.go

Lines 294 to 299 in 53b61ea

    
           // @todo: Update with document schemas 
        
           func (doc *Document) setAndParseType(field string, value interface{}) error { 
        
           	switch value.(type) { 
        
           	// int (any number) 
        
           	case int:

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:24:04Z

Remove NodeGed Wtter as a paramter, and move it to a MerkleClock field

defradb/merkle/clock/clock.go

Lines 97 to 100 in 53b61ea

    
           // @todo Remove NodeGed Wtter as a paramter, and move it to a MerkleClock field 
        
           _, err = mc.ProcessNode( 
        
           	&crdtNodeGetter{deltaExtractor: mc.crdt.DeltaDecode}, 
        
           	nd.Cid(),

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:24:07Z

FIX THIS

defradb/merkle/crdt/lwwreg.go

Lines 74 to 79 in 53b61ea

    
           	// error, lets panic for now. TODO: FIX THIS 
        
           	panic("invalid index identifier for Merkle CRDT") 
        
           } 
        
           clk := clock.NewMerkleClock(headstore, dagstore, headsetKey.String(), reg) 
        
           // newBaseMerkleCRDT(clock, register)

This comment was generated by todo based on a `TODO` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:24:10Z

Abstract schema definitions to CORE

Currently schema definitions are stored in db/base/descriptions

defradb/merkle/crdt/merklecrdt.go

Lines 66 to 71 in 53b61ea

    
           // @todo: Abstract schema definitions to CORE 
        
           // @body: Currently schema definitions are stored in db/base/descriptions 
        
           // which is suppose to be reserved for implementation specific data. 
        
           // However we need to have some reference of schema here in the MerkleCRDT 
        
           // system, which is a protocol design, and shouldn't rely on implementation 
        
           // specific utilities.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:24:13Z

handle error if no CID is given

defradb/query/graphql/planner/commit.go

Lines 139 to 142 in 53b61ea

    
           } // @todo: handle error if no CID is given 
        
           return &commitSelectNode{ 
        
           	p:             p,

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

todo · 2021-11-24T21:24:16Z

When running the optimizer, check if the filter object

defradb/query/graphql/planner/select.go

Lines 225 to 230 in 53b61ea

    
           // @todo: When running the optimizer, check if the filter object 
        
           // contains a _key equality condition, and upgrade it to a point lookup 
        
           // instead of a prefix scan + filter via the Primary Index (0), like here: 
        
           dockeyIndexKey := base.MakeIndexKey(&sourcePlan.info.collectionDescription, 
        
           	&sourcePlan.info.collectionDescription.Indexes[0], core.NewKey(parsed.DocKey)) 
        
           spans := core.Spans{core.NewSpan(dockeyIndexKey, core.Key{})}

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

jsimnz · 2021-11-24T21:26:58Z

Also includes some extra explainer comments that arose during the code-tour onboarding video sessions.

AndrewSisley · 2021-11-25T01:25:20Z

Added some minor comments as I was reading, but haven't gotten fully into it yet - will continue review tomorrow when my brain is sharper

jsimnz · 2021-11-25T02:08:05Z

@AndrewSisley im not seeing any comments :/

todo · 2021-11-25T11:11:03Z

parse groupby

defradb/query/graphql/parser/query.go

Lines 369 to 372 in 1025a9e

    
           	// @todo: parse groupby 
        
           } 
        
           // if theres no field selections, just return

This comment was generated by todo based on a `todo` comment in `1025a9e` in #59. cc @sourcenetwork.

jsimnz · 2021-11-25T11:23:28Z

Added some minor comments as I was reading, but haven't gotten fully into it yet - will continue review tomorrow when my brain is sharper

If you made your comments as a review, but they are marked as "pending" its because you haven't submitted the review yet.
Lmk

AndrewSisley · 2021-11-25T15:04:17Z

Added some minor comments as I was reading, but haven't gotten fully into it yet - will continue review tomorrow when my brain is sharper

If you made your comments as a review, but they are marked as "pending" its because you haven't submitted the review yet. Lmk

Ahh okay - didnt realize it behaved like that - let me know if you have a reference when reviewing across two days - e.g. submit on the second day, or submit twice so you get stuff early, or never actually open the review

AndrewSisley

Looks good to me and makes sense - quite a lot of fun functionality for fairly straight forward code :) I've added quite a few mostly minor comments that would good to be addressed though - most fall under:

Dead code removal
Comment removal
Tests
Soft-conflicts with other recent changes/findings that wont be flagged by git

It is also making me really want to merge the key changes as we seem to be building up more and more string magic and it'll keep getting harder and harder to maintain that branch until it goes in...

Also sorry for any comments that were addressed by your latest changes, I think you might have resolved a few already

core/txn.go

core/store.go

AndrewSisley · 2021-11-25T00:20:22Z

db/base/descriptions.go

 // CollectionDescription describes a Collection and
 // all its associated metadata
 type CollectionDescription struct {
 	Name    string
 	ID      uint32
 	Schema  SchemaDescription
-	Indexes []IndexDescription
+	Indexes []IndexDescription // @todo: New system reserved indexes are NEGATIVE. Maybe we need a map here


Would suggest that this quite a nasty bug waiting to happen - might be worth double checking that an issue is created on merge by the bot

Could you elaborate on this one? the negative stuff is only in relation to the Index identifiers.

Comment suggests this could be accessed by index? Indexes[x] where x might now be negative?

AndrewSisley · 2021-11-25T00:22:56Z

db/base/descriptions.go

@@ -102,6 +142,15 @@ func (sd SchemaDescription) IsEmpty() bool {
 	return false
 }

+func (sd SchemaDescription) GetFieldKey(fieldName string) uint32 {


looks like this can be private?

currently its only used within the base package, but I can see its useful for future work outside of it. So might as well keep it in for now.

nvm, duplicate as per below comment, will remove in favor of collection implementation

db/base/descriptions.go

AndrewSisley · 2021-11-25T15:33:03Z

query/graphql/planner/versionedscan.go

+// Returns true, if there is a result,
+// and false otherwise.
+func (n *versionedScanNode) Next() (bool, error) {
+	if !n.scanInitialized {


I can't spot why this is needed, and it is a bit ugly checking a near-constant value for every value (as well suggesting perhaps falsely that it is needed and that there is some non-obvious route into this function). Would suggest removal.

So it is technically required, but only because I realized there is a bug in the executor that this resolves, however it should be resolved elsewhere. When we create the plan, for some reason we don't call plan.Init() before starting the executor and calling into Next().

So we can remove this check here (albeit, its a minuscule amount of over head, would only produce a single Assembly Op from the compiler), and add the necessary top level Init call to the executor.

There is also extra overhead for any dev wanting to read the function :) And performance-wise (although I see that as trivial and unimportant here), it is an extra code branch that'll throw off any speculative execution somewhat (although if the bool is always the same value I think a lot of CPUs are good at picking up on that and assume it is that value ahead of time)

db/fetcher/versioned.go

AndrewSisley · 2021-11-25T16:00:08Z

db/fetcher/versioned.go

+
+	// get node
+	// decode the block
+	return dag.DecodeProtobuf(blk.RawData())


You might need to set the node builder here or it might be spewing out v0 cids:

nd.SetCidBuilder( cid.V1Builder{ Codec: cid.Raw, MhType: mh.SHA2_256, MhLength: -1, })

query/graphql/planner/versionedscan.go

db/fetcher/versioned.go

jsimnz · 2021-11-25T23:05:03Z

Added some minor comments as I was reading, but haven't gotten fully into it yet - will continue review tomorrow when my brain is sharper

If you made your comments as a review, but they are marked as "pending" its because you haven't submitted the review yet. Lmk

Ahh okay - didnt realize it behaved like that - let me know if you have a reference when reviewing across two days - e.g. submit on the second day, or submit twice so you get stuff early, or never actually open the review

either works fine, I guess it depends on the nature of the issue. If most of the comments/changes are independent, then you can submit multiple reviews.

…on migration

We forgot to call `plan.Init()` after `makePlan(...)` which caused the scanners to not be properly initialized as there was no propagation. This was incorrectly "fixed" without realizing it by checking if the scanners had beed initialized on each `scan.Next()` call. This worked, but added constant overhead to each call, and could be resolved by properly initializing the entire plan graph once at the beginning.

AndrewSisley · 2022-02-04T02:09:32Z

datastores/badger/v3/iterator.go

@@ -85,6 +87,8 @@ func (iterator *BadgerIterator) IteratePrefix(ctx context.Context, startPrefix d
 	formattedStartPrefix := asFormattedString(startPrefix)
 	formattedEndPrefix := asFormattedString(endPrefix)

+	iterator.prefix = formattedStartPrefix


Pretty sure this isn't used, and this is incorrect (end is unlikely to fall within the full start prefix)

Remove setting of iterator.prefix

lol I put that in there while debugging and totally forgot, yea needs to be removed

AndrewSisley · 2022-02-04T02:12:15Z

db/fetcher/fetcher.go

+		if err := df.kvResultsIter.Close(); err != nil {
+			return err
+		}
+	}
 	df.kvResultsIter = nil


Minor, but as mentioned on discord should probs move df.kvResultsIter = nil inside the if

AndrewSisley · 2022-02-04T02:14:00Z

db/fetcher/versioned.go

+	// decode cidRaw from core.Key to cid.Cid
+	// need to remove '/' prefix from the core.Key
+
+	c, err := cid.Decode(cidRaw.String()[1:])


Just as I was losing interest in the Key refactor 😁

AndrewSisley · 2022-02-04T02:14:59Z

db/fetcher/versioned.go

+	// run the DF init, VersionedFetchers only supports the Primary (0) index
+	vf.DocumentFetcher = new(DocumentFetcher)
+	return vf.DocumentFetcher.Init(col, &col.Indexes[0], fields, reverse)
+	// df.index = index


suggest removal of commented out code

AndrewSisley · 2022-02-04T02:29:10Z

query/graphql/parser/query.go

+			slct.QueryType = ScanQuery
+		}
+
+		// @todo: parse groupby


What is this todo for?

AndrewSisley · 2022-02-04T02:30:36Z

query/graphql/planner/planner.go

@@ -156,7 +156,9 @@ func (p *Planner) makePlan(stmt parser.Statement) (planNode, error) {
 	if err != nil {
 		return nil, err
 	}
-	return plan, nil
+
+	err = plan.Init()


Might be why you started getting iterator errors?

hmm, possible ill look into it

AndrewSisley · 2022-02-04T02:32:44Z

query/graphql/planner/select.go

@@ -412,3 +467,5 @@ func (p *Planner) Select(parsed *parser.Select) (planNode, error) {
 	}
 	return top, nil
 }
+
+// func (p *Planner) select(parsed *parser.Select, source planNode, render bool, fromCollection string) (....)


Suggest removal

AndrewSisley

Approved assuming you revert badger/v3/iterator and fix the linter errors (some are very important) - dont remember anything else serious

codecov · 2022-02-04T05:27:16Z

Codecov Report

Merging #59 (39862a1) into develop (9d822c3) will decrease coverage by 0.32%.
The diff coverage is 46.02%.

@@             Coverage Diff             @@
##           develop      #59      +/-   ##
===========================================
- Coverage    57.98%   57.66%   -0.33%     
===========================================
  Files           91       95       +4     
  Lines         8935     9363     +428     
===========================================
+ Hits          5181     5399     +218     
- Misses        3194     3354     +160     
- Partials       560      610      +50

Impacted Files	Coverage Δ
db/base/maker.go	`66.66% <0.00%> (-33.34%)`	⬇️
db/collection_update.go	`39.81% <0.00%> (-0.41%)`	⬇️
merkle/clock/ipld.go	`31.34% <ø> (ø)`
merkle/crdt/factory.go	`65.15% <0.00%> (-2.04%)`	⬇️
query/graphql/planner/commit.go	`83.63% <ø> (-0.15%)`	⬇️
query/graphql/planner/multi.go	`60.54% <ø> (+4.29%)`	⬆️
query/graphql/planner/update.go	`73.40% <ø> (+0.48%)`	⬆️
query/graphql/planner/versionedscan.go	`0.00% <0.00%> (ø)`
store/multi.go	`0.00% <0.00%> (ø)`
db/fetcher/dag.go	`56.81% <33.33%> (-2.68%)`	⬇️
... and 32 more

…#59) * Implemented versionFetcher * Update Fetcher interface * Created Txn utility * Added versionedScan to query planner * Added tests for versionFetcher

…sourcenetwork#59) * Implemented versionFetcher * Update Fetcher interface * Created Txn utility * Added versionedScan to query planner * Added tests for versionFetcher

jsimnz marked this pull request as draft November 24, 2021 21:23

jsimnz linked an issue Nov 24, 2021 that may be closed by this pull request

TimeTraveling Queries #58

Closed

jsimnz added this to the DefraDB v0.2 milestone Nov 24, 2021

jsimnz added area/query Related to the query component feature New feature or request labels Nov 24, 2021

jsimnz requested review from AndrewSisley and shahzadlone November 24, 2021 21:35

jsimnz marked this pull request as ready for review November 25, 2021 02:09

jsimnz force-pushed the jsimnz/feat/history-traverser branch from 53b61ea to 1025a9e Compare November 25, 2021 11:10

jsimnz force-pushed the jsimnz/feat/history-traverser branch from 1025a9e to d98da05 Compare November 25, 2021 11:17

AndrewSisley requested changes Nov 25, 2021

View reviewed changes

jsimnz added 12 commits February 3, 2022 16:52

added explainer diagrams/talks

05d857f

Explainer comments for typeJoin

be4308e

Added comments for explainer code tour

8f77d85

added comments/explainers

3fc0f5d

Merged datastore context upgrade into VersionedFetcher system

7dee202

Fixed outstanding merge conflicts

fc78927

Fixed version fetcher tests which failed as a result of the CID versi…

178dd1b

…on migration

Dead/Legacy/Commented code cleanup

baa48ff

fixed rebase compilation issues

5b27b6e

Added Txn utility to create txns from any given store

3d5d53c

Fixed dangling iterator bug on HeadScan/Fetcher

c3af989

jsimnz force-pushed the jsimnz/feat/history-traverser branch from 5ac6247 to c3af989 Compare February 4, 2022 01:00

Fixed more dangling iterator issues on the Fetcher

0a3c6dc

AndrewSisley reviewed Feb 4, 2022

View reviewed changes

AndrewSisley approved these changes Feb 4, 2022

View reviewed changes

jsimnz added 2 commits February 3, 2022 23:11

Implemented changes/fixes from review

ec7b6fe

Fixed versionedFetcher tests, added new Head Setter/Getter to Document

729757b

Fixed linter issues

39862a1

jsimnz merged commit 2a70c28 into develop Feb 4, 2022

jsimnz deleted the jsimnz/feat/history-traverser branch February 4, 2022 05:42

feat: TimeTraveling (History Traversing) query engine and doc fetcher #59

feat: TimeTraveling (History Traversing) query engine and doc fetcher #59

Conversation

jsimnz commented Nov 24, 2021

Architecture

todo bot commented Nov 24, 2021

New system reserved indexes are NEGATIVE. Maybe we need a map here

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

Support multiple spans

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

Support multiple spans

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

Find an effecient way to determine if a CID is a member of a

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

set slice size

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

Right now we ONLY handle LWW_REGISTER, need to swith on this and get CType from descriptions

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

Update with document schemas

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

Remove NodeGed Wtter as a paramter, and move it to a MerkleClock field

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

FIX THIS

This comment was generated by todo based on a TODO comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

Abstract schema definitions to CORE

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

handle error if no CID is given

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

todo bot commented Nov 24, 2021

When running the optimizer, check if the filter object

This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

jsimnz commented Nov 24, 2021

AndrewSisley commented Nov 25, 2021

jsimnz commented Nov 25, 2021

todo bot commented Nov 25, 2021

parse groupby

This comment was generated by todo based on a todo comment in 1025a9e in #59. cc @sourcenetwork.

jsimnz commented Nov 25, 2021

AndrewSisley commented Nov 25, 2021

AndrewSisley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsimnz commented Nov 25, 2021

AndrewSisley Feb 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 4, 2022 • edited Loading

Codecov Report

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `TODO` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `53b61ea` in #59. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `1025a9e` in #59. cc @sourcenetwork.

AndrewSisley Feb 4, 2022 •

edited

Loading

codecov bot commented Feb 4, 2022 •

edited

Loading