feat: Add basic group by functionality #43

AndrewSisley · 2021-11-16T17:02:03Z

Adds basic group by functionality to defraDb. Closes #25 Closes #46

Note for reviewers:

This does not add support for joins within child groups, although the gql clients may suggest that it does (see https://github.com/sourcenetwork/defradb/tree/sisley/group-by-child-join with failing test and partial (and very WIP implementation)), please let me know your thoughts on whether this can wait or not (Group By: Support joins within _group items #46) Fixed in Group By: Support joins within _group items #46 and cherry-picked into here, see commits (Handle pipe nodes in addSubPlan and Add support for joins within groups)
Adding non-grouped fields at the parent query level is permitted by gql, but the behaviour is undefined (will just render the last value); e.g. user (groupBy: [Age]) {Age, Name}
Commits are an absolute mess at the moment and I will clean them up once we decide that the functionality and implementation are alright to merge. They will be squashed, and then possibly re-commited in a sensible fashion. Please review all changes at once via the Files changed.
Aggregates are not added here

To do:

Clean up commits once people are mostly happy with implementation (previous perma-stashed in sisley/group-by-pre-squash)
Cleanup any other 'perma-stash' remote branches
Proper godocs for node(s)
Decide whether to break out the auxiliary components (such as pipeNode) from the group.go file (I'm really in two minds here, input very very welcome)
One-to-many join tests are flacky in this branch - develop doesn't seem to have this issue - find and fix
Squash fixup commit and inner join commits

todo · 2021-11-16T17:02:08Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in 5949c8b

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `5949c8b` in #43. cc @sourcenetwork.

todo · 2021-11-16T17:02:10Z

@ handle error

defradb/query/graphql/schema/generate.go

Lines 327 to 332 in 5949c8b

    
           	//todo@ handle error 
        
           } 
        
           fields[parser.GroupFieldName] = &gql.Field{ 
        
           	Type: gql.NewList(gqlType), 
        
           }

This comment was generated by todo based on a `todo` comment in `5949c8b` in #43. cc @sourcenetwork.

AndrewSisley · 2021-11-16T19:56:32Z

More info on the flacky tests:

Only one-many are affected (incl. non-group tests)
error surfaces ~ln90 in executor.go from the call gql.ValidateDocument(schema, ast, nil)
occurs ~ 1 in 10 times an affected test runs
spamming db.SchemaManager().ResolveTypes() or gqlObject.Fields() appears to have no effect on the rate of failure
schema.TypeMap()["author"].(*gql.Object).Fields()["published"].Args is empty when tests fail (~ln 90, executor.go), and populated when they pass

Once the go cache decides it has passed, it can be really hard to make it fail again so clean the cache first (go clean -testcache).

Lazy command for me to reproduce (may take a few runs, warning - contains local aliases if you are not me):
go.t.c && go.t.r TestQueryOneToManyWithNumericGreaterThanFilterOnParentAndChild

UPDATE: Issue fixed - see commit "Track expanded items by object and name" - issue may have been affecting production (if object has more than one parent record - e.g. book with an author and a publisher - might be worth adding tests for outside of this PR)

AndrewSisley · 2021-11-16T20:13:01Z

this is incorrect for joins within _group collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in 5949c8b

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible

// Find the first scan node in the plan, we assume that it will be for the correct collection

scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)

// Check for any existing pipe nodes in the plan, we should use it if there is one

pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a todo comment in 5949c8b in #43. cc @sourcenetwork.

Issue #46

todo · 2021-11-16T21:50:27Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in fb55cd3

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `fb55cd3` in #43. cc @sourcenetwork.

todo · 2021-11-16T21:52:50Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in 7fe6c2b

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `7fe6c2b` in #43. cc @sourcenetwork.

todo · 2021-11-16T22:05:45Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in b97a202

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `b97a202` in #43. cc @sourcenetwork.

todo · 2021-11-16T22:11:04Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in 26d25ec

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `26d25ec` in #43. cc @sourcenetwork.

todo · 2021-11-16T22:12:40Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in 6a66e1a

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `6a66e1a` in #43. cc @sourcenetwork.

todo · 2021-11-16T22:13:17Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in 3a24a75

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `3a24a75` in #43. cc @sourcenetwork.

todo · 2021-11-17T16:09:26Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in 7423a89

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `7423a89` in #43. cc @sourcenetwork.

todo · 2021-11-17T17:24:38Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in fc4d2e8

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `fc4d2e8` in #43. cc @sourcenetwork.

todo · 2021-11-17T17:31:00Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in 605e732

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `605e732` in #43. cc @sourcenetwork.

todo · 2021-11-17T17:32:54Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in b6741dc

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `b6741dc` in #43. cc @sourcenetwork.

todo · 2021-11-17T17:50:18Z

this is incorrect for joins within `_group` collections, and should be corrected when possible

defradb/query/graphql/planner/planner.go

Lines 253 to 258 in 82bf22e

    
           // @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible 
        
           // Find the first scan node in the plan, we assume that it will be for the correct collection 
        
           scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode) 
        
           // Check for any existing pipe nodes in the plan, we should use it if there is one 
        
           pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a `todo` comment in `82bf22e` in #43. cc @sourcenetwork.

query/graphql/planner/planner.go

Tracking by object only means that arguements on child objects only get added to the one field if more than one property requires them

'f' is used everywhere else in the file

Allows a lazy-loaded cache with arbitrary reads

Note, does not yet fully implement the planNode interface and is only used within the groupNode

Test currently fails, needs more work plus cleanup. Stashing here and focusing on the main feature.

jsimnz

All looks good to me, is there any final things you wanted/needed to do before this gets merged?

Is this PR also fixing those OneToMany test cases that we're randomly failing? I see you added some changes to type generation stuff.

This is one of those things that might need an engineering note (or section in the DB Tech Spec) so we have a brief overview of the implementation, as well as ideas/thoughts to extend it with aggregates if you're open to writing that.

AndrewSisley · 2021-11-22T21:24:22Z

All looks good to me, is there any final things you wanted/needed to do before this gets merged?

Is this PR also fixing those OneToMany test cases that we're randomly failing? I see you added some changes to type generation stuff.

This is one of those things that might need an engineering note (or section in the DB Tech Spec) so we have a brief overview of the implementation, as well as ideas/thoughts to extend it with aggregates if you're open to writing that.

Test flackiness was fixed with commit "Track expanded items by object and name" - it might have been a production issue, but I haven't verified that.

Not before it gets merged, but I would be interested in benchmarking this with large (+1,000,000) keys at somepoint - I don't know about the Go implementation but most normal map implementations can get quite slow with large numbers of records and it might be worth optimizing that at somepoint.

Happy to look at adding stuff to the tech spec - will do before I pick up a new task

jsimnz

👍

jsimnz · 2021-11-22T21:36:13Z

I believe Go's map implementation holds up pretty well, but we can certainly keep an eye on it

AndrewSisley · 2021-11-22T23:45:36Z

Added a short section to the tech spec under relationships

* Remove commented out test code * Track expanded items by object and name Tracking by object only means that arguements on child objects only get added to the one field if more than one property requires them * Rename variable 'f' is used everywhere else in the file * Add pipe node to defra db Allows a lazy-loaded cache with arbitrary reads * Add data-source/arbitrary join node Note, does not yet fully implement the planNode interface and is only used within the groupNode * Add group-by functionality to defra db * Add support for joins within groups Test currently fails, needs more work plus cleanup. Stashing here and focusing on the main feature. * Handle pipe nodes in addSubPlan * FIXUP - Move child group field propogation into p.GroupBy

AndrewSisley added feature New feature or request area/query Related to the query component labels Nov 16, 2021

AndrewSisley requested review from jsimnz and shahzadlone November 16, 2021 17:02

AndrewSisley self-assigned this Nov 16, 2021

jsimnz linked an issue Nov 16, 2021 that may be closed by this pull request

Query GroupBy operation #25

Closed

AndrewSisley force-pushed the sisley/group-by branch from fb55cd3 to 7fe6c2b Compare November 16, 2021 21:52

AndrewSisley changed the title ~~Add basic group by functionality~~ feat: Add basic group by functionality Nov 16, 2021

AndrewSisley force-pushed the sisley/group-by branch from b97a202 to 26d25ec Compare November 16, 2021 22:11

AndrewSisley force-pushed the sisley/group-by branch from 26d25ec to 6a66e1a Compare November 16, 2021 22:12

AndrewSisley force-pushed the sisley/group-by branch from 6a66e1a to 3a24a75 Compare November 16, 2021 22:13

AndrewSisley force-pushed the sisley/group-by branch from 3a24a75 to 7423a89 Compare November 17, 2021 16:09

AndrewSisley force-pushed the sisley/group-by branch from 7423a89 to fc4d2e8 Compare November 17, 2021 17:24

AndrewSisley force-pushed the sisley/group-by branch from fc4d2e8 to 605e732 Compare November 17, 2021 17:30

AndrewSisley force-pushed the sisley/group-by branch from 605e732 to b6741dc Compare November 17, 2021 17:32

AndrewSisley force-pushed the sisley/group-by branch from b6741dc to 82bf22e Compare November 17, 2021 17:50

AndrewSisley linked an issue Nov 17, 2021 that may be closed by this pull request

Group By: Support joins within _group items #46

Closed

jsimnz requested changes Nov 19, 2021

View reviewed changes

query/graphql/planner/planner.go Show resolved Hide resolved

AndrewSisley added 9 commits November 19, 2021 11:23

Remove commented out test code

bcd31f3

Track expanded items by object and name

31b3448

Tracking by object only means that arguements on child objects only get added to the one field if more than one property requires them

Rename variable

fb188d8

'f' is used everywhere else in the file

Add pipe node to defra db

807e17d

Allows a lazy-loaded cache with arbitrary reads

Add data-source/arbitrary join node

507974e

Note, does not yet fully implement the planNode interface and is only used within the groupNode

Add group-by functionality to defra db

ae2d803

Add support for joins within groups

16b43e3

Test currently fails, needs more work plus cleanup. Stashing here and focusing on the main feature.

Handle pipe nodes in addSubPlan

dc730a1

FIXUP - Move child group field propogation into p.GroupBy

69f2633

AndrewSisley force-pushed the sisley/group-by branch from 8e8206a to 69f2633 Compare November 19, 2021 16:23

AndrewSisley requested a review from jsimnz November 19, 2021 16:25

jsimnz reviewed Nov 22, 2021

View reviewed changes

jsimnz approved these changes Nov 22, 2021

View reviewed changes

jsimnz merged commit d717b8c into develop Nov 22, 2021

todo bot mentioned this pull request Nov 22, 2021

Error handling for GQL Object fieldThunk functions #53

Closed

jsimnz added the sred/merged SR&ED activity: Merged label Nov 22, 2021

AndrewSisley deleted the sisley/group-by branch November 22, 2021 23:47

This was referenced Dec 2, 2021

Group By: Support joins within _group items #46

Closed

Query GroupBy operation #25

Closed

orpheuslummis mentioned this pull request Jan 29, 2022

Changelog creation #151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add basic group by functionality #43

feat: Add basic group by functionality #43

AndrewSisley commented Nov 16, 2021 •

edited

Loading

todo bot commented Nov 16, 2021

todo bot commented Nov 16, 2021

AndrewSisley commented Nov 16, 2021 •

edited

Loading

AndrewSisley commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a `todo` comment in 5949c8b in #43. cc @sourcenetwork.

todo bot commented Nov 16, 2021

todo bot commented Nov 16, 2021

todo bot commented Nov 16, 2021

todo bot commented Nov 16, 2021

todo bot commented Nov 16, 2021

todo bot commented Nov 16, 2021

todo bot commented Nov 17, 2021

todo bot commented Nov 17, 2021

todo bot commented Nov 17, 2021

todo bot commented Nov 17, 2021

todo bot commented Nov 17, 2021

jsimnz left a comment

AndrewSisley commented Nov 22, 2021 •

edited

Loading

jsimnz left a comment

jsimnz commented Nov 22, 2021

AndrewSisley commented Nov 22, 2021

feat: Add basic group by functionality #43

feat: Add basic group by functionality #43

Conversation

AndrewSisley commented Nov 16, 2021 • edited Loading

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in 5949c8b in #43. cc @sourcenetwork.

todo bot commented Nov 16, 2021

@ handle error

This comment was generated by todo based on a todo comment in 5949c8b in #43. cc @sourcenetwork.

AndrewSisley commented Nov 16, 2021 • edited Loading

AndrewSisley commented Nov 16, 2021

this is incorrect for joins within _group collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in 5949c8b in #43. cc @sourcenetwork.

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in fb55cd3 in #43. cc @sourcenetwork.

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in 7fe6c2b in #43. cc @sourcenetwork.

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in b97a202 in #43. cc @sourcenetwork.

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in 26d25ec in #43. cc @sourcenetwork.

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in 6a66e1a in #43. cc @sourcenetwork.

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in 3a24a75 in #43. cc @sourcenetwork.

todo bot commented Nov 17, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in 7423a89 in #43. cc @sourcenetwork.

todo bot commented Nov 17, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in fc4d2e8 in #43. cc @sourcenetwork.

todo bot commented Nov 17, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in 605e732 in #43. cc @sourcenetwork.

todo bot commented Nov 17, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in b6741dc in #43. cc @sourcenetwork.

todo bot commented Nov 17, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a todo comment in 82bf22e in #43. cc @sourcenetwork.

jsimnz left a comment

Choose a reason for hiding this comment

AndrewSisley commented Nov 22, 2021 • edited Loading

jsimnz left a comment

Choose a reason for hiding this comment

jsimnz commented Nov 22, 2021

AndrewSisley commented Nov 22, 2021

AndrewSisley commented Nov 16, 2021 •

edited

Loading

This comment was generated by todo based on a `todo` comment in `5949c8b` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `5949c8b` in #43. cc @sourcenetwork.

AndrewSisley commented Nov 16, 2021 •

edited

Loading

this is incorrect for joins within `_group` collections, and should be corrected when possible

This comment was generated by todo based on a `todo` comment in 5949c8b in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `fb55cd3` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `7fe6c2b` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `b97a202` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `26d25ec` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `6a66e1a` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `3a24a75` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `7423a89` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `fc4d2e8` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `605e732` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `b6741dc` in #43. cc @sourcenetwork.

This comment was generated by todo based on a `todo` comment in `82bf22e` in #43. cc @sourcenetwork.

AndrewSisley commented Nov 22, 2021 •

edited

Loading