-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for top level aggregates #594
Conversation
76c2b69
to
e896e05
Compare
@@ -223,6 +231,25 @@ func (p *Planner) expandPlan(plan planNode, parentPlan *selectTopNode) error { | |||
case *deleteNode: | |||
return p.expandPlan(n.source, parentPlan) | |||
|
|||
case *topLevelNode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: now we will have a topLevelNode
for these top-level aggregates which will wrap the selectTopNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The select top node is essentially a join (on all), similar to how aggregates behave in all other cases - most of that code hasn't changed.
Pre-render structure is roughly:
{
count: 2,
users: [{...},{...}],
}
query/graphql/planner/operations.go
Outdated
@@ -35,6 +35,7 @@ var ( | |||
_ planNode = (*typeJoinOne)(nil) | |||
_ planNode = (*updateNode)(nil) | |||
_ planNode = (*valuesNode)(nil) | |||
_ planNode = (*topLevelNode)(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Does this new node need to be explainable? I think it should be (can be out of this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very much agree that that is out of scope, will add to description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I think the entire list was alphabetically sorted (pre-sort-to-order-node PR) maybe if you can sort this as following, will make my ocd happy.
_ planNode = (*averageNode)(nil)
_ planNode = (*commitSelectNode)(nil)
_ planNode = (*commitSelectTopNode)(nil)
_ planNode = (*countNode)(nil)
_ planNode = (*createNode)(nil)
_ planNode = (*dagScanNode)(nil)
_ planNode = (*deleteNode)(nil)
_ planNode = (*groupNode)(nil)
_ planNode = (*hardLimitNode)(nil)
_ planNode = (*headsetScanNode)(nil)
_ planNode = (*multiScanNode)(nil)
_ planNode = (*orderNode)(nil)
_ planNode = (*parallelNode)(nil)
_ planNode = (*pipeNode)(nil)
_ planNode = (*renderLimitNode)(nil)
_ planNode = (*scanNode)(nil)
_ planNode = (*selectNode)(nil)
_ planNode = (*selectTopNode)(nil)
_ planNode = (*sumNode)(nil)
_ planNode = (*topLevelNode)(nil)
_ planNode = (*typeIndexJoin)(nil)
_ planNode = (*typeJoinMany)(nil)
_ planNode = (*typeJoinOne)(nil)
_ planNode = (*updateNode)(nil)
_ planNode = (*valuesNode)(nil)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol.... maybe 🤣
) | ||
|
||
// topLevelNode is a special node that represents the very top of the | ||
// plan graph. It has no source, and will only yield a single item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: this represents the very top of the plan graph which contains top level aggregates only right? or for every plan we will have this? Also if it is the top level why would it not have source (linking to the rest of the plangraph, aren't the children the source)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just aggregates atm, this is noted in the description under stuff I consider out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
children are disinct from source. Source is typically what nodes iterate through, which is nil in this case and distinct from children.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want Source()
to return children (even if in other PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but I dont see that as a question to be answered in this PR. My gut says nil is better as the children are not really the source (as mentioned, they do not yield items in the same sense as other nodes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beauty of a PR + tests are on point! Made some comments and asked some questions. I did a round 1 review of sorts so far, but I feel somewhat unqualified to review generate.go
stuff. I will give a more detailed look when I wake up.
@@ -433,13 +439,39 @@ func getRequestables( | |||
return | |||
} | |||
|
|||
func getAggregateRequests(index int, aggregate *parser.Select) (aggregateRequest, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: A line of comment here would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how much new info I could add that isn't in the func signature, but will have a look
- doc getAggregateRequests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving as is
query/graphql/planner/operations.go
Outdated
@@ -35,6 +35,7 @@ var ( | |||
_ planNode = (*typeJoinOne)(nil) | |||
_ planNode = (*updateNode)(nil) | |||
_ planNode = (*valuesNode)(nil) | |||
_ planNode = (*topLevelNode)(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I think the entire list was alphabetically sorted (pre-sort-to-order-node PR) maybe if you can sort this as following, will make my ocd happy.
_ planNode = (*averageNode)(nil)
_ planNode = (*commitSelectNode)(nil)
_ planNode = (*commitSelectTopNode)(nil)
_ planNode = (*countNode)(nil)
_ planNode = (*createNode)(nil)
_ planNode = (*dagScanNode)(nil)
_ planNode = (*deleteNode)(nil)
_ planNode = (*groupNode)(nil)
_ planNode = (*hardLimitNode)(nil)
_ planNode = (*headsetScanNode)(nil)
_ planNode = (*multiScanNode)(nil)
_ planNode = (*orderNode)(nil)
_ planNode = (*parallelNode)(nil)
_ planNode = (*pipeNode)(nil)
_ planNode = (*renderLimitNode)(nil)
_ planNode = (*scanNode)(nil)
_ planNode = (*selectNode)(nil)
_ planNode = (*selectTopNode)(nil)
_ planNode = (*sumNode)(nil)
_ planNode = (*topLevelNode)(nil)
_ planNode = (*typeIndexJoin)(nil)
_ planNode = (*typeJoinMany)(nil)
_ planNode = (*typeJoinOne)(nil)
_ planNode = (*updateNode)(nil)
_ planNode = (*valuesNode)(nil)
query/graphql/planner/planner.go
Outdated
if _, isSelect := child.(*selectTopNode); isSelect { | ||
// We only care about expanding the child source here, it is assumed that the parent source | ||
// is expanded elsewhere/already | ||
err := p.expandPlan(child, parentPlan) | ||
if err != nil { | ||
return err | ||
} | ||
} else { | ||
switch c := child.(type) { | ||
case aggregateNode: | ||
// top-level aggregates use the top-level node as a source | ||
c.SetPlan(n) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Would the following not do the same exact thing ?
if _, isSelect := child.(*selectTopNode); isSelect { | |
// We only care about expanding the child source here, it is assumed that the parent source | |
// is expanded elsewhere/already | |
err := p.expandPlan(child, parentPlan) | |
if err != nil { | |
return err | |
} | |
} else { | |
switch c := child.(type) { | |
case aggregateNode: | |
// top-level aggregates use the top-level node as a source | |
c.SetPlan(n) | |
} | |
} | |
switch c := child.(type) { | |
case aggregateNode: | |
// top-level aggregates use the top-level node as a source | |
c.SetPlan(n) | |
case *selectTopNode: | |
// We only care about expanding the child source here, | |
// it is assumed that the parent source is expanded elsewhere/already. | |
err := p.expandPlan(child, parentPlan) | |
if err != nil { | |
return err | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, my bad - thanks, will change :D
- cleanup idiot code
query/graphql/schema/generate.go
Outdated
expandedField := &gql.InputObjectFieldConfig{ | ||
Type: g.manager.schema.TypeMap()[name+"FilterArg"], | ||
} | ||
aggregateTarget.Type.(*gql.InputObject).AddFieldConfig("filter", expandedField) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregateTarget.Type.(*gql.InputObject).AddFieldConfig("filter", expandedField) | |
aggregateTarget.Type.(*gql.InputObject).AddFieldConfig(parserTypes.FilterClause, expandedField) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers, will do
- const
query/graphql/schema/generate.go
Outdated
return err | ||
} | ||
|
||
objs = g.genCountInlineArrayInputs(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why do we need mutation
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I can't tell what you mean here, could you expand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my bad, I was wondering why we need to re-assign to objs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didnt spot that cheers. will renaming it quickly
- rename
Fields: gql.InputObjectConfigFieldMap{ | ||
"_": &gql.InputObjectFieldConfig{ | ||
Type: gql.Int, | ||
Description: "Placeholder - empty object not permitted, but will have fields shortly", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Do we want a const
string for this as it is used in a few places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, content here doesnt really matter
e896e05
to
1b73854
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a LGTM, but feel like I wasn't qualified for some of it haha. Can do a second round if not in a rush once I wake up. Perhaps @jsimnz 's second opinion might be useful here?
01bc421
to
85bac3a
Compare
Ahh, its just code, nothing fancy. Will leave it hanging around for a couple more hours in case anyone else wants to chime in, but I have future work dependent on this and there is nothing here IMO that could have a significant impact on the health of the codebase |
// plan graph. It has no source, and will only yield a single item | ||
// containing all of its children. | ||
type topLevelNode struct { | ||
documentIterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought (non-blocking): I look forward to see this renamed to something more appropriate 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I think I went over it 3 times and quite pleased with it. The only thing that I find a little weird is the isInRecurse
with the defer
switching from true
to false
. But I can't think of anything better at the moment so I think I'll let it go for now :)
query/graphql/planner/top.go
Outdated
} | ||
n.currentValue.Fields[n.childIndexes[i]] = docs | ||
default: | ||
_, err := child.Next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why do you ignore hasChild
here but not above? What if the child value is nil
? Not saying it's wrong. But maybe a comment would help clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a comment - an explicit nil is desirable - it means whatever the user has requested is nil (current a nil is not possible anyway, as atm this can only be an aggregate, which never returns nil). Although having written that I think it would be better to check the bool and explicitly assign nil (even though it would technically be dead code atm).
- tweak this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
went with a comment after remembering more as to why this is like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had this also as a pending comment, but moving it here for consistency. I still strongly suggest to keep the bool check in place, regardless of the strongest mental gurantees we think are in place, we should never call Value()
without guaranteeing from the previous Next()
that the node indeed has a value.
And it costs literally nothing to have the bool check in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure will change - it will mean introducing dead code though (to handle !hasValue)
- revise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't characterize it as "dead" code. As its still enforces the expected guarantees required, so its more of a safety net.
85bac3a
to
a393d66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitting my comments now, still investigating the recursive structure of topLevelNode
. Generate looks good!
@@ -61,7 +60,6 @@ func (p *Planner) Sum( | |||
|
|||
// Returns true if the value to be summed is a float, otherwise false. | |||
func (p *Planner) isValueFloat( | |||
parentDescription *client.CollectionDescription, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: Outside of this PR, it would be great if we look into the generalization of that descriptions repo cache you implemented for the mapper, make it more generic to be used by any part of the codebase, then can be embedded within the planner or mapper systems respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you support this, as I had that in mind when writing that
// This node's children may use this node as a source | ||
// this property controls the recursive flow preventing | ||
// infinate loops. | ||
isInRecurse bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Technically this breaks the guarantees of the plan graph being a DAG (Directed Acyclical Graph).
I'm not sure I fully understand the reason for having something called "topLevelNode" that can exist not at the top of the graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Technically this breaks the guarantees of the plan graph being a DAG (Directed Acyclical Graph).
For future reference: We spoke over discord about this
I'm not sure I fully understand the reason for having something called "topLevelNode" that can exist not at the top of the graph.
It does and can only exist at the top of the graph (ignoring recursive element). Making it always there is mentioned in the out-of-scope section of the PR description (item 2)
if n.isInRecurse { | ||
return | ||
} | ||
n.isInRecurse = true | ||
defer func() { | ||
n.isInRecurse = false | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This feels funky to me, and not sure i've seen something like this used in the context of recursion, makes me feel like there is an issue with the control flow of the code. Also is a result of the DAG violation I mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noted the same concern above. But I can't think of anything clever at the moment to help us with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be possible to remove the recursion by adding a second new node to handle the select/join stuff, but IMO it is not worth it at the moment and is easy enough to do when expanding the feature (for explain, or multiple top-level stuff)
if n.isdone { | ||
return false, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This would prevent the node from being re-used as is. Since the isDone
var isn't reset during the Init
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about adding that in, but decided against it as it is untestable at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's testable, just not through integration tests. Which i guess inherently makes it a moot-ish point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO if a consumer cant hit a line of code it is dead and shouldn't really be tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the other comment about !hasValue
bool check. Its more about enforcing expected invariants that the planner / plangraph are supposed to have.
Some notable ones come to mind
- plan graph is a DAG
- Only call
Value()
after a succesful true call toNext()
Init()
fully re-initializes the node in question (and sub nodes)
This particular comment relates the 3. I can certainly imagine there are other violations of these invariants throughout the code which will surface through additional testing, but atm this is identified here and now.
This is about ensuring that future consumer/callers of the plan graph can develop knowing certain invariants are maintained, so they don't have to go down a debugging rabbit hole only to learn a single random node doesn't properly re-initialize, or that walking the graph, expecting it to be acyclical, randomly OOMs itself because there is a circular reference somewhere.
This kind of mindset will save time/headbanging in the future. Obviously, the priority is to make sure the requirements right now are met, but not explicitly at the expense of future devs.
At the very least there are always two developers that need to interact with your code. You and Future You.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future me should test any code he uses - if for some odd reason a piece of functionality inits a top level node twice it should 100% add tests that cover that. I find it much safer to assume that any code that isn't tested is broken, than to assume that untested code is correct.
Regarding this line, unless unit tested (which is a whole other topic) - there is no way to assert that n.isDone = false
is correct (and useful - something that unit tests cant really assert very well).
RE (1) - there is nothing that asserts that this node forms part of a DAG is true or necessary. The same can be said of (2) - defra users dont give a monkey's about that.
(Generally) the only things I really care about in code is:
- Does it provide provably correct behaviour to externals (externals can be internal devs in some cases)
- Is it easy to change (this includes readability, cohesion, simplicity, etc)
As mentioned, the known alternative to the recursion is to introduce a second new node to handle the selects etc, this IMO would be less cohessive and more complex whilst providing no additional functionality in the present. It is not provable that it will be useful for future requirements (e.g. explain/multi-top-queries) as they have not been implemented yet, and it could end up being counter productive there costing additional time to replace with something that is provably useful.
query/graphql/planner/top.go
Outdated
} | ||
n.currentValue.Fields[n.childIndexes[i]] = docs | ||
default: | ||
_, err := child.Next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had this also as a pending comment, but moving it here for consistency. I still strongly suggest to keep the bool check in place, regardless of the strongest mental gurantees we think are in place, we should never call Value()
without guaranteeing from the previous Next()
that the node indeed has a value.
And it costs literally nothing to have the bool check in place.
d402aa1
to
3c8d752
Compare
Decouple them from host, cleaner for user, and allows reuse for top-level aggs
Minimal cost in re-getting, and makes it easier to call from other locations
Switch will also gain a new case shortly
Whilst they should have the same value at the moment, the disinction between the two becomes more important when introducing top-level aggregates
Will be called multiple times once top-level aggregates are introduced
93b4ba3
to
66ae0a4
Compare
Is a bug in the parser/query.go logic for aggregates that is breaking things when using a filter on a top-level-agg Update: Was some legacy code hanging around, deleting it solved the problem |
bc2612f
to
a2f7807
Compare
This has been incorrect for a while, and will cause problems for top-level aggregates
a2f7807
to
7ed7bc1
Compare
Any controversy appears to be very localised and this is blocking many of my remaining 0.3 tasks. Happy to continue the discussion post merge
* Rework count input objects Decouple them from host, cleaner for user, and allows reuse for top-level aggs * Remove sourceInfo param from Sum Minimal cost in re-getting, and makes it easier to call from other locations * Use switch instead of if for type check Switch will also gain a new case shortly * Remove unused type from createExpandedFieldAggregate * Use correct collection name Whilst they should have the same value at the moment, the disinction between the two becomes more important when introducing top-level aggregates * Extract out aggregate request logic to function Will be called multiple times once top-level aggregates are introduced * Remove legacy code This has been incorrect for a while, and will cause problems for top-level aggregates * Add support for top level aggregates
Relevant issue(s)
Resolves #98
Description
Adds support for top-level aggregates, allowing consumers to aggregate across entire collections.
I consider the following desirable, but out of scope:
Tasks
How has this been tested?
Manual type checking in the Altair client, plus int. tests.
Specify the platform(s) on which this was tested: