Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add sort support for numeric aggregates #786

Merged
merged 5 commits into from
Sep 15, 2022

Conversation

AndrewSisley
Copy link
Contributor

Relevant issue(s)

Resolves #381

Description

Adds sort support for numeric aggregates. Does not add it for count, as the sort order cannot impact the count result. Also adds a Sort enumerable type.

Will suffer from the same issue as #588 when ordering via a related item (unless matching join is rendered), I'm noting this in that issue so that they can be fixed together.

How has this been tested?

(replace) Describe the tests performed to verify the changes. Provide instructions to reproduce them.

Specify the platform(s) on which this was tested:

  • Debian Linux

@AndrewSisley AndrewSisley added feature New feature or request area/query Related to the query component area/schema Related to the schema system action/no-benchmark Skips the action that runs the benchmark. labels Sep 9, 2022
@AndrewSisley AndrewSisley added this to the DefraDB v0.3.1 milestone Sep 9, 2022
@AndrewSisley AndrewSisley requested a review from a team September 9, 2022 19:39
@AndrewSisley AndrewSisley self-assigned this Sep 9, 2022
@codecov
Copy link

codecov bot commented Sep 9, 2022

Codecov Report

Merging #786 (926c38e) into develop (0ad47d5) will increase coverage by 0.25%.
The diff coverage is 88.04%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #786      +/-   ##
===========================================
+ Coverage    59.28%   59.53%   +0.25%     
===========================================
  Files          153      154       +1     
  Lines        17035    17181     +146     
===========================================
+ Hits         10099    10229     +130     
- Misses        6019     6030      +11     
- Partials       917      922       +5     
Impacted Files Coverage Δ
query/graphql/schema/generate.go 82.58% <70.00%> (+0.14%) ⬆️
query/graphql/mapper/mapper.go 85.21% <83.75%> (-0.33%) ⬇️
core/enumerable/sort.go 92.50% <92.50%> (ø)
query/graphql/planner/sum.go 87.33% <100.00%> (+1.76%) ⬆️

type enumerableSort[T any] struct {
source Enumerable[T]
less func(T, T) bool
capacity int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Do you actually mean capacity here or length? The Sort function would take capacity as the of len(source), and then Next() would reserve the capacity for the result to that length (using the capacity, which is understandable).

suggestion: Removal, if len(source) is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of source is unknown during enumeration (due to filters, limits etc) - it is a capacity (a fairly generous capacity potentially).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is capacity not always == len(source) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do len(source). Enumerable[T] is an interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interface, and the result of it's enumeration would be unknown until it has been enumerated - would need to use the Count function instead

query/graphql/mapper/mapper.go Show resolved Hide resolved
Comment on lines +1055 to +1100
// The order in which items should be aggregated. Affects results when used with
// limit. Optional.
order *parserTypes.OrderBy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Outside the scope of this PR, but maybe we can organize these types in one place. Right now Limit parser.Filter and parserTypes.OrderBy are in 3 different places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partially agreed :) is a bit odd order and filter are in different places, but they are different from limit as they have to be parsed and we use the same parsing logic as the parser here. I have a vague memory of thinking it possible to remove that, but it has since left my head (may be noted in a comment somewhere)

Comment on lines +990 to +1059
if other == nil {
return o == nil
}

if len(o.Conditions) != len(other.Conditions) {
return false
}

for i, conditionA := range o.Conditions {
conditionB := other.Conditions[i]
if conditionA.Direction != conditionB.Direction {
return false
}

if len(conditionA.FieldIndexes) != len(conditionB.FieldIndexes) {
return false
}

for j, fieldIndexA := range conditionA.FieldIndexes {
fieldIndexB := conditionB.FieldIndexes[j]
if fieldIndexA != fieldIndexB {
return false
}
}
}

return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Perhaps some coverage for these lines as there is no test coverage at the moment.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers, I'll add a test or two

  • Add tests for order matching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 1005 is dead at the moment due to #588

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than the 1005 block are rest being hit now? (idk why codecov doesn't show them as hitting)

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm context switching too much, and this is performance-only code (no tests will fail). I missed a test case for sure, but there is an existing that should cover more than this - will look more into this

  • Why is rendered test not hitting most of this
  • Rendered or average test with orderby different field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^Might be related to Shahzad's current ticket, leave this for a few hours and see how he gets on fixing it as it might be quick

Copy link
Member

@shahzadlone shahzadlone Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^Might be related to Shahzad's current ticket, leave this for a few hours and see how he gets on fixing it as it might be quick

I was able to do the fix. 1st and 2nd commit in https://github.com/sourcenetwork/defradb/pull/774/commits, but unsure if that helps this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorrted

Comment on lines +129 to +131
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: So all or nothing kind of deal. Even if one errors out we just return error. Do you know how big g.typeDefs generally is? wondering if is waste of time or worthwhile to perhaps reserve generatedQueryFields to it's size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by what you mean here, we need to generate these and we don't want things to partially succeed and leave users with a broken database.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry should have been clearer. We either return all generated fields or none if an error occurs (the way it should be). But. what I was asking was do you see any benefit to reserving generatedQueryFields to the capacity of g.typeDefs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got you. I don't think worrying about that is worth the bother - this is run on schema update, not query so performance (of that scale) doesnt really matter

Comment on lines 77 to 102
func TestQueryInlineNillableIntegerArrayWithSumWithOffsetWithLimitWithOrderAsc(t *testing.T) {
test := testUtils.QueryTestCase{
Description: "Simple inline array, ordered offsetted limited sum of integer array",
Query: `query {
users {
Name
_sum(TestScores: {offset: 1, limit: 3, order: ASC})
}
}`,
Docs: map[int][]string{
0: {
`{
"Name": "Shahzad",
"TestScores": [null, 2, 5, 1, 0, 7]
}`,
},
},
Results: []map[string]interface{}{
{
"Name": "Shahzad",
// 0 + 1 + 2
"_sum": int64(3),
},
},
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Perhaps another test where {offset: 0, limit: 3, order: ASC} and "TestScores": [null, 0, 7]

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early morning me is struggling to see why. Why?

I might move null from the zero index though, to make sure sort is done before offset

  • test tweak

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to see what happens if null is there and limit needs to fill one more number (i.e. limit > total-non-nil-numbers.

Yes sir good idea on moving null to another index.

question: when you do sort how do you handle nulls (I might have seen this somewhere on this PR but slipped my mind). Like are they less than every other number, greater than or just ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to see what happens if null is there and limit needs to fill one more number

IMO that is not an order related test, and the limit tests already cover limits exceeding the length

when you do sort how do you handle nulls

Is handled in the less function, in sum.go

@AndrewSisley AndrewSisley force-pushed the sisley/feat/I381-agg-sort branch 2 times, most recently from 0b2fd91 to 60e818b Compare September 12, 2022 14:26
Copy link
Member

@shahzadlone shahzadlone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (assuming you will be adding the appropriate tests).

Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a single thought. Anything else was brought up by Shahzad.

query/graphql/mapper/mapper.go Show resolved Hide resolved
@AndrewSisley AndrewSisley merged commit 57a9394 into develop Sep 15, 2022
@AndrewSisley AndrewSisley deleted the sisley/feat/I381-agg-sort branch September 15, 2022 19:36
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
* Add sort enumerable

* Move resolution of object input types to before resolution of aggregates

* Add order support for inline array numeric aggs

* Add order support for relation numeric aggs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/query Related to the query component area/schema Related to the schema system feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aggregate: Add sort support
3 participants