Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add limit support for aggregates #771

Merged
merged 8 commits into from
Aug 31, 2022

Conversation

AndrewSisley
Copy link
Contributor

@AndrewSisley AndrewSisley commented Aug 31, 2022

Relevant issue(s)

Resolves #378

Description

Add limit support for aggregates. Includes the introduction of basic enumerable types to support the construction of lazily executed iteration logic (handy for limit/filter stuff when particular bits of logic are dependent on iteration-constants).

I was very tempted to break the enumerable stuff out into a child package of core, I feel like that is a better fit, but I thought there'd be plenty of talk around those types anyway and I didn't want to waste time moving it from where I first added it - please let me know your thoughts on the location.

Fully fleshing out Enumerable and friends is well out of scope IMO. There is an existing open-source package that does this, but it seems dead, and does not allow for error handling within enumerables.

Having to do limit/filter (and soon offset and sort) within the aggregate nodes (in order to support arrays and groups) is starting to become a bit clunky. I have broken this rework out to a new issue #770 as it will involve some notable head scratching and I don't want it to bog down this work (and having the extra use-case tests will be handy when attempting that ticket).

Commits should be clean.

Specify the platform(s) on which this was tested:

  • Debian Linux

@AndrewSisley AndrewSisley added feature New feature or request area/query Related to the query component action/no-benchmark Skips the action that runs the benchmark. labels Aug 31, 2022
@AndrewSisley AndrewSisley added this to the DefraDB v0.3.1 milestone Aug 31, 2022
@AndrewSisley AndrewSisley requested a review from a team August 31, 2022 17:19
@AndrewSisley AndrewSisley self-assigned this Aug 31, 2022
Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Had some thought/question, but doesn't require changes.

Additionally, as mentioned, we can likely split the Enumerable stuff into its own package, ideally with Option (granted I know theres export requirements on this one) and anything else that fits the Generic monad/functional pattern. Overlaps a bit with some of the dynamic type stuff im working on for a Iterable type, same semantics more or less. Future PR for this stuff, just mentioning

// executed upon iteration, allowing the enumerable to be constructed out of a
// complex set of instructions that can be evaluated in a single iteration of the
// underlying set.
type Enumerable[T any] interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Well defined, great documentation!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks :)

@@ -954,10 +961,22 @@ func getAggregateSources(field *parser.Select) ([]*aggregateRequestTarget, error
}
}

limitArg, hasLimitArg := tryGet(argumentValue, parserTypes.LimitClause)
if hasLimitArg {
limitValue, err := strconv.ParseInt(limitArg.Value.(*ast.IntValue).Value, 10, 64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought(out of scope): still not a fan of doing direct AST access in the mapper package, but TBD on solution outside of this PR

"offset": schemaTypes.NewArgConfig(gql.Int),
"having": schemaTypes.NewArgConfig(g.manager.schema.TypeMap()[typeName+"HavingArg"]),
"order": schemaTypes.NewArgConfig(g.manager.schema.TypeMap()[typeName+"OrderArg"]),
parserTypes.LimitClause: schemaTypes.NewArgConfig(gql.Int),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why change just this key to use the parserTypes package? Instead of the previous hardcoded string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same Q for the other instances of this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have to be the same (or parsing will fail), so IMO they should use the same const (to show the connection, 1 place to change it, etc). I'd like the rest to be changed over time too.

@AndrewSisley
Copy link
Contributor Author

AndrewSisley commented Aug 31, 2022

Additionally, as mentioned, we can likely split the Enumerable stuff into its own package, ideally with Option (granted I know theres export requirements on this one) and anything else that fits the Generic monad/functional pattern. Overlaps a bit with some of the dynamic type stuff im working on for a Iterable type, same semantics more or less. Future PR for this stuff, just mentioning

I'll move the Enumerable stuff now then, as I was keen to do so (noted in description). Less fussed about option as it is simpler and less likely to grow, but I for sure see the merit in it (although worry about too many public packages) - can be done later (like you said), with greater thought/attention.

  • Move enumerable stuff

@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

Merging #771 (06301ca) into develop (0f7fa15) will increase coverage by 0.23%.
The diff coverage is 94.64%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #771      +/-   ##
===========================================
+ Coverage    58.31%   58.55%   +0.23%     
===========================================
  Files          147      150       +3     
  Lines        16780    16884     +104     
===========================================
+ Hits          9786     9887     +101     
- Misses        6072     6074       +2     
- Partials       922      923       +1     
Impacted Files Coverage Δ
query/graphql/mapper/mapper.go 86.72% <76.92%> (-0.18%) ⬇️
core/enumerable/enumerable.go 84.61% <84.61%> (ø)
core/enumerable/take.go 100.00% <100.00%> (ø)
core/enumerable/where.go 100.00% <100.00%> (ø)
query/graphql/planner/count.go 95.60% <100.00%> (+3.46%) ⬆️
query/graphql/planner/sum.go 84.97% <100.00%> (+2.55%) ⬆️
query/graphql/schema/generate.go 84.03% <100.00%> (+0.06%) ⬆️

@AndrewSisley AndrewSisley merged commit d2b4623 into develop Aug 31, 2022
@AndrewSisley AndrewSisley deleted the sisley/feat/I378-agg-limit branch August 31, 2022 23:44
Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this was merged, I have a few comments that might be used for future changes.

First time reviewing a PR with generics. I think you used it well :)

count += 1
})

return count, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Could we not avoid the OnEach loop here and just return the length of items?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length is not known until full enumeration (due to filters, limits, etc)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but you set items just above and all the OnEach does is add to count for each item. So the length of items would be known no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructed enumerable is not iterated through until OnEach is called. You could instead ToSlice it (func removed, but hopefully obvious), and then len(result), but that would be wasteful in that you'd copy the result into an array and then count it, instead of just counting the number of items that make it through the enumerable's filters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough 👍

}
}

func (s *enumerableSlice[T]) Next() (bool, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: This applies to all the Next methods. If they returned only a bool, it would make for a cleaner for loop with:

for source.Next() {
  ...
}

An error could end the loop and be saved and checked with source.Error() right after the loop:

if source.Error() {
  ...
}

Or even return source.Error() directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the Error on the object results in easily missed errors, it is perhaps the biggest reason generate.go is such a buggy pain in the neck to work with (the gql lib does this, and I hate this far more than the file size)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really different than doing if err != nil within the for loop though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is, the linter will pick up any error returned but not handled, and it is much more obvious to any developer working with the type

}

// Take creates an `Enumerable` from the given `Enumerable` and limit. The returned
// `Enumerable` will restrict the maximum number of items yielded to the given limit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Since some people won't have experience with this terminology, I would encourage adding some kind of definition.

// It can be interpreted as `take x items from source`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't think

Enumerable will restrict the maximum number of items yielded to the given limit.

covers it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It tells us what the output will be limited to a length of limit but it's not obvious from the doc that limiting the output is all that is really happening. I feel like adding my suggestion would be a useful clarification.

You wrote it so it's obvious to you :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote it so it's obvious to you :)

Always a risk :)

t's not obvious from the doc that limiting the output is all that is really happening

What else could it be doing? I dislike documenting what functions don't do, unless it is unexpectedly not doing something (len = infinity-x etc...) - is there something you expected it to do here that it doesn't do?

Copy link
Collaborator

@fredcarle fredcarle Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else could it be doing?

Exactly :)
My suggestion is not documenting something the function isn't doing. It's just that my preference is using function names that are self descriptive enough that a doc like you wrote would be enough. But in this case, Take is not obvious. I mean, it's obvious now that we discussed it but when I first looked at it, those few extra words in the doc would have helped a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take is not obvious

Sure - but what is not covered by the existing documentation? I am a bit confused here and I don't want the comment to get too wordy/repetitive as that just adds noise. Is yielded an unusual term outside of .Net? I could add a goc-doc example, but that feels overkill atm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to stay as concise as possible I would just replace the doc to this.

// Take limits the number of items that can be yielded by the given `Enumerable`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to that if you think it is worth the dev time (this would get it's own commit in develop, and will pop up in the change log). I still don't really see any extra clarity vs the current though as it uses the current terms (although it reads better and is more concise and for sure is an improvement IMO).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just create an issue for it and we can integrate it in a PR that uses that function.

shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
* Add tests for count 1-many with filter

* Use const for limit string

Needs to be the same

* Add basic enumerable types

* Add limit support for count of relations

* Add limit support for count of inline arrays

* Add limit support for numeric aggs of relations

* Add limit support for numeric aggs of inline arrays
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/query Related to the query component feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aggregate: Add limit support
3 participants