-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TPC-H] Query 11 and 13 memory issue #1382
Comments
This is a bit harder to detect though, this is actually something where the cardinality is needed to make a more informed decision |
This one turns out to be a bit trickier. There's an instance of a groupby with many unique values, as well as a join with a single value of the (partitioned) nations dataset (#1380). However, applying the trivial fixes ( |
As it turns out, |
The broadcast flag wasn't properly preserved when pushing filters down, this is probably why that looked weird for @hendrikmakait Pr to fix is here: dask/dask-expr#871 Have to rerun after that one is in |
@phofl: This looks much better now, thanks! https://cloud.coiled.io/clusters/383307/information?tab=Metrics |
This looks like another instance of the problem in #1376. We end up with a groupby-aggregate that leaves us with ~30M groups in a single partition.
Edit (Patrick): Query 13 has exactly the same issue
The text was updated successfully, but these errors were encountered: