-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Window functions: Reduce
- FlatMap UnnestList
fusion
#29554
Conversation
03933ff
to
01ca776
Compare
Reduce
- FlatMap UnnestList
fusion
d60d4e2
to
5404fcf
Compare
The Edit: Fixed it. |
7b34ac4
to
8548892
Compare
@@ -342,6 +342,7 @@ Reassign | |||
Recursion | |||
Recursive | |||
Redacted | |||
Reduce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggevay I don't see these new keywords getting used anywhere, can you help me understand what they're for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update, nvm I see from the SQL thread that they're for a new EXPLAIN option!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from an Adapter and SQL Council perspective
8548892
to
0439dde
Compare
Ok, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the change and I think I understand what's happening. I left some comments around code structure, but I think the general pattern is fine. (It shows that we need to think about re-doing reductions, because the code slowly becomes unreadable, but that's not this PR's fault.)
My main questions are around when to apply this optimization: The implementation choses to do this in lowering, but to me it seems very similar to just any other transform, so I'd like to understand why it's not a transform.
// `func`. | ||
for expr in &mut exprs { | ||
expr.permute_map(permutation); | ||
MirRelationExpr::FlatMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not a regular transform? Is there something in here that we cannot express using MIR alone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, this is more of a physical optimization, which are sometimes unpleasant to make a part of the MIR pipeline.
The specific problem here with putting this into the MIR pipeline would be that we'd need to modify MIR's semantics: MIR's Reduce
currently always emits exactly 1 row per group, but the fused Reduce-FlatMap
can emit multiple rows per group. Such semantic changes of MIR are very scary, since various parts of the optimizer assume that Reduce
emits only 1 row per group, and it would be very hard to hunt down all these parts. (For example, key inference infers the group key as a unique key.)
(Btw. the MIR pipeline currently has a physical part, where the most important thing is JoinImplementation
, but we are planning to move also JoinImplementation
to the MIR-to-LIR lowering.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense.
src/compute/src/render/reduce.rs
Outdated
}; | ||
let arranged = | ||
partial.mz_arrange::<RowRowSpine<_, _>>(("Arranged ".to_owned() + name).as_str()); | ||
let oks = arranged.mz_reduce_abelian::<_, _, _, RowRowSpine<_, _>>(name, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better if we move the !fused_unnest_list
outside of the reduce closure and instead have two different mz_reduce_abelian
calls. This avoids the runtime check in the closure (or us relying on the optimizer to eliminate the unreachable branch.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done!
@@ -119,6 +119,7 @@ optimizer_feature_flags!({ | |||
reoptimize_imported_views: bool, | |||
// Enables the value window function fusion optimization. | |||
enable_value_window_function_fusion: bool, | |||
enable_reduce_unnest_list_fusion: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was thinking to stop adding docs for these optimizer flags, because most of these (except for reoptimize_imported_views
) are simply the same thing as the feature flag of the same name.
Now I've added a comment that explicitly points to the feature flag of the same name.
src/compute/src/render/reduce.rs
Outdated
let datum_iter = key.to_datum_iter(); | ||
let mut datums_local = datums1.borrow(); | ||
datums_local.extend(datum_iter); | ||
let key_len = datums_local.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can avoid decoding the key repeatedly. This assumes that evaluating an mfp only appends datums, but never modifies datums, which I think is true at the moment.
diff --git a/src/compute/src/render/reduce.rs b/src/compute/src/render/reduce.rs
index c9ceac4470..c74a5b0e90 100644
--- a/src/compute/src/render/reduce.rs
+++ b/src/compute/src/render/reduce.rs
@@ -781,6 +781,7 @@ where
// Allocations for the two closures.
let mut datums1 = DatumVec::new();
+ let mut datums_key = DatumVec::new();
let mut datums2 = DatumVec::new();
let mfp_after1 = mfp_after.clone();
let mfp_after2 = mfp_after.filter(|mfp| mfp.could_error());
@@ -826,16 +827,17 @@ where
target.push((row, 1));
}
} else {
+ let mut datums_local = datums_key.borrow();
+ datums_local.extend(key.to_datum_iter());
+ let key_len = datums_local.len();
+
for datum in func
.eval_with_unnest_list::<_, window_agg_helpers::OneByOneAggrImpls>(
iter,
&temp_storage,
)
{
- let datum_iter = key.to_datum_iter();
- let mut datums_local = datums1.borrow();
- datums_local.extend(datum_iter);
- let key_len = datums_local.len();
+ datums_local.truncate(key_len);
datums_local.push(datum);
if let Some(row) = evaluate_mfp_after(
&mfp_after1,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice! Thank you!
{ | ||
name: enable_reduce_unnest_list_fusion, | ||
desc: "Enables fusing `Reduce` with `FlatMap UnnestList` for better window function performance", | ||
default: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this enabled by default? It's fine with me not to enable it, but I'd like to understand the reasoning!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, the plan was to
- Roll this PR out as a patch release to all customers this week, in which case it seemed less risky to enable it only for GM this week.
- Enable it for all customers in the next release window. I was planning to do this in a follow-up PR this Thursday, where I'd set this default to
true
.
However, then Nikhil decided (completely understandably) that we should roll this PR out in a special patch release to only GM (tomorrow, right after the normal release goes out). So, now I think I'll modify this PR to make it enabled by default.
(Generally, LD can of course override whatever default we set here, but I think it's good if the default value here is consistent with the LD setting of the majority of customers, so that we run the entirety of CI with the common setting.)
Edit: Done, I've changed the default to true
.
9039d1e
to
46585b2
Compare
Thank you very much for the comments @antiguru! I've addressed all of them. Could you please check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
// `func`. | ||
for expr in &mut exprs { | ||
expr.permute_map(permutation); | ||
MirRelationExpr::FlatMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense.
// Note that we skip validating for negative diffs when we have a fused unnest list, | ||
// because this is already a CPU-intensive situation due to the non-incrementalness | ||
// of window functions. | ||
let validating = !fused_unnest_list; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an issue to enable validation with fused list unnest at a later time? Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done: #29624
CI caught some minor test issues due to changing the default value of the flag. Fixing them now. Edit: Fixed. |
46585b2
to
0ab4e92
Compare
0ab4e92
to
f59d7ab
Compare
Had a call with Petros, where he approved it. Merging (after CI completes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the code with Gabor and it LGTM!
- Factors out Reduce lowering into a separate fn. - Some code motion in FlatMap lowering. Checking the preconditions for fusion will be a multi-step process, from which we'll to bail out to the standard FlatMap lowering at multiple stages.
In addition to addressing the comments, this: - adds more tests - flips the feature flag's default to true, and addresses the test fallout. - adds a soft_assert_or_log for missing a fusion due to the complex pattern match failing.
f59d7ab
to
33f63b5
Compare
This PR implements fusing
Reduce
withFlatMap UnnestList
, to improve window function performance (#29426) to unblock https://github.com/MaterializeInc/accounts/issues/3.The first two commits are just minor refactorings in preparation for the main thing.
The third commit adds a feature flag (but doesn't wire it up to any actual functionality yet).
The fourth commit is the main thing.
We'll probably want to backport this to the release that is coming out this Thursday, because https://github.com/MaterializeInc/accounts/issues/3 is quite blocked at the moment.
Motivation
Reduce
withFlatMap UnnestList
#29426 This is currently blocking https://github.com/MaterializeInc/accounts/issues/3 with their prototyping of a new use case.Tips for reviewer
Review commit by commit.
Checklist
window_funcs.slt
has 7000+ lines of window function tests, and most of those exercise the new code when the feature flag is enabled.window-functions
with the feature flag enabled and disabled.$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.