Window functions: `Reduce` - `FlatMap UnnestList` fusion #29554

ggevay · 2024-09-16T09:53:21Z

This PR implements fusing Reduce with FlatMap UnnestList, to improve window function performance (#29426) to unblock https://github.com/MaterializeInc/accounts/issues/3.

The first two commits are just minor refactorings in preparation for the main thing.

The third commit adds a feature flag (but doesn't wire it up to any actual functionality yet).

The fourth commit is the main thing.

We'll probably want to backport this to the release that is coming out this Thursday, because https://github.com/MaterializeInc/accounts/issues/3 is quite blocked at the moment.

Motivation

This PR fixes a recognized bug: Window functions: Fuse Reduce with FlatMap UnnestList #29426 This is currently blocking https://github.com/MaterializeInc/accounts/issues/3 with their prototyping of a new use case.

Tips for reviewer

Review commit by commit.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
- window_funcs.slt has 7000+ lines of window function tests, and most of those exercise the new code when the feature flag is enabled.
  - I've run the entire slt locally with the feature flag enabled and disabled.
  - The feature flag is explicitly enabled at the beginning of the slt.
  - Added some new tests at the end of the slt where the feature flag is explicitly disabled.
  - I've checked locally that when the feature flag is enabled, then all (non-const-folding) tests in the slt run the new code, i.e., that the pattern matching in the lowering always succeeds for window functions.
- Nightly:
  - https://buildkite.com/materialize/nightly/builds/9539
  - New run, after adding the pattern match soft_assert: https://buildkite.com/materialize/nightly/builds/9618
- Locally ran RQG window-functions with the feature flag enabled and disabled.
- Did some benchmarking: there is an 1.5x speedup on this benchmark.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

ggevay · 2024-09-16T11:23:04Z

The reduce_plan_protobuf_roundtrip failure is just a test issue: the test is generating some invalid plans, which can't occur in practice. I'll fix it after lunch.

Edit: Fixed it.

ParkMyCar · 2024-09-16T14:57:39Z

src/sql-lexer/src/keywords.txt

@@ -342,6 +342,7 @@ Reassign
 Recursion
 Recursive
 Redacted
+Reduce


@ggevay I don't see these new keywords getting used anywhere, can you help me understand what they're for?

update, nvm I see from the SQL thread that they're for a new EXPLAIN option!

ParkMyCar

Looks good from an Adapter and SQL Council perspective

bosconi · 2024-09-17T20:36:33Z

@ggevay I checked with @antiguru and he has time to review this tomorrow.

ggevay · 2024-09-17T21:01:14Z

Ok, thank you!

antiguru

I read the change and I think I understand what's happening. I left some comments around code structure, but I think the general pattern is fine. (It shows that we need to think about re-doing reductions, because the code slowly becomes unreadable, but that's not this PR's fault.)

My main questions are around when to apply this optimization: The implementation choses to do this in lowering, but to me it seems very similar to just any other transform, so I'd like to understand why it's not a transform.

antiguru · 2024-09-18T07:34:43Z

src/compute-types/src/plan/lowering.rs

-                    // `func`.
-                    for expr in &mut exprs {
-                        expr.permute_map(permutation);
+            MirRelationExpr::FlatMap {


Why is this not a regular transform? Is there something in here that we cannot express using MIR alone?

At a high level, this is more of a physical optimization, which are sometimes unpleasant to make a part of the MIR pipeline.

The specific problem here with putting this into the MIR pipeline would be that we'd need to modify MIR's semantics: MIR's Reduce currently always emits exactly 1 row per group, but the fused Reduce-FlatMap can emit multiple rows per group. Such semantic changes of MIR are very scary, since various parts of the optimizer assume that Reduce emits only 1 row per group, and it would be very hard to hunt down all these parts. (For example, key inference infers the group key as a unique key.)

(Btw. the MIR pipeline currently has a physical part, where the most important thing is JoinImplementation, but we are planning to move also JoinImplementation to the MIR-to-LIR lowering.)

Thanks, that makes sense.

src/compute-types/src/plan/reduce.rs

src/compute/src/render/reduce.rs

antiguru · 2024-09-18T07:43:38Z

src/compute/src/render/reduce.rs

+        };
+        let arranged =
+            partial.mz_arrange::<RowRowSpine<_, _>>(("Arranged ".to_owned() + name).as_str());
+        let oks = arranged.mz_reduce_abelian::<_, _, _, RowRowSpine<_, _>>(name, {


I think it's better if we move the !fused_unnest_list outside of the reduce closure and instead have two different mz_reduce_abelian calls. This avoids the runtime check in the closure (or us relying on the optimizer to eliminate the unreachable branch.)

src/compute/src/render/reduce.rs

antiguru · 2024-09-18T08:11:49Z

src/repr/src/optimize.rs

@@ -119,6 +119,7 @@ optimizer_feature_flags!({
    reoptimize_imported_views: bool,
    // Enables the value window function fusion optimization.
    enable_value_window_function_fusion: bool,
+    enable_reduce_unnest_list_fusion: bool,


Documentation!

Sorry, I was thinking to stop adding docs for these optimizer flags, because most of these (except for reoptimize_imported_views) are simply the same thing as the feature flag of the same name.

Now I've added a comment that explicitly points to the feature flag of the same name.

src/repr/src/row.rs

antiguru · 2024-09-18T08:27:48Z

src/compute/src/render/reduce.rs

+                        let datum_iter = key.to_datum_iter();
+                        let mut datums_local = datums1.borrow();
+                        datums_local.extend(datum_iter);
+                        let key_len = datums_local.len();


I think you can avoid decoding the key repeatedly. This assumes that evaluating an mfp only appends datums, but never modifies datums, which I think is true at the moment.

diff --git a/src/compute/src/render/reduce.rs b/src/compute/src/render/reduce.rs index c9ceac4470..c74a5b0e90 100644 --- a/src/compute/src/render/reduce.rs +++ b/src/compute/src/render/reduce.rs @@ -781,6 +781,7 @@ where // Allocations for the two closures. let mut datums1 = DatumVec::new(); + let mut datums_key = DatumVec::new(); let mut datums2 = DatumVec::new(); let mfp_after1 = mfp_after.clone(); let mfp_after2 = mfp_after.filter(|mfp| mfp.could_error()); @@ -826,16 +827,17 @@ where target.push((row, 1)); } } else { + let mut datums_local = datums_key.borrow(); + datums_local.extend(key.to_datum_iter()); + let key_len = datums_local.len(); + for datum in func .eval_with_unnest_list::<_, window_agg_helpers::OneByOneAggrImpls>( iter, &temp_storage, ) { - let datum_iter = key.to_datum_iter(); - let mut datums_local = datums1.borrow(); - datums_local.extend(datum_iter); - let key_len = datums_local.len(); + datums_local.truncate(key_len); datums_local.push(datum); if let Some(row) = evaluate_mfp_after( &mfp_after1,

Oh, nice! Thank you!

src/compute/src/render/reduce.rs

antiguru · 2024-09-18T08:30:41Z

src/sql/src/session/vars/definitions.rs

+    {
+        name: enable_reduce_unnest_list_fusion,
+        desc: "Enables fusing `Reduce` with `FlatMap UnnestList` for better window function performance",
+        default: false,


Why isn't this enabled by default? It's fine with me not to enable it, but I'd like to understand the reasoning!

Originally, the plan was to

Roll this PR out as a patch release to all customers this week, in which case it seemed less risky to enable it only for GM this week.

Enable it for all customers in the next release window. I was planning to do this in a follow-up PR this Thursday, where I'd set this default to true.

However, then Nikhil decided (completely understandably) that we should roll this PR out in a special patch release to only GM (tomorrow, right after the normal release goes out). So, now I think I'll modify this PR to make it enabled by default.

(Generally, LD can of course override whatever default we set here, but I think it's good if the default value here is consistent with the LD setting of the majority of customers, so that we run the entirety of CI with the common setting.)

Edit: Done, I've changed the default to true.

ggevay · 2024-09-18T10:58:05Z

Thank you very much for the comments @antiguru! I've addressed all of them. Could you please check?

antiguru

Thanks!

antiguru · 2024-09-18T11:06:46Z

src/compute-types/src/plan/lowering.rs

-                    // `func`.
-                    for expr in &mut exprs {
-                        expr.permute_map(permutation);
+            MirRelationExpr::FlatMap {


Thanks, that makes sense.

antiguru · 2024-09-18T11:10:48Z

src/compute/src/render/reduce.rs

+                // Note that we skip validating for negative diffs when we have a fused unnest list,
+                // because this is already a CPU-intensive situation due to the non-incrementalness
+                // of window functions.
+                let validating = !fused_unnest_list;


Could you add an issue to enable validation with fused list unnest at a later time? Thank you

Yes, done: #29624

ggevay · 2024-09-18T12:40:37Z

CI caught some minor test issues due to changing the default value of the flag. Fixing them now.

Edit: Fixed.

ggevay · 2024-09-18T17:13:40Z

Had a call with Petros, where he approved it. Merging (after CI completes).

petrosagg

Went through the code with Gabor and it LGTM!

src/compute-types/src/plan/lowering.rs

src/compute-types/src/plan/reduce.rs

src/compute/src/render/reduce.rs

- Factors out Reduce lowering into a separate fn. - Some code motion in FlatMap lowering. Checking the preconditions for fusion will be a multi-step process, from which we'll to bail out to the standard FlatMap lowering at multiple stages.

In addition to addressing the comments, this: - adds more tests - flips the feature flag's default to true, and addresses the test fallout. - adds a soft_assert_or_log for missing a fusion due to the complex pattern match failing.

ggevay added A-optimization Area: query optimization and transformation A-CLUSTER Topics related to the CLUSTER layer labels Sep 16, 2024

ggevay force-pushed the reduce-flatmap-fusion branch from 03933ff to 01ca776 Compare September 16, 2024 09:53

ggevay changed the title ~~Reduce flatmap fusion~~ Window functions: Reduce - FlatMap UnnestList fusion Sep 16, 2024

ggevay requested a review from petrosagg September 16, 2024 09:53

ggevay force-pushed the reduce-flatmap-fusion branch 4 times, most recently from d60d4e2 to 5404fcf Compare September 16, 2024 10:52

ggevay marked this pull request as ready for review September 16, 2024 11:13

ggevay requested review from a team as code owners September 16, 2024 11:13

ggevay requested a review from ParkMyCar September 16, 2024 11:13

ggevay force-pushed the reduce-flatmap-fusion branch 2 times, most recently from 7b34ac4 to 8548892 Compare September 16, 2024 14:20

ParkMyCar reviewed Sep 16, 2024

View reviewed changes

ParkMyCar approved these changes Sep 16, 2024

View reviewed changes

ggevay force-pushed the reduce-flatmap-fusion branch from 8548892 to 0439dde Compare September 16, 2024 18:30

antiguru self-requested a review September 17, 2024 20:34

antiguru reviewed Sep 18, 2024

View reviewed changes

ggevay force-pushed the reduce-flatmap-fusion branch 2 times, most recently from 9039d1e to 46585b2 Compare September 18, 2024 10:56

antiguru approved these changes Sep 18, 2024

View reviewed changes

ggevay mentioned this pull request Sep 18, 2024

Consider checking for negative multiplicities also in window function inputs #29624

Open

ggevay force-pushed the reduce-flatmap-fusion branch from 46585b2 to 0ab4e92 Compare September 18, 2024 12:47

ggevay force-pushed the reduce-flatmap-fusion branch from 0ab4e92 to f59d7ab Compare September 18, 2024 17:04

petrosagg approved these changes Sep 18, 2024

View reviewed changes

src/compute-types/src/plan/lowering.rs Outdated Show resolved Hide resolved

src/compute-types/src/plan/reduce.rs Outdated Show resolved Hide resolved

src/compute/src/render/reduce.rs Outdated Show resolved Hide resolved

ggevay added 5 commits September 18, 2024 22:10

Preparations for Reduce-FlatMap fusion

cecaded

Add feature flag for Reduce-FlatMap fusion

9e24b28

Window functions: Fuse Reduce with FlatMap UnnestList

dab35b1

ggevay force-pushed the reduce-flatmap-fusion branch from f59d7ab to 33f63b5 Compare September 18, 2024 20:10

ggevay enabled auto-merge September 18, 2024 20:12

ggevay merged commit 35b3016 into MaterializeInc:main Sep 18, 2024
84 checks passed

github-actions bot locked and limited conversation to collaborators Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Window functions: `Reduce` - `FlatMap UnnestList` fusion #29554

Window functions: `Reduce` - `FlatMap UnnestList` fusion #29554

ggevay commented Sep 16, 2024 •

edited

Loading

ggevay commented Sep 16, 2024 •

edited

Loading

ParkMyCar Sep 16, 2024

ParkMyCar Sep 16, 2024

ParkMyCar left a comment

bosconi commented Sep 17, 2024

ggevay commented Sep 17, 2024

antiguru left a comment

antiguru Sep 18, 2024

ggevay Sep 18, 2024

antiguru Sep 18, 2024

antiguru Sep 18, 2024

ggevay Sep 18, 2024

antiguru Sep 18, 2024

ggevay Sep 18, 2024

antiguru Sep 18, 2024

ggevay Sep 18, 2024

antiguru Sep 18, 2024

ggevay Sep 18, 2024 •

edited

Loading

ggevay commented Sep 18, 2024

antiguru left a comment

antiguru Sep 18, 2024

antiguru Sep 18, 2024

ggevay Sep 18, 2024

ggevay commented Sep 18, 2024 •

edited

Loading

ggevay commented Sep 18, 2024

petrosagg left a comment

Window functions: Reduce - FlatMap UnnestList fusion #29554

Window functions: Reduce - FlatMap UnnestList fusion #29554

Conversation

ggevay commented Sep 16, 2024 • edited Loading

Motivation

Tips for reviewer

Checklist

ggevay commented Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ParkMyCar left a comment

Choose a reason for hiding this comment

bosconi commented Sep 17, 2024

ggevay commented Sep 17, 2024

antiguru left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggevay Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

ggevay commented Sep 18, 2024

antiguru left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggevay commented Sep 18, 2024 • edited Loading

ggevay commented Sep 18, 2024

petrosagg left a comment

Choose a reason for hiding this comment

Window functions: `Reduce` - `FlatMap UnnestList` fusion #29554

Window functions: `Reduce` - `FlatMap UnnestList` fusion #29554

ggevay commented Sep 16, 2024 •

edited

Loading

ggevay commented Sep 16, 2024 •

edited

Loading

ggevay Sep 18, 2024 •

edited

Loading

ggevay commented Sep 18, 2024 •

edited

Loading