Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window functions: Reduce - FlatMap UnnestList fusion #29554

Merged
merged 5 commits into from
Sep 18, 2024

Conversation

ggevay
Copy link
Contributor

@ggevay ggevay commented Sep 16, 2024

This PR implements fusing Reduce with FlatMap UnnestList, to improve window function performance (#29426) to unblock https://github.com/MaterializeInc/accounts/issues/3.

The first two commits are just minor refactorings in preparation for the main thing.

The third commit adds a feature flag (but doesn't wire it up to any actual functionality yet).

The fourth commit is the main thing.

We'll probably want to backport this to the release that is coming out this Thursday, because https://github.com/MaterializeInc/accounts/issues/3 is quite blocked at the moment.

Motivation

Tips for reviewer

Review commit by commit.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
    • window_funcs.slt has 7000+ lines of window function tests, and most of those exercise the new code when the feature flag is enabled.
      • I've run the entire slt locally with the feature flag enabled and disabled.
      • The feature flag is explicitly enabled at the beginning of the slt.
      • Added some new tests at the end of the slt where the feature flag is explicitly disabled.
      • I've checked locally that when the feature flag is enabled, then all (non-const-folding) tests in the slt run the new code, i.e., that the pattern matching in the lowering always succeeds for window functions.
    • Nightly:
    • Locally ran RQG window-functions with the feature flag enabled and disabled.
    • Did some benchmarking: there is an 1.5x speedup on this benchmark.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ggevay ggevay added A-optimization Area: query optimization and transformation A-CLUSTER Topics related to the CLUSTER layer labels Sep 16, 2024
@ggevay ggevay changed the title Reduce flatmap fusion Window functions: Reduce - FlatMap UnnestList fusion Sep 16, 2024
@ggevay ggevay force-pushed the reduce-flatmap-fusion branch 4 times, most recently from d60d4e2 to 5404fcf Compare September 16, 2024 10:52
@ggevay ggevay marked this pull request as ready for review September 16, 2024 11:13
@ggevay ggevay requested review from a team as code owners September 16, 2024 11:13
@ggevay
Copy link
Contributor Author

ggevay commented Sep 16, 2024

The reduce_plan_protobuf_roundtrip failure is just a test issue: the test is generating some invalid plans, which can't occur in practice. I'll fix it after lunch.

Edit: Fixed it.

@ggevay ggevay force-pushed the reduce-flatmap-fusion branch 2 times, most recently from 7b34ac4 to 8548892 Compare September 16, 2024 14:20
@@ -342,6 +342,7 @@ Reassign
Recursion
Recursive
Redacted
Reduce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggevay I don't see these new keywords getting used anywhere, can you help me understand what they're for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update, nvm I see from the SQL thread that they're for a new EXPLAIN option!

Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from an Adapter and SQL Council perspective

@bosconi
Copy link
Member

bosconi commented Sep 17, 2024

@ggevay I checked with @antiguru and he has time to review this tomorrow.

@ggevay
Copy link
Contributor Author

ggevay commented Sep 17, 2024

Ok, thank you!

Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the change and I think I understand what's happening. I left some comments around code structure, but I think the general pattern is fine. (It shows that we need to think about re-doing reductions, because the code slowly becomes unreadable, but that's not this PR's fault.)

My main questions are around when to apply this optimization: The implementation choses to do this in lowering, but to me it seems very similar to just any other transform, so I'd like to understand why it's not a transform.

// `func`.
for expr in &mut exprs {
expr.permute_map(permutation);
MirRelationExpr::FlatMap {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a regular transform? Is there something in here that we cannot express using MIR alone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, this is more of a physical optimization, which are sometimes unpleasant to make a part of the MIR pipeline.

The specific problem here with putting this into the MIR pipeline would be that we'd need to modify MIR's semantics: MIR's Reduce currently always emits exactly 1 row per group, but the fused Reduce-FlatMap can emit multiple rows per group. Such semantic changes of MIR are very scary, since various parts of the optimizer assume that Reduce emits only 1 row per group, and it would be very hard to hunt down all these parts. (For example, key inference infers the group key as a unique key.)

(Btw. the MIR pipeline currently has a physical part, where the most important thing is JoinImplementation, but we are planning to move also JoinImplementation to the MIR-to-LIR lowering.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense.

src/compute-types/src/plan/reduce.rs Outdated Show resolved Hide resolved
src/compute/src/render/reduce.rs Outdated Show resolved Hide resolved
};
let arranged =
partial.mz_arrange::<RowRowSpine<_, _>>(("Arranged ".to_owned() + name).as_str());
let oks = arranged.mz_reduce_abelian::<_, _, _, RowRowSpine<_, _>>(name, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better if we move the !fused_unnest_list outside of the reduce closure and instead have two different mz_reduce_abelian calls. This avoids the runtime check in the closure (or us relying on the optimizer to eliminate the unreachable branch.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done!

src/compute/src/render/reduce.rs Outdated Show resolved Hide resolved
@@ -119,6 +119,7 @@ optimizer_feature_flags!({
reoptimize_imported_views: bool,
// Enables the value window function fusion optimization.
enable_value_window_function_fusion: bool,
enable_reduce_unnest_list_fusion: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was thinking to stop adding docs for these optimizer flags, because most of these (except for reoptimize_imported_views) are simply the same thing as the feature flag of the same name.

Now I've added a comment that explicitly points to the feature flag of the same name.

src/repr/src/row.rs Show resolved Hide resolved
Comment on lines 835 to 838
let datum_iter = key.to_datum_iter();
let mut datums_local = datums1.borrow();
datums_local.extend(datum_iter);
let key_len = datums_local.len();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid decoding the key repeatedly. This assumes that evaluating an mfp only appends datums, but never modifies datums, which I think is true at the moment.

diff --git a/src/compute/src/render/reduce.rs b/src/compute/src/render/reduce.rs
index c9ceac4470..c74a5b0e90 100644
--- a/src/compute/src/render/reduce.rs
+++ b/src/compute/src/render/reduce.rs
@@ -781,6 +781,7 @@ where
 
         // Allocations for the two closures.
         let mut datums1 = DatumVec::new();
+        let mut datums_key = DatumVec::new();
         let mut datums2 = DatumVec::new();
         let mfp_after1 = mfp_after.clone();
         let mfp_after2 = mfp_after.filter(|mfp| mfp.could_error());
@@ -826,16 +827,17 @@ where
                         target.push((row, 1));
                     }
                 } else {
+                    let mut datums_local = datums_key.borrow();
+                    datums_local.extend(key.to_datum_iter());
+                    let key_len = datums_local.len();
+
                     for datum in func
                         .eval_with_unnest_list::<_, window_agg_helpers::OneByOneAggrImpls>(
                             iter,
                             &temp_storage,
                         )
                     {
-                        let datum_iter = key.to_datum_iter();
-                        let mut datums_local = datums1.borrow();
-                        datums_local.extend(datum_iter);
-                        let key_len = datums_local.len();
+                        datums_local.truncate(key_len);
                         datums_local.push(datum);
                         if let Some(row) = evaluate_mfp_after(
                             &mfp_after1,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice! Thank you!

src/compute/src/render/reduce.rs Outdated Show resolved Hide resolved
{
name: enable_reduce_unnest_list_fusion,
desc: "Enables fusing `Reduce` with `FlatMap UnnestList` for better window function performance",
default: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this enabled by default? It's fine with me not to enable it, but I'd like to understand the reasoning!

Copy link
Contributor Author

@ggevay ggevay Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, the plan was to

  1. Roll this PR out as a patch release to all customers this week, in which case it seemed less risky to enable it only for GM this week.
  2. Enable it for all customers in the next release window. I was planning to do this in a follow-up PR this Thursday, where I'd set this default to true.

However, then Nikhil decided (completely understandably) that we should roll this PR out in a special patch release to only GM (tomorrow, right after the normal release goes out). So, now I think I'll modify this PR to make it enabled by default.

(Generally, LD can of course override whatever default we set here, but I think it's good if the default value here is consistent with the LD setting of the majority of customers, so that we run the entirety of CI with the common setting.)

Edit: Done, I've changed the default to true.

@ggevay ggevay force-pushed the reduce-flatmap-fusion branch 2 times, most recently from 9039d1e to 46585b2 Compare September 18, 2024 10:56
@ggevay
Copy link
Contributor Author

ggevay commented Sep 18, 2024

Thank you very much for the comments @antiguru! I've addressed all of them. Could you please check?

Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

// `func`.
for expr in &mut exprs {
expr.permute_map(permutation);
MirRelationExpr::FlatMap {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense.

Comment on lines +238 to +241
// Note that we skip validating for negative diffs when we have a fused unnest list,
// because this is already a CPU-intensive situation due to the non-incrementalness
// of window functions.
let validating = !fused_unnest_list;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an issue to enable validation with fused list unnest at a later time? Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done: #29624

@ggevay
Copy link
Contributor Author

ggevay commented Sep 18, 2024

CI caught some minor test issues due to changing the default value of the flag. Fixing them now.

Edit: Fixed.

@ggevay
Copy link
Contributor Author

ggevay commented Sep 18, 2024

Had a call with Petros, where he approved it. Merging (after CI completes).

Copy link
Contributor

@petrosagg petrosagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the code with Gabor and it LGTM!

src/compute-types/src/plan/lowering.rs Outdated Show resolved Hide resolved
src/compute-types/src/plan/reduce.rs Outdated Show resolved Hide resolved
src/compute/src/render/reduce.rs Outdated Show resolved Hide resolved
- Factors out Reduce lowering into a separate fn.
- Some code motion in FlatMap lowering. Checking the preconditions
  for fusion will be a multi-step process, from which we'll to bail
  out to the standard FlatMap lowering at multiple stages.
In addition to addressing the comments, this:
- adds more tests
- flips the feature flag's default to true, and addresses the test
  fallout.
- adds a soft_assert_or_log for missing a fusion due to the complex
  pattern match failing.
@ggevay ggevay merged commit 35b3016 into MaterializeInc:main Sep 18, 2024
84 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-CLUSTER Topics related to the CLUSTER layer A-optimization Area: query optimization and transformation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants