Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform: Turn a Join with a constant input into a FlatMap or Map #29683

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ggevay
Copy link
Contributor

@ggevay ggevay commented Sep 20, 2024

This finally implements the transform that turns Joins that have constant inputs into a FlatMap or Map. This would have been useful many times in the past few years, see links in the issue (#20637). (Most recently, there were some confusions at https://github.com/MaterializeInc/accounts/issues/3 around the manual broadcast join pattern.) Importantly, this turns a non-scalable computation (cross-join) into a scalable one (FlatMap or Map).

Notably, I've put this near the beginning of the MIR optimization pipeline, before join fusion, explained in the doc comment of join_to_flat_map.rs.

It's guarded by a feature flag, but I've enabled it by default. I think there is a relatively low risk of significant regressions.

cc @sthm

Motivation

Tips for reviewer

The first commit just adds the feature flag, which is a bit noisy, so I recommend reviewing this commit separately.

Checklist

@ggevay ggevay added A-optimization Area: query optimization and transformation A-CLUSTER Topics related to the CLUSTER layer labels Sep 20, 2024
.unwrap()
.filter(unpack_equivalences(equivalences));
return Ok(false);
}
Copy link
Contributor Author

@ggevay ggevay Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was factored out into the eliminate_trivial_join call above.

@ggevay ggevay marked this pull request as ready for review September 20, 2024 18:15
@ggevay ggevay requested review from a team as code owners September 20, 2024 18:15
@ggevay
Copy link
Contributor Author

ggevay commented Sep 20, 2024

Bringing back to draft, because Nightly is showing a possible bug.

Edit: No, see below.

@ggevay ggevay marked this pull request as draft September 20, 2024 19:23
@ggevay
Copy link
Contributor Author

ggevay commented Sep 21, 2024

Nightly https://buildkite.com/materialize/nightly/builds/9664:

  • Product limits 1: Seems unrelated, and also broken on main.
  • Full Testdrive in Cloudtest (K8s) 2: Unrelated. (Related to sinks, and I checked that the relevant query's plan didn't change with this PR.)
  • OptbenchTPCHQ14: The "8.0 TIMES more" messages looks unrelated. We have slts checking TPC-H query plans, and none of them changed in this PR. Note that the absolute number of messages in this test is quite low (32 vs. 4).
  • Output consistency (Postgres): This looked suspicious at first glance, but it's unrelated. The query plan and query results didn't change in this PR, so the same inconsistency with Postgres exists also before this PR. @nrainer-materialize, I briefly checked [Evergreen] Inconsistencies with Postgres (collection) #21981, but couldn't find a corresponding issue. Could you please check if it's indeed new?

Ready for review!

@ggevay ggevay marked this pull request as ready for review September 21, 2024 08:58
@ggevay
Copy link
Contributor Author

ggevay commented Sep 21, 2024

@frankmcsherry, may I ask you to review this one? I think you mentioned this optimization several times in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CLUSTER Topics related to the CLUSTER layer A-optimization Area: query optimization and transformation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant