New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Feat(optimizer): optimize pivots #1617

Merged

georgesittas merged 12 commits into main from jo/pivot_optimization

May 16, 2023

Collaborator

georgesittas commented May 14, 2023 •

edited

Loading

Fixes #1449

This PR introduces optimizer logic for handling the PIVOT operator, i.e. explode & qualify the columns of the table it produces where necessary. It's a first draft towards the general solution, which is more difficult because we may have multiple JOINs and / or PIVOTs, UNPIVOTs chained together in non-trivial ways.

We could also try to transform PIVOT operators using exp.Case expressions. This seems a bit less straightforward to me if we want to get it right in all cases. For example:

What do we GROUP BY if there are additional columns besides what's referenced in the exp.Pivot expression?
Can we always map the aggregations trivially into projections? How would we handle stuff like COUNT(*)?
How does this transformation behave when there are JOINs and / or other (UN)PIVOT operators applied?

The major advantage of this approach, though, would be that we'd get back a canonical query without PIVOTs.

References:

georgesittas requested review from tobymao and barakalon

May 14, 2023 10:03

georgesittas force-pushed the jo/pivot_optimization branch from 5b3c3cd to c279777 Compare

May 14, 2023 10:12

georgesittas commented

View reviewed changes

sqlglot/dialects/dialect.py Show resolved Hide resolved

georgesittas commented

View reviewed changes

sqlglot/optimizer/eliminate_subqueries.py Show resolved Hide resolved

georgesittas commented

View reviewed changes

sqlglot/optimizer/merge_subqueries.py Show resolved Hide resolved

georgesittas commented

View reviewed changes

sqlglot/optimizer/pushdown_projections.py Show resolved Hide resolved

georgesittas commented

View reviewed changes

sqlglot/optimizer/qualify_columns.py Show resolved Hide resolved

georgesittas commented

View reviewed changes

sqlglot/optimizer/qualify_columns.py Outdated Show resolved Hide resolved

georgesittas commented

View reviewed changes

sqlglot/optimizer/qualify_columns.py Outdated Show resolved Hide resolved

Collaborator Author

georgesittas commented May 14, 2023 •

edited

Loading

Left some comments on the PR for clarity (they're now marked as resolved), let me know if something's not clear. Interested to hear alternatives, would love to simplify this somehow.

georgesittas commented

View reviewed changes

sqlglot/optimizer/qualify_columns.py Show resolved Hide resolved

georgesittas commented

View reviewed changes

sqlglot/optimizer/scope.py Show resolved Hide resolved

georgesittas commented

View reviewed changes

sqlglot/optimizer/qualify_tables.py Show resolved Hide resolved

tobymao reviewed

View reviewed changes

sqlglot/optimizer/qualify_columns.py Outdated Show resolved Hide resolved

tobymao reviewed

View reviewed changes

sqlglot/optimizer/qualify_columns.py Outdated Show resolved Hide resolved

georgesittas added 8 commits

May 15, 2023 20:01


          Feat(optimizer): optimize pivots

81b92b1


          Fixup

82811a5


          Simplify

6551eab


          Cleanup

5bd1398


          Fix pivot sql generation

cea3a89


          Fixed snowflake pivot column names, add another optimizer test

cc231cc


          Fixed issue with pivoted cte source, added bigquery test

5232b85


          Factor out some computations

f8baa73

georgesittas force-pushed the jo/pivot_optimization branch from 6703b06 to f8baa73 Compare

May 15, 2023 17:01

Collaborator Author

georgesittas commented May 15, 2023 •

edited

Loading

TODO:

Add transformation to remove pivot alias for Spark

tobymao reviewed

View reviewed changes

sqlglot/optimizer/qualify_columns.py Outdated Show resolved Hide resolved

tobymao reviewed

View reviewed changes

sqlglot/optimizer/qualify_columns.py Outdated Show resolved Hide resolved

tobymao reviewed

View reviewed changes

sqlglot/optimizer/scope.py Outdated Show resolved Hide resolved


          Cleanup

dcdadb4

tobymao approved these changes

View reviewed changes

georgesittas added 2 commits

May 16, 2023 01:10


          Add transform to unalias pivot in spark, more tests

54b0b0b


          Typo

930ed17

Collaborator Author

georgesittas commented May 15, 2023

@tobymao made a few more changes, let me know if you take a look.

Moved the unqualify_pivot_columns transform to spark since it's the only dialect using it.
Created an _unalias_pivot transform just for spark that removes table aliases from pivots.
Improved handling of the alias argument in subquery.
Added more spark tests & new optimizer test that demonstrates the above transformations.


          Comment fixup

cbe8c5d

georgesittas merged commit 4b1aa02 into main

georgesittas deleted the jo/pivot_optimization branch

May 16, 2023 13:08

adrianisk pushed a commit to adrianisk/sqlglot that referenced this pull request


          Feat(optimizer): optimize pivots (tobymao#1617)

00f59b6

* Feat(optimizer): optimize pivots

* Fixup

* Simplify

* Cleanup

* Fix pivot sql generation

* Fixed snowflake pivot column names, add another optimizer test

* Fixed issue with pivoted cte source, added bigquery test

* Factor out some computations

* Cleanup

* Add transform to unalias pivot in spark, more tests

* Typo

* Comment fixup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet