Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum repeatedly substituted alias size #475

Merged
merged 2 commits into from
Jan 30, 2019

Conversation

j-esse
Copy link

@j-esse j-esse commented Jan 29, 2019

https://issues.apache.org/jira/browse/SPARK-26626
apache#23556

What changes were proposed in this pull request?

This adds a spark.sql.maxRepeatedAliasSize config option, which specifies the maximum size of an aliased expression to be substituted (in CollapseProject and PhysicalOperation). This prevents large aliased expressions from being substituted multiple times and exploding the size of the expression tree, eventually OOMing the driver.

The default config value of 100 was chosen through testing to find the optimally performant value:

image

How was this patch tested?

Added unit tests, and did manual testing

@vinooganesh
Copy link

vinooganesh commented Jan 29, 2019

@j-esse will approve, but can you add this to FORK.md as well?

@j-esse
Copy link
Author

j-esse commented Jan 30, 2019

@vinooganesh done!

@bulldozer-bot bulldozer-bot bot merged commit a51fa9c into master Jan 30, 2019
@bulldozer-bot bulldozer-bot bot deleted the feature/cap-alias-substitution-palantir branch January 30, 2019 21:24
@robert3005
Copy link

For future - please follow same PR title as upstream

@j-esse
Copy link
Author

j-esse commented Feb 1, 2019

@robert3005 ah sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants