Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add projection to HashJoinExec to avoid unecessary output creation #6768

Closed
Dandandan opened this issue Jun 26, 2023 · 4 comments
Closed

Add projection to HashJoinExec to avoid unecessary output creation #6768

Dandandan opened this issue Jun 26, 2023 · 4 comments
Labels
enhancement New feature or request performance Make DataFusion faster

Comments

@Dandandan
Copy link
Contributor

Dandandan commented Jun 26, 2023

Is your feature request related to a problem or challenge?

Currently HashJoinExec returns all the columns from both sides of the join.

We can add the necessary output columns so we can save some work in gathering values (take) from batches based on matching indices.

This is similar to #5436

Describe the solution you'd like

Add projection to hashjoinexec, use it to reduce output.

Describe alternatives you've considered

No response

Additional context

No response

@Dandandan Dandandan added enhancement New feature or request performance Make DataFusion faster labels Jun 26, 2023
@Dandandan Dandandan changed the title Add projection to HashJoinExec to avoid unecessary output creation Add projection to HashJoinExec to avoid unecessary output creation Jun 26, 2023
@my-vegetable-has-exploded
Copy link
Contributor

I'd like to have a try.

@eejbyfeldt
Copy link
Contributor

Is there something left to be done in this issue after #9236 My understanding is that it completes all the work currently described in the ticket.

@Dandandan
Copy link
Contributor Author

It's done

@Dandandan
Copy link
Contributor Author

Thanks @eejbyfeldt for the observation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Make DataFusion faster
Projects
None yet
Development

No branches or pull requests

3 participants