Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] Move with_column and exclude function logic to Rust side, add with_columns #2167

Merged
merged 3 commits into from
Apr 24, 2024

Conversation

kevinzwang
Copy link
Member

@kevinzwang kevinzwang commented Apr 22, 2024

This PR includes two things:

  • a performance improvement to calling (not execution of) DataFrame.with_column and DataFrame.exclude where projection conversion is done on the Rust side so that costly conversions of the dataframe fields into Python objects are no longer necessary
  • the addition of the DataFrame.with_columns method which allows adding multiple columns to a dataframe at once. Will also improve performance over calling with_column multiple times. Since I was modifying the with_column function I figured I would add this as a drive-by

I also took this opportunity to clean up some of the rust builder code.

…_columns

also clean up some of the rust builder code
@github-actions github-actions bot added documentation Improvements or additions to documentation performance labels Apr 22, 2024
@kevinzwang kevinzwang requested a review from jaychia April 22, 2024 08:36
@kevinzwang kevinzwang marked this pull request as draft April 22, 2024 08:45
Copy link

codecov bot commented Apr 22, 2024

Codecov Report

Attention: Patch coverage is 57.69231% with 11 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (main@e8fd2de). Click here to learn what that means.
Report is 6 commits behind head on main.

❗ Current head 7545832 differs from pull request most recent head f3e5e32. Consider uploading reports for the commit f3e5e32 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2167   +/-   ##
=======================================
  Coverage        ?   84.97%           
=======================================
  Files           ?       68           
  Lines           ?     7374           
  Branches        ?        0           
=======================================
  Hits            ?     6266           
  Misses          ?     1108           
  Partials        ?        0           
Files Coverage Δ
daft/logical/builder.py 90.40% <90.90%> (ø)
daft/dataframe/dataframe.py 87.82% <33.33%> (ø)

@kevinzwang kevinzwang marked this pull request as ready for review April 23, 2024 05:44
@kevinzwang kevinzwang added the enhancement New feature or request label Apr 23, 2024
Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, were we able to run any quick sanity benchmarking to make sure that this fixes the slowdowns?

daft/dataframe/dataframe.py Outdated Show resolved Hide resolved
src/daft-plan/src/builder.rs Outdated Show resolved Hide resolved
src/daft-plan/src/builder.rs Outdated Show resolved Hide resolved
src/daft-plan/src/builder.rs Show resolved Hide resolved
src/daft-plan/src/builder.rs Show resolved Hide resolved
@kevinzwang
Copy link
Member Author

Quick performance results:

Péter's script in the community Slack went from an average of 49.02s to 5.84s per run on my machine, averaged over 5 runs.

@kevinzwang kevinzwang enabled auto-merge (squash) April 24, 2024 23:07
@kevinzwang kevinzwang merged commit 3582aa7 into main Apr 24, 2024
29 checks passed
@kevinzwang kevinzwang deleted the kevin/rust-builder-projections branch April 24, 2024 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants