Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHORE] Refactor Binary Ops #2876

Merged
merged 20 commits into from
Sep 26, 2024
Merged

[CHORE] Refactor Binary Ops #2876

merged 20 commits into from
Sep 26, 2024

Conversation

samster25
Copy link
Member

@samster25 samster25 commented Sep 21, 2024

This PR removes 1.4 million lines of llvm code and reduces the size of daft-core by 44%!

This was accomplished by dropping the SeriesBinaryOps trait which required every Array Type to implement their own binary ops which the default implementation doing a macro expand on both the the rhs type and the output type. This caused a O(Dtype^2) expansion for every Array Type. This was done as a way to let each Array define their own behavior for binary ops but we didn't really leverage that outside of a few temporal types. For example if wanted to implement Timestamp + Duration we could implement it on TimestampArray But since we may also have Duration + Timestamp, we would also have to implement it on DurationArray.

The new approach is much simpler where we dispatch to the target implementation based of left_dtype, right_dtype. the numerics path pretty much stays the same but for temporals theres only a handful of pairs to consider.

I also factored out a bunch of macros into functions, especially ones that would perform the binary ops on PythonArrays

Breakdown of llvm lines:
current main:

  Lines                  Copies               Function name
  -----                  ------               -------------
  3926120                70662                (TOTAL)
   162996 (4.2%,  4.2%)   1208 (1.7%,  1.7%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
   135172 (3.4%,  7.6%)   1201 (1.7%,  3.4%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
   127022 (3.2%, 10.8%)   1136 (1.6%,  5.0%)  alloc::vec::Vec<T,A>::extend_trusted
    82450 (2.1%, 12.9%)     34 (0.0%,  5.1%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::equal
    82450 (2.1%, 15.0%)     34 (0.0%,  5.1%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::gt
    82450 (2.1%, 17.1%)     34 (0.0%,  5.2%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::gte
    82450 (2.1%, 19.2%)     34 (0.0%,  5.2%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::lt
    82450 (2.1%, 21.3%)     34 (0.0%,  5.3%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::lte
    82450 (2.1%, 23.4%)     34 (0.0%,  5.3%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::not_equal
    79322 (2.0%, 25.5%)     34 (0.0%,  5.4%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::mul
    79322 (2.0%, 27.5%)     34 (0.0%,  5.4%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::rem
    76880 (2.0%, 29.4%)     31 (0.0%,  5.4%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::add
    72323 (1.8%, 31.3%)     31 (0.0%,  5.5%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::sub
    67116 (1.7%, 33.0%)     34 (0.0%,  5.5%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::and
    67116 (1.7%, 34.7%)     34 (0.0%,  5.6%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::or
    67116 (1.7%, 36.4%)     34 (0.0%,  5.6%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::xor
    47428 (1.2%, 37.6%)    334 (0.5%,  6.1%)  core::slice::sort::unstable::quicksort::partition_lomuto_branchless_cyclic

after:

  Lines                  Copies               Function name
  -----                  ------               -------------
  2512523                73042                (TOTAL)
   136529 (5.4%,  5.4%)   1208 (1.7%,  1.7%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
   127090 (5.1%, 10.5%)   1201 (1.6%,  3.3%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
   106918 (4.3%, 14.7%)   1136 (1.6%,  4.9%)  alloc::vec::Vec<T,A>::extend_trusted
    42752 (1.7%, 16.4%)    334 (0.5%,  5.3%)  core::slice::sort::unstable::quicksort::partition_lomuto_branchless_cyclic
    39085 (1.6%, 18.0%)   1371 (1.9%,  7.2%)  core::iter::adapters::map::map_fold::{{closure}}
    37698 (1.5%, 19.5%)    383 (0.5%,  7.7%)  alloc::vec::Vec<T,A>::extend_desugared
    36300 (1.4%, 20.9%)    150 (0.2%,  7.9%)  core::slice::sort::shared::find_existing_run
    36236 (1.4%, 22.4%)    521 (0.7%,  8.6%)  core::iter::traits::iterator::Iterator::try_fold
    36018 (1.4%, 23.8%)    373 (0.5%,  9.1%)  <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
    34050 (1.4%, 25.2%)    150 (0.2%,  9.3%)  core::slice::sort::shared::smallsort::small_sort_general_with_scratch
    33082 (1.3%, 26.5%)    278 (0.4%,  9.7%)  arrow2::array::utf8::mutable::MutableUtf8Array<O>::try_from_iter
    32417 (1.3%, 27.8%)    193 (0.3%, 10.0%)  daft_core::array::ops::utf8::substr_compute_result::{{closure}}

Copy link

codspeed-hq bot commented Sep 21, 2024

CodSpeed Performance Report

Merging #2876 will not alter performance

Comparing sammy/refactor-bin-ops (31f90cd) with main (0525bb9)

Summary

✅ 17 untouched benchmarks

@samster25 samster25 changed the title [CHORE ]Sammy/refactor bin ops [CHORE] Refactor Binary Ops Sep 26, 2024
@github-actions github-actions bot added the chore label Sep 26, 2024
Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 67 lines in your changes missing coverage. Please review.

Project coverage is 78.39%. Comparing base (a5cb55d) to head (31f90cd).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-core/src/series/utils/python_fn.rs 77.31% 27 Missing ⚠️
src/daft-core/src/series/ops/arithmetic.rs 83.44% 24 Missing ⚠️
src/daft-core/src/series/ops/logical.rs 70.58% 15 Missing ⚠️
src/daft-core/src/series/ops/comparison.rs 94.73% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2876      +/-   ##
==========================================
+ Coverage   78.37%   78.39%   +0.02%     
==========================================
  Files         596      597       +1     
  Lines       69688    69687       -1     
==========================================
+ Hits        54616    54634      +18     
+ Misses      15072    15053      -19     
Flag Coverage Δ
78.39% <80.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/daft-core/src/array/ops/arithmetic.rs 90.00% <100.00%> (ø)
src/daft-core/src/series/array_impl/data_array.rs 95.95% <ø> (-3.34%) ⬇️
...c/daft-core/src/series/array_impl/logical_array.rs 91.66% <ø> (+3.66%) ⬆️
...rc/daft-core/src/series/array_impl/nested_array.rs 59.79% <ø> (+18.06%) ⬆️
src/daft-core/src/series/mod.rs 98.50% <ø> (ø)
src/daft-core/src/series/ops/between.rs 81.48% <ø> (ø)
src/daft-core/src/series/ops/is_in.rs 95.23% <ø> (ø)
src/daft-core/src/series/ops/mod.rs 100.00% <ø> (+25.00%) ⬆️
src/daft-core/src/series/ops/comparison.rs 95.00% <94.73%> (-5.00%) ⬇️
src/daft-core/src/series/ops/logical.rs 70.58% <70.58%> (ø)
... and 2 more

... and 11 files with indirect coverage changes

@samster25 samster25 marked this pull request as ready for review September 26, 2024 07:25
#[cfg(feature = "python")]
DataType::Python => run_python_binary_operator_fn(lhs, rhs, "add"),
DataType::Utf8 => {
Ok(cast_downcast_op!(lhs, rhs, &DataType::Utf8, Utf8Array, add)?.into_series())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this was likely already an issue, but won't this cause us to do a bunch of extra work casting and downcasting when datatypes match the output type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast logic should check if it already matches the target dtype

Cargo.toml Outdated
@@ -82,6 +82,8 @@ parquet2 = {path = "src/parquet2"}
debug = true

[profile.dev]
debug = "line-tables-only"
opt-level = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need opt-level=1 in dev? That seems like it'll greatly slow down the build times

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left that artifact in by mistake, I meant to push up the optimization level change for build macros

Copy link
Contributor

@colin-ho colin-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as @universalmind303 regarding the opt-level 1 for debug, otherwise it looks good to me!

src/daft-core/src/series/ops/logical.rs Outdated Show resolved Hide resolved
src/daft-core/src/series/ops/arithmetic.rs Outdated Show resolved Hide resolved
@samster25 samster25 merged commit 46c5a7e into main Sep 26, 2024
40 checks passed
@samster25 samster25 deleted the sammy/refactor-bin-ops branch September 26, 2024 19:21
sagiahrac pushed a commit to sagiahrac/Daft that referenced this pull request Oct 7, 2024
This PR removes 1.4 million lines of llvm code and reduces the size of
`daft-core` by 44%!

This was accomplished by dropping the `SeriesBinaryOps` trait which
required every Array Type to implement their own binary ops which the
default implementation doing a macro expand on both the the rhs type and
the output type. This caused a `O(Dtype^2)` expansion for every Array
Type. This was done as a way to let each Array define their own behavior
for binary ops but we didn't really leverage that outside of a few
temporal types. For example if wanted to implement `Timestamp +
Duration` we could implement it on `TimestampArray ` But since we may
also have `Duration + Timestamp`, we would also have to implement it on
`DurationArray`.

The new approach is much simpler where we dispatch to the target
implementation based of `left_dtype, right_dtype`. the numerics path
pretty much stays the same but for temporals theres only a handful of
pairs to consider.

I also factored out a bunch of macros into functions, especially ones
that would perform the binary ops on PythonArrays

Breakdown of llvm lines:
current main:
```
  Lines                  Copies               Function name
  -----                  ------               -------------
  3926120                70662                (TOTAL)
   162996 (4.2%,  4.2%)   1208 (1.7%,  1.7%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
   135172 (3.4%,  7.6%)   1201 (1.7%,  3.4%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
   127022 (3.2%, 10.8%)   1136 (1.6%,  5.0%)  alloc::vec::Vec<T,A>::extend_trusted
    82450 (2.1%, 12.9%)     34 (0.0%,  5.1%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::equal
    82450 (2.1%, 15.0%)     34 (0.0%,  5.1%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::gt
    82450 (2.1%, 17.1%)     34 (0.0%,  5.2%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::gte
    82450 (2.1%, 19.2%)     34 (0.0%,  5.2%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::lt
    82450 (2.1%, 21.3%)     34 (0.0%,  5.3%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::lte
    82450 (2.1%, 23.4%)     34 (0.0%,  5.3%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::not_equal
    79322 (2.0%, 25.5%)     34 (0.0%,  5.4%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::mul
    79322 (2.0%, 27.5%)     34 (0.0%,  5.4%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::rem
    76880 (2.0%, 29.4%)     31 (0.0%,  5.4%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::add
    72323 (1.8%, 31.3%)     31 (0.0%,  5.5%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::sub
    67116 (1.7%, 33.0%)     34 (0.0%,  5.5%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::and
    67116 (1.7%, 34.7%)     34 (0.0%,  5.6%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::or
    67116 (1.7%, 36.4%)     34 (0.0%,  5.6%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::xor
    47428 (1.2%, 37.6%)    334 (0.5%,  6.1%)  core::slice::sort::unstable::quicksort::partition_lomuto_branchless_cyclic
```

after:
```
  Lines                  Copies               Function name
  -----                  ------               -------------
  2512523                73042                (TOTAL)
   136529 (5.4%,  5.4%)   1208 (1.7%,  1.7%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
   127090 (5.1%, 10.5%)   1201 (1.6%,  3.3%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
   106918 (4.3%, 14.7%)   1136 (1.6%,  4.9%)  alloc::vec::Vec<T,A>::extend_trusted
    42752 (1.7%, 16.4%)    334 (0.5%,  5.3%)  core::slice::sort::unstable::quicksort::partition_lomuto_branchless_cyclic
    39085 (1.6%, 18.0%)   1371 (1.9%,  7.2%)  core::iter::adapters::map::map_fold::{{closure}}
    37698 (1.5%, 19.5%)    383 (0.5%,  7.7%)  alloc::vec::Vec<T,A>::extend_desugared
    36300 (1.4%, 20.9%)    150 (0.2%,  7.9%)  core::slice::sort::shared::find_existing_run
    36236 (1.4%, 22.4%)    521 (0.7%,  8.6%)  core::iter::traits::iterator::Iterator::try_fold
    36018 (1.4%, 23.8%)    373 (0.5%,  9.1%)  <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
    34050 (1.4%, 25.2%)    150 (0.2%,  9.3%)  core::slice::sort::shared::smallsort::small_sort_general_with_scratch
    33082 (1.3%, 26.5%)    278 (0.4%,  9.7%)  arrow2::array::utf8::mutable::MutableUtf8Array<O>::try_from_iter
    32417 (1.3%, 27.8%)    193 (0.3%, 10.0%)  daft_core::array::ops::utf8::substr_compute_result::{{closure}}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants