[CHORE] Refactor Binary Ops #2876

samster25 · 2024-09-21T02:22:41Z

This PR removes 1.4 million lines of llvm code and reduces the size of daft-core by 44%!

This was accomplished by dropping the SeriesBinaryOps trait which required every Array Type to implement their own binary ops which the default implementation doing a macro expand on both the the rhs type and the output type. This caused a O(Dtype^2) expansion for every Array Type. This was done as a way to let each Array define their own behavior for binary ops but we didn't really leverage that outside of a few temporal types. For example if wanted to implement Timestamp + Duration we could implement it on TimestampArray But since we may also have Duration + Timestamp, we would also have to implement it on DurationArray.

The new approach is much simpler where we dispatch to the target implementation based of left_dtype, right_dtype. the numerics path pretty much stays the same but for temporals theres only a handful of pairs to consider.

I also factored out a bunch of macros into functions, especially ones that would perform the binary ops on PythonArrays

Breakdown of llvm lines:
current main:

  Lines                  Copies               Function name
  -----                  ------               -------------
  3926120                70662                (TOTAL)
   162996 (4.2%,  4.2%)   1208 (1.7%,  1.7%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
   135172 (3.4%,  7.6%)   1201 (1.7%,  3.4%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
   127022 (3.2%, 10.8%)   1136 (1.6%,  5.0%)  alloc::vec::Vec<T,A>::extend_trusted
    82450 (2.1%, 12.9%)     34 (0.0%,  5.1%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::equal
    82450 (2.1%, 15.0%)     34 (0.0%,  5.1%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::gt
    82450 (2.1%, 17.1%)     34 (0.0%,  5.2%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::gte
    82450 (2.1%, 19.2%)     34 (0.0%,  5.2%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::lt
    82450 (2.1%, 21.3%)     34 (0.0%,  5.3%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::lte
    82450 (2.1%, 23.4%)     34 (0.0%,  5.3%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::not_equal
    79322 (2.0%, 25.5%)     34 (0.0%,  5.4%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::mul
    79322 (2.0%, 27.5%)     34 (0.0%,  5.4%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::rem
    76880 (2.0%, 29.4%)     31 (0.0%,  5.4%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::add
    72323 (1.8%, 31.3%)     31 (0.0%,  5.5%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::sub
    67116 (1.7%, 33.0%)     34 (0.0%,  5.5%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::and
    67116 (1.7%, 34.7%)     34 (0.0%,  5.6%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::or
    67116 (1.7%, 36.4%)     34 (0.0%,  5.6%)  daft_core::series::array_impl::binary_ops::SeriesBinaryOps::xor
    47428 (1.2%, 37.6%)    334 (0.5%,  6.1%)  core::slice::sort::unstable::quicksort::partition_lomuto_branchless_cyclic

after:

  Lines                  Copies               Function name
  -----                  ------               -------------
  2512523                73042                (TOTAL)
   136529 (5.4%,  5.4%)   1208 (1.7%,  1.7%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
   127090 (5.1%, 10.5%)   1201 (1.6%,  3.3%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
   106918 (4.3%, 14.7%)   1136 (1.6%,  4.9%)  alloc::vec::Vec<T,A>::extend_trusted
    42752 (1.7%, 16.4%)    334 (0.5%,  5.3%)  core::slice::sort::unstable::quicksort::partition_lomuto_branchless_cyclic
    39085 (1.6%, 18.0%)   1371 (1.9%,  7.2%)  core::iter::adapters::map::map_fold::{{closure}}
    37698 (1.5%, 19.5%)    383 (0.5%,  7.7%)  alloc::vec::Vec<T,A>::extend_desugared
    36300 (1.4%, 20.9%)    150 (0.2%,  7.9%)  core::slice::sort::shared::find_existing_run
    36236 (1.4%, 22.4%)    521 (0.7%,  8.6%)  core::iter::traits::iterator::Iterator::try_fold
    36018 (1.4%, 23.8%)    373 (0.5%,  9.1%)  <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
    34050 (1.4%, 25.2%)    150 (0.2%,  9.3%)  core::slice::sort::shared::smallsort::small_sort_general_with_scratch
    33082 (1.3%, 26.5%)    278 (0.4%,  9.7%)  arrow2::array::utf8::mutable::MutableUtf8Array<O>::try_from_iter
    32417 (1.3%, 27.8%)    193 (0.3%, 10.0%)  daft_core::array::ops::utf8::substr_compute_result::{{closure}}

codspeed-hq · 2024-09-21T02:34:16Z

CodSpeed Performance Report

Merging #2876 will not alter performance

_{Comparing sammy/refactor-bin-ops (31f90cd) with main (0525bb9)}

Summary

✅ 17 untouched benchmarks

codecov · 2024-09-26T07:09:22Z

Codecov Report

Attention: Patch coverage is 80.00000% with 67 lines in your changes missing coverage. Please review.

Project coverage is 78.39%. Comparing base (a5cb55d) to head (31f90cd).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-core/src/series/utils/python_fn.rs	77.31%	27 Missing ⚠️
src/daft-core/src/series/ops/arithmetic.rs	83.44%	24 Missing ⚠️
src/daft-core/src/series/ops/logical.rs	70.58%	15 Missing ⚠️
src/daft-core/src/series/ops/comparison.rs	94.73%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2876      +/-   ##
==========================================
+ Coverage   78.37%   78.39%   +0.02%     
==========================================
  Files         596      597       +1     
  Lines       69688    69687       -1     
==========================================
+ Hits        54616    54634      +18     
+ Misses      15072    15053      -19

Flag	Coverage Δ
	`78.39% <80.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/daft-core/src/array/ops/arithmetic.rs	`90.00% <100.00%> (ø)`
src/daft-core/src/series/array_impl/data_array.rs	`95.95% <ø> (-3.34%)`	⬇️
...c/daft-core/src/series/array_impl/logical_array.rs	`91.66% <ø> (+3.66%)`	⬆️
...rc/daft-core/src/series/array_impl/nested_array.rs	`59.79% <ø> (+18.06%)`	⬆️
src/daft-core/src/series/mod.rs	`98.50% <ø> (ø)`
src/daft-core/src/series/ops/between.rs	`81.48% <ø> (ø)`
src/daft-core/src/series/ops/is_in.rs	`95.23% <ø> (ø)`
src/daft-core/src/series/ops/mod.rs	`100.00% <ø> (+25.00%)`	⬆️
src/daft-core/src/series/ops/comparison.rs	`95.00% <94.73%> (-5.00%)`	⬇️
src/daft-core/src/series/ops/logical.rs	`70.58% <70.58%> (ø)`
... and 2 more

... and 11 files with indirect coverage changes

universalmind303 · 2024-09-26T15:42:32Z

src/daft-core/src/series/ops/arithmetic.rs

+            #[cfg(feature = "python")]
+            DataType::Python => run_python_binary_operator_fn(lhs, rhs, "add"),
+            DataType::Utf8 => {
+                Ok(cast_downcast_op!(lhs, rhs, &DataType::Utf8, Utf8Array, add)?.into_series())


so this was likely already an issue, but won't this cause us to do a bunch of extra work casting and downcasting when datatypes match the output type?

The cast logic should check if it already matches the target dtype

universalmind303 · 2024-09-26T15:45:21Z

Cargo.toml

@@ -82,6 +82,8 @@ parquet2 = {path = "src/parquet2"}
 debug = true

 [profile.dev]
+debug = "line-tables-only"
+opt-level = 1


do we really need opt-level=1 in dev? That seems like it'll greatly slow down the build times

Left that artifact in by mistake, I meant to push up the optimization level change for build macros

colin-ho

Same comment as @universalmind303 regarding the opt-level 1 for debug, otherwise it looks good to me!

src/daft-core/src/series/ops/logical.rs

src/daft-core/src/series/ops/arithmetic.rs

This PR removes 1.4 million lines of llvm code and reduces the size of `daft-core` by 44%! This was accomplished by dropping the `SeriesBinaryOps` trait which required every Array Type to implement their own binary ops which the default implementation doing a macro expand on both the the rhs type and the output type. This caused a `O(Dtype^2)` expansion for every Array Type. This was done as a way to let each Array define their own behavior for binary ops but we didn't really leverage that outside of a few temporal types. For example if wanted to implement `Timestamp + Duration` we could implement it on `TimestampArray ` But since we may also have `Duration + Timestamp`, we would also have to implement it on `DurationArray`. The new approach is much simpler where we dispatch to the target implementation based of `left_dtype, right_dtype`. the numerics path pretty much stays the same but for temporals theres only a handful of pairs to consider. I also factored out a bunch of macros into functions, especially ones that would perform the binary ops on PythonArrays Breakdown of llvm lines: current main: ``` Lines Copies Function name ----- ------ ------------- 3926120 70662 (TOTAL) 162996 (4.2%, 4.2%) 1208 (1.7%, 1.7%) <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold 135172 (3.4%, 7.6%) 1201 (1.7%, 3.4%) <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter 127022 (3.2%, 10.8%) 1136 (1.6%, 5.0%) alloc::vec::Vec<T,A>::extend_trusted 82450 (2.1%, 12.9%) 34 (0.0%, 5.1%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::equal 82450 (2.1%, 15.0%) 34 (0.0%, 5.1%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::gt 82450 (2.1%, 17.1%) 34 (0.0%, 5.2%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::gte 82450 (2.1%, 19.2%) 34 (0.0%, 5.2%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::lt 82450 (2.1%, 21.3%) 34 (0.0%, 5.3%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::lte 82450 (2.1%, 23.4%) 34 (0.0%, 5.3%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::not_equal 79322 (2.0%, 25.5%) 34 (0.0%, 5.4%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::mul 79322 (2.0%, 27.5%) 34 (0.0%, 5.4%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::rem 76880 (2.0%, 29.4%) 31 (0.0%, 5.4%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::add 72323 (1.8%, 31.3%) 31 (0.0%, 5.5%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::sub 67116 (1.7%, 33.0%) 34 (0.0%, 5.5%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::and 67116 (1.7%, 34.7%) 34 (0.0%, 5.6%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::or 67116 (1.7%, 36.4%) 34 (0.0%, 5.6%) daft_core::series::array_impl::binary_ops::SeriesBinaryOps::xor 47428 (1.2%, 37.6%) 334 (0.5%, 6.1%) core::slice::sort::unstable::quicksort::partition_lomuto_branchless_cyclic ``` after: ``` Lines Copies Function name ----- ------ ------------- 2512523 73042 (TOTAL) 136529 (5.4%, 5.4%) 1208 (1.7%, 1.7%) <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold 127090 (5.1%, 10.5%) 1201 (1.6%, 3.3%) <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter 106918 (4.3%, 14.7%) 1136 (1.6%, 4.9%) alloc::vec::Vec<T,A>::extend_trusted 42752 (1.7%, 16.4%) 334 (0.5%, 5.3%) core::slice::sort::unstable::quicksort::partition_lomuto_branchless_cyclic 39085 (1.6%, 18.0%) 1371 (1.9%, 7.2%) core::iter::adapters::map::map_fold::{{closure}} 37698 (1.5%, 19.5%) 383 (0.5%, 7.7%) alloc::vec::Vec<T,A>::extend_desugared 36300 (1.4%, 20.9%) 150 (0.2%, 7.9%) core::slice::sort::shared::find_existing_run 36236 (1.4%, 22.4%) 521 (0.7%, 8.6%) core::iter::traits::iterator::Iterator::try_fold 36018 (1.4%, 23.8%) 373 (0.5%, 9.1%) <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}} 34050 (1.4%, 25.2%) 150 (0.2%, 9.3%) core::slice::sort::shared::smallsort::small_sort_general_with_scratch 33082 (1.3%, 26.5%) 278 (0.4%, 9.7%) arrow2::array::utf8::mutable::MutableUtf8Array<O>::try_from_iter 32417 (1.3%, 27.8%) 193 (0.3%, 10.0%) daft_core::array::ops::utf8::substr_compute_result::{{closure}} ```

samster25 added 12 commits September 10, 2024 16:09

wip

c62887e

working comparisions

60e230e

factor out python macros to function

aa048a3

move code out of series mod

69dc9d9

move utils to series

eed7da4

factor out logical

524c118

faster inc builds

fe229d1

merge in main

8319b3a

clean up

9cce0ea

working bitwise

b0694df

fix name for add

89b9ad6

enable boolean ops

2d98d54

samster25 added 3 commits September 25, 2024 17:26

merge in main

ca5e1c7

lint fixes

35a62b0

add temporal types

d8beb13

samster25 changed the title ~~[CHORE ]Sammy/refactor bin ops~~ [CHORE] Refactor Binary Ops Sep 26, 2024

github-actions bot added the chore label Sep 26, 2024

samster25 added 3 commits September 25, 2024 23:38

del old code

1585c6c

remove commented out code

ca0d185

Merge remote-tracking branch 'origin' into sammy/refactor-bin-ops

c8e6774

samster25 marked this pull request as ready for review September 26, 2024 07:25

samster25 requested review from jaychia, universalmind303 and colin-ho September 26, 2024 07:26

universalmind303 reviewed Sep 26, 2024

View reviewed changes

colin-ho approved these changes Sep 26, 2024

View reviewed changes

src/daft-core/src/series/ops/logical.rs Outdated Show resolved Hide resolved

src/daft-core/src/series/ops/arithmetic.rs Outdated Show resolved Hide resolved

enable build macros to be level 3

a753a7b

rename macros

31f90cd

samster25 merged commit 46c5a7e into main Sep 26, 2024
40 checks passed

samster25 deleted the sammy/refactor-bin-ops branch September 26, 2024 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CHORE] Refactor Binary Ops #2876

[CHORE] Refactor Binary Ops #2876

samster25 commented Sep 21, 2024 •

edited

Loading

codspeed-hq bot commented Sep 21, 2024 •

edited

Loading

codecov bot commented Sep 26, 2024 •

edited

Loading

universalmind303 Sep 26, 2024

samster25 Sep 26, 2024

universalmind303 Sep 26, 2024

samster25 Sep 26, 2024

colin-ho left a comment

[CHORE] Refactor Binary Ops #2876

[CHORE] Refactor Binary Ops #2876

Conversation

samster25 commented Sep 21, 2024 • edited Loading

codspeed-hq bot commented Sep 21, 2024 • edited Loading

CodSpeed Performance Report

Merging #2876 will not alter performance

Summary

codecov bot commented Sep 26, 2024 • edited Loading

Codecov Report

universalmind303 Sep 26, 2024

Choose a reason for hiding this comment

samster25 Sep 26, 2024

Choose a reason for hiding this comment

universalmind303 Sep 26, 2024

Choose a reason for hiding this comment

samster25 Sep 26, 2024

Choose a reason for hiding this comment

colin-ho left a comment

Choose a reason for hiding this comment

samster25 commented Sep 21, 2024 •

edited

Loading

codspeed-hq bot commented Sep 21, 2024 •

edited

Loading

codecov bot commented Sep 26, 2024 •

edited

Loading