[CHORE] Remove user-facing arguments for casting to Ray's tensor type #2802

jaychia · 2024-09-06T19:28:24Z

Summary

Cleanup PR.

Removes cast_tensors_to_ray_tensor_dtype as a user-facing argument in our export methods (e.g. to_arrow, to_pandas etc) -- this is really only intended to be used when a user is converting a Daft dataframe to a Ray dataset anyways and there isn't a need to expose this functionality to a user
Instead, the logic for casting daft.DataType.tensor data to a Ray Data tensor type is done inside of the conversion code for Ray Data (_make_ray_block_from_micropartition). This lets us contain the ickiness of that code without having it touch all of our to_arrow logic
Also removes _trim_pyarrow_large_arrays which was a legacy codepath that doesn't get hit anymore

codspeed-hq · 2024-09-06T19:44:10Z

CodSpeed Performance Report

Merging #2802 will degrade performances by 13.33%

_{Comparing jay/arrow-encode-decode (b2a1e6b) with main (e3fbf88)}

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 14 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`jay/arrow-encode-decode`	Change
❌	`test_count[1 Small File]`	20.5 ms	23.6 ms	-13.33%
⚡	`test_show[100 Small Files]`	298.8 ms	50.9 ms	×5.9

jaychia · 2024-09-06T20:28:49Z

daft/series.py

-                # type since it expects all tensor elements to have the same number of dimensions, which Daft does not enforce.
-                # TODO(Clark): Convert directly to Ray's variable-shaped tensor extension type when all tensor
-                # elements have the same number of dimensions, without going through pylist roundtrip.
-                return ArrowTensorArray.from_numpy(self.to_pylist())


I omitted this logic in this refactor because I have no idea what this is doing. Also there aren't any tests to help me understand so 🤷

Actually, added this back in to pass tests

codecov · 2024-09-06T21:06:38Z

Codecov Report

Attention: Patch coverage is 96.07843% with 2 lines in your changes missing coverage. Please review.

Project coverage is 63.11%. Comparing base (6fe408c) to head (b2a1e6b).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
daft/runners/ray_runner.py	91.30%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #2802    +/-   ##
========================================
  Coverage   63.11%   63.11%            
========================================
  Files        1008     1007     -1     
  Lines      114269   114135   -134     
========================================
- Hits        72117    72038    -79     
+ Misses      42152    42097    -55

Files with missing lines	Coverage Δ
daft/dataframe/dataframe.py	`86.05% <100.00%> (+0.04%)`	⬆️
daft/datatype.py	`91.10% <100.00%> (ø)`
daft/runners/partitioning.py	`81.33% <100.00%> (ø)`
daft/series.py	`89.50% <100.00%> (-0.03%)`	⬇️
daft/table/micropartition.py	`91.07% <100.00%> (ø)`
daft/table/table.py	`60.56% <100.00%> (+1.36%)`	⬆️
src/daft-core/src/python/datatype.rs	`81.29% <100.00%> (-0.62%)`	⬇️
daft/runners/ray_runner.py	`88.03% <91.30%> (+0.12%)`	⬆️

... and 22 files with indirect coverage changes

Jay Chia added 4 commits September 5, 2024 18:07

Remove option to coerce Ray tensor types

14de4aa

Fix tests

be668c1

Add casting in Ray code

7fafc0e

Fix test by disallowing Python to arrow conversions

b65d5e2

github-actions bot added the chore label Sep 6, 2024

jaychia commented Sep 6, 2024

View reviewed changes

Add handling for non fixed size tensors

b2a1e6b

jaychia requested a review from kevinzwang September 6, 2024 20:50

jaychia merged commit 3c2af5a into main Sep 7, 2024
38 of 39 checks passed

jaychia deleted the jay/arrow-encode-decode branch September 7, 2024 23:04

jaychia mentioned this pull request Sep 9, 2024

[PERF] Use to_arrow_iter in to_arrow to avoid unnecessary array concats #2780

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CHORE] Remove user-facing arguments for casting to Ray's tensor type #2802

[CHORE] Remove user-facing arguments for casting to Ray's tensor type #2802

jaychia commented Sep 6, 2024 •

edited

Loading

codspeed-hq bot commented Sep 6, 2024 •

edited

Loading

jaychia Sep 6, 2024

jaychia Sep 6, 2024

codecov bot commented Sep 6, 2024 •

edited

Loading

[CHORE] Remove user-facing arguments for casting to Ray's tensor type #2802

[CHORE] Remove user-facing arguments for casting to Ray's tensor type #2802

Conversation

jaychia commented Sep 6, 2024 • edited Loading

Summary

codspeed-hq bot commented Sep 6, 2024 • edited Loading

CodSpeed Performance Report

Merging #2802 will degrade performances by 13.33%

Summary

Benchmarks breakdown

jaychia Sep 6, 2024

Choose a reason for hiding this comment

jaychia Sep 6, 2024

Choose a reason for hiding this comment

codecov bot commented Sep 6, 2024 • edited Loading

Codecov Report

jaychia commented Sep 6, 2024 •

edited

Loading

codspeed-hq bot commented Sep 6, 2024 •

edited

Loading

codecov bot commented Sep 6, 2024 •

edited

Loading