Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" #25031

clarkzinzow · 2022-05-20T16:02:57Z

Fixes the check ingest utility to handle non-Pandas native batches.

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

… tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (ray-project#24812)" (ray-project#25017)" This reverts commit fbfb134.

jjyao · 2022-05-20T16:19:37Z

python/ray/ml/utils/check_ingest.py

+                    elif isinstance(batch, np.ndarray):
+                        num_bytes += batch.nbytes
+                    else:
+                        # NOTE: This isn't recursive and will just return the size of


seems this is the recommend recursive way: https://code.activestate.com/recipes/577504/: but I think we can leave it as a TODO.

Yeah I was going to open up a separate issue for this, since we currently calculate the byte size of simple blocks with this top-level sys.getsizeof() but we might want to use a recursive recipe both there and here.

In the future we can use BlockAccessor.for_block(b).size_bytes()

@ericl For sure, but it's batch, not a block! So

block = BlockAccessor.batch_to_block(batch) num_bytes += BlockAccessor.for_block(block).nbytes

clarkzinzow · 2022-05-20T18:16:26Z

ML tests now pass, and failing Datasets test is the flaky test that's reverted in master, so this looks good to merge. We can wait until MacOS CI jobs complete since this isn't time-sensitive.

… provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (#25031)" This reverts commit 9ea5a8e.

… provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (#25031)" (#25057) Reverts #25031 It looks to be still somewhat flaky.

…atically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (ray-project#25031)" (ray-project#25057)" This reverts commit fb2933a.

…ovide tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (#25031)" (#25531) Unreverts #24812, skipping the memory releasing tests that are already flaky. We have a separate issue tracking the unskipping of these memory releasing tests, once we find a more reliable way to test them. * Revert "Revert "Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (#25031)" (#25057)" This reverts commit fb2933a. * Skip shuffle memory release test.

…ovide tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (ray-project#25031)" (ray-project#25531) Unreverts ray-project#24812, skipping the memory releasing tests that are already flaky. We have a separate issue tracking the unskipping of these memory releasing tests, once we find a more reliable way to test them. * Revert "Revert "Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (ray-project#25031)" (ray-project#25057)" This reverts commit fb2933a. * Skip shuffle memory release test.

clarkzinzow added 2 commits May 20, 2022 15:49

Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide…

5a4da75

… tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (ray-project#24812)" (ray-project#25017)" This reverts commit fbfb134.

Fix check ingest.

2b33528

clarkzinzow requested review from ericl, scv119, jjyao and maxpumperla as code owners May 20, 2022 16:02

clarkzinzow assigned ericl, jjyao and krfricke May 20, 2022

jjyao approved these changes May 20, 2022

View reviewed changes

clarkzinzow added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label May 20, 2022

ericl merged commit 9ea5a8e into ray-project:master May 20, 2022

mwtian added a commit that referenced this pull request May 21, 2022

Revert "Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically…

aecce9c

… provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (#25031)" This reverts commit 9ea5a8e.

mwtian mentioned this pull request May 21, 2022

Revert "Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets.""" #25057

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" #25031

Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" #25031

clarkzinzow commented May 20, 2022

jjyao May 20, 2022

clarkzinzow May 20, 2022

ericl May 20, 2022

clarkzinzow May 20, 2022

ericl May 20, 2022

clarkzinzow commented May 20, 2022 •

edited

Loading

Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" #25031

Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" #25031

Conversation

clarkzinzow commented May 20, 2022

Checks

jjyao May 20, 2022

Choose a reason for hiding this comment

clarkzinzow May 20, 2022

Choose a reason for hiding this comment

ericl May 20, 2022

Choose a reason for hiding this comment

clarkzinzow May 20, 2022

Choose a reason for hiding this comment

ericl May 20, 2022

Choose a reason for hiding this comment

clarkzinzow commented May 20, 2022 • edited Loading

clarkzinzow commented May 20, 2022 •

edited

Loading