-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" #25031
Conversation
… tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (ray-project#24812)" (ray-project#25017)" This reverts commit fbfb134.
elif isinstance(batch, np.ndarray): | ||
num_bytes += batch.nbytes | ||
else: | ||
# NOTE: This isn't recursive and will just return the size of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems this is the recommend recursive way: https://code.activestate.com/recipes/577504/: but I think we can leave it as a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was going to open up a separate issue for this, since we currently calculate the byte size of simple blocks with this top-level sys.getsizeof()
but we might want to use a recursive recipe both there and here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we can use BlockAccessor.for_block(b).size_bytes()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl For sure, but it's batch, not a block! So
block = BlockAccessor.batch_to_block(batch)
num_bytes += BlockAccessor.for_block(block).nbytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
ML tests now pass, and failing Datasets test is the flaky test that's reverted in master, so this looks good to merge. We can wait until MacOS CI jobs complete since this isn't time-sensitive. |
…atically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (ray-project#25031)" (ray-project#25057)" This reverts commit fb2933a.
…ovide tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (#25031)" (#25531) Unreverts #24812, skipping the memory releasing tests that are already flaky. We have a separate issue tracking the unskipping of these memory releasing tests, once we find a more reliable way to test them. * Revert "Revert "Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (#25031)" (#25057)" This reverts commit fb2933a. * Skip shuffle memory release test.
…ovide tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (ray-project#25031)" (ray-project#25531) Unreverts ray-project#24812, skipping the memory releasing tests that are already flaky. We have a separate issue tracking the unskipping of these memory releasing tests, once we find a more reliable way to test them. * Revert "Revert "Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (ray-project#25031)" (ray-project#25057)" This reverts commit fb2933a. * Skip shuffle memory release test.
Fixes the check ingest utility to handle non-Pandas native batches.
Checks
scripts/format.sh
to lint the changes in this PR.