-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Unrevert Arrow table copy method change. #19534
[Datasets] Unrevert Arrow table copy method change. #19534
Conversation
…contain tensor columns. (ray-project#19494)" (ray-project#19517)" This reverts commit a6f9c93.
LGTM if it passes the linear dataset test |
assert len(chunk) == b - a | ||
bufs = chunk.buffers() | ||
print(len(bufs)) | ||
assert bufs[1].address != og_bufs[1].address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
…ation, guaranteeing single-copy.
@ericl I updated this to use chunk concatenation to copy instead of the NumPy roundtrip, this should be guaranteed to be single-copy and should prevent any data coercions from happening. Please take another look! |
Co-authored-by: Eric Liang <[email protected]>
@ericl Datasets tests and Ray Train examples are passing, MacOS failures appear to be unrelated, so this is gtg in terms of CI. |
Unreverts the reversion of #19494. This PR also changes the Arrow table copy method to a per-column copy via chunk concatenation, which
Opening the PR to let the failing example run in CI since I wasn't able to reproduce the failure locally.
Related issue number
Closes #19476
Checks
scripts/format.sh
to lint the changes in this PR.