-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "[Data] Change offsets to int64 and change to LargeList for ArrowTensorArray" #46511
Revert "[Data] Change offsets to int64 and change to LargeList for ArrowTensorArray" #46511
Conversation
…rowTenso…" This reverts commit f4e8538.
@terraflops1048576 i am looking into the release test failure, let me see if there is a quick fix available, otherwise i will try to find a reproducible example. |
@scottjlee w00t i already revert it; feel free to merge it again once it's fixed |
@terraflops1048576 here is a minimal reproducible example which fails:
with this traceback:
I took an initial look, but wasn't able to find anything conclusive. Would you be able to take a look, and see if you can debug the issue? Thanks! |
@scottjlee The cause of this problem is that the example data uses the old tensor storage format, which uses lists (32-bit offset arrays), and the PR I introduced uses the new tensor storage format, which uses large lists (64-bit offset arrays). Therefore, loading from the old format is backward-incompatible without some careful consideration toward fixing it (or refreshing the examples to use the new format) I think this is possibly solvable with some conversion code, but I'll have to give it more thought. |
It might very well be the case that we have to define a different extension type ArrowLargeTensorArray for backward compatibility and do conversion, but this seems like it would pollute the codebase. |
Good find. i think for the best user experience, Ray Data should be able to handle both the old and new tensor storage format, without requiring the user to configure settings (so some form of auto-retry/fallback logic). I think either would be two viable options. What do you think? |
Reverts #45352
Breaking #46499, #46496 and #46495