-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIR - Datasets] [Hotfix] Tensor extension column concatenation fixes. #29479
[AIR - Datasets] [Hotfix] Tensor extension column concatenation fixes. #29479
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this issue, just minor comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM beside @jianoaix's comments.
cf9eec1
to
c382d5c
Compare
@jianoaix Feedback implemented, PTAL. I opted to add test coverage for the particular |
I've been pushing for full unit test coverage of the block layer for a while, would definitely still like to do that in a dedicated PR. |
Failing tests appear to be unrelated, merging. |
ray-project#29479) Fixes the concatenation of tensor columns (and extension types in general). Previously, concatenating tensor columns may produce a variable-shaped tensor but would be represented with a broken homogeneous-shaped tensor. This PR ensures that the correct tensor extension type is produced post-concatenation. Issue Details: The core issue is that we weren’t accounting for homogeneous-shaped tensor column concatenation resulting in a heterogeneous-shaped tensor column. In master, this currently fails silently until accessing the data fails somewhere downstream of the concatenation. E.g., if you have 5 images of shape (32, 32) in one block, and 5 images of shape (64, 64) in another block, the image column in each individual block is homogeneous-shaped. But if you concatenate those blocks, you have 10 images, half with shape (32, 32) and half with shape (64, 64), which needs our heterogeneous-shaped (variable-shaped) tensor column representation. Signed-off-by: Weichen Xu <[email protected]>
Fixes the concatenation of tensor columns (and extension types in general). Previously, concatenating tensor columns may produce a variable-shaped tensor but would be represented with a broken homogeneous-shaped tensor. This PR ensures that the correct tensor extension type is produced post-concatenation.
Issue Details
The core issue is that we weren’t accounting for homogeneous-shaped tensor column concatenation resulting in a heterogeneous-shaped tensor column. In master, this currently fails silently until accessing the data fails somewhere downstream of the concatenation.
E.g., if you have 5 images of shape (32, 32) in one block, and 5 images of shape (64, 64) in another block, the image column in each individual block is homogeneous-shaped. But if you concatenate those blocks, you have 10 images, half with shape (32, 32) and half with shape (64, 64), which needs our heterogeneous-shaped (variable-shaped) tensor column representation.
Issue
Closes #29489
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.