Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIR - Datasets] Encode number of dimensions in variable-shaped tensor extension type. #29281

Merged

Conversation

clarkzinzow
Copy link
Contributor

@clarkzinzow clarkzinzow commented Oct 12, 2022

Knowing the number of dimensions in a variable-shaped tensor column is useful for e.g. inferring a ragged tensor spec when constructing a tf.data Dataset; by encoding this ndim data in the extension type, we can do this type inference based on Dataset metadata, which is required.

Note that this change will disallow variable-shaped tensor columns containing tensor elements that have a variable number of dimensions. This isn't supported by TensorFlow and Torch ragged tensors, so sacrificing this feature seems tenable.

Related issue number

Closes #29135

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@clarkzinzow clarkzinzow added the do-not-merge Do not merge this PR! label Oct 12, 2022
@clarkzinzow clarkzinzow force-pushed the datasets/fix/ragged-tensor-ndim branch from dc64bb1 to d97c6ad Compare October 20, 2022 03:03
@clarkzinzow clarkzinzow removed the do-not-merge Do not merge this PR! label Oct 20, 2022
@clarkzinzow clarkzinzow marked this pull request as ready for review October 20, 2022 03:03
@clarkzinzow clarkzinzow changed the title [AIR - Datasets] [WIP] Encode number of dimensions in variable-shaped tensor extension type. [AIR - Datasets] Encode number of dimensions in variable-shaped tensor extension type. Oct 20, 2022
Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, although not sure if I have enough familiarity with the tensor extension code to approve

python/ray/air/util/tensor_extensions/arrow.py Outdated Show resolved Hide resolved
@clarkzinzow clarkzinzow force-pushed the datasets/fix/ragged-tensor-ndim branch from dc53d80 to 14dc07e Compare October 21, 2022 23:03
Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if ndim is not None and a.ndim != ndim:
raise ValueError(
"ArrowVariableShapedTensorArray only supports tensor elements that "
"all have the same number of dimensinos, but got tensor elements "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "dimensinos" seems not fixed yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh weird, I thought I fixed that, good catch!

@clarkzinzow clarkzinzow merged commit 9a3c0fb into ray-project:master Oct 24, 2022
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022
…r extension type. (ray-project#29281)

Knowing the number of dimensions in a variable-shaped tensor column is useful for e.g. inferring a ragged tensor spec when constructing a tf.data Dataset; by encoding this ndim data in the extension type, we can do this type inference base on Dataset metadata, which is required.

Note that this change will disallow variable-shaped tensor columns containing tensor elements that have a variable number of dimensions. This isn't supported by TensorFlow and Torch ragged tensors, so sacrificing this feature seems tenable.

Signed-off-by: Weichen Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Datasets] Make ragged tensor shape more descriptive
4 participants