-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIR] Add experimental read_images
#28256
Closed
Closed
Changes from all commits
Commits
Show all changes
80 commits
Select commit
Hold shift + click to select a range
5e50b46
Add experimental `read_images`
bveeramani 675ca6c
Merge branch 'master' into bveeramani/read-images
bveeramani b8d3974
Mark as experimental
bveeramani 4f1d5d7
Rename `PathPartitionScheme` as `Partitioning`
bveeramani 9afc041
Update input_output.rst
bveeramani d6b2667
Update partitioning.py
bveeramani 517c390
Update partitioning.py
bveeramani d7a2ae3
Add CSV tests
bveeramani 9416d3c
Merge remote-tracking branch 'upstream/master' into bveeramani/partition
bveeramani e9a9c5c
Merge remote-tracking branch 'upstream/master' into bveeramani/partition
bveeramani 644878f
Support `None` field name
bveeramani 9c65eb9
Update test_partitioning.py
bveeramani 7372987
Merge branch 'bveeramani/dir-partitioning' into bveeramani/partition
bveeramani 6980079
Merge stuff
bveeramani 2253c47
Move code to `FileBasedDatasource`
bveeramani d34acc9
Delete tmp.csv
bveeramani 0cfeb58
Merge remote-tracking branch 'upstream/master' into bveeramani/partition
bveeramani 38ba956
Add files
bveeramani 308bc68
Appease lint
bveeramani a8432e4
Update csv_datasource.py
bveeramani b5657a8
Delete test_csv_partitioning.py
bveeramani f96a498
Update file_based_datasource.py
bveeramani 44ec745
Rename
bveeramani 00aac7d
Make changes
bveeramani a2f2ab0
Appease lint
bveeramani 3fd0aac
Update read_api.py
bveeramani e0cb06a
Add Numpy
bveeramani 4f08b73
Update files
bveeramani a839514
Update read_api.py
bveeramani fc087f1
Update files
bveeramani bca3925
Merge remote-tracking branch 'upstream/master' into bveeramani/read-i…
bveeramani 5f7ea9f
Merge branch 'bveeramani/partition' into bveeramani/read-images
bveeramani 34b016f
Update read_api.py
bveeramani e4eb840
Update error messages
bveeramani 3f1c361
Temp
bveeramani 9924029
Merge branch 'bveeramani/partition' into bveeramani/read-images
bveeramani 5d7b7fe
Update files
bveeramani e4a2cb9
Bug fix and lint
bveeramani 0715fc8
Update files
bveeramani d7fccfa
Appease lint and fix install
bveeramani 7f88436
Merge branch 'bveeramani/partition' into bveeramani/read-images
bveeramani edf1b9f
Fix parameter
bveeramani 578edc2
Update creating-datasets.rst
bveeramani 249bafc
Fix test
bveeramani 27d9a59
Address review comments
bveeramani c993f2d
Update test_dataset_formats.py
bveeramani 65dc78f
Merge branch 'master' into bveeramani/partition
bveeramani 92d6af5
Update test_dataset_formats.py
bveeramani 8dc0501
Update test_dataset_formats.py
bveeramani 343c995
Merge branch 'master' into bveeramani/partition
bveeramani 29ed734
Update test_dataset_formats.py
bveeramani 0ef5585
Update python/ray/data/datasource/text_datasource.py
bveeramani 2fb3451
Update python/ray/data/tests/test_dataset_formats.py
bveeramani baf096e
Address review comments
bveeramani a3d5729
Update test_partitioning.py
bveeramani ef2e79e
Address review comments
bveeramani fbf2bb1
Merge remote-tracking branch 'upstream/master' into bveeramani/partition
bveeramani 01be922
Merge branch 'master' into bveeramani/read-images
bveeramani 6f6855d
Update test_dataset_image.py
bveeramani c3cdf7b
Merge branch 'master' into bveeramani/partition
bveeramani 5eaa52b
Tests
bveeramani 0604d3a
Delete x.npy
bveeramani 50f99ca
Appease lint
bveeramani b1d9b33
Merge branch 'bveeramani/partition' into bveeramani/read-images
bveeramani 2f65750
Delete model
bveeramani 2d23510
Update pytorch_training_e2e.py
bveeramani 3138d7b
Merge branch 'master' into bveeramani/read-images
bveeramani 2dfd0fd
Appease lint
bveeramani ad8f81c
Minor fixes
bveeramani 151309b
Update documentation
bveeramani d827ccb
Remove references
bveeramani 5d6af8b
Update creating-datasets.rst
bveeramani 46f0292
Update read_benchmark.py
bveeramani 9c1c277
Minor fixes
bveeramani 208089b
Fix CI
bveeramani ddd342f
Update read_api.py
bveeramani 0dc6dbe
Address review comments
bveeramani 0bf9734
Merge branch 'master' into bveeramani/read-images
bveeramani bdae9f4
Merge branch 'master' into bveeramani/read-images
bveeramani 4c98cf8
Update test_dataset_image.py
bveeramani File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're showing off a NumPy-only UDF, we shouldn't return a pandas DataFrame; instead, we can return a single ndarray (or dict of ndarrays, if we're wanting to change to a human-readable column name), which Datasets will convert back into a tabular format. This is both better UX for the UDF developer and should be more efficient under-the-hood (Datasets will represent the imagery tensor column in an Arrow Table rather than a Pandas DataFrame, which is more reliably zero-copy and has a smaller wire footprint).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, could we match what @jiaodong is doing in their NumPy narrow waist for prediction PR, where the torchvision transform is vectorized over the input ndarray? That should be doable with the current API, just need to do the same transpose as in that PR: https://github.com/ray-project/ray/pull/28917/files#diff-e2bccb297d421f0dcff1892c4f23993064f52b17710787c41c3a2ae9dbc84159
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. basically this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't return an
ndarray
, because then I getCould we address this in a follow-up PR? I can create an issue to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I forgot that the preprocessor is going to be applied within the predictor, which doesn't have the NumPy narrow waist merged yet.
Since we need to convert NumPy ndarray batches to pandas DataFrame batches with
read_images()
now returning a tensor dataset, I suppose this is fine as-is, with the expectation that whichever PR is merged second will need to resolve merge conflicts and converge to what I gave above (ndarray in, ndarray out, vectorized torchvision transform).