-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Air][Data] Don't promote locality_hints for split #26647
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: scv119 <[email protected]>
scv119
requested review from
ericl,
clarkzinzow,
jjyao,
jianoaix and
maxpumperla
as code owners
July 17, 2022 19:09
6 tasks
In order to disable for AIR we should change it in Train as well ray/python/ray/train/_internal/dataset_spec.py Lines 48 to 52 in 38c9e1d
ray/python/ray/train/_internal/dataset_spec.py Lines 209 to 213 in 38c9e1d
|
amogkam
reviewed
Jul 17, 2022
ericl
approved these changes
Jul 17, 2022
ericl
added
the
@author-action-required
The PR author is responsible for the next step. Remove tag to send back to the reviewer.
label
Jul 17, 2022
Signed-off-by: scv119 <[email protected]>
jianoaix
approved these changes
Jul 18, 2022
scv119
added a commit
that referenced
this pull request
Jul 18, 2022
…ints (#26641) This PR replaces dataset.split(.., equal=True) implementation by dataset.split_at_indices() . My experiments (the script ) showed that dataset.split_at_indices() have more predictable performance than the dataset.split(…) Concretely: on 10 m5.4xlarge nodes with 5000 iops disk calling ds.split(81) on 200GB dataset with 400 blocks: the split takes 20-40 seconds, split_at_indices takes ~12 seconds. calling ds.split(163) on 200GB dataset with 400 blocks, the split takes 40-100 seconds, split_at_indices takes ~24 seconds. I don’t have much insight of dataset.split implementation, but with dataset.split_at_indices() we are just doing SPREAD to num_split_at_indices tasks, which yield much stable performance. Note: clean up the usage of experimental locality_hints in #26647
scv119
added
tests-ok
The tagger certifies test failures are unrelated and assumes personal liability.
and removed
@author-action-required
The PR author is responsible for the next step. Remove tag to send back to the reviewer.
labels
Jul 18, 2022
jianoaix
pushed a commit
to jianoaix/ray
that referenced
this pull request
Jul 18, 2022
…ints (ray-project#26641) This PR replaces dataset.split(.., equal=True) implementation by dataset.split_at_indices() . My experiments (the script ) showed that dataset.split_at_indices() have more predictable performance than the dataset.split(…) Concretely: on 10 m5.4xlarge nodes with 5000 iops disk calling ds.split(81) on 200GB dataset with 400 blocks: the split takes 20-40 seconds, split_at_indices takes ~12 seconds. calling ds.split(163) on 200GB dataset with 400 blocks, the split takes 40-100 seconds, split_at_indices takes ~24 seconds. I don’t have much insight of dataset.split implementation, but with dataset.split_at_indices() we are just doing SPREAD to num_split_at_indices tasks, which yield much stable performance. Note: clean up the usage of experimental locality_hints in ray-project#26647 Signed-off-by: Ubuntu <[email protected]>
jianoaix
pushed a commit
to jianoaix/ray
that referenced
this pull request
Jul 18, 2022
Why are these changes needed? Since locality_hints is an experimental feature, we stop promoting it in doc and don't enable it in AIR. See ray-project#26641 for more context Signed-off-by: Ubuntu <[email protected]>
xwjiang2010
pushed a commit
to xwjiang2010/ray
that referenced
this pull request
Jul 19, 2022
…ints (ray-project#26641) This PR replaces dataset.split(.., equal=True) implementation by dataset.split_at_indices() . My experiments (the script ) showed that dataset.split_at_indices() have more predictable performance than the dataset.split(…) Concretely: on 10 m5.4xlarge nodes with 5000 iops disk calling ds.split(81) on 200GB dataset with 400 blocks: the split takes 20-40 seconds, split_at_indices takes ~12 seconds. calling ds.split(163) on 200GB dataset with 400 blocks, the split takes 40-100 seconds, split_at_indices takes ~24 seconds. I don’t have much insight of dataset.split implementation, but with dataset.split_at_indices() we are just doing SPREAD to num_split_at_indices tasks, which yield much stable performance. Note: clean up the usage of experimental locality_hints in ray-project#26647 Signed-off-by: Xiaowei Jiang <[email protected]>
xwjiang2010
pushed a commit
to xwjiang2010/ray
that referenced
this pull request
Jul 19, 2022
Why are these changes needed? Since locality_hints is an experimental feature, we stop promoting it in doc and don't enable it in AIR. See ray-project#26641 for more context Signed-off-by: Xiaowei Jiang <[email protected]>
6 tasks
Stefan-1313
pushed a commit
to Stefan-1313/ray_mod
that referenced
this pull request
Aug 18, 2022
…ints (ray-project#26641) This PR replaces dataset.split(.., equal=True) implementation by dataset.split_at_indices() . My experiments (the script ) showed that dataset.split_at_indices() have more predictable performance than the dataset.split(…) Concretely: on 10 m5.4xlarge nodes with 5000 iops disk calling ds.split(81) on 200GB dataset with 400 blocks: the split takes 20-40 seconds, split_at_indices takes ~12 seconds. calling ds.split(163) on 200GB dataset with 400 blocks, the split takes 40-100 seconds, split_at_indices takes ~24 seconds. I don’t have much insight of dataset.split implementation, but with dataset.split_at_indices() we are just doing SPREAD to num_split_at_indices tasks, which yield much stable performance. Note: clean up the usage of experimental locality_hints in ray-project#26647 Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313
pushed a commit
to Stefan-1313/ray_mod
that referenced
this pull request
Aug 18, 2022
Why are these changes needed? Since locality_hints is an experimental feature, we stop promoting it in doc and don't enable it in AIR. See ray-project#26641 for more context Signed-off-by: Stefan van der Kleij <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: scv119 [email protected]
Why are these changes needed?
Since locality_hints is an experimental feature, we stop promoting it in doc and don't enable it in AIR. See #26641 for more context
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.