Cherry-pick #42027 #42761

scottjlee · 2024-01-27T00:49:05Z

Why are these changes needed?

Cherry-pick #42027, which adds stability for read tasks.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…y-project#42027) Shuffles input images for read_images_train_4_gpu release test, which fixes the issue with accuracy going to 0. Add AWS Error NETWORK_CONNECTION and AWS Error ACCESS_DENIED as an Exception type to retry during reads, since this can be a transient error that is fine upon retry. Other small fixes for optional parameters in benchmark file, used for debugging purposes. Results of sample release test run: read_images_train_4_gpu: Result of case cache-none: {'time': 11964.644112934, 'tput': 429.22158930338344, 'accuracy': 0.4667895757295709, 'extra_metrics': {}} read_images_train_16_gpu: Result of case cache-none: {'time': 5400.357632072, 'tput': 1593.6668981608586, 'accuracy': 0.5293150227295434, 'extra_metrics': {}} read_images_train_16_gpu_preserve_order: Result of case cache-none: {'time': 5566.524269388, 'tput': 1571.1312653719967, 'accuracy': 0.5295374787691078, 'extra_metrics': {}} (The difference is accuracy is because the 4 worker test only runs for 3 epochs, the 16 worker test runs for 5 epochs, using the entire dataset per epoch.) --------- Signed-off-by: Andrew Xue <[email protected]> Signed-off-by: Scott Lee <[email protected]> Co-authored-by: Scott Lee <[email protected]> Co-authored-by: Scott Lee <[email protected]>

Signed-off-by: Scott Lee <[email protected]>

Zandew and others added 3 commits January 26, 2024 16:44

lint

aaea3c6

Signed-off-by: Scott Lee <[email protected]>

fix

2bdb61e

Signed-off-by: Scott Lee <[email protected]>

scottjlee marked this pull request as ready for review January 27, 2024 00:50

scottjlee requested review from ericl, scv119, c21, amogkam, bveeramani, raulchen, stephanie-wang and Zandew as code owners January 27, 2024 00:50

raulchen approved these changes Jan 27, 2024

View reviewed changes

architkulkarni assigned architkulkarni and zhe-thoughts Jan 29, 2024

architkulkarni added the v2.9.2-pick label Jan 29, 2024

zhe-thoughts approved these changes Jan 29, 2024

View reviewed changes

architkulkarni merged commit ab097bd into ray-project:releases/2.9.2 Jan 29, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick #42027 #42761

Cherry-pick #42027 #42761

scottjlee commented Jan 27, 2024

Cherry-pick #42027 #42761

Cherry-pick #42027 #42761

Conversation

scottjlee commented Jan 27, 2024

Why are these changes needed?

Related issue number

Checks