Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[air] Add RunConfig.storage_path to replace SyncConfig.upload_dir and RunConfig.local_dir. #33463

Merged
merged 50 commits into from
Apr 5, 2023

Conversation

krfricke
Copy link
Contributor

@krfricke krfricke commented Mar 20, 2023

Why are these changes needed?

Pending acceptance of ray-project/enhancements#27, this PR changes the input API to configure persistent storage paths in Ray AIR and Ray Tune.

In a nutshell:

  • We have a unified setting RunConfig.storage_path that can be set either to a local dir or a remote path.
  • Setting the storage path to a cloud or NFS URI (e.g., s3://, or file:// that points to a NFS mount). In these cases, data will be first written to a local cache dir on the worker, and then synced to a subdirectory in the storage path designated by <experiment_name>/<trial_name>/.
  • Setting the storage path to a purely local URI (e.g., /home/foo/ray_results). In this mode, trial data is synced to the head node via the object store. We would generally recommend using a remote storage path or shared directory instead.

This PR deprecates existing input API arguments, and introduces the new storage_path configuration. The changes are fully backwards compatible. This is proven by the fact that this PR only changes very few private API calls in the test suites. All tests and examples are expected to continue to work.

In a follow-up PR, tests and examples should be adjusted to use the new API.

Related issue number

Replaces #33306

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Kai Fricke added 13 commits March 16, 2023 19:43
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
@krfricke krfricke changed the title [tune] Move to storage_path [air] Add RunConfig.storage_path to replace SyncConfig.upload_dir and RunConfig.local_dir. Mar 20, 2023
Kai Fricke added 2 commits March 30, 2023 11:07
Signed-off-by: Kai Fricke <[email protected]>
This reverts commit 602eac4.
@@ -116,13 +116,22 @@
STDOUT_FILE = "__stdout_file__"
STDERR_FILE = "__stderr_file__"


def _get_defaults_results_dir() -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not related to this PR] in the future, this file and other file like trial.py should really move to an execution specific folder rather than under tune.

Signed-off-by: Kai Fricke <[email protected]>
@xwjiang2010
Copy link
Contributor

Looks like there is one related test failure? test_data_parallel_trainer_checkpointing
Approving pending fixing that one.

Also would be good to link the launch doc as well, for an understanding of gradual rollout. Thanks!

@krfricke krfricke merged commit 2efbfec into ray-project:master Apr 5, 2023
@krfricke krfricke deleted the tune/storage-path-2 branch April 5, 2023 08:51
krfricke added a commit that referenced this pull request Apr 13, 2023
Following #33463, this PR updates our tests, examples, and docs to use the new `storage_path` API.

The only locations where we continue to use the `local_dir` statement are tests where we specify both a local dir and a remote dir. For these tests, we can move to an environment-variable based wrapper in the future.

Signed-off-by: Kai Fricke <[email protected]>
vitsai pushed a commit to vitsai/ray that referenced this pull request Apr 17, 2023
…#34263)

Following ray-project#33463, this PR updates our tests, examples, and docs to use the new `storage_path` API.

The only locations where we continue to use the `local_dir` statement are tests where we specify both a local dir and a remote dir. For these tests, we can move to an environment-variable based wrapper in the future.

Signed-off-by: Kai Fricke <[email protected]>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
… and `RunConfig.local_dir`. (ray-project#33463)

Pending acceptance of ray-project/enhancements#27, this PR changes the input API to configure persistent storage paths in Ray AIR and Ray Tune.

In a nutshell:

- We have a unified setting `RunConfig.storage_path` that can be set either to a local dir or a remote path.
- Setting the storage path to a cloud or NFS URI (e.g., `s3://`, or `file://` that points to a NFS mount). In these cases, data will be first written to a local cache dir on the worker, and then synced to a subdirectory in the storage path designated by `<experiment_name>/<trial_name>/`.
- Setting the storage path to a purely local URI (e.g., `/home/foo/ray_results`). In this mode, trial data is synced to the head node via the object store. We would generally recommend using a remote storage path or shared directory instead.

This PR deprecates existing input API arguments, and introduces the new `storage_path` configuration. The changes are fully backwards compatible. This is proven by the fact that this PR only changes very few private API calls in the test suites. All tests and examples are expected to continue to work.

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: elliottower <[email protected]>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…#34263)

Following ray-project#33463, this PR updates our tests, examples, and docs to use the new `storage_path` API.

The only locations where we continue to use the `local_dir` statement are tests where we specify both a local dir and a remote dir. For these tests, we can move to an environment-variable based wrapper in the future.

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: elliottower <[email protected]>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
… and `RunConfig.local_dir`. (ray-project#33463)

Pending acceptance of ray-project/enhancements#27, this PR changes the input API to configure persistent storage paths in Ray AIR and Ray Tune.

In a nutshell:

- We have a unified setting `RunConfig.storage_path` that can be set either to a local dir or a remote path.
- Setting the storage path to a cloud or NFS URI (e.g., `s3://`, or `file://` that points to a NFS mount). In these cases, data will be first written to a local cache dir on the worker, and then synced to a subdirectory in the storage path designated by `<experiment_name>/<trial_name>/`.
- Setting the storage path to a purely local URI (e.g., `/home/foo/ray_results`). In this mode, trial data is synced to the head node via the object store. We would generally recommend using a remote storage path or shared directory instead.

This PR deprecates existing input API arguments, and introduces the new `storage_path` configuration. The changes are fully backwards compatible. This is proven by the fact that this PR only changes very few private API calls in the test suites. All tests and examples are expected to continue to work.

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Jack He <[email protected]>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…#34263)

Following ray-project#33463, this PR updates our tests, examples, and docs to use the new `storage_path` API.

The only locations where we continue to use the `local_dir` statement are tests where we specify both a local dir and a remote dir. For these tests, we can move to an environment-variable based wrapper in the future.

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Jack He <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants