Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[train+tune] Make path joining OS-agnostic by using Path.as_posix over os.path.join #42037

Merged
merged 19 commits into from
Jan 31, 2024

Conversation

n30111
Copy link
Contributor

@n30111 n30111 commented Dec 20, 2023

Why are these changes needed?

os.path.join uses an OS-dependent separator (\ on windows and / on posix systems), which causes some issues when os.path.join is used in conjunction with Path. The combination can result in a mix of / and \ when running Ray Train/Tune on windows. This runs into issues when passing these paths into pyarrow.fs, leading to issues such as FileNotFound.

This PR fixes the following issues:

While running training on Windows OS with S3 as storage system, some files are not placed in correct locations.
For example the file checkpoint_00000/.metadata.json ends up as checkpoint_00000\.metadata.json.
Also while restoring a tuner run, it fails, as tuner.pkl is not located correctly due to path issue.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I think we should generally use Path.as_posix everywhere we need to do path joins to be OS agnostic, and pyarrow (which we use under the hood) does not like the mix of posix and windows separators.

I'm wondering if we want to actually go ahead and change ALL of them with this PR? I did a global search on the repo and have found a bunch more instances of os.path.join in our Ray Train/Tune code.

Another note: we basically don't have comprehensive windows tests, so support for windows for at least Ray Train/Tune is very experimental. If you are not tied down to local development on a windows machine, running on a devbox or a linux subsystem may be a good thing to consider.

python/ray/train/_checkpoint.py Outdated Show resolved Hide resolved
n3011 and others added 3 commits December 21, 2023 02:44
Co-authored-by: Justin Yu <[email protected]>
Signed-off-by: Ishant Mrinal <[email protected]>
Signed-off-by: n3011 <[email protected]>
Signed-off-by: n3011 <[email protected]>
@n30111
Copy link
Contributor Author

n30111 commented Dec 21, 2023

@justinvyu thanks, we are dependent on both linux and windows, so this changes will be useful.

@n30111
Copy link
Contributor Author

n30111 commented Dec 28, 2023

@justinvyu I have updated the PR, It would be great if we can include these changes in the next release, thanks.

@justinvyu
Copy link
Contributor

justinvyu commented Jan 8, 2024

Hey @n30111 sorry for the delay, I'll do another review on this PR, so that we can get it in for Ray 2.10.

@jbedorf
Copy link
Contributor

jbedorf commented Jan 11, 2024

@justinvyu Friendly ping on this PR.

@justinvyu
Copy link
Contributor

@n3011 @jbedorf Is it possible to "give maintainers permission to edit" the PR (should be some setting in the PR)? I've replaced os.path.join in many other place sin the code.

@n30111
Copy link
Contributor Author

n30111 commented Jan 17, 2024

@justinvyu Hi Justin, that particular option does not work for forks hosted by organizations (see: https://github.com/orgs/community/discussions/5634 ), but we just added you to the fork/repository to enable you to push the changes you made.

@justinvyu
Copy link
Contributor

@n3011 Is it possible to add a small test to confirm that this PR fixes the failure scenario you're running into? Thanks!

https://github.com/ray-project/ray/blob/master/python/ray/train/tests/test_windows.py

@@ -129,7 +130,7 @@ def get_metadata(self) -> Dict[str, Any]:

If no metadata is stored, an empty dict is returned.
"""
metadata_path = os.path.join(self.path, _METADATA_FILE_NAME)
metadata_path = Path(self.path, _METADATA_FILE_NAME).as_posix()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a common utility that wraps this Path/as_posix logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it but I prefer just seeing this spelled out. I think the longer term goal is to replace everything with Path objects and never pass strings around.

@@ -41,8 +41,8 @@ class JsonLogger(Logger):

def _init(self):
self.update_config(self.config)
local_file = os.path.join(self.logdir, EXPR_RESULT_FILE)
self.local_out = open(local_file, "a")
local_file = Path(self.logdir, EXPR_RESULT_FILE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include as_posix()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for other instances below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just used Path.open instead of open(str) below.

@n30111
Copy link
Contributor Author

n30111 commented Jan 18, 2024

@n3011 Is it possible to add a small test to confirm that this PR fixes the failure scenario you're running into? Thanks!

https://github.com/ray-project/ray/blob/master/python/ray/train/tests/test_windows.py

This test covers the scenario https://github.com/ray-project/ray/blob/master/python/ray/train/tests/test_trainer_restore.py#L189, is this run on windows system also?

@justinvyu justinvyu self-assigned this Jan 29, 2024
@justinvyu justinvyu changed the title [tune/train] Fix issues of files misplacement for Windows with S3 file system [train+tune] Make path joining OS-agnostic by using Path.as_posix over os.path.join Jan 31, 2024
@justinvyu justinvyu merged commit 30653f1 into ray-project:master Jan 31, 2024
9 checks passed
@justinvyu justinvyu deleted the win_path_fix branch January 31, 2024 22:33
@justinvyu
Copy link
Contributor

Thanks for this contribution @n30111! 🚀 Let me know if you get a chance to try it out on nightly.

shrekris-anyscale pushed a commit to shrekris-anyscale/ray that referenced this pull request Feb 1, 2024
…ver `os.path.join` (ray-project#42037)

`os.path.join` uses an OS-dependent separator (`\` on windows and `/` on posix systems), which causes some issues when `os.path.join` is used in conjunction with `Path`. The combination can result in a mix of `/` and `\` when running Ray Train/Tune on windows. This runs into issues when passing these paths into `pyarrow.fs`, leading to issues such as `FileNotFound`.

---------

Signed-off-by: n3011 <[email protected]>
Signed-off-by: Ishant Mrinal <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Co-authored-by: n3011 <[email protected]>
Co-authored-by: Justin Yu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants