[RuntimeEnv, Windows] Fix working_dir, pip & conda for windows #28589

jbedorf · 2022-09-18T11:59:05Z

Why are these changes needed?

Various options of the runtime_env method, when using Windows, are currently broken due to the changes in this PR. That PR removed the use of the command_prefix in the context. This work restores the usage of that. However, due to the different launch methods between Linux and Windows the changes had to be made in multiple locations to ensure that lists are returned instead of strings.

To prevent this breaking in the future various tests have been fixed/enabled. However, there are some lingering issues with the tests:

The top level folder of the working_dir is not deleted due to not being able to delete a folder that is in use on Windows. This is accounted for in the tests by whitelisting that particular folder in various tests.
For the conda environment option the deleting is not complete. Similar the deleting fails because a number of files and folders are in use. Typically they are in use by the ray.util.client.server process. As such not all tests are enabled for Windows as they would keep failing, and leave behind sizeable temporary files.

Other:

Fixed issue where Ray and the temp folder are placed on different drives and as such the change directory method did not work.
Fixed a number of tests that had recursion errors on Windows

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Jeroen Bédorf <[email protected]>

jbedorf · 2022-09-19T12:31:36Z

@mattip Given that you previously worked on Windows related PRs could you help find some reviewers? Thanks!

jbedorf · 2022-09-22T17:13:50Z

@architkulkarni Can you help move this forward? Thanks!

architkulkarni

Thanks for the contribution!

Regarding the conda tests:

As such not all tests are enabled for Windows as they would keep failing, and leave behind sizeable temporary files.

Certainly if they're failing we shouldn't enable them in the PR, but for those tests where the only issue is leaving behind temporary files, can we try to enable them in this PR? If the temporary files somehow cause a problem, we'll see it in the CI run for this PR.

architkulkarni · 2022-09-22T18:14:05Z

python/ray/_private/runtime_env/pip.py

+        else:
+            context.command_prefix += [
+                _PathHelper.get_virtualenv_activate_command(target_dir)
+            ]


Should we make get_virtualenv_activate_command always return List[str] to avoid this if-else?

Yeah I was thinking more about it. A cleaner solution would to have always add lists to the command_prefix item and then do the combination of the items to string for Linux/Mac and keep it as list items for Windows. So we would have to update it for each of the three plugins.

architkulkarni · 2022-09-22T18:18:50Z

python/ray/_private/test_utils.py


+            if set(items) == whitelist:


Based on the term "whitelist" I would expect subset instead of == here, what do you think?

Yep, will update.

jbedorf · 2022-09-22T19:34:42Z

As such not all tests are enabled for Windows as they would keep failing, and leave behind sizeable temporary files.

Certainly if they're failing we shouldn't enable them in the PR, but for those tests where the only issue is leaving behind temporary files, can we try to enable them in this PR? If the temporary files somehow cause a problem, we'll see it in the CI run for this PR.

The tests will pass when adding the same construct as added in some of the other tests:

        whitelist = get_local_file_whitelist(cluster, option)
        wait_for_condition(lambda: check_local_files_gced(cluster, whitelist))

Basically ignore the top level folder in the conda subfolder. It worked fine, but because the temporary session folders are not cleaned during the tests there will be increased disk usage due to various DLLs and exe files that are left behind in the conda folder. For the workdir folders that was not an issue as those were empty. I can enable the tests and hope the CI servers don't run out of disk space next time the tests run 😬

Signed-off-by: Jeroen Bédorf <[email protected]>

…indows Signed-off-by: Jeroen Bédorf <[email protected]>

jbedorf · 2022-09-26T07:25:38Z

@architkulkarni Has something changed with the buildkite settings? After updating the code and importing the latest master I'm unable to view the build results/details. This makes it impossible to see why some of the steps failed.

Looking at the links it appears the address previously was ray-project/ray-builders-pr/.. and now ray-project/oss-ci-build-pr

architkulkarni · 2022-09-26T15:29:57Z

@jbedorf Not 100% sure if it's related, but you're right that there was a recent change to our CI pipeline, but it shouldn't require any special action on our part. I would suggest merging the latest master but it looks like you've already done that.

Which parts can you no longer see? From what I can tell at https://buildkite.com/ray-project/oss-ci-build-pr/builds/396
some runtime env tests are failing on linux/mac, which should be reproducible locally.

jbedorf · 2022-09-26T18:37:34Z

I see, I guess the project settings have changed regarding anonymous access. When I click on the current links it tells me I need a buildkite account. Whereas previous builds could be accessed by anyone. You can see it by opening the buildkite links in an incognito window.

Anyway, I'll update my Linux environment and take a look at the failing tests.

pcmoritz · 2022-09-26T20:20:38Z

@jbedorf Thanks for your feedback about the buildkite visibility, this is not intended. We are looking into fixing it!

cc @simon-mo @thomasdesr

pcmoritz · 2022-09-26T21:21:14Z

I believe this is fixed now -- I tried it out via incognito mode :)

Thanks @simon-mo and @krfricke for fixing!

jbedorf · 2022-09-27T05:24:25Z

I believe this is fixed now -- I tried it out via incognito mode :)

Thanks @simon-mo and @krfricke for fixing!

Yep, it works now. Thanks!

…dows Signed-off-by: Jeroen Bédorf <[email protected]>

Signed-off-by: Jeroen Bédorf <[email protected]>

jbedorf · 2022-09-28T06:46:30Z

@architkulkarni
Please have a another look. The requested changes are made and some more tests are enabled. All related tests do pass, there are some failing tests that appear to either be known issues and/or failing for other PRs as well.

For example:

Revert "Improve Ray client error message when exception can't be unpickled" #28825
python/ray/tests/test_client_proxy.py::test_delay_in_rewriting_environment (Which passes locally)

architkulkarni · 2022-09-28T16:50:05Z

https://flaky-tests.ray.io/
Windows test_traceback broken on master
Wheels and Jars timeout likely unrelated
Linkcheck failure unrelated (no doc changes)
test_usage_stats broken on master
test_client_proxy broken recently on master

architkulkarni

Looks good to me! Before I merge this, it would be good to have a review from Windows expert @mattip who should be back from OOO soon.

mattip

This is nice. There is only once small nit with changing shell=True but I think that is OK. I like that there are more tests enabled and the use of pytest for the tmpdir fixture.

mattip · 2022-09-28T17:26:32Z

python/ray/_private/runtime_env/context.py

@@ -82,7 +82,8 @@ def exec_worker(self, passthrough_args: List[str], language: Language):
            )
        logger.debug(f"Exec'ing worker with command: {command_str}")
        if sys.platform == "win32":
-            subprocess.run([executable, *passthrough_args])
+            cmd = [*self.command_prefix, executable, *passthrough_args]
+            subprocess.Popen(cmd, shell=True).wait()


There are some slight differences when running with shell=True. Is this intentional?

I found that the tests ran more stable with shell=True. I honestly don't understand exactly why as the tests that fail with shell=False often pass when I run them individually. Maybe it has to do with the way pytest runs/keeps the state between runs, but I couldn't pin it down exactly.

mattip · 2022-09-28T17:28:55Z

python/ray/tests/test_runtime_env_complicated.py

 @pytest.mark.skipif(
    os.environ.get("CI") and sys.platform != "linux",
    reason="This test is only run on linux CI machines.",
 )
-def test_pip_with_env_vars(start_cluster):
+def test_pip_with_env_vars(start_cluster, tmp_path):


+1, this is a welcome change

…roject#28589) * Fix working_dir, conda and pip options for Windows Signed-off-by: Jeroen Bédorf <[email protected]> * More test fixes Signed-off-by: Jeroen Bédorf <[email protected]> * More test fixes Signed-off-by: Jeroen Bédorf <[email protected]> * More test fixes and style format fixes Signed-off-by: Jeroen Bédorf <[email protected]> * Style fixes Signed-off-by: Jeroen Bédorf <[email protected]> * Restructure, enable more tests Signed-off-by: Jeroen Bédorf <[email protected]> * Initial fixes to tests Signed-off-by: Jeroen Bédorf <[email protected]> * Fix lint errors Signed-off-by: Jeroen Bédorf <[email protected]> * Fix style Signed-off-by: Jeroen Bédorf <[email protected]> Signed-off-by: Jeroen Bédorf <[email protected]> Signed-off-by: Weichen Xu <[email protected]>

jbedorf added 5 commits September 15, 2022 20:13

Fix working_dir, conda and pip options for Windows

b2dd137

Signed-off-by: Jeroen Bédorf <[email protected]>

More test fixes

02c1f00

Signed-off-by: Jeroen Bédorf <[email protected]>

More test fixes

7375818

Signed-off-by: Jeroen Bédorf <[email protected]>

More test fixes and style format fixes

cf31541

Signed-off-by: Jeroen Bédorf <[email protected]>

Style fixes

b3aa8a2

Signed-off-by: Jeroen Bédorf <[email protected]>

jbedorf changed the title ~~[RuntimeEnv, Windows] Fix runtime dir windows~~ [RuntimeEnv, Windows] Fix working_dir, pip & conda for windows Sep 18, 2022

jbedorf marked this pull request as ready for review September 19, 2022 05:37

architkulkarni assigned architkulkarni and mattip Sep 22, 2022

architkulkarni approved these changes Sep 22, 2022

View reviewed changes

architkulkarni added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Sep 22, 2022

jbedorf added 2 commits September 25, 2022 19:12

Restructure, enable more tests

ed47f62

Signed-off-by: Jeroen Bédorf <[email protected]>

Merge remote-tracking branch 'upstream/master' into fix_runtime_dir_w…

9b6be37

…indows Signed-off-by: Jeroen Bédorf <[email protected]>

jbedorf added 4 commits September 27, 2022 07:37

Merge remote-tracking branch 'origin/master' into fix_runtime_dir_win…

0a1b4d4

…dows Signed-off-by: Jeroen Bédorf <[email protected]>

Initial fixes to tests

b3684c5

Signed-off-by: Jeroen Bédorf <[email protected]>

Fix lint errors

ff1ae19

Signed-off-by: Jeroen Bédorf <[email protected]>

Fix style

6baab38

Signed-off-by: Jeroen Bédorf <[email protected]>

architkulkarni added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. labels Sep 28, 2022

architkulkarni approved these changes Sep 28, 2022

View reviewed changes

mattip approved these changes Sep 28, 2022

View reviewed changes

architkulkarni merged commit de79e6d into ray-project:master Sep 28, 2022

architkulkarni mentioned this pull request Sep 29, 2022

[CI] windows://python/ray/tests:test_runtime_env_working_dir_3 is failing/flaky on master. #28816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RuntimeEnv, Windows] Fix working_dir, pip & conda for windows #28589

[RuntimeEnv, Windows] Fix working_dir, pip & conda for windows #28589

jbedorf commented Sep 18, 2022 •

edited

Loading

jbedorf commented Sep 19, 2022

jbedorf commented Sep 22, 2022

architkulkarni left a comment

architkulkarni Sep 22, 2022

jbedorf Sep 22, 2022

architkulkarni Sep 22, 2022

jbedorf Sep 22, 2022

jbedorf commented Sep 22, 2022

jbedorf commented Sep 26, 2022

architkulkarni commented Sep 26, 2022

jbedorf commented Sep 26, 2022

pcmoritz commented Sep 26, 2022

pcmoritz commented Sep 26, 2022

jbedorf commented Sep 27, 2022

jbedorf commented Sep 28, 2022

architkulkarni commented Sep 28, 2022

architkulkarni left a comment

mattip left a comment

mattip Sep 28, 2022

jbedorf Sep 28, 2022

mattip Sep 28, 2022

[RuntimeEnv, Windows] Fix working_dir, pip & conda for windows #28589

[RuntimeEnv, Windows] Fix working_dir, pip & conda for windows #28589

Conversation

jbedorf commented Sep 18, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

jbedorf commented Sep 19, 2022

jbedorf commented Sep 22, 2022

architkulkarni left a comment

Choose a reason for hiding this comment

architkulkarni Sep 22, 2022

Choose a reason for hiding this comment

jbedorf Sep 22, 2022

Choose a reason for hiding this comment

architkulkarni Sep 22, 2022

Choose a reason for hiding this comment

jbedorf Sep 22, 2022

Choose a reason for hiding this comment

jbedorf commented Sep 22, 2022

jbedorf commented Sep 26, 2022

architkulkarni commented Sep 26, 2022

jbedorf commented Sep 26, 2022

pcmoritz commented Sep 26, 2022

pcmoritz commented Sep 26, 2022

jbedorf commented Sep 27, 2022

jbedorf commented Sep 28, 2022

architkulkarni commented Sep 28, 2022

architkulkarni left a comment

Choose a reason for hiding this comment

mattip left a comment

Choose a reason for hiding this comment

mattip Sep 28, 2022

Choose a reason for hiding this comment

jbedorf Sep 28, 2022

Choose a reason for hiding this comment

mattip Sep 28, 2022

Choose a reason for hiding this comment

jbedorf commented Sep 18, 2022 •

edited

Loading