Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] windows://python/ray/tests:test_runtime_env_working_dir_3 is failing/flaky on master. #28816

Closed
architkulkarni opened this issue Sep 27, 2022 · 8 comments · Fixed by #28820 or #28898
Assignees
Labels
flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/

Comments

@architkulkarni
Copy link
Contributor

....
Generated from flaky test tracker. Please do not edit the signature in this section.
DataCaseName-windows://python/ray/tests:test_runtime_env_working_dir_3-END
....

@architkulkarni architkulkarni added the flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/ label Sep 27, 2022
@architkulkarni architkulkarni self-assigned this Sep 27, 2022
@architkulkarni
Copy link
Contributor Author

Likely due to #28623 and we just need to increase a timeout somewhere.

@architkulkarni
Copy link
Contributor Author

architkulkarni commented Sep 27, 2022

5/5 failures are in test_pin_runtime_env_uri[ray_client-tmp_working_dir-20]

Example: https://buildkite.com/ray-project/oss-ci-build-branch/builds/204#01837b47-7381-4a73-a193-98de0c9d35c1/8514-8542

@architkulkarni
Copy link
Contributor Author

___________ test_pin_runtime_env_uri[ray_client-tmp_working_dir-20] ___________
--
  |  
  | start_cluster = (<ray.cluster_utils.Cluster object at 0x00000188AC732DC0>, 'ray://localhost:10004')
  | source = 'C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\tmptp14d912'
  | expiration_s = 20
  | monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x00000188AC7899A0>
  |  
  | @pytest.mark.parametrize("expiration_s", [0, 20])
  | @pytest.mark.parametrize("source", [lazy_fixture("tmp_working_dir")])
  | def test_pin_runtime_env_uri(start_cluster, source, expiration_s, monkeypatch):
  | """Test that temporary GCS URI references are deleted after expiration_s."""
  | monkeypatch.setenv(RAY_RUNTIME_ENV_URI_PIN_EXPIRATION_S_ENV_VAR, str(expiration_s))
  |  
  | cluster, address = start_cluster
  |  
  | start = time.time()
  | ray.init(address, namespace="test", runtime_env={"working_dir": source})
  |  
  | @ray.remote
  | def f():
  | pass
  |  
  | # Wait for runtime env to be set up. This can be accomplished by getting
  | # the result of a task that depends on it.
  | ray.get(f.remote())
  | ray.shutdown()
  |  
  | # Need to re-connect to use internal_kv.
  | ray.init(address=address)
  |  
  | print("Starting Internal KV checks at time ", time.time() - start)
  | if expiration_s > 0:
  | >           assert not check_internal_kv_gced()
  | E           assert not True
  | E            +  where True = check_internal_kv_gced()

@architkulkarni
Copy link
Contributor Author

architkulkarni commented Sep 29, 2022

Newly flaky after #28589, should be an easy fix.
https://flaky-tests.ray.io/
Example: https://buildkite.com/ray-project/oss-ci-build-branch/builds/261#018388a9-df2d-4b75-a7a7-346815b92f71

def get_local_file_whitelist(cluster, option):
--
  | # On Windows the runtime directory itself is not deleted due to it being in use
  | # therefore whitelist it for the tests.
  | if sys.platform == "win32" and option != "py_modules":
  | runtime_dir = (
  | Path(cluster.list_all_nodes()[0].get_runtime_env_dir_path())
  | / "working_dir_files"
  | )
  | >           return {list(Path(runtime_dir).iterdir())[0].name}
  | E           IndexError: list index out of range


@architkulkarni
Copy link
Contributor Author

This test is still flaky, due to a new reason:
Ex: https://buildkite.com/ray-project/oss-ci-build-branch/builds/284#01838ca6-974f-41be-94d6-d21fe159416f/7284-7827
wait_for_condition(lambda: check_local_files_gced(cluster, whitelist=whitelist)) timed out.

architkulkarni added a commit that referenced this issue Oct 11, 2022
… timing out (#29007)

Several tests were recently enabled on Windows; unfortunately some of them are flaky. This PR disables the flaky tests to fix CI.

Related issue number
Addresses #28816, #29000 and #29001 but doesn't fix the root cause. We should find the root cause and reenable these tests in the future.
@rickyyx
Copy link
Contributor

rickyyx commented Oct 14, 2022

This is timing out on the release branch as well (on mac machine though) - should we cherry pick this? https://buildkite.com/ray-project/oss-ci-build-branch/builds/528#0183d72d-8a29-4fcb-a094-b4e7cddf24c2

@architkulkarni

@architkulkarni
Copy link
Contributor Author

@rickyyx Can it be fixed by a restart?
Screen Shot 2022-10-14 at 1 13 46 PM
I don't think it's a release blocker. If you think it would delay the release less overall if we skip the test, we can cherry pick a PR to skip it on Mac.

We should eventually figure out the root cause though, I made an issue to track the mac flakiness here #29373

@rickyyx
Copy link
Contributor

rickyyx commented Oct 14, 2022

I see - I think it doesn't show up every single time. I could bear with the flakiness for now.

@rickyyx rickyyx closed this as completed Oct 14, 2022
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this issue Dec 19, 2022
… timing out (ray-project#29007)

Several tests were recently enabled on Windows; unfortunately some of them are flaky. This PR disables the flaky tests to fix CI.

Related issue number
Addresses ray-project#28816, ray-project#29000 and ray-project#29001 but doesn't fix the root cause. We should find the root cause and reenable these tests in the future.

Signed-off-by: Weichen Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-tracker Issue created via Flaky Test Tracker https://flaky-tests.ray.io/
Projects
None yet
2 participants