[Release Test] Remove runtime env usage from release tests #33288

rkooo567 · 2023-03-14T07:45:48Z

Why are these changes needed?

Use SDK commands for all core tests.

It is because there was a big regression after migrating to V2 anyscale job runner.

Related issue number

Closes #32750

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 · 2023-03-14T13:57:24Z

cc @scv119 I am verifying if this works now (for some reasons, I couldn't start the cluster. Will ping shomil for this). Can you tell me a list of tests to verify?

@Yard1 my assumption is this should be fine because we anyway sync working dir using the file manager. Is this correct?

Also, do you know exactly when the runtime env is used?

ollie-iterators · 2023-03-14T14:03:16Z

Speaking of release tests, why do the release tests show up much less results than the other tests in https://flakey-tests.ray.io/?

Signed-off-by: SangBin Cho <[email protected]>

Yard1 · 2023-03-14T20:55:10Z

This will cause issues with tests that import from other files in the working directory, as those will not be propagated to other nodes by the file manager. Also, we need to ensure that the working directory is set correctly for imports.

What is the reason for this PR in the first place? The runtime envs in the tests take precedence over the job runtime env anyway (this is how we have set it up here, normally it's the other way with Ray).

Yard1 · 2023-03-14T22:55:10Z

Also it's not a trivial change to Anyscale Jobs because we can't start a cluster and upload to it separately. We'd need to add complex logic to first upload the files to S3 and then download them onto the cluster, essentially reimplementing runtime envs ourselves.

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 · 2023-03-15T00:32:20Z

We will revert back to sdk manager. But before that, I'd like to just try if eager_installs=True can help. I think the "proper solution" from the prod (iiuc) is to include all necessary files and code inside cluster env, which probably takes some time to implement.

rkooo567 · 2023-03-15T06:06:22Z

eager install = True doesn't seem to fix the issue (not sure if it was actually used) https://buildkite.com/ray-project/release-tests-pr/builds/30942#0186e2af-fdf8-44aa-aa1a-c57d7952f906.

I am trying the regular SDK solution now

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 · 2023-03-15T15:22:34Z

hmm looks like it is not working (maybe sdk is not working with v2 stack)?

rkooo567 · 2023-03-15T15:22:48Z

@shomilj is the sdk_command API still available from the v2 stack?

Yard1 · 2023-03-15T17:11:05Z

@rkooo567 it should be working fine, if you get an infra error, just retry

rkooo567 · 2023-03-15T22:04:59Z

@Yard1 it looks like the wait_for_nodes fail with status code 5555 (which is pretty weird). It doesn't seem to be an infra error. Let me investigate a bit

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 · 2023-03-15T23:00:34Z

trying v1 + sdk commands now. We will try syncing files using cluster env later (after this PR)

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 · 2023-03-16T20:56:30Z

https://buildkite.com/ray-project/release-tests-pr/builds/31218#0186ead3-1865-4cfc-9443-bb7c7fa9361e

The perf seems to be recovered.

cc @krfricke can you approve this PR as a code owner?

Yard1 · 2023-03-16T21:03:17Z

@rkooo567 can you just add a comment to the code change explaining why this special case is added?

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 · 2023-03-17T06:03:09Z

The release test result lgtm. Since V1 stack wil be deprecated by end of April we should figure out the root cause of regressions in the new job runner. It looks like it is 4X slower for some reasons (and we verified it doesn't use the runtime env). I will create an issue.

rkooo567 · 2023-03-17T06:05:05Z

This PR recovers test_per_seconds to 190 (40 in the nightly) and actors_per_second to 800 (240 in nightly) again.

…ct#33288) Use SDK commands for all core tests. It is because there was a big regression after migrating to V2 anyscale job runner. Signed-off-by: Jack He <[email protected]>

…ct#33288) Use SDK commands for all core tests. It is because there was a big regression after migrating to V2 anyscale job runner. Signed-off-by: Edward Oakes <[email protected]>

…ct#33288) Use SDK commands for all core tests. It is because there was a big regression after migrating to V2 anyscale job runner.

…ct#33288) Use SDK commands for all core tests. It is because there was a big regression after migrating to V2 anyscale job runner. Signed-off-by: chaowang <[email protected]>

…ct#33288) Use SDK commands for all core tests. It is because there was a big regression after migrating to V2 anyscale job runner. Signed-off-by: elliottower <[email protected]>

…ct#33288) Use SDK commands for all core tests. It is because there was a big regression after migrating to V2 anyscale job runner. Signed-off-by: Jack He <[email protected]>

trial

d7e439c

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 requested review from krfricke, simon-mo and Yard1 as code owners March 14, 2023 07:45

Do file upload

f03c605

Signed-off-by: SangBin Cho <[email protected]>

enalbe eager install to see if perf has changed

a1f018a

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 added 2 commits March 14, 2023 23:10

Revert to sdk commands

575ad99

Signed-off-by: SangBin Cho <[email protected]>

done

ca97961

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 assigned scv119 and Yard1 Mar 15, 2023

scv119 approved these changes Mar 15, 2023

View reviewed changes

try v1 stack sdk command

e0023c2

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 added 4 commits March 15, 2023 20:44

Merge branch 'master' into remove-runtime0env

e3c4d94

fix

d5bf749

Signed-off-by: SangBin Cho <[email protected]>

fix

c497d0a

Signed-off-by: SangBin Cho <[email protected]>

try staging v1

7236400

Signed-off-by: SangBin Cho <[email protected]>

Yard1 approved these changes Mar 16, 2023

View reviewed changes

add comments to explain reasoning

c44899d

Signed-off-by: SangBin Cho <[email protected]>

rkooo567 changed the title ~~[WIP] Remove runtime env usage from release tests~~ [Release Test] Remove runtime env usage from release tests Mar 17, 2023

rkooo567 merged commit 76fd2cd into ray-project:master Mar 17, 2023

rkooo567 mentioned this pull request Mar 17, 2023

[Release][Core] Migrate core test to v2 stack and Anyscale Jobs #33414

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Release Test] Remove runtime env usage from release tests #33288

[Release Test] Remove runtime env usage from release tests #33288

rkooo567 commented Mar 14, 2023 •

edited

Loading

rkooo567 commented Mar 14, 2023 •

edited

Loading

ollie-iterators commented Mar 14, 2023

Yard1 commented Mar 14, 2023 •

edited

Loading

Yard1 commented Mar 14, 2023

rkooo567 commented Mar 15, 2023

rkooo567 commented Mar 15, 2023

rkooo567 commented Mar 15, 2023

rkooo567 commented Mar 15, 2023 •

edited

Loading

Yard1 commented Mar 15, 2023

rkooo567 commented Mar 15, 2023

rkooo567 commented Mar 15, 2023 •

edited

Loading

rkooo567 commented Mar 16, 2023

Yard1 commented Mar 16, 2023

rkooo567 commented Mar 17, 2023

rkooo567 commented Mar 17, 2023

[Release Test] Remove runtime env usage from release tests #33288

[Release Test] Remove runtime env usage from release tests #33288

Conversation

rkooo567 commented Mar 14, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

rkooo567 commented Mar 14, 2023 • edited Loading

ollie-iterators commented Mar 14, 2023

Yard1 commented Mar 14, 2023 • edited Loading

Yard1 commented Mar 14, 2023

rkooo567 commented Mar 15, 2023

rkooo567 commented Mar 15, 2023

rkooo567 commented Mar 15, 2023

rkooo567 commented Mar 15, 2023 • edited Loading

Yard1 commented Mar 15, 2023

rkooo567 commented Mar 15, 2023

rkooo567 commented Mar 15, 2023 • edited Loading

rkooo567 commented Mar 16, 2023

Yard1 commented Mar 16, 2023

rkooo567 commented Mar 17, 2023

rkooo567 commented Mar 17, 2023

rkooo567 commented Mar 14, 2023 •

edited

Loading

rkooo567 commented Mar 14, 2023 •

edited

Loading

Yard1 commented Mar 14, 2023 •

edited

Loading

rkooo567 commented Mar 15, 2023 •

edited

Loading

rkooo567 commented Mar 15, 2023 •

edited

Loading