Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Fix flaky sample YAML tests #1590

Merged

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Oct 31, 2023

Why are these changes needed?

test-raycluster-sample-yamls-nightly-operator becomes very flaky recently. #1556 introduces a new YAML file for a RayCluster with GCS FT. As a result, the sample YAML tests have become flaky. There are two YAMLs, ray-cluster.external-redis.yaml and ray-cluster.external-redis-uri.yaml, for GCS FT-enabled RayCluster custom resources. If the Redis Pod from the first YAML isn't deleted in time, the RayCluster from the second YAML will connect to the terminating Redis. Consequently, the RayCluster crashes when the Redis Pod is eventually deleted.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@kevin85421 kevin85421 marked this pull request as ready for review November 1, 2023 00:37
@kevin85421 kevin85421 added the 1.0 label Nov 1, 2023
@kevin85421
Copy link
Member Author

cc @rueian

@rueian
Copy link
Contributor

rueian commented Nov 1, 2023

Hi @kevin85421, I am sorry about that.

I thought the whole kind cluster will be recreated before each run of yaml. Wasn’t that the case?

@kevin85421
Copy link
Member Author

Hi @kevin85421, I am sorry about that.

I thought the whole kind cluster will be recreated before each run of yaml. Wasn’t that the case?

No need to apologize. It's quite common, and I also overlooked that part during the code review. I cc'd you primarily because you're planning to improve CI, and I wanted to ensure you have enough context for the change. In compatibility-test.py, the Kind cluster is recreated for each test class, but the sample YAML tests are not.

@architkulkarni
Copy link
Contributor

Out of curiosity can this situation happen in real life outside of CI? (delete one raycluster and start another one very quickly, and the second one fails because it connects to the first one's redis)

@kevin85421 kevin85421 merged commit b0096b0 into ray-project:master Nov 1, 2023
23 checks passed
@kevin85421
Copy link
Member Author

Out of curiosity can this situation happen in real life outside of CI? (delete one raycluster and start another one very quickly, and the second one fails because it connects to the first one's redis)

Yes, you can try to run both ray-cluster.external-redis.yaml and ray-cluster.external-redis-uri.yaml consecutively.

Screen Shot 2023-11-01 at 10 19 20 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants