-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Fix flaky sample YAML tests #1590
[Bug] Fix flaky sample YAML tests #1590
Conversation
d8d29ea
to
51136a5
Compare
cc @rueian |
Hi @kevin85421, I am sorry about that. I thought the whole kind cluster will be recreated before each run of yaml. Wasn’t that the case? |
No need to apologize. It's quite common, and I also overlooked that part during the code review. I cc'd you primarily because you're planning to improve CI, and I wanted to ensure you have enough context for the change. In compatibility-test.py, the Kind cluster is recreated for each test class, but the sample YAML tests are not. |
Out of curiosity can this situation happen in real life outside of CI? (delete one raycluster and start another one very quickly, and the second one fails because it connects to the first one's redis) |
Yes, you can try to run both |
Why are these changes needed?
test-raycluster-sample-yamls-nightly-operator
becomes very flaky recently. #1556 introduces a new YAML file for a RayCluster with GCS FT. As a result, the sample YAML tests have become flaky. There are two YAMLs, ray-cluster.external-redis.yaml and ray-cluster.external-redis-uri.yaml, for GCS FT-enabled RayCluster custom resources. If the Redis Pod from the first YAML isn't deleted in time, the RayCluster from the second YAML will connect to the terminating Redis. Consequently, the RayCluster crashes when the Redis Pod is eventually deleted.Related issue number
Checks