-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] "should be able to update all Pods to Running" is still flaky. #894
Comments
Dang, this is unfortunate. I wonder if there's a way to more reliably get the true status of the Pods either by forcing local cache refresh or bypassing it. Or maybe checking the Pod statuses are running isn't needed and |
It seems that there is a way to do that. In client.New doc, it says:
In kubebuilder's writing-tests doc, it says:
So, the idea is we should use k8sClient that are created from client.New instead of k8sManager.GetClient. In current implementation, we use k8sClient from k8sManager.GetClient: k8sClient = mgr.GetClient()
Expect(k8sClient).ToNot(BeNil()) Thus, I remove the above k8sClient from k8sManager.GetClient and use k8sClient from client.New: k8sClient, err = client.New(cfg, client.Options{Scheme: scheme.Scheme})
Expect(err).ToNot(HaveOccurred())
Expect(k8sClient).ToNot(BeNil()) In a 2 core CPU, 7 GB RAM VM (to simulate the github's standard Linux runner), I ran the test 100 times: If this makes sense(also see this example, they do the similar thing), I can make a pr. In 100 runs, 86 runs succeed, 14 runs fail with the following errors (not sure if we should ignore these small proportions or need to dig out why, but I think this is a separate problem):
|
Thanks, @davidxia and @Yicheng-Lu-llll! Using |
According to this comment:
writing to Kubernetes API Server is asynchronous. |
Search before asking
KubeRay Component
ci
What happened + What you expected to happen
After #893 is merged, this test is still flaky. (See link for more details.) Only "should be able to update all Pods to Running" fails, so all Pods become running before the start of the test "cluster's .status.state should be updated to 'ready' shortly after all Pods are Running". We may need to increase the timeout of the test.
@Yicheng-Lu-llll will update the test and run more runs to test its stability.
Reproduction script
Run
make test
several times.Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: