-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kOps: Test timeouts are not diagnosable when run in parallel #20738
Comments
Hoping to diagnose out why tests are timing out. Issue kubernetes#20738
Hoping to diagnose out why tests are timing out. Issue kubernetes#20738
Hoping to diagnose out why tests are timing out. Issue kubernetes#20738
this seems like something to fix in kubetest2? 🙃 |
thanks to e2e.test's One interesting clue is that all of the jobs that timeout are missing their Looking at e2e.test flags, besides
I may investigate the impact of these flags. I know that --num-nodes defaults to the number of ready nodes and is used to skip certain tests so we might be no longer skipping certain tests. Our serial job is timing out after 5 hours, we can consider extending that too. |
The timeouts were fixed by increasing the prow job's memory in #20931 We also have increased visibility with per-ginkgo-node logs via ginkgo's /close |
@rifelpet: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We have some test timeouts, for example: https://testgrid.k8s.io/sig-cluster-lifecycle-kops#kops-grid-calico-amzn2-k20-docker
It's very difficult to see what the cause of the timeout is. It appears that ginkgo only logs tests after they complete, it doesn't log on a signal, it doesn't write the junit output on a signal. The ginkgo output isn't sufficiently self-descriptive to facilitate scripting.
Possibly passing --progress, --trace or -stream might help here.
We can also try a serial test to see if we can find the problem (I'll probably try this!)
cc @rifelpet
The text was updated successfully, but these errors were encountered: