Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale test is failing #831

Closed
rramkumar1 opened this issue Aug 21, 2019 · 5 comments
Closed

Scale test is failing #831

rramkumar1 opened this issue Aug 21, 2019 · 5 comments
Assignees

Comments

@rramkumar1
Copy link
Contributor

Ref: https://k8s-testgrid.appspot.com/sig-network-ingress-gce-e2e#ingress-gce-e2e-scale

@MrHohn can you take a look?

@MrHohn
Copy link
Member

MrHohn commented Aug 21, 2019

Thanks for catching this. Saw a bunch of quota exceeded error from the logs (the latest run): https://storage.googleapis.com/kubernetes-jenkins/logs/ci-ingress-gce-e2e-scale/1164061805203427328/artifacts/e2e-a8e3dd9eae-886f8-master/glbc.log

E0821 09:04:53.592740       1 certificates.go:115] Failed to create new sslCertificate "k8s-ssl-b3e0d16f7506ed57-874e944d06b574ed--85b8acb592fffc63" for "ingress-scale-8415-ing-scale-99--85b8acb592fffc63" - googleapi: Error 403: QUOTA_EXCEEDED - Quota 'SSL_CERTIFICATES' exceeded.  Limit: 100.0 globally.
E0821 09:04:53.948214       1 taskqueue.go:62] Requeuing "ingress-scale-8415/ing-scale-99" due to error: error running load balancer syncing routine: loadbalancer ingress-scale-8415-ing-scale-99--85b8acb592fffc63 does not exist: Cert creation failures - k8s-ssl-b3e0d16f7506ed57-874e944d06b574ed--85b8acb592fffc63 Error:googleapi: Error 403: QUOTA_EXCEEDED - Quota 'SSL_CERTIFICATES' exceeded.  Limit: 100.0 globally. (ingresses)
E0821 09:04:53.592740       1 certificates.go:115] Failed to create new sslCertificate "k8s-ssl-b3e0d16f7506ed57-874e944d06b574ed--85b8acb592fffc63" for "ingress-scale-8415-ing-scale-99--85b8acb592fffc63" - googleapi: Error 403: QUOTA_EXCEEDED - Quota 'SSL_CERTIFICATES' exceeded.  Limit: 100.0 globally.
E0821 09:04:53.948214       1 taskqueue.go:62] Requeuing "ingress-scale-8415/ing-scale-99" due to error: error running load balancer syncing routine: loadbalancer ingress-scale-8415-ing-scale-99--85b8acb592fffc63 does not exist: Cert creation failures - k8s-ssl-b3e0d16f7506ed57-874e944d06b574ed--85b8acb592fffc63 Error:googleapi: Error 403: QUOTA_EXCEEDED - Quota 'SSL_CERTIFICATES' exceeded.  Limit: 100.0 globally. (ingresses)
E0821 09:04:53.592740       1 certificates.go:115] Failed to create new sslCertificate "k8s-ssl-b3e0d16f7506ed57-874e944d06b574ed--85b8acb592fffc63" for "ingress-scale-8415-ing-scale-99--85b8acb592fffc63" - googleapi: Error 403: QUOTA_EXCEEDED - Quota 'SSL_CERTIFICATES' exceeded.  Limit: 100.0 globally.

@MrHohn
Copy link
Member

MrHohn commented Aug 21, 2019

Well, seems like we leaked exactly one SSL cert, so the last LB creation (for ing-scale-99) was not successful:

k8s-ssl-9c2d6ff584aac9b3-be97841f9ee50ac2--e2247c88b143658c |   | scale.ingress.com | Aug 18, 2020, 5:32:48 AM

Deleting the leaked cert but will look more why it was leaked.

@MrHohn
Copy link
Member

MrHohn commented Aug 21, 2019

Requested a quota bump on the project for relevant resources to allow some headroom as well.

@MrHohn
Copy link
Member

MrHohn commented Aug 21, 2019

Found the deletion error (in used by) on that leaked SSL cert:

"The ssl_certificate resource 'projects/k8s-ingress-e2e-scale-backup/global/sslCertificates/k8s-ssl-9c2d6ff584aac9b3-be97841f9ee50ac2--e2247c88b143658c' is already being used by 'projects/k8s-ingress-e2e-scale-backup/global/targetHttpsProxies/k8s-tps-ingress-scale-3468-ing-scale-98--e2247c88b143658c'"       

Probably a bug with the GC procedure? As target proxy should be cleaned up prior to SSL cert?

@MrHohn
Copy link
Member

MrHohn commented Aug 22, 2019

Okay, test is now passing; https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-ingress-gce-e2e-scale/1164243519703879680

I tend to believe the resource were leaked from an aborted run (which we can't avoid), and the above deletion error came from janitor a day after that. It failed due to some ordering issue in janitor (deleting cert before target proxy), which kubernetes/test-infra#14021 is fixing.

@MrHohn MrHohn closed this as completed Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants