Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tackle Does Not Wait For a Healthy Cluster When Getting Ingresses #13492

Closed
jeremyje opened this issue Jul 17, 2019 · 4 comments · Fixed by #13597
Closed

Tackle Does Not Wait For a Healthy Cluster When Getting Ingresses #13492

jeremyje opened this issue Jul 17, 2019 · 4 comments · Fixed by #13597
Assignees
Labels
area/prow Issues or PRs related to prow kind/bug Categorizes issue or PR as related to a bug. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@jeremyje
Copy link
Contributor

What happened:
While using tackle I tried to create a Prow cluster in my GCP project. There's a lot of pit falls but this one is a blocker. Basically a Prow cluster appears to be unable to startup without a Github bot token. After providing one the cluster pods eventually get to a healthy state but tackle advances too quickly and tries to get "ingresses" within a few seconds after submitting the token.

Store your GitHub token in a file e.g. echo $TOKEN > /path/to/github/token
Input /path/to/github/token to upload into cluster: /home/c9user/token
INFO[0562] User()                                        client=github
Prow will act as jeremyje on github
Applying github token into oauth-token secret...secret/oauth-token created
Ensuring hmac secret exists at hmac-token...INFO[0563] Creating new hmac-token secret with random data... 
exists
Looking for prow's hook ingress URL... FATA[0564] Could not get ingresses                       error="the server could not find the requested resource"

Tackle then aborts and the workflow cannot be resumed because reattempting gives this,

tackle
Existing kubernetes contexts:
  0: gke_jeremyedwards-gaming-dev_us-central1-a_cloud-gaming-prow
* 1: gke_jeremyedwards-gaming-dev_us-east1-b_cloud-gaming-prow (current)
  2: gke_jeremyedwards-gaming-dev_us-east1-b_prow

Choose context or [create new]: 1
Applying admin role bindings (to create RBAC rules)...
clusterrolebinding.rbac.authorization.k8s.io/prow-admin configured
Deploying prow...
Apply starter.yaml from [github upstream]: 
Loading from https://raw.githubusercontent.com/kubernetes/test-infra/master/prow/cluster/starter.yaml
Prow is already deployed to default in gke_jeremyedwards-gaming-dev_us-east1-b_cloud-gaming-prow, overwrite? [no]: 
FATA[0010] Could not deploy prow                         error="prow already deployed"

What you expected to happen:
Tackle to setup a working Prow cluster.

How to reproduce it (as minimally and precisely as possible):
Simply create a prow cluster using GKE.

Please provide links to example occurrences, if any:

Anything else we need to know?:

@jeremyje jeremyje added the kind/bug Categorizes issue or PR as related to a bug. label Jul 17, 2019
@spiffxp
Copy link
Member

spiffxp commented Jul 24, 2019

/assign @fejta @clarketm
/area prow
Is this still an issue? It seems like if nothing else tackle is doing too broad a check to verify whether prow is "done"

@k8s-ci-robot k8s-ci-robot added the area/prow Issues or PRs related to prow label Jul 24, 2019
@clarketm
Copy link
Contributor

clarketm commented Jul 24, 2019

Regarding the two underlying issues here:

  1. Ingress resource retrieval failure: that specific error correlates to a 404 on the ingress list. The logic handling this is already in a loop with a timeout but is aborting with a Fatal error. Changing this to a warning and adding retry logic might be beneficial here (otherwise, something like kubectl wait may be an option).

  2. Exit on error (or invalid input): gracefully recovering from invalid input, generally speaking would be helpful. We might consider reprompting instead of exiting in some cases (e.g. invalid cluster name, invalid zone choice, incorrect token path, etc.) from a user experience standpoint.

@fejta - do you have any caveats or input on these paths to resolution?

@clarketm
Copy link
Contributor

clarketm commented Jul 24, 2019

Upon further investigation it appears that the api call to list the ingress resources is failing even when the ingress is ready, while the older version of the api to retrieve ingresses does not (e.g. ing, err := kc.ExtensionsV1beta1().Ingresses(ns).List(metav1.ListOptions{})).

Here is the introduction of the api change and context regarding it. Since the default master version in GKE is 1.12 and the GCE ingress does not support the new api, I will provide support for both old and new ingress resource endpoints.

kubernetes/enhancements#758
kubernetes/kubernetes#74057
kubernetes/ingress-gce#770

clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 25, 2019
clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 25, 2019
clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 25, 2019
clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 25, 2019
clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 25, 2019
@spiffxp
Copy link
Member

spiffxp commented Jul 26, 2019

/sig testing

@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Jul 26, 2019
clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 26, 2019
k8s-ci-robot added a commit that referenced this issue Aug 1, 2019
Fix tackle ingress API incompatibility, #13492
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prow Issues or PRs related to prow kind/bug Categorizes issue or PR as related to a bug. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants