Tackle Does Not Wait For a Healthy Cluster When Getting Ingresses #13492

jeremyje · 2019-07-17T17:39:55Z

What happened:
While using tackle I tried to create a Prow cluster in my GCP project. There's a lot of pit falls but this one is a blocker. Basically a Prow cluster appears to be unable to startup without a Github bot token. After providing one the cluster pods eventually get to a healthy state but tackle advances too quickly and tries to get "ingresses" within a few seconds after submitting the token.

Store your GitHub token in a file e.g. echo $TOKEN > /path/to/github/token
Input /path/to/github/token to upload into cluster: /home/c9user/token
INFO[0562] User()                                        client=github
Prow will act as jeremyje on github
Applying github token into oauth-token secret...secret/oauth-token created
Ensuring hmac secret exists at hmac-token...INFO[0563] Creating new hmac-token secret with random data... 
exists
Looking for prow's hook ingress URL... FATA[0564] Could not get ingresses                       error="the server could not find the requested resource"

Tackle then aborts and the workflow cannot be resumed because reattempting gives this,

tackle
Existing kubernetes contexts:
  0: gke_jeremyedwards-gaming-dev_us-central1-a_cloud-gaming-prow
* 1: gke_jeremyedwards-gaming-dev_us-east1-b_cloud-gaming-prow (current)
  2: gke_jeremyedwards-gaming-dev_us-east1-b_prow

Choose context or [create new]: 1
Applying admin role bindings (to create RBAC rules)...
clusterrolebinding.rbac.authorization.k8s.io/prow-admin configured
Deploying prow...
Apply starter.yaml from [github upstream]: 
Loading from https://raw.githubusercontent.com/kubernetes/test-infra/master/prow/cluster/starter.yaml
Prow is already deployed to default in gke_jeremyedwards-gaming-dev_us-east1-b_cloud-gaming-prow, overwrite? [no]: 
FATA[0010] Could not deploy prow                         error="prow already deployed"

What you expected to happen:
Tackle to setup a working Prow cluster.

How to reproduce it (as minimally and precisely as possible):
Simply create a prow cluster using GKE.

Please provide links to example occurrences, if any:

Anything else we need to know?:

The text was updated successfully, but these errors were encountered:

spiffxp · 2019-07-24T00:50:07Z

/assign @fejta @clarketm
/area prow
Is this still an issue? It seems like if nothing else tackle is doing too broad a check to verify whether prow is "done"

clarketm · 2019-07-24T17:11:48Z

Regarding the two underlying issues here:

Ingress resource retrieval failure: that specific error correlates to a 404 on the ingress list. The logic handling this is already in a loop with a timeout but is aborting with a Fatal error. Changing this to a warning and adding retry logic might be beneficial here (otherwise, something like kubectl wait may be an option).
Exit on error (or invalid input): gracefully recovering from invalid input, generally speaking would be helpful. We might consider reprompting instead of exiting in some cases (e.g. invalid cluster name, invalid zone choice, incorrect token path, etc.) from a user experience standpoint.

@fejta - do you have any caveats or input on these paths to resolution?

clarketm · 2019-07-24T20:04:51Z

Upon further investigation it appears that the api call to list the ingress resources is failing even when the ingress is ready, while the older version of the api to retrieve ingresses does not (e.g. ing, err := kc.ExtensionsV1beta1().Ingresses(ns).List(metav1.ListOptions{})).

Here is the introduction of the api change and context regarding it. Since the default master version in GKE is 1.12 and the GCE ingress does not support the new api, I will provide support for both old and new ingress resource endpoints.

kubernetes/enhancements#758
kubernetes/kubernetes#74057
kubernetes/ingress-gce#770

spiffxp · 2019-07-26T18:34:02Z

/sig testing

Fix tackle ingress API incompatibility, #13492

jeremyje added the kind/bug Categorizes issue or PR as related to a bug. label Jul 17, 2019

yufan-bot mentioned this issue Jul 19, 2019

start deck failed #13502

Closed

k8s-ci-robot assigned clarketm and fejta Jul 24, 2019

k8s-ci-robot added the area/prow Issues or PRs related to prow label Jul 24, 2019

clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 25, 2019

fix tackle GCE ingress incompatibility, kubernetes#13492

a9ef76a

clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 25, 2019

fix tackle GCE ingress incompatibility, kubernetes#13492

d37b1c6

clarketm mentioned this issue Jul 25, 2019

Fix tackle ingress API incompatibility, #13492 #13597

Merged

clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 25, 2019

fix tackle GCE ingress incompatibility, kubernetes#13492

84b0346

clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 25, 2019

fix tackle GCE ingress incompatibility, kubernetes#13492

c87fcb2

clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 25, 2019

fix tackle GCE ingress incompatibility, kubernetes#13492

85338af

k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Jul 26, 2019

clarketm pushed a commit to clarketm/kubernetes_test-infra that referenced this issue Jul 26, 2019

fix tackle GCE ingress incompatibility, kubernetes#13492

56cf455

k8s-ci-robot closed this as completed in #13597 Aug 1, 2019

k8s-ci-robot added a commit that referenced this issue Aug 1, 2019

Merge pull request #13597 from clarketm/tackle-ingress

b8bc35d

Fix tackle ingress API incompatibility, #13492

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tackle Does Not Wait For a Healthy Cluster When Getting Ingresses #13492

Tackle Does Not Wait For a Healthy Cluster When Getting Ingresses #13492

jeremyje commented Jul 17, 2019

spiffxp commented Jul 24, 2019

clarketm commented Jul 24, 2019 •

edited

Loading

clarketm commented Jul 24, 2019 •

edited

Loading

spiffxp commented Jul 26, 2019

Tackle Does Not Wait For a Healthy Cluster When Getting Ingresses #13492

Tackle Does Not Wait For a Healthy Cluster When Getting Ingresses #13492

Comments

jeremyje commented Jul 17, 2019

spiffxp commented Jul 24, 2019

clarketm commented Jul 24, 2019 • edited Loading

clarketm commented Jul 24, 2019 • edited Loading

spiffxp commented Jul 26, 2019

clarketm commented Jul 24, 2019 •

edited

Loading

clarketm commented Jul 24, 2019 •

edited

Loading