Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release failure during the cut of v1.11.1 #586

Closed
foxish opened this issue Jul 17, 2018 · 10 comments
Closed

Release failure during the cut of v1.11.1 #586

foxish opened this issue Jul 17, 2018 · 10 comments
Assignees
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/release Categorizes an issue or PR as relevant to SIG Release.

Comments

@foxish
Copy link

foxish commented Jul 17, 2018

When trying to run the --nomock version of the official release, I ran into the following:

Step #1: ================================================================================
Step #1: DISK SPACE CHECK  (5/15)
Step #1: ================================================================================
Step #1: 
Step #1: Checking for at least 20 GB on /workspace: OK
Step #1: [2018-Jul-17 20:30:53 UTC] common::disk_space_check in 1s
Step #1: 
Step #1: ATTENTION: Skipping prepare_tree+official step executed during staging
Step #1: ATTENTION: Skipping build_tree+official step executed during staging
Step #1: ATTENTION: Skipping local_kube_cross step executed during staging
Step #1: ATTENTION: Skipping make_cross+official step executed during staging
Step #1: ATTENTION: Skipping prepare_tree+beta step executed during staging
Step #1: ATTENTION: Skipping build_tree+beta step executed during staging
Step #1: ATTENTION: Skipping make_cross+beta step executed during staging
Step #1: ATTENTION: Skipping generate_release_notes step executed during staging
Step #1: 
Step #1: ================================================================================
Step #1: PUSH GIT OBJECTS  (11/15)
Step #1: ================================================================================
Step #1: 
Step #1: Checkout master branch to push objects: OK
Step #1: Pushing tags
Step #1: * v1.11.1: OK
Step #1: * v1.11.2-beta.0: OK
Step #1: Pushing release-1.11 branch: OK
Step #1: Checkout master branch to push objects: OK
Step #1: Rebase master branch: OK
Step #1: Pushing master branch: OK
Step #1: [2018-Jul-17 20:31:13 UTC] push_git_objects in 20s
Step #1: 
Step #1: ================================================================================
Step #1: PUSH BINARY RELEASE ARTIFACTS official (12/15)
Step #1: ================================================================================
Step #1: 
Step #1: Bucket-to-bucket copy gs://kubernetes-release/stage/v1.11.1-beta.0.115+b1b29978270dc2/v1.11.1/gcs-stage artifacts to gs://kubernetes-release/release/v1.11.1: OK
Step #1: Copy staged kubernetes.tar.gz to /workspace/anago-v1.11.1/src/k8s.io/kubernetes/_output-v1.11.1/gcs-stage/v1.11.1: OK
Step #1: Copy staged docker images to /workspace/anago-v1.11.1/src/k8s.io/kubernetes/_output-v1.11.1/release-images: OK
Step #1: [2018-Jul-17 20:33:13 UTC] copy_staged_from_gcs in 2m0s
Step #1: Send docker containers from release-images to staging-k8s.gcr.io...
Step #1: Pushing staging-k8s.gcr.io/cloud-controller-manager:v1.11.1: .....FAILED
Step #1: [2018-Jul-17 20:33:31 UTC] release::docker::release in 18s
Step #1: [2018-Jul-17 20:33:31 UTC] push_all_artifacts in 2m18s
Step #1: FAILED in push_all_artifacts.
Step #1: 
Step #1: RELEASE INCOMPLETE! Exiting...
Step #1: 
Step #1: Copying /workspace/tmp/anago.log{,.[0-9]} to /workspace/anago-v1.11.1: OK
Step #1: Copy /workspace/anago-v1.11.1 files to gs://kubernetes-release/archive/anago-v1.11.1...
Step #1: Ensure PRIVATE ACL on gs://kubernetes-release/archive/anago-v1.11.1/anago.log\*: OK
Step #1: 
Step #1: anago: DONE main on cc2d94ab0d20 Tue Jul 17 20:33:35 UTC 2018 in 3m16s
Finished Step #1
ERROR
ERROR: build step 1 "gcr.io/kubernetes-release-test/k8s-cloud-builder" failed: exit status 1

Now, the tags have been published, but upon retrying, I end up with:

Step #1: ================================================================================
Step #1: DISK SPACE CHECK  (5/15)
Step #1: ================================================================================
Step #1: 
Step #1: Checking for at least 20 GB on /workspace: OK
Step #1: [2018-Jul-17 20:30:53 UTC] common::disk_space_check in 1s
Step #1: 
Step #1: ATTENTION: Skipping prepare_tree+official step executed during staging
Step #1: ATTENTION: Skipping build_tree+official step executed during staging
Step #1: ATTENTION: Skipping local_kube_cross step executed during staging
Step #1: ATTENTION: Skipping make_cross+official step executed during staging
Step #1: ATTENTION: Skipping prepare_tree+beta step executed during staging
Step #1: ATTENTION: Skipping build_tree+beta step executed during staging
Step #1: ATTENTION: Skipping make_cross+beta step executed during staging
Step #1: ATTENTION: Skipping generate_release_notes step executed during staging
Step #1: 
Step #1: ================================================================================
Step #1: PUSH GIT OBJECTS  (11/15)
Step #1: ================================================================================
Step #1: 
Step #1: Checkout master branch to push objects: OK
Step #1: Pushing tags
Step #1: * v1.11.1: OK
Step #1: * v1.11.2-beta.0: OK
Step #1: Pushing release-1.11 branch: OK
Step #1: Checkout master branch to push objects: OK
Step #1: Rebase master branch: OK
Step #1: Pushing master branch: OK
Step #1: [2018-Jul-17 20:31:13 UTC] push_git_objects in 20s
Step #1: 
Step #1: ================================================================================
Step #1: PUSH BINARY RELEASE ARTIFACTS official (12/15)
Step #1: ================================================================================
Step #1: 
Step #1: Bucket-to-bucket copy gs://kubernetes-release/stage/v1.11.1-beta.0.115+b1b29978270dc2/v1.11.1/gcs-stage artifacts to gs://kubernetes-release/release/v1.11.1: OK
Step #1: Copy staged kubernetes.tar.gz to /workspace/anago-v1.11.1/src/k8s.io/kubernetes/_output-v1.11.1/gcs-stage/v1.11.1: OK
Step #1: Copy staged docker images to /workspace/anago-v1.11.1/src/k8s.io/kubernetes/_output-v1.11.1/release-images: OK
Step #1: [2018-Jul-17 20:33:13 UTC] copy_staged_from_gcs in 2m0s
Step #1: Send docker containers from release-images to staging-k8s.gcr.io...
Step #1: Pushing staging-k8s.gcr.io/cloud-controller-manager:v1.11.1: .....FAILED
Step #1: [2018-Jul-17 20:33:31 UTC] release::docker::release in 18s
Step #1: [2018-Jul-17 20:33:31 UTC] push_all_artifacts in 2m18s
Step #1: FAILED in push_all_artifacts.
Step #1: 
Step #1: RELEASE INCOMPLETE! Exiting...
Step #1: 
Step #1: Copying /workspace/tmp/anago.log{,.[0-9]} to /workspace/anago-v1.11.1: OK
Step #1: Copy /workspace/anago-v1.11.1 files to gs://kubernetes-release/archive/anago-v1.11.1...
Step #1: Ensure PRIVATE ACL on gs://kubernetes-release/archive/anago-v1.11.1/anago.log\*: OK
Step #1: 
Step #1: anago: DONE main on cc2d94ab0d20 Tue Jul 17 20:33:35 UTC 2018 in 3m16s
Finished Step #1
ERROR
ERROR: build step 1 "gcr.io/kubernetes-release-test/k8s-cloud-builder" failed: exit status 1

cc/ @jpbetz @ixdy

@foxish foxish added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jul 17, 2018
@foxish
Copy link
Author

foxish commented Jul 17, 2018

cc/ @calebamiles @kubernetes/sig-release-members

@foxish
Copy link
Author

foxish commented Jul 17, 2018

I'm not sure how we should proceed from here, since the release seems to wedged in an intermediate state.

@david-mcmahon
Copy link
Contributor

Looking at gs://kubernetes-release/archive/anago-v1.11.1/anago.log, it looks like something has changed in the permission model. Does anyone know what might have changed and why?

It's possibly related to an out of date gcloud binary or maybe something else. Has anyone updated the image?

@listx

anago::release::docker::release(): /opt/google/google-cloud-sdk/bin/gcloud docker -- push staging-k8s.gcr.io/cloud-controller-manager:v1.11.1
WARNING: `gcloud docker` will not be supported for Docker client versions above 18.03. Please use `gcloud auth configure-docker` to configure `docker` to use `gcloud` as a credential helper, then use `docker` as you would for non-GCR registries, e.g. `docker pull gcr.io/project-id/my-image`. Add `--verbosity=error` to silence this warning, e.g. `gcloud docker --verbosity=error -- pull gcr.io/project-id/my-image`. See: https://cloud.google.com/container-registry/docs/support/deprecation-notices#gcloud-docker
The push refers to repository [staging-k8s.gcr.io/cloud-controller-manager]
3574c41b535e: Preparing
8e9a7d50b12c: Preparing
denied: Token exchange failed for project 'k8s-image-staging'. Caller does not have permission 'storage.buckets.get'. To configure permissions, follow instructions at: https://cloud.google.com/container-registry/docs/access-control

@foxish
Copy link
Author

foxish commented Jul 17, 2018

@david-mcmahon, I was on 207.0.0 and now upgraded to 208.0.2. Retrying now.

@david-mcmahon
Copy link
Contributor

Your desktop gcloud version isn't the issue. It's the release image's gcloud.

The fix here is:

  1. Fix gcloud permission issue. That might be a simple update or something else. I don't know.
  2. delete the new tags created today from github
  3. rerun previous release gcbmgr command

@listx
Copy link

listx commented Jul 17, 2018

I see the line

Step #1: Pushing staging-k8s.gcr.io/cloud-controller-manager:v1.11.1: .....FAILED

This is probably because staging-k8s.gcr.io now points to gcr.io/k8s-image-staging, the new staging registry for images before they are pushed to gcr.io/google-containers. This has been the case for about a week now. The fix would be to just add the cloudbuilder service account that gcbmgr uses. Looking at what foxish posted it looks like kubernetes-release is the gcp project; I don't have permissions to figure out what the service account is --- can someone tell me so that I can add it to the ACLs in k8s-image-staging?

@listx
Copy link

listx commented Jul 17, 2018

The service account has been added; the failure I pointed out should not happen again. So going by david-mcmahon@'s outline, we're on step 2. David can you or someone delete the new tags?

@foxish
Copy link
Author

foxish commented Jul 17, 2018

i just deleted the tags, and @jpbetz helped me run gsutil -m rm -r gs://kubernetes-release/release/v1.11.1. All set to retry whenever we're ready.

@david-mcmahon
Copy link
Contributor

Thanks @jpbetz, I forgot about that one too. You can rerun now.

@foxish
Copy link
Author

foxish commented Jul 18, 2018

This issue is fixed now, but it seems like the images never got pushed to k8s.gcr.io, only staging-k8s.gcr.io. Any ideas why?

@justaugustus justaugustus added sig/release Categorizes an issue or PR as relevant to SIG Release. area/release-eng Issues or PRs related to the Release Engineering subproject labels Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

5 participants