push-build.sh container image pushes should precede staging GCS artifacts and writing version markers #1693

justaugustus · 2020-11-06T16:08:18Z

What happened:

Tracking issue for https://kubernetes.slack.com/archives/CJH2GBF7Y/p1604669572198400.
Noticed in kubernetes/test-infra#19483.

Our attempts to move the ci-kubernetes-build to Community Infra are failing because container images are not successfully getting pushed.

Comment from @ameukam (kubernetes/test-infra#19483 (comment)):

do this via adding the service account e-mail address to the [email protected] group?

ci-kubernetes-build-canary still fails even after the service account is added (see kubernetes/k8s.io#1393) to [email protected] : https://testgrid.k8s.io/sig-testing-canaries#build-master-canary

prow-build service account inherits of the permissions of the role roles/cloudbuild.builds.editor as member of [email protected] :

https://github.com/kubernetes/k8s.io/blob/74bfdc5741bdde3b8f489bdd8327474101b3b5e4/infra/gcp/lib.sh#L209-L231

which is not enough to make the job successful.

That's a credential issue that needs to be fixed in parallel.

This issue is specifically for some of my expectations around push-build.sh behavior.

What you expected to happen:

Any build jobs should verify access to the container image registry before proceeding

This is a fail-fast scenario.
If we know that a build is supposed to push GCR images, we should check that we're able to do that first, instead of build artifacts and waiting for the container push failure at the end of the scenario.

The check for the existence of a build only checks for GCS bucket artifacts, not container images

In scenarios/kubernetes_build.py

https://github.com/kubernetes/test-infra/blob/329444781ba13be597917343cca4aa1b92366b6d/scenarios/kubernetes_build.py#L45-L84

If we consider a "complete" build to also include container images, this check should verify that those exist as well before claiming a build is not required.

A build should not push artifacts if it cannot guarantee that all of them will be available

The current push-build.sh logic:

release/push-build.sh

Lines 867 to 918 in 4c6b5aa

    
           ############################################################################## 
        
           common::stepheader COPY RELEASE ARTIFACTS 
        
           ############################################################################## 
        
           attempt=0 
        
           while ((attempt<max_attempts)); do 
        
             if $USE_BAZEL; then 
        
               release::gcs::bazel_push_build $GCS_DEST $LATEST $KUBE_ROOT/_output \ 
        
                                              $RELEASE_BUCKET && break 
        
             else 
        
               release::gcs::locally_stage_release_artifacts $LATEST \ 
        
                                                             $KUBE_ROOT/_output \ 
        
                                                             $FLAGS_release_kind 
        
               if ((FLAGS_fast)); then 
        
                 BUILD_DEST="$GCS_DEST/fast" 
        
               else 
        
                 BUILD_DEST="$GCS_DEST" 
        
               fi 
        
               release::gcs::push_release_artifacts \ 
        
                $KUBE_ROOT/_output/gcs-stage/$LATEST \ 
        
                gs://$RELEASE_BUCKET/$BUILD_DEST/$LATEST && break 
        
             fi 
        
             ((attempt++)) 
        
           done 
        
           ((attempt>=max_attempts)) && common::exit 1 "Exiting..." 
        
           if [[ -n "${FLAGS_docker_registry:-}" ]]; then 
        
             ############################################################################## 
        
             common::stepheader PUSH DOCKER IMAGES 
        
             ############################################################################## 
        
             # TODO: support Bazel too 
        
             # Docker tags cannot contain '+' 
        
             release::docker::release $FLAGS_docker_registry ${LATEST/+/_} \ 
        
               $KUBE_ROOT/_output 
        
           fi 
        
           # If not --ci, then we're done here. 
        
           ((FLAGS_ci)) || common::exit 0 "Exiting..." 
        
           if ! ((FLAGS_noupdatelatest)); then 
        
             ############################################################################## 
        
             common::stepheader UPLOAD to $RELEASE_BUCKET 
        
             ############################################################################## 
        
             attempt=0 
        
             while ((attempt<max_attempts)); do 
        
               release::gcs::publish_version $GCS_DEST $LATEST $KUBE_ROOT/_output \ 
        
                                             $RELEASE_BUCKET $GCS_EXTRA_VERSION_MARKERS && break 
        
               ((attempt++)) 
        
             done 
        
             ((attempt>=max_attempts)) && common::exit 1 "Exiting..." 
        
           fi

Here, we should probably attempt to publish artifacts in the following order:

container images
GCS artifacts
version marker

That way, if images fail to push, then the build job fails before copying to GCS.
If there's nothing in the bucket, then the check in #1 will cause a new build to always be attempted.

@hasheddan -- I'll leave you to divide up the work as appropriate.

/assign @hasheddan @ameukam @cpanato
cc: @kubernetes/release-engineering @spiffxp
/priority critical-urgent

How to reproduce it (as minimally and precisely as possible):

See kubernetes/test-infra#19483.

Anything else we need to know?:

Environment:

Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Others:

The text was updated successfully, but these errors were encountered:

justaugustus · 2020-11-06T16:11:29Z

FYI @kubernetes/ci-signal, as this broadly explains some build job failures you may be seeing.

spiffxp · 2020-11-06T20:32:50Z

There is strong overlap with kubernetes/test-infra#18808

Saving the version marker for last hopefully addresses most of the concerns that prevent us from overwriting incomplete builds.

ref: - kubernetes#1693 - kubernetes/test-infra#18808 Signed-off-by: Stephen Augustus <[email protected]>

spiffxp · 2021-01-25T23:00:18Z

Where do we stand on trying to make this happen in v1.21 timeframe?

fejta-bot · 2021-04-25T23:40:00Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

mkorbi · 2021-04-26T19:28:33Z

looks like WIP
will we get this done in 1.22? 👍
/remove-lifecycle stale

k8s-triage-robot · 2021-09-20T21:38:43Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-10-20T22:23:27Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-11-19T22:31:27Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-11-19T22:31:46Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

justaugustus added kind/bug Categorizes issue or PR as related to a bug. sig/release Categorizes an issue or PR as relevant to SIG Release. area/release-eng Issues or PRs related to the Release Engineering subproject labels Nov 6, 2020

k8s-ci-robot assigned ameukam, cpanato and hasheddan Nov 6, 2020

k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Nov 6, 2020

thejoycekung mentioned this issue Nov 6, 2020

[Failing Test] build-canary fails on push-build.sh (exit status 2) due to lack of permissions kubernetes/kubernetes#95936

Closed

hasheddan mentioned this issue Nov 6, 2020

Return stderr output in kubernetes_build called subprocesses kubernetes/test-infra#19870

Merged

justaugustus mentioned this issue Nov 8, 2020

[krel] Initial commit krel ci-build command #1698

Merged

justaugustus added a commit to justaugustus/release that referenced this issue Nov 8, 2020

pkg/build: Ensure images are pushed before publishing GCS artifacts

ddfcc98

ref: - kubernetes#1693 - kubernetes/test-infra#18808 Signed-off-by: Stephen Augustus <[email protected]>

This was referenced Nov 9, 2020

releng: Add a job to test creating CI builds w/o the bootstrap image kubernetes/test-infra#19887

Merged

ci-kubernetes-build-* jobs should build without requiring bootstrap.py scenarios #1711

Closed

saschagrunert mentioned this issue Apr 21, 2021

Add Roadmap and Vision kubernetes/sig-release#1529

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2021

justaugustus assigned puerco and unassigned hasheddan Jun 22, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 20, 2021

k8s-ci-robot closed this as completed Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

push-build.sh container image pushes should precede staging GCS artifacts and writing version markers #1693

push-build.sh container image pushes should precede staging GCS artifacts and writing version markers #1693

justaugustus commented Nov 6, 2020

justaugustus commented Nov 6, 2020

spiffxp commented Nov 6, 2020

spiffxp commented Jan 25, 2021

fejta-bot commented Apr 25, 2021

mkorbi commented Apr 26, 2021

k8s-triage-robot commented Sep 20, 2021

k8s-triage-robot commented Oct 20, 2021

k8s-triage-robot commented Nov 19, 2021

k8s-ci-robot commented Nov 19, 2021

push-build.sh container image pushes should precede staging GCS artifacts and writing version markers #1693

push-build.sh container image pushes should precede staging GCS artifacts and writing version markers #1693

Comments

justaugustus commented Nov 6, 2020

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

justaugustus commented Nov 6, 2020

spiffxp commented Nov 6, 2020

spiffxp commented Jan 25, 2021

fejta-bot commented Apr 25, 2021

mkorbi commented Apr 26, 2021

k8s-triage-robot commented Sep 20, 2021

k8s-triage-robot commented Oct 20, 2021

k8s-triage-robot commented Nov 19, 2021

k8s-ci-robot commented Nov 19, 2021