Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sig-release] Umbrella issue for a job that signs artifacts and uploads them to a GCS bucket #913

Closed
11 tasks
dims opened this issue Apr 17, 2019 · 51 comments
Closed
11 tasks
Assignees
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/release Categorizes an issue or PR as relevant to SIG Release.
Milestone

Comments

@dims
Copy link
Member

dims commented Apr 17, 2019

Currently we need googlers to build, sign and upload deb/rpm artifacts. We need a prow job(s) that can do this. Ones that can be triggered by the sig-release team when they cut the release.

  • Find OWNERS for the deb and rpm definitions and associated scripts.
  • Define policy for dependencies are named inside the deb/rpm definition metadata
  • We need a CNCF owned signing key. (We need to build a web of trust that signs this key, until then we can just use a temporary key, see kubernetes/kubernetes#70132)
  • We need one or more GCS buckets for storing the artifacts (talk to wg-k8s-infra)
  • Another GCS bucket for staging build artifacts pending approval
  • We need a directory structure of how we would store the artifacts (to accomodate daily/nightly in addition to what we do today)
  • Ability to Inspect staged artifacts (manually? automatically?) to ensure compliance with community-approved release process
  • We need a trusted cluster for the job(s) (talk to wg-k8s-infra. The signing key will need to be loaded onto the cluster so the jobs can access it)
  • We need a way to trigger the jobs (git-ops style, need a design for the yaml files, guessing we will need the SHA's for the k/k, k/release repos and version numbers)
  • Ability for these jobs to Migrate approved release artifacts to GCS bucket for approved builds
  • Add job definitions in test-infra to run the jobs in the trusted cluster
@dims
Copy link
Member Author

dims commented Apr 17, 2019

cc @timothysc
/sig release

@timothysc
Copy link
Member

/assign @akutz

@justaugustus
Copy link
Member

/priority critical-urgent
/milestone v1.15

@tpepper
Copy link
Member

tpepper commented Apr 18, 2019

This covers a set of mechanism points, but our current issue is largely one of policy in my opinion.

What dependencies are named inside the deb/rpm definition metadata? Are they specified as '=' to a version, '>=' to a version, or without a version? If versioned, do we insure all non-archived packages have the highest preferred version (re-packaging old builds to enforce fresh dependency versioning), or do we only build and publish new Kubernetes versions with the then freshest dependency version? Do we reap/archive older packages from the repos?

For any of these intended choices of package build and publication, we need documented criteria for the choice makers...when and why do they bump a piece of metadata and what all must follow from that.

By building tools not backed in a process which itself is not backed by a shared knowledge and intent in people, we aren't changing much. Changing the tools alone does not change the underlying problem. To do that we need to understand what people do/need, define process that supports them, and build tooling that automates the process. In that order, not the opposite order.

@timothysc
Copy link
Member

@tpepper add it to the list.

@dims
Copy link
Member Author

dims commented Apr 18, 2019

@timothysc @tpepper i added it as item number 2, presumably the OWNERS from item number 1 will help define this policy.

@akutz
Copy link
Member

akutz commented Apr 18, 2019

Hi @tpepper,

I could be mistaken, but I believe the intent of documenting the issue is to facilitate a discussion that results in a community-approved, well-documented approach. At that point it is up to the community to actually participate and follow through on their agreement, which the correct processes built on the proper tooling can help ensure. For example, a staging area for artifacts that are inspected prior to being published could validate that the community-approved release process was followed.

To that end, @dims, I would like to suggest adding the following to the above checklist:

  • GCS bucket for staging build artifacts pending approval
  • Inspect staged artifacts (manually? automatically?) to ensure compliance with community-approved release process
  • Migrate approved release artifacts to GCS bucket for approved builds

@akutz
Copy link
Member

akutz commented Apr 18, 2019

Hi All,

I'm wondering...should this issue be a KEP? With all due respect to @dims, should the above list be a foregone conclusion on the prescribed process? As @tpepper said, the process needs community buy-in, or the whole things falls flat. To that end, doesn't it make sense to socialize the design and discussion in the manner which other K8s features are handled?

@justaugustus
Copy link
Member

@akutz -- I can draft one.

@dims
Copy link
Member Author

dims commented Apr 18, 2019

@akutz added slight variations of the 3 points to the list.

@akutz this is a brain dump from my experience, AFAIK, there has never been a full workflow with steps proposed, this is the attempt (not trying to push for this to be authoritative)

@justaugustus thanks! please go for it.

@dims
Copy link
Member Author

dims commented Apr 18, 2019

@akutz we also have a tendency to go off into the weeds very quickly, so hopefully a check list will help inform the KEP :)

@justaugustus
Copy link
Member

/assign @justaugustus @tpepper

@akutz
Copy link
Member

akutz commented Apr 18, 2019

Hi All,

This is likely out-of-scope of the spec, but it will be necessary as an implementation detail so I'm adding it here.

Prow currently lacks the ability for people in the OWNERS file to manually trigger post-submit and periodic jobs. The release jobs would likely be one of those. Currently the only way to manually trigger these job types is with API server access and rights. Steve Kuznetsov and I discussed this at length in Slack at https://kubernetes.slack.com/archives/C09QZ4DQB/p1553298938731900.

@justaugustus
Copy link
Member

@akutz -- can you open an issue in k/test-infra, tag us, and cross-link it here, so it doesn't get lost in the discussion?

@timothysc
Copy link
Member

PSA - I've ask @thockin and @spiffxp for a concrete backlog which will overlap with some of the above and may expand some of the items as well.

@ttousai
Copy link

ttousai commented May 31, 2019

@akutz @tpepper @justaugustus do you still expect to complete this issue for 1.15?

@soggiest
Copy link

soggiest commented Jun 3, 2019

/milestone v1.16

@dims
Copy link
Member Author

dims commented Jul 13, 2019

/assign @justaugustus

@justaugustus
Copy link
Member

We're making headway on some of this, but we're going to see more of it land for 1.17.
/milestone v1.17
/remove-priority critical-urgent
/priority important-soon
/remove-kind bug
/kind feature cleanup

@justaugustus justaugustus transferred this issue from kubernetes/kubernetes Oct 28, 2019
@justaugustus
Copy link
Member

(Migrated to k/release)

/sig release
/area release-eng
/milestone v1.17
/priority important-soon
/kind feature cleanup

@k8s-ci-robot k8s-ci-robot added sig/release Categorizes an issue or PR as relevant to SIG Release. area/release-eng Issues or PRs related to the Release Engineering subproject labels Oct 28, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 28, 2019
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/feature Categorizes issue or PR as related to a new feature. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Oct 28, 2019
@saschagrunert
Copy link
Member

@sftim feel free to add your suggestions directly to the spreadsheet since it is our source of truth for now. We plan to outline the whole topic in a KEP (as usual) and request input from others via the usual workflow. You can also add your name to one of the columns if you like to get directly involved in one of the milestone topics. 🙏

@BenTheElder
Copy link
Member

ref: #913 (comment)

I'm going through off-boarding from the Kubernetes EngProd team @ Google and found that while I've been on vacation the past couple weeks we dropped down to only two people in this rotation, and it will at least briefly be down to one (⚠️) as I exit.

@MushuEE will be onboarding 1-2 junior team members O(soon) but that's where this is at on the Google end right now ...

I think the team is hiring, but it's relatively small right now, anything involving these packages just isn't prioritized with other more pressing work – they're more of a historical tech-debt handed off from the previous release-engineering team.

I'll do what I can to help, but my own priorities will be shifting / expanding with my new role, and I will be losing access to sign / publish to the Google infra.

@dims
Copy link
Member Author

dims commented Jul 27, 2022

@BenTheElder thanks for the update. This is just an indication that we should increase the priority of this work and get it done quickly we've known this for a while now given that this issue has been open from 2019 ( cc @kubernetes/sig-release )

@BenTheElder
Copy link
Member

I think folks in SIG release have started on this and I left a note in slack as well, there’s been recent activity like kubernetes/enhancements#3434 🙏 that I haven’t caught up on yet, but I’m sure others are also watching this issue as well since we’ve linked to it from requests to change how signing is done etc 😅

@lasomethingsomething
Copy link

+1 @dims. Here is the current work plan, with milestones broken down to highlight various needs and current progress noted. kubernetes/enhancements#1731 (comment) cc @detiber @ameukam @RobertKielty @Verolop (and @castrojo, who's offered to help do some outreach on behalf of the third-party hosting services-related tasks).

@BenTheElder
Copy link
Member

The rotation has been down to one person (@MushuEE) and is up to two (@MushuEE and @BenjaminKazemi) and is receiving internal pushback about how it is handled. I've talked to this team (my former team) about this recently and they do not have bandwidth to do anything further anytime soon.

For folks interested in Kubernetes-provided DEB/RPM packages, the current "package built on the workstation of 1-2 googlers and then published to google cloud's package host" could still really use some attention. From my point of view this has been and is at serious continuity risk.

@sftim
Copy link
Contributor

sftim commented Oct 10, 2022

It sounds like the bit where folks have to be a Googler, right now, is the package signing. Everything else can happen outside Google - this is an open source project after all. @dims has it right to focus on the signing aspect aside from the package build.

I wonder if a way to tackle this is to start with getting a Kubernetes-controlled private key, even a disposible one, to sign some trivial package. Maybe fetch an existing package using curl; maybe make a package that owns deploys /hello-world.txt

Once we can do that part, we can look at signing something more complex, and also how to better protect the private key.

@BenTheElder
Copy link
Member

BenTheElder commented Oct 10, 2022

I believe the current google-internal hosting infrastructure is also coupled to the signing system, and because we're not even using the product offering there, one way or another we're going to have to migrate both package signing and publishing/hosting in order to bring this to community control. The project will need to select a solution for both signing and hosting.

I think figuring out key management is definitely one of the tricky steps.

Even the Google employees with permission to sign and publish packages cannot access, add, or modify the signing keys on the existing infrastructure.

In the short term, it's also time consuming that we conflate building packages with access to sign and publish, since those are actually distinct steps, an intermediate step that could be worked on in parallel is moving the actual package build onto the community release process. Sometimes the package building is currently broken because it is in no way run in automation / CI, consuming even more time. The package sign and upload process is much quicker and less error-prone than the builds.

If we moved that to the upstream release build process, then the Googlers could download, then sign and publish those packages until the community signing and publishing is available. That would reduce the time required from that small rotation, and the community signing and packaging would also then have package builds available to work with. It also fixes the "these are built on someone's workstation and nobody can really vet them properly". After some transition period we could phase out publishing on the existing infra.

Previous discussion has been hampered by efforts to build an entirely new package build system, which we do not currently use at all. I would strongly urge that we lift-and-shift the existing build we ship releases with today and iteratively improve it later.

@saschagrunert
Copy link
Member

@BenTheElder this is what we propose in kubernetes/enhancements#3434, I’ll update the KEP today so that it’s ready for review.

@BenTheElder
Copy link
Member

Thanks @saschagrunert, FWIW I did not read it that way: I read the KEP as blocked on replacing the package build tools as well. (Really glad to see that it's active regardless! 🙏)

@BenTheElder
Copy link
Member

BenTheElder commented Oct 12, 2022

Actually, to immediately de-risk things somewhat at even lower cost we could simply do:

  • split the rapture script into a build script and a sign + publish script
  • enhance the build script to upload the outputs to a storage bucket
  • enhance the sign + publish script to download from a storage bucket (we already input a k8s version)
  • release team / release managers can take over running the build

The release team / managers the project at large can staff up, this two-person rotation we cannot.

I think the Google team may also have some automation options that are more viable to consider investing in as a stopgap to reduce some of the pressure on them and the other internal pushback if they don't have to get a functioning build environment with docker etc. ...

@saschagrunert
Copy link
Member

@BenTheElder generelly yes, why not. 👍

  • release team / release managers can take over running the build

I see a bunch of pros and cons related to this and I get the overall intention of that take. Still I'd like to not increase the complexity on the release managers side by additionally running a manual step during the release. Can we ensure that the package build script is that battle proof that it works on every release managers machine and produces the same outputs?

Alternatively, automating the rapture build script into krel stage seems to be a hack, because we explicitly decided to go away from running bash scripts in the release pipeline during the whole development of krel and deprecation of anago.

I think we should keep up the discussion and find a good way to integrate the changes faster without introducing unnecessary tech debt.

@saschagrunert
Copy link
Member

@dims do we still need that umbrella issue with a possible merged of kubernetes/enhancements#3434 and therefore an up to date KEP in kubernetes/enhancements#1731?

@dims
Copy link
Member Author

dims commented Oct 13, 2022

@saschagrunert nope. we can close it!

@saschagrunert
Copy link
Member

Closing in favor of kubernetes/enhancements#1731

@BenTheElder
Copy link
Member

BenTheElder commented Oct 13, 2022

I see a bunch of pros and cons related to this and I get the overall intention of that take. Still I'd like to not increase the complexity on the release managers side by additionally running a manual step during the release. Can we ensure that the package build script is that battle proof that it works on every release managers machine and produces the same outputs

I mean no more or less than it is for the two part time volunteers that can only be googlers we block on today.

the requirements are well documented. Currently it requires docker and rpm and the script has been in use for ages with occasional minor updates.

As far as producing the same outputs ... again no more or less? Google workstations receive updates to docker and rpm too ... package building is not doing anything Google specific and google workstations are not a reproducible build environment.. Never have been. Signing and publishing are doing google specific things but that happens after building.

@BenTheElder
Copy link
Member

Alternatively, automating the rapture build script into krel stage seems to be a hack, because we explicitly decided to go away from running bash scripts in the release pipeline during the whole development of krel and deprecation of anago.

This I don't quite understand either, make release in kubernetes is a metric ton of bash with the thinnest of makefile on top ... every krel pipeline is invoking a lot of bash 🙃

Invoking one more build script surely won't be a serious regression?

This package build code in the script is what we have shipped with since the first Kubernetes packages ever up through now, 8+ years into the project

@saschagrunert
Copy link
Member

Alternatively, automating the rapture build script into krel stage seems to be a hack, because we explicitly decided to go away from running bash scripts in the release pipeline during the whole development of krel and deprecation of anago.

This I don't quite understand either, make release in kubernetes is a metric ton of bash with the thinnest of makefile on top ... every krel pipeline is invoking a lot of bash upside_down_face

I know, but we in SIG Release do not maintain that code directly on a daily basis.

Invoking one more build script surely won't be a serious regression?

It will for us, because it moves the responsibility with the code. We decided against bash for the main reason that it's not being tested in the same way we can test golang applications. Means having the same logic in golang encapsulated in test-able units (a library) could be the right path forward for building the packages. We still evaluate OBS right now, so that discussion has to be finished before.

@BenTheElder
Copy link
Member

BenTheElder commented Oct 14, 2022

I know, but we in SIG Release do not maintain that code directly on a daily basis.

But, nobody has asked to maintain it. Invoking != maintaining, as evidenced by this conversation re: make release.

Further, using something for now does not mean it cannot be swapped out later or that there is a long term commitment to maintaining it.

Build, sign, publish are each a distinct step. Building is time heavy and should easily be executed in an automated build environment that has all the same requirements (make release has more environment dependencies actually).


As of this week, anyone can build the Debian and RPM packages Googlers have been building, signing and publishing since I've taken the straightforward step of factoring out the tiny build step into a distinct entrypoint: #2708

You need docker and bash installed. You clone the repo and run hack/rapture/build-package.sh 1.25.2, where 1.25.2 is the version of Kubernetes you wish to build packages for.

Any continued dependence on my former team having time to build packages is purely artificial, and does not need to be blocked on building a golang package build tool.

This script is not difficult to run, it is very little code to maintain, and I'm readily available to answer any questions on the subject, despite no special expertise being required.

I no longer have the access to publish these, but I'm doing my best to ensure that there is an opportunity to prevent these packages from going the way of hyperkube, if folks wish to take over. If not, there are many other options for installing Kubernetes and many other problems to work on.

If I were invested in these distro packages I would reduce the dependency on the 1-2 people who have permission to publish these as quickly as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests