-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI for CoreOS projects: 2021 plan #764
Comments
In particular a major sub-thread I want to pursue is this:
Examples:
Also, we could add an optional In particular a flow I'd really like to enable that crosses all domains - as a Fedora kernel developer, test this kernel package in OKD. (And the same for C8S/RHEL). |
Is it possible to run tests with Prow, but not have it spam PR comments and not have it responsible for merging? |
Yes, I think if we e.g. removed the Skimming through that file, e.g. ComplianceAsCode turned off most plugins and looking at a random recent PR there I don't see any GH comments from the Prow bot. |
The "merge logic" is a whole sub-thread to this. I would say ideally we have some consistency across at least coreos/ repositories on this. The Prow logic predates the existence of the Github-native PR approval, not to mention the even more recent Github-native "Merge this PR after CI is green" logic. We could maybe take a vote on the options of:
Maybe a sub-spike on this is adding a Prow job to e.g. coreos/ignition but turn off the plugins and see how that goes? (I assume ignition doesn't want the comment spam?) |
Let's try out openshift/release#16706 |
Most notably drop approve and lgtm, which are responsible for a lot of comment spam. We are going to experiment with using the Github-native methods for this. xref coreos/fedora-coreos-tracker#764
Even more sub-details around this: When configured to handle merging, Prow will re-test PRs against the latest master. It solves the "semantically but not textually conflicting PRs" problem, see https://github.com/barosl/homu#why-is-it-needed The comment spam around failing tests isn't unique to Prow; e.g. Github Actions also seems to default to sending a direct email with PR results, and Travis did the same for a long time. The core problem they're solving is that if your CI takes longer than a minute, the submitter is going to context switch away; and without an async notification of failure they need to poll. Now honestly, I've been trashing the Prow emails for a long time and indeed I rely on polling; I periodically check on my PRs across various repos. But, it's a bit awkward. |
The GitHub Actions email is harmless by comparison, since it doesn't conflate humans and bots into the same thread (email or web). Prow feels like it's designed for a workflow where only the bot is thinking about the overall state of the repo, and humans are laser-focused on individual PRs. That could make sense for projects at a certain scale, but to me it feels intrusive for smaller projects. The lack of control of merge timing [1] and the assumption that automated tests have sufficient coverage [2] both seem like sources of friction. [1] Both its tendency to encourage premature merges by accepting the first |
(In other words, I don't think Prow's fix for "semantically but not textually conflicting PRs" is worth the cost. I'm not opposed to using it for running tests.) |
Hmm, I think
I usually only see that kind of latency when it's doing retesting (for the reason of semantic PR conflicts), and on repos which have nontrivial CI (which is definitely true for many of the Kubernetes and OpenShift repos, as well as ours - we're using Prow for "heavy lifting" jobs).
Yeah...I think I agree that for probably all of our repos it's a high cost relative to the value. The standard other mitigation for this is to have "post merge periodics" that build and test git master - we definitely do that too. Periodics/post merge are a good place to do more expensive CI too. Anyways, openshift/release#16706 merged and you can see the effect here: No Prow comments in coreos/rpm-ostree#2655 (edit: I mean other than it replying to me around the use of What do people think? I think we just have coreos-assembler and bootupd hooked up to Prow approve/lgtm. If there's rough consensus around continuing in this direction it should be easy to do the same for those repos. |
OK on a non-Prow topic: The "GH Actions have write access" bit is making me hesitate a lot on GH Actions. I mean to me a whole lot of the point of this is it's a zero cost, low friction way to run some quick arbitrary CI from containers for linting type things. I can see the use case for Actions to mutate the repo, just doesn't seem like it should be the default. Hmm I assume that actions can't override required status checks and branch protection; i.e. simply having write access doesn't let them push code to git master, but they can change labels, add PR comments and stuff. That's OK I guess, but we should be sure that we have required status checks and branch protection on I think. |
Right. I meant that I prefer requiring the PR submitter to concur with merging the PR [1], which Prow doesn't do by default. That makes it easier for the submitter to make revisions based on discussion, or to wait for more review.
I'm not thrilled with it either. The obvious case is safe (jobs don't get write access if they're running on PRs submitted from a fork) but that only makes the other cases more subtle.
Yup, that's right. [1] When the submitter has ongoing responsibility for the repo, which usually means when they have write access. |
Does Prow get us any CI on s390x? |
If I don't want a PR to merge I usually make it draft, personally. |
That has different semantics, though. I read "draft" as "incomplete" and "non-draft" as "is believed ready to merge as-is". But the latter might change due to events, and it's not great to have to leap for the "convert to draft" button. |
The answer appears to be "not today", but there is a whole Multi-Arch CI thing internally that is doing related things, and that work might lead into supporting this for us. |
Following the plan in coreos/fedora-coreos-tracker#764 we will continue to use Prow to run tests, but not as a merge bot.
|
I've enabled required PR reviews in the cosa master branch and marked |
OK there's a new pain point. CI for rpm-ostree is currently failing here https://jenkins-coreos-ci.apps.ocp.ci.centos.org/blue/organizations/jenkins/github-ci%2Fcoreos%2Frpm-ostree/detail/PR-2694/3/artifacts
Now...mainly because the quay.io/coreos-assembler stuff is opaque to me I ended up setting up this cosa-buildroot container in api.ci. But then later, the DPTP team is shutting off registry.svc in favor of registry.ci which is more rigorous about what goes into it - it also requires authentication. I think what we should be doing is this: https://docs.ci.openshift.org/docs/how-tos/mirroring-to-quay/ But that to me raises the question of whether we should be e.g. putting this into quay.io/coreos/cosa-buildroot or so? We could probably just write to quay.io/coreos-assembler/cosa-buildroot of course too; we'd need to setup the secrets for that. Longer term though at least for cosa I think it would make a lot of sense to ensure that the This topic also relates to coreos/fedora-coreos-config#740 and of course the general question of where our CI builds run. Following that perhaps it should be |
Hmm, you should have access to the image. I've also invited you to the org itself. (I'm not entirely sure what the story is there wrt why it's in a separate namespace vs just coreos/. We should probably eventually fold it back into coreos/ but... we'd probably want to keep mirroring at the old location for a while.)
No strong opinion either way. Maintenance-wise, the simplest would be to just have Quay.io build the buildroot image to keep it consistent with how the main image is built. (Maybe just quay.io/coreos-assembler/buildroot given the namespace). But if you'd like, we can exercise the new mirroring path and set up whatever secrets needed for that. |
The thing is though that cosa is (somewhat) of a "cross" tool, whereas the buildroot is Fedora right now and we can't escape that. So let's briefly bikeshed this: how about |
Hmm, I agree that's where we want to go. But until we do coreos/fedora-coreos-config#740, it's a bit of a lie since it's really tracking the cosa |
Meh, if we're agreed on coreos/fedora-coreos-config#740, I can just brush that up right now and we get it in and set up those tags. |
SGTM! |
That said I think the "build in quay.io" path has some disadvantages - like if you want to do testing of those images and promotion...well, that's not in quay's scope. But it is definitely in Prow's scope. |
Yeah, CI for those branches is in a sad state right now. CoreOS CI wants to build Fedora and of course we don't have access to RHEL packages like api.ci does. (Though worth noting that it's slightly less broken now that CoreOS CI knows how to build images properly, which means that it should use the right Fedora base image at least.) So this is definitely a good argument for building and testing in Prow. My only hesitation would be whether we can make the registry.ci locations pullable publicly (i.e. no auth). I personally don't like the idea of introducing mirroring into the mix just to work around that because it adds lag and complexity. Hmm, but actually, we could just use Prow for testing RHCOS builds on PRs targeting those branches, but still have pushed commits build on Quay.io like today, right?
Do we need a promotion model for cosa other than the PR workflow? If a change breaks, we revert in git and rebuild. It seems like that model has worked pretty well so far. |
Perhaps one debate to have is creating github.com/openshift/coreos-assembler - i.e. we drop the rhcos-$x branches in github.com/coreos/coreos-assembler. |
OK, we now have https://quay.io/repository/coreos-assembler/fcos-buildroot?tab=builds! We'll need to update repos which used the buildroot image to start building with that image instead. I opened coreos/coreos-ci-lib#66 to make this easier. |
Matching coreos/ignition#1182 and discussion in coreos/fedora-coreos-tracker#764
OK I propose we actually close this issue and maintain a "living document" at e.g. github.com/coreos/fedora-coreos-tracker/doc/ci-and-pipeline.md |
SGTM! |
Trying to migrate content from coreos#764 which is a proposal into a "how it works" that we can maintain over time.
Trying to migrate content from coreos#764 which is a proposal into a "how it works" that we can maintain over time.
Trying to migrate content from #764 which is a proposal into a "how it works" that we can maintain over time.
OK since we got rough consensus here, closing. We can open new issues or even PRs to the doc for proposed plans. |
Matching coreos/ignition#1182 and discussion in coreos/fedora-coreos-tracker#764
I followed the link in #764 (comment) and noticed that things have now changed regarding read-only tokens for GitHub Actions. GitHub has now implemented more granular permissions. |
Hi, thanks for following up. Indeed, we (particularly bgilbert) have been gradually enabling those restrictions, e.g. coreos/stream-metadata-go#28 etc. |
As far as I know, all of our Actions workflows (except for bootupd) should have restrictions enabled now. |
CoreOS CI 2021
Nothing in this is a firm commitment. Including "2021" in the title is intended to imply that e.g. we may change small or large aspects of this in 2022 (or earlier). Individual repository owners may choose to do different things. However, having standardized and well-maintained centralized CI flows is a huge benefit to our team.
This issue is moved from #263
Proposal: Use a combination of CoreOS CI Jenkins, OpenShift Prow and per-repository Github actions.
CoreOS CI Jenkins (CCI)
It is what we use on various repositories, and is how FCOS is released today. We have a lot of institutional knowledge around this and it gives us a place where we can easily control the end-to-end interactions. Jenkins is a well understood tool.
This is deployed in CentOS CI which is a bare metal OpenShift cluster where nested virt is enabled.
Advantages:
Disadvantages:
Current and proposed use cases:
OpenShift Prow
Prow is heavily oriented towards testing OpenShift container components. However, as of recently we enabled nested virt on the
build02
GCP cluster, which means we can create "container native" flows that still test the OS with coreos-assembler.For Fedora CoreOS, we are independent of OpenShift release cycles. For RHEL CoreOS, we are tightly tied to them. It is really a requirement for openshift/os to increasingly tightly integrate with Prow. Specifically for openshift/os for example we want to follow the same
release-4.X
git branches as the platform.We also use it as a "merge bot" on some repositories with
/approve
and/lgtm
.Advantages of Prow:
/test azure
and/test aws
to e.g. IgnitionDisadvantages of Prow:
/hold
)Current and proposed use cases:
GitHub Actions
Free for small scale, nice to use. This is a good option for per-repository specific things that don't need centralization.
Advantages:
Disadvantages:
Example use cases:
Other CI systems
Ideally, we focus on these 3 and sharing as much as possible between them. The more CI systems we have, the more overhead there is for engineers and particularly new contributors to understand.
On this topic, this proposal specifically calls for dropping Travis usage.
CI types
Our different repositories
Each type has different requirements and tolerances.
The text was updated successfully, but these errors were encountered: