-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Bug 1850057: update etcd followers first, use bfq on control plane #1946
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
692c5ee
to
3dd8b0a
Compare
OK a while ago I'd invested some time in tweaking the node controller to have useful logs around what it's doing; my first "point of contact" when looking at upgrades was its logs. But...we lose most those on upgrade since the pod gets killed, so trying to determine success for this PR was annoying. Ended up reworking things here so the node controller emits events - currently the MCD emits useful events which can be queried afterwards (in our CI runs we dump |
a1a3f0d
to
3a0a35e
Compare
Wondering how this squares with @ericavonb comment:
Since there have been huge efforts to decouple MCO/etcd in 4.5 and move to the etcd operator? |
|
OK yeah I think the events are useful, e.g. with this PR I can do
EDIT: watch me learn jq live! |
It sounds like you're suggesting a different tactic, not a different strategy, correct? In other words, the strategy is ensuring the MCO upgrades etcd followers first, the tactic is how that's implemented. I totally agree that the way we're talking to etcd directly here is hacky, but the debate for that is over here: |
Using Promecieus and looking at |
3a0a35e
to
8e61010
Compare
@cgwalters: This pull request references Bugzilla bug 1850057, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
34b9bf6
to
bc1d70b
Compare
We had an event when we were starting an OS update, but nothing when it was completed - one could implicitly get that by looking at the next event, but that's a bit fragile. And since then we started doing a lot more stuff with the OS, so let's add an event emitted before and after all OS changes so we can consistently get e.g. timing information about it. Relates to openshift#1962 around getting better data about timing during upgrades.
A while ago I'd invested some time in tweaking the node controller to have useful logs around what it's doing; my first "point of contact" when looking at upgrades was its pod logs. But...we lose most those on upgrade since the pod gets killed. Add events to the node controller too. Currently the MCD emits useful events which can be queried afterwards (in our CI runs we dump `events.json`). With this we can create a "journal/history" for upgrade/update events just by querying the event stream.
Part of solving openshift#1897 A lot more details in https://hackmd.io/WeqiDWMAQP2sNtuPRul9QA The TL;DR is that the `bfq` I/O scheduler better respects IO priorities, and also does a better job of handling latency sensitive processes like `etcd` versus bulk/background I/O .
We switched rpm-ostree to do this when applying updates, but it also makes sense to do when extracting the oscontainer. Part of: openshift#1897 Which is about staging OS updates more nicely when etcd is running.
The way we're talking to etcd is a bit hacky, I ended up cargo culting some code. This would be much cleaner if the etcd operator did it. But it's critical that we update the etcd followers first, because leader elections are disruptive events and we can easily minimize that. Closes: openshift#1897
bc1d70b
to
9c6b844
Compare
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@cgwalters: This pull request references Bugzilla bug 1850057, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1 similar comment
@cgwalters: This pull request references Bugzilla bug 1850057, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I just noticed that one of the azure runs above claimed success, but actually didn't upgrade the OS - it seems the MCO was reporting success but clearly the MCC and MCD were still working. It's possible this bug is somehow introduced by this PR...going to go look at some existing e2e-upgrade jobs. |
EDIT: See #1991 |
1 similar comment
EDIT: See #1991 |
@cgwalters: This pull request references Bugzilla bug 1850057, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1 similar comment
@cgwalters: This pull request references Bugzilla bug 1850057, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@cgwalters: This pull request references Bugzilla bug 1850057, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Updated some commentary in #1957 The argument for merging this one (etcd leader awareness) is basically: A useful baseline metric for disruption is etcd leader election count, and if OS upgrades might randomly force that anywhere between 1-3 times, success becomes harder to measure. But it probably does make sense to do #1957 first, (also get openshift/cluster-etcd-operator#418 in), measure a bit, then experiment with this on top. |
@cgwalters: This pull request references Bugzilla bug 1850057, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1 similar comment
@cgwalters: This pull request references Bugzilla bug 1850057, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Re: testing, I commented on the issue (#1897 (comment)) but maybe should have commented here. Cross-referencing... |
Per #1897 (comment) we will focus on etcd latency metrics and not leader elections - so it's not worth carrying the code today to have "etcd leader awareness" for node upgrades. |
@cgwalters: This pull request references Bugzilla bug 1850057. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR is now replaced by #1957 |
This PR rolls up a few others, including notably:
Then adds: WIP: Update etcd followers first
The way we're talking to etcd is a bit hacky, I ended up cargo
culting some code. This would be much cleaner if the etcd operator
did it.
But it's critical that we update the etcd followers first, because
leader elections are disruptive events and we can easily minimize
that.
Closes: #1897
Updated release images for testing (v2)
I think the previous results were misleading/broken due to using the same release image as base, new images:
test upgrade registry.svc.ci.openshift.org/coreos/walters-mco-upgrade-release@sha256:25deaaf3074cf984f352ab92fca5f61c4663c9b506ce05c50db8f723a5b386b7 registry.svc.ci.openshift.org/coreos/walters-mco-upgrade-release-target@sha256:1ad5e2fb2df0e4af4abd3c1e59461a7977924201eb0eada524ce7af34d7ac1c4