-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for online upgrade #150
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is prep for online upgrade. This adds a new status message that we will use to report the phase that the upgrade is in. This also has a fix that will retry status changes for any transient error. You may have seen the error I'm talking about -- it usually complains about an update failing because we are not using the latest copy of the object.
This pulls out some common functions with the new online image change reconciler. A new Go struct, ImageChangeInitiator, was created that handles the common logic. This also adds new parameters to the CR to allow for control of what type of upgrade you want. No reconciler has been added yet for online image change.
This sketches out the flow for the online image change. Online image change is still not functional, but it lays out the structure of the code that I will fill in on subsequent PRs. This also does more refactoring. There were additional changes I wanted to reuse in offlineimagechange_reconcile.go. Those were moved to ImageChangeInitiator, and that struct was renamed to ImageChangeManager.
The online-upgrade process needs more information about the subcluster. This introduces a new data structure SubclusterHandle that has the Subcluster struct that is stored in etcd plus additional runtime info that is needed for the online-upgrade process. There is two parts to this change. First is fetching the additional information through sc_finder. The second is flowing the new SubclusterHandle through various functions. In a subsequent PR, I will start to fill out some of the functions in onlineupgrade_reconcile.go using the data collected in SubclusterHandle.
This is another PR for online-upgrade. It will handle creation and removal of the standby subcluster during the online-image change process. - new state was added to the vapi.Subcluster for this. Originally, I was planning to keep most of this in the SubclusterHandle struct, but we already pass around vapi.Subcluster so it made it easier to have it their - new status conditions for offline and online image change. These are intended to be used by the operator to know what image change to continue with once an image change has started - filled out more of the logic in onlineimagechange_reconciler.go. It will scale-out a new standby subcluster for each primary, then scale them down when we are finishing the image change. - moved more logic into imagechange.go that is common between online and offline image change - restart logic was changed to allow option to restart read-only nodes. When restarting for online, we will skip the read-only modes. Offline restarts everything.
…128) This adds manipulating of the service objects during an online image change. It will route client traffic to the standby when we are upgrading, then reroute the traffic back to the original subclusters when completing the upgrade. As a side affect this also adds the ability for multiple subclusters to share the same service object. This has benefits outside of the online image change process.
Use the term transient instead of standby. This also removes the SubclusterHandle struct.
This is the next set of changes for online image change. It will route to a temporary subcluster, called a transient, so that client connections connect to an up node. It will automatically route back to the original subcluster once the subcluster is back up. It will also process secondary subclusters that they are brought back up. This includes some rework in the onlineimagechange_reconciler to cut down on the amount of code duplication. Added the ability to specify a template for the transient cluster. This will get created when the image change starts and will get cleanup up when the image change is done.
…#133) This is the next set of changes for online image change. The CR parm TransientSubclusterTemplate was renamed to TemporarySubclusterRouting. The parm can be used to provide a template of a subcluster to use for temporary routing while subclusters are down. Or the parm can also be used to specify an existing subcluster. This later option may be useful to those that want to just reuse existing subclusters. This PR also cleans up how we route traffic to the subclusters. We previously had relied on a 'transient' label. But now we route to service name or subcluster name.
This fills in the Status.ImageChangeStatus message as we progress through the online image change. It also reworks the status message updates we do for offline image change to share the same infrastructure. I'm also including a change to keep the ssh key in the Vertica container stable between builds. The ssh key is just used for communication between the Vertica nodes. A stable key allows Vertica nodes from a different container image to be able to communicate. This becomes an issue when doing an online image change because the Vertica pods are either running the old or new image, yet they all have to talk to each other.
This adds e2e tests for online upgrade. I added a new directory (e2e-11.1) since we cannot run these yet in our GitHub CI. This directory will contain all of the e2e tests that must run on a Vertica server 11.1 or higher. We can fold these into the main e2e after 11.1 is GA'd. Also, we use another vertica image in e2e tests called BASE_VERTICA_IMG. The online upgrade tests will change the image from BASE_VERTICA_IMG to VERTICA_IMG. During the testing, a few issues were found that I have fixes for: - ObjReconciler, DBAddNodeReconciler and DBAddSubclusterReconciler will work only on the transient subcluster. We previously added the transient to the VerticaDB, then run these reconciler as-is. But it can pick up other changes that can interfere with the upgrade -- namely scaling out changes. So I no longer update the VerticaDB with the transient, and run the reconciler's just with the transient subcluster. - avoid creating the transient if the cluster is down. We need the cluster to be up to create the transient, so skipping that entire part if the cluster is down - Restart reconciler will avoid restarting pods for the transient subcluster. This subcluster intentionally stays on the old image and there is no way to restart an old image if the primaries are already updated to the new image.
This adds rules to the webhook for online image change: - prevent changes to upgradePolicy when imageChange is in progress - transient subcluster template: isPrimary == false, name cannot be an existing subcluster, size > 0 if name present - transient subcluster cannot be added/removed during an online image change - if multiple subclusters share a ServiceName, service specific things must be common between them (serviceType, NodePort, externalIPs, etc).
- when running AT start_db, we need to run from one of the primary nodes. It won't work if we try from a read-only node that isn't being restarted. - when calling re_ip, use the --force option. This option is new in 11.1.0, so we needed conditional logic to know when we could use this. - when calling start_db, we use the host list. This option exists first in 11.0.1, so like reip, we needed conditional logic to know when we can use that option One thing that isn't related to the title of this PR is some new logic needed in DBAddNodeReconicler. That reconciler will now requeue if some pods aren't yet ready. This was needed so that the upgrade properly waits for the transient subcluster to scale out. Prior to this change, it was possible that the image change went ahead and restarted the primaries before the transient was up. This should be solved now.
This adds drain logic so that we wait for active connections to disappear before taking down a subcluster. I added finer granular messaging for imageChangeStatus so that we will have a clear idea if it is waiting for the drain of a particular subcluster. This change involves sorting the output from sc_finder. This was necessary to match up the status message with order that we will process the subclusters. Also including a fix that waits for the transient pod to be in a ready state. There was a small timing window where we started to route client traffic to the transient before it was ready. The ready probe is run every 10 seconds, so there was a window where vertica was up but k8s didn't yet know about it. A new e2e test was added to make sure draining works for the primary and secondary subclusters.
This adds upgrade logic for VerticaDB created from older versions of the operator. We changed the selector label for pods in the sts. The selector label is immutable, so in order to upgrade to the 1.3.0 release, the sts and their pods need to be destroyed. This is handled automatically by a new reconcile actor. This means that when upgrade to the 1.3.0 release, any running Vertica instance will be stopped then restarted since deleting sts will cause the pods to go away. A new github workflow was added so that we can change operator pod upgrades going forward.
This adds checking in the operator to ensure the proper upgrade path is chosen. It will catch attempts to skip released versions and prevent downgrades. A backdoor was added to the CR for those that don't want this behaviour. You can simply set .spec.ignoreUpgradePath to be true.
This changes the default behaviour for temporarySubclusterRouting. It now defaults to picking existing subclusters rather than creating a transient subcluster.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds support for online upgrade of the Vertica server and allow multiple subclusters to share the same Service Object.
To initiate an upgrade of a Vertica cluster in Kubernetes you simply change the name of the container image in the CR. Prior to this stage, we had to drive an offline upgrade because Vertica didn't support mixed versions running at the same time. Starting in Vertica 11.1.0 (GA in Feb 2022), a mix of versions is now supported. Vertica supports this by forcing nodes running the older versions to be in read-only mode – only queries can be run on the node, no DML or DDL. We are taking advantage of this feature to implement an online version of an upgrade.
New externals were added to the CRD to support this mode:
New status fields are provided that will allow the upgrade to be monitored. A summary of the new fields are as follows:
The Vertica server needs to be on version 11.1 for this to work. This includes the current version that we are running and the new version we are upgrading. So this means online upgrade won't be usable until after the next release. For instance, you need to be on 11.1.0 already and want to upgrade to 11.1.1.
Prior to this PR, we had a one-to-one mapping of subclusters to Service objects. This meant that a single Service object could only direct traffic to a single subcluster. We needed the ability to have one Service object to direct to multiple subclusters as part of the work for online upgrade. We are showing this new feature in a separate section to call this out as it can have uses that are not specific to online upgrade.
When defining a subcluster in the CR, a new field was added called serviceName. When the operator reconciles the subclusters in the CR, it will create a Service object using the name specifed in serviceName. If you want to have multiple subclusters sharing the same Service object, use the same name for both. The default behaviour is for each subcluster to have its own Service object. We do this by using the name of the subcluster as the serviceName.
Here is a sample CR, where multiple subclusters share the same service object:
Once created, the following service object will exist that will route connections between the two subclusters: sample-connections.