-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support zero-downtime upgrading for the Trident node plugins #740
Comments
For your reference, here is the reproducing step for this issue.
|
@gnarl In this situation, unfortunately, during the removal of Trident, the trident-csi(node-plugin_ remained in a In actual our cases of failure, it took many hours(4-5H) from the time of failure to the time of recovery. We want to enhance from delete-install to rolling update of the Trident for it(see the right side fig). As you know, |
Hi @ysakashita, Thank you for this explanation of the outage you've experienced. This helps to clarify the situation your customer experienced. Our team has examined the situation and we don't believe there is a better immediate workaround than monitoring the Trident DaemonSet Pod to determine if it is stuck in terminating state. The team understands the need to support rolling upgrades for the Trident DaemonSet based on your explanation. There are additional changes that need to be made to properly handle upgrading from N previous versions of Trident. This enhancement will need to be prioritized for a future Trident release. |
Trident 23.07.01 is released with fix for this issue. This fix is also present in Trident 23.10. |
closing the issue. |
Describe the solution you'd like
We would like the trident operator to upgrade the Trident node plugins without downtime.
The trident operator deletes the Trident DaemonSet once when updating the trident version. It causes downtime for mounting and unmounting until new DamonSet pods become ready.
It becomes a serious issue when one of the plugin pods cannot be deleted for some reason. The trident operator does not create a new DaemonSet until all plugin pods have been deleted since it deletes the DaemonSet with the foreground option. So there will be no nodes that can mount Trident volumes even when one trident pod cannot be deleted.
I understand the foreground deletion is for fixing issues like #444 and #487 . Is it possible to patch the DaemonSet instead of deleting it for recreating the pods? I think that patching the DaemonSet with a dummy annotation like
kubectl rollout restart ds
lets the DaemonSet controller perform a rolling update without downtime.The text was updated successfully, but these errors were encountered: