-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug v2.0.0] Volume Topology Requirements do not conform to CSI spec #333
Comments
Revert the topology segment label used for new volumes to `csi.hetzner.cloud/location`. Adding the new label was a mistake and leads to incompatibility with the CSI spec (and Nomad). We plan to fully revert the changes, but that will require user intervention to fix all volumes created with the new label. By changing the label used on new volumes, we can limit the amount of volumes that need to be fixed for users that already upgraded to v2.0.0. See issue #333 for details.
Revert the topology segment label used for new volumes to `csi.hetzner.cloud/location`. Adding the new label was a mistake and leads to incompatibility with the CSI spec (and Nomad). We plan to fully revert the changes, but that will require user intervention to fix all volumes created with the new label. By changing the label used on new volumes, we can limit the amount of volumes that need to be fixed for users that already upgraded to v2.0.0. See issue #333 for details.
Revert the topology segment label used for new volumes to `csi.hetzner.cloud/location`. Adding the new label was a mistake and leads to incompatibility with the CSI spec (and Nomad). We plan to fully revert the changes, but that will require user intervention to fix all volumes created with the new label. By changing the label used on new volumes, we can limit the amount of volumes that need to be fixed for users that already upgraded to v2.0.0. See issue #333 for details.
Version Users of |
Revert all changes we made to our reported topology for nodes and volumes. Because we report two segments on the `NodeGetInfo` call, but only one of them on `CreateVolume` we are not compliant with the CSI spec. This did not matter for Kubernetes, because the scheduler still worked, but it breaks Nomad. As we are not compliant with the CSI spec, we decided to revert these changes, even though it will require user intervention to fix volumes created in the meantime. For details see issue #333.
I'm confused with the recommendations. If 2.0.1 should be the first good and fixed version of 2.x, on a new cluster installing for first time, would 2.0.1 be good to use? If I want to start today a new cluster, what should block me in installing 2.0.1? |
Nothing blocks you, but that version still does not follow the CSI spec, so we do not recommend using that version for a new cluster, only if you are using Version |
2h is great. Many thanks. I thought it might take another few days or more. For me this are great news, I intended to have it ready for next week. |
Revert all changes we made to our reported topology for nodes and volumes. Because we report two segments on the `NodeGetInfo` call, but only one of them on `CreateVolume` we are not compliant with the CSI spec. This did not matter for Kubernetes, because the scheduler still worked, but it breaks Nomad. As we are not compliant with the CSI spec, we decided to revert these changes, even though it will require user intervention to fix volumes created in the meantime. For details see issue #333.
Also removed section about cluster-autoscaler fix, as we reverted that in v2.1.0, see #333 for details.
Version New users and users of Users of |
The guideline to fix If you have any further questions, feel free to comment here or contact the Hetzner Cloud support. |
Summary
Topology labels added to new volumes in v2.0.0 does not conform to the CSI spec.
We plan to fix this ASAP, but the change will require manual fixing of affected
PersistentVolumes
.See section recommended actions to learn how to proceed.
This issue will be updated as the situation progresses. I recommend subscribing to the issue to receive notifications.
Recommended Actions
v2.0.0
was installed, reference the wrong topology label. They will still work withv2.1.0+
, but we recommend to fix the labels.Background
The CSI Spec supports topology requirements for volumes, this happens in two steps:
NodeGetInfo
callCreateVolume
call), the driver then chooses which of the supplied topologies the volume will serve and returns those in the response.For version 2.0.0 of the CSI driver we made a change to the topologies, to fix an issue with cluster-autoscaler and to use a standardized label, see #302 for details of this change:
The change was validated with Kubernetes and everything worked as we expected it. It turns out that this does not strictly adher to the CSI spec, because the created volume has a
accessible_topology
that does not match the nodesaccessible_topology
. Thecsi.hetzner.cloud/location
label is missing.The
accessible_topology
segments are combined using AND in the spec, so theoretically this should not be schedulable in Kubernetes. In practice the Kubernetes scheduler allows this, and the volume can still be attached. Because of this we did not find the bug prior to release.The Nomad scheduler on the other hand is strict, and only allows scheduling to happen if all topology constraints from the volume match the ones of the node.
While we could implement a dirty hack to make this work in Nomad, we want to do the right thing and conform to the CSI spec. For this, we will revert the changes from #302 and go back to only using the topology constraint
csi.hetzner.cloud/location
.All
PersistentVolumes
created with v2.0.0 of the csi-driver reference thetopology.kubernetes.io/region
label, which will not be reported by the CSI driver in v2.1.0+. Because of the way Kubernetes handles scheduling, the volumes will still be useable inv2.1.0+
of the driver. We will provide a guide to fix the topology labels for these volumes ASAP.Plan
This plan tracks the progress of individual steps to deal with the issue. The details of this plan might change.
v2.0.0
release to point users to this issuev2.0.1
(fix: invalid topology label on new volumes #333 #334)v2.0.0
, and not for any new volumes afterwards.v2.1.0
(fix: revert invalid topology changes #333 #335)v2.0.0
still work with this version, so anyone may upgradev2.0.x
to fix volumes with broken topology requirements.The text was updated successfully, but these errors were encountered: