Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to migrate from csi-driver 0.3 -> csi-driver csi-1.0 #296

Closed
sbskas opened this issue Mar 30, 2019 · 29 comments
Closed

How to migrate from csi-driver 0.3 -> csi-driver csi-1.0 #296

sbskas opened this issue Mar 30, 2019 · 29 comments

Comments

@sbskas
Copy link
Contributor

sbskas commented Mar 30, 2019

Since driver name changed and PV are tagged with driver-name, how do one migrate the already created PV from pre-csi-1.0 version to the new version.

Changing the driver in the persistent volume fails with :
"# persistentvolumes "pvc-638476e6264211e9" was not valid:

* spec.persistentvolumesource: Forbidden: is immutable after creation"

What's the plan then to migrate smoothly from csi-rbdplugin to rbd.csi.ceph.com ?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Apr 1, 2019

@ShyamsundarR @humblec do we support migration?

@ShyamsundarR
Copy link
Contributor

@sbskas Currently the PV naming scheme in 1.0 is undergoing more changes, thus the naming would be different than what was present in v0.3 of the Ceph-CSI implementation. As this is immutable post creation, a direct upgrade to the v1.0 version from v0.3 version is not feasible.

We also have to analyze the breaking changes from the perspective of the CSI spec itself to understand what else may break on such upgrades.

As of now we do not have any migration steps to move from 0.3 to 1.0. Possibilities are,

  • PV to PV migration of data, where both instances of the CSI plugin exist, and a pod can use the older and newer PVs to copy out the data.
  • If kubernetes allows it, change the PV names in the backend (rbd/cephfs) and create the requres omaps, and change the VolID as kubernetes know it (this is speculative at best for now)

@gman0 @rootfs thoughts?

@sbskas
Copy link
Contributor Author

sbskas commented Apr 3, 2019

@ShyamsundarR, naming is not an issue. PV has all information to work by itself, even after storageclass deletion as long as finalizer,provisioner, and attacher are able to process it. My main point is indeed the PV are immutable so are the pvc.
Could the driver take in charge of both pv (rdb-cephplugin + rdb.csi.ceph.com) ? That would be a goot starting point. Then implement some kind of migration using maybe kubernetes-csi-migration-library.
That will be a thrill. Up to now there's only reference implementation for GCE, but maybe rbd <-> csi-rbd would be doable ?

@ShyamsundarR
Copy link
Contributor

@ShyamsundarR, naming is not an issue. PV has all information to work by itself, even after storageclass deletion as long as finalizer,provisioner, and attacher are able to process it. My main point is indeed the PV are immutable so are the pvc.
Could the driver take in charge of both pv (rdb-cephplugin + rdb.csi.ceph.com) ? That would be a goot starting point. Then implement some kind of migration using maybe kubernetes-csi-migration-library.
That will be a thrill. Up to now there's only reference implementation for GCE, but maybe rbd <-> csi-rbd would be doable ?

The mentioned repository is now updated to point to this one instead: https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/csi-translation-lib

@ShyamsundarR
Copy link
Contributor

@kfox1111 @sbskas

Question: Supposing we state the following support statement for volumes (and snapshots) created using existing 0.3 and 1.0 versions of the Ceph-CSI drivers (rbd and CephFS), is it sufficient to address any concerns with already in use plugins?

For volumes created using 0.3 and existing 1.0 versions of the Ceph-CSI plugins the following actions would be supported by a future version of the plugin,

  • DeleteVolume
  • DeleteSnapshot
  • NodePublishVolume (IOW, mounting and using the volume for required IO operations)

And, the following would be unsupported:

  • CreateVolume from snapshot source which is from an older version
  • CreateSnapshot from volume source which is from an older version

@kfox1111
Copy link
Contributor

I think there's multiple things maybe here....

This issue is talking about specifically 0.3 -> 1.0 ?

The driver rename happend post 1.0. as I got hit by that and I manually burned down all my 0.3 volumes and deployed 1.0 fresh and then the driver rename happened.

Then there's the issue where they want to drop configmaps and push that info into ceph directly (I support this)

But thats already potentially 2 issues, post 1.0 where things are breaking or close to breaking.

I'm kind of ok not having a migration path from 0.3 to 1.0 as it doesn't effect me anymore. I already took that hit. but post 1.0 that is a problem. we guarantee some stability so should provide some kind of migration path or ongoing support.

For the configmap -> ceph migration, I'd be ok with some out of band tool that dumped the configmaps and loaded them into ceph. Then the code doesn't have to stick around in driver forever. Make it a migration step.

@sbskas
Copy link
Contributor Author

sbskas commented Apr 25, 2019

For me, I think it should be enough.
Just one question though : why not mutate the pv object to the new format and manage it like created objects with the new drivers ?

@kfox1111
Copy link
Contributor

the pv's cant be edited after they are created. The apiserver blocks changes.

@kfox1111
Copy link
Contributor

I can see 3 possible solutions to that:

  1. the kubernetes storage team provides an option to allow "unsafe" api editing of pv's.
  2. we try a procedure like: shutdown the driver, backup all relevant pv's. Delete them. update the driver name in backups, import from backups.
  3. We shutdown the apiserver and tweak the docs directly in etcd.

option two might be the most expedient and safe

@ShyamsundarR
Copy link
Contributor

I can see 3 possible solutions to that:

1. the kubernetes storage team provides an option to allow "unsafe" api editing of pv's.

I would not vote for the above, just for this plugin, seems like a lot of work.

2. we try a procedure like: shutdown the driver, backup all relevant pv's. Delete them. update the driver name in backups, import from backups.

Wouldn't the PV delete step move it to "pending" state if the driver is not up? and, hence not really delete the PV till the driver is back up again, at which point the real PV on Ceph gets deleted as well?

To prevent this, would an option in 0.3/1.0 versions of the plugin work, that would fake a delete? IOW, respond success for a delete, but in reality not delete anything in the Ceph Cluster? (just a thought, as I attempt to run these steps on a test cluster)

3. We shutdown the apiserver and tweak the docs directly in etcd.

option two might be the most expedient and safe

@kfox1111
Copy link
Contributor

Yeah. maybe an extra step of: update the pv's to have state so they get deleted is needed. If the driver isn't running, its probably ok?

@JohnStrunk
Copy link

Another migration option would be to swap things around directly on the ceph back end...

Assume: Both 0.3 and 1.0 can co-exist during the migration

Starting w/ a 0.3 PVC/PV pair: pvc/myvol and pv/myvol:

  • Provision a new, empty volume via the 1.0 provisioner: pvc/myvol-1.0 and pv/myvol-1.0.
  • By looking at the metadata, it should be possible to figure out both the original (pv/myvol) RBD volume name and the new (pv/myvol-1.0) RBD volume name.
  • Delete (directly on the back end) the new RBD volume
  • Clone the old RBD volume into that new name
  • (flatten, maybe?)
  • Delete via kube pvc/myvol original. This will cascade through and clean up the rest of the 0.3 resources.

The original data should now be visible in the new PVC: pvc/myvol-1.0.

If you really must have the same PVC name, either CSI clone it back into pvc/myvol or do the above steps again.

@ShyamsundarR
Copy link
Contributor

ShyamsundarR commented Apr 26, 2019

Ran the above procedure on a test cluster, and it works as required (thanks @JohnStrunk ).

@kfox1111 and @sbskas this would be the way to migrate, without leaving behind any older PVCs with older metadata for a longer duration, or without having to copy out the data. Let me know your thoughts, and we can possibly coordinate on scripts that can help achieve the same.

Here are some more intermediate steps that can help clarify the procedure better:

  • Install and configure latest 1.0 CSI plugins into namespace "latest-ceph-csi" with driverName <name>.rbd.csi.ceph.com

    • say latest is the stateless version from master at some point in the future
    • also ensure that the socket directory is changed to reflect the driver name as above
  • Remove older 0.3/1.0 storage class
    (Stops further PV creation using the older plugin)

  • Create storageclass pointing to the "latest" instance of the CSI plugin, if needed with the same name as the just deleted storageclass

  • Gather data from existing PVCs

    • kubectl get -A pvc -o jsonpath='{range .items[*]}{@.spec.storageClassName}{","}{@.metadata.name}{","}{@.spec.volumeName}{"\n"}{end}' | grep -E "^@${STORAGECLASSNAME}" > existing-pvcs.txt
    • Each line would be a comma separated <storageclass-name,pvc-name,image-name>
      • Where image-name is the name of the RBD image backing the PVC in 0.3/1.0 versions of the CSI plugin
  • Create new PVCs for the above list of existing PVCs (named say, "-latest")

    • Ideally, size would also need to be gathered above to create PVCs of the same size
  • Gather data for new PVCs

    • Get the <storageclass-name,pvc-name,volume-name> (as before)
      • kubectl get -A pvc -o jsonpath='{range .items[*]}{@.spec.storageClassName}{","}{@.metadata.name}{","}{@.spec.volumeName}{"\n"}{end}' | grep -E "^@${STORAGECLASSNAME}" > new-pvcs-tmp.txt
    • Further, get the Volume handle and hence the rbd image name, from the backing PV for each PVC like so,
      • for i in $(cat new-pvcs.out); do pv=$(echo $i | cut -d ',' -f 3); imgSuffix=$(kubectl get pv $pv -o jsonpath='{@.spec.csi.volumeHandle}{"\n"}' | cut -d '-' -f 5-); j=$(echo $i | cut -d ',' -f 1,2); echo $j,csi-vol-$imgSuffix; done > new-pvcs.txt
  • Stop workloads using the the PVCs under migration

  • The 2 lists generated above have the the new and old PVCs and its corresponding image name on Ceph for the following operations of replacing the RBD image backing the new PVCs

  - rbd snap create --pool <poolname> --snap <oldpvc-volumename>-snap <oldpvc-volumename>
  - rbd snap protect <poolname>/<oldpvc-volumename>@<oldpvc-volumename>-snap
  - rbd rm replicapool/<newpvc-volumename>
  - rbd clone <poolname>/<oldpvc-volumename>@<oldpvc-volumename>-snap replicapool/<newpvc-volumename>
  - rbd flatten replicapool/<newpvc-volumename>
  - rbd snap unprotect <poolname>/<oldpvc-volumename>@<oldpvc-volumename>-snap
  - rbd snap rm <poolname>/<oldpvc-volumename>@<oldpvc-volumename>-snap
  • Delete the existing PVCs
  • Modify the stopped workloads (pods) using the older PVCs to rename them to the new PVCs, and start the workloads
    • If this modification is intrusive, repeat the steps, as suggested, post deleting the existing PVCs to flip the names again
  • Delete the older 0.3/1.0 CSI plugin pods and configuration, once all PVCs are migrated

@kfox1111
Copy link
Contributor

Thanks for coming up with the procedure. looks like that was a fair amount of work.

But it is also a very long procedure that has many potential places where a mistake could be made.

I'm going to ask advice on the k8s sig-storage channel to see what other options are available. If there was a quick way to rename the driver under the hood, that may be preferable?

@sbskas
Copy link
Contributor Author

sbskas commented Apr 27, 2019

Indeed, this is what we did to migrate the pv.
However, there is still few roadblocks on the roard.
1- we have been unable to install both driver together. Either it's 1.0, or 0.3 not both. The csinodeinfo/driverregistrar features whatever barfs with the old driver registration.
2- the pv must be migrated with all the applications using the PV down (i.e. scaled to 0). Quite a big blackout.
3- Last, once the new PV have been created with the old named, we saw that deleting the PVC cleans up all relative ressources in kubernetes but forgets to effectivelment delete the rbd.

Isn't there a simpler way of migrating thing like a mutatingwebhook to change the pv ?

I was pointing to the csi-translation-library since one seems to be able to migrate old style PV to the new CSI without all those steps and I was wondering how they managed to do it ?
They seems to be able to mutate the nodes to the new specs in place.

@kfox1111
Copy link
Contributor

@msau42 any ideas?

@msau42
Copy link

msau42 commented Apr 29, 2019

The "simplest" I know of currently is to:

  1. VERY IMPORTANT: Modify all your PVs to use reclaim policy Retain!
  2. Create new copies of the PV.spec object with the new driver name. In that new copy, specify the same PV.spec.ClaimRef.Name/Namespace (but not uid).
  3. One by one, delete pod, then PVC. Recreate a new PVC (if you're not using statefulsets) with the same name. Since the newPV.spec.ClaimRef is pointing to that same PVC name, the system will automatically bind the PVC to newPV. You might be able to skip the pod deletion part if you force remove the PVC finalizer. But I haven't tried that out.
  4. Delete all the old PVs.
  5. Change your new PVs reclaim policy back to Delete (if it was previously)

@kfox1111
Copy link
Contributor

Hmm.. sounds like that might work... still a good chance of messing something up in the process.

How possible would it be to make the csidriver field of the pv editable?

@msau42
Copy link

msau42 commented Apr 29, 2019

That may fix this particular case, but doesn't solve the general issue of how to "upgrade" a PV.

@kfox1111
Copy link
Contributor

The problem is, pvc's and pv's are the abstraction between the user (pvc) and admin (pv) allowing scaleout of workload. If the admin has to deal with pvc's then it re-intertwines the workload from the cluster maintenance. This is especially hard on multitenant clusters. Being able to have a procedure that leaves the pvc's alone would be a huge benefit.

Solving it completely generally may not be possible.

Renaming a csidriver I expect to be a relatively common issue, as people will either not realize there is a convention and switch once they realize (like this driver) or the driver will someday change hands and want the driver name to go along. So, I'm kind of ok trying to find a solution to it, separately from trying to find a more general solution to allow editing of any pv. Is that reasonable?

@msau42
Copy link

msau42 commented Apr 29, 2019

Name of the driver should be treated like any other API field. Once something is 1.0, then there are strict backwards compatibility rules that drivers need to follow.

Anything pre 1.0 though does not have such strict guarantees.

@msau42
Copy link

msau42 commented Apr 29, 2019

On another note, you should be able to run a 0.3 driver with name X, and a 1.0 with name Y alongside each other. They should be seen as 2 different drivers. However, you may need to make sure they're not clobbering each other's sockets

@kfox1111
Copy link
Contributor

so, should the driver be renamed back, since it started life in 1.0 with the other name?

@ShyamsundarR
Copy link
Contributor

The 1.0 versioning was a misnomer IMO, as the version of the driver went along with the CSI spec version that it supported (it should have been a version for the driver, not the spec version).

The problem is larger than the driver name in this case, we are changing the on-disk rbd image name and also the passed back volumeID. Thus, mutating older PVs to newer ones, requires rbd changes (image names and rados maps) and also PV spec changes, to change the volumeID and related information.

@kfox1111
Copy link
Contributor

I asked when backwards compatability would be honoured, and it sounded like the answer was, once the driver hit 1.0. So I thought 1.0 had a meaning there.

I too have seen potentially breaking changes, but thus far they could be maintained unbroken by telling the chart to use the old name. I was planning on trying to block further breaking changes without migration plans.

I'm still looking for a migration plan for allowing renaming the driver to move forward.

For those running 1.0 and above, there is a non breaking plan. for 0.3 -> 1.0+, there isn't currently a plan.

@ShyamsundarR
Copy link
Contributor

I did not fully understand this comment, hence asking for clarifications.

I asked when backwards compatability would be honoured, and it sounded like the answer was, once the driver hit 1.0. So I thought 1.0 had a meaning there.

Could you point to the context in which this was answered. I am wondering if I answered this in some way or if this was before the current breaking changes under review.

I too have seen potentially breaking changes, but thus far they could be maintained unbroken by telling the chart to use the old name. I was planning on trying to block further breaking changes without migration plans.

I see that using the old name, would prevent any breakage till now, and the name is configurable, hence entirely feasible to use the older name as a solution to the driver name change in the code.

Further, as per my understanding I thought and still think we are in pre-1.0 space, and hence we were making breaking changes to the code. Although, we stopped this one, as that would have required further PV spec edits had it got in.

I'm still looking for a migration plan for allowing renaming the driver to move forward.

For those running 1.0 and above, there is a non breaking plan. for 0.3 -> 1.0+, there isn't currently a plan.

Even for 1.0 -> 1.0+ (or maybe 2.0, depends on what it is versioned) it would be a breaking plan once we merge the on-disk changes. The migration plan (barring "PV upgrades" or ability to edit PV meta-data) would be to change up the names as in this comment which was tested with 1.0+ breaking changes in place.

So we need to sort this out someway to allow for the breaking changes, such that we can support this more elegantly in the future.

Tagging @rootfs @gman0 for their comments or observations.

@kfox1111
Copy link
Contributor

kfox1111 commented Apr 29, 2019

I think the discussion was offline with some folks. So, sorry I don't have history of it. :(

#312 has not merged. This change in particular was the one I had in mind when I mentioned trying to block such changes unless there is a clear, reasonable migration plan. It Being under review isn't necessarily a bad thing as it allows progress to be made. But merging without a migration plan would be very bad/damaging to users.

@ShyamsundarR
Copy link
Contributor

Indeed, this is what we did to migrate the pv.
However, there is still few roadblocks on the roard.
1- we have been unable to install both driver together. Either it's 1.0, or 0.3 not both. The csinodeinfo/driverregistrar features whatever barfs with the old driver registration.

I did my initial experiments with 1.0 and 1.0+ code, hence repeated part of it (i.e running 2 CSI instances) with 0.3 today. I am able to run both instances without much trouble. The changes are,

  • Driver name should be different
  • Namespace where the CSI pods are running is different
  • Socket path is different (like the driver names)

A patch to the v1.0 helm chart looks like this to make it run as above.
Kubernetes version: 1.13.5

2- the pv must be migrated with all the applications using the PV down (i.e. scaled to 0). Quite a big blackout.
3- Last, once the new PV have been created with the old named, we saw that deleting the PVC cleans up all relative ressources in kubernetes but forgets to effectivelment delete the rbd.

Is the delete not working against the 0.3 version of the CSI Plugin? Request further details to enable testing the same. Thanks.

Isn't there a simpler way of migrating thing like a mutatingwebhook to change the pv ?

I was pointing to the csi-translation-library since one seems to be able to migrate old style PV to the new CSI without all those steps and I was wondering how they managed to do it ?
They seems to be able to mutate the nodes to the new specs in place.

The translation library transforms the PV parameters, but helps when the older PV is from an in-tree provisioner (based on Kubernetes code reading at present). Further, with the future scheme of VolumeID based encoding the RBD image and cluster, such a simple transformation is not feasible (for example we need to feed the transformation engine details about the Ceph cluster and pool IDs etc. for it to work).

This is an initial analysis of the csi translation based method.

@sbskas
Copy link
Contributor Author

sbskas commented Sep 6, 2019

I'm closing the issue since we no longer are using the ceph-csi 0.3 and ceph-csi team decided to not support ceph-csi 0.3 -> ceph-csi 1.x.x migration.

@sbskas sbskas closed this as completed Sep 6, 2019
Madhu-1 pushed a commit to Madhu-1/ceph-csi that referenced this issue Jun 20, 2024
Syncing latest changes from upstream devel for ceph-csi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants