-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VolumeGroupReplication support #1472
base: main
Are you sure you want to change the base?
Conversation
edae774
to
ee1b472
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also created this for some common scheme to process CGs across volsync and volrep, that relates to some comments here: #1483
controllers/vrg_volrep.go
Outdated
@@ -62,7 +64,7 @@ func (v *VRGInstance) reconcileVolRepsAsPrimary() { | |||
} | |||
|
|||
// If VR did not reach primary state, it is fine to still upload the PV and continue processing | |||
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, log) | |||
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, pvc, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I am thinking we should process a group here, and if the group is processed then check the PVCs reported by the group as protected and check it against the PVCs that we decide should belong to this group.
Assume we have a list of groups, with PVCs that belong to the group. We need to process the group only once as Primary, and future PVCs can be validated as part of the group status (i.e already protected by the group) or not.
ff8fcf1
to
45dd82a
Compare
2f4eee7
to
2854dc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to review vrg_volrep.go file. Other changes are reviewed, nothing major, mostly integration points that may need some additional handling.
@@ -548,6 +556,10 @@ func (v *VRGInstance) processVRG() ctrl.Result { | |||
return v.invalid(err, "Failed to process list of PVCs to protect", true) | |||
} | |||
|
|||
if err := v.updatePVCListForCG(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(future) At some point for clarity, we potentially should remove the optimizations for PVC grouping into either lists and let the separate handle it. As currently we look at various conditions and optimize out calling the separate function. This is to help code redability and not functionality as such.
return nil | ||
} | ||
|
||
for idx := range v.volRepPVCs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This becomes common for both Volsync and Volrep, i.e PVC labeling. @BenamarMk @youhangwang (FYI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Currently, VolSync is not using this function yet (addConsistencyGroupLabel
). The temporary solution in the VolSync-related CG code involves the user labeling the PVCs. We liked the idea of having users label the PVCs as it allows for multiple CGs per storageId. This is still missing from @youhangwang's PR.
I think we should implement the following for both VolSync and VolRep:
- Provide a common label CG name, such as
ramendr.openshift.io/cg-label
for a lack of a better name. - If the user labels the PVC with that label, then the PVC label value for CG should be the combination of
ramendr.openshift.io/consistency-group: storageId + cg-label
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'm waiting for this pr merged, then I can rebase my pr to adopt this func. or maybe I can copy this func into my pr to handle volcyncPVCs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Provide a common label CG name, such as `ramendr.openshift.io/cg-label` for a lack of a better name. 2. If the user labels the PVC with that label, then the PVC label value for CG should be the combination of `ramendr.openshift.io/consistency-group: storageId + cg-label`.
Ack! if it is required that PVCs be separated into CGs of their own, even as they are selected by a common pvcSelector provided to Ramen, this label can help create sub-cg groups as desired.
Would it be safe to assume that this would be in the future, IOW not part of this PR or the initial CG PRs. The initial set would just group all PVCs to a cg, when possible, giving no further granularity to the app owner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
@@ -668,10 +680,18 @@ func (v *VRGInstance) updatePVCList() error { | |||
return nil | |||
} | |||
|
|||
if err := v.updateReplicationClassList(); err != nil { | |||
v.log.Error(err, "Failed to get VolumeReplicationClass list") | |||
if rmnutil.IsCGEnabled(v.instance.GetAnnotations()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenamarMk if the CG annotation is added to an already created and reconciled DRPC/VRG, will it cause issues here. I.e now the VRG reconciliation would shift to CG based processing and potentially leave stale non-CG resources (at least in the Volrep case).
IOW, CG enabling is during initial reconcile from DRPC and should not be enabled later, is that somehow prevented overall from DRPC creating the VRG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShyamsundarR Yes, that would violate the CG requirement. Existing workloads are not permitted to be converted to use CG retroactively. If a user inadvertently adds the CG annotation to an already created and reconciled DRPC/VRG, it could lead to inconsistent states, given that PVCs has to be labeled to use CG as well.
CG enabling should occur only during the initial reconciliation of DRPC and should not be enabled afterward. This is not currently prevented by the DRPC when creating the VRG, so if a user adds the CG annotation later, the user would eventually encounter CG errors.
@@ -763,6 +858,17 @@ func (v *VRGInstance) separatePVCsUsingStorageClassProvisioner(pvcList *corev1.P | |||
} | |||
} | |||
|
|||
if !replicationClassMatchFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CG is enabled, and SC has the replicationID label, then we should error out if we do not find a matching VGRClass, and not default the PVC to Volsync.
This either gets added here or with the PR #1487
@@ -6,6 +6,8 @@ apiVersion: storage.k8s.io/v1 | |||
kind: StorageClass | |||
metadata: | |||
name: rook-ceph-block | |||
labels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar change would be needed to the CephFS storageClass from file test/addons/rook-cephfs/kustomization.yaml ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validateVRStatus
should ensure the desired PVC is part of the VGR group status, so that we know it is protected. Currently if a new PVC is added to the group we will report that as protected based on the existing VGR output.
@@ -67,7 +68,7 @@ func (v *VRGInstance) reconcileVolRepsAsPrimary() { | |||
} | |||
|
|||
// If VR did not reach primary state, it is fine to still upload the PV and continue processing | |||
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, log) | |||
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, pvc, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A group request will protect all PVCs in the group, and so when we invoke protection for a single PVC here, without ensuring we are done with prior steps like pvcUnprotectVolRepIfDeleted
, preparePVCForVRProtection
we may introduce races that may get in the way of protection.
While the code to create/update VR or VGR is common, I think we may need to do certain operations to the group here before we proceed with a VGR for the group.
I am in parallel thinking how best to achieve this with the existing code as well.
@@ -67,7 +68,7 @@ func (v *VRGInstance) reconcileVolRepsAsPrimary() { | |||
} | |||
|
|||
// If VR did not reach primary state, it is fine to still upload the PV and continue processing | |||
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, log) | |||
requeueResult, _, err := v.processVRAsPrimary(pvcNamespacedName, pvc, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pvcUnprotectVolRepIfDeleted
should remove the CG label from the PVC, such that this PVC is no longer protected as part of the group.
Currently if the function above is traced (and required flags enabled) it would delete the entire VGR and not remove the PVC from the VGR.
While the feature of PVC deletion is still not the default (as there are other DRPC changes that are required), we should be aware of this problem if this is enabled.
internal/controller/vrg_volrep.go
Outdated
return "", fmt.Errorf("missing storageID for PVC %s/%s", pvc.GetNamespace(), pvc.GetName()) | ||
} | ||
|
||
vgrName := storageID + "-vgr" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be multiple workloads independently DR protected in the same namespace. These can have PVCs from the same SC with the same StorageID. The name will then conflict.
I would suggest, munging this with the VRG name as well. We cannot have 2 VRGs with the same name in the same namespace, so using that as part of the name maybe an option. It does create a possibility that we will have a longer name, so any shortening here may assist in that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed VGR name to {vrg}-{storageID}, @ShyamsundarR what is we will remove -drpc from vrg name? It will shorten the whole name a little
fc759d3
to
bebc6ae
Compare
54c0651
to
989c437
Compare
fe41328
to
5b446a3
Compare
2b9ce80
to
d02aad6
Compare
830172b
to
ffb9e5a
Compare
ca3299c
to
ceee8b6
Compare
6311db7
to
c704dc2
Compare
5897f1e
to
94c9841
Compare
62afd4e
to
3cc7b69
Compare
Signed-off-by: Elena Gershkovich <[email protected]>
Signed-off-by: Elena Gershkovich <[email protected]>
3cc7b69
to
ead5e73
Compare
This version for VolumeGroupReplication support introduces code for separation of PVCs, created as part of consistency group. On every VRG reconcile we are iterating over all PVCs, protected by volume replication, and marking those, who are part of consistency group. Afterwards, during VR reconciliation, we will create VR for every PVC, that is not part of the consistency group, and VGR for all PVCs, that are using the same storage id.