KEP-3314: Changed Block Tracking With CSI VolumeSnapshotDelta #3367

ihcsim · 2022-06-08T21:54:22Z

Signed-off-by: Ivan Sim [email protected]

One-line PR description: Add new KEP for changed block tracking (CBT)

Issue link: CSI Differential Snapshot for Block Volumes #3314

Other comments:

keps/sig-storage/3314-csi-differential-snapshot/README.md

PrasadG193 · 2022-06-09T12:19:55Z

@ihcsim There is a mismatch in the callbackURL format used in DifferentialSnapshot status example in cbt-step-02.png i.e (https://stream-uri:port-number/cr-namespace/cr-name) and cbt-step-03.png which is (https://stream-uri:port-number/custom-resource-uid).

keps/sig-storage/3314-csi-differential-snapshot/README.md

humblec · 2022-06-09T16:18:23Z

@ihcsim Thanks..! I have few review comments and will pass one more round soon..

ihcsim · 2022-06-10T17:36:55Z

/retest

johnbelamaric

Ok, I gave a quick review of primarily just the PRR pieces. Is all development out-of-tree? If so then I think the PRR pieces are fine. I think you need to consult with API machinery and also the API review team though, this is...unusual.

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

xing-yang · 2022-06-22T13:16:05Z

/assign @msau42

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

jsafrane · 2022-06-23T11:40:00Z

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

+authorised service account's secret token as the bearer token. If the user's
+service account is deployed with `automountServiceAccountToken` set to `false`,
+they will have to extract the appropriate token from their secret.
+


Where the sidecar gets the cluster TLS CA certificate and key for the HTTPS server?

BTW, I think plain HTTP could be OK during alpha, still, you should have an idea how to add HTTPS.

Yeah, I left out the TLS part for alpha. As far as the sidecar is concerned, I am thinking it can get its CA cert, ready-to-use signed cert etc. from a Secret. Essentially, the sidecar doesn't have to worry about issuing CSR. The bigger question is how that secret gets created (e.g., cert-manager). Can we offer a default self-signed approach, and an advance "bring your own certs" approach?

I am not sure how other Kubernetes vendors handle certificates, IMO there is no common patter. OpenShift will generate a secret with cert + key for a Service, if the Service has a specific annotation.

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

Signed-off-by: Ivan Sim <[email protected]>

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

Signed-off-by: Ivan Sim <[email protected]>

johnbelamaric

PRR is fine, waiting for SIG approval before I approve.

jsafrane · 2022-10-06T09:12:24Z

I am not sure about usage of the aggregated API server and "virtual resources" there - I can see it is possible to implement it this way, I just don't know if it's the best option. The alternatives are well documented, so we can discuss the details in the API review. The interface between a storage backend, a CSI driver and aggregated API server looks OK.

From storage point of view, this KEP makes sense and is implementable.
/lgtm

I let @msau42 approve (or ping me on slack).

jsafrane · 2022-10-06T18:08:02Z

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

+The previous alternate design which involves generating and returning a callback
+endpoint to the caller has been superceded by the aggregation extension
+mechanism described in this KEP. The aggregation extension design provides a
+tighter integration with the Kubernetes API server, enabling the re-use of the
+existing Kubernetes machinery of GVR and GVK binding, URL registration and
+delegated authentication and authorization.


I talked to @deads2k about possibility of a backup software talking directly to CSI sidecar using a service (when the backup SW is inside a cluster) or using a public route / ingress when it's outside of the cluster. I.e. we won't need the aggregated API server at all. The SW can read VolumeSnapshotDeltaServiceConfiguration CR by itself to find the sidecar. The actual protocol between the sidecar and the backup SW is IMO implementation detail, but it can re-use Kubernetes API authentication using e.g. kube-rbac-proxy.

This is close tho this proposal, just with the same authn/authz as the API server and CSI sidecar announcing its http(s) endpoint in VolumeSnapshotDeltaServiceConfiguration.

It all depends on size of a regular and maximum list of changed blocks - if it's O(MiB), I'd be fine with this KEP, if it's O(GiB) then while the aggregated API server looks more Kubernetes-ish, it will just eat CPU and network bandwidth and it makes the overall architecture more fragile.

/lgtm cancel

I let @msau42 judge if it's good for alpha as it is.

Note that /pod/logs (and /exec and /portforward) is a kube subresource that can be O(GiB) but the traffic that flows across it is explicitly intended to be management traffic, and even then there have been multiple efforts to decouple exec/logs/portforward. I agree with @deads2k and @jsafrane - it is not appropriate to design core APIs that funnel large amounts (~O(GiB)) of non-management traffic through aggregated APIs (third parties may do this if they so choose). If a hypothetical decoupling layer were to be designed and implemented or the amount of data is O(MiB) it is less reasonable.

Generally speaking, if the design of an API resulted in bandwidth requirements that did not materially change the p99 bandwidth profile of the api server as it is today (~10 MiB/s for reasonable clusters, possibly ~100 MiB/s for large ones), it would be reasonable to do it via a kube control plane API.

@jsafrane @smarterclayton thanks for the feedback. The idea of the aggregated API was definitely inspired by /pod/logs and metrics-server, where we assumed that our traffic won't be worse than theirs. Will adding APF flow control and priority level configuration to protect the K8s api server helps here, as it proxies traffic to-and-forth our out-of-tree agg api server? Or is APF also meant for management traffic? The implication is that cluster admin will be able to control the changed block traffic, which may or may not be desirable.

If aggregated API server isn't an appropriate choice, will SIG architecture, SIG app or both, be the right forums for me to bring this design problem to? We have explore multiple alternatives over the past few months, including:

a CRD approach where the CSI driver returns an "out-of-band" data endpoint (via status subresource) to the user

a CRD approach where we let user specify a "callback URL" for the CSI driver to deliver the data to

direct HTTP calls to the CSI driver (without kube-style API)

aggregated API server

Signed-off-by: Ivan Sim <[email protected]>

k8s-ci-robot · 2022-12-15T17:36:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ihcsim
Once this PR has been reviewed and has the lgtm label, please assign johnbelamaric for approval by writing /assign @johnbelamaric in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-storage/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Ivan Sim <[email protected]>

thockin

my first read

thockin · 2023-02-06T00:40:19Z

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

+  // This field is optional, and may be empty if no secret is required. If the
+  // secret object contains more than one secret, all secrets are passed.
+  // +optional
+  VolumeSnapshotDeltaSecretRef *SecretReference


This struct isn't defined - why is it a Ref and not a Name ?

thockin · 2023-02-06T00:40:58Z

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

+  // Define the maximum number of entries to return in the response.
+  Limit uint64 `json:"limit"`
+
+  // Defines the start of the block index (in bytes).


is this the offset into the storage volume?

thockin · 2023-02-06T00:42:15Z

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

+  State VolumeSnapshotDeltaState `json:"state"`
+
+  // The limit defined in the request.
+  Limit uint64 `json:"limit"`


Why is it important to repeat this from spec?

thockin · 2023-02-06T00:42:25Z

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

+  // The limit defined in the request.
+  Limit uint64 `json:"limit"`
+
+  // The offset (in bytes) defined in the request.


thockin · 2023-02-06T00:43:27Z

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

+  OffsetBytes uint64 `json:"offsetBytes"`
+
+  // The size of the blocks.
+  BlockSizeBytes unit64 `json:"blockSizeBytes"`


storage is the only place where I actually worry that uint64 might eventually be too small - am I crazy?

thockin · 2023-02-06T00:44:01Z

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

+  // The size of the blocks.
+  BlockSizeBytes unit64 `json:"blockSizeBytes"`
+
+  // The optional token used to retrieve the actual data block at the given


This needs more explanation, I think?

thockin · 2023-02-06T00:44:33Z

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

+const (
+  // Successfully retrieved chunks of CBT entries starting at offset, and ending
+  // at offset + limit, with no more data left.
+  Completed VolumeSnapshotDeltaState = "completed"


Completed - constants are UpperBumpyCaps

johnbelamaric · 2023-02-07T19:27:23Z

Looking at commits since my last review, PRR should still be good on this. I just can't do the prow approval until I see the SIG approval.

ihcsim · 2023-02-07T22:07:33Z

@thockin @johnbelamaric Thanks for the feedback. The working group still has some concerns on the proposed aggregated API server approach. There is still some work to be done here.

ameade · 2023-02-17T14:44:39Z

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md

+  // The size of the block in bytes. This field is REQUIRED.
+  uint64 block_size_bytes = 2;
+
+  // The token and other information needed to retrieve the actual


is this for retrieving the actual bits in the block?

Yes - that's correct.

k8s-triage-robot · 2023-05-22T16:25:15Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

xing-yang · 2023-05-22T17:44:25Z

/remove-lifecycle stale

ihcsim · 2023-06-12T21:35:33Z

Superseded by #4082, due to authorship changes.

k8s-ci-robot requested review from saad-ali and xing-yang June 8, 2022 21:54

k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jun 8, 2022

xing-yang reviewed Jun 9, 2022

View reviewed changes

keps/sig-storage/3314-csi-differential-snapshot/README.md Outdated Show resolved Hide resolved

ihcsim changed the title ~~[WIP] KEP-3314: Changed Block Tracking With CSI Differential Snapshot~~ KEP-3314: Changed Block Tracking With CSI Differential Snapshot Jun 9, 2022

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2022

humblec reviewed Jun 9, 2022

View reviewed changes

keps/sig-storage/3314-csi-differential-snapshot/README.md Outdated Show resolved Hide resolved

humblec reviewed Jun 9, 2022

View reviewed changes

keps/sig-storage/3314-csi-differential-snapshot/README.md Outdated Show resolved Hide resolved

ihcsim force-pushed the cbt-kep branch 2 times, most recently from dc501d5 to 8953347 Compare June 10, 2022 04:10

Priyankasaggu11929 mentioned this pull request Jun 11, 2022

CSI Differential Snapshot for Block Volumes #3314

Open

4 tasks

ihcsim force-pushed the cbt-kep branch from b0e26a0 to e2a2168 Compare June 17, 2022 04:13

ihcsim changed the title ~~KEP-3314: Changed Block Tracking With CSI Differential Snapshot~~ KEP-3314: Changed Block Tracking With CSI VolumeSnapshotDelta Jun 17, 2022

johnbelamaric reviewed Jun 22, 2022

View reviewed changes

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Show resolved Hide resolved

k8s-ci-robot assigned msau42 Jun 22, 2022

jsafrane reviewed Jun 23, 2022

View reviewed changes

ihcsim force-pushed the cbt-kep branch from c89ac85 to d726673 Compare June 23, 2022 22:17

ihcsim force-pushed the cbt-kep branch 2 times, most recently from 334eda4 to 53bff5c Compare September 7, 2022 15:17

ihcsim added 2 commits September 12, 2022 10:13

Add CBT differential snapshot KEP

a9dda45

Signed-off-by: Ivan Sim <[email protected]>

Fix YAML syntax error in kep.yaml

36d9f78

Signed-off-by: Ivan Sim <[email protected]>

jsafrane reviewed Oct 5, 2022

View reviewed changes

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved

xing-yang reviewed Oct 5, 2022

View reviewed changes

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Show resolved Hide resolved

ihcsim added 2 commits October 5, 2022 15:20

Rename 'DriverDiscovery' to 'VolumeSnapshotDeltaServiceConfiguration'

9b437b4

Signed-off-by: Ivan Sim <[email protected]>

Use string type of 'state' property

47f4c70

Signed-off-by: Ivan Sim <[email protected]>

johnbelamaric reviewed Oct 6, 2022

View reviewed changes

k8s-ci-robot assigned jsafrane Oct 6, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022

jsafrane reviewed Oct 6, 2022

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022

liggitt assigned thockin Oct 13, 2022

Update KEP author list

3684cc5

Signed-off-by: Ivan Sim <[email protected]>

Set release milestone to 1.27

eb5dd82

Signed-off-by: Ivan Sim <[email protected]>

mhenriks mentioned this pull request Jan 24, 2023

Document backup partner integration touchpoints kubevirt/kubevirt#8868

Merged

thockin reviewed Feb 6, 2023

View reviewed changes

ameade reviewed Feb 17, 2023

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2023

ihcsim mentioned this pull request Jun 12, 2023

KEP-3314: CSI Changed Block Tracking #4082

Merged

ihcsim closed this Jun 12, 2023

liggitt removed the api-review Categorizes an issue or PR as actively needing an API review. label Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-3314: Changed Block Tracking With CSI VolumeSnapshotDelta #3367

KEP-3314: Changed Block Tracking With CSI VolumeSnapshotDelta #3367

ihcsim commented Jun 8, 2022 •

edited

Loading

PrasadG193 commented Jun 9, 2022

humblec commented Jun 9, 2022

ihcsim commented Jun 10, 2022

johnbelamaric left a comment

xing-yang commented Jun 22, 2022

jsafrane Jun 23, 2022

jsafrane Jun 23, 2022

ihcsim Jun 23, 2022

jsafrane Sep 27, 2022

johnbelamaric left a comment

jsafrane commented Oct 6, 2022

jsafrane Oct 6, 2022

smarterclayton Oct 10, 2022 •

edited

Loading

ihcsim Oct 11, 2022 •

edited

Loading

k8s-ci-robot commented Dec 15, 2022

thockin left a comment

thockin Feb 6, 2023

thockin Feb 6, 2023

thockin Feb 6, 2023

thockin Feb 6, 2023

thockin Feb 6, 2023

thockin Feb 6, 2023

thockin Feb 6, 2023

johnbelamaric commented Feb 7, 2023

ihcsim commented Feb 7, 2023

ameade Feb 17, 2023

ihcsim Feb 21, 2023

k8s-triage-robot commented May 22, 2023

xing-yang commented May 22, 2023

ihcsim commented Jun 12, 2023 •

edited

Loading

KEP-3314: Changed Block Tracking With CSI VolumeSnapshotDelta #3367

KEP-3314: Changed Block Tracking With CSI VolumeSnapshotDelta #3367

Conversation

ihcsim commented Jun 8, 2022 • edited Loading

PrasadG193 commented Jun 9, 2022

humblec commented Jun 9, 2022

ihcsim commented Jun 10, 2022

johnbelamaric left a comment

Choose a reason for hiding this comment

xing-yang commented Jun 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnbelamaric left a comment

Choose a reason for hiding this comment

jsafrane commented Oct 6, 2022

Choose a reason for hiding this comment

smarterclayton Oct 10, 2022 • edited Loading

Choose a reason for hiding this comment

ihcsim Oct 11, 2022 • edited Loading

Choose a reason for hiding this comment

k8s-ci-robot commented Dec 15, 2022

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnbelamaric commented Feb 7, 2023

ihcsim commented Feb 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-triage-robot commented May 22, 2023

xing-yang commented May 22, 2023

ihcsim commented Jun 12, 2023 • edited Loading

ihcsim commented Jun 8, 2022 •

edited

Loading

smarterclayton Oct 10, 2022 •

edited

Loading

ihcsim Oct 11, 2022 •

edited

Loading

ihcsim commented Jun 12, 2023 •

edited

Loading