Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3314: Changed Block Tracking With CSI VolumeSnapshotDelta #3367

Closed
wants to merge 48 commits into from

Conversation

ihcsim
Copy link
Contributor

@ihcsim ihcsim commented Jun 8, 2022

Signed-off-by: Ivan Sim [email protected]

  • One-line PR description: Add new KEP for changed block tracking (CBT)
  • Other comments:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jun 8, 2022
@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jun 8, 2022
@ihcsim ihcsim changed the title [WIP] KEP-3314: Changed Block Tracking With CSI Differential Snapshot KEP-3314: Changed Block Tracking With CSI Differential Snapshot Jun 9, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2022
@PrasadG193
Copy link
Contributor

@ihcsim There is a mismatch in the callbackURL format used in DifferentialSnapshot status example in cbt-step-02.png i.e (https://stream-uri:port-number/cr-namespace/cr-name) and cbt-step-03.png which is (https://stream-uri:port-number/custom-resource-uid).

@humblec
Copy link
Contributor

humblec commented Jun 9, 2022

@ihcsim Thanks..! I have few review comments and will pass one more round soon..

@ihcsim ihcsim force-pushed the cbt-kep branch 2 times, most recently from dc501d5 to 8953347 Compare June 10, 2022 04:10
@ihcsim
Copy link
Contributor Author

ihcsim commented Jun 10, 2022

/retest

@ihcsim ihcsim changed the title KEP-3314: Changed Block Tracking With CSI Differential Snapshot KEP-3314: Changed Block Tracking With CSI VolumeSnapshotDelta Jun 17, 2022
Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I gave a quick review of primarily just the PRR pieces. Is all development out-of-tree? If so then I think the PRR pieces are fine. I think you need to consult with API machinery and also the API review team though, this is...unusual.

@xing-yang
Copy link
Contributor

/assign @msau42

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved
keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved
keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved
keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved
keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved
keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved
authorised service account's secret token as the bearer token. If the user's
service account is deployed with `automountServiceAccountToken` set to `false`,
they will have to extract the appropriate token from their secret.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where the sidecar gets the cluster TLS CA certificate and key for the HTTPS server?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think plain HTTP could be OK during alpha, still, you should have an idea how to add HTTPS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I left out the TLS part for alpha. As far as the sidecar is concerned, I am thinking it can get its CA cert, ready-to-use signed cert etc. from a Secret. Essentially, the sidecar doesn't have to worry about issuing CSR. The bigger question is how that secret gets created (e.g., cert-manager). Can we offer a default self-signed approach, and an advance "bring your own certs" approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how other Kubernetes vendors handle certificates, IMO there is no common patter. OpenShift will generate a secret with cert + key for a Service, if the Service has a specific annotation.

keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved
keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved
keps/sig-storage/3314-csi-volume-snapshot-delta/README.md Outdated Show resolved Hide resolved
Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRR is fine, waiting for SIG approval before I approve.

@jsafrane
Copy link
Member

jsafrane commented Oct 6, 2022

I am not sure about usage of the aggregated API server and "virtual resources" there - I can see it is possible to implement it this way, I just don't know if it's the best option. The alternatives are well documented, so we can discuss the details in the API review. The interface between a storage backend, a CSI driver and aggregated API server looks OK.

From storage point of view, this KEP makes sense and is implementable.
/lgtm

I let @msau42 approve (or ping me on slack).

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022
Comment on lines +1424 to +1429
The previous alternate design which involves generating and returning a callback
endpoint to the caller has been superceded by the aggregation extension
mechanism described in this KEP. The aggregation extension design provides a
tighter integration with the Kubernetes API server, enabling the re-use of the
existing Kubernetes machinery of GVR and GVK binding, URL registration and
delegated authentication and authorization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @deads2k about possibility of a backup software talking directly to CSI sidecar using a service (when the backup SW is inside a cluster) or using a public route / ingress when it's outside of the cluster. I.e. we won't need the aggregated API server at all. The SW can read VolumeSnapshotDeltaServiceConfiguration CR by itself to find the sidecar. The actual protocol between the sidecar and the backup SW is IMO implementation detail, but it can re-use Kubernetes API authentication using e.g. kube-rbac-proxy.

This is close tho this proposal, just with the same authn/authz as the API server and CSI sidecar announcing its http(s) endpoint in VolumeSnapshotDeltaServiceConfiguration.

It all depends on size of a regular and maximum list of changed blocks - if it's O(MiB), I'd be fine with this KEP, if it's O(GiB) then while the aggregated API server looks more Kubernetes-ish, it will just eat CPU and network bandwidth and it makes the overall architecture more fragile.

/lgtm cancel

I let @msau42 judge if it's good for alpha as it is.

Copy link
Contributor

@smarterclayton smarterclayton Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that /pod/logs (and /exec and /portforward) is a kube subresource that can be O(GiB) but the traffic that flows across it is explicitly intended to be management traffic, and even then there have been multiple efforts to decouple exec/logs/portforward. I agree with @deads2k and @jsafrane - it is not appropriate to design core APIs that funnel large amounts (~O(GiB)) of non-management traffic through aggregated APIs (third parties may do this if they so choose). If a hypothetical decoupling layer were to be designed and implemented or the amount of data is O(MiB) it is less reasonable.

Generally speaking, if the design of an API resulted in bandwidth requirements that did not materially change the p99 bandwidth profile of the api server as it is today (~10 MiB/s for reasonable clusters, possibly ~100 MiB/s for large ones), it would be reasonable to do it via a kube control plane API.

Copy link
Contributor Author

@ihcsim ihcsim Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane @smarterclayton thanks for the feedback. The idea of the aggregated API was definitely inspired by /pod/logs and metrics-server, where we assumed that our traffic won't be worse than theirs. Will adding APF flow control and priority level configuration to protect the K8s api server helps here, as it proxies traffic to-and-forth our out-of-tree agg api server? Or is APF also meant for management traffic? The implication is that cluster admin will be able to control the changed block traffic, which may or may not be desirable.

If aggregated API server isn't an appropriate choice, will SIG architecture, SIG app or both, be the right forums for me to bring this design problem to? We have explore multiple alternatives over the past few months, including:

  • a CRD approach where the CSI driver returns an "out-of-band" data endpoint (via status subresource) to the user
  • a CRD approach where we let user specify a "callback URL" for the CSI driver to deliver the data to
  • direct HTTP calls to the CSI driver (without kube-style API)
  • aggregated API server

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022
Signed-off-by: Ivan Sim <[email protected]>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ihcsim
Once this PR has been reviewed and has the lgtm label, please assign johnbelamaric for approval by writing /assign @johnbelamaric in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my first read

// This field is optional, and may be empty if no secret is required. If the
// secret object contains more than one secret, all secrets are passed.
// +optional
VolumeSnapshotDeltaSecretRef *SecretReference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct isn't defined - why is it a Ref and not a Name ?

// Define the maximum number of entries to return in the response.
Limit uint64 `json:"limit"`

// Defines the start of the block index (in bytes).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the offset into the storage volume?

State VolumeSnapshotDeltaState `json:"state"`

// The limit defined in the request.
Limit uint64 `json:"limit"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it important to repeat this from spec?

// The limit defined in the request.
Limit uint64 `json:"limit"`

// The offset (in bytes) defined in the request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

OffsetBytes uint64 `json:"offsetBytes"`

// The size of the blocks.
BlockSizeBytes unit64 `json:"blockSizeBytes"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage is the only place where I actually worry that uint64 might eventually be too small - am I crazy?

// The size of the blocks.
BlockSizeBytes unit64 `json:"blockSizeBytes"`

// The optional token used to retrieve the actual data block at the given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more explanation, I think?

const (
// Successfully retrieved chunks of CBT entries starting at offset, and ending
// at offset + limit, with no more data left.
Completed VolumeSnapshotDeltaState = "completed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed - constants are UpperBumpyCaps

@johnbelamaric
Copy link
Member

Looking at commits since my last review, PRR should still be good on this. I just can't do the prow approval until I see the SIG approval.

@ihcsim
Copy link
Contributor Author

ihcsim commented Feb 7, 2023

@thockin @johnbelamaric Thanks for the feedback. The working group still has some concerns on the proposed aggregated API server approach. There is still some work to be done here.

// The size of the block in bytes. This field is REQUIRED.
uint64 block_size_bytes = 2;

// The token and other information needed to retrieve the actual
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for retrieving the actual bits in the block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that's correct.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2023
@xing-yang
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2023
@ihcsim
Copy link
Contributor Author

ihcsim commented Jun 12, 2023

Superseded by #4082, due to authorship changes.

@ihcsim ihcsim closed this Jun 12, 2023
@liggitt liggitt removed the api-review Categorizes an issue or PR as actively needing an API review. label Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.