-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3314: Changed Block Tracking With CSI VolumeSnapshotDelta #3367
Conversation
@ihcsim There is a mismatch in the |
@ihcsim Thanks..! I have few review comments and will pass one more round soon.. |
dc501d5
to
8953347
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I gave a quick review of primarily just the PRR pieces. Is all development out-of-tree? If so then I think the PRR pieces are fine. I think you need to consult with API machinery and also the API review team though, this is...unusual.
/assign @msau42 |
authorised service account's secret token as the bearer token. If the user's | ||
service account is deployed with `automountServiceAccountToken` set to `false`, | ||
they will have to extract the appropriate token from their secret. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where the sidecar gets the cluster TLS CA certificate and key for the HTTPS server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I think plain HTTP could be OK during alpha, still, you should have an idea how to add HTTPS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I left out the TLS part for alpha. As far as the sidecar is concerned, I am thinking it can get its CA cert, ready-to-use signed cert etc. from a Secret
. Essentially, the sidecar doesn't have to worry about issuing CSR. The bigger question is how that secret gets created (e.g., cert-manager). Can we offer a default self-signed approach, and an advance "bring your own certs" approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how other Kubernetes vendors handle certificates, IMO there is no common patter. OpenShift will generate a secret with cert + key for a Service, if the Service has a specific annotation.
334eda4
to
53bff5c
Compare
Signed-off-by: Ivan Sim <[email protected]>
Signed-off-by: Ivan Sim <[email protected]>
Signed-off-by: Ivan Sim <[email protected]>
Signed-off-by: Ivan Sim <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PRR is fine, waiting for SIG approval before I approve.
I am not sure about usage of the aggregated API server and "virtual resources" there - I can see it is possible to implement it this way, I just don't know if it's the best option. The alternatives are well documented, so we can discuss the details in the API review. The interface between a storage backend, a CSI driver and aggregated API server looks OK. From storage point of view, this KEP makes sense and is implementable. I let @msau42 approve (or ping me on slack). |
The previous alternate design which involves generating and returning a callback | ||
endpoint to the caller has been superceded by the aggregation extension | ||
mechanism described in this KEP. The aggregation extension design provides a | ||
tighter integration with the Kubernetes API server, enabling the re-use of the | ||
existing Kubernetes machinery of GVR and GVK binding, URL registration and | ||
delegated authentication and authorization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to @deads2k about possibility of a backup software talking directly to CSI sidecar using a service (when the backup SW is inside a cluster) or using a public route / ingress when it's outside of the cluster. I.e. we won't need the aggregated API server at all. The SW can read VolumeSnapshotDeltaServiceConfiguration
CR by itself to find the sidecar. The actual protocol between the sidecar and the backup SW is IMO implementation detail, but it can re-use Kubernetes API authentication using e.g. kube-rbac-proxy.
This is close tho this proposal, just with the same authn/authz as the API server and CSI sidecar announcing its http(s) endpoint in VolumeSnapshotDeltaServiceConfiguration.
It all depends on size of a regular and maximum list of changed blocks - if it's O(MiB), I'd be fine with this KEP, if it's O(GiB) then while the aggregated API server looks more Kubernetes-ish, it will just eat CPU and network bandwidth and it makes the overall architecture more fragile.
/lgtm cancel
I let @msau42 judge if it's good for alpha as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that /pod/logs
(and /exec
and /portforward
) is a kube subresource that can be O(GiB)
but the traffic that flows across it is explicitly intended to be management traffic, and even then there have been multiple efforts to decouple exec/logs/portforward. I agree with @deads2k and @jsafrane - it is not appropriate to design core APIs that funnel large amounts (~O(GiB)
) of non-management traffic through aggregated APIs (third parties may do this if they so choose). If a hypothetical decoupling layer were to be designed and implemented or the amount of data is O(MiB)
it is less reasonable.
Generally speaking, if the design of an API resulted in bandwidth requirements that did not materially change the p99 bandwidth profile of the api server as it is today (~10 MiB/s
for reasonable clusters, possibly ~100 MiB/s
for large ones), it would be reasonable to do it via a kube control plane API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane @smarterclayton thanks for the feedback. The idea of the aggregated API was definitely inspired by /pod/logs
and metrics-server
, where we assumed that our traffic won't be worse than theirs. Will adding APF flow control and priority level configuration to protect the K8s api server helps here, as it proxies traffic to-and-forth our out-of-tree agg api server? Or is APF also meant for management traffic? The implication is that cluster admin will be able to control the changed block traffic, which may or may not be desirable.
If aggregated API server isn't an appropriate choice, will SIG architecture, SIG app or both, be the right forums for me to bring this design problem to? We have explore multiple alternatives over the past few months, including:
- a CRD approach where the CSI driver returns an "out-of-band" data endpoint (via
status
subresource) to the user - a CRD approach where we let user specify a "callback URL" for the CSI driver to deliver the data to
- direct HTTP calls to the CSI driver (without kube-style API)
- aggregated API server
Signed-off-by: Ivan Sim <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ihcsim The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Ivan Sim <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my first read
// This field is optional, and may be empty if no secret is required. If the | ||
// secret object contains more than one secret, all secrets are passed. | ||
// +optional | ||
VolumeSnapshotDeltaSecretRef *SecretReference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct isn't defined - why is it a Ref and not a Name ?
// Define the maximum number of entries to return in the response. | ||
Limit uint64 `json:"limit"` | ||
|
||
// Defines the start of the block index (in bytes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the offset into the storage volume?
State VolumeSnapshotDeltaState `json:"state"` | ||
|
||
// The limit defined in the request. | ||
Limit uint64 `json:"limit"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it important to repeat this from spec?
// The limit defined in the request. | ||
Limit uint64 `json:"limit"` | ||
|
||
// The offset (in bytes) defined in the request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
OffsetBytes uint64 `json:"offsetBytes"` | ||
|
||
// The size of the blocks. | ||
BlockSizeBytes unit64 `json:"blockSizeBytes"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage is the only place where I actually worry that uint64 might eventually be too small - am I crazy?
// The size of the blocks. | ||
BlockSizeBytes unit64 `json:"blockSizeBytes"` | ||
|
||
// The optional token used to retrieve the actual data block at the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs more explanation, I think?
const ( | ||
// Successfully retrieved chunks of CBT entries starting at offset, and ending | ||
// at offset + limit, with no more data left. | ||
Completed VolumeSnapshotDeltaState = "completed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed - constants are UpperBumpyCaps
Looking at commits since my last review, PRR should still be good on this. I just can't do the prow approval until I see the SIG approval. |
@thockin @johnbelamaric Thanks for the feedback. The working group still has some concerns on the proposed aggregated API server approach. There is still some work to be done here. |
// The size of the block in bytes. This field is REQUIRED. | ||
uint64 block_size_bytes = 2; | ||
|
||
// The token and other information needed to retrieve the actual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this for retrieving the actual bits in the block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - that's correct.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Superseded by #4082, due to authorship changes. |
Signed-off-by: Ivan Sim [email protected]