-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-606: finalize podresources API GA graduation #3791
Conversation
/cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, I'm reviewing PRR this time as a shadow. Thanks for updating to the latest questionnaire template. This looks OK to me, I left one small question.
|
||
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? | ||
|
||
In 1.18, DDOSing the API can lead to resource exhaustion. It is planned to be addressed as part of G.A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still relevant or was this mitigated in previous work toward G.A. Could you add clarification here about how this is still relevant or if it's been mitigated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my knowledge, this was not addressed indeed yet. I'm looking at options at gRPC level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for acknowledging this risk here. Please be sure the sig considers this information when making the decision about promoting to GA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffromani let's add this into the GA graduation criteria than.
Is this DDOS worse than what can be achieved by querying other endpoints? If not (would be my quess), it may be deserving it's own KEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be addressed and GA blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dchen1107 @SergeyKanzhelev , I'll prepare a followup KEP update to explicitely mention this requisite and make sure it's a GA blocker.
PRR looks good. The sig gets to decide about the risk/benefit for DDOS. /approve |
thanks @deads2k for the review and the approval.
I'm still doing research on gRPC to figure out the details, but if we want to implement DDOS prevention, this seems like a viable option. [EDIT] the rate limit logic can be implemented reusing https://pkg.go.dev/golang.org/x/time/rate#NewLimiter - we very likely need just an adapter between the |
/assign @derekwaynecarr |
There is a bug (#78628) that keep the feature from working when the FG is enabled, so putting the KEP back into implementable to be able to fix the issue. Full context: https://groups.google.com/g/kubernetes-sig-architecture/c/v4OwOaBkvVc/m/xdfLryLTGQAJ Signed-off-by: Francesco Romani <[email protected]>
The KEP template was too old, so this information was missing. Backfill alpha and beta review data with GA data. Signed-off-by: Francesco Romani <[email protected]>
This change wants to do as much as mechanical translation as possible; as consequence, we have now many gaps and TODOs, which will be filled in followup PR. Signed-off-by: Francesco Romani <[email protected]>
Add few missing details needed by the new KEP template. Signed-off-by: Francesco Romani <[email protected]>
re-uploaded to actually address a pending comment from @SergeyKanzhelev - sorry for the delay |
##### e2e tests | ||
|
||
- `k8s.io/kubernetes/test/e2e_node/podresources_test.go`: https://storage.googleapis.com/k8s-triage/index.html?test=POD%20Resources | ||
|
||
### Graduation Criteria |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the late feedback. I see an open PR to add Windows support. Should we add Windows to be a graduation criteria? I don't see any blockers there and should be straightforward thing to add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok to update this after the merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT support for all the platforms was a implicit requirement, but in hindisight is indeed better to mention it explicitely, thanks for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffromani if you can add two additional items to graduation criteria - Windows support and DDOS attach mitigation analysis, that would be great.
I don't think adding this ^^^ should block the KEP so lgtm:
/lgtm
@ffromani I think addressing DDOS attach is GA blocker, and looks like that is also your plan written in this KEP. But I don't think we need to block KEP being merged before you have the complete analysis and solution though. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, deads2k, ffromani, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Clarify GA blockers as asked in kubernetes#3791 (review) kubernetes#3791 (comment) - Explicitely added windows support (and all the other platforms supported by device plugins) as GA condition. - Added DOS prevention as GA condition, and clarified the perimeter of the DOS attack surface area. Signed-off-by: Francesco Romani <[email protected]>
@dchen1107 @SergeyKanzhelev thanks for LGTM/approval! implemented your requests in #3863 |
Clarify GA blockers as asked in kubernetes#3791 (review) kubernetes#3791 (comment) - Explicitely added windows support (and all the other platforms supported by device plugins) as GA condition. - Added DOS prevention as GA condition, and clarified the perimeter of the DOS attack surface area. Signed-off-by: Francesco Romani <[email protected]>
The podresources API is a node-local gRPC API exposed by the kubelet using a UNIX-domain socket which allows client to query about compute resources exclusively allocated to pods (devices, cpus...) As part as the feature GA graduation, we identified the requirement to add rate limiting to prevent DOS from buggy or malicious clients [1][2]. So this change extends the KubeletConfiguration to allow to configure the ratelimit parameters. The interface intentionally mimics the parameters of the golang/x/time/rate package [3], because it's simple and already being used in the codebase. Because of this, there is an interdependency between the rate limiter configuration parameters. This is the reason why the rate limiting is optional, with defaults to "no limits" for backward compatibility, but if specified, all the rate limit configuration values must be given (e.g. burst doesn't make much sense without frequency, see [3]). +++ [1] kubernetes/enhancements#3791 [2] kubernetes/enhancements#3863 [3] https://pkg.go.dev/golang.org/x/time/rate#Limiter Signed-off-by: Francesco Romani <[email protected]>
The PR is organized as follows: