Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-606: finalize podresources API GA graduation #3791

Merged
merged 4 commits into from
Feb 9, 2023

Conversation

ffromani
Copy link
Contributor

  1. first, we move back the KEP to GA to be able to do the missing fixes, see here for context
  2. adapt to the most recent KEP template adding the PRR information
  3. mechanical translation from the older document format to the newest available at moment of writing. Care is taken to alter as little as possible the original content, with the only purpose to make it fit in the new format
  4. add all the new information not required when the original document was written, mostly concern with the PRR reviews.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 27, 2023
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 27, 2023
@swatisehgal
Copy link
Contributor

/cc

Copy link
Contributor

@jeremyrickard jeremyrickard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I'm reviewing PRR this time as a shadow. Thanks for updating to the latest questionnaire template. This looks OK to me, I left one small question.


###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

In 1.18, DDOSing the API can lead to resource exhaustion. It is planned to be addressed as part of G.A.
Copy link
Contributor

@jeremyrickard jeremyrickard Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still relevant or was this mitigated in previous work toward G.A. Could you add clarification here about how this is still relevant or if it's been mitigated?

Copy link
Contributor Author

@ffromani ffromani Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the best of my knowledge, this was not addressed indeed yet. I'm looking at options at gRPC level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for acknowledging this risk here. Please be sure the sig considers this information when making the decision about promoting to GA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ffromani let's add this into the GA graduation criteria than.

Is this DDOS worse than what can be achieved by querying other endpoints? If not (would be my quess), it may be deserving it's own KEP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be addressed and GA blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dchen1107 @SergeyKanzhelev , I'll prepare a followup KEP update to explicitely mention this requisite and make sure it's a GA blocker.

@jeremyrickard
Copy link
Contributor

@ffromani I believe you are missing an updated question from the PRR template (see for the change: 78200cb). Could you add that to your PRR questionnaire and provide an answer?

@ffromani
Copy link
Contributor Author

ffromani commented Feb 7, 2023

@ffromani I believe you are missing an updated question from the PRR template (see for the change: 78200cb). Could you add that to your PRR questionnaire and provide an answer?

Sorry I missed that part! added in the latest upload.

@deads2k
Copy link
Contributor

deads2k commented Feb 7, 2023

PRR looks good. The sig gets to decide about the risk/benefit for DDOS.

/approve

@ffromani
Copy link
Contributor Author

ffromani commented Feb 8, 2023

PRR looks good. The sig gets to decide about the risk/benefit for DDOS.

/approve

thanks @deads2k for the review and the approval.
@SergeyKanzhelev @klueska regarding DDOS mitigation, we can address at gRPC level with a rate limiter interceptor (https://github.com/grpc-ecosystem/go-grpc-middleware/blob/master/ratelimit/ratelimit.go). The simplest next step could look like:

  • add a kubelet option to limit podresources call per second - the limit would be initially per-call
  • install a ratelimit interceptor to implement the ratelimit
  • this way any given client will have a hard cap of calls/second through the API

I'm still doing research on gRPC to figure out the details, but if we want to implement DDOS prevention, this seems like a viable option.

[EDIT] the rate limit logic can be implemented reusing https://pkg.go.dev/golang.org/x/time/rate#NewLimiter - we very likely need just an adapter between the x/time/rate interface and the gRPC interface.
Note: we already have this dep in k/k go.mod including the ratelimit grpc package

@ffromani
Copy link
Contributor Author

ffromani commented Feb 8, 2023

/assign @derekwaynecarr
for approval per bot suggestion

There is a bug (#78628) that keep the feature from working when the FG
is enabled, so putting the KEP back into implementable to be able to
fix the issue.

Full context: https://groups.google.com/g/kubernetes-sig-architecture/c/v4OwOaBkvVc/m/xdfLryLTGQAJ

Signed-off-by: Francesco Romani <[email protected]>
The KEP template was too old, so this information was missing.
Backfill alpha and beta review data with GA data.

Signed-off-by: Francesco Romani <[email protected]>
This change wants to do as much as mechanical translation
as possible; as consequence, we have now many gaps and TODOs,
which will be filled in followup PR.

Signed-off-by: Francesco Romani <[email protected]>
Add few missing details needed by the new KEP template.

Signed-off-by: Francesco Romani <[email protected]>
@ffromani
Copy link
Contributor Author

ffromani commented Feb 8, 2023

re-uploaded to actually address a pending comment from @SergeyKanzhelev - sorry for the delay

##### e2e tests

- `k8s.io/kubernetes/test/e2e_node/podresources_test.go`: https://storage.googleapis.com/k8s-triage/index.html?test=POD%20Resources

### Graduation Criteria
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late feedback. I see an open PR to add Windows support. Should we add Windows to be a graduation criteria? I don't see any blockers there and should be straightforward thing to add

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok to update this after the merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT support for all the platforms was a implicit requirement, but in hindisight is indeed better to mention it explicitely, thanks for pointing this out.

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ffromani if you can add two additional items to graduation criteria - Windows support and DDOS attach mitigation analysis, that would be great.

I don't think adding this ^^^ should block the KEP so lgtm:

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2023
@dchen1107
Copy link
Member

@ffromani I think addressing DDOS attach is GA blocker, and looks like that is also your plan written in this KEP. But I don't think we need to block KEP being merged before you have the complete analysis and solution though.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, deads2k, ffromani, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2023
@dchen1107 dchen1107 added this to the v1.27 milestone Feb 9, 2023
@k8s-ci-robot k8s-ci-robot merged commit fade6bb into kubernetes:master Feb 9, 2023
@ffromani ffromani deleted the kep-606-ga branch February 9, 2023 07:56
ffromani added a commit to ffromani/enhancements that referenced this pull request Feb 9, 2023
Clarify GA blockers as asked in
kubernetes#3791 (review)
kubernetes#3791 (comment)

- Explicitely added windows support (and all the other platforms supported by
  device plugins) as GA condition.
- Added DOS prevention as GA condition, and clarified the perimeter of
  the DOS attack surface area.

Signed-off-by: Francesco Romani <[email protected]>
@ffromani
Copy link
Contributor Author

ffromani commented Feb 9, 2023

@dchen1107 @SergeyKanzhelev thanks for LGTM/approval! implemented your requests in #3863

ffromani added a commit to ffromani/enhancements that referenced this pull request Feb 9, 2023
Clarify GA blockers as asked in
kubernetes#3791 (review)
kubernetes#3791 (comment)

- Explicitely added windows support (and all the other platforms supported by
  device plugins) as GA condition.
- Added DOS prevention as GA condition, and clarified the perimeter of
  the DOS attack surface area.

Signed-off-by: Francesco Romani <[email protected]>
ffromani added a commit to ffromani/kubernetes that referenced this pull request Mar 2, 2023
The podresources API is a node-local gRPC API exposed by the kubelet
using a UNIX-domain socket which allows client to query about compute
resources exclusively allocated to pods (devices, cpus...)

As part as the feature GA graduation, we identified the
requirement to add rate limiting to prevent DOS from buggy or malicious
clients [1][2].

So this change extends the KubeletConfiguration to allow to
configure the ratelimit parameters.

The interface intentionally mimics the parameters of the
golang/x/time/rate package [3], because it's simple and already being
used in the codebase.

Because of this, there is an interdependency between the rate limiter
configuration parameters. This is the reason why the rate limiting is
optional, with defaults to "no limits" for backward compatibility, but
if specified, all the rate limit configuration values must be given
(e.g. burst doesn't make much sense without frequency, see [3]).

+++

[1] kubernetes/enhancements#3791
[2] kubernetes/enhancements#3863
[3] https://pkg.go.dev/golang.org/x/time/rate#Limiter

Signed-off-by: Francesco Romani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

8 participants