Promote VisibilityOnDemand to Beta #2973

mimowo · 2024-09-04T09:23:20Z

What would you like to be added:

Promote the VisibilityOnDemand to Beta.

Why is this needed:

the feature is assumed to be enabled by kueuectl "POSITION IN QUEUE" column in the kueuectl list workload command
The feature is in alpha quite long already
The enablement involves 2 steps which is sometimes confusing to users

Completion requirements:

This enhancement requires the following artifacts:

Design doc
API change
Docs update

The artifacts should be linked in subsequent comments.

The text was updated successfully, but these errors were encountered:

mimowo · 2024-09-04T09:23:46Z

/cc @mwielgus @mwysokin @PBundyra @alculquicondor

mimowo · 2024-09-04T10:06:00Z

From the graduation criteria:

Release the API in beta and guarantee backwards compatibility,

This is the objective of the change.

Reconsider introducing a throttling mechanism based on user and review feedback,

I'm not aware of users' feedback to justify the effort. We can move it to GA criteria.

Consider introducing FlowScheme and PriorityLevelConfiguration to allow admins to easily tune API priorities.

Similar, I'm not aware of the feedback to justify the effort, I would suggest moving it to criteria for GA.

There were also ideas for new endpoints like #2145, but it is not clear they are needed for now. I think we should be ok to graduate the API we have. We should be good to still add the new endpoints while in Beta if needed.

alculquicondor · 2024-09-04T12:23:48Z

An important question when graduating is whether we want to include visibility-api.yaml as part of the main YAML.
I suppose there are no downsides.

mbobrovskyi · 2024-09-06T12:10:00Z

/assign

mimowo · 2024-09-06T12:22:09Z

An important question when graduating is whether we want to include visibility-api.yaml as part of the main YAML.
I suppose there are no downsides.

I see no downsides, so I would embed it for simplicity of deployment.

mbobrovskyi · 2024-09-06T12:38:52Z

@mimowo @alculquicondor What about QueueVisibility? Should we graduate it too?

mimowo · 2024-09-06T12:43:08Z

No, we aim to deprecate it, and remove in the long run.

tenzen-y · 2024-09-06T14:33:13Z

Shouldn't we add some guard that we planned in the design phase?
https://github.com/kubernetes-sigs/kueue/tree/main/keps/168-2-pending-workloads-visibility#graduation-criteria

IIUC, the current visibility endpoint, is not guarded by any mechanism. So, at least, I think it would be better to provide APF configurations, and it would be nice to have a throttling mechanism.

tenzen-y · 2024-09-06T14:36:45Z

If we do not provide any guards for the kube-apiserver, I guess that the internet-exposed kube-apiserver faces some dangers.

Indeed, if we want to use the MultiKueue, we need to create a public k8s cluster, not a supported private cluster, right?

mimowo · 2024-09-06T15:01:48Z

IIUC, the current visibility endpoint, is not guarded by any mechanism. So, at least, I think it would be better to provide APF configurations, and it would be nice to have a throttling mechanism.

I think there is a basic protection mechanism by APF, assigning one unit (seat) to every request. This may not be very accurate as listing workload may take proportionally more time.

Indeed, if we want to use the MultiKueue, we need to create a public k8s cluster, not a supported private cluster, right?

I'm not sure how MultiKueue changes that. The single regular clusters might be public, but there is authentication which would prevent external actors.

mimowo · 2024-09-06T15:18:35Z

But I agree it requires some extra research. In particular, I it would be good to refresh the knowledge on the default protection mechanism, and see how much effort there is to provide a dedicated FlowSchema.

So, I would suggest to record the findings in the KEP first before the code PR.

tenzen-y · 2024-09-11T15:37:44Z

I think there is a basic protection mechanism by APF, assigning one unit (seat) to every request. This may not be very accurate as listing workload may take proportionally more time.

I guess that it would be better to provide the kueue-sysytem namespace dedicated FlowSchema so that we can prevent the kueue-manager occupies the kube-apiserver seats when the visibility endpoints are attacked by malicious people.
But, before we introduce the FlowSchema, we need to investigate a little bit if the mechanism improves the problems.

The single regular clusters might be public, but there is authentication which would prevent external actors.

Yes, that's right. But, even if the malicious people do not have the credentials to access the kube-apiserver, they can give high load to kube-apiserver via visibility endpoints.

mimowo · 2024-09-11T15:43:53Z

I guess that it would be better to provide the kueue-sysytem namespace dedicated FlowSchema so that we can prevent the kueue-manager occupies the kube-apiserver seats when the visibility endpoints are attacked by malicious people.
But, before we introduce the FlowSchema, we need to investigate a little bit if the mechanism improves the problems.

@mbobrovskyi is investigating this, and will summarize the findings, but from what we see the default global-default schema of the P&F mechanism protects the endpoint. Yes, we can introduce a custom FlowSchema or Kueue, but it is a complication and I'm not sure it is needed. We don't have any feedback from users that the built-in mechanism is not enough.

Yes, that's right. But, even if the malicious people do not have the credentials to access the kube-apiserver, they can give high load to kube-apiserver via visibility endpoints.

I don't think you can query the visibility endpoints without credentials, I would assume this would be bad and is not allowed. However, we may do an experiment on a life cluster to confirm.

EDIT: note that the custom endpoints are exposed by the kube-apiserver via the aggregation layer, not directly, so the authentication & authorization mechanism is in place, similarly as the P&F.

tenzen-y · 2024-09-11T15:48:49Z

Anyway, we can discuss this after the summary is made public.

mimowo · 2024-09-11T17:40:17Z

I have confirmed that on GKE (and @mbobrovskyi checked on Kind) that the requests to the custom endpoint are authorized. In particular, I run:

curl -kH "Authorization: Bearer $ACCESS_TOKEN" -H "Accept: application/json" "https://$CLUSTER_ENDPOINT/apis/visibility.kueue.x-k8s.io/v1alpha1/clusterqueues/cluster-queue/pendingworkloads"

returns me results, while if I remove the Authorization header I get:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "nodes is forbidden: User \"system:anonymous\" cannot list resource \"nodes\" in API group \"\" at the cluster scope",
  "reason": "Forbidden",
  "details": {
    "kind": "nodes"
  },
  "code": 403
}

I get the analogous error for other built-in endpoints, such as https://$CLUSTER_ENDPOINT/api/v1/nodes

mbobrovskyi · 2024-09-11T17:46:39Z

Yes, the Visibility API endpoints are protected by default with global-default FlowSchema and PriorityLevelConfiguration. Below is the configuration for both:

apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: FlowSchema
metadata:
  annotations:
    apf.kubernetes.io/autoupdate-spec: "true"
  name: global-default
spec:
  distinguisherMethod:
    type: ByUser
  matchingPrecedence: 9900
  priorityLevelConfiguration:
    name: global-default
  rules:
  - nonResourceRules:
    - nonResourceURLs:
      - '*'
      verbs:
      - '*'
    resourceRules:
    - apiGroups:
      - '*'
      clusterScope: true
      namespaces:
      - '*'
      resources:
      - '*'
      verbs:
      - '*'
    subjects:
    - group:
        name: system:unauthenticated
      kind: Group
    - group:
        name: system:authenticated
      kind: Group
---
apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: PriorityLevelConfiguration
metadata:
  annotations:
    apf.kubernetes.io/autoupdate-spec: "true"
  name: global-default
spec:
  limited:
    lendablePercent: 50
    limitResponse:
      queuing:
        handSize: 6
        queueLengthLimit: 50
        queues: 128
      type: Queue
    nominalConcurrencyShares: 20
  type: Limited

I found it when I tried executing a large number of parallel requests to kubectl get --raw /apis/visibility.kueue.x-k8s.io/v1alpha1/clusterqueues/cluster-queue/pendingworkloads.

The kubectl get --raw /debug/api_priority_and_fairness/dump_priority_levels return a table with column DispatchedRequests that show us how much requests we have for this configuration.

mbobrovskyi · 2024-09-11T19:09:08Z

The LimitResponse type in PriorityLevelConfiguration can be set to one of two options:

Reject: Excess traffic is immediately rejected with an HTTP 429 (Too Many Requests) error.
Queue: Requests exceeding the threshold are placed in a queue."

The global-default configuration uses the Queue type. This means that if there is a high volume of requests, they will be added to a queue and wait for their turn to be processed.

This is an example that what I have:

kubectl get --raw /debug/api_priority_and_fairness/dump_priority_levels 
PriorityLevelName, ActiveQueues, IsIdle, IsQuiescing, WaitingRequests, ExecutingRequests, DispatchedRequests, RejectedRequests, TimedoutRequests, CancelledRequests
catch-all,         0,            true,   false,       0,               0,                 39,                 0,                0,                0
exempt,            0,            true,   false,       0,               0,                 2290,               0,                0,                0
global-default,    4,            false,  false,       57,              368,               23614,              1533,             0,                609
leader-election,   0,            true,   false,       0,               0,                 1170,               0,                0,                0
node-high,         0,            true,   false,       0,               0,                 743,                0,                0,                0
system,            0,            true,   false,       0,               0,                 557,                0,                0,                0
workload-high,     0,            true,   false,       0,               0,                 1706,               0,                0,                0
workload-low,      0,            true,   false,       0,               0,                 4595,               0,                0,                0

And also the apiserver_flowcontrol_rejected_requests_total metrics that what I have:

# This one rejected because queue full
apiserver_flowcontrol_rejected_requests_total{endpoint="https", flow_schema="global-default", instance="172.18.0.4:6443", job="apiserver", namespace="default", priority_level="global-default", reason="queue-full", service="kubernetes"} | 924
-- | --
# This one rejected due to time-out
apiserver_flowcontrol_rejected_requests_total{endpoint="https", flow_schema="global-default", instance="172.18.0.4:6443", job="apiserver", namespace="default", priority_level="global-default", reason="time-out", service="kubernetes"} | 719

Possible rejection:

queue-full, indicating that too many requests were already queued.
concurrency-limit, indicating that the PriorityLevelConfiguration is configured to reject rather than queue excess requests.
time-out, indicating that the request was still in the queue when its queuing time limit expired.
cancelled, indicating that the request is not purge locked and has been ejected from the queue.

mbobrovskyi · 2024-09-11T20:16:19Z

I guess that it would be better to provide the kueue-sysytem namespace dedicated FlowSchema so that we can prevent the kueue-manager occupies the kube-apiserver seats when the visibility endpoints are attacked by malicious people.
But, before we introduce the FlowSchema, we need to investigate a little bit if the mechanism improves the problems.

I think you right, it should help. We will have different queues that allow us to do not block kube-system or another namespaces when kueue-system namespace is overloaded. I also tested this scenario, and it worked well.

Just a question. Should we have dedicated FlowSchema for kube-system namespace or just for Visibility API? WDYT?

tenzen-y · 2024-09-13T11:11:30Z

I think you right, it should help. We will have different queues that allow us to do not block kube-system or another namespaces when kueue-system namespace is overloaded. I also tested this scenario, and it worked well.

Just a question. Should we have dedicated FlowSchema for kube-system namespace or just for Visibility API? WDYT?

If we can configure the FlowScheme only for the visibility endpoint, it would be great since we want to continue the Kueue operation when the visibility endpoints are attacked by malicious people.

Is it possible to configure the FliowScheme only for visibility endpoints?

tenzen-y · 2024-09-13T11:14:20Z

The main problem is that the visibility endpoints use the global-defaults FlowScheme since most of the controllers and workloads belong to the global defaults. So, when the global-defaults are occupied by the visibility endpoints, the cluster admin and batch users will lose way to operate cluster and themselves Jobs.

mbobrovskyi · 2024-09-13T12:11:10Z

Is it possible to configure the FliowScheme only for visibility endpoints?

Yes, we can configure for visibility.kueue.x-k8s.io API group. I can prepare PR for it.

tenzen-y · 2024-09-13T12:15:34Z

Is it possible to configure the FliowScheme only for visibility endpoints?

Yes, we can configure for visibility.kueue.x-k8s.io API group. I can prepare PR for it.

Thank you for letting me know. Could you wait for a while?

Actually, we are investigating the cloud provider specifications for FlowScheme. After we resolve some questions, @mimowo will comment here or KEP PR.

tenzen-y · 2024-09-13T12:59:10Z

Is it possible to configure the FliowScheme only for visibility endpoints?

Yes, we can configure for visibility.kueue.x-k8s.io API group. I can prepare PR for it.

Thank you for letting me know. Could you wait for a while?

Actually, we are investigating the cloud provider specifications for FlowScheme. After we resolve some questions, @mimowo will comment here or KEP PR.

Let's continue the discussion at #3032 (comment).

mbobrovskyi · 2024-09-18T10:32:51Z

@mimowo @tenzen-y

It's a lot of task related to this issue. We should merge them in this order:

mimowo · 2024-09-18T10:40:18Z

We still have a pending discussion about including P&F config: #3043 (comment). I think we may need to rework the PR to put it in docs, and update the KEP.

mimowo added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 4, 2024

k8s-ci-robot assigned mbobrovskyi Sep 6, 2024

mbobrovskyi mentioned this issue Sep 6, 2024

Promote VisibilityOnDemand to Beta. #3008

Merged

mimowo mentioned this issue Sep 11, 2024

Deprecate the QueueVisibility feature gate and corresponding API #2256

Closed

mimowo mentioned this issue Sep 12, 2024

Update "KEP-168-2: Pending-workloads-visibility" before graduation to Beta #3032

Merged

This was referenced Sep 13, 2024

Add opt-in P&F configs for Visibility API and reference from documentation #3043

Merged

Move openapi to visibility directory. #3083

Merged

mbobrovskyi mentioned this issue Sep 18, 2024

Include visibility-api.yaml as a part of main.yaml. #3084

Merged

mbobrovskyi mentioned this issue Sep 19, 2024

Fix visibility command to view only one pending workloads. #3097

Merged

k8s-ci-robot closed this as completed in #3008 Sep 23, 2024

mimowo mentioned this issue Oct 4, 2024

☂️ Release v0.9.0 requirements #3192

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promote VisibilityOnDemand to Beta #2973

Promote VisibilityOnDemand to Beta #2973

mimowo commented Sep 4, 2024

mimowo commented Sep 4, 2024

mimowo commented Sep 4, 2024

alculquicondor commented Sep 4, 2024

mbobrovskyi commented Sep 6, 2024

mimowo commented Sep 6, 2024

mbobrovskyi commented Sep 6, 2024

mimowo commented Sep 6, 2024

tenzen-y commented Sep 6, 2024

tenzen-y commented Sep 6, 2024

mimowo commented Sep 6, 2024

mimowo commented Sep 6, 2024

tenzen-y commented Sep 11, 2024

mimowo commented Sep 11, 2024 •

edited

Loading

tenzen-y commented Sep 11, 2024

mimowo commented Sep 11, 2024

mbobrovskyi commented Sep 11, 2024

mbobrovskyi commented Sep 11, 2024 •

edited

Loading

mbobrovskyi commented Sep 11, 2024 •

edited

Loading

tenzen-y commented Sep 13, 2024

tenzen-y commented Sep 13, 2024

mbobrovskyi commented Sep 13, 2024

tenzen-y commented Sep 13, 2024

tenzen-y commented Sep 13, 2024

mbobrovskyi commented Sep 18, 2024 •

edited

Loading

mimowo commented Sep 18, 2024

Promote VisibilityOnDemand to Beta #2973

Promote VisibilityOnDemand to Beta #2973

Comments

mimowo commented Sep 4, 2024

mimowo commented Sep 4, 2024

mimowo commented Sep 4, 2024

alculquicondor commented Sep 4, 2024

mbobrovskyi commented Sep 6, 2024

mimowo commented Sep 6, 2024

mbobrovskyi commented Sep 6, 2024

mimowo commented Sep 6, 2024

tenzen-y commented Sep 6, 2024

tenzen-y commented Sep 6, 2024

mimowo commented Sep 6, 2024

mimowo commented Sep 6, 2024

tenzen-y commented Sep 11, 2024

mimowo commented Sep 11, 2024 • edited Loading

tenzen-y commented Sep 11, 2024

mimowo commented Sep 11, 2024

mbobrovskyi commented Sep 11, 2024

mbobrovskyi commented Sep 11, 2024 • edited Loading

mbobrovskyi commented Sep 11, 2024 • edited Loading

tenzen-y commented Sep 13, 2024

tenzen-y commented Sep 13, 2024

mbobrovskyi commented Sep 13, 2024

tenzen-y commented Sep 13, 2024

tenzen-y commented Sep 13, 2024

mbobrovskyi commented Sep 18, 2024 • edited Loading

mimowo commented Sep 18, 2024

mimowo commented Sep 11, 2024 •

edited

Loading

mbobrovskyi commented Sep 11, 2024 •

edited

Loading

mbobrovskyi commented Sep 11, 2024 •

edited

Loading

mbobrovskyi commented Sep 18, 2024 •

edited

Loading