Improve UX of Step Resource Requests #2986

danielhelfand · 2020-07-22T15:33:59Z

Feature request

Currently, users have the ability to define resource requests as part of a step for a Task for each step:

    steps:
    - name: step1
      image: ...
      command: ...
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 500m
          memory: 1Gi
    - name: step2
      image: ...
      command: ...
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 500m
          memory: 1Gi

This is misleading in my opinion because it makes it seem like the user has control of setting the requests for each step. However, only the max requests from all steps are ever used as documented here.

So a user should only have to, at most, define resource requests (the max values needed for a container) once for steps. Maybe it makes sense to allow users to define CPU, memory, and ephemeral storage once as part of a TaskRun and allow this to be configurable for PipelineRuns via taskRunSpecs?

Use case

Discussed in #2984: #2984 (comment)
Discussed in #2931: #2931 (comment)

The idea has come up a couple times in issues where the experience of requests is confusing for users, and it might help to provide this option and clearly document with better examples.

imjasonh · 2020-07-22T16:18:44Z

+1, the CLI/UI could also make this clearer

Per-step limits are still valid and enforced, so maybe we should migrate to a world where requests are specified per-Task and each step can only set a limit?

danielhelfand · 2020-07-23T14:53:51Z

+1, the CLI/UI could also make this clearer

Yes, think the CLI could definitely add some kind of option for this.

Per-step limits are still valid and enforced, so maybe we should migrate to a world where requests are specified per-Task and each step can only set a limit?

Think it's reasonable to still support limits for steps.

Another benefit of this approach would be potentially eliminating needing to search through all containers as part of a TaskRun to find the max since the values are known in advance.

danielhelfand · 2020-07-24T03:18:50Z

Something that will have to be worked out with this though is how to deal with step limits and not repeating what occurred in #1937.

danielhelfand · 2020-07-24T15:11:37Z

In rethinking this with regards to limits, why not have one limit for the entire Task? This way, the Task can be set up with a limit to be enforced via a TaskRun, but then resource requests can be specified at run time via a request property?

Since a step has access to all resources of a pod, is there any reason for restricting individual steps?

qu1queee · 2020-07-27T15:43:12Z

Regarding steps limits being configurable, would this apply also to the Tekton containers like the step-create-dir-* found in a TaskRun?

imjasonh · 2020-07-27T17:28:41Z

Regarding steps limits being configurable, would this apply also to the Tekton containers like the step-create-dir-* found in a TaskRun?

I don't think so. Tekton is responsible for those steps, and should be responsible for configuring their limits.

Perhaps that step should be an initContainer anyway, so there's a clearer distinction between "step containers" (containers) and "Tekton's containers" (initContainers)

qu1queee · 2020-07-27T18:32:15Z

Makes sense. Do you think it would be possible for Tekton to define reasonable Limits numbers for those containers(e.g. step-create-dir-*), instead of relying of the LimitRange default value?

SaschaSchwarze0 · 2020-07-28T09:58:40Z

Regarding steps limits being configurable, would this apply also to the Tekton containers like the step-create-dir-* found in a TaskRun?

I don't think so. Tekton is responsible for those steps, and should be responsible for configuring their limits.

Perhaps that step should be an initContainer anyway, so there's a clearer distinction between "step containers" (containers) and "Tekton's containers" (initContainers)

@imjasonh I +1 on Tekton being responsible for those steps (init containers as well as containers added by Tekton) and therefore providing the resources (imo not just limits but also requests). But, I suggest that Tekton provides reasonable values. What we (I am in the same team as @qu1queee) see (we are still on Tekton 0.11, about to get to 0.14; so if things are changed in the meantime, pls correct me) is the following: we have a LimitRange in place in the namespace where the TaskRun (with embedded TaskSpec) gets created. The LimitRange contains default, defaultRequest, min and max for CPU and Memory. What we see for initContainers: Tekton takes default and defaultRequest. This is imo not correct (because in our case it requests 1 Gi of memory which is way too much). If Tekton is responsible for those init containers, then it should specify reasonable values. Tekton knows what these containers are doing. As such, it should specify a request and limit that is suitable for that (while still respecting min and max of the LimitRange to create valid containers). The same applies to Tekton's containers (step-create-dir-image, step-image-digest-exporter) where Tekton imo takes min for request and default as limit. Tekton here should also specify reasonable values suitable for what the containers are doing and not arbitrary values from a LimitRange that it can't control (but has to respect). And I agree, step-create-dir-image sounds more like an initContainer.

In rethinking this with regards to limits, why not have one limit for the entire Task? This way, the Task can be set up with a limit to be enforced via a TaskRun, but then resource requests can be specified at run time via a request property?

Since a step has access to all resources of a pod, is there any reason for restricting individual steps?

@danielhelfand that would be new to me. A step is a container and has only access to the memory and CPU defined for this container.

This is misleading in my opinion because it makes it seem like the user has control of setting the requests for each step. However, only the max requests from all steps are ever used as documented here.

So a user should only have to, at most, define resource requests (the max values needed for a container) once for steps. Maybe it makes sense to allow users to define CPU, memory, and ephemeral storage once as part of a TaskRun and allow this to be configurable for PipelineRuns via taskRunSpecs?

@danielhelfand No, this refers to my previous comment. As my understanding is that the CPU and memory are assigned to a container, it can't be assigned once per TaskRun. I would really like to see a mechanism available where the user creating the TaskSpec has control over the step resource requests AND limit. I agree with the concern outlined elsewhere that due to containers being started all at the same time this can easily eat resources of a node in case there are many steps (or even cause it to not be schedulable). But, I also think that the current algorithm can cause an issue. Assuming there are two steps and we leave all tekton-added containers out of the picture for simplification. The first one has request = limit = 1 Gi, the second one has request = limit = 2 Gi. When the LimitRange min is 128 Mi, then Tekton will change the request for the first step to 128 Mi. Kubernetes can now schedule this pod to a node that currently has only 2128 Mi memory available because scheduling happens based on the request (and not the limit). The first one now starts to run. If that's an operation that really needs memory up to 1 Gi, then this container will be OOMKilled as the node can't give it the necessary memory although the container limit is not reached. I am not asking for the current mechanism to be generally changed, but at least there should be an option somewhere (global as env property, a flag on the TaskSpec, something like that) that allows to force the step's request and limit to be set in the Pod's containers as well without modification.

danielhelfand · 2020-07-28T11:53:30Z

@SaschaSchwarze0 Not sure if you have read how Tekton handles step/container resource requests, but there are some differences from the traditional approach by Kubernetes. Please see the documentation below:

SaschaSchwarze0 · 2020-07-28T13:47:53Z

@SaschaSchwarze0 Not sure if you have read how Tekton handles step/container resource requests, but there are some differences from the traditional approach by Kubernetes. Please see the documentation below:
* https://docs.google.com/presentation/d/1-FNMMbRuxckAInO2aJPtzuINqV-9pr-M25OXQoGbV0s/edit#slide=id.p

* https://github.com/tektoncd/pipeline/blob/master/docs/taskruns.md#specifying-limitrange-values

* Also, see issue #598

Yes, I know it.

danielhelfand · 2020-07-30T04:09:48Z

I agree that the request is reasonable as far as being able override step resource requests if it is the case that each step container doesn't have access to the max resources defined for all containers.

cc @imjasonh Any thoughts on this?

qu1queee · 2020-08-08T12:40:44Z

folks, any update on this?

tekton-robot · 2020-11-06T12:50:07Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

vdemeester · 2020-11-09T07:00:42Z

/remove-lifecycle stale

erinz · 2020-12-01T22:29:29Z

+1, we need this feature too.

zhangtbj · 2020-12-02T03:27:35Z

+1, I think it doesn't need to be an explicit setting in the Tekton resource definition.

But in a shared or complex Kubernetes environment, it is better to provide the capability to customize the settings by admin. Although some of the containers are created by Tekton but they also depend on the end user input may have some problem if just using default settings.

tekton-robot · 2021-03-02T03:56:41Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-04-01T04:33:45Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

vdemeester · 2021-04-16T04:34:19Z

/lifecycle frozen
We still need to support this.

bobcatfish · 2021-09-15T19:57:21Z

quick update @vdemeester is making some improvements to how tekton handles limitranges (#4176), tho i think this particular issue is also requesting support for specifying limits at the Task level as well

This commit expands the scope of TEP-0094 to cover the user experience of specifying resource requests and limits in Tasks. Focusing only on Step and Sidecar resource requirements may be too narrow of a scope for this TEP. This is largely motivated by tektoncd/pipeline#2986, because the solution to this problem may involve removing the ability to specify Step resource requests. It doesn't make sense to override Step resource requests in TaskRuns if users shouldn't be able to specify Step resource requests in the first place. The scope is also expanded to include parameterizing resource requests based on discussion in tektoncd#560, around treating resource requirement parameterization and runtime overrides as "both/and", rather than "either/or". Fixing Task's resource requirement UX may allow us to get parameterization for free.

lbernick · 2022-01-12T16:16:52Z

As of v0.28.0, Step resource requests and limits are left unchanged, or adjusted based on limitranges present (see docs for more info)-- thanks @vdemeester!

I'm going to close this issue because the Step level behavior described is no longer true. I've opened #4470 as a FR for Task-level resource requirements.

/close

tekton-robot · 2022-01-12T16:17:00Z

@lbernick: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

As of v0.28.0, Step resource requests and limits are left unchanged, or adjusted based on limitranges present (see docs for more info)-- thanks @vdemeester!

I'm going to close this issue because the Step level behavior described is no longer true. I've opened #4470 as a FR for Task-level resource requirements.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vdemeester · 2022-01-12T16:21:44Z

/close
🙏🏼

tekton-robot · 2022-01-12T16:21:46Z

@vdemeester: Closing this issue.

In response to this:

/close
🙏🏼

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

danielhelfand added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 22, 2020

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 6, 2020

tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 9, 2020

vdemeester mentioned this issue Nov 20, 2020

Towards v1 API #3548

Closed

17 tasks

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 2, 2021

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 1, 2021

vdemeester mentioned this issue Apr 16, 2021

failed to create task run pod in namespace with resource quotas #3879

Closed

tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Apr 16, 2021

dibyom added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 4, 2021

jsravn mentioned this issue Oct 21, 2021

Support specifying resource requests at the TaskRun level #4326

Closed

lbernick mentioned this issue Dec 20, 2021

[Design Proposal] TEP-0094: Specifying Resource Requirements at Runtime tektoncd/community#560

Merged

lbernick mentioned this issue Dec 20, 2021

[TEP-0094]: Rescope to Task resource UX tektoncd/community#588

Closed

lbernick mentioned this issue Jan 12, 2022

FR: Task-level (and maybe Pipeline-level) resource requests and limits #4470

Closed

tekton-robot closed this as completed Jan 12, 2022

lbernick mentioned this issue Apr 13, 2023

Step Resource "Request" Distribution Regression #6525

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve UX of Step Resource Requests #2986

Improve UX of Step Resource Requests #2986

danielhelfand commented Jul 22, 2020

imjasonh commented Jul 22, 2020

danielhelfand commented Jul 23, 2020

danielhelfand commented Jul 24, 2020

danielhelfand commented Jul 24, 2020

qu1queee commented Jul 27, 2020

imjasonh commented Jul 27, 2020

qu1queee commented Jul 27, 2020

SaschaSchwarze0 commented Jul 28, 2020

danielhelfand commented Jul 28, 2020

SaschaSchwarze0 commented Jul 28, 2020

danielhelfand commented Jul 30, 2020

qu1queee commented Aug 8, 2020

tekton-robot commented Nov 6, 2020

vdemeester commented Nov 9, 2020

erinz commented Dec 1, 2020

zhangtbj commented Dec 2, 2020

tekton-robot commented Mar 2, 2021

tekton-robot commented Apr 1, 2021

vdemeester commented Apr 16, 2021

bobcatfish commented Sep 15, 2021

lbernick commented Jan 12, 2022

tekton-robot commented Jan 12, 2022

vdemeester commented Jan 12, 2022

tekton-robot commented Jan 12, 2022

Improve UX of Step Resource Requests #2986

Improve UX of Step Resource Requests #2986

Comments

danielhelfand commented Jul 22, 2020

Feature request

Use case

imjasonh commented Jul 22, 2020

danielhelfand commented Jul 23, 2020

danielhelfand commented Jul 24, 2020

danielhelfand commented Jul 24, 2020

qu1queee commented Jul 27, 2020

imjasonh commented Jul 27, 2020

qu1queee commented Jul 27, 2020

SaschaSchwarze0 commented Jul 28, 2020

danielhelfand commented Jul 28, 2020

SaschaSchwarze0 commented Jul 28, 2020

danielhelfand commented Jul 30, 2020

qu1queee commented Aug 8, 2020

tekton-robot commented Nov 6, 2020

vdemeester commented Nov 9, 2020

erinz commented Dec 1, 2020

zhangtbj commented Dec 2, 2020

tekton-robot commented Mar 2, 2021

tekton-robot commented Apr 1, 2021

vdemeester commented Apr 16, 2021

bobcatfish commented Sep 15, 2021

lbernick commented Jan 12, 2022

tekton-robot commented Jan 12, 2022

vdemeester commented Jan 12, 2022

tekton-robot commented Jan 12, 2022