Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP-0094]: Rescope to Task resource UX #588

Closed
wants to merge 1 commit into from

Conversation

lbernick
Copy link
Member

This commit expands the scope of TEP-0094 to cover the user experience of
specifying resource requests and limits in Tasks.
Focusing only on Step and Sidecar resource requirements may be too narrow of a scope for this TEP.

This is largely motivated by tektoncd/pipeline#2986, because the solution
to this problem may involve removing the ability to specify Step resource requests.
It doesn't make sense to override Step resource requests in TaskRuns if users shouldn't be able
to specify Step resource requests in the first place.

The scope is also expanded to include parameterizing resource requests based on discussion in
#560, around treating resource requirement parameterization
and runtime overrides as "both/and", rather than "either/or". Fixing Task's resource requirement UX
may allow us to get parameterization for free.

/kind tep

This commit expands the scope of TEP-0094 to cover the user experience of
specifying resource requests and limits in Tasks.
Focusing only on Step and Sidecar resource requirements may be too narrow of a scope for this TEP.

This is largely motivated by tektoncd/pipeline#2986, because the solution
to this problem may involve removing the ability to specify Step resource requests.
It doesn't make sense to override Step resource requests in TaskRuns if users shouldn't be able
to specify Step resource requests in the first place.

The scope is also expanded to include parameterizing resource requests based on discussion in
tektoncd#560, around treating resource requirement parameterization
and runtime overrides as "both/and", rather than "either/or". Fixing Task's resource requirement UX
may allow us to get parameterization for free.
@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Dec 20, 2021
@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign pritidesai
You can assign the PR to them by writing /assign @pritidesai in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 20, 2021
@jerop
Copy link
Member

jerop commented Dec 21, 2021

/assign

@jerop
Copy link
Member

jerop commented Jan 10, 2022

/assign @dibyom

Comment on lines +48 to +50
### Confusing Task Resource Requirements UX
Under the hood, Tekton modifies the resource requests users specify to account for how the Tekton entrypoint runs each `Step` container.
These modifications aren't transparent to users and have caused confusion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind-of not true 😛. Starting in 0.28 (I think), tekton never modifies resource request users specified, but it will add some resource request/limits on anything that have not been specified by the user, if there is a (or several) LimitRange in the namespace.

Comment on lines +54 to +57
A pod's [effective resource requests and limits](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources) are the higher of:

- the sum of container and sidecar resource requests/limits
- the maximum requests/limits of any init container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the sum of both ? (aka the max of init containers + the sum of container — including sidecars) ?


This information is used to schedule the pod on a node, and the pod may not be scheduled if its requirements are too large.
Cluster admins may also define [resource quotas](https://kubernetes.io/docs/concepts/policy/resource-quotas/), which restrict resource consumption
per namespace, and [limit ranges](https://kubernetes.io/docs/concepts/policy/limit-range/), which restrict resource consumption per pod, container, or PVC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LimitRange are also there to set some default value if the user doesn't set any.

Tekton `Step`s are run sequentially. This means that the amount of resources needed by the `Task`'s pod is constrained by the maximum `Step`
resource requests, rather than the sum of the resource requests of all `Step`s.

Tekton implements this by applying the maximum `Step` resource request to one container, and not applying resource requests to any other container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true anymore though, it doesn't zero out resource request anymore, it uses what the user gave or use LimitRange but even there, never zeros out anything.

Comment on lines +69 to +72
Tekton modifies container resource requirements to comply with any LimitRange policies. For example, if a LimitRange requires a minimum resource
request, Tekton modifies each container to request at least that minimum.
See [Presentation: Tekton Resource Requests](https://docs.google.com/presentation/d/1-FNMMbRuxckAInO2aJPtzuINqV-9pr-M25OXQoGbV0s/edit#slide=id.p)
and [Tekton LimitRange docs](https://github.com/tektoncd/pipeline/blob/main/docs/taskruns.md#specifying-limitrange-values) for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does this on anything the user didn't specify. If a user specify a resource request lower than the allowed value, Tekton would just "let it fail", because it's coming from the user.

Comment on lines +74 to +76
As a result, all `Step` resource requests other than the largest are ignored by Tekton, causing confusion for users.
In addition, Tekton doesn't currently account for [ResourceQuotas] when determining container resource requests, as documented in
[Issue: Resource request not applied to init and sidecar containers](https://github.com/tektoncd/pipeline/issues/2933).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this is not true anymore.


### Goals

- Align `Task` resource requirements API with how `Task`s are scheduled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand that item, what does it mean ? 🤔

Comment on lines +102 to +104
- Allow parameterization of any resource requirement related fields defined in `Task`, including within `Step`s and `Sidecar`s.
- Add configuration to `TaskRun` allowing users to modify resource requirements defined in the corresponding `Task`,
including those defined in `Step`s and `Sidecar`s.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the reason of this "rescope" but those two item can be tackled completely independently of each other (but we can also group them in this TEP).

@lbernick
Copy link
Member Author

The main reason for rescoping this TEP was that I was concerned users did not like the way of specifying Step resource requirements (described in tektoncd/pipeline#2986), so overriding it wouldn't fix that problem.
However, it looks like the behavior described in that issue has been addressed, and the issue has been left open as a FR for Task-level resource requests. Because of this, I believe the rescoping is not necessary, and we should go ahead with the original proposal in #560.

@lbernick lbernick closed this Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants