Step Resource "Request" Distribution Regression #6525

skaegi · 2023-04-12T19:57:32Z

This was originally work done in #723 but it looks like part of the functionality was omitted during a refactor in #4176

The now missing functionality is...
"Since steps run sequentially we rewrite their resource "requests" to only have the max request for that resource name in one step container and zero out the requests for that resource name in the other step containers."

The result is that Tekton "requests" more resources than it needs which really reduces bin packing when running a Tekton service.

For example...

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: resourcerequest-taskrun
spec:
  taskSpec:
    steps:
    - name: s1
      image: alpine
      script: sleep 1
      resources:
        requests:
          memory: "32Mi"
          cpu: "250m"
    - name: s2
      image: alpine
      script: sleep 1
      resources:
        requests:
          memory: "64Mi"
          cpu: "125m"
    sidecars:
    - name: sc1
      image: alpine
      script: sleep 30
      resources:
        requests:
          memory: "32Mi"
          cpu: "250m"
    - name: sc2
      image: alpine
      script: sleep 30
      resources:
        requests:
          memory: "64Mi"
          cpu: "125m"

Expected Behavior

Containers:
  step-s1:
    Requests:
      cpu:                250m
      memory:             0
  step-s2:
    Requests:
      cpu:                0
      memory:             64Gi
  sidecar-sc1:
    Requests:
      cpu:                250m
      memory:             32Mi
  sidecar-sc2:
    Requests:
      cpu:                125m
      memory:             64Gi

Actual Behavior

Containers:
  step-s1:
    Requests:
      cpu:                250m
      memory:             32Mi
  step-s2:
    Requests:
      cpu:                125m
      memory:             64Gi
  sidecar-sc1:
    Requests:
      cpu:                250m
      memory:             32Mi
  sidecar-sc2:
    Requests:
      cpu:                125m
      memory:             64Gi

The text was updated successfully, but these errors were encountered:

vdemeester · 2023-04-13T12:07:00Z

cc @lbernick
This changed again as.. sometimes, settings things to 0 is a problem.

lbernick · 2023-04-13T13:34:47Z

It seems like we could have been clearer about the fact that this was a breaking change, but I'm hesitant to revert this. This behavior has been in place since 0.28 (about a year and a half) and reverting it may be confusing. Also, another user filed an issue saying that the original behavior was confusing: #2986 (comment)

I think you can achieve the behavior you're interested in with task-level compute resources. Would you consider trying this feature?

Also, to what Vincent said, setting compute resources to 0 doesn't interact correctly with limitranges, and I think some flavors of k8s don't allow it. That's why for task-level compute resources, we chose to divide the requests between containers (tektoncd/community@2a66576).

skaegi · 2023-04-13T14:37:20Z

Link to previous code that had this logic - https://github.com/tektoncd/pipeline/blob/release-v0.27.x/pkg/pod/resource_request.go

re: 0 -- instead we might then omit the value

Containers:
  step-s1:
    Requests:
      cpu:                250m
  step-s2:
    Requests:
      memory:             64Gi
  sidecar-sc1:
    Requests:
      cpu:                250m
      memory:             32Mi
  sidecar-sc2:
    Requests:
      cpu:                125m
      memory:             64Gi

or alternately spread it like we do with limit ranges...

Containers:
  step-s1:
    Requests:
      cpu:                125m
      memory:             32Mi
  step-s2:
    Requests:
      cpu:                125m
      memory:             32Mi
  sidecar-sc1:
    Requests:
      cpu:                250m
      memory:             32Mi
  sidecar-sc2:
    Requests:
      cpu:                125m
      memory:             64Gi

I think we really do want to fix this. The new syntax is well... new and we have 1000s of pre-existing pipelines that are currently requesting resources and reducing our bin packing.

skaegi · 2023-04-13T15:25:42Z

@lbernick just educated me a bit more and it's tricky -- it looks like spread is the way, but we would also have to the same sort of limit spreading as is done in TEP 0104. We might just re-use that logic and compute the TaskLevelComputeResource by examining the steps.

lbernick · 2023-04-13T19:48:45Z

related: #4347

lbernick · 2023-06-30T15:40:13Z

@skaegi did you have the chance to investigate whether task-level compute resources would work well for you? Is there any action remaining for this issue?

tekton-robot · 2023-09-28T16:18:32Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2023-10-28T16:23:01Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

sibelius · 2024-04-21T14:24:58Z

what is the best approach for this ?

skaegi added the kind/bug Categorizes issue or PR as related to a bug. label Apr 12, 2023

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 28, 2023

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step Resource "Request" Distribution Regression #6525

Step Resource "Request" Distribution Regression #6525

skaegi commented Apr 12, 2023 •

edited

Loading

vdemeester commented Apr 13, 2023

lbernick commented Apr 13, 2023

skaegi commented Apr 13, 2023 •

edited

Loading

skaegi commented Apr 13, 2023

lbernick commented Apr 13, 2023

lbernick commented Jun 30, 2023

tekton-robot commented Sep 28, 2023

tekton-robot commented Oct 28, 2023

sibelius commented Apr 21, 2024

Step Resource "Request" Distribution Regression #6525

Step Resource "Request" Distribution Regression #6525

Comments

skaegi commented Apr 12, 2023 • edited Loading

Expected Behavior

Actual Behavior

vdemeester commented Apr 13, 2023

lbernick commented Apr 13, 2023

skaegi commented Apr 13, 2023 • edited Loading

skaegi commented Apr 13, 2023

lbernick commented Apr 13, 2023

lbernick commented Jun 30, 2023

tekton-robot commented Sep 28, 2023

tekton-robot commented Oct 28, 2023

sibelius commented Apr 21, 2024

skaegi commented Apr 12, 2023 •

edited

Loading

skaegi commented Apr 13, 2023 •

edited

Loading