Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] unset CPU limit is overridden with CPU request #3574

Open
2 tasks done
flixr opened this issue Apr 6, 2023 · 5 comments
Open
2 tasks done

[BUG] unset CPU limit is overridden with CPU request #3574

flixr opened this issue Apr 6, 2023 · 5 comments
Labels
bug Something isn't working flytepropeller needs-decision Requires decision

Comments

@flixr
Copy link
Contributor

flixr commented Apr 6, 2023

Describe the bug

A container task with only CPU requests but no limits set, still gets limits applied:

from flytekit import ContainerTask, kwtypes, workflow, Resources

hello_task = ContainerTask(
    name="hello",
    image="ubuntu:20.04",
    requests=Resources(cpu="2", mem="1Gi"),
    limits=Resources(mem="2Gi"),
    command=["echo", "hello"]
)

Running this task results in a pod with cpu limit also set to 2 (same as request), but there should be no cpu limit.

Expected behavior

If no CPU limit is specified, it is also not set implicitly and no CPU limit is applied in k8s.

So I propose to only copy the requests to limits as a whole if limits are completely unset instead of filling missing limits with the respective requests values.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@flixr flixr added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Apr 6, 2023
@ossareh
Copy link
Contributor

ossareh commented Apr 11, 2023

We have just encountered this today also; in our case request is being set at the task level:

Here is the schema from flytectl's point of view; flytectl get workflow -p flytesnacks -d development test_workflow -o json --version test_version:

... snipped to the section of interest...
          "resources": {
            "requests": [
              {
                "name": "CPU",
                "value": "64"
              },
              {
                "name": "MEMORY",
                "value": "950Gi"
              }
            ]
          }

pod spec:

resources:                                                                                                                                                                                                                                                                                                                                                                                                                                                    
  limits:
    cpu: "64"
    memory: 950Gi
  requests:
    cpu: "64"
    memory: 950Gi

@flixr
Copy link
Contributor Author

flixr commented May 4, 2023

For more context: In many cases setting CPU limits (not requests) results in unused CPU due to throttling, see also
https://home.robusta.dev/blog/stop-using-cpu-limits.

@hamersaw hamersaw added flytepropeller needs-decision Requires decision and removed untriaged This issues has not yet been looked at by the Maintainers labels Dec 1, 2023
@hamersaw
Copy link
Contributor

hamersaw commented Dec 1, 2023

Need to explore this further, but Flyte does this because of k8s pod QoS. If the requests are different than the limits then k8s frequently preempts the pods to schedule others. By setting them the same, k8s will not prematurely delete the pod to schedule another.

@fellhorn
Copy link
Contributor

fellhorn commented Dec 5, 2023

We were wondering if it is the right decision to set it by default instead of letting the user decide.

In our case most tasks are scheduled on exclusive nodes. Without manually overriding the limits, currently users would need to get very close to the maximum CPU and memory request of the node.
Without this, their Pod will be throttled and see OOM earlier than necessary.

We currently work around this by automatically overriding the limits on every task.

@leoll2
Copy link

leoll2 commented Jul 8, 2024

We were wondering if it is the right decision to set it by default instead of letting the user decide.

My 2 cents here is that letting the user decide, rather than enforcing an unintuitive behavior, is never a bad choice.

Sure there can be a default behavior, but the user should be given the option to leave the limits unset (unlimited) depending on the use case. And, as @flixr pointed out, there are good reasons to do it sometimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flytepropeller needs-decision Requires decision
Projects
None yet
Development

No branches or pull requests

5 participants