Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Improved Resource Allocation #400

Open
shinybrar opened this issue Nov 10, 2022 · 0 comments
Open

[FEAT] Improved Resource Allocation #400

shinybrar opened this issue Nov 10, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@shinybrar
Copy link
Collaborator

Problem

Currently the CANFAR Kubernetes cluster allows for creation of jobs with following underlying resources:

  • CPU [1, 16]
  • Memory [1, 192] GB
  • GPU [0, 28]

To better optimize the cluster usage, we need could employ one of the following strategies:

Cluster-Wide Reservation

By default, we can choose to reserve resources with parity to the unused resources.

For example, if we have a cluster, with 100 CPUs, 1000 GB of memory we can reserve 4 GB of memory for each vacant CPU. Here is a nominal example:

Cluster Utilization:
CPU: 70%
Memory: 100%

In a scenario like this, we always keep a stop gap of vacant cores * 4 GB of memory on reserve in the cluster. This will allow us to run CPU intensive jobs, while the cluster is starved for memory.

Improved Job Reservation

Currently, when a job is spawned, it is given a reserved a fixed, maximum amount of resources. e.g. If spawn a job with 4 CPUs and 64 GB of memory, those resources are removed from the cluster. The reality is that the job may not need all of those resources at all times. We can improve the reservation strategy by allowing the jobs to define a minimum and maximum amount of resources they need. This will allow the cluster to better utilize the resources.

For a job described below, the user is gaurenteed 1 CPU and 16 GB of memory and is allowed to use a maximum of 4 CPUs and 64 GB of memory.

apiVersion: batch/v1
kind: Job
metadata:
  name: my-job
spec:
    template:
        spec:
        containers:
        - name: my-job
            image: my-job:latest
            resources:
            limits:
                cpu: 4
                memory: 64Gi
            requests:
                cpu: 1
                memory: 16Gi
        restartPolicy: Never
@shinybrar shinybrar added the enhancement New feature or request label Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants