Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hugepages allocation in containers #2589

Open
achimnol opened this issue Jul 29, 2024 · 0 comments
Open

Support hugepages allocation in containers #2589

achimnol opened this issue Jul 29, 2024 · 0 comments
Labels
comp:agent Related to Agent component comp:manager Related to Manager component urgency:3 Must be finished within a certain time frame.
Milestone

Comments

@achimnol
Copy link
Member

achimnol commented Jul 29, 2024

Some HPC applications may want to use hugepages (2 MiB / 1 GiB page sizes) to reduce TLB cache pressure.

In container runtimes, there are several examples to support hugepages:

Some references on hugepages:

We need to explicitly enable hugepages on part of our testing infra and implement the option, like:

backend.ai create -r mem=16G --resource-opt shm=1G,huge-2Mi=512M,huge-1Gi=4G ...

or,

backend.ai create -r mem=16G -r mem.huge-2g=512M -r mem.huge-1g=1G --resource-opt shm=1G ...

The first option (resource-opt) does not prevent the overlapped usage but just allow the hugepage access from containers with limits.

The second option (resource-slot) treats hugepages as an accounted resource that cannot be shared between different containers. For consistency with MIG slots (cuda.mig-5g, ...), I've removed the trailing i (binary suffix) in the resource slot names.

@achimnol achimnol added type:feature Add new features comp:manager Related to Manager component comp:agent Related to Agent component urgency:3 Must be finished within a certain time frame. labels Jul 29, 2024
@achimnol achimnol added this to the 24.09 milestone Jul 29, 2024
@achimnol achimnol changed the title Support hugepages like shared memory as a resource option Support hugepages allocation in containers Jul 29, 2024
@achimnol achimnol removed the type:feature Add new features label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:agent Related to Agent component comp:manager Related to Manager component urgency:3 Must be finished within a certain time frame.
Projects
None yet
Development

No branches or pull requests

1 participant