Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slurm plug-in: remove --cpus-per-task requirement #442

Closed
AnarManafov opened this issue May 19, 2022 · 3 comments
Closed

slurm plug-in: remove --cpus-per-task requirement #442

AnarManafov opened this issue May 19, 2022 · 3 comments
Assignees
Milestone

Comments

@AnarManafov
Copy link
Contributor

DDS slurm plug-in sets the --cpus-per-task= to the number of task slots per agent.
That was intently made to prevent users from overbooking resources on slurm.

Looks like this is not always convenient, according to ALICE pre-production tests.
This requirement should be either configurable (enabled by an option in dds submit) or completely removed.

I lean to the configurable solution.
dds-submit and its ToolsAPI should accept an "--enable-overbooking" argument, which will instruct plug-in to disable --cpus-per-task (for slurm) and similar arguments in other RMS plug-ins. When specified, the responsibility to define cpu requirements will be delegated to the user.

@AnarManafov AnarManafov added this to the 3.8 milestone May 19, 2022
@AnarManafov AnarManafov self-assigned this May 19, 2022
AnarManafov added a commit to AnarManafov/DDS that referenced this issue May 26, 2022
dds-submit-slurm: Modified: Remove #SBATCH --ntasks-per-node=1. (FairRootGroupGH-444)
dds-submit-slurm: Modified: Remove #SBATCH --cpus-per-task=%DDS_NSLOTS%. (FairRootGroupGH-442)
@rbx
Copy link
Member

rbx commented May 26, 2022

Would it be reasonable to configure --cpus-per-task via a submit request option:

dds::tools_api::SSubmitRequest::request_t requestInfo;
requestInfo.m_cores = 10;

?
If the option is not provided, then you can put a default of your choice, or none.

@AnarManafov
Copy link
Contributor Author

@rbx , this is the plan. I don’t want to give up on this protection.
At the moment it’s removed only temporarily to avoid blocking of Alice.

AnarManafov added a commit to AnarManafov/DDS that referenced this issue Jun 2, 2022
dds-submit: Added: The command learned a new argument --enable-overbooking. The flag instructs DDS RMS plug-ing to not specify any CPU requirement for RMS jobs. (FairRootGroupGH-442)
dds-tools-api: Added: SSubmitRequestData supports flags. See SSubmitRequestData::setFlag and SSubmitRequestData::ESubmitRequestFlags. (FairRootGroupGH-442)
dds-slurm-plugin: Modified: The #SBATCH --cpus-per-task=%DDS_NSLOTS% requirment is now can be disiabled by providing the "enable-overbooking" flag (ToolsAPI or dds-submit). (FairRootGroupGH-442)
AnarManafov added a commit to AnarManafov/DDS that referenced this issue Jun 2, 2022
dds-submit: Added: The command learned a new argument --enable-overbooking. The flag instructs DDS RMS plug-ing to not specify any CPU requirement for RMS jobs. (FairRootGroupGH-442)
dds-tools-api: Added: SSubmitRequestData supports flags. See SSubmitRequestData::setFlag and SSubmitRequestData::ESubmitRequestFlags. (FairRootGroupGH-442)
dds-slurm-plugin: Modified: The #SBATCH --cpus-per-task=%DDS_NSLOTS% requirment is now can be disiabled by providing the "enable-overbooking" flag (ToolsAPI or dds-submit). (FairRootGroupGH-442)
AnarManafov added a commit that referenced this issue Jun 3, 2022
dds-submit: Added: The command learned a new argument --enable-overbooking. The flag instructs DDS RMS plug-ing to not specify any CPU requirement for RMS jobs. (GH-442)
dds-tools-api: Added: SSubmitRequestData supports flags. See SSubmitRequestData::setFlag and SSubmitRequestData::ESubmitRequestFlags. (GH-442)
dds-slurm-plugin: Modified: The #SBATCH --cpus-per-task=%DDS_NSLOTS% requirment is now can be disiabled by providing the "enable-overbooking" flag (ToolsAPI or dds-submit). (GH-442)
@AnarManafov
Copy link
Contributor Author

@rbx , the feature is implemented in the master.
Now, dds submit (command and ToolsAPI) support flags.
The #SBATCH --cpus-per-task=%DDS_NSLOTS% requirement can be disabled by providing the "enable-overbooking" flag (ToolsAPI or dds-submit).
API example:

SSubmitRequest::request_t requestInfo;
...
// Set flags
requestInfo.setFlag(dds::tools_api::SSubmitRequestData::ESubmitRequestFlags::enable_overbooking,
                                   <state of the flag: true/false>);
...

Please note, by default the flag is off. That means ODC needs to enable it for ALL alice partitions, except the ones which need a CPU-based allocation.
This is very important as ALICE doesn't use CPU requirement for most of their runs. They mostly use overbooking on nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants