Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add accelerator value to job cfg and extend PR comment if accelerator arg is used #280

Merged
merged 9 commits into from
Sep 17, 2024

Conversation

trz42
Copy link
Contributor

@trz42 trz42 commented Sep 10, 2024

Passes down value of accelerator to job.cfg file and uses value in comment about a new job.

README.md Outdated Show resolved Hide resolved
# however, matching CPU architectures works differently to handling
# accelerators; multiple CPU architectures defined in arch_target_map
# can match the (CPU) architecture component of a filter; in
# contrast, the value of the accelerator filter is just passed down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this rather limiting?

How could we then send GPU builds to a GPU node/partition, when desired?

Copy link
Contributor Author

@trz42 trz42 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a separate argument for that (e.g., node: or nodetype:) or rework the handling of arguments, e.g., have arguments that are passed down to the build script and arguments that are used by the bot to allocate the right node type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to not add this capability in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, agreed

tasks/build.py Outdated
@@ -514,10 +530,12 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
cpu_target = '/'.join(arch.split('/')[1:])
os_type = arch.split('/')[0]
log(f"{fn}(): arch = '{arch}' => cpu_target = '{cpu_target}' , os_type = '{os_type}'")
prepare_job_cfg(job_dir, build_env_cfg, repocfg, repo_id, cpu_target, os_type)

log(f"{fn}(): accelerator = '{accelerator}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not log this via log statement on line 516?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Changed in 8f44ba9

tasks/build.py Outdated
# obtain accelerator from job.accelerator
accelerator = job.accelerator
accelerator_spec_str = ''
if accelerator != 'none':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we intend to keep 'none' as a magic special value, we should make it a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some value that represents that no accelerator build is requested. Maybe just 'none' is not good enough. Could be NO_ACCELERATOR or similar.

Agree, a constant would be best.

Copy link
Contributor Author

@trz42 trz42 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather used None than a constant. It may require different processing in the bot/* scripts, but that's ok as those haven't been implemented yet.

Done in 8f44ba9

tasks/build.py Outdated
@@ -474,6 +475,12 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
# call to just before download_pr
year_month, pr_id, run_dir = create_pr_dir(pr, cfg, event_info)

accelerator = "none"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use "none" instead of None here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use None. Just need to change the processing where it is used as a string and we want a meaningful output, e.g., in a log or in a PR comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 8f44ba9

# determine accelerator from action_filter argument
accelerators = action_filter.get_filter_by_component(tools_filter.FILTER_COMPONENT_ACCEL)
if len(accelerators) > 0:
accelerator = accelerators[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we warn or something when the list has more than 1 element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Logging is maybe sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8f44ba9

Also logs in case there is no element.

Co-authored-by: Kenneth Hoste <[email protected]>
@trz42
Copy link
Contributor Author

trz42 commented Sep 10, 2024

Looks like some tests are failing now. Will look into these and address the suggestions.

@trz42
Copy link
Contributor Author

trz42 commented Sep 10, 2024

All suggestions addressed. Pytests have been fixed. Bot build command has been tested both with and without additional argument accelerator.

Copy link
Contributor

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Contributor

boegel commented Sep 17, 2024

Tested myself, works as designed. Diff for job.cfg created by a "build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2" request vs "build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80":

$ diff -u 18855/cfg/job.cfg 18856/cfg/job.cfg
--- 18855/cfg/job.cfg	2024-09-17 12:42:14.738160934 +0000
+++ 18856/cfg/job.cfg	2024-09-17 12:42:55.348634381 +0000
@@ -5,7 +5,7 @@
 shared_fs_path = /home/boegel/bot-shared

 [repository]
-repos_cfg_dir = /home/boegel/eessi-bot-software-layer/jobs/2024.09/pr_32/event_405625f0-74f2-11ef-8f66-1dae4afc0d3b/run_000/linux_x86_64_amd_zen2/eessi.io-2023.06-software/cfg
+repos_cfg_dir = /home/boegel/eessi-bot-software-layer/jobs/2024.09/pr_32/event_591068d0-74f2-11ef-9bca-f787718ea34f/run_000/linux_x86_64_amd_zen2/eessi.io-2023.06-software/cfg
 repo_id = eessi.io-2023.06-software
 container = docker://ghcr.io/eessi/build-node:debian11
 repo_name = software.eessi.io
@@ -14,5 +14,5 @@
 [architecture]
 software_subdir = x86_64/amd/zen2
 os_type = linux
-accelerator =
+accelerator = nvidia/cc80

Relevant log entries:

$ grep 'accelerator =' pyghee.log
[20240917-T12:42:14] prepare_jobs(): arch = 'linux/x86_64/amd/zen2' => cpu_target = 'x86_64/amd/zen2' , os_type = 'linux', accelerator = 'None'
accelerator =
[20240917-T12:42:55] prepare_jobs(): arch = 'linux/x86_64/amd/zen2' => cpu_target = 'x86_64/amd/zen2' , os_type = 'linux', accelerator = 'nvidia/cc80'
accelerator = nvidia/cc80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants