Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support riscv64 test in docker containers with qemu #4307

Merged
merged 1 commit into from
Mar 30, 2023

Conversation

Accelerator1996
Copy link
Contributor

Support risc-v test in docker containers with qemu to solve the problem that someone may not have risc-v machine. We test it locally and it was fine.

Copy link
Contributor

@smlambert smlambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Accelerator1996 - this looks like a nice option for many tests, there are just a few other considerations:

  • what are the configuration requirements for a node with the label 'docker.qemu' (and is that a good label for such a node, so that it follows and/or sets a pattern in our labeling schema).
    • If the requirement is simply that docker is installed on a machine, then we have sw.tool.docker label that could be used.
    • action on @smlambert to document our current labeling schema
  • some tests require to run inside of docker containers, what happens when those tests are sent to these riscv emulators. If they can not be run on these, then we may need to update some other code to prevent it

@Accelerator1996
Copy link
Contributor Author

thanks @Accelerator1996 - this looks like a nice option for many tests, there are just a few other considerations:

  • what are the configuration requirements for a node with the label 'docker.qemu' (and is that a good label for such a node, so that it follows and/or sets a pattern in our labeling schema).

    • If the requirement is simply that docker is installed on a machine, then we have sw.tool.docker label that could be used.
    • action on @smlambert to document our current labeling schema
  • some tests require to run inside of docker containers, what happens when those tests are sent to these riscv emulators. If they can not be run on these, then we may need to update some other code to prevent it

Hi, @smlambert! I think the label you said is suitable, so I have modified it. The label of riscv64_linux contains sw.os.linux. So if a linux machine with docker installed, it can run in most cases with the help of multiarch/qemu-user-static. In the code, whether it is using this docker or the execution of testBuild(), it is wrapped by try-catch, so I think exceptions can be handled.

@sxa
Copy link
Member

sxa commented Feb 6, 2023

@Accelerator1996 Do you have the Dockerfile which creates the image that this PR is pulling?

@Accelerator1996
Copy link
Contributor Author

@Accelerator1996 Do you have the Dockerfile which creates the image that this PR is pulling?

Sorry, we're still sorting it out.

@smlambert
Copy link
Contributor

One other enhancement to do as part of this PR, similar to what we have done to support dynamic agents hinging on the parameter CLOUD_PROVIDER, is to prefer real nodes first if they are available, and if not available, then fire up a dynamic agent or container. (see code here)

So, for our use case we would like to have a mix of both real risc-v hardware and these emulators.

I think we can take this same approach, where we set CLOUD_PROVIDER=local (or some such value), and if that is set, check first if real nodes are available and if not, fire up these containers. I have asked for feedback/review from @sophia-guo as she added the dynamic agent support originally.

Copy link
Contributor

@smlambert smlambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will mark this PR so it doesn't get merged before a few other details are discussed. Looking forward to leveraging the concept when it is ready.

@sophia-guo
Copy link
Contributor

Agree with @smlambert if this can be enhanced as prefer real nodes first if they are available, and if not available fire up a dynamic agent or container.

Feel like the current PR will only work by explicitly setting the LABEL as 'hw.arch.riscv&&sw.tool.docker&&....'. That is it won't run in docker containers if there is no risc-v machine( labeled with ci.role.test&&sw.os.linux&&hw.arch.riscv&&hw.bits.64 ).

The way Dynamic agents work is doing extra check , trigger a agent and run with the docker container.

@Accelerator1996
Copy link
Contributor Author

Accelerator1996 commented Feb 9, 2023

Agree with @smlambert if this can be enhanced as prefer real nodes first if they are available, and if not available fire up a dynamic agent or container.

Feel like the current PR will only work by explicitly setting the LABEL as 'hw.arch.riscv&&sw.tool.docker&&....'. That is it won't run in docker containers if there is no risc-v machine( labeled with ci.role.test&&sw.os.linux&&hw.arch.riscv&&hw.bits.64 ).

The way Dynamic agents work is doing extra check , trigger a agent and run with the docker container.

@smlambert @sophia-guo Thank you all very much! I will improve my code according to your opinions recently.

@Accelerator1996 Accelerator1996 force-pushed the riscv_qemu_branch branch 2 times, most recently from 1cbe8ed to 45c522c Compare February 13, 2023 08:27
@Accelerator1996
Copy link
Contributor Author

Hi, @smlambert @sophia-guo! Does my change meet your expectations? I have added DynamicAgents configuration. If there is no risc-v machine, by setting CLOUD_PROVIDER=local, it can run test on the machine labeled with the 'sw.os.linux&&hw.bits.64&&ci.agent.dynamic&&sw.tool.docker'. I have tested it in our pipeline and it works fine.

In addition, one of my changes is the condition of whether to start the vm. I think that the spliced label should not be used only, but the original label should be allowed to be used. Otherwise, the configuration similar to "required docker" will not be able to start the vm.

@smlambert
Copy link
Contributor

Thanks @Accelerator1996, looks good. I will await the Dockerfile for https://hub.docker.com/r/alibabadragonwelljdk/riscv-normal-qemu_6.0.0-rvv-1.0 before running some test jobs to verify this PR.

@Accelerator1996
Copy link
Contributor Author

Thanks @Accelerator1996, looks good. I will await the Dockerfile for https://hub.docker.com/r/alibabadragonwelljdk/riscv-normal-qemu_6.0.0-rvv-1.0 before running some test jobs to verify this PR.

I have written a dockerfile, but I haven’t verified it completely yet. I will submit it together after I verify it in the past few days. May I ask what directory should my dockerfile be placed in?

@smlambert
Copy link
Contributor

You do not need to place it in our repository, but it would be good to be able to see it shared somewhere publicly visible before I kick off test runs.

@Accelerator1996
Copy link
Contributor Author

You do not need to place it in our repository, but it would be good to be able to see it shared somewhere publicly visible before I kick off test runs.

👌

@sophia-guo
Copy link
Contributor

sophia-guo commented Feb 15, 2023

In addition, one of my changes is the condition of whether to start the vm. I think that the spliced label should not be used only, but the original label should be allowed to be used. Otherwise, the configuration similar to "required docker" will not be able to start the vm.

params.LABEL is normally used as the agent name when people want to run the job on specific machine ( for example, debug). If it's not available then no need to run on any other available agent. So I think without the change of ORIG_LABEL part you PR should work in your local pipeline without setting param.LABEL, correct?

@Accelerator1996
Copy link
Contributor Author

Hi, @smlambert @sxa ! This is the repository where our dockerfile is stored https://github.com/dragonwell-releng/docker-qemu-riscv64. If the local docker version is too old, you need to use "docker buildx" instead of "docker build". We tested the image we made locally and it works fine.

@Accelerator1996
Copy link
Contributor Author

In addition, one of my changes is the condition of whether to start the vm. I think that the spliced label should not be used only, but the original label should be allowed to be used. Otherwise, the configuration similar to "required docker" will not be able to start the vm.

params.LABEL is normally used as the agent name when people want to run the job on specific machine ( for example, debug). If it's not available then no need to run on any other available agent. So I think without the change of ORIG_LABEL part you PR should work in your local pipeline without setting param.LABEL, correct?

In the code, the prerequisite for your dynamic agent to start is that there is a machine with specific label. But if user sets the "required docker" to true, this place will never be called unless he manually added "sw.tool.docker" in the MAP["LABEL"]. ORIG_LABEL just lets him meet the condition, so that some backup agents can be enabled for testing when all available nodes are offline. If there are still available nodes, he will not use dynamic agent. Therefore I think ORIG_LABEL is necessary.

@smlambert
Copy link
Contributor

I did not forget about this one, I will get a chance to try some test runs tomorrow.

@smlambert
Copy link
Contributor

@Accelerator1996
Copy link
Contributor Author

https://ci.adoptium.net/view/Test_grinder/job/Grinder/6792/

Sorry, I verified it in other repo. When I sorted out the code, there was an extra "+". I have modified it and it is ok now. Sorry for the trouble.

@smlambert
Copy link
Contributor

No worries @Accelerator1996 - here is the next run https://ci.adoptium.net/view/Test_grinder/job/Grinder/6793/

I can take a closer look tomorrow, to offer suggestions.

@sophia-guo
Copy link
Contributor

params.LABEL is normally used as the agent name when people want to run the job on specific node.

@Accelerator1996 the case you mentioned user set the params.LABEL and params.DOCKER_REQUIRED is a bug. Instead of using the extra variable ORIG_LABEL as params.LABEL I would suggest to move all those extra label settings https://github.com/adoptium/aqa-tests/blob/master/buildenv/jenkins/openjdk_tests#L181-L197 to this else block

} else {
LABEL = PLATFORM_MAP[params.PLATFORM]["LABEL"]
if (params.BUILD_LIST.contains("perf")) {
def perfLabel = LABEL.minus("ci.role.test&&").concat("&&ci.role.perf")
if (areNodesWithLabelOnline(perfLabel)) {
LABEL = perfLabel
}
}
.

The dynamic part one is good in grinder https://ci.adoptium.net/view/Test_grinder/job/Grinder/6807/ for our case with real risc-v hardware available. In this grinder https://ci.adoptium.net/view/Test_grinder/job/Grinder/6807/ I didn't ask for docker required and believe that is the expected case.

@smlambert
Copy link
Contributor

Rerun with CLOUD_PROVIDER set (which I forgot to do on the earlier runs): https://ci.adoptium.net/view/Test_grinder/job/Grinder/6810

@Accelerator1996 Accelerator1996 force-pushed the riscv_qemu_branch branch 2 times, most recently from 03e1fda to dcd66a5 Compare March 8, 2023 02:27
@Accelerator1996 Accelerator1996 force-pushed the riscv_qemu_branch branch 2 times, most recently from 0f8cb23 to 1a193dc Compare March 8, 2023 03:48
@@ -192,6 +193,10 @@ timestamps{
LABEL += "&&!sw.os.ubuntu.22"
}

if (SPEC.equals("linux_riscv64") && params.CLOUD_PROVIDER.equals("local")) {
LABEL = LABEL.minus("&&hw.arch.riscv")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused. This settings will make the dynamic agent has high priority than the hardware one if both dynamic and hardware are available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused. This settings will make the dynamic agent has high priority than the hardware one if both dynamic and hardware are available?

The hardware machine will be selected when required_docker is false. The dynamic machine will be selected when required_docker is false, add a value such as 'fyre' in DynamicAgents instead of 'local' and set CLOUD_PROVIDER to be 'fyre'. Docker will be selected when required_docker is true or CLOUD_PROVIDER is 'local'. It will select hardware one when both are available and not all hardware machines are busy if we set parameters as above.

@sophia-guo
Copy link
Contributor

sophia-guo commented Mar 10, 2023

We probably need to clarify two things. One is about the printout message. Some current printout message might not be clear and cause confusion. There is a PR to address that #4407.

That means output as following doesn't mean the job will run on dynamic agent - it only means none of labeled machines are online ( not same as avaialbe)
image.

The second one is I'm not sure why the change depends on if params.DOCKER_REQUIRED is set or not? I've tried four cases with the PR on our jenkins ( riscv64 static agent available) the result as following:

params.DOCKER_REQUIRED = true params.DOCKER_REQUIRED = false
params.CLOUD_PROVIDER ="" expected behavior : job run on hardware agent timeout expected : no specified agents
params.CLOUD_PROVIDER ="local" expected behavior: riscv64 static agent . actual behavior: running on other linux : ppc64le, s390x expected behavior: no static agent, dynamic triggered(even if we don't really have that environment). Actual: running on other linux : ppc64le, s390x ❌

@Accelerator1996 I'm not sure your environment so does it mean in your environment there is no other linux static agents labeled sw.os.linux? Also I feel like your case is asking for a docker required agent and then using that agent setting up the qemu container and do the test? Yes, agreed might be helpful if we could setup a short meeting to make all those questions clear.

@smlambert
Copy link
Contributor

@sophia-guo - it is likely the guidance I was giving to @Accelerator1996 . I expected our code to work this way:

  • if there is a real riscv static node available, send the job to it
  • if there is not a riscv static node available, find a node with docker installed on it (sw.tool.docker label) and use that node to fire up the qemu image (NOTE: this is likely not sufficient... mainly it will not work for 2 reasons:
    • we will try to download the TEST jdk (which is riscv tar.gz) and unpack it and run on that sw.tool.docker labelled machine instead of inside the container
    • we may also need to consider architecture (hw.arch.xxxx label)

I leave it to you both to meet and discuss.

@Accelerator1996 Accelerator1996 force-pushed the riscv_qemu_branch branch 2 times, most recently from 580292b to fdae63f Compare March 14, 2023 06:34
@Accelerator1996
Copy link
Contributor Author

Accelerator1996 commented Mar 14, 2023

Thank you very much for your work guidance @smlambert @sophia-guo . Thank you very much! I have improved my code. Since it involves resource scheduling, I would like to explain my thinking and current situation.

The current scheduling is like this. When there is no node with the given label, the project will wait for a certain period of time until there is an available node. When there is a machine with a given label and a given params.CLOUD_PROVIDER, then make a judgement. If there is an idle node, use it, otherwise use the dynamic agents.

Now the switch of the dynamic proxy is the parameter CLOUD_PROVIDER, but I think that since it is a dynamic agent, should it be used regardless of whether CLOUD_PROVIDER is set?

The following is my logic. First, condition 1 is changed to whether there is an online node or non-empty dynamicAgents setting in map. Only when neither exists, start to wait for the given label node to go online, otherwise enter the else branch. Then confirm whether it is set dynamicAgents, if it is empty, it means that it does not support, start to get the idle physical machine. Here I have modified the judgment conditions for the idle machine, because there may be cases where the machine setting concurrency is greater than 1. If all hardware machines are busy and the number of executors is full, then start to enable the dynamic agents. Then if the parameter CLOUD_PROVIDER is set, if it is not empty and in the defined list, use it, otherwise use the first one of the list which defined in list. Of course, if the list is empty, no dynamic agent will be used.

The following is my self-test report:
1. CLOUD_PROVIDER == ''
(1) There is 1 risc-v machine, the number of executors is 2, and 1 test job is triggered, use hardware machine with label 'ci.role.test&&sw.os.linux&&hw.arch.riscv'
(2) There is 1 risc-v machine, the number of executors is 2, and 2 test jobs are triggered, both use hardware machine with label 'ci.role.test&&sw.os.linux&&hw.arch.riscv'
(3) There is 1 risc-v machine, the number of executors is 2, and 3 test jobs are triggered, first two use hardware machine with label 'ci.role.test&&sw.os.linux&&hw.arch.riscv', the last one use machine with label 'ci.role.test&&sw.os.linux&&hw.arch.x86&&sw.tool.docker'
(4) There is no risc-v machine, 1 test job is triggered, use the machine with label 'ci.role.test&&sw.os.linux&&hw.arch.x86&&sw.tool.docker'
(5) There is no aarch64_mac machine which is not defined dynamicAgents in map, 1 test job is triggerd, job will be cancelled because of Could not find any nodes with 'ci.role.test&&hw.arch.aarch64&&(sw.os.osx||sw.os.mac)' label
(6) There is 1 hw.arch.x86 machine, the number of executors is 1, 1 test job is triggered, use the machine with label 'ci.role.test&&hw.arch.x86&&sw.os.linux'
(7) There is 1 hw.arch.x86 machine, the number of executors is 1, 2 test job are triggered, the last one use the machine with label 'hw.arch.x86&&sw.os.linux&&ci.agent.dynamic', and test on docker container, dynamicLabel is 'azure'

2. CLOUD_PROVIDER == 'azure'
(1) There is 1 hw.arch.x86 machine, the number of executors is 1, 1 test job is triggered, use the machine with label 'ci.role.test&&hw.arch.x86&&sw.os.linux', and test on hardware machine
(2) There is 1 hw.arch.x86 machine, the number of executors is 1, 2 test jobs are triggered, the last one use the machine with label 'hw.arch.x86&&sw.os.linux&&ci.agent.dynamic', and test on docker container, dynamicLabel is 'azure'

3. CLOUD_PROVIDER == 'fyre'
(1) There is 1 hw.arch.x86 machine, the number of executors is 1, 1 test job is triggered, use the machine with label 'ci.role.test&&hw.arch.x86&&sw.os.linux', and test on hardware machine
(2) There is 1 hw.arch.x86 machine, the number of executors is 1, 2 test jobs are triggered, the last one use the machine with label 'hw.arch.x86&&sw.os.linux&&ci.agent.dynamic', and test on hardware machine, dynamicLabel is 'fyre'

4. CLOUD_PROVIDER == 'local'
(1) There is 1 risc-v machine, the number of executors is 1, and 1 test job is triggered, use hardware machine with label 'ci.role.test&&sw.os.linux&&hw.arch.riscv'
(2) There is 1 risc-v machine, the number of executors is 1, and 2 test jobs are triggered, the last one use hardware machine with label 'ci.role.test&&sw.os.linux&&hw.arch.x86&&sw.tool.docker', and test on docker container, dynamicLabel is 'local'

5. CLOUD_PROVIDER == 'xx'
(1) There is 1 risc-v machine, the number of executors is 1, and 1 test job is triggered, use hardware machine with label 'ci.role.test&&sw.os.linux&&hw.arch.riscv'
(2) There is 1 risc-v machine, the number of executors is 1, and 2 test jobs are triggered, the last one use hardware machine with label 'ci.role.test&&sw.os.linux&&hw.arch.x86&&sw.tool.docker', and test on docker container, dynamicLabel is 'local'

@smlambert
Copy link
Contributor

smlambert commented Mar 14, 2023

Thanks for those exhaustive notes and testing @Accelerator1996 ! This design discussion and work has been very beneficial.

In my original thinking, CLOUD_PROVIDER == '' and CLOUD_PROVIDER == 'local' equated to the same thing, but I agree with you, they can/should be considered different cases.

CLOUD_PROVIDER indicates who will supply the resources being used.

CLOUD_PROVIDER == '' resource will be supplied by other appropriately labelled Jenkins nodes attached to the Jenkins server, so when resources are requested, and exceed the number of real hardware, if it is defined in Jenkinsfilebase code send the task off to a differently labeled machine (sw.tool.docker&&<extraLabelsAsRequired>)

CLOUD_PROVIDER == 'azure|fyre|xx' resource is supplied by those cloud providers by using the appropriate Jenkins plugin for each of those providers to spin up dynamic agents

CLOUD_PROVIDER == 'local', similar to azure|fyre|xx, resource is supplied by Jenkins server, by using Jenkins Docker plugin to spin up containers on the fly (not supported yet, likely has arch restrictions, need to investigate, related: https://devopscube.com/docker-containers-as-build-slaves-jenkins/).

@Accelerator1996
Copy link
Contributor Author

Thanks for those exhaustive notes and testing @Accelerator1996 ! This design discussion and work has been very beneficial.

In my original thinking, CLOUD_PROVIDER == '' and CLOUD_PROVIDER == 'local' equated to the same thing, but I agree with you, they can/should be considered different cases.

CLOUD_PROVIDER indicates who will supply the resources being used.

CLOUD_PROVIDER == '' resource will be supplied by other appropriately labelled Jenkins nodes attached to the Jenkins server, so when resources are requested, and exceed the number of real hardware, if it is defined in Jenkinsfilebase code send the task off to a differently labeled machine (sw.tool.docker&&<extraLabelsAsRequired>)

CLOUD_PROVIDER == 'azure|fyre|xx' resource is supplied by those cloud providers by using the appropriate Jenkins plugin for each of those providers to spin up dynamic agents

CLOUD_PROVIDER == 'local', similar to azure|fyre|xx, resource is supplied by Jenkins server, by using Jenkins Docker plugin to spin up containers on the fly (not supported yet, likely has arch restrictions, need to investigate, related: https://devopscube.com/docker-containers-as-build-slaves-jenkins/).

As you said, the startup of docker may be related to arch, including our qemu also needs to run on linux/amd64, so now I limit the use of 'local' to nodes with label hw.arch.x86&&sw.os.linux

@sophia-guo
Copy link
Contributor

  1. CLOUD_PROVIDER == ''
    (7) There is 1 hw.arch.x86 machine, the number of executors is 1, 2 test job are triggered, the last one use the machine with label 'hw.arch.x86&&sw.os.linux&&ci.agent.dynamic', and test on docker container, dynamicLabel is 'azure'

I have a question about this case. The label ci.agent.dynamic is configured by jenkins dynamic agent plugin. So if CLOUD_PROVIDER == '' that could mean no dynamic agent setup or even no dynamic agent plugin installed in the jenkins server so there will never be agents labeled hw.arch.x86&&sw.os.linux&&ci.agent.dynamic available?

LABEL += '&&ci.agent.dynamic'
if (params.CLOUD_PROVIDER != null && params.CLOUD_PROVIDER in dynamicAgents) {
dynamicLabel = params.CLOUD_PROVIDER
} else if (dynamicAgents.size() >= 1) {
Copy link
Contributor

@sophia-guo sophia-guo Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the case (1) --> 7 as I mentioned in #4307 (comment), which I would suggest to disregard it.

@Accelerator1996
Copy link
Contributor Author

  1. CLOUD_PROVIDER == ''
    (7) There is 1 hw.arch.x86 machine, the number of executors is 1, 2 test job are triggered, the last one use the machine with label 'hw.arch.x86&&sw.os.linux&&ci.agent.dynamic', and test on docker container, dynamicLabel is 'azure'

I have a question about this case. The label ci.agent.dynamic is configured by jenkins dynamic agent plugin. So if CLOUD_PROVIDER == '' that could mean no dynamic agent setup or even no dynamic agent plugin installed in the jenkins server so there will never be agents labeled hw.arch.x86&&sw.os.linux&&ci.agent.dynamic available?

When there is no hardware machine to meet the requirements, it will try to use dynamic agent. If there is no dynamic, then the test job will wait for available nodes. If it runs here, it actually means that the user does not actually have a hardware machine with the corresponding label. The result is the same.

@sophia-guo
Copy link
Contributor

If it runs here, it actually means that the user does not actually have a hardware machine with the corresponding label. The result is the same.

The discussion is based on the current PR.
It runs here because the condition !dynamicLabel.equals('') is used, which I think should be params.CLOUD_PROVIDER. Also result is not the same. To use dynamic it means for now the hardware machine with the correct label is not available as dynamic available we will use dynamic. It doesn't mean the hardware machine will not be available later, it can be available later. For the case I mentioned the label is changed with extra 'ci.agent.dynamic', that means the machine will never be available.

@Accelerator1996 Accelerator1996 force-pushed the riscv_qemu_branch branch 2 times, most recently from 7e87345 to 94c395a Compare March 17, 2023 17:30
@Accelerator1996
Copy link
Contributor Author

Hello, Thank you very much for your work guidance @smlambert @sophia-guo .As you said, in fact, we should distinguish between dynamicAgents and dockerAgents. Because testing with docker is different from testing with dynamicAgent, the former mainly depends on the machine environment, we only need a specific arch machine with docker to run test. But dynamicAgents depends on the plugin of jenkins, which is not available in every jenkins, so I think the latter mechanism should be retained. In view of the fact that we maybe run tests in docker containers more frequently in the future. I think a DockerAgents should be added in map, which will be a common design.

The following is my self-test report:

  1. CLOUD_PROVIDER == ''
    (1) There is no risc-v machine, 1 test job is triggered, nodeLabel: 'ci.role.test&&sw.os.linux&&hw.arch.x86&&sw.tool.docker', dockerAgentLabel: 'default', run test in docker container
    (2) There is 1 risc-v machine, the number of executors is 1, trigger 1 test job, run test on hardware machine
    (3) There is 1 risc-v machine, the number of executors is 1, trigger 2 test job, first one run test on hardware machine, the last one nodeLabel: 'ci.role.test&&sw.os.linux&&hw.arch.x86&&sw.tool.docker' and dockerAgentLabel: 'default', run test in docker container
    (4) There is 1 linux-x86 machine, the number of executors is 2, trigger 1 test job, run test on hardware machine
    (5) There is 1 linux-x86 machine, the number of executors is 2, trigger 2 test job, both run test on hardware machine
    (6) There is 1 linux-x86 machine, the number of executors is 2, trigger 3 test job, first two run test on hardware machine, the last one is waiting for next available executor on 'ci.role.test&&hw.arch.x86&&sw.os.linux'

  2. CLOUD_PROVIDER == 'azure'
    (1) There is 1 risc-v machine, the number of executors is 1, trigger 1 test job, run test on hardware machine
    (2) There is 1 risc-v machine, the number of executors is 1, trigger 2 test job, first one run test on hardware machine, the last one nodeLabel: 'ci.role.test&&sw.os.linux&&hw.arch.x86&&sw.tool.docker' and dockerAgentLabel: 'default', run test in docker container
    (3) There is 1 linux-x86 machine, the number of executors is 2, trigger 1 test job, run test on hardware machine
    (4) There is 1 linux-x86 machine, the number of executors is 2, trigger 2 test job, both run test on hardware machine
    (5) There is 1 linux-x86 machine, the number of executors is 2, trigger 3 test job, first two run test on hardware machine, the last one starting dynamic vm, nodeLabel: 'hw.arch.x86&&sw.os.linux&&ci.agent.dynamic' and dynamicLabel: 'azure'

  3. CLOUD_PROVIDER == 'default'
    (1) There is 1 risc-v machine, the number of executors is 2, trigger 1 test job, run test on hardware machine
    (2) There is 1 risc-v machine, the number of executors is 2, trigger 2 test job, both run test on hardware machine
    (3) There is 1 risc-v machine, the number of executors is 2, trigger 3 test job, first two run test on hardware machine, the last one nodeLabel: 'ci.role.test&&sw.os.linux&&hw.arch.x86&&sw.tool.docker' and dockerAgentLabel: 'default', run test in docker container
    (4) There is 1 linux-x86 machine, the number of executors is 2, trigger 1 test job, run test on hardware machine
    (5) There is 1 linux-x86 machine, the number of executors is 2, trigger 2 test job, both run test on hardware machine
    (6) There is 1 linux-x86 machine, the number of executors is 2, trigger 3 test job, first two run test on hardware machine, the last one is waiting for next available executor on 'ci.role.test&&hw.arch.x86&&sw.os.linux'

dockerAgentLabel = dockerAgents[0]
}
if (!dockerAgentLabel.equals('')) {
if (dockerAgentLabel.equals('default') && SPEC.equals('linux_riscv64')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we combine those two ifs to one? Feels like if (!dockerAgentLabel.equals('')) is unnecessary?

if (!dockerAgentLabel.equals('')) {
                                if (dockerAgentLabel.equals('default') && SPEC.equals('linux_riscv64')) {

LABEL = LABEL.minus("ci.role.test&&")
LABEL += '&&ci.agent.dynamic'
println "Cannot find any idle nodes. Starting dynamic vm, nodeLabel: '${LABEL}', dynamicLabel: '${params.CLOUD_PROVIDER}'"
} else if (params.CLOUD_PROVIDER != null && params.CLOUD_PROVIDER in dockerAgents) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those two cases if (params.CLOUD_PROVIDER != null && params.CLOUD_PROVIDER in dockerAgents) and else if (dockerAgents.size() >= 1) same? For both dockerAgentLabel=default?

@sophia-guo
Copy link
Contributor

LGTM other than two minor comments. For the extra property 'DockerAgents' : ['default'] there may be other suggestions about the naming, which we can easily update later.

Copy link
Contributor

@smlambert smlambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, we can revise the naming in a later PR as required as I suspect we will continue to play in this area of code in the upcoming months. Thanks @Accelerator1996 and @sophia-guo for this enhancement and the discussions that it triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants