Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not hard-code the number of jobs to 1 when running ctest #578

Merged
merged 1 commit into from
Jun 26, 2020

Conversation

mattyjams
Copy link
Contributor

I'm not sure what previously required the tests to be run serially, but I don't seem to be having any issues running them in parallel on Linux.

Was there something else we needed to follow up on here before we switched multiprocessing back on?

@kxl-adsk
Copy link

kxl-adsk commented Jun 9, 2020

@mattyjams It's something we are internally looking into. We had random failures on some platforms / hardware configuration. Not ready to accept this PR at this point. We will make a switch when ready.

@kxl-adsk kxl-adsk added unit test Related to unit tests (both python or c++) do-not-merge-yet Development is not finished, PR not ready for merge labels Jun 9, 2020
@@ -285,8 +285,6 @@ def RunCMake(context, extraArgs=None, stages=None):
def RunCTest(context, extraArgs=None):
buildDir = context.buildDir
variant = BuildVariant(context)
#TODO we can't currently run tests in parallel, something to revisit.
numJobs = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to preserve the default behavior of running tests in serial unless the number of jobs was explicitly passed to the build script. This would match what ctest is doing where by default it runs with a single job, but you can enable parallel execution when using -j flag. Once this is done, we should be able to merge this PR and start gradually adopt it platform by platform.

Copy link
Contributor Author

@mattyjams mattyjams Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I re-worked this change to only use multiple jobs for testing when you explicitly use --jobs. Otherwise if --jobs isn't given, the behavior should be the same as before where it uses the number of CPUs for building and only one for testing. Also rebased against dev.

See f2cb351.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Matt. I added @HamedSabri-adsk to the review since he may have some feedback for the actual implementation.

@kxl-adsk kxl-adsk removed the do-not-merge-yet Development is not finished, PR not ready for merge label Jun 16, 2020
@mattyjams mattyjams force-pushed the pr/run_tests_using_context_numJobs branch from 2b1238f to f2cb351 Compare June 17, 2020 00:24
Copy link
Contributor

@HamedSabri-adsk HamedSabri-adsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mattyjams Your changes mostly looks great but there are couple of tiny things that I would like to change:

We always want to use all the available CPU cores for build jobs. We've never had a situation to not use all the cores when building the project. With your recent changes, numBuildJobs now depends on args.jobs which is something we don't want.

if (args.jobs is not None):
    if args.jobs <= 0:
        raise ValueError("Number of jobs must be greater than 0")

    self.numBuildJobs = args.jobs
    self.numTestJobs = args.jobs
else:
    self.numBuildJobs = GetDefaultNumBuildJobs()
    self.numTestJobs = GetDefaultNumTestJobs()

Here is something that we can do:

1- Have numBuildJobs unconditionally always use GetDefaultNumBuildJobs()

2- Rename --jobs to --ctest-jobs.

3- Introduce a new argument --ctest-args. By exposing this argument, people can also pass extra ctest arguments ( e.g -VV, -L, etc.... ).

@seando-adsk
Copy link
Collaborator

@HamedSabri-adsk Are you saying that you want the build jobs to always unconditionally always use the number max number of CPUs? I think that is a good default, but we should provide a way to override that. Why not leave the existing --jobs flag as-is and add a new flag --test-jobs?

@HamedSabri-adsk
Copy link
Contributor

Are you saying that you want the build jobs to always unconditionally always use the number max number of CPUs? I think that is a good default, but we should provide a way to override that. Why not leave the existing --jobs flag as-is and add a new flag --test-jobs?

Sure, I wasn't sure if we ever want to override it but perhaps one might need.

In that case, could we have exciting --job flag be renamed to --build-jobs and the new flag for ctest job be called --ctest-jobs?

@seando-adsk
Copy link
Collaborator

@HamedSabri-adsk I do sometimes override the jobs flag (using a number less than my nb of cores) when I want to launch a job and do something else with my computer while its building.

@mattyjams
Copy link
Contributor Author

We always want to use all the available CPU cores for build jobs. We've never had a situation to not use all the cores when building the project. With your recent changes, numBuildJobs now depends on args.jobs which is something we don't want.

@HamedSabri-adsk: I don't think that's true. Previously, GetCPUCount() was used only to provide the default value for the --jobs option. The context then directly used the args value:

self.numJobs = args.jobs

As others have identified, if you had passed a different value to the --jobs option, it should have used that number of processes when building.

If people prefer that --jobs be split into --build-jobs (or just leave that one as --jobs) and --test-jobs, I can do that. I was just avoiding changing the command line options in case they've been baked into any automated setups.

@HamedSabri-adsk
Copy link
Contributor

@mattyjams

I don't think that's true. Previously, GetCPUCount() was used only to provide the default value for the --jobs option. The context then directly used the args value:

self.numJobs = args.jobs

Hmm, not sure what part of my original comment is incorrect? If you print out self.numJobs as is right now, you can see it prints GetCPUCount(). That's what is it used as default value. I never had the need to set the jobs flags and therefore it takes the default value.

parser.add_argument("-j", "--jobs", type=int, default=GetCPUCount(),
                    help=("Number of build jobs to run in parallel. "
                          "(default: # of processors [{0}])"
                          .format(GetCPUCount())))

@mattyjams
Copy link
Contributor Author

I'm saying that numJobs/numBuildJobs has always depended on args.jobs.

If you do not pass --jobs to build.py, then yes, it uses GetCPUCount(). However if you do pass --jobs, it uses that number of processes. That behavior is the same before and after the changes here.

@kxl-adsk
Copy link

@mattyjams We discussed this internally and agree that renaming an already existing flag isn't ideal. For exactly this reason we feel we shouldn't cut corners now and provide only the ability to customize the number of jobs for tests. The end goal is to have the ability to append custom arguments to ctest. This can be a number of jobs to run in parallel, but as well regular expression for test filtering, or verbosity level, etc.

What we are proposing is to have a new ctest-args flag which will have similar behavior to build-args, i.e. arguments passed to build script will be appended to the default arguments set in the script. For example in the case of running ctest jobs in parallel, by default, we don't pass anything to j flag. But you will be able to use ctest-args to pass j=64.

Finally, the current build.py job flag will remain to only control build. It won't be used for ctest.

This allows running tests in parallel using the --ctest-args option with an
invocation like this:

build.py ... --stages test --ctest-args "--parallel 8" -- .
@mattyjams mattyjams force-pushed the pr/run_tests_using_context_numJobs branch from f2cb351 to ce0fc54 Compare June 23, 2020 23:36
@mattyjams mattyjams changed the title run ctest using numJobs from the context do not hard-code the number of jobs to 1 when running ctest Jun 23, 2020
@mattyjams
Copy link
Contributor Author

In that case, how about the simplified change in ce0fc54 which simply removes the hard-coded -j 1 option being passed to ctest?

With that change, I'm able to run tests in parallel with a command that looks something like this:

build.py ... --stages test --ctest-args "--parallel 17" -- .

@kxl-adsk
Copy link

Ha, looks like we already had it set up this way 🤦‍♂️ and by default, it will run in serial.
@HamedSabri-adsk and I are off this week...I will ask @seando-adsk to review this.

Copy link
Collaborator

@seando-adsk seando-adsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me. The default behavior is the same, since ctest by default uses a single job. But this gives users the ability to override.

@kxl-adsk kxl-adsk merged commit 9ff9784 into Autodesk:dev Jun 26, 2020
@mattyjams mattyjams deleted the pr/run_tests_using_context_numJobs branch June 26, 2020 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unit test Related to unit tests (both python or c++)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants