Ability to set CPU configuration at runtime #451

decandia50 · 2020-10-09T18:35:21Z

I see that you can use BLIS_ARCH_DEBUG=1 to see what CPU configuration was selected at runtime, but it would be handy if you could set the CPU configuration at runtime instead of recompiling. The reasoning is to have the ability to act like MKL's MKL_CBWR environment variable which will allow you to specify the instruction set at runtime. This is useful when trying to create reproducible results across different machine types. For instance a Haswell machine can use AVX2, but not AVX-512. If you wanted to create a program that ran across a heterogenous set of Haswell and Skylake machines that produced the same result you would need to specify that the software built on the Skylake nodes used the haswell configuration. I would like to be able to specify that at runtime so the program would only use AVX2 instructions but leave the software configured for auto. This would allow me to run using AVX2 for some experiments, and AVX-512 for others. I went through the documentation, and did not find a way to tweak this setting at runtime, so if this already exists, please point me to the proper documentation.

The text was updated successfully, but these errors were encountered:

decandia50 · 2020-10-09T18:50:36Z

Also #351 was merged, so you can update

blis/frame/base/bli_arch.c

Line 78 in 2d8ec16

// NOTE: Change this usage of getenv() to bli_env_get_var() after

devinamatthews · 2020-10-09T19:23:27Z

@decandia50 if you configure using e.g. configure intel then it will compile in all the Intel architectures and select the proper one at runtime. While this isn't exactly the feature you asked for, it sounds like it would solve part of your problem. What you wouldn't get is e.g. using AVX2 on SkylakeX instead of AVX-512, but it's not clear why you would want to do that.

decandia50 · 2020-10-09T20:51:40Z

@devinamatthews thanks for the response, but that's exactly what I'm trying to avoid doing. As you note I can recompile down to a known set of common denominator CPU instructions, but what I'm trying to accomplish is effectively use a specific set of instructions at runtime without the need to recompile/redistribute my software (the code I need is already in there, but I can't self select it).

For a scenario - Let's say I care about number reproducibility with respect to floating point. And also that I have a large HPC cluster with heterogenous host/CPU types. Some support AVX2, some support AVX-512. In a common scenario I will have tools like numpy linked against BLIS, and I will make a calculation like np.linalg.norm(A@B) as part of some regression test suite. What I would expect is that given a known A and B that each host in the cluster would be able to reproduce the same result. However, because BLIS will autodetect the CPU and use AVX2 in some cases and AVX512 in others there is no way to specify that I care more about number reproducibility than performance at runtime without recompiling and redistributing the code, dependencies, and libraries to all hosts. In a large batch-like HPC system you may see many workloads. Some will desire absolute performance, and will want AVX-512 others will require reproducibility and only require the lowest common instruction set. For folks who care about floating point reproducibility this is fairly important. As was once told to me "diff is a wonderful debugging tool".

devinamatthews · 2020-10-09T22:51:31Z

Oh, I didn't not see that it's reproducibility that is the main issue. I think this feature should be relatively easy to add, but i can't hazard a guess on a timeline. Do note that, for a reproducible answer, you will also need to run with the same number of threads on each machine.

decandia50 · 2020-10-09T23:05:21Z

Do note that, for a reproducible answer, you will also need to run with the same number of threads on each machine.

Indeed. The thread count is very important for reproducibility. In many cases where reproducibility is required the BLAS functions are often run single threaded to simplify things; e.g. BLIS_NUM_THREADS=1 BLIS_JC_NT=1 BLIS_IC_NT=1 BLIS_JR_NT=1 BLIS_IR_NT=1

Details: - Implemented support for the user manually overriding the automatic subconfiguration selection that happens at runtime. This override can be requested by setting the BLIS_ARCH_TYPE environment variable. The variable must be set to the arch_t id (as enumerated in bli_type_defs.h) corresponding to the desired subconfiguration. If a value outside this enumerated range is given, BLIS will abort with an error message. If the value is in the valid range but corresponds to a subconfiguration that was not activated at configure-time/compile-time, BLIS will abort with a (different) error message. Thanks to decandia50 for suggesting this feature via issue #451. - Defined a new function bli_gks_lookup_id to return the address of an internal data structure within the gks. If this address is NULL, then it indicates that the subconfig corresponding to the arch_t id passed into the function was not compiled into BLIS. This function is used in the second of the two abort scenarios described above. - Defined the enumerated error code BLIS_UNINITIALIZED_GKS_CNTX, which is returned for the latter of the two abort scenarios mentioned above, along with a corresponding error message and a function to perform the error check. - Added cpp macro branching to bli_env.c to support compilation of the auto-detect.x executable during configure-time. This cpp branch is similar to the cpp code already found in bli_arch.c and bli_cpuid.c. - Cleaned up the auto_detect() function to facilitate easier maintenance going forward. Also added a convenient debug switch that outputs the compilation command for the auto-detect.x executable and exits.

fgvanzee · 2020-10-19T00:07:27Z

@decandia50 I've added support for observing the BLIS_ARCH_TYPE environment variable to manually override the automatic subconfiguration selection mechanism. (See commit 2a0682f.) Please note that it must be set to (1) an arch_t id value within the defined range, 0 to BLIS_NUM_ARCHS-1, as defined in frame/include/bli_type_defs.h (note that these enum values may change in future commits!), and (2) an arch_t id value that corresponds to a subconfiguration that is actually compiled into the library to which the executable was linked. If either condition is not met, BLIS will abort with an error message.

Note that you can still use BLIS_ARCH_DEBUG to confirm the subconfiguration selected, whether it is configure-determined, automatic at runtime, or manually overriden.

Hopefully this feature, as implemented, is satisfactory for your purposes. Please test it out and pass along your feedback.

I appreciate the detail you included in your initial issue post and follow-up messages. Ultimately, this is what caught my attention and prompted me to spend part of my weekend on this. Happy early Hanukkah/Christmas/Festivus/birthday/whatever. :)

Also #351 was merged, so you can update

blis/frame/base/bli_arch.c

Line 78 in 2d8ec16

// NOTE: Change this usage of getenv() to bli_env_get_var() after

Thanks for this reminder. I folded that change into 2a0682f.

devinamatthews added the enhancement label Oct 9, 2020

fgvanzee self-assigned this Oct 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to set CPU configuration at runtime #451

Ability to set CPU configuration at runtime #451

decandia50 commented Oct 9, 2020

decandia50 commented Oct 9, 2020

devinamatthews commented Oct 9, 2020

decandia50 commented Oct 9, 2020

devinamatthews commented Oct 9, 2020

decandia50 commented Oct 9, 2020 •

edited

Loading

fgvanzee commented Oct 19, 2020 •

edited

Loading

Ability to set CPU configuration at runtime #451

Ability to set CPU configuration at runtime #451

Comments

decandia50 commented Oct 9, 2020

decandia50 commented Oct 9, 2020

devinamatthews commented Oct 9, 2020

decandia50 commented Oct 9, 2020

devinamatthews commented Oct 9, 2020

decandia50 commented Oct 9, 2020 • edited Loading

fgvanzee commented Oct 19, 2020 • edited Loading

decandia50 commented Oct 9, 2020 •

edited

Loading

fgvanzee commented Oct 19, 2020 •

edited

Loading