-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to set CPU configuration at runtime #451
Comments
@decandia50 if you configure using e.g. |
@devinamatthews thanks for the response, but that's exactly what I'm trying to avoid doing. As you note I can recompile down to a known set of common denominator CPU instructions, but what I'm trying to accomplish is effectively use a specific set of instructions at runtime without the need to recompile/redistribute my software (the code I need is already in there, but I can't self select it). For a scenario - Let's say I care about number reproducibility with respect to floating point. And also that I have a large HPC cluster with heterogenous host/CPU types. Some support |
Oh, I didn't not see that it's reproducibility that is the main issue. I think this feature should be relatively easy to add, but i can't hazard a guess on a timeline. Do note that, for a reproducible answer, you will also need to run with the same number of threads on each machine. |
Indeed. The thread count is very important for reproducibility. In many cases where reproducibility is required the BLAS functions are often run single threaded to simplify things; e.g. |
Details: - Implemented support for the user manually overriding the automatic subconfiguration selection that happens at runtime. This override can be requested by setting the BLIS_ARCH_TYPE environment variable. The variable must be set to the arch_t id (as enumerated in bli_type_defs.h) corresponding to the desired subconfiguration. If a value outside this enumerated range is given, BLIS will abort with an error message. If the value is in the valid range but corresponds to a subconfiguration that was not activated at configure-time/compile-time, BLIS will abort with a (different) error message. Thanks to decandia50 for suggesting this feature via issue #451. - Defined a new function bli_gks_lookup_id to return the address of an internal data structure within the gks. If this address is NULL, then it indicates that the subconfig corresponding to the arch_t id passed into the function was not compiled into BLIS. This function is used in the second of the two abort scenarios described above. - Defined the enumerated error code BLIS_UNINITIALIZED_GKS_CNTX, which is returned for the latter of the two abort scenarios mentioned above, along with a corresponding error message and a function to perform the error check. - Added cpp macro branching to bli_env.c to support compilation of the auto-detect.x executable during configure-time. This cpp branch is similar to the cpp code already found in bli_arch.c and bli_cpuid.c. - Cleaned up the auto_detect() function to facilitate easier maintenance going forward. Also added a convenient debug switch that outputs the compilation command for the auto-detect.x executable and exits.
@decandia50 I've added support for observing the Note that you can still use Hopefully this feature, as implemented, is satisfactory for your purposes. Please test it out and pass along your feedback. I appreciate the detail you included in your initial issue post and follow-up messages. Ultimately, this is what caught my attention and prompted me to spend part of my weekend on this. Happy early Hanukkah/Christmas/Festivus/birthday/whatever. :)
Thanks for this reminder. I folded that change into 2a0682f. |
I see that you can use
BLIS_ARCH_DEBUG=1
to see what CPU configuration was selected at runtime, but it would be handy if you could set the CPU configuration at runtime instead of recompiling. The reasoning is to have the ability to act like MKL'sMKL_CBWR
environment variable which will allow you to specify the instruction set at runtime. This is useful when trying to create reproducible results across different machine types. For instance a Haswell machine can useAVX2
, but notAVX-512
. If you wanted to create a program that ran across a heterogenous set of Haswell and Skylake machines that produced the same result you would need to specify that the software built on the Skylake nodes used thehaswell
configuration. I would like to be able to specify that at runtime so the program would only useAVX2
instructions but leave the software configured forauto
. This would allow me to run usingAVX2
for some experiments, andAVX-512
for others. I went through the documentation, and did not find a way to tweak this setting at runtime, so if this already exists, please point me to the proper documentation.The text was updated successfully, but these errors were encountered: