Built to detect weak external linkage of CUDA kernels that can lead to unexpected behavior.
Weak external linkage of CUDA kernels is a source of incorrect behavior that is non-obvious and incredibly hard to track. Incorrect behavior can include:
- kernel not being executed at all
__global__
function call is not configured errors- kernel for wrong SM being launched
- random segfaults
This happens due to the fact that the host side entry point for the kernel has weak linkage, and therefore if multiple DSO have that symbol only one will exist at runtime. This collapsing of symbols breaks certain assumptions in the CUDA runtime and in effect creates undefined behavior.
The detect.py tool offers a suite of options/flags to allow users to extract the specific details they are looking for. Any detect issues will cause the program to return an exit code of 1, and output json dictonary of the issues.
detect.py -m <libA> <libB> # List only kernels that exist in more than one of the provided library
detect.py -m <directory> # List only kernels that exist in more than one of the libraries inside the given directory
detect.py -m -r <exe|lib> # Look at the provided library, and all ldd (`DT_NEEDED`) dependencies
Options and Flags:
r
Scan all dependencies of the provided input(s).m
Only consider kernels that exist in multiple files ( includes dependencies )b
Only output new symbol usage that doesnt't exist in the json baseline filee
Filter out certain c++ symbols from output with provided regexu
Include public global process unique variables in the output
When detect.py outputs any issues to standard output using the following format:
{
"_Z9TrickySigPfPiPdi": {
"/path/to/file/example_A.so": [
"SASS",
"SASS",
"SASS",
"PTX"
],
"/path/to/file/example_B.so": [
"SASS"
],
"symbol": "TrickySig(float*, int*, double*, int)"
}
}
The root object contains a key/value map of all symbols where the key is the mangled symbol name and the value is a dictonary object which contains the follwing entries:
symbol
Contains the C++ demangled name of the symbol
<path>
Each file that has the symbol will have a separate key / value entry.
The file path ( including name and extension ) is the key, and the value is a list of symbol instance types.
Due to CUDA whole compilation mode it is possible for the same symbol to exist multiple times inside the same library ( de-duplicated at runtime ).
Symbol instance types:
- "PTX" Means that kernel instance will be compiled at runtime ( compute90 )
- "SASS" Means the kernel instance is for a specific for GPU version ( sm90 )
- "u" Means the symbol is a globally unique variable across the entire process space
Baseline files allow projects to easily detect when new weak symbols are introduced into a project. This can by running the detect.py
as part of a
CI pipeline, or part of a git bisect
script.
A baseline file can be generated by running detect.py
on the project from a 'good' configuration and piping the output to a file.
When detect.py
is executed with the baseline option (-b
) the following rules are applied to determine what output will be shown.
- The execution of
detect.py
is done as normal reading all input and obeyin the-r
flag - Any other filtering flags (
-m
,-e
,-u
) will be processed - The output json after all filtering will be compared to the json in the baseline file. - Symbols will be considered to match if the mangled names are the same - Files will be matched on just the filename plus extension, with any preceding path ( absolute or relateive ) ignored
A symbol will be considered to be new and part of the output, if it never existed in the baseline file, or if the baseline has smaller total number of files, or symbols per file.
detect.py -r exe
detect.py -r exe -b <path/to/baseline.json>
detect.py -r -m -u exe