-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests that run model test cases under valgrind. #148
Comments
Do you have an automated way in mind? |
Valgrind can be run manually on gaea for any test case that will run on a single PE. For example:
The suppressions file tells valgrind which errors to ignore. I'll share mine once I've completed it. Then look in valgrind_log.txt to see memory errors. For test cases that need multiple PEs valgrind generates millions of false-positives from within MPI. I'm in the process of figuring out how to filter these out properly (making a suppressions file is not feasible). |
From what I can gather, it's going to be tricky to properly run mulit-PE test cases with valgrind on gaea. Usually Valgrind would handle calls to MPI by replacing the MPI library with wrappers that do certain checks before making the actual calls. This replacement is only possible if the MPI library is dynamically linked. It seems that running executables with dynamic libraries is not easy/supported on gaea. For a start the compute nodes don't have access to the filesystem where most dynamic libraries reside (netcdf, hdf, z, math, etc). Also I can't find a way to make the ftn compiler link some libraries as dynamic and others as static. I still have a couple of things to try. |
I've given up on running valgrind on gaea due to gaea limitation with shared libraries. Instead I'll try to run it on raijin, supercomputer on Canberra, Aus. |
I'll run these tests on the Aus computer. The output will be published here: https://climate-cms.nci.org.au/jenkins/job/mom-ocean.org/ This is what it looks like for MOM5, I think it can be cleaned up a lot (this file is ~300Mb). https://climate-cms.nci.org.au/jenkins/job/mom-ocean.org/job/MOM5_valgrind/lastBuild/console |
The Valgrind tests are not yet all running, but I thought it would be good to document any errors as I see them.... In global_ALE/z: ==20891== Invalid read of size 8 ==20891== Conditional jump or move depends on uninitialised value(s) |
These can be found here: https://climate-cms.nci.org.au/jenkins/job/mom-ocean.org/job/MOM6_runtime_analyzer/ |
Improve Leith schemes
Valgrind has been shown to be a useful tool to find use of uninitialized variables. Using uninitialized variables most often leads to unreproducible results because garbage can be read out of memory.
This issue proposes an automated way to run the test cases under valgrind. This will allow bugs of this kind to be found quickly.
See also #149
The text was updated successfully, but these errors were encountered: