Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect results for OpenFOAM v10 + v11 when built with GCC 11.3.0 or 12.3.0 and -ftree-vectorize #20927

Open
boegel opened this issue Jun 28, 2024 · 10 comments · Fixed by #20958

Comments

@boegel
Copy link
Member

boegel commented Jun 28, 2024

We got a report that installations of OpenFOAM (OpenFOAM-10-foss-2022a.eb, OpenFOAM-10-foss-2023a.eb, OpenFOAM-11-foss-2023a.eb) produce incorrect results, see also https://bugs.openfoam.org/view.php?id=4076 .

After a lot of digging, we figured out that this problem is caused by a compiler bug: if these easyconfigs are changed to avoid the use of the -ftree-vectorize compiler option, the problem is resolved:

toolchainopts = {'vectorize': False}
@nicolasdlss
Copy link

The incorrect results when using -ftree-vectorize are noticeable when using strict solver tolerances.
On the one hand, with -ftree-vectorize the results are (slightly) different and also different compared to older versions (8 and 9).
On the other hand and more importantly, when using -ftree-vectorize, the results depend on the partitioning. For some partitionings a large error remained after convergence, which is several order of magnitude bigger than the used solver tolerances. See also https://bugs.openfoam.org/view.php?id=4076.

@bartoldeman
Copy link
Contributor

Is the issue specific to 11.3.0 and 12.3.0 only or does it also happen with earlier/later versions of GCC?

@joris13
Copy link

joris13 commented Jun 30, 2024

When testing on a lid driven cavity with icoFoam, with or without -ftree-vectorize does not result in a strong performance difference, so this option can be avoided without heavy penalty.
Comparison

@boegel
Copy link
Member Author

boegel commented Jul 3, 2024

Is the issue specific to 11.3.0 and 12.3.0 only or does it also happen with earlier/later versions of GCC?

Problem doesn't appear with OpenFOAM easyconfigs using a toolchain older than 2022a, seems like we're hitting something that was introduced in auto-vectorizer of GCC 11.x and more recent.

@Micket
Copy link
Contributor

Micket commented Jul 3, 2024

Related, same(?) or at least very similar tree-vectorizer issue: #15495
In that case, -O3 also "solved" the issue there.

The actual issue here is that OpenFOAM doesn't have a robust test suite that is desperately needs.
I would also really like to test this with ASAN enabled -fsanitize=address. If someone could post a jobscript for running the reproducing test case I would appreciate it.

@boegel
Copy link
Member Author

boegel commented Jul 3, 2024

@nicolasdlss Can you provide some clear instructions on how to reproduce the problem, and what to look out for?

@Micket Since we already disable -ftree-vectorize for the other variant of OpenFOAM (ever since #15495), I think it makes sense to also do that for openfoam.org variant, as proposed in #20958, especially since the performance impact seems very minimal...

@nicolasdlss
Copy link

This is a fast-running test (order of 1 minute) with an numerical outcome that can be easily used for comparison.
Instructions on how to use the test and interpret the results are given in the README.
GitHub_reproducer.tar.gz

@boegel
Copy link
Member Author

boegel commented Jul 5, 2024

Re-opening this, I would really like to have some kind of sanity check in place for this, so we don't re-introduce this problem again in the future.

@nicolasdlss How feasible would it be to leverage a tutorial case that is included with OpenFOAM for this?

We could also look into integrating your minimal test case as a sanity check of course, but that's a bit messier since it involves external input files, etc.

@klust
Copy link

klust commented Jul 17, 2024

After hearing about the issue in the EasyBuild conference call of July 17 2024 I checked the setup on LUMI to see if we are also affected.

It turns out that by default OpenFOAM with the GNU compilers will use -O3 (but not -ftree-vectorize). So this was likely also what the developers used when trying to reproduce the bug reported by Nicolas and is likely the configuration they use when testing OpenFOAM.

It might be a good idea (and for more packages actually) to try to check which compiler options developers use, and if they have a reasonable level of optimization, simply configure the toolchainopts to try to reproduce them as closely as possible to avoid such problems.

@Micket
Copy link
Contributor

Micket commented Jul 21, 2024

-O3 enables -ftree-vectorize. In fact, starting with GCC 12, -O2 enables tree vectorizer as well (but not as aggressive).
The fact that observable changes in the computations happens when changing any standard compilation flags is a sign that something wrong with the code; some undefined behavior, race condition or poorly written manual SIMD code.
The fact that this happens when we actually lower the optimization flags is just extra worrying. We aren't talking about fast-math is anything fragile like that.

Trying to mimic their defaults is just praying that it just happens to work without testing, This will almost certainly just randomly occur whenever someone decides to try a different compiler, version, CPU architecture.
Given that this has been known to occur across several GCC versions (that have not been known to have any other issues in anything we build), i still believe this is just broken OpenFOAM code and it really just needs to be fixed. Starting off with some sanitizers; in particular -fsanitize=address and/or -fsanitize=undefined (though there are also others that could be interesting).

@boegel boegel modified the milestones: 4.9.3, release after 4.9.3 Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants