-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate regression test baseline set #1217
Conversation
Allows to run the regression tests without running the simulations. This is useful for testing the regression test infrastructure itself. The option already existed in the reg test scripts, but it was not connected to CTest.
@rafmudaf -- This is definitely a welcome change to the regression testing infrastructure; thanks for that! Just a few comments:
Best regards, |
Thanks for the comments @jjonkman. Good points on the wording, I'll update the post to be more clear. As for precision, this is all based on double precision. I didn't test in single precision since it is known that the solution can be different for reasons aside from numerical precision such as described in #1209 (comment). That being said, I'll do a quick study to get a general idea of the differences. |
These test cases are too sensitive to be used reliably for software testing
The function to determine the passing channels was incorrectly using a global minimum instead of channel minimum to determine the absolutel tolerance.
bab5941
to
92ddaf3
Compare
Thanks, @rafmudaf. What are the two curves (red, blue) in your plots on the right involving ABS(local-baseline)? |
The red line is the error threshold for the regression test and the blue line is the error, |
@@ -421,6 +421,7 @@ def check_error(self): | |||
def check_input_motions(self,nodePos,nodeVel,nodeAcc): | |||
# make sure number of nodes didn't change for some reason | |||
if self._initNumNodePts != self.numNodePts: | |||
# @ANDY TODO: `time` is not available here so this would be a runtime error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This error never occurred in the test cases, so it never got triggered. I have a few other revisions to make to this file, so I'll correct it then.
I'll hold off on merging until the legend addition is complete. |
The |
Feature or improvement description
This pull request changes the regression test infrastructure to support a single set of baseline solutions for all supported compiler and platform combinations. Whereas the existing method for comparing results uses a norm for each channel, the new method uses an element-wise comparison along with a tolerance level that is derived from the channel itself.
New method
The primary change is to compare the test and baseline results with two element-wise tolerance values. A relative tolerance determines the allowed deviation from the baseline, and an absolute tolerance sets the smallest values that are considered meaningful. Both tolerances are expressed as orders of mangitude less than the order of magnitude of the range of the channel. The default values are
RTOL = 2
andATOL = 1.9
and these have been tuned to the existing r-test suite. The comparison has the formatol
is a function of the range of the baseline channel. It sets the level of precision required to pass.rtol
is a function of the baseline channel values themselves, and this can be thought of as the level of accuracy required to pass the regression test. This comparison allows the threshold to scale with the magnitude of the baseline so that, for example, a deviation of 10 passes for a data point at 1e6 but fails for a data point at 1e2.Old method
The existing method replaced by this pull request uses a "relative L2 norm" and a single value for a tolerance:
Where$||baseline||_2 >= 1$ , the error norm is divided by the L2 norm of the baseline.
Methods compared
For comparison, the plot below shows the difference between Intel and GNU compiler results for the HydroMxi channel in 5MW_ITIBarge_DLL_WTurb_WavesIrr. By visual comparison, this should pass the regression test.
The old method for determining whether this passes the regression test is shown below. In this case, the test fails since the calculated norm exceeds the default threshold.
The new method for comparison is shown below. The threshold is directly related to the baseline data
To get a sense for how much the test results are allowed to deviate from the baseline, I've plotted the upper and lower limits around the baseline channel and zoomed in to the portion of time with the most deviation. The dotted red line is the test and solid red is baseline. In this case, the bounds could be tighter around the baseline data, but it is a significant improvement over the old method.
OS / Compiler
This has been tested on the following OS / compiler combinations:
Disabled tests
A few tests have been consistently inconsistent between operating system, compilers, and code changes. These tests have been disabled:
These two tests had errors that were outliers compared to all the rest. For the sake of simplifying the test suite and the effort required to update it, these tests have been disabled. However, they should be investigated and reenabled in the future.
Future work
This is a step toward improving the regression test infrastructure, and I intend to continue to iterate on this method as I get a sense for how well it captures changes to the results. Better tuning of the tolerance parameters would enable more sensitivity in the comparison. Additionally, it might be worthwhile to allow for specifying tolerances specific to each test case or channel. Further improvements will also involve improving the tests and their outputs to provide more meaningful data for comparison.
Additional supporting information
This allows for regression test baseline results to be updated from a single system (i.e. my MacBook Pro or eventually automatically through GitHub Actions). In the short term, I will continue to manually inspect the differences between solution to ensure that there are no false-passing cases.