-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define a verificationExperiment annotation for experiments meant to compare results for regression testing and cross-tool comparisons #3473
Comments
I agree that such information is useful, and ideally it should only be needed for a fraction of the models. I believe that what we have used for Dymola is instead of modifying the package that information is available externally - and also to loosen the tolerance so that we don't get false positives; but I will check. As I can see that has a number of advantages:
|
I'm not 100% happy with this direction, as it means that the default will be to use the settings from the I see good reasons to generate reference results according to a procedure where I can imagine there could rare situations when the automatic settings for reference result generation need to be overridden, but I have yet to see an actual example of this. Similar to what @HansOlsson's says regarding Dymola, we at Wolfram also keep information for overriding the A related topic is the use of different |
I honestly would not introduce any new annotations etc to the language. In my opinion Modelica already has all the means one needs to achieve this, e.g., inheritance and modification. What has to be done by each individual project is to decide for ITS DOMAIN AND USE-CASES on the structure and templates of modelling all the simulation applications. For example, you can have basic user experiments in your library (i.e., examples) and in a separate regression test library refine these experiments with different setting (change tolerances, stop times whatever). In this setup, the regression testing setup is not part of the final library distribution, as it likely should be (because you might add a bunch of scripts and other software for a continuous integration support resulting in huge dependencies to make your automatic tests run). You just don't want to ship all of that. And as @henrikt-ma mentions, good pratice at the tool vendors is already to have separate settings for this. I think, testing is a matter of project policies and templates. MSL can come up with a good scheme for its use case and improve its infrastructure without any need for language changes. |
I respectfully disagree with @henrikt-ma and @christoff-buerger:
Regarding this comment
it is true that each MA project has its own responsibility, but the whole point of the MA is to provide coordinated standards and libraries. In this case, I clearly see the need for a small language change to support the work by MAP-Lib. BTW, we are talking about introducing one new annotation, which is no big deal. We could actually introduce a vendor annotation __MAP_Lib_verificationExperiment, but this really seems overkill to me, can't we just have it as a standard Modelica annotation? @GallLeo, @beutlich, @dietmarw, @AHaumer, @hubertus65, I'd like to hear your opinion. |
First of all: We need something for MSL, but the standardized result should be easy to adopt by all library developers. I had a look at the "heuristics" proposed 10 years ago, when I started creating the first set of reference results: After the 4.0.0 release, @beutlich documented his adaption of tolerances in the MSL wiki: So, in order to make it transparent, I'm in favor of explicitly specifying verification settings. Where to specifiy the explicit verfication settings? @christoff-buerger proposed using @HansOlsson proposed storing the verifications setup externally. Seperate files are very hard to keep up to date, therefore storing verification settings in the example/test case seems the be right place to me. Especially, if we think about using signals from I would add arguments to
|
I am pretty much aligned with @GallLeo here. Example modelsUsing example models simulation results as reference results for cross-tool, cross-solver or cross-MSL-version regressing testing can be considered as a misuse. In strict consequence there should be no directory in Modelica/Resources/Reference/ at all. Reference modelsReference models should be taken from ModelicaTest. If example models from Modelica shall be also considered for regression testing, it might be an option to extend from it and adopt the experiment settings. This would also simplify the job of the regression tester since only one library needs to be taken into account. I also agree that we should not only specify the solver settings but also the valid reference signals. I see multiple options.
SummaryI am in favour to keep reference models outside Modelica and only use ModelicaTest for it. I am in favour to not have reference signals and experiment settings distributed in various locations, i.e., option 2 or 3 are my preferences. I even see more advantage in option 3. In that case it is left to specify (if at all)
I still need to be convinced that we need to have this in the specification or if it is simply up to the library developers of ModelicaTest. (Remember, it is now all in ModelicaTest and not any more in the MSL.) |
I would also be in favour of having the verification settings separate from the simulation model but referenced from within the simulation model. So basically @beutlich's option 3. |
I agree that many of those rules will not be adopted by all libraries. Additionally to me use-cases such as this is one of the reasons we want the possibility to have the information completely outside of the library; so that the user of the library can add the things needed for testing - without changing the up-stream-library. |
IMHO @casella is right with his analysis and his idea. I'm not fond of storing additional to the model the comparisonSignals.txt and separate settings for comparison / regression tests. It is much better to store this information (stop time, interval length, tolerance, what else?) within the model, either a second annotation or as part of the experiment annotation. If not present, the setting of the experiment annotation are valid (in most cases). Additionally we could specify the tolerance for comparison. The model developer decides whether changes for comparison / regression tests are necessary or not, maybe after receiving a message that comparison / regression tests are problematic (could be from tool vendors). As this annotation would be supported by most tools, this could be adopted by all third-party libraries. |
One thing that is unclear to me in most of the discussions above is what pieces are talking about settings for reference data creation, and what pieces are talking about settings for test runs. I'm a big proponent of "test what you ship", in the sense that one should (at least) have tests that do what a user would do. In this case it means running the examples with the settings from the experiment annotation. I can see that there are cases where this is not possible, like the chaotic systems already mentioned. In this case, a shorter stop time for result comparison is fine. Except for extremely rare cases like the chaotic system, I don't want testing to happen with special settings for test runs. If you do that, you are no longer testing what the user will experience, and models might not even simulate in the end product (with the settings in the experiment annotation). Nobody would notice, if the tests run with some special configuration. I definitely see the use of special settings for reference data creation. One could use that to give a tighter tolerance, resulting in reference data closer to the "perfect" result, increasing the chances of test runs with different tools/platforms/compilers getting close to the reference. I also want to make people aware of the |
I agree with testing what you ship (as part of testing; you can add more) and and even this case involves two different things:
Note that there can be other reasons than chaos for separating testing from running, e.g.: Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6 (and _analytic) says:
(Note that the experiment annotation doesn't have number of intervals, but Interval=6e-5 corresponds to the above, and even 1e-4 will sort of work as well; but not much higher - since that will under-sample the envelope too much leading to weird effects.) |
Thank you for clarifying this. I agree the test (in a perfect world) in this case should include both these steps.
As a user of the model, I would not want to see the weird under-sampling of the The file size issue for testing can be dealt with by the tool only storing the variables you need for the comparison. For the user, if we introduce figures in the model, tools could be smarter there as well and only store (by default) what you need for plotting (and animation). |
At Wolfram, where we have somewhat extensive use of figures in the models, we have the default policy to simply use all variables appearing in figures as comparison signals (@GallLeo already briefly mentioned this idea above). It has several advantages:
One small limitation of the approach is that there is no way to say that a variable in a figure should be excluded from correctness testing. I'm not aware of any model where we have needed this, but still… As @maltelenz suggested, the
|
I think it is well understood that there isn't one
The default for |
I am also in favor of adding a new annotation, or an entry to the Therefore, having model-specific specifications for the CI tests would be valuable. These specification should include
Currently we have (some of) these information spread across different files for about 1700 test models. This is due to historical reasons. It would be good to refactor this as the current situation makes it hard for new developers to figure out what is to be specified where, and it makes it harder to explain how to port changes across maintenance branches. Having these inside the |
This seems quite important, we can mention at the next phone meeting - but we will work a lot on it outside of those meetings as well. |
Please just make sure to keep the language group in the loop. Is it the MAB-Lib monthly meeting one should participate in to engage in the work, or some other forum? |
Conceptually speaking, it's both things, as also mentioned by @mwetter in his comment below. Whether or not we want to put the information about how to create the reference data in the same annotation or in a different annotation it's just a matter of taste but does not change the fundamental requirement (for me), i.e., that all the relevant information for testing a model should be embedded in the model through annotations, not spread over other files. We already have this concept for documentation, I don't really see why testing should be different. Regarding storing the names of the variables to be used for automated comparisons (possibly using regexp to keep them compact) I also think they should have a place in an annotation, rather than in a separate file, for the same reason the documentation is embedded in the model and not stored elsewhere. I agree with @henrikt-ma that we should make good use of the information provided by the
@henrikt-ma with cross-tool checking of the Buildings library, this happens in a significant number of cases, and it's really a nuisance if you want to make the library truly tool-independent with a high-degree of dependability. @mwetter can confirm. The solution, so far, was to tighten the tolerance in the experiment annotation, e.g. to 1e-7 for all those cases. This is really not a clean solution, because those simulations become unnecessarily slower and, most importantly, the difference in the obtained results is small, much smaller than the modelling errors inherent in the simulation model, which of course has lots of modelling approximations. As I'll argue once more in the next comment, the requirements for human inspection are completely different from the requirments for automatic cross-tool checking.
@maltelenz, I don't see the point of using a tighter tolerance for the reference creation, and then using a sloppy one to generate results that will fail the the verification test because of numerical errors. If you compare results obtained with different tools (and, mind you, the reference result will be one of them!), of course the tolerance should be the same. |
I re-read all the comments, and I noticed that there are two rather different opinions: according to the first one, the information to run cross-tool verifications should belong to completely separate entities, e.g. separate libraries such as ModelicaTest, according to the other instead, the information should be embedded in the models themselves. Let me add some extra considerations about that, which I hope can further clarify the issue.
I agree with that, with a few important remarks
Good point. |
Just a quick remark: this division is not terribly surprising to me, because it can also be observed in general in programming languages' testing tools -- in my personal experience there seem to be (at least) two major "schools":
I'll refrain from discussing respective pros/cons, since much of this comes down to personal preference, and to avoid bloating the discussion here. I just want to point out that the two approaches have widespread use/merit, and which one is "better" depends on a load of factors, probably. As far as my experience is concerned, what sets Modelica somewhat apart from the ones above is the noticeable dominance of "reference testing"/golden master/regression testing. The approach of asserting smaller facts about models' behaviour, e.g. like XogenyTest proposed, seems to have very little to no use from my vantage point. |
I believe the idea is that there is a "perfect" solution, if you had infinitely tight tolerance requirements. The idea is to have this perfect solution in the reference result. The numerical variations from different tools and sloppier tolerance from the user-facing If we instead generate the reference result with a sloppy tolerance from the |
As a last remark (I apologize for flooding this ticket), it is my goal as the leader of MAP-Lib to eventually provide an MA-sponsored infrastructure that can test open-source libraries (starting from the MSL and ModelicaTest, of course) by running all models that can be simulated with all Modelica tools, automatically comparing their results and visualizing the outcome of the comparisons in a web interface. This would allow the library developer(s) to easily inspect the results, and eventually pick the results of any one tool as reference results, based on his/her expert judgement, so that he/she is not limited in the development to the availability of tool(s) for which he or she has a license. It will also allow the library users to pick the best tool(s) to run a certain library. I believe this is the only way we can truly foster the concept of tool-independent language and libraries, moving it from theory to practice. In order for this thing to work, we have to lower the entry barrier as much as we can, so that we get as many libraries as possible in. As I envision it, one could start by asking for his/her library to be tested. He/she will then be able to compare the results with different tools, which may also indirectly point out models that are numerically fragile, and eventually select reference trajectories among the ones that were computed, not necessarily with the tool(s) that he or she has installed on his computer. In most cases, assuming the tools are not buggy and that the model is kosher with respect to the language spec (a mandatory requirement IMHO), selecting any specific tool result obtained with the default experiment annotation as a reference will cause all other tool results to fall within the CSV-compare tubes, so everybody's happy. For a few corner cases (2-5%?) it will be clear that the model is numerically trickier, so the library developers may introduce a verificationExperiment annotation to determine tight enough conditions that allow to reduce the numerical errors below the CSV-compare tool tolerance, so that all tool results are close enough. In some cases, it will be clear that the results of some tools are plain wrong, and this will provide useful information to tool developers for bug fixing. Some other times, every tool could give a different result, which may be a strong indication of a numerically fragile model, which is useful information for the library developer. This is also an essential feature for MSL development: developing the "Standard" library with a single tool is not really a viable proposition, the developer needs as much feedback as possible from all MLS compliant tools to develop something truly standard. Once this process has converged, the library will be usable with all tools and will have all the information embedded within it to allow continuous testing of this property, which will be publicly demonstrated on the MA servers. I believe this would be a very strong demonstration of the power of the Modelica ecosystem. Now, the crucial point to make this dream come true is that the additional effort for the library developer to enter this game will need to be as low as possible, otherwise this is simply not going to happen. With my proposal, and with some reasonable heuristics to pick meaningful comparison variables absent their indication (e.g. only the state variables for dynamic models) one can run cross-tool comparisons of an open-source library with zero additional effort by the library developer. The idea is that the majority of tested models will be fine, and it will only be necessary to provide a few custom annotations for the critical models, e.g. selecting specific variables for testing or changing the verificationExperiment annotation. BTW, this task is conceptually under the responsibility of the open-source library developer(s), which however may have not enough time or motivation to take care of it. What is nice it that other parties (i.e., the Modelica community) could help in this process by means of pull requests to the library code base that introduce such annotations where needed. These pull requests with a few added annotations can preliminary be fed to the MA testing infrastructure, so that the library developer can see the result and accept the PRs with just one click of the mouse, if the results are fine. Which means, he or she can easily put in the judgement, without much effort. This is how we'll get things done. We could get student involved in this work, which would also promote Modelica with the younger generations. IMHO this kind of process has a much higher chance of practical success than a process where we expect that Library Officers or open-source Library Developers (which are usually volunteers doing this in their spare time) have unlimited time and resources to set up elaborate testing libraries with ad-hoc developed test models, elaborate rules, multiple files, scripts and whatnot. Ideally, this may be the best option, but in practice, this is never going to happen. We need a viable path to achieve the dream I envisioned, and I believe that the verificationExperiment annotation, alongside with annotations for selecting the reference variables, is a key feature for that. My 2 cts as MAP-Lib leader. 😃 |
I agree, except for the case of chaotic systems, for which there are theoretical reasons why even a very, very tight tolerance doesn't work in the long term, due to exponential divergence of very close trajectories.
"Hopefully" is unfortunately a problematic word here, in my experience 😃. Why hoping, if you can just tighten the tolerance also when generating the result to be compared? Anyway, the problem here is that a solution with a 2% error, obtained from a model that has 10% uncertainty on key parameters (which is pretty normal for thermal systems) may be perfectly fine for all practical purposes, except automated verification, for which we have (rightfully) set a 0.2% relative tolerance. One thing is to get a result which is good enough for some application, another thing is to verify that two different tools give the same result if the numerical approximations are good enough. Why should we unnecessarily use a tigher tolerance in the experiment annotation, hence longer simulation times, to stay within the CSV bounds, which have no consideration for the model uncertainty? The fundamental issue addressed by this ticket is that the requirements for simulations are completely different whether you want to use the results to make decisions about a system being modelled, or you want to use the simulation result for cross-tool and regression checking. Different requirements lead to different simulation setups. Hence, different annotations to specify them. |
I would partially agree with @beutlich comment. To me ModelicaTest is not only models testing each component once, but intended as unit-tests of components; so if there's a weird flag there should be a test-model for that. As always coverage and testing goes hand in hand. It might be good if we also added some "integration tests" (in the testing meaning) - and to me that would fit naturally in ModelicaTest, but perhaps separated in some way. However, I understand that we might not have resources for that. In contrast, the Examples models in MSL are primarily constructed as Examples demonstrating how to use (and sometimes not use) the library. Thus in one sense using them for (cross-tool-)testing is misusing them, but on the other hand we have them and they should work reliably (at least in general: see #2340 for an exception, and as discussed we also have chaotic systems as another exception) - so why not test them as well? However, we should still keep the mind-set that Example-models are examples - not testing-models. After thinking through this I think that one conclusion is that since the Examples aren't primarily for testing and they shouldn't be the only tests for the library, it should be ok to reduce the simulation time considerably for specific Example-models (due to chaotic behavior or too large results). Whether such information is stored in the library or separately is a somewhat different issue. Both should work in principle, but I could see problems with getting agreement on the exact levels for different tools - and thus the need to add something outside of the library for specific cases and tools - which means that if we start inside the library we would have both. |
I also need to mention an elephant in the room before we fall too deeply in love with
That is, we are discussing the control of local error for some integration method (with adaptive step size) that hasn't been defined. Hence, the global error – which is what matters for verification of simulation results – will generally be bigger with increased number of steps taken. To avoid this arbitrary relation to the integration method chosen by the tool, it would make way more sense to set an This is an area where I hope that the modern and flexible DifferentialEquations.jl and the active community around it could benefit us all; that if we just keep nagging @ChrisRackauckas about this opportunity to lead the way, they will eventually set a new standard for error control that the rest of us will have to catch up with. |
It is not that simple - tolerance can be local error per step or per step length (normalized or not). However, looking at global error has two issues:
However, I don't see that such a discussion is relevant here. |
My point is that we must have realistic expectations on how far we can get with cross-tool verification when we haven't even agreed upon what |
@henrikt-ma my expectations are based on several years of experience trying to match Dymola-generated reference values with OpenModelica simulations on several libraries, most notably the MSL and Buildings, which contain a wide range of diverse models (mechanical, themal, electrical, thermo-fluid, etc.). The outcome of that experience is that the current set-up works nicely in 95% of the cases, but then we always get corner cases that need to be handled. If we don't, we end up with an improper dependence of the success of the verification process on the tool that actually generated the reference data, which is something we must absolutely get rid of, if we want the core statement "Modelica is a tool-independent modelling language" to actually mean something. I am convinced that my latest proposal will enable to handle all these corner cases nicely. Of course this is only based on good judgement, I have no formal proof of that, so I may be wrong, but I guess we should try to make a step forward, as the current situation is not really tenable for MAP-Lib. If this doesn't work, we can always change it, the MLS is not etched in stone. As to the meaning of
The point of this parameter is not to be used quantitatively, but just to have a knob that you can turn to get more or less precise time integration of differential equations. In most practical cases, experience showed that 1e-4 was definitely too sloppy for cross-tool verification, 1e-6 gives satisfactory results in most cases, but in some cases you need to further tighen that by 1 to 3 orders of magnitude to avoid numerical errors play a too big role. That's it 😅. |
Sounds good. KISS 😃
I understand the requirement, but I'm not sure this is a good solution in all cases. If the default Tolerance = 1e-6, it makes perfect sense to generate reference results with Tolerance =1e-7. But if we need to tighen it significantly, e.g. Tolerance = 1e-8, it may well be that Tolerance = 1e-9 is too tight and the simulation fails. Happened to me many times with numerically challenging thermo-fluid models. Per se, I don't think that the reference should necessarily be generated with a tigher tolerance than the verification simulation, if the latter is tight enough. Do I miss something? That was the reason for specifying the tolerance for reference generation directly, instead of doing it with some factor. At the end of the day, I guess it doesn't matter much and it's mostly a matter of taste, in fact GUIs could take care of this aspect, like giving the number of intervals which is then translated into the Interval annotation by computing StopTime-StartTime/numberOfIntervals.
This was @maltelenz's requirement. He wants to run one verification simulation with the Tolerance set in the experiment annotation. If the results fall within the 0.2% CSV-compare bounds (which is 95% of the time), the verification is passed. Otherwise, we need to override the default and try a stricter one. Do I miss something? |
Yes; if the library developer can't even make the simulation results match a reference generated with more strict tolerance, I would argue that the reference shouldn't be trusted. In my experience, it doesn't need to be 100 times smaller than I'd be very sceptic about using a |
That it only looks like a poor workaround for not being able to specify more relaxed comparison tolerance when one insists on prioritizing simulation speed over result quality. If we avoid introducing |
For the record, here are some required tolerance changes for Buildings in order to avoid cross-tool verification issues: PR lbl-srg/modelica-buildings#3867 |
The problem is, we don't only want to see that. We also want to make sure that if you tighten the tolerance, you get closer to the "right" result. Regardless of what the user experience is. |
I'd say it makes sense to perform both tests, but not to let a test with some Let's assume there is a We should remember the possibility of using
|
Agreed 100%, we're now on the same page with @maltelenz here.
Ditto.
We need to experiment a bit here. Historically, we figured out that 0.002 relative tube tolerance in CSV-compare was good enought to avoid spurious regressions in 95% of the cases, and it was tight enough to guarantee that the solutions look the same to the naked eye. The 5% few spurious regressions were simply not handled properly for cross-tool verification 😃 Once we have a proper two-tier verification procedure in place, we might as well require an even tighter tolerance for the base case, e.g. 1e-4 instead of 2e-3. We could try that out with MSL, and if that doesn't increase the number of "special cases" too much, we could change the default tolerance for CSV-compare accordingly. But I'm not hearing cries out there to get this. What I hear is people crying because of endless fighting with spurious false negatives.
I guess in the TestCase annotation 😃
I'd say at most two. One tight (currently 0.002, maybe less, see above), one more relaxed for user experience (5-10%). Currently these are hard-wired in the CSV-compare tool. The question is whether we want to make them explicit and possibly define default values in the MLS. I'm not sure. We could also say that we always run two tests, one tight and one relaxed. Given the current experience, I'd say that would definitely be overkill in most cases. I'm in favour of running a moderately tight verification with the default experiment annotation, and only in case it fails but passes a sloppier verification, re-try with the same moderately tight verification but tighter tolerances, as specified in the TestCase annotation. But I am ready to change my mind here if there are convincing arguments to do so.
Sure. But this requires a significant amount of extra library management work, which, as I argued here, will likely prevent a wider use of systematic cross-tool verification in practice. I understand that in general adding hard-wired stuff to the specification should be avoided in favour of doing things with the existing language, because then you do what you want, you can change it if you change your mind, and you don't need to ask for permission or get an agreement in a committee. However, adding a few annotations here and there to handle a few corner cases is sooo much more convenient in this case. See it as a form of syntactic sugar 😉 |
But having just a single Multiple experiment setups within the same model would be an interesting feature, but considering the impact on user interfaces related to experiments, I think it is a big task belonging to a separate discussion that shouldn't block what we need for regression testing and cross-tool comparisons. Once we have that, we can migrate the |
Language group:
Testing both models and tools. Possible to override experiment settings with TestCase-sub-annotation:
|
Note: I realized that a minor item is that any model with TestCase-annotation is not seen as part of the interface and can be updated freely without being seen as a breaking change. That may be:
(I thought that we maybe already had the first rule for Examples in MSL - but I could not find it written somewhere.) |
I have another interesting use case to motivate why we may need one extra set of simulation parameters for simulation. Consider this test of the Buildings library Buildings.Fluid.HydronicConfigurations.ActiveNetworks.Examples.InjectionTwoWayVariableReturn.diff.con.T2Sup.T: The test case under question is a day-long simulation (86400 s) with the default Dymola choice of number of intervals, i.e. 500. This means Interval = 180, i.e. three minutes. For all practical purposes, a sampling time of three minutes is good enough for a day-long simulation, since the main interest here is on slow thermal behaviour, and helps avoiding too large simulation results files. However, as it is clear from the above image, this sampling inteval is definitely not short enough to represent the signal correctly if the data points are intepolated by piecewise linear interpolation, which is standard practice, for verification comparisons. So, the tolerance tube gets an edgy shape (which has nothing to do with the actual system behaviour) and as a consequence the simulation points can fall slightly outside it. Of course one could have much worse cases of aliasing if the frequency of the oscillations are above Shannon's limit. I think this can motivate the need for a shorter Interval for generating reference data than the one set in the experiment annotation. This could also be set (if necessary) in the |
In fact, the more I think of it, the more I believe that this could explain a lot of false negatives that take place around peaks of some variables that are not sampled fast enough. It is obvious that under-sampling can lead to significant under-estimation of the actual peak value, so that the tubes build around a severly under-sampled simulation will be incorrectly too narrow. As far as I understand, this issue cannot be handled in a satisfactory way by relaxing the tube tolerance. It can be observed by human inspection, and fixed once and for all by declaring an appropriate |
I don't fully agree with this. However, to me this indicates that the tube tolerance criteria may not be appropriate if the model has somewhat periodic behavior (as is common) - if we instead of tube tolerance had used some Lp-norm of the deviations it seems it would just have worked; without having to modify the It's not necessarily that those two alternative criteria should be used - it's just that instead of starting to add extra intervals for a lot of models to fit with a specific criteria we should also re-evaluate that criteria. (But obviously not block MSL 4.1.0 release for this.) |
If the thing of interest is slow behavior, why then test a variable with fast behavior? |
I can think of several answers:
|
Two reasons:
|
The point of my post was simply that we should also take into account Shannon's theorem and aliasing issues when defining the sampling interval of reference trajectories, not just the tolerance of the underlying ODE solver. Building tubes, using Lp-norms, or performing any other kind of analysis on badly undersampled variables is never a good idea, simply because too much undersampling may lose crucial information. E.g., a periodic signal sampled at an integer multiple of its period will look like a constant, as it is well-known. IMHO this may need specific provisions different from the experiment annotation when generating reference trajectories. |
Thanks for answering my maybe a bit terse and confrontational question :) These quick oscillations, or sharp transients, are exactly the kind of signals we have the most trouble with when trying to compare our results against the MSL references, so we agree on where the problems are. |
For some ideas about periodic and chaotic models see #4477. |
Good that you point this out. This is the reason why I added the possibility of specifying the tolerance factor for generating reference data in my first proposal. @AHaumer please provide a precise link of the model in question, so that everyone involved in this discussion can play around with it. I think we had a lot of discussions on the massimi sistemi here but not enough on actual use cases. We should build a small library of publicly available problematic models that we can actually run with multiple tools, and then make sure that whatever we propose can deal with all of them nicely. |
Modelica models include an
experiment
annotation that defines the time span, tolerance and communication interval for a default simulation of the system. Usually, these parameters are set in order to get meaningful results from the point of view of the modeller. Since in many cases the models are affected by significant parametric uncertainty and modelling assumptions/approximations, it typically makes little sense to seek very high precision, say rtol = 1e-8, resulting in longer simulation times, when the results are affected by maybe 1% or more error.We are currently using these values also to generate reference results and to run simulation whose result are compared to them, both for regression testing and for cross-tool testing. This is unfortunately not a good idea, mainly for two reasons:
For testing, what we need is to select simulation parameters which somehow guarantee that the numerical solution obtained is so close to the exact one that numerical errors cannot lead to false negatives, so that a verification failure really means something has changed with the model or the way it was handled to generate simulation code.
In both cases, what we need is to clearly differentiate between the role of the experiment annotation, which is to produce meaningful results for the end-user of the model, and of some new annotation, which is meant to produce near-exact results for comparisons.
For case 1., what one typically needs is to provide a tighter tolerance, and possibly a shorter communication interval. How much tighter and shorter, it depends on the specific case, and possibly also on the set of tools involved in the comparison - there is no one-fits-all number.
For case 2., it is necessary to choose a much shorter simulation interval (smaller StopTime), so that the effects of chaotic motion or the accumulated drift of event times don't have enough time to unfurl significantly. Again, how much shorter, it depends on the participant to the game, and may require some adaptation.
For this purpose, I would propose to introduce the
verificationExperiment
annotation, with exactly the same arguments as theexperiment
annotation, to be used for the purpose of generating results for verification. Of course, if some arguments (e.g.StartTime
) are not supplied, or if the annotation is outright missing, the corresponding values of experiment annotation (or their defaults) will be used instead.The text was updated successfully, but these errors were encountered: