-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy up w3srcemd #1010
Tidy up w3srcemd #1010
Conversation
Some code moved around to group together more logically.
Keep as draft at the moment as I have some differences in the OMP/OMPH regtests... |
Sounds good, thanks for the update @ukmo-ccbunney. |
@MatthewMasarik-NOAA would you or @JessicaMeixner-NOAA be able to run a couple of the OMP/OMPH regtests from this PR to confirm that you are also seeing differences? Not the whole regtest suite - just a couple from I am slightly worried that this might be one of those non b4b issue that occurs just because I've moved some variables around (I've seen this before). I suspect there is an uninitialized variable somewhere in WW3 that is causing this. Thanks! |
@ukmo-ccbunney -- I wouldn't be surprised either about the uninitialized variables. This clean-up is really nice. I will run the regtests - it's not a big deal to just run them all but I will try to run a few ts3 w/omp/omph first to see if we can get you a faster answer. |
Thanks @JessicaMeixner-NOAA |
I think I have found the potential issue - a missing line (my fault then) - the commit above reinstates it. Just re-testing now. Sorry... |
No worries - I'll stop my tests and pull in these changes and re-test. |
The regtests are all looking good for me now (other than the usual P.S. I have realised that I have some problems with two of the
The error message looks something like this from ww3_multi:
Since this is happening in develop too, I don't think this affects this PR and I will investigate separately. |
@ukmo-ccbunney the regtests ran yesterday and I have the compare script submitted now. The code itself is a huge improvement! Thank you so much for this clean-up effort! I'll let you know about the regtests here shortly. |
Hi @ukmo-ccbunney I got a few extra unexpected diffs for some cases:
Full list:
I'm going to re-run our develop branch twice and see if these some how got in there with out us noticing. |
Hmm. I didn't see those in my tests. What files were different? |
All in the output of HS it seems and very small diffs: output/ww3_tp2.21/work_b_metis/ww3.201809_hs.nc_diff.txt :
output/ww3_tp2.21/work_b/ww3.201809_hs.nc_diff.txt
output/ww3_tp2.21/work_mb/ww3.201809_hs.nc_diff.txt
I did double check the develop and I don't get these diffs there. I'll take a closer look at the code and see if something jumps out at me. |
@JessicaMeixner-NOAA It seems to be just the PDLIB variants of ww3_tp2.21 that have differences for you. I'll keep looking. |
@JessicaMeixner-NOAA Following up from my previous comment, I think I have found a bug in the WW3/model/src/w3profsmd_pdlib.F90 Lines 3645 to 3646 in a09335e
WW3/model/src/w3profsmd_pdlib.F90 Line 3627 in a09335e
but Lines 2029 to 2030 in a09335e
It could be unrelated to the B4B issues you saw above, but equally could be clobbering some memory?? P.S. I am not sure why
I will raise a separate issue for this. |
@ukmo-ccbunney very interesting find! I'll try seeing if fixing this will resolve the issues I'm seeing. I'll keep digging on my end as well, this is great clean-up and given that you aren't seeing this, I suspect this is one of the many weird memory issues or uninitalized variables causing problems. Thinking outloud: I wonder if trying to run this case in debug mode if it'd help find anything? |
I put a local fix in for it, but the fix itself does cause some small differences compared to develop (likely because it is no longer overwriting memory?) |
@ukmo-ccbunney i was actually afraid of it changing answers, so I actually have only started the regtets w/the change in develop and then will run your branch plus that change. |
Okay @ukmo-ccbunney - back to testing this PR to see if I can see why we're getting diffs on our machines. I'll let you know how it goes. |
@ukmo-ccbunney would you mind merging in develop when you get a chance. The typo fixes from @MatthewMasarik-NOAA that I just merged in causes some very minor conflicts. |
OK - have merged in develop. |
Okay - I'm still getting diffs in the PDLIB casees:
I ran those with debug flags and am looking into see if that can help. However, I only got this warning message:
This was with intel not gnu, so maybe I'll also try that when I get a chance. I don't think this is the root cause of the diffs we're seeing though. @aronroland also mentioned that he saw this and would have a look to see if he could help as well. |
Thanks for running the tests again @JessicaMeixner-NOAA I'll get back to you. |
@ukmo-ccbunney I'm torn between there might be something in w3srce and this might be an uninitialized variable somewhere else in the code that's just popping up here as we've seen so many times. I have quite a bit going on this week, but will keep trying to run different things. I was going to run on another machine here but of course today is maintenance, so tomorrow! I'll keep you posted as I find things. |
@ukmo-ccbunney I can just say that I looked carefully at your changes and unfortunately I cannot see anything that rings the bell. I hope you can find this problem soon since this work of @ukmo-ccbunney is really a decent piece of work. |
Thank you @aronroland for taking the time to look at this - much appreciated! |
@JessicaMeixner-NOAA
I dont see these differences when running on our other HPC. This makes me think there might be some other issue that is causing the b4b issues that we have not found yet... This is the case for the HEAD of |
@ukmo-ccbunney Thank you for this update! Our other machine is back today so I'll try that as well (including running just the develop branch twice) + I'm also trying some gnu compilers + debug options to see what I can try to find. Hopefully we can find whatever this. |
Just to add a bit more to this: I have run the same ww3_tp2.21/work_b_metis regtest on our new HPC with If I do this , then two consecutive runs are identical. This makes me thing that there is definitely an unitinialised variable lurking around somewhere. Interestingly, its only on our new HPC that i see these differences, but it does have a much newer version of GFortran... |
I would not be surprised to see an uninitialized array. It's our most common issue when we have problems like this. We typically just run with intel, so I'm trying to set-up some gnu runs and haven't had full success yet. I know @mickaelaccensi runs with GNU, so maybe he could run some tests to see if he has reproduciblity with the develop branch? With intel on our other machine, I got that the develop branch as well was reproducing itself. About to run the compare with this PR (and this PR against itself) to see if that shows anything. |
So the other computer found a difference for: Hopefully fixing some of the initialized variables that have already been found will help. |
@ukmo-ccbunney I just wanted to let you know that I haven't forgotten about this. I tried initializing the uninitialized variables mentioned in #1017 but it didn't help. I'm trying more things to see what I can figure out. @mickaelaccensi any chance you've had a chance to run regression tests on this PR? |
I note that this is another PDLIB regtest. It could be that the issue lies in the SCOTCH (or ParMETIS) library. When we compile with debugging flags for the WW3 regtest, this won't enable any debugging on the linked libraries as these are already compiled... |
@JessicaMeixner-NOAA If you run two consecutive runs of However, if I run the same regtest, but using SCOTCH (rather than ParMETIS), then they are self consistent. Makes me a bit suspicious of the ParMETIS library! |
I do get consistency between two runs, but if we can get past this by simply switching things from metis->scotch, that would be GREAT news. Let me see what I can't figure out. |
@ukmo-ccbunney so switching to SCOTCH did not solve the differences for me. I don''t really think these differences are related to your PR and maybe we just ask all community code managers to review and @aronroland since this is a change in PDLIB things and if everyone agrees we merge this in? I'd prefer to figure out what is going on first especially given where the changes are, but this is a tricky issue. If we did that I'd also want to run our coupled system here just so I know what the effects are there too. Thoughts? We need to make some sort of an issue here though, because if you can't even get self-consistent answers with the PDLIB, then that's an issue that is concerning. |
@ukmo-ccbunney when you get a chance can you resolve the merge conflict with this PR. I'm not sure why there is one, but there is a simple one to resolve. |
@ukmo-ccbunney The regtests now give me the expected diffs. I believe you do not work Mondays, so whenever you get a chance if you'll resolve the conflict from merging in develop, I'll confirm it's the way I merged it in (I doubt it would be different, if it is I'll just re-run the tests, no worries), and then we should be able to get this PR merged in! |
Thanks @JessicaMeixner-NOAA |
@ukmo-ccbunney your merge was a little different from mine, so I'm doing a sanity check set of regtests. Don't expect any issues though and anticipate this being merged by the end of the day. |
matrixCompFull.txt Diffs:
|
@ukmo-ccbunney thank you for this awesome clean up and all the help finding the bugs hiding that this update uncovered! |
* Bugfix - initialised VD and VS to zero in w3srcemd. (NOAA-EMC#1037) * More efficient test for binary files in matrix.comp (NOAA-EMC#1035) * Tidy up of pre-processor directives and unused variables in w3srcemd.F90 (NOAA-EMC#1010) * Correct typo in w3srcemd.F90 pre-processor directive. (NOAA-EMC#1039) * minor bugfix for matrix grepping on keywords (NOAA-EMC#1049) * Stop masking group 1 output where icec > icen (NOAA-EMC#1019) * Doxygen documentation added, 8th subset.(NOAA-EMC#1046) * NC4 ,F90 ,XX0 switches removed from ww3_tp2.19 regtest (NOAA-EMC#1054) * CI: Fix for Intel scripts. GNU scripts updated. (NOAA-EMC#1064) * correct the computation of QP parameter, add QKK output parameter, change UST scale factor (NOAA-EMC#1050) * correct issue with ww3_multi when requesting restart2 and using nml file instead of inp file (NOAA-EMC#1070) * correct calendar for track netcdf output (NOAA-EMC#1079) * Fix missing mod_def.ww3 file in multigrid regression tests for track output (NOAA-EMC#1091) * STAB3: fix cmake build for ST4 or ST3 (NOAA-EMC#1086) * new feature to output out_grd.ww3, out_pnt.ww3 and mod_def.ww3 both in binary and ascii format using switch ASCII. (NOAA-EMC#1089) * Update local unit number arrays (NDS, MDS) to be same size of array defined in w3odatmd (size=15). Also, defined unit numbers for NDS(14) and NDS(15). (NOAA-EMC#1098) * Removed code referencing PHIOC in output section for PHICE in ww3_ounf (NOAA-EMC#1093) * implementation of the GQM (Gaussian Quadrature Method) to replace the DIA in NL1 or NL2. (NOAA-EMC#1083) * update logic to ensure you are not accessing uninitialized dates (NOAA-EMC#1114) * Initialised S and D arrays in W3SDB1 before potential early return if zero energy. (NOAA-EMC#1115) * ww3_ounp.F90: x/y units attribute corrected from 'm' to 'km' (NOAA-EMC#1088) * Bugfix: Assign unit numbers to ASCII gridded/point output in multi-grid mode. (NOAA-EMC#1118) * correct bugs to run correctly GQM implementation (NOAA-EMC#1127) * Adding documentation to w3iopo() in preparation for code for NOAA-EMC#682. (NOAA-EMC#1131) * NCEP regtest module updates: uses spack-stack/1.5.0, includes scotch/7.0.4 (NOAA-EMC#1137) * Minor update to ncep regtests (NOAA-EMC#1138) * Updated intel workflow to install oneapi compilers from new location. (NOAA-EMC#1157) * Add unit test for points I/O code. (NOAA-EMC#1158) * Update Intel CI (relocate /usr/local; ensure intel-oneapi-mpi; use ubuntu-latest) (NOAA-EMC#1161) * remove lookup table for ST4 to speed up computation and clean up the ST4 code (NOAA-EMC#1124) Co-authored-by: Fabrice Ardhuin <[email protected]> * initialize USSP_WN for mod_def (NOAA-EMC#1165) * Introduce IC4M8 and IC4M9 to WW3 (NOAA-EMC#1176) * clean up and add ST4 variables (NOAA-EMC#1181) * w3fld1md.F90: fix divide by zero in CRIT2 parameter (NOAA-EMC#1184) * ww3_prnc.F90: fix out-of-scope grid index write statement (NOAA-EMC#1185) * Bugfix: address potential divide-by-zero in APPENDTAIL (NOAA-EMC#1188) Co-authored-by: Denise Worthen <[email protected]> * Provide initial drying of cells with depth < ZLIM for SMC grid. (NOAA-EMC#1192) * Output OMP threading info to screen when running ww3_shel/ww3_multi compiled with the OMPG switch. Also fixes truncation of build.log when running run_cmake_build. (NOAA-EMC#1191) * Added screen output showing number of threads when OMP enabled. * update build to get more info in logs (NOAA-EMC#46) --------- Co-authored-by: Jessica Meixner <[email protected]> * update run_cmake_test to catch build errors and exit (NOAA-EMC#1194) * fix merge conflicts * Fix gustiness bug, as suggst by Pieter * Change USTARsigma to WAM implementation --------- Co-authored-by: Chris Bunney <[email protected]> Co-authored-by: Mickael Accensi <[email protected]> Co-authored-by: Benoit Pouliot <[email protected]> Co-authored-by: Matthew Masarik <[email protected]> Co-authored-by: Ghazal-Mohammadpour <[email protected]> Co-authored-by: Jessica Meixner <[email protected]> Co-authored-by: Biao Zhao <[email protected]> Co-authored-by: Edward Hartnett <[email protected]> Co-authored-by: Alex Richert <[email protected]> Co-authored-by: Fabrice Ardhuin <[email protected]> Co-authored-by: W. Erick Rogers <[email protected]> Co-authored-by: Denise Worthen <[email protected]> Co-authored-by: Camille Teicheira <[email protected]>
Pull Request Summary
Tidy up
#ifdef
preprocessor lines in w3srcemd.Also remove a bunch of unused variables.
Description
Tidying up preprocessor directives after the conversion from old format
!/XXX
switches to preprocessor#ifdef
directives. This includes:Merging duplicate directives
Example:
becomes:
Merging lines with common code
Example:
becomes
Grouping nested directives onto single line, where applicable:
Example:
becomes:
Other changes
USE
statements or variable declarations that occur in an optional#ifdef
block have been moved after the non-optional entries.No change to the outputs or regression tests are expected
Issue(s) addressed
Commit Message
Tidy up of pre-processor directives and unused variables in
w3srcemd.F90
Check list
Testing
Just the usual non-b4b regtests: