Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary SIMD instruction sets for Jet, first round of cleanup in rt.conf, initialize cld_amt to zero for regional runs (dycore) #353

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Dec 31, 2020

Description

1. Remove unnecessary SIMD instruction sets for Jet

When option SIMDMULTIARCH is used at compile time (currently only on Jet), four SIMD instruction sets are compiled into the executable. The cmake config cmake/Intel.cmake defines these flags for the build. By reducing the instruction sets from

-axSSE4.2,AVX,CORE-AVX2,CORE-AVX512

to

-axSSE4.2,CORE-AVX2

we are reducing the compile time by about 50%. Further, nobody has looked at the results (in terms of performance and accuracy) when CORE-AVX512 is used. With the two flags -axSSE4.2,CORE-AVX2 we can run the same high performance code on the newer jet partitions that we run on other Intel systems (NOAA RDHPC, NCAR, ...), and we have a fallback option for the older Jet platforms that do not understand AVX2 SIMD instruction sets.

2. First round of cleanup in rt.conf

See #352 for a detailed description. This work is in preparation and to facilitate the upcoming overhaul of the global model regression tests.

Also included:

Issue(s) addressed

Testing

Regression tests will be run on all tier-1 platforms. For systems for which the baseline is not expected to change (see below), the existing baseline will be copied to the new date tag and used to verify against. For systems for which the baseline is expected to change, a new baseline will be created and used to verify against.

No changes are expected on the following systems:

  • cheyenne.gnu (confirmed by running rt.sh against existing baseline on 12/31/2020)
  • hera.intel (confirmed by running rt.sh against existing baseline on 12/31/2020)
  • hera.gnu (confirmed by running rt.sh against existing baseline on 12/31/2020)

Changes are expected on the following systems (because additional tests are run and, for jet only, the compiler flags have changed):

  • jet.intel (creating new baseline and verifying against it successful, confirmed 1/4/2021)
  • gaea.intel (creating new baseline and verifying against it successful, confirmed 12/31/2020)
  • cheyenne.intel (creating new baseline and verifying against it successful, confirmed 12/31/2020)
  • orion.intel (creating new baseline and verifying against it successful, confirmed 1/4/2021)
  • wcoss_cray (no access)
  • wcoss_dell_p3 (no access)

Final regression testing on 01/06/2021: all tests passed, logs updated in the PR.

Dependencies

NCAR/ccpp-physics#539
NOAA-EMC/GFDL_atmos_cubed_sphere#49
NOAA-EMC/fv3atm#220
#353

@junwang-noaa
Copy link
Collaborator

Can we have a detailed plan what the future rt.conf will look like? I thought we are going to reduce the total number of rt.conf files, but now we have rt_gnu.conf, rt_acorn.conf, rt_ccpp_dev.conf. Also why do we remove the "-f" option which may allow us integrate rt_35d.conf into rt.conf. Also we worked with other research collaborators to port the code to stampede, before we have a formal plan on supporting a wide range developers including people using stampede, please do not remove the file. I think we need to write a document for the rt.conf development.

@climbfuji
Copy link
Collaborator Author

climbfuji commented Jan 4, 2021

Can we have a detailed plan what the future rt.conf will look like? I thought we are going to reduce the total number of rt.conf files, but now we have rt_gnu.conf, rt_acorn.conf, rt_ccpp_dev.conf.

That's why I started reducing it by removing rt_stampede.conf, which a few lines below you ask me to put it back. Leaving this file there when it is not tested doesn't make any sense. If needed we can simply pull out the two most standard tests of rt.conf when we identify a team that maintains and run rt.sh on stampede, and that the full rt.conf doesn't run for some reason.

EDIT For the same reason, we should delete rt_acorn.conf in my opinion. rt_gnu.conf is needed because the differences between Intel and GNU regression test sets are still to big to incorporate them in one file, unless we add another column that specifies the compiler(s) in the same way the machines can be specified.

rt_ccpp_dev.conf is required and maintained by DTC+GSL, but if that one file is offending some we can think about whether it makes sense to maintain it only in their fork. I am note sure it is, though, because the file provides a reference for the develop users what kind of tests are run routinely by the CCPP and RAP/HRRR physics developers

Also why do we remove the "-f" option which may allow us integrate rt_35d.conf into rt.conf.

Because @DusanJovic-NOAA asked me to do so, and because it is not needed? -l xyz.conf exists, and without -l it goes to rt.conf automatically, what does -f do on top of it?

Also we worked with other research collaborators to port the code to stampede, before we have a formal plan on supporting a wide range developers including people using stampede, please do not remove the file.

See above.

I think we need to write a document for the rt.conf development.

Yes.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Jan 4, 2021 via email

@climbfuji
Copy link
Collaborator Author

Let's discuss this at ufs infrastructure development, we do have tier2 or tier3 platforms. We may not have enough resources to test it on certain platforms, it does not mean " Leaving this file there when it is not tested", I do think corresponding parties can have test them. In general we need to provide a way to allow other collaborators to run ufs. Again I just don't know where the code changes lead us to, we need a development plan for it.

Based on today's discussion it seems to be appropriate to remove rt_stampede.conf, since rt.sh is only run on tier-1 platforms (which stampede is not). For tier-2 platforms, we maintain the module files and (try to, time and access permitting) make sure that the model compiles, but we do not maintain running the regression tests.

We will retain rt_acorn.conf as an exception until that system has been adopted by NOAA, at which time it will be using rt.conf as all other tier-1 platforms. We should also keep rt_ccpp_dev.conf, and we must keep rt_gnu.conf unless we can and want to run all tests in rt.conf with GNU (several tests will take much longer with GNU).

We did not discuss the -f flag. Is there any reason why this flag is needed? I can't see it, but I may be missing something.

@climbfuji climbfuji changed the title Remove unnecessary SIMD instruction sets for Jet, first round of cleanup in rt.conf Remove unnecessary SIMD instruction sets for Jet, first round of cleanup in rt.conf, initialize cld_amt to zero for regional runs (dycore) Jan 4, 2021
@climbfuji climbfuji marked this pull request as ready for review January 4, 2021 21:44
@climbfuji climbfuji added Baseline Updates Current baselines will be updated. Waiting for Reviews The PR is waiting for reviews from associated component PR's. labels Jan 4, 2021
@climbfuji
Copy link
Collaborator Author

@junwang-noaa @DusanJovic-NOAA please take a look at the modified rt.conf. Once Denise's current PR is merged, I will update this one and then we are ready to create new baselines. If the rt.conf changes are agreed upon by then, we don't have to do any more back and forth. Thanks!

@climbfuji
Copy link
Collaborator Author

This PR is ready for review. The new regression test baseline date tag is 20210106.

tests/rt_acorn.conf Outdated Show resolved Hide resolved
tests/rt.conf Outdated Show resolved Hide resolved
@climbfuji climbfuji force-pushed the simd_update_and_rt_cleanup_20201231 branch from 0ecfc31 to 00d34e1 Compare January 6, 2021 16:32
@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Jan 6, 2021 via email

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Jan 6, 2021 via email

@junwang-noaa
Copy link
Collaborator

@climbfuji would you please give a short summary what additional tests have been added on orion, jet, cheyenne, and wcoss?

@climbfuji
Copy link
Collaborator Author

It's best to look at https://docs.google.com/spreadsheets/d/1tf7ufYW2umLXQQ2G43h64ESGw5jYb67MAqEbvGPNAm4/edit?ts=5feca219#gid=1397536520 and the updated rt.conf. The following tests have changed:

  • fv3_ccpp_ca, fv3_ccpp_lndp, fv3_ccpp_iau, fv3_ccpp_lheatstrg, fv3_ccpp_satmedmfq, fv3_ccpp_gfdlmprad_32bit_post now run on all platforms
  • fv3_ccpp_rap and fv3_ccpp_hrrr now run on all platforms except jet (known issue, will be fixed when the next round of updates from gsl/develop are merged into the authoritative repositories)
  • all v15p2 and v16beta tests are now run on all platforms (in both DEBUG and PROD mode)
  • fv3_ccpp_gfsv16_csawmg and fv3_ccpp_gfsv16_csawmgt not run on cheyenne (crashes with a bus error, as before), but they do now run on gaea and jet

That's it.

@climbfuji
Copy link
Collaborator Author

@DusanJovic-NOAA I think you can start creating baselines on wcoss and orion (if possible). I am cruising along on jet, gaea, cheyenne, hera.

@climbfuji
Copy link
Collaborator Author

It's best to look at https://docs.google.com/spreadsheets/d/1tf7ufYW2umLXQQ2G43h64ESGw5jYb67MAqEbvGPNAm4/edit?ts=5feca219#gid=1397536520 and the updated rt.conf. The following tests have changed:

  • fv3_ccpp_ca, fv3_ccpp_lndp, fv3_ccpp_iau, fv3_ccpp_lheatstrg, fv3_ccpp_satmedmfq, fv3_ccpp_gfdlmprad_32bit_post now run on all platforms
  • fv3_ccpp_rap and fv3_ccpp_hrrr now run on all platforms except jet (known issue, will be fixed when the next round of updates from gsl/develop are merged into the authoritative repositories)
  • all v15p2 and v16beta tests are now run on all platforms (in both DEBUG and PROD mode)
  • fv3_ccpp_gfsv16_csawmg and fv3_ccpp_gfsv16_csawmgt not run on cheyenne (crashes with a bus error, as before), but they do now run on gaea and jet

That's it.

I will update the spreadsheet that Dusan put together after the PR is merged.

@climbfuji climbfuji added the Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. label Jan 6, 2021
@climbfuji
Copy link
Collaborator Author

@junwang-noaa @DusanJovic-NOAA This PR is ready to merge. The CI tests just kicked off, they'll be done by tomorrow morning easily. Thanks @DusanJovic-NOAA for your help with running the regression tests today.

module load hdf5/1.10.6
module load netcdf/4.7.4
module load pio/2.5.1
module load esmf/8_1_0_beta_snapshot_27
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no esmf debug module on jet ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the platforms where we haven't compiled the debug module yet we simply had to copy the existing module over so that the code would still run. This is because somebody removed the logic (or it is not working as intended) that says "only if a debug module exists, use it; otherwise use the standard module".

Yes, when the next HPC stack release is rolled out we should create debug modules for all platforms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, it's the same for wcoss_cray, for example.

@climbfuji
Copy link
Collaborator Author

CI tests passed; @DeniseWorthen once you approve we are ready to merge. Thanks everyone for your review.

@DusanJovic-NOAA DusanJovic-NOAA merged commit 5adf1d8 into ufs-community:develop Jan 7, 2021
DomHeinzeller pushed a commit to NOAA-GSL/ufs-weather-model that referenced this pull request Feb 23, 2021
* Updates to stochastic_physics_wrapper (ufs-community#280)

Fix to stochastic_physics_wrapper to allow for random patterns to update at a longer time-step than model

Co-authored-by: Dom Heinzeller <[email protected]>

* Update for Jet, bug fixes in running with frac_grid=T and GFDL MP, and in restarting with frac_grid=T  (ufs-community#304)

Update the modulefile for jet.intel to enable UPP v10.0.0. The hpc-stack v1.0.0 pre-release is used for this. Small changes are made to tests.rt.sh for jet.intel and gaea.intel (consistency with other platforms).

The submodule pointer update for fv3atm addresses bugs in the ufs-weather-model with frac_grid=T and GFDL microphysics, and with restarting the model when frac_grid=T (from @shansun6 and @SMoorthi-emc).

* Feature/update mom6 and retain b4b results for 025x025 resolution (ufs-community#290)

* point MOM6 to new branch which corresponding to GFDL 20201022 commit
* modify fms_files.cmake and mom6_files.cmake to reflect changes in MOM6 code as this version of MOM6 contains some file deletion, new files being added and renaming of files
* manually set MOM6 parameters in order to retain origonal results for 0.25x0.25 resolution
* update MOM6 to include Bugfix for mom6solo to be built
* modify compile.sh to allow mom6solo compiling
* modify MOM_input_template for all resolutions based on GFDL MOM6-example main branch update on 20201022
* change executable permissions for CMakeLists.txt
* chmod 644 to 6 files Dom pointed out
* chmod for CMakeLists.txt and tests/compile.sh
* change baseline directpory to 20201202 in rt.sh

* Update CICE, Move regression test input outside baseline directory (ufs-community#270)

*Updates CICE to most recent develop branch of NOAA-EMC
* Sets n_aero (number of aerosols) in ice_in_template to 0.
* removes trailing whitespace from ice_in
* moves regression test input outside baseline directory (ufs-weather PR ufs-community#312)

Co-authored-by: Dusan Jovic <[email protected]>
Co-authored-by: Dom Heinzeller <[email protected]>

* Updates to build for JEDI linking/control, add wcoss2 (ufs-community#295)

* Build on wcoss2 (acorn)
* Use -march=core-avx2 instead of -xCORE-AVX2 on wcoss2
* Updates to build for JEDI linking/control
* Removed unnecessary include files and INLINE POST setting
* Updated to address PR suggestions.
* Add rt_acorn.conf. Change /lfs/h2 to /lfs/h1.
* Update .gitmodules and submodule pointer for fv3atm for code review and testing
* regression test results
* Updated .gitmodules and removed extraneous file
* Fixed .gitmodules and updated pointer for FV3
* Updated pointer to NEMS repo
Co-authored-by: Dusan Jovic <[email protected]>
Co-authored-by: Dom Heinzeller <[email protected]>

* Final-final GFS v16 updates / restart reproducibility bugfixes (ufs-community#325)

* Update .gitmodules and submodule pointer for fv3atm for code review and testing
* Add GFS v16 beta restart test, update stochastics test
* Update regression test baseline date tag to 20201214; skip-ci
* tests/rt.conf: bugfix, add missing 'fv3' to new stochy tests; skip-ci
* Regression test logs for gaea.intel, hera.gnu, hera.intel, jet.intel, orion.intel; skip-ci
* Run GFS v16beta tests also on wcoss; regression test logs for wcoss; skip-ci
* Regression test logs for cheyenne.intel and cheyenne.gnu
* Revert change to .gitmodules and update submodule pointer for fv3atm

* Add optional bulk flux calculation in ufs-datm (ufs-community#266)


* Update NEMS DATM and CMEPS to allow the optional bulk flux formulation; add two tests using the option
* Update top level CMakeList.txt to have compile flags for MOM6 and CICE6 identical for ufs-cpld and ufs-datm
* Add optional configuration variable to nems.configure to specify the directory where CMEPS will write restarts
* Adds cheyenne tasking variables to default_vars and sets WW3_COMP to cheyenne for platform cheyenne.intel 

*NOTE: Baselines develop-20201215 exist on all platforms, regression tests were run against exactly that baseline on all systems except cheyenne.intel. On cheyenne.intel the tests were run against 20201214, and this baseline is identical to 20201215 (as per "diff -r develop-20201214 develop-20201215").

Co-authors:
@DusanJovic-NOAA
@aerorahul
@JessicaMeixner-NOAA

skip-ci

* Add 2 new tests for DATM-MOM6-CICE6 application (ufs-community#332)

* Add the following 2 tests: datm_restart_cfsr, datm_debug_cfsr
* Add wcoss_dell_p3.log.
* Add Hera log, Orion log, wcoss_dell_p3 log.

* RRTMGP and Thompson MP coupling (ufs-community#323)

* Feature branch with RRTMGP and Thompson MP
* Updated FV3/ccpp-physics. Added namelist and configuration for RRTMGP RTs using GSD physics.
* Updated FV3
* Update physics in FV3
* Updated baselines in rt.sh
* Updated RT logs. Updated FV3 physics submodule pointer.
* Updated FV3 hash and .gitmodules

* Regression test log for PR ufs-community#323 for jet.intel (ufs-community#336)

* Update modules with hpc-stack v1.1.0 (ufs-community#319)

* Update modules with hpc-stack v1.1.0
* Minor bug fixes to CCPP UGWP

Co-authored-by: Dom Heinzeller <[email protected]>

* Replace old regional SDF with FV3_GFS_v15_thompson_mynn (ufs-community#333)

* Replace old FV3_GFS_2017_gfdlmp_regional SDF for regional tests with FV3_GFS_v15_thompson_mynn.
* Final path to IC's and new results.  Also, input.nml updated.
* Update RegressionTests_wcoss_dell_p3.log
* Update RegressionTests_wcoss_cray.log
* Update RegressionTests_hera.intel.log
* Update RegressionTests_jet.intel.log
* Update RegressionTests_orion.intel.log
* Update RegressionTests_cheyenne* logs.
* Update RegressionTests_hera.gnu.log

* Feature/ww3update (ufs-community#334)

This updates the WW3 submodule pointer to point to the top of the WW3 develop branch.
The path to WW3 inputs is changed to input-data-20201201/WW3_input_data_20201207/

* Remove IPD (step 1) (ufs-community#331)

Make CCPP=Y the default in tests/compile.sh. Remove CCPP=Y from tests/rt*.conf and adjust formatting.
Update submodule pointer for MOM6 to include PR ufs-community#341 ("Update MOM6 to GFDL's 20201218 commit")
Add modulefiles/wcoss_cray/fv3_debug (identical to modulefiles/wcoss_cray/fv3)
Fix broken utest (see ufs-community#348)

* Update the format of rt.conf (ufs-community#349)

Update the format of MACHINES column in rt.conf (and other .conf files). This column can be either empty, which means a test will run on all supported machines, or start with - or + sign to exclude or include specified machines explicitly.

* Add checkpoint restarts for ufs-cpld (ufs-community#342)


* Adds 3 checkpoint restart tests for the ufs-cpld model
* Drops the existing c92mx025 restart test
* Adds cheyenne.intel as tested configuration for ufs-cpld and ufs-datm
* Fixes instances of srf_data* in various fv3_conf files

* add frac grid input, update and add additional cpld tests (ufs-community#354)


* Updates FV3_input_frac to add both benchmark dates and L127 files
* Adds additional tests and restart tests for coupled model
* Sets all cpld tests to use frac grid input by default
* Removes all instances of  USE_LA_LI2016=True except for benchmark+wave configurations

* Remove unnecessary SIMD instruction sets for Jet, first round of cleanup in rt.conf, initialize cld_amt to zero for regional runs (dycore) (ufs-community#353)

* Reduce SIMDMULTIARCH sets from four to two in cmake/Intel.cmake
* First cleanup of regression test config tests/rt.conf
* tests/rt.sh: reduce number of build jobs on jet.intel from 10 to 5
* Remove flags -f and -s from rt.sh, remove SET logic, remove corresponding column in all rt*conf files
* Remove tests/rt_acorn.conf and run GFS v15p2 and GFS v16beta DEBUG tests on all platforms

* Implementation of CCPP timestep_init and timestep_final phases (ufs-community#337)

* Update .gitmodules and submodule pointer for fv3atm for code review and testing
* Update submodule pointer for fv3atm; skip-ci
* Don't try to compile all suites in DEBUG mode on cheyenne.intel, weird bug on compute nodes; skip-ci
* Don't try to compile all suites in DEBUG mode on wcoss_cray; skip-ci
* Regression test logs for cheyenne.gnu, cheyenne.intel, gaea.intel, hera.gnu, hera.intel, jet.intel, orion.intel; skip-ci
* Don't try to compile all suites in DEBUG mode on wcoss_dell_p3; skip-ci
* Regression test logs for wcoss_cray and wcoss_dell_p3
* Revert change to .gitmodules and update submodule pointer for fv3atm

* Update CMEPS  (ufs-community#345)


* Update CMEPS for recent changes, including addition of new run "post" run phases to eliminate redundant mapping, multiple ice sheet capability and ocn->land ice dynamic mapping
* Add a new test fv3_gfs_v16_RRTMGP_c192L127

Co-authored-by: Jun Wang <[email protected]>

* Remove IPD steps 3 and 5 (ufs-community#357)

Reduce SIMDMULTIARCH sets from four to two in cmake/Intel.cmake
* First cleanup of regression test config tests/rt.conf
* tests/rt.sh: reduce number of build jobs on jet.intel from 10 to 5; skip-ci
* Remove flags -f and -s from rt.sh, remove SET logic, remove corresponding column in all rt*conf files
* Update usage in rt.sh, add modulefiles/jet.intel/fv3_debug; skip-ci
* CCPP is default in cmake build
* Add debug modulefiles for linux.gnu and macosx.gnu
* Update submodule pointer for fv3atm
* Change logic in CMakeLists.txt and tests/compile.sh so that 32BIT=ON automatically sets DYN32=ON; skip-ci
* Move logic to set DYN32 - depending on 32BIT setting - to fv3atm
* Remove -DCCPP=ON from tests/compile.sh; update submodule pointer for fv3atm; skip-ci

* point fv3 to EMC develop branch (ufs-community#377)

* update cpl gfsv16 tests, rrtmgp fix and bug fixes in cmeps (ufs-community#378)

* update CMEPS, fix character length error for gnu compile
* add Dusan's fix for rt_utils.sh
* update cpl gfsv16 tests, replace seaice_newland.grb with global_slmask.t1534.3072.1536.grb, recover input.mom6.nml.IN, update input directory, update global thread and decomp tests, update fdiag for global control
* point to Dustins rrtmgp fix branch
* update input directory

Co-authored-by: denise.worthen <[email protected]>
Co-authored-by: Jun Wang <[email protected]>

* Update develop from NOAA-GSL: RUC ice, MYNN sfclay, stochastic land perturbations (ufs-community#386)

* Update .gitmodules and submodule pointer for fv3atm for gsl/develop branch
* RUC ice for gsl/develop (replaces #47) (#49)Implementation of RUC LSM ice model in CCPP
* Squash-merge climbfuji:rucice_gfsv16dzmin into gsl/develop
* Add kice=9 to tests/tests/fv3_ccpp_rap and tests/tests/fv3_ccpp_hrrr
* Change NEW_BASELINE directory for gsl/develop to avoid conflicts with development work on the authoritative branches
* Add KICE=9 to tests/tests/fv3_ccpp_gsd_unified_ugwp and tests/tests/fv3_ccpp_gsd_drag_suite_unified_ugwp
* Revert change to .gitmodules and update submodule pointer for fv3atm
* Update gsl/develop from develop 2020/12/08 (#50)
* Updates to stochastic_physics_wrapper (ufs-community#280)
Fix to stochastic_physics_wrapper to allow for random patterns to update at a longer time-step than model
* Update for Jet, bug fixes in running with frac_grid=T and GFDL MP, and in restarting with frac_grid=T  (ufs-community#304)
Update the modulefile for jet.intel to enable UPP v10.0.0. The hpc-stack v1.0.0 pre-release is used for this. Small changes are made to tests.rt.sh for jet.intel and gaea.intel (consistency with other platforms).
The submodule pointer update for fv3atm addresses bugs in the ufs-weather-model with frac_grid=T and GFDL microphysics, and with restarting the model when frac_grid=T (from @shansun6 and @SMoorthi-emc).
* Land stochastic perturbations (#57)

* dycore options to add zero-gradient BC to reconstruct interface u/v and change dz_min as input (ufs-community#369)

* Update fv3atm
* update ccpp control test forecast length to 24h
* remove rename command
* Add CI related changes
* Update RT logs
* Update RT log files
* Add the gaea RT log file
* Update the point of fv3atm
* Update fv3atm
Co-authored-by: Jun Wang <[email protected]>
Co-authored-by: MinsukJi-NOAA <[email protected]>
Co-authored-by: Jun Wang <[email protected]>

* MOM6 bugfixes, GFDL update, update CDMBGWD settings; fix for restart reproducibility (without waves) when USE_LA_LI2016=True, sign error on fprec passed to ocean, GFDL update, resolution dependent cdmbgwd settings (ufs-community#379)


* implements two MOM6 bugfixes in the NUOPC MOM6 cap to allow restart reproducibility when USE_LA_LI2016=True and to change the sign of the latent heat flux associated with frozen precipitation (fprec) exported to MOM6

* updates MOM6 to include the GFDL 20210120 main branch which contains EMC's wave coupling code, alone with some minor code standardization and documentation

* updates the cdmbgwd namelist settings for FV3 standalone tests at C96 and implements resolution dependent values for ufs-cpld tests

Co-authored-by: Ali <[email protected]>

* Remove legacy gnumake build from fv3atm and NEMS, remove legacy Python 2.7 support, rename v16beta to v16 and RT updates (ufs-community#384)

* Update .gitmodules and submodule pointers for fv3atm and NEMS
* Remove Python 2.7 support from top-level CMakeLists.txt
* Reduce forecast length of test fv3_ccpp_gfs_v16_RRTMGP_c192L127 from 24h to 12h
* Rename v16beta to v16 everywhere except the public release documentation
* Bugfixes and missing changes
* Remove 'export CCPP_LIB_DIR=ccpp/lib' from all regression tests
* Update regression test baseline date tag to 20210128; skip-ci
* Update ecflow-python environment on cheyenne and jet; skip-ci

* Update CMEPS for HAFS integration; add datm and coupled-model tests on Gaea (ufs-community#401)


* Add HAFS support in NOAA-EMC/CMEPS 
* Add coupled and datm tests for Gaea.intel

Co-authored-by: Jun Wang <[email protected]>
Co-authored-by: Bin Li <[email protected]>

* Move LSM vegetation lookup tables into CCPP, clean up RUC snow cover on ice initialization (remove IPD step 2)  (ufs-community#407)

* Regression test logs for all tier=1 platforms

* updates FMS to 2020.04.01 (ufs-community#392)

* updates FMS to 2020.04.01
* fixes fms_files.cmake
* removes extra horiz_interp
* Workaround for FMS 2020.04.01 for Cheyenne with GNU 9.1.0, incl. regression test log
Co-authored-by: Mikyung Lee <[email protected]>
Co-authored-by: Dom Heinzeller <[email protected]>

* add optional mesh in MOM6; add dz_min and min_seaice as configurable variables for coupled model (ufs-community#399)

*Implements an optional setting in the cpld and datm nems.configure files to specify whether the MOM6 cap should use a mesh or a grid

*Adds configurable settings for min_seaice to gfs_physics_nml and dz_min to fv_core_nml.

* UGWP v0 v1 combined (ufs-community#396)

- combines the changes in PRs ufs-community#360 and ufs-community#382
- adds three regression tests `fv3_ccpp_gfsv16_ugwpv1 `, `fv3_ccpp_gfsv16_ugwpv1_warmstart` and `fv3_ccpp_gfsv16_ugwpv1_debug`
- contains updates and bugfixes for `nc_compare.py` and the CI tests from @MinsukJi-NOAA 
- update Python3 environment on jet.intel, gaea.intel, cheyenne.{intel,gnu}
- turn off (again) test `fv3_ccpp_decomp` on jet.intel, this test didn't work in the past, but recently it "passed", because the error checking with `nc_compare.py` failed silently and we didn't notice it

Co-authored-by: valery.yudin <[email protected]>
Co-authored-by: Michael Toy <[email protected]>
Co-authored-by: MinsukJi-NOAA <[email protected]>

* Update regression tests from GFSv15+Thompson to GFSv16+Thompson, include "Add one regional regression test in DEBUG mode. (ufs-community#419)" (ufs-community#421)

* Add one regional regression test in DEBUG mode.
* Update .gitmodules and submodule pointer for fv3atm for code review and testing
* Update regression tests from GFSv15+Thompson to GFSv16+Thompson
* Combine several COMPILE lines in tests/rt.conf and tests/rt_gnu.conf
* Regression test log for cheyenne.{gnu,intel},gaea.intel, hera.gnu, jet.intel,hera.intel,orion.intel;wcoss_cray and wcoss_dell_p3;

Co-authored-by: Phil Pegion <[email protected]>
Co-authored-by: jiandewang <[email protected]>
Co-authored-by: Denise Worthen <[email protected]>
Co-authored-by: Dusan Jovic <[email protected]>
Co-authored-by: Mark Potts <[email protected]>
Co-authored-by: BinLi-NOAA <[email protected]>
Co-authored-by: dustinswales <[email protected]>
Co-authored-by: Kyle Gerheiser <[email protected]>
Co-authored-by: RatkoVasic-NOAA <[email protected]>
Co-authored-by: Ali.Abdolali <[email protected]>
Co-authored-by: Jun Wang <[email protected]>
Co-authored-by: Jun Wang <[email protected]>
Co-authored-by: XiaqiongZhou-NOAA <[email protected]>
Co-authored-by: Ali <[email protected]>
Co-authored-by: Bin Li <[email protected]>
Co-authored-by: MiKyung Lee <[email protected]>
Co-authored-by: valery.yudin <[email protected]>
Co-authored-by: Michael Toy <[email protected]>
Co-authored-by: MinsukJi-NOAA <[email protected]>
AnningCheng-NOAA added a commit to AnningCheng-NOAA/ufs-weather-model that referenced this pull request Mar 8, 2021
* upstream/develop:
  update MOM6 to GFDL 20210224 main branch commit (ufs-community#439)
  Add GNU and Cheyenne Support to Automated RT (ufs-community#444)
  Move Noah MP init to CCPP and update Noah MP regression tests, ice flux init bug fix in CCPP (ufs-community#425)
  Feature/rt automation (ufs-community#403)
  Update ccpp-physics. Make RRTMGP thread safe (ufs-community#418)
  Update regression tests from GFSv15+Thompson to GFSv16+Thompson, include "Add one regional regression test in DEBUG mode. (ufs-community#419)" (ufs-community#421)
  UGWP v0 v1 combined (ufs-community#396)
  add optional mesh in MOM6; add dz_min and min_seaice as configurable variables for coupled model (ufs-community#399)
  updates FMS to 2020.04.01 (ufs-community#392)
  Move LSM vegetation lookup tables into CCPP, clean up RUC snow cover on ice initialization (remove IPD step 2)  (ufs-community#407)
  Update CMEPS for HAFS integration; add datm and coupled-model tests on Gaea (ufs-community#401)
  Remove legacy gnumake build from fv3atm and NEMS, remove legacy Python 2.7 support, rename v16beta to v16 and RT updates (ufs-community#384)
  MOM6 bugfixes, GFDL update, update CDMBGWD settings; fix for restart reproducibility (without waves) when USE_LA_LI2016=True, sign error on fprec passed to ocean, GFDL update, resolution dependent cdmbgwd settings (ufs-community#379)
  dycore options to add zero-gradient BC to reconstruct interface u/v and change dz_min as input (ufs-community#369)
  Update develop from NOAA-GSL: RUC ice, MYNN sfclay, stochastic land perturbations (ufs-community#386)
  update cpl gfsv16 tests, rrtmgp fix and bug fixes in cmeps (ufs-community#378)
  point fv3 to EMC develop branch (ufs-community#377)
  Remove IPD steps 3 and 5 (ufs-community#357)
  Update CMEPS  (ufs-community#345)
  Implementation of CCPP timestep_init and timestep_final phases (ufs-community#337)
  Remove unnecessary SIMD instruction sets for Jet, first round of cleanup in rt.conf, initialize cld_amt to zero for regional runs (dycore) (ufs-community#353)
  add frac grid input, update and add additional cpld tests (ufs-community#354)
  Add checkpoint restarts for ufs-cpld (ufs-community#342)
  Update the format of rt.conf (ufs-community#349)
  Remove IPD (step 1) (ufs-community#331)
  Feature/ww3update (ufs-community#334)
  Replace old regional SDF with FV3_GFS_v15_thompson_mynn (ufs-community#333)
  Update modules with hpc-stack v1.1.0 (ufs-community#319)
  Regression test log for PR ufs-community#323 for jet.intel (ufs-community#336)
  RRTMGP and Thompson MP coupling (ufs-community#323)
  Add 2 new tests for DATM-MOM6-CICE6 application (ufs-community#332)
  Add optional bulk flux calculation in ufs-datm (ufs-community#266)
  Final-final GFS v16 updates / restart reproducibility bugfixes (ufs-community#325)
  Updates to build for JEDI linking/control, add wcoss2 (ufs-community#295)
  Update CICE, Move regression test input outside baseline directory (ufs-community#270)
  Feature/update mom6 and retain b4b results for 025x025 resolution (ufs-community#290)
  Update for Jet, bug fixes in running with frac_grid=T and GFDL MP, and in restarting with frac_grid=T  (ufs-community#304)
  Updates to stochastic_physics_wrapper (ufs-community#280)
  Update develop from gsd/develop 2020/11/20: Unified gravity wave drag, updates to other GSL physics (ufs-community#297)
  Fix to allow quilting with non-factors for layout (ufs-community#250)
  rt update (ufs-community#261)
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
* Add preamble script from global workflow.

* Call preamble script in j-jobs and ex-scripts

* Call preamble in other scripts.

* Make names of j-jobs and ex-scripts consistent.

* Working towards nco vars in table 1.

* Change default bin directory to exec

* Appen FATAL ERROR to print_err_msg_exit.

* Replace some cp, cd, mkdir calls with their corresponding _vrfy versions

* Add job and jobid to the job-card.

* Add cyc and subcyc to rocoto xml

* Add a j-job preamble script for setpdy.

* Add a j-job postamble as well.

* Define some Table 1 vars in setup.

* Remove unused SRC_DIR, and rename others

* Rename CYCLE_BASEDIR to COMIN_BASEDIR

* Create the NCO root directories in setup.

* Remove source machine file wrapper.

* Bug fix in job_preamble.

* Make make_ics/lbcs use DATA directory properly.

* Make run_fcst use DATA directory properly.

* Made run_post use DATA directory properly.

* Make make_grid use DATA properly (untested).

* Make make_sfc_climo use DATA properly (untested).

* Make make_orog use DATA properly (untested).

* Bug fix for none-nco mode.

* Don't pass arguments from j-jobs to ex-scripts.

* Make forecast and post-output go to COMOUT.

* Remove CYCLE_DIR and use COMIN instead.

* Bug fix for community mode.

* Append cyc to COMIN in NCO mode.

* Fix rocoto run_post dependency with run_fcst issue.

* Use OPSROOT instead of PTMP and STMP.

* Move nco vars in config_defaults.

* Move logdir location to COMROOT.

* Set all root directories to EXPTDIR in community mode.

* Use pgmout and pgmerr.

* Fix inline post.

* Make pgmout/err redirection work with community mode.

* Use print_err in get_obs_mrms.

* Add prep_step.

* Add post_step.

* Add dbn_alert to post-processed grib2 output.

* Download extrn files directly to COMIN.

* Make make_ics/lbcs directly output to COMIN.

* Change names of extrn_mdl_var_defns files.

* Name fixes for DO_ENSEMBLE=false, dyn/phy

* Don't create symlinks to grib2 files in NCO mode.

* Append rrfs to make_ics/lbcs output.

* Modify extrn_mdl_var_defns names.

* Move forecast output to DATA/RUN.PDY. This location
can be used to store output of other tasks as well.

* Move templates to parm.

* Fix for new parm location.

* Move metplus one level up.

* Fixes for community mode.

* Rename SCRIPTSDIR and JOBSDIR.

* Move all FIX** directories in to a fix/ directory.

* Make FIXrrfs be EXPTDIR for community mode.

* Symlink upp and ufs_utils parm files to top level parm directory.

* Remove UPP_DIR and UFS_UTILS_DIR.

* Define cycle with subcyc when it is non-zero.

* Don't delete COMIN_BASEDIR if it already exists.

* Disassociate NCO mode from pre-generated grid.

* Don't choose fix location based on RUN_ENVIR.

* Bug fix in make_lbcs.

* Add flag to symlink or copy fix files.

* Change slurm log file locations

* Minor fix for inline post in nco mode.

* Add unique workflow ID to avoid clashes between different runs, while
keeping the relation between different tasks, which PID can not do.

* Make verification tasks NCO complaint.

* Pass RUN_ENVIR to we2e script.

* Fixes for merge conflicts.

* Add versions for wcoss2.

* Fix symlinks.

* Minor changes.

* Move grid/orog/sfcc completion files to EXPTDIR/grid/orog etc.

* Output modified namelist file with seeds in current directory.

* Fixes for unittests.

* Bugfix wrf_io version

* Fix CI issue with bin locations.

* Allow NCO root directories to be set individually.

* Don't append workflow id in community mode.

* Add helper script to rename model e.g. rrfs->aqm

* Bug fixes and naming changes for consitency.

* Replace instances of USHrrfs etc with a generic USHdir etc.

* Add unittest for whole workflow now that the merge made it possible.

* Remove unused process_args utility.

* Remove hard coded paths from configs.

* Don't replace existing var value with None.

* Add config.nco to unittest.

* Fix for Orion issue.

* Fix default OPSROOT location in run_we2e.

* Modeify setup_we2e script to run fundamental tests on all machines.

* Fix conflicting ics/lbcs temp location by moving to DATA.

* Bug fix in load_modules taken from PR #353.

* Specify default shell instead of symlinking.

* Turn off grid/orog/sfc_climo tasks for NCO test cases.

* Use PDY and cyc in ex-scripts.

* Remove CDATE from xml and define int job_preamble.

* Use machine specific list of tests if available.

* Run all tests in community mode so that the last NCO test case
gets reported as finished.

* Minor changes

* Avoid using preamble in functions.

* Use preamble in function too.

* Turn on debugging for utility functions.

* Turn on debug & verbose in CI.

* Turn off set -e for init_env
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
* update lmod

* update lmod

* update hpc-stack and miniconda

* fix lmod-setup.sh bug for Gaea

* update files to run with new miniconda and MET VX

* fix typo

* fixed typo

* update vx task

* Update build_gaea_intel

The list of modules to be loaded needs updates.

* Update load_modules_run_task.sh

Fixed a typo

* Update load_modules_run_task.sh

* updated vx task

Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00061.pw-noaa-us-east-1.pw.local>
Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00062.pw-noaa-us-east-1.pw.local>
Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00063.pw-noaa-us-east-1.pw.local>
Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00064.pw-noaa-us-east-1.pw.local>
Co-authored-by: Natalie Perlin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. Waiting for Reviews The PR is waiting for reviews from associated component PR's.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify rt.conf Update of jet regression testing environment
5 participants