Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Add smoke and dust capability #1127

Open
wants to merge 64 commits into
base: develop
Choose a base branch
from

Conversation

chan-hoo
Copy link
Collaborator

@chan-hoo chan-hoo commented Sep 6, 2024

DESCRIPTION OF CHANGES:

  • Add the smoke/dust capability of the RRFS workflow.
  • Since the smoke/dust capability is available only with the production/RRFSv.1 branch of the UFS weather model, a separate External.cfg file should be used.
  • Currently this capability works only on Hera.

Steps to build and run for smoke/dust:

  1. Retrive the 'smoke_dust' branch:
git clone -b smoke_dust https://github.com/chan-hoo/ufs-srweather-app
cd ufs-srweather-app
  1. Check out the external components with Externals_smokes_dust.cfg:
./manage_externals/checkout_externals -e Externals_smoke_dust.cfg
  1. Build the app with --smoke:
./devbuild.sh -p=hera --smoke
  1. Load python environment:
module use modulefiles
module load wflow_hera
conda activate srw_app
  1. Set up the configuration:
cd ush
cp config.smoke_dust.yaml config.yaml
## change 'ACCOUNT' with yours
  1. Generate a workflow for smoke/dust:
./generate_FV3LAM_wflow.py
  1. Run the workflow:
cd ../../exp_dir/smoke_dust_conus3km
./launch_FV3LAM_wflow.sh

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • derecho.intel
  • gaea.intel
  • hera.gnu
  • hera.intel
  • hercules.intel
  • jet.intel
  • orion.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DOCUMENTATION:

Documentation will be followed by a separate PR.

ISSUE:

Fixes issue mentioned #1126

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

@RatkoVasic-NOAA
Copy link
Collaborator

Tested smoke/dust on Hera:

       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
201907220000               make_grid                    66334840           SUCCEEDED                   0         1          40.0
201907220000               make_orog                    66334973           SUCCEEDED                   0         1         363.0
201907220000          make_sfc_climo                    66335324           SUCCEEDED                   0         1          81.0
201907220000              smoke_dust                    66335412           SUCCEEDED                   0         1          40.0
201907220000               prepstart                    66335548           SUCCEEDED                   0         1          31.0
201907220000           get_extrn_ics                    66334837           SUCCEEDED                   0         1          49.0
201907220000          get_extrn_lbcs                    66334836           SUCCEEDED                   0         1          48.0
201907220000         make_ics_mem000                    66335410           SUCCEEDED                   0         1         178.0
201907220000        make_lbcs_mem000                    66335414           SUCCEEDED                   0         1          82.0
201907220000         run_fcst_mem000                    66335638           SUCCEEDED                   0         1        4469.0
201907220000        post_mem000_f000                    66338670           SUCCEEDED                   0         1         192.0
201907220000        post_mem000_f001                    66338675           SUCCEEDED                   0         1         195.0
201907220000        post_mem000_f002                    66338674           SUCCEEDED                   0         1         194.0
201907220000        post_mem000_f003                    66338677           SUCCEEDED                   0         1         205.0
201907220000        post_mem000_f004                    66338671           SUCCEEDED                   0         1         218.0
201907220000        post_mem000_f005                    66338672           SUCCEEDED                   0         1         214.0
201907220000        post_mem000_f006                    66338673           SUCCEEDED                   0         1         209.0
================================================================================================================================
201907220600              smoke_dust                    66338676           SUCCEEDED                   0         1         164.0
201907220600               prepstart                    66338787           SUCCEEDED                   0         1          94.0
201907220600           get_extrn_ics                    66334838           SUCCEEDED                   0         1          50.0
201907220600          get_extrn_lbcs                    66334839           SUCCEEDED                   0         1          49.0
201907220600         make_ics_mem000                    66335411           SUCCEEDED                   0         1         154.0
201907220600        make_lbcs_mem000                    66335413           SUCCEEDED                   0         1          82.0
201907220600         run_fcst_mem000                    66338842           SUCCEEDED                   0         1        4494.0
201907220600        post_mem000_f000                    66341952           SUCCEEDED                   0         1         195.0
201907220600        post_mem000_f001                    66341950           SUCCEEDED                   0         1         218.0
201907220600        post_mem000_f002                    66341951           SUCCEEDED                   0         1         223.0
201907220600        post_mem000_f003                    66341953           SUCCEEDED                   0         1         220.0
201907220600        post_mem000_f004                    66341954           SUCCEEDED                   0         1         220.0
201907220600        post_mem000_f005                    66341955           SUCCEEDED                   0         1         224.0
201907220600        post_mem000_f006                    66341956           SUCCEEDED                   0         1         224.0

Fundamental SRW tests:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              11.91
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               8.58
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              25.30
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024090  COMPLETE              43.66
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240909183  COMPLETE              27.30
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024090918371  COMPLETE              45.46
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             162.21

Approving.

@chan-hoo
Copy link
Collaborator Author

chan-hoo commented Sep 9, 2024

@RatkoVasic-NOAA, thank you so much for your test and approval !!! :)

@MichaelLueken
Copy link
Collaborator

@chan-hoo -

After rewinding/booting the first run_fcst task, I was able to successfully run the new smoke and dust WE2E test on Hera:

       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
201907220000               make_grid                    66218006           SUCCEEDED                   0         1          45.0
201907220000               make_orog                    66218340           SUCCEEDED                   0         1         372.0
201907220000          make_sfc_climo                    66218575           SUCCEEDED                   0         1          90.0
201907220000              smoke_dust                    66218715           SUCCEEDED                   0         1          38.0
201907220000               prepstart                    66219117           SUCCEEDED                   0         1          37.0
201907220000           get_extrn_ics                    66218007           SUCCEEDED                   0         1          63.0
201907220000          get_extrn_lbcs                    66218008           SUCCEEDED                   0         1          58.0
201907220000         make_ics_mem000                    66218716           SUCCEEDED                   0         1         152.0
201907220000        make_lbcs_mem000                    66218717           SUCCEEDED                   0         1          79.0
201907220000         run_fcst_mem000                    66225732           SUCCEEDED                   0         1        4462.0
201907220000        post_mem000_f000                    66229719           SUCCEEDED                   0         1         197.0
201907220000        post_mem000_f001                    66229724           SUCCEEDED                   0         1         198.0
201907220000        post_mem000_f002                    66229720           SUCCEEDED                   0         1         202.0
201907220000        post_mem000_f003                    66229721           SUCCEEDED                   0         1         208.0
201907220000        post_mem000_f004                    66229722           SUCCEEDED                   0         1         214.0
201907220000        post_mem000_f005                    66229726           SUCCEEDED                   0         1         216.0
201907220000        post_mem000_f006                    66229723           SUCCEEDED                   0         1         222.0
================================================================================================================================
201907220600              smoke_dust                    66229725           SUCCEEDED                   0         1         171.0
201907220600               prepstart                    66230255           SUCCEEDED                   0         1         102.0
201907220600           get_extrn_ics                    66218009           SUCCEEDED                   0         1          63.0
201907220600          get_extrn_lbcs                    66218010           SUCCEEDED                   0         1          58.0
201907220600         make_ics_mem000                    66218718           SUCCEEDED                   0         1         155.0
201907220600        make_lbcs_mem000                    66218719           SUCCEEDED                   0         1          79.0
201907220600         run_fcst_mem000                    66230376           SUCCEEDED                   0         1        4520.0
201907220600        post_mem000_f000                    66330901           SUCCEEDED                   0         1         198.0
201907220600        post_mem000_f001                    66330897           SUCCEEDED                   0         1         208.0
201907220600        post_mem000_f002                    66330898           SUCCEEDED                   0         1         216.0
201907220600        post_mem000_f003                    66330899           SUCCEEDED                   0         1         221.0
201907220600        post_mem000_f004                    66330902           SUCCEEDED                   0         1         216.0
201907220600        post_mem000_f005                    66330903           SUCCEEDED                   0         1         214.0
201907220600        post_mem000_f006                    66330900           SUCCEEDED                   0         1         216.0

The fundamental tests successfully passed on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              11.55
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               8.63
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              25.94
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024090  COMPLETE              43.71
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240909190  COMPLETE              26.62
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024090919013  COMPLETE              45.41
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             161.86

Currently running the AQM WE2E test on Hera and will report once it completes.

Please be aware, there are several coding standard issues with the changes in this PR. In hopes of kicking off some discussion (if need be, a larger discussion will take place during the Thursday code management meeting), I have the following concerns:

  • I don't mind adding a second Externals.cfg file, but any component hashes that are merged into develop should point to the component's develop (or other default branch in the authoritative repository), rather than a production/implementation branch, release branch, or release tag.
  1. The ufs_utils hash, 33cc663, is pointing to the ufs_utils_1_12_2 release tag.
  2. The ufs-weather-model hash, ce43a6f, is pointing to the production/RRFS.v1 implementation branch.
  3. The UPP hash, fc85241, is pointing to the release/rrfs_v1 release branch.

In PR #1089, we should be able to remove the dependency of the ufs_utils_1_12_2 release tag. In the UFS_UTIL repository, PR #923 exists to add smoke mass density to the ICs and LBCs. Do you know if the RRFS-SD stakeholders have plans on updating the UFS_UTILS hash to include smoke (from RAP-smoke/HRRR-smoke/RRFS-SD) in the ICs/LBCs?

Do you know which commits are still required in ufs-weather-model/develop and UPP/develop that haven't been merged yet? While changes have been added to ufs-weather-model/production/RRFS.v1 and UPP/release/rrfs_v1 over the last month, the hashes used in this PR are from June 2024. The majority of the release/implementation changes should already be in the develop branches in these components.

  • The inclusion of new Hera and Orion production build modulefiles. The issue with using the production/RRFS.v1 is that it requires using an older version of spack-stack in order to work. This then requires loading the modules associated with spack-stack v1.5.1 rather than spack-stack v1.6.0. The build_*_intel_prod.lua modulefiles are unable to load the necessary modules from srw_common.lua, which is a requirement for the build modulefiles.
  • There is already a parm/wflow/post.yaml file. Why add a new upp_post.yaml file? It is possible to add the changes from upp_post.yaml into the currently existing post.yaml and remove the newly added workflow file?
  • For JSRW_FORECAST, JSRW_UPP_POST, exsrw_forecast.sh, and exsrw_upp_post.sh, is there a reason for the addition of these j-jobs and ex-scripts? Is it possible to add the modifications in these j-jobs and ex-scripts into the original JREGIONAL_RUN_FCST, JREGIONAL_RUN_POST, exregional_run_fcst.sh, and exregional_run_post.sh? Including several ex-scripts and j-jobs that do the similar things doesn't make sense.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SRW coding standard issues need to be discussed before moving forward with final testing and merging this work to develop. Hopefully, discussion and take place in this PR. Otherwise, discussion will begin during Thursday's code management meeting.

@chan-hoo
Copy link
Collaborator Author

chan-hoo commented Sep 9, 2024

@MichaelLueken, thanks for your review:

  1. The reason why I put another External.cfg is that the smoke and dust capability is available only on the production/RRFSv.1 branch of the ufs weather model. Once this capability is merged into the develop branch, we can remove this External_smoke_dust.cfg. (Sorry I have no idea which PRs were related to this change in the ufs weather model specifically.)
  2. Regarding the hash of the other external components, this production branch of the ufs weather model was not compiled with the module file in the SRW app. Recently I found out that this was because the smoke and dust capability requires the recent version of FMS and g2tmpl. I think we can use the same hash of UFS_UTILS and UPP as the current SRW App. I'll test it.
  3. Regarding the new tasks srw_forecast and srw_upp_post, the critical issue is that the current regional_run_fcst and regional_run_post do not meet the NCO standards. The ex-scripts may not be the big issue, but the trouble maker is the J-job scripts. The scripts JREGIONAL_RUN_FCST and JREGIONAL_RUN_POST have very complicate vertical structures. If you are familiar with the NCO standards, please check the SRW Issues All build-related files and directories should be located in sorc #1022 - Name of source code directory should be same as name of executable #1032. The best way is to replace JREGIONAL_RUN_FCST and JREGIONAL_RUN_POST with JSRW_FORECAST and JSRW_UPP_POST, but all the scripts should be changed and all we2e tests should be tested for this change. This is not what I want to do in this PR. I think that JREGIONAL_RUN_FCST and JREGIONAL_RUN_POST should be replaced with JSRW_FORECAST and JSRW_UPP_POST eventually for all other we2e tests. This might be your task :) Through this PR, I'd like to show how the other JREGIONAL_ scripts should be changed to meet the NCO standards.
    I'll try to resolve the above item 2). If you have any other concerns, we can keep discussing in this PR :)

@MichaelLueken
Copy link
Collaborator

@chan-hoo -

If the smoke and dust capability is only available in the RRFS v1 production branch, and not in the develop branch in the ufs-weather-model, then we can't move forward with integrating this into the SRW develop branch at this time. The RRFS v1 and RRFS-SD implementation schedule has been suspended due to issues with severe convection encountered during the Hazardous Weather Testbed's 2024 Spring Experiment. If capabilities are available in develop branches for all component repositories, then we can move forward with introducing this work into the SRW develop branch.

Since the ICs/LBCs for the new smoke dust WE2E test are RAP and not RRFS, I suspect that the current UFS_UTILS hash should be sufficient (if it used RRFS, then we would need to wait until Natalie's PR #1089). I'll also check the current UPP hash. While the release branch has several changes, I can't tell if those changes are for RRFS-SD or RRFS v1.

Thanks for explaining the purpose behind the new ex-scripts and j-jobs. I certainly agree with you that changing all of the ex-scripts and j-jobs to follow NCO standards lies outside the scope of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need to add smoke and dust capability Port RRFS-SD features
3 participants