-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Add new RRFS variables such as NWGES and workflow control variables #647
[develop] Add new RRFS variables such as NWGES and workflow control variables #647
Conversation
013a28f
to
0b8ebc4
Compare
e9f8a61
to
7881c7f
Compare
01a0649
to
289a318
Compare
a6bde34
to
dd4749a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielabdi-noaa Overall, these changes look good to me. I agree that I'm not entirely sure what the purpose of the SERVICE_ACCOUNT
is in the ush/config_defaults.yaml
file. I have reached out to @hu5970 to see if he can shine some light on it (or if it even needs to be included). Also, the newly added RRFS_CONUS_3km_HRRRIC
grid appears to be identical to the RRFS_CONUScompact_3km
grid. The only difference is that the DT_ATMOS
value was decreased recently to use less wasteful values. So, I would suggest using RRFS_CONUScompact_3km
rather than RRFS_CONUS_3km_HRRRIC
.
#----------------------------------------------------------------------- | ||
MACHINE: "BIG_COMPUTER" | ||
ACCOUNT: "" | ||
SERVICE_ACCOUNT: '{{ user.ACCOUNT }}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the NOAA-GSL ush/config_defaults.yaml
file has both a SERVICE_ACCOUNT
and HPSS_ACCOUNT
, where HPSS_ACCOUNT
defaults to SERVICE_ACCOUNT
if not set. @hu5970, are these entries required? I'm seeing that they appeared to be added for Jet. Any additional information would be greatly appreciated. Thanks!
ush/predef_grid_params.yaml
Outdated
WRTCMP_lon_lwr_left: -122.17364391 | ||
WRTCMP_lat_lwr_left: 21.88588562 | ||
WRTCMP_dx: 3000.0 | ||
WRTCMP_dy: 3000.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the RRFS_CONUS_3km_HRRRIC
grid is identical to the RRFS_CONUScompact_3km
grid, with the exception of DT_ATMOS
being set to 36. This modification was merged at fd3c778 and was to use less wasteful values of DT_ATMOS
. It would probably be best to have the scripts use the RRFS_CONUScompact_3km
grid, rather than add the new RRFS_CONUS_3km_HRRRIC
grid with a different DT_ATMOS
value.
@MichaelLueken Thanks for capturing the predefined grid duplicates. It looks like there are no new predef grids to add, so I have removed all changes to predef_grid file. |
@danielabdi-noaa Going through the latest changes in the
Until this PR has been merged, I will keep an eye out on further changes to the |
@MichaelLueken Thanks a lot for going through the commits and summarizing the changes! I have updated this PR with most of the new variable changes, except for two, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielabdi-noaa I appreciate your contribution here.
I am very concerned about this approach to adding a bunch of variables that aren't used yet. We have no way to test them, and no way to know whether they will be relevant or needed when the tasks that currently use them in RRFS_dev1 are contributed.
I am also concerned that it will be difficult to communicate what does and does not work to the rest of the community. What I mean is, if someone sees a flag that says "do_something" they expect "something" to work. Given that is not true, I do not think it's fair for us to include those sorts of flags until the feature is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should not be added yet. We don't have the capabilities in place to run or test any of it, and many of the settings here will not ever be supported in SRW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you can run it even now. It will generate a workflow file that doesn't have any of the rrfs tasks.
@@ -5,7 +5,7 @@ metadata: | |||
description: >- | |||
Default configuration for an experiment. The valid values for most of the | |||
parameters are specified in valid_param_vals.yaml | |||
version: !!str '1.0' | |||
version: !!str '2.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me how we will want to bump versions, but it definitely doesn't seem like the time to do that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not the right time? Infact, everytime a variable is added is time for a minor version update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is generally unclear to me whether this should be the SRW version, the release version, etc.
# | ||
# RESERVATION_POST: | ||
# The reservation for post tasks. | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partitions and queues are now all set in the parm/wflow files. None of these are needed here. Reservations, when they are added, should also live in the parm/wflow files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the parm/wflow/default_worklfow.yaml
files get the there settings from config files, infact, we aready have partitions/queues in them already, so I am just extending that for jobs RRFS adds
PARTITION_DEFAULT: '{{ platform.get("PARTITION_DEFAULT") }}'
PARTITION_FCST: '{{ platform.get("PARTITION_FCST") }}'
PARTITION_HPSS: '{{ platform.get("PARTITION_HPSS") }}'
QUEUE_DEFAULT: '{{ platform.get("QUEUE_DEFAULT") }}'
QUEUE_FCST: '{{ platform.get("QUEUE_FCST") }}'
QUEUE_HPSS: '{{ platform.get("QUEUE_HPSS") }}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting that we don't necessarily have to set them explicitly here in config defaults. Here, the default queue is the main queue on a platform. FCST is special because it might need bigger nodes with more memory, and HPSS has internet access -- essentially it's the front-end node on platforms where that makes sense. Because each of these three function in separate ways for the workflow, we can choose from those 3 which task runs under which queue in the task definition.
There is no need for a one-to-one correspondence between available config defaults variables and what values are set as entities.
So, in the definition of the post jobs, we can modify the entry to specifically be debug
if we want to change it for a given user/platform.
PARTITION_POST: "" | ||
QUEUE_POST: "" | ||
RESERVATION: "" | ||
RESERVATION_POST: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these will be needed here as they can be set explicitly in the parm/wflow files for the tasks when they are added to the workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above! We are not setting them directly, but getting them from platform
section of the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I argue above, we do not need them as they generally don't serve any additional function on our platforms than the current functions allow in the XML generation.
@@ -360,6 +359,8 @@ def generate_FV3LAM_wflow(ushdir, logfile: str = "log.generate_FV3LAM_wflow", de | |||
"target_lon": LON_CTR, | |||
"target_lat": LAT_CTR, | |||
"nrows_blend": HALO_BLEND, | |||
"regional_bcs_from_gsi": False, | |||
"write_restart_with_bcs": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this impact AQM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea @chan-hoo probably knows.
RAPHRR_SOIL_ROOT: "" | ||
FIRE_RAVE_DIR: "" | ||
AIRCRAFT_REJECT: "/scratch2/BMC/zrtrr/rli/data/amdar_reject_lists" | ||
SFCOBS_USELIST: "/scratch2/BMC/zrtrr/rli/data/mesonet_uselists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the appropriate structure for retrieve_data.py to gather these. I believe there are other efforts to get the data paths incorporated into the workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't these "fixed files" that do not change from test case to test case? retrieve_data.py
has so far been used to get ICS/LBCS that change from one test case to test case. @EdwardSnyder-NOAA has put some of these files under epic role account and these paths can be changed as new files are added to locations that SRW can use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These paths are a combination of fixed and observation files. We were trying to replicate the functionality from the set_rrfs_config.sh found in the dev1 branch. Essentially this script sets paths based on machine and model run (retro/current) type. It made sense to add the logic here even though the retrieve_data.py
script isn't used to grab these files. We are definitely open for suggestions on placing this logic elsewhere.
In addition, how often should we utilize the retrieve_data.py
script? As the process lightning task grabs lightning data in the jobs script without the retrieve_data.py
script. Would we need to integrate all the rrfs obs data into the retrieve_data.py
script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should be using the retrieve_data script any time we need to pull data from disk and/or external resources.
One assumption made in RRFS_dev was that we'd always have data staged on disk for real-time or retro runs. That is not an appropriate assumption for the community who likely have different goals. SRW users working outside the real-time paradigm, or on platforms where real-time data streams aren't available, or on retrospective case studies, will need a way to pull data from archives like HPSS or AWS.
The retrieve_data tool is meant to allow a user to define whether they have staged data on disk via their own doing, or a supported data stream, or whether the workflow should go get it for them from known external locations.
@@ -58,7 +58,7 @@ def set_FV3nml_ens_stoch_seeds(cdate): | |||
# | |||
# ----------------------------------------------------------------------- | |||
# | |||
fv3_nml_ensmem_fp = f"{os.getcwd()}{os.sep}{FV3_NML_FN}" | |||
fv3_nml_ensmem_fp = f"{os.getcwd()}{os.sep}{FV3_NML_FN}_base" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the name of the file changes, it should change in config_defaults so that it's clear what it is from the beginning. One of the features of having configuration settings is that there is rarely a need to change these types of things in the scripts.
@@ -174,7 +174,7 @@ def setUp(self): | |||
mkdir_vrfy("-p", EXPTDIR) | |||
cp_vrfy( | |||
os.path.join(PARMdir, "input.nml.FV3"), | |||
os.path.join(EXPTDIR, "input.nml"), | |||
os.path.join(EXPTDIR, "input.nml_base"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, it seems like we should be referencing the configuration setting instead of repeating this information, especially if we'd like to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what your concern is here, but the way RRFS handles input namelist files is different from how SRW does it. RRFS generates four different input namelist files and symlinks it to input.nml
depending on the configuration, as opposed to in-place modification of one namelist file that SRW uses. This also applies for the stochastic physics options modification and I tested that this modification works for the WE2E test community_ensemble_2mems_stoch
.
@@ -5,7 +5,7 @@ metadata: | |||
description: >- | |||
Default configuration for an experiment. The valid values for most of the | |||
parameters are specified in valid_param_vals.yaml | |||
version: !!str '1.0' | |||
version: !!str '2.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is generally unclear to me whether this should be the SRW version, the release version, etc.
# | ||
# RESERVATION_POST: | ||
# The reservation for post tasks. | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting that we don't necessarily have to set them explicitly here in config defaults. Here, the default queue is the main queue on a platform. FCST is special because it might need bigger nodes with more memory, and HPSS has internet access -- essentially it's the front-end node on platforms where that makes sense. Because each of these three function in separate ways for the workflow, we can choose from those 3 which task runs under which queue in the task definition.
There is no need for a one-to-one correspondence between available config defaults variables and what values are set as entities.
So, in the definition of the post jobs, we can modify the entry to specifically be debug
if we want to change it for a given user/platform.
PARTITION_POST: "" | ||
QUEUE_POST: "" | ||
RESERVATION: "" | ||
RESERVATION_POST: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I argue above, we do not need them as they generally don't serve any additional function on our platforms than the current functions allow in the XML generation.
# Setup default locations for FIRE_RRFS files and update time | ||
# FIRE_RAVE_DIR | ||
# FIRE_RRFS_ROOT | ||
# FIRE_RRFS_update_hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data:
section of the machine files in SRW currently is meant to work with retrieve_data.py script to set where to get the datas on disk and is a structured section with the structure strictly defined by that tool. The section should remain consistent with the tool's existing behavior.
CYCL_HRS: [] | ||
CYCL_HRS_SPINSTART: [] | ||
CYCL_HRS_PRODSTART: [] | ||
CYCL_HRS_PRODSTART_ENS: [] | ||
CYCL_HRS_RECENTER: [] | ||
CYCL_HRS_STOCH: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I'm suggesting that the necessary variables be added with the tool that requires them. They should be added when they are needed, not separately where they cannot be tested on their own, or recognized as to why they are useful.
They are included in PR #638 , and I suggest that be the only place they are added.
POSTPROC_LEN_HRS: 1 | ||
POSTPROC_LONG_LEN_HRS: 1 | ||
FCST_LEN_HRS_SPINUP: 1 | ||
FCST_LEN_HRS_CYCLES: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you suggest, this one should not be included. A proposed change for the current solution for AQM is being discussed in Issue #709
RRFSE_NWGES: '{{ OPSROOT }}/nwges' | ||
RRFSE_NWGES_BASEDIR: '{{ RRFSE_NWGES }}' | ||
ENSCTRL_NWGES: '{{ OPSROOT }}/nwges' | ||
ENSCTRL_NWGES_BASEDIR: '{{ ENSCTRL_NWGES}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting in a clean implementation now is preferred. A main goal of the RRFS_dev1 merge is to take care of the refactoring needed to address some of the accidental complexities that have snuck in over time.
START_TIME_CONVENTIONAL: 00:40:00 | ||
START_TIME_NSSLMOSIAC: 00:45:00 | ||
START_TIME_LIGHTNINGNC: 00:45:00 | ||
START_TIME_PROCSMOKE: 00:45:00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rocoto section of the config files are 100% configurable. I am definitely not suggesting that the users edit any of the parm/wflow files. The sections defined in those parm/wflow files are completely configurable by the user in their own user config. There is no need to add the sections to the config_defaults.yaml file.
#----------------------------------------------------------------------- | ||
# | ||
LBCS_SEARCH_HRS: 6 | ||
EXTRN_MDL_LBCS_SEARCH_OFFSET_HRS: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my YAGNI argument above. It is a standard, industry wide best practice, and does apply in this case.
RAPHRR_SOIL_ROOT: "" | ||
FIRE_RAVE_DIR: "" | ||
AIRCRAFT_REJECT: "/scratch2/BMC/zrtrr/rli/data/amdar_reject_lists" | ||
SFCOBS_USELIST: "/scratch2/BMC/zrtrr/rli/data/mesonet_uselists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should be using the retrieve_data script any time we need to pull data from disk and/or external resources.
One assumption made in RRFS_dev was that we'd always have data staged on disk for real-time or retro runs. That is not an appropriate assumption for the community who likely have different goals. SRW users working outside the real-time paradigm, or on platforms where real-time data streams aren't available, or on retrospective case studies, will need a way to pull data from archives like HPSS or AWS.
The retrieve_data tool is meant to allow a user to define whether they have staged data on disk via their own doing, or a supported data stream, or whether the workflow should go get it for them from known external locations.
DESCRIPTION OF CHANGES:
Addresses issue #654.
config_defaults.yaml
- New directory namesNWGES
,RRFSE_NWGES
,ENSCTRL_COMIN
etc,TAG
.DO_SPINUP
,DO_DACYCLE
,DO_SOIL_ADJUST
RRFS_dev1
defines a bunch of new cycles besidesforecast
,initial
etc, so those are added too.generate_FV3LAM_wflow.py
needed for creating input namelist files for DA cyclesI think this PR is priority number 1 because of this since all tasks will depend on it, and can serve as a central place to coordinate workflow variables, common task variables etc.
Type of change
TESTS CONDUCTED:
RRFS conus 3km test case as run in PR [develop] Incorporate RRFS_dev1 workflow to SRW and import most tasks #540.
DEPENDENCIES:
DOCUMENTATION:
ISSUE:
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):