-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workflow to run model with supplied nudging tendency #215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some over-arching points:
- There's some configuration currently provided using global constants which should probably be provided in a yaml file somewhere. This workflow could take them in as command-line arguments, or via the fv3config.yml.
- The "nudging" runfile seems to actually be a "forcing" runfile. I would suggest refactoring the names to reflect that you are forcing a model using saved nudging data, rather than nudging the model against a reference.
- Some of the helper methods around config dictionaries should probably be added to fv3config rather than fv3net.
fv3net/runtime/config.py
Outdated
def get_config(): | ||
"""Return fv3config dictionary""" | ||
with open("fv3config.yml") as f: | ||
config = yaml.safe_load(f) | ||
return config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added to fv3config
as a config_from_yaml
routine to match config_to_yaml
. I would imagine fv3config.yml
should be a global constant somewhere, possibly in fv3net.runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a config_to_yaml
function to fv3config (will put up PR in a moment) and added a TODO to refactor to use that once fv3net points to an updated fv3config version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just bump the patch version on fv3config also, and point fv3net to the updated fv3config? You could do the version bumping and new features in the same PR as described in the release instructions. Would avoid having to keep track of this and fix it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
fv3net/runtime/config.py
Outdated
def get_timestep(): | ||
"""Return model timestep in seconds""" | ||
return get_namelist()["coupler_nml"]["dt_atmos"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added as a helper method in fv3config (in derive).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a function to fv3config, and put in TODO to refactor that usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored workflow to use fv3config version of get_timestep().
Is it a workflow which has issues with this, or a package like fv3net or vcm? If it's a package, can this also be added to the |
Noah ran into this issue in the new one-step workflow, I ran into it in this workflow. These both use the prognostic run image, so just pinning the version there should handle it. Looks like it's already been pinned in fv3net's |
FYI I think the bug you are referring to has been accommodated for temporarily in the latest release of xarray (0.15.1), pydata/xarray#3764, and will ultimately be fixed in pandas in their next release (1.1.0), pandas-dev/pandas#32905. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Oli. This looks like a lot of work, and overall quite clean.
Apart from some minor comments and FYIs below, my main concern is the number of different scripts used by this workflow. There are scripts for preparing, submitting, and preprocessing the jobs, and it is not 100% clear what the entry point of this workflow is, since it is not documented in the README. My preference would be that running make -C workflows/run_with_learned_nudging
from the fv3net root would do everything that needs to happen (including preprocessing) for this job to complete successfully.
Also please add some explanation to HISTORY.rst.
@@ -106,6 +106,23 @@ sources: | |||
access: read_only | |||
urlpath: "gs://vcm-ml-data/2020-02-25-additional-november-C3072-simulation-C384-diagnostics/atmos_8xdaily_C3072_to_C384.zarr" | |||
|
|||
GFS_analysis_T85_2015_2016: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. I think this will be very helpful for us.
@@ -63,9 +62,9 @@ def _parse_categories(diagnostic_categories, rundir): | |||
return diagnostic_categories | |||
|
|||
|
|||
def _parse_diagnostic_dir(diagnostic_dir, rundir): | |||
def _get_diagnostic_dir(diagnostic_dir, rundir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This function does a very general thing, but has an overly specific name. I would just move this logic to the run
function
diagnostic_dir = rundir if args.diagnostic_dir is None else args.diagnostic_dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
] | ||
|
||
|
||
def _get_and_write_nudge_files_description_asset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has an "and" in the name. I would move the listing and writing into two separate function calls in update_config_for_nudging
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1,7 +1,7 @@ | |||
## Diagnostics-to-zarr workflow | |||
This workflow takes a path/url to a run directory as an input and saves zarr stores | |||
of the diagnostic model output to a specified location. This workflow requires a | |||
specific xarray version (0.14.0) and so to run locally, one must ensure your | |||
specific xarray version (0.15.0) and so to run locally, one must ensure your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should mention this in the README since it is bound to get our of sync, and you already have it in the setup.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the reference to a particular version number.
@@ -0,0 +1,5 @@ | |||
This workflow allows an external nudging tendency be applied to FV3GFS runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a title with at the ##
level so that this README is correctly interpreted by the sphinx docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise please mention the path workflows/run_with_learned_nudging
so people will know how to get to this folder when looking at the sphinx docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 +1,28 @@ | |||
IMAGE = us.gcr.io/vcm-ml/prognostic_run:mean_nudging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear where this docker image is built from this PR and what is in it. Are you planning to adjust this to point at a versioned tag of the prognostic_run image before mergining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, added another Dockerfile in fv3net/docker
. I didn't add this new docker image to the build_images
rule in fv3net's Makefile, since it's not used by any of the steps in the main end-to-end pipeline. There are instructions in the README about building the image though.
|
||
# constants | ||
ROOT_URL=gs://vcm-ml-data/2020-03-30-learned-nudging-FV3GFS-runs | ||
LATLON_VAR_LIST=DLWRFsfc,DSWRFsfc,DSWRFtoa,LHTFLsfc,PRATEsfc,SHTFLsfc,ULWRFsfc,ULWRFtoa,USWRFsfc,USWRFtoa,TMP2m,TMPsfc,SOILM,PRESsfc,ucomp,vcomp,temp,sphum,ps_dt_nudge,delp_dt_nudge,u_dt_nudge,v_dt_nudge,t_dt_nudge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some improvements in #153 that could avoid explicitly listing these in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know.
if [ "$DO_REGRID" = true ]; then | ||
for RUN in $RUNS; do | ||
# regrid certain monthly-mean variables to lat-lon grid | ||
argo --cluster $ARGO_CLUSTER submit workflows/fregrid_cube_netcdfs/pipeline.yaml \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pass -w
to argo submit this command will block until the job completes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. postprocess.sh
is the last step, so no need to block, but good to know.
config_url = os.path.abspath(config_url) | ||
config = prepare_config(template, base_config, args.nudge_label, config_url) | ||
fs = vcm.cloud.get_fs(config_url) | ||
fs.mkdirs(config_url, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the new run_k8s can be passed the config dictionary directly, without needing it to be uploaded first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for reminder. Already uploading other assets, so I'll leave this be.
@@ -0,0 +1,47 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include this postprocessing as a dependency in the makefile someplace? It is currently not clear how it fits into the rest of the pipeline. Ideally, typing make
should result in everything being run that needs to be run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments to the README as we discussed.
Thanks for the heads up Spencer! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
This workflow allows an external nudging tendency be applied to FV3GFS runs. | ||
## Run with learned nudging workflow | ||
|
||
This workflow (in `workflows/run_with_learned_nudging`) allows an external nudging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving the readme.
Major changes
single_fv3gfs_runs
workflow but has now been moved tofv3net/pipelines/kube_jobs
and tests have been added. This code move goes against our new guideline (I did the refactor before that guideline was discussed). I think the correct ultimate solution is to move the entirekube_jobs
set of modules to an external package, as they now being used be four workflows (one_step_jobs, prognostic_c48_run, single_fv3gfs_run, and the new run_with_learned_nudging). I'd rather not do that now, since I don't want to mess with the core ML workflow steps.Minor changes
get_protocol
tovcm.cloud.__init__.py
diagnostics-to-zarr
workflow so that by default it saves Zarr stores of diagnostic output in the rundir of the given experiment instead of in the parent of the rundir.