Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Bug fix to support the %H format in METplus via printf. #1102

Merged

Conversation

gsketefian
Copy link
Collaborator

@gsketefian gsketefian commented Jul 9, 2024

DESCRIPTION OF CHANGES:

This bug was encountered when verifying forecast output that has a 2-digit forecast hour in its name. It turns out specifying the METplus format %H to obtain a 2-digit forecast hour in the workflow/verification configuration variable FCST_FN_TEMPLATE (and others) causes an error in the shell script eval_METplus_timestr_tmpl.sh because bash's printf utility does not support the %H format. This fixes that error using a similar approach to the %HHH format for obtaining 3-digit hours.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • hercules.intel
  • cheyenne.intel
  • cheyenne.gnu
  • derecho.intel
  • gaea.intel
  • gaeac5.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

The full set of WE2E tests involving vx were run on Hera. These are:

MET_ensemble_verification
MET_ensemble_verification_only_vx
MET_ensemble_verification_only_vx_time_lag
MET_ensemble_verification_winter_wx
MET_verification
MET_verification_only_vx
MET_verification_winter_wx

All passed.

DEPENDENCIES:

None needed.

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@willmayfield and @michelleharrold encountered this bug, and @mkavulich pinpointed the script it was originating from.

@gsketefian
Copy link
Collaborator Author

@MichaelLueken I am running the vx WE2E tests on this now.

@MichaelLueken MichaelLueken changed the title Bug fix to support the %H format in METplus via printf. [develop] Bug fix to support the %H format in METplus via printf. Jul 9, 2024
@MichaelLueken MichaelLueken added the bug Something isn't working label Jul 9, 2024
@gsketefian
Copy link
Collaborator Author

@MichaelLueken All the WE2E vx tests passed, and I noted that in the PR message. Thanks.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsketefian -

These changes look good to me! I ran the non-HPSS verification WE2E tests on Hercules and all tests successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
MET_ensemble_verification_only_vx_20240710074823                   COMPLETE               1.97
MET_ensemble_verification_winter_wx_20240710074826                 COMPLETE             153.22
MET_verification_only_vx_20240710074828                            COMPLETE               0.59
MET_verification_20240710074830                                    COMPLETE              14.11
MET_ensemble_verification_2024071007483                            COMPLETE              29.84
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             199.73

and the HPSS verification WE2E tests successfully passed on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
MET_ensemble_verification_only_vx_time_lag_20240710153519          COMPLETE               3.57
MET_verification_winter_wx_20240710153521                          COMPLETE              17.14
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE              20.71

Approving now.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Jul 10, 2024
@MichaelLueken
Copy link
Collaborator

The Jenkins WE2E coverage tests successfully passed on all machines, with the exception of Jet, where the testing phase was aborted for running longer than 8 hours. Before the tests were aborted, there were two failures - custom_ESGgrid and custom_ESGgrid_Great_Lakes_snow_8km.

Both failures appear to be due to Slurm/Node issues on the machine.

Tasks being allocated nodes and hanging until the walltime has passed:

slurmstepd: error: *** STEP 6229197.0 ON x625 CANCELLED AT 2024-07-11T04:00:24 DUE TO TIME LIMIT ***
slurmstepd: error: *** JOB 6229197 ON x625 CANCELLED AT 2024-07-11T04:00:24 DUE TO TIME LIMIT ***

Tasks not launching properly:

srun: error: timeout waiting for task launch, started 96 of 108 tasks
srun: StepId=6229110.0 aborted before step completely launched.
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** STEP 6229110.0 ON x3 CANCELLED AT 2024-07-11T05:22:07 ***

Manual running of the WE2E coverage tests were also launched on Jet yesterday. There are no time-outs for test stages while manually running, so the tests ran through to completion. Three tests ultimately failed with the above errors. Once resubmission of the failed tests successfully pass, this PR will be merged.

@gsketefian
Copy link
Collaborator Author

@MichaelLueken Thanks for the update.

@MichaelLueken
Copy link
Collaborator

The manual runs of the WE2E coverage tests successfully passed on Jet:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
community_20240710185903                                           COMPLETE              17.92
custom_ESGgrid_20240710185904                                      COMPLETE             155.97
custom_ESGgrid_Great_Lakes_snow_8km_20240710185905                 COMPLETE              22.16
custom_GFDLgrid_20240710185907                                     COMPLETE               9.95
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202407  COMPLETE               9.29
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20  COMPLETE              87.54
get_from_HPSS_ics_RAP_lbcs_RAP_20240710185909                      COMPLETE              16.11
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240710185910  COMPLETE             615.42
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              64.71
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               6.93
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024  COMPLETE             930.56
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1936.56

Moving forward with merging this work now.

@MichaelLueken MichaelLueken merged commit e5832d1 into ufs-community:develop Jul 12, 2024
3 of 5 checks passed
christinaholtNOAA added a commit to christinaholtNOAA/ufs-srweather-app that referenced this pull request Aug 12, 2024
commit b368974
Author: Christina.Holt <[email protected]>
Date:   Sat Jul 13 02:42:24 2024 +0000

    Linting.

commit 5502cdc
Author: Christina.Holt <[email protected]>
Date:   Fri Jul 12 22:58:49 2024 +0000

    This should no longer be an executable bash file.

commit 0a8cf97
Merge: 8b9eb07 e5832d1
Author: Christina.Holt <[email protected]>
Date:   Fri Jul 12 22:51:03 2024 +0000

    Merge branch 'develop' of https://github.com/ufs-community/ufs-srweather-app into to_yaml

commit 8b9eb07
Merge: 2ed368f f360da3
Author: Christina.Holt <[email protected]>
Date:   Fri Jul 12 22:47:59 2024 +0000

    Merge branch 'to_yaml' of https://github.com/ChristinaholtNOAA/ufs-srweather-app into to_yaml

commit 2ed368f
Author: Christina.Holt <[email protected]>
Date:   Fri Jul 12 22:47:04 2024 +0000

    Update docs for default values.

commit b31f6c7
Author: Christina.Holt <[email protected]>
Date:   Fri Jul 12 22:41:29 2024 +0000

    Applying suggested changes from review.

commit e5832d1
Author: gsketefian <[email protected]>
Date:   Fri Jul 12 06:57:23 2024 -0600

    [develop] Bug fix to support the %H format in METplus via printf. (ufs-community#1102)

    This bug was encountered when verifying forecast output that has a 2-digit forecast hour in its name. It turns out specifying the METplus format %H to obtain a 2-digit forecast hour in the workflow/verification configuration variable FCST_FN_TEMPLATE (and others) causes an error in the shell script eval_METplus_timestr_tmpl.sh because bash's printf utility does not support the %H format. This fixes that error using a similar approach to the %HHH format for obtaining 3-digit hours.

commit f360da3
Author: Christina Holt <[email protected]>
Date:   Fri Jun 28 08:24:21 2024 -0600

    Update jobs/JREGIONAL_MAKE_OROG

    Co-authored-by: Michael Lueken <[email protected]>

commit 241ef5d
Author: Christina.Holt <[email protected]>
Date:   Tue Jun 18 23:13:55 2024 +0000

    A bit of clean up

commit e242490
Author: Christina.Holt <[email protected]>
Date:   Tue Jun 18 16:50:48 2024 +0000

    Add nco section to be sourced always.

commit 253590f
Author: Christina.Holt <[email protected]>
Date:   Mon Jun 17 19:41:55 2024 +0000

    WIP

commit d2a0bc2
Author: Christina.Holt <[email protected]>
Date:   Tue Jun 11 02:16:32 2024 +0000

    Source from a function.

commit 874af4b
Author: Christina.Holt <[email protected]>
Date:   Tue Jun 11 02:16:10 2024 +0000

    Bump uwtools version.

commit 4ef89f6
Merge: 94d7970 81be59e
Author: Christina.Holt <[email protected]>
Date:   Tue Jun 11 01:46:17 2024 +0000

    Merge remote-tracking branch 'origin/develop' into to_yaml

commit 94d7970
Author: Christina.Holt <[email protected]>
Date:   Tue Jun 11 01:40:37 2024 +0000

    WIP

commit f08e3ef
Author: Christina.Holt <[email protected]>
Date:   Sat Apr 27 01:02:41 2024 +0000

    Boolified.

commit a6513cb
Merge: 0d86ab0 c7e093d
Author: Christina.Holt <[email protected]>
Date:   Tue Apr 23 12:45:01 2024 +0000

    Merge remote-tracking branch 'origin/develop' into to_yaml

commit 0d86ab0
Author: Christina.Holt <[email protected]>
Date:   Fri Apr 19 00:19:18 2024 +0000

    Fix all the subshell problems.

commit 4bec6b7
Author: Christina.Holt <[email protected]>
Date:   Fri Apr 19 00:12:14 2024 +0000

    WIP

commit a12f6fb
Author: Christina.Holt <[email protected]>
Date:   Fri Apr 19 00:10:10 2024 +0000

    WIP

commit 444f1f0
Author: Christina.Holt <[email protected]>
Date:   Thu Apr 18 23:45:42 2024 +0000

    WIP

commit 16da1ef
Author: Christina Holt <[email protected]>
Date:   Thu Apr 18 14:47:15 2024 -0600

    Cleaning up old functionality.

commit e5b28da
Author: Christina Holt <[email protected]>
Date:   Thu Apr 18 14:35:59 2024 -0600

    Ordering for sourced files.

commit 8496c68
Author: Christina Holt <[email protected]>
Date:   Thu Apr 18 12:29:56 2024 -0600

    Order jobs.

commit 38f9daa
Author: Christina.Holt <[email protected]>
Date:   Thu Apr 18 18:19:44 2024 +0000

    Order docs.

commit 446f6eb
Author: Christina.Holt <[email protected]>
Date:   Thu Apr 18 14:13:22 2024 +0000

    Finished up first round of ex-scripts

commit 59d07d4
Author: Christina.Holt <[email protected]>
Date:   Thu Apr 18 02:39:49 2024 +0000

    WIP

commit 87e549f
Author: Christina.Holt <[email protected]>
Date:   Thu Apr 11 23:03:43 2024 +0000

    Keep same environment behavior for AQM jobs.

commit 46c4805
Author: Christina.Holt <[email protected]>
Date:   Thu Apr 11 22:01:26 2024 +0000

    Add machine to command

commit 55b500b
Author: Christina.Holt <[email protected]>
Date:   Thu Apr 11 21:56:41 2024 +0000

    WIP

    Use UW-compliant YAML for var_defns.sh file.
    Update loading tasks script for not having all required variables in
    environment.
natalie-perlin pushed a commit to natalie-perlin/ufs-srweather-app that referenced this pull request Aug 15, 2024
…s-community#1102)

This bug was encountered when verifying forecast output that has a 2-digit forecast hour in its name. It turns out specifying the METplus format %H to obtain a 2-digit forecast hour in the workflow/verification configuration variable FCST_FN_TEMPLATE (and others) causes an error in the shell script eval_METplus_timestr_tmpl.sh because bash's printf utility does not support the %H format. This fixes that error using a similar approach to the %HHH format for obtaining 3-digit hours.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants