Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEEPS Quality Assurance (QA) code review #2882

Closed
1 of 15 tasks
mpm-meto opened this issue May 9, 2024 · 10 comments · Fixed by #2987
Closed
1 of 15 tasks

SEEPS Quality Assurance (QA) code review #2882

mpm-meto opened this issue May 9, 2024 · 10 comments · Fixed by #2987
Assignees
Labels
alert: NEED ACCOUNT KEY Need to assign an account key to this issue component: code cleanup Code cleanup and maintenance issue MET: Grid-to-Grid Verification MET: Grid-to-Point Verification requestor: UK Met Office United Kingdom Met Office required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone type: bug Fix something that is not working
Milestone

Comments

@mpm-meto
Copy link

mpm-meto commented May 9, 2024

Following the addition of the SEEPS code to PointStat and GridStat a science quality assurance (QA) review of the code needs to be carried out.

Met Office will conduct this code review. Changes implemented by DTC.

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of main_<Version>.
    Branch name: bugfix_<Issue Number>_main_<Version>_<Description>
  • Fix the bug and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into main_<Version>.
    Pull request: bugfix <Issue Number> main_<Version> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Development issue
    Select: Milestone as the next bugfix version
    Select: Coordinated METplus-X.Y Support project for support of the current coordinated release
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Complete the steps above to fix the bug on the develop branch.
    Branch name: bugfix_<Issue Number>_develop_<Description>
    Pull request: bugfix <Issue Number> develop <Description>
    Select: Reviewer(s) and Development issue
    Select: Milestone as the next official version
    Select: MET-X.Y.Z Development project for development toward the next official release
  • Close this issue.
@mpm-meto mpm-meto added type: bug Fix something that is not working alert: NEED MORE DEFINITION Not yet actionable, additional definition required alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle labels May 9, 2024
@mpm-meto mpm-meto self-assigned this May 9, 2024
@mpm-meto
Copy link
Author

mpm-meto commented May 9, 2024

QA review 99% complete, and almost ready to hand over code changes. Pull request will be a combination of bug fixes and code tidy/clean up to make it less ambiguous. One of the key issues has been that this code has been translated between so many different languages, some of which are row-dominant/major and others are column-dominant/major. In these situations using matrix referencing becomes highly ambiguous and some issues with data being used in a (wrongly) transposed form were found.

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented May 9, 2024

Recommend serving up the gridded SEEPS climatology data via this DTCenter Zenodo community. @mpm-meto and @RachelNorth, please sign into Zenodo and send me you usernames so that I can add you as curators to the DTCenter community.

Once this data is posted to Zenodo, we'll need to update the MET User's Guide with instructions for finding/using it.

@jprestop
Copy link
Collaborator

jprestop commented May 9, 2024

Will also want to test with data once accessible; @hsoh-u worked on SEEPS previously and should be able to help.

@JohnHalleyGotway JohnHalleyGotway added the required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone label May 15, 2024
@mpm-meto
Copy link
Author

We don't have permissions to create a branch. Please can this be fixed or someone create a branch for us?

@mpm-meto
Copy link
Author

Rachel and I have found a way to edit using a fork. This creates a branch but it does create two branches so we're not editing the same branch. The branches will need to be merged. We don't want to create extra work but we don't know enough about git (yet) to know how to do that.

RachelNorth added a commit to RachelNorth/MET that referenced this issue Jul 29, 2024
Changes to variable names to make them less ambiguous and bug fixes associated with SEEPS QA in dtcenter#2882
@JohnHalleyGotway
Copy link
Collaborator

@mpm-meto and @RachelNorth, please find the results of our work today in this feature branch on the MET repo:
https://github.com/dtcenter/MET/tree/feature_2882_seeps_qa

As a reminder, it currently fails to compile in pair_data_point.cc, as shown below:

pair_data_point.cc: In member function 'void VxPairDataPoint::add_point_obs(float*, const char*, const char*, unixtime, const char*, float*, Grid&, const char*, const DataPlane*)':
pair_data_point.cc:1569:49: error: cannot convert 'double' to 'SeepsScore*' in assignment
 1569 |                seeps = pd[i][j][k].compute_seeps(hdr_sid_str, fcst_v, obs_v, hdr_ut);
      |                        ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                 |
      |                                                 double

@JohnHalleyGotway JohnHalleyGotway removed the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Jul 29, 2024
@JohnHalleyGotway
Copy link
Collaborator

@RachelNorth and @mpm-meto, this is a reminder about 2 topics. Based on our recent work together, it looks like we'll be modifying the output column names in the SEEPS line type, replacing S12, S13, and so on with more descriptive names.

If the PR for this work, does change those column names...

  1. Remember to update them in this table of the documentation. That can be found in the docs/Users_Guide/point_stat.rst.
  2. Remember to create a new METdataio enhancement issue to update the METdataio schema to handle the new column names... and rename columns in existing databases.

@bikegeek, I'm tagging you here to put this on your radar.

JohnHalleyGotway added a commit that referenced this issue Aug 14, 2024
…d revisiting the volume of SEEPS-related Debug log messages and reducing them once its fully tested.
@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Aug 14, 2024

@mpm-meto and @RachelNorth, FYI, I got this feature branch compiling again with the commit listed right above this comment. I did merge the latest changes from the develop into this branch and manually resolved a few conflicts. I'm 85% certain it's doing what you want it to do, but please do review and test further!

That commit includes a lot of standardizing of debug log messages as well. Although, once you're satisfied with the functionality, I'd recommend removing some of them... since there are so many. It's good to have useful debugging, but each one slows down the code slightly. Even if it's not printed by the logger, it still takes time to construct the message, pass it to the logger, and have the logger decide whether or not to print it.

I did also manually trigger this testing.yml GitHub action run. It compiled the code and ran all the unit tests, one of which failed with:

WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "odfl_00" does not exist!
WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "odfh_00" does not exist!
WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "olfd_00" does not exist!
WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "olfh_00" does not exist!
WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "ohfd_00" does not exist!
WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "ohfl_00" does not exist!
WARNING: 
ERROR  : 
ERROR  : SeepsClimoGrid::read_seeps_climo_grid() -> Did not get odfl_00 (s12)
ERROR  : 

So clearly, I we need to update the variable names in ${MET_TEST_INPUT}/climatology_data/seeps/PPT24_seepsweights_grid.nc. I'll go work on that now.

@robdarvell hopefully the following commands will help in getting this branch compiled there:

# I like including the branch name in the directory (as below), but its not required
git clone https://github.com/dtcenter/MET MET-feature_2882_seeps_qa
cd MET-feature_2882_seeps_qa
git checkout feature_2882_seeps_qa
# Confirm that your compilation environment is set
printenv | grep MET_
# Run configure, adding whatever options you normally use. Below, I'm compiling all the options (--enable-all) in the current directory (--prefix)
./configure --prefix=`pwd` --enable-all
make install test >& make.log &
tail -f make.log
# Type CTRL-C to exit the tail

Then @mpm-meto and @RachelNorth will make edits as needed in the source code.

  • Run git status to see what your current modifications as compared to the contents of the repo.
  • Run git diff /path/to/modified/file to see the actual differences line-by-line.
  • Run git diff origin/develop --name-only to see which of your local files differ from those in the develop branch.
  • Run git diff origin/develop /path/to/modified/file to see those actual differences line-by-line.
  • Run git add /path/to/modified/file to stage a file to be committed.
  • Run git commit -m "Per #2882, change <add details here>" to commit your staged changes with a message describing them.
  • Run git push to push the commits from your local repository back up to GitHub.

@JohnHalleyGotway JohnHalleyGotway self-assigned this Aug 14, 2024
@JohnHalleyGotway JohnHalleyGotway removed the alert: NEED MORE DEFINITION Not yet actionable, additional definition required label Aug 14, 2024
@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Aug 14, 2024

@mpm-meto and @RachelNorth, FYI, I ran the following ncrename command to rename the SEEPS climo variables to use the new names:

# Create a new version with v12.0 in the filename.
# Once this feature branch is merged into develop, we can replace the existing PPT24_seepsweights_grid.nc with the v12.0 version.
ncrename -h -v s12_00,odfl_00 -v s13_00,odfh_00 \
   -v s21_00,olfd_00 -v s23_00,olfh_00 \
   -v s31_00,ohfd_00 -v s32_00,ohfl_00 \
   PPT24_seepsweights_grid.nc PPT24_seepsweights_grid_v12.0.nc

The result can be found in:
https://dtcenter.ucar.edu/dfiles/code/METplus/MET/MET_unit_test/unit_test/climatology_data/seeps/PPT24_seepsweights_grid_v12.0.nc

I also re-triggered another regression test run for the MET-feature_2882_seeps_qa branch. That compiled the MET code, saw that there's a new timestamp on the input data tarfile, automatically pulled the updated inputs, ran all the unit tests, and then diffed against our previous output.

The good news is that all the tests ran. However some differences were flagged, as can be seen by downloading this diff artifact tar file.

You'll find files in there with the TRUTH and OUTPUT suffixes. Spot checking some, I see that some SEEPS values that were non-zero are now very close to zero. Based on the diffs I'm seeing, I'm guessing more work is needed.

For example, comparing grid_stat/grid_stat_SEEPS_000000L_20211202_000000V_seeps_TRUTH.txt and _OUTPUT.txt, I see:

musial6:diff johnhg$ cat grid_stat/grid_stat_SEEPS_000000L_20211202_000000V_seeps_TRUTH.txt | cut -c272-600
TOTAL  S12        S13      S21         S23      S31      S32       PF1     PF2     PF3     PV1     PV2     PV3      MEAN_FCST MEAN_OBS SEEPS
575576 6048.16295 24.63754 21215.92302 22.81579 2369.768 467.28489 0.61274 0.27837 0.10888 0.75553 0.21359 0.030882   3.55237  0.74543 0.59112
musial6:diff johnhg$ cat grid_stat/grid_stat_SEEPS_000000L_20211202_000000V_seeps_OUTPUT.txt | cut -c272-600
TOTAL  S12         S13        S21         S23         S31         S32         PF1     PF2     PF3     PV1     PV2     PV3      MEAN_FCST MEAN_OBS SEEPS
575576 2.3715e-322 6.926e-310 4.6542e-310 3.4804e-317 4.6542e-310 4.6542e-310 0.61274 0.27837 0.10888 0.75553 0.21359 0.030882   3.55237  0.74543 0.62105

Those numbers differ a lot.

And I also assume that we need to modify the actual output columns names of S12, S13, and so on?

Please take a look and let me know what the next steps are.

Would you like me to change the output columns names on this feature branch?

JohnHalleyGotway added a commit that referenced this issue Aug 15, 2024
….nc file name. Rename as _v12.0.nc for the updated version with the new names so that the existing regressions tests and nightly builds for main_v11.1 and develop continue to work. We can remove the _v12.0 once this feature branch is merged into develop but for the time being, we need both versions to exist.
@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Sep 20, 2024

When meeting on 9/20/24, @mpm-meto confirmed that @JohnHalleyGotway should proceed with the changes to the output column names described above and submit a PR to include these changes in the MET-12.0.0-beta6 development release.

@mpm-meto will coordinate with @RachelNorth to confirm that no additional changes are needed at this time. However, they will plan to test this functionality in MET-12.0.0-beta6 for accuracy and will submit a bugfix issue if they encounter any unexpected behvior.

JohnHalleyGotway added a commit that referenced this issue Sep 20, 2024
… to the more descriptive ODFL, ODFH, OLFD, OLFH, OHFD, OHFL names.
JohnHalleyGotway added a commit that referenced this issue Sep 20, 2024
… like the SEEPS score itself so that they're handled in a consistent manner. Note however that it's hard-coded to NOT write the weighted means/score, only the unweighted ones.
JohnHalleyGotway added a commit that referenced this issue Sep 20, 2024
JohnHalleyGotway added a commit that referenced this issue Sep 24, 2024
JohnHalleyGotway added a commit that referenced this issue Sep 24, 2024
…loat and double inputs to satisfy compiler pb2nc compiler warnings.
@JohnHalleyGotway JohnHalleyGotway linked a pull request Oct 8, 2024 that will close this issue
17 tasks
JohnHalleyGotway added a commit that referenced this issue Oct 9, 2024
…on number from the gridded SEEPS climo file name ci-skip-all
JohnHalleyGotway added a commit that referenced this issue Oct 9, 2024
* Update seeps.h

Change variable names to reduce ambiguity for interpretation and aid useability.

* Update seeps.cc

Pull through variable name changes and renaming of functions to aid legibility and clarity. Introduced some additional debug print statements.

* Update grid-stat.rst

Add documentation about the location of the gridded climatology files for SEEPS and which environment variable to use.

* Replace read_seeps_scores() with get_seeps_climo_grid()

* Manually merging Rachel's patch-1 changes.

* Getting close to getting these seeps changes to compile. But it's failing in pair_data_point.cc

* Per #2882, get branch feature_2882_seeps_qa compiling again. Recommend revisiting the volume of SEEPS-related Debug log messages and reducing them once its fully tested.

* Per #2882, need to update the handling of the PPT24_seepsweights_grid.nc file name. Rename as _v12.0.nc for the updated version with the new names so that the existing regressions tests and nightly builds for main_v11.1 and develop continue to work. We can remove the _v12.0 once this feature branch is merged into develop but for the time being, we need both versions to exist.

* Per #2882, rename the SEEPS columns from S12, S13, S21, S23, S31, S32 to the more descriptive ODFL, ODFH, OLFD, OLFH, OHFD, OHFL names.

* Per #2882, update SEEPS details

* Per #2882, store and report the weighted mean fcst and mean obs, just like the SEEPS score itself so that they're handled in a consistent manner. Note however that it's hard-coded to NOT write the weighted means/score, only the unweighted ones.

* Per #2882, change SEEPS debug log levels and correct the storage of mean_fcst and mean_obs values.

* Per #2882, correct SEEPS column name lookups

* Per #2882, call is_bad_data() instead of is_eq(..., -9999.0) to get rid of compiler warning message.

* Per #2882, add 2 more variations of the is_eq() function with mixed float and double inputs to satisfy compiler pb2nc compiler warnings.

* Per #2882, switch from dynamically allocated arrays to std::vector

* Per #2882, enhance Stat-Analysis to write the SEEPS line type to an output .stat file.

* Per #2882, update the aggregated seeps computation to use better-initialized vectors.

* Per #2882, resolve a few more SonarQube code smells.

* Per #2882, now that this PR is ready to merge, remove the v12.0 version number from the gridded SEEPS climo file name ci-skip-all

---------

Co-authored-by: mpm-meto <[email protected]>
JohnHalleyGotway added a commit that referenced this issue Oct 11, 2024
* Per #2887, update NumArray::vals() to return a reference to the vector rather a pointer to doubles.

* Per #2887, switch over the whole ContingencyTable class heirarchy from storing integer counts to storing double-precision weights.

* Add ContingencyTable::is_integer() member function to check whether the table contains all integers

* Per #2887, update parse_stat_line.cc to get it to compile after changing PCT to store thresholds in a std::vector.

* Per #2887, update PCTInfo::clear() logic.

* Per #2887, update ctc_by_row() logic to create reproducible results with the develop branch.

* Per #2887, update logic of define_prob_bins() to add a final >=1.0 threshold if needed. While ==0.1 works fine, I found that ==0.05 did not because the last >=1.0 threshold was missing likely do to floating point precision issues. This change should fix that problem.

* Per #2887, update roc_auc() function to match the develop branch

* Per #2887, fix bug if computation of far()

* Per #2887, replaced all ==0 integer equality checks with calls to is_eq() instead and fix a couple of equations to snuff out diffs in some CTS statistics.

* Per #2887, address some of the 34 SonarQube code smells flagged for this PR. Note that the compute_ci.h/.cc changes are necessary and good since we should be computing CI's using doubles instead of integer counts.

* Per #2887, update run_sonarqube.sh to specify the target CXX standard as 11. The hope is that that will limit the findings to only those features available in the C++11 standard.

* Per #2887, update to SonarQube version 6.1.0.4477 released on 6/27/2024.

* Per #2887, updating build_met_sonarqube.sh to specify --std=c++11 since c++17 is used by default

* Per #2887, swap in a much simpler implementation of the ORSS statistic to match the equation listed in the MET User's Guide.

* Per #2887, update grid_stat and library code to actually apply the grid_weight_flag settings to the computation of contingency table counts and statistics.

* Per #2887, fix the handling of bad data in the ORSS equation.

* Per #2887, add Npairs member to the ContingencyTable class, eliminate the n() accessor function, and carefully replace references to n() with n_pairs() for the integer number of matched pairs or total() with the double-precision sum of the weights.

* Per #2887, reset Npairs = 0 for ContingencyTable::zero_out()

* Per #2883, need to call set_n_pairs() in a few spots to set ECLV TOTAL column correctly ci-run-unit

* Per #2887, call set_n_pairs() when aggregating PCT data in Series-Analysis ci-run-unit

* Per #2887, update stat_analysis to parse the TOTAL column for the PCT and MCTC line types.

* Pet #2882, call set_n_pairs() after set_size() ci-run-unit

* Per #2887, reconfigure existing Ensemble-Stat unit test to request probabilistic output to see that it's impacted by the grid_weight_flag setting.

* Per #2887, update Ensemble-Stat test to provide climo stdev data

* Per #2887, add grid_weight_flag to the list of config options for Grid-Stat and Ensemble-Stat.

* Per #2887, disable FHO output if grid_weight_flag != NONE.

* Per #2887, revise the existing unit_grid_weight.xml unit tests for Grid-Stat to write CTC/CTS/MCTC/MCTS output and for the DESC column to be populated to indicate the type of grid weighting that was applied.

* Per #2887, relatively small changes to drive down SonarQube code smells. Also, switch from total() to n_pairs() when computing confidence intervals.

* Per #2887, more SonarQube tweaks

* Per #2887, more SonarQube tweaks.

* Per #2887, more SonarQube tweaks.

* Per #2887, whitespace only changes.

* Per #2287, fix path the seeps climo grid.

* Per #2887, update the grid_weight_flag documentation.

* Per #2887, tweak the wording.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alert: NEED ACCOUNT KEY Need to assign an account key to this issue component: code cleanup Code cleanup and maintenance issue MET: Grid-to-Grid Verification MET: Grid-to-Point Verification requestor: UK Met Office United Kingdom Met Office required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone type: bug Fix something that is not working
Projects
Status: 🏁 Done
Development

Successfully merging a pull request may close this issue.

3 participants