Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature #1283 mvmode mods #2569

Merged
merged 44 commits into from
Jun 16, 2023
Merged

Feature #1283 mvmode mods #2569

merged 44 commits into from
Jun 16, 2023

Conversation

davidalbo
Copy link
Contributor

Expected Differences

  • Do these changes introduce new tools, command line arguments, or configuration file options? [Yes]
    3 new config options: multivar_intensity, multivar_name, multivar_level. The latter 2 can be set independently for fcst and obs dictionaries.

  • Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]
    But.. the output file organization (no more numbered subdirectories) and file naming has changed.


**Changes to be aware of that have not been 'vetted' by anyone or seem possibly incomplete, and are things the reviewer might want to consider before approving:

  1. Went with 0's inside superobjects for 'all intensities false' to represent superobjects without intensities instead of setting missing data everywhere inside and outside superobjects.
  2. Made up some replacements for when the config has level=(*,*). The output files names now include the levels. I replaced(*,*) with -X,X- in those output names, as the wildcard characters are problematic for file names. Do wildcards in levels make sense for multivariate mode? (The unit test for mvmode does have the wildcard levels, and does run and produce output with the wildcards).
  3. Disallowed many configuration settings for multivariate mode, which hopefully will not cause problems for users, such as quilting and engine merging. Also everything must be the same length (number of files (fcst and obs), multivar_logic, multivar_intensity, and fcst and obs field array lengths).
  4. the metplus changes to support the new config options was added by @georgemccabe but has not been tested on this version of mode.
  5. documentation is only partially updated to describe the changes**

Pull Request Testing

  • Describe testing already performed for these changes:
    Extensive testing on a test case provided by @hertneky . Unit tests have been run manually.

  • Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:
    Have a look in /scratch/dave/mvmode_test_for_pull_request. In that directory two .bsh files can be run.
    run-multi.bsh should run a setup with intensities computed for the last 2 fields, with output going to subdirectory output.
    run-super.bsh should run a setup with no intensities, just superobject output, into subdirectory output-super


  • Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [No] It might work, but not sure.

  • Do these changes include sufficient testing updates? [Not quite] More testing of traditional mode would be good, as would testing on a variety of multivar examples.

  • Will this PR result in changes to the test suite? [Yes]

    If yes, describe the new output and/or changes to the existing output:
    Unit test output file names have changed, and additional unit testing is now in place. The test is internal/test_unit/xml/unit_mode_multivar.xml, which now runs two instances of multivariate mode.
    New output files that should replace what were previously the output files for this test:
    mode_Fcst_ALPHA-X,X-_Obs_ALPHA-X,X-_300000L_20120410_180000V_060000A_cts.txt
    mode_Fcst_ALPHA-X,X-_Obs_ALPHA-X,X-_300000L_20120410_180000V_060000A_obj.nc
    mode_Fcst_ALPHA-X,X-_Obs_ALPHA-X,X-_300000L_20120410_180000V_060000A_obj.txt
    mode_Fcst_ALPHA-X,X-_Obs_ALPHA-X,X-_300000L_20120410_180000V_060000A.ps
    mode_Fcst_BETA-X,X-_Obs_BETA-X,X-_300000L_20120410_180000V_060000A_cts.txt
    mode_Fcst_BETA-X,X-_Obs_BETA-X,X-_300000L_20120410_180000V_060000A_obj.nc
    mode_Fcst_BETA-X,X-_Obs_BETA-X,X-_300000L_20120410_180000V_060000A_obj.txt
    mode_Fcst_BETA-X,X-_Obs_BETA-X,X-_300000L_20120410_180000V_060000A.ps
    mode_Fcst_GAMMA-X,X-_Obs_GAMMA-X,X-_300000L_20120410_180000V_060000A_cts.txt
    mode_Fcst_GAMMA-X,X-_Obs_GAMMA-X,X-_300000L_20120410_180000V_060000A_obj.nc
    mode_Fcst_GAMMA-X,X-_Obs_GAMMA-X,X-_300000L_20120410_180000V_060000A_obj.txt
    mode_Fcst_GAMMA-X,X-_Obs_GAMMA-X,X-_300000L_20120410_180000V_060000A.ps
    mode_Fcst_Super_LO_Obs_Super_LO_300000L_20120410_180000V_060000A_cts.txt
    mode_Fcst_Super_LO_Obs_Super_LO_300000L_20120410_180000V_060000A_obj.nc
    mode_Fcst_Super_LO_Obs_Super_LO_300000L_20120410_180000V_060000A_obj.txt
    mode_Fcst_Super_LO_Obs_Super_LO_300000L_20120410_180000V_060000A.ps
    These files are here: /scratch/dave/mvmode_test/unit_test_output/


  • Please complete this pull request review by [Fill in date].

Pull Request Checklist

See the METplus Workflow for details.

  • Review the source issue metadata (required labels, projects, and milestone).
  • Complete the PR definition above.
  • Ensure the PR title matches the feature or bugfix branch name.
  • Define the PR metadata, as permissions allow.
    Select: Reviewer(s)
    Select: Organization level software support Project or Repository level development cycle Project
    Select: Milestone as the version that will include these changes
  • After submitting the PR, select Development issue with the original issue number.
  • After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
  • Close the linked issue and delete your feature or bugfix branch from GitHub.

JohnHalleyGotway and others added 30 commits February 2, 2023 12:17
…data in memory, and implemented masking fields using superobjects.. a first cut, much more to clean up
… plus checks for the multivar_logic, both to make sure things are of the correct length and in bounds
…mode enum to distinguish single mode from multivar 1st and 2nd pass
…s of mode multivar to not do merging and matching
…re unimplemented situations and error exit for those
@davidalbo davidalbo added this to the MET 11.1.0 milestone Jun 12, 2023
@JohnHalleyGotway JohnHalleyGotway removed the priority: high High Priority label Jun 12, 2023
@JohnHalleyGotway JohnHalleyGotway changed the title Feature 1283 mvmode mods Feature #1283 mvmode mods Jun 12, 2023
Copy link
Contributor

@hertneky hertneky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no suggested changes. This looks good.

JohnHalleyGotway and others added 2 commits June 12, 2023 13:36
…nfig_constants.h. Make each error message consistently begin with 1 newline and end with 2. Use lookup_bool_array() defined in dictionary.cc and remove the redefinition in mode_conf_info.cc
Copy link
Collaborator

@JohnHalleyGotway JohnHalleyGotway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidalbo I think there's an issue with the convolution radius and thresholds being applied when defining the objects for each of the components. The config file settings indicate that the conv_radius = 2; and conv_thresh = >=5.0;. However, I don't see those settings in the output.

Downloading the differences flagged in this GHA run, I see the following.

All of the TRUTH data reports the expected convolution radius and threshold:

cat `find ./ -name "*_obj_TRUTH.txt"` | awk '{print $12,$13}' | sort -ru
FCST_RAD FCST_THR
2 >=5.0

While the newly generated OUTPUT data reports what I'm guessing are default convolution radius and threshold values:

cat `find ./ -name "*_obj_OUTPUT.txt"` | awk '{print $12,$13}' | sort -ru
FCST_RAD FCST_THR
0 !=-9999

I realize that the output file NAMES have changed, but I don't think the object definition criteria or geometry should change. Based on what I'm seeing, perhaps the logic for parsing the convolution radius and threshold from the input configuration file isn't working as expected?

Please take another look at this.

@JohnHalleyGotway
Copy link
Collaborator

@davidalbo note that running unit_mode_multivar.xml manually with at least verbosity level 3 (-v 3), I see that the conv radius and threshold ARE being parsed from the config file:

egrep -i "radius|threshold" run.log | sort -u
DEBUG 3: Applying circular convolution of radius 2 to the forecast field.
DEBUG 3: Applying circular convolution of radius 2 to the observation field.
DEBUG 3: Applying convolution threshold >=5.0 resulted in 1 simple forecast objects.
DEBUG 3: Applying convolution threshold >=5.0 resulted in 1 simple observation objects.

So perhaps they're being parsed and use correctly but not written to the output file correctly?

Another thing that's throwing me for a loop is the PostScript output. All of these PostScript files:

mode_Fcst_ALPHA_all_all_Obs_ALPHA_all_all_300000L_20120410_180000V_060000A_OUTPUT.ps
mode_Fcst_BETA_all_all_Obs_BETA_all_all_300000L_20120410_180000V_060000A_OUTPUT.ps
mode_Fcst_GAMMA_all_all_Obs_GAMMA_all_all_300000L_20120410_180000V_060000A_OUTPUT.ps
mode_Fcst_Super_LO_Obs_Super_LO_300000L_20120410_180000V_060000A_OUTPUT.ps

Show the same super objects. I was kind of expecting the ALPHA would show me the first object, BETA would show me the second, and GAMMA would show me the third. But I think I understand why they don't. This is displaying the raw intensity values for the ALPHA, BETA, and GAMMA input fields. It just so happens that our raw input data is so uninteresting (0's or 100's), that it's causing a little confusion for me.

@davidalbo
Copy link
Contributor Author

@JohnHalleyGotway I believe I've found and fixed the incorrect conv radius and thresh in the output problem. As for the other concern, I thinkAlpha, Beta, Gamma, Supershould show the same objects. In each case what we are looking at is the data restricted to inside the same super objects. The data being Alpha, Beta, or Gamma (with intensities) andsuper(without intensities). Am I understanding the concern? If so, I'm ready for another push to github.

…ecified in output files for multivariate mode
@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Jun 14, 2023

Dave, Tracy and I met briefly and have the following requests:

  1. You already updated the FCST/OBS_RAD and FCST/OBS_THR columns in the _obj.txt file to indicate the radius and threshold applied to that field, which is great.
  2. Please also populate the FCST/OBS_UNITS and FCST/OBS_LEV columns in the _obj.txt file to indicate the units and level for that field as well.
  3. In the FCST/OBS_VAR column, pre-pend the super object name (i.e. multivar_name config file entry). For example, change FCST_VAR = ALPHA to FCST_VAR = Super_ALPHA.
  4. In the PostScript output, can we plot the FULL raw field of each component rather than the raw field masked to the area of the super object? If so, that'd be more consistent with regular mode output. If it's too difficult, this isn't a big deal.
  5. Recommend that we change the MODE config files used by the unit tests to request matching and make the max_centroid_dist bigger, as follows:
git diff config/MODEConfig_multivar_fake_data
diff --git a/internal/test_unit/config/MODEConfig_multivar_fake_data b/internal/test_unit/config/MODEConfig_multivar_fake_data
index c687d39c4..c8941f8b9 100644
--- a/internal/test_unit/config/MODEConfig_multivar_fake_data
+++ b/internal/test_unit/config/MODEConfig_multivar_fake_data
@@ -107,12 +107,12 @@ mask_missing_flag = NONE;
 //
 // Match objects between the forecast and observation fields
 //
-match_flag = NONE;
+match_flag = MERGE_BOTH;
 
 //
 // Maximum centroid distance for objects to be compared
 //
-max_centroid_dist = 800.0/grid_res;
+max_centroid_dist = 800.0;

And:

git diff config/MODEConfig_multivar_fake_data_with_intensities 
diff --git a/internal/test_unit/config/MODEConfig_multivar_fake_data_with_intensities b/internal/test_unit/config/MODEConfig_multivar_fake_data_with_intensities
index 8f646002e..af6d1720a 100644
--- a/internal/test_unit/config/MODEConfig_multivar_fake_data_with_intensities
+++ b/internal/test_unit/config/MODEConfig_multivar_fake_data_with_intensities
@@ -107,12 +107,12 @@ mask_missing_flag = NONE;
 //
 // Match objects between the forecast and observation fields
 //
-match_flag = NONE;
+match_flag = MERGE_BOTH;
 
 //
 // Maximum centroid distance for objects to be compared
 //
-max_centroid_dist = 800.0/grid_res;
+max_centroid_dist = 800.0;

@davidalbo
Copy link
Contributor Author

For 2. above, I think for individual intensities this is already in place. Looking at Tracy's test case output I see for WIND: units = m/s and level = Z10, and for VIS: units = m and level = LO.

For the unit test it seems that units = NA, this comes from the input data, and level = (,) from the config, which is what we see in the Alpha, Beta, Gamma individual intensity output txt files.

For the super object only case (no intensities), I'm setting units = NA and level to LO because the inputs can be a mix of different values. For the Tracy test case, the inputs that made the super objects have units m/s, m and levels LO and Z10. What values should go into the txt file in this case? I went with NA and LO arbitrarily, open for suggestions.

@JohnHalleyGotway
Copy link
Collaborator

Dave, I'm glad to hear that you're getting meaningful units/level strings from Tracy. I do understand what you're saying, but there is some difference occurring. When MET reads data from NetCDF files the units and level variable attributes (if present) are used to define those strings. For example:

	float ALPHA(lat, lon) ;
		ALPHA:name = "ALPHA" ;
		ALPHA:long_name = "alpha" ;
		ALPHA:level = "SFC" ;
		ALPHA:units = "kg/m^2" ;

Running this through traditional MODE, it grabs the level and units strings from here. So I would expect to see those same strings in the output from multi-variate MODE.

@davidalbo
Copy link
Contributor Author

davidalbo commented Jun 14, 2023

Dave, I'm glad to hear that you're getting meaningful units/level strings from Tracy. I do understand what you're saying, but there is some difference occurring. When MET reads data from NetCDF files the units and level variable attributes (if present) are used to define those strings. For example:

	float ALPHA(lat, lon) ;
		ALPHA:name = "ALPHA" ;
		ALPHA:long_name = "alpha" ;
		ALPHA:level = "SFC" ;
		ALPHA:units = "kg/m^2" ;

Running this through traditional MODE, it grabs the level and units strings from here. So I would expect to see those same strings in the output from multi-variate MODE.

Might this be an artifact from the presence of level=(*,*) in the mode config file for the unit test? For the Tracy example, the level is specified explicitly in the config file.

@JohnHalleyGotway
Copy link
Collaborator

Dave, this line of code is relevant. The users requests (*,*) in the config file to tell MET what data to grab. That line of library code stores the level and units variable attributes, if defined, in the VarInfo object:

      if(info->level_att.length()     > 0) vinfo.set_level_name(info->level_att.c_str());
      if(info->units_att.length()     > 0) vinfo.set_units(info->units_att.c_str());

So this is how traditional MODE is populating the FCST_LEV and FCST_UNITS columns.

@davidalbo
Copy link
Contributor Author

davidalbo commented Jun 14, 2023

Dave, this line of code is relevant. The users requests (*,*) in the config file to tell MET what data to grab. That line of library code stores the level and units variable attributes, if defined, in the VarInfo object:

      if(info->level_att.length()     > 0) vinfo.set_level_name(info->level_att.c_str());
      if(info->units_att.length()     > 0) vinfo.set_units(info->units_att.c_str());

So this is how traditional MODE is populating the FCST_LEV and FCST_UNITS columns.

I have implemented a fix for this. I do not understand why the case from Tracy without wildcards did hold onto units and levels between pass1 (simple objects) and pass2 (super object masked), whereas the unit test with wildcard did not, but the fix does work for both tests. I'm grabbing the units/levels during the first pass where they are correct, saving them, and using them in the second pass.
Breaking news: It still doesn't work, only works for ALPHA, not BETA or GAMMA. Will revisit tomorrow AM.

… to variable names in output files, display full raw field in netcdf and postscript output files, changed unit test to include matching
@JohnHalleyGotway JohnHalleyGotway self-requested a review June 15, 2023 19:13
Copy link
Collaborator

@JohnHalleyGotway JohnHalleyGotway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of these changes. Thanks @davidalbo!

I downloaded/reviewed the diff files and see that you made all the changes I previously requested. I reviewed several of the source code changes and did make a small commit to the branch with whitespace tweaks for consistent code formatting. But I didn't review every change in the 50 modified files. But the output looks great.

I'll go ahead and merge this PR and proceed with cutting the MET-11.1.0-RC1 release.

@JohnHalleyGotway JohnHalleyGotway merged commit 9f9b4cd into develop Jun 16, 2023
JohnHalleyGotway pushed a commit that referenced this pull request Jun 19, 2023
Co-authored-by: John Halley Gotway <[email protected]>
Co-authored-by: jprestop <[email protected]>
Co-authored-by: Seth Linden <[email protected]>
Co-authored-by: Daniel Adriaansen <[email protected]>
Co-authored-by: John and Cindy <[email protected]>
Co-authored-by: rgbullock <[email protected]>
Co-authored-by: Randy Bullock <[email protected]>
Co-authored-by: Dave Albo <[email protected]>
Co-authored-by: davidalbo <[email protected]>
Co-authored-by: Julie Prestopnik <[email protected]>
Co-authored-by: George McCabe <[email protected]>
Co-authored-by: Lisa Goodrich <[email protected]>
Co-authored-by: Howard Soh <[email protected]>
Co-authored-by: hsoh-u <[email protected]>
Co-authored-by: lisagoodrich <[email protected]>
Co-authored-by: Seth Linden <[email protected]>
Co-authored-by: metplus-bot <[email protected]>
Co-authored-by: j-opatz <[email protected]>
fix #2449 develop pdf (#2464)
fix #2402 develop sonarqube (#2468)
fix #2426 develop buoy (#2475)
fix 2518 dtypes appf docs (#2519)
fix 2531 compilation errors (#2533)
fix #2531 compilation_errors_configure (#2535)
fix #2514 develop clang (#2563)
fix #2575 develop python_convert (#2576)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Enhance Multivariate MODE to generate object statistics for each input field requested by the user.
3 participants