Feature #1283 mvmode mods #2569

davidalbo · 2023-06-12T14:40:48Z

Expected Differences

Do these changes introduce new tools, command line arguments, or configuration file options? [Yes]
3 new config options: multivar_intensity, multivar_name, multivar_level. The latter 2 can be set independently for fcst and obs dictionaries.
Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]
But.. the output file organization (no more numbered subdirectories) and file naming has changed.

**Changes to be aware of that have not been 'vetted' by anyone or seem possibly incomplete, and are things the reviewer might want to consider before approving:

Went with 0's inside superobjects for 'all intensities false' to represent superobjects without intensities instead of setting missing data everywhere inside and outside superobjects.
Made up some replacements for when the config has level=(*,*). The output files names now include the levels. I replaced(*,*) with -X,X- in those output names, as the wildcard characters are problematic for file names. Do wildcards in levels make sense for multivariate mode? (The unit test for mvmode does have the wildcard levels, and does run and produce output with the wildcards).
Disallowed many configuration settings for multivariate mode, which hopefully will not cause problems for users, such as quilting and engine merging. Also everything must be the same length (number of files (fcst and obs), multivar_logic, multivar_intensity, and fcst and obs field array lengths).
the metplus changes to support the new config options was added by @georgemccabe but has not been tested on this version of mode.
documentation is only partially updated to describe the changes**

Pull Request Testing

Describe testing already performed for these changes:
Extensive testing on a test case provided by @hertneky . Unit tests have been run manually.
Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:
Have a look in /scratch/dave/mvmode_test_for_pull_request. In that directory two .bsh files can be run.
run-multi.bsh should run a setup with intensities computed for the last 2 fields, with output going to subdirectory output.
run-super.bsh should run a setup with no intensities, just superobject output, into subdirectory output-super

Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [No] It might work, but not sure.
Do these changes include sufficient testing updates? [Not quite] More testing of traditional mode would be good, as would testing on a variety of multivar examples.
Will this PR result in changes to the test suite? [Yes]

If yes, describe the new output and/or changes to the existing output:
Unit test output file names have changed, and additional unit testing is now in place. The test is internal/test_unit/xml/unit_mode_multivar.xml, which now runs two instances of multivariate mode.
New output files that should replace what were previously the output files for this test:
mode_Fcst_ALPHA-X,X-_Obs_ALPHA-X,X-_300000L_20120410_180000V_060000A_cts.txt
mode_Fcst_ALPHA-X,X-_Obs_ALPHA-X,X-_300000L_20120410_180000V_060000A_obj.nc
mode_Fcst_ALPHA-X,X-_Obs_ALPHA-X,X-_300000L_20120410_180000V_060000A_obj.txt
mode_Fcst_ALPHA-X,X-_Obs_ALPHA-X,X-_300000L_20120410_180000V_060000A.ps
mode_Fcst_BETA-X,X-_Obs_BETA-X,X-_300000L_20120410_180000V_060000A_cts.txt
mode_Fcst_BETA-X,X-_Obs_BETA-X,X-_300000L_20120410_180000V_060000A_obj.nc
mode_Fcst_BETA-X,X-_Obs_BETA-X,X-_300000L_20120410_180000V_060000A_obj.txt
mode_Fcst_BETA-X,X-_Obs_BETA-X,X-_300000L_20120410_180000V_060000A.ps
mode_Fcst_GAMMA-X,X-_Obs_GAMMA-X,X-_300000L_20120410_180000V_060000A_cts.txt
mode_Fcst_GAMMA-X,X-_Obs_GAMMA-X,X-_300000L_20120410_180000V_060000A_obj.nc
mode_Fcst_GAMMA-X,X-_Obs_GAMMA-X,X-_300000L_20120410_180000V_060000A_obj.txt
mode_Fcst_GAMMA-X,X-_Obs_GAMMA-X,X-_300000L_20120410_180000V_060000A.ps
mode_Fcst_Super_LO_Obs_Super_LO_300000L_20120410_180000V_060000A_cts.txt
mode_Fcst_Super_LO_Obs_Super_LO_300000L_20120410_180000V_060000A_obj.nc
mode_Fcst_Super_LO_Obs_Super_LO_300000L_20120410_180000V_060000A_obj.txt
mode_Fcst_Super_LO_Obs_Super_LO_300000L_20120410_180000V_060000A.ps
These files are here: /scratch/dave/mvmode_test/unit_test_output/

Please complete this pull request review by [Fill in date].

Pull Request Checklist

See the METplus Workflow for details.

Review the source issue metadata (required labels, projects, and milestone).
Complete the PR definition above.
Ensure the PR title matches the feature or bugfix branch name.
Define the PR metadata, as permissions allow.
Select: Reviewer(s)
Select: Organization level software support Project or Repository level development cycle Project
Select: Milestone as the version that will include these changes
After submitting the PR, select Development issue with the original issue number.
After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
Close the linked issue and delete your feature or bugfix branch from GitHub.

…e web and update local data

…data in memory, and implemented masking fields using superobjects.. a first cut, much more to clean up

… plus checks for the multivar_logic, both to make sure things are of the correct length and in bounds

…mode enum to distinguish single mode from multivar 1st and 2nd pass

…s of mode multivar to not do merging and matching

…e_flag=THRESH when logic includes a union

…nged to multivar mode naming

…e_mods

…re unimplemented situations and error exit for those

…e_mods

hertneky

I have no suggested changes. This looks good.

…nfig_constants.h. Make each error message consistently begin with 1 newline and end with 2. Use lookup_bool_array() defined in dictionary.cc and remove the redefinition in mode_conf_info.cc

JohnHalleyGotway

@davidalbo I think there's an issue with the convolution radius and thresholds being applied when defining the objects for each of the components. The config file settings indicate that the conv_radius = 2; and conv_thresh = >=5.0;. However, I don't see those settings in the output.

Downloading the differences flagged in this GHA run, I see the following.

All of the TRUTH data reports the expected convolution radius and threshold:

cat `find ./ -name "*_obj_TRUTH.txt"` | awk '{print $12,$13}' | sort -ru
FCST_RAD FCST_THR
2 >=5.0

While the newly generated OUTPUT data reports what I'm guessing are default convolution radius and threshold values:

cat `find ./ -name "*_obj_OUTPUT.txt"` | awk '{print $12,$13}' | sort -ru
FCST_RAD FCST_THR
0 !=-9999

I realize that the output file NAMES have changed, but I don't think the object definition criteria or geometry should change. Based on what I'm seeing, perhaps the logic for parsing the convolution radius and threshold from the input configuration file isn't working as expected?

Please take another look at this.

JohnHalleyGotway · 2023-06-13T00:19:28Z

@davidalbo note that running unit_mode_multivar.xml manually with at least verbosity level 3 (-v 3), I see that the conv radius and threshold ARE being parsed from the config file:

egrep -i "radius|threshold" run.log | sort -u
DEBUG 3: Applying circular convolution of radius 2 to the forecast field.
DEBUG 3: Applying circular convolution of radius 2 to the observation field.
DEBUG 3: Applying convolution threshold >=5.0 resulted in 1 simple forecast objects.
DEBUG 3: Applying convolution threshold >=5.0 resulted in 1 simple observation objects.

So perhaps they're being parsed and use correctly but not written to the output file correctly?

Another thing that's throwing me for a loop is the PostScript output. All of these PostScript files:

mode_Fcst_ALPHA_all_all_Obs_ALPHA_all_all_300000L_20120410_180000V_060000A_OUTPUT.ps
mode_Fcst_BETA_all_all_Obs_BETA_all_all_300000L_20120410_180000V_060000A_OUTPUT.ps
mode_Fcst_GAMMA_all_all_Obs_GAMMA_all_all_300000L_20120410_180000V_060000A_OUTPUT.ps
mode_Fcst_Super_LO_Obs_Super_LO_300000L_20120410_180000V_060000A_OUTPUT.ps

Show the same super objects. I was kind of expecting the ALPHA would show me the first object, BETA would show me the second, and GAMMA would show me the third. But I think I understand why they don't. This is displaying the raw intensity values for the ALPHA, BETA, and GAMMA input fields. It just so happens that our raw input data is so uninteresting (0's or 100's), that it's causing a little confusion for me.

davidalbo · 2023-06-13T18:36:49Z

@JohnHalleyGotway I believe I've found and fixed the incorrect conv radius and thresh in the output problem. As for the other concern, I thinkAlpha, Beta, Gamma, Supershould show the same objects. In each case what we are looking at is the data restricted to inside the same super objects. The data being Alpha, Beta, or Gamma (with intensities) andsuper(without intensities). Am I understanding the concern? If so, I'm ready for another push to github.

…ecified in output files for multivariate mode

JohnHalleyGotway · 2023-06-14T18:19:17Z

Dave, Tracy and I met briefly and have the following requests:

You already updated the FCST/OBS_RAD and FCST/OBS_THR columns in the _obj.txt file to indicate the radius and threshold applied to that field, which is great.
Please also populate the FCST/OBS_UNITS and FCST/OBS_LEV columns in the _obj.txt file to indicate the units and level for that field as well.
In the FCST/OBS_VAR column, pre-pend the super object name (i.e. multivar_name config file entry). For example, change FCST_VAR = ALPHA to FCST_VAR = Super_ALPHA.
In the PostScript output, can we plot the FULL raw field of each component rather than the raw field masked to the area of the super object? If so, that'd be more consistent with regular mode output. If it's too difficult, this isn't a big deal.
Recommend that we change the MODE config files used by the unit tests to request matching and make the max_centroid_dist bigger, as follows:

git diff config/MODEConfig_multivar_fake_data
diff --git a/internal/test_unit/config/MODEConfig_multivar_fake_data b/internal/test_unit/config/MODEConfig_multivar_fake_data
index c687d39c4..c8941f8b9 100644
--- a/internal/test_unit/config/MODEConfig_multivar_fake_data
+++ b/internal/test_unit/config/MODEConfig_multivar_fake_data
@@ -107,12 +107,12 @@ mask_missing_flag = NONE;
 //
 // Match objects between the forecast and observation fields
 //
-match_flag = NONE;
+match_flag = MERGE_BOTH;
 
 //
 // Maximum centroid distance for objects to be compared
 //
-max_centroid_dist = 800.0/grid_res;
+max_centroid_dist = 800.0;

And:

git diff config/MODEConfig_multivar_fake_data_with_intensities 
diff --git a/internal/test_unit/config/MODEConfig_multivar_fake_data_with_intensities b/internal/test_unit/config/MODEConfig_multivar_fake_data_with_intensities
index 8f646002e..af6d1720a 100644
--- a/internal/test_unit/config/MODEConfig_multivar_fake_data_with_intensities
+++ b/internal/test_unit/config/MODEConfig_multivar_fake_data_with_intensities
@@ -107,12 +107,12 @@ mask_missing_flag = NONE;
 //
 // Match objects between the forecast and observation fields
 //
-match_flag = NONE;
+match_flag = MERGE_BOTH;
 
 //
 // Maximum centroid distance for objects to be compared
 //
-max_centroid_dist = 800.0/grid_res;
+max_centroid_dist = 800.0;

davidalbo · 2023-06-14T19:57:32Z

For 2. above, I think for individual intensities this is already in place. Looking at Tracy's test case output I see for WIND: units = m/s and level = Z10, and for VIS: units = m and level = LO.

For the unit test it seems that units = NA, this comes from the input data, and level = (,) from the config, which is what we see in the Alpha, Beta, Gamma individual intensity output txt files.

For the super object only case (no intensities), I'm setting units = NA and level to LO because the inputs can be a mix of different values. For the Tracy test case, the inputs that made the super objects have units m/s, m and levels LO and Z10. What values should go into the txt file in this case? I went with NA and LO arbitrarily, open for suggestions.

JohnHalleyGotway · 2023-06-14T20:20:21Z

Dave, I'm glad to hear that you're getting meaningful units/level strings from Tracy. I do understand what you're saying, but there is some difference occurring. When MET reads data from NetCDF files the units and level variable attributes (if present) are used to define those strings. For example:

	float ALPHA(lat, lon) ;
		ALPHA:name = "ALPHA" ;
		ALPHA:long_name = "alpha" ;
		ALPHA:level = "SFC" ;
		ALPHA:units = "kg/m^2" ;

Running this through traditional MODE, it grabs the level and units strings from here. So I would expect to see those same strings in the output from multi-variate MODE.

davidalbo · 2023-06-14T20:28:16Z

Dave, I'm glad to hear that you're getting meaningful units/level strings from Tracy. I do understand what you're saying, but there is some difference occurring. When MET reads data from NetCDF files the units and level variable attributes (if present) are used to define those strings. For example:
	float ALPHA(lat, lon) ;
		ALPHA:name = "ALPHA" ;
		ALPHA:long_name = "alpha" ;
		ALPHA:level = "SFC" ;
		ALPHA:units = "kg/m^2" ;
Running this through traditional MODE, it grabs the level and units strings from here. So I would expect to see those same strings in the output from multi-variate MODE.

Might this be an artifact from the presence of level=(*,*) in the mode config file for the unit test? For the Tracy example, the level is specified explicitly in the config file.

JohnHalleyGotway · 2023-06-14T20:51:05Z

Dave, this line of code is relevant. The users requests (*,*) in the config file to tell MET what data to grab. That line of library code stores the level and units variable attributes, if defined, in the VarInfo object:

      if(info->level_att.length()     > 0) vinfo.set_level_name(info->level_att.c_str());
      if(info->units_att.length()     > 0) vinfo.set_units(info->units_att.c_str());

So this is how traditional MODE is populating the FCST_LEV and FCST_UNITS columns.

davidalbo · 2023-06-14T22:53:14Z

Dave, this line of code is relevant. The users requests (*,*) in the config file to tell MET what data to grab. That line of library code stores the level and units variable attributes, if defined, in the VarInfo object:
      if(info->level_att.length()     > 0) vinfo.set_level_name(info->level_att.c_str());
      if(info->units_att.length()     > 0) vinfo.set_units(info->units_att.c_str());
So this is how traditional MODE is populating the FCST_LEV and FCST_UNITS columns.

I have implemented a fix for this. I do not understand why the case from Tracy without wildcards did hold onto units and levels between pass1 (simple objects) and pass2 (super object masked), whereas the unit test with wildcard did not, but the fix does work for both tests. I'm grabbing the units/levels during the first pass where they are correct, saving them, and using them in the second pass.
Breaking news: It still doesn't work, only works for ALPHA, not BETA or GAMMA. Will revisit tomorrow AM.

… to variable names in output files, display full raw field in netcdf and postscript output files, changed unit test to include matching

…for internal consistency.

JohnHalleyGotway

I approve of these changes. Thanks @davidalbo!

I downloaded/reviewed the diff files and see that you made all the changes I previously requested. I reviewed several of the source code changes and did make a small commit to the branch with whitespace tweaks for consistent code formatting. But I didn't review every change in the 50 modified files. But the output looks great.

I'll go ahead and merge this PR and proceed with cutting the MET-11.1.0-RC1 release.

Co-authored-by: John Halley Gotway <[email protected]> Co-authored-by: jprestop <[email protected]> Co-authored-by: Seth Linden <[email protected]> Co-authored-by: Daniel Adriaansen <[email protected]> Co-authored-by: John and Cindy <[email protected]> Co-authored-by: rgbullock <[email protected]> Co-authored-by: Randy Bullock <[email protected]> Co-authored-by: Dave Albo <[email protected]> Co-authored-by: davidalbo <[email protected]> Co-authored-by: Julie Prestopnik <[email protected]> Co-authored-by: George McCabe <[email protected]> Co-authored-by: Lisa Goodrich <[email protected]> Co-authored-by: Howard Soh <[email protected]> Co-authored-by: hsoh-u <[email protected]> Co-authored-by: lisagoodrich <[email protected]> Co-authored-by: Seth Linden <[email protected]> Co-authored-by: metplus-bot <[email protected]> Co-authored-by: j-opatz <[email protected]> fix #2449 develop pdf (#2464) fix #2402 develop sonarqube (#2468) fix #2426 develop buoy (#2475) fix 2518 dtypes appf docs (#2519) fix 2531 compilation errors (#2533) fix #2531 compilation_errors_configure (#2535) fix #2514 develop clang (#2563) fix #2575 develop python_convert (#2576)

JohnHalleyGotway and others added 30 commits February 2, 2023 12:17

Per #2426, port bugfix over for the develop branch.

c76c548

initial commit of a script designed to pull NDBC station info from th…

8bca0e9

…e web and update local data

Redesigned python scrip build_ndbc_stations_from_web.py to use a class

a09e43d

Implemented the approach suggested by Gwen Chen NOAA

0212205

Added new NDBC stations updating script to make

a975a77

Updated ndbc_stations.xml file

dcb1494

Added documentation about the NDBC stations update script option

c3d5c89

Changed to not write out an elevation when input elevation is not found

9634cc9

For multivariate mode, modified to do everything in one run, keeping …

673b1fa

…data in memory, and implemented masking fields using superobjects.. a first cut, much more to clean up

Bug fix

fbb2db0

added the multivar_intensity_flag logic, added config checks for that…

5d4f8ad

… plus checks for the multivar_logic, both to make sure things are of the correct length and in bounds

compile bug fixes

36a7fc3

Merging develop into feature_1283_mvmode_mods

b1209a3

refactored the code for readability

0197bbe

Cosmetic changes

34638c2

Changed indentation style for mode source code

ade2c09

add checks for non-implemented multivar mode configs, add processing …

bccc761

…mode enum to distinguish single mode from multivar 1st and 2nd pass

put BoolArray in the libs along with a config lookup, changed 1st pas…

f1ba975

…s of mode multivar to not do merging and matching

changed output file names and organization, print a warning with merg…

a1b1ffc

…e_flag=THRESH when logic includes a union

Bugfix in which single var mode output file names had incorrectly cha…

4d7165d

…nged to multivar mode naming

Cleanup and a few tweaks

1741a81

Merge remote-tracking branch 'origin/develop' into feature_1283_mvmod…

0133a47

…e_mods

Change to create the output directory recursively if needed

9c58d59

Bug fix

39e0a8d

changed actions taken in pass1/pass2 for multivariate mode, detect mo…

0a4c832

…re unimplemented situations and error exit for those

merging algorithm changes for multivar mode prior to cleanup and testing

14ab55f

multivar_mode changes with some testing and cleanup

dc725ee

Merge branch 'develop' into feature_1282_mvmode_mods

4e3f915

tweak after merging develop into this branch

f1d8da1

test

1812a05

davidalbo added this to the MET 11.1.0 milestone Jun 12, 2023

davidalbo requested review from JohnHalleyGotway and hertneky June 12, 2023 14:40

davidalbo linked an issue Jun 12, 2023 that may be closed by this pull request

Enhance Multivariate MODE to generate object statistics for each input field requested by the user. #1283

Closed

21 tasks

JohnHalleyGotway removed the priority: high High Priority label Jun 12, 2023

JohnHalleyGotway changed the title ~~Feature 1283 mvmode mods~~ Feature #1283 mvmode mods Jun 12, 2023

Merge remote-tracking branch 'origin/develop' into feature_1283_mvmod…

f9279f0

…e_mods

hertneky reviewed Jun 12, 2023

View reviewed changes

hertneky approved these changes Jun 12, 2023

View reviewed changes

JohnHalleyGotway and others added 2 commits June 12, 2023 13:36

Per #1283, move multivar mode config file setting definitions into co…

3e9ca75

…nfig_constants.h. Make each error message consistently begin with 1 newline and end with 2. Use lookup_bool_array() defined in dictionary.cc and remove the redefinition in mode_conf_info.cc

Changed output file names for '(*,*)' to '_all_all_'

ce129d8

JohnHalleyGotway mentioned this pull request Jun 12, 2023

Enhancement: Enhance the MODE wrapper to handle updated multivar configuration options dtcenter/METplus#2210

Closed

21 tasks

JohnHalleyGotway requested changes Jun 12, 2023

View reviewed changes

Bug fix in which convolution thresholds and radii were incorrectly sp…

41c48bb

…ecified in output files for multivariate mode

Fixed units/level problem in output files, prepended superobject name…

ccfcdfc

… to variable names in output files, display full raw field in netcdf and postscript output files, changed unit test to include matching

JohnHalleyGotway self-requested a review June 15, 2023 19:13

Per #1283, ci-skip-all, no real code changes. Just whitespace tweaks …

0e05d4f

…for internal consistency.

JohnHalleyGotway approved these changes Jun 16, 2023

View reviewed changes

JohnHalleyGotway merged commit 9f9b4cd into develop Jun 16, 2023

JohnHalleyGotway mentioned this pull request Jun 19, 2023

Update develop-ref after #2567 and #2569 #2579

Merged

georgemccabe mentioned this pull request Jun 30, 2023

Enhancement: Update multi-variate MODE use case to match enhancements to tool dtcenter/METplus#2230

Closed

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature #1283 mvmode mods #2569

Feature #1283 mvmode mods #2569

davidalbo commented Jun 12, 2023

hertneky left a comment

JohnHalleyGotway left a comment

JohnHalleyGotway commented Jun 13, 2023

davidalbo commented Jun 13, 2023

JohnHalleyGotway commented Jun 14, 2023 •

edited

Loading

davidalbo commented Jun 14, 2023

JohnHalleyGotway commented Jun 14, 2023

davidalbo commented Jun 14, 2023 •

edited

Loading

JohnHalleyGotway commented Jun 14, 2023

davidalbo commented Jun 14, 2023 •

edited

Loading

JohnHalleyGotway left a comment

Feature #1283 mvmode mods #2569

Feature #1283 mvmode mods #2569

Conversation

davidalbo commented Jun 12, 2023

Expected Differences

Pull Request Testing

Pull Request Checklist

hertneky left a comment

Choose a reason for hiding this comment

JohnHalleyGotway left a comment

Choose a reason for hiding this comment

JohnHalleyGotway commented Jun 13, 2023

davidalbo commented Jun 13, 2023

JohnHalleyGotway commented Jun 14, 2023 • edited Loading

davidalbo commented Jun 14, 2023

JohnHalleyGotway commented Jun 14, 2023

davidalbo commented Jun 14, 2023 • edited Loading

JohnHalleyGotway commented Jun 14, 2023

davidalbo commented Jun 14, 2023 • edited Loading

JohnHalleyGotway left a comment

Choose a reason for hiding this comment

JohnHalleyGotway commented Jun 14, 2023 •

edited

Loading

davidalbo commented Jun 14, 2023 •

edited

Loading

davidalbo commented Jun 14, 2023 •

edited

Loading