Skip to content

Commit

Permalink
Feature 1733 exc (#1734)
Browse files Browse the repository at this point in the history
* Per #1733, add column_exc_name, column_exc_val, init_exc_name, and init_exc_val options to the TCStat config files.

* Per #1733, enhance tc_stat to support the column_exc and init_exc config file and job command filtering options.

* Per #1733, update stat_analysis to support the -column_exc job filtering option. Still need to update docuementation and add unit tests.

* Per #1773, update the user's guide with the new config and job command options.

* Per #1733, add call to stat_analysis to exercise -column_str and -column_exc options.

* Per #1733, I ran into a namespace conflict in tc_stat where -init_exc was used for to filter by time AND my string value. So I switched to using -init_str_exc instead. And made the corresponding change to -column_str_exc in stat_analysis and tc_stat. Also changed internal variable names to use IncMap and ExcMap to keep the logic clear.

* Per #1733, tc_stat config file updates to switch from column_exc and init_exc to column_str_exc and init_str_exc.

* Per #1733, add tc_stat and stat_analysis jobs to exercise the string filtering options.
  • Loading branch information
JohnHalleyGotway authored Mar 29, 2021
1 parent 8dfd7c0 commit e2f77e4
Show file tree
Hide file tree
Showing 16 changed files with 562 additions and 303 deletions.
12 changes: 12 additions & 0 deletions met/data/config/TCStatConfig_default
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ column_thresh_val = [];
column_str_name = [];
column_str_val = [];

//
// Stratify by excluding strings in non-numeric data columns.
//
column_str_exc_name = [];
column_str_exc_val = [];

//
// Similar to the column_thresh options above
//
Expand All @@ -123,6 +129,12 @@ init_thresh_val = [];
init_str_name = [];
init_str_val = [];

//
// Similar to the column_str_exc options above
//
init_str_exc_name = [];
init_str_exc_val = [];

//
// Stratify by the ADECK and BDECK distances to land.
//
Expand Down
16 changes: 9 additions & 7 deletions met/docs/Users_Guide/config_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3748,17 +3748,19 @@ Where "job_name" is set to one of the following:
Job command FILTERING options that may be used only when -line_type
has been listed once. These options take two arguments: the name of the
data column to be used and the min, max, or exact value for that column.
If multiple column eq/min/max/str options are listed, the job will be
If multiple column eq/min/max/str/exc options are listed, the job will be
performed on their intersection:

.. code-block:: none
"-column_min col_name value" e.g. -column_min BASER 0.02
"-column_max col_name value"
"-column_eq col_name value"
"-column_thresh col_name threshold" e.g. -column_thresh FCST '>273'
"-column_str col_name string" separate multiple filtering strings
with commas
"-column_min col_name value" e.g. -column_min BASER 0.02
"-column_max col_name value"
"-column_eq col_name value"
"-column_thresh col_name threshold" e.g. -column_thresh FCST '>273'
"-column_str col_name string" separate multiple filtering strings
with commas
"-column_str_exc col_name string" separate multiple filtering strings
with commas
Job command options to DEFINE the analysis job. Unless otherwise noted,
Expand Down
52 changes: 45 additions & 7 deletions met/docs/Users_Guide/config_options_tc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -517,8 +517,8 @@ For example:

Stratify by performing string matching on non-numeric data columns.
Specify a comma-separated list of columns names and values
to be checked. May add using the "-column_str name string" job command
options.
to be included in the analysis.
May add using the "-column_str name string" job command options.

For example:

Expand All @@ -531,6 +531,23 @@ For example:
column_str_name = [];
column_str_val = [];
**column_str_exc_name, column_str_exc_val**

Stratify by performing string matching on non-numeric data columns.
Specify a comma-separated list of columns names and values
to be excluded from the analysis.
May add using the "-column_str_exc name string" job command options.

For example:

| column_str_exc_name = [ "LEVEL" ];
| column_str_exc_val = [ "TD" ];
|
.. code-block:: none
column_str_exc_name = [];
column_str_exc_val = [];
**init_thresh_name, init_thresh_val**

Expand Down Expand Up @@ -567,6 +584,23 @@ For example:
init_str_name = [];
init_str_val = [];
**init_str_exc_name, init_str_exc_val**

Just like the column_str_exc options above, but apply the string matching only
when lead = 0. If lead = 0 string does match, discard the entire track.
May add using the "-init_str_exc name thresh" job command options.

For example:

| init_str_exc_name = [ "LEVEL" ];
| init_str_exc_val = [ "HU" ];
|
.. code-block:: none
init_str_exc_name = [];
init_str_exc_val = [];
**water_only**

Stratify by the ADECK and BDECK distances to land. Once either the ADECK or
Expand Down Expand Up @@ -747,8 +781,10 @@ Where "job_name" is set to one of the following:
"-track_watch_warn name"
"-column_thresh name thresh"
"-column_str name string"
"-column_str_exc name string"
"-init_thresh name thresh"
"-init_str name string"
"-init_str_exc name string"
Additional filtering options that may be used only when -line_type
has been listed only once. These options take two arguments: the name
Expand All @@ -758,11 +794,13 @@ Where "job_name" is set to one of the following:

.. code-block:: none
"-column_min col_name value" For example: -column_min TK_ERR 100.00
"-column_max col_name value"
"-column_eq col_name value"
"-column_str col_name string" separate multiple filtering strings
with commas
"-column_min col_name value" For example: -column_min TK_ERR 100.00
"-column_max col_name value"
"-column_eq col_name value"
"-column_str col_name string" separate multiple filtering strings
with commas
"-column_str_exc col_name string" separate multiple filtering strings
with commas
Required Args: -dump_row

Expand Down
4 changes: 2 additions & 2 deletions met/docs/Users_Guide/gsi-tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ The GSID2MPR tool writes the same set of MPR output columns for the conventional
- PRS_MAX_WGT
- Pressure of the maximum weighing function

The gsid2mpr output may be passed to the Stat-Analysis tool to derive additional statistics. In particular, users should consider running the **aggregate_stat** job type to read MPR lines and compute partial sums (SL1L2), continuous statistics (CNT), contingency table counts (CTC), or contingency table statistics (CTS). Stat-Analysis has been enhanced to parse any extra columns found at the end of the input lines. Users can filter the values in those extra columns using the **-column_thresh** and **-column_str** job command options.
The gsid2mpr output may be passed to the Stat-Analysis tool to derive additional statistics. In particular, users should consider running the **aggregate_stat** job type to read MPR lines and compute partial sums (SL1L2), continuous statistics (CNT), contingency table counts (CTC), or contingency table statistics (CTS). Stat-Analysis has been enhanced to parse any extra columns found at the end of the input lines. Users can filter the values in those extra columns using the **-column_thresh**, **-column_str**, and **-column_str_exc** job command options.

An example of the Stat-Analysis calling sequence is shown below:

Expand Down Expand Up @@ -425,7 +425,7 @@ The GSID2MPR tool writes the same set of ORANK output columns for the convention
- TZFND
- d(Tz)/d(Tr)

The gsidens2orank output may be passed to the Stat-Analysis tool to derive additional statistics. In particular, users should consider running the **aggregate_stat** job type to read ORANK lines and ranked histograms (RHIST), probability integral transform histograms (PHIST), and spread-skill variance output (SSVAR). Stat-Analysis has been enhanced to parse any extra columns found at the end of the input lines. Users can filter the values in those extra columns using the **-column_thresh** and **-column_str** job command options.
The gsidens2orank output may be passed to the Stat-Analysis tool to derive additional statistics. In particular, users should consider running the **aggregate_stat** job type to read ORANK lines and ranked histograms (RHIST), probability integral transform histograms (PHIST), and spread-skill variance output (SSVAR). Stat-Analysis has been enhanced to parse any extra columns found at the end of the input lines. Users can filter the values in those extra columns using the **-column_thresh**, **-column_str**, and **-column_str_exc** job command options.

An example of the Stat-Analysis calling sequence is shown below:

Expand Down
15 changes: 8 additions & 7 deletions met/docs/Users_Guide/stat-analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -522,13 +522,14 @@ This job command option is extremely useful. It can be used multiple times to sp

.. code-block:: none
-column_min col_name value
-column_max col_name value
-column_eq col_name value
-column_thresh col_name thresh
-column_str col_name string
The column filtering options may be used when the **-line_type** has been set to a single value. These options take two arguments, the name of the data column to be used followed by a value, string, or threshold to be applied. If multiple column_min/max/eq/thresh/str options are listed, the job will be performed on their intersection. Each input line is only retained if its value meets the numeric filtering criteria defined or matches one of the strings defined by the **-column_str** option. Multiple filtering strings may be listed using commas. Defining thresholds in MET is described in :numref:`config_options`.
-column_min col_name value
-column_max col_name value
-column_eq col_name value
-column_thresh col_name thresh
-column_str col_name string
-column_str_exc col_name string
The column filtering options may be used when the **-line_type** has been set to a single value. These options take two arguments, the name of the data column to be used followed by a value, string, or threshold to be applied. If multiple column_min/max/eq/thresh/str options are listed, the job will be performed on their intersection. Each input line is only retained if its value meets the numeric filtering criteria defined, matches one of the strings defined by the **-column_str** option, or does not match any of the string defined by the **-column_str_exc** option. Multiple filtering strings may be listed using commas. Defining thresholds in MET is described in :numref:`config_options`.

.. code-block:: none
Expand Down
24 changes: 21 additions & 3 deletions met/docs/Users_Guide/tc-stat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,16 @@ _________________________
column_str_name = [];
column_str_val = [];
The **column_str_name** and **column_str_val** fields stratify by performing string matching on non-numeric data columns. Specify a comma-separated list of columns names and values to be checked. The length of the **column_str_val** should match that of the **column_str_name**. Using the **-column_str name val** option within the job command lines may further refine these selections.
The **column_str_name** and **column_str_val** fields stratify by performing string matching on non-numeric data columns. Specify a comma-separated list of columns names and values to be **included** in the analysis. The length of the **column_str_val** should match that of the **column_str_name**. Using the **-column_str name val** option within the job command lines may further refine these selections.

_________________________

.. code-block:: none
column_str_exc_name = [];
column_str_exc_val = [];
The **column_str_exc_name** and **column_str_exc_val** fields stratify by performing string matching on non-numeric data columns. Specify a comma-separated list of columns names and values to be **excluded** from the analysis. The length of the **column_str_exc_val** should match that of the **column_str_exc_name**. Using the **-column_str_exc name val** option within the job command lines may further refine these selections.

_________________________

Expand All @@ -260,7 +269,7 @@ _________________________
init_thresh_name = [];
init_thresh_val = [];
The **init_thresh_name** and **init_thresh_val** fields stratify by applying thresholds to numeric data columns only when lead = 0. If lead =0, but the value does not meet the threshold, discard the entire track. The length of the **init_thresh_val** should match that of the **init_thresh_name**. Using the **-init_thresh name val** option within the job command lines may further refine these selections.
The **init_thresh_name** and **init_thresh_val** fields stratify by applying thresholds to numeric data columns only when lead = 0. If lead = 0, but the value does not meet the threshold, discard the entire track. The length of the **init_thresh_val** should match that of the **init_thresh_name**. Using the **-init_thresh name val** option within the job command lines may further refine these selections.

_________________________

Expand All @@ -269,7 +278,16 @@ _________________________
init_str_name = [];
init_str_val = [];
The **init_str_name** and **init_str_val** fields stratify by performing string matching on non-numeric data columns only when lead = 0. If lead =0, but the string does not match, discard the entire track. The length of the **init_str_val** should match that of the **init_str_name**. Using the **-init_str name val** option within the job command lines may further refine these selections.
The **init_str_name** and **init_str_val** fields stratify by performing string matching on non-numeric data columns only when lead = 0. If lead = 0, but the string **does not** match, discard the entire track. The length of the **init_str_val** should match that of the **init_str_name**. Using the **-init_str name val** option within the job command lines may further refine these selections.

_________________________

.. code-block:: none
init_str_exc_name = [];
init_str_exc_val = [];
The **init_str_exc_name** and **init_str_exc_val** fields stratify by performing string matching on non-numeric data columns only when lead = 0. If lead = 0, and the string **does** match, discard the entire track. The length of the **init_str_exc_val** should match that of the **init_str_exc_name**. Using the **-init_str_exc name val** option within the job command lines may further refine these selections.

_________________________

Expand Down
4 changes: 4 additions & 0 deletions met/src/basic/vx_config/config_constants.h
Original file line number Diff line number Diff line change
Expand Up @@ -1037,10 +1037,14 @@ static const char conf_key_column_thresh_name[] = "column_thresh_name";
static const char conf_key_column_thresh_val[] = "column_thresh_val";
static const char conf_key_column_str_name[] = "column_str_name";
static const char conf_key_column_str_val[] = "column_str_val";
static const char conf_key_column_str_exc_name[] = "column_str_exc_name";
static const char conf_key_column_str_exc_val[] = "column_str_exc_val";
static const char conf_key_init_thresh_name[] = "init_thresh_name";
static const char conf_key_init_thresh_val[] = "init_thresh_val";
static const char conf_key_init_str_name[] = "init_str_name";
static const char conf_key_init_str_val[] = "init_str_val";
static const char conf_key_init_str_exc_name[] = "init_str_exc_name";
static const char conf_key_init_str_exc_val[] = "init_str_exc_val";
static const char conf_key_water_only[] = "water_only";
static const char conf_key_rirw_track[] = "rirw.track";
static const char conf_key_rirw_time_adeck[] = "rirw.adeck.time";
Expand Down
77 changes: 64 additions & 13 deletions met/src/libcode/vx_analysis_util/stat_job.cc
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,8 @@ void STATAnalysisJob::clear() {
wmo_fisher_stats.clear();

column_thresh_map.clear();
column_str_map.clear();
column_str_inc_map.clear();
column_str_exc_map.clear();

by_column.clear();

Expand Down Expand Up @@ -306,7 +307,8 @@ void STATAnalysisJob::assign(const STATAnalysisJob & aj) {
wmo_fisher_stats = aj.wmo_fisher_stats;

column_thresh_map = aj.column_thresh_map;
column_str_map = aj.column_str_map;
column_str_inc_map = aj.column_str_inc_map;
column_str_exc_map = aj.column_str_exc_map;

by_column = aj.by_column;

Expand Down Expand Up @@ -507,9 +509,16 @@ void STATAnalysisJob::dump(ostream & out, int depth) const {
thr_it->second.dump(out, depth + 1);
}

out << prefix << "column_str_map ...\n";
for(map<ConcatString,StringArray>::const_iterator str_it = column_str_map.begin();
str_it != column_str_map.end(); str_it++) {
out << prefix << "column_str_inc_map ...\n";
for(map<ConcatString,StringArray>::const_iterator str_it = column_str_inc_map.begin();
str_it != column_str_inc_map.end(); str_it++) {
out << prefix << str_it->first << ": \n";
str_it->second.dump(out, depth + 1);
}

out << prefix << "column_str_exc_map ...\n";
for(map<ConcatString,StringArray>::const_iterator str_it = column_str_exc_map.begin();
str_it != column_str_exc_map.end(); str_it++) {
out << prefix << str_it->first << ": \n";
str_it->second.dump(out, depth + 1);
}
Expand Down Expand Up @@ -948,15 +957,27 @@ int STATAnalysisJob::is_keeper(const STATLine & L) const {
//
// column_str
//
for(map<ConcatString,StringArray>::const_iterator str_it = column_str_map.begin();
str_it != column_str_map.end(); str_it++) {
for(map<ConcatString,StringArray>::const_iterator str_it = column_str_inc_map.begin();
str_it != column_str_inc_map.end(); str_it++) {

//
// Check if the current value is in the list for the column
//
if(!str_it->second.has(L.get_item(str_it->first.c_str(), false))) return(0);
}

//
// column_str_exc
//
for(map<ConcatString,StringArray>::const_iterator str_it = column_str_exc_map.begin();
str_it != column_str_exc_map.end(); str_it++) {

//
// Check if the current value is not in the list for the column
//
if(str_it->second.has(L.get_item(str_it->first.c_str(), false))) return(0);
}

//
// For MPR lines, check mask_grid, mask_poly, and mask_sid
//
Expand Down Expand Up @@ -1125,7 +1146,10 @@ void STATAnalysisJob::parse_job_command(const char *jobstring) {
column_thresh_map.clear();
}
else if(jc_array[i] == "-column_str" ) {
column_str_map.clear();
column_str_inc_map.clear();
}
else if(jc_array[i] == "-column_str_exc" ) {
column_str_exc_map.clear();
}
else if(jc_array[i] == "-set_hdr" ) {
hdr_name.clear();
Expand Down Expand Up @@ -1376,12 +1400,30 @@ void STATAnalysisJob::parse_job_command(const char *jobstring) {
col_value.add_css(jc_array[i+2]);

// If the column name is already present in the map, add to it
if(column_str_map.count(col_name) > 0) {
column_str_map[col_name].add(col_value);
if(column_str_inc_map.count(col_name) > 0) {
column_str_inc_map[col_name].add(col_value);
}
// Otherwise, add a new map entry
else {
column_str_map.insert(pair<ConcatString, StringArray>(col_name, col_value));
column_str_inc_map.insert(pair<ConcatString, StringArray>(col_name, col_value));
}
i+=2;
}
else if(jc_array[i] == "-column_str_exc") {

// Parse the column name and value
col_name = to_upper((string)jc_array[i+1]);
col_value.clear();
col_value.set_ignore_case(1);
col_value.add_css(jc_array[i+2]);

// If the column name is already present in the map, add to it
if(column_str_exc_map.count(col_name) > 0) {
column_str_exc_map[col_name].add(col_value);
}
// Otherwise, add a new map entry
else {
column_str_exc_map.insert(pair<ConcatString, StringArray>(col_name, col_value));
}
i+=2;
}
Expand Down Expand Up @@ -2461,14 +2503,23 @@ ConcatString STATAnalysisJob::get_jobstring() const {
}

// column_str
for(map<ConcatString,StringArray>::const_iterator str_it = column_str_map.begin();
str_it != column_str_map.end(); str_it++) {
for(map<ConcatString,StringArray>::const_iterator str_it = column_str_inc_map.begin();
str_it != column_str_inc_map.end(); str_it++) {

for(i=0; i<str_it->second.n(); i++) {
js << "-column_str " << str_it->first << " " << str_it->second[i] << " ";
}
}

// column_str_exc
for(map<ConcatString,StringArray>::const_iterator str_it = column_str_exc_map.begin();
str_it != column_str_exc_map.end(); str_it++) {

for(i=0; i<str_it->second.n(); i++) {
js << "-column_str_exc " << str_it->first << " " << str_it->second[i] << " ";
}
}

// by_column
if(by_column.n() > 0) {
for(i=0; i<by_column.n(); i++)
Expand Down
3 changes: 2 additions & 1 deletion met/src/libcode/vx_analysis_util/stat_job.h
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,8 @@ class STATAnalysisJob {
map<ConcatString,ThreshArray> column_thresh_map;

// ASCII column string matching
map<ConcatString,StringArray> column_str_map;
map<ConcatString,StringArray> column_str_inc_map;
map<ConcatString,StringArray> column_str_exc_map;

StringArray hdr_name;
StringArray hdr_value;
Expand Down
Loading

0 comments on commit e2f77e4

Please sign in to comment.