Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 1735 stat_analysis #1754

Merged
merged 9 commits into from
Apr 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions met/docs/Users_Guide/stat-analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,11 @@ The Stat-Analysis “aggregate” job aggregates values from multiple STAT lines
Aggregate STAT lines and produce aggregated statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Stat-Analysis “aggregate-stat” job aggregates multiple STAT lines of the same type together and produces relevant statistics from the aggregated line. This may be done in the same manner listed above in :numref:`StA_Aggregated-values-from`. However, rather than writing out the aggregated STAT line itself, the relevant statistics generated from that aggregated line are provided in the output. Specifically, if a contingency table line type (FHO, CTC, PCT, MCTC, or NBRCTC) has been aggregated, a contingency table statistics (CTS, PSTD, MCTS, or NBRCTS) line type will be written out. If a partial sums line type (SL1L2 or SAL1L2) has been aggregated, a continuous statistics (CNT) line type will be written out. If a vector partial sums line type (VL1L2) has been aggregated, the vector continuous statistics (VCNT) line type will be written out. For ensembles, the ORANK line type can be accumulated into ECNT, RPS, RHIST, PHIST, RELP, or SSVAR output. If the matched pair line type (MPR) has been aggregated, the user may choose the line type to be output (FHO, CTC, CTS, CNT, MCTC, MCTS, SL1L2, SAL1L2, VL1L2, VCNT, WDIR, PCT, PSTD, PJC, PRC, or ECLV).
The Stat-Analysis “aggregate-stat” job aggregates multiple STAT lines of the same type together and produces relevant statistics from the aggregated line. This may be done in the same manner listed above in :numref:`StA_Aggregated-values-from`. However, rather than writing out the aggregated STAT line itself, the relevant statistics generated from that aggregated line are provided in the output. Specifically, if a contingency table line type (FHO, CTC, PCT, MCTC, or NBRCTC) has been aggregated, contingency table statistics (CTS, ECLV, PSTD, MCTS, or NBRCTS) line types can be computed. If a partial sums line type (SL1L2 or SAL1L2) has been aggregated, the continuous statistics (CNT) line type can be computed. If a vector partial sums line type (VL1L2) has been aggregated, the vector continuous statistics (VCNT) line type can be computed. For ensembles, the ORANK line type can be accumulated into ECNT, RPS, RHIST, PHIST, RELP, or SSVAR output. If the matched pair line type (MPR) has been aggregated, may output line types (FHO, CTC, CTS, CNT, MCTC, MCTS, SL1L2, SAL1L2, VL1L2, VCNT, WDIR, PCT, PSTD, PJC, PRC, or ECLV) can be computed. Multiple output line types may be specified for each “aggregate-stat” job, as long as each output is derivable from the input.

When aggregating the matched pair line type (MPR) and computing an output contingency table statistics (CTS) or continuous statistics (CNT) line type, the bootstrapping method is applied for computing confidence intervals. The bootstrapping method is applied here in the same way that it is applied in the statistics tools. For a set of n matched forecast-observation pairs, the matched pairs are resampled with replacement many times. For each replicated sample, the corresponding statistics are computed. The confidence intervals are derived from the statistics computed for each replicated sample.
When aggregating the matched pair line type (MPR), additional required job command options are determined by the requested output line type(s). For example, the “-out_thresh” (or “-out_fcst_thresh” and “-out_obs_thresh” options) are required to compute contingnecy table counts (FHO, CTC) or statistics (CTS). Those same job command options can also specify filtering thresholds when computing continuous partial sums (SL1L2, SAL1L2) or statistics (CNT). Output is written for each threshold specified.

When aggregating the matched pair line type (MPR) and computing an output contingency table statistics (CTS) or continuous statistics (CNT) line type, the bootstrapping method can be applied to compute confidence intervals. The bootstrapping method is applied here in the same way that it is applied in the statistics tools. For a set of n matched forecast-observation pairs, the matched pairs are resampled with replacement many times. For each replicated sample, the corresponding statistics are computed. The confidence intervals are derived from the statistics computed for each replicated sample.

.. _StA_Skill-Score-Index:

Expand Down Expand Up @@ -541,7 +543,7 @@ Each analysis job is performed over a subset of the input data. Filtering the in

-out_line_type name

This option specifies the desired output line type for the **aggregate_stat** job type.
This option specifies the desired output line type(s) for the **aggregate_stat** job type.

.. code-block:: none

Expand Down
50 changes: 46 additions & 4 deletions met/src/basic/vx_util/ascii_table.cc
Original file line number Diff line number Diff line change
Expand Up @@ -331,8 +331,6 @@ if ( e.size() != NRC ) {

}

//for (j=0; j<NRC; ++j) e[j] = "";

Nrows = NR;
Ncols = NC;

Expand All @@ -346,8 +344,6 @@ if ( !ColWidth.size() ) {

}

//for (j=0; j<Ncols; ++j) ColWidth[j] = 0;

InterColumnSpace.resize(Ncols - 1, default_ics);
InterRowSpace.resize(Nrows - 1, default_irs);

Expand Down Expand Up @@ -438,6 +434,52 @@ return;
////////////////////////////////////////////////////////////////////////


void AsciiTable::expand(const int NR, const int NC)

{

//
// already big enough, nothing to do
//

if ( NR < Nrows && NC < Ncols ) return;

//
// stash existing table
//

AsciiTable at = *this;

//
// resize and copy over data
//

set_size(max(Nrows, NR), max(Ncols, NC));

int r, c;

for (r=0; r<at.nrows(); ++r) {

for (c=0; c<at.ncols(); ++c) {

set_entry(r, c, at(r, c));

}

}

//
// done
//

return;

}


////////////////////////////////////////////////////////////////////////


void AsciiTable::set_ics(int value)

{
Expand Down
1 change: 1 addition & 0 deletions met/src/basic/vx_util/ascii_table.h
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,7 @@ class AsciiTable {

virtual void set_size(const int NR, const int NC);
virtual void add_rows(const int NR);
virtual void expand (const int NR, const int NC);

virtual void set_entry(const int r, const int c, const char*);
virtual void set_entry(const int r, const int c, const ConcatString &);
Expand Down
161 changes: 104 additions & 57 deletions met/src/libcode/vx_analysis_util/stat_job.cc
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,8 @@ void STATAnalysisJob::clear() {
if(dump_row) { delete [] dump_row; dump_row = (char *) 0; }
if(stat_file) { delete [] stat_file; stat_file = (char *) 0; }

stat_row = 0;

out_line_type.clear();

out_fcst_thresh.clear();
Expand Down Expand Up @@ -307,8 +309,8 @@ void STATAnalysisJob::assign(const STATAnalysisJob & aj) {
wmo_fisher_stats = aj.wmo_fisher_stats;

column_thresh_map = aj.column_thresh_map;
column_str_inc_map = aj.column_str_inc_map;
column_str_exc_map = aj.column_str_exc_map;
column_str_inc_map = aj.column_str_inc_map;
column_str_exc_map = aj.column_str_exc_map;

by_column = aj.by_column;

Expand Down Expand Up @@ -354,6 +356,8 @@ void STATAnalysisJob::assign(const STATAnalysisJob & aj) {
mask_poly = aj.mask_poly;
mask_sid = aj.mask_sid;

stat_row = aj.stat_row;

set_dump_row (aj.dump_row);
set_stat_file(aj.stat_file);

Expand Down Expand Up @@ -1880,6 +1884,8 @@ void STATAnalysisJob::open_stat_file() {

close_stat_file();

stat_row = 0;

if(!stat_file) return;

stat_out = new ofstream;
Expand All @@ -1904,7 +1910,7 @@ void STATAnalysisJob::setup_stat_file(int n_row, int n) {
int i, c, n_col;

//
// Nothing to do unless output STAT file stream is defined
// Nothing to do if no output STAT file stream is defined
//
if(!stat_out) return;

Expand Down Expand Up @@ -1970,63 +1976,104 @@ void STATAnalysisJob::setup_stat_file(int n_row, int n) {
n_col += n_header_columns;

//
// Setup the STAT table
//
stat_at.set_size(n_row, n_col);
justify_stat_cols(stat_at);
stat_at.set_precision(precision);
stat_at.set_bad_data_value(bad_data_double);
stat_at.set_bad_data_str(na_str);
stat_at.set_delete_trailing_blank_rows(1);

//
// Write the STAT header row
//
switch(out_lt) {
case stat_sl1l2: write_header_row (sl1l2_columns, n_sl1l2_columns, 1, stat_at, 0, 0); break;
case stat_sal1l2: write_header_row (sal1l2_columns, n_sal1l2_columns, 1, stat_at, 0, 0); break;
case stat_vl1l2: write_header_row (vl1l2_columns, n_vl1l2_columns, 1, stat_at, 0, 0); break;
case stat_val1l2: write_header_row (val1l2_columns, n_val1l2_columns, 1, stat_at, 0, 0); break;
case stat_fho: write_header_row (fho_columns, n_fho_columns, 1, stat_at, 0, 0); break;
case stat_ctc: write_header_row (ctc_columns, n_ctc_columns, 1, stat_at, 0, 0); break;
case stat_cts: write_header_row (cts_columns, n_cts_columns, 1, stat_at, 0, 0); break;
case stat_mctc: write_mctc_header_row (1, n, stat_at, 0, 0); break;
case stat_mcts: write_header_row (mcts_columns, n_mcts_columns, 1, stat_at, 0, 0); break;
case stat_cnt: write_header_row (cnt_columns, n_cnt_columns, 1, stat_at, 0, 0); break;
case stat_vcnt: write_header_row (vcnt_columns, n_vcnt_columns, 1, stat_at, 0, 0); break;
case stat_pct: write_pct_header_row (1, n, stat_at, 0, 0); break;
case stat_pstd: write_pstd_header_row (1, n, stat_at, 0, 0); break;
case stat_pjc: write_pjc_header_row (1, n, stat_at, 0, 0); break;
case stat_prc: write_prc_header_row (1, n, stat_at, 0, 0); break;
case stat_eclv: write_eclv_header_row (1, n, stat_at, 0, 0); break;
case stat_mpr: write_header_row (mpr_columns, n_mpr_columns, 1, stat_at, 0, 0); break;
case stat_nbrctc: write_header_row (nbrctc_columns, n_sl1l2_columns, 1, stat_at, 0, 0); break;
case stat_nbrcts: write_header_row (nbrcts_columns, n_sl1l2_columns, 1, stat_at, 0, 0); break;
case stat_nbrcnt: write_header_row (nbrcnt_columns, n_sl1l2_columns, 1, stat_at, 0, 0); break;
case stat_grad: write_header_row (grad_columns, n_grad_columns, 1, stat_at, 0, 0); break;
case stat_isc: write_header_row (isc_columns, n_isc_columns, 1, stat_at, 0, 0); break;
case stat_wdir: write_header_row (job_wdir_columns, n_job_wdir_columns, 1, stat_at, 0, 0); break;
case stat_ecnt: write_header_row (ecnt_columns, n_ecnt_columns, 1, stat_at, 0, 0); break;
case stat_rps: write_header_row (rps_columns, n_rps_columns, 1, stat_at, 0, 0); break;
case stat_rhist: write_rhist_header_row (1, n, stat_at, 0, 0); break;
case stat_phist: write_phist_header_row (1, n, stat_at, 0, 0); break;
case stat_relp: write_relp_header_row (1, n, stat_at, 0, 0); break;
case stat_orank: write_header_row (orank_columns, n_orank_columns, 1, stat_at, 0, 0); break;
case stat_ssvar: write_header_row (ssvar_columns, n_ssvar_columns, 1, stat_at, 0, 0); break;
case stat_genmpr: write_header_row (genmpr_columns, n_genmpr_columns, 1, stat_at, 0, 0); break;
// Create table from scratch
//
if(stat_row == 0) {

//
// Multiply the number of rows by the number of
// output line types to avoid resizing later
//
n_row *= max(1, out_sa.n());

//
// Write only header columns for unspecified line type
// Setup the STAT table
//
stat_at.set_size(n_row, n_col);
justify_stat_cols(stat_at);
stat_at.set_precision(precision);
stat_at.set_bad_data_value(bad_data_double);
stat_at.set_bad_data_str(na_str);
stat_at.set_delete_trailing_blank_rows(1);

//
case no_stat_line_type:
write_header_row ((const char **) 0, 0, 1, stat_at, 0, 0); break;

default:
mlog << Error << "\nSTATAnalysisJob::setup_stat_file() -> "
<< "unexpected stat line type \"" << statlinetype_to_string(out_lt)
<< "\"!\n\n";
exit(1);
break;
// Write the STAT header row
//
switch(out_lt) {
case stat_sl1l2: write_header_row (sl1l2_columns, n_sl1l2_columns, 1, stat_at, 0, 0); break;
case stat_sal1l2: write_header_row (sal1l2_columns, n_sal1l2_columns, 1, stat_at, 0, 0); break;
case stat_vl1l2: write_header_row (vl1l2_columns, n_vl1l2_columns, 1, stat_at, 0, 0); break;
case stat_val1l2: write_header_row (val1l2_columns, n_val1l2_columns, 1, stat_at, 0, 0); break;
case stat_fho: write_header_row (fho_columns, n_fho_columns, 1, stat_at, 0, 0); break;
case stat_ctc: write_header_row (ctc_columns, n_ctc_columns, 1, stat_at, 0, 0); break;
case stat_cts: write_header_row (cts_columns, n_cts_columns, 1, stat_at, 0, 0); break;
case stat_mctc: write_mctc_header_row (1, n, stat_at, 0, 0); break;
case stat_mcts: write_header_row (mcts_columns, n_mcts_columns, 1, stat_at, 0, 0); break;
case stat_cnt: write_header_row (cnt_columns, n_cnt_columns, 1, stat_at, 0, 0); break;
case stat_vcnt: write_header_row (vcnt_columns, n_vcnt_columns, 1, stat_at, 0, 0); break;
case stat_pct: write_pct_header_row (1, n, stat_at, 0, 0); break;
case stat_pstd: write_pstd_header_row (1, n, stat_at, 0, 0); break;
case stat_pjc: write_pjc_header_row (1, n, stat_at, 0, 0); break;
case stat_prc: write_prc_header_row (1, n, stat_at, 0, 0); break;
case stat_eclv: write_eclv_header_row (1, n, stat_at, 0, 0); break;
case stat_mpr: write_header_row (mpr_columns, n_mpr_columns, 1, stat_at, 0, 0); break;
case stat_nbrctc: write_header_row (nbrctc_columns, n_sl1l2_columns, 1, stat_at, 0, 0); break;
case stat_nbrcts: write_header_row (nbrcts_columns, n_sl1l2_columns, 1, stat_at, 0, 0); break;
case stat_nbrcnt: write_header_row (nbrcnt_columns, n_sl1l2_columns, 1, stat_at, 0, 0); break;
case stat_grad: write_header_row (grad_columns, n_grad_columns, 1, stat_at, 0, 0); break;
case stat_isc: write_header_row (isc_columns, n_isc_columns, 1, stat_at, 0, 0); break;
case stat_wdir: write_header_row (job_wdir_columns, n_job_wdir_columns, 1, stat_at, 0, 0); break;
case stat_ecnt: write_header_row (ecnt_columns, n_ecnt_columns, 1, stat_at, 0, 0); break;
case stat_rps: write_header_row (rps_columns, n_rps_columns, 1, stat_at, 0, 0); break;
case stat_rhist: write_rhist_header_row (1, n, stat_at, 0, 0); break;
case stat_phist: write_phist_header_row (1, n, stat_at, 0, 0); break;
case stat_relp: write_relp_header_row (1, n, stat_at, 0, 0); break;
case stat_orank: write_header_row (orank_columns, n_orank_columns, 1, stat_at, 0, 0); break;
case stat_ssvar: write_header_row (ssvar_columns, n_ssvar_columns, 1, stat_at, 0, 0); break;
case stat_genmpr: write_header_row (genmpr_columns, n_genmpr_columns, 1, stat_at, 0, 0); break;

//
// Write only header columns for unspecified line type
//
case no_stat_line_type:
write_header_row ((const char **) 0, 0, 1, stat_at, 0, 0); break;

default:
mlog << Error << "\nSTATAnalysisJob::setup_stat_file() -> "
<< "unexpected stat line type \"" << statlinetype_to_string(out_lt)
<< "\"!\n\n";
exit(1);
break;
}

//
// Increment row counter
//
stat_row++;
}
//
// Expand the table, if needed
//
else {

//
// Determine the required dimensions
//
int need_rows = max(stat_at.nrows(), stat_row + n_row);
int need_cols = max(stat_at.ncols(), n_col);

if(need_rows > stat_at.nrows() || need_cols > stat_at.ncols()) {

//
// Resize the STAT table
//
stat_at.expand(need_rows, need_cols);
justify_stat_cols(stat_at);
stat_at.set_precision(precision);
stat_at.set_bad_data_value(bad_data_double);
stat_at.set_bad_data_str(na_str);
stat_at.set_delete_trailing_blank_rows(1);
}
}

return;
Expand Down
1 change: 1 addition & 0 deletions met/src/libcode/vx_analysis_util/stat_job.h
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,7 @@ class STATAnalysisJob {
char *stat_file; // dump output statistics to a STAT file
ofstream *stat_out; // output file stream for -out_stat
AsciiTable stat_at; // AsciiTable for buffering output STAT data
int stat_row; // Counter for the current stat row

StringArray out_line_type; // output line types
ThreshArray out_fcst_thresh; // output forecast threshold(s)
Expand Down
Loading