Update develop-ref after #2694 and #2705 (#2714)

Co-authored-by: jprestop <[email protected]> Co-authored-by: Seth Linden <[email protected]> Co-authored-by: John Halley Gotway <[email protected]> Co-authored-by: Daniel Adriaansen <[email protected]> Co-authored-by: John and Cindy <[email protected]> Co-authored-by: rgbullock <[email protected]> Co-authored-by: Randy Bullock <[email protected]> Co-authored-by: Dave Albo <[email protected]> Co-authored-by: Howard Soh <[email protected]> Co-authored-by: George McCabe <[email protected]> Co-authored-by: hsoh-u <[email protected]> Co-authored-by: MET Tools Test Account <[email protected]> Co-authored-by: Seth Linden <[email protected]> Co-authored-by: lisagoodrich <[email protected]> Co-authored-by: davidalbo <[email protected]> Co-authored-by: Lisa Goodrich <[email protected]> Co-authored-by: metplus-bot <[email protected]> Co-authored-by: j-opatz <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jonathan Vigh <[email protected]> Co-authored-by: Tracy Hertneky <[email protected]> Co-authored-by: David Albo <[email protected]> Co-authored-by: Dan Adriaansen <[email protected]> fix 2518 dtypes appf docs (#2519) fix 2531 compilation errors (#2533) fix #2531 compilation_errors_configure (#2535) fix #2514 develop clang (#2563) fix #2575 develop python_convert (#2576) Fix Python environment issue (#2407) fix definitions of G172 and G220 based on comments in NOAA-EMC/NCEPLIBS-w3emc#157. (#2406) fix #2380 develop override (#2382) fix #2408 develop empty config (#2410) fix #2390 develop compile zlib (#2404) fix #2412 develop climo (#2422) fix #2437 develop convert (#2439) fix for develop, for #2437, forgot one reference to the search_parent for a dictionary lookup. fix #2452 develop airnow (#2454) fix #2449 develop pdf (#2464) fix #2402 develop sonarqube (#2468) fix #2426 develop buoy (#2475) fix 2596 main v11.1 rpath compilation (#2614) fix #2514 main_v11.1 clang (#2628) fix #2644 develop percentile (#2647)
dtcenter · Oct 13, 2023 · 9fce8d1 · 9fce8d1
1 parent 80afcac
commit 9fce8d1
Show file tree

Hide file tree

Showing 57 changed files with 851 additions and 543 deletions.
diff --git a/data/config/PlotPointObsConfig_default b/data/config/PlotPointObsConfig_default
@@ -79,7 +79,6 @@ point_data = [
 
 ////////////////////////////////////////////////////////////////////////////////
 
-tmp_dir = "/tmp";
 version = "V12.0.0";
 
 ////////////////////////////////////////////////////////////////////////////////
diff --git a/data/config/TCPairsConfig_default b/data/config/TCPairsConfig_default
@@ -145,7 +145,7 @@ diag_info_map = [
       diag_source    = "CIRA_DIAG_RT";
       track_source   = "GFS";
       field_source   = "GFS_0p50";
-      match_to_track = [ "GFS" ];
+      match_to_track = [];
       diag_name      = [];
    },
    { 

diff --git a/docs/Contributors_Guide/dev_details/index.rst b/docs/Contributors_Guide/dev_details/index.rst
@@ -0,0 +1,11 @@
+*******************
+Development Details
+*******************
+
+This chapter provides specific details about select topics within the
+MET code base. The list of topics is certainly not comprehensive.
+
+.. toctree::
+   :titlesonly:
+
+   tmp_file_use
diff --git a/docs/Contributors_Guide/dev_details/tmp_file_use.rst b/docs/Contributors_Guide/dev_details/tmp_file_use.rst
@@ -0,0 +1,169 @@
+.. _tmp_file_use:
+
+Use of Temporary Files
+======================
+
+The MET application and library code uses temporary files in several
+places. Each specific use of temporary files is described below. The
+directory in which temporary files are stored is configurable as,
+described in :numref:`User's Guide Section %s <config_tmp_dir>`.
+
+Whenever a MET application is run, the operating system assigns it a
+process identification number (PID). All temporary files created by
+MET include the PID in the file name so that multiple instances can
+run concurrently without conflict. In addition, when creating a
+temporary file name, the :code:`make_temp_file_name(...)` utility
+function appends :code:`_0` to the PID, checks to see if the
+corresponding file name is already in use, and if so, tries
+:code:`_1`, :code:`_2` and so on, until an available file name is
+found.
+
+Note that creating, reading, and deleting temporary files from the
+local filesystem is much more efficient than performing these
+operations across a network filesystem. Using the default
+:code:`/tmp` directory is recommended, unless prohibited by policies
+on your system.
+
+In general, MET applications delete any temporary files they create
+when they are no longer needed. However, if the application exits
+abnormally, the temporary files may remain.
+
+.. _tmp_files_pb2nc:
+
+PB2NC Tool
+^^^^^^^^^^
+
+The PB2NC tool reads input binary files in the BUFR or PrepBUFR
+format, extracts and/or derives observations from them, filters
+those observations, and writes the result to a NetCDF output file.
+
+PB2NC creates the following temporary files when running:
+
+* :code:`tmp_pb2nc_blk_{PID}`, :code:`tmp_pb2nc_meta_blk_{PID}`,
+  :code:`tmp_pb2nc_tbl_blk_{PID}`
+
+  PB2NC assumes that each input binary file requires Fortran
+  blocking prior to being read by the BUFRLIB library. It applies
+  Fortran blocking, writes the result to this temporary file, and
+  uses BUFRLIB to read its contents.
+
+* :code:`tmp_pb2nc_bufr_{PID}_tbl`: PB2NC extracts Bufr table data
+  that is embedded in input files and writes it to this temporary
+  file for later use.
+
+.. note::
+   The first 3 files listed above are identical. They are all
+   Fortran-blocked versions of the same input file. Recommend
+   modifying the logic to only apply Fortran blocking once.
+
+.. _tmp_files_point2grid:
+
+Point2Grid Tool
+^^^^^^^^^^^^^^^
+
+The Point2Grid tool reads point observations from a variety of
+inputs and summarizes them on a grid. When processing GOES input
+files, a temporary NetCDF file is created to store the mapping of
+input pixel locations to output grid cells unless the
+MET_GEOSTATIONARY_DATA environment variable defines an existing grid
+navigation file to be used.
+
+If that temporary geostationary grid mapping file already exists, it
+is used directly and not recreated. If not, it is created as needed.
+
+Note that this temporary file is *not* deleted by the Point2Grid
+tool. Once created, it is intended to be reused in future runs.
+
+.. _tmp_files_bootstrap:
+
+Bootstrap Confidence Intervals
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Several MET tools support the computation of bootstrap confidence
+intervals, as described in :numref:`User's Guide Section %s <config_boot>`
+and :numref:`User's Guide Appendix D, Section %s <App_D-Confidence-Intervals>`.
+When bootstrap confidence intervals are requested, up to two
+temporary files are created for each CNT, CTS, MCTS, NBRCNT, or
+NBRCTS line type written to the output.
+
+* :code:`tmp_{LINE_TYPE}_i_{PID}`: When the BCA bootstrapping method
+  is requested, jackknife resampling is applied to the input matched
+  pairs. Statistics are computed for each jackknife resample and
+  written to this temporary file.
+
+* :code:`tmp_{LINE_TYPE}_r_{PID}`: For each bootstrap replicate
+  computed from the input matched pairs, statistics are computed
+  and written to this temporary file.
+
+Where {LINE_TYPE} is :code:`cnt`, :code:`cts`, :code:`mcts`,
+:code:`nbrcnt`, or :code:`nbrcts`.
+
+.. note::
+   Consider whether or not it's realistic to hold the resampled
+   statistics in memory rather than writing them to temporary files.
+   If so, that would reduce the I/O.
+
+.. _tmp_files_stat_analysis:
+
+Stat-Analysis Tool
+^^^^^^^^^^^^^^^^^^
+
+The Stat-Analysis tool reads ASCII output created by the MET
+statistics tools. A single job can be specified on the command line
+or one or more jobs can be specified in an optional configuration
+file. When a configuration file is provided, any filtering options
+specified are applied to all entries in the :code:`jobs` array.
+
+Rather than reading all of the input data for each job, Stat-Analysis
+reads all the input data once, applies any common filtering options,
+and writes the result to a temporary file.
+
+* :code:`tmp_stat_analysis_{PID}`: Stat-Analysis reads all of the
+  input data, applies common filtering logic, and writes the result
+  to this temporary file. All of the specified jobs read data from
+  this temporary file, apply any additional job-specific filtering
+  criteria, and perform the requested operation.
+
+.. note::
+   Consider revising the logic to only use a temp file when actually
+   necessary, when multiple jobs are specified along with non-empty
+   common filtering logic.
+
+.. _tmp_files_python_embedding:
+
+Python Embedding
+^^^^^^^^^^^^^^^^
+
+As described in
+:numref:`User's Guide Appendix F, Section %s <appendixF>`, when the
+:code:`MET_PYTHON_EXE` environment variable is set, the MET tools run
+any Python embedding commands using the specified Python executable.
+
+* :code:`tmp_mpr_{PID}`: When Python embedding of matched pair data
+  is performed, a Python wrapper is run to execute the user-specified
+  Python script and write the result to this temporary ASCII file.
+
+* :code:`tmp_met_nc_{PID}`: When Python embedding of gridded data or
+  point observations is performed, a Python wrapper is run to
+  execute the user-specified Python script and write the result to
+  this temporary NetCDF file.
+
+The compile-time Python instance is run to read data from these
+temporary files.
+
+.. _tmp_files_tc_diag:
+
+TC-Diag Tool
+^^^^^^^^^^^^
+
+The TC-Diag tool requires the use of Python embedding. It processes
+one or more ATCF tracks and computes model diagnostics. For each
+track point, it converts gridded model data to cylindrical
+coordinates centered at that point, writes it to a temporary NetCDF
+file, and passes it to Python scripts to compute model diagnostics.
+
+* :code:`tmp_met_nc_{PID}`: Cylindrical coordinate model data is
+  written to this temporary NetCDF file for each track point
+  and passed to Python scripts to compute diagnostics. If requested,
+  these temporary NetCDF files for each track point are combined into
+  a single NetCDF cylindrical coordinates output file for each track.
diff --git a/docs/Contributors_Guide/index.rst b/docs/Contributors_Guide/index.rst
@@ -7,9 +7,11 @@ Welcome to the Model Evaluation Tools (MET) Contributor's Guide.
 .. toctree::
    :titlesonly:
    :numbered:
+   :maxdepth: 1
 
    coding_standards
    dev_env
+   dev_details/index 
    github_workflow
    testing
    continuous_integration

diff --git a/docs/Users_Guide/appendixF.rst b/docs/Users_Guide/appendixF.rst
@@ -56,7 +56,7 @@ If a user attempts to invoke Python embedding with a version of MET that was not
 Controlling Which Python MET Uses When Running
 ==============================================
 
-When MET is compiled with Python embedding support, MET uses the Python executable in that Python installation by default when Python embedding is used. However, for users of highly configurable Python environments, the Python instance set at compilation time may not be sufficient. Users may want to use an alternate Python installation if they need additional packages not available in the Python installation used when compiling MET. In MET versions 9.0+, users have the ability to use a different Python executable when running MET than the version used when compiling MET by setting the environment variable **MET_PYTHON_EXE**.
+When MET is compiled with Python embedding support, MET uses the Python executable in that Python installation by default when Python embedding is used. However, for users of highly configurable Python environments, the Python instance set at compilation time may not be sufficient. Users may want to use an alternate Python installation if they need additional packages not available in the Python installation used when compiling MET. In MET versions 9.0+, users have the ability to use a different Python executable when running MET than the version used when compiling MET by setting the environment variable **MET_PYTHON_EXE**. Whenever **MET_PYTHON_EXE** is set, MET writes a temporary file, as described in :numref:`Contributor's Guide Section %s <tmp_files_python_embedding>`.
 
 If a user's Python script requires packages that are not available in the Python installation used when compiling the MET software, they will encounter a runtime error when using MET. In this instance, the user will need to change the Python MET is using to a different installation with the required packages for their script. It is the responsibility of the user to manage this Python installation, and one popular approach is to use a custom Anaconda (Conda) Python environment. Once the Python installation meeting the user's requirements is available, the user can force MET to use it by setting the **MET_PYTHON_EXE** environment variable to the full path of the Python executable in that installation. For example:
 

diff --git a/docs/Users_Guide/config_options.rst b/docs/Users_Guide/config_options.rst
@@ -533,6 +533,8 @@ override the default value set in ConfigConstants.
 		
   output_precision = 5;
 
+.. _config_tmp_dir:
+
 tmp_dir
 ^^^^^^^
 
@@ -546,6 +548,9 @@ Some tools override the temporary directory by the command line argument
 		
   tmp_dir = "/tmp";
 
+A description of the use of temporary files in MET can be found in
+:numref:`Contributor's Guide Section %s <tmp_file_use>`.
+
 message_type_group_map
 ^^^^^^^^^^^^^^^^^^^^^^
 
@@ -1684,6 +1689,8 @@ interval.
 		
   ci_alpha = [ 0.05, 0.10 ];
 
+.. _config_boot:
+
 boot
 ^^^^
 

diff --git a/docs/Users_Guide/plotting.rst b/docs/Users_Guide/plotting.rst
@@ -86,7 +86,6 @@ ________________________
 
 .. code-block:: none
 
-  tmp_dir        = "/tmp";
   version        = "VN.N";
 
 The configuration options listed above are common to multiple MET tools and are described in :numref:`config_options`.

diff --git a/docs/Users_Guide/point-stat.rst b/docs/Users_Guide/point-stat.rst
@@ -197,7 +197,7 @@ For continuous fields (e.g., temperature), it is possible to estimate confidence
 
 For the measures relating the two fields (i.e., mean error, correlation and standard deviation of the errors), confidence intervals are based on either the joint distributions of the two fields (e.g., with correlation) or on a function of the two fields. For the correlation, the underlying assumption is that the two fields follow a bivariate normal distribution. In the case of the mean error and the standard deviation of the mean error, the assumption is that the errors are normally distributed, which for continuous variables, is usually a reasonable assumption, even for the standard deviation of the errors.
 
-Bootstrap confidence intervals for any verification statistic are available in MET. Bootstrapping is a nonparametric statistical method for estimating parameters and uncertainty information. The idea is to obtain a sample of the verification statistic(s) of interest (e.g., bias, ETS, etc.) so that inferences can be made from this sample. The assumption is that the original sample of matched forecast-observation pairs is representative of the population. Several replicated samples are taken with replacement from this set of forecast-observation pairs of variables (e.g., precipitation, temperature, etc.), and the statistic(s) are calculated for each replicate. That is, given a set of n forecast-observation pairs, we draw values at random from these pairs, allowing the same pair to be drawn more than once, and the statistic(s) is (are) calculated for each replicated sample. This yields a sample of the statistic(s) based solely on the data without making any assumptions about the underlying distribution of the sample. It should be noted, however, that if the observed sample of matched pairs is dependent, then this dependence should be taken into account somehow. Currently, the confidence interval methods in MET do not take into account dependence, but future releases will support a robust method allowing for dependence in the original sample. More detailed information about the bootstrap algorithm is found in the :numref:`Appendix D, Section %s. <appendixD>`
+Bootstrap confidence intervals for any verification statistic are available in MET. Bootstrapping is a nonparametric statistical method for estimating parameters and uncertainty information. The idea is to obtain a sample of the verification statistic(s) of interest (e.g., bias, ETS, etc.) so that inferences can be made from this sample. The assumption is that the original sample of matched forecast-observation pairs is representative of the population. Several replicated samples are taken with replacement from this set of forecast-observation pairs of variables (e.g., precipitation, temperature, etc.), and the statistic(s) are calculated for each replicate. That is, given a set of n forecast-observation pairs, we draw values at random from these pairs, allowing the same pair to be drawn more than once, and the statistic(s) is (are) calculated for each replicated sample. This yields a sample of the statistic(s) based solely on the data without making any assumptions about the underlying distribution of the sample. It should be noted, however, that if the observed sample of matched pairs is dependent, then this dependence should be taken into account somehow. Currently, the confidence interval methods in MET do not take into account dependence, but future releases will support a robust method allowing for dependence in the original sample. More detailed information about the bootstrap algorithm is found in the :numref:`Appendix D, Section %s <appendixD>`. Note that MET writes temporary files whenever bootstrap confidence intervals are computed, as described in :numref:`Contributor's Guide Section %s <tmp_files_bootstrap>`. 
 
 Confidence intervals can be calculated from the sample of verification statistics obtained through the bootstrap algorithm. The most intuitive method is to simply take the appropriate quantiles of the sample of statistic(s). For example, if one wants a 95% CI, then one would take the 2.5 and 97.5 percentiles of the resulting sample. This method is called the percentile method, and has some nice properties. However, if the original sample is biased and/or has non-constant variance, then it is well known that this interval is too optimistic. The most robust, accurate, and well-behaved way to obtain accurate CIs from bootstrapping is to use the bias corrected and adjusted percentile method (or BCa). If there is no bias, and the variance is constant, then this method will yield the usual percentile interval. The only drawback to the approach is that it is computationally intensive. Therefore, both the percentile and BCa methods are available in MET, with the considerably more efficient percentile method being the default.
 
@@ -911,7 +911,7 @@ The first set of header columns are common to all of the output files generated
   * - 26
     - N_CAT
     - Dimension of the contingency table
-  * - 28
+  * - 27
     - Fi_Oj
     - Count of events in forecast category i and observation category j, with the observations incrementing first (repeated)
   * - \*

diff --git a/docs/Users_Guide/reformat_point.rst b/docs/Users_Guide/reformat_point.rst
@@ -108,6 +108,7 @@ ____________________
 		version    = "VN.N";
 
 The configuration options listed above are common to many MET tools and are described in :numref:`config_options`.
+The use of temporary files in PB2NC is described in :numref:`Contributor's Guide Section %s <tmp_files_pb2nc>`.
 
 _____________________
 
@@ -1082,7 +1083,7 @@ Optional arguments for point2grid
 
 Only 4 interpolation methods are applied to the field variables; MIN/MAX/MEDIAN/UW_MEAN. The GAUSSIAN method is applied to the probability variable only. Unlike regrad_data_plane, MAX method is applied to the file variable and Gaussian method to the probability variable with the MAXGAUSS method. If the probability variable is not requested, MAXGAUSS method is the same as MAX method.
 
-For the GOES-16 and GOES-17 data, the computing lat/long is time consuming. So the computed coordinate (lat/long) is saved into the NetCDF file to the environment variable MET_TMP_DIR or */tmp* if MET_TMP_DIR is not defined. The computing lat/long step can be skipped if the coordinate file is given through the environment variable MET_GEOSTATIONARY_DATA. The grid mapping to the target grid is saved to MET_TMP_DIR to save the execution time. Once this file is created, the MET_GEOSTATIONARY_DATA is ignored. The grid mapping file should be deleted manually in order to apply a new MET_GEOSTATIONARY_DATA environment variable or to re-generate the grid mapping file. An example of call point2grid to process GOES-16 AOD data is shown below:
+For the GOES-16 and GOES-17 data, the computing lat/long is time consuming. The computed coordinate (lat/long) is saved to a temporary NetCDF file, as described in :numref:`Contributor's Guide Section %s <tmp_files_point2grid>`. The computing lat/long step can be skipped if the coordinate file is given through the environment variable MET_GEOSTATIONARY_DATA. The grid mapping to the target grid is saved to MET_TMP_DIR to save the execution time. Once this file is created, the MET_GEOSTATIONARY_DATA is ignored. The grid mapping file should be deleted manually in order to apply a new MET_GEOSTATIONARY_DATA environment variable or to re-generate the grid mapping file. An example of call point2grid to process GOES-16 AOD data is shown below:
 
 .. code-block:: none