Skip to content

Commit

Permalink
Per #2690, tweak tmp_file docs and add links FROM the User's Guide TO…
Browse files Browse the repository at this point in the history
… the Contributor's Guide.
  • Loading branch information
JohnHalleyGotway committed Sep 27, 2023
1 parent d9e236f commit b42a4de
Show file tree
Hide file tree
Showing 7 changed files with 41 additions and 33 deletions.
59 changes: 33 additions & 26 deletions docs/Contributors_Guide/dev_details/tmp_file_use.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ Use of Temporary Files

The MET application and library code uses temporary files in several
places. Each specific use of temporary files is described below. The
directory in which temporary files are stored is configurable as
described in the User's Guide :numref:`config_tmp_dir`.
directory in which temporary files are stored is configurable as,
described in :numref:`User's Guide Section %s <config_tmp_dir>`.

Whenever a MET application is run, the operating system assigns it a
process identification number (PID). All temporary files created by
Expand All @@ -28,6 +28,8 @@ In general, MET applications delete any temporary files they create
when they are no longer needed. However, if the application exits
abnormally, the temporary files may remain.

.. _tmp_files_pb2nc:

PB2NC Tool
^^^^^^^^^^

Expand All @@ -46,40 +48,40 @@ PB2NC creates the following temporary files when running:
uses BUFRLIB to read its contents.

* :code:`tmp_pb2nc_bufr_{PID}_tbl`: PB2NC extracts Bufr table data
that is embedded in input files, applies Fortran blocking, and
writes it to this temporary file for later use.
that is embedded in input files and writes it to this temporary
file for later use.

.. note::
The first 3 files listed above are identical. They are all
blocked versions of the same input file. Recommend modifying the
logic to only block the input file once.
Fortran-blocked versions of the same input file. Recommend
modifying the logic to only apply Fortran blocking once.

.. _tmp_files_point2grid:

Point2Grid Tool
^^^^^^^^^^^^^^^

The Point2Grid tool reads point observations from a variety of
inputs and summarizes them on a grid. When processing GOES input
files, a temporary NetCDF file is created to store the mapping of
input pixel locations to output grid cells.
input pixel locations to output grid cells unless the
MET_GEOSTATIONARY_DATA environment variable defines an existing grid
navigation file to be used.

If that geostationary grid mapping file already exists, it is used
directly and not recreated. If not, it is created as needed.
If that temporary geostationary grid mapping file already exists, it
is used directly and not recreated. If not, it is created as needed.

Note that this temporary file is *not* deleted by the Point2Grid
tool. Once created, it is intended to be reused in future runs.

.. note::
Should this grid navigation file actually be written to the
temporary directory or should it be written to an output
directory instead since its intended to be reused across multiple
runs?
.. _tmp_files_bootstrap:

Bootstrap Confidence Intervals
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Several MET tools support the computation of bootstrap confidence
intervals as described in the User's Guide :numref:`config_boot`
and :numref:`Appendix D, Section %s <App_D-Confidence-Intervals>`.
intervals, as described in :numref:`User's Guide Section %s <config_boot>`
and :numref:`User's Guide Appendix D, Section %s <App_D-Confidence-Intervals>`.
When bootstrap confidence intervals are requested, up to two
temporary files are created for each CNT, CTS, MCTS, NBRCNT, or
NBRCTS line type written to the output.
Expand All @@ -97,9 +99,11 @@ Where {LINE_TYPE} is :code:`cnt`, :code:`cts`, :code:`mcts`,
:code:`nbrcnt`, or :code:`nbrcts`.

.. note::
Consider whether or not its realistic to hold the resampled
statistics all in memory. If so, that'd save a lot of time in
I/O.
Consider whether or not it's realistic to hold the resampled
statistics in memory rather than writing them to temporary files.
If so, that would reduce the I/O.

.. _tmp_files_stat_analysis:

Stat-Analysis Tool
^^^^^^^^^^^^^^^^^^
Expand All @@ -125,11 +129,13 @@ and writes the result to a temporary file.
necessary, when multiple jobs are specified along with non-empty
common filtering logic.

.. _tmp_files_python_embedding:

Python Embedding
^^^^^^^^^^^^^^^^

As described in the User's Guide
:numref:`Appendix F, Section %s <appendixF>`, when the
As described in
:numref:`User's Guide Appendix F, Section %s <appendixF>`, when the
:code:`MET_PYTHON_EXE` environment variable is set, the MET tools run
any Python embedding commands using the specified Python executable.

Expand All @@ -145,18 +151,19 @@ any Python embedding commands using the specified Python executable.
The compile-time Python instance is run to read data from these
temporary files.

.. _tmp_files_tc_diag:

TC-Diag Tool
^^^^^^^^^^^^

The TC-Diag tool requires the use of Python embedding. It processes
one or more ATCF tracks and computes model diagnostics. For each
track point, it converts gridded model data to cyclindrical
track point, it converts gridded model data to cylindrical
coordinates centered at that point, writes it to a temporary NetCDF
file, and passes it to Python scripts to compute the model
diagnostics.
file, and passes it to Python scripts to compute model diagnostics.

* :code:`tmp_met_nc_{PID}`: Cylindrical coordinate model data is
written to this temporary NetCDF file for each track point
and passed to Python scripts to compute diagnostics. If requested,
the temporary NetCDF files for each track point are combined into
a single output NetCDF cylindrical coordinates file for each track.
these temporary NetCDF files for each track point are combined into
a single NetCDF cylindrical coordinates output file for each track.
2 changes: 1 addition & 1 deletion docs/Users_Guide/appendixF.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ If a user attempts to invoke Python embedding with a version of MET that was not
Controlling Which Python MET Uses When Running
==============================================

When MET is compiled with Python embedding support, MET uses the Python executable in that Python installation by default when Python embedding is used. However, for users of highly configurable Python environments, the Python instance set at compilation time may not be sufficient. Users may want to use an alternate Python installation if they need additional packages not available in the Python installation used when compiling MET. In MET versions 9.0+, users have the ability to use a different Python executable when running MET than the version used when compiling MET by setting the environment variable **MET_PYTHON_EXE**.
When MET is compiled with Python embedding support, MET uses the Python executable in that Python installation by default when Python embedding is used. However, for users of highly configurable Python environments, the Python instance set at compilation time may not be sufficient. Users may want to use an alternate Python installation if they need additional packages not available in the Python installation used when compiling MET. In MET versions 9.0+, users have the ability to use a different Python executable when running MET than the version used when compiling MET by setting the environment variable **MET_PYTHON_EXE**. Whenever **MET_PYTHON_EXE** is set, MET writes a temporary file, as described in :numref:`Contributor's Guide Section %s <tmp_files_python_embedding>`.

If a user's Python script requires packages that are not available in the Python installation used when compiling the MET software, they will encounter a runtime error when using MET. In this instance, the user will need to change the Python MET is using to a different installation with the required packages for their script. It is the responsibility of the user to manage this Python installation, and one popular approach is to use a custom Anaconda (Conda) Python environment. Once the Python installation meeting the user's requirements is available, the user can force MET to use it by setting the **MET_PYTHON_EXE** environment variable to the full path of the Python executable in that installation. For example:

Expand Down
4 changes: 2 additions & 2 deletions docs/Users_Guide/config_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -548,8 +548,8 @@ Some tools override the temporary directory by the command line argument
tmp_dir = "/tmp";
A description of the use of temporary files in MET can be found in the
Contributor's Guide section :numref:`tmp_file_use`.
A description of the use of temporary files in MET can be found in
:numref:`Contributor's Guide Section %s <tmp_file_use>`.

message_type_group_map
^^^^^^^^^^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion docs/Users_Guide/point-stat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ For continuous fields (e.g., temperature), it is possible to estimate confidence

For the measures relating the two fields (i.e., mean error, correlation and standard deviation of the errors), confidence intervals are based on either the joint distributions of the two fields (e.g., with correlation) or on a function of the two fields. For the correlation, the underlying assumption is that the two fields follow a bivariate normal distribution. In the case of the mean error and the standard deviation of the mean error, the assumption is that the errors are normally distributed, which for continuous variables, is usually a reasonable assumption, even for the standard deviation of the errors.

Bootstrap confidence intervals for any verification statistic are available in MET. Bootstrapping is a nonparametric statistical method for estimating parameters and uncertainty information. The idea is to obtain a sample of the verification statistic(s) of interest (e.g., bias, ETS, etc.) so that inferences can be made from this sample. The assumption is that the original sample of matched forecast-observation pairs is representative of the population. Several replicated samples are taken with replacement from this set of forecast-observation pairs of variables (e.g., precipitation, temperature, etc.), and the statistic(s) are calculated for each replicate. That is, given a set of n forecast-observation pairs, we draw values at random from these pairs, allowing the same pair to be drawn more than once, and the statistic(s) is (are) calculated for each replicated sample. This yields a sample of the statistic(s) based solely on the data without making any assumptions about the underlying distribution of the sample. It should be noted, however, that if the observed sample of matched pairs is dependent, then this dependence should be taken into account somehow. Currently, the confidence interval methods in MET do not take into account dependence, but future releases will support a robust method allowing for dependence in the original sample. More detailed information about the bootstrap algorithm is found in the :numref:`Appendix D, Section %s. <appendixD>`
Bootstrap confidence intervals for any verification statistic are available in MET. Bootstrapping is a nonparametric statistical method for estimating parameters and uncertainty information. The idea is to obtain a sample of the verification statistic(s) of interest (e.g., bias, ETS, etc.) so that inferences can be made from this sample. The assumption is that the original sample of matched forecast-observation pairs is representative of the population. Several replicated samples are taken with replacement from this set of forecast-observation pairs of variables (e.g., precipitation, temperature, etc.), and the statistic(s) are calculated for each replicate. That is, given a set of n forecast-observation pairs, we draw values at random from these pairs, allowing the same pair to be drawn more than once, and the statistic(s) is (are) calculated for each replicated sample. This yields a sample of the statistic(s) based solely on the data without making any assumptions about the underlying distribution of the sample. It should be noted, however, that if the observed sample of matched pairs is dependent, then this dependence should be taken into account somehow. Currently, the confidence interval methods in MET do not take into account dependence, but future releases will support a robust method allowing for dependence in the original sample. More detailed information about the bootstrap algorithm is found in the :numref:`Appendix D, Section %s <appendixD>`. Note that MET writes temporary files whenever bootstrap confidence intervals are computed, as described in :numref:`Contributor's Guide Section %s <tmp_files_bootstrap>`.

Confidence intervals can be calculated from the sample of verification statistics obtained through the bootstrap algorithm. The most intuitive method is to simply take the appropriate quantiles of the sample of statistic(s). For example, if one wants a 95% CI, then one would take the 2.5 and 97.5 percentiles of the resulting sample. This method is called the percentile method, and has some nice properties. However, if the original sample is biased and/or has non-constant variance, then it is well known that this interval is too optimistic. The most robust, accurate, and well-behaved way to obtain accurate CIs from bootstrapping is to use the bias corrected and adjusted percentile method (or BCa). If there is no bias, and the variance is constant, then this method will yield the usual percentile interval. The only drawback to the approach is that it is computationally intensive. Therefore, both the percentile and BCa methods are available in MET, with the considerably more efficient percentile method being the default.

Expand Down
3 changes: 2 additions & 1 deletion docs/Users_Guide/reformat_point.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ ____________________
version = "VN.N";
The configuration options listed above are common to many MET tools and are described in :numref:`config_options`.
The use of temporary files in PB2NC is described in :numref:`Contributor's Guide Section %s <tmp_files_pb2nc>`.

_____________________

Expand Down Expand Up @@ -1082,7 +1083,7 @@ Optional arguments for point2grid

Only 4 interpolation methods are applied to the field variables; MIN/MAX/MEDIAN/UW_MEAN. The GAUSSIAN method is applied to the probability variable only. Unlike regrad_data_plane, MAX method is applied to the file variable and Gaussian method to the probability variable with the MAXGAUSS method. If the probability variable is not requested, MAXGAUSS method is the same as MAX method.

For the GOES-16 and GOES-17 data, the computing lat/long is time consuming. So the computed coordinate (lat/long) is saved into the NetCDF file to the environment variable MET_TMP_DIR or */tmp* if MET_TMP_DIR is not defined. The computing lat/long step can be skipped if the coordinate file is given through the environment variable MET_GEOSTATIONARY_DATA. The grid mapping to the target grid is saved to MET_TMP_DIR to save the execution time. Once this file is created, the MET_GEOSTATIONARY_DATA is ignored. The grid mapping file should be deleted manually in order to apply a new MET_GEOSTATIONARY_DATA environment variable or to re-generate the grid mapping file. An example of call point2grid to process GOES-16 AOD data is shown below:
For the GOES-16 and GOES-17 data, the computing lat/long is time consuming. The computed coordinate (lat/long) is saved to a temporary NetCDF file, as described in :numref:`Contributor's Guide Section %s <tmp_files_point2grid>`. The computing lat/long step can be skipped if the coordinate file is given through the environment variable MET_GEOSTATIONARY_DATA. The grid mapping to the target grid is saved to MET_TMP_DIR to save the execution time. Once this file is created, the MET_GEOSTATIONARY_DATA is ignored. The grid mapping file should be deleted manually in order to apply a new MET_GEOSTATIONARY_DATA environment variable or to re-generate the grid mapping file. An example of call point2grid to process GOES-16 AOD data is shown below:

.. code-block:: none
Expand Down
2 changes: 1 addition & 1 deletion docs/Users_Guide/stat-analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ The configuration file for the Stat-Analysis tool is optional. Users may find it

Most of the user-specified parameters listed in the Stat-Analysis configuration file are used to filter the ASCII statistical output from the MET statistics tools down to a desired subset of lines over which statistics are to be computed. Only output that meets all of the parameters specified in the Stat-Analysis configuration file will be retained.

The Stat-Analysis tool actually performs a two step process when reading input data. First, it stores the filtering information defined top section of the configuration file. It applies that filtering criteria when reading the input STAT data and writes the filtered data out to a temporary file. Second, each job defined in the **jobs** entry reads data from that temporary file and performs the task defined for the job. After all jobs have run, the Stat-Analysis tool deletes the temporary file.
The Stat-Analysis tool actually performs a two step process when reading input data. First, it stores the filtering information defined top section of the configuration file. It applies that filtering criteria when reading the input STAT data and writes the filtered data out to a temporary file, as described in :numref:`Contributor's Guide Section %s <tmp_files_stat_analysis>`. Second, each job defined in the **jobs** entry reads data from that temporary file and performs the task defined for the job. After all jobs have run, the Stat-Analysis tool deletes the temporary file.

This two step process enables the Stat-Analysis tool to run more efficiently when many jobs are defined in the configuration file. If only operating on a small subset of the input data, the common filtering criteria can be applied once rather than re-applying it for each job. In general, filtering criteria common to all tasks defined in the **jobs** entry should be moved to the top section of the configuration file.

Expand Down
2 changes: 1 addition & 1 deletion docs/Users_Guide/tc-diag.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Originally developed for the Statistical Hurricane Intensity Prediction Scheme (

TC-Diag is run once for each initialization time to produce diagnostics for each user-specified combination of TC tracks and model fields. The user provides track data (such as one or more ATCF a-deck track files), along with track filtering criteria as needed, to select one or more tracks to be processed. The user also provides gridded model data from which diagnostics should be computed. Gridded data can be provided for multiple concurrent storms, multiple models, and/or multiple domains (i.e. parent and nest) in a single run.

TC-Diag first determines the list of valid times that appear in any one of the tracks. For each valid time, it processes all track points for that time. For each track point, it reads the gridded model fields requested in the configuration file and transforms the gridded data to a range-azimuth cylindrical coordinates grid. For each domain, it writes the range-azimuth data to a temporary NetCDF file.
TC-Diag first determines the list of valid times that appear in any one of the tracks. For each valid time, it processes all track points for that time. For each track point, it reads the gridded model fields requested in the configuration file and transforms the gridded data to a range-azimuth cylindrical coordinates grid. For each domain, it writes the range-azimuth data to a temporary NetCDF file, as described in :numref:`Contributor's Guide Section %s <tmp_files_tc_diag>`.

.. note:: The current version of the tool does not yet include the capabilities described in the next three paragraphs. These additional capabilities are planned to be added in the MET v12.0.0 release later in 2023.

Expand Down

0 comments on commit b42a4de

Please sign in to comment.