Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build and RT/UT documentation for user's guide #593

295 changes: 275 additions & 20 deletions doc/UsersGuide/source/BuildingAndRunning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,12 @@ set up specifically for issues related to build dependencies.
Downloading the Weather Model Code
==================================

To clone the ufs-weather-model repository for this v2.0.0 release, execute the following commands:
To clone the ufs-weather-model repository, execute the following commands:

.. code-block:: console

git clone https://github.com/ufs-community/ufs-weather-model.git ufs-weather-model
cd ufs-weather-model
git checkout ufs-v2.0.0
git submodule update --init --recursive

Compiling the model will take place within the `ufs-weather-model` directory you just created.
Expand All @@ -86,9 +85,8 @@ that, these environment variables need to be set, as shown in :numref:`Table %s
+------------------+-----------------------------------------------------------------+
| **NCEP Library** | **Environment Variables** |
+==================+=================================================================+
| nemsio | export NEMSIO_INC=<path_to_nemsio_include_dir> |
+------------------+-----------------------------------------------------------------+
| | export NEMSIO_LIB=<path_to_nemsio_lib_dir>/libnemsio<version>.a |
| nemsio || export NEMSIO_INC=<path_to_nemsio_include_dir> |
| || export NEMSIO_LIB=<path_to_nemsio_lib_dir>/libnemsio<version>.a|
+------------------+-----------------------------------------------------------------+
| bacio | export BACIO_LIB4=<path_to_bacio_lib_dir>/libbacio<version>.a |
+------------------+-----------------------------------------------------------------+
MinsukJi-NOAA marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -116,15 +114,13 @@ that, these environment variables need to be set, as shown in :numref:`Table %s
The following are a few different ways to set the required environment variables to the correct values.
If you are running on one of the `pre-configured platforms
<https://github.com/ufs-community/ufs/wiki/Supported-Platforms-and-Compilers>`_, you can set them using
modulefiles. Modulefiles for all supported platforms are located in ``modulefiles/<platform>/fv3``. To
modulefiles. Modulefiles for all supported platforms are located in ``modulefiles/ufs_<platform>.<compiler>``. To
load the modules from the `ufs-weather-model` directory on hera:

.. code-block:: console

cd modulefiles/hera.intel
module use $(pwd)
module load fv3
cd ../..
module use modulefiles
module load ufs_hera.intel

Note that loading this module file will also set the CMake environment variables shown in
:numref:`Table %s <CMakeEnv>`.
Expand Down Expand Up @@ -154,19 +150,56 @@ to build the prerequisite libraries, there is a script in the ``NCEPLIBS-ufs-v2.

Of course, you can also set the values of these variables yourself if you know where the paths are on your system.

--------------------------------------------
Setting the CCPP_SUITES environment variable
--------------------------------------------
-------------------------------------------------------------
Setting the CMAKE_FLAGS and CCPP_SUITES environment variables
-------------------------------------------------------------

You need to use the ``CMAKE_FLAGS`` environment variable to specify which application to build.
In order to have one or more CCPP physics suites available at runtime, you also need to select those suites at
build time by setting the ``CCPP_SUITES`` environment variable. Multiple suites can be set. Following
examples are for the bash shell.

For the ufs-weather-model ATM app (standalone ATM):

.. code-block:: console

export CMAKE_FLAGS="-DAPP=ATM"
export CCPP_SUITES="FV3_GFS_v16"

In order to have one or more CCPP physics suites available at runtime, you need to select those suites at
build time by setting the ``CCPP_SUITES`` environment variable. Multiple suites can be set, as shown below
in an example for the bash shell:
For the ufs-weather-model ATM app (standalone ATM) in 32 bit:

.. code-block:: console

export CCPP_SUITES="FV3_GFS_v15p2,FV3_GFS_v16beta"
export CMAKE_FLAGS="-DAPP=ATM -D32BIT=ON"
export CCPP_SUITES="FV3_GFS_v16"

For the ufs-weather-model ATMW app (standalone ATM with wave):

.. code-block:: console

export CMAKE_FLAGS="-DAPP=ATMW"
export CCPP_SUITES="FV3_GFS_v16"

For the ufs-weather-model S2S app (atm/ice/ocean):

.. code-block:: console

export CMAKE_FLAGS="-DAPP=S2S"
export CCPP_SUITES="FV3_GFS_2017_coupled,FV3_GFS_2017_satmedmf_coupled,FV3_GFS_v15p2_coupled,FV3_GFS_v16_coupled,FV3_GFS_v16_couplednsst"

For the ufs-weather-model S2S app (atm/ice/ocean) with debugging flags turned on, with verbose build messages:
MinsukJi-NOAA marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: console

export CMAKE_FLAGS="-DAPP=S2S -DDEBUG=ON"
export CCPP_SUITES="FV3_GFS_2017_coupled,FV3_GFS_2017_satmedmf_coupled,FV3_GFS_v15p2_coupled,FV3_GFS_v16_coupled,FV3_GFS_v16_couplednsst"

For the ufs-weather-model S2SW app (atm/ice/ocean/wave):

.. code-block:: console

If ``CCPP_SUITES`` is not set, the default is set to ``‘FV3_GFS_v15p2’`` in ``build.sh``.
export CMAKE_FLAGS="-DAPP=S2SW"
export CCPP_SUITES="FV3_GFS_2017_coupled,FV3_GFS_v15p2_coupled,FV3_GFS_v16_coupled,FV3_GFS_v16_coupled_noahmp"

------------------
Building the model
Expand Down Expand Up @@ -195,5 +228,227 @@ set up specifically for issues related to the Weather Model.
=================
Running the model
=================
The `UFS Weather Model wiki <https://github.com/ufs-community/ufs-weather-model/wiki>`_ includes a simple
test case that illustrates how the model can be run.

.. _UsingRegressionTest:

--------------------------------
Using the regression test script
--------------------------------
The regression test script ``rt.sh`` in the tests/ directory can be
used to run a number of preconfigured test cases. It is the top-level script
that calls lower-level scripts to build, set up environments and run tests.
On `Tier-1 platforms <https://github.com/ufs-community/ ufs-weather-model/wiki
/Regression-Test-Policy-for-Weather-Model-Platforms-and-Compilers>`_, it can
be as simple as editing the ``rt.conf`` file and subsequently executing

.. code-block:: console

./rt.sh -l rt.conf

Following discussions are general, but the user may not be able to successfully
execute the script as is unless s/he is on one of the Tier-1 platforms.

Each line in the PSV (Pipe-separated values) file ``rt.conf`` is used to either
build or run. The ``COMPILE`` line specifies the application to build (e.g.
``APP=S2S``), CCPP suite to use (e.g. ``SUITES=FV3_GFS_2017_coupled``), and
DeniseWorthen marked this conversation as resolved.
Show resolved Hide resolved
additional build options (e.g. ``DEBUG=Y``) as necessary. The ``RUN`` line
specifies the name of a test to run. The test name should match the name of one
of the test files in the tests/tests/ directory or, if the user is adding a new
test, the name of the new test file. The order of lines in ``rt.conf`` matters
since ``rt.sh`` processes them sequentially; a ``RUN`` line should be proceeded
by a ``COMPILE`` line that builds the model used in the test. The following example
``rt.conf`` file builds the Subseasonal to Seasonal (S2S) model and then runs the
``cpld_control`` test:

.. code-block:: console

COMPILE | APP=S2S SUITES=FV3_GFS_2017_coupled | | fv3
RUN | cpld_control | | fv3

The third column of ``rt.conf`` relates to the platform; if left blank, the test
runs on all Tier-1 platforms. The fourth column deals with baseline creation (more
on this later) and ``fv3`` means the test will be included during baseline creation.
The ``rt.conf`` file includes a large number of tests. If the user wants to run
only a specific test, s/he can either comment out (using the ``#`` prefix) the
tests to be skipped, or create a new file, e.g. ``my_rt.conf``, then execute
``./rt.sh -l my_rt.conf``.

The regression test generates a number of log files. The summary log file
``RegressionTests_<machine>.<compiler>.log`` in the tests/ directory compares
the results of the test against the baseline specific to a given platform and
reports the outcome (hence, the 'regression' test): 'Missing file' results when
the expected files from the simulation are not found, and typically occurs
when the simulation did not run to completion; 'OK' means that the simulation
results are bit-for-bit identical to those of the baseline; 'NOT OK' when
the results are not bit-for-bit identical; and 'Missing baseline' when there
is no baseline data to compare against.

More detailed log files are found in the tests/log_<machine>.<compiler>/ directory.
In particular, the user may find useful the run directory path provided as the
value of ``RUNDIR`` variable in the ``run_<test-name>`` file. ``$RUNDIR`` is a
self-contained (i.e. sandboxed) directory with the executable file, initial
conditions, model configuration files, environment setup scripts and a batch job
submission script. The user can run the test by cd'ing into ``$RUNDIR`` and
invoking the command ``sbatch job_card``. Note that ``$RUNDIR`` is automatically
deleted at the end of a successful regression test; specifying the ``-k`` option
retains the ``$RUNDIR``, e.g. ``./rt.sh -l rt.conf -k``.

Found inside the ``$RUNDIR`` directory are the model configuration files
``data_table``, ``diag_table``, ``ice_in``, ``input.nml``, ``model_configure``
and ``nems.configure``. They are generated by ``rt.sh`` from the template files
in the tests/parm/ directory. Specific values used to fill in the template files
depend on the test being run, and are set in two stages: default values are
specified in ``tests/default_vars.sh`` and the default values are overriden if
necessary by those specified in a test file ``tests/tests/<test-name>``. For
example, the variable ``DT_ATMOS``, which is substituted into the template file
``model_configure.IN`` to generate ``model_configure``, is initially assigned
1800 in the function ``export_fv3`` of the script ``default_vars.sh``, but the
test file ``tests/tests/control`` overrides by reassigning 720 to the variable.

Also found inside the ``$RUNDIR`` directory are the files ``fv3_run`` and
``job_card``, which are generated from the template files in the tests/fv3_conf/
directory. The latter is a platform-specific batch job submission script, while
the former prepares the initial conditions by copying relevant data from the
input data directory of a given platform to the ``$RUNDIR`` directory.
:numref:`Table %s <RTSubDirs>` summarizes the subdirectories discussed above.

.. _RTSubDirs:

.. table:: *Regression test subdirectories*

+-----------------+--------------------------------------------------------------------------------------+
| **Name** | **Description** |
+=================+======================================================================================+
| tests/ | Regression test root directory. Contains rt-related scripts and the summary log file |
+-----------------+--------------------------------------------------------------------------------------+
| tests/tests/ | Contains specific test files |
+-----------------+--------------------------------------------------------------------------------------+
| tests/parm/ | Contains templates for model configuration files |
+-----------------+--------------------------------------------------------------------------------------+
| tests/fv3_conf/ | Contains templates for setting up initial conditions and a batch job |
+-----------------+--------------------------------------------------------------------------------------+
| tests/log_*/ | Contains fine-grained log files |
+-----------------+--------------------------------------------------------------------------------------+

There are a number of command line options available to the ``rt.sh`` script.
The user can execute ``./rt.sh`` to see information on these options. A couple
of them are discussed here. When running a large number (10's or 100's) of
tests, the ``-e`` option to use the ecFlow workflow manager can significantly
decrease the testing time by queuing the jobs according to dependencies and
running them concurrently. The ``-n`` option can be used to run a single test;
for example, ``./rt.sh -n cpld_control`` will build the S2S model and run the
``cpld_control`` test. The ``-c`` option is used to create baseline. New
baslines are needed when code changes lead to result changes, and therefore
deviate from existing baselines on a bit-for-bit basis.

When a developer needs to create a new test for his/her implementation, the
first step would be to identify a test in the tests/tests/ directory that can
be used as a basis and to examine the variables defined in the test file. As
mentioned above, some of the variables may be overrides for those defined in
``default_vars.sh``; others may be new variables that are needed specifically
for the test. Default variables and their values are defined in the ``export_fv3``
function of the ``default_vars.sh`` script for ATM application, ``export_cpl``
function for S2S application and ``export_datm`` function for GODAS application.
Also, the names of template files for model configuration and initial conditions
can be identified via variables ``INPUT_NML``, ``NEMS_CONFIGURE`` and ``FV3_RUN``;
for example, by trying ``grep -n INPUT_NML *`` inside the tests/ and tests/tests/
directories.

.. _UsingUnitTest:

--------------------------
Using the unit test script
--------------------------
The unit test script ``utest`` in the tests/ directory can also be used to run
tests. Given the name of a test, ``utest`` carries out a suite of test cases.
Each test case addresses an aspect of the requirements new implementations
should satisfy, which are shown in :numref:`Table %s <ImplementationRequirement>`.
For the following discussions on utest, the user should note the distinction between
'test name' and 'test case': examples of test name are ``control``, ``cpld_control``
and ``regional_control`` which are all found in the /tests/tests/ directory, whereas
test case refers to any one of ``thr``, ``mpi``, ``dcp``, ``rst``, ``bit`` and ``dbg``.

.. _ImplementationRequirement:

.. table:: *Implementation requirements*

+----------+------------------------------------------------------------------------+
| **Case** | **Description** |
+==========+========================================================================+
| thr | Varying the number of threads produces the same results |
+----------+------------------------------------------------------------------------+
| mpi | Varying the number of MPI tasks reproduces |
+----------+------------------------------------------------------------------------+
| dcp | Varying the decomposition (i.e. tile layout of FV3) reproduces |
+----------+------------------------------------------------------------------------+
| rst | Restarting reproduces |
+----------+------------------------------------------------------------------------+
| bit | Model can be compiled in double/single precision and run to completion |
+----------+------------------------------------------------------------------------+
| dbg | Model can be compiled and run to completion in debug mode |
+----------+------------------------------------------------------------------------+

The unit test uses the same testing framework used by the regression
test, and therefore it is recommened that the user first read the
:numref:`Section on regression test %s <UsingRegressionTest>`. All the files in
MinsukJi-NOAA marked this conversation as resolved.
Show resolved Hide resolved
the subdirectories shown in :numref:`Table %s <RTSubDirs>` are relavant to the
unit test except that the ``utest`` script replaces ``rt.sh`` and the
``utest.bld`` file replaces ``rt.conf``. The /tests/utests/ directory contains
utest-specific lower-level scripts used to set up run configurations.

On `Tier-1 platforms <https://github.com/ufs-community/ ufs-weather-model/wiki
/Regression-Test-Policy-for-Weather-Model-Platforms-and-Compilers>`_, tests can
be run by first modifying the PSV file ``utest.bld`` to specify the build options
and then invoking

.. code-block:: console

./utest -n <test-name>

For example, including in the ``utest.bld`` file the following line

.. code-block:: console

cpld_control | APP=S2S SUITES=FV3_GFS_2017_coupled

and then executing ``./utest -n cpld_control`` performs all six test cases
listed in :numref:`Table %s <ImplementationRequirement>` for ``cpld_control``
test. At the end of the run, a log file ``UnitTests_<machine>.<compiler>.log``
is generated in tests/ directory, which informs the user whether each test case
passed or failed. The user can choose to run a specific test case by invoking

.. code-block:: console

./utest -n <test-name> -c <test-case>

where ``<test-case>`` is one or
more comma-separated values selected from ``thr``, ``mpi``, ``dcp``, ``rst``,
``bit``, ``dbg``. For example, ``./utest -n cpld_control -c thr,rst`` runs the
``cpld_control`` test and checks the reproducibility of threading and restart.
The user can see different command line options available to ``utest`` by
executing ``./utest -h``; frequently used options are ``-e`` to use the ecFlow
workflow manager, and ``-k`` to keep the ``$RUNDIR``. In the following,
comparisons are made between the regression and unit tests on how they handle
different reproducibility tests.

As discussed in :numref:`Section %s <UsingRegressionTest>`, the variables and
values used to configure model parameters and to set up initial conditions in the
``$RUNDIR`` directory are set up by the regression test script ``rt.sh`` in two
stages: first, ``tests/default_vars.sh`` define default values; then a specific
test file in the tests/tests/ subdirectory either overrides the default values or
creates new variables if required by the test. The regression test treats
the different test cases shown in :numref:`Table %s <ImplementationRequirement>`
as different tests. Therefore, each test case requires a test file in the tests/tests/
subdirectory; examples are ``control_2threads``, ``control_decomp``,
``control_restart`` and ``control_debug``, which are just variations of ``control``
test to check various reproducibilities. There are two potential issues with this
approach. First, if several different variations of a given test were to be created
and included in the ``rt.conf`` file, there are too many tests to run. Second, if
a new test is added by the user, s/he will also have to create these variations.
The idea behind the unit test is to automatically configure and run these variations,
or test cases, given a test file. For example, ``./utest -n control`` will run all
six test cases in :numref:`Table %s <ImplementationRequirement>` based on a single
``control`` test file. Similarly, if the user adds a new test ``new_test``, then
``./utest -n new_test`` will run all test cases. This is done by the unit test script
``utest`` by adding a third stage of variable overrides, and the related scripts can
be found in the tests/utests/ directory.