Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates on 04_21 #2

Merged
merged 253 commits into from
Apr 15, 2021
Merged

Updates on 04_21 #2

merged 253 commits into from
Apr 15, 2021

Conversation

OlgaSergienko
Copy link
Owner

No description provided.

wrongkindofdoctor and others added 30 commits February 5, 2019 12:59
Merge in latest dev/gfdl updates
Merge in latest MOM6 dev/gfdl updates
Merge in latest MOM dev/gfdl updates
Merge in latest MOM6 updates
Merge in latest dev/gfdl updates
Merge in latest dev/gfdl updates
merge latest updates into dev/gfdl
Merge in latest dev/gfdl commits
Merge in latest dev/gfdl updates
Merge in latest dev/gfdl updates
merge in latest dev/gfdl changes
merge in latest dev/gfdl updates
Merge in latest MOM6 updates
Merge in latest dev/gfdl updates
Merge in latest round of dev/gfdl updates
merge in latest dev/gfdl updates
Merge in latest dev/gfdl updates
Merge in latest dev/gfdl updates
Merge in latest dev/gfdl updates
Merge in latest updates from dev/gfdl
Merge in latest dev/gfdl updates
Merge in updates to remap_all_state_vars
* TC4 integration into test suite

This patch renames the tc4 test to activate it in the test suite.  It
also modifies the Makefile to build the input field test scripts.

It also modifies the Python build scripts to be PEP8-conformant.

We temporarily disable tc4 in the restart tests, since they currently
fail.  This needs to be addressed before we can merge this into the main
branch.

The patch does not enable the necessary Python modules for running on
Travis, that will also be addressed later.

* Travis python support; tc4 Makefile

The custom TC4 Makefile has been added (oops), and the presumed Python
Ubuntu packages have been added for Travis.

* Verify ENABLE_THERMODYNAMICS is True before posting C_p diagnostic

* Make tc4 faster

* remove trailing whitespace

* add unit scaling

* fix restart fail for tc4 and some cleanup

* remove trailiny ws

* Enable tc4.restart test

* +Pass timeesteps to tracer diagnostics in [T]

  Pass timeesteps to the tracer diagnistics routines post_tracer_diagnostics and
postALE_tracer_diagnostics and to adiabatic in units of [T}.  All answers are
bitwise identical.

* +Rescaled tracer advective flux diagnostics

  Rescaled the internal units of the tracer advective flux diagnostics to units
of [conc H L2 T-1] for code simplicity and dimensional consistency testing.
Also corrected the units of some tracer fluxes as documented in comments and
commented out unused elements of the tracer_type.  All answers are bitwise
identical.

* +Pass timesteps to ALE_main in [T]

  Pass the timesteps to ALE_main, ALE_main_offline, and ALE_main_accelerated in
units of [T] for code simplicity and dimensional consistency testing.  This also
includes the rescaling of remapping-driven tracer tendencies.  All answers and
diagnostics are bitwise identical.

* +Pass timesteps to tracer column_physics in [T]

  Pass timesteps to the various tracer column_physics routines in [T] for
dimensional consistency testing.  Also added a new unit_scale_type argument to
these routines.  All answers are bitwise identical, but there are minor
interface changes to 13 subroutines.

* +Pass timesteps to applyTracerBoundaryFluxesInOut in [T]

  Pass timesteps to applyTracerBoundaryFluxesInOut in [T], and use units of
[T-1] for internal source and decay rates for the oil tracer and in fluxes of
CFCs.  Also modified extract_offline_main to return timesteps as real values
with units of [T].  Also there is a new unit_scale_type argument to
register_oil_tracer.   All answers in the MOM6_examples test cases and
regression tests are bitwise identical.

* Simplified expressions in MOM_PointAccel

  Simplified expressions inside of MOM_PointAccel, taking into account that all
velocities use the same units of [L T-1].  All answers are bitwise identical.

* Corrected dimensional epsilons in downscaling

  Added distinct negligible volumes, face areas, horizonal areas and lengths
with proper dimensional rescaling in the downsample field routines.  With these
changes, downscaled diagnostics should now pass the dimensional rescaling tests,
whereas previously there would have been a problem when the numbers used to
represent lengths are smaller than about 1e-8 times their MKS values.  All
answers are bitwise identical without dimensional rescaling.

* Simplified expressions in MOM_offline_aux

  Simplified expressions in distribute_residual_uh_barotropic.  All answers are
bitwise identical.

* Revised wave_speed to return speed in [L T-1]

  Revised wave_speed to return the internal wave speed in units of [L T-1] and
to use mono_N2_depth in units of [Z] for code simplification and expanded
dimensional consistency testing.  Also revised the internal units of some
related diagnostics in calculate_diagnostic_fields.  All answers and diagnostics
are bitwise identical.

* Rescaled internal variables in wave_speed

  Rescale internal calculations in wave_speed and wave_speeds for greater
robustness and dimensional consistency testing.  All answers are bitwise
identical and pass dimensional scaling tests.

* +Changed the units of minimum_forcing_depth to [H]

  Changed the units of minimum_forcing_depth passed to applyBoundaryFluxesInOut
and applyTracerBoundaryFluxesInOut to [H].  All answers are bitwise identical.

* Correction of documented units in comments

  Corrected some units in comments and eliminated some unused variables.
All answers are bitwise identical.

* Adiabatic clock ID bugfix

This patch fixes an initialization bug of the diabatic timer, which was
being used to measure adiabatic time but was never initialized if the
experiment was configured as adiabatic.

We fix this by introducing a separate timer for the adiabatic solver.
Although we could have reused the diabatic timer, the addition of a new
variable should not add any overhead on modern compilers.

* Corrected an OMP declaration

  Added a variable to an OMP declaration.  All answers are bitwise identical,
and a recently added compile-time error with openMP was fixed.

* Update MOM.F90

Fixed Alistair's embarrassing error.

* Dimensional rescaling in MOM_open_boundary.F90

  Added rescaling for dimensional consistency testing in MOM_open_boundary.F90,
including splitting variables with different units that had previously shared
the same variable and adding more extensive documentation of variables.  Also
changed the dimensions of the timesteps passed to radiation_open_bdry_conds and
update_segment_tracer_reservoirs to [T] and added vertical_grid_type and
unit_scale_type arguments to open_boundary_init and open_boundary_test_extern_h.
All answers are bitwise identical, although some probably bugs have been noted
in comments and there are new or altered arguments to several routines.

* (*)Fixed invariance bugs in MOM_open_boundary.F90

  Corrected dimensional consistency bugs in update_segment_tracer_reservoirs and
horizontal indexing and related bugs in gradient_at_q_points with oblique_grad
OBCs.  These will both change answers in test cases that use some open boundary
condition options, but not in any of the MOM6-examples test cases.
* TC4 integration into test suite

This patch renames the tc4 test to activate it in the test suite.  It
also modifies the Makefile to build the input field test scripts.

It also modifies the Python build scripts to be PEP8-conformant.

We temporarily disable tc4 in the restart tests, since they currently
fail.  This needs to be addressed before we can merge this into the main
branch.

The patch does not enable the necessary Python modules for running on
Travis, that will also be addressed later.

* Travis python support; tc4 Makefile

The custom TC4 Makefile has been added (oops), and the presumed Python
Ubuntu packages have been added for Travis.

* Verify ENABLE_THERMODYNAMICS is True before posting C_p diagnostic

* Make tc4 faster

* remove trailing whitespace

* add unit scaling

* fix restart fail for tc4 and some cleanup

* remove trailiny ws

* Enable tc4.restart test

* +Pass timeesteps to tracer diagnostics in [T]

  Pass timeesteps to the tracer diagnistics routines post_tracer_diagnostics and
postALE_tracer_diagnostics and to adiabatic in units of [T}.  All answers are
bitwise identical.

* +Rescaled tracer advective flux diagnostics

  Rescaled the internal units of the tracer advective flux diagnostics to units
of [conc H L2 T-1] for code simplicity and dimensional consistency testing.
Also corrected the units of some tracer fluxes as documented in comments and
commented out unused elements of the tracer_type.  All answers are bitwise
identical.

* +Pass timesteps to ALE_main in [T]

  Pass the timesteps to ALE_main, ALE_main_offline, and ALE_main_accelerated in
units of [T] for code simplicity and dimensional consistency testing.  This also
includes the rescaling of remapping-driven tracer tendencies.  All answers and
diagnostics are bitwise identical.

* +Pass timesteps to tracer column_physics in [T]

  Pass timesteps to the various tracer column_physics routines in [T] for
dimensional consistency testing.  Also added a new unit_scale_type argument to
these routines.  All answers are bitwise identical, but there are minor
interface changes to 13 subroutines.

* +Pass timesteps to applyTracerBoundaryFluxesInOut in [T]

  Pass timesteps to applyTracerBoundaryFluxesInOut in [T], and use units of
[T-1] for internal source and decay rates for the oil tracer and in fluxes of
CFCs.  Also modified extract_offline_main to return timesteps as real values
with units of [T].  Also there is a new unit_scale_type argument to
register_oil_tracer.   All answers in the MOM6_examples test cases and
regression tests are bitwise identical.

* Simplified expressions in MOM_PointAccel

  Simplified expressions inside of MOM_PointAccel, taking into account that all
velocities use the same units of [L T-1].  All answers are bitwise identical.

* Corrected dimensional epsilons in downscaling

  Added distinct negligible volumes, face areas, horizonal areas and lengths
with proper dimensional rescaling in the downsample field routines.  With these
changes, downscaled diagnostics should now pass the dimensional rescaling tests,
whereas previously there would have been a problem when the numbers used to
represent lengths are smaller than about 1e-8 times their MKS values.  All
answers are bitwise identical without dimensional rescaling.

* Simplified expressions in MOM_offline_aux

  Simplified expressions in distribute_residual_uh_barotropic.  All answers are
bitwise identical.

* Revised wave_speed to return speed in [L T-1]

  Revised wave_speed to return the internal wave speed in units of [L T-1] and
to use mono_N2_depth in units of [Z] for code simplification and expanded
dimensional consistency testing.  Also revised the internal units of some
related diagnostics in calculate_diagnostic_fields.  All answers and diagnostics
are bitwise identical.

* Rescaled internal variables in wave_speed

  Rescale internal calculations in wave_speed and wave_speeds for greater
robustness and dimensional consistency testing.  All answers are bitwise
identical and pass dimensional scaling tests.

* +Changed the units of minimum_forcing_depth to [H]

  Changed the units of minimum_forcing_depth passed to applyBoundaryFluxesInOut
and applyTracerBoundaryFluxesInOut to [H].  All answers are bitwise identical.

* Correction of documented units in comments

  Corrected some units in comments and eliminated some unused variables.
All answers are bitwise identical.

* Adiabatic clock ID bugfix

This patch fixes an initialization bug of the diabatic timer, which was
being used to measure adiabatic time but was never initialized if the
experiment was configured as adiabatic.

We fix this by introducing a separate timer for the adiabatic solver.
Although we could have reused the diabatic timer, the addition of a new
variable should not add any overhead on modern compilers.

* Corrected an OMP declaration

  Added a variable to an OMP declaration.  All answers are bitwise identical,
and a recently added compile-time error with openMP was fixed.

* Update MOM.F90

Fixed Alistair's embarrassing error.

* Dimensional rescaling in MOM_open_boundary.F90

  Added rescaling for dimensional consistency testing in MOM_open_boundary.F90,
including splitting variables with different units that had previously shared
the same variable and adding more extensive documentation of variables.  Also
changed the dimensions of the timesteps passed to radiation_open_bdry_conds and
update_segment_tracer_reservoirs to [T] and added vertical_grid_type and
unit_scale_type arguments to open_boundary_init and open_boundary_test_extern_h.
All answers are bitwise identical, although some probably bugs have been noted
in comments and there are new or altered arguments to several routines.

* (*)Fixed invariance bugs in MOM_open_boundary.F90

  Corrected dimensional consistency bugs in update_segment_tracer_reservoirs and
horizontal indexing and related bugs in gradient_at_q_points with oblique_grad
OBCs.  These will both change answers in test cases that use some open boundary
condition options, but not in any of the MOM6-examples test cases.
Hallberg-NOAA and others added 29 commits March 31, 2021 18:28
  Added additional domain routine interfaces that are needed by SIS2, including
the new function same_domain, which tests whether two domains use the same
layout and conforming computational domain sizes, and a new 4d-array variant of
redistribute_array because SIS2 uses thickness categories as a 4th dimension.
All answers are bitwise identical.
Remove inappropriate timelevel arguments
- previous code had averaging instead of summation for
  SPP (x:sum,y:point,z:point) diagnostics
- corrects an issue where these diagnostics were incorrect
  by approximately a factor of 2.
- Orginially found when analyzing the depth-integrated
  temperature advection diagnostic (T_ady_2d)
MOM_hor_visc: Variables moved to stack
This patch restructures the CorAdCalc function so that the loops are
more easily vectorized on a broader range of systems. The number of
memory access has also been slightly reduced.

We observed a 1.75x speedup on a modern consumer AMD processor (Ryzen 5
2600) and a 1.24x speedup on Gaea's Intel Xeons (E5-2697 v4).
Description

There are two major changes:

- An if-block testing for `Area_q` was removed, and the `h_neglect *
  Area_q` term was replaced with a new `vol_neglect` term.

  This term is intended to prevent division by zero when the hArea_q is
  zero.  Otherwise, it is meant to be below roundoff and have no impact
  on the calculation.

  Previously, a zero value of Area_q would force a division by zero.
  Using vol_neglect ensures that the denominator is always nonzero.

  The value is set to use `H_subroundoff` times an area of 0.1 mm2,
  suggested by Robert Hallberg as a hypothetical Kolmogorov scale.
  Numerical results are intended to be independent of this choice.

- Two separated loops associated with the bounded Coriolis term were
  combined into a single loop, which reduced both the number of internal
  if-blocks and avoided redundant memory load/stores.

Other if-blocks inside of do-loops were moved outside of the loops.

I can provide two potential explanations for the difference in Intel and
AMD performance:

* Masking instructions have a lower latency on Intel CPUs, which permit
  limited vectorization of if-blocks. Similar vectorization is not
  possible on AMD CPUs. So Intel is less likely to benefit from if-block
  re-ordering.

* Our Intel nodes on Gaea have a lower RAM bandwidth, and see a smaller
  benefit from vectorization, which must required greater bandwidth.
  This speedup may be greater on a more modern Intel platform.

Although the code has been vectorized on both Intel and AMD platforms,
there are still many memory accesses per operation, which is limiting
performance.

The changes below are not expected to change any answers, and none were
detected. But since we are changing a core component, I'd suggest
reviewing this carefully.

Sample timings are provided below.

Runtime measurements
--------------------

AMD Before:

(Ocean Coriolis & mom advection)      1.091571
(Ocean Coriolis & mom advection)      1.086183
(Ocean Coriolis & mom advection)      1.091197
(Ocean Coriolis & mom advection)      1.087709
(Ocean Coriolis & mom advection)      1.086990

AMD After:

(Ocean Coriolis & mom advection)      0.619346
(Ocean Coriolis & mom advection)      0.624106
(Ocean Coriolis & mom advection)      0.625438
(Ocean Coriolis & mom advection)      0.630169
(Ocean Coriolis & mom advection)      0.621736

----

Intel Before:

(Ocean Coriolis & mom advection)      0.981367
(Ocean Coriolis & mom advection)      0.982316
(Ocean Coriolis & mom advection)      0.986633
(Ocean Coriolis & mom advection)      0.981260
(Ocean Coriolis & mom advection)      0.982810

Intel After:

(Ocean Coriolis & mom advection)      0.788747
(Ocean Coriolis & mom advection)      0.797684
(Ocean Coriolis & mom advection)      0.786874
(Ocean Coriolis & mom advection)      0.792120
(Ocean Coriolis & mom advection)      0.795373
…03-26

dev/gfdl main candidate 2021-03-26
- Fixes for SSP (x:sum;y:sum,z:point) and
  PSP (x:point,y:sum,z:point) diagnostics
- Removed unused `total_weight` arrays in these cases
  Corrected comments describing the various CT_copy_data routines, following
suggestions in a review by Keith Lindsay.  All answers are bitwise identical.
+Add infrastructure interfaces needed by SIS2
Fixed downsampling for x:sum y:point z:point diags
Coriolis: Improved coradcalc vectorization
  Added support for new IO capabilities that are needed by SIS2 to use the MOM6
framework and infrastructure code, but should also be useful within MOM6
itself.  These new capabilities include writing global attributes to files,
using create_file named axes that are not derived from a MOM6 grid type, and new
options and elements in the vardesc type to support a wider array of axes and to
provide the position of the grid staggering via an integer position variable
instead of the short character strings that had been used.

  As a part of this commit, there are the new opaques type axis_info and
attribute_info, and the new routines set_axis_info, delete_axis_info,
set_attribute_info and delete_attribute_info to facilitate these new
capabilities, as well as the publicly visible function position_from_horgrid to
translate the vardesc%hor_grid character strings into the integer position flag
used elsewhere in the MOM6 and FMS codes.  Within the MOM_io_infra, there is a
new variant of the overloaded interface write_meta to handle writing global
attributes. There are also two new optional arguments to create_file and
reopen_file, and two new optional arguments to var_desc, modify_vardesc, and
query_vardesc.  All answers and output are bitwise identical.
  Restructured the code slightly so that the output files that are generated
when input_filename = 'F' is exactly the same as if it is 'n' if there are no
restart files in the restart_input_dir, or as if it is 'r' if the restart files
are there.  Previously, the solutions with 'F' worked this way, but no
ocean_geometry.nc or Vertical_grid.nc files were written when WRITE_GEOM=1,
regardless of the presence or absence of the restart files, and the
MOM_parameter_doc.all files differed slightly between the 'n' and 'F' or 'r'
cases.  As a part of these changes, the determination of whether this is a new
run is moved earlier in the algorithm, and now sits outside of
MOM_initialize_state.  All solutions are bitwise identical, but there are
changes in the position of the PARALLEL_RESTARTFILES and REFERENCE_HEIGHT
entries in most MOM_parameter_doc.all files.
+Enhanced support for novel axes in MOM_io
  Change the types returned from the 5 rotated_field_chksum from integer to
integer(kind=int64), so that the full 64-byte checksums are returned.  Without
this change, the checksums that are written to MOM6 restart files or interpreted
from them are truncated to the latter half of their length.  This changes the
checksums that are written to the restart files, but both before and after this
change the values that are written are the same as those that are generated
after reading the restart with the same version of the code.  The code can run
across this change by setting RESTART_CHECKSUMS_REQUIRED = False for the run
segment where the transition occurs.  The solutions themselves are bitwise
identical.
  Changed the declarations of the vardesc and fields arrays to allocatable in
write_ocean_geometry_files, primarily to get one of the TC test cases to run
properly with the gcc compiler by shifting the memory for these arrays from
stack to heap.  The reason why this change works is not clear.  Some comments
describing these variables were also added.  All answers are bitwise identical.
+More consistent treatment of input_filename = 'F'
(*)Write full checksums to restarts
@OlgaSergienko OlgaSergienko merged commit 7eb92f6 into OlgaSergienko:dev/gfdl Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.