Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFDL to main candidate branch (2022-04-04) #1563

Closed

Conversation

marshallward
Copy link
Collaborator

This patch includes the usual mix of new features, bugfixes, and code refactors.

New features include internal tides updates (both solver and new diagnostics), Laplacian diffusion of interface heights, Hybgen/HYCOM compatibility, and ice shelf dynamics.

MOM_input now supports longer line lengths, primarily to support complex OBC specifications.

Several minor bugfixes are also included in this PR, and are detailed below.

New parameters (based on regression suite; not necessarily complete):

  • THICKNESS_TOLERANCE
  • KH_ETA_CONST
  • KH_ETA_VEL_SCALE
  • ITD_LANDFAST
  • VELOCITY_REMAPPING_SCHEME
  • PARTIAL_CELL_VELOCITY_REMAP
  • REMAP_VEL_MASK_BBL_THICK

Features:

Bugfixes and Calculation Improvements:

Refactoring and Testing:

Contributors:

Hallberg-NOAA and others added 30 commits February 15, 2022 16:19
  Added two new modules, the MOM6-specific MOM_check_scaling.F90 and the generic
framework module MOM_scaling_check.F90, to assess the uniqueness of the unit
scaling factors for all of the variables used by MOM6.  If there are overlaps in
scaling factors for different units, this also identifies and suggests alternate
scaling factors with less overlaps.  This commit includes the introduction of
the new publicly visible routines check_scaling_factors(), scales_to_powers()
and check_MOM6_scaling_factors.  This new capability does not do anything for
sufficiently low levels of model verbosity, and it is silent if the scaling
factors are unique, or if less than 2 dimensions are being rescaled.  All
answers and output files are bitwise identical, but there can be additional
messages to stdout.
  Renamed the framework module MOM_scaling_check.F90 to MOM_unique_scales.F90 to
help differentiate it from MOM_check_scaling.F90, and renamed the subroutine
check_scaling_factors() as check_scaling_uniqueness().  Also added
_Dimensional_consistency.dox to describe the dimensional consistency testing.
This commit should address the issues raised in the review of MOM6 PR #49.  All
answers and output are bitwise identical.
  Corrected memory declarations in MOM_regridding.F90 in cases where the
vertical size of the input and output grids are not the same.  Although this is
not know to have caused any particular problems, these inconsistencies could
lead to segmentation faults in cases where the target grid (e.g., diagnostic
output) is larger than the input grid (e.g., the model's native grid).  In some
cases, certain grid generation options were only written to work with the same
size of input and output grids, and error handling has been added to these cases
to gracefully bring down the model if they are used with different grid sizes.
All answers are bitwise identical in the MOM6-examples test suite, but it is
conceivable that this could correct subtle (memory-related) issues in some
configurations.
 - default behavior returns field size using fms1 format.
…65)

* Eliminate GET_ALL_PARAMS in hor_visc_init

  Added do_not_log arguments to get_param calls in MOM_hor_visc.F90 that are
only used conditionally, and eliminated the unlogged GET_ALL_PARAMS runtime
parameter and get_all variable in hor_visc_init().  By design, all logging of
parameters after this commit is identical to before, even for variables that are
inactive and therefore should not be logged.  In several places, there were some
problems, mostly with the GME code, that have been noted in comments marked with
'###'.  Also cleaned up the code alignment and eliminated unneeded temporary
variables in a few places in hor_visc().  All solutions are bitwise identical,
and no output is changed.

* Move call to get get_KH outside k-loop

The call to get the 3-d GME diffusivity arrays and the subsequent
blocking halo update was moved outside of the k-loop.

* Increase loop range for calculation of GME fluxes

* Makes GME filter rotationally invariant

* Makes the GME filter rotationally invariant
* Adds a runtime param to allow the user to control how many
smoothing passes should be done.

* Rearranges the get_param calls related to Leith

The get_param calls for Leith were not in the correct location.
This commit fixes that.

* Adding halo updates

* Fixes do loops indices and adds diagnostics

* Adds option to save barotropic tension and strain;

* Fixes many i and j loops indices associated with GME to avoid halo
problems and unnecessary halo updates. With these changes, the
model is now conserving mass and tracers when USE_GME = True.

* Fixes issues related to the merging with dev/gfdl

* Fixes calculation of FrictWork_GME

The calculation now mimics the calculation of FrictWork and it
includes the energy diffusion term.

* Removes dependency of FrictWork_GME calculation on MEKE

* Adding sh_xy_sq and sh_xx_sq in the OMP directives

Co-authored-by: Robert Hallberg <[email protected]>
Subroutine adjustEtaToFitBathymetry had a hard-wired parameter
(hTolerance = 0.1) controlling the tolerance when adjusting the
thickness to fit the bathymetry. This patch adds an user-controlled
parameter (THICKNESS_TOLERANCE), which replaces hTolerance.
THICKNESS_TOLERANCE is only activated when ADJUST_THICKNESS=True.
  Actually calculate the mean temperature and salinity reported by
MOM_state_stats().  Due to an oversight, these means were always being reported
as 0.  This changes the output when the debugging flag DEBUG_CONSERVATION=True.
All answers are bitwise identical.
  Added the new function global_mass_int_EFP(), which is analogous to
global_mass_integral but returns its result in extended fixed point (EFP_type)
format and always uses reproducing sums, to facilitate layout-invariant
global integrals but with the potential for deferred global reductions so that
this last step can be combined for various global reductions for efficiency.
All answers are bitwise identical, but there is a new public interface.
  Use global_mass_integral for the debugging diagnostics of the tracer amounts
before and after diffusion in lateral_boundary_diffusion, and replaced a call to
write(*,*) with a call to MOM_mesg to actually write the message.  The
global_mass_integral uses reproducing sums, and is invariant to layout, while
MOM_mesg is preferable for output because it will allow us to more cleanly
control how output is handled and which processors do the writing.  All
solutions are bitwise identical, although some debugging output will change.
  Use reproducing sums for tabulating tracer stocks, and move the global sum for
the tracer stocks form write_energy into call_tracer_stocks.  This involves
changes to the type of an argument (from real to EFP_type) for two arguments to
the internal routine store_stocks.  Existing tracer stock packages will still
work, but to benefit from the reproducing sums, they will also have to change
their reported values from real to EFP_type.  This is demonstrated for two
packages (advection_test_tracer and ideal_age_example), where the stocks are now
found with calls to global_mass_int_EFP(), replacing the previous explicit
sums.  With this change, the reported stock values from these packages are
identical for different PE layouts and can be much more accurate than before,
but they are different from the previously reported values at roundoff (for
positive-definite tracers), but it could be larger for tracers with a near-zero
mean value.  All solutions are bitwise identical, but output changes.
  Modified the remaining tracer packages to use the reproducing stocks.  The
reported stock values from these packages will have changed slightly, but they
now reproduce across PE layouts.  All solutions are bitwise identical, but
output changes.
  Removed trailing white space on two lines.  All answers are bitwise identical.
  Added new tmp_scale optional arguments to global_area_mean, global_layer_mean,
global_area_mean_u, global_area_mean_v, global_area_integral, global_volume_mean
and global_mass_integral.  This argument allows the reproducing sums to work
with rescaled variables without undoing the scaling of the returned variable.
Also corrected the area_rescaling in global_area_mean_u and global_area_mean_v,
which were substantially changing answers when horizontal distances were being
rescaled (i.e., if L_RESCALE_POWER is not 0).  These routines had only been
added recently, so this change is only changing answers with HOMOGENIZE_FORCINGS
= True.
  Add optional tmp_scale arguments to the various homogenize_field routines and
use them in numerous calls in homogenize_forcing and homogenize_mech_forcing, so
that the answers will pass the dimensional rescaling tests for large rescaling
factors when HOMOGENIZE_FORCINGS = True.  Among other changes is the addition of
a verticalGrid_type argument to homogenize_forcing.  All answers are bitwise
identical in the existing MOM6-examples test suite, but it will change answers
in other cases with dimensional rescaling active and homogenized forcing.
This patch includes several minor changes to the MOM_file_parser and
supporting modules in order to accommodate stronger unit testing.

It includes the following API changes:

- Removal of `static_value` from `get_param`

- Redefined `link_parameter` and `parameter_block` as private

- New functions: `all_across_PEs()`, `any_across_PEs()`

`static_value` was not used in any known experiments (outside of
internal GFDL testing), and the two derived types describe internal
operations within `MOM_file_parser`, so we do not expect any disruptions
from these changes.

A detailed summary of the changes are listed below.

- `assert()` is now used to detect same files with different IO units.

  Detection of reopenend files of the same name but different IO unit
  has been changed from `MOM_error(FATAL, ...)` to `assert()`, to
  reflect that this should be a logically impossible result.

- Bugfix: Reopened files are now reported to all PEs.

  If an open file is re-opened, then only the root PE will detect this
  and will `return` immediately.  However, the others will proceed into
  `populate_param_data` and will get stuck in a broadcast waiting for
  root.

  We fix this by communicating the reopened state to all PEs and allow
  all ranks to return before re-processing the data.

  Note that this could also be resolved by allowing all ranks to track
  IO unit numbers, but for now we do not attempt to change this
  behavior.

- `newunit=` used to generate parameter file IO unit

  The parameter IO unit is now generated by `newunit=` rather than an
  explicit search for an unused IO unit.  Note that this is a Fortran
  2008 feature.

  Testing around available IO units has also been removed.

- Removal of generic IO error handling

  Generic "IO error" tests, and corresponding `err=` arguments, have
  been removed in most cases.  We now rely on the Fortran runtime to
  provide diagnostics on these errors, which should typically exceed any
  information that MOM6 could provide.

- Removal of purported `namelist` support

  There were several blocks of code provided to support namelist syntax,
  but did not appear to be working, nor was there any known instance of
  it being used by anyone, so it has been removed.

- `#define/undef/=` syntax testing across ranks

  Previously, only the root PE would test for consistency of the
  #define-like syntax, even though all ranks have this information.
  This required a second, awkwardly placed syntax test later in the
  subroutine.

  This test is redefined to run over all ranks, and the subsequent test
  has been removed.

- `define/override` test reordering

  The `found_override` test when coupled to a `#define`-like declaration
  was unreachable due to the presence of an even stronger test related
  to valid syntax.

  This test has been moved to provide more detailed information about
  the nature of the error.

- `link_parameter`, `parameter_block` defined as private

  Internal derived types of `MOM_file_parser` are redefined as private.
  This preserves the integrity of instances of these types, and also
  prevents creation of implicit object code required to access them
  externally.

- Removal of `static_value` from `get_param` interface

  The `static_value` argument of `get_param` has been removed, since it
  is functionally equivalent to `default`.  While this is an API change,
  there is no known case of anyone using this argument.

- The `param_type%doc` fields are now properly deallocated after closed.

- Quotes have been added around some filename error warnings, to help
  detect issues related to whitespace.

- `any_across_PEs` and `all_across_PEs`

   New functions for calling `any()` and `all()` across PE ranks have
   been added.  Behavior is in line with other functions, such as
   `min_across_PEs`.
  Added a number of options to MOM_ALE to mimic options that are found in
Hycom.  By default, all answers are bitwise identical, but there are several
new runtime parameters.  The code changes include:

 . Added the ability to use a different remapping scheme for velocities than
   for tracers, using the new runtime parameter VELOCITY_REMAPPING_SCHEME

 . Added the new runtime option PARTIAL_CELL_VELOCITY_REMAP to use partial cell
   thicknesses for remapping at velocity points, which triggers a call to the
   new internal routine apply_partial_cell_mask.

 . Added the new internal routine mask_near_bottom_vel to allow MOM6 to mimic
   Hycom in zeroing out the velocities in thin layers in a bottom boundary
   layer with a thickness given by the new runtime parameter
   REMAP_VEL_MASK_BBL_THICK, while the definition of thin is specified by
   REMAP_VEL_MASK_H_THIN.  Setting these to be negative (as is the default)
   avoids this.

 . Modified the interface to remap_all_state_vars to take just the ALE_CS, which
   then provides the remapping control structure that is appropriate for the
   tracers or velocities, rather than also passing this in as an argument.

 . Eliminated some unnecessary internal variables, and added others to be
   more explicit avoid array syntax copies in arguments.
  Enforce a minimum thickness of 0.5*Angstrom in the mixed_layer_restrat
routines.  The streamfunctions in these routines attempt to limit the
thicknesses to exceed Angstrom, but they can be less than this due to roundoff.
The new limit prevents thicknesses from becoming negative when Angstrom is set
to 0, but should not change any answers for test cases with positive values of
Angstrom.  Also added some comments describing variables and their units and
simplified the OMP directives.

  Also corrected error messages in MOM_diabatic_aux.F90 to identify the file or
module where these messages come from, and modified an error message in
applyTracerBoundaryFluxesInOut so that it is written if the localized fault does
not happen to occur on the root PE.

  All answers in the existing MOM6-examples regression suite are bitwise
identical.
This patch redefines `h_new` to match the shape of a center-point rather
than a v-face point.

Dynamic memory builds would have been unaffected by this, and older GCC
(~7.3.0) compilers appear to have not objected to the shape mismatch,
but this raised an error in newer GCCs (~9.3.0).

Since this redefines `h_new` from zero-based to 1-based indexing in the
y-axis, answer changes are very possible.
bugfix: static h_new shape remaining_transport_sum
  Stop logging the deprecated run-time parameter NEW_SPONGES, and always log
INTERPOLATE_SPONGE_TIME_SPACE as if NEW_SPONGES were not used.  This commit will
address MOM6 issue #46, which can be closed it is accepted.  This will change
the MOM_parameter_doc entries in some cases, but all answers are bitwise
identical.
- Enabling pipelines for SIS2 required making the MOM6_SRC macro mandatory.
  It was set in .gitlab-ci.yml for all targets except nolibs build.
+(*)Fix dimensional rescaling with HOMOGENIZE_FORCINGS
  This commit adds the ability to specify three remapping options derived from
Hycom's hybgen code.

 - Adds the new module MOM_hybgen_remap, which contains the new subroutines
   hybgen_plm_coefs, hybgen_ppm_coefs, and hybgen_weno_coefs

 - Adds code to handle PLM_HYBGEN, PPM_HYBGEN and WENO_HYBGEN as valid entries
   for specifying the ALE remapping schemes.  Also added descriptions of units
   to some internal remapping variables.

 - Adds the optional argument PCM_cell to regridding_main, remapping_core_h, and
   remap_all_state_vars to specify layers that should use piecewise constant
   remapping, regardless of the overall remapping scheme, to follow the approach
   used in Hycom.  ALE_main uses these new optional arguments, although until
   the Hybgen regridding code is added, they will always be set to false.

 - Makes 7 character strings longer in 5 files to accommodate the new
   remapping options.

All answers are bitwise identical, but there are changes to some entries in the
MOM_parameter_doc files.
Co-authored-by: Robert Hallberg <[email protected]>
@codecov
Copy link

codecov bot commented Apr 4, 2022

Codecov Report

Merging #1563 (78c574b) into main (399a7db) will increase coverage by 0.03%.
The diff coverage is 24.48%.

@@            Coverage Diff             @@
##             main    #1563      +/-   ##
==========================================
+ Coverage   28.94%   28.98%   +0.03%     
==========================================
  Files         242      246       +4     
  Lines       71591    72344     +753     
==========================================
+ Hits        20723    20966     +243     
- Misses      50868    51378     +510     
Impacted Files Coverage Δ
config_src/drivers/solo_driver/MOM_driver.F90 67.33% <0.00%> (-0.68%) ⬇️
src/ALE/MOM_hybgen_remap.F90 0.00% <0.00%> (ø)
src/core/MOM_checksum_packages.F90 30.82% <0.00%> (-0.48%) ⬇️
src/core/MOM_isopycnal_slopes.F90 44.93% <0.00%> (ø)
src/diagnostics/MOM_obsolete_params.F90 76.19% <ø> (ø)
src/diagnostics/MOM_sum_output.F90 62.56% <ø> (-0.07%) ⬇️
src/diagnostics/MOM_wave_speed.F90 23.97% <ø> (ø)
src/framework/MOM_coms.F90 55.83% <ø> (ø)
src/framework/posix.F90 0.00% <0.00%> (ø)
src/ice_shelf/MOM_ice_shelf.F90 0.00% <0.00%> (ø)
... and 81 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@jiandewang
Copy link
Collaborator

@marshallward
summary from UFS testing:
(1) all 27 MOM6 related jobs are fine when using infra/FMS1 code.
(2) 26 jobs are fine when using infra/FMS2 but "iau" job failed to retain current result.
(3) currently dev/emc is one commit ahead of main branch (FMS mixed mode code change in *time_manager.F90), tried merging 20220404 candidate into an early version which is the same as current MOM main, still got the same situation as (1) and (2)
(4) compared with all other 26 successful jobs, the unique feature in "iau" job is that it will read in an extra netcdf fixed file for incremental, kind of suspect this maybe the cause. I put this file on GAEA at /lustre/f2/dev/ncep/Jiande.Wang/For-Marshall/IAU

@marshallward
Copy link
Collaborator Author

Thanks @jiandewang, that is useful information. We did fix some netCDF issues on our end, so it's very possible we created a new problem.

Also reassuring it is restricted to the infra/FMS2 section, and (hopefully) unrelated to the mixed precision changes.

I can see the file. I don't see anything odd, but noticed it does not have a time axis. Maybe we are doing something which assumes an unlimited time axis. I have seen similar errors in the past.

Do you know where this file is read in the IAU model?

@jiandewang
Copy link
Collaborator

Thanks @jiandewang, that is useful information. We did fix some netCDF issues on our end, so it's very possible we created a new problem.

Also reassuring it is restricted to the infra/FMS2 section, and (hopefully) unrelated to the mixed precision changes.

I can see the file. I don't see anything odd, but noticed it does not have a time axis. Maybe we are doing something which assumes an unlimited time axis. I have seen similar errors in the past.

Do you know where this file is read in the IAU model?

yes it is not related to FMS mixed mode
code: https://github.com/NOAA-GFDL/MOM6/blob/dev-gfdl-main-candidate-2022-04-04/src/initialization/MOM_state_initialization.F90#L2220

I also put MOM_input at the same location on GAEA, see line 82

@klindsay28
Copy link
Contributor

Not a full-fledged review...
I noticed in a quick skim some new unused local and imported variables.
These are not errors per se, but do lead to code that is less tidy than necessary.
I attempted to look more methodically and came up with the following.
I wouldn't be surprised if I missed some.

new unused local variables

src/ALE/MOM_ALE.F90
subroutine ALE_main
ntr

src/framework/MOM_error_handler.F90
subroutine disable_fatal_errors
rc
subroutine enable_fatal_errors
rc

variables newly imported via use, but not used

src/core/MOM_check_scaling.F90
MOM_error
FATAL
WARNING
assert
MOM_get_verbosity

src/framework/MOM_unique_scales.F90
MOM_error
FATAL
WARNING

@marshallward
Copy link
Collaborator Author

@klindsay28 Thanks for catching those. We need to investigate the netCDF issue anyway, so we have an opportunity to also clean these up.

@jiandewang
Copy link
Collaborator

Thanks @jiandewang, that is useful information. We did fix some netCDF issues on our end, so it's very possible we created a new problem.
Also reassuring it is restricted to the infra/FMS2 section, and (hopefully) unrelated to the mixed precision changes.
I can see the file. I don't see anything odd, but noticed it does not have a time axis. Maybe we are doing something which assumes an unlimited time axis. I have seen similar errors in the past.
Do you know where this file is read in the IAU model?

yes it is not related to FMS mixed mode code: https://github.com/NOAA-GFDL/MOM6/blob/dev-gfdl-main-candidate-2022-04-04/src/initialization/MOM_state_initialization.F90#L2220

I also put MOM_input at the same location on GAEA, see line 82

@marshallward it's possible this is the cause:
https://github.com/NOAA-GFDL/MOM6/pull/64/files
removing the added part here made "iau" job retained result. I am testing all other jobs but our machine is saturated at this moment, will provide update to you tomorrow.

@jiandewang
Copy link
Collaborator

Thanks @jiandewang, that is useful information. We did fix some netCDF issues on our end, so it's very possible we created a new problem.
Also reassuring it is restricted to the infra/FMS2 section, and (hopefully) unrelated to the mixed precision changes.
I can see the file. I don't see anything odd, but noticed it does not have a time axis. Maybe we are doing something which assumes an unlimited time axis. I have seen similar errors in the past.
Do you know where this file is read in the IAU model?

yes it is not related to FMS mixed mode code: https://github.com/NOAA-GFDL/MOM6/blob/dev-gfdl-main-candidate-2022-04-04/src/initialization/MOM_state_initialization.F90#L2220
I also put MOM_input at the same location on GAEA, see line 82

@marshallward it's possible this is the cause: https://github.com/NOAA-GFDL/MOM6/pull/64/files removing the added part here made "iau" job retained result. I am testing all other jobs but our machine is saturated at this moment, will provide update to you tomorrow.

update: all 27 jobs passed RT

@marshallward
Copy link
Collaborator Author

marshallward commented Apr 7, 2022

@jiandewang Your explanation looks very likely, and the comment suggests that the block was added to handle 2d+time fields. It looks like this is padding the field with 1-length vertical axis. I could see that causing problems if it thinks the vertical axis is a time axis!

Maybe we can fix this by adding an "unlimited" check. I will need to check with @MJHarrison-GFDL to see what our options are here. I think that some convention changed from FMS1 to FMS2 so our options may be limited.

@jiandewang
Copy link
Collaborator

@jiandewang Your explanation looks very likely, and the comment suggests that the block was added to handle 2d+time fields. It looks like this is padding the field with 1-length vertical axis. I could see that causing problems if it thinks the vertical axis is a time axis!

Maybe we can fix this by adding an "unlimited" check. I will need to check with @MJHarrison-GFDL to see what our options are here. I think that some convention changed from FMS1 to FMS2 so our options may be limited.

@marshallward in my previous "iau" run which failed in UFS regression test, the job finished without error but the results differ a lot from baseline, this means the netcdf file is read in incorrectly.

@marshallward
Copy link
Collaborator Author

@jiandewang We've looked over the problem, and agreed that it is due to field_size, which calls the code block which you linked.

This function is receiving an array of size four (siz(4)) and is reading 3d fields (e.g. h_fg). Because siz is one larger, it is inserting a third axis, as if it is inserting a single vertical level. So something like this:

(x, y, k) -> (x, y, 1, k)

This feature is needed for other parts of the model (e.g. OBCs) and works if we are reading a 2d+time array, but not a 3d array with no time axis.

We believe the issue is that FMS1 used to check the axes and its metadata to decide if this sort of resize should occur. But FMS2 no longer provides this feature, so we need to figure it out on the MOM6 side. Since the axis metadata was not based on CF or any other standard, we were reluctant to reintroduce a similar check in MOM6.

We clearly need a better solution for field_size, but it's not yet clear how we can fix this in a way which preserves the old FMS1 behavior.

However, none of this happens if siz matches the size of the fields. So we can fix this in the ODA by passing siz(:3) into your field_size() calls in initialize_oda_incupd_file. But this also assumes that this file will always contain 3d fields, and will never contain a time axis.

I don't know what the format of these file should be. Can we assume that ODA_INCUPD_FILE always contains 3d (x,y,k) fields? Or could it ever vary?

@jiandewang
Copy link
Collaborator

@marshallward Phil is going to chime in for the format of that file.

@jiandewang
Copy link
Collaborator

jiandewang commented Apr 7, 2022

@marshallward Phil is going to chime in for the format of that file. If we add t-axis in the fixed file, will that work ? Phil: what do you think ?

@pjpegion
Copy link
Contributor

pjpegion commented Apr 7, 2022

@marshallward I created the increment without a time-dimension, but I see that other people have included a time dimension. Would it make it easier if I created the increment file with a time-dimension of size 1?

@pjpegion
Copy link
Contributor

pjpegion commented Apr 7, 2022

@abozec Should also have a say in this

@marshallward
Copy link
Collaborator Author

I don't know if the file needs to be changed. If the current file works, then I think we have some obligation to continue supporting it.

The bigger question to me is what other possible formats the file could take? If we know that, then we can modify the ODA code to support these other possibilities.

If the answer is "all of the above", then it will be a messier solution, but we still ought be able to solve it.

@marshallward
Copy link
Collaborator Author

marshallward commented Apr 7, 2022

@pjpegion To answer your specific question: If a time axis had been in the file, then I believe it would have worked.

It looks like we should ought to be prepared to handle both 3d and 3d+time. I will look into a solution.

@marshallward
Copy link
Collaborator Author

@pjpegion @jiandewang Can one of you try this patch? I don't have any ODA configurations and am not set up to test this one, but I think it will fix your issue.

It uses the ndims output argument of field_size which (hopefully) reports that the fields are 3d and will revert any modifications to the siz array.

We need to address the underlying problem at some stage, and perhaps more generally how we want to handle inputs with different axes, but I think this will address your particular problem.

diff --git a/src/initialization/MOM_state_initialization.F90 b/src/initialization/MOM_state_initialization.F90
index 4e8dd4f18..459fc6641 100644
--- a/src/initialization/MOM_state_initialization.F90
+++ b/src/initialization/MOM_state_initialization.F90
@@ -2198,6 +2198,7 @@ subroutine initialize_oda_incupd_file(G, GV, US, use_temperature, tv, h, u, v, p
   integer :: isd, ied, jsd, jed
 
   integer, dimension(4) :: siz
+  integer :: inc_ndims
   integer :: nz_data  ! The size of the sponge source grid
   logical :: oda_inc  ! input files are increments (true) or full fields (false)
   logical :: save_inc ! save increments if using full fields
@@ -2265,7 +2266,16 @@ subroutine initialize_oda_incupd_file(G, GV, US, use_temperature, tv, h, u, v, p
   if (.not.file_exists(filename, G%Domain)) &
     call MOM_error(FATAL, " initialize_oda_incupd: Unable to open "//trim(filename))
 
-  call field_size(filename,h_var,siz,no_domain=.true.)
+  call field_size(filename,h_var,siz,no_domain=.true., ndims=inc_ndims)
+  ! NOTE: if siz is larger than h_var, then field_size assumes that h_var is
+  !   2d+time and pads siz with a single vertical level.  Currently, increment
+  !   fields are either 3d or 3d+time, so this correction is an error.
+  ! Here, we revert this correction and move siz(4) back to siz(3).
+  ! TODO: Remove this if the `field_size` behavior is resolved.
+  if (inc_ndims == 3) then
+    siz(3:4) = [siz(4), 0]
+  endif
+
   if (siz(1) /= G%ieg-G%isg+1 .or. siz(2) /= G%jeg-G%jsg+1) &
          call MOM_error(FATAL,"initialize_oda_incupd_file: Array size mismatch for oda data.")
   nz_data = siz(3)

@jiandewang
Copy link
Collaborator

@marshallward I will test that. Where is your branch ?

@marshallward
Copy link
Collaborator Author

Sorry, there was no branch, just that patch.

I have added the patch to this branch: https://github.com/marshallward/MOM6/tree/incsize_patch

@jiandewang
Copy link
Collaborator

Sorry, there was no branch, just that patch.

I have added the patch to this branch: https://github.com/marshallward/MOM6/tree/incsize_patch

got it, will test it and get back to you

@jiandewang
Copy link
Collaborator

Sorry, there was no branch, just that patch.
I have added the patch to this branch: https://github.com/marshallward/MOM6/tree/incsize_patch

got it, will test it and get back to you

@marshallward the patch caused the iau job failed when using infra/FMS1 code.
FATAL from PE 0: fms_io(read_data_3d_new), field h_fg in file INPUT/mom6_increment.nc: field size mismatch 1
job using infra/FMS2 code is still waiting on the queue

@abozec
Copy link
Collaborator

abozec commented Apr 8, 2022

sorry to be late in the conversation... we had a discussion with @awallcraft about that time dimension when we were implementing the incup and at that time, we did not see an application that would use a incremental update that was changing in time ... Usually, assimilation frameworks work sequentially with model running then assimilation software creating an increment at a specific time that is implemented once the model runs again ... If you have a way to seamlessly accommodate 3 and 4 dims, that would be fine but if it is too much problem, I would stick with 3 dimensions (x,y,z).. That is my 2-cents.

@marshallward
Copy link
Collaborator Author

@marshallward the patch caused the iau job failed when using infra/FMS1 code. FATAL from PE 0: fms_io(read_data_3d_new), field h_fg in file INPUT/mom6_increment.nc: field size mismatch 1 job using infra/FMS2 code is still waiting on the queue

Right, of course the FMS1 would already be correct. Sorry about that... hopefuly FMS2 answer is OK.

Let me try to come up with something (and perhaps will try to test myself this time).

@marshallward
Copy link
Collaborator Author

marshallward commented Apr 8, 2022

@jiandewang I have finished looking into this. If the ndims argument is provided, then the behavior is completely different.

When using FMS2 on a 3d array (nx, ny, nz):

  • call field_size(filename, var, size) sets size to (nx, ny, 1, nz)
  • call field_size(filename, var, size, ndims=ndims) sets sizeto (nx, ny, nz, 0) andndims` to 3.

For FMS1, the behavior is largely the same: (nx, ny, nz, _). (Sometimes the final value is set to 0, but not relevant here.)

We could "fix" this by adding the ndims argument, but I am more concerned that this change to field_size is too volatile to keep in the model. The reshaping of (nx, ny, nz) to (nx, ny, 1, nz) is just too particular.

I will discuss this with the others on Monday, but I expect we will have to revert this change before this PR can be considered.
@marshallward infra/FMS2 job finally got chance to run, but failed with the same error as when using infra/FMS1

@marshallward
Copy link
Collaborator Author

After some discussion, we've decided that this change to field_size() is a bit too volatile to patch over, and we'd rather try to find a more robust fix before merging to main, so we're going to withdraw this PR and will re-submit it soon.

Apologies to those who have already reviewed it, but the work is appreciated.

@adcroft adcroft deleted the dev-gfdl-main-candidate-2022-04-04 branch December 7, 2022 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.