Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken diagnostics (T_adx, T_ady, S_adx, S_ady) in symmetric memory mode #646

Closed
breichl opened this issue May 21, 2024 · 8 comments
Closed
Labels
bug Something isn't working

Comments

@breichl
Copy link

breichl commented May 21, 2024

Diagnostics including T_adx, T_ady, S_adx, and S_ady give stripes of no values on CPU boundaries in symmetric memory mode (vertical stripes in X_adx and horizontal stripes in X_ady). The stripes are not present in non-symmetric memory. We have not yet confirmed if it impacts other diagnostics. It does not impact the prognostic model solutions. The stripes emerge in code versions from PR #564 and later.

@adcroft
Copy link
Member

adcroft commented May 21, 2024

Not sure (haven't tried) but it looks like the shift calculation was dropped at https://github.com/NOAA-GFDL/MOM6/pull/564/files#diff-b4d8d593199025815b566920316560d920d617186e26b3464e2a8b2f17176ad9L388

@adcroft
Copy link
Member

adcroft commented May 21, 2024

Apparently (slack thread) this is not apparent in other variables so the newer loop structure in MOM_diag_remap.F90 is probably OK. We'll have to do a DEBUG=True for before and after...

@breichl
Copy link
Author

breichl commented May 23, 2024

It looks like it was this commit that introduced the striping: 6153d97
I reverted this change:
if (CS%usePPM) stencil = 3
back to this:
if (CS%usePPM .and. .not. CS%useHuynh) stencil = 3
and the striping went away. (So when the stencil is 2 there is no striping.)

Still unsure what the fix should be.

@breichl
Copy link
Author

breichl commented May 23, 2024

I have now figured out that I can get rid of the striping with the newer version of the code with 2 changes:
A) increasing the halo size in OM4 from 4 to 6
or
B) Changing this line to isv = is-1-nsten_halo * stencil ; jsv = js-1-nsten_halo * stencil

(A works independent of B and B works independent of A)

I'm not sure if "B" is the correct patch for this yet, but just offering these as clues. I wonder if @Hallberg-NOAA or @kshedstrom might have thoughts?

@Hallberg-NOAA
Copy link
Member

Making the halos or the work space wider is a good clue, but it will be a potentially expensive (and in the case of the wider halos unreliable) fix for this problem. I think that the underlying problem here is with how the do_i(i,j) is being declared, set and used. In fact, I think that you already identified this problem, @breichl , when you added the comment on Line 1069 of MOM_tracer_advect.F90, when you say:
! (The logical test could be "(do_i(i,j) .or. do_i(i+1,j))" to be clearer, but not needed)

To fix the problem, I suspect that the logic surrounding the calculation of the tracer advection diagnostics should be revised to do i=is,ie ; if (do_i(i,j) .or. do_i(i,j+1)) then, in both lines 1070 and 1078 and do I=is-1,ie ; if (do_i(i,j) .or. do_i(i+1,j)) then on lines 669 and 691.

In addition, I believe that the declarations for do_i on lines 383 and 759 should be changed from logical :: do_i(SZIB_(G), SZJ_(G)) to logical :: do_i(SZI_(G), SZJ_(G)) to reflect that these logical branches apply at tracer points.

In addition, I suspect that @marshallward might advise us that it would just be faster to avoid the logical tests around these diagnostics of net advective fluxes and exploit the fact that we don't change anything by adding in 0-fluxes. (We do need the logical tests for the tracer advection itself at about lines 653 and 1041, because even when the fluxes are 0, multiplying a tracer concentration by a thickness and then the reciprocal of that thickness changes the tracer concentration at roundoff.)

@breichl
Copy link
Author

breichl commented May 23, 2024

Thanks @Hallberg-NOAA! I tested this again with the revised logic and indeed that seems to also fix the striping on its own. Pr is in #649

Another thing I noticed while debugging this is the inconsistency between the logic change at L127 with the existing logic at L396 and L768
In my configuration I can update these three lines to all have the updated logic and get the same results. It is separate from the present issue, but just wanted to point it out.

@Hallberg-NOAA
Copy link
Member

@breichl, now that PR #649 has been merged into dev/gfdl, do you think that this issue can be closed, or would you like this to be kept open as a reminder to deal with the inconsistent logic you noted where the halo-sizes are being set?

@Hallberg-NOAA Hallberg-NOAA added the bug Something isn't working label May 24, 2024
@breichl
Copy link
Author

breichl commented May 24, 2024

I'm fine to close it now, the bug that was impacting OM5 is now resolved. It seems we have a couple things to potentially revisit later in this code w/ the other being the logicals inside the loop setting the diagnostics.

@breichl breichl closed this as completed May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants