Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use adjust_ps=false in Cxx theta-l_kokkos code #6063

Merged
merged 10 commits into from
Mar 21, 2024

Conversation

tcclevenger
Copy link
Contributor

@tcclevenger tcclevenger commented Nov 10, 2023

This is an incremental change to introduce adjustment of dp3d instead of adjustment of ps.

adjust_ps is still used for preqx (forcing was never rewritten), rsplit=0 (only adjust_ps is possible), or EAM (to not touch every test).

Fixed bugs in forcing_ut init, testing it only with adjust_ps=false option.

[non-BFB] for HOMME tests with baselines (with moist forcing and some without nontrivial forcing, because even zero forcing is processed differently for adjustment of dp3d).

@tcclevenger tcclevenger added the non-BFB PR makes roundoff changes to answers. label Nov 10, 2023
@rljacob
Copy link
Member

rljacob commented Nov 10, 2023

Maybe we want adjust_ps to be false in v3? @golaz or @wlin7 ?

@rljacob
Copy link
Member

rljacob commented Nov 10, 2023

This adjust_ps logical has come up in other PRs (#4717) and the comment is always "we're not going to change it for EAM because we don't want to change v2 answers". We're assembling v3 now so time to reconsider that?

Also this issue E3SM-Project/scream#1343 says "E3SM should also migrate to adjust_ps=.false., but this is on hold in order to preserve buggy V2 behavior."

Comment on lines -70 to +73
, m_policy_dp3d_lim (Homme::get_default_team_policy<ExecSpace,TagDp3dLimiter>(m_num_elems))
, m_tu(m_policy_dp3d_lim)
, m_dp3d_thresh(params.dp3d_thresh)
, m_vtheta_thresh(params.vtheta_thresh)
, m_policy_dp3d_lim (Homme::get_default_team_policy<ExecSpace,TagDp3dLimiter>(m_num_elems))
, m_tu(m_policy_dp3d_lim)
Copy link
Contributor Author

@tcclevenger tcclevenger Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated, but gives us a warning in EAMxx that I would like to remove. Let me know if I need to revert for the PR.

@mt5555
Copy link
Contributor

mt5555 commented Nov 10, 2023

To Rob's question: Yes, EAM also has this bug and EAM should adopt adjust_ps=.false. But I didn't want to suggest it since it's a small effect and we are so close to the V3 deadlines.

@ambrad
Copy link
Member

ambrad commented Nov 13, 2023

Will this be answer changing for SCREAM simulations?

@tcclevenger
Copy link
Contributor Author

tcclevenger commented Nov 13, 2023

Will this be answer changing for SCREAM simulations?

When we merge for EAMxx it will be answer changing for C++, not otherwise (#ifdef SCREAM here still has same behavior).

@ambrad
Copy link
Member

ambrad commented Nov 13, 2023

Ok, and everyone has agreed to that? I ask because it looks like the old method is being removed from C++, so we won't be able to switch back if there are issues.

@oksanaguba
Copy link
Contributor

I was about to ask whether we want climo for this change, to document it, even if we can do climo only in eam. I ran climo for this long time ago, so the best would be to rerun https://acme-climate.atlassian.net/wiki/spaces/COM/pages/1632864081/Climo+6-year+runs+for+ps+adjustment+dp3d+adjustment+T+adjustment+choices .

@mt5555
Copy link
Contributor

mt5555 commented Nov 13, 2023

Correct - (removing old code). This has been tested in SCREAM v0.1, but then the bug was accidentally turned back on in v1. so it hasn't had extensive testing in SCREAM v1, but it is low risk.

@oksanaguba
Copy link
Contributor

Also, can we change the description a little? Something like

This PR addresses the issue that the moist pressure forcing was applied to ps instead of dp3d (design inherited from CAM).
The change affects all runs in standalone homme with forcing, but adjusting ps is hardcoded for EAM runs with F executable for now.
When this is pulled into scream, it will use dp3d adjustment in EAMXX, not ps adjustment.

nonbfb for homme runs with forcings against baselines, should be bfb for cxx-vs-f tests.

@tcclevenger did you run homme suites for this? if so, would you please post the output from chrysalis and weaver here? thanks!

@oksanaguba
Copy link
Contributor

Though my comment above is suggestive, i actually think we should rerun climo for this change. If @tcclevenger does not want to, i can do it.

@tcclevenger
Copy link
Contributor Author

@oksanaguba I have run all theta-f* tests on my workstation. I am struggling with weaver at the moment. Have you successfully run there since the drivers and modules were updated? I will continue to try, or move to summit.

And I do not have access to chrysalis.

@oksanaguba
Copy link
Contributor

@tcclevenger i haven't used weaver for a while.

@rljacob
Copy link
Member

rljacob commented Dec 7, 2023

Waiting on @tcclevenger to test on GPU.

@tcclevenger
Copy link
Contributor Author

An update: I now have homme building on weaver, but the F90 theta-f* tests I'm trying to run appear to be hanging, so I'm not able to produce baselines or run the cxx_vs_f90 tests. I'm working on figuring this out.

As for summit, I'm running into an internal compiler error whose solution doesn't seem trivial. I'll most likely continue on the weaver path.

@ambrad
Copy link
Member

ambrad commented Dec 11, 2023

Is the Summit ICE for the standalone-Homme build? Does it occur with this configuration?

machinefile=$e3sm/components/homme/cmake/machineFiles/summit-bfb.cmake
cmake -C $machinefile $e3sm/components/homme

This configuration worked for me with the master branch of a few weeks ago; you might check the master branch in addition to your branch.

You should also merge this PR into the SCREAM repo and run the CIME SCREAMv1 test suite, e.g., the specific test ERS_Ln90.ne30pg2_ne30pg2.F2010-SCREAMv1.

Another point: The ICE might depend on modules. You can source the .env_mach_specific.sh file from a CIME test to get an environment to use when building standalone Homme.

@tcclevenger
Copy link
Contributor Author

@oksanaguba @ambrad Looks like my issue on summit was the modules! Loading them through CIME has it building without error. Thanks for the help!

@tcclevenger
Copy link
Contributor Author

Ran the theta-f* tests on summit, all cxx_vs_f90 tests pass, the following tests fail with diffs in 5 of 21 fields (kokkos and f90 are identical).

  • theta-f1-tt10-hvs1-hvst0-r2-qz10-nutopoff-GB-sl-ne2-nu3.4e18-ndays1
    • Diff range (normalized): [4e-15, 2e-11]
  • theta-fhs1-ne2-nu3.4e18-ndays1
    • Diff range (normalized): [4e-10, 2e-6]
  • theta-fhs2-ne2-nu3.4e18-ndays1
    • Diff range (normalized): [1e-9, 2e-4]
  • theta-fhs3-ne2-nu3.4e18-ndays1
    • Diff range (normalized): [1e-10, 2e-6]
      So all tests give small diffs in same fields.

I also merged into EAMxx and ran AT test suite on weaver and mappy. Weaver passed, Mappy failed with diffs in CIME cases (expected) with normalized diffs in range [2e-5, 1]. I also ran same CIME cases on summit and diffs existed in the same fields as CPU and the same values.

@mt5555 Do these numbers seem reasonable (if that is possible to know), or are there any simulations I need to run to test the output?

@oksanaguba can you run the Chrysalis tests. I don't have access yet.

@oksanaguba
Copy link
Contributor

Ran climo for model-vs-model for EAM with the default setup (adjust_ps = true, master branch) and with adjust_ps = false. The climo is here https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/onguba/theta/eam-ne30-dpadj.443759.0002-0006/def/viewer/ , not sure if these diffs are considered small, but they are documented.

@tcclevenger
Copy link
Contributor Author

@oksanaguba Do you know if there are cxx vs. F90 tests using adjust_ps=false in nightly tests?

@tcclevenger
Copy link
Contributor Author

@oksanaguba Ran ERS test in SCREAM with these changes in homme and we pass.

@tcclevenger
Copy link
Contributor Author

@oksanaguba What still needs to be looked at for this? Is there anything I can do on my end?

@oksanaguba
Copy link
Contributor

@tcclevenger i want to check one more time which code line(s) actually make adjust_ps vs adjust_dp cases diverge when forcing is zero. that is something i tried to do before, but did not succeed. i plan to get back to this at the end of this week.

@oksanaguba
Copy link
Contributor

Regarding nonbfb behavior for adjust_ps=true vs false with use_moisture=true, it is due to this code

#ifdef MODEL_THETA_L
   if (use_moisture) then
      ! compute updated pnh and exner
      if (adjust_ps) then
         ! recompute hydrostatic pressure from ps
         do k=1,nlev
            phydro(:,:,k)=hvcoord%ps0*hvcoord%hyam(k) + ps(:,:)*hvcoord%hybm(k)
         enddo
      else
         ! recompute hydrostatic pressure from dp3d
         call get_hydro_pressure(phydro,elem%state%dp3d(:,:,:,np1),hvcoord)
      endif

that is, phydro is computed differently. when using test theta-f1-tt10-hvs1-hvst0-r2-qz10-nutopoff-GB-ne2-nu3.4e18-ndays1 these differences not show up during the 1st time step (not sure why, not like the values are represented as decimals)

OG phydro in forcing  4  3  8  0.58593750000000000E+04
OG phydro in forcing  4  3  9  0.66406250000000000E+04
OG phydro in forcing  4  3 10  0.74218750000000000E+04

but show up in the 2nd time step

934459,934460c934459,934460
< OG phydro in forcing  1  1  8  0.58593940740563385E+04
< OG phydro in forcing  1  1  9  0.66406466172638511E+04
---
> OG phydro in forcing  1  1  8  0.58593940740563394E+04
> OG phydro in forcing  1  1  9  0.66406466172638502E+04

I do not see anything else that would be a concern.

@oksanaguba
Copy link
Contributor

i am running homme suite on chrysalis, after that this will be ready.

@oksanaguba
Copy link
Contributor

@tcclevenger forcing test passed for me when running a few times by hand, but failed in test suite, with seed 1206257716 . So i hardcoded the seed into forcing_ut.cpp and it does fail, with nans (not like results differ). last time we saw it it was about init of ps and init of dp3d mismatch (dp3d did not get init-ed from A,B). you fixed that, so i am not sure what else is there. one of us would need to debug.

@oksanaguba
Copy link
Contributor

@tcclevenger i believe one issue with the new code is that this init of dp is not what we want:

       Kokkos::parallel_for(Kokkos::ThreadVectorRange(kv.team,NUM_LEV),
                            [&](const int ilev) {
        dp(ie,tl,igp,jgp,ilev) = ps0*hybrid_am(ilev)
                               + ps(ie,tl,igp,jgp)*hybrid_bm(ilev);

See, for example, this correct code for dp from A,B:

  do k = 1 , nlev
    dp_ref(k) = ( hvcoord%hyai(k+1) - hvcoord%hyai(k) ) * hvcoord%ps0 + &
         ( hvcoord%hybi(k+1) - hvcoord%hybi(k) ) * ps  !Reference pressure difference

I tried this fix:

+++ b/components/homme/src/theta-l_kokkos/cxx/ElementsState.cpp
@@ -285,8 +285,15 @@ void ElementsState::randomize(const int seed,
   auto dp = m_dp3d;
   auto ps = m_ps_v;
   auto ps0 = hvcoord.ps0;
-  auto hybrid_am = hvcoord.hybrid_am; 
-  auto hybrid_bm = hvcoord.hybrid_bm; 
+  //auto hybrid_am = hvcoord.hybrid_am; 
+  //auto hybrid_bm = hvcoord.hybrid_bm; 
+
+  // Create local copies, to avoid issue of lambda on GPU
+  auto hyai = hvcoord.hybrid_ai_packed;
+  auto hybi = hvcoord.hybrid_bi_packed;
+  auto dhyai = hvcoord.hybrid_ai_delta;
+  auto dhybi = hvcoord.hybrid_bi_delta;
+
   const auto tu = m_tu;
   Kokkos::parallel_for(m_policy, KOKKOS_LAMBDA(const TeamMember& team) {
     KernelVariables kv(team, tu);
@@ -297,11 +304,21 @@ void ElementsState::randomize(const int seed,
                          [&](const int idx) {
       const int igp  = idx / NP;
       const int jgp  = idx % NP;
+
+      ColumnOps::compute_midpoint_delta(kv,hyai,dhyai);
+      ColumnOps::compute_midpoint_delta(kv,hybi,dhybi);
+
       Kokkos::parallel_for(Kokkos::ThreadVectorRange(kv.team,NUM_LEV),
                            [&](const int ilev) {
-        dp(ie,tl,igp,jgp,ilev) = ps0*hybrid_am(ilev)
-                               + ps(ie,tl,igp,jgp)*hybrid_bm(ilev);
+
+        dp(ie,tl,igp,jgp,ilev) = ps0*dhyai(ilev)
+                               + ps(ie,tl,igp,jgp)*dhybi(ilev);

but it also did not work (meaning it seems that there is still a mismatch between this dp and dp as computed from p(A,B,ps)). So I printed averages of Ai vs Am and for me they do not match (but they should). We can re-iterate on Tuesday in case i made mistakes.

@tcclevenger
Copy link
Contributor Author

Thanks for looking at this, @oksanaguba. I can try to do some debugging this week!

@oksanaguba
Copy link
Contributor

This resolved it for me, that is, now hydro pressure computed from dp coincides with the pressure from A,B:

diff --git a/components/homme/src/share/cxx/HybridVCoord.cpp b/components/homme/src/share/cxx/HybridVCoord.cpp
index 569eef1d68..199b6bc4f7 100644
--- a/components/homme/src/share/cxx/HybridVCoord.cpp
+++ b/components/homme/src/share/cxx/HybridVCoord.cpp
@@ -158,8 +158,8 @@ void HybridVCoord::random_init(int seed) {
 
     Errors::runtime_check(curr>prev,"Error! hybrid_a+hybrid_b is not increasing.\n", -1);
 
-    host_hybrid_am_real(i-1) = (host_hybrid_ai(i) + host_hybrid_ai(i))/2.0;
-    host_hybrid_bm_real(i-1) = (host_hybrid_bi(i) + host_hybrid_bi(i))/2.0;
+    host_hybrid_am_real(i-1) = (host_hybrid_ai(i) + host_hybrid_ai(i-1))/2.0;
+    host_hybrid_bm_real(i-1) = (host_hybrid_bi(i) + host_hybrid_bi(i-1))/2.0;
   }

Please commit the changes from above (hv init and dp calculations), run on gpu standalone forcing test (i assume there is no need to run ERS test, since the above does not touch forcing functor), and then i will re-test multiple times on chrysalis. Thanks.

@tcclevenger
Copy link
Contributor Author

@oksanaguba Implemented your changes, test with that specific seed before and after and went from failing to passing. Tested forcing ut on weaver V100 gpu and builds and runs without errors and test passes.

@rljacob
Copy link
Member

rljacob commented Feb 29, 2024

discussion: Oksana will test one more time.

@oksanaguba
Copy link
Contributor

Running hommebfb on chrysalis as expected

	 30 - thetanhwet-TC (Failed)
	 39 - thetanh-moist-bubble (Failed)
	 43 - thetah-sl-dcmip16_test1pg2 (Failed)
	 47 - thetanh-moist-bubble-kokkos (Failed)
	 53 - thetanh-moist-bubble-sl (Failed)
	 54 - thetanh-moist-bubble-sl-kokkos (Failed)
	 55 - thetanh-moist-bubble-sl-pg2 (Failed)
	 56 - thetanh-moist-bubble-sl-pg2-kokkos (Failed)
	 75 - theta-f1-tt10-hvs1-hvst0-r2-qz10-nutopoff-GB-sl-ne2-nu3.4e18-ndays1 (Failed)
	 76 - theta-f1-tt10-hvs1-hvst0-r2-qz10-nutopoff-GB-sl-kokkos-ne2-nu3.4e18-ndays1 (Failed)
	 77 - theta-fhs1-ne2-nu3.4e18-ndays1 (Failed)
	 78 - theta-fhs1-kokkos-ne2-nu3.4e18-ndays1 (Failed)
	 79 - theta-fhs2-ne2-nu3.4e18-ndays1 (Failed)
	 80 - theta-fhs2-kokkos-ne2-nu3.4e18-ndays1 (Failed)
	 81 - theta-fhs3-ne2-nu3.4e18-ndays1 (Failed)
	 82 - theta-fhs3-kokkos-ne2-nu3.4e18-ndays1 (Failed)
	114 - theta-f1-tt10-hvs1-hvst0-r2-qz10-nutopoff-GB-sl-ne6-nu1.25e17-ndays1 (Failed)
	115 - theta-f1-tt10-hvs1-hvst0-r2-qz10-nutopoff-GB-sl-kokkos-ne6-nu1.25e17-ndays1 (Failed)
	116 - theta-fhs1-ne6-nu1.25e17-ndays1 (Failed)
	117 - theta-fhs1-kokkos-ne6-nu1.25e17-ndays1 (Failed)
	118 - theta-fhs2-ne6-nu1.25e17-ndays1 (Failed)
	119 - theta-fhs2-kokkos-ne6-nu1.25e17-ndays1 (Failed)
	120 - theta-fhs3-ne6-nu1.25e17-ndays1 (Failed)
	121 - theta-fhs3-kokkos-ne6-nu1.25e17-ndays1 (Failed)

@oksanaguba
Copy link
Contributor

also forcing up passed 100 times:

[ac.onguba@chr-0115 thetal_kokkos_ut]$ for i in $(seq 1 100); do srun -n 1 ./forcing_ut | grep "All tests passed" ; done | wc -l
100

@oksanaguba
Copy link
Contributor

This PR is ready.

@oksanaguba
Copy link
Contributor

  ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 (Overall: PASS) details:
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 CREATE_NEWCASE
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 XML
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 SETUP
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 SHAREDLIB_BUILD time=179
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 NLCOMP
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 MODEL_BUILD time=1082
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 SUBMIT
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 RUN time=93
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 COMPARE_base_rest
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 BASELINE master:
    FAIL ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 MEMCOMP [Errno 2] No such file or directory: '/lcrc/group/e3sm/baselines/chrys/intel/master/ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2/cpl-mem.log'
    FAIL ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 TPUTCOMP [Errno 2] No such file or directory: '/lcrc/group/e3sm/baselines/chrys/intel/master/ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2/cpl-tput.log'
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 MEMLEAK
    PASS ERS.ne4pg2_oQU480.F2010.chrysalis_intel.eam-thetahy_ftype2 SHORT_TERM_ARCHIVER

@rljacob
Copy link
Member

rljacob commented Mar 14, 2024

@oksanaguba you're the integrator on this. Go ahead and start merging it.

oksanaguba added a commit that referenced this pull request Mar 15, 2024
This is an incremental change to introduce adjustment of dp3d
instead of adjustment of ps.

adjust_ps is still used for preqx (forcing was never rewritten),
rsplit=0 (only adjust_ps is possible), or EAM (to not touch every test).

Fixed bugs in forcing_ut init, testing it only
with adjust_ps=false option.

[non-BFB] for HOMME tests with baselines (with moist forcing
and some without nontrivial forcing, because even zero forcing
is processed differently for adjustment of dp3d).
@oksanaguba
Copy link
Contributor

cdash is still affected by chrysalis, i cannot sort out yet whcih tests diff-ed because of this PR.

@rljacob
Copy link
Member

rljacob commented Mar 19, 2024

I think you are good. This PR only caused diffs in: ERS_D.ne4pg2_oQU480.F2010.mach_comp.eam-hommexx, SMS_Ln5.ne30pg2_ne30pg2.F2010-SCREAM-LR-DYAMOND2, and SMS_Ln9_P24x1.ne4_ne4.FDPSCREAM-ARM97

@oksanaguba
Copy link
Contributor

I just realized we have a deficiency here -- we have no choice but to keep both options, adjust_ps and adjust_dp3d, therefore we need to print in the logs which one is active. i am (again) looking at the code instead of just reading a log file.

#endif

#ifdef CAM
adjust_ps=.true. ! For CAM runs, require forcing to stay on reference levels
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this last change is the problem for cdahs -- DYAMOND and DPSCREAM tests ( SMS_Ln5.ne30pg2_ne30pg2.F2010-SCREAM-LR-DYAMOND2.chrysalis_intel and SMS_Ln9_P24x1.ne4_ne4.FDPSCREAM-ARM97.chrysalis_intel ) both have -DCAM and -DSCREAM in ./cmake-bld/cmake/atm/CMakeFiles/atm.dir/flags.make defined. We code like only one is active. Previously v0 scream was using adjust_dp3d, but now it is switched to adjust_ps .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to only forcing adjust_ps=.true. is CAM is defined and SCREAM is not defined.

@oksanaguba
Copy link
Contributor

Not quite sure what to do -- DYAMOND and DPSCREAM v0 tests fail because they were accidentally switched to adjust_ps . hommexx test fails because it was switched from adjust_ps to adjust_dp3d. The latter is not an issue, but v0 tests probably need to be addressed. I won't be able to do this today, maybe @tcclevenger can address this?

Separately, we need a follow up Pr that would print which option is used, in all cases -- in homme and hommexx output.

@rljacob
Copy link
Member

rljacob commented Mar 20, 2024

Its not a problem for E3SMv3 that SCREAMv0 answers change. Up to @PeterCaldwell I guess.

@oksanaguba
Copy link
Contributor

@tcclevenger please push your change to your branch and i will test now.

@tcclevenger
Copy link
Contributor Author

@oksanaguba Done. Should be good to test now.

oksanaguba added a commit that referenced this pull request Mar 20, 2024
This is an incremental change to introduce adjustment of dp3d
instead of adjustment of ps.

adjust_ps is still used for preqx (forcing was never rewritten),
rsplit=0 (only adjust_ps is possible), or EAM (to not touch every test).

Fixed bugs in forcing_ut init, testing it only
with adjust_ps=false option.

[non-BFB] for HOMME tests with baselines (with moist forcing
and some without nontrivial forcing, because even zero forcing
is processed differently for adjustment of dp3d).
@oksanaguba
Copy link
Contributor

re-merged to next after confirming that DY and DP tests now pass.

@oksanaguba oksanaguba merged commit b4ce2b2 into E3SM-Project:master Mar 21, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Atmosphere HOMME non-BFB PR makes roundoff changes to answers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants