Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart reproducibility for S2SW (was Target improvements/changes for Prototype 5) #739

Closed
AvichalMehra-NOAA opened this issue Mar 7, 2020 · 40 comments · Fixed by #1131
Closed
Assignees

Comments

@AvichalMehra-NOAA
Copy link

  1. Switch mediator from NEMS to CMEPS.
  2. Swap CICE 5 with CICE 6.
  3. Include fractional grid capability.

To be completed by 08/15/20.

@AvichalMehra-NOAA AvichalMehra-NOAA self-assigned this Mar 7, 2020
@JessicaMeixner-NOAA
Copy link
Collaborator

Additional details:

  1. Wave coupling will continue to go through connectors, which will mean atm-ocn-ice-wav restarts will not be bit reproducible and will not be a target for this prototype.
  2. The CICE6 cap that will be used is the CICE6 cap located here: https://github.com/CICE-Consortium/CICE/tree/master/cicecore/drivers/nuopc/cmeps

@DeniseWorthen
Copy link
Collaborator

We have a NOAA-EMC fork of the CICE-Consorium which I have been keeping up to date w/ master: NOAA-EMC/CICE

@junwang-noaa
Copy link
Collaborator

@AvichalMehra-NOAA @JessicaMeixner-NOAA may I ask some questions:

  1. I saw in benchmark4, we will run a benchmark with coupled FV3-MOM6-CICE5-WW3 using CCPP. Assuming with ww3, model restart is not bit reproducible, are we still going to run the 4 component FV3-MOM6-CICE5-WW3 in benchmark4, then run 3 component FV3-MOM6-CICE5 in benchmark 5? not sure what it means that atm-ocn-ice-wav will not be a target for this prototype?
  2. If I understand correctly, Mariana said CMEPS has CICE6 and fractional grid capbility, so in benchmark5, we are going to use CMEPS for CICE6 coupling and fractional grid capability?

@AvichalMehra-NOAA
Copy link
Author

For (1), reproducible restarts will not be an initial focus for prototype 5. We will consider them once we have finalized the role of mediator with CMEPS for atm-ocn-ice-wav coupling.

Yes on (2).

@JessicaMeixner-NOAA
Copy link
Collaborator

@junwang-noaa Benchmark 5 will include four components: FV3-MOM6-CICE6-WW3 with CMEPS mediator used for exchanges between FV3-MOM6-CICE6 and WW3 connecting to the other models via connectors.

@junwang-noaa
Copy link
Collaborator

@junwang-noaa Benchmark 5 will include four components: FV3-MOM6-CICE6-WW3 with CMEPS mediator used for exchanges between FV3-MOM6-CICE6 and WW3 connecting to the other models via connectors.
@JessicaMeixner-NOAA @AvichalMehra-NOAA if changing ww3 connector to CMEPS mediator will be included in benchmark5, who will be working on this task? and what is the time line for this work to be done?

@junwang-noaa junwang-noaa reopened this Jun 3, 2020
@DeniseWorthen
Copy link
Collaborator

Sorry, I think it was auto-closed when I merged.

@junwang-noaa
Copy link
Collaborator

According to Avichal at coupled model meeting on 6/3/2020, changing ww3 connector to CMEPS mediator is not a requirement for benchmark5. We will work on it if all the benchmark5 required components are finished before the deadline, otherwise it will be in benchmark 6.

@DeniseWorthen DeniseWorthen transferred this issue from ufs-community/ufs-s2s-model Aug 6, 2021
@DeniseWorthen DeniseWorthen changed the title Target improvements/changes for Prototype 5 Restart reproducibility for S2SW (was Target improvements/changes for Prototype 5) Aug 6, 2021
@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Nov 1, 2021

Working in close collaboration w/ @mvertens at NCAR, I've been able to run a version of the cpld_control_wave_p7 through the CMEPS mediator using :

  • a modified CESM-WW3 NUOPC cap using ifdef CESMCOUPLED to separate code specific to CESM.
  • a 2/3deg MOM6 tripole configuration for WW3 provided by Alper Altuntas at NCAR (@alperaltuntas).
  • import of the following fields, all mapped with nearest-source-to-destination conservative mapping
    • u10m,v10m and Tbot from the ATM
    • u,v and SST from the OCN
    • ice fraction from the ICE
  • export of the wave roughness length to the ATM (shown below after 12 hours in the coupler history):

Screen Shot 2021-10-31 at 10 13 38 AM

When testing restarts, the Sw_z0 does not reproduce. I made an attempt to add the charn field to the ww3 restart file; the model restarts but the initial value of Sw_z0 is still not reproducing. I will continue working on this.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Nov 1, 2021 via email

@DeniseWorthen
Copy link
Collaborator

I've be able to achieve restart reproducibility (6hr/6hr/12hr) in the current set up by adding conditional logic for restart to the firstCall statement in the calculation of export roughness length. I've also removed the additional field I was writing to the restart file since it turned out to be unnecessary. In the current setup I am not using the WRST switch.

The bug fix for deactivated sea points in w3wavemd discussed at the coupled tag-up did not solve the restart issue.

I'll also note that for the 12h test I've been running, the current setup is giving a consistent wall clock time of ~340s, about 100s fast than the typical wall time for the current cpld_control_wave_p7, even though the ocean resolution is higher (2/3 deg vs 1deg).

@JessicaMeixner-NOAA
Copy link
Collaborator

That's awesome @DeniseWorthen !!!!

The difference in timing might be explained by the spectral resolution instead of the geographic resolution. Also, multi has a small overhead compared to shel, but given the timings you mention, I'd suspect its spectral resolution -- which needs to be decreased (#822)

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Nov 4, 2021 via email

@DeniseWorthen
Copy link
Collaborator

The test I'm running is C96mx100 for the ATM-OCN-ICE but the WAV model is 2/3deg MOM6 grid instead of the 1deg rectilinear grid used in our standard test.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Nov 4, 2021 via email

@DeniseWorthen
Copy link
Collaborator

I believe I am using a spectral resolution of 25 in the CMEPS test; the cpld_control_p7_wave is currently using 50.

@aliabdolali
Copy link
Collaborator

I've be able to achieve restart reproducibility (6hr/6hr/12hr) in the current set up by adding conditional logic for restart to the firstCall statement in the calculation of export roughness length. I've also removed the additional field I was writing to the restart file since it turned out to be unnecessary. In the current setup I am not using the WRST switch.

The bug fix for deactivated sea points in w3wavemd discussed at the coupled tag-up did not solve the restart issue.

I'll also note that for the 12h test I've been running, the current setup is giving a consistent wall clock time of ~340s, about 100s fast than the typical wall time for the current cpld_control_wave_p7, even though the ocean resolution is higher (2/3 deg vs 1deg).

Great news @DeniseWorthen, did WRST cause a problem for restarts?

@JessicaMeixner-NOAA
Copy link
Collaborator

@aliabdolali I don't think WRST is necessarily used/needed due to the different NUOPC cap & mediator

@DeniseWorthen that will explain the difference. We are going to put together a PR with updated 1 deg spectral resolution and a fix for the other bugs so that the new resolution will be available for everyone to use, we should not have 50 there.

@JessicaMeixner-NOAA
Copy link
Collaborator

Speaking of new WW3 grids, @DeniseWorthen it's still unclear to me if you needed the 1 deg WW3 to just needed the WW3 1 deg rectilinear grid to have fewer masked out regions, or if you wanted a curvilinear 1deg WW3 grid that matched MOM6's tricolor grid, but stop at 80 deg N or something like that.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Nov 4, 2021

@aliabdolali As Jessica notes, it appears that WRST is not required for restart reproducibility when you're coupling through a mediator.

My change in the CalcRoughl SR did result in roughness lengths over ice>0.5 having a uniform value of 968.6. This does result in strange (large) values getting mapped to the ATM. I need to look also at how the ATM is actually applying the roughness length near the sea ice edge.

@SMoorthi-emc Does the ATM apply some ice fraction cutoff to the values it receives?

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Nov 4, 2021 via email

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Nov 4, 2021

@JessicaMeixner-NOAA As to the grids, I think it is up to how the coupled group wishes to validate WW3 coupling through the mediator vs connectors. Ultimately I think it makes the most sense to run waves on the same grid as the ocean and ice. But since we're currently using a rectilinear grid for waves (w/ the connectors), is that an acceptable difference for validation? Or would you prefer being able to do a "clean" validation where the only difference was connector vs mediator?

@DeniseWorthen
Copy link
Collaborator

@junwang-noaa Thanks for the explanation on zorl. I guess that explains why it didn't blow up w/ the large mapped values.

@mvertens
Copy link

mvertens commented Nov 4, 2021 via email

@DeniseWorthen
Copy link
Collaborator

I've updated my test configuration to run on the MOM6 1-deg tripole grid and turned on export of the stokes drift partition fields to the MOM6. The 6h/6h/12h restart still passes, except for 6 points where the roughness length passed to the ATM does not reproduce. Those 6 points are however "out-of-range" for what the ATM uses (they are ~1000). Since they are not used by ATM, restart reproducibility is maintained.

Using the same 1-deg MOM6 Tripole grid, I ran the current connector version also after enabling the export field writing. The following figures show the export fields from WAV at the end of 6 hours (not all exported fields are shown):

wavImp_Sw_z0

wavImp_Sw_vstokes3

wavImp_Sw_ustokes3

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Nov 29, 2021

@aliabdolali @JessicaMeixner-NOAA I noticed some behaviour when testing my latest code updates. I then went and ran the cpld_control_p7 from the current develop and noticed the same thing (I turned on the field dumping in wmesmf). Basically, the stokes3 (eg, y3pstk) components are now very very small (~10-13) everywhere. Is there an explanation for this?

@JessicaMeixner-NOAA
Copy link
Collaborator

Did your latest code updates update WW3? If so what were the before/after differences and code versions? I have not noticed this but haven't printed out the values in a while either.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Nov 29, 2021

My code is behaving the same as the current develop branch, using the cpld_control_p7 test. A run directory is here:

/scratch1/NCEPDEV/stmp2/Denise.Worthen/FV3_RT/rt_9619/cpld_control_p7

@JessicaMeixner-NOAA
Copy link
Collaborator

@DeniseWorthen Okay I double checked what I suspected and confirmed it's just the lack of ICs. I checked this by running the benchmark reg test twice,

Without wave ICs: /scratch1/NCEPDEV/stmp2/Jessica.Meixner/FV3_RT/rt_239203/cpld_bmark_p7
with wave ICs: /scratch1/NCEPDEV/stmp2/Jessica.Meixner/FV3_RT/rt_73541/cpld_bmark_p7

The run with ICs is has much higher values of the partitioned Stoke's drift, I think everything you are seeing is normal. If you want me to create an IC for your testing, just send me the ww3_grind.inp file and IC date of choice and it'll take me about a day or so.

@DeniseWorthen
Copy link
Collaborator

Hm. I thought I remember you telling me that the wave model spins up really quickly. So I ran the cpld_control_p7 case out for 5 days. This is the mean value (including all the land points) for the y1 (black),y2(red) and y3(green) field:

Screen Shot 2021-11-29 at 5 53 35 PM

Why isn't the y3 field spinning up?

@JessicaMeixner-NOAA
Copy link
Collaborator

@DeniseWorthen wind seas spin up much quicker than swells. Also, I actually think this is more of an issue that we reduced the frequency space for speed of the tests, and the three bands are based on frequency. If I increase the number of frequency, we get more in the 3rd partition. See my test here: /scratch1/NCEPDEV/stmp2/Jessica.Meixner/FV3_RT/rt_5412/cpld_control_p7 which uses more frequency and direction points.

@DeniseWorthen
Copy link
Collaborator

Thanks, that makes more sense. I understand about reducing the number of frequencies to speed up the tests.

I want to understand more what is happening though. If I read the "option 2" description in CALC_U3STOKES and look at the code, I see that partitions don't need to match the spectral freq grid of WW3, that it will essentially "bin" the contributions into the three partitions given.

So when you reduce the number of frequencies, does it truncate the spectrum rather than just making each "band" in the spectrum wider?

In other words, in the glo_1deg grid.inp, I see where it sets "freq increment factor, first freq and number of frequencies". If I look in the older glo_1deg inp file in the input data, I see

1.07 0.035 50 36 0.5

and the newer one I see

1.07 0.035 25 24 0.5

So the "increment" is the same, but the spectrum only extends out half as far (25 vs 50). So the third partition ends up being empty. Is that how it works? I'm sure I'm getting some of the terminology wrong, hopefully I'm making sense.

@JessicaMeixner-NOAA
Copy link
Collaborator

Yes, you can see the description of the variables here:
https://github.com/NOAA-EMC/WW3/blob/develop/model/inp/ww3_grid.inp#L8-L15

so the first frequency is 0.035 and then the second frequency is 0.03*1.07. The Stoke's frequency bands were chosen off the standard frequency, so it's not perfect for this but I don't think we should change them right now either. Especially as we do not have a consistent way to deal with the various values across the different places.

@DeniseWorthen
Copy link
Collaborator

I have built wave-on versions of the c96,c192 and c384 regression control/restart tests where in each case waves are running on the MOM6 tripole grid (1 deg, 1/2deg and 1/4 deg). All control/restart tests pass if the ww3 restart file itself is not used in the file comparison.

However, for the mx050 and mx025 case, the wave restart file itself does not reproduce in the restart run (but all other restart files do reproduce). I believe something similar is noted in this issue thread. My branch is up-to-date with the current dev/ufs-weather-model branch of WW3.

I also did a memory profiling test for the c96mx100 case. This test can be compared to the one in Discussion 779, though it is not a strictly 'apples-to-apples' comparison.

Screen Shot 2021-12-01 at 6 10 19 AM

.

@JessicaMeixner-NOAA
Copy link
Collaborator

Remove the "WRST" line https://github.com/NOAA-EMC/WW3/blob/develop/model/esmf/switch#L9 in the switch file WW3/model/esmf/switch and you should be able to also compare WW3 restarts successfully. That's a known issue that I will expand on why later and that variable/feature that WRST brings should not be needed when coupled via CMEPS.

@DeniseWorthen
Copy link
Collaborator

I am not using the WRST switch.

@JessicaMeixner-NOAA
Copy link
Collaborator

Is it just the first restart file during the second restart run or all of the wave restart files?

@DeniseWorthen
Copy link
Collaborator

Right now, wave restarts are being written every 3 hours for my test setup. For the c192 setup, the restart test produces wave restarts at 20210322.18,22.21,23.0 etc. All are different compared to the same file in the control run.

So the first restart that the restart run writes at startup (20210322.180000.restart.ww3), which should be identical to the restart.ww3 file it is using from the control, is actually different.

@JessicaMeixner-NOAA
Copy link
Collaborator

The first restart being different ie (20210322.180000.restart.ww3) and restart.ww3 wouldn't surprise me too much - although removing the WRST switch I thought was sufficient. The other ones being different do surprise me a little but we've recently had a similar issue: NOAA-EMC/WW3#452 that this reminds me of where we have that the restart files are not reproducing but answers are. I'm still looking into this issue, but haven't figured anything out yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants