-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ESMF-managed threading for all coupled components #824
Implement ESMF-managed threading for all coupled components #824
Comments
@junwang-noaa there is a build of ESMF 820bs20 available on Hera for testing.
If you cannot use these modules directly, you should be able to override existing modules for testing on Hera by setting this and leaving the rest of the modules unchanged:
Can you please use this to run the RTs against this snapshot of ESMF? |
|
Apologies @DusanJovic-NOAA. Can @mark-a-potts please help? |
Something is wrong with the modulefile:
I should have hpc-stack modules loaded first but even then |
Weird. When I do a module show, the /opt/modules is not there. From the module file, this is in there, though--- `local pkgName = myModuleName() local hierA = hierarchyA(pkgNameVer,2) conflict(pkgName) local opt = os.getenv("HPC_OPT") or os.getenv("OPT") or "/opt/modules" local base = pathJoin(opt,compNameVerD,mpiNameVerD,pkgName,pkgVersion) prepend_path("PATH", pathJoin(base,"bin")) setenv( "ESMF_ROOT", base) Do you have HPC_OPT or OPT defined in your environment? |
In
After I load ufs_hera.intel I see:
|
Looks like combining modules from two different stacks does not work. |
If you only want to use the new version of ESMF from my install, you can unload the esmf module (of comment out the load in ufs_common) and then set the ESMFMKFILE to point to my install in /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/scratch1-NCEPDEV-da-Mark.Potts-sandbox-hpc-modules-modulefiles/mpi/intel/18.0.5.274/impi/2018.0.4/esmf/8_2_0_beta_snapshot_20/lib/esmf.mk I recently got rt.sh to find the right library by doing that and adding a "-DESMFMKFILE=/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/scratch1-NCEPDEV-da-Mark.Potts-sandbox-hpc-modules-modulefiles/mpi/intel/18.0.5.274/impi/2018.0.4/esmf/8_2_0_beta_snapshot_20/lib/esmf.mk" to the cmake options in rt.conf. |
That file (/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/scratch1-NCEPDEV-da-Mark.Potts-sandbox-hpc-modules-modulefiles/mpi/intel/18.0.5.274/impi/2018.0.4/esmf/8_2_0_beta_snapshot_20/lib/esmf.mk) does not exist. See my comment above. Was you compilation successful? |
Sorry, I pasted in the wrong path. Use this instead--/scratch1/NCEPDEV/da/Mark.Potts/sandbox/hpc-modules/intel-18.0.5.274/impi-2018.0.4/esmf/8_2_0_beta_snapshot_20/lib/esmf.mk I was able to successfully compile and run the cpld_bmark_wave_v16 test with rt.sh, but that was the only one I tried. |
I added:
to modulefiles/ufs_hera.intel, and removed loading of old esmf from ufs_common, and ran control test. In compile log I see:
and compilation was successful, but run failed with this errors:
Obviously something in my environment is different. |
Hmm. I was able to get the first cpld_control_wave_p7 to run, but then it failed on the restart. You can check out how I have things set up here--/scratch1/NCEPDEV/da/Mark.Potts/sandbox/tmp/ufs-weather-model What happened to the cpld_bmark_wave_v16 test? |
Mark, the coupled tests with wave do not have restart reproducibility.
Please run cpld_control_p7(without wave) for restart test.
…On Fri, Oct 1, 2021 at 4:52 PM Mark Potts ***@***.***> wrote:
Hmm. I was able to get the first cpld_control_wave_p7 to run, but then it
failed on the restart. You can check out how I have things set up
here--/scratch1/NCEPDEV/da/Mark.Potts/sandbox/tmp/ufs-weather-model
What happened to the cpld_bmark_wave_v16 test?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#824 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TMGHZRLXC3R76NSNOLUEYNQLANCNFSM5EUBX37Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@mark-a-potts With today's commit, the coupled regression tests have all been updated to the P7 configuration. The old cpld_bmark_wave_v16 test is now called cpld_bmark_p7. |
@DusanJovic-NOAA are you able to build now using the updated ESMF path from Mark? |
Yes. The compilation was successful, but model crashed, see the runtime error above. |
@DusanJovic-NOAA that looks like a linking error, so is probably still related to the build itself. We need a clear approach to how to update version of ESMF for testing, given an existing HPC stack. I don't understand why @mark-a-potts was able to get the test to run but Dusan could not. |
I think in the RT test, we did module purge first. So the environment
should be clean. Mark can you show us how you run the cpld_control_wave_p7
case? In your code directory I only see:
/scratch1/NCEPDEV/da/Mark.Potts/sandbox/tmp/ufs-weather-model
modified: ../modulefiles/ufs_common
modified: RegressionTests_hera.intel.log
modified: fv3_conf/fv3_slurm.IN_hera
***@***.*** tests]$ git diff ../modulefiles/ufs_common
diff --git a/modulefiles/ufs_common b/modulefiles/ufs_common
index 49bdf59..81d0af7 100644
-module load esmf/8_2_0_beta_snapshot_14
+#module load esmf/8_2_0_beta_snapshot_14
Then I see you have:
COMPILE | -DAPP=S2SW
-DCCPP_SUITES=FV3_GFS_2017_coupled,FV3_GFS_v16_coupled,FV3_GFS_v16_coupled_nsstNoahmpUGWPv1
-DESMFMKFILE=/scratch1/NCEPDEV/da/Mark.Potts/sandbox/hpc-modules/intel-18.0.5.274/impi-2018.0.4/esmf/8_3_0_beta_snapshot_00/lib/
esmf.mk | - wcoss_cray wcoss2 | fv3 |
RUN | cpld_control_wave_p7
| - wcoss_cray
wcoss2 | fv3 |
I am not sure if the module file will be updated in the RT test directory.
Can you run the test case through the RT script with the new esmf lib added
in ufs_common?Thanks
…On Wed, Oct 6, 2021 at 11:30 AM Rocky Dunlap ***@***.***> wrote:
@DusanJovic-NOAA <https://github.com/DusanJovic-NOAA> that looks like a
linking error, so is probably still related to the build itself. We need a
clear approach to how to update version of ESMF for testing, given an
existing HPC stack. I don't understand why @mark-a-potts
<https://github.com/mark-a-potts> was able to get the test to run but
Dusan could not.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#824 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TODTKSPSTLAX4OB2MTUFRTQXANCNFSM5EUBX37Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
A small update is needed in HYCOM before updating to ESMF8bs20. |
@junwang-noaa @DusanJovic-NOAA
See logs here: This requires merging this HYCOM PR first: Please let me know what else is needed to update ufs-weather-model to ESMF 8.2.0bs20. For further testing, you can use this build of ESMF on Hera:
|
@rsdunlapiv Thanks for the testing. Does the HYCOM backward compatible? Do we need to commit the HYCOM PR along with the ESMF update? |
The HYCOM PR is backward compatible, so it can go in first with the current
version on ESMF.
…On Mon, Oct 18, 2021, 9:23 PM Jun Wang ***@***.***> wrote:
@rsdunlapiv <https://github.com/rsdunlapiv> Thanks for the testing. Does
the HYCOM backward compatible? Do we need to commit the HYCOM PR along with
the ESMF update?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#824 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYR3FUUO2KTUUQ7CDCDLQDUHTQDHANCNFSM5EUBX37Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
We decided to go ahead and update to the official release ESMF_8_2_0 instead of the 820bs20. |
* add paths to recent MET/METplus installations on Gaea * change data staging directory to ncep_shared
Description
To achieve optimal performance of coupled UFS applications, the number of threads need to be tuned separately for each component.
Solution
ESMF recently introduced flexible threading options that allows each component model to independently set its own threading level. This was discussed at a recent UFS/CMEPS call (see slides)
UFS will need to first be updated to ESMF 820bs20+.
@theurich and @mark-a-potts have started this work on branches.
Alternatives
There are some options for machine-specific threading layouts. However, these are not portable between machines and do not support setting per-component threading levels when the components are running on the same nodes.
Related to
Depends on: NOAA-EMC/HYCOM-src#1
The text was updated successfully, but these errors were encountered: