-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Integrate 'online-cmaq' into 'develop' branch: Part 2, build script and module files #549
[develop] Integrate 'online-cmaq' into 'develop' branch: Part 2, build script and module files #549
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chan-hoo From the Contributor's Guide, all externals live in a single Externals.cfg
file. From your detailed write-up, I understand why you added Externals_AQM.cfg
(different hashes between SRW App
and Online-CMAQ
), but this goes against the current coding standards. I would like to hear from others to see if they are okay with this before approving myself.
Having said that, I was able to successfully clone your fork's feature/aqm_part2 branch, checkout the necessary components and build the Online-CMAQ
using the instructions that you provided, which completes the work associated with this PR.
@MichaelLueken, thank you for the comment. I understand. Practically, the Online-CMAQ and the current develop branch (FV3-LAM) only share 'UFS-UTILS' in Externals.cfg. The hash of the ufs weather model in the develop branch is not up-to-date mostly. The major issue when we try to update the hash of the ufs weather model in Externals.cfg is that we (EMC) don't have access to other HPC platforms. So it will take long to update the hash. In addition, we don't want to check out 'upp', 'rrfs-utls', and 'gsi' for Online-CMAQ. I think this will be the same to FV3-LAM. The users of FV3-LAM would not want to check out the components only for Online-CMAQ. I agree with you. We should hear from others. |
@chan-hoo If you decide to go the route of one Externals.cfg file, you can check out the components you need by listing them after the checkout_externals command, where all the components have lowercase names. I don't have an opinion on which route is better. |
@danielabdi-noaa, Thanks for the comment. In my opinion, one Externals.cfg file for one app is better for the reason I described above (maintenance at EMC). For example, 'Externals.cfg' for FV3-LAM (or RRFS) and 'Externals_AQM.cfg' for Online-CMAQ. If other applications are incorporated in the ufs srweather app in the future, I think, this approach will be more convenient. However, I'll follow the majority decision :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so concerned about the use of an additional Externals.cfg
file as long as it is for the right reasons. In this case, one of the stated reasons was to circumvent basic development processes because testing updating of ufs-weather-model will take too long. That is not an acceptable justification. The required updates need to be made to make it all work with develop
. The objection to checking out components that aren't needed does make sense, however.
Another concern I have is that the build process is becoming too complex. There is a comment in the devbuild.sh
script that says:
NOTE: This script is for internal developer use only;
But that is now a lie. I would like to see the sequence of commands needed to build the online-cmaq
without using devbuild.sh
because devbuild.sh
was originally meant only to be a "convenience" script that experts could use to short-cut the process and make their work go faster. But, now, that appears no longer to be the case. One of the most difficult and insidious problems of modeling systems infrastructure developed over the past 20 years was the very non-portable and incomprehensible build systems that were developed. That has largely been mitigated in the current UFS and I would urge us to resist adding anything that takes the UFS backwards in that arena. What I'm saying is, it's starting to get to the point where a user can't possibly build the UFS without a very complicated, hard-to-understand (for an end-user), script with ever growing maintenance costs. It should be possible, without too much difficulty to build CMAQ without devbuild.sh
and I'd like to see how that is done.
@christopherwharrop-noaa, I agree with you in principle. If the hash of the ufs weather model is updated with the latest hash of the authoritative repository, we won't have to maintain it separately. However, as you may know, the hash of the ufs weather model has not been updated frequently as expected. This is always a big issue on our side. I described it above from the practical point of view. I would like to hear from others if they also think this is not an acceptable justification to them either :) I thought it was very acceptable justification to me for having a separate Externals.cfg file for Online-CMAQ. |
I wanted to check where we are with this PR. My understanding is that it is waiting for a PR to the weather model that will move to a different HPC-Stack location that will allow the App to also use the same stack location? Is that correct ? Is there a time line on that? |
@arunchawla-NOAA - Yes, you are correct. Applications are not supposed to move ahead from the ufs-weather-model with respect to HPC-stack modules. The ufs-weather-model repo has opened PR #1596 to bring in the EPIC AUS team's new HPC-stack locations for Tier-1 machines. I'm ready to assist @chan-hoo with updating this PR with the new stack locations once this PR has been opened and merged. Unfortunately, due to issues encountered with the regression testing on Gaea and Orion, I have no ETA on when the weather model's PR will be moved to ready to review and ultimately merged. |
A note for posterity: this branch was also affected by the rebase to remove large files introduced in PR #556. If you are working on a local copy of this repository it may be necessary to re-download a fresh copy. Please contact me if you have any questions about this. |
what are the issues on regression testing on Gaea and Orion ? |
Waiting on @natalie-perlin comments at ufs-community/ufs-weather-model#1596. |
@arunchawla-NOAA - Looking at the conversations taking place in the WM's PR, it looks like the WM is failing to build with the following message on Orion:
and on Gaea:
|
@arunchawla-NOAA We're currently working to test the new HPC-Stack locations, using the RT suite, and tracking communication/progress in this issue created by Natalie.: #1465 The cmake related error experienced on Orion (referred to above by Mike) was also experienced on Jet. The stack has recently been rebuilt by AUS on Jet to resolve this issue, and I'm re-testing on Jet now. AUS is currently working to rebuild the stack as well on Orion/Gaea. |
… into feature/aqm_part2
…still submitting in Jenkins testing.
@chan-hoo The weather model updated their modulefiles earlier today. I have pushed updated modulefiles to this PR. I will now kick off the Jenkins tests once again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chan-hoo Finally, I have some very good news! Following my update of the modulefiles, the rest of the Jenkins tests are now passing! Please note that the Hera tests are still associated with this PR's build on Jenkins, so the abort in the details is due to needing to abort the Hera tests.
I have one minor request in devbuild.sh
- if you would rather not move the highlighted section, that's fine, but I think that the Check out external components
does need to be renamed, because that is no longer happening within devbuild.sh
. Additionally, since wgrib2
is loaded in modulefiles/srw_common.lua
, please remove the entry from modulefiles/build_hera_intel.lua
.
Once you have made the changes, I'll be ready to give my approval to these changes.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chan-hoo Thanks for making the requested changes! Approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Hi @christopherwharrop-noaa - |
Thanks, @MichaelLueken , @danielabdi-noaa !! |
For additional discussion on Externals.cfg, please see and discuss further in #602. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking the time to address all the comments.
@MichaelLueken - I'm good with moving forward at this point. Thank you for asking and helping to get things synced up. |
Machine: hera |
The devbuild.sh is unable to build the rrfs_utils app even if it is being called from the command line like in: ./devbuild.sh -p=<platform> all or ./devbuild.sh -p=<platform> gsi rrfs_utils. After some investigation it appears the -DBUILD_RRFS_UTILS option was accidentally removed from the cmake settings in PR #549. Adding this line back in causes the rrfs_utils app to build again. Co-authored-by: Edward Snyder <[email protected]>
DESCRIPTION OF CHANGES:
Currently, the Online-CMAQ for air quality modeling is separately managed in the 'online-cmaq' branch of the authoritative UFS SRW App repository. The main target of the current 'online-cmaq' branch is the implementation of AQM ver.7 and delivery to NCO in late March. Therefore, it has been developed and tested on WCOSS2 and Hera. The Online-CMAQ checks out the same ufs weather model as the 'develop' branch. However, since it couples another component 'AQM', the compile process is different from that of the 'develop' branch.
This 'online-cmaq' branch has beeen integrating into the 'develop' branch with PR #536, this PR, and the following PR. For the convenience of the reviewers, we split the 'online-cmaq' branch into three parts as follows:
Part 1: j-job and ex- scripts (PR #536)
Part 2: build scripts and module files (this PR)
Part 3: workflow set-up, generation, run scripts
For Online-CMAQ, a separate Externals.cfg file named
Externals_AQM.cfg
is introduced and it is automatically read by runningdevbuild.sh
with the argument-a=ATMAQ
. This is because Online-CMAQ may use a different hash of the ufs weather model unliess its hash in 'Externals.cfg' is always up to date.As described above, in case of different hashes between 'Externals.cfg' and 'Externals_AQM.cfg', the versions of the modules would not be the same. Online-CMAQ has been developed and tested on wcoss2 and Hera. To avoid any conflicts on other HPC platforms, the versions of the modules are separately controlled by the version file `build.ver.[platform] in the 'versions' directory.
This PR should not affect any existing configuration and test cases.
Type of change
TESTS CONDUCTED:
WE2E tests on wcoss2:
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
MET_verification
community_ensemble_2mems_stoch
hera.intel
orion.intel
cheyenne.intel
cheyenne.gnu
gaea.intel
jet.intel
wcoss2.intel
NOAA Cloud (indicate which platform)
Jenkins
fundamental test suite
comprehensive tests (specify which if a subset was used)
ISSUE:
Fixes issue mentioned in #534.
CHECKLIST