Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel.yml CI workflow must change to support new distribution method, new LLVM compilers #1156

Open
edwardhartnett opened this issue Dec 26, 2023 · 12 comments
Labels
bug Something isn't working

Comments

@edwardhartnett
Copy link
Contributor

Describe the bug
Intel has changed the way it distributes its free compiler. The current intel CI works because this is cached in the CI build, but when the cache keys are changed, the build will fail.

To Reproduce
Create a new branch and change the cache keys in the intel.yml build.

Expected behavior
We need the intel build to work, of course.

Also intel has introduced a new set of compilers, which we should also use to build in CI.

@edwardhartnett edwardhartnett added the bug Something isn't working label Dec 26, 2023
@edwardhartnett
Copy link
Contributor Author

@AlexanderRichert-NOAA and @MatthewMasarik-NOAA when I switch from ubuntu-20.4 to ubuntu-latest, the build fails. Do you know why? See https://github.com/edwardhartnett/WW3/actions/runs/7330891012/job/19962722122.

@MatthewMasarik-NOAA
Copy link
Collaborator

@AlexanderRichert-NOAA and @MatthewMasarik-NOAA when I switch from ubuntu-20.4 to ubuntu-latest, the build fails. Do you know why? See https://github.com/edwardhartnett/WW3/actions/runs/7330891012/job/19962722122.

Hi @edwardhartnett, sorry I don't know the why, though I can say for the WW3 Intel workflow we had moved from ubuntu-latest back to ubuntu-20.04 a bit ago to solve CI crashes. Though the GNU CI workflow is at ubuntu-latest. I tried to get both Intel and GNU to be on the same version, either one, but ultimately couldn't so they are currently split.

@edwardhartnett
Copy link
Contributor Author

OK, we definitely want to learn what is causing the problem between ubuntu-20.03 and ubuntu-latest. That's exactly the kind of thing that causes portability issues.

Also we want to add testing with the new intel compilers: https://www.intel.com/content/www/us/en/developer/articles/technical/getting-to-know-llvm-based-oneapi-compilers.html#gs.2oksgd. @AlexanderRichert-NOAA did this to NCEPLIBS but I can't figure out how to do it to WW3. When he gets back from leave we can ask for help.

Also also I see that the oneapi compilers can also be installed with spack. https://spack.readthedocs.io/en/latest/build_systems/inteloneapipackage.html. This is probably how we want to do things, but when I tried, everything broke. ;-)

@aidanheerdegen
Copy link

As you're using spack to install your dependencies is there a reason you're not also building WW3 with spack? We (https://github.com/ACCESS-NRI) will want to utilise spack builds in the future for WW3, so would certainly be interested in collaborating on this capability.

By utilising spack you can leverage its ability to determine and build all dependencies automatically, and so create model agnostic CI workflows. e.g. this is what we've done:

https://github.com/ACCESS-NRI/build-ci

See the in-depth developer docs for more detail.

One goal with this approach was to be less sensitive to changes in the GitHub runner software stack, and control the build environment as much as possible.

We're still in the early stages, but we're happy with how it is performing so far.

@AlexanderRichert-NOAA
Copy link
Contributor

@edwardhartnett see https://github.com/AlexanderRichert-NOAA/WW3/blob/intel_ci_fix_jan24/.github/workflows/intel.yml for fixes (see CI output here). In short, in addition to updating to ubuntu-latest, I am

  • relocating /usr/local because there are stray copies of glibc under there (for android development and such) that cmake picks up on and it breaks everything 😠😠😠;
  • making it use the apt-installed intel-oneapi-mpi (I don't think there's any benefit to installing through Spack, and I think for some reason the Spack-installed one was causing the issue you were seeing though I don't know why); and
  • installing cmake in the "build" job (which is needed because currently it's using the one under the now-disabled /usr/local).

@MatthewMasarik-NOAA
Copy link
Collaborator

@edwardhartnett see https://github.com/AlexanderRichert-NOAA/WW3/blob/intel_ci_fix_jan24/.github/workflows/intel.yml for fixes (see CI output here). In short, in addition to updating to ubuntu-latest, I am

* relocating /usr/local because there are stray copies of glibc under there (for android development and such) that cmake picks up on and it breaks everything 😠😠😠;

* making it use the apt-installed intel-oneapi-mpi (I don't think there's any benefit to installing through Spack, and I think for some reason the Spack-installed one was causing the issue you were seeing though I don't know why); and

* installing cmake in the "build" job (which is needed because currently it's using the one under the now-disabled /usr/local).

Thank you much, @AlexanderRichert-NOAA. We definitely appreciate your work in sorting out what was going on under the hood here.

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @AlexanderRichert-NOAA, I wanted to check in with you on your fix. Did you want to submit a PR with your branch, or would you prefer us to submit one? Either is just fine, I didn't want to act though before verifying your plans.

@AlexanderRichert-NOAA
Copy link
Contributor

I went ahead and created a PR with my updates (it's current with develop and the CI tests pass): #1161

@MatthewMasarik-NOAA
Copy link
Collaborator

As you're using spack to install your dependencies is there a reason you're not also building WW3 with spack? We (https://github.com/ACCESS-NRI) will want to utilise spack builds in the future for WW3, so would certainly be interested in collaborating on this capability.

By utilising spack you can leverage its ability to determine and build all dependencies automatically, and so create model agnostic CI workflows. e.g. this is what we've done:

https://github.com/ACCESS-NRI/build-ci

See the in-depth developer docs for more detail.

One goal with this approach was to be less sensitive to changes in the GitHub runner software stack, and control the build environment as much as possible.

We're still in the early stages, but we're happy with how it is performing so far.

Hi @aidanheerdegen, thanks for your interest and pointing us your spack CI work. We moved to a cmake -based build and have started using spack for our dependencies somewhat recently. I'll admit I'm just getting up to speed with spack, so I'm probably not well-versed enough yet to answer your question directly. Though I did check out the approach you're using in ACCESS-NRI/spack-packages, mom5 in particular. It looks like the Mom5 class creates and then builds from a Makefile. Since cmake automates the 3 stages corresponding to your phases routines (edit, build, install), is there a reason for choosing traditional make over cmake?

@MatthewMasarik-NOAA
Copy link
Collaborator

And actually, if you're fine with having cmake/3.19 as a dependency, than you could really easily create a WW3 class similar to the Mom5(Makefile) class. Since the WW3 cmake build system is set up, you don't need to write a Makefile, you can just make use of the calls to cmake. For a WW3(<cmake_generator>) class, the functions edit, build, and install would just be wrappers around: cmake <...>, cmake --build <...>, and cmake --install <...>, respectively.

@aidanheerdegen
Copy link

We moved to a cmake -based build and have started using spack for our dependencies somewhat recently. I'll admit I'm just getting up to speed with spack, so I'm probably not well-versed enough yet to answer your question directly.

Hi @MatthewMasarik-NOAA. Agreed spack can take some getting used to, and finding out the best way to configure and use it. There are some benefits to creating a spack package for WW3 and use it for building the software, the most obvious is you get reproducible builds with full build provenance.

Using spack we're building our CI containers in 2 stages, the first stage is the base container with a version of spack and (intel) compiler. We've set it up as a matrix build, so we could do combinations of spack and compiler versions, but we currently don't. The next stage is to install the model dependencies. By keeping these separate we can update the model dependencies without incurring the penalty of reinstalling spack and compiler. Then we can use that versioned container for CI with a spack install with the modified model code. The really cool thing is that it is basically model agnostic. With very little effort (and hopefully little to no changes) the same workflow could be used to do build CI on any model. This is the goal, so we can scale, and have CI testing for many models without the onerous burden of fixing broken CI pipelines (and I know this pain in a very real way).

There is a nice explanation in the docs here

https://github.com/ACCESS-NRI/build-ci/blob/main/README-DEV.md#ci-workflow-run-through

Having said which, there is currently a wrinkle with the way GitHub actions mounts containers that we're trying to fix right now.

Though I did check out the approach you're using in ACCESS-NRI/spack-packages, mom5 in particular. It looks like the Mom5 class creates and then builds from a Makefile. Since cmake automates the 3 stages corresponding to your phases routines (edit, build, install), is there a reason for choosing traditional make over cmake?

I'm a big fan of CMake for Fortran projects: it has excellent dependency resolution, the best I have used (I wrote some experimental CMake build infra for MOM5). We didn't use CMake for MOM5 because we wanted to match as closely as possible the existing Makefile (mkmf) build system, to try and get builds that produced bit reproducible model output. We achieved this goal. If MOM5 was an active development target we might then have moved to supporting a CMake build, but at this stage we're only targeting legacy support and backwards compatibility with previous builds. MOM6 is the actively developed ocean model component, and we're collaborating with COSIMA to get that "fully spackified". It is ACCESS-OM3 (with MOM6) that is using WW3.

Thanks for engaging, and I'm sorry for hijacking this issue. When we start work on a spack package for WW3 I'll create a separate issue and reference this one. I'd imagine this would be within the next few months, but my predictions are notoriously unreliable.

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @aidanheerdegen, thanks for the explanation, I see what you're saying supporting legacy system for the MOM5 example. I think if my understanding is correct, implementing a WW3 class/package with cmake should be not too difficult. If you go that route and need any pointers on potential arguments to the calls, feel free to tag me. Best of luck with your efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants