Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) added CUDA 11.0.2 and related recipes #10935

Closed
wants to merge 16 commits into from

Conversation

mboisson
Copy link
Contributor

@mboisson mboisson commented Jul 7, 2020

No description provided.

@mboisson
Copy link
Contributor Author

mboisson commented Jul 7, 2020

I'm not sure what this failure means

Traceback (most recent call last):
  File "/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 773, in test_changed_files_pull_request
    self.check_sanity_check_paths(changed_ecs)
  File "/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 650, in check_sanity_check_paths
    self.assertFalse(failing_checks, '\n'.join(failing_checks))
AssertionError: ['No custom sanity_check_paths found in CUDA-11.0-GCC-9.3.0.eb', 'No custom sanity_check_paths found in CUDA-11.0-iccifort-2020.1.217.eb'] is not false : No custom sanity_check_paths found in CUDA-11.0-GCC-9.3.0.eb
No custom sanity_check_paths found in CUDA-11.0-iccifort-2020.1.217.eb

there are no sanity_check_paths in any of the other CUDA recipes ?

@mboisson
Copy link
Contributor Author

mboisson commented Jul 7, 2020

Ah, it's because it's using a Bundle EasyBlock, which is not in here :

whitelist = ['BuildEnv', 'CrayToolchain', 'GoPackage', 'ModuleRC', 'PythonBundle', 'PythonPackage',

Why are PythonBundle excluded, but not Bundle ?

@Micket
Copy link
Contributor

Micket commented Jul 7, 2020

Test report by @Micket
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in this PR)
hebbe-c1 - Linux centos linux 7.8.2003, x86_64, Intel Core Processor (Haswell, no TSX), Python 2.7.5
See https://gist.github.com/4c6265827ff87a3f08452518f6b763c0 for a full test report.

(this was a testbuilt using the fixed cuda.py block)

@Micket
Copy link
Contributor

Micket commented Jul 7, 2020

Why are PythonBundle excluded, but not Bundle ?

PythonPackage and PythonBundle are excluded because they contain default implementations for sanity checks. It checks for the site-packages directory
Bundles typically always contain stuff though, this is really a.. uh. well I don't know really. The only cases that are module-only are actually ModuleRC (which wouldn't work) and Toolchain as far as I know. (oh, and BuildEnv, but that's not right either)

@mboisson
Copy link
Contributor Author

mboisson commented Jul 7, 2020

Why are PythonBundle excluded, but not Bundle ?

PythonPackage and PythonBundle are excluded because they contain default implementations for sanity checks. It checks for the site-packages directory
Bundles typically always contain stuff though, this is really a.. uh. well I don't know really. The only cases that are module-only are actually ModuleRC (which wouldn't work) and Toolchain as far as I know. (oh, and BuildEnv, but that's not right either)

I can try to create an empty sanity_check_paths ?

@mboisson
Copy link
Contributor Author

mboisson commented Jul 7, 2020

Ah, no, I need to add to

bundles_whitelist = ['Autotools', 'GCC']

@mboisson
Copy link
Contributor Author

mboisson commented Jul 7, 2020

Ah, no, I need to add to

bundles_whitelist = ['Autotools', 'GCC']

That is done in
#10936

@Micket
Copy link
Contributor

Micket commented Jul 7, 2020

(this is still the most reliable way to re-test a PR)

@Micket Micket closed this Jul 7, 2020
@Micket
Copy link
Contributor

Micket commented Jul 7, 2020

Re-running tests

@Micket Micket reopened this Jul 7, 2020
@boegelbot

This comment has been minimized.

@mboisson
Copy link
Contributor Author

mboisson commented Jul 8, 2020

I'm not sure why the tests fail with

Traceback (most recent call last):
  File "/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 147, in test_conflicts
    self.process_all_easyconfigs()
  File "/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 120, in process_all_easyconfigs
    EasyConfigTest.ordered_specs = resolve_dependencies(EasyConfigTest.parsed_easyconfigs, modules_tool(), retain_all_deps=True)
  File "/opt/hostedtoolcache/Python/3.5.9/x64/lib/python3.5/site-packages/easybuild/tools/robot.py", line 460, in resolve_dependencies
    raise_error_missing_deps(totally_missing, extra_msg="no easyconfig file or existing module found")
  File "/opt/hostedtoolcache/Python/3.5.9/x64/lib/python3.5/site-packages/easybuild/tools/robot.py", line 323, in raise_error_missing_deps
    raise EasyBuildError(error_msg)
easybuild.tools.build_log.EasyBuildError: 'Missing dependencies: CUDAcore/11.0.2-GCC-9.3.0, CUDAcore/11.0.2-iccifort-2020.1.217 (no easyconfig file or existing module found)'

The recipes do not call for CUDAcore/11.0.2-GCC-9.3.0. Shouldn't EasyBuild automatically resolve the dependencies are at the SYSTEM level ?

@mboisson mboisson closed this Jul 8, 2020
@mboisson mboisson reopened this Jul 8, 2020
@mboisson
Copy link
Contributor Author

mboisson commented Jul 8, 2020

Looks like the build is simply too long ?

@bartoldeman
Copy link
Contributor

Travis has issues from time to time. I'll try again.

@bartoldeman bartoldeman closed this Jul 8, 2020
@bartoldeman bartoldeman reopened this Jul 8, 2020
@bartoldeman
Copy link
Contributor

UCX should be built with CUDA support too, Open MPI in here is using the non-CUDA enabled UCX. But that would need the gcccorecuda toolchain as explained in slack.

@mboisson
Copy link
Contributor Author

mboisson commented Jul 9, 2020 via email

@bartoldeman
Copy link
Contributor

No new easyblock is needed, but a gcccorecuda toolchain needs to be added to the framework.

Framework PR filed here: easybuilders/easybuild-framework#3385

@boegelbot

This comment has been minimized.

@boegelbot
Copy link
Collaborator

Travis test report: 2/2 runs failed - see https://travis-ci.org/easybuilders/easybuild-easyconfigs/builds/706594322

Only showing partial log for 1st failed test suite run 23191.1;
full log at https://travis-ci.org/easybuilders/easybuild-easyconfigs/jobs/706594323

...
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/easybuilders/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 492, in test_sanity_check_paths
    self.process_all_easyconfigs()
  File "/home/travis/build/easybuilders/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 120, in process_all_easyconfigs
    EasyConfigTest.ordered_specs = resolve_dependencies(EasyConfigTest.parsed_easyconfigs, modules_tool(), retain_all_deps=True)
  File "/home/travis/virtualenv/python2.7.15/lib/python2.7/site-packages/easybuild/tools/robot.py", line 436, in resolve_dependencies
    processed_ecs = process_easyconfig(path, validate=not retain_all_deps, hidden=hidden)
  File "/home/travis/virtualenv/python2.7.15/lib/python2.7/site-packages/easybuild/framework/easyconfig/easyconfig.py", line 1966, in process_easyconfig
    raise EasyBuildError("Failed to process easyconfig %s: %s", spec, err.msg)
EasyBuildError: 'Failed to process easyconfig /home/travis/build/easybuilders/easybuild-easyconfigs/easybuild/easyconfigs/u/UCX/UCX-1.8.0-gcccorecuda-2020a.eb: Toolchain gcccorecuda not found, available toolchains: giolfc,gpsmpi,intel,iomkl,dummy,gimpic,goblf,pompi,cgmpich,gomkl,ismkl,iimpi,system,gsmpi,xlmvapich2,cgmpolf,gmpich2,cgoolf,gqacml,CrayPGI,gmvapich2,iimkl,impich,iiqmpi,xlmpich2,foss,iompi,goolf,iomklc,GCCcore,gmpich,xlcxlf,golf,iompic,giolf,gcccuda,cgmvapich2,iimklc,xlompi,gompi,gomklc,CrayCCE,iccifort,intel-para,ipsmpi,gmpolf,GNU,iccifortcuda,ictce,gmkl,goalf,goolfc,gimpi,gmvolf,intelcuda,gmklc,CrayIntel,impmkl,iqacml,CrayGNU,golfc,gpsolf,cgmvolf,GCC,iimpic,ClangGCC,xlmpich,pomkl,gmacml,gimkl,cgompi,PGI,fosscuda,pmkl,gsolf,gompic'

======================================================================
FAIL: test_changed_files_pull_request (test.easyconfigs.easyconfigs.EasyConfigTest)
Specific checks only done for the (easyconfig) files that were changed in a pull request.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/easybuilders/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 768, in test_changed_files_pull_request
    self.assertTrue(False, error_msg)
AssertionError: Failed to find parsed easyconfig for UCX-1.8.0-gcccorecuda-2020a.eb (and could not isolate it in easyconfigs archive either)

----------------------------------------------------------------------
Ran 10170 tests in 365.249s

FAILED (failures=1, errors=3)
ERROR: Not all tests were successful.
travis_time:end:295feb58:start=1594312504382178034,finish=1594312871815642595,duration=367433464561,event=script
�[0K�[31;1mThe command "python -O -m test.easyconfigs.suite" exited with 2.�[0m
travis_time:start:132030ec
�[0K$ unset PYTHONPATH
travis_time:end:132030ec:start=1594312871819568119,finish=1594312871821850615,duration=2282496,event=script
�[0K�[32;1mThe command "unset PYTHONPATH" exited with 0.�[0m
travis_time:start:05f81a10
�[0K$ cd $HOME; pip install $TRAVIS_BUILD_DIR
�[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.�[0m
Processing ./build/easybuilders/easybuild-easyconfigs
Building wheels for collected packages: easybuild-easyconfigs
  Building wheel for easybuild-easyconfigs (setup.py) ... �[?25l-� �\� �|� �/� �-� �\� �|� �/� �-� �\� �|� �/� �-� �\� �|� �/� �-� �\� �|� �/� �-� �\� �|� �/� �-� �\� �|� �/� �-� �\� �|� �/� �-� �\� �|� �/� �-� �\� �|� �/� �-� �\� �done
�[?25h  Stored in directory: /home/travis/.cache/pip/wheels/60/83/5a/68a83b743a3b5a96bdb88de299d7630c9f272b0dec140d3e5d
Successfully built easybuild-easyconfigs
Installing collected packages: easybuild-easyconfigs
Successfully installed easybuild-easyconfigs-4.2.3.dev0
travis_time:end:05f81a10:start=1594312871825472870,finish=1594312882847023922,duration=11021551052,event=script
�[0K�[32;1mThe command "cd $HOME; pip install $TRAVIS_BUILD_DIR" exited with 0.�[0m
travis_time:start:074c0de0
�[0K$ export EB_PYTHON=python
travis_time:end:074c0de0:start=1594312882850906920,finish=1594312882853244136,duration=2337216,event=script
�[0K�[32;1mThe command "export EB_PYTHON=python" exited with 0.�[0m
travis_time:start:0911d460
�[0K$ eb --show-config | tee eb_show_config.out
#
# Current EasyBuild configuration
# (C: command line argument, D: default value, E: environment variable, F: configuration file)
#
buildpath      (D) = /home/travis/.local/easybuild/build
containerpath  (D) = /home/travis/.local/easybuild/containers
installpath    (D) = /home/travis/.local/easybuild
repositorypath (D) = /home/travis/.local/easybuild/ebfiles_repo
robot-paths    (D) = /home/travis/virtualenv/python2.7.15/easybuild/easyconfigs
sourcepath     (D) = /home/travis/.local/easybuild/sources
travis_time:end:0911d460:start=1594312882856603120,finish=1594312883266044269,duration=409441149,event=script
�[0K�[32;1mThe command "eb --show-config | tee eb_show_config.out" exited with 0.�[0m
travis_time:start:17f8f716
�[0K$ grep "^robot-paths .*/easybuild/easyconfigs" eb_show_config.out
robot-paths    (D) = /home/travis/virtualenv/python2.7.15/easybuild/easyconfigs
travis_time:end:17f8f716:start=1594312883270205959,finish=1594312883274280797,duration=4074838,event=script
�[0K�[32;1mThe command "grep "^robot-paths .*/easybuild/easyconfigs" eb_show_config.out" exited with 0.�[0m
travis_time:start:1b46f094
�[0K$ eb --search 'TensorFlow-1.14.*.eb' | tee eb_search_TF.out
 * /home/travis/virtualenv/python2.7.15/easybuild/easyconfigs/t/TensorFlow/TensorFlow-1.14.0-foss-2019a-Python-3.7.2.eb
 * /home/travis/virtualenv/python2.7.15/easybuild/easyconfigs/t/TensorFlow/TensorFlow-1.14.0-fosscuda-2019a-Python-3.7.2.eb
travis_time:end:1b46f094:start=1594312883277714704,finish=1594312883862777650,duration=585062946,event=script
�[0K�[32;1mThe command "eb --search 'TensorFlow-1.14.*.eb' | tee eb_search_TF.out" exited with 0.�[0m
travis_time:start:08bdba4c
�[0K$ grep '/TensorFlow-1.14.0-foss-2019a-Python-3.7.2.eb$' eb_search_TF.out
 * /home/travis/virtualenv/python2.7.15/easybuild/easyconfigs/t/TensorFlow/TensorFlow-1.14.0-foss-2019a-Python-3.7.2.eb
travis_time:end:08bdba4c:start=1594312883867063467,finish=1594312883871076954,duration=4013487,event=script
�[0K�[32;1mThe command "grep '/TensorFlow-1.14.0-foss-2019a-Python-3.7.2.eb$' eb_search_TF.out" exited with 0.�[0m
travis_time:start:005a60f4
�[0K$ eb --search '^foss-2018b.eb' | tee eb_search_foss.out
 * /home/travis/virtualenv/python2.7.15/easybuild/easyconfigs/f/foss/foss-2018b.eb
travis_time:end:005a60f4:start=1594312883874741798,finish=1594312884452892082,duration=578150284,event=script
�[0K�[32;1mThe command "eb --search '^foss-2018b.eb' | tee eb_search_foss.out" exited with 0.�[0m
travis_time:start:17cba45f
�[0K$ grep '/foss-2018b.eb$' eb_search_foss.out
 * /home/travis/virtualenv/python2.7.15/easybuild/easyconfigs/f/foss/foss-2018b.eb
travis_time:end:17cba45f:start=1594312884457223077,finish=1594312884460938867,duration=3715790,event=script
�[0K�[32;1mThe command "grep '/foss-2018b.eb$' eb_search_foss.out" exited with 0.�[0m
travis_time:start:00dea239
�[0K$ eb --prefix /tmp/$USER M4-1.4.18.eb
== temporary log file in case of crash /tmp/eb-_YdiHf/easybuild-mM1Gw8.log
== processing EasyBuild easyconfig /home/travis/virtualenv/python2.7.15/easybuild/easyconfigs/m/M4/M4-1.4.18.eb
== building and installing M4/1.4.18...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== testing...
== installing...
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== cleaning up...
== creating module...
== permissions...
== packaging...
== COMPLETED: Installation ended successfully (took 25 sec)
== Results of the build can be found in the log file(s) /tmp/travis/software/M4/1.4.18/easybuild/easybuild-M4-1.4.18-20200709.164149.log
== Build succeeded for 1 out of 1
== Temporary log file(s) /tmp/eb-_YdiHf/easybuild-mM1Gw8.log* have been removed.
== Temporary directory /tmp/eb-_YdiHf has been removed.
travis_time:end:00dea239:start=1594312884464383727,finish=1594312909920512406,duration=25456128679,event=script
�[0K�[32;1mThe command "eb --prefix /tmp/$USER M4-1.4.18.eb" exited with 0.�[0m


Done. Your build exited with 1.

*bleep, bloop, I'm just a bot (boegelbot v20180813.01)*Please talk to my owner @boegel if you notice you me acting stupid),or submit a pull request to https://github.com/boegel/boegelbot fix the problem.

dependencies = [
('zlib', '1.2.11'),
('hwloc', '2.2.0'),
('UCX', '1.8.0', '', ('gcccorecuda', '2020a')),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comment here to explain why we do this.

bartoldeman added a commit to ComputeCanada/easybuild-easyconfigs that referenced this pull request Sep 16, 2020
Like easybuilders#10935 this introduces a new CUDAcore easyconfig to share
CUDA between use of GCC and Intel compilers, but
unlike easybuilders#10935 this uses a versionsuffix for UCX+CUDA so it does
not need any framework changes or MODULEPATH adjustments.
@boegel boegel modified the milestones: 4.3.1 (next release), 4.x Oct 26, 2020
@mboisson
Copy link
Contributor Author

This is no longer relevant I guess.

@mboisson mboisson closed this Oct 13, 2022
@mboisson mboisson deleted the cuda-11.0.2 branch October 13, 2022 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants