-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC: Add groundwork for separating actual cuda parts in OpenMPI. WIP WIP WIP #14919
PoC: Add groundwork for separating actual cuda parts in OpenMPI. WIP WIP WIP #14919
Conversation
@akesandgren: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/1790411763
bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
@akesandgren Did you open PRs upstream for these changes? If so, please drop the PR URLs in the PR description |
I haven't done that yet. Was kind of hoping for some constructive criticism here first... |
@akesandgren: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/2195839723
bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
@boegelbot Please build @ generoso |
Got message " Please build @ generoso - notification for comment with ID 1104186093 processed Message to humans: this is just bookkeeping information for me, |
@boegelbot Please test @ generoso |
@akesandgren: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1104201893 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
I think I have a simpler solution, which is to include those parts of cuda.h that are necessary to build Open MPI. I'm attaching my patch here, which can still use a bit of cleanup. The resulting Open MPI should be equivalent to one built with CUDA as a builddependency, so no special OMPI_mca environment variables need to be set, and there's just one Open MPI, with two UCXes, as before. The only concern left is then the potential performance issue for small transfers since there is always a runtime check for CUDA. |
This is unexpected to me. Isn't OpenMPI linking to |
No, what happens is that the code in |
I see. Well, |
Simplified patch attached |
EasyBuild sillyness really since gcccuda toolchains and friends add |
I think it's called sideloading or something, OpenMPI does that for lots of libraries that it may or may not use. |
@bartoldeman Will you open a PR for this approach? |
@Micket the patch introduces a new potential value for the option |
Looking at the easyblock, I don't think there's any need for a change to the easyblock. You just add the patch and set
|
I didn't think any of the patches we have discussed would induce any more or less overhead than what we have always had in the old The only downside of enabling this cuda-awareness is that it does induce an overhead even for those who aren't using CUDA at all due to the extra branch + function call. quoting @bartoldeman from the slack thread:
At that point, i don't even think it's worth caring about disabling cuda-awareness. It might already be possible by simply using:
if you don't care about CUDA and not want to pay even the <1% overhead. |
@akesandgren This can be closed now that #15528 is merged? |
Jup, that it can, closing. |
(created using
eb --new-pr
)NOTE: PoC only (even if it actually works)
This shows how we can separate the CUDA parts in OpenMPI in such a way that we can handle OpenMPI the same way we do UCX and UCX-CUDA.
To build the cuda split base version (i.e. OpenMPI) configure with --with-cuda=enable with no CUDA deps
To build the fully cuda enabled OpenMPI-CUDA, configure with --with-cuda=$EBROOTCUDA (as usual) and only install
into OpenMPI-CUDA.../lib and lib/openmpi
And finally make sure OMPI_MCA_mca_component_path first lists OpenMPI-CUDA's lib and lib/openmpi and the the base OpenMPI's lib and lib/openmpi
The OpenMPI-4.1.1_fix_missing_OPAL_CUDA_GDR_SUPPORT_protection.patch doesn't really have anything to do with this, it's just a fix for an actual bug in OpenMPI that got uncovered due to this.
And CUDA_GDR support is still not handled since that will also need manual intervention during configure.
Depends on: easybuilders/easybuild-easyblocks#2710