Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set specific GPU architectures for builds #160

Merged
merged 7 commits into from
Oct 10, 2024

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Oct 8, 2024

Contributes to #115

While inspecting the size of the conda packages, I was surprised to see that they were tiny (2.5MB for the GPU variant on Python 12). I investigated that, and realized that in CI builds, legate-boost is not currently targeting a specific set of GPU architectures.

It's falling back to native. Since builds are done on runners without a GPU, that means the packages are being built for whatever the single default target architecture is in the version of CMake getting pulled in (CMake docs on that).

This PR proposes matching legate and cunumeric's behavior, setting CUDAARCHS="all-major". @RAMitchell @trivialfis if you'd prefer to hard-code a specific set of architectures like 80;86;90 or similar, let me know.

Notes for Reviewers

Bad (good?) timing... as I opened this, pre-commit 4.0 came out, breaking some of our configurations here that were relying on hooks that were incompatible with that version. pre-commit-related changes that you see in the diff are to fix that.

Mainly:

@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Oct 8, 2024
@jameslamb jameslamb changed the title WIP: set specific GPU architectures for builds set specific GPU architectures for builds Oct 10, 2024
@jameslamb jameslamb marked this pull request as ready for review October 10, 2024 15:37
@RAMitchell
Copy link
Contributor

Following cunumeric is perfect. Although I suspect they might be compiling for architectures they don't support (@Jacobfaib does this make sense?). The downside is large binaries - this can get pretty annoying for users when it gets > 300mb.

@jameslamb
Copy link
Member Author

The downside is large binaries - this can get pretty annoying for users when it gets > 300mb.

Ah yeah! I should have mentioned... with this change, legate-boost conda packages only grow to around 6MB.

legate-boost 0.1.0 cuda12_py312_0_gpu
-------------------------------------
file name   : legate-boost-0.1.0-cuda12_py312_0_gpu.tar.bz2
name        : legate-boost
version     : 0.1.0
build       : cuda12_py312_0_gpu
build number: 0
size        : 6.2 MB

(build link)

@Jacobfaib
Copy link
Contributor

Although I suspect they might be compiling for architectures they don't support

Entirely likely. I'll be honest, I don't think we thought that hard about it. Someone at some point said "you should compile for all-major" and so we did. Downside of course is that the binaries are downright hefty.

I'd be careful with big binaries though, if you are doing ABI-versioned symlinks and shipping Python wheels. Python wheels don't support symlinks, instead it makes deep copies of everything. So what used to be

libfoo.so.1.2.3 # 200MB
libfoo.so.1 -> libfoo.so.1.2.3
libfoo.so -> libfoo.so.1.2.3

will now be

libfoo.so.1.2.3 # 200MB
libfoo.so.1 # 200MB
libfoo.so # 200MB

That's partly the reason why cunumeric is such a big install...

@jameslamb
Copy link
Member Author

Thanks for that note @Jacobfaib , it's an important one!

For now, legate-boost is targeting publishing conda packages only, not wheels, and not worrying about exporting its shared library for use by other projects.

@jameslamb jameslamb merged commit fd04bbc into rapidsai:main Oct 10, 2024
9 checks passed
@jameslamb jameslamb deleted the ci/architectures branch October 10, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants