-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update fmt (to 11.0.2) and spdlog (to 1.14.1), add those libraries to libcuml conda host dependencies #6071
Conversation
ci/test_python_common.sh
Outdated
@@ -4,6 +4,7 @@ | |||
set -euo pipefail | |||
|
|||
. /opt/conda/etc/profile.d/conda.sh | |||
source ./ci/use_conda_packages_from_prs.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not specific to this line, but just choosing a place to start a threaded conversation.
All conda-based tests are failing with issues like this:
Fatal Python error: Aborted
Thread 0x00007f1903eea640 (most recent call first):
File "/opt/conda/envs/test/lib/python3.12/site-packages/execnet/gateway_base.py", line 534 in read
File "/opt/conda/envs/test/lib/python3.12/site-packages/execnet/gateway_base.py", line 567 in from_io
File "/opt/conda/envs/test/lib/python3.12/site-packages/execnet/gateway_base.py", line 1160 in _thread_receiver
File "/opt/conda/envs/test/lib/python3.12/site-packages/execnet/gateway_base.py", line 341 in run
File "/opt/conda/envs/test/lib/python3.12/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn
Current thread 0x00007f1904b0e740 (most recent call first):
File "/opt/conda/envs/test/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 190 in wrapper
...
Extension modules: cython.cimports.libc.math, Cython.Utils, numpy._core._multiarray_umath, numpy._core._multiarray_tests
...
.[gw13] node down: Not properly terminated
I'll investigate this right now. I suspect it might be something related to symbol visibility (like rapidsai/cudf#15483 (comment)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah!! It looks like either I missed some dependencies in the test scripts, or something else is holding back the fmt
/ spdlog
version?
I see that RAPIDS libraries are coming from rapidsai-nightly
and not the CI artifacts, and that the older versions of fmt
and spdlog
are being used.
...
cudf 24.10.00a369 cuda11_py310_240923_g9b4c4c721c_369 rapidsai-nightly
cuml 24.10.00a70 cuda11_py310_240923_ga1d76d4da_70 file:///tmp/python_channel
...
distributed-ucxx 0.40.00a py3.10_240923_ga07a40f_28 rapidsai-nightly
...
fmt 10.2.1 h00ab1b0_0 conda-forge
...
Will try replicating that conda solve locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing locally, I found that if I added constraints like fmt>=11.0.2, spdlog>=1.14.1
in the conda environment, the solver was able to solve with all of cuml
's runtime dependencies... including the versions of cudf
, rmm
, ucxx
, etc. produced from the PRs for rapidsai/build-planning#56.
I think that libcuml
conda packages need host:
dependencies on spdlog
and fmt
.
- commit adding that: 5b812f0
- build link to watch: https://github.com/rapidsai/cuml/actions/runs/10998530224/job/30536663624?pr=6071
That library's code directly uses both of those:
cuml/cpp/src/common/logger.cpp
Lines 20 to 21 in 7de8831
#include <spdlog/sinks/stdout_color_sinks.h> // NOLINT | |
#include <spdlog/spdlog.h> // NOLINT |
std::string msg_string = fmt::to_string(formatted); |
And having those dependencies would prevent these situations during development where the solver is able to choose to fall back to older RAPIDS nightlies from within the same release (but supporting different fmt
/ spdlog
).
cudf
is doing the same thing: https://github.com/rapidsai/cudf/blob/9b4c4c721c399bae9e88733da79daa1a10644481/conda/recipes/libcudf/meta.yaml#L69-L71
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the right solution, thanks for investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ended up not being enough 😭
Look at https://github.com/rapidsai/cuml/actions/runs/10998530224/job/30541086357?pr=6071.
The environment is still solving with fmt==10.2.1
, spdlog==1.12.0
, and older RAPIDS 24.10.*
nightlies that support those (build link)... even though it used the cuml
/ libcuml
packages from CI here.
cudf 24.10.00a371 cuda11_py310_240923_g625590686f_371 rapidsai-nightly
cuml 24.10.00a71 cuda11_py310_240923_g5b812f0e9_71 file:///tmp/python_channel
...
fmt 10.2.1 h00ab1b0_0 conda-forge
...
libcudf 24.10.00a371 cuda11_240923_g625590686f_371 rapidsai-nightly
...
libcuml 24.10.00a71 cuda11_240923_g5b812f0e9_71 file:///tmp/cpp_channel
libcumlprims 24.10.00a cuda11_240923_gddfb9dd_8 rapidsai-nightly
...
rmm 24.10.00a40 cuda11_py310_240923_g58039d3c_40 rapidsai-nightly
...
spdlog 1.12.0 hd2e6256_2 conda-forge
...
I just pushed another commit explicitly pinning fmt
and spdlog
in the conda test environments, to force the solver to use the newer packages.
- commit: 7724113
- build link: https://github.com/rapidsai/cuml/actions/runs/11000193616/job/30542188379?pr=6071
This should be safe to do... as soon as there are librmm
packages published to rapids-nightly
containing the changes from rapidsai/rmm#1678, every environment containing the latest librmm
/ rmm
will have those same pins in it.
If that works, that's evidence that the fmt
/ spdlog
update is safe and we can start moving forward.
I think both of those changes are worth merging here
- adding the
host:
dependencies = to make CPM unnecessary forspdlog
and avoid unnecessary vendoring and therefore file clobbering - explicitly pinning in test environments = make such fallbacks during development less likely the next time we do an update like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 I see tests now passing and fmt==11.0.2
+ spdlog==1.14.1
getting installed!
Build link: https://github.com/rapidsai/cuml/actions/runs/11000193616/job/30544094053?pr=6071
Another thing that I think happened here.... it's been several days since the cudf
CI artifacts being pulled in here were produced, so the rapidsai-nightly
channel's versions are now newer.
e.g. rapidsai/cudf#16806 produces cudf==24.10.00a364
but rapidsai-nightly
now contains cudf==24.10.00a371
. That would also explain why some of the CI artifacts are being ignored, in favor of those from rapidsai-nightly
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here: #6071 (comment)
- fmt {{ fmt_version }} | ||
- libcumlprims ={{ minor_version }} | ||
- libraft ={{ minor_version }} | ||
- libraft-headers ={{ minor_version }} | ||
- librmm ={{ minor_version }} | ||
- spdlog {{ spdlog_version }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are in the global list for the split recipe but also need to be in the libcuml
package below in order to have run-exports on the appropriate packages. That would eliminate the need for test-time pinnings. I'm not sure which direction we want to go: do we add run-exported dependencies on these to "runtime" packages (these aren't "dev" packages) to enforce consistency, or use test-time pinnings to only ensure that we get compatible builds of librmm, etc. which are exporting these dependencies (and with which we need to avoid conflicts)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok! I'm not that familiar with the ways that split recipes are different from single-package ones, thank you for pointing this out.
I'm not sure precisely what you mean about those two choices... but I can say at least that I think cuml
/ libcuml
should explicitly pin fmt
and spdlog
, given that it directly uses them: #6071 (comment) (instead of getting that pinning transitively from librmm
or similar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked offline and decided to add these pinnings in the top-level host environment and the build environment created from the common_build:
list in dependencies.yaml
, but to not mess with run exports or adding explicit run dependencies on fmt
/ spdlog
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving but we may have a cleanup task related to host dependencies / run-exports.
🎉 this is working as expected
|
/merge |
Description
Contributes to rapidsai/build-planning#56
fmt
andspdlog
to newer versions, to match the rest of RAPIDSfmt
andspdlog
tohost:
dependencies forlibcuml
conda packages (see Update fmt (to 11.0.2) and spdlog (to 1.14.1), add those libraries to libcuml conda host dependencies #6071 (comment))Now that most of
conda-forge
has been updated tofmt >=11.0.1,<12
andspdlog>=1.14.1,<1.15
(rapidsai/build-planning#56 (comment)), we're attempting to upgrade RAPIDS to similar versions of those libraries.This improves the likelihood that RAPIDS will be installable alongside newer versions of its
dependencies and complementary packages on conda-forge.
Notes for Reviewers
This PR is testing changes made in rapidsai/rapids-cmake#689.