Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

undefined reference to `ucm_set_global_opts' #476

Closed
zasdfgbnm opened this issue Apr 14, 2022 · 8 comments · Fixed by #480
Closed

undefined reference to `ucm_set_global_opts' #476

zasdfgbnm opened this issue Apr 14, 2022 · 8 comments · Fixed by #480

Comments

@zasdfgbnm
Copy link
Contributor

zasdfgbnm commented Apr 14, 2022

I am seeing the following error when building UCC in NVIDIA's PyTorch container:

  CCLD     ucc_info
/usr/bin/ld: /usr/local/lib/libucs.so: undefined reference to `ucm_set_global_opts'
/usr/bin/ld: /usr/local/lib/libucs.so: undefined reference to `ucm_mmap_hook_modes'

To reproduce, run the following docker container:

docker run -it nvcr.io/nvidia/pytorch:22.03-py3

And inside the container, run the following script:

#!/bin/bash

set -ex

export UCX_HOME="/usr"
export UCC_HOME="/usr"

install_ucx() {
    set -ex
    echo "Will install ucx at: $UCX_HOME"
    rm -rf ucx
    git clone --recursive https://github.com/openucx/ucx.git
    pushd ucx
    ./autogen.sh
    ./configure --prefix=$UCX_HOME      \
        --without-bfd                   \
        --enable-mt                     \
        --with-cuda=/usr/local/cuda/    \
        --enable-profiling              \
        --enable-stats
    make -j
    make install
    popd
}

install_ucc() {
    set -ex
    echo "Will install ucc at: $UCC_HOME"
    rm -rf ucc
    git clone --recursive https://github.com/openucx/ucc.git
    pushd ucc
    ./autogen.sh
    ./configure --prefix=$UCC_HOME      \
        --with-ucx=$UCX_HOME            \
        --with-nccl=/usr                \
        --with-cuda=/usr/local/cuda/
    make -j
    make install
    popd
}

install_ucx
install_ucc
@zasdfgbnm
Copy link
Contributor Author

cc @ptrblck

@Sergei-Lebedev
Copy link
Contributor

Confirmed, I was able to repro this issue with pytorch:22.03-py3. However it's not clear if we need to resolve this at UCC level. Error happens because compiler picks up libucm required by libucs from a different directory, i.e. libucs is taken from $UCX_HOME while libucm comes from HPCX installed in /opt. Proper container environment config should resolve the issue.

@zasdfgbnm
Copy link
Contributor Author

I confirm that using the UCX from HPCX will get rid of the error. But it is strange that the build system is not respecting the UCX_HOME environmental variable.

@vspetrov
Copy link
Collaborator

I confirm that using the UCX from HPCX will get rid of the error. But it is strange that the build system is not respecting the UCX_HOME environmental variable.

agree. need to fix that.

@Sergei-Lebedev
Copy link
Contributor

ok, will prepare fix for this

@s1ddok
Copy link

s1ddok commented Oct 16, 2023

I'm still getting this issue on the newer container versions

inspecting shows this

avolodin@winstation:/workspace/permutohedral_encoding$ ldd /opt/hpcx/ucx/lib/libucs.so.0
        linux-vdso.so.1 (0x00007fff15f1c000)
        libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007f2d9b48a000)
        libucm.so.0 => /usr/lib/x86_64-linux-gnu/libucm.so.0 (0x00007f2d9b470000)
        libz.so.1 => /usr/lib/x86_64-linux-gnu/libz.so.1 (0x00007f2d9b454000)
        libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f2d9b22c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2d9b717000)

how to overcome this? @Sergei-Lebedev

@Sergei-Lebedev
Copy link
Contributor

@s1ddok can you pls provide more details, what container do use, what ucc version and how to repro?

@s1ddok
Copy link

s1ddok commented Oct 17, 2023

I'm using nvcr.io/nvidia/pytorch:23.09-py3, it happens when I try to compile this repo: https://github.com/RaduAlexandru/permuto_sdf

what helps is doing this:

export LD_LIBRARY_PATH=/opt/hpcx/ucx/lib:$LD_LIBRARY_PATH

but it was quite hard to figure out, I wonder if it is possible to make things work out of the box?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants