Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4.1.5 UCX_NET_DEVICES not selecting TCP devices correctly #12785

Open
bertiethorpe opened this issue Aug 30, 2024 · 2 comments
Open

v4.1.5 UCX_NET_DEVICES not selecting TCP devices correctly #12785

bertiethorpe opened this issue Aug 30, 2024 · 2 comments

Comments

@bertiethorpe
Copy link

Details of the problem

  • OS version (e.g Linux distro)
    • Rocky Linux release 9.4 (Blue Onyx)
  • Driver version:
    • rdma-core-2404mlnx51-1.2404066.x86_64
    • MLNX_OFED_LINUX-24.04-0.6.6.0

Setting UCX_NET_DEVICES to target only TCP devices when RoCE is available seems to be ignored in favour of some fallback.

I'm running a 2 node IMB_MPI PingPong to benchmark RoCE against regular TCP ethernet.

Setting UCX_NET_DEVICES=all or mlx5_0:1 gives the optimal performance and uses RDMA as expected.
Setting UCX_NET_DEVICES=eth0, eth1, or anything else still appears to use RoCE at only a slightly longer latency

HW information from ibstat or ibv_devinfo -vv command :

        hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         20.36.1010
        node_guid:                      fa16:3eff:fe4f:f5e9
        sys_image_guid:                 0c42:a103:0003:5d82
        vendor_id:                      0x02c9
        vendor_part_id:                 4124
        hw_ver:                         0x0
        board_id:                       MT_0000000224
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

How ompi is configured from ompi_info | grep Configure :

 Configured architecture: x86_64-pc-linux-gnu
 Configured by: abuild
 Configured on: Thu Aug  3 14:25:15 UTC 2023
 Configure command line: '--prefix=/opt/ohpc/pub/mpi/openmpi4-gnu12/4.1.5'
                                             '--disable-static' '--enable-builtin-atomics'
                                             '--with-sge' '--enable-mpi-cxx'
                                             '--with-hwloc=/opt/ohpc/pub/libs/hwloc'
                                             '--with-libfabric=/opt/ohpc/pub/mpi/libfabric/1.18.0'
                                             '--with-ucx=/opt/ohpc/pub/mpi/ucx-ohpc/1.14.0'
                                             '--without-verbs' '--with-tm=/opt/pbs/'

Following the advice from Here, it is apparently due to a higher priority of OpenMPI's btl/openib component but I don't think it can be if --without-verbs and openib is not available when searching ompi_info | grep btl.

As suggested in the UCX issue, adding -mca pml_ucx_tls any -mca pml_ucx_devices any to my mpirun has fixed this problem, but I was wondering what in the MCA precisely causes this behaviour.

Here's my batch script:

#!/usr/bin/env bash

#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.out
#SBATCH --exclusive
#SBATCH --partition=standard

module load gnu12 openmpi4 imb

export UCX_NET_DEVICES=mlx5_0:1

echo SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST
echo SLURM_JOB_ID: $SLURM_JOB_ID
echo UCX_NET_DEVICES: $UCX_NET_DEVICES

export UCX_LOG_LEVEL=data
mpirun -mca pml_ucx_tls any -mca pml_ucx_devices any IMB-MPI1 pingpong -iter_policy off
@evgeny-leksikov
Copy link

@bertiethorpe I can't reproduce described behavior with ompi and ucx bult from sources (see below), what I'm missing?

  1. I removed libfabric and pbs
  2. used osu instead of IMB
    but it should not make a difference:
$ <path>/ompi_install/bin/ompi_info | grep Configure
 Configured architecture: x86_64-pc-linux-gnu
           Configured by: evgenylek
           Configured on: Tue Oct  1 17:07:14 UTC 2024
  Configure command line: '--prefix=<path>/ompi_install' '--disable-static' '--enable-builtin-atomics' '--with-sge' '--enable-mpi-cxx' '--without-verbs'

$ ibdev2netdev | grep Up
mlx5_0 port 1 ==> ib0 (Up)
mlx5_2 port 1 ==> ib2 (Up)
mlx5_3 port 1 ==> enp129s0f1np1 (Up)
mlx5_4 port 1 ==> ib3 (Up)

$ mpirun -H host1,host2 -n 2 /osu-micro-benchmarks-5.8/mpi/pt2pt/osu_latency -m 0:128                                 
# OSU MPI Latency Test v5.8                                                                                                                                                                     
# Size          Latency (us)                                                                                                                                                                    
0                       0.89                                                                                                                                                                    
1                       0.89                                                                                                                                                                    
2                       0.89                                                                                                                                                                    
4                       0.89                                                                                                                                                                    
8                       0.88                                                                                                                                                                    
16                      0.89                                                                                                                                                                    
32                      0.91                                                                                                                                                                    
64                      1.03                                                                                                                                                                    
128                     1.07                                                                                                                                                                    
$ mpirun -x UCX_NET_DEVICES=mlx5_0:1 -H host1,host2 -n 2 /osu-micro-benchmarks-5.8/mpi/pt2pt/osu_latency -m 0:128     
# OSU MPI Latency Test v5.8                                                                                                                                                                     
# Size          Latency (us)                                                                                                                                                                    
0                       0.89                                                                                                                                                                    
1                       0.89                                                                                                                                                                    
2                       0.88                                                                                                                                                                    
4                       0.88                                                                                                                                                                    
8                       0.88                                                                                                                                                                    
16                      0.89                                                                                                                                                                    
32                      0.91                                                                                                                                                                    
64                      1.02                                                                                                                                                                    
128                     1.07                                                                                                                                                                    
$ mpirun -x UCX_NET_DEVICES=mlx5_3:1 -H host1,host2 -n 2 /osu-micro-benchmarks-5.8/mpi/pt2pt/osu_latency -m 0:128     
# OSU MPI Latency Test v5.8                                                                                                                                                                     
# Size          Latency (us)                                                                                                                                                                    
0                       1.33                                                                                                                                                                    
1                       1.34                                                                                                                                                                    
2                       1.34                                                                                                                                                                    
4                       1.34                                                                                                                                                                    
8                       1.34                                                                                                                                                                    
16                      1.34                                                                                                                                                                    
32                      1.38                                                                                                                                                                    
64                      1.60                                                                                                                                                                    
128                     1.67                                                                                                                                                                    
$ mpirun -x UCX_NET_DEVICES=enp129s0f1np1 -H host1,host2 -n 2 /osu-micro-benchmarks-5.8/mpi/pt2pt/osu_latency -m 0:128
# OSU MPI Latency Test v5.8                                                                                                                                                                     
# Size          Latency (us)                                                                                                                                                                    
0                      55.89                                                                                                                                                                    
1                      56.11                                                                                                                                                                    
2                      56.15                                                                                                                                                                    
4                      56.29                                                                                                                                                                    
8                      56.09                                                                                                                                                                    
16                     56.12                                                                                                                                                                    
32                     56.14                                                                                                                                                                    
64                     56.62                                                                                                                                                                    
128                    56.86                                                                                                                                                                    
$ mpirun -x UCX_NET_DEVICES=eno1 -H host1,host2 -n 2 /osu-micro-benchmarks-5.8/mpi/pt2pt/osu_latency -m 0:128         
# OSU MPI Latency Test v5.8                                                                                                                                                                     
# Size          Latency (us)                                                                                                                                                                    
0                      60.95                                                                                                                                                                    
1                      61.04                                                                                                                                                                    
2                      61.11                                                                                                                                                                    
4                      61.12                                                                                                                                                                    
8                      61.05                                                                                                                                                                    
16                     61.10                                                                                                                                                                    
32                     61.16                                                                                                                                                                    
64                     61.43                                                                                                                                                                    
128                    61.69                                                                                                                                                                    

@yosefe
Copy link
Contributor

yosefe commented Oct 6, 2024

@bertiethorpe can you pls increase the verbosity of OpenMPI, by adding -mca pml_ucx_verbose 99 after mpirun (along with -x UCX_NET_DEVICES=eth0), and post the resulting output?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants