Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btl/uct: clear related flags when there is no rdma transport layer #6164

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ggouaillardet
Copy link
Contributor

This fixes the use of btl/uct with the tcp memory_domain

Signed-off-by: Gilles Gouaillardet [email protected]

@ggouaillardet
Copy link
Contributor Author

@hjelmn can you please review this PR ?
@hoopoepg if no are using the tcp memory domain with btl/uct then you need this patch (otherwise Open MPI crashes). This is unrelated to #6137 though

and correctly handle when there is no rdma transport layer.

This fixes the use of btl/uct with the tcp memory_domain

Signed-off-by: Gilles Gouaillardet <[email protected]>
@ggouaillardet
Copy link
Contributor Author

:bot:retest

@ggouaillardet
Copy link
Contributor Author

This patch is only good enough for eager message (e.g. small enough messages).

Function
main (osu_bw.c:117)
  PMPI_Isend (pisend.c:95)
    mca_pml_ob1_isend (pml_ob1_isend.c:198)
      mca_pml_ob1_send_request_start_seq (pml_ob1_sendreq.h:467)
        mca_pml_ob1_send_request_start_btl (pml_ob1_sendreq.h:432)
          mca_pml_ob1_send_request_start_rndv (pml_ob1_sendreq.c:811)
            mca_pml_ob1_rendezvous_hdr_prepare (pml_ob1_hdr.h:147)
              mca_pml_ob1_match_hdr_prepare (pml_ob1_hdr.h:102)
                mca_pml_ob1_common_hdr_prepare (pml_ob1_hdr.h:70)

MPI_Send() crashes with larger messages (a NULL header is usedin mca_pml_ob1_send_request_start_rndv() we use end up passing a NULL hdr to mca_pml_ob1_rendezvous_hdr_prepare().

@hjelmn
Copy link
Member

hjelmn commented Apr 1, 2019

I will see about checking this out in the next several weeks.

@hjelmn
Copy link
Member

hjelmn commented Apr 1, 2019

Note that I didn't really intend for btl/uct to be used with non-RDMA networks. It is worth fixing though.

@AboorvaDevarajan
Copy link
Member

Can one of the admins verify this patch?

@ibm-ompi
Copy link

ibm-ompi commented Feb 6, 2020

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/1e23e7debe86706c7143d238a89b2b3c

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/33d2372e313a2c9fba70b755f02dade8

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/4eaaaa47f2a6244ce675ced588af4178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants