Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with EOPNOTSUPP returned from getsockopt() #5716

Merged
merged 1 commit into from
Sep 18, 2018
Merged

Deal with EOPNOTSUPP returned from getsockopt() #5716

merged 1 commit into from
Sep 18, 2018

Conversation

mkuron
Copy link
Contributor

@mkuron mkuron commented Sep 16, 2018

EOPNOTSUPP can be returned by getsockopt() when running on QEMU user-mode emulation, which does not support getsockopt(..., SOL_SOCKET, SO_RCVTIMEO, ..., ...). I discovered this with OpenMPI 2.1.1 on Ubuntu 18.04, emulating armhf on amd64, but the issue seems to persist in newer versions.

The actual error is in the PMIx library (patch at openpmix/openpmix#836), but fc32ae4 copied one instance of that error into an OpenMPI file.

Without this patch, the following error messages appear and initialization fails:

$ mpiexec -n 1 --mca pmix_base_verbose 100 ./helloworld
[...]
[0086e7d4ea41:26421] pmix: init called
[0086e7d4ea41:26421] posting notification recv on tag 0
[0086e7d4ea41:26421] Security SPC include: native
[0086e7d4ea41:26421] sec: native init
[0086e7d4ea41:26421] sec: SPC native active
[0086e7d4ea41:26421] usock_peer_try_connect: attempting to connect to server
[0086e7d4ea41:26421] usock_peer_try_connect: attempting to connect to server on socket 15
[0086e7d4ea41:26421] pmix: SEND CONNECT ACK
[0086e7d4ea41:26414] listen_thread: new connection: (27, 0)
[0086e7d4ea41:26421] send blocking of 33 bytes to socket 15
[0086e7d4ea41:26421] blocking send complete to socket 15
[0086e7d4ea41:26421] pmix: RECV CONNECT ACK FROM SERVER
[0086e7d4ea41:26414] connection_handler: new connection: 27
getsockopt level=1 optname=20 not yet supported
[0086e7d4ea41:26414] RECV CONNECT ACK FROM PEER ON SOCKET 27
[0086e7d4ea41:26414] waiting for blocking recv of 12 bytes
[0086e7d4ea41:26421] PMIX ERROR: UNREACHABLE in file src/client/pmix_client.c at line 205
[0086e7d4ea41:26414] blocking receive complete from remote
[0086e7d4ea41:26414] waiting for blocking recv of 21 bytes
[0086e7d4ea41:26414] blocking receive complete from remote
[0086e7d4ea41:26414] connect-ack recvd from peer 1524039681:0:1.2.1
[0086e7d4ea41:26414] sec: native validate_cred NULL
[0086e7d4ea41:26414] sec:native checking getsockopt for peer credentials
[0086e7d4ea41:26414] sec: native credential 0:0 valid
[0086e7d4ea41:26414] client credential validated
[0086e7d4ea41:26414] send blocking of 4 bytes to socket 27
[0086e7d4ea41:26421] sec: native finalize
[0086e7d4ea41:26414] usock_peer_send_blocking: send() to socket 27 failed: Broken pipe (32)
[0086e7d4ea41:26414] PMIX ERROR: UNREACHABLE in file src/server/pmix_server_listener.c at line 507
[...]
  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)

This can be returned when running on QEMU user-mode emulation,
which does not support getsockopt with SO_RCVTIMEO.

Signed-off-by: Michael Kuron <[email protected]>
@ompiteam-bot
Copy link

Can one of the admins verify this patch?

@rhc54
Copy link
Contributor

rhc54 commented Sep 16, 2018

ok to test

@rhc54 rhc54 merged commit 8a200ba into open-mpi:master Sep 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants