Default binding policy #4799

artpol84 · 2018-02-08T20:54:01Z

OMPI version: v2.1

I was recently investigating the issue with PMIx_Get latency of a dstore. I was running on 1 node and observing growing numbers when PPN cont was increased. I was using the default binding policy thinking that it defaults to bind-to core.
The bottleneck was attributed to a thread shift part:
openpmix/openpmix#665 (comment).

Debugging the scheduler that PMIx service thread was assigned to a different core which was causing perf issues. You can see on the plot that starting from 4 procs the performance degrades noticeably. This is due to the fact that if IIRC up to 2 processes mpirun will bind to core and then it will be socket.
Perf confirmed that guess:

cpu # is enclosed in brackets: [0004];
pmix_intra_perf[164802] is the main thread
pmix_intra_perf[164807/164802] is a service thread.

$ perf sched timehist
...
  648540.416283 [0004]  pmix_intra_perf[164802]             0.005      0.000      0.005.                                                                                                                                                      
  648540.416289 [0008]  pmix_intra_perf[164807/164802]      0.003      0.000      0.007.                                                                                                                                                      
  648540.416294 [0004]  pmix_intra_perf[164802]             0.004      0.000      0.006.                                                                                                                                                      
  648540.416299 [0008]  pmix_intra_perf[164807/164802]      0.003      0.000      0.006.  
...

For 4 PPN case procs was remaining on their CPUs for the whole time (cpu4 and cpu8). But starting from 16PPN they began to actively migrate which caused more rapid growt:

$ perf sched timehist
...
  649086.369911 [0019]  pmix_intra_perf[165820/165811]      0.004      0.001      0.016.                                                                                                                                                      
  649086.369914 [0017]  pmix_intra_perf[165811]             0.012      0.000      0.006.                                                                                                                                                      
  649086.369921 [0019]  pmix_intra_perf[165820/165811]      0.001      0.000      0.007.                                                                                                                                                      
  649086.369925 [0017]  pmix_intra_perf[165811]             0.005      0.000      0.005.                                                                                                                                                      
  649086.369933 [0019]  pmix_intra_perf[165820/165811]      0.003      0.000      0.008.                                                                                                                                                      
  649086.369941 [0023]  pmix_intra_perf[165811]             0.006      0.000      0.009.                                                                                                                                                      
  649086.369948 [0019]  pmix_intra_perf[165820/165811]      0.006      0.001      0.008.                                                                                                                                                      
  649086.369953 [0023]  pmix_intra_perf[165811]             0.005      0.000      0.006.                                                                                                                                                      
  649086.369961 [0019]  pmix_intra_perf[165820/165811]      0.004      0.001      0.008.                                                                                                                                                      
  649086.369966 [0023]  pmix_intra_perf[165811]             0.005      0.000      0.007.                                                                                                                                                      
  649086.369984 [0019]  pmix_intra_perf[165820/165811]      0.012      0.009      0.010.                                                                                                                                                      
  649086.369994 [0027]  pmix_intra_perf[165811]             0.016      0.001      0.011.                                                                                                                                                      
  649086.369999 [0019]  pmix_intra_perf[165820/165811]      0.008      0.000      0.007.                                                                                                                                                      
  649086.370004 [0027]  pmix_intra_perf[165811]             0.004      0.000      0.006.                                                                                                                                                      
  649086.370012 [0019]  pmix_intra_perf[165820/165811]      0.004      0.000      0.008.                
...

After forcing bind-to core performance stabilized (yellow dashed curve):
openpmix/openpmix#665 (comment)

I this an additional input on the impact that default binding policy may have. The suggestion is to consider this at the next OMPI dev meeting.

The text was updated successfully, but these errors were encountered:

ggouaillardet · 2018-02-09T02:02:58Z

@artpol84 IIRC, the rationale for binding to sockets (instead of core) is to be friendly with those who have hybrid MPI+OpenMP applications, but fail to ask n cpus per MPI task.
And the rationale for binding to cores by default when there are 2 MPI tasks is simply to get better out of the box performance when comparing Open MPI vs an other MPI library.

artpol84 · 2018-02-09T03:05:59Z

And when those decisions was made circumstances I highlighted here wasn’t taken into consideration.
I discussed this with @rhc54 in the context of PMIx performance and he suggested that OMPI defaults might need to be revisited considering these findings.

rhc54 · 2018-03-20T22:40:15Z

This was discussed at the devel meeting, and the conclusion was that the need to adequately support multi-threaded applications overrides this issue. We don't know of any way to force the kernel to keep one thread local to another, following each other around the socket.

For performance tests like the one you are running, you should override the default binding policy with bind-to core. For OMPI, we feel that the current defaults are the correct ones to use.

artpol84 added the enhancement label Feb 8, 2018

jsquyres mentioned this issue Feb 20, 2018

Document binding behavior (especially w.r.t. threads) #4845

Open

rhc54 closed this as completed Mar 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default binding policy #4799

Default binding policy #4799

artpol84 commented Feb 8, 2018

ggouaillardet commented Feb 9, 2018

artpol84 commented Feb 9, 2018

rhc54 commented Mar 20, 2018

Default binding policy #4799

Default binding policy #4799

Comments

artpol84 commented Feb 8, 2018

ggouaillardet commented Feb 9, 2018

artpol84 commented Feb 9, 2018

rhc54 commented Mar 20, 2018