Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't try to bind tasks when direct launched by an external launcher that did no binding #1386

Closed
wants to merge 3 commits into from

Conversation

plesn
Copy link
Contributor

@plesn plesn commented Feb 19, 2016

When using an external launcher like srun, srun handles the
binding. Sometimes it does intentionnaly not bind, like for a hybrid
job.

If we set a binding policy in ompi (I like to have mpirun bind by
default) it is usually not applied in srun because there is a check to
see if processes were externally bound. But when they were
intentionally left unbound by srun, the check says they are unbound,
so we bind them.

I find this behaviour doesn't follow least suprise, and that by
default we should rather keep what the external launcher did.

So what I propose here is to not apply the binding policy for direct
launched jobs.
If we really want to enforce the binding policy when the launcher did
nothing, I propose a mca flag hwloc_base_bind_direct_launched=true

So for an unbound openmp job, with the original behaviour (or the new
mca param) we get:

   $ env OMPI_MCA_hwloc_base_binding_policy=core OMPI_MCA_hwloc_base_report_bindings=1 OMPI_MCA_hwloc_base_bind_direct_launched=true srun --exclusive -p bulldozer --label  -N1 -n1 --cpu_bind=none,verbose  omp_hybrid_hello
   0: [btp3:08632] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././.][./././././././././././././././.]
   0: 0.13 on btp3.8632.8654 bound to Core#0 (pu: {0}) running on 0; details: '[Socket#0(32GB)].Core{0}.PU{0}'
   0: 0.19 on btp3.8632.8660 bound to Core#0 (pu: {0}) running on 0; details: '[Socket#0(32GB)].Core{0}.PU{0}'
   0: 0.14 on btp3.8632.8655 bound to Core#0 (pu: {0}) running on 0; details: '[Socket#0(32GB)].Core{0}.PU{0}'

With the new behaviour, we maintain what srun did:

   $ env OMPI_MCA_hwloc_base_binding_policy=core OMPI_MCA_hwloc_base_report_bindings=1 srun --exclusive -p bulldozer --label  -N1 -n1 --cpu_bind=none,verbose  omp_hybrid_hello
   0: [btp3:08703] MCW rank 0 is not bound (or bound to all available processors)
   0: 0.24 on btp3.8703.8736 bound to Machine#0(64GB) (pu: {0-31}) running on 31; details: 'Socket{0-1}.Core{0-7}.PU{0-31}'
   0: 0.9 on btp3.8703.8721 bound to Machine#0(64GB) (pu: {0-31}) running on 31; details: 'Socket{0-1}.Core{0-7}.PU{0-31}'
   0: 0.0 on btp3.8703.8703 bound to Machine#0(64GB) (pu: {0-31}) running on 24; details: 'Socket{0-1}.Core{0-7}.PU{0-31}'
   [...]

What do you think of this ?

@rhc54
Copy link
Contributor

rhc54 commented Feb 19, 2016

I'll put this on the "to-be-discussed" list for next week's developer's meeting. The initial idea was to have mpirun and direct launch wind up with the same behavior in the absence of any directives. This proposal would seem to break that model - not sure which causes the most surprise.

@plesn
Copy link
Contributor Author

plesn commented Feb 22, 2016

I totally agree that a common model between mpirun and direct launch is something very desirable ! By the way is it really as easy to get the right placement in srun as to my knowledge we don't have a mapping mca param with srun ?

The surprise here is that an explicit command line option (srun --cpu_bind=none) is overwritten by a configuration option (the openmpi binding_policy). Especially as srun has its own default binding. Preventing this would probably require the launcher to give its binding requirements from the user by pmi…

@rhc54
Copy link
Contributor

rhc54 commented Feb 22, 2016

I would be inclined to agree that an explicit call to bind a certain way (or not to bind at all) should be respected. The difficulty is in detecting it, as you note. We cannot wait for PMIx directives as that might come too late to ensure that memory is aligned with the location, so we likely do have to look for RM-specific directives. Merits a little thought as to the best method for solving the problem.

IIRC, we were supposed to check for an externally-applied binding (i.e., that we are already bound in some fashion) before auto-binding ourselves. So I think we are okay if they specify a binding pattern. Thus, the question really falls down to "how do we detect an explicit do-not-bind directive given to the RM"? Trickier problem.

@plesn
Copy link
Contributor Author

plesn commented Feb 23, 2016

The detection of whether we were bound externally is also not quite
right imho as it essentially tests whether we are not filling what we
can (allowed_cpuset != bound_cpuset).

This means that when slurm allocates only one socket on a node and
then bounds to this socket, we think we were not bound and also use
our policy.

That's why I think that as long as don't know what the user provided
it is safer to let by default the binding the direct launcher made.

@rhc54
Copy link
Contributor

rhc54 commented Feb 24, 2016

@plesn We talked about this at some length today. There was consensus that we should do a better job of detecting that we were externally bound so we don't change that binding pattern. This includes the case where the user explicitly told the RM "do not bind". @artpol84 found the right SLURM envars to support that behavior, and we'll have to see if others can provide us with similar "flags".

There was general disagreement over the right behavior in the case where the user specified nothing on the RM cmd line. We currently bind by default due to user complaints over performance differences between jobs launched via mpirun vs srun. So in this case, we are going to stick with the current behavior of binding to some pre-defined pattern, or to the pattern specified via MCA param. We note that the MCA param offers the "bind-to-none" option, which the user can set for direct launch and the sys admin can include in the default MCA parameter file. So users and admins are free to customize this behavior today.

However, we did recognize that a system wide default param would apply both to direct launch and launch via mpirun, and that this might not be desirable. So we will look into providing sectional headings in the default MCA parameter file that will let admins specify a different behavior for direct vs mpirun launch. Not sure when this will be done, but it would seem a better alternative to adding yet another param to control binding.

Hope that makes sense.

@plesn
Copy link
Contributor Author

plesn commented Feb 25, 2016

Ok I understand, and I agree that it makes sense to:

  • have a envvar or pmi attribute from slurm to indicate the binding
    selected by the user as we can't really guess it.
    It's true that pmix/s2 is already slurm-specific.
  • have sections for direct-launch and mpirun specific options. This
    would be a more general implementation of this issue
  • bind if user or srun did nothing, for benchmarking comparison
    purposes…

In the meantime for our internal usage at Bull we'll temporarily use
this, so I change my mca param default to bind_direct_launched=true,
so it we stay compatible with mainline behaviour.

@jsquyres
Copy link
Member

@plesn @rhc54 Is this PR still relevant?

@rhc54
Copy link
Contributor

rhc54 commented Mar 25, 2016

Not for the master - it has been fixed there, albeit in a different way.

@jsquyres jsquyres closed this Mar 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants