Skip to content

RuntimeDiscussion_20180718

Geoffrey Paulsen edited this page Aug 1, 2018 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Josh Hursey
  • Ralph Castain
  • Geoffroy Vallee
  • Todd Kordenbrock
  • Shinji Sumimoto
  • Takahiro Kawashima
  • Maurali (LLNL)

Overall Runtime Discussion (talking v5.0 timeframe, 2019)

  • Two Options:
    1. Keep going on our current path, and taking updates to ORTE, etc.

      • Two problems:
        1. Opal abstraction layer. Because every time you want to expose a new PMIx function, you have to do it 3 times.
            1. PMIX, 2 OPAL abstraction layer, and 3 in ORTE itself.
          • Problem because extra redundant work, and also problem in terms of BUGs.
          • Potential solution: Could re-do the OPAL abstraction layer. - use PMIx as the internal layer in OMPI itself.
            • Would have to figure out how to write a SLURM PMI1 or PMI2 interface.
            • Could call PMIX API and convert to PMI1 or PMI2 protocol for SLURM or ALPS.
              • Eventually this will go away as SLURM and ALPS will implement PMIX APIS, and wont need PMI1 or PMI2 layers.
            • Could say with Open MPI v5.0 that we'll only Supply a PMIx API, and those who need it can stay at OMPIv4.0
            • Need to see how hard of a line we might take.
            • SLURM already has a PMIx impelementation, but OLDER SLURMS will be the issue.
            • At the moment, CRAY doesn't yet have a PMIX version of ALPS.
          • Tools - PMI1 and PMI2 don't have tools interfaces.
        2. MPIR - if Open MPI chooses not to REMOVE in v5.0
          • Orthoginal to OPAL abstraction layer issue.
          • Touches ORTE and OMPI layers. - partially broken right now.
          • Historiclly don't worry, and someone will fix bugs.
    2. Shuffle our code a bit (new ompi_rte framework merged with orte_pmix frame work moved down and renamed)

      • Opal used to be single process abstraction, but not as true anymore.
      • API of foo, looks pretty much like PMIx API.
        • Still have PMIx v2.0, PMI2 or other components (all retooled for new framework to use PMIx)
      • to call just call opal_foo.spawn(), etc then you get whatever component is underneath.
      • what about mpirun? Well, PRTE comes in, it's the server side of the PMIx stuff.
        • Could use their prun and wrap in a new mpirun wrapper
      • PRTE doesn't just replace ORTE. PRTE and OMPI layer don't really interact with each other, they both call the same OPAL layer (which contains PMIx, and other OPAL stuff).
        • prun has a lam-boot looking approach.
      • Build system about opal, etc. Code Shufflling, retooling of components.
      • We want to leverage the work the PMIx community is doing correctly.
      • ORNL OSHMEM - Having similar discussion, so This approach should work for OSHMEM as well.
      • ORTED - go through opal abstraction as well.
    3. PRTE - Third approach looks like lam-boot. - simply move from being inside OMPI to being inside of PRTE.

      • Only way this makes sense if there is a more active community working on PRTE.
      • Any hope on this becoming true? - Not really, we'd be surprised.
      • Thought that when resource managers adopted
      • OSHMEM community needs to have a solution. Right now extract ORTE from Open MPI.
        • OSHMEM is interested in having it's own prted for it's launching.
        • Thought some resources were becoming available, but a bit confusing now.

A slightly different question - seperating runtime project from Open MPI, either PRTE or ORTE. * One benifit of using a seperate runtime project, is that it's easier to integrate. * Like the idea of pulling runtime away from Open MPI as a seperate project. * Then runtime itself can follow it's own path and it's own release cycle. * Then Open MPI can pick a version of runtime based on quality requirements. * Having this seperate project be prte has some advantages

* Fujitsu - process manager - currently implemented and debugging PMIx in their resource manager.
  • Does Open MPI want a launcher at all?

    • It used to be like this with lamboot. Users would boot something, and then
    1. In this path, Would say that Open MPI doesn't have a resource manager (might package PRTE).
    2. Other path is we ARE going to have a runtime, but who's going to have it.
    • Right now, because the runtime is integrated in Open MPI, everyone has to work within this context.
      • If we split the two completely,
    • ORTE had to adjust for direct launch for SLURM and other direct launchers.
  • Three big questions:

  1. Should OMPI and OPAL move to using PMIX directly (without opal abstraction layer)
    • Internal code reordering, if done correctly, it'd be transparent.
      • Actually rather simple. Opal modex send/recv macros. Litterally copy those from prte, and put into a header in OMPI or OPAL.
      • Already done in PRTE.
    • At some point PMI1 and PMI2 conversion components - some users might see this pain.
    • Any reason NOT to do this??? - PMI1 and PMI2 components don't have owners for.
      • Can define this work.
  2. Do we have Open MPI contain ORTE as today, or pull it out into a seperate product (seperate release cycles, etc)
    • How to make progress on this question???
    • What do we gain by doing this?
      • Those who don't need runtime life is easier.
      • Those who don't need MPI is easier.
      • Customers can update runtime independently from Open MPI releases. (been helpful for other launchers)
      • Could have it's own quality requirements for release.
      • Would like to have seperate runtime tests.
    • This is the main decision.
    • How do we get the stake holders to the meeting???
      • Lets have another meeting like this in a month?
    • How can we get a credible answer to "What's the path forward?"
    • Nobody has any resources to put on it. No matter what we decide no one can do it.
    • Need a clear decision from the community.
    • Do we need statements of intent?
      1. Take ORTE out, and need a 3rd party launcher in some env.
      2. Leave ORTE in, and people have to step up and
        • Do we have everyone call PMIx directly? Burden on non PMIx envs.
  3. If we Do seperate it out, what (if any) do we make default?
    • Delay until we can answser #2.

We've got 3 big questions, how do we make progress?

Chicken and Egg problem, people don't see the priority yet, because they don't feel the pain yet.

  • One solution is to "expose the pain" in small increments.

  • ECP - exa-scale project for Labs.

  • If we do this, we still need people to do runtime work over in PRTE.

    • In some ways it might be harder to get resources from management for yet another project.
    • Nice to have a componentized interface, without moving runtime to a 3rd party project.
    • Need to think about it.
  • Concerns with working adding ORTE PMIx integration.

  • Want to know the state of SLURM PMIx Plugin with PMIx v3.x

    • It should build, and work with v3. They only implemented about 5 interfaces, and they haven't changed.
  • A few related to OMPIx project, talking about how much to contribute to this effort.

    • How to factor in requirements of OSHMEM (who use our runtimes), and already doing things to adapt.
    • Would be nice to support both groups with a straight forward component to handle both of these.
  • Thinking about how much effort this will be. and manage these tasks in a timely manor.

  • Testing, will need to discuss how to best test all of this.

Today (Geoffroy Vallee)

  • Lets take a stance and let community react?

    • Move the runtime outside of the Open MPI tree, into it's own project.
    • Runtime would have it's own release schedule, and meetings.
    • Could drop an initial release right away.
    • Switch our code to use PMIx directly and not opal abstraction layer.
    • If people still want a way to start jobs, they either download a 3rd party package, or as a community we provide a packaged version of the software that gives everything at once.
      • Could be packaged as 2 rpms (one with RTE, and one without RTE)
    • Push this out there as what we're thinking about direction we want to go, let community respond with concerns.
      • Could even call the runtime ORTE when we move it out. If we use langage carefully.
    • Need to discuss with packagers after community has come to consensus.

    Geoffroy Vallee will send out this writeup to devel-core by Same time next week. Follow up meeting 2 weeks from now same time.

Clone this wiki locally