Skip to content

WeeklyTelcon_20170404

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Josh Hursey (IBM)
  • Howard (LANL)
  • Artem Polyakov (Mellanox)
  • Joshua Ladd (Mellanox)
  • Edgar Gabriel (UH)
  • Todd Kordenbrock (Sandia)
  • Ralph (Intel)
  • Sylvain Jeaugey (NVIDIA)
  • Brian Barrett (Amazon)
  • David Bernholdt (ORNL)
  • Geoffroy Vallee (ORNL)
  • Thomas Naughton (ORNL)
  • George (UTK)

Agenda

  • Issue #3267 - OSU crashing in v2.x series
    • PR #3274 Needs to be PR'ed to v2.x series
    • Issue should be re-opened until it has been PR'ed to all of the correct places.

Review v2.0.x

  • No specific update

Review v2.1.x

  • PR #3193 - RM's need to review and discuss a bit more.
  • PR for Issue #3267 needs to be PR'ed to the v2.x series
  • Issue #3268 -

Review v3.0.0

  • PR #3271 - Update hwloc to 1.11.6
    • Pushed to the next release
    • Add a release note about this issue, and possible workarounds (e.g., external hwloc)
  • Disabling CMA
    • PR #3272 vs PR #3270 : Only one of these should be picked

Review PMIx v2.0

  • v2.0 API's are all in master - we are API complete
    • Open MPI master has been updated (PR #3273)
  • Timeline:
    • Target PMIx v2.0 release end of April
    • Commit this to master, once stable MTT, then PR to v3.0
  • No update

AWS Testing Setup

  • Jenkins setup in progress
    • Jenkins Builder "Open MPI CI"
      • Builds after every push to ompi (not PRs yet)
      • Basic build running at UH
      • Infrastructure coming along
  • MTT Setup
    • Will test using many of the open-source batch schedulers (e.g., SLURM, Torque)
  • Need more participation in MTT testing
    • In particular those pushing for a faster release cycle
    • IBM: To turn this back on this week.
    • Mellanox: To start getting this setup soon.
  • Absoft has a compiler failure
    • Ralph to reach out to check on that
  • Cisco:
    • Disabled one-side tests
    • oshmem still failing

MTT Development status:

  • No update

Exceptional topics

  • If you have a patch that needs to go to a release branch
    • Label the PR to master with the Target X tag so it doesn't get missed
    • Do not close the Issue until the patch has been PR'ed and merged into all of the required release branches.
      • Be careful of auto-closing issues from your PR commit message.
  • Next Face-to-Face meeting
    • Doodle poll

Status Updates:

  • Mellanox
    • PMIx, v2.x ongoing work
    • Working on adding a osc/ucx component
  • Sandia
    • Issue #173 - Very old ticket on Portals MTL
    • MTT - Trying to figure out why some tests are running very slowly
  • Intel
    • PMIx v2.0
      • Cross version work
      • Solidifying master\v2.x branch
    • OpenMP/MPI coordination
    • Launch scaling work

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Cisco, IBM, ORNL, UTK, NVIDIA, Amazon

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally