Skip to content

WeeklyTelcon_20170829

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen (IBM)
  • Artem (Mellanox)
  • Ralph Castain (Intel)
  • Nathan Hjelm
  • Howard
  • Brian Barrett
  • Todd Kordenbrock
  • Jeff Squyres (Cisco)
  • Geoffroy Vallee
  • Joshua Ladd
  • Joshua Hursey

Agenda

Review v2.0.x Milestones v2.0.4

  • Nothing new to report.

Review v2.x Milestones v2.1.2

  • v2.1.2
    • Howard put out an RC last week.
    • Big Endian Support Stuff to put in here - 4105 needs review.
  • PR4059 - NEWs
    • Some discussion about -xrc issue. Is this a regression from v2.x or existed always.
    • Artem tested:
      • Still an openib issue, only on the v2.x branch. v2.0.x and v3.0.x does NOT reproduce.
      • not reproducible does not mean it's not there.
    • leave it disabled on v2.x. Leave it disabled on master. And leave it disabled on v3.0.x.
    • If someone fixes it, they can re-enable it.
  • New RC tomorrow.
    • Howard noticed that autogen blew up in configure (with .tarball) in Pmix in v2.x. Opened an Issue 4109.

Review v3.0.x Milestones v3.0

  • Will put out RC5 on Thursday Morning.
  • Remove proof of concept f08 module. PR4070
  • Josh Hursey's Patch for Dynamic components linking against their project library.
  • RE: Issue4126
    • We will integrate the mpi4py tests into our CI tests, and catch things like this.
    • Brian will close this and file a new ticket
    • Could be fairly easy to fix. Doesn't translate the lower level error into MPI_ERROR_NO_KEY?
    • PR3743 was a recent change.

Review Master Master Pull Requests

  • Does appear we have a number of PMI install failures on master.
    • Perhaps could have been fixed, since it's dated on the 12th. Cisco / Absoft.
      • 'nm_check_prefix' - this test requires an ENV to run. There is a directive in Makefile.am to run.
        • This test is used to report exported symbols that shouldn't be exported.
        • For some environments this check isn't working. Will get Mark to look at again.
        • Mark (IBM) will look at.
    • IBM will file a separate PR that takes it out, and use the existing PR to discuss.
    • will create a seperate PR to remove it today, and then discuss the correct fix in the existing PR.
  • libpmix failed in linking against libopal.
    • atomic opal_atomic - 64bit in 32bit? In PMIX?
    • Need to look at some more.

MTT / Jenkins Testing

MTT Dev status:

  • Added some Spawn tests to MTT in ibm-dynamic.
    • Can't find executable, due to current working directory. C works, but Fortran fails.
  • Thanks to Mellanox to renaming MTT tests so they all follow the same naming convention.
  • Amazon started submitting 'intel_mpi' tests to our MTT.
    • There for the purpose of comparison.
    • Want to do for MPICH also.

Jenkins CI

Next Week Discussion Points.

  • Schedule for NEXT v3.x release (Branch and Ship)
  • Like to enable -Werror. We've gotten sloppy.
    • Could target this for v3.1.0. (or later, need to discuss).

Master

  • Some things missing out of SHMEM - Issue4098
  • userlist talking about ROCE issue
  • PR4121 - fixes component linking model to link against project level library.
    • Couldn't figure out something that would work well with automake.
    • Only adding one library to each component.
    • Josh wrote a script to inject these libraries across the whole tree

Master testing

  • Cisco added a bunch of new mtt platforms, but not sure why the v3.0.x tests didn't run in last 48 hours.

Exceptional topics

  • Container folks contacted Ralph about issues.

    • Open MPI issues with containers.
    • Is Open MPI going to consider supporting building with one version and running with another version through major series.
      • we've always said if you build app with v2.0.0, it will run with all versions of v2.x.x.
    • some containers are putting orte inside container and some are putting it outside the container.
    • In Docker people tend to put the orted in the container.
    • v2.0.x and v2.1.x has the same version of PMIx in it. ABI stayed the same, but some handshake or something.
    • Starting with v3.0.0, we should build tests with v3.0.0 and do testing to ensure they run with future v3.x.x for ABI.
  • Next face-to-face meeting

    • Jan / Feb
    • San Jose, Portland, Albuquerque, Dallas

Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally