Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VERSION: Changing master to v3.2.0 #4401

Closed
wants to merge 1 commit into from
Closed

Conversation

gpaulsen
Copy link
Member

There are currently no known binary incompatible changes
on master that would require a first digit change.

Signed-off-by: Geoffrey Paulsen [email protected]

There are currently no known binary incompatible changes
on master that would require a first digit change.

Signed-off-by: Geoffrey Paulsen <[email protected]>
@bwbarrett
Copy link
Member

I think the intermittent cray connectivity issue struck again. Bad Jenkins.

bot:ompi:retest

@bwbarrett
Copy link
Member

bot:ompi:retest

Not sure why Jenkins got angry there...

@bwbarrett
Copy link
Member

Not sure what's going on with the pull request checker; the build was successful. It looks like the API call from Jenkins to GitHub just didn't set the status properly. @gpaulsen, please go ahead and merge this.

@gpaulsen gpaulsen requested review from bwbarrett and removed request for hppritcha December 15, 2017 16:14
@gpaulsen
Copy link
Member Author

bot:retest

@gpaulsen
Copy link
Member Author

I'd love to get this into master soon before we forget that our intent is the release a v3.2 and not a v4.0

@bwbarrett
Copy link
Member

are we sure that the next version will be 3.2 instead of 4.0? I believe we wanted to remove the mxm mtl, since no one is supporting or testing it, but that would require bumping the version to 4.0. We should have removed for 3.0, but apparently I sucked. So we can make it 3.2 and then bump it to 4 if we do the removal, but I'm not sure how that plays in your internal release.

@rhc54
Copy link
Contributor

rhc54 commented Dec 20, 2017

Both @jsquyres and I seem to recall that we needed to go to 4.0 next time. I don't believe we can do a 3.2.

@gpaulsen
Copy link
Member Author

Well we should have some way to tell users if the API has changed, and if they need to recompile, versus if we just no longer support a certain interconnect.

IBM very much wants to maintain "forwards compatibility" for users compiled applications (On our platforms) but are also for culling older interconnect support that is no longer tested or used.

If we can distinguish between the two, we can help ensure the former while not precluding the later.

@rhc54
Copy link
Contributor

rhc54 commented Dec 21, 2017

I think what we need is a better understanding of why IBM seems so concerned about staying at 3.x instead of moving to 4.0. I get maintaining ABI for SpectrumMPI users, but that surely is just a corporate decision on what level to base your code on, and not a general OMPI community issue.

Are there things in master that you want in a 3.x release, but aren't scheduled for 3.1 inclusion? If so, why doesn't IBM just backport them in SpectrumMPI? We know you are maintaining your own patches (bug fix and feature) - isn't this just another set?

@gpaulsen
Copy link
Member Author

gpaulsen commented Jan 2, 2018

I believe this is an issue for Open MPI community as a whole. Every time the mpi library so name changes, it requires end users to relink (probably recompile and relink to be safe) their applications, which in turn generally requires a re-validation of their entire software stack.

In production environments, even if it's trivial to rebuild/relink an end user's application, policies can require days or weeks of validation after a rebuild. Policies usually allow for minor version updates which include changes that don't change .so version numbers, thereby allowing customers to upgrade to the next minor version at a much lower cost (in terms of validation testing).

Therefore I'm advocating a "lets not rev the major version numbers (or the user mpi library .so versions) unless its absolutely needed and planned for" strategy. Even in cases where we thought we needed to break ABI, we've found creative solutions to prevent that breakage or delay the ABI break until a planned for release.

@rhc54
Copy link
Contributor

rhc54 commented Jan 2, 2018

Now I am further confused. We specifically did deliberately plan to make a 4.0 break in 2018 - it was openly discussed in the last devel meeting. The feeling was that enough changes were occurring to justify it, and that the 3.x series was completing its life with the 3.1.x releases (which are expected to continue throughout 2018).

Since many of us come from the national lab environment, we fully grok the validation issue, though I think you overstate it here 😄 Production codes never picked up the latest x.0 release right away, but stay back one from there. So in this case, the labs will likely stay in the 3.0 series for at least 2018, and then move to 3.1 in 2019, going to 4.x (x > 0) in 2019/2020. Note that we (at least while I was there) always posted the newer releases so those wanting/needing access to the new features could use them.

So what precisely is your point of concern? You were one of the orgs pushing for a time-based release schedule - why is 2+ years of 3.x not adequate? Why would we want to distort the code base with workarounds simply to avoid a major release? And why is the lab's strategy not adequate for the customers you are concerned about?

@bwbarrett
Copy link
Member

I think I agree with everyone on this thread, which means my head has exploded :).

A couple of notes / thoughts...

  1. Bumping the major version number of Open MPI does not mean we have to bump the shared library version to force a recompile.
  2. We have generally used the major version of Open MPI to indicate backwards-incompatible changes to both the library interface (ie, bumping the shared library version to force a recompile) or to the user interface (mpirun, removing transports, etc.).

So I'm not sure what we want to do with the release that follows 3.1.x. It seems like we're going around and around here; perhaps we should bring this topic up at the next telecon and see if we can make more progress there?

@rhc54
Copy link
Contributor

rhc54 commented Jan 2, 2018

Yeah, I think that makes sense. IIRC, the rationale here was that we planned to remove some things (e.g., the sm btl, mxm mtl) and have new options. One could argue that these could be delayed, but I'm trying to understand why as the historical way of dealing with these version changes has seemed adequate and acceptable.

@jsquyres
Copy link
Member

jsquyres commented Jan 3, 2018

I was still out of the office yesterday; I don't know if you had the Tuesday webex this week or not to discuss this stuff.

Here's my $0.02:

  • If we make backwards-incompatible changes, we need to bump the major version.
  • If we do not make backwards-incompatible changes, we (really really) should not change the major version.
  • Removing components and/or changing CLI or MCA parameters are backwards-incompatible changes.

Meaning: as @rhc54 pointed out, if we remove those components and/or change CLI/MCA params, then the next series needs to be 4.0.x. If we delay all those things (and no other backwards-incompatible changes occur relative to v3.0.x and v3.1.x), then the next series needs to be v3.2.x.

@rhc54
Copy link
Contributor

rhc54 commented Jan 3, 2018

We did not meet this week, so this will get discussed next week (and likely run into the devel meeting before getting resolved). We all are in violent agreement over what you said. The issue is whether or not there should be a backwards-incompatible release in the first half of 2018. I think IBM is advocating for "no", but I still fail to grok the reasoning behind that request.

@gpaulsen
Copy link
Member Author

gpaulsen commented Jan 3, 2018

@bwbarrett suggests that it's possible to rev to a new major version to incorporate backwards incompatible changes (like mpirun command line changes, or removal of components), but to NOT rev the user lib .so versions. This would support pre-built MPI apps, and more accurately describe that the change in Open MPI did not affect our ABI. It seems a somewhat confusing message, but perhaps this is a solution.

As @jsquyres said: If we do not make backwards-incompatible changes, we (really really) should not change the major version.

But how strong should that "really really" be? The beauty of Open MPI's component architecture is that there is a lot of flexibility to change the internals of Open MPI without affecting the layer above.

@rhc54
Copy link
Contributor

rhc54 commented Jan 3, 2018

Again, I "really, really" want to understand what problem you are trying to solve. The user community has had a way of dealing with this that was considered acceptable and adequate for nearly 14 years. What precisely is the issue now driving us to modify our methods?

People argued (rather loudly) that our feature/stable release methods should be replaced by time-based releases, and that we would let the major version indicate breaks in compatibility. This was defined as broader than what is now being suggested - specifically, it included changes in command line options and behavioral mods that would be apparent to a user.

Revving the library is a totally different question - there are strict libtool rules that govern those versions, and they have absolutely nothing to do with the release versioning. So I don't understand why this conversation is even bringing those into the thread.

@jsquyres
Copy link
Member

jsquyres commented Jan 3, 2018

BTW, #4635 and #4638 (currently pending for master) will definitely change the OSHMEM ABI.

@gpaulsen
Copy link
Member Author

We discussed at our weekly meeting: https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20180109

Decision was to keep master/next release at v4.0, but not break .so versioning unless audit determines that it's needed on a library by library basis.

@gpaulsen gpaulsen closed this Jan 10, 2018
@gpaulsen gpaulsen deleted the version branch June 5, 2019 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants