Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One method of debugger attach broken #1225

Closed
jsquyres opened this issue Dec 15, 2015 · 9 comments · Fixed by #1480
Closed

One method of debugger attach broken #1225

jsquyres opened this issue Dec 15, 2015 · 9 comments · Fixed by #1480
Assignees
Labels
Milestone

Comments

@jsquyres
Copy link
Member

In master and v2.x, debugger attach for TotalView is currently broken. The problem is that when we upgraded to PMIx, we removed the OOB support from apps as it was no longer necessary. However, we currently send a message from mpirun to rank0 indicating that the debugger is ready, and therefore releasing rank0 to complete a barrier.

There are several ways of fixing this; there's ongoing discussion to pick the best one.

Just to summarize:

  • stat: works (anything based on LaunchMon works)
  • DDT: works
  • TotalView: doesn't work
@hppritcha
Copy link
Member

@rhc54 have you had a chance to look at this?

@rhc54
Copy link
Contributor

rhc54 commented Jan 4, 2016

I've looked at it and have been working on a fix, but no ETA for committing it.

@jsquyres
Copy link
Member Author

jsquyres commented Jan 5, 2016

Discussion from today's call...

There was a webex discussing this issue, and what to do about it. Two main options emerged:

  1. Restore OOB functionality (including usock component).
  2. Use PMIX notification system.

Restoring the OOB functionality for this one message seems like a lot of work, and it also seems like a step backwards.

The consensus seems to be to move forward and use the PMIx notification system (which means: finish implementing the PMIx notification system). @rhc54 is working on it. This will likely take a little time to finish and test. @gpaulsen thinks that IBM may be able to contribute some resources to help.

Note, too, that the PMIx notification stuff will be part of PMIx v1.2. Meaning: if we want TotalView attach to work in Open MPI v2.0.0 (as of today: we do), we'll need to update the v2.x branch with PMIx v1.

@hppritcha hppritcha modified the milestones: v2.1.0, v2.0.0 Jan 19, 2016
@gpaulsen
Copy link
Member

From Telcon Call: https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20160112
Decided this is NOT a blocker for 2.0.0, but want fixed in next release.

@jsquyres
Copy link
Member Author

When this issue is fixed, please also revert the change (on v2.x) from open-mpi/ompi-release#905.

@jsquyres
Copy link
Member Author

Per discussion on 24 Feb 2016, moving this milestone back to v2.0.1. Rationale: it's a bug fix, and it does not affect our backwards compatibility promises for the 2.x series.

@jsquyres jsquyres modified the milestones: v2.0.1, v2.1.0 Feb 24, 2016
@rhc54
Copy link
Contributor

rhc54 commented Feb 26, 2016

@jsquyres @hppritcha @hjelmn Just an FYI: I noticed that something is broken on the pmix120 component - I'm getting hangs during regular init (i.e., no debugger) on my Mac. Not sure what may have broken, but I'll fix in on Fri.

@gpaulsen
Copy link
Member

gpaulsen commented Mar 3, 2016

Any update? Is this a 2.0 blocker?

@jsquyres
Copy link
Member Author

jsquyres commented Mar 3, 2016

We're iterating with Totalview. The first version we sent to them didn't work; we sent them another one last night.

jsquyres pushed a commit to jsquyres/ompi that referenced this issue Aug 23, 2016
…thus fixing show_help aggregation.

Fixes open-mpi#1467

Restore debugger attach operations

Fixes open-mpi#1225

(cherry picked from commit open-mpi/ompi@c146c49)

Fix the debugger attach - previous commit had fixed one instance of a check prior to sending the release message, but there was a second code path that included a similar check that was missed. Thanks to John DelSignore for spotting it!

(cherry picked from commit open-mpi/ompi@4a62377)

Very minor typo

(cherry picked from commit open-mpi/ompi@6e6bbfd)
jsquyres pushed a commit to jsquyres/ompi that referenced this issue Aug 23, 2016
bosilca pushed a commit to bosilca/ompi that referenced this issue Oct 3, 2016
…supported by the pmix120 component, which is not selected by default. All other components will ignore error registration requests, and thus do not support debugger attach when launched via mpirun. Note that direct launched applications will support such attachment, but may not do so in a scalable fashion.

Fixes #open-mpi#1225
bosilca pushed a commit to bosilca/ompi that referenced this issue Oct 3, 2016
…thus fixing show_help aggregation.

Fixes open-mpi#1467

Restore debugger attach operations

Fixes open-mpi#1225
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants