Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opal/fifo: fix 128-bit atomic fifo on Power9 #5374

Merged
merged 1 commit into from
Jul 10, 2018

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Jul 3, 2018

This commit updates the atomic fifo code to fix a consistency issue
observed on Power9 systems when builtin atomics are used. The cause
was two things: 1) a missing write memory barrier in fifo push, and 2)
a read ordering issue when reading the fifo head non-atomically. This
commit fixes both issues and appears to correct then inconsistency.

Signed-off-by: Nathan Hjelm [email protected]

@hjelmn
Copy link
Member Author

hjelmn commented Jul 3, 2018

@jsquyres Turns out that Power 9 can support 128-bit compare-exchange. It has LL/SC instructions for reading/writing quad words.

@hjelmn
Copy link
Member Author

hjelmn commented Jul 3, 2018

FYI @gpaulsen. Targeted for v3.0.x and v3.1.x.

@gpaulsen
Copy link
Member

gpaulsen commented Jul 3, 2018

Our CI is power8, but I think Josh disabled make-check.
I'm out rest of the week, but next week I can try it on our power8 and power9 by hand.

This commit updates the atomic fifo code to fix a consistency issue
observed on Power9 systems when builtin atomics are used. The cause
was two things: 1) a missing write memory barrier in fifo push, and 2)
a read ordering issue when reading the fifo head non-atomically. This
commit fixes both issues and appears to correct then inconsistency.

Signed-off-by: Nathan Hjelm <[email protected]>
@jsquyres
Copy link
Member

jsquyres commented Jul 9, 2018

@gpaulsen Any news? 😄

@gpaulsen
Copy link
Member

gpaulsen commented Jul 9, 2018

I'll give it a go.

@gpaulsen
Copy link
Member

gpaulsen commented Jul 10, 2018

Build and runs fine on power8 and power9, both with RHEL 7.5 and gcc 4.8.5

'make check' reported all PASSES (including opal_fifo).

NOTE: mpool_memkind was skipped because configure was not able to build it (no memkind.h at configure time). I assume this is not related to this opal_fifo change.

Thanks again @hjelmn

@gpaulsen gpaulsen self-requested a review July 10, 2018 21:27
Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any obvious issues, though I am not a memory barrier expert.
runs well on power8 and power9.

@hjelmn hjelmn merged commit 8b09010 into open-mpi:master Jul 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants