Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/7.0] Fix macOS floating point corruption #75467

Merged
merged 1 commit into from
Sep 12, 2022

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Sep 12, 2022

Backport of #75440 to release/7.0

/cc @janvorli

Customer Impact

Very rare and extremely difficult to diagnose corruption of floating point values.

Testing

CI tests, local coreclr tests and especially running the test Regression/coreclr/GitHub_16833 where that problem was occurring randomly in our CI and that can reproduce the issue once in several thousands iterations.

Risk

Very low, it only changes the order of setting floating point / general purpose + control context parts on a thread that is returning from hardware exception handling.

There is a race condition between activation signal handling and returning
from a hardware exception handler on macOS. It shows up intermittently in
the Regression/coreclr/GitHub_16833 test in the CI and I am able to repro
it on my local mac once in several thousands of iterations of the test when
running with GC stress C.

It turned out the issue is caused by the order in which we set parts of the
context in the thread when returning from the hardware exception handler.
MacOS can only set the floating point and control / integer portions separately.
We were setting the control / integer portion first and the floating point
portion after that. In the race condition, the signal handling code in the
macOS extracts the context that contains the new control registers, but the
old floating point ones (which is the state of those in the PAL_DispatchException).
The signal handler for the activation injection then gets executed and when it
returns later to our managed code, the floating point registers get restored
to the wrong values.

The fix is to change the context setting to first set the floating point
registers and then the control / integer portion of the context.

Close #66568
@janvorli janvorli self-assigned this Sep 12, 2022
@janvorli janvorli added this to the 7.0.0 milestone Sep 12, 2022
Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved. once we get a green ci we can merge

@carlossanlop
Copy link
Member

CI is green. Approved and signed off. Ready to merge. :shipit:

@carlossanlop carlossanlop merged commit 1fc3b9e into release/7.0 Sep 12, 2022
@carlossanlop carlossanlop deleted the backport/pr-75440-to-release/7.0 branch September 12, 2022 22:07
@ghost ghost locked as resolved and limited conversation to collaborators Oct 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants