Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/6.0] Fix stress issues around multiple threads throwing the same exceptions #57959

Merged
merged 2 commits into from
Aug 23, 2021

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Aug 23, 2021

Backport of #57684 to release/6.0

/cc @davidwrighton

Customer Impact

In very rare situations throwing an exception could cause the runtime to AV accessing NULL. This will FailFast the runtime. However, the issue was found with internally reported failures under GCStress. That failure mode was somewhat more likely, but can only exist on a chk build of the runtime.

Testing

Manual testing of scenario known to cause problems in rare circumstances. It did not repro, and there appears to be no impact to any other CI test. However, as the failure is rather difficult to pin down, its entirely plausible that the issue isn't fixed completely.

Risk

Low.

…s - The watson codebase manipulates the state of the following fields on Exception in a lock-free manner without locks if there are multiple threads throwing the same exception - _stackTrace - _stackTraceString - _remoteStackTraceString - _watsonBuckets - _ipForWatsonBuckets - The designed behavior is that these apis should "mostly" be correct, but as they are only used for fatal shutdown scenarios, exact correctness is not required for correct program execution - However, there are some race conditions that have been seen recently in testing 1. In some circumstances, the value will be explicitly read from multiple times, where the first read is to check for NULL, and then a second read is to read the actual value and use it in some way. In the presence of a race which sets the value to NULL, the runtime can crash. To fix this, the code is refactored in cases which could lead to crashes with a single read, and carrying around the read value to where it needs to go. 2. Since the C++ memory model generally allows a single read written in C++ to be converted into multiple reads if the compiler can prove that the read does not cross a lock/memory barrier, it is possible for the C++ compiler to inject multiple reads where the logic naturally only has 1. The fix for this is to utlilize the VolatileLoadWithoutBarrier api to specify that a read should happen once in cases where it might cause a problem.

Finally, the test45929 was tended to fail in GC stress as it would take a very long time to run under GC stress or on some hardware. Adjust it so that it shuts down after about 2.5 minutes.
- Do this instead of disabling running under gcstress as there is evidence that there may have been bugs seen during runs under gcstress.

Fixes #46803
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Please get a code review and once there is a green CI we can merge.

@davidwrighton
Copy link
Member

@jeffschwMSFT I don't have permissions to merge in this branch. Do we need to set a label so that this gets looked at by a special merge person?

@jeffschwMSFT jeffschwMSFT merged commit b7df9ff into release/6.0 Aug 23, 2021
@danmoseley danmoseley deleted the backport/pr-57684-to-release/6.0 branch August 23, 2021 23:11
@ghost ghost locked as resolved and limited conversation to collaborators Sep 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants