Refactor STM par asym tests and increment count #377
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR should stabilize the output of the asym tests which have added noise since the merge of #368 that switched from
Semaphore.Binary
to anint Atomic.t
. The end result (bumping the count) is not as exciting as the story to get there.I first factored out the negative
asym
tests into a separate file to be sure that the state from the previous non-asym ref
tests were not affecting the outcome, and then ran focused tests on them.Across all architectures on both CI systems 20/20 negative tests succeeded, except for on GitHub actions, macOS 5.1 and macOS trunk:
The last one had several pairs with both the int and int64 versions failing.
This prompted me to run stats using the hackish
generic-stats
branch. I did so both locally and across both CI systems.The results are fascinating:
Based on it one can then prove (at 95% confidence) that both Linux and macOS 5.1
int ref
andint64 ref
works better locally than on GA. For example,More interestingly, the CI error rates from the stats do not line up with the error rates that I was observing on the focused tests!
Overall
int Atomic.t
trick works great locally (and on ocaml-ci machines) - provably better than on GitHub actionsAs a consequence, increasing the count to 5000 for this negative test has no visible effect on any other platform - but it stabilizes the output for sensitive macOS-runners on GitHub actions 🤓