Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor STM par asym tests and increment count #377

Merged
merged 2 commits into from
Jul 15, 2023
Merged

Conversation

jmid
Copy link
Collaborator

@jmid jmid commented Jul 14, 2023

This PR should stabilize the output of the asym tests which have added noise since the merge of #368 that switched from Semaphore.Binary to an int Atomic.t. The end result (bumping the count) is not as exciting as the story to get there.

I first factored out the negative asym tests into a separate file to be sure that the state from the previous non-asym ref tests were not affecting the outcome, and then ran focused tests on them.

Across all architectures on both CI systems 20/20 negative tests succeeded, except for on GitHub actions, macOS 5.1 and macOS trunk:

 macOS 5.1:     6 / 20+20  (int+int64)
 macOS trunk:  15 / 20+20  (int+int64)

The last one had several pairs with both the int and int64 versions failing.

This prompted me to run stats using the hackish generic-stats branch. I did so both locally and across both CI systems.
The results are fascinating:

  Local:               int ref      int64 ref
   Linux 5.1         1106 / 5000   1115 / 5000
   Linux 5.0         1120 / 5000   1208 / 5000
   MacOS 5.1          783 / 5000    809 / 5000
   macOS 5.0          799 / 5000    801 / 5000

  Multicoretest-CI:
   linux-arm64-5.0   1022 / 5000   1018 / 5000
   linux-arm64-5.1    948 / 5000    961 / 5000
   linux-arm64-5.2   1059 / 5000   1107 / 5000
   linux-s390x-5.1    302 / 5000    242 / 5000
   linux-s390x-5.2    238 / 5000    236 / 5000
   linux-ppc64le-5.2 1078 / 5000   1111 / 5000
   macos-arm64-5.0    671 / 5000    626 / 5000
   macos-arm64-5.1    659 / 5000    624 / 5000
   macos-arm64-5.2    701 / 5000    667 / 5000

  GitHub actions:
   Linux 5.1          202 / 5000    209 / 5000
   Linux trunk        158 / 5000    177 / 5000
   Linux 5.1 debug    211 / 5000    220 / 5000
   Linux trunk debug  329 / 5000    358 / 5000
   32bit 5.1          418 / 5000    409 / 5000
   32bit trunk        448 / 5000    434 / 5000
   Bytecode 5.1       457 / 5000    546 / 5000
   Bytecode trunk     320 / 5000    335 / 5000
   macOS 5.1          631 / 5000    644 / 5000
   macOS trunk        443 / 5000    420 / 5000
   Windows 5.1        258 / 5000    246 / 5000
   Win bytecode 5.1   378 / 5000    362 / 5000
   Windows trunk      467 / 5000    494 / 5000
   Win bytecode trunk 287 / 5000    285 / 5000

Based on it one can then prove (at 95% confidence) that both Linux and macOS 5.1 int ref and int64 ref works better locally than on GA. For example,

$ dune exec src/statistics/z_test.exe 5000 202 5000 1106
z-test of two proportions         
z = -26.810487
Is |z| = |-26.810487| > z_alpha2 = 1.960000 ?
Yes, null hypothesis rejected

More interestingly, the CI error rates from the stats do not line up with the error rates that I was observing on the focused tests!

Overall

  • there's a wide spread in error rates across platforms: 158-1111 out of 5000
  • the int Atomic.t trick works great locally (and on ocaml-ci machines) - provably better than on GitHub actions
  • the CI stats do not reflect the error rates of actual CI test runs (this echos a concern you had on Add the possibility to compute statistics #369 @shym!)
  • the macOS GitHub action runners seem load sensitive

As a consequence, increasing the count to 5000 for this negative test has no visible effect on any other platform - but it stabilizes the output for sensitive macOS-runners on GitHub actions 🤓

@jmid
Copy link
Collaborator Author

jmid commented Jul 15, 2023

CI summary:

Most importantly: no asym failures in sight, so I'll merge 🤞

@jmid jmid merged commit 077da06 into main Jul 15, 2023
44 of 47 checks passed
@jmid jmid deleted the stm-par-asym-refactor branch July 15, 2023 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'STM _ ref test parallel asymmetric' failure to trigger
1 participant