Refactor STM par asym tests and increment count #377

jmid · 2023-07-14T16:03:53Z

This PR should stabilize the output of the asym tests which have added noise since the merge of #368 that switched from Semaphore.Binary to an int Atomic.t. The end result (bumping the count) is not as exciting as the story to get there.

I first factored out the negative asym tests into a separate file to be sure that the state from the previous non-asym ref tests were not affecting the outcome, and then ran focused tests on them.

Across all architectures on both CI systems 20/20 negative tests succeeded, except for on GitHub actions, macOS 5.1 and macOS trunk:

 macOS 5.1:     6 / 20+20  (int+int64)
 macOS trunk:  15 / 20+20  (int+int64)

The last one had several pairs with both the int and int64 versions failing.

This prompted me to run stats using the hackish generic-stats branch. I did so both locally and across both CI systems.
The results are fascinating:

  Local:               int ref      int64 ref
   Linux 5.1         1106 / 5000   1115 / 5000
   Linux 5.0         1120 / 5000   1208 / 5000
   MacOS 5.1          783 / 5000    809 / 5000
   macOS 5.0          799 / 5000    801 / 5000

  Multicoretest-CI:
   linux-arm64-5.0   1022 / 5000   1018 / 5000
   linux-arm64-5.1    948 / 5000    961 / 5000
   linux-arm64-5.2   1059 / 5000   1107 / 5000
   linux-s390x-5.1    302 / 5000    242 / 5000
   linux-s390x-5.2    238 / 5000    236 / 5000
   linux-ppc64le-5.2 1078 / 5000   1111 / 5000
   macos-arm64-5.0    671 / 5000    626 / 5000
   macos-arm64-5.1    659 / 5000    624 / 5000
   macos-arm64-5.2    701 / 5000    667 / 5000

  GitHub actions:
   Linux 5.1          202 / 5000    209 / 5000
   Linux trunk        158 / 5000    177 / 5000
   Linux 5.1 debug    211 / 5000    220 / 5000
   Linux trunk debug  329 / 5000    358 / 5000
   32bit 5.1          418 / 5000    409 / 5000
   32bit trunk        448 / 5000    434 / 5000
   Bytecode 5.1       457 / 5000    546 / 5000
   Bytecode trunk     320 / 5000    335 / 5000
   macOS 5.1          631 / 5000    644 / 5000
   macOS trunk        443 / 5000    420 / 5000
   Windows 5.1        258 / 5000    246 / 5000
   Win bytecode 5.1   378 / 5000    362 / 5000
   Windows trunk      467 / 5000    494 / 5000
   Win bytecode trunk 287 / 5000    285 / 5000

Based on it one can then prove (at 95% confidence) that both Linux and macOS 5.1 int ref and int64 ref works better locally than on GA. For example,

$ dune exec src/statistics/z_test.exe 5000 202 5000 1106
z-test of two proportions         
z = -26.810487
Is |z| = |-26.810487| > z_alpha2 = 1.960000 ?
Yes, null hypothesis rejected

More interestingly, the CI error rates from the stats do not line up with the error rates that I was observing on the focused tests!

Overall

there's a wide spread in error rates across platforms: 158-1111 out of 5000
the int Atomic.t trick works great locally (and on ocaml-ci machines) - provably better than on GitHub actions
the CI stats do not reflect the error rates of actual CI test runs (this echos a concern you had on Add the possibility to compute statistics #369 @shym!)
the macOS GitHub action runners seem load sensitive

As a consequence, increasing the count to 5000 for this negative test has no visible effect on any other platform - but it stabilizes the output for sensitive macOS-runners on GitHub actions 🤓

jmid · 2023-07-15T07:39:43Z

CI summary:

Windows bytecode 5.1 timeout due to a threadomain timeout [ocaml5-issue] Windows failures on threadomain #203
Windows 5.1 timeout during to an excessive 12182.5s shrink run of Lin DSL In_channel test with Domain Long In_channel shrink runs causing timeout #373
macOS arm64 5.1 failed to trigger STM int64 ref test with Thread Improve reproducibility of Lin/STM Thread modes #338

Most importantly: no asym failures in sight, so I'll merge 🤞

jmid added 2 commits July 14, 2023 14:56

Factor out asym ref tests

881ea21

Bump asym count to account for sensitive GA runners

a3a67a7

jmid linked an issue Jul 14, 2023 that may be closed by this pull request

'STM _ ref test parallel asymmetric' failure to trigger #364

Open

jmid mentioned this pull request Jul 15, 2023

'STM _ ref test parallel asymmetric' failure to trigger #364

Open

jmid merged commit 077da06 into main Jul 15, 2023
44 of 47 checks passed

jmid deleted the stm-par-asym-refactor branch July 15, 2023 07:40

jmid mentioned this pull request Aug 18, 2023

[ocaml5-issue] General Cygwin slowness #384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor STM par asym tests and increment count #377

Refactor STM par asym tests and increment count #377

jmid commented Jul 14, 2023

jmid commented Jul 15, 2023

Refactor STM par asym tests and increment count #377

Refactor STM par asym tests and increment count #377

Conversation

jmid commented Jul 14, 2023

jmid commented Jul 15, 2023