Measure statistical significance of Domain setup #47

jmid · 2022-04-16T23:34:52Z

There's one remaining usage of cpu_relax in spinning the first domain while waiting for the second domain to start-up:
https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/lib/lin.ml#L122-L124
Now that we have statistics in place, it would be natural to give this Domain setup a run-down to see what aspects actually influence the bug-finding ability similar to what I did for Thread recently: https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/src/statistics/README.md?plain=1#L129-L143

For Thread a wait loop had an significant effect. For Domain it would be nice to confirm - and also investigate whether there could be better ways to accomplish this. In the tests for the work-stealing deque that has now been pulled out of domainslib the spinning did not work at all to trigger issues on MacOSX, so I ended up going with a binary semaphore: https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/src/domainslib/ws_deque_test.ml#L131-L133
The simpler, the better. A combination of a Mutex and a Condition variable may also be sufficient.

Originally posted by @jmid in #43 (comment)

The text was updated successfully, but these errors were encountered:

n-osborne · 2022-09-27T13:08:10Z

I've been trying to have some numbers comparing bug-triggering with cpu_relax and semaphore. I have some strange results (no buggy programs found over 10000 while CI is happy with 1000...) and I don't understand yet, but it seems that synchronization with a semaphore is a bit faster than with a cpu_relax:

$ dune exec -- src/neg_tests/conclist_stm_tests.exe
random seed: 138767447
generated error  fail  pass / total     time test name
[✓] 10000     0     0 10000 / 10000   103.1s STM int64 CList with cpu_relax
[✓] 10000     0     0 10000 / 10000    78.5s STM int64 CList with semaphore
================================================================================
success (ran 2 tests)
relax : 0 / 10000
semap : 0 / 10000

Code is here: https://github.com/n-osborne/multicoretests/blob/domain-stats/src/neg_tests/conclist_stm_tests.ml#L53 and here: https://github.com/n-osborne/multicoretests/blob/domain-stats/lib/STM.ml#L391

jmid · 2022-09-27T15:55:16Z

That's indeed interesting that the Semaphore is faster than the "Atomic waiting loop" 👍 🤔

I had a quick look:

When an exception is raised mk_prop does not increase the counter (I think it should)
I also noticed that the stats tests are not using repeat. To be comparable to the CI's 1000 iterations I would try to use it here too.

n-osborne · 2022-09-28T14:38:18Z

* When an exception is raised `mk_prop` does not increase the counter (I think it should)

Yes, that works better that way.

* I also noticed that the stats tests are not using `repeat`. To be comparable to the CI's 1000 iterations I would try to use it here too.

That was just to have something a bit more accurate for speed.

So Semaphore are indeed faster, but spot far less buggy programs:

This is with repeat 25 prop.

$ dune exec -- src/neg_tests/conclist_stm_tests.exe
random seed: 300478220
generated error  fail  pass / total     time test name
[✓] 10000     0     0 10000 / 10000  3302.4s STM int64 CList with cpu_relax
[✓] 10000     0     0 10000 / 10000  1970.2s STM int64 CList with semaphore
================================================================================
success (ran 2 tests)
relax : 36868 / 10000
semap : 8 / 10000

jmid · 2022-09-28T16:18:08Z

Ah, that is indeed quite a difference! 😮

i'm surprised by the number 36868 though!
Because of the way Util.repeat is implemented it should stop early on the first failed property.
I would thus expect it to increment the counter at most once for each of the 25 repetitions and hence reach at most 10000. 🤔

jmid mentioned this issue Apr 16, 2022

Unnecessary calls to cpu_relax when sequentially executing commands #43

Closed

jmid mentioned this issue Jun 8, 2023

Expand on stats to guide improvements #362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure statistical significance of Domain setup #47

Measure statistical significance of Domain setup #47

jmid commented Apr 16, 2022

n-osborne commented Sep 27, 2022

jmid commented Sep 27, 2022

n-osborne commented Sep 28, 2022

jmid commented Sep 28, 2022

Measure statistical significance of Domain setup #47

Measure statistical significance of Domain setup #47

Comments

jmid commented Apr 16, 2022

n-osborne commented Sep 27, 2022

jmid commented Sep 27, 2022

n-osborne commented Sep 28, 2022

jmid commented Sep 28, 2022