Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure statistical significance of Domain setup #47

Open
jmid opened this issue Apr 16, 2022 · 4 comments
Open

Measure statistical significance of Domain setup #47

jmid opened this issue Apr 16, 2022 · 4 comments

Comments

@jmid
Copy link
Collaborator

jmid commented Apr 16, 2022

There's one remaining usage of cpu_relax in spinning the first domain while waiting for the second domain to start-up:
https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/lib/lin.ml#L122-L124
Now that we have statistics in place, it would be natural to give this Domain setup a run-down to see what aspects actually influence the bug-finding ability similar to what I did for Thread recently: https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/src/statistics/README.md?plain=1#L129-L143

For Thread a wait loop had an significant effect. For Domain it would be nice to confirm - and also investigate whether there could be better ways to accomplish this. In the tests for the work-stealing deque that has now been pulled out of domainslib the spinning did not work at all to trigger issues on MacOSX, so I ended up going with a binary semaphore: https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/src/domainslib/ws_deque_test.ml#L131-L133
The simpler, the better. A combination of a Mutex and a Condition variable may also be sufficient.

Originally posted by @jmid in #43 (comment)

@n-osborne
Copy link
Contributor

I've been trying to have some numbers comparing bug-triggering with cpu_relax and semaphore. I have some strange results (no buggy programs found over 10000 while CI is happy with 1000...) and I don't understand yet, but it seems that synchronization with a semaphore is a bit faster than with a cpu_relax:

$ dune exec -- src/neg_tests/conclist_stm_tests.exe
random seed: 138767447
generated error  fail  pass / total     time test name
[✓] 10000     0     0 10000 / 10000   103.1s STM int64 CList with cpu_relax
[✓] 10000     0     0 10000 / 10000    78.5s STM int64 CList with semaphore
================================================================================
success (ran 2 tests)
relax : 0 / 10000
semap : 0 / 10000

Code is here: https://github.com/n-osborne/multicoretests/blob/domain-stats/src/neg_tests/conclist_stm_tests.ml#L53 and here: https://github.com/n-osborne/multicoretests/blob/domain-stats/lib/STM.ml#L391

@jmid
Copy link
Collaborator Author

jmid commented Sep 27, 2022

That's indeed interesting that the Semaphore is faster than the "Atomic waiting loop" 👍 🤔

I had a quick look:

  • When an exception is raised mk_prop does not increase the counter (I think it should)
  • I also noticed that the stats tests are not using repeat. To be comparable to the CI's 1000 iterations I would try to use it here too.

@n-osborne
Copy link
Contributor

* When an exception is raised `mk_prop` does not increase the counter (I think it should)

Yes, that works better that way.

* I also noticed that the stats tests are not using `repeat`. To be comparable to the CI's 1000 iterations I would try to use it here too.

That was just to have something a bit more accurate for speed.

So Semaphore are indeed faster, but spot far less buggy programs:

This is with repeat 25 prop.

$ dune exec -- src/neg_tests/conclist_stm_tests.exe
random seed: 300478220
generated error  fail  pass / total     time test name
[✓] 10000     0     0 10000 / 10000  3302.4s STM int64 CList with cpu_relax
[✓] 10000     0     0 10000 / 10000  1970.2s STM int64 CList with semaphore
================================================================================
success (ran 2 tests)
relax : 36868 / 10000
semap : 8 / 10000

@jmid
Copy link
Collaborator Author

jmid commented Sep 28, 2022

Ah, that is indeed quite a difference! 😮

i'm surprised by the number 36868 though!
Because of the way Util.repeat is implemented it should stop early on the first failed property.
I would thus expect it to increment the counter at most once for each of the 25 repetitions and hence reach at most 10000. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants