Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imrpove BRLA get_min_sample_size() efficiency #31

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sarahmorin
Copy link
Contributor

I had a thought about this a while ago and the conversation yesterday about minimum sample sizes reminded me about it.

The get_min_sample_size() method in brla gets very expensive for contests with large total ballot counts because it was increasing the binary search space and causing many unnecessary expensive distribution computations. I generated the following minimum sample sizes to determine if we could create a more informed search policy.

alpa/contest ballots 100 1,000 5,000 10,000 50,000 100,000 500,000 1,000,000 10,000,000
0.0001 15 17 17 17 17 17 17 17 17
0.001 12 13 13 13 13 13 13 13 13
0.01 9 9 9 9 9 9 9 9 9
0.02 8 8 8 8 8 8 8 8 8
0.05 6 7 7 7 7 7 7 7 7
0.1 5 5 5 5 5 5 5 5 5
0.2 4 4 4 4 4 4 4 4 4

As I expected, the minimum sample sizes from brla do not have a large range. I hardcoded the binary search to look between 1 and 30 (instead of the maximum ballots to draw) which greatly improved the initialization time for large contests. I added a minimum sample size test file for the above data. I include all the risk limits, but only contests up to 500,000 total ballots.

Updates

  • Improved binary search efficiency
  • Added min sample size test for brla

the brla get_min_sample_size() binary search gets very expensive for
large contests because their total ballots were increasing the search
space. After generating many minimum sample sizes for brla, I found the
highest minimum sample size was 17 (for a 0.01% risk limit up to
10,000,000 ballots). So I hard-coded the right bound in the binary search
to 30 (just in case) which significantly reduces the search space for
all contest making this initialization much more efficient for large
contests. I also added a test file to ensure the minimums are all still
correct. The test file only goes to 500,000.
@codecov
Copy link

codecov bot commented Aug 11, 2020

Codecov Report

Merging #31 into master will decrease coverage by 0.33%.
The diff coverage is 86.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #31      +/-   ##
==========================================
- Coverage   89.08%   88.74%   -0.34%     
==========================================
  Files          18       18              
  Lines        1374     1386      +12     
  Branches      172      175       +3     
==========================================
+ Hits         1224     1230       +6     
- Misses        117      119       +2     
- Partials       33       37       +4     
Impacted Files Coverage Δ
src/r2b2/brla.py 81.03% <33.33%> (-1.73%) ⬇️
tests/test_brla.py 100.00% <100.00%> (ø)
src/r2b2/audit.py 89.60% <0.00%> (-1.99%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0db2b07...90426e3. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant