Imrpove BRLA `get_min_sample_size()` efficiency #31

sarahmorin · 2020-08-11T16:10:43Z

I had a thought about this a while ago and the conversation yesterday about minimum sample sizes reminded me about it.

The get_min_sample_size() method in brla gets very expensive for contests with large total ballot counts because it was increasing the binary search space and causing many unnecessary expensive distribution computations. I generated the following minimum sample sizes to determine if we could create a more informed search policy.

alpa/contest ballots	100	1,000	5,000	10,000	50,000	100,000	500,000	1,000,000	10,000,000
0.0001	15	17	17	17	17	17	17	17	17
0.001	12	13	13	13	13	13	13	13	13
0.01	9	9	9	9	9	9	9	9	9
0.02	8	8	8	8	8	8	8	8	8
0.05	6	7	7	7	7	7	7	7	7
0.1	5	5	5	5	5	5	5	5	5
0.2	4	4	4	4	4	4	4	4	4

As I expected, the minimum sample sizes from brla do not have a large range. I hardcoded the binary search to look between 1 and 30 (instead of the maximum ballots to draw) which greatly improved the initialization time for large contests. I added a minimum sample size test file for the above data. I include all the risk limits, but only contests up to 500,000 total ballots.

Updates

Improved binary search efficiency
Added min sample size test for brla

the brla get_min_sample_size() binary search gets very expensive for large contests because their total ballots were increasing the search space. After generating many minimum sample sizes for brla, I found the highest minimum sample size was 17 (for a 0.01% risk limit up to 10,000,000 ballots). So I hard-coded the right bound in the binary search to 30 (just in case) which significantly reduces the search space for all contest making this initialization much more efficient for large contests. I also added a test file to ensure the minimums are all still correct. The test file only goes to 500,000.

codecov · 2020-08-11T16:14:46Z

Codecov Report

Merging #31 into master will decrease coverage by 0.33%.
The diff coverage is 86.66%.

@@            Coverage Diff             @@
##           master      #31      +/-   ##
==========================================
- Coverage   89.08%   88.74%   -0.34%     
==========================================
  Files          18       18              
  Lines        1374     1386      +12     
  Branches      172      175       +3     
==========================================
+ Hits         1224     1230       +6     
- Misses        117      119       +2     
- Partials       33       37       +4

Impacted Files	Coverage Δ
src/r2b2/brla.py	`81.03% <33.33%> (-1.73%)`	⬇️
tests/test_brla.py	`100.00% <100.00%> (ø)`
src/r2b2/audit.py	`89.60% <0.00%> (-1.99%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0db2b07...90426e3. Read the comment docs.

sarahmorin requested a review from GrantMcClearn August 11, 2020 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imrpove BRLA `get_min_sample_size()` efficiency #31

Imrpove BRLA `get_min_sample_size()` efficiency #31

sarahmorin commented Aug 11, 2020

codecov bot commented Aug 11, 2020

Imrpove BRLA get_min_sample_size() efficiency #31

Are you sure you want to change the base?

Imrpove BRLA get_min_sample_size() efficiency #31

Conversation

sarahmorin commented Aug 11, 2020

Updates

codecov bot commented Aug 11, 2020

Codecov Report

Imrpove BRLA `get_min_sample_size()` efficiency #31

Imrpove BRLA `get_min_sample_size()` efficiency #31