Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize select_bit #52

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Optimize select_bit #52

wants to merge 3 commits into from

Commits on Aug 21, 2023

  1. select_bit: Optimize for AArch64

    ARM NEON has a byte-wise popcount instruction, which helps to optimize
    `select_bit` and `PopCount::count`.  Use it for AArch64 (64-bit ARM).
    
    15% speedup for `Rank1`, 4% for `Select0` and 3% for `Select1`.
    (60% for `PopCount::count` itself.)
    jmr committed Aug 21, 2023
    Configuration menu
    Copy the full SHA
    fc0fe8f View commit details
    Browse the repository at this point in the history
  2. select_bit: Use byte-serial version for 32-bit

    This gives a 9% speedup on `select0` and 7% on `select1`.
    (Tested on Pixel 3 in armeabi-v7a mode.)
    
    This is likely because the branches of this unrolled linear
    search are more predictable than the binary search that was
    used previously.
    jmr committed Aug 21, 2023
    Configuration menu
    Copy the full SHA
    b69d476 View commit details
    Browse the repository at this point in the history
  3. select_bit: Use a lookup table

    Instead of computing `(counts | MASK_80) - ((i + 1) * MASK_01)`,
    we pre-compute a lookup table
    ```
    PREFIX_SUM_OVERFLOW[i] = (0x80 - (i + 1)) * MASK_01 = (0x7F - i) * MASK_01
    ```
    then use `counts + PREFIX_SUM_OVERFLOW[i]`.
    
    This uses a `UInt64[64]` or 0.5kiB lookup table. The trick is from:
    Gog, Simon and Matthias Petri. “Optimized succinct data structures for
    massive data.” Software: Practice and Experience 44 (2014): 1287 - 1314.
    
    https://www.semanticscholar.org/paper/Optimized-succinct-data-structures-for-massive-data-Gog-Petri/c7e7f02f441ebcc0aeffdcad2964185926551ec3
    
    This gives a 2-3% speedup for `BitVector::select0`/`select1`.
    jmr committed Aug 21, 2023
    Configuration menu
    Copy the full SHA
    b3ebfc7 View commit details
    Browse the repository at this point in the history