Implement `ones` using `trailing_zeros` #18

vks · 2018-02-22T13:53:34Z

The iter_ones_all_zeros benchmark improved a lot, the results for iter_ones_all_ones are too noisy on my machine to tell. Will update when I have better numbers.

Fixes #17.

vks · 2018-02-22T17:54:15Z

I ported the benchmarks to criterion (see #19), hoping to get less noisy results. I still don't trust them, but here they are:

iter ones: all zeros/default                        
                        time:   [10.350 us 10.395 us 10.440 us]
                        change: [-57.692% -57.491% -57.299%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

iter ones: all ones/default                        
                        time:   [433.47 us 433.99 us 434.56 us]
                        change: [+19.300% +21.654% +23.611%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

iter ones: random/default                        
                        time:   [447.96 us 448.54 us 449.20 us]
                        change: [-83.733% -83.682% -83.634%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  9 (9.00%) high mild
  3 (3.00%) high severe

The actual noise is at least 2%. The last benchmark is newly introduced and uses (uniformly) random bits. For the "all ones" benchmark, the slice-based implementation is best:

iter ones: all ones/slice directly                        
                        time:   [367.65 us 368.20 us 368.80 us]
                        change: [-1.6927% -1.4273% -1.1618%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

(Note that the change in performance is bogus, the code was not changed.)

akraines · 2021-01-11T11:54:22Z

src/lib.rs

            }
+            self.bitset = self.remaining_blocks[0];


Instead of immediately setting bitset to remaining_blocks[0] and updating remaining blocks, why don't you search for the first none zero block and only then update remaining_blocks? (just to save some assignments)

(the PR has been opened for almost 3 years, so I guess nothing needs be done until the maintainer shows interest, but I figured I'd add the comment in case something comes of it)

I'm not sure that this is faster, because it might mess up vectorization. Could be interesting to benchmark.

bluss · 2021-01-22T16:40:34Z

Looks good to me. Sorry for the long wait, I'm not sure if there was a reason for not including this

Fixes petgraph#17.

vks · 2021-01-22T23:40:26Z

I rebased it on master.

bluss · 2021-01-22T23:45:59Z

src/lib.rs

-                    return None;
-                }
+        while self.bitset == 0 {
+            if self.remaining_blocks.is_empty() {


We can accomplish all of is_empty, [0] and [1..] in one method call - self.remaining_blocks.split_first() for a presumed minor win in terms of bounds checking overhead 🙂

Or for that matter, reading split_first's implmentation with slice patterns - but I think that's a much more recent feature if let [first, tail @ ..] = self { Some((first, tail)) } else { None } but it's good to read and use elsewhere if not here.

bluss · 2021-01-22T23:46:59Z

Looks good to me.

What you think @jrraymond? If you say go I can maybe look at releasing unless you want to do it

jrraymond · 2021-01-23T14:29:18Z

lgtm as well. I don't mind releasing this morning.

I dumped the asm because I was curious to see if trailing zeroes does produce better asm and it does:

<old::Ones as core::iter::traits::iterator::Iterator>::next:
        mov     ecx, dword ptr [rdi + 32]
        mov     rsi, qword ptr [rdi]
        xor     eax, eax
.LBB8_1:
        mov     rdx, rsi
        mov     esi, ecx
        shr     ecx
        test    sil, 1
        jne     .LBB8_5
        lea     rsi, [rdx + 1]
        test    ecx, ecx
        jne     .LBB8_1
        mov     rsi, qword ptr [rdi + 24]
        test    rsi, rsi
        je      .LBB8_6
        mov     rdx, qword ptr [rdi + 16]
        add     rsi, -1
        mov     ecx, dword ptr [rdx]
        add     rdx, 4
        mov     qword ptr [rdi + 16], rdx
        mov     qword ptr [rdi + 24], rsi
        mov     rsi, qword ptr [rdi + 8]
        add     rsi, 1
        mov     qword ptr [rdi + 8], rsi
        shl     rsi, 5
        jmp     .LBB8_1
.LBB8_5:
        mov     dword ptr [rdi + 32], ecx
        lea     rax, [rdx + 1]
        mov     qword ptr [rdi], rax
        mov     eax, 1
.LBB8_6:
        ret

<new::Ones as core::iter::traits::iterator::Iterator>::next:
        mov     r8d, dword ptr [rdi + 24]
        test    r8d, r8d
        je      .LBB7_3
        mov     rdx, qword ptr [rdi]
        jmp     .LBB7_2
.LBB7_3:
        mov     rsi, qword ptr [rdi + 16]
        add     rsi, -1
        xor     eax, eax
.LBB7_4:
        cmp     rsi, -1
        je      .LBB7_5
        mov     rdx, qword ptr [rdi]
        mov     rcx, qword ptr [rdi + 8]
        mov     r8d, dword ptr [rcx]
        mov     dword ptr [rdi + 24], r8d
        add     rcx, 4
        mov     qword ptr [rdi + 8], rcx
        mov     qword ptr [rdi + 16], rsi
        add     rdx, 1
        mov     qword ptr [rdi], rdx
        add     rsi, -1
        test    r8d, r8d
        je      .LBB7_4
.LBB7_2:
        bsf     eax, r8d   ; bit scan forward \o/
        lea     esi, [r8 - 1]
        and     esi, r8d
        mov     dword ptr [rdi + 24], esi
        shl     rdx, 5
        or      rdx, rax
        mov     eax, 1
        ret
.LBB7_5:
        ret

vks mentioned this pull request Feb 22, 2018

Port benchmarks to criterion #19

Closed

akraines reviewed Jan 11, 2021

View reviewed changes

vks added 2 commits January 23, 2021 00:35

Remove trailing whitespace

1ffc9cb

Implement ones() using trailing_zeros()

6c79c7c

Fixes petgraph#17.

vks force-pushed the faster-iter branch from ed9597d to 6c79c7c Compare January 22, 2021 23:39

bluss reviewed Jan 22, 2021

View reviewed changes

jrraymond merged commit 1563358 into petgraph:master Jan 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `ones` using `trailing_zeros` #18

Implement `ones` using `trailing_zeros` #18

vks commented Feb 22, 2018

vks commented Feb 22, 2018 •

edited

Loading

akraines Jan 11, 2021 •

edited

Loading

akraines Jan 11, 2021 •

edited

Loading

vks Jan 22, 2021

bluss commented Jan 22, 2021

vks commented Jan 22, 2021

bluss Jan 22, 2021

bluss Jan 22, 2021

bluss commented Jan 22, 2021 •

edited

Loading

jrraymond commented Jan 23, 2021 •

edited

Loading

Implement ones using trailing_zeros #18

Implement ones using trailing_zeros #18

Conversation

vks commented Feb 22, 2018

vks commented Feb 22, 2018 • edited Loading

akraines Jan 11, 2021 • edited Loading

Choose a reason for hiding this comment

akraines Jan 11, 2021 • edited Loading

Choose a reason for hiding this comment

vks Jan 22, 2021

Choose a reason for hiding this comment

bluss commented Jan 22, 2021

vks commented Jan 22, 2021

bluss Jan 22, 2021

Choose a reason for hiding this comment

bluss Jan 22, 2021

Choose a reason for hiding this comment

bluss commented Jan 22, 2021 • edited Loading

jrraymond commented Jan 23, 2021 • edited Loading

Implement `ones` using `trailing_zeros` #18

Implement `ones` using `trailing_zeros` #18

vks commented Feb 22, 2018 •

edited

Loading

akraines Jan 11, 2021 •

edited

Loading

akraines Jan 11, 2021 •

edited

Loading

bluss commented Jan 22, 2021 •

edited

Loading

jrraymond commented Jan 23, 2021 •

edited

Loading