Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branchless .filter(_).count() #39107

Merged
merged 4 commits into from
Feb 5, 2017
Merged

Conversation

llogiq
Copy link
Contributor

@llogiq llogiq commented Jan 16, 2017

I found that the branchless version is only slower if we have little to no branch misses, which usually isn't the case. I notice speedups between -5% (perfect prediction) and 60% (real world data).

@rust-highfive
Copy link
Collaborator

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

// branchless count
c += (&mut predicate)(&x) as usize;
}
c
Copy link
Member

@bluss bluss Jan 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one doesn't want to argue style for trivial things, but I find the code cramped when simple should work well:

fn count(mut self) -> usize {
    // Attempt to produce branchless count if possible
    let mut count = 0;
    for elt in self.iter {
        count += (self.predicate)(&elt) as usize;
    }
    count
}

The surrounding methods use .by_ref() which seems archaic (just &mut self.iter would be fine if mutable reference only was required)

Copy link
Contributor Author

@llogiq llogiq Jan 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will change shortly (perhaps also change the other methods en passant).

Copy link
Member

@bluss bluss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation looks good to me. Will wait for others to see if they agree with this special case.

@sfackler sfackler added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jan 17, 2017
@sfackler
Copy link
Member

Thoughts @alexcrichton @aturon? I feel somewhat conflicted.

@alexcrichton
Copy link
Member

Seems fine to me, although it'd be nice to have some tests specifically for this combinator now that we're specializing the implementation.

@BurntSushi
Copy link
Member

Only one of the jobs failed, but the error looks related. Strange.

error[E0369]: binary operation `%` cannot be applied to type `&&{integer}`
   --> /checkout/src/libcore/../libcoretest/iter.rs:197:37
    |
197 |     assert_eq!(xs.iter().filter(|x| x % 2 == 0).count(), 5);
    |                                     ^
    |
note: an implementation of `std::ops::Rem` might be missing for `&&{integer}`
   --> /checkout/src/libcore/../libcoretest/iter.rs:197:37
    |
197 |     assert_eq!(xs.iter().filter(|x| x % 2 == 0).count(), 5);
    |       

@BurntSushi
Copy link
Member

I think my concern with this is not the specific change, but rather, what's to stop someone from removing this specialization in the future because their particular data runs faster without it? Is a comment documenting it enough?

@bluss
Copy link
Member

bluss commented Jan 24, 2017

Only one job actually compiles rust and runs the tests, so that's expected.

@BurntSushi Do you mean that if this is added, that it's part of guaranteed behaviour? I don't think it should be, and it should be possible to reevaluate changes like this.

@BurntSushi
Copy link
Member

@bluss No, I mean, what's to prevent someone, say, a year from now submitting a PR that removes this specialization for similar reasons that @llogiq provided? I'm not referring to the behavior, but rather, the state of adding/removing optimizations like this?

I suppose the best answer to my concern is an executable benchmark, but I'm not sure we have the infrastructure in place for that?

(And maybe this concern just doesn't matter much in practice.)

@alexcrichton
Copy link
Member

@BurntSushi optimizations like this can be removed/added as we see fit, I don't think we're making a binding contract to implement this method exactly this way, so we're still open to future improvements if necessary.

In any case this looks good to go, so

@bors: r+

@bors
Copy link
Contributor

bors commented Feb 4, 2017

📌 Commit bfabe81 has been approved by alexcrichton

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 5, 2017
…lexcrichton

branchless .filter(_).count()

I found that the branchless version is only slower if we have little to no branch misses, which usually isn't the case. I notice speedups between -5% (perfect prediction) and 60% (real world data).
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 5, 2017
…lexcrichton

branchless .filter(_).count()

I found that the branchless version is only slower if we have little to no branch misses, which usually isn't the case. I notice speedups between -5% (perfect prediction) and 60% (real world data).
bors added a commit that referenced this pull request Feb 5, 2017
@bors bors merged commit bfabe81 into rust-lang:master Feb 5, 2017
@bors
Copy link
Contributor

bors commented Feb 5, 2017

⌛ Testing commit bfabe81 with merge 9c8cdb2...

@llogiq llogiq deleted the branchless_filter_count branch February 5, 2017 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants