Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up leb128 encoding and decoding for unsigned values. #46919

Merged
merged 1 commit into from
Jan 20, 2018

Conversation

michaelwoerister
Copy link
Member

Make the implementation for some leb128 functions potentially faster.

@Mark-Simulacrum, could you please trigger a perf.rlo run?

@rust-highfive
Copy link
Collaborator

r? @aidanhs

(rust_highfive has picked a reviewer for you, use r? to override)

@kennytm
Copy link
Member

kennytm commented Dec 21, 2017

@bors try

@kennytm kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 21, 2017
@bors
Copy link
Contributor

bors commented Dec 21, 2017

⌛ Trying commit 43ad4fd with merge 2d54fb61881368133d872d8f878adcde8621da7f...

@bors
Copy link
Contributor

bors commented Dec 21, 2017

☀️ Test successful - status-travis
State: approved= try=True

@aidanhs
Copy link
Member

aidanhs commented Dec 21, 2017

r? @sfackler

@rust-highfive rust-highfive assigned sfackler and unassigned aidanhs Dec 21, 2017
macro_rules! impl_read_unsigned_leb128 {
($fn_name:ident, $int_ty:ident) => (
#[inline]
pub fn $fn_name(data: &[u8], start_position: usize) -> ($int_ty, usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was here before, bu t why does this take a slice and an offset instead of just a slice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Maybe there was a reason in the past. I'll change it.

@Mark-Simulacrum
Copy link
Member

Perf run queued.

@michaelwoerister
Copy link
Member Author

http://perf.rust-lang.org/compare.html?start=eff3de0927c36e6483ccb8a35c3d2da6e063de0b&end=2d54fb61881368133d872d8f878adcde8621da7f&stat=wall-time

Huh, that's very different from what the microbenchmarks showed. Seems like I need to iterate some more on this.

@kennytm kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 23, 2017
@alexcrichton
Copy link
Member

ping @michaelwoerister, just wanna make sure this doesn't fall off your radar!

@michaelwoerister
Copy link
Member Author

I'm still in the process of setting up some good benchmarks that work with real-world data: https://github.com/michaelwoerister/encoding-bench

It's a bit of a side-project, so it will take a while.

@michaelwoerister
Copy link
Member Author

@bors try

@bors
Copy link
Contributor

bors commented Jan 9, 2018

⌛ Trying commit 53c2f44 with merge c70d579...

bors added a commit that referenced this pull request Jan 9, 2018
Speed up leb128 encoding and decoding for unsigned values.

Make the implementation for some leb128 functions potentially faster.

@Mark-Simulacrum, could you please trigger a perf.rlo run?
@bors
Copy link
Contributor

bors commented Jan 9, 2018

☀️ Test successful - status-travis
State: approved= try=True

@michaelwoerister
Copy link
Member Author

@Mark-Simulacrum, could you do another perf run please?

@Mark-Simulacrum
Copy link
Member

The try commit is done, we're waiting for perf to collect data for the previous auto branch commit -- should be next in queue.

@michaelwoerister
Copy link
Member Author

http://perf.rust-lang.org/compare.html?start=2e33c89ff1518359c4bd5fbed1571ea00cb3b146&end=c70d5799e8445489a1db19a30f0e2a1f4aa31789&stat=instructions%3Au

So ... this looks very good in all cases, except for style-servo-opt - clean incremental-opt -- which regresses 19%? That kind of looks like a fluke. Let's wait for #47243 and #47181 to be merged and then do another test run, I guess. Both of these should change the distribution of values fed into leb128 quite a bit.

@kennytm kennytm added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 10, 2018
@michaelwoerister
Copy link
Member Author

@bors try

@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 14, 2018
@kennytm kennytm removed the S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. label Jan 14, 2018
@Mark-Simulacrum
Copy link
Member

@bors try

@Mark-Simulacrum
Copy link
Member

@bors retry

@bors
Copy link
Contributor

bors commented Jan 15, 2018

⌛ Trying commit 53c2f44 with merge 8def86f...

bors added a commit that referenced this pull request Jan 15, 2018
Speed up leb128 encoding and decoding for unsigned values.

Make the implementation for some leb128 functions potentially faster.

@Mark-Simulacrum, could you please trigger a perf.rlo run?
@bors
Copy link
Contributor

bors commented Jan 15, 2018

☀️ Test successful - status-travis
State: approved= try=True

@michaelwoerister
Copy link
Member Author

OK, success. Let's see how it does now.

@Mark-Simulacrum
Copy link
Member

Alright, queued. Should be a couple hours.

@michaelwoerister
Copy link
Member Author

@michaelwoerister
Copy link
Member Author

@Mark-Simulacrum Hm, the link doesn't seem to be working. Did I do something wrong?

@michaelwoerister
Copy link
Member Author

The perf.rlo link works now. Numbers look good, I think.

re-r? @sfackler

@kennytm kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 17, 2018
macro_rules! impl_write_unsigned_leb128 {
($fn_name:ident, $int_ty:ident) => (
#[inline]
pub fn $fn_name(out: &mut Vec<u8>, start_position: usize, mut value: $int_ty) -> usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more verbose, but another strategy I've seen for this is just branching on the size of the value and avoiding the loop. Not sure which would be faster in rustc though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, running some tests shows that the following implementation for u32 is 10% faster when encoding metadata (while showing no improvement for the query-cache and the dep-graph):

#[inline]
pub fn write_leb128_u32(out: &mut Vec<u8>, start_position: usize, value: u32) -> usize {

    if value <= (1 << 7) {
        write_to_vec(out, start_position, value as u8);
        1
    } else if value <= (1 << 14) {
        write_to_vec(out, start_position, (value as u8) | 0x80);
        write_to_vec(out, start_position + 1, (value >> 7) as u8);
        2
    } else if value <= (1 << 21) {
        write_to_vec(out, start_position, (value as u8) | 0x80);
        write_to_vec(out, start_position + 1, ((value >> 7) as u8) | 0x80);
        write_to_vec(out, start_position + 2, (value >> 14) as u8);
        3
    } else if value <= (1 << 28) {
        write_to_vec(out, start_position, (value as u8) | 0x80);
        write_to_vec(out, start_position + 1, ((value >> 7) as u8) | 0x80);
        write_to_vec(out, start_position + 2, (value >> 14) as u8 | 0x80);
        write_to_vec(out, start_position + 3, (value >> 21) as u8);
        4
    } else {
        write_to_vec(out, start_position, (value as u8) | 0x80);
        write_to_vec(out, start_position + 1, ((value >> 7) as u8) | 0x80);
        write_to_vec(out, start_position + 2, (value >> 14) as u8 | 0x80);
        write_to_vec(out, start_position + 3, (value >> 21) as u8 | 0x80);
        write_to_vec(out, start_position + 4, (value >> 28) as u8);
        5
    }
}

A similar implementation for usize does a lot worse than the one from the PR. Not sure if it's worth the trouble since my test data is only from one crate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for checking it out!

@sfackler
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Jan 18, 2018

📌 Commit 53c2f44 has been approved by sfackler

@kennytm kennytm added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 18, 2018
@bors
Copy link
Contributor

bors commented Jan 20, 2018

⌛ Testing commit 53c2f44 with merge 816d765...

bors added a commit that referenced this pull request Jan 20, 2018
Speed up leb128 encoding and decoding for unsigned values.

Make the implementation for some leb128 functions potentially faster.

@Mark-Simulacrum, could you please trigger a perf.rlo run?
@bors
Copy link
Contributor

bors commented Jan 20, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: sfackler
Pushing 816d765 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants