Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some perf tweaks to the tokenizer. #112

Merged
merged 5 commits into from
Jan 21, 2017
Merged

Some perf tweaks to the tokenizer. #112

merged 5 commits into from
Jan 21, 2017

Conversation

emilio
Copy link
Member

@emilio emilio commented Jan 18, 2017

This makes parsing quite faster, by stripping UTF-8 logic and using tables instead of branching everywhere.

We may be able to tweak it a bit more (sometimes the table may be overkill? I don't know).

I've written the table macro so you can skip it easily if you want.

In any case, benchmark results:

Before:

test tests::big_stylesheet ... bench: 10,392,017 ns/iter (+/- 1,954,644)
test tests::unquoted_url ... bench: 261,854 ns/iter (+/- 53,335)

After:

test tests::big_stylesheet ... bench: 8,638,215 ns/iter (+/- 381,980)
test tests::unquoted_url ... bench: 211,863 ns/iter (+/- 73,418)

Which is quite good if you ask me.


This change is Reviewable

@emilio
Copy link
Member Author

emilio commented Jan 18, 2017

The stylesheet is a mix of MozReview, The Guardian, and Facebook, so I can remove it if there's any concern about having it in-repo (we can find others also).

cc @heycam @bholley, I believe this may improve our page load times, is there any way to measure Gecko's tokenizer in isolation? (to see if we should aim for more improvement or not)

r? @SimonSapin

@Ms2ger
Copy link
Contributor

Ms2ger commented Jan 18, 2017

The stylesheet is a mix of MozReview, The Guardian, and Facebook, so I can remove it if there's any concern about having it in-repo (we can find others also).

Yes, please remove.

@emilio
Copy link
Member Author

emilio commented Jan 18, 2017

Yes, please remove.

Done

@emilio
Copy link
Member Author

emilio commented Jan 20, 2017

Review ping @SimonSapin?

@SimonSapin
Copy link
Member

Reviewed 1 of 1 files at r1, 7 of 7 files at r2, 3 of 3 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 13 unresolved discussions.


.travis.yml, line 13 at r6 (raw file):

  - cargo test --features heapsize
  - cargo test --features dummy_match_byte
  - cargo bench --features bench

Why is it useful to run benchmarks on CI? They take time, and the CPU speed of VMs can vary a lot (e.g if another Travis job is running on the same CPU at the same time).


build.rs, line 1 at r2 (raw file):

Needs a copyright header.


src/tokenizer.rs, line 408 at r1 (raw file):

        return None
    }
    let c = tokenizer.next_byte_unchecked();

Please rename c to b or byte.


src/tokenizer.rs, line 556 at r1 (raw file):

                consume_ident_like(tokenizer)
            } else {
                let ret = Delim(tokenizer.next_char());

Since we’re already excluding non-ASCII, c as u8 can be used instead of next_char(). (And ret becomes unnecessary.)


src/tokenizer.rs, line 409 at r2 (raw file):

    }
    let c = tokenizer.next_byte_unchecked();
    let token = match_byte! { c,

Have you checked that match_byte! generates better code than match?


src/macros/match_byte.rs, line 1 at r2 (raw file):

Needs a copyright header.


src/macros/match_byte.rs, line 53 at r2 (raw file):

}

fn parse_match_bytes_macro(tts: Vec<syn::TokenTree>) -> (Vec<syn::TokenTree>, Vec<u8>, Vec<Case>, Option<syn::Ident>) {

This could use some docs. What is the expected "shape" of the input tokens (maybe give an example)? What are the respective components of the return value?


src/macros/match_byte.rs, line 82 at r2 (raw file):

    let mut tts = tts.peekable();
    let mut case_id: u8 = 1;
    let mut binding = None;

The parser allows multiple bindings but only the last one will work. But since this macro is not public and only used with a known set of inputs maybe that’s ok.


src/macros/match_byte.rs, line 108 at r2 (raw file):

/// The `binding` parameter is the identifier that is used by the wildcard
/// pattern.
fn parse_case(tts: &mut iter::Peekable<vec::IntoIter<syn::TokenTree>>,

This accepts some inputs that it shouldn’t (the empty input, an ident followed by a byte pattern, …) but since this macro is not public and only used with a known set of inputs maybe that’s ok.


src/macros/match_byte.rs, line 182 at r2 (raw file):

    assert!(!to_be_matched.is_empty());
    assert_eq!(table.len(), 256);

Replace Vec<u8> with [u8; 256] to make this assert unnecessary?


src/macros/mod.rs, line 1 at r2 (raw file):

Needs a copyright header.


src/macros/visit.rs, line 1 at r2 (raw file):

// Copyright 2016 The html5ever Project Developers. See the

I believe I wrote all of this code, and I agree to re-licensing it under the MPL 2. Please change this license header to the same as the rest of cssparser.


src/macros/visit.rs, line 14 at r2 (raw file):

use syn;

pub trait Visitor {

In html5ever I initially did that but then switched to only parsing token trees and avoiding the AST entirely, so that a visitor trait is not needed. Was there a reason not to do that here?


Comments from Reviewable

@emilio
Copy link
Member Author

emilio commented Jan 20, 2017

I think all your comments are addressed now :)


Review status: 0 of 8 files reviewed at latest revision, 13 unresolved discussions.


.travis.yml, line 13 at r6 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Why is it useful to run benchmarks on CI? They take time, and the CPU speed of VMs can vary a lot (e.g if another Travis job is running on the same CPU at the same time).

Only to prevent them from breaking. I can back this out if you want.


build.rs, line 1 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Needs a copyright header.

Good catch.


src/tokenizer.rs, line 408 at r1 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Please rename c to b or byte.

Done


src/tokenizer.rs, line 556 at r1 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Since we’re already excluding non-ASCII, c as u8 can be used instead of next_char(). (And ret becomes unnecessary.)

Good catch.


src/tokenizer.rs, line 409 at r2 (raw file):
Not manually by inspection and diffing, but I added a feature to compare both and using it made the large url test faster from:

test tests::unquoted_url ... bench: 269,799 ns/iter (+/- 20,056)

without the custom macro expansion, to:

test tests::unquoted_url ... bench: 213,494 ns/iter (+/- 15,135)

with it.


src/macros/match_byte.rs, line 1 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Needs a copyright header.

Right


src/macros/match_byte.rs, line 82 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

The parser allows multiple bindings but only the last one will work. But since this macro is not public and only used with a known set of inputs maybe that’s ok.

Yeah, I only focused in making something that kept enough idiomatic code working.


src/macros/match_byte.rs, line 182 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Replace Vec<u8> with [u8; 256] to make this assert unnecessary?

Yup, fine. I didn't want to pass it on the stack, but it's small anyway.


src/macros/visit.rs, line 1 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

I believe I wrote all of this code, and I agree to re-licensing it under the MPL 2. Please change this license header to the same as the rest of cssparser.

Ok, thanks.


src/macros/visit.rs, line 14 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

In html5ever I initially did that but then switched to only parsing token trees and avoiding the AST entirely, so that a visitor trait is not needed. Was there a reason not to do that here?

Well, we use it to visit expressions and statements in the AST. I can definitely rework it, but I think this way works quite fine.


Comments from Reviewable

@emilio
Copy link
Member Author

emilio commented Jan 20, 2017

Review status: 0 of 8 files reviewed at latest revision, 13 unresolved discussions.


src/tokenizer.rs, line 409 at r2 (raw file):

Previously, emilio (Emilio Cobos Álvarez) wrote…

Not manually by inspection and diffing, but I added a feature to compare both and using it made the large url test faster from:

test tests::unquoted_url ... bench: 269,799 ns/iter (+/- 20,056)

without the custom macro expansion, to:

test tests::unquoted_url ... bench: 213,494 ns/iter (+/- 15,135)

with it.

After conversations in rust-internals (cc @nox), it seems that rustc doesn't generate the best possible code if we use ranges, but could if we moved the ranges to the bottom or expanded it.

We can try to make the match_byte! macro do any of those, and verify the performance is similar than with the table. I don't know if it's totally worth though, and I don't think I'll have the time to do it this evening :)


Comments from Reviewable

@SimonSapin
Copy link
Member

Reviewed 1 of 6 files at r8, 3 of 3 files at r9, 1 of 1 files at r11, 1 of 1 files at r13, 4 of 4 files at r14, 1 of 1 files at r15.
Review status: all files reviewed at latest revision, 4 unresolved discussions.


.travis.yml, line 13 at r6 (raw file):

Previously, emilio (Emilio Cobos Álvarez) wrote…

Only to prevent them from breaking. I can back this out if you want.

I just checked: cargo test runs one iteration of benchmarks and fails if they panic. So cargo test --features bench does what you want but doesn’t take as much time.


src/tokenizer.rs, line 409 at r2 (raw file):

Previously, emilio (Emilio Cobos Álvarez) wrote…

After conversations in rust-internals (cc @nox), it seems that rustc doesn't generate the best possible code if we use ranges, but could if we moved the ranges to the bottom or expanded it.

We can try to make the match_byte! macro do any of those, and verify the performance is similar than with the table. I don't know if it's totally worth though, and I don't think I'll have the time to do it this evening :)

This is already an improvement that I’m happy with, so up to you to land it as-is or investigate alternatives first.


src/macros/match_byte.rs, line 182 at r2 (raw file):

Previously, emilio (Emilio Cobos Álvarez) wrote…

Yup, fine. I didn't want to pass it on the stack, but it's small anyway.

Could also be Box<[u8; 256]>. Anyway, this is compile-time code that is only used a few times and with small inputs, so I’m not worried about its perfomance.


src/macros/visit.rs, line 14 at r2 (raw file):

Previously, emilio (Emilio Cobos Álvarez) wrote…

Well, we use it to visit expressions and statements in the AST. I can definitely rework it, but I think this way works quite fine.

Yeah, both work. It’s just less code to maintain, or to update if the AST needs to change: servo/html5ever@2ddcd53


Comments from Reviewable

@emilio
Copy link
Member Author

emilio commented Jan 21, 2017

Review status: 4 of 7 files reviewed at latest revision, 2 unresolved discussions.


.travis.yml, line 13 at r6 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

I just checked: cargo test runs one iteration of benchmarks and fails if they panic. So cargo test --features bench does what you want but doesn’t take as much time.

Oh, right! done :)


src/tokenizer.rs, line 409 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

This is already an improvement that I’m happy with, so up to you to land it as-is or investigate alternatives first.

Ok, I think I want to land this as-is for now.


src/macros/match_byte.rs, line 182 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Could also be Box<[u8; 256]>. Anyway, this is compile-time code that is only used a few times and with small inputs, so I’m not worried about its perfomance.

Yup, agreed.


src/macros/visit.rs, line 14 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Yeah, both work. It’s just less code to maintain, or to update if the AST needs to change: servo/html5ever@2ddcd53

Thanks for the pointers :)

I made it work on token trees, I've left the last fixup commit intentionally so you can review it.


Comments from Reviewable

@SimonSapin
Copy link
Member

Squash as desired, then r=me :)


Reviewed 1 of 1 files at r16, 3 of 3 files at r17.
Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

This increases the performance of the stylesheet tokenization test about 20%,
and now one of the hottest instructions is the sign extension rust does to index
in the array.
This was causing unaligned moves (movups instructions), for some reason.
@emilio
Copy link
Member Author

emilio commented Jan 21, 2017

@bors-servo r=SimonSapin

@bors-servo
Copy link
Contributor

📌 Commit 21f8573 has been approved by SimonSapin

@bors-servo
Copy link
Contributor

⌛ Testing commit 21f8573 with merge 7c8a9a6...

bors-servo pushed a commit that referenced this pull request Jan 21, 2017
Some perf tweaks to the tokenizer.

This makes parsing quite faster, by stripping UTF-8 logic and using tables instead of branching everywhere.

We may be able to tweak it a bit more (sometimes the table may be overkill? I don't know).

I've written the table macro so you can skip it easily if you want.

In any case, benchmark results:

Before:

> test tests::big_stylesheet ... bench:  10,392,017 ns/iter (+/- 1,954,644)
> test tests::unquoted_url   ... bench:     261,854 ns/iter (+/- 53,335)

After:

> test tests::big_stylesheet ... bench:   8,638,215 ns/iter (+/- 381,980)
> test tests::unquoted_url   ... bench:     211,863 ns/iter (+/- 73,418)

Which is quite good if you ask me.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-cssparser/112)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

☀️ Test successful - status-travis

@bors-servo bors-servo merged commit 21f8573 into master Jan 21, 2017
@emilio
Copy link
Member Author

emilio commented Jan 21, 2017

Thanks for the review @SimonSapin! :)

@emilio emilio deleted the perf branch January 21, 2017 20:32
@nox
Copy link
Contributor

nox commented Mar 13, 2017

Part of this (or maybe all of it) can be reverted once rust-lang/rust#39456 is in Rust stable.

@SimonSapin
Copy link
Member

#128

@SimonSapin SimonSapin mentioned this pull request Jun 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants