Some perf tweaks to the tokenizer. #112

emilio · 2017-01-18T01:03:35Z

This makes parsing quite faster, by stripping UTF-8 logic and using tables instead of branching everywhere.

We may be able to tweak it a bit more (sometimes the table may be overkill? I don't know).

I've written the table macro so you can skip it easily if you want.

In any case, benchmark results:

Before:

test tests::big_stylesheet ... bench: 10,392,017 ns/iter (+/- 1,954,644)
test tests::unquoted_url ... bench: 261,854 ns/iter (+/- 53,335)

After:

test tests::big_stylesheet ... bench: 8,638,215 ns/iter (+/- 381,980)
test tests::unquoted_url ... bench: 211,863 ns/iter (+/- 73,418)

Which is quite good if you ask me.

This change is

emilio · 2017-01-18T01:06:50Z

The stylesheet is a mix of MozReview, The Guardian, and Facebook, so I can remove it if there's any concern about having it in-repo (we can find others also).

cc @heycam @bholley, I believe this may improve our page load times, is there any way to measure Gecko's tokenizer in isolation? (to see if we should aim for more improvement or not)

r? @SimonSapin

Ms2ger · 2017-01-18T10:15:48Z

The stylesheet is a mix of MozReview, The Guardian, and Facebook, so I can remove it if there's any concern about having it in-repo (we can find others also).

Yes, please remove.

emilio · 2017-01-18T13:14:53Z

Yes, please remove.

Done

emilio · 2017-01-20T00:41:52Z

Review ping @SimonSapin?

SimonSapin · 2017-01-20T09:40:53Z

Reviewed 1 of 1 files at r1, 7 of 7 files at r2, 3 of 3 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 13 unresolved discussions.

.travis.yml, line 13 at r6 (raw file):

  - cargo test --features heapsize
  - cargo test --features dummy_match_byte
  - cargo bench --features bench

Why is it useful to run benchmarks on CI? They take time, and the CPU speed of VMs can vary a lot (e.g if another Travis job is running on the same CPU at the same time).

build.rs, line 1 at r2 (raw file):

Needs a copyright header.

src/tokenizer.rs, line 408 at r1 (raw file):

        return None
    }
    let c = tokenizer.next_byte_unchecked();

Please rename c to b or byte.

src/tokenizer.rs, line 556 at r1 (raw file):

                consume_ident_like(tokenizer)
            } else {
                let ret = Delim(tokenizer.next_char());

Since we’re already excluding non-ASCII, c as u8 can be used instead of next_char(). (And ret becomes unnecessary.)

src/tokenizer.rs, line 409 at r2 (raw file):

    }
    let c = tokenizer.next_byte_unchecked();
    let token = match_byte! { c,

Have you checked that match_byte! generates better code than match?

src/macros/match_byte.rs, line 1 at r2 (raw file):

Needs a copyright header.

src/macros/match_byte.rs, line 53 at r2 (raw file):

}

fn parse_match_bytes_macro(tts: Vec<syn::TokenTree>) -> (Vec<syn::TokenTree>, Vec<u8>, Vec<Case>, Option<syn::Ident>) {

This could use some docs. What is the expected "shape" of the input tokens (maybe give an example)? What are the respective components of the return value?

src/macros/match_byte.rs, line 82 at r2 (raw file):

    let mut tts = tts.peekable();
    let mut case_id: u8 = 1;
    let mut binding = None;

The parser allows multiple bindings but only the last one will work. But since this macro is not public and only used with a known set of inputs maybe that’s ok.

src/macros/match_byte.rs, line 108 at r2 (raw file):

/// The `binding` parameter is the identifier that is used by the wildcard
/// pattern.
fn parse_case(tts: &mut iter::Peekable<vec::IntoIter<syn::TokenTree>>,

This accepts some inputs that it shouldn’t (the empty input, an ident followed by a byte pattern, …) but since this macro is not public and only used with a known set of inputs maybe that’s ok.

src/macros/match_byte.rs, line 182 at r2 (raw file):

    assert!(!to_be_matched.is_empty());
    assert_eq!(table.len(), 256);

Replace Vec<u8> with [u8; 256] to make this assert unnecessary?

src/macros/mod.rs, line 1 at r2 (raw file):

Needs a copyright header.

src/macros/visit.rs, line 1 at r2 (raw file):

// Copyright 2016 The html5ever Project Developers. See the

I believe I wrote all of this code, and I agree to re-licensing it under the MPL 2. Please change this license header to the same as the rest of cssparser.

src/macros/visit.rs, line 14 at r2 (raw file):

use syn;

pub trait Visitor {

In html5ever I initially did that but then switched to only parsing token trees and avoiding the AST entirely, so that a visitor trait is not needed. Was there a reason not to do that here?

Comments from Reviewable

emilio · 2017-01-20T15:42:31Z

I think all your comments are addressed now :)

Review status: 0 of 8 files reviewed at latest revision, 13 unresolved discussions.

.travis.yml, line 13 at r6 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Why is it useful to run benchmarks on CI? They take time, and the CPU speed of VMs can vary a lot (e.g if another Travis job is running on the same CPU at the same time).

Only to prevent them from breaking. I can back this out if you want.

build.rs, line 1 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Needs a copyright header.

Good catch.

src/tokenizer.rs, line 408 at r1 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Please rename c to b or byte.

Done

src/tokenizer.rs, line 556 at r1 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Since we’re already excluding non-ASCII, c as u8 can be used instead of next_char(). (And ret becomes unnecessary.)

Good catch.

src/tokenizer.rs, line 409 at r2 (raw file):
Not manually by inspection and diffing, but I added a feature to compare both and using it made the large url test faster from:

test tests::unquoted_url ... bench: 269,799 ns/iter (+/- 20,056)

without the custom macro expansion, to:

test tests::unquoted_url ... bench: 213,494 ns/iter (+/- 15,135)

with it.

src/macros/match_byte.rs, line 1 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Needs a copyright header.

Right

src/macros/match_byte.rs, line 82 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

The parser allows multiple bindings but only the last one will work. But since this macro is not public and only used with a known set of inputs maybe that’s ok.

Yeah, I only focused in making something that kept enough idiomatic code working.

src/macros/match_byte.rs, line 182 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Replace Vec<u8> with [u8; 256] to make this assert unnecessary?

Yup, fine. I didn't want to pass it on the stack, but it's small anyway.

src/macros/visit.rs, line 1 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

I believe I wrote all of this code, and I agree to re-licensing it under the MPL 2. Please change this license header to the same as the rest of cssparser.

Ok, thanks.

src/macros/visit.rs, line 14 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

In html5ever I initially did that but then switched to only parsing token trees and avoiding the AST entirely, so that a visitor trait is not needed. Was there a reason not to do that here?

Well, we use it to visit expressions and statements in the AST. I can definitely rework it, but I think this way works quite fine.

Comments from Reviewable

emilio · 2017-01-20T15:47:10Z

Review status: 0 of 8 files reviewed at latest revision, 13 unresolved discussions.

src/tokenizer.rs, line 409 at r2 (raw file):

Previously, emilio (Emilio Cobos Álvarez) wrote…

Not manually by inspection and diffing, but I added a feature to compare both and using it made the large url test faster from:

test tests::unquoted_url ... bench: 269,799 ns/iter (+/- 20,056)

without the custom macro expansion, to:

test tests::unquoted_url ... bench: 213,494 ns/iter (+/- 15,135)

with it.

After conversations in rust-internals (cc @nox), it seems that rustc doesn't generate the best possible code if we use ranges, but could if we moved the ranges to the bottom or expanded it.

We can try to make the match_byte! macro do any of those, and verify the performance is similar than with the table. I don't know if it's totally worth though, and I don't think I'll have the time to do it this evening :)

Comments from Reviewable

SimonSapin · 2017-01-20T16:28:56Z

Reviewed 1 of 6 files at r8, 3 of 3 files at r9, 1 of 1 files at r11, 1 of 1 files at r13, 4 of 4 files at r14, 1 of 1 files at r15.
Review status: all files reviewed at latest revision, 4 unresolved discussions.

.travis.yml, line 13 at r6 (raw file):

Previously, emilio (Emilio Cobos Álvarez) wrote…

Only to prevent them from breaking. I can back this out if you want.

I just checked: cargo test runs one iteration of benchmarks and fails if they panic. So cargo test --features bench does what you want but doesn’t take as much time.

src/tokenizer.rs, line 409 at r2 (raw file):

Previously, emilio (Emilio Cobos Álvarez) wrote…

After conversations in rust-internals (cc @nox), it seems that rustc doesn't generate the best possible code if we use ranges, but could if we moved the ranges to the bottom or expanded it.

We can try to make the match_byte! macro do any of those, and verify the performance is similar than with the table. I don't know if it's totally worth though, and I don't think I'll have the time to do it this evening :)

This is already an improvement that I’m happy with, so up to you to land it as-is or investigate alternatives first.

src/macros/match_byte.rs, line 182 at r2 (raw file):

Previously, emilio (Emilio Cobos Álvarez) wrote…

Yup, fine. I didn't want to pass it on the stack, but it's small anyway.

Could also be Box<[u8; 256]>. Anyway, this is compile-time code that is only used a few times and with small inputs, so I’m not worried about its perfomance.

src/macros/visit.rs, line 14 at r2 (raw file):

Previously, emilio (Emilio Cobos Álvarez) wrote…

Well, we use it to visit expressions and statements in the AST. I can definitely rework it, but I think this way works quite fine.

Yeah, both work. It’s just less code to maintain, or to update if the AST needs to change: servo/html5ever@2ddcd53

Comments from Reviewable

emilio · 2017-01-21T13:06:47Z

Review status: 4 of 7 files reviewed at latest revision, 2 unresolved discussions.

.travis.yml, line 13 at r6 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

I just checked: cargo test runs one iteration of benchmarks and fails if they panic. So cargo test --features bench does what you want but doesn’t take as much time.

Oh, right! done :)

src/tokenizer.rs, line 409 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

This is already an improvement that I’m happy with, so up to you to land it as-is or investigate alternatives first.

Ok, I think I want to land this as-is for now.

src/macros/match_byte.rs, line 182 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Could also be Box<[u8; 256]>. Anyway, this is compile-time code that is only used a few times and with small inputs, so I’m not worried about its perfomance.

Yup, agreed.

src/macros/visit.rs, line 14 at r2 (raw file):

Previously, SimonSapin (Simon Sapin) wrote…

Yeah, both work. It’s just less code to maintain, or to update if the AST needs to change: servo/html5ever@2ddcd53

Thanks for the pointers :)

I made it work on token trees, I've left the last fixup commit intentionally so you can review it.

Comments from Reviewable

SimonSapin · 2017-01-21T20:06:26Z

Squash as desired, then r=me :)

Reviewed 1 of 1 files at r16, 3 of 3 files at r17.
Review status: all files reviewed at latest revision, all discussions resolved.

Comments from Reviewable

This increases the performance of the stylesheet tokenization test about 20%, and now one of the hottest instructions is the sign extension rust does to index in the array.

This was causing unaligned moves (movups instructions), for some reason.

emilio · 2017-01-21T20:25:06Z

@bors-servo r=SimonSapin

bors-servo · 2017-01-21T20:25:07Z

📌 Commit 21f8573 has been approved by SimonSapin

bors-servo · 2017-01-21T20:25:10Z

⌛ Testing commit 21f8573 with merge 7c8a9a6...

Some perf tweaks to the tokenizer. This makes parsing quite faster, by stripping UTF-8 logic and using tables instead of branching everywhere. We may be able to tweak it a bit more (sometimes the table may be overkill? I don't know). I've written the table macro so you can skip it easily if you want. In any case, benchmark results: Before: > test tests::big_stylesheet ... bench: 10,392,017 ns/iter (+/- 1,954,644) > test tests::unquoted_url ... bench: 261,854 ns/iter (+/- 53,335) After: > test tests::big_stylesheet ... bench: 8,638,215 ns/iter (+/- 381,980) > test tests::unquoted_url ... bench: 211,863 ns/iter (+/- 73,418) Which is quite good if you ask me.  --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-cssparser/112)

bors-servo · 2017-01-21T20:28:23Z

☀️ Test successful - status-travis

emilio · 2017-01-21T20:32:31Z

Thanks for the review @SimonSapin! :)

nox · 2017-03-13T15:58:39Z

Part of this (or maybe all of it) can be reverted once rust-lang/rust#39456 is in Rust stable.

SimonSapin · 2017-03-13T17:02:43Z

#128

Use less UTF-8 logic when not needed.

afcf259

emilio force-pushed the perf branch from 09e0255 to 6105db6 Compare January 18, 2017 13:14

emilio force-pushed the perf branch from 6105db6 to dee6833 Compare January 20, 2017 15:13

emilio added 4 commits January 21, 2017 21:23

Use a procedural macro to create jump tables instead of doing branching.

4676208

This increases the performance of the stylesheet tokenization test about 20%, and now one of the hottest instructions is the sign extension rust does to index in the array.

Change the signature of next_token to match Tokenizer::next()

b509dc0

This was causing unaligned moves (movups instructions), for some reason.

Remove most of the unnecessary UTF-8 logic from the tokenizer.

52e6527

travis: Test with and without procedural macro.

21f8573

emilio force-pushed the perf branch from 48e09f2 to 21f8573 Compare January 21, 2017 20:24

bors-servo merged commit 21f8573 into master Jan 21, 2017

emilio deleted the perf branch January 21, 2017 20:32

SimonSapin mentioned this pull request Mar 13, 2017

Reconsider match_byte macro if plain match becomes optimized enough #128

Open

SimonSapin mentioned this pull request Jun 25, 2017

Make it fast! #164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some perf tweaks to the tokenizer. #112

Some perf tweaks to the tokenizer. #112

emilio commented Jan 18, 2017 •

edited by larsbergstrom

Loading

emilio commented Jan 18, 2017

Ms2ger commented Jan 18, 2017

emilio commented Jan 18, 2017

emilio commented Jan 20, 2017

SimonSapin commented Jan 20, 2017

emilio commented Jan 20, 2017

emilio commented Jan 20, 2017

SimonSapin commented Jan 20, 2017

emilio commented Jan 21, 2017

SimonSapin commented Jan 21, 2017

emilio commented Jan 21, 2017

bors-servo commented Jan 21, 2017

bors-servo commented Jan 21, 2017

bors-servo commented Jan 21, 2017

emilio commented Jan 21, 2017

nox commented Mar 13, 2017

SimonSapin commented Mar 13, 2017

Some perf tweaks to the tokenizer. #112

Some perf tweaks to the tokenizer. #112

Conversation

emilio commented Jan 18, 2017 • edited by larsbergstrom Loading

emilio commented Jan 18, 2017

Ms2ger commented Jan 18, 2017

emilio commented Jan 18, 2017

emilio commented Jan 20, 2017

SimonSapin commented Jan 20, 2017

emilio commented Jan 20, 2017

emilio commented Jan 20, 2017

SimonSapin commented Jan 20, 2017

emilio commented Jan 21, 2017

SimonSapin commented Jan 21, 2017

emilio commented Jan 21, 2017

bors-servo commented Jan 21, 2017

bors-servo commented Jan 21, 2017

bors-servo commented Jan 21, 2017

emilio commented Jan 21, 2017

nox commented Mar 13, 2017

SimonSapin commented Mar 13, 2017

emilio commented Jan 18, 2017 •

edited by larsbergstrom

Loading