Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up parsing with table-driven predicates. #276

Merged
merged 1 commit into from
Sep 2, 2020

Conversation

nnethercote
Copy link
Contributor

When loading a capture in wrench, startup time is dominated by RON
parsing, and most of that time is spent in simple byte predicates such
as "is this a whitespace byte?"

These predicates currently use slice::contains to search through a
list of possible matching bytes. The list lengths range from 4 to 63.

This commit changes these predicates to use a table. Each predicate is
now just an array lookup plus a bit mask. This more than doubles the
parsing speed for the wrench capture I have been looking at.

When loading a capture in wrench, startup time is dominated by RON
parsing, and most of that time is spent in simple byte predicates such
as "is this a whitespace byte?"

These predicates currently use `slice::contains` to search through a
list of possible matching bytes. The list lengths range from 4 to 63.

This commit changes these predicates to use a table. Each predicate is
now just an array lookup plus a bit mask. This more than doubles the
parsing speed for the wrench capture I have been looking at.
@kvark
Copy link
Collaborator

kvark commented Sep 2, 2020

I'm happy to see the loading hotpath optimized? Has been staring at WR captures loading for quite a bit :)
Before reviewing, I have a question: did you measure time in release? Can you share any numbers before/after the change, preferably in debug and release (since the former is most useful)?

@kvark kvark merged commit 2f44ef8 into ron-rs:master Sep 2, 2020
@nnethercote
Copy link
Contributor Author

Here's a partial Cachegrind profile before, showing instruction counts:

--------------------------------------------------------------------------------
Ir                     
--------------------------------------------------------------------------------
1,791,647,600 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                    file:function
--------------------------------------------------------------------------------
250,463,413 (13.98%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd//src/libcore/slice/mod.rs:<i8 as core::slice::SliceContains>::slice_contains
205,357,185 (11.46%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd//src/libcore/slice/memchr.rs:<i8 as core::slice::SliceContains>::slice_contains
183,977,818 (10.27%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd//src/libcore/iter/traits/iterator.rs:<i8 as core::slice::SliceContains>::slice_cont
ains
180,738,660 (10.09%)  ???:???
128,940,782 ( 7.20%)  /home/njn/moz/ron/src/parse.rs:ron::parse::Bytes::skip_ws
 94,233,937 ( 5.26%)  /build/glibc-YYA7BZ/glibc-2.31/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
 55,896,204 ( 3.12%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd/src/libcore/slice/mod.rs:ron::parse::Bytes::skip_ws
 46,453,884 ( 2.59%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd//src/libcore/ptr/mod.rs:<i8 as core::slice::SliceContains>::slice_contains

And after:

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
957,105,658 (100.0%)  PROGRAM TOTALS
 
--------------------------------------------------------------------------------
Ir                    file:function
--------------------------------------------------------------------------------
180,319,309 (18.84%)  ???:???
123,210,617 (12.87%)  /home/njn/moz/ron/src/parse.rs:ron::parse::Bytes::skip_ws
 93,937,965 ( 9.81%)  /build/glibc-YYA7BZ/glibc-2.31/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
 40,092,835 ( 4.19%)  ???:llvm::PMTopLevelManager::setLastUser(llvm::ArrayRef<llvm::Pass*>, llvm::Pass*)
 24,953,670 ( 2.61%)  /home/njn/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-deque-0.7.3/src/lib.rs:crossbeam_deque::Stealer<T>::steal
 22,326,638 ( 2.33%)  /build/glibc-YYA7BZ/glibc-2.31/elf/dl-lookup.c:do_lookup_x
 20,365,249 ( 2.13%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd/src/libcore/option.rs:ron::parse::Bytes::skip_ws
 19,668,728 ( 2.06%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd//src/libcore/str/mod.rs:core::str::from_utf8

I only measured release builds, but this is a simple enough change that I am confident that it would be a significant win on debug builds too.

@nnethercote
Copy link
Contributor Author

This needs to make it into a RON release and then WebRender needs to be updated to use it, of course :)

@nnethercote nnethercote deleted the table-predicates branch September 2, 2020 05:42
@kvark
Copy link
Collaborator

kvark commented Sep 2, 2020

The general workflow of getting patches in our projects is - filing a separate PR against the branch of the released version (v0.6 in our case, but you could also do it for v0.5, etc), and adding a commit that bumps the patch version of the crate as well as adds a small changelog entry. Would you be interested in making such a PR?

@nnethercote
Copy link
Contributor Author

Is that a non-standard release model? I admit I don't exactly follow the steps required in your description.

I was previously confused as to why the current Cargo.toml on master says 0.6.0 when the current release on crates.io is 0.6.1. Perhaps this non-standard release model explains that?

@kvark
Copy link
Collaborator

kvark commented Sep 7, 2020

What is the standard release model that you are referring to?

@nnethercote
Copy link
Contributor Author

It seems pretty common for people to just release off trunk.

@kvark
Copy link
Collaborator

kvark commented Sep 7, 2020

Trunk (or master here) is supposed to be the default branch to target all the changes for. If releasing from it, we'd have to forbid breaking changes coming to this branch, or instantly release as soon as breaking changes come in. That wouldn't solve the problem of needing to patch previously released versions, still, so branches for them would still be needed.

Anyhow, this is how we roll. Unless you have a specific reason for us to do it differently, let's keep this model - it proved it self useful (and I'm open to talk more!).

kvark pushed a commit that referenced this pull request Sep 10, 2020
When loading a capture in wrench, startup time is dominated by RON
parsing, and most of that time is spent in simple byte predicates such
as "is this a whitespace byte?"

These predicates currently use `slice::contains` to search through a
list of possible matching bytes. The list lengths range from 4 to 63.

This commit changes these predicates to use a table. Each predicate is
now just an array lookup plus a bit mask. This more than doubles the
parsing speed for the wrench capture I have been looking at.
@kvark
Copy link
Collaborator

kvark commented Sep 10, 2020

I ported it to v0.6 manually, it's a part of ron-0.6.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants