Speed up parsing with table-driven predicates. #276

nnethercote · 2020-08-31T04:31:21Z

When loading a capture in wrench, startup time is dominated by RON
parsing, and most of that time is spent in simple byte predicates such
as "is this a whitespace byte?"

These predicates currently use slice::contains to search through a
list of possible matching bytes. The list lengths range from 4 to 63.

This commit changes these predicates to use a table. Each predicate is
now just an array lookup plus a bit mask. This more than doubles the
parsing speed for the wrench capture I have been looking at.

When loading a capture in wrench, startup time is dominated by RON parsing, and most of that time is spent in simple byte predicates such as "is this a whitespace byte?" These predicates currently use `slice::contains` to search through a list of possible matching bytes. The list lengths range from 4 to 63. This commit changes these predicates to use a table. Each predicate is now just an array lookup plus a bit mask. This more than doubles the parsing speed for the wrench capture I have been looking at.

kvark · 2020-09-02T04:22:22Z

I'm happy to see the loading hotpath optimized? Has been staring at WR captures loading for quite a bit :)
Before reviewing, I have a question: did you measure time in release? Can you share any numbers before/after the change, preferably in debug and release (since the former is most useful)?

nnethercote · 2020-09-02T05:39:44Z

Here's a partial Cachegrind profile before, showing instruction counts:

--------------------------------------------------------------------------------
Ir                     
--------------------------------------------------------------------------------
1,791,647,600 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                    file:function
--------------------------------------------------------------------------------
250,463,413 (13.98%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd//src/libcore/slice/mod.rs:<i8 as core::slice::SliceContains>::slice_contains
205,357,185 (11.46%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd//src/libcore/slice/memchr.rs:<i8 as core::slice::SliceContains>::slice_contains
183,977,818 (10.27%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd//src/libcore/iter/traits/iterator.rs:<i8 as core::slice::SliceContains>::slice_cont
ains
180,738,660 (10.09%)  ???:???
128,940,782 ( 7.20%)  /home/njn/moz/ron/src/parse.rs:ron::parse::Bytes::skip_ws
 94,233,937 ( 5.26%)  /build/glibc-YYA7BZ/glibc-2.31/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
 55,896,204 ( 3.12%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd/src/libcore/slice/mod.rs:ron::parse::Bytes::skip_ws
 46,453,884 ( 2.59%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd//src/libcore/ptr/mod.rs:<i8 as core::slice::SliceContains>::slice_contains

And after:

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
957,105,658 (100.0%)  PROGRAM TOTALS
 
--------------------------------------------------------------------------------
Ir                    file:function
--------------------------------------------------------------------------------
180,319,309 (18.84%)  ???:???
123,210,617 (12.87%)  /home/njn/moz/ron/src/parse.rs:ron::parse::Bytes::skip_ws
 93,937,965 ( 9.81%)  /build/glibc-YYA7BZ/glibc-2.31/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
 40,092,835 ( 4.19%)  ???:llvm::PMTopLevelManager::setLastUser(llvm::ArrayRef<llvm::Pass*>, llvm::Pass*)
 24,953,670 ( 2.61%)  /home/njn/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-deque-0.7.3/src/lib.rs:crossbeam_deque::Stealer<T>::steal
 22,326,638 ( 2.33%)  /build/glibc-YYA7BZ/glibc-2.31/elf/dl-lookup.c:do_lookup_x
 20,365,249 ( 2.13%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd/src/libcore/option.rs:ron::parse::Bytes::skip_ws
 19,668,728 ( 2.06%)  /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd//src/libcore/str/mod.rs:core::str::from_utf8

I only measured release builds, but this is a simple enough change that I am confident that it would be a significant win on debug builds too.

nnethercote · 2020-09-02T05:42:09Z

This needs to make it into a RON release and then WebRender needs to be updated to use it, of course :)

kvark · 2020-09-02T11:42:09Z

The general workflow of getting patches in our projects is - filing a separate PR against the branch of the released version (v0.6 in our case, but you could also do it for v0.5, etc), and adding a commit that bumps the patch version of the crate as well as adds a small changelog entry. Would you be interested in making such a PR?

nnethercote · 2020-09-06T23:49:40Z

Is that a non-standard release model? I admit I don't exactly follow the steps required in your description.

I was previously confused as to why the current Cargo.toml on master says 0.6.0 when the current release on crates.io is 0.6.1. Perhaps this non-standard release model explains that?

kvark · 2020-09-07T00:11:52Z

What is the standard release model that you are referring to?

nnethercote · 2020-09-07T03:19:00Z

It seems pretty common for people to just release off trunk.

kvark · 2020-09-07T05:18:04Z

Trunk (or master here) is supposed to be the default branch to target all the changes for. If releasing from it, we'd have to forbid breaking changes coming to this branch, or instantly release as soon as breaking changes come in. That wouldn't solve the problem of needing to patch previously released versions, still, so branches for them would still be needed.

Anyhow, this is how we roll. Unless you have a specific reason for us to do it differently, let's keep this model - it proved it self useful (and I'm open to talk more!).

When loading a capture in wrench, startup time is dominated by RON parsing, and most of that time is spent in simple byte predicates such as "is this a whitespace byte?" These predicates currently use `slice::contains` to search through a list of possible matching bytes. The list lengths range from 4 to 63. This commit changes these predicates to use a table. Each predicate is now just an array lookup plus a bit mask. This more than doubles the parsing speed for the wrench capture I have been looking at.

kvark · 2020-09-10T00:52:55Z

I ported it to v0.6 manually, it's a part of ron-0.6.2

nnethercote force-pushed the table-predicates branch from f897826 to 27d6989 Compare August 31, 2020 04:41

kvark approved these changes Sep 2, 2020

View reviewed changes

kvark merged commit 2f44ef8 into ron-rs:master Sep 2, 2020

nnethercote deleted the table-predicates branch September 2, 2020 05:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up parsing with table-driven predicates. #276

Speed up parsing with table-driven predicates. #276

nnethercote commented Aug 31, 2020

kvark commented Sep 2, 2020

nnethercote commented Sep 2, 2020

nnethercote commented Sep 2, 2020

kvark commented Sep 2, 2020

nnethercote commented Sep 6, 2020

kvark commented Sep 7, 2020

nnethercote commented Sep 7, 2020

kvark commented Sep 7, 2020

kvark commented Sep 10, 2020

Speed up parsing with table-driven predicates. #276

Speed up parsing with table-driven predicates. #276

Conversation

nnethercote commented Aug 31, 2020

kvark commented Sep 2, 2020

nnethercote commented Sep 2, 2020

nnethercote commented Sep 2, 2020

kvark commented Sep 2, 2020

nnethercote commented Sep 6, 2020

kvark commented Sep 7, 2020

nnethercote commented Sep 7, 2020

kvark commented Sep 7, 2020

kvark commented Sep 10, 2020