Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automata/nfa/backtrack: fix memory usage #1028

Merged
merged 1 commit into from
Jul 7, 2023

Conversation

BurntSushi
Copy link
Member

This fixes a memory usage regression where the backtracker would eagerly allocate its entire capacity up-front. In this case, it meant a minimum of 256KB for every regex.

Prior to regex 1.9, 256KB was treated as a maximum, and we only allocated what we needed. We migrate that strategy to regex-automata now as well. This probably does come with a latency cost (I'll run rebar to be sure it isn't horrendous), but we definitely can't be eagerly allocating 256KB for every regex. If the latency ends up being an issue, we can investigate fixing that in other ways.

Fixes #1027

This fixes a memory usage regression where the backtracker would eagerly
allocate its entire capacity up-front. In this case, it meant a minimum
of 256KB for every regex.

Prior to regex 1.9, 256KB was treated as a *maximum*, and we only
allocated what we needed. We migrate that strategy to regex-automata now
as well. This probably does come with a latency cost (I'll run rebar to
be sure it isn't horrendous), but we definitely can't be eagerly
allocating 256KB for every regex. If the latency ends up being an issue,
we can investigate fixing that in other ways.

Fixes #1027
@BurntSushi
Copy link
Member Author

Benchmarks on just the backtracking engine suggest any difference is within noise:

$ rebar diff tmp/old.csv tmp/new.csv
benchmark                              engine                tmp/old.csv         tmp/new.csv
---------                              ------                -----------         -----------
opt/backtrack/words-english            rust/regex/backtrack  56.6 MB/s (1.00x)   56.3 MB/s (1.01x)
opt/backtrack/words-russian            rust/regex/backtrack  44.0 MB/s (1.00x)   44.1 MB/s (1.00x)
opt/onepass/fn-predicate               rust/regex/backtrack  371.2 MB/s (1.00x)  354.1 MB/s (1.05x)
opt/onepass/first-three-words-english  rust/regex/backtrack  236.8 MB/s (1.00x)  232.1 MB/s (1.02x)
opt/onepass/first-three-words-russian  rust/regex/backtrack  260.0 MB/s (1.00x)  255.5 MB/s (1.02x)
opt/onepass/word-boundary-english      rust/regex/backtrack  603.0 MB/s (1.00x)  567.9 MB/s (1.06x)
opt/onepass/word-boundary-russian      rust/regex/backtrack  715.4 MB/s (1.00x)  686.1 MB/s (1.04x)

I also ran the full regex engine on the curated set of benchmarks and all seems well there too:

$ rebar diff record/all/2023-07-02/rust-regex.csv tmp/curated-rust-regex.csv -t 1.00001
benchmark                                       engine      record/all/2023-07-02/rust-regex.csv  tmp/curated-rust-regex.csv
---------                                       ------      ------------------------------------  --------------------------
curated/01-literal/sherlock-en                  rust/regex  31.5 GB/s (1.04x)                     32.7 GB/s (1.00x)
curated/01-literal/sherlock-casei-en            rust/regex  10.0 GB/s (1.12x)                     11.2 GB/s (1.00x)
curated/01-literal/sherlock-ru                  rust/regex  31.3 GB/s (1.03x)                     32.1 GB/s (1.00x)
curated/01-literal/sherlock-casei-ru            rust/regex  8.7 GB/s (1.03x)                      9.0 GB/s (1.00x)
curated/01-literal/sherlock-zh                  rust/regex  40.3 GB/s (1.00x)                     39.5 GB/s (1.02x)
curated/02-literal-alternate/sherlock-en        rust/regex  12.5 GB/s (1.01x)                     12.7 GB/s (1.00x)
curated/02-literal-alternate/sherlock-casei-en  rust/regex  2.9 GB/s (1.00x)                      2.9 GB/s (1.00x)
curated/02-literal-alternate/sherlock-ru        rust/regex  6.5 GB/s (1.02x)                      6.6 GB/s (1.00x)
curated/02-literal-alternate/sherlock-zh        rust/regex  12.6 GB/s (1.19x)                     15.1 GB/s (1.00x)
curated/03-date/ascii                           rust/regex  163.3 MB/s (1.00x)                    162.2 MB/s (1.01x)
curated/03-date/unicode                         rust/regex  162.2 MB/s (1.00x)                    161.2 MB/s (1.01x)
curated/03-date/compile-unicode                 rust/regex  5.27ms (1.00x)                        5.36ms (1.02x)
curated/04-ruff-noqa/real                       rust/regex  1676.1 MB/s (1.00x)                   1655.5 MB/s (1.01x)
curated/04-ruff-noqa/tweaked                    rust/regex  1620.1 MB/s (1.00x)                   1562.1 MB/s (1.04x)
curated/04-ruff-noqa/compile-real               rust/regex  53.74us (1.01x)                       53.10us (1.00x)
curated/05-lexer-veryl/single                   rust/regex  9.3 MB/s (1.01x)                      9.4 MB/s (1.00x)
curated/05-lexer-veryl/compile-single           rust/regex  272.44us (1.01x)                      270.92us (1.00x)
curated/06-cloud-flare-redos/original           rust/regex  583.1 MB/s (1.00x)                    579.8 MB/s (1.01x)
curated/07-unicode-character-data/parse-line    rust/regex  330.0 MB/s (1.01x)                    331.8 MB/s (1.00x)
curated/07-unicode-character-data/compile       rust/regex  28.04us (1.00x)                       28.02us (1.00x)
curated/08-words/all-english                    rust/regex  119.9 MB/s (1.01x)                    120.8 MB/s (1.00x)
curated/08-words/all-russian                    rust/regex  19.6 MB/s (1.00x)                     18.7 MB/s (1.05x)
curated/08-words/long-russian                   rust/regex  33.7 MB/s (1.00x)                     32.7 MB/s (1.03x)
curated/09-aws-keys/full                        rust/regex  1782.1 MB/s (1.00x)                   1764.8 MB/s (1.01x)
curated/09-aws-keys/quick                       rust/regex  1854.6 MB/s (1.01x)                   1868.0 MB/s (1.00x)
curated/09-aws-keys/compile-full                rust/regex  87.28us (1.03x)                       84.91us (1.00x)
curated/09-aws-keys/compile-quick               rust/regex  15.13us (1.00x)                       15.07us (1.00x)
curated/10-bounded-repeat/letters-en            rust/regex  715.0 MB/s (1.01x)                    721.7 MB/s (1.00x)
curated/10-bounded-repeat/letters-ru            rust/regex  642.7 MB/s (1.00x)                    643.6 MB/s (1.00x)
curated/10-bounded-repeat/context               rust/regex  99.6 MB/s (1.00x)                     96.9 MB/s (1.03x)
curated/10-bounded-repeat/compile-context       rust/regex  61.49us (1.05x)                       58.81us (1.00x)
curated/10-bounded-repeat/compile-capitals      rust/regex  61.33us (1.02x)                       60.36us (1.00x)
curated/11-unstructured-to-json/extract         rust/regex  109.5 MB/s (1.02x)                    112.1 MB/s (1.00x)
curated/11-unstructured-to-json/compile         rust/regex  20.06us (1.03x)                       19.54us (1.00x)
curated/12-dictionary/single                    rust/regex  714.9 MB/s (1.00x)                    714.7 MB/s (1.00x)
curated/12-dictionary/multi                     rust/regex  187.0 MB/s (1.00x)                    180.8 MB/s (1.03x)
curated/12-dictionary/compile-single            rust/regex  7.74ms (1.00x)                        8.26ms (1.07x)
curated/12-dictionary/compile-multi             rust/regex  15.87ms (1.00x)                       16.51ms (1.04x)
curated/13-noseyparker/single                   rust/regex  132.8 MB/s (1.00x)                    128.8 MB/s (1.03x)
curated/13-noseyparker/multi                    rust/regex  104.0 MB/s (1.00x)                    101.4 MB/s (1.03x)
curated/13-noseyparker/compile-single           rust/regex  2.44ms (1.01x)                        2.41ms (1.00x)
curated/14-quadratic/1x                         rust/regex  17.8 MB/s (1.00x)                     17.7 MB/s (1.00x)
curated/14-quadratic/10x                        rust/regex  1706.3 KB/s (1.00x)                   1706.5 KB/s (1.00x)

@BurntSushi BurntSushi merged commit 4e89cbf into master Jul 7, 2023
@BurntSushi BurntSushi deleted the ag/fix-backtrack-memusage branch July 7, 2023 17:42
@BurntSushi BurntSushi restored the ag/fix-backtrack-memusage branch July 7, 2023 17:46
@BurntSushi BurntSushi deleted the ag/fix-backtrack-memusage branch July 7, 2023 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increased runtime memory usage in 1.9
1 participant