Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to using thread_local regular expressions to stop mutext contention #996

Merged
merged 1 commit into from
Aug 26, 2023

Conversation

orf
Copy link
Contributor

@orf orf commented Aug 25, 2023

Fixes #986. This is also rebased on top of #995 to allow me to run the benchmarks.

Summary

When calling libcst_native::parse_module from multiple threads the parsing performance drops dramatically. This is somewhat documented in the regex_automata crate and in other places around the internet:

You need to work-around contention issues with sharing a regex across multiple threads. The meta::Regex::search_with API permits bypassing any kind of synchronization at all by requiring the caller to provide the mutable scratch spaced needed during a search.

Benchmarks

I've added a multi-threaded benchmark using Rayon. The use of thread locals improves the throughput by 22% and reduces the time by 18%:

parse_into_cst_parallel/all
                        time:   [476.86 ms 482.42 ms 488.68 ms]
                        thrpt:  [5.9343 Kelem/s 6.0113 Kelem/s 6.0814 Kelem/s]
                 change:
                        time:   [-19.567% -18.431% -17.452%] (p = 0.00 < 0.05)
                        thrpt:  [+21.142% +22.595% +24.327%]
                        Performance has improved.

However the throughput increase I've seen with high-volume "real-world" parsing is much, much higher.

Benchmarks

These are the results of the benchmark added in this change, with and without the thread_local! change. It charts the time taken to parse the fixture files. The number of files is multiplied by the number of threads, which ranges from 1 to 10. At 10 threads the average time taken is 50% faster.

Benchmark: Before

Screenshot 2023-08-25 at 21 09 00

Benchmark: Before

Screenshot 2023-08-25 at 21 09 04

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 25, 2023
@orf orf force-pushed the use-thread-locals branch 5 times, most recently from f215978 to 77269e7 Compare August 25, 2023 18:53
@codecov-commenter
Copy link

codecov-commenter commented Aug 26, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (b28777e) 90.93% compared to head (3f34a66) 90.93%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #996   +/-   ##
=======================================
  Coverage   90.93%   90.93%           
=======================================
  Files         254      254           
  Lines       25972    25972           
=======================================
  Hits        23617    23617           
  Misses       2355     2355           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@zsol zsol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks great to me. Happy to merge once #995 is in.

@zsol
Copy link
Member

zsol commented Aug 26, 2023

Oh, one thing though: you'll have to update the MSRV to (looks like 1.70), but that should be fine.

@orf
Copy link
Contributor Author

orf commented Aug 26, 2023

Done @zsol! I also bumped some dependencies. notably the regex bump also improves parsing speed quite a lot (the benchmarks above used the latest regex version). Let me know if you want me to split them out from this.

@orf orf force-pushed the use-thread-locals branch 2 times, most recently from ab536ea to 4ce22ba Compare August 26, 2023 12:21
@zsol zsol merged commit 75b6331 into Instagram:main Aug 26, 2023
21 checks passed
@zsol
Copy link
Member

zsol commented Aug 26, 2023

Thank you!

@orf orf deleted the use-thread-locals branch August 26, 2023 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issues when using multiple threads due to regex scratch space contention
4 participants