-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache parsed object names #675
Conversation
Looking up the fstring table isn't that cheap, if we assume most object names are likely to re-appear, it makes sense to keep a small cache of them on the stack. e.g. `[{"foo": 1, "bar": 2}, {"foo": 3, "bar": 4}]` In term of implementation, we use a simple sorted array with binary search as it's the most compact and guarantee a decent `O(log n)`, and the comparison is first done on the string length, an then fallback to `memcmp`. This helps quite a bit on `activitypub` and `twitter` benchmarks, however it is worse on `citm_catalog` because it has a number of unique object names, of the same size, so it behave poorly with the chosen sorting algorithm. It's a bit of a double edge sword, but perhaps a smarter comparison function may solve it. Before: ``` == Parsing activitypub.json (58160 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 714.000 i/100ms oj 801.000 i/100ms Oj::Parser 943.000 i/100ms rapidjson 632.000 i/100ms Calculating ------------------------------------- json 7.135k (± 0.7%) i/s (140.16 μs/i) - 35.700k in 5.003978s oj 7.991k (± 0.2%) i/s (125.14 μs/i) - 40.050k in 5.012044s Oj::Parser 9.611k (± 0.2%) i/s (104.04 μs/i) - 48.093k in 5.003723s rapidjson 6.318k (± 0.2%) i/s (158.29 μs/i) - 31.600k in 5.001896s Comparison: json: 7134.7 i/s Oj::Parser: 9611.5 i/s - 1.35x faster oj: 7990.8 i/s - 1.12x faster rapidjson: 6317.6 i/s - 1.13x slower == Parsing twitter.json (567916 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 57.000 i/100ms oj 62.000 i/100ms Oj::Parser 78.000 i/100ms rapidjson 56.000 i/100ms Calculating ------------------------------------- json 573.527 (± 1.6%) i/s (1.74 ms/i) - 2.907k in 5.070094s oj 619.368 (± 1.6%) i/s (1.61 ms/i) - 3.100k in 5.006550s Oj::Parser 770.095 (± 0.9%) i/s (1.30 ms/i) - 3.900k in 5.064768s rapidjson 560.601 (± 0.4%) i/s (1.78 ms/i) - 2.856k in 5.094597s Comparison: json: 573.5 i/s Oj::Parser: 770.1 i/s - 1.34x faster oj: 619.4 i/s - 1.08x faster rapidjson: 560.6 i/s - 1.02x slower == Parsing citm_catalog.json (1727030 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 31.000 i/100ms oj 34.000 i/100ms Oj::Parser 46.000 i/100ms rapidjson 38.000 i/100ms Calculating ------------------------------------- json 319.842 (± 0.3%) i/s (3.13 ms/i) - 1.612k in 5.040026s oj 329.315 (± 2.4%) i/s (3.04 ms/i) - 1.666k in 5.061887s Oj::Parser 452.725 (± 1.1%) i/s (2.21 ms/i) - 2.300k in 5.080996s rapidjson 358.160 (± 0.8%) i/s (2.79 ms/i) - 1.824k in 5.093054s Comparison: json: 319.8 i/s Oj::Parser: 452.7 i/s - 1.42x faster rapidjson: 358.2 i/s - 1.12x faster oj: 329.3 i/s - 1.03x faster ``` After: ``` == Parsing activitypub.json (58160 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 810.000 i/100ms oj 794.000 i/100ms Oj::Parser 936.000 i/100ms rapidjson 630.000 i/100ms Calculating ------------------------------------- json 7.759k (± 1.2%) i/s (128.88 μs/i) - 38.880k in 5.011657s oj 7.945k (± 1.2%) i/s (125.87 μs/i) - 40.494k in 5.097765s Oj::Parser 9.658k (± 1.3%) i/s (103.54 μs/i) - 48.672k in 5.040614s rapidjson 6.321k (± 0.3%) i/s (158.20 μs/i) - 32.130k in 5.082985s Comparison: json: 7759.2 i/s Oj::Parser: 9657.8 i/s - 1.24x faster oj: 7944.6 i/s - same-ish: difference falls within error rapidjson: 6321.1 i/s - 1.23x slower == Parsing twitter.json (567916 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 66.000 i/100ms oj 62.000 i/100ms Oj::Parser 78.000 i/100ms rapidjson 56.000 i/100ms Calculating ------------------------------------- json 661.653 (± 0.6%) i/s (1.51 ms/i) - 3.366k in 5.087447s oj 621.674 (± 1.3%) i/s (1.61 ms/i) - 3.162k in 5.087171s Oj::Parser 775.859 (± 0.3%) i/s (1.29 ms/i) - 3.900k in 5.026718s rapidjson 563.961 (± 0.4%) i/s (1.77 ms/i) - 2.856k in 5.064230s Comparison: json: 661.7 i/s Oj::Parser: 775.9 i/s - 1.17x faster oj: 621.7 i/s - 1.06x slower rapidjson: 564.0 i/s - 1.17x slower == Parsing citm_catalog.json (1727030 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 30.000 i/100ms oj 34.000 i/100ms Oj::Parser 47.000 i/100ms rapidjson 38.000 i/100ms Calculating ------------------------------------- json 303.318 (± 0.3%) i/s (3.30 ms/i) - 1.530k in 5.044285s oj 328.832 (± 2.7%) i/s (3.04 ms/i) - 1.666k in 5.069899s Oj::Parser 458.235 (± 0.4%) i/s (2.18 ms/i) - 2.303k in 5.025895s rapidjson 381.204 (± 0.5%) i/s (2.62 ms/i) - 1.938k in 5.084065s Comparison: json: 303.3 i/s Oj::Parser: 458.2 i/s - 1.51x faster rapidjson: 381.2 i/s - 1.26x faster oj: 328.8 i/s - 1.08x faster ```
Based on how JSON documents are generally constructed, a numeric object name is much less likely to be repeated in the document. ``` == Parsing activitypub.json (58160 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 779.000 i/100ms oj 793.000 i/100ms Oj::Parser 956.000 i/100ms rapidjson 620.000 i/100ms Calculating ------------------------------------- json 7.849k (± 0.3%) i/s (127.41 μs/i) - 39.729k in 5.061875s oj 7.953k (± 0.3%) i/s (125.74 μs/i) - 40.443k in 5.085293s Oj::Parser 9.652k (± 0.2%) i/s (103.61 μs/i) - 48.756k in 5.051573s rapidjson 6.198k (± 0.1%) i/s (161.35 μs/i) - 31.000k in 5.001787s Comparison: json: 7848.7 i/s Oj::Parser: 9651.7 i/s - 1.23x faster oj: 7953.0 i/s - 1.01x faster rapidjson: 6197.8 i/s - 1.27x slower == Parsing twitter.json (567916 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 66.000 i/100ms oj 62.000 i/100ms Oj::Parser 79.000 i/100ms rapidjson 56.000 i/100ms Calculating ------------------------------------- json 676.145 (± 0.3%) i/s (1.48 ms/i) - 3.432k in 5.075895s oj 631.864 (± 0.3%) i/s (1.58 ms/i) - 3.162k in 5.004287s Oj::Parser 791.409 (± 0.3%) i/s (1.26 ms/i) - 4.029k in 5.090947s rapidjson 566.597 (± 0.2%) i/s (1.76 ms/i) - 2.856k in 5.040641s Comparison: json: 676.1 i/s Oj::Parser: 791.4 i/s - 1.17x faster oj: 631.9 i/s - 1.07x slower rapidjson: 566.6 i/s - 1.19x slower == Parsing citm_catalog.json (1727030 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 34.000 i/100ms oj 32.000 i/100ms Oj::Parser 43.000 i/100ms rapidjson 37.000 i/100ms Calculating ------------------------------------- json 356.414 (± 0.3%) i/s (2.81 ms/i) - 1.802k in 5.055963s oj 349.397 (± 2.0%) i/s (2.86 ms/i) - 1.760k in 5.039216s Oj::Parser 478.342 (± 0.4%) i/s (2.09 ms/i) - 2.408k in 5.034158s rapidjson 395.222 (± 0.8%) i/s (2.53 ms/i) - 1.998k in 5.055669s Comparison: json: 356.4 i/s Oj::Parser: 478.3 i/s - 1.34x faster rapidjson: 395.2 i/s - 1.11x faster oj: 349.4 i/s - same-ish: difference falls within error ```
4a04252 solves the regression. It's simple, we just don't cache names that don't start with a letter. This may sound like benchmark gaming, but I think it makes sense as an heuristic given how most JSON documents are constructed.
|
ext/json/ext/parser/name_cache.h
Outdated
static VALUE rstring_cache_fetch(rstring_cache *cache, const char *str, const long length) | ||
{ | ||
if (RB_UNLIKELY(length > JSON_RSTRING_CACHE_MAX_ENTRY_LENGTH)) { | ||
rb_enc_interned_str(str, length, rb_utf8_encoding()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this missing a return
?
Fixed in the next commit, but I wonder if it affects the benchmark results for this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it was a mistake. I don't think we ever hit that branch on any of the benchmarks.
@eregon while you're here, I'm surprised to see TruffleRuby is missing
|
https://github.com/oracle/truffleruby/blob/07b978e4e0d43827ca13c600af447c98ac477a71/src/main/c/cext/string.c#L435-L442 looks like TruffleRuby has rb_enc_interned_str_cstr() and rb_str_to_interned_str() but not rb_enc_interned_str() yet -> oracle/truffleruby#3703 |
Also micro optimize a bit further. ``` == Parsing activitypub.json (58160 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 832.000 i/100ms oj 799.000 i/100ms Oj::Parser 969.000 i/100ms rapidjson 636.000 i/100ms Calculating ------------------------------------- json 8.020k (± 0.3%) i/s (124.70 μs/i) - 40.768k in 5.083607s oj 7.942k (± 1.7%) i/s (125.92 μs/i) - 39.950k in 5.031909s Oj::Parser 9.515k (± 4.4%) i/s (105.10 μs/i) - 47.481k in 5.001202s rapidjson 6.282k (± 2.1%) i/s (159.20 μs/i) - 31.800k in 5.064719s Comparison: json: 8019.6 i/s Oj::Parser: 9514.5 i/s - 1.19x faster oj: 7941.9 i/s - same-ish: difference falls within error rapidjson: 6281.6 i/s - 1.28x slower == Parsing twitter.json (567916 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 67.000 i/100ms oj 62.000 i/100ms Oj::Parser 79.000 i/100ms rapidjson 55.000 i/100ms Calculating ------------------------------------- json 670.935 (± 2.7%) i/s (1.49 ms/i) - 3.417k in 5.096850s oj 618.937 (± 3.2%) i/s (1.62 ms/i) - 3.100k in 5.014800s Oj::Parser 768.894 (± 1.7%) i/s (1.30 ms/i) - 3.871k in 5.036093s rapidjson 556.882 (± 2.7%) i/s (1.80 ms/i) - 2.805k in 5.040970s Comparison: json: 670.9 i/s Oj::Parser: 768.9 i/s - 1.15x faster oj: 618.9 i/s - 1.08x slower rapidjson: 556.9 i/s - 1.20x slower == Parsing citm_catalog.json (1727030 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 35.000 i/100ms oj 32.000 i/100ms Oj::Parser 43.000 i/100ms rapidjson 39.000 i/100ms Calculating ------------------------------------- json 382.262 (± 2.6%) i/s (2.62 ms/i) - 1.925k in 5.039336s oj 348.721 (± 0.6%) i/s (2.87 ms/i) - 1.760k in 5.047265s Oj::Parser 478.294 (± 0.6%) i/s (2.09 ms/i) - 2.408k in 5.034798s rapidjson 398.740 (± 0.8%) i/s (2.51 ms/i) - 2.028k in 5.086365s Comparison: json: 382.3 i/s Oj::Parser: 478.3 i/s - 1.25x faster rapidjson: 398.7 i/s - 1.04x faster oj: 348.7 i/s - 1.10x slower ```
I fixed the TruffleRuby and Ruby 2.7 support, and squeezed some more micro-optimizations:
I'll squash and merge. |
This is somewhat dead code as unless you are using `JSON::Parser.new` direcltly we never allocate `JSON::Ext::Parser` anymore. But still, we should mark all its reference in case some code out there uses that. Followup: ruby#675
This is somewhat dead code as unless you are using `JSON::Parser.new` direcltly we never allocate `JSON::Ext::Parser` anymore. But still, we should mark all its reference in case some code out there uses that. Followup: ruby#675
Heavily inspired from ruby/json#675. When parsing documents with the same structure repeatedly, a lot of time can be saved by keeping a small cache of map keys encountered. Using the existing `bench/bench.rb`, comparing with and without the cache shows a 30% performance improvement: ``` Calculating ------------------------------------- unpack-pooled 960.380k (± 1.4%) i/s - 4.865M in 5.066600s unpack-key-cache 1.245M (± 1.6%) i/s - 6.232M in 5.009060s Comparison: unpack-pooled: 960379.8 i/s unpack-key-cache: 1244517.6 i/s - 1.30x (± 0.00) faster ``` However, on the same benchmark, but with the cache filled with other keys, the performance is notably degraded: ``` Calculating ------------------------------------- unpack-pooled 926.849k (± 2.1%) i/s - 4.639M in 5.007333s unpack-key-cache 822.266k (± 2.4%) i/s - 4.113M in 5.004645s Comparison: unpack-pooled: 926849.2 i/s unpack-key-cache: 822265.6 i/s - 1.13x (± 0.00) slower ``` So this feature is powerful but situational.
Heavily inspired from ruby/json#675. When parsing documents with the same structure repeatedly, a lot of time can be saved by keeping a small cache of map keys encountered. Using the existing `bench/bench.rb`, comparing with and without the cache shows a 30% performance improvement: ``` Calculating ------------------------------------- unpack-pooled 960.380k (± 1.4%) i/s - 4.865M in 5.066600s unpack-key-cache 1.245M (± 1.6%) i/s - 6.232M in 5.009060s Comparison: unpack-pooled: 960379.8 i/s unpack-key-cache: 1244517.6 i/s - 1.30x (± 0.00) faster ``` However, on the same benchmark, but with the cache filled with other keys, the performance is notably degraded: ``` Calculating ------------------------------------- unpack-pooled 926.849k (± 2.1%) i/s - 4.639M in 5.007333s unpack-key-cache 822.266k (± 2.4%) i/s - 4.113M in 5.004645s Comparison: unpack-pooled: 926849.2 i/s unpack-key-cache: 822265.6 i/s - 1.13x (± 0.00) slower ``` So this feature is powerful but situational.
Heavily inspired from ruby/json#675. When parsing documents with the same structure repeatedly, a lot of time can be saved by keeping a small cache of map keys encountered. Using the existing `bench/bench.rb`, comparing with and without the cache shows a 30% performance improvement: ``` Calculating ------------------------------------- unpack-pooled 960.380k (± 1.4%) i/s - 4.865M in 5.066600s unpack-key-cache 1.245M (± 1.6%) i/s - 6.232M in 5.009060s Comparison: unpack-pooled: 960379.8 i/s unpack-key-cache: 1244517.6 i/s - 1.30x (± 0.00) faster ``` However, on the same benchmark, but with the cache filled with other keys, the performance is notably degraded: ``` Calculating ------------------------------------- unpack-pooled 926.849k (± 2.1%) i/s - 4.639M in 5.007333s unpack-key-cache 822.266k (± 2.4%) i/s - 4.113M in 5.004645s Comparison: unpack-pooled: 926849.2 i/s unpack-key-cache: 822265.6 i/s - 1.13x (± 0.00) slower ``` So this feature is powerful but situational.
Looking up the fstring table isn't that cheap, if we assume most object names are likely to re-appear, it makes sense to keep a small cache of them on the stack.
e.g.
[{"foo": 1, "bar": 2}, {"foo": 3, "bar": 4}]
In term of implementation, we use a simple sorted array with binary search as it's the most compact and guarantee a decent
O(log n)
, and the comparison is first done on the string length, an then fallback tomemcmp
.This helps quite a bit on
activitypub
andtwitter
benchmarks, however it is worse oncitm_catalog
because it has a number of unique object names, of the same size, so it behave poorly with the chosen sorting algorithm.It's a bit of a double edge sword, but perhaps a smarter comparison function may solve it.Solved later on.Before:
After: