Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(NODE-5910): optimize small byte copies #651

Merged
merged 1 commit into from
Feb 27, 2024
Merged

Conversation

nbbeeken
Copy link
Contributor

@nbbeeken nbbeeken commented Feb 22, 2024

Description

What is changing?

  • For small byte sequences copy the values "manually" rather than using TypedArray.set
Is there new documentation needed for these changes?

No.

What is the motivation for this change?

  • TypedArray.set can be slower than just running a for loop or manually assigning bytes into the array for copies of 16 bytes or less.
  • ObjectIds are 12 bytes, UUIDs are 16, and decimal128 values are 16 bytes.

benchmarks

Release Highlight

Improve the performance of small byte copies

When serializing ObjectIds, Decimal128, and UUID values we can get better performance by writing the byte-copying logic in Javascript for loops rather than using the TypedArray.set API. ObjectId serialization performance is 1.5x-2x faster.

Double-check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@nbbeeken nbbeeken marked this pull request as ready for review February 26, 2024 16:49
@durran durran self-assigned this Feb 27, 2024
@durran durran added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Feb 27, 2024
Copy link
Member

@durran durran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement on the 3 types.

@durran durran added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels Feb 27, 2024
@durran durran merged commit 24d035e into main Feb 27, 2024
4 checks passed
@durran durran deleted the NODE-5910-oid-perf branch February 27, 2024 17:24
@SeanReece
Copy link
Contributor

@nbbeeken I love seeing these general performance improvements 🎉 Good work.

I don't have access to the linked Google doc. Could you update the PR with the details from the benchmarks? I'm also curious if we've compared performance in Node 18, 20, and 21.

@nbbeeken
Copy link
Contributor Author

nbbeeken commented Mar 1, 2024

Hey @SeanReece thank you! I appreciate the community taking notice! 😄

So first about Node.js versions, I only manually tested on Node.js v20.11.0 but our CI benchmark tests on v18.16.0 and they show similar improvements. This change and the others we have made are general to JS and even Node versions before our current matrix so that gave me enough confidence we moved the needle in the right direction across the matrix.

And sure! I'll take a note to post something similar to what I did for this improvement soon. Since this is already merged, I'll try comparing with and without just this change, so the other improvements we've made will still be included in the versions I'll be comparing.

@nbbeeken
Copy link
Contributor Author

nbbeeken commented Mar 1, 2024

Microbenchmarks only get us so far and there's variance in the results to take into account but I think the message is clear on the serialize front! 🙂

Node.js v20.11.0
OS: linux
CPUs: 8
Arch: x64
RAM: 33.105743872 GB

========== test single ObjectId ==========
bson_deserialize_current_main x 3,525,016 ops/sec ±0.60% (391 runs sampled)
bson_deserialize_reverted_small_bytes_copy x 3,167,018 ops/sec ±3.23% (394 runs sampled)
bson_serialize_current_main x 2,200,393 ops/sec ±0.34% (392 runs sampled)
bson_serialize_reverted_small_bytes_copy x 1,184,140 ops/sec ±2.42% (362 runs sampled)
Fastest to slowest is:
  - serialize on current_main is faster by 60.05%
  - deserialize on current_main is faster by 10.70%

========== test 1000 ObjectId ==========
bson_deserialize_current_main x 5,507 ops/sec ±0.44% (393 runs sampled)
bson_deserialize_reverted_small_bytes_copy x 5,780 ops/sec ±0.18% (395 runs sampled)
bson_serialize_current_main x 8,384 ops/sec ±0.48% (390 runs sampled)
bson_serialize_reverted_small_bytes_copy x 3,254 ops/sec ±0.33% (393 runs sampled)
Fastest to slowest is:
  - serialize on current_main is faster by 88.16%
  - deserialize on reverted_small_bytes_copy is faster by 4.83%

========== test single Decimal128 ==========
bson_deserialize_current_main x 2,095,102 ops/sec ±2.38% (392 runs sampled)
bson_deserialize_reverted_small_bytes_copy x 2,258,844 ops/sec ±0.31% (394 runs sampled)
bson_serialize_current_main x 1,947,324 ops/sec ±0.43% (393 runs sampled)
bson_serialize_reverted_small_bytes_copy x 1,738,469 ops/sec ±0.55% (394 runs sampled)
Fastest to slowest is:
  - serialize on current_main is faster by 11.33%
  - deserialize on reverted_small_bytes_copy is faster by 7.52%

========== test 1000 Decimal128 ==========
bson_deserialize_current_main x 3,154 ops/sec ±0.43% (394 runs sampled)
bson_deserialize_reverted_small_bytes_copy x 3,171 ops/sec ±0.19% (395 runs sampled)
bson_serialize_current_main x 6,381 ops/sec ±1.23% (383 runs sampled)
bson_serialize_reverted_small_bytes_copy x 3,624 ops/sec ±3.11% (394 runs sampled)
Fastest to slowest is:
  - serialize on current_main is faster by 55.11%
  - deserialize on reverted_small_bytes_copy is faster by 0.53%

========== test single UUID ==========
bson_deserialize_current_main x 1,356,950 ops/sec ±3.45% (393 runs sampled)
bson_deserialize_reverted_small_bytes_copy x 1,633,847 ops/sec ±2.66% (393 runs sampled)
bson_serialize_current_main x 1,890,476 ops/sec ±0.43% (393 runs sampled)
bson_serialize_reverted_small_bytes_copy x 1,713,948 ops/sec ±2.49% (384 runs sampled)
Fastest to slowest is:
  - serialize on current_main is faster by 9.80%
  - deserialize on reverted_small_bytes_copy is faster by 18.52%

========== test 1000 UUID ==========
bson_deserialize_current_main x 2,169 ops/sec ±3.22% (356 runs sampled)
bson_deserialize_reverted_small_bytes_copy x 2,282 ops/sec ±2.59% (393 runs sampled)
bson_serialize_current_main x 6,223 ops/sec ±0.32% (390 runs sampled)
bson_serialize_reverted_small_bytes_copy x 5,416 ops/sec ±1.44% (383 runs sampled)
Fastest to slowest is:
  - serialize on current_main is faster by 13.87%
  - deserialize on reverted_small_bytes_copy is faster by 5.06%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants