-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tailsamplingprocessor: Optimize tag mutator memory allocations #27889
Conversation
Since each `tailSamplingSpanProcessor`'s instance is not concurrently called by the ticker worker (it's a 1-to-1 relationship) we can safely reuse a slice for the tag mutators used in `makeDecision`. Additionally the tag mutators themselves were causing a lot of allocations and since they are static, we created constants for them preventing allocations on each execution of `makeDecision`. This improved the `makeDecision` benchmark by ~31%. ``` benchstat old.txt new.txt name old time/op new time/op delta Sampling-10 51.8µs ± 1% 35.7µs ± 1% -30.94% (p=0.008 n=5+5) ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed this live, and I'm good with the changes as long as the CI is passing (I'm not sure the tests were executed).
This looks cool. I jumped into this PR from the YouTube video as well. Nice job. |
A test is failing on a place that is related to the changes:
Anyone willing to pick this up? I can probably take a look eventually, but not right now. |
…ield mutatorsBuf and panic in CICD. Added this field to all struct in ut.
I appreciate the optimizations made by the original author. The issue of the unit test only requires a very small change, but since I am unable to submit it on the original branch, I have opened a separate PR #28597 , which could be merged after #27889 . But yes @brancz feel free to add those fix lines to the original branch and I will delete the new pr ^_^ |
I think I was able to add your commits to this PR, @jiekun. Thank you! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
... and 46 files with indirect coverage changes 📢 Thoughts on this report? Let us know!. |
…telemetry#27889) **Description:** Since each `tailSamplingSpanProcessor`'s instance is not concurrently called by the ticker worker (it's a 1-to-1 relationship) we can safely reuse a slice for the tag mutators used in `makeDecision`. Additionally the tag mutators themselves were causing a lot of allocations and since they are static, we created constants for them preventing allocations on each execution of `makeDecision`. This improved the `makeDecision` benchmark by ~31%. ``` benchstat old.txt new.txt name old time/op new time/op delta Sampling-10 51.8µs ± 1% 35.7µs ± 1% -30.94% (p=0.008 n=5+5) ``` **Testing:** Unit tests unchanged; added a benchmark **Documentation:** Perf improvement so no documentation changes needed. This was all based on production profiling data at Polar Signals running the collector. Here is a snapshot of the original profiling data we started with: https://pprof.me/52a7fab/ Judging by the production profiling data, a 31% improvement on the `makeDecision` codepath, should translate roughly into a 6% baseline CPU improvement our production deployment of the opentelemetry collector. The profiling data after improving: https://pprof.me/58c0e84/ This improvement was done as part of the Let's Profile Livestream where we optimize popular open-source projects live: https://www.youtube.com/watch?v=vkMQRjiNTHM --------- Co-authored-by: Jiekun <[email protected]>
…telemetry#27889) **Description:** Since each `tailSamplingSpanProcessor`'s instance is not concurrently called by the ticker worker (it's a 1-to-1 relationship) we can safely reuse a slice for the tag mutators used in `makeDecision`. Additionally the tag mutators themselves were causing a lot of allocations and since they are static, we created constants for them preventing allocations on each execution of `makeDecision`. This improved the `makeDecision` benchmark by ~31%. ``` benchstat old.txt new.txt name old time/op new time/op delta Sampling-10 51.8µs ± 1% 35.7µs ± 1% -30.94% (p=0.008 n=5+5) ``` **Testing:** Unit tests unchanged; added a benchmark **Documentation:** Perf improvement so no documentation changes needed. This was all based on production profiling data at Polar Signals running the collector. Here is a snapshot of the original profiling data we started with: https://pprof.me/52a7fab/ Judging by the production profiling data, a 31% improvement on the `makeDecision` codepath, should translate roughly into a 6% baseline CPU improvement our production deployment of the opentelemetry collector. The profiling data after improving: https://pprof.me/58c0e84/ This improvement was done as part of the Let's Profile Livestream where we optimize popular open-source projects live: https://www.youtube.com/watch?v=vkMQRjiNTHM --------- Co-authored-by: Jiekun <[email protected]>
…telemetry#27889) **Description:** Since each `tailSamplingSpanProcessor`'s instance is not concurrently called by the ticker worker (it's a 1-to-1 relationship) we can safely reuse a slice for the tag mutators used in `makeDecision`. Additionally the tag mutators themselves were causing a lot of allocations and since they are static, we created constants for them preventing allocations on each execution of `makeDecision`. This improved the `makeDecision` benchmark by ~31%. ``` benchstat old.txt new.txt name old time/op new time/op delta Sampling-10 51.8µs ± 1% 35.7µs ± 1% -30.94% (p=0.008 n=5+5) ``` **Testing:** Unit tests unchanged; added a benchmark **Documentation:** Perf improvement so no documentation changes needed. This was all based on production profiling data at Polar Signals running the collector. Here is a snapshot of the original profiling data we started with: https://pprof.me/52a7fab/ Judging by the production profiling data, a 31% improvement on the `makeDecision` codepath, should translate roughly into a 6% baseline CPU improvement our production deployment of the opentelemetry collector. The profiling data after improving: https://pprof.me/58c0e84/ This improvement was done as part of the Let's Profile Livestream where we optimize popular open-source projects live: https://www.youtube.com/watch?v=vkMQRjiNTHM --------- Co-authored-by: Jiekun <[email protected]>
Description:
Since each
tailSamplingSpanProcessor
's instance is not concurrentlycalled by the ticker worker (it's a 1-to-1 relationship) we can safely
reuse a slice for the tag mutators used in
makeDecision
. Additionallythe tag mutators themselves were causing a lot of allocations and since
they are static, we created constants for them preventing allocations on
each execution of
makeDecision
.This improved the
makeDecision
benchmark by ~31%.Testing: Unit tests unchanged; added a benchmark
Documentation: Perf improvement so no documentation changes needed.
This was all based on production profiling data at Polar Signals running the collector. Here is a snapshot of the original profiling data we started with: https://pprof.me/52a7fab/
Judging by the production profiling data, a 31% improvement on the
makeDecision
codepath, should translate roughly into a 6% baseline CPU improvement our production deployment of the opentelemetry collector.The profiling data after improving: https://pprof.me/58c0e84/
This improvement was done as part of the Let's Profile Livestream where we optimize popular open-source projects live: https://www.youtube.com/watch?v=vkMQRjiNTHM