Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tailsamplingprocessor: Optimize tag mutator memory allocations #27889

Merged
merged 5 commits into from
Oct 26, 2023

Conversation

brancz
Copy link
Contributor

@brancz brancz commented Oct 20, 2023

Description:

Since each tailSamplingSpanProcessor's instance is not concurrently
called by the ticker worker (it's a 1-to-1 relationship) we can safely
reuse a slice for the tag mutators used in makeDecision. Additionally
the tag mutators themselves were causing a lot of allocations and since
they are static, we created constants for them preventing allocations on
each execution of makeDecision.

This improved the makeDecision benchmark by ~31%.

benchstat old.txt new.txt
name         old time/op  new time/op  delta
Sampling-10  51.8µs ± 1%  35.7µs ± 1%  -30.94%  (p=0.008 n=5+5)

Testing: Unit tests unchanged; added a benchmark

Documentation: Perf improvement so no documentation changes needed.

This was all based on production profiling data at Polar Signals running the collector. Here is a snapshot of the original profiling data we started with: https://pprof.me/52a7fab/

Judging by the production profiling data, a 31% improvement on the makeDecision codepath, should translate roughly into a 6% baseline CPU improvement our production deployment of the opentelemetry collector.

The profiling data after improving: https://pprof.me/58c0e84/

This improvement was done as part of the Let's Profile Livestream where we optimize popular open-source projects live: https://www.youtube.com/watch?v=vkMQRjiNTHM

Since each `tailSamplingSpanProcessor`'s instance is not concurrently
called by the ticker worker (it's a 1-to-1 relationship) we can safely
reuse a slice for the tag mutators used in `makeDecision`. Additionally
the tag mutators themselves were causing a lot of allocations and since
they are static, we created constants for them preventing allocations on
each execution of `makeDecision`.

This improved the `makeDecision` benchmark by ~31%.

```
benchstat old.txt new.txt
name         old time/op  new time/op  delta
Sampling-10  51.8µs ± 1%  35.7µs ± 1%  -30.94%  (p=0.008 n=5+5)
```
@brancz brancz requested a review from a team October 20, 2023 15:50
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Oct 20, 2023
Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed this live, and I'm good with the changes as long as the CI is passing (I'm not sure the tests were executed).

https://www.youtube.com/watch?v=vkMQRjiNTHM

@jiekun
Copy link
Member

jiekun commented Oct 21, 2023

This looks cool. I jumped into this PR from the YouTube video as well. Nice job.

@jpkrohling
Copy link
Member

A test is failing on a place that is related to the changes:

--- FAIL: TestTraceIntegrity (0.00s)
panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 15 [running]:
testing.tRunner.func1.2({0x112df00, 0xc00016a5a0})
	/opt/hostedtoolcache/go/1.20.10/x64/src/testing/testing.go:1526 +0x372
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.20.10/x64/src/testing/testing.go:1529 +0x650
panic({0x112df00, 0xc00016a5a0})
	/opt/hostedtoolcache/go/1.20.10/x64/src/runtime/panic.go:890 +0x263
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor.(*tailSamplingSpanProcessor).makeDecision(0xc000326180, {0x1, 0x2, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...)
	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/tailsamplingprocessor/processor.go:305 +0x14b9
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor.(*tailSamplingSpanProcessor).samplingPolicyOnTick(0xc000326180)

Anyone willing to pick this up? I can probably take a look eventually, but not right now.

…ield mutatorsBuf and panic in CICD. Added this field to all struct in ut.
@jiekun
Copy link
Member

jiekun commented Oct 25, 2023

A test is failing on a place that is related to the changes:

--- FAIL: TestTraceIntegrity (0.00s)
panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 15 [running]:
testing.tRunner.func1.2({0x112df00, 0xc00016a5a0})
	/opt/hostedtoolcache/go/1.20.10/x64/src/testing/testing.go:1526 +0x372
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.20.10/x64/src/testing/testing.go:1529 +0x650
panic({0x112df00, 0xc00016a5a0})
	/opt/hostedtoolcache/go/1.20.10/x64/src/runtime/panic.go:890 +0x263
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor.(*tailSamplingSpanProcessor).makeDecision(0xc000326180, {0x1, 0x2, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...)
	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/tailsamplingprocessor/processor.go:305 +0x14b9
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor.(*tailSamplingSpanProcessor).samplingPolicyOnTick(0xc000326180)

Anyone willing to pick this up? I can probably take a look eventually, but not right now.

I appreciate the optimizations made by the original author.

The issue of the unit test only requires a very small change, but since I am unable to submit it on the original branch, I have opened a separate PR #28597 , which could be merged after #27889 . But yes @brancz feel free to add those fix lines to the original branch and I will delete the new pr ^_^

@jpkrohling
Copy link
Member

I think I was able to add your commits to this PR, @jiekun. Thank you!

@codecov
Copy link

codecov bot commented Oct 26, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Files Coverage Δ
processor/tailsamplingprocessor/processor.go 90.54% <100.00%> (+0.25%) ⬆️

... and 46 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

@jpkrohling jpkrohling merged commit acae6fe into open-telemetry:main Oct 26, 2023
163 of 164 checks passed
@github-actions github-actions bot added this to the next release milestone Oct 26, 2023
sigilioso pushed a commit to carlossscastro/opentelemetry-collector-contrib that referenced this pull request Oct 27, 2023
…telemetry#27889)

**Description:**

Since each `tailSamplingSpanProcessor`'s instance is not concurrently
called by the ticker worker (it's a 1-to-1 relationship) we can safely
reuse a slice for the tag mutators used in `makeDecision`. Additionally
the tag mutators themselves were causing a lot of allocations and since
they are static, we created constants for them preventing allocations on
each execution of `makeDecision`.

This improved the `makeDecision` benchmark by ~31%.

```
benchstat old.txt new.txt
name         old time/op  new time/op  delta
Sampling-10  51.8µs ± 1%  35.7µs ± 1%  -30.94%  (p=0.008 n=5+5)
```

**Testing:** Unit tests unchanged; added a benchmark

**Documentation:** Perf improvement so no documentation changes needed.

This was all based on production profiling data at Polar Signals running
the collector. Here is a snapshot of the original profiling data we
started with: https://pprof.me/52a7fab/

Judging by the production profiling data, a 31% improvement on the
`makeDecision` codepath, should translate roughly into a 6% baseline CPU
improvement our production deployment of the opentelemetry collector.

The profiling data after improving: https://pprof.me/58c0e84/

This improvement was done as part of the Let's Profile Livestream where
we optimize popular open-source projects live:
https://www.youtube.com/watch?v=vkMQRjiNTHM

---------

Co-authored-by: Jiekun <[email protected]>
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this pull request Nov 12, 2023
…telemetry#27889)

**Description:**

Since each `tailSamplingSpanProcessor`'s instance is not concurrently
called by the ticker worker (it's a 1-to-1 relationship) we can safely
reuse a slice for the tag mutators used in `makeDecision`. Additionally
the tag mutators themselves were causing a lot of allocations and since
they are static, we created constants for them preventing allocations on
each execution of `makeDecision`.

This improved the `makeDecision` benchmark by ~31%.

```
benchstat old.txt new.txt
name         old time/op  new time/op  delta
Sampling-10  51.8µs ± 1%  35.7µs ± 1%  -30.94%  (p=0.008 n=5+5)
```

**Testing:** Unit tests unchanged; added a benchmark

**Documentation:** Perf improvement so no documentation changes needed.

This was all based on production profiling data at Polar Signals running
the collector. Here is a snapshot of the original profiling data we
started with: https://pprof.me/52a7fab/

Judging by the production profiling data, a 31% improvement on the
`makeDecision` codepath, should translate roughly into a 6% baseline CPU
improvement our production deployment of the opentelemetry collector.

The profiling data after improving: https://pprof.me/58c0e84/

This improvement was done as part of the Let's Profile Livestream where
we optimize popular open-source projects live:
https://www.youtube.com/watch?v=vkMQRjiNTHM

---------

Co-authored-by: Jiekun <[email protected]>
RoryCrispin pushed a commit to ClickHouse/opentelemetry-collector-contrib that referenced this pull request Nov 24, 2023
…telemetry#27889)

**Description:**

Since each `tailSamplingSpanProcessor`'s instance is not concurrently
called by the ticker worker (it's a 1-to-1 relationship) we can safely
reuse a slice for the tag mutators used in `makeDecision`. Additionally
the tag mutators themselves were causing a lot of allocations and since
they are static, we created constants for them preventing allocations on
each execution of `makeDecision`.

This improved the `makeDecision` benchmark by ~31%.

```
benchstat old.txt new.txt
name         old time/op  new time/op  delta
Sampling-10  51.8µs ± 1%  35.7µs ± 1%  -30.94%  (p=0.008 n=5+5)
```

**Testing:** Unit tests unchanged; added a benchmark

**Documentation:** Perf improvement so no documentation changes needed.

This was all based on production profiling data at Polar Signals running
the collector. Here is a snapshot of the original profiling data we
started with: https://pprof.me/52a7fab/

Judging by the production profiling data, a 31% improvement on the
`makeDecision` codepath, should translate roughly into a 6% baseline CPU
improvement our production deployment of the opentelemetry collector.

The profiling data after improving: https://pprof.me/58c0e84/

This improvement was done as part of the Let's Profile Livestream where
we optimize popular open-source projects live:
https://www.youtube.com/watch?v=vkMQRjiNTHM

---------

Co-authored-by: Jiekun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/tailsampling Tail sampling processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants