Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to uint64 hash for identifier #886

Merged
merged 4 commits into from
Sep 4, 2024

Conversation

dashpole
Copy link
Contributor

@dashpole dashpole commented Aug 27, 2024

goos: linux
goarch: amd64
pkg: github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector/internal/datapointstorage
cpu: AMD EPYC 7B12
             │ bench2.txt  │             bench3.txt             │
             │   sec/op    │   sec/op     vs base               │
Identifier-2   4.805µ ± 4%   2.539µ ± 0%  -47.16% (p=0.002 n=6)

             │ bench2.txt  │            bench3.txt             │
             │    B/op     │    B/op     vs base               │
Identifier-2   2208.0 ± 0%   632.0 ± 0%  -71.38% (p=0.002 n=6)

             │ bench2.txt │            bench3.txt            │
             │ allocs/op  │ allocs/op   vs base              │
Identifier-2   47.00 ± 0%   44.00 ± 0%  -6.38% (p=0.002 n=6)

I'm still working on testing edge cases to make sure identifiers don't collide.

Copy link

codecov bot commented Aug 27, 2024

Codecov Report

Attention: Patch coverage is 91.11111% with 4 lines in your changes missing coverage. Please review.

Project coverage is 62.97%. Comparing base (4caace7) to head (7de96dd).
Report is 38 commits behind head on main.

Files with missing lines Patch % Lines
...ctor/internal/normalization/disabled_normalizer.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #886      +/-   ##
==========================================
+ Coverage   61.03%   62.97%   +1.93%     
==========================================
  Files          56       57       +1     
  Lines        5903     6012     +109     
==========================================
+ Hits         3603     3786     +183     
+ Misses       2143     2064      -79     
- Partials      157      162       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bwplotka
Copy link

bwplotka commented Aug 29, 2024

FYI: Check an amazing https://pkg.go.dev/golang.org/x/perf/cmd/benchstat tool for easier comparision of those results 🤗

Also you can use the following command for (1) saving to file (2) do enough repetition for benchstat probabilistic significance check to work (reliability), (3) to customize duration (1s is by default) , (4) to customize how many CPUs you want to use:

export bench=bench1 && go test \
    -run '^$' -bench '^BenchmarkIdentifier' \
    -benchtime 10s -count 6 -cpu 2 -benchmem \
    -memprofile=${bench}.mem.pprof -cpuprofile=${bench}.cpu.pprof \
  | tee ${bench}.txt

@@ -212,27 +212,57 @@ func (c *Cache) gc(shutdown <-chan struct{}, tickerCh <-chan time.Time) bool {
}

// Identifier returns the unique string identifier for a metric.
func Identifier(resource *monitoredrespb.MonitoredResource, extraLabels map[string]string, metric pmetric.Metric, attributes pcommon.Map) string {
var b strings.Builder
func Identifier(resource *monitoredrespb.MonitoredResource, extraLabels map[string]string, metric pmetric.Metric, attributes pcommon.Map) (uint64, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have context on other parts of Otel collector code, but storing those in maps is generally not ideal for efficiency. Especially as those attributes are short enough to be iterated to find items you are looking for etc

This is why Prometheus primary labels code started as array and now it's tightly optimized interned and encoded string 🙈 https://github.com/prometheus/prometheus/blob/main/model/labels/labels_dedupelabels.go#L27

Just ideas for future if even applicable.

Copy link

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a solid start 👍🏽 Good work!

@dashpole
Copy link
Contributor Author

dashpole commented Sep 3, 2024

added delimiters to make sure we don't get collisions between the different maps

@dashpole dashpole marked this pull request as ready for review September 3, 2024 19:59
@dashpole dashpole requested a review from a team as a code owner September 3, 2024 19:59
@dashpole dashpole requested a review from damemi September 3, 2024 19:59
@dashpole dashpole merged commit 0f06c76 into GoogleCloudPlatform:main Sep 4, 2024
29 checks passed
@dashpole dashpole deleted the optimize_identifier branch September 4, 2024 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants