Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unmarshal identical dedicatedColumns fields less often #3952

Merged
merged 3 commits into from
Aug 9, 2024

Conversation

zalegrala
Copy link
Contributor

@zalegrala zalegrala commented Aug 9, 2024

What this PR does:

Here we cache the unmarshaled DedicatedColumns in a package level map variable which is keyed by the bytes that were used to unmarshal.

  • This works because we are not dealing with maps, which are unordered. Otherwise the json should be in a consistent order.
  • This has the potential to grow unbounded, but there are not an unbounded set of dedicated columns, so this is probably fine.
  • The string of the json is used as the map key, which could be large.

A reasonable follow up might be to do the same kind of caching, but for each dedicated column. This would mean that we use the cached set of columns first, and if that isn't available, then use the cache for a single column.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@zalegrala
Copy link
Contributor Author

goos: linux
goarch: amd64
pkg: github.com/grafana/tempo/tempodb/backend
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
                  │  main.txt   │             next.txt              │
                  │   sec/op    │   sec/op     vs base              │
IndexMarshal-16     174.5µ ± 7%   171.5µ ± 6%       ~ (p=0.878 n=8)
IndexUnmarshal-16   41.32µ ± 2%   38.73µ ± 4%  -6.25% (p=0.000 n=8)
geomean             84.91µ        81.51µ       -4.01%

                  │   main.txt   │              next.txt              │
                  │     B/op     │     B/op      vs base              │
IndexMarshal-16     1.033Mi ± 0%   1.033Mi ± 0%       ~ (p=0.234 n=8)
IndexUnmarshal-16   47.59Ki ± 0%   45.75Ki ± 0%  -3.87% (p=0.000 n=8)
geomean             224.4Ki        220.0Ki       -1.93%

                  │  main.txt   │             next.txt              │
                  │  allocs/op  │ allocs/op   vs base               │
IndexMarshal-16      68.00 ± 1%   69.00 ± 1%        ~ (p=0.619 n=8)
IndexUnmarshal-16   109.00 ± 1%   63.00 ± 0%  -42.20% (p=0.000 n=8)
geomean              86.09        65.93       -23.42%

tempodb/backend/block_meta.go Show resolved Hide resolved
import "sync"

var (
dedicatedColumnsKeeper = map[string]*DedicatedColumns{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree there's nothing better than a global var b/c of the way Unmarshal/Marshal works, but i'd like this data to have a lifecycle.

Should we add a "ClearDedicatedColumnsCache()" (or whatever) method that wipes this map called right before polling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I put this in the defer of the poller.Do so that we clear the memory when we're done with polling. I can see putting it at the beginning instead, but then we'd hold the memory even after we're done polling. Could do both I suppose, but this is probably okay.

@zalegrala zalegrala merged commit 50631af into grafana:main Aug 9, 2024
16 checks passed
@zalegrala zalegrala deleted the dedicatedColumnsKeeper branch August 9, 2024 20:10
@joe-elliott joe-elliott mentioned this pull request Sep 3, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants