-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce cost of pipeline stages by pooling maps we create for labels #11484
Conversation
Signed-off-by: Callum Styan <[email protected]>
Trivy scan found the following vulnerabilities:
|
Signed-off-by: Callum Styan <[email protected]>
Benchmark for latest commit. Additional pooling of the map at the beginning of
|
pkg/logql/log/fmt.go
Outdated
@@ -380,7 +385,12 @@ func (lf *LabelsFormatter) Process(ts int64, l []byte, lbs *LabelsBuilder) ([]by | |||
lf.currentLine = l | |||
lf.currentTs = ts | |||
|
|||
var data interface{} | |||
var data = stringMapPool.Get().(map[string]string) | |||
clear(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Would it make a difference if the clear() is done in the defer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am misunderstanding how some of this works under the hood, it seems like unfortunately the text template execution is using references to the data in the map, so if we clear at the exit of this function rather than when we first grab something from the pool the tests will fail.
I've modified things so that there's a wrapper struct for the pool that calls clear
for us when we grab something from the pool, just to make things cleaner.
Can you share a couple profiles showing this? (probably best to export the pprof files from pyroscope and attach here instead of linking, for the sake of the community) |
pkg/logql/log/labels.go
Outdated
@@ -437,6 +438,32 @@ func (b *LabelsBuilder) UnsortedLabels(buf labels.Labels, categories ...LabelCat | |||
return buf | |||
} | |||
|
|||
var stringMapPool = sync.Pool{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to wrap this in a specific type so we don't have to typcast when we decide to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
funnily enough, I did exactly this already but forgot to push it
This seems like a cool idea, in general. I've left a comment with a potential QOL improvement for someone using it. |
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
remove mistakenly committed files (benchmarking related) from #11484 Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
reverts pooling that causes a panic for concurrent map writes --------- Signed-off-by: Callum Styan <[email protected]>
…rafana#11484) If we look at profiling of ingesters in some of our production deployments during which we have high ingester load from querying we see that we spend roughly 15-20% of our CPU time on GC related work. Looking at different `runtime` memory related calls, it looks like ~50% of all CPU time during Pipeline stage processing is the creation of the map for labels that is then passed to text template parsing. This change will make the pipeline processing itself ~15% faster but hopefully will also reduce (very slightly) the impact of GC in heavy query load situations. ``` goos: linux goarch: amd64 pkg: github.com/grafana/loki/pkg/logql/log cpu: AMD Ryzen 9 5950X 16-Core Processor │ base │ mapPool │ │ sec/op │ sec/op vs base │ _Pipeline/pipeline_bytes-32 7.749µ ± ∞ ¹ 6.464µ ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/pipeline_string-32 7.691µ ± ∞ ¹ 6.549µ ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/line_extractor_bytes-32 7.922µ ± ∞ ¹ 6.882µ ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/line_extractor_string-32 7.900µ ± ∞ ¹ 6.878µ ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/label_extractor_bytes-32 7.952µ ± ∞ ¹ 6.976µ ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/label_extractor_string-32 7.940µ ± ∞ ¹ 7.102µ ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 7.858µ 6.805µ -13.41% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 │ base │ mapPool │ │ B/op │ B/op vs base │ _Pipeline/pipeline_bytes-32 4.963Ki ± ∞ ¹ 1.413Ki ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/pipeline_string-32 5.025Ki ± ∞ ¹ 1.476Ki ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/line_extractor_bytes-32 5.030Ki ± ∞ ¹ 1.478Ki ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/line_extractor_string-32 5.028Ki ± ∞ ¹ 1.478Ki ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/label_extractor_bytes-32 5.027Ki ± ∞ ¹ 1.478Ki ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/label_extractor_string-32 5.027Ki ± ∞ ¹ 1.478Ki ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 5.017Ki 1.466Ki -70.77% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 │ base │ mapPool │ │ allocs/op │ allocs/op vs base │ _Pipeline/pipeline_bytes-32 37.00 ± ∞ ¹ 35.00 ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/pipeline_string-32 38.00 ± ∞ ¹ 36.00 ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/line_extractor_bytes-32 37.00 ± ∞ ¹ 35.00 ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/line_extractor_string-32 37.00 ± ∞ ¹ 35.00 ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/label_extractor_bytes-32 37.00 ± ∞ ¹ 35.00 ± ∞ ¹ ~ (p=1.000 n=1) ² _Pipeline/label_extractor_string-32 37.00 ± ∞ ¹ 35.00 ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 37.16 35.16 -5.38% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 ``` --------- Signed-off-by: Callum Styan <[email protected]>
remove mistakenly committed files (benchmarking related) from grafana#11484 Signed-off-by: Callum Styan <[email protected]>
If we look at profiling of ingesters in some of our production deployments during which we have high ingester load from querying we see that we spend roughly 15-20% of our CPU time on GC related work.
Looking at different
runtime
memory related calls, it looks like ~50% of all CPU time during Pipeline stage processing is the creation of the map for labels that is then passed to text template parsing. This change will make the pipeline processing itself ~15% faster but hopefully will also reduce (very slightly) the impact of GC in heavy query load situations.