Global sampling service telemetry logging configuration options #4554

rmfitzpatrick · 2021-12-14T16:00:54Z

Is your feature request related to a problem? Please describe.
Some components use warn and info level statements for paths that can be hit frequently, like the batch processor when configured with a small sending queue:

opentelemetry-collector/processor/batchprocessor/batch_processor.go

Line 185 in 9d3a8a4

bp.logger.Warn("Sender failed", zap.Error(err))

. This can lead to collectors logging hundreds or thousands of messages a second, much of which is redundant information.

Describe the solution you'd like
The ServiceTelemetryLogs config should provide zap sampling initial and thereafter fields that are used when instantiating all component loggers, with defaults equivalent to disabling:

    sampling_initial: 1
    sampling_thereafter: 1

Describe alternatives you've considered
Given the nature of zap logger usage in components I'm not sure of an alternative sampling approach within the collector process, or another course other than setting an arbitrarily high log level and missing important statements, or making valid warning/info scenarios have a lower log level (to same effect or no change with low enough configured log level).

Additional context
Would potentially resolve #1061

The text was updated successfully, but these errors were encountered:

tigrannajaryan · 2021-12-14T16:08:41Z

I think this is a reasonable request but I would like to understand better why we are logging too many messages per second.

This can lead to collectors logging hundreds or thousands of messages a second, much of which is redundant information.

This a bit unexpected to me. We use exponential backoff in senders so presumably after a few failures to send the rate of failures should slow down a lot. Is this not happening? What message is logged thousands times a second?

Also queued_retry component itself uses a sampled logger, precisely to avoid logging too much. Is batch processor the guilty component?

rmfitzpatrick · 2021-12-14T16:14:34Z

On a quick pass I'm not seeing where the batch processor uses a sampling config, and right it is flooding the logs when extended export failures occur:

2021-12-14T15:26:49.497Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.508Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.525Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.588Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.597Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.614Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.640Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.649Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.673Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.689Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.707Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.771Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.809Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.854Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.872Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.883Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.886Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.925Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.935Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.943Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}
2021-12-14T15:26:49.974Z	warn	batchprocessor/batch_processor.go:185	Sender failed	{"kind": "processor", "name": "batch", "error": "sending_queue is full"}

There may be other issues with the exporter helpers as well and I will open another issue if I determine what it is.

tigrannajaryan · 2021-12-14T16:26:34Z

On a quick pass I'm not seeing where the batch processor uses a sampling config

AFAIK, it doesn't. queued_retry does.

I think it is a good idea to have a general sampling for all logs (at least configurable).

Ideally I would prefer that components are given 2 loggers: one to use during startup which will not be sampled (or will have higher sampling thresholds) to make sure all critical messages during startup are printed, and another logger to be used after startup to make sure long-running processes don't generate huge volumes of logs.

Fixes open-telemetry#4554 Signed-off-by: Bogdan <[email protected]>

Fixes #4554 Signed-off-by: Bogdan <[email protected]> Signed-off-by: Bogdan <[email protected]>

bogdandrutu mentioned this issue Jan 5, 2022

[Proposal] Move batching to exporterhelper #4646

Open

bogdandrutu added a commit to bogdandrutu/opentelemetry-collector that referenced this issue Oct 25, 2022

Allow to configure sampling config for logs

8fccc4a

Fixes open-telemetry#4554 Signed-off-by: Bogdan <[email protected]>

bogdandrutu mentioned this issue Oct 25, 2022

Allow to configure sampling config for logs #6404

Merged

bogdandrutu added a commit to bogdandrutu/opentelemetry-collector that referenced this issue Nov 7, 2022

Allow to configure sampling config for logs

276c1c9

Fixes open-telemetry#4554 Signed-off-by: Bogdan <[email protected]>

bogdandrutu closed this as completed in #6404 Nov 7, 2022

bogdandrutu added a commit that referenced this issue Nov 7, 2022

Allow to configure sampling config for logs (#6404)

2b03abd

Fixes #4554 Signed-off-by: Bogdan <[email protected]> Signed-off-by: Bogdan <[email protected]>

dmitryax mentioned this issue Sep 8, 2023

[telemetrySettting] Create sampled Logger #8134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global sampling service telemetry logging configuration options #4554

Global sampling service telemetry logging configuration options #4554

rmfitzpatrick commented Dec 14, 2021

tigrannajaryan commented Dec 14, 2021

rmfitzpatrick commented Dec 14, 2021

tigrannajaryan commented Dec 14, 2021

Global sampling service telemetry logging configuration options #4554

Global sampling service telemetry logging configuration options #4554

Comments

rmfitzpatrick commented Dec 14, 2021

tigrannajaryan commented Dec 14, 2021

rmfitzpatrick commented Dec 14, 2021

tigrannajaryan commented Dec 14, 2021