Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk-trace-base): add diagnostic logging when spans are dropped #3610

Merged
merged 3 commits into from
Feb 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ For experimental package changes, see the [experimental CHANGELOG](experimental/
* feat(sdk-metrics): apply binary search in histogram recording [#3539](https://github.com/open-telemetry/opentelemetry-js/pull/3539) @legendecas
* perf(propagator-jaeger): improve deserializeSpanContext performance [#3541](https://github.com/open-telemetry/opentelemetry-js/pull/3541) @doochik
* feat: support TraceState in SamplingResult [#3530](https://github.com/open-telemetry/opentelemetry-js/pull/3530) @raphael-theriault-swi
* feat(sdk-trace-base): add diagnostic logging when spans are dropped [#3610](https://github.com/open-telemetry/opentelemetry-js/pull/3610) @neoeinstein

### :bug: (Bug Fix)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ export abstract class BatchSpanProcessorBase<T extends BufferConfig>
private _finishedSpans: ReadableSpan[] = [];
private _timer: NodeJS.Timeout | undefined;
private _shutdownOnce: BindOnceFuture<void>;
private _droppedSpansCount: number = 0;

constructor(private readonly _exporter: SpanExporter, config?: T) {
const env = getEnv();
Expand Down Expand Up @@ -117,8 +118,23 @@ export abstract class BatchSpanProcessorBase<T extends BufferConfig>
private _addToBuffer(span: ReadableSpan) {
if (this._finishedSpans.length >= this._maxQueueSize) {
// limit reached, drop span

if (this._droppedSpansCount === 0) {
diag.debug('maxQueueSize reached, dropping spans');
}
this._droppedSpansCount++;
Comment on lines +122 to +125
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very chatty in a system overloaded enough to be dropping spans. Do you think it's needed when the warn log on line 132 is already doing this job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a given stretch of dropped spans, this message will only be reported once, just like the summary warning, so should have the same cardinality as that one. This debug diagnostic can be used to identify at what time those drops started. It's recorded as debug, which should be able to be filtered out if debug diagnostics aren't interesting or desired.

I found it useful as there are two knobs that can be adjusted to prevent dropping spans: queue size and the export interval. The warning message makes it easier to adjust the queue size, but the debug message can make it apparent how often one should be exporting if they don't want to carry a larger memory load associated with a larger queue.

I'm not particularly attached to this message, so if we want it removed, that's fine with me.


return;
}

if (this._droppedSpansCount > 0) {
// some spans were dropped, log once with count of spans dropped
diag.warn(
`Dropped ${this._droppedSpansCount} spans because maxQueueSize reached`
);
this._droppedSpansCount = 0;
}

this._finishedSpans.push(span);
this._maybeStartTimer();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,40 @@ describe('BatchSpanProcessorBase', () => {
}
assert.equal(processor['_finishedSpans'].length, 6);
});
it('should count and report dropped spans', done => {
const debugStub = sinon.spy(diag, 'debug');
const warnStub = sinon.spy(diag, 'warn');
const span = createSampledSpan('test');
for (let i = 0, j = 6; i < j; i++) {
processor.onStart(span, ROOT_CONTEXT);
processor.onEnd(span);
}
assert.equal(processor['_finishedSpans'].length, 6);
assert.equal(processor['_droppedSpansCount'], 0);
sinon.assert.notCalled(debugStub);

processor.onStart(span, ROOT_CONTEXT);
processor.onEnd(span);

assert.equal(processor['_finishedSpans'].length, 6);
assert.equal(processor['_droppedSpansCount'], 1);
sinon.assert.calledOnce(debugStub);

processor.forceFlush().then(() => {
processor.onStart(span, ROOT_CONTEXT);
processor.onEnd(span);

assert.equal(processor['_finishedSpans'].length, 1);
assert.equal(processor['_droppedSpansCount'], 0);

sinon.assert.calledOnceWithExactly(
warnStub,
'Dropped 1 spans because maxQueueSize reached'
);

done();
});
});
});
});

Expand Down