-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX1958: Error code 400 when input trace is too large(WIP) #2612
Conversation
@@ -186,7 +186,7 @@ func (i *instance) push(ctx context.Context, id, traceBytes []byte) error { | |||
prevSize := int(i.traceSizes[tkn]) | |||
reqSize := len(traceBytes) | |||
if prevSize+reqSize > maxBytes { | |||
return status.Errorf(codes.FailedPrecondition, (newTraceTooLargeError(id, i.instanceID, maxBytes, reqSize).Error())) | |||
return status.Errorf(codes.InvalidArgument, (newTraceTooLargeError(id, i.instanceID, maxBytes, reqSize).Error())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you tested this. does this really cause the distributor to return a 400?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is WIP. Updated the title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I test it? I have created a small spring boot app, and configured it with otel to send traces to tempo instance(image based on my local tempo codebase).
I face 2 issues:
- I am not able to configure max_bytes_per_trace for overrides. When I do it, tempo doesnt start, though its a valid config parameter as per official documentation.
overrides:
per_tenant_override_config: /etc/overrides.yaml
In overrides.yaml, I have:
overrides:
max_bytes_per_trace: 500
- As a workaround, I hardcode this config in instance.go. But this stops even other requests from being accepted(I see some requests with bytes size around 10064 at repeated intervals. I think they are compaction related activity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I test it?
I would try this: https://github.com/equinix-labs/otel-cli
Hopefully it gives you the info you need. If not with some small modification you can likely dump it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In collector logs I see:
023-07-12 19:05:03 2023-07-12T13:35:03.418Zerrorexporterhelper/queued_retry.go:391Exporting failed. The error is not retryable. Dropping data.{"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "Permanent error: rpc error: code = InvalidArgument desc = TRACE_TOO_LARGE: max size of trace (500) exceeded while adding 5024 bytes to trace 64aeac0154073ef25bea20d464cc57d3 for tenant single-tenant", "dropped_items": 4}
Seems good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to turn off compaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps try something like:
compactor:
compaction:
max_block_bytes: 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In collector logs, I see error message containing 'The error is not retryable. Dropping data' no matter which code I use in ingester, FailedPrecondition OR InvalidArgument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, i'm afraid the error is not propagating correctly. this change may be a lot more involved then initially thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I explicitly set code 400,
return status.Errorf(400, (newTraceTooLargeError(id, i.instanceID, maxBytes, reqSize).Error()))
then log is:
2023-07-19T12:33:30.773Zerrorexporterhelper/queued_retry.go:391Exporting failed. The error is not retryable. Dropping data.{"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "Permanent error: rpc error: code = Code(400) desc = TRACE_TOO_LARGE: max size of trace (700) exceeded while adding 5024 bytes to trace 64b7d81a3295b734efbe7a6a1c9d4f60 for tenant single-tenant", "dropped_items": 4}
Maybe this helps.
This PR has been automatically marked as stale because it has not had any activity in the past 60 days. |
Return status code 400 when input trace is too large:
Currently, status code returned for traces larger than permissible size is 500, which makes client to retry. To avoid this, return status code 400 instead.:
Fixes #1958
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]