Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX1958: Error code 400 when input trace is too large(WIP) #2612

Closed
wants to merge 3 commits into from

Conversation

mghildiy
Copy link
Contributor

@mghildiy mghildiy commented Jul 5, 2023

Return status code 400 when input trace is too large:

Currently, status code returned for traces larger than permissible size is 500, which makes client to retry. To avoid this, return status code 400 instead.:
Fixes #1958

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@@ -186,7 +186,7 @@ func (i *instance) push(ctx context.Context, id, traceBytes []byte) error {
prevSize := int(i.traceSizes[tkn])
reqSize := len(traceBytes)
if prevSize+reqSize > maxBytes {
return status.Errorf(codes.FailedPrecondition, (newTraceTooLargeError(id, i.instanceID, maxBytes, reqSize).Error()))
return status.Errorf(codes.InvalidArgument, (newTraceTooLargeError(id, i.instanceID, maxBytes, reqSize).Error()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tested this. does this really cause the distributor to return a 400?

Copy link
Contributor Author

@mghildiy mghildiy Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is WIP. Updated the title.

Copy link
Contributor Author

@mghildiy mghildiy Jul 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I test it? I have created a small spring boot app, and configured it with otel to send traces to tempo instance(image based on my local tempo codebase).
I face 2 issues:

  1. I am not able to configure max_bytes_per_trace for overrides. When I do it, tempo doesnt start, though its a valid config parameter as per official documentation.
overrides:
  per_tenant_override_config: /etc/overrides.yaml

In overrides.yaml, I have:

overrides:
  max_bytes_per_trace: 500
  1. As a workaround, I hardcode this config in instance.go. But this stops even other requests from being accepted(I see some requests with bytes size around 10064 at repeated intervals. I think they are compaction related activity)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I test it?

I would try this: https://github.com/equinix-labs/otel-cli
Hopefully it gives you the info you need. If not with some small modification you can likely dump it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In collector logs I see:

023-07-12 19:05:03 2023-07-12T13:35:03.418Zerrorexporterhelper/queued_retry.go:391Exporting failed. The error is not retryable. Dropping data.{"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "Permanent error: rpc error: code = InvalidArgument desc = TRACE_TOO_LARGE: max size of trace (500) exceeded while adding 5024 bytes to trace 64aeac0154073ef25bea20d464cc57d3 for tenant single-tenant", "dropped_items": 4}

Seems good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to turn off compaction?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps try something like:

compactor:
  compaction:
    max_block_bytes: 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In collector logs, I see error message containing 'The error is not retryable. Dropping data' no matter which code I use in ingester, FailedPrecondition OR InvalidArgument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, i'm afraid the error is not propagating correctly. this change may be a lot more involved then initially thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I explicitly set code 400,

return status.Errorf(400, (newTraceTooLargeError(id, i.instanceID, maxBytes, reqSize).Error()))

then log is:

2023-07-19T12:33:30.773Zerrorexporterhelper/queued_retry.go:391Exporting failed. The error is not retryable. Dropping data.{"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "Permanent error: rpc error: code = Code(400) desc = TRACE_TOO_LARGE: max size of trace (700) exceeded while adding 5024 bytes to trace 64b7d81a3295b734efbe7a6a1c9d4f60 for tenant single-tenant", "dropped_items": 4}

Maybe this helps.

@mghildiy mghildiy changed the title FIX1958: Error code 400 when input trace is too large FIX1958: Error code 400 when input trace is too large(WIP) Jul 6, 2023
@github-actions
Copy link
Contributor

This PR has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. This pull request will be closed in 15 days if there is no new activity.
Please apply keepalive label to exempt this Pull Request.

@github-actions github-actions bot added the stale Used for stale issues / PRs label Sep 18, 2023
@github-actions github-actions bot closed this Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Used for stale issues / PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix distributor http status codes
2 participants