Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(otlp-exporter-base)!: ensure we do not retry after the timeout has elapsed #4889

Conversation

pichlermarc
Copy link
Member

@pichlermarc pichlermarc commented Jul 31, 2024

Which problem is this PR solving?

This is a fix for an unreleased bug (therefore skipping changelog). While working on a follow up for the web exporters, I realized that the RetryingTransport I introduced in #4743 it does not correctly handle timeouts. Instead of computing remaining time it does not take care of timeouts at all, meaning that consecutive send calls can actually take too long when retrying. This changes the transport interface so that we can set a timeout for each send call, and computes the remaining time to pass to the underlying send call.

Example (correct behavior):

  • call send with a timeout of 1000ms, retrying transport calls send on the underlying transport, with timeout of 1000ms
  • transport returns 503, in 10ms, says to retry-after 500ms
  • now, the actual time left for the retry is 1000ms (original timeout) - 500ms (the time that was spent on waiting) - 10ms (time of the original call) = 490ms
  • transport again returns 503, in 10ms, says to retry-after 500ms
  • we should not retry as we only have 480ms left, but the sever tells us to wait for 500ms, which will be after our timeout.

Example (current behavior):

  • call send with a timeout of 1000ms, retrying transport calls send on the underlying transport, with timeout of 1000ms
  • transport returns 503, in 10ms, says to retry-after 500ms
  • wait 500ms, try again with a timeout of 1000ms
  • transport again returns 503, in 10ms, says to retry-after 500ms
  • wait 500ms, try again with a timeout of 1000ms (now we've actually exceeded our timeout already)
  • this carries on until we reached max timeouts

Enables #4895

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

  • Existing tests
  • Added unit test

@pichlermarc pichlermarc requested a review from a team July 31, 2024 12:24
@pichlermarc pichlermarc added bug Something isn't working Skip Changelog labels Jul 31, 2024
@pichlermarc pichlermarc added the target:next-minor-release This PR is in scope for the next minor release (`main` branch) label Aug 7, 2024
@pichlermarc pichlermarc added this pull request to the merge queue Aug 9, 2024
Merged via the queue into open-telemetry:main with commit 5cc3dc2 Aug 9, 2024
19 checks passed
@pichlermarc pichlermarc deleted the fix/no-retry-after-deadline branch August 9, 2024 07:35
Zirak pushed a commit to Zirak/opentelemetry-js that referenced this pull request Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Skip Changelog target:next-minor-release This PR is in scope for the next minor release (`main` branch)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants