Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to understand and fix test_shutdown_wait_last_export #4042

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

LarsMichelsen
Copy link

Description

TL;DR The time assert in the test was wrong.

Looking at the original commit (c84ba9) of this test, the delay of the mocked backoff was 4 seconds fixed with 6 iterations. So the time until shutdown returned was around 24 seconds. The default timeout of the lock wait in shutdown did not hit before the retry of the exporter was over.

The line self.assertGreaterEqual(now, (start_time + 30 / 1000)) seemed to be wrong from the beginning. It just ensured that the shutdown call took at least 30ms. I assume it was supposed to do something else. But to me it's not clear what exactly it should do.

One explanation would be that it was intended to do self.assertGreaterEqual(now, (start_time + 30 * 1000)) instead. But that would mean: Shutdown should need at least 30 seconds to pass, which does not make sense imho.

Another option would be that it was intended to do self.assertLessEqual(now, (start_time + 30 * 1000)) instead. That would mean: Shutdown should need at maximum 30 seconds to pass, which actually could make more sense. It would ensure that the call of self.exporter.shutdown returns after 30 seconds.

My biggest issue here is, that it tests the happy path where the retry is over before the timeout while there is no mechanism to ensure that the exporter thread is stopped in a sensible time in case it's current exponential backoff is in a longer sleep already while the shutdown flag is set. If I understand it correctly, it can hit the beginning of a 32 second sleep, which could mean that a process needs a) 30 seconds to signal the shutdown and additional 32 seconds to finish the exporter thread. But that's something for another change.

After all these words, looking at the current HEAD, the situation of the test case is basically the same as originally. The retry interval has been changed recently to be way shorter. The default lock timeout stayed the same at 30 seconds, but that does not matter too much since the retry mechanism is finished long before that. And the aforementioned assert is still wrong, with some additional problem from #4014.

Fixes the issue mentioned here: #4014 (comment)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Run the changed tests

Does This PR Require a Contrib Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

TL;DR The time assert in the test was wrong.

Looking at the original commit (c84ba9) of this test, the delay of the
mocked backoff was 4 seconds fixed with 6 iterations. So the time until
shutdown returned was around 24 seconds. The default timeout of the lock
wait in shutdown did not hit before the retry of the exporter was over.

The line `self.assertGreaterEqual(now, (start_time + 30 / 1000))` seemed
to be wrong from the beginning. It just ensured that the shutdown call
took at least 30ms. I assume it was supposed to do something else. But
to me it's not clear what exactly it should do.

One explantion would be that it was intended to do
`self.assertGreaterEqual(now, (start_time + 30 * 1000))` instead. But
that would mean: Shutdown should need at least 30 seconds to pass, which
does not make sense imho.

Another option would be that it was intended to do
`self.assertLessEqual(now, (start_time + 30 * 1000))` instead. That
would mean: Shutdown should need at maximum 30 seconds to pass, which
actually could make more sense. It would ensure that the call of
`self.exporter.shutdown` returns after 30 seconds.

My biggest issue here is, that it tests the happy path where the retry
is over before the timeout while there is no mechanism to ensure that the
exporter thread is stopped in a sensible time in case it's current
exponential backoff is in a longer sleep already while the shutdown flag
is set. If I understand it correctly, it can hit the beginning of a 32
second sleep, which could mean that a process needs a) 30 seconds to
signal the shutdown and additional 32 seconds to finish the exporter
thread. But that's something for another change.

After all these words, looking at the current HEAD, the situation of the
test case is basically the same as originally. The retry interval has
been changed recently to be way shorter. The default lock timeout stayed
the same at 30 seconds, but that does not matter too much since the
retry mechanism is finished long before that. And the aforementioned
assert is still wrong, with some additional problem from open-telemetry#4014.
@LarsMichelsen LarsMichelsen requested a review from a team July 10, 2024 20:09
Copy link

linux-foundation-easycla bot commented Jul 10, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: LarsMichelsen / name: Lars (acf919c)
  • ✅ login: emdneto / name: Emídio Neto (93bf0ee, b29c0a1)

@xrmx xrmx added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Skip Changelog PRs that do not require a CHANGELOG.md entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants