Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Failure: disaster_recovery_2 #390

Closed
linh2931 opened this issue Jul 22, 2024 · 3 comments · Fixed by #426, #428, #484, #486 or #494
Closed

Test Failure: disaster_recovery_2 #390

linh2931 opened this issue Jul 22, 2024 · 3 comments · Fixed by #426, #428, #484, #486 or #494
Assignees
Labels
👍 lgtm OCI Work exclusive to OCI team test-instability tag issues for flaky tests, high priority to address

Comments

@linh2931
Copy link
Member

https://github.com/AntelopeIO/spring/actions/runs/10048012787/job/27771514855?pr=382#step:4:1595

Traceback (most recent call last):
  File "/__w/spring/spring/build/tests/disaster_recovery_2.py", line 140, in <module>
    assert node.waitForLibToAdvance(), "Node did not advance LIB after relaunch"
@heifner
Copy link
Member

heifner commented Jul 23, 2024

The test restarts 5 nodes from a snapshot. The test failed because node0 could not connect to any node when launched so it waited 30 seconds before trying again. During this 30 seconds it happened to be producing blocks. Once it was able to connect it was busy feeding blocks to the other node.

Options:

  • Modify the test to wait longer than the default 30 seconds for LIB to advance. Note the default for reconnect attempts is 30 seconds.
  • Modify net_plugin to attempt connections quicker than 30 seconds at launch and back off to configured connection monitor time after a number of connection attempts.

@arhag arhag added 👍 lgtm test-instability tag issues for flaky tests, high priority to address and removed triage labels Jul 26, 2024
@arhag arhag added this to the Savanna: Production-Ready milestone Jul 26, 2024
@heifner heifner self-assigned this Jul 29, 2024
@heifner heifner added the OCI Work exclusive to OCI team label Jul 29, 2024
heifner added a commit that referenced this issue Jul 29, 2024
[1.0-beta4] Test: Provide more time for connections to be established
heifner added a commit that referenced this issue Jul 29, 2024
[1.0-beta4 -> main] Test: Provide more time for connections to be established
@spoonincode
Copy link
Member

appears to still exist?
https://github.com/AntelopeIO/spring/actions/runs/10272637950/job/28425737014
reopening for now but if felt strongly can open new one instead

@spoonincode spoonincode reopened this Aug 6, 2024
heifner added a commit that referenced this issue Aug 6, 2024
@heifner heifner linked a pull request Aug 6, 2024 that will close this issue
heifner added a commit that referenced this issue Aug 7, 2024
[1.0-beta4] P2P: Fix handling of known pending block
heifner added a commit that referenced this issue Aug 7, 2024
heifner added a commit that referenced this issue Aug 7, 2024
[1.0-beta4 -> main] P2P: Fix handling of known pending block
heifner added a commit that referenced this issue Aug 7, 2024
heifner added a commit that referenced this issue Aug 7, 2024
[5.0] P2P: Fix handling of known pending block
heifner added a commit that referenced this issue Aug 7, 2024
heifner added a commit that referenced this issue Aug 7, 2024
[5.0 -> main] P2P: Fix handling of known pending block
@spoonincode
Copy link
Member

noticed again on
https://github.com/AntelopeIO/spring/actions/runs/10376337986/job/28728623008
though on an UBSAN build which of course goes kinda on the slow side. Should we reopen this one or do a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment