Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_multi_threaded_executor timer_over_take is flaky due to timer jitter #1008

Closed
brawner opened this issue Mar 1, 2020 · 4 comments · Fixed by #1628
Closed

test_multi_threaded_executor timer_over_take is flaky due to timer jitter #1008

brawner opened this issue Mar 1, 2020 · 4 comments · Fixed by #1628
Assignees
Labels
Linux Linux support macOS macOS support tests Failing or missing tests Windows Windows support

Comments

@brawner
Copy link
Contributor

brawner commented Mar 1, 2020

This seems to be a reoccurring flaky test, as a test introduced in #383 has been relaxed a couple of times (#501, and #907), but the test may still fail on different platforms. The measured period between timer callbacks is supposed to be 0.1s, but the linux test result below shows that the period could be less than half of that. While the test is not meant to measure timer jitter, the results suggest that the timer jitter cannot be reliably bounded.

See:
https://ci.ros2.org/view/nightly/job/nightly_linux_repeated/1780/testReport/(root)/projectroot/test_multi_threaded_executor/
https://ci.ros2.org/view/nightly/job/nightly_osx_repeated/1866/testReport/(root)/projectroot/test_multi_threaded_executor/
https://ci.ros2.org/view/nightly/job/nightly_windows-container_repeated/6/testReport/(root)/projectroot/test_multi_threaded_executor/

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2020-03-18/13313/1

@chapulina chapulina added Linux Linux support macOS macOS support tests Failing or missing tests Windows Windows support labels Nov 4, 2020
@chapulina
Copy link
Contributor

chapulina commented Feb 22, 2021

I think this may be a duplicate of #751.

This test is still flaky one year later. Here's the failure

[ RUN      ] TestMultiThreadedExecutor.timer_over_take
/Users/osrf/jenkins-agent/workspace/nightly_osx_debug/ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_multi_threaded_executor.cpp:91: Failure
Expected: (diff) > (PERIOD - TOLERANCE), actual: 0.732793 vs 0.75
[  FAILED  ] TestMultiThreadedExecutor.timer_over_take (4201 ms)

On #751, there was a suggestion to "remove the upper bound condition on the test", which I think may refer exactly to the offending expectation, and @clalancette agreed at the time (2 years ago!) that it was a good idea.

ASSERT_GT(diff, PERIOD - TOLERANCE);


https://github.com/osrf/buildfarmer/issues/161

@clalancette
Copy link
Contributor

Actually, I don't think we have the upper bound anymore. I'm going to close out #751.

When I've looked into this in the past, it does look like a real bug in the multithreaded implementation. That is, we are setting up a timer to expire once a second. We should reasonably expect that timer to execute no more than once a second. However, these failures show that sometimes callbacks happen much earlier than that, even down to 0.5 seconds. So something is obviously wrong there, but it is a hard one to debug.

@ivanpauno
Copy link
Member

There was some investigation done here.
#1628 will (hopefully) fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Linux Linux support macOS macOS support tests Failing or missing tests Windows Windows support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants