test_multi_threaded_executor timer_over_take is flaky due to timer jitter #1008

brawner · 2020-03-01T00:44:33Z

This seems to be a reoccurring flaky test, as a test introduced in #383 has been relaxed a couple of times (#501, and #907), but the test may still fail on different platforms. The measured period between timer callbacks is supposed to be 0.1s, but the linux test result below shows that the period could be less than half of that. While the test is not meant to measure timer jitter, the results suggest that the timer jitter cannot be reliably bounded.

See:
https://ci.ros2.org/view/nightly/job/nightly_linux_repeated/1780/testReport/(root)/projectroot/test_multi_threaded_executor/
https://ci.ros2.org/view/nightly/job/nightly_osx_repeated/1866/testReport/(root)/projectroot/test_multi_threaded_executor/
https://ci.ros2.org/view/nightly/job/nightly_windows-container_repeated/6/testReport/(root)/projectroot/test_multi_threaded_executor/

ros-discourse · 2020-03-20T20:40:51Z

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2020-03-18/13313/1

chapulina · 2021-02-22T22:08:36Z

I think this may be a duplicate of #751.

This test is still flaky one year later. Here's the failure

[ RUN      ] TestMultiThreadedExecutor.timer_over_take
/Users/osrf/jenkins-agent/workspace/nightly_osx_debug/ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_multi_threaded_executor.cpp:91: Failure
Expected: (diff) > (PERIOD - TOLERANCE), actual: 0.732793 vs 0.75
[  FAILED  ] TestMultiThreadedExecutor.timer_over_take (4201 ms)

On #751, there was a suggestion to "remove the upper bound condition on the test", which I think may refer exactly to the offending expectation, and @clalancette agreed at the time (2 years ago!) that it was a good idea.

rclcpp/rclcpp/test/rclcpp/executors/test_multi_threaded_executor.cpp

Line 91 in c2a75f0

ASSERT_GT(diff, PERIOD - TOLERANCE);

https://github.com/osrf/buildfarmer/issues/161

clalancette · 2021-02-22T22:57:38Z

Actually, I don't think we have the upper bound anymore. I'm going to close out #751.

When I've looked into this in the past, it does look like a real bug in the multithreaded implementation. That is, we are setting up a timer to expire once a second. We should reasonably expect that timer to execute no more than once a second. However, these failures show that sometimes callbacks happen much earlier than that, even down to 0.5 seconds. So something is obviously wrong there, but it is a hard one to debug.

ivanpauno · 2021-04-12T21:24:22Z

There was some investigation done here.
#1628 will (hopefully) fix the issue.

claireyywang assigned gonzodepedro Mar 12, 2020

chapulina added Linux Linux support macOS macOS support tests Failing or missing tests Windows Windows support labels Nov 4, 2020

clalancette mentioned this issue Mar 30, 2021

test_multithreaded_executor is flaky #1552

Closed

This was referenced Apr 6, 2021

Mark test_multi_threaded_executor as xfail #1624

Closed

Use a different implementation of mutex two priorities #1628

Merged

ivanpauno closed this as completed in #1628 Apr 13, 2021

ivanpauno mentioned this issue Apr 16, 2021

Foxy: backport mutex_two_priorities #1516 #1636

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_multi_threaded_executor timer_over_take is flaky due to timer jitter #1008

test_multi_threaded_executor timer_over_take is flaky due to timer jitter #1008

brawner commented Mar 1, 2020

ros-discourse commented Mar 20, 2020

chapulina commented Feb 22, 2021 •

edited

Loading

clalancette commented Feb 22, 2021

ivanpauno commented Apr 12, 2021

test_multi_threaded_executor timer_over_take is flaky due to timer jitter #1008

test_multi_threaded_executor timer_over_take is flaky due to timer jitter #1008

Comments

brawner commented Mar 1, 2020

ros-discourse commented Mar 20, 2020

chapulina commented Feb 22, 2021 • edited Loading

clalancette commented Feb 22, 2021

ivanpauno commented Apr 12, 2021

chapulina commented Feb 22, 2021 •

edited

Loading