Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetworkClock: unblock all waiting threads if a network reset is detected #2494

Merged
merged 1 commit into from
Mar 8, 2021

Conversation

Giulero
Copy link
Contributor

@Giulero Giulero commented Feb 12, 2021

This PR implements the suggestions of @traversaro (see #800 (comment)) to handle the time reset.

This fix should avoid the phenomena of threads stopping and resuming after the time at which the clock was previously reset has passed.

@Giulero Giulero requested a review from drdanz as a code owner February 12, 2021 11:10
@update-docs

This comment has been minimized.

@traversaro
Copy link
Member

Thanks @Giulero ! Can you add a file in https://github.com/robotology/yarp/tree/yarp-3.4/doc/release/yarp_3_4 to document this change? Thanks!

Copy link
Member

@traversaro traversaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some improvements suggestions.

@traversaro
Copy link
Member

@xEnVrE if you use the YARP_CLOCK for Gazebo simulations, I guess you could be interested also in this.

@Giulero
Copy link
Contributor Author

Giulero commented Feb 12, 2021

Thanks @traversaro for the suggestions!

@drdanz
Copy link
Member

drdanz commented Feb 15, 2021

I have a very small concern about this patch.

If, for example, someone defines a clock, similar to what @PeterBowman did in #2488 (comment) which is based on a non-monotonic clock, and the clock adjusts of a few microseconds, this patch will notice that _time < oldTime, and will unlock all threads. This might lead to unexpected behaviours, which may be very hard to debug and to reproduce.

I'd rather keep the thread blocked, and update the wake up time in this way: waiter_it->first = _time + (waiter_it->first - oldTime).

@Giulero and @traversaro what do you think?

@traversaro
Copy link
Member

I think this make sense @drdanz ! @Giulero could you try this modification?

@Giulero
Copy link
Contributor Author

Giulero commented Feb 16, 2021

I agree! I'll try the modification.

@Giulero
Copy link
Contributor Author

Giulero commented Feb 19, 2021

@drdanz @traversaro I tried the modification but maybe I did something wrong.

Correct me if I'm wrong. From what I understood, a delay (of 1) is requested (waiter_it->first evolves in a quantized fashion) every time the condition waiter_it->first - _time < 1E-12 is fulfilled, checking which threads are sleeping.
If a reset occurs, waiter_if->first remains blocked on the last value (for example 100) until _time (starting now from 0) reaches 100.

Updating waiter_it->first = _time + (waiter_it->first - oldTime) will close the gap between waiter_it->first and the "new" _time when the reset occurs. However, waiter_it->first will grow by a delta=_time - oldTime every cycle (not in a quantized way as before) The condition waiter_it->first - _time < 1E-12 will not be met since waiter_it->first is delayed always by 1 w.r.t _time.

Even if it does not work, I commit the modification (f216964), since I probably just misunderstood the suggestion.

@traversaro
Copy link
Member

Even if it does not work, I commit the modification (f216964), since I probably just misunderstood the suggestion.

Should't we simply modify first if a network reset is detected, and leave it as it is in the rest of the cycles?

@Giulero
Copy link
Contributor Author

Giulero commented Feb 19, 2021

Should't we simply modify first if a network reset is detected, and leave it as it is in the rest of the cycles?

I think the same! Even if it seems that the condition clockResetDetected is just moved somewhere else, in this way.
In any case, we should define what network reset means. _time < oldTime doesn't work for non-monotonic clocks. Maybe something more generic as abs(_time - oldTime) > eps?

I also tried to modify first if a reset is detected.
What I checked is that first is modified when a reset is detected, but the cycle after goes back to the previous value (jumping from 0 to 100, for instance). I guess that another process is setting the value of first and we are modifying it just in that scope.

src/libYARP_os/src/yarp/os/NetworkClock.cpp Outdated Show resolved Hide resolved
src/libYARP_os/src/yarp/os/NetworkClock.cpp Outdated Show resolved Hide resolved
Copy link
Member

@drdanz drdanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Giulero Friendly ping 😜
I think this patch does not require much work in order to be merged, please do not forget about this.

src/libYARP_os/src/yarp/os/NetworkClock.cpp Outdated Show resolved Hide resolved

#### `os`

* If in a `yarp::os::NetworkClock` a clock reset is detected, unblock all the threads that are blocked waiting on a `yarp::os::NetworkClock::delay` call on that network clock. A network clock reset is defined a jump in the past of the time published by the network clock port. This avoid that if `YARP_CLOCK` is set, `yarp::os::PeriodicThread` silently stops to run if the network clock is reset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description here should also be updated...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @drdanz :)
Done in ab17f0f.

@Giulero Giulero temporarily deployed to code-analysis March 7, 2021 11:46 Inactive
@Giulero Giulero temporarily deployed to code-analysis March 7, 2021 11:46 Inactive
@Giulero Giulero temporarily deployed to code-analysis March 7, 2021 11:47 Inactive
@Giulero Giulero temporarily deployed to code-analysis March 7, 2021 12:01 Inactive
@Giulero Giulero temporarily deployed to code-analysis March 7, 2021 12:01 Inactive
@Giulero Giulero temporarily deployed to code-analysis March 7, 2021 12:01 Inactive
@Giulero
Copy link
Contributor Author

Giulero commented Mar 7, 2021

@drdanz sorry for the delay! Your suggestions implemented in 97ce64d.

@drdanz drdanz added the PR Status: Continuous Integration - OK Continuous Integration for this PR passed (invalid if commits were added or modified after this) label Mar 8, 2021
Copy link
Member

@drdanz drdanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Now, thanks.

@drdanz

This comment has been minimized.

@drdanz drdanz temporarily deployed to code-analysis March 8, 2021 13:13 Inactive
@drdanz drdanz temporarily deployed to code-analysis March 8, 2021 13:13 Inactive
@drdanz drdanz temporarily deployed to code-analysis March 8, 2021 13:13 Inactive
… reset is detected

Co-authored-by: Daniele E. Domenichelli <[email protected]>
@drdanz drdanz merged commit 285401f into robotology:yarp-3.4 Mar 8, 2021
@drdanz
Copy link
Member

drdanz commented Mar 8, 2021

Merged, thanks.

@drdanz drdanz temporarily deployed to code-analysis March 8, 2021 13:16 Inactive
@drdanz drdanz temporarily deployed to code-analysis March 8, 2021 13:16 Inactive
@drdanz drdanz temporarily deployed to code-analysis March 8, 2021 13:17 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Library - YARP_os PR Status: Continuous Integration - OK Continuous Integration for this PR passed (invalid if commits were added or modified after this) PR Type: Bugfix This PR fixes some bug Target: YARP v3.4.4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants