Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical synchronization bug results into going back in time #48

Open
Tracked by #3
bruno-f-cruz opened this issue May 8, 2024 · 6 comments
Open
Tracked by #3
Assignees
Labels
bug Something isn't working

Comments

@bruno-f-cruz
Copy link
Member

bruno-f-cruz commented May 8, 2024

Consider a rig where a FLIR camera is being externally triggered by an harp behavior board using DO0. When looking at the difference between the timestamps of adjacent frames and triggers, I get the following plot:

image

As you can see, while the camera operates at a steady 8.3ms per frame, two frames in >400k seem to have a lower-than-expected timestamp.

  • Set framerate matches the average frame rate of the camera (120fps)
  • The first frame is emitted AFTER the start trigger and the last before the stop trigger event.
  • The number of frames and number of triggers match.

The two "faulty" frames look like:

Frame 187419 has a timestamp difference of 0.00669 s
                FrameAcquired MessageType     Diff
Time                                              
2992896.986816           True       EVENT 0.008352
2992896.995136           True       EVENT 0.008320
2992897.001824           True       EVENT 0.006688
2992897.010176           True       EVENT 0.008352


Frame 341022 has a timestamp difference of 0.00621 s
                FrameAcquired MessageType     Diff
Time                                              
2994176.990112           True       EVENT 0.008352
2994176.998432           True       EVENT 0.008320
2994177.004640           True       EVENT 0.006208
2994177.012960           True       EVENT 0.008320

One import point, is that the bug seems to happen around the second roll-over so I am wondering if this is a problem related with the clock synchronization. It is kinda weird that according to the camera the period remains stable but not according to the behavior board. I suspect this might be because the PWM task is running on the MCU clock and not on the Harp clock. Once the clock synchronizes (and corrects the harp clock) it results in a mis-timestamped trigger as seen by the non-linear harp clock. While this seems like an ok behavior in general if the correction of the harp timestamp is small, it seems weird that such a big gap is found (2ms) between the timestamped reported by the camera and the one reported by harp.

@bruno-f-cruz
Copy link
Member Author

Updating this.
Interestingly enough this seems to be a problem with the clock source (TimestampGeneratorGen3). The same problem is present in other registers of the same device (e.g. Analog Data)

image

But also in other devices (Olfactometer) that were acquiring data in the same experiment:
image

@bruno-f-cruz bruno-f-cruz transferred this issue from harp-tech/device.behavior May 10, 2024
@bruno-f-cruz bruno-f-cruz changed the title Potential timestamping bug in Camera0Frame register Critical synchronization bug results into going back in time May 10, 2024
@bruno-f-cruz
Copy link
Member Author

More data on this bug

Other sessions produce a rather weird bug where the jump in time is actually closer to 1 full second:

                0      diff
Time                       
2487628.957536  1  0.008352
2487628.965888  1  0.008320
2487628.974208  1  0.008352
2487628.982560  1  0.008320
2487628.990880  1 -0.974720
2487628.016160  1  0.991392
2487629.007552  1  0.008320
2487629.015872  1  0.008352

I am not sure if this is the same bug or a mixture of two bugs (the one described above + non atomic update of the two time keeping registers).

@bruno-f-cruz
Copy link
Member Author

@filcarv suggested connecting a clock generator and a subordinate board. Measure both synch led outputs. Unplug the synch cable and check how long it takes to drift the heartbeat.

@glopesdev
Copy link
Collaborator

@bruno-f-cruz @filcarv We also just found this on long-term data collection epochs. We've seen two different asymmetrical patterns, but essentially same features as @bruno-f-cruz described.

It's interesting that this was not present in our earlier datasets. We used to have a rare bug where PWM rise times would update non-atomically but this would manifest as a jump forward in time, followed by a jump backwards in time of the same amount. It's more concerning if we have now gaps which are non-symmetric.

@Poofjunior
Copy link

Poofjunior commented Sep 12, 2024

Here's the Pico strategy for periodic behavior.
harp_sync_drawing_reference

Heartbeats are scheduled 1s into the future relative to the current Harp Time whole second. If Harp time is additionally updated within that interval, it is ignored. When the interval has elapsed, a message is issued with the current Harp time, which may be more up-to-date than when the heartbeat event was scheduled.

Note

local system time is monotonic, but Harp time is not necessarily monotonic.

@Poofjunior
Copy link

Poofjunior commented Sep 12, 2024

Updated image:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants