-
-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(subscriber): use monotonic Instant
s for all timestamps
#288
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Motivation Currently, the `console-subscriber` crate records all timestamps as `SystemTime`s. This is because they are eventually sent over the wire as protobuf `Timestamp`s, which can be constructed from a `SystemTime`. They cannot be constructed from `Instant`s, because `Instant` is opaque, and does not expose access to the underlying OS time. However, using `SystemTime` is not really correct for our use case. We only use timestamps for calculating durations; we only have to serialize them because some durations are calculated in the console UI rather than in-process. We *don't* need timestamps that are globally consistent with a shared timebase, but we *do* need monotonicity --- using `SystemTime` leaves us vulnerable to clock skew, if (for example), an NTP clock skew adjustment causes the system clock to run backwards far enough that a poll appears to end "before" it started (as in issue #286). If we were using monotonic `Instant`s, all polls should always have positive durations, but with `SystemTime`s, this isn't necessarily the case. Furthermore, `Instant::now()` may have less performance overhead than `SystemTime::now()`, at least on some platforms. ## Solution This branch changes `console-subscriber` to always take timestamps using `Instant::now()` rather than using `SystemTime::now()`, and store all timestamps as `Instant`s. In order to convert these `Instant`s into `SystemTime`s that can be sent over the wire, we construct a reference `TimeAnchor`, consisting of a paired `Instant` and `SystemTime` recorded at the same time when the `ConsoleLayer` is constructed. We can then construct "system times" that are monotonic, by calculating the duration between a given `Instant` and the anchor `Instant`, and adding that duration to the anchor `SystemTime`. These are not *real* system timestamps, as they will never run backwards if the system clock is adjusted; they are relative only to the base process start time as recorded by the anchor. However, they *are* monotonic, and all durations calculated from them will be reasonable. This is part of the change I proposed in #254. I'm not going to close that issue yet, though, as it also described potentially switching to use the `quanta` crate rather than `std::time::Instant` to reduce the overhead of recording monotonic timestamps. Fixes #286
seanmonstar
approved these changes
Feb 17, 2022
hawkw
added a commit
that referenced
this pull request
Feb 17, 2022
## Motivation Currently, when constructing an update message to send over the wire, a timestamp is taken first, and then the protobuf data is constructed. This can lead to issues where the "now" timestamp is actually _before_ timestamps present in the stats sent in that update, since the stats for a particular task/resource/async op might be updated on another thread after taking the update's "now" timestamp. This results in issues like #266. ## Solution There's no actual reason to take those timestamps *before* we assemble the update. This branch changes the aggregator to build all the various data updates in an update message, and *then* record the update's "now" timestamp. Any timestamps for tasks/resources/async ops that are recorded after the update's "now" timestamp will now be included in the *next* update. While I was making this change, I also did a little refactoring. I factored out the code that's shared between constructing the initial update and subsequent updates into methods that are called with an `Include` enum, so we can reuse them when building the first update for a subscription and when building subsequent updates in `publish`. IMO this makes the code much more readable. Fixes #266 Depends on #288
Signed-off-by: Eliza Weisman <[email protected]>
hawkw
force-pushed
the
eliza/no-system-time
branch
from
February 17, 2022 18:33
9b442a9
to
c868b03
Compare
Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
that referenced
this pull request
Feb 17, 2022
## Motivation Currently, when constructing an update message to send over the wire, a timestamp is taken first, and then the protobuf data is constructed. This can lead to issues where the "now" timestamp is actually _before_ timestamps present in the stats sent in that update, since the stats for a particular task/resource/async op might be updated on another thread after taking the update's "now" timestamp. This results in issues like #266. ## Solution There's no actual reason to take those timestamps *before* we assemble the update. This branch changes the aggregator to build all the various data updates in an update message, and *then* record the update's "now" timestamp. Any timestamps for tasks/resources/async ops that are recorded after the update's "now" timestamp will now be included in the *next* update. While I was making this change, I also did a little refactoring. I factored out the code that's shared between constructing the initial update and subsequent updates into methods that are called with an `Include` enum, so we can reuse them when building the first update for a subscription and when building subsequent updates in `publish`. IMO this makes the code much more readable. Fixes #266 Depends on #288
hawkw
added a commit
that referenced
this pull request
Feb 17, 2022
## Motivation Currently, when constructing an update message to send over the wire, a timestamp is taken first, and then the protobuf data is constructed. This can lead to issues where the "now" timestamp is actually _before_ timestamps present in the stats sent in that update, since the stats for a particular task/resource/async op might be updated on another thread after taking the update's "now" timestamp. This results in issues like #266. ## Solution There's no actual reason to take those timestamps *before* we assemble the update. This branch changes the aggregator to build all the various data updates in an update message, and *then* record the update's "now" timestamp. Any timestamps for tasks/resources/async ops that are recorded after the update's "now" timestamp will now be included in the *next* update. Fixes #266 Depends on #288
hawkw
added a commit
that referenced
this pull request
Feb 17, 2022
## Motivation Currently, when constructing an update message to send over the wire, a timestamp is taken first, and then the protobuf data is constructed. This can lead to issues where the "now" timestamp is actually _before_ timestamps present in the stats sent in that update, since the stats for a particular task/resource/async op might be updated on another thread after taking the update's "now" timestamp. This results in issues like #266. ## Solution There's no actual reason to take those timestamps *before* we assemble the update. This branch changes the aggregator to build all the various data updates in an update message, and *then* record the update's "now" timestamp. Any timestamps for tasks/resources/async ops that are recorded after the update's "now" timestamp will now be included in the *next* update. Fixes #266 Depends on #288
hawkw
added a commit
that referenced
this pull request
Feb 17, 2022
* refac(subscriber): factor out update construction This branch refactors how the aggregator task builds update messages. In particular, it factors out the shared code between building the initial update for a subscription and building the subsequent updates in `publish` into methods that are called with an `Include` enum to determine whether the update should include all tasks/resources/ops, or just the ones since the last update. This makes the code in `add_instrument_subscription` and in `publish` much easier to read. This is just the refactor part from #289 split out into a separate branch; this should make no functional change. If we decide that #289 is the correct way to fix the bug, we can rebase that branch onto this, so that it _just_ includes the bugfix. This way, we can land the refactor anyway, because I think it's nice. * fix(subscriber): record timestamps for updates last ## Motivation Currently, when constructing an update message to send over the wire, a timestamp is taken first, and then the protobuf data is constructed. This can lead to issues where the "now" timestamp is actually _before_ timestamps present in the stats sent in that update, since the stats for a particular task/resource/async op might be updated on another thread after taking the update's "now" timestamp. This results in issues like #266. ## Solution There's no actual reason to take those timestamps *before* we assemble the update. This branch changes the aggregator to build all the various data updates in an update message, and *then* record the update's "now" timestamp. Any timestamps for tasks/resources/async ops that are recorded after the update's "now" timestamp will now be included in the *next* update. Fixes #266 Depends on #288
hawkw
added a commit
that referenced
this pull request
Feb 18, 2022
## 0.1.3 (2022-02-18) #### Bug Fixes * record timestamps for updates last (#289) ([703f1aa](703f1aa), closes [#266](266)) * use monotonic `Instant`s for all timestamps (#288) ([abc0830](abc0830), closes [#286](286)) * bail rather than panic when encountering clock skew (#287) ([24db8c6](24db8c6), closes [#286](286)) * fix compilation on targets without 64-bit atomics (#282) ([5590fdb](5590fdb), closes [#279](279))
hawkw
added a commit
that referenced
this pull request
Feb 18, 2022
#### Features * add `Builder::filter_env_var` builder parameter (#276) ([dbdb149](dbdb149), closes [#206](206)) #### Bug Fixes * record timestamps for updates last (#289) ([703f1aa](703f1aa), closes [#266](266)) * use monotonic `Instant`s for all timestamps (#288) ([abc0830](abc0830), closes [#286](286)) * bail rather than panic when encountering clock skew (#287) ([24db8c6](24db8c6), closes [#286](286)) * fix compilation on targets without 64-bit atomics (#282) ([5590fdb](5590fdb), closes [#279](279))
hawkw
added a commit
that referenced
this pull request
Feb 18, 2022
#### Features * add `Builder::filter_env_var` builder parameter (#276) ([dbdb149](dbdb149), closes [#206](206)) #### Bug Fixes * record timestamps for updates last (#289) ([703f1aa](703f1aa), closes [#266](266)) * use monotonic `Instant`s for all timestamps (#288) ([abc0830](abc0830), closes [#286](286)) * bail rather than panic when encountering clock skew (#287) ([24db8c6](24db8c6), closes [#286](286)) * fix compilation on targets without 64-bit atomics (#282) ([5590fdb](5590fdb), closes [#279](279))
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Currently, the
console-subscriber
crate records all timestamps asSystemTime
s. This is because they are eventually sent over the wire asprotobuf
Timestamp
s, which can be constructed from aSystemTime
.They cannot be constructed from
Instant
s, becauseInstant
is opaque,and does not expose access to the underlying OS time.
However, using
SystemTime
is not really correct for our use case. Weonly use timestamps for calculating durations; we only have to serialize
them because some durations are calculated in the console UI rather than
in-process. We don't need timestamps that are globally consistent with
a shared timebase, but we do need monotonicity --- using
SystemTime
leaves us vulnerable to clock skew, if (for example), an NTP clock skew
adjustment causes the system clock to run backwards far enough that a
poll appears to end "before" it started (as in issue #286). If we were
using monotonic
Instant
s, all polls should always have positivedurations, but with
SystemTime
s, this isn't necessarily the case.Furthermore,
Instant::now()
may have less performance overhead thanSystemTime::now()
, at least on some platforms.Solution
This branch changes
console-subscriber
to always take timestamps usingInstant::now()
rather than usingSystemTime::now()
, and store alltimestamps as
Instant
s. In order to convert theseInstant
s intoSystemTime
s that can be sent over the wire, we construct a referenceTimeAnchor
, consisting of a pairedInstant
andSystemTime
recordedat the same time when the
ConsoleLayer
is constructed. We can thenconstruct "system times" that are monotonic, by calculating the duration
between a given
Instant
and the anchorInstant
, and adding thatduration to the anchor
SystemTime
. These are not real systemtimestamps, as they will never run backwards if the system clock is
adjusted; they are relative only to the base process start time as
recorded by the anchor. However, they are monotonic, and all durations
calculated from them will be reasonable.
This is part of the change I proposed in #254. I'm not going to close
that issue yet, though, as it also described potentially switching to
use the
quanta
crate rather thanstd::time::Instant
to reduce theoverhead of recording monotonic timestamps.
Fixes #286