-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cosmic-swingset): poll timer after swingset quiescent #7423
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure @btulloh and @rowgraus are cool with this change, as it means things like time-triggered liquidations will be deferred in favor of pre-existing work (actions that have already started, but haven't finished yet, not new actions coming from the queue).
Assuming that, I'm happy with this change as long as you remove the if (addedToQueue)
condition.
`polled; blockTime:${blockTime}, h:${blockHeight}; ADDED =`, | ||
addedToQueue, | ||
); | ||
if (addedToQueue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to runSwingset()
independently of the return value of timer.poll()
.
I made timer.poll
return "there is work to be done" so that solo apps could do less work in the no-work-to-be-done case. But I remember we had determinism problems (maybe during replay/restart?) when we allowed it to avoid work in a consensus-based kernel, so we stopped using the return value.
I'll look for the relevant issue/change, but to be safe, I'd recommend removing the if (addedToQueue)
condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7703aa7#diff-cb472ee053cfbc36ebbae1abba0c95dc1f6c2775c90dc563e0b8f8307ac48e4fR134 is one relevant fix (three years ago, wow). At that time, we did the timer.poll()
during BEGIN_BLOCK
, and ran the kernel immediately iff it added anything. That was broken because running the kernel is also what committed the state, and a timer.poll will change state (recording the new time) even if nothing is triggered.
I'll keep poking around for others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just re-read the timer-device.js
code, and now I think making c.run() conditional on the return value of timer.poll() is probably ok. The important constraint is to always always commit the kernel DB. timer.poll() has state side-effects that need to be committed, even if it doesn't result in work that needs to be done (so no controller.run()), I think that was the mistake we made last time. I'm not 100% sure how those state changes (kvStore entries for device-timer's state.now) get incorporated into the crankhash/activityhash if we don't do controller.run, that's the one remaining thing I need to investigate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least swingstore exports are not flushed until a crank ends, which means we must attempt to run the kernel even if nothing was queued to generate an empty crank. It's likely activityHash would also not get updated correctly without it.
So yeah we must run unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for my reference later: "swingstore exports" refers to the "swingstore export" feature we added to support state-sync snapshots: the "Export Data" associated with swing-store changes are streamed to a callback function, but we deliberately delay those updates if we think we might unwind the changes associated with a delivery. In this case, these changes were being delayed until some subsequent actual delivery. This also raises the concern that the first delivery of the subsequent run might get rewound, thus losing the unrelated device state changes that were made before c.run()
started)
38a50db
to
c6df9ad
Compare
refs: #6964
Description
Drain the kernel queue before polling the timer
Security Considerations
None
Scaling Considerations
This should prevent userland logic creating faulty timers to compound their problems by not advancing the time and invoking new callbacks while previous timer triggered logic is still running.
Documentation Considerations
None
Testing Considerations
Unsure how to test this. As with most cosmic-swingset changes, we rely mostly on integration tests.