-
Notifications
You must be signed in to change notification settings - Fork 795
RRR: Reconnection & Request Reissuance #2181
Conversation
7390950
to
22a0e33
Compare
@mattsse PTAL - I'll cut a release after we get this PR in |
going to move all previous WS behavior under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Much needed feature. Looking for a pass from @mattsse before merging.
69be2fc
to
eb6976e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I understood this right, a WsClient now has two different tasks?
I haven't looked into it much, but that's kind of weird I guess.
manager, backend, backenddriver is a but confusing.
it would be great to have some docs on the various channel fields because there are like 4 different channels used in this setup.
this now auto-reconnects on any error error caused by ws connection?
None => if let Err(e) = self.reconnect().await { | ||
break Err(e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could totally be wrong here because it wasn't obvious to which component to_handle
is wired here. but it looks like the other half belongs to WsBackend
?
I believe there's an issue with how the WsBackend
task resolves, which could cause two reconnects if the task exits because of an error?
first the channel will be None, then the self.backend.error
receives the error message and reconnects again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there's an issue with how the WsBackend task resolves, which could cause two reconnects if the task exits because of an error?
No, only 1 select branch is reachable in any pass through the loop. So if the to_handle
channel is dead, the WS will reconnect. Then by the time the loop restarts, the BackendDriver
will be different, and the error channel will be fresh and good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could totally be wrong here because it wasn't obvious to which component to_handle is wired here. but it looks like the other half belongs to WsBackend?
the naming convention is you send a chunk of work to the handler via handler
. the handler accepts work that it should handle via to_handle
same convention with dispatcher
and to_dispatch
. you send to the dispatching task via a dispatcher
channel. the dispatcher accepts the requests via to_dispatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can think of the BackendDriver
as control and ownership of the running backend task. It has the ability to shut down that task, send/receive messages from it, and dropping the driver causes the task to shut down automatically
The communication flow is
client ----instructions---> manager
manager -----requests-----> backend
backend <-----WS Protocol-----> server
backend -----PubSubItem----> manager
manager ---sub notifications---> sub channels
manager ---sub channels---> client
manager ---responses----> request future channels
this would go well in a swim lanes but I don't have time to make one rn
yeah, because reconnection requires the connection to be disposable, there must be >=2 tasks. One for the connection, and 1 for detecting and fixing connection failures.
added comments to all struct fields
yes, up to a fixed number of retries. It also includes a keepalive loop when not running in wasm |
209d37b
to
128c54a
Compare
128c54a
to
669e4fb
Compare
Co-authored-by: Georgios Konstantopoulos <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattsse merging cautiously; if anybody experiences issues they can use the legacy-ws
feature. This is a bit hard to unit test further, open to ideas here.
#[cfg(not(target_arch = "wasm32"))] | ||
tokio::spawn(fut); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could add a reconnects(..)
builder method to override the default, but can do in followup
@prestwich Is it reasonable to try to keep a subscription through If yes, would the best way be to just instantiate a new |
Yes it's pretty reasonable
You should just set an arbitrarily high reconnection count :) u64::max should be fine |
* wip: ws2 * ws2 backend compiles * refactor: rename PubSubItem and BackendDriver * feature: dispatch request to end subscription * refactor: move ws2 to ws, fix reconnection and deser on subs * chore: improve use of tracing in manager * refactor: feature legacy_ws to enable backwards compatibility * nit: mod file ordering * docs: copy PR description to ws structs * fixes: remove unused macros file, remove err formats * docs: add comments to struct fields * docs: comment client struct fields * chore: changelog * fix: unused imports in ws_errors test * docs: missing comment Co-authored-by: Georgios Konstantopoulos <[email protected]> * fix: legacy-ws feature in root crate, hyphen not underscore * fix: a couple bad imports/exports --------- Co-authored-by: Georgios Konstantopoulos <[email protected]>
dropping events seems fine to me but out of curiosity why is it unavoidable? couldn't |
Adds automatic reconnection and request-reissuance logic to WebSockets
Motivation
Users keep making a fuss about it and also it's generally a good thing to have :)
Solution
WS now consists of 2 tasks:
WsBackend
- manages dispatching raw values to the WebSocketRequestManager
- manages caching request bodies until the request has resolvedWsBackend
dispatches requests and routes responses and notifications. It also has a simple ping-based keepalive (when not compiled to wasm), to prevent inactivity from triggering server-side closesRequestManager
holds aBackendDriver
, to communicate with the current backend. Reconnection is accomplished by instantiating a newWsBackend
and swapping out the manager'sBackendDriver
The
RequestManager
holds copies of all pending requests (asInFlight
), and active subscriptions (asActiveSub
). When reconnection occurs, all pending requests are re-dispatched to the new backend, and all active subs are re-subscribedIn order to provide continuity of subscription IDs to the client, the
RequestManager
also keeps aSubscriptionManager
. This struct manages the relationship between the u64 request ID, and U256 server-side subscription ID. It does this by aliasing the server ID to the request ID, and returning the Request ID to the caller (hiding the server ID in theSubscriptionManager
internals.) Giving the caller a "fake" subscription id allows the subscription to behave consistently across reconnectionsThe behavior is accessed by the
WsClient
frontend, which implementsJsonRpcClient
. TheWsClient
is cloneable (no need for an arc 😄). It communicates to the request manager via a channel, and receives notifications in a shared map for the client to retrieveThe
RequestManager
shuts down and drops when allWsClient
instances have been dropped (because allUnboundedSender
instances will have dropped).The
WsBackend
shuts down when instructed to by theRequestManager
or when theRequestManager
drops (because the inbound channel will close)Note on use of mutexes
This work does NOT use async mutexes, as we never need to hold past an await point. Because each use of mutex is fast (simple hashmap insert/remove), we're not concerned about blocking on locking
Limitations:
eth_subscribe
call so that the manager can inspect the sub idFollow-up work:
Because
BackendDriver
does not contain any WS-specific behavior, we can add aRequestManager
in front of IPC channels to add the same reconnection + reissuance logic. This doesn't seem high priority, as WS is much more prone to ephemeral failures than IPCPR Checklist