-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Faster room joins: avoid blocking when pulling events with missing prevs #13355
Faster room joins: avoid blocking when pulling events with missing prevs #13355
Conversation
When a room has the partial state flag, we may not have an accurate `m.room.member` event for event senders in the room's current state, and so cannot perform soft fail checks correctly. Skip the soft fail check entirely in this case. As an alternative, we could block until we have full state, but that would prevent us from receiving incoming events over federation, which is undesirable. Signed-off-by: Sean Quah <[email protected]>
Previously, `partial_state=False` either meant "full state" or "calculate the flag yourself", depending on whether `state_ids_before_event` was provided. Split out the latter meaning into `partial_state=None`. Signed-off-by: Sean Quah <[email protected]>
Signed-off-by: Sean Quah <[email protected]>
…t can return partial state Signed-off-by: Sean Quah <[email protected]>
…rwise we can block when receiving or backfilling events over federation Signed-off-by: Sean Quah <[email protected]>
…e race Now that we persist events with partial state, we have to handle `PartialStateConflictError`s, by retrying just once. Signed-off-by: Sean Quah <[email protected]>
32e5e09
to
b879d8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looks great!
# We ought to have full state now, barring some unlikely race where we left and | ||
# rejoned the room in the background. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check that partial_state
is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, keeping in mind that it won't catch the "calculate-the-state-yourself" case. I can't think of how to handle a second partial state apart from raising an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry yes, I'd envisaged raising an exception, on the grounds we could give a clearer idea of what's going wrong.
But if you'd expect it to behave correctly (ish) in the case where we've rejoined, then I guess that's not appropriate and we should just leave this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've yet to figure out what happens when we rejoin (not that we can even leave right now!) so am happy to raise an exception.
synapse/handlers/federation_event.py
Outdated
if context.partial_state: | ||
|
||
context = None | ||
if not partial_state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd find it helpful to enumerate the possibilities here. AIUI, they are:
state_ids
andpartial_state
are bothNone
, if we had all the prev_events (some of which may have partial state).state_ids
is non-empty, andpartial_state
isFalse
, meaning we were missing some or all prev_events, and have calculated the state after the prev_eventsstate_ids
is non-empty, andpartial_state
isTrue
, meaning we were missing some prev_events, but others had partial-state. We have calculated the state after all the prev_events, but it is still partial. There is no point building an event context in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I can't help feeling that the interface for _resolve_state_at_missing_prevs
is excessively baroque and coupled to that of compute_event_context
. (Should we replace _resolve_state_at_missing_prevs
with something that returns an EventContext
, and we pass an EventContext
into _process_received_pdu
instead of state_ids
and partial_state
?)
But that probably requires a bunch of refactoring that isn't justified right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a big comment listing the cases, borrowing some of your wording here.
I agree that _resolve_state_at_missing_prevs
is really unwieldy after this change. Is the proposal to have it *always* return an EventContext
? I'll try that refactor in a follow on PR.
@@ -82,13 +82,15 @@ async def get_state_group_delta( | |||
return state_group_delta.prev_group, state_group_delta.delta_ids | |||
|
|||
async def get_state_groups_ids( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a random point for future reference: there is a convention that the subject line of a git commit message should be limited to 50 characters. That helps avoid this sort of thing:
There's an art to writing such a pithy summary, of course. In this case I would probably just say:
`get_state_groups_ids`: add `await_full_state` parameter
... and put "so that it can return partial state" in the body of the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, I wasn't aware that the subject line had a shorter recommended length.
Up until now I've been using a limit of 72 characters in merge commits (and ignoring it in commits that get squashed, like the one above). 50 characters certainly makes things tricky!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignoring it in commits that get squashed, like the one above
Yeah, the problem comes when we're doing a commit-by-commit review: it's much nicer if there's a pithy summary for each commit. It's certainly not worth stressing overmuch over though.
50 characters certainly makes things tricky!
Indeed. github actually seems to truncate at 66 chars, so that seems a more reasonable compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Co-authored-by: Richard van der Hoff <[email protected]>
…evs (matrix-org#13355) Avoid blocking on full state in `_resolve_state_at_missing_prevs` and return a new flag indicating whether the resolved state is partial. Thread that flag around so that it makes it into the event context. Co-authored-by: Richard van der Hoff <[email protected]>
Synapse 1.65.0 (2022-08-16) =========================== No significant changes since 1.65.0rc2. Synapse 1.65.0rc2 (2022-08-11) ============================== Internal Changes ---------------- - Revert 'Remove the unspecced `room_id` field in the `/hierarchy` response. ([\matrix-org#13365](matrix-org#13365))' to give more time for clients to update. ([\matrix-org#13501](matrix-org#13501)) Synapse 1.65.0rc1 (2022-08-09) ============================== Features -------- - Add support for stable prefixes for [MSC2285 (private read receipts)](matrix-org/matrix-spec-proposals#2285). ([\matrix-org#13273](matrix-org#13273)) - Add new unstable error codes `ORG.MATRIX.MSC3848.ALREADY_JOINED`, `ORG.MATRIX.MSC3848.NOT_JOINED`, and `ORG.MATRIX.MSC3848.INSUFFICIENT_POWER` described in [MSC3848](matrix-org/matrix-spec-proposals#3848). ([\matrix-org#13343](matrix-org#13343)) - Use stable prefixes for [MSC3827](matrix-org/matrix-spec-proposals#3827). ([\matrix-org#13370](matrix-org#13370)) - Add a new module API method to translate a room alias into a room ID. ([\matrix-org#13428](matrix-org#13428)) - Add a new module API method to create a room. ([\matrix-org#13429](matrix-org#13429)) - Add remote join capability to the module API's `update_room_membership` method (in a backwards compatible manner). ([\matrix-org#13441](matrix-org#13441)) Bugfixes -------- - Update the version of the LDAP3 auth provider module included in the `matrixdotorg/synapse` DockerHub images and the Debian packages hosted on packages.matrix.org to 0.2.2. This version fixes a regression in the module. ([\matrix-org#13470](matrix-org#13470)) - Fix a bug introduced in Synapse v1.41.0 where the `/hierarchy` API returned non-standard information (a `room_id` field under each entry in `children_state`). ([\matrix-org#13365](matrix-org#13365)) - Fix a bug introduced in Synapse 0.24.0 that would respond with the wrong error status code to `/joined_members` requests when the requester is not a current member of the room. Contributed by @andrewdoh. ([\matrix-org#13374](matrix-org#13374)) - Fix bug in handling of typing events for appservices. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13392](matrix-org#13392)) - Fix a bug introduced in Synapse 1.57.0 where rooms listed in `exclude_rooms_from_sync` in the configuration file would not be properly excluded from incremental syncs. ([\matrix-org#13408](matrix-org#13408)) - Fix a bug in the experimental faster-room-joins support which could cause it to get stuck in an infinite loop. ([\matrix-org#13353](matrix-org#13353)) - Faster room joins: fix a bug which caused rejected events to become un-rejected during state syncing. ([\matrix-org#13413](matrix-org#13413)) - Faster room joins: fix error when running out of servers to sync partial state with, so that Synapse raises the intended error instead. ([\matrix-org#13432](matrix-org#13432)) Updates to the Docker image --------------------------- - Make Docker images build on armv7 by installing cryptography dependencies in the 'requirements' stage. Contributed by Jasper Spaans. ([\matrix-org#13372](matrix-org#13372)) Improved Documentation ---------------------- - Update the 'registration tokens' page to acknowledge that the relevant MSC was merged into version 1.2 of the Matrix specification. Contributed by @moan0s. ([\matrix-org#11897](matrix-org#11897)) - Document which HTTP resources support gzip compression. ([\matrix-org#13221](matrix-org#13221)) - Add steps describing how to elevate an existing user to administrator by manipulating the database. ([\matrix-org#13230](matrix-org#13230)) - Fix wrong headline for `url_preview_accept_language` in documentation. ([\matrix-org#13437](matrix-org#13437)) - Remove redundant 'Contents' section from the Configuration Manual. Contributed by @dklimpel. ([\matrix-org#13438](matrix-org#13438)) - Update documentation for config setting `macaroon_secret_key`. ([\matrix-org#13443](matrix-org#13443)) - Update outdated information on `sso_mapping_providers` documentation. ([\matrix-org#13449](matrix-org#13449)) - Fix example code in module documentation of `password_auth_provider_callbacks`. ([\matrix-org#13450](matrix-org#13450)) - Make the configuration for the cache clearer. ([\matrix-org#13481](matrix-org#13481)) Internal Changes ---------------- - Extend the release script to automatically push a new SyTest branch, rather than having that be a manual process. ([\matrix-org#12978](matrix-org#12978)) - Make minor clarifications to the error messages given when we fail to join a room via any server. ([\matrix-org#13160](matrix-org#13160)) - Enable Complement CI tests in the 'latest deps' test run. ([\matrix-org#13213](matrix-org#13213)) - Fix long-standing bugged logic which was never hit in `get_pdu` asking every remote destination even after it finds an event. ([\matrix-org#13346](matrix-org#13346)) - Faster room joins: avoid blocking when pulling events with partially missing prev events. ([\matrix-org#13355](matrix-org#13355)) - Instrument `/messages` for understandable traces in Jaeger. ([\matrix-org#13368](matrix-org#13368)) - Remove an unused argument to `get_relations_for_event`. ([\matrix-org#13383](matrix-org#13383)) - Add a `merge-back` command to the release script, which automates merging the correct branches after a release. ([\matrix-org#13393](matrix-org#13393)) - Adding missing type hints to tests. ([\matrix-org#13397](matrix-org#13397)) - Faster Room Joins: don't leave a stuck room partial state flag if the join fails. ([\matrix-org#13403](matrix-org#13403)) - Refactor `_resolve_state_at_missing_prevs` to compute an `EventContext` instead. ([\matrix-org#13404](matrix-org#13404), [\matrix-org#13431](matrix-org#13431)) - Faster Room Joins: prevent Synapse from answering federated join requests for a room which it has not fully joined yet. ([\matrix-org#13416](matrix-org#13416)) - Re-enable running Complement tests against Synapse with workers. ([\matrix-org#13420](matrix-org#13420)) - Prevent unnecessary lookups to any external `get_event` cache. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13435](matrix-org#13435)) - Add some tracing to give more insight into local room joins. ([\matrix-org#13439](matrix-org#13439)) - Rename class `RateLimitConfig` to `RatelimitSettings` and `FederationRateLimitConfig` to `FederationRatelimitSettings`. ([\matrix-org#13442](matrix-org#13442)) - Add some comments about how event push actions are stored. ([\matrix-org#13445](matrix-org#13445), [\matrix-org#13455](matrix-org#13455)) - Improve rebuild speed for the "synapse-workers" docker image. ([\matrix-org#13447](matrix-org#13447)) - Fix `@tag_args` being off-by-one with the arguments when tagging a span (tracing). ([\matrix-org#13452](matrix-org#13452)) - Update type of `EventContext.rejected`. ([\matrix-org#13460](matrix-org#13460)) - Use literals in place of `HTTPStatus` constants in tests. ([\matrix-org#13463](matrix-org#13463), [\matrix-org#13469](matrix-org#13469)) - Correct a misnamed argument in state res v2 internals. ([\matrix-org#13467](matrix-org#13467))
Avoid blocking on full state in
_resolve_state_at_missing_prevs
andreturn a new flag indicating whether the resolved state is partial.
Thread that flag around so that it makes it into the event context.
Resolves #13002.
Corresponding Complement tests: matrix-org/complement#419
Best reviewed commit by commit.
The first commit is from #13354.