Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Immediately retry any requests that have backed off when a server comes back online. #12500

Merged
merged 8 commits into from
May 10, 2022

Conversation

erikjohnston
Copy link
Member

Otherwise it can take up to a minute for any in-flight /send requests to be retried.

@erikjohnston erikjohnston changed the title Immediate retry any requests that have backed off when a server comes back online. Immediately retry any requests that have backed off when a server comes back online. Apr 19, 2022
@erikjohnston erikjohnston force-pushed the erikj/server_wake_up branch 3 times, most recently from 262c4ec to ee54ddc Compare April 19, 2022 15:19
@erikjohnston erikjohnston marked this pull request as ready for review May 5, 2022 12:35
@erikjohnston erikjohnston requested a review from a team as a code owner May 5, 2022 12:35
synapse/util/async_helpers.py Show resolved Hide resolved
def wake(self, name: str) -> None:
"""Wake everything related to `name` that is currently sleeping."""
stream_set = self._streams.pop(name, set())
for deferred in set(stream_set):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this extra set construction necessary? AFAICS we're not mutating stream_set below, and self._streams should only contain sets for its values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so, turns out this was due to another part of the code storing a stream_set across a yield point, and so popping from self._streams wasn't enough to ensure nothing else would try and mutate it. I think I've fixed that now though.

synapse/util/async_helpers.py Show resolved Hide resolved
tests/util/test_async_helpers.py Show resolved Hide resolved
Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any danger that immediately retrying a bunch of requests ends up spamming the server with shedloads of stuff at once and brings it down again?

@erikjohnston
Copy link
Member Author

Is there any danger that immediately retrying a bunch of requests ends up spamming the server with shedloads of stuff at once and brings it down again?

I don't think so. We already clear the backoff timers when we see the host is back up, so this will only cause in flight requests to be immediately retried (and since we can only have one in flight transaction per remote host, that shouldn't be toooo bad)

Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks. LGTM

@erikjohnston erikjohnston merged commit 8dd3e0e into develop May 10, 2022
@erikjohnston erikjohnston deleted the erikj/server_wake_up branch May 10, 2022 09:39
DMRobertson pushed a commit that referenced this pull request May 10, 2022
Synapse 1.59.0rc1 (2022-05-10)
==============================

This release makes several changes that server administrators should be aware of:

- Device name lookup over federation is now disabled by default. ([\#12616](#12616))
- The `synapse.app.appservice` and `synapse.app.user_dir` worker application types are now deprecated. ([\#12452](#12452), [\#12654](#12654))

See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1590) for more details.

Additionally, this release removes the non-standard `m.login.jwt` login type from Synapse. It can be replaced with `org.matrix.login.jwt` for identical behaviour. This is only used if `jwt_config.enabled` is set to `true` in the configuration. ([\#12597](#12597))

Features
--------

- Support [MSC3266](matrix-org/matrix-spec-proposals#3266) room summaries over federation. ([\#11507](#11507))
- Implement [changes](matrix-org/matrix-spec-proposals@4a77139) to [MSC2285 (hidden read receipts)](matrix-org/matrix-spec-proposals#2285). Contributed by @SimonBrandner. ([\#12168](#12168), [\#12635](#12635), [\#12636](#12636), [\#12670](#12670))
- Extend the [module API](https://github.com/matrix-org/synapse/blob/release-v1.59/synapse/module_api/__init__.py) to allow modules to change actions for existing push rules of local users. ([\#12406](#12406))
- Add the `notify_appservices_from_worker` configuration option (superseding `notify_appservices`) to allow a generic worker to be designated as the worker to send traffic to Application Services. ([\#12452](#12452))
- Add the `update_user_directory_from_worker` configuration option (superseding `update_user_directory`) to allow a generic worker to be designated as the worker to update the user directory. ([\#12654](#12654))
- Add new `enable_registration_token_3pid_bypass` configuration option to allow registrations via token as an alternative to verifying a 3pid. ([\#12526](#12526))
- Implement [MSC3786](matrix-org/matrix-spec-proposals#3786): Add a default push rule to ignore `m.room.server_acl` events. ([\#12601](#12601))
- Add new `mau_appservice_trial_days` configuration option to specify a different trial period for users registered via an appservice. ([\#12619](#12619))

Bugfixes
--------

- Fix a bug introduced in Synapse 1.48.0 where the latest thread reply provided failed to include the proper bundled aggregations. ([\#12273](#12273))
- Fix a bug introduced in Synapse 1.22.0 where attempting to send a large amount of read receipts to an application service all at once would result in duplicate content and abnormally high memory usage. Contributed by Brad & Nick @ Beeper. ([\#12544](#12544))
- Fix a bug introduced in Synapse 1.57.0 which could cause `Failed to calculate hosts in room` errors to be logged for outbound federation. ([\#12570](#12570))
- Fix a long-standing bug where status codes would almost always get logged as `200!`, irrespective of the actual status code, when clients disconnect before a request has finished processing. ([\#12580](#12580))
- Fix race when persisting an event and deleting a room that could lead to outbound federation breaking. ([\#12594](#12594))
- Fix a bug introduced in Synapse 1.53.0 where bundled aggregations for annotations/edits were incorrectly calculated. ([\#12633](#12633))
- Fix a long-standing bug where rooms containing power levels with string values could not be upgraded. ([\#12657](#12657))
- Prevent memory leak from reoccurring when presence is disabled. ([\#12656](#12656))

Updates to the Docker image
---------------------------

- Explicitly opt-in to using [BuildKit-specific features](https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/syntax.md) in the Dockerfile. This fixes issues with building images in some GitLab CI environments. ([\#12541](#12541))
- Update the "Build docker images" GitHub Actions workflow to use `docker/metadata-action` to generate docker image tags, instead of a custom shell script. Contributed by @henryclw. ([\#12573](#12573))

Improved Documentation
----------------------

- Update SQL statements and replace use of old table `user_stats_historical` in docs for Synapse Admins. ([\#12536](#12536))
- Add missing linebreak to `pipx` install instructions. ([\#12579](#12579))
- Add information about the TCP replication module to docs. ([\#12621](#12621))
- Fixes to the formatting of `README.rst`. ([\#12627](#12627))
- Fix docs on how to run specific Complement tests using the `complement.sh` test runner. ([\#12664](#12664))

Deprecations and Removals
-------------------------

- Remove unstable identifiers from [MSC3069](matrix-org/matrix-spec-proposals#3069). ([\#12596](#12596))
- Remove the unspecified `m.login.jwt` login type and the unstable `uk.half-shot.msc2778.login.application_service` from
  [MSC2778](matrix-org/matrix-spec-proposals#2778). ([\#12597](#12597))
- Synapse now requires at least Python 3.7.1 (up from 3.7.0), for compatibility with the latest Twisted trunk. ([\#12613](#12613))

Internal Changes
----------------

- Use supervisord to supervise Postgres and Caddy in the Complement image to reduce restart time. ([\#12480](#12480))
- Immediately retry any requests that have backed off when a server comes back online. ([\#12500](#12500))
- Use `make_awaitable` instead of `defer.succeed` for return values of mocks in tests. ([\#12505](#12505))
- Consistently check if an object is a `frozendict`. ([\#12564](#12564))
- Protect module callbacks with read semantics against cancellation. ([\#12568](#12568))
- Improve comments and error messages around access tokens. ([\#12577](#12577))
- Improve docstrings for the receipts store. ([\#12581](#12581))
- Use constants for read-receipts in tests. ([\#12582](#12582))
- Log status code of cancelled requests as 499 and avoid logging stack traces for them. ([\#12587](#12587), [\#12663](#12663))
- Remove special-case for `twisted` logger from default log config. ([\#12589](#12589))
- Use `getClientAddress` instead of the deprecated `getClientIP`. ([\#12599](#12599))
- Add link to documentation in Grafana Dashboard. ([\#12602](#12602))
- Reduce log spam when running multiple event persisters. ([\#12610](#12610))
- Add extra debug logging to federation sender. ([\#12614](#12614))
- Prevent remote homeservers from requesting local user device names by default. ([\#12616](#12616))
- Add a consistency check on events which we read from the database. ([\#12620](#12620))
- Remove use of the `constantly` library and switch to enums for `EventRedactBehaviour`. Contributed by @andrewdoh. ([\#12624](#12624))
- Remove unused code related to receipts. ([\#12632](#12632))
- Minor improvements to the scripts for running Synapse in worker mode under Complement. ([\#12637](#12637))
- Move `pympler` back in to the `all` extras. ([\#12652](#12652))
- Fix spelling of `M_UNRECOGNIZED` in comments. ([\#12665](#12665))
- Release script: confirm the commit to be tagged before tagging. ([\#12556](#12556))
- Fix a typo in the announcement text generated by the Synapse release development script. ([\#12612](#12612))

- Fix scripts-dev to pass typechecking. ([\#12356](#12356))
- Add some type hints to datastore. ([\#12485](#12485))
- Remove unused `# type: ignore`s. ([\#12531](#12531))
- Allow unused `# type: ignore` comments in bleeding edge CI jobs. ([\#12576](#12576))
- Remove redundant lines of config from `mypy.ini`. ([\#12608](#12608))
- Update to mypy 0.950. ([\#12650](#12650))
- Use `Concatenate` to better annotate `_do_execute`. ([\#12666](#12666))
- Use `ParamSpec` to refine type hints. ([\#12667](#12667))
- Fix mypy against latest pillow stubs. ([\#12671](#12671))
Comment on lines +251 to +263

if self.notifier:
# Inform the relevant places that the remote server is back up.
self.notifier.notify_remote_server_up(self.destination)

if self.replication_client:
# If we're on a worker we try and inform master about this. The
# replication client doesn't hook into the notifier to avoid
# infinite loops where we send a `REMOTE_SERVER_UP` command to
# master, which then echoes it back to us which in turn pokes
# the notifier.
self.replication_client.send_remote_server_up(self.destination)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path appears to get taken when we fail to contact the remote server too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #16506.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants