Sliding Sync: Slight optimization when fetching state for the room (`get_events_as_list(...)`) #17718

MadLittleMods · 2024-09-16T23:55:53Z

Spawning from @kegsay pointing out that the Sliding Sync endpoint doesn't handle a large room with a lot of state well on initial sync (requesting all state via required_state: [ ["*","*"] ]) (it just takes forever).

After investigating further, the slow part is just get_events_as_list(...) fetching all of the current state ID's out for the room (which can be 100k+ events for rooms with a lot of membership). This is just a slow thing in Synapse in general and the same thing happens in Sync v2 or the /state endpoint.

The only idea I had to improve things was to use batch_iter to only try fetching a fixed amount at a time instead of working with large maps, lists, and sets. This doesn't seem to have much effect though.

There is already a batch_iter(event_ids, 200) in _fetch_event_rows(...) for when we actually have to touch the database and that's inside a queue to deduplicate work.

I did notice one slight optimization to use get_events_as_list(...) directly instead of get_events(...). get_events(...) just turns the result from get_events_as_list(...) into a dict and since we're just iterating over the events, we don't need the dict/map.

Dev notes

Run the new tests

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.databases.main.test_events_worker.GetEventsTestCase

SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.databases.main.test_events_worker.GetEventsTestCase

Enable Jaeger tracing in Twisted Trial tests:

Run Jaeger locally with Docker: https://www.jaegertracing.io/docs/1.6/getting-started/#all-in-one-docker-image

Add the following to tests/unittest.py in setUp(...):

        from synapse.logging.opentracing import init_tracer

        # Start the tracer
        init_tracer(self.hs)  # noqa

Add the following to your tests (in tests/rest/client/test_asdf.py for example):

    def default_config(self) -> JsonDict:
        config = super().default_config()
        config["opentracing"] = {
            "enabled": True,
            "jaeger_config": {
                "sampler": {"type": "const", "param": 1},
                "logging": False,
            },
        }
        return config

Add a sleep to the end of the test you're running so the traces have a chance to flow to Jaeger before the process exits.

import time
time.sleep(6)

If you're not tracing an endpoint/servlet (which will already have a logging context and tracing span), you will need to wrap your function under test in a logging context for the tracing to be happy. Also be sure to add @trace decorators to what you want to trace downstream.

def test_asdf(self) -> None:
    with LoggingContext("test") as _ctx:
        my_custom_function()

Run your tests.

Visit http://localhost:16686/ and choose the test master service and find the traces you're interesting in.

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
- Use markdown where necessary, mostly for code blocks.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
Code style is correct
(run the linters)

…anyway

MadLittleMods · 2024-09-16T23:58:55Z

synapse/handlers/sliding_sync/__init__.py

+        events = await self.store.get_events_as_list(list(state_ids.values()))

        state_map = {}
-        for key, event_id in state_ids.items():
-            event = event_map.get(event_id)
-            if event:
-                state_map[key] = event
+        for event in events:
+            state_map[(event.type, event.state_key)] = event


One slight optimization to use get_events_as_list(...) directly instead of get_events(...). get_events(...) just turns the result from get_events_as_list(...) into a dict and since we're just iterating over the events, we don't need the dict/map.

MadLittleMods · 2024-09-17T00:11:38Z

tests/storage/databases/main/test_events_worker.py

+    def prepare(self, reactor: MemoryReactor, clock: Clock, hs: HomeServer) -> None:
+        self.store: EventsWorkerStore = hs.get_datastores().main
+
+    def test_get_lots_of_messages(self) -> None:


New test because I used it as a test bench and get consistent traces (previous iteration with tracing). The tracing stuff has been removed.

Feel free to nit to remove the whole thing.

MadLittleMods · 2024-09-17T00:12:19Z

synapse/storage/databases/main/events_worker.py

    @cancellable
    async def get_unredacted_events_from_cache_or_db(
        self,
-        event_ids: Iterable[str],
+        event_ids: Collection[str],


Collection because we iterate over event_ids multiple times (even before I added in the tracing tag)

MadLittleMods · 2024-09-17T00:13:15Z

synapse/storage/databases/main/events_worker.py

    async def _get_events_from_external_cache(
-        self, events: Iterable[str], update_metrics: bool = True
+        self, events: Collection[str], update_metrics: bool = True


This one was a proper Iterable before and only used once but updated to be a Collection now that we also read from it for the tracing tag.

MadLittleMods · 2024-10-14T15:25:39Z

Thanks for the review and merge @erikjohnston 🐸

MadLittleMods added 4 commits September 16, 2024 16:09

Add tracing and test

fbae11f

More tracing and test

fed7502

Reduce test event count

d3e67f1

No need to make an intermediate map if we are just iterating over it …

c84ee8f

…anyway

MadLittleMods added the A-Sync label Sep 16, 2024

MadLittleMods changed the title ~~Sliding Sync: Investigate get_events_as_list(...) taking a long time~~ Sliding Sync: Investigate get_events_as_list(...) taking a long time for large rooms Sep 16, 2024

MadLittleMods commented Sep 16, 2024

View reviewed changes

MadLittleMods added 5 commits September 16, 2024 19:00

Remove tracing specifics

4b62546

Fix typo

be025a4

Add changelog

55c5811

Remove tracing generic things that could stick around

58550d0

Add test description

ff6e51f

MadLittleMods commented Sep 17, 2024

View reviewed changes

MadLittleMods changed the title ~~Sliding Sync: Investigate get_events_as_list(...) taking a long time for large rooms~~ Sliding Sync: Slight optimization when fetching state for the room (get_events_as_list(...)) Sep 17, 2024

MadLittleMods marked this pull request as ready for review September 17, 2024 00:38

MadLittleMods requested a review from a team as a code owner September 17, 2024 00:38

Merge branch 'develop' into madlittlemods/get_a_lot_of_events

4d1b3e4

erikjohnston approved these changes Oct 14, 2024

View reviewed changes

erikjohnston merged commit adda2a4 into develop Oct 14, 2024
39 checks passed

erikjohnston deleted the madlittlemods/get_a_lot_of_events branch October 14, 2024 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sliding Sync: Slight optimization when fetching state for the room (`get_events_as_list(...)`) #17718

Sliding Sync: Slight optimization when fetching state for the room (`get_events_as_list(...)`) #17718

MadLittleMods commented Sep 16, 2024 •

edited

Loading

MadLittleMods Sep 16, 2024

MadLittleMods Sep 17, 2024

MadLittleMods Sep 17, 2024 •

edited

Loading

MadLittleMods Sep 17, 2024 •

edited

Loading

MadLittleMods commented Oct 14, 2024

Sliding Sync: Slight optimization when fetching state for the room (get_events_as_list(...)) #17718

Sliding Sync: Slight optimization when fetching state for the room (get_events_as_list(...)) #17718

Conversation

MadLittleMods commented Sep 16, 2024 • edited Loading

Dev notes

Run the new tests

Enable Jaeger tracing in Twisted Trial tests:

Pull Request Checklist

MadLittleMods Sep 16, 2024

Choose a reason for hiding this comment

MadLittleMods Sep 17, 2024

Choose a reason for hiding this comment

MadLittleMods Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

MadLittleMods Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

MadLittleMods commented Oct 14, 2024

Sliding Sync: Slight optimization when fetching state for the room (`get_events_as_list(...)`) #17718

Sliding Sync: Slight optimization when fetching state for the room (`get_events_as_list(...)`) #17718

MadLittleMods commented Sep 16, 2024 •

edited

Loading

MadLittleMods Sep 17, 2024 •

edited

Loading

MadLittleMods Sep 17, 2024 •

edited

Loading