This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Make the backfill linearizer lock smarter to not block as often #13619
Labels
A-Messages-Endpoint
/messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill)
A-Performance
Performance, both client-facing and admin-facing
O-Uncommon
Most users are unlikely to come across this or unexpected workflow
S-Minor
Blocks non-critical functionality, workarounds exist.
T-Enhancement
New features, changes in functionality, improvements in performance, or user-facing enhancements.
Milestone
Mentioned in internal doc about speeding up
/messages
. Also see "1. Backfill linearizer lock takes forever" in #13356The linearizer lock on backfill is only used to de-duplicate work at the same time (it doesn't have anything to do with data integrity) (introduced in #10116). But the linearizer is very simplistic and we can make it smarter to not block so often.
Improvements
Block by ranges of
depth
per-roomCurrently, the linearizer blocks per-room, so you're not able to backfill in two separate locations in a room.
We could update to have locks on ranges of
depth
for the room.De-duplicate the same request/work
Currently, the linearizer has no de-duplication so if you send the same request 10 times in a row, they will all just rack up and do the same work over and over in sequence.
We could instead share the response from the first one to all the requests that are waiting.
The queue can build up above our timeout threshold
If we see that items in the queue are older than 180s, we should just cancel them because the underlying request to Synapse has timed out anyway. No need really to do work for a request that is no longer running.
Linearizer shared across all worker processes
Currently, the linearizer is per-process so if there are multiple client reader worker processes, we are potentially duplicating work across them. We do try to send traffic for a given room to the same worker though but there is no guarantee.
This is probably one of the optimizations to prioritize last.
The text was updated successfully, but these errors were encountered: