You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
There are a number of problems with the logic in keyring.py for fetching server keys:
each request that needs a key for a given server gets queued up, and it's possible to end up with quite a long queue for a given server. If the lookup is successful, that's ok. However, if it fails (which may take many minutes while we wait for timeouts), then we try again for each request in the queue - so we can rapidly end up getting very badly behind. When we want key X for server Y, if there is already a request in the queue for that key, then we should just use the results from it, even if it fails.
relatedly, the queueing logic might never complete. If a given request wants keys from server A and server B, and a lookup is already in progress for A, it waits for that to complete. By that time, another request might be doing a lookup for B, so it waits for that to complete. Then we might be waiting for A again. etc. We should immediately start lookups for those servers which aren't already in progress, rather than waiting for the complete set.
relatedly, the queueing logic might never complete.
This is a huge problem while we are joining a room, and is a huge contributor to #1211. In particular:
we do a send_join
servers start sending us federation transactions, which means we need to fetch their keys, so we take out key-fetch locks for those servers
we try to verify the results of the send join, so have to fetch hundreds of keys. Some of those servers are already locked due to the above, so we wait
This (really the tight looping of Waiting for existing lookups logging in #5435) has come up several times in the past couple of months. When a HS hits this it is effectively unresponsive until it gets restarted
There are a number of problems with the logic in
keyring.py
for fetching server keys:each request that needs a key for a given server gets queued up, and it's possible to end up with quite a long queue for a given server. If the lookup is successful, that's ok. However, if it fails (which may take many minutes while we wait for timeouts), then we try again for each request in the queue - so we can rapidly end up getting very badly behind. When we want key X for server Y, if there is already a request in the queue for that key, then we should just use the results from it, even if it fails.
relatedly, the queueing logic might never complete. If a given request wants keys from server A and server B, and a lookup is already in progress for A, it waits for that to complete. By that time, another request might be doing a lookup for B, so it waits for that to complete. Then we might be waiting for A again. etc. We should immediately start lookups for those servers which aren't already in progress, rather than waiting for the complete set.
see also store_server_verify_keys shouldn't need to lock the table #3819 and get_keys_from_store should do one big lookup, not hundreds of tiny ones #3818
The text was updated successfully, but these errors were encountered: