Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fetch less state when we _compute_event_context_with_maybe_missing_prevs #15616

Closed

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented May 17, 2023

Fetch less state when we _compute_event_context_with_maybe_missing_prevs

Part of making /messages faster: #13356

Background

When we have to _compute_event_context_with_maybe_missing_prevs, _get_state_groups_from_groups currently takes up most of the time. It makes the following recursive query for each state group given which goes up the entire state_group chain and returns the complete distinct state which means thousands of membership events at the very least. For example, with a random /messages request in the Matrix HQ room we did 10x of these queries which each took 0.5-2 seconds and return roughly 88k events every time totaling 14s.

WITH RECURSIVE sgs(state_group) AS (
  VALUES 
    (? :: bigint) 
  UNION ALL 
  SELECT 
    prev_state_group 
  FROM 
    state_group_edges e, sgs s 
  WHERE 
    s.state_group = e.state_group
) 
SELECT 
  DISTINCT ON (type, state_key) type, state_key, event_id 
FROM 
  state_groups_state 
WHERE 
  state_group IN (
    SELECT state_group FROM sgs
  ) 
ORDER BY type, state_key, state_group DESC

https://explain.depesz.com/s/OuOe
https://explain.dalibo.com/plan/ef6bc290f2dced17

EXPLAIN ANAYLZE query output
EXPLAIN ANALYZE WITH RECURSIVE sgs(state_group) AS (
  VALUES 
    (739988088 :: bigint) 
  UNION ALL 
  SELECT 
    prev_state_group 
  FROM 
    state_group_edges e, sgs s 
  WHERE 
    s.state_group = e.state_group
) 
SELECT 
  DISTINCT ON (type, state_key) type, state_key, event_id 
FROM 
  state_groups_state 
WHERE 
  state_group IN (
    SELECT state_group FROM sgs
  ) 
ORDER BY type, state_key, state_group DESC;
                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=7161.42..7199.29 rows=5050 width=91) (actual time=1084.809..1099.816 rows=88180 loops=1)
   CTE sgs
     ->  Recursive Union  (cost=0.00..284.28 rows=101 width=8) (actual time=0.004..21.247 rows=59 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
           ->  Nested Loop  (cost=0.57..28.23 rows=10 width=8) (actual time=0.358..0.359 rows=1 loops=59)
                 ->  WorkTable Scan on sgs s  (cost=0.00..0.20 rows=10 width=8) (actual time=0.000..0.000 rows=1 loops=59)
                 ->  Index Only Scan using state_group_edges_unique_idx on state_group_edges e  (cost=0.57..2.79 rows=1 width=16) (actual time=0.357..0.357 rows=1 loops=59)
                       Index Cond: (state_group = s.state_group)
                       Heap Fetches: 0
   ->  Sort  (cost=6877.14..6889.76 rows=5050 width=91) (actual time=1084.806..1089.100 rows=88221 loops=1)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
         Sort Method: quicksort  Memory: 14782kB
         ->  Nested Loop  (cost=3.11..6566.51 rows=5050 width=91) (actual time=23.522..890.988 rows=88221 loops=1)
               ->  HashAggregate  (cost=2.27..3.28 rows=101 width=8) (actual time=21.348..21.383 rows=59 loops=1)
                     Group Key: sgs.state_group
                     Batches: 1  Memory Usage: 24kB
                     ->  CTE Scan on sgs  (cost=0.00..2.02 rows=101 width=8) (actual time=0.007..21.282 rows=59 loops=1)
               ->  Index Scan using state_groups_state_type_idx on state_groups_state  (cost=0.84..64.48 rows=50 width=91) (actual time=0.959..14.462 rows=1495 loops=59)
                     Index Cond: (state_group = sgs.state_group)
 Planning Time: 4.485 ms
 Execution Time: 1105.539 ms
(21 rows)

Time: 1112.886 ms (00:01.113)

Full Jaeger trace JSON

Questions/ideas

Do we really need all of this state?

Can we get away with adding a state_filter that only fetches the auth event types out? StateFilter.from_types(event_auth.auth_types_for_event(event.room_version, event))

Or maybe we can avoid fetching all m.room.member events except for the one that pertains to the event sender?

Another idea, since we fetch state for many state_groups, can we re-use the work from one to the other? Perhaps we can have one recursive query starting at the max state_group given and somehow assembles what we want for each of the subsequent state groups.


For reference, we previously optimized the SQL query in filter_events_for_client -> _get_state_groups_from_groups in #14527 but that was a case where we already had a state_filter to only grab the m.room.history_visibility and m.room.member events.

Dev notes

Grabbing all of the state via _get_state_groups_from_groups is the slowest part of _compute_event_context_with_maybe_missing_prevs

  • _compute_event_context_with_maybe_missing_prevs
    • TODO: ... _get_state_groups_from_groups 🐢

Of all the state we grab in _compute_event_context_with_maybe_missing_prevs, we only ever use TODO

  • _process_pulled_event
    • _compute_event_context_with_maybe_missing_prevs
    • _process_received_pdu
      • _check_event_auth
        • _state_resolution_handler.resolve_events_with_store(...)
          • does full state res
        • -> Looks for auth events in the state (event_auth.auth_types_for_event(event.room_version, event)) 🎈
      • _check_for_soft_fail
        -> Looks for auth events again 🎈
      • _run_push_actions_and_persist_event
        -> Looks for m.room.power_levels 🎈
        • filter_event_for_clients_with_state
          -> EventTypes.RoomHistoryVisibility 🎈

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

@MadLittleMods MadLittleMods added A-Performance Performance, both client-facing and admin-facing T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db labels May 17, 2023
@MadLittleMods MadLittleMods added the A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) label Jul 5, 2023
@clokep
Copy link
Member

clokep commented Sep 25, 2023

This seems to have been a placeholder.

@clokep clokep closed this Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) A-Performance Performance, both client-facing and admin-facing T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants