Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

events table uses a 32-bit integer for stream_ordering #8255

Closed
richvdh opened this issue Sep 4, 2020 · 11 comments
Closed

events table uses a 32-bit integer for stream_ordering #8255

richvdh opened this issue Sep 4, 2020 · 11 comments
Labels
S-Critical Blocks development, potential data loss, more than 25% of users possibly affected, no workarounds. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. z-bug (Deprecated Label)

Comments

@richvdh
Copy link
Member

richvdh commented Sep 4, 2020

2 billion events isn't actually that many; current estimate for exhaustion on matrix.org is Q3 2021, but that's an optimistic estimate as the rate of events is increasing.

nb there are other tables that use a standard int too.

@richvdh
Copy link
Member Author

richvdh commented Jun 25, 2021

ETA for event 2B is now 2021-07-04.

@richvdh richvdh added S-Critical Blocks development, potential data loss, more than 25% of users possibly affected, no workarounds. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. labels Jun 25, 2021
@richvdh
Copy link
Member Author

richvdh commented Jun 25, 2021

The main difficulty with fixing is doing it in a way that doesn't involve synchronously rewriting the table (or at least the index).

From https://www.postgresql.org/docs/10/sql-altertable.html:

... changing the type of an existing column will require the entire table and its indexes to be rewritten. As an exception when changing the type of an existing column, if ... the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed; but any indexes on the affected columns must still be rebuilt.

Ideas for solving this:

    • As a single db migration:
      • create a new table events2 with a bigint stream_ordering. Take the opportunity to drop unused columns such as depth, content, unrecognized_keys, processed, and outlier.
      • rename events to events1
      • create a view called events which queries from both tables.
    • have the code populate events2 instead of events
    • Run a background db process which moves rows from events1 to events2, and eventually drops events1 and the view and renames events2 back to events (and has the code populate events again).

    A big advantage of this approach is that it gets us out of the hole immediately, without having to wait for the background processes to run.

    My main concern is whether the queries that we do against events will be efficient when made against a view.

    Another problem is that it will get us into a position where it will be impossible to roll back Synapse to earlier versions (because we'll be writing to the wrong table). But maybe that doesn't matter too much if we do it as an incremental update on v1.37, with no other changes.

    • Add a (nullable) column stream_ordering2 bigint
    • Start populating stream_ordering2 with the same value as stream_ordering
    • Run a background process to update existing rows with stream_ordering2
    • Run a create unique index concurrently on events(stream_ordering2);
    • Drop the old column, rename the new one, and stop populating stream_ordering2.

    This leaves us with stream_ordering being nullable - changing that requires a table scan even though there is an index which could be used to determine there are no null values :/

@ptman
Copy link
Contributor

ptman commented Jun 27, 2021

I'm in favour of option 1. In addition to dropping unused columns, it allows for reordering columns: https://www.2ndquadrant.com/en/blog/on-rocks-and-sand/

The lack of rollback is indeed an issue. It can be scripted, but then that would have to be tested...

@richvdh
Copy link
Member Author

richvdh commented Jun 28, 2021

Having experimented somewhat with this, I don't think option 1 is going to work. The two different types for stream_ordering mean that we can't efficiently query against that field. For example, the following results in a table-scan of events:

SELECT * FROM 
   (SELECT stream_ordering, event_id FROM events 
    UNION 
    SELECT stream_ordering, event_id FROM events2)
WHERE stream_ordering > 2000000000 LIMIT 1;

Being able to query efficiently for ranges of stream_ordering is kinda the whole reason we have a stream_ordering in the first place, so I think that's a show-stopper.

Rather, I am going to pursue option 2. We can follow up with option 1 later to clean up the table.

@callahad
Copy link
Contributor

Do we also need to worry about last_stream_ordering on the pushers table?

@richvdh
Copy link
Member Author

richvdh commented Jun 29, 2021

Do we also need to worry about last_stream_ordering on the pushers table?

As I wrote over on #10264, yes!

Here's a complete list of regular INTEGER columns in the database:

# needs an urgent fix
 federation_stream_position.stream_id
 pushers.last_stream_ordering

# probably needs to be made a BIGINT, but not this week:
 application_services_state.last_txn
 application_services_state.presence_stream_id
 application_services_state.read_receipt_stream_id
 application_services_txns.txn_id
 room_depth.min_depth

# can safely remain an INTEGER for the foreseeable future
 applied_schema_deltas.version
 background_updates.ordering
 e2e_room_keys.first_message_index
 e2e_room_keys.forwarded_count
 event_json.format_version
 instance_map.instance_id
 local_media_repository.media_length
 local_media_repository_thumbnails.thumbnail_height
 local_media_repository_thumbnails.thumbnail_length
 local_media_repository_thumbnails.thumbnail_width
 local_media_repository_url_cache.response_code
 push_rules.priority
 push_rules_stream.priority
 received_transactions.response_code
 remote_media_cache.media_length
 remote_media_cache_thumbnails.thumbnail_height
 remote_media_cache_thumbnails.thumbnail_length
 remote_media_cache_thumbnails.thumbnail_width
 room_memberships.forgotten
 room_stats_current.banned_members
 room_stats_current.current_state_events
 room_stats_current.invited_members
 room_stats_current.joined_members
 room_stats_current.knocked_members
 room_stats_current.left_members
 room_stats_current.local_users_in_room
 schema_compat_version.compat_version
 schema_version.version

@richvdh
Copy link
Member Author

richvdh commented Jun 30, 2021

These are all fixed in #10286:

# needs an urgent fix
 federation_stream_position.stream_id
 pushers.last_stream_ordering

# probably needs to be made a BIGINT, but not this week:
 application_services_state.last_txn
 application_services_state.presence_stream_id
 application_services_state.read_receipt_stream_id
 application_services_txns.txn_id

And this one is fixed in #10289.

 room_depth.min_depth

@richvdh
Copy link
Member Author

richvdh commented Jul 1, 2021

Somehow I seem to have missed one from the "ought to fix, but not urgent" category:

application_services_txns.txn_id

I've just been back through the list and double-checked: this is the only int4 not listed above. For the record, the incantation I used is:

select relname||'.'||attname from pg_attribute att
   left join pg_type t1 on t1.oid=att.atttypid 
   left join pg_class rel on rel.oid=attrelid 
    LEFT JOIN pg_namespace n ON n.oid = rel.relnamespace
WHERE nspname='public' AND typname='int4' and relkind IN ('r','p')
ORDER BY relname, attname;

@richvdh
Copy link
Member Author

richvdh commented Jul 1, 2021

It was pointed out that I had better include sequences. Indeed, the only thing we want to exclude is indexes (i) since they follow the types of the indexed tables:

select relname||'.'||attname from pg_attribute att
   left join pg_type t1 on t1.oid=att.atttypid 
   left join pg_class rel on rel.oid=attrelid 
    LEFT JOIN pg_namespace n ON n.oid = rel.relnamespace
WHERE nspname='public' AND typname='int4' and relkind <> 'i'
ORDER BY relname, attname;

It doesn't find anything not already covered above.

@richvdh
Copy link
Member Author

richvdh commented Jul 9, 2021

Somehow I seem to have missed one from the "ought to fix, but not urgent" category:

application_services_txns.txn_id

This is handled in #10349.

@richvdh
Copy link
Member Author

richvdh commented Jul 9, 2021

I'm going to close this issue, because the major problem of events.stream_ordering is fixed in Synapse 1.38, and the remaining open PRs (#10349 and #10289) are just cleanup.

@richvdh richvdh closed this as completed Jul 9, 2021
babolivier pushed a commit that referenced this issue Sep 1, 2021
* commit '7513006b0':
  In light of #8255, use BIGINTs for destination_rooms (#8256)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Critical Blocks development, potential data loss, more than 25% of users possibly affected, no workarounds. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. z-bug (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

4 participants