Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Emphasize the right reasons to use (room_id, event_id) in a schema #13915

Merged
merged 5 commits into from
Sep 27, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13915.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Emphasize the right reasons when to use `(room_id, event_id)` in a database schema.
31 changes: 16 additions & 15 deletions docs/development/database_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,23 +195,24 @@ There are three separate aspects to this:

## `event_id` global uniqueness

In room versions `1` and `2` it's possible to end up with two events with the
same `event_id` (in the same or different rooms). After room version `3`, that
can only happen with a hash collision, which we basically hope will never
happen.

There are several places in Synapse and even Matrix APIs like [`GET
`event_id`'s can be considered globally unique although there has been a lot of
debate on this topic in places like
https://github.com/matrix-org/matrix-spec-proposals/issues/2779 and
richvdh marked this conversation as resolved.
Show resolved Hide resolved
[MSC2848](https://github.com/matrix-org/matrix-spec-proposals/pull/2848) which
has no resolution yet (as of 2022-09-01). There are several places in Synapse
and even in the Matrix APIs like [`GET
/_matrix/federation/v1/event/{eventId}`](https://spec.matrix.org/v1.1/server-server-api/#get_matrixfederationv1eventeventid)
where we assume that event IDs are globally unique.

But hash collisions are still possible, and by treating event IDs as room
scoped, we can reduce the possibility of a hash collision. When scoping
`event_id` in the database schema, it should be also accompanied by `room_id`
(`PRIMARY KEY (room_id, event_id)`) and lookups should be done through the pair
`(room_id, event_id)`.
When scoping `event_id` in a database schema, it is often nice to accompany it
with `room_id` (`PRIMARY KEY (room_id, event_id)` and a `FOREIGN KEY(room_id)
REFERENCES rooms(room_id)`) which makes flexible lookups easy. For example it
makes it very easy to find and clean up everything in a room when it needs to be
purged (no need to use sub-`select` query or join from the `events` table).

A note on collisions: In room versions `1` and `2` it's possible to end up with
two events with the same `event_id` (in the same or different rooms). After room
version `3`, that can only happen with a hash collision, which we basically hope
will never happen (SHA256 has a massive big key space).

There has been a lot of debate on this in places like
https://github.com/matrix-org/matrix-spec-proposals/issues/2779 and
[MSC2848](https://github.com/matrix-org/matrix-spec-proposals/pull/2848) which
has no resolution yet (as of 2022-09-01).