Consider changing `/messages` to use stream ordering for events #852

timokoesters · 2021-07-02T13:10:44Z

Today the /messages endpoint documentation does not say anything about the
ordering of the events. /sync hints at one possible ordering method:

Events are ordered in this API according to the arrival time of the event on
the homeserver. This can conflict with other APIs which order events based on
their partial ordering in the event graph.

The current concept of ordering based on event graph is not ideal for the following reasons:

It's not clearly documented

Which event is picked when an event has multiple prev events?
What happens if a prev event is not known to the server
What happens if an event has no prev events, or builds a circle of prev events
or is it's own prev event. This is not checked in the federation api.

You can't notify the client about updates

Clients might cache events from /messages, but this list of events could be
incomplete because we don't know about events of a fork
/sync can't be used to send these updates because it's response will be limited
if there are too many events

Possible solution:

/messages uses the same order as /sync, which is the order at which events
arrive at the server, not sorted at all.

This has the advantage of being a lot simpler and therefore doesn't have many
edge cases.

A problem is that Matrix events often arrive in batches from other servers, so
conversations might become unreadable because you first see all messages from
one user and then all messages from the other user.

This can be improved by adding more logic in the client (for example sort the
events by timestamp) and by improving the /messages api (maybe it can send prev
events to the client to improve ordering or call /context on them).

timokoesters · 2021-07-02T13:12:18Z

This can be very problematic with MSC2716 matrix-org/matrix-spec-proposals#2716, because it relies heavily on the ordering of events.

richvdh · 2021-07-05T10:26:47Z

I think this is basically a duplicate of https://github.com/matrix-org/matrix-doc/issues/574.

timokoesters · 2021-07-15T17:32:14Z

@richvdh The issue you posted is just about clarifying the current behavior. In this issue I also try to argue why the current behavior should be changed.

richvdh · 2021-09-20T17:58:12Z

related: matrix-org/matrix-spec-proposals#474

ShadowJonathan · 2021-09-28T09:49:23Z

As said in #spec, I think a proper solution would be an extra parameter on every endpoint that asserts ordering of events, to either receive a "canonical" timeline (as how it'd be ordered in a utopian world where everything is up 100% and there's no latency), or an "as-received" timeline (as how events are received/fetched by the server).

Sync could be altered to have timeline events, which'd technically fall "outside of view", arrive in a different key, to then have the client decide how to present these to the user (possibly with a small pop-up on the side saying "hey, there's delayed events in a conversation you participated in" or some kind)

ShadowJonathan · 2022-04-11T20:33:50Z

After talking about this issue in the P2P Matrix room, I think a server can note (at least) two things to a client;

If the timestamp on the event is older than the "last" (DAG-fork-wise) event from that node
- This allows detecting jumping clocks / timestamp forgery
The timestamp at which it received/fetched the event over federation (if the event is coming down sync)
- This could allow clients to push the event to the room timeline anyways, hide it, defer it in a side-tab, or notify the user in some other way

dkasak · 2022-05-28T11:44:14Z

/messages uses the same order as /sync, which is the order at which events
arrive at the server, not sorted at all.

This can't work for backfill (history before your HS joined the room), since these events were never received by your HS in the first place. Meaning you have to ask another HS for the events, but the ordering would be different depending on which HS you backfilled from.

and by improving the /messages api (maybe it can send prev events to the client to improve ordering or call /context on them).

You can already entice a HS to send you the event in federation format using filters, via the event_format parameter. This naturally includes all prev_events.

This also partially solves this problem, at least with well-behaved homeservers:

Clients might cache events from /messages, but this list of events could be
incomplete because we don't know about events of a fork

Because you'll eventually receive an event whose prev_events lists events from all the DAG forks.

This problem mostly boils down to the fact that /sync, through which you're supposed to be notified about the new stuff, can be limited (gappy). At this point you're advised to use /messages to retrieve the stuff that ended up in the gap. But since /messages uses a different ordering, you'd basically have to traverse the whole room history so that you'd be sure you've seen it all. I've called this the Gap Problem in the past.

I think the solution would require to be able to paginate through the /sync-style ordering (stream ordering) and not just the topological ordering, so that when a gappy sync happens, you'd be able to retrieve everything that happened in the gap reliably.

MadLittleMods · 2023-04-14T20:14:41Z

This problem mostly boils down to the fact that /sync, through which you're supposed to be notified about the new stuff, can be limited (gappy). At this point you're advised to use /messages to retrieve the stuff that ended up in the gap. But since /messages uses a different ordering, you'd basically have to traverse the whole room history so that you'd be sure you've seen it all. I've called this the Gap Problem in the past.

/messages can also be gappy. Related MSC3871: Gappy timelines

dkasak · 2023-07-12T10:49:08Z

For cross-linking purposes (because I keep having to search for it), this comment outlines the problem of using separate orderings between the different APIs well, as well as explaining how it prevents efficient collection of the entire room timeline client-side.

andybalaam · 2023-07-12T10:50:58Z

Related: matrix-org/matrix-spec-proposals#4033 which argues that we need to be explicit about message order for receipts to make sense.

timokoesters · 2024-01-29T09:47:18Z

This can't work for backfill

Why not? Backfilled events are permanently stored, so the order will be consistent from the perspective of the client (it will not be consistent with other homeservers, but that is not important)

/messages can also be gappy

Conduit solves this by always using /sync ordering, so there are no gaps.

matrix-org/matrix-spec-proposals#4033

On first glance, this looks like a good solution, but it's unfortunate that we have to add yet another field to events. I will look at it in more detail later.

dkasak · 2024-01-29T12:37:16Z

Why not? Backfilled events are permanently stored, so the order will be consistent from the perspective of the client (it will not be consistent with other homeservers, but that is not important)

I'm actually not sure what it is exactly that you were proposing. Was it that issuing a backfill would append all newly retrieved events to the given room's sync ordering?

If that's the case, then the sync ordering will be vastly different from a reasonable ordering that should be shown by the client, which I think goes contrary to how most clients work today (am I wrong?). I think that what you are proposing is to more heavily rely on client-side sorting of the sync ordering using event timestamps.

Also, I wouldn't a priori dismiss having an (eventually) consistent view across homeservers as unimportant.

timokoesters · 2024-02-26T09:52:16Z

Was it that issuing a backfill would append all newly retrieved events to the given room's sync ordering?

Sorry I see why I was unclear. Backfilled events with Conduit will never arrive over sync and can only be seen if you call /messages far enough back. So the timeline is actually a linear list of events that can be expanded at both ends.

richvdh · 2024-02-26T11:03:31Z

I'm still not entirely following the proposal, so let's try to clarify it with an example.

Suppose your server has a DAG that looks like this, with assigned stream orderings in parentheses:

A(1) <- B(2) <- C(3)

... and an event F arrives over federation, which references a prev_event E (that you don't have, but maybe pull in via POST /_matrix/federation/v1/get_missing_events/{roomId} before you serve F to the client). The DAG segments that you have now looks like this:

A(1) <- B(2) <- C(3)

E(4) <- F(5)

Now, E's prev_event happens to be D. Let's assume that something happens to make your server backfill it via GET /_matrix/federation/v1/backfill/{roomId}. And let's assume that there is another event, D between the two which gets backfilled on a second request.

Synapse assigns backfilled events a negative stream ordering, and I don't think you're proposing to change that, so our room now looks like this:

A(1) <- B(2) <- C(3) <- D(-2) <- E(4) <- F(5)

So, I think your proposal is that /messages should return these events in the order D, A, B, C, E, F?

... And, extending the example, let's suppose that a later event H gets backfilled later. Now the order is: H, D, A, B, C, E, F? And, in general, more-recently-backfilled events are always prepended to that list, regardless of the actual causality implied by the DAG?

timokoesters · 2024-02-26T11:29:21Z

A(1) <- B(2) <- C(3)

Okay.

and an event F arrives over federation, which references a prev_event E
Now, E's prev_event happens to be D

Conduit will fetch as many prev_events as possible before accepting F. That means, it will recursively fetch all prev_events to get D, E and F, sort them with topological ordering and then insert them in the correct order, this gives

A(1) <- B(2) <- C(3) <- D(4) <- E(5) <- F(6)

Conduit only uses /backfill to load events from before our server joined the room, that means if a user scrolls up to event A and the asks /messages, Conduit will backfill events Z, Y, X and insert them like this:

X(-3) <- Y(-2) <- Z(-1) <- A(1) <- B(2) <- C(3) <- D(4) <- E(5) <- F(6)

Does that make things more clear?

richvdh · 2024-02-26T11:42:53Z

Conduit will fetch as many prev_events as possible before accepting F.

Out of interest, how does it fetch them? With /get_missing_events ?

That means, it will recursively fetch all prev_events to get D, E and F, sort them with topological ordering and then insert them in the correct order, this gives

A(1) <- B(2) <- C(3) <- D(4) <- E(5) <- F(6)

Conduit only uses /backfill to load events from before our server joined the room, that means if a user scrolls up to event A and the asks /messages, Conduit will backfill events Z, Y, X and insert them like this:

X(-3) <- Y(-2) <- Z(-1) <- A(1) <- B(2) <- C(3) <- D(4) <- E(5) <- F(6)

Does that make things more clear?

kinda, but it raises more questions than it answers. What happens if D is unavailable (eg, you can't reach any servers that will provide it) at the point you receive F? Or what happens if the gap between C and F is so large that it takes hours to fill it? Do you just keep going forever?

timokoesters · 2024-02-26T11:54:48Z

Good questions!

Out of interest, how does it fetch them?

It's just /event for each unknown event I think.

What happens if D is unavailable

These cases are rare, because if a server uses D as a prev_events, they can usually provide D. In the case they don't, this will lead to missing events. If D is referenced again in the future, we will try loading it again.

so large that it takes hours to fill it

There is a configurable limit of how many prev_events should be loaded per incoming event, the default is ~100 which works well in normal cases.
This can become a problem if your server was down for a while and you will receive out-of-order messages for a bit until things stabilize again. I think this can be improved by detecting this kind of downtime and using a different algorithm to fetch prev events. Alternatively, clients can hide these cases by collapsing out of order events as "100 messages from 3 days ago"

richvdh · 2024-02-26T12:12:58Z

These cases are rare, because if a server uses D as a prev_events, they can usually provide D

Empirically, they're not as rare as you might imagine, particularly in the case of transitive relationships (eg, you are receiving event F from a server, which can provide the prev_event, but not the prev_event's prev_event).

In the case they don't, this will lead to missing events. If D is referenced again in the future, we will try loading it again.

Ok, and what stream ordering is it assigned if it is successfully loaded on the second attempt? One that puts it at the start of the timeline, right?

so large that it takes hours to fill it

There is a configurable limit of how many prev_events should be loaded per incoming event, the default is ~100 which works well in normal cases.

Ok, but this is recursive, right? You need to fetch the prev_events of the prev_events? There must be a limit to the recursion as well?

In any case, the point here is: you can end up with gaps in your DAG, which could get filled later. The relevance to this discussion is: how do you fit such events into your linear timeline?

timokoesters · 2024-02-26T12:28:10Z

which can provide the prev_event, but not the prev_event's prev_event

In practice this seems to work well enough. To increase our chances, we could also ask some other servers that were active in that part of the timeline.

Ok, and what stream ordering is it assigned if it is successfully loaded on the second attempt? One that puts it at the start of the timeline, right?

What is the "start" for you? It will insert it as one of the most recent events:
If D could not be loaded the first time, but was also referred to by an event G, then that would result in

A(1) <- B(2) <- C(3) <- E(5) <- F(6) <- D(7) <-G(8)

Ok, but this is recursive, right?

Conduit does a breath-first search and takes the first 100 events.

how do you fit such events into your linear timeline?

In general, incoming events are appended at the bottom of the timeline, so that it arrives over /sync. When there are missing prev_events, we will handle them first before handling the incoming event.

richvdh · 2024-02-26T12:42:41Z

What is the "start" for you? It will insert it as one of the most recent events: If D could not be loaded the first time, but was also referred to by an event G, then that would result in

A(1) <- B(2) <- C(3) <- E(5) <- F(6) <- D(7) <-G(8)

Sidebar: this is not an accurate representation of the DAG. The actual DAG looks like this:

A(1) <- B(2) <- C(3) <- D(7) <- E(5) <- F(6) 
                          `-------------------------G(8)

Anyway, this seems like a difference between Synapse and Conduit. With Synapse, each time a client makes a /messages request that would touch the "gap" in the dag before D, Synapse will attempt to fill it, whereas Conduit will never attempt to fill that gap once it exists (unless an event like G appears)?

... which presumably means that, if your server is offline for more than 100 events, there will be missing events that can never be filled?

[Also: if you end up pulling in state events, which might be very old, in order to handle incoming events in the timeline -- for example, because they are referenced as auth_events -- are they added to the bottom of the timeline?]

timokoesters · 2024-02-26T12:58:20Z

if your server is offline for more than 100 events

It's not quite this bad, because when other servers send events to you, they start with the oldest events, so the gap is not quite that large.

And in theory, the origin servers of events should send you every event before they send newer ones. So that alone should make all events arrive. However, I know that Synapse breaks this promise if a server is offline for too long.

richvdh · 2024-02-26T13:05:59Z

It's not quite this bad, because when other servers send events to you, they start with the oldest events

Er, I don't think that's correct.

And in theory, the origin servers of events should send you every event before they send newer ones. So that alone should make all events arrive. However, I know that Synapse breaks this promise if a server is offline for too long.

I don't think the spec says that's a thing that is required, and the decision not to send older events is very deliberate in both Synapse and the spec. It's generally considered that new events are more important than those you may have missed from days or months ago.

timokoesters · 2024-02-26T13:11:53Z

That's unfortunate, because it would make my idea of ordering much more stable. So what are your thoughts on the approach? Do you think there is a way to make it happen?

richvdh · 2024-02-26T13:29:22Z

I don't know to be honest. The problem is that we are currently trying to satisfy two conflicting requirements:

When events arrive, they should be shown at the bottom of the timeline so that users see them
When viewing historical messages, events should be shown an order that best reflects causality, so that messages are shown in context and the conversation is easier to follow.

(and we currently address that conflict, badly, by using different orderings for /sync and /messages and allowing /messages to be inconsistent.)

If I were forced to give up one of those two requirements, I think it would be the first. (If a message was sent 6 months ago, it probably shouldn't pop up in the user's timeline as if it had been sent in the middle of today's conversation.) Indeed I think there was some experimental work about having Element show up such "late" messages in a separate panel or something. Not sure what happened to that.

Anyway, the point is: I think if I were changing things, then rather than trying to make /messages more like /sync, I would make /sync more like /messages. So... I guess I'm pretty dubious?

timokoesters · 2024-02-26T14:05:59Z

The problem with (2) is that there is no way of telling clients about those historical events, so it only works if they regularly clear their caches and load it from the server again.

We should ask for more opinions.

dkasak · 2024-02-26T16:21:46Z

@andybalaam should be interested in this since I discussed this with him a couple of times and I'll ping him to voice his opinion instead of me repeating it badly.

My gut feeling is to side with Rich and aim for a DAG-based ordering, but there are non-trivial problems with this approach.

andybalaam · 2024-02-27T10:45:21Z

My opinions are biased by my work on receipts, which crucially depend on message ordering, but don't care about historical events. (Where by historical events I mean ones that have a negative stream order in Synapse.)

My strongest opinion is that we need a way to page through events in an order that is sensible for clients.

I believe that the best order for clients is "stream" ordering, except in the case of historical events. My reasoning: the user sees the events in the stream order. Even when they are late-arriving, I still think that it is sensible to display them to the user in the order they arrived for that user. Anything else is deeply confusing.

It is very unfortunate, but true, that we need to agree on an ordering that all clients are happy with, and provide that ordering on the server. The reason for this is that if the clients present events in a different order, receipts work extremely counter-intuitively. If receipts did not use a before/after relationship (e.g. there was one receipt per message) then we would have the freedom for different clients to use different orderings, and we could ask the server to provide enough information to allow this. But our current design for receipts constrains us to pick one true ordering.

As for historical events, that is very sad. It makes me want to design some kind of hybrid ordering that says "stream order except where the event graph contradicts it". This would mean that late-arriving messages because of a netsplit would appear at the bottom of the timeline (since no event graph relationship relates any events across the split) but historical messages would be placed at the beginning (since the latest event is listed as a prevevent of an old event). But this is difficult or impossible because to do it consistently you would need all events, and sorting events would need knowledge of all events - there would be no way to compare two individual events without referring to all the others just in case there is an indirect event graph relationship.

…remental sync (#17510) Use `stream_ordering` based `timeline` pagination for incremental `/sync` in Sliding Sync. Previously, we were always using a `topological_ordering` but we should only be using that for historical scenarios (initial `/sync`, newly joined, or haven't sent the room down the connection before). This is slightly different than what the [spec suggests](https://spec.matrix.org/v1.10/client-server-api/#syncing) > Events are ordered in this API according to the arrival time of the event on the homeserver. This can conflict with other APIs which order events based on their partial ordering in the event graph. This can result in duplicate events being received (once per distinct API called). Clients SHOULD de-duplicate events based on the event ID when this happens. But we've had a [discussion below in this PR](#17510 (comment)) and this matches what Sync v2 already does and seems like it makes sense. Created a spec issue matrix-org/matrix-spec#1917 to clarify this. Related issues: - matrix-org/matrix-spec#1917 - matrix-org/matrix-spec#852 - matrix-org/matrix-spec-proposals#4033

timokoesters added the clarification An area where the expected behaviour is understood, but the spec could do with being more explicit label Jul 2, 2021

richvdh closed this as completed Jul 5, 2021

richvdh reopened this Jul 16, 2021

richvdh added feature Suggestion for a significant extension which needs considerable consideration and removed clarification An area where the expected behaviour is understood, but the spec could do with being more explicit labels Jul 16, 2021

timokoesters mentioned this issue Jul 16, 2021

MSC2716: Incrementally importing history into existing rooms matrix-org/matrix-spec-proposals#2716

Open

richvdh mentioned this issue Aug 23, 2021

Clarifications to pagination parameters for various APIs matrix-org/matrix-spec-proposals#3353

Merged

richvdh mentioned this issue Sep 20, 2021

We need to figure out how to linearize history for client consumption. (SPEC-65) matrix-org/matrix-spec-proposals#474

Closed

MadLittleMods mentioned this issue Oct 18, 2021

Fix historical messages backfilling in random order on remote homeservers (MSC2716) matrix-org/synapse#11114

Merged

11 tasks

ShadowJonathan mentioned this issue Feb 1, 2022

Use SledStore to store full timeline matrix-org/matrix-rust-sdk#288

Closed

ShadowJonathan mentioned this issue Feb 17, 2022

Backfill causes wrong membership state to be sent to clients matrix-org/synapse#11874

Closed

richvdh transferred this issue from matrix-org/matrix-spec-proposals Mar 2, 2022

DMRobertson mentioned this issue Nov 14, 2022

CS API refers to undefined terms stream ordering and topological ordering #1334

Open

t3chguy mentioned this issue Apr 27, 2023

Several places in the code use timestamp to determine event order matrix-org/matrix-js-sdk#3325

Open

richvdh changed the title ~~Event ordering~~ Consider changing /messages to use stream ordering for events Apr 27, 2023

robintown mentioned this issue May 12, 2023

How should clients order a timeline composed of events from multiple rooms? #1520

Open

clokep mentioned this issue Oct 11, 2023

/sync doesn't inform clients when there is a gap in the timeline matrix-org/synapse#16463

Closed

jplatte mentioned this issue Dec 6, 2023

Tracking issue for timeline API matrix-org/matrix-rust-sdk#1103

Closed

29 tasks

This was referenced Jul 31, 2024

Sliding Sync: Use stream_ordering based timeline pagination for incremental sync element-hq/synapse#17510

Merged

Clarify order of initial vs incremental /sync #1917

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider changing `/messages` to use stream ordering for events #852

Consider changing `/messages` to use stream ordering for events #852

timokoesters commented Jul 2, 2021

timokoesters commented Jul 2, 2021

richvdh commented Jul 5, 2021

timokoesters commented Jul 15, 2021

richvdh commented Sep 20, 2021

ShadowJonathan commented Sep 28, 2021 •

edited

Loading

ShadowJonathan commented Apr 11, 2022 •

edited

Loading

dkasak commented May 28, 2022

MadLittleMods commented Apr 14, 2023

dkasak commented Jul 12, 2023

andybalaam commented Jul 12, 2023

timokoesters commented Jan 29, 2024

dkasak commented Jan 29, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

dkasak commented Feb 26, 2024 •

edited

Loading

andybalaam commented Feb 27, 2024

Consider changing /messages to use stream ordering for events #852

Consider changing /messages to use stream ordering for events #852

Comments

timokoesters commented Jul 2, 2021

timokoesters commented Jul 2, 2021

richvdh commented Jul 5, 2021

timokoesters commented Jul 15, 2021

richvdh commented Sep 20, 2021

ShadowJonathan commented Sep 28, 2021 • edited Loading

ShadowJonathan commented Apr 11, 2022 • edited Loading

dkasak commented May 28, 2022

MadLittleMods commented Apr 14, 2023

dkasak commented Jul 12, 2023

andybalaam commented Jul 12, 2023

timokoesters commented Jan 29, 2024

dkasak commented Jan 29, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

richvdh commented Feb 26, 2024

timokoesters commented Feb 26, 2024

dkasak commented Feb 26, 2024 • edited Loading

andybalaam commented Feb 27, 2024

Consider changing `/messages` to use stream ordering for events #852

Consider changing `/messages` to use stream ordering for events #852

ShadowJonathan commented Sep 28, 2021 •

edited

Loading

ShadowJonathan commented Apr 11, 2022 •

edited

Loading

dkasak commented Feb 26, 2024 •

edited

Loading