Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2946: Spaces Summary #2946

Merged
merged 77 commits into from
Oct 31, 2021
Merged
Changes from 13 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
059f324
Spaces Summary
kegsay Jan 7, 2021
14eb56c
MSC2946
kegsay Jan 7, 2021
aea5336
Clarity
kegsay Jan 7, 2021
7618ed6
More clarity
kegsay Jan 7, 2021
6e051d5
Clarify what no room data means for clients
kegsay Jan 7, 2021
619c100
Federation API
kegsay Jan 13, 2021
104cf78
Update 2946-spaces-summary.md
kegsay Jan 13, 2021
8f6fb9d
auto_join filter
kegsay Jan 13, 2021
200147c
Blurb on auth for fed api
kegsay Jan 13, 2021
f2457cf
Update to reflect MSC1772 changes
kegsay Jan 15, 2021
1430661
Mention auth chain on federation api
kegsay Jan 15, 2021
09b0848
Add 'version' field
kegsay Jan 18, 2021
7b2f3dc
Stripped state; remove room versions
kegsay Jan 19, 2021
6224859
Update 2946-spaces-summary.md
kegsay Feb 25, 2021
211e5e6
Update proposals/2946-spaces-summary.md
kegsay Mar 15, 2021
725277e
Replace with link to draft doc.
richvdh Mar 23, 2021
0fd8d8d
Add a preamble and copy the current draft API.
clokep Apr 13, 2021
dee9040
Switch to using stable identifiers (and add an unstable identifiers s…
clokep Apr 13, 2021
a3b62a8
Updates / clarifications.
clokep Apr 13, 2021
f28ad9b
Fix typo.
clokep Apr 14, 2021
d911c82
Clean-ups.
clokep Apr 14, 2021
74f12d5
Update proposals/2946-spaces-summary.md
ara4n Apr 30, 2021
8b1fe00
Drop unstable identifiers from MSC1772.
clokep May 3, 2021
8fdbfb1
Various updates and clarifications.
clokep May 4, 2021
f145fa3
Include the origin_server_ts in the response, as needed by MSC1772.
clokep May 5, 2021
f9c00a5
Rename a parameter for clarity.
clokep May 5, 2021
9c2e85a
Fix typo.
clokep May 5, 2021
760cda8
Various clarifications based on feedback.
clokep May 5, 2021
a5ad9a4
Add auth / rate-limiting info.
clokep May 6, 2021
4c10e02
Combine some double spaces.
clokep May 6, 2021
ad5af4d
Use only GET endpoints.
clokep May 6, 2021
dba41f9
Add notes about DoS potential.
clokep May 6, 2021
8a968eb
Tweaks from review.
clokep May 6, 2021
b379c42
Add context about why stripped events are returned.
clokep May 6, 2021
27f526c
Remove some implementation details.
clokep May 6, 2021
c142433
Add notes on ordering.
clokep May 10, 2021
af8c7b0
Remove unnecessary data.
clokep May 10, 2021
bcde9e0
Clarify the server-server API.
clokep May 10, 2021
328ae81
More clarifications.
clokep May 10, 2021
518db51
Remove obsolete note.
clokep May 11, 2021
3b0051f
Some clarifications to what accessible means.
clokep May 19, 2021
5cd8270
Update notes about sorting to include the origin_server_ts of the m.s…
clokep May 19, 2021
797dda4
Only consider `m.space` rooms and do not return links to nowhere.
clokep Jun 11, 2021
105fd93
Updates based on MSC3173 merging and updates to MSC3083.
clokep Jun 25, 2021
a7a08eb
Updates per MSC2403.
clokep Jul 2, 2021
094de30
Remove field which is not part of the C-S API.
clokep Jul 27, 2021
c0a63ab
Rewrite the proposal.
clokep Jul 28, 2021
3d7769f
Handle todo comments.
clokep Jul 28, 2021
5627721
Update URLs.
clokep Jul 29, 2021
7d0c8f6
Rename field.
clokep Jul 29, 2021
420d698
Updates based on implementation.
clokep Aug 9, 2021
5cd0db4
Clarify the state which is persisted.
clokep Aug 10, 2021
14bdc42
Expand notes about errors.
clokep Aug 10, 2021
00d3d67
Update MSC with pagination parameter.
clokep Aug 11, 2021
e39eac3
Fix wrong endpoint.
clokep Aug 13, 2021
6823998
Clarifications based on implementation.
clokep Aug 27, 2021
5a5a404
Remove empty section.
clokep Sep 7, 2021
0174348
Fix typo.
clokep Sep 10, 2021
545ff90
Rename field in example.
clokep Sep 21, 2021
42ba46e
Clarify error code.
clokep Sep 21, 2021
dee3b2d
Clarify ordering changes.
clokep Sep 21, 2021
a9803c3
Clarify wording.
clokep Sep 28, 2021
e9592ff
Fix typos.
clokep Sep 29, 2021
8978562
Clarify that rooms do not belong to servers.
clokep Sep 29, 2021
622f8ed
Fix example to use correct URL.
clokep Sep 29, 2021
6ad9ead
Clarify using local vs. remote data.
clokep Sep 29, 2021
482597e
Clarify bits aboud stripped state.
clokep Sep 29, 2021
41bfaa5
Clarify access control of federation responses.
clokep Sep 29, 2021
f6f41b1
Clarify error code.
clokep Sep 29, 2021
828c076
Be less prescriptive about expiring data.
clokep Oct 5, 2021
7fd45e5
Limit must be non-zero.
clokep Oct 6, 2021
902a124
Rate limiting.
clokep Oct 6, 2021
bf5f84d
Add a note about room upgrades.
clokep Oct 6, 2021
b43333a
Update stable URLs per MSC2844.
clokep Oct 12, 2021
c74adf7
Clarify federation return values.
clokep Oct 12, 2021
fe5a9a7
Clarify `origin_server_ts`.
clokep Oct 26, 2021
a9f6cf6
Tweak wording around `inaccessible_children`.
clokep Oct 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
259 changes: 259 additions & 0 deletions proposals/2946-spaces-summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
## Spaces Summary API

*This MSC depends on [MSC1772](https://github.com/matrix-org/matrix-doc/pull/1772).*
clokep marked this conversation as resolved.
Show resolved Hide resolved

Spaces are rooms with `m.space` as the [room type](https://github.com/matrix-org/matrix-doc/pull/1840).
clokep marked this conversation as resolved.
Show resolved Hide resolved
Spaces can include state events to specify parent/child relationships.
These relationships point to other rooms, which may themselves be spaces.
This means spaces can have subspaces and rooms. This creates a graph: a space directory.

This MSC defines a new endpoint which can be used to reveal information about the space directory.
clokep marked this conversation as resolved.
Show resolved Hide resolved

Consider the graph:
```
A
^
|___
| |
V V
B R1
^
|
V
R2

R1,R2 = rooms
A,B = spaces
<--> = parent/child relationship events
```
This MSC aims to create a way for clients to produce a tree view along the lines of:
```
Space A
|
|___ Room 1
|
Space B
|
|___ Room 2
```
Clients are able to do this currently by peeking into all of these rooms
(assuming they have permission to) but this is costly and slow.

### Client API

```
POST /_matrix/client/r0/rooms/{roomID}/spaces
richvdh marked this conversation as resolved.
Show resolved Hide resolved
clokep marked this conversation as resolved.
Show resolved Hide resolved
clokep marked this conversation as resolved.
Show resolved Hide resolved
{
"max_rooms_per_space": 5, // The maximum number of rooms/subspaces to return for a given space, if negative unbounded. default: -1.
clokep marked this conversation as resolved.
Show resolved Hide resolved
clokep marked this conversation as resolved.
Show resolved Hide resolved
clokep marked this conversation as resolved.
Show resolved Hide resolved
"auto_join_only": true, // If true, only return m.space.child events with auto_join:true, default: false, which returns all events.
"limit": 100, // The maximum number of rooms/subspaces to return, server can override this, default: 100.
kegsay marked this conversation as resolved.
Show resolved Hide resolved
"batch": "opaque_string" // A token to use if this is a subsequent HTTP hit, default: "".
}
```

which returns:

```
{
"next_batch": "opaque string",
kegsay marked this conversation as resolved.
Show resolved Hide resolved
"rooms": [
{
"aliases": [
"#murrays:cheese.bar"
],
clokep marked this conversation as resolved.
Show resolved Hide resolved
"avatar_url": "mxc://bleeker.street/CHEDDARandBRIE",
"guest_can_join": false,
"name": "CHEESE",
"num_joined_members": 37,
"room_id": "!ol19s:bleecker.street",
"topic": "Tasty tasty cheese",
"world_readable": true,

"num_refs": 42,
clokep marked this conversation as resolved.
Show resolved Hide resolved
"room_type": "m.space"
kegsay marked this conversation as resolved.
Show resolved Hide resolved
},
{ ... }
],
"events": [
richvdh marked this conversation as resolved.
Show resolved Hide resolved
{
"type": "m.space.child",
"state_key": "!efgh:example.com",
"content": {
"via": ["example.com"],
"present": true,
"order": "abcd",
clokep marked this conversation as resolved.
Show resolved Hide resolved
"auto_join": true
},
"room_id": "!ol19s:bleecker.street",
"sender": "@alice:bleecker.street"
},
{
"type": "m.space.parent",
"state_key": "!space:example.com",
"content": {
"via": ["example.com"]
},
"room_id": "!ol19s:bleecker.street",
"sender": "@alice:bleecker.street"
}
]
}
```

Justifications for the request API shape are as follows:
- The HTTP path: Spaces are scoped to a specific room to act as an anchor point for
navigating the directory. Alternatives are `/r0/spaces` with `room_id` inside the
body, but this feels less idiomatic for room-scoped requests.
- The HTTP method: there's a lot of data to provide to the server, and GET requests
shouldn't have an HTTP body, hence opting for POST. The same request can produce
different results over time so PUT isn't acceptable as an alternative.
- `max_rooms_per_space`: UIs can only display a set number of rooms per space, so allowing
clients to specify this limit is desirable. Subsequent rooms can be obtained by paginating.
The graph has 2 distinct types of nodes, and some UIs may want to weight one type above
the other. However, it's impossible to always know what type of node a given room ID falls
under because the server may not be joined to that room (to determine the room type) or the
caller may not have permission to see this information.
- `limit`: The maximum number of events to return in `events`. It is desirable for clients
clokep marked this conversation as resolved.
Show resolved Hide resolved
and servers to be able to put a maximum cap on the amount of data returned to the client.
**This limit may be exceeded if the root room has `> limit` rooms.**
- `auto_join_only`: If `true`, only a subset of the graph is returned based on the presence
of `auto_join: true` in the `content` field of `m.space.child`. Some clients may only
care about the "main" or "default" rooms, which are rooms with this flag set. This does
not affect parent state events: they are still returned. This does not modify the value
of `num_refs`.
- `batch`: Required for pagination. Could be a query parameter but it's easier if
clokep marked this conversation as resolved.
Show resolved Hide resolved
request data is in one place.

Justifications for the response API shape are as follows:
- `rooms`: These are the nodes of the graph. The objects in the array are exactly the same as `PublicRoomsChunk` in the
[specification](https://matrix.org/docs/spec/client_server/r0.6.0#post-matrix-client-r0-publicrooms)
as the information displayed to users is the same. There are two _additional_ keys
which are:
* `num_refs` which is the total number of state events which point to or from this room (inbound/outbound edges).
This includes all `m.space.child` events in the room, _in addition to_ `m.space.parent` events which point to
this room as a parent.
* `room_type` which is the room type, which is `m.space` for subspaces. It can be omitted if there is no room type
in which case it should be interpreted as a normal room.
- `events`: These are the edges of the graph. The objects in the array are stripped `m.space.parent`
or `m.space.child` events. This means that they only contain the `type`, `state_key`, `content`, `room_id` and `sender`
keys, similar to `invite_state` in the `/sync` API.
- `next_batch`: Its presence indicates that there are more results to return.

Server behaviour:
- Extract the room ID from the request. Sanity check request data. Begin walking the graph
clokep marked this conversation as resolved.
Show resolved Hide resolved
starting with the room ID in the request in a queue of unvisited rooms according to the
following rules:
* If this room has already been processed, skip. NB: do not remember this between calls,
as servers will need to visit the same room more than once to return additional events.
* Mark this room as processed.
* Is the caller currently joined to the room or is the room `world_readable`?
If no, skip this room. If yes, continue.
clokep marked this conversation as resolved.
Show resolved Hide resolved
* If this room has not ever been in `rooms` (across multiple requests), extract the
clokep marked this conversation as resolved.
Show resolved Hide resolved
`PublicRoomsChunk` for this room.
* Get all `m.space.child` and `m.space.parent` state events for the room. *In addition*, get
clokep marked this conversation as resolved.
Show resolved Hide resolved
all `m.space.child` and `m.space.parent` state events which *point to* (via `state_key`)
this room. This requires servers to store reverse lookups. Add the total number of events
to `PublicRoomsChunk` under `num_refs`. Add `PublicRoomsChunk` to `rooms`.
Do NOT include state events which are missing the `content.via` field, as this indicates
a redacted link. These events do not contribute to `num_refs` and should not be returned
to the caller.
* If this is the root room from the original request, insert all these events into `events` if
they haven't been added before (across multiple requests).
* Else add them to `events` honouring the `limit` and `max_rooms_per_space` values. If either
are exceeded, stop adding events. If the event has already been added, do not add it again.
* For each referenced room ID in the events being returned to the caller (both parent and child)
add the room ID to the queue of unvisited rooms. Loop from the beginning.
- This guarantees that all edges for the root node are given to the client. Not all edges of subspaces
will be returned, nor will edges of all rooms be returned. This can be detected by clients in two ways:
* Comparing `num_refs` with the *total number* of edges pointing to/from the room.
* Comparing the number of `m.space.child` state events in the room with `max_rooms_per_space`, where
`max_rooms_per_space` is 1 greater than the actual desired maximum value.
- If not all events were returned due to reaching a `limit` or `max_rooms_per_space`, return a
`next_batch` token. The server SHOULD NOT return duplicate events or rooms on subsequent
clokep marked this conversation as resolved.
Show resolved Hide resolved
requests: this can be achieved by remembering the event/room IDs returned to the caller between calls.
This results in each request uncovering more nodes/edges until the entire tree has been explored.


Client behaviour:
- Decide which room should be the root of the tree, then call this endpoint with the root room ID.
- The data in `rooms` determines _what_ to show. The events in `events` determine _where_ to show it.
Take all the data in `rooms` and key them by room ID.
- Loop through the `events` and keep track of parent->child relationships by looking at the `state_key`
which is the child room ID. Clients may want to treat child->parent relationships
(`m.space.parent` events) the same way or differently. Treating them the
same way will guarantee that the entire graph is exposed on the UI, but can cause issues because it
can result in multiple roots (a child can refer to a new unknown parent). If a child->parent relationship
exists but a corresponding parent->child relationship does not exist, this room is a "secret" room which
should be indicated as such. If a parent->child relationship exists but a corresponding child->parent
relationship does not exist, this room is a "user-curated collection" and should be indicated as such.
Persist the mappings in a map: one child can have multiple parents and one parent can have multiple
children.
- Starting at the root room ID:
* Compare the `num_refs` value in `rooms.$room_id` to the total number of events which reference this
room in `events` (across all rooms). If they differ, a partial response has been returned for this
space and additional results should be loaded when required. The API guarantees that *all* events for
the root room ID will be returned, regardless of how many events there are (even if they exceed `limit`).
* Lookup all children for this room ID. For each child:
- If there is no corresponding room data for this room ID then this room is either a subspace or a room.
The room is not world readable or the server does not have any information about this room. Clients
MAY be able to join this room by issuing a `/join` request.
- If the child is a room (not a space, check the `room_type` field), look up the room data from
`rooms` and render it.
- Else the child is a space, render the space as a heading (using the room name/topic) and
restart the lookup using the new space room ID.


### Federation API
clokep marked this conversation as resolved.
Show resolved Hide resolved


Servers may not be joined to all subspaces in the graph. If this happens, they will lack the room state to form a response.
Servers may get this information by peeking into the room, but this includes a live stream of events which is unecessary and
is a single request per room in the graph. It would be preferable if there was a federation endpoint which included this
information and nothing more. This is more performant and is a single request per _server_ (which may have many nodes
of the graph). Effectively, this federation API requests the view of the graph from the point of view of the destination
server.

```
POST /_matrix/federation/v1/spaces/{roomID}
clokep marked this conversation as resolved.
Show resolved Hide resolved
{
"exclude_rooms": ["!a:b", "!b:c"] // Optional. Do not return state events in these rooms, nor include these rooms in `rooms`.
"max_rooms_per_space": 5, // The maximum number of rooms/subspaces to return for a given space, if negative unbounded. default: -1.
"limit": 100, // The maximum number of rooms/subspaces to return, server can override this, default: 100.
"batch": "opaque_string" // A token to use if this is a subsequent HTTP hit, default: "".
}
```

Justifications for the request API shape are the same as before with one exception:
- The HTTP path: Per-room federation endpoints are not put under `/rooms` so this proposal doesn't either.
- The `exclude_rooms` parameter: In order to stop redundant information being sent to the server, this field allows requesting
servers the ability to suppress node/edge information on a per-room basis. If a room ID is present in this list,
the server should not return node information under `rooms` nor should it return _any state events in this room_. NB: state
events which _point to_ this room should still be included.

The response body remains unchanged from the client format. Servers are unable to verify the auth chain of the returned events
as they are typically not joined to the rooms returned. Servers MUST NOT persist these events in any potential room DAG that
may be created if the server were to join the room. The decision to use stripped state events instead of the actual events
was made because:
- Clients just care about the data, and servers shouldn't be persisting the unverified events in the DAG, meaning data like
`prev_events` and `auth_events` would be useless.
- Events deserialise differently based on the room version which would need to be injected into the response if we decided
to use full events. In addition, because this endpoint returns events from multiple rooms then servers would need to partially
deserialise the event to extract the `room_id` field to work out which room version to use. This is bad because it relies on
the `room_id` field never changing in a future room version.

Sending server behaviour:
- When walking the spaces graph, if the server is not joined to a given room, remember the `via` server names and the room ID.
- Send a federated request to a server in `via` for the unknown room, marking rooms the server is already joined to
in `exclude_rooms`.
- Servers MAY eagerly request graph information and SHOULD cache the response for a configurable duration. This proposal recommends
1 hour.

Receiving server behaviour:
- Validate the request and check sender signatures.
- Walk the graph in the same way as the CS API endpoint, remembering to exclude rooms in `exclude_rooms`. "Exclude" in this
context merely means do not add the room or state events in that room to the response. The room itself MUST still be walked
so servers can extract transitive rooms e.g `A -> B -> C` and the requesting server requests `room_id: A, exclude_rooms: [B]`
must return `C`.
- Servers are authorised to see node/edge information if they are either joined to the room or the room is `world_readable`.
A well-behaved server will not send requests for rooms they are already joined to, so they should only be shown `world_readable`
rooms.