Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2326: Label based filtering #2326

Open
wants to merge 22 commits into
base: old_master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions proposals/2326-label-based-filtering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Label based filtering
Copy link
Contributor

@babolivier babolivier Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bubu

This is really handy for finding stuff, especially the images and links part. As this indexing needs to be done server side and also in encrypted rooms we'd need a way to attach labels to messages, which state, this message contains an image/at least one link/an attachment/etc.

The labels for this would need to be interpreted by the server, so for encrypted rooms we'd need to use a set of well-known labels.
This will obviously leak metadata, so this would probably need to be an optional feature for encrypted rooms.

Agreed that this would be a handy feature, and that listing all labels in a room is a nice thing to have generally. I'm thinking of one possible way to do this, which would be to add an endpoint that exposes the list of labels the server knows have been used in the room (which should be fairly easy given the server will probably already store (event_id, label) tuples in its database for efficiency). For encrypted rooms, this would return a list of hashes (which is what the server considers as a list of labels for that room, since it doesn't know about the actual labels), and clients would then be able to resolve those hashes by calling /messages with a filter containing the labels to resolve, and extracting the labels from the response (which contains events that the client should be able to decrypt). This would allow such a feature to work well without having to leak more metadata.

wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!
As for metadata concerns, these are still there, as we are working on a set of ~6 fixed labels for this usecase. But this is basically already covered in the "Security Considerations" section here.

Whether or not the actually usage of these tags for images/links/etc. will become optional in E2EE chats is not part of this MSC I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we resolve this thread?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that once I've updated the MSC to describe this solution, which I haven't got time to do yet.


## Problem

Rooms often contain overlapping conversations, which Matrix should help users
navigate.

## Context

We already have the concept of 'Replies' to define which messages are
responses to which, which MSC1849 proposes extending into a generic mechanism
babolivier marked this conversation as resolved.
Show resolved Hide resolved
for defining threads which could (in future) be paginated both depth-wise and
breadth-wise. Meanwhile, MSC1198 is an alternate proposal for threading,
which separates conversations into high-level "swim lanes" with a new `POST
/rooms/{roomId}/thread` API.

However, fully generic threading (which could be used to implement forum or
email style semantics) runs a risk of being overly complicated to specify and
implement and could result in feature creep. This is doubly true if you try
to implement retrospective threading (e.g. to allow moderators to split off
messages into their own thread, as you might do in a forum or to help manage
conversation in a busy chatroom).

Therefore, this is a simpler proposal to allow messages in a room to be
filtered based on a given label in order to give basic one-layer-deep
threading functionality.

## Proposal

We let users specify an optional `m.label` field onto events (outside of E2E
contents) which provides a list of freeform text labels for the events they
send. Clients can use these to filter the overlapping conversations in a room
into different topics. The labels could also be used when bridging as a
hashtag to help manage the disconnect which can happen when bridging a
threaded room to an unthreaded one.

Example:

```json
{
"type": "m.room.message",
"content": {
"body": "who wants to go down the pub?",
"msgtype": "m.text",
"m.label": [ "#fun" ]
}
}
```

```json
{
"type": "m.room.encrypted",
"content": {
"algorithm": "m.megolm.v1.aes-sha2",
"ciphertext": "AwgAEpABm6.......",
"device_id": "SOLZHNGTZT",
"sender_key": "FRlkQA1enABuOH4xipzJJ/oD8fxiQHj6jrAyyrvzSTY",
"session_id": "JPWczbhnAivenK3qRwqLLBQu4W13fz1lqQpXDlpZzCg",
"m.label": [ "#work" ]
babolivier marked this conversation as resolved.
Show resolved Hide resolved
},
}
```

Labels which are prefixed with # are expected to be user-visible and exposed
babolivier marked this conversation as resolved.
Show resolved Hide resolved
to the user as a hashtag, letting the user filter their current room by the
various hashtags present within it.

Clients are expected to explicitly set the label on a message if the user's
intention is to respond as part of a given labelled topic. For instance, if
the user is currently filtered to only view messages with a given label, then
new messages sent should use the same label. Similarly if the user sends a
reply to a given message, that reply should typically use the same labels as
the message being replied to.

The convention is to use hashtag style human-visible labels prefixed with a #,
but one could also use a unique ID (e.g. thread ID bridged from another
platform) without a # prefix).

When a user wants to filter a room to given label(s), it defines a filter for
use with /sync or /messages to limit appropriately. This is done by new
babolivier marked this conversation as resolved.
Show resolved Hide resolved
`labels` and `not_labels` fields to the `EventFilter` object, which specifies
a list of labels to include or exclude in the given filter.

## Problems

Do we care about internationalising hashtags?

Too many threading APIs?

## Unstable prefix

Unstable implementations should hook up `org.matrix.label` rather than `m.label`.
babolivier marked this conversation as resolved.
Show resolved Hide resolved
babolivier marked this conversation as resolved.
Show resolved Hide resolved