Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2918: Refresh tokens #2918

Merged
merged 22 commits into from
Sep 28, 2021
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
ab50b62
Refresh tokens MSC
sandhose Dec 18, 2020
f8dad2a
MSC2918: minor changes
sandhose Jan 14, 2021
0e615f7
MSC2918: access token expiration as milliseconds
sandhose May 20, 2021
870cded
MSC2918: account registration API changes
sandhose May 20, 2021
6530ecc
MSC2918: fix `expires_in_ms` example
sandhose May 20, 2021
b320001
MSC2918: add precision about token revocation
sandhose Jun 3, 2021
d433e3b
MSC2918: specify error codes for the refresh API
sandhose Jun 3, 2021
87566c3
MSC2918: clarify that the change also applies to ASes
sandhose Jun 3, 2021
269fcac
Apply suggestions from code review
sandhose Jul 1, 2021
4d73b7e
MSC2918: clarify what problem this MSC solves
sandhose Jul 15, 2021
db8ceab
MSC2918: minor formatting and rephrasing
sandhose Jul 15, 2021
9bbb4c5
MSC2918: clarify ratelimiting, masquerading and authentication on ref…
sandhose Jul 15, 2021
a050dc3
MSC2918: make expires_in_ms/refresh_token optional
sandhose Jul 15, 2021
2c11e6f
MSC2918: soft logout in refresh token API
sandhose Jul 15, 2021
4cd94e3
MSC2918: add detailed rationale
sandhose Aug 12, 2021
04ae1c3
MSC2918: minor fix
sandhose Aug 12, 2021
488e9e1
MSC2918: clarifications on backward compatibility
sandhose Sep 9, 2021
4cf821c
MSC2918: advertise support in the request body
sandhose Sep 10, 2021
c076763
MSC2918: clarify on what happen when token expire
sandhose Sep 10, 2021
a157cc3
MSC2918: remove redundant precision about token expiration and lifetime
sandhose Sep 23, 2021
ed54213
MSC2918: minor clarification
sandhose Sep 23, 2021
70b2dfc
MSC2918: soft logout when using expired token
sandhose Sep 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 163 additions & 0 deletions proposals/2918-refreshtokens.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# MSC2918: Refresh tokens
richvdh marked this conversation as resolved.
Show resolved Hide resolved
turt2live marked this conversation as resolved.
Show resolved Hide resolved

In Matrix, requests to the Client-Server API are currently authenticated using non-expiring, revocable access tokens.
An access token might leak for various reasons, including:

- leaking from the server database (and its backups)
- intercepting it with a man-in-the-middle attack
- leaking from the client storage (and its backups)

In the OAuth 2.0 world, this vector of attack is partly mitigated by having expiring access tokens with short lifetimes and rotating refresh tokens to renew them.
This MSC adds support for expiring access tokens and introduces refresh tokens to renew them.
richvdh marked this conversation as resolved.
Show resolved Hide resolved
A more [detailed rationale](#detailed-rationale) of what kind of attacks it mitigates lives at the end of this document.

## Proposal

Homeservers can choose to have access tokens expire after a short amount of time, forcing the client to renew them with a refresh token.
A refresh token is issued on login and rotates on each usage.

It allows homeservers to opt for signed and non-revocable access tokens (JWTs, Macaroon, etc.) for performance reasons if their expiration is short enough (less than 5 minutes).

It is heavily recommended for clients to support refreshing tokens for additional security.
They can advertise their support by adding a `"refresh_token": true` field in the request body on the `/login` and `/register` APIs.

Handling of clients that do *not* support refreshing access tokens is up to individual homeserver deployments.
For example, server administrators may choose to support such clients for backwards-compatibility, or to expire access tokens anyway for improved security at the cost of inferior user experience in legacy clients.

If a client uses an access token that has expired, the server will respond with an `M_UNKNOWN_TOKEN` error, preferably with the `soft_logout` parameter set to `true` to improve the user experience in legacy clients.
Thus, if a client receives an `M_UNKNOWN_TOKEN` error, and it has a refresh token available, it should no longer assume that it has been logged out, and instead attempt to refresh the token.
If the client was in fact logged out, then the server will respond with an `M_UNKNOWN_TOKEN` error to the token refresh request, possibly with the `soft_logout` parameter set.

### Login API changes
richvdh marked this conversation as resolved.
Show resolved Hide resolved

The login API returns two additional fields:
clokep marked this conversation as resolved.
Show resolved Hide resolved

- `expires_in_ms`: The lifetime in milliseconds of the access token.
sandhose marked this conversation as resolved.
Show resolved Hide resolved
- `refresh_token`: The refresh token, which can be used to obtain new access tokens.
sandhose marked this conversation as resolved.
Show resolved Hide resolved

This also applies to logins done by application services.

Both fields are optional.
If `expires_in_ms` is missing, the client can assume the access token won't expire.
If `refresh_token` is missing but `expires_in_ms` is present, the client can assume the access token will expire but it won't have a way to refresh the access token without re-logging in.

Clients advertise their support for refreshing tokens by setting the `refresh_token` field to `true` in the request body.

### Account registration API changes

Unless `inhibit_login` is `true`, the account registration API returns two additional fields:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be if its false? I.e. we include the params when we return a valid access token?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says:

inhibit_login: If true, an access_token and device_id should not be returned from this call, therefore preventing an automatic login. Defaults to false.

So that sentence seems correct? If inhibit_login is true, it will not return the additional fields

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not the easiest thing to grok, but @sandhose is right, and @erikjohnston is confused.


- `expires_in_ms`: The lifetime in milliseconds of the access token.
- `refresh_token`: The refresh token, which can be used to obtain new access tokens.
richvdh marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the refresh token expire? How does one manually expire it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also would be interested to see how this plays with soft logout: if the access token expires, but the refresh token is still live, should the server be using soft_logout: true in expiration responses? If so, how should the server clean up the device once the refresh token also expires?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refresh token don't expire, but they get invalidated on use.
By the way, the current implementation in Synapse is incompatible with the session_lifetime config parameter, which was the one that made access token expire and led to soft logouts.

also would be interested to see how this plays with soft logout: if the access token expires, but the refresh token is still live, should the server be using soft_logout: true in expiration responses?

I was not aware of how soft_logout works. :)
But also clients should now be aware when the token expires, so soft_logout on access token expiration should not happen as often, but then I'd clarify that soft_logout also applies when using the /refresh endpoint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refresh token don't expire, but they get invalidated on use.

This particular piece is a bit concerning, as it means that refresh tokens are hanging around waiting to give access back to the account. On the other hand, this somewhat fixes the scripts usecase as it can then store the refresh token and use that on the next run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still heavily mitigates the impact of token leakage, since they are rotating.
This is what a security best practices for OAuth 2.0 document from the IETF says about refresh tokens:

*  *Refresh token rotation:* the authorization server issues a new
      refresh token with every access token refresh response.  The
      previous refresh token is invalidated but information about the
      relationship is retained by the authorization server.  If a
      refresh token is compromised and subsequently used by both the
      attacker and the legitimate client, one of them will present an
      invalidated refresh token, which will inform the authorization
      server of the breach.  The authorization server cannot determine
      which party submitted the invalid refresh token, but it will
      revoke the active refresh token.  This stops the attack at the
      cost of forcing the legitimate client to obtain a fresh
      authorization grant.

tl;dr: if there is an attempt to use an old refresh token, there might be a token leak somewhere and the whole session should be invalidated. This could be mentioned in the MSC and/or in the spec, and implemented in Synapse if you thing it makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also clarified about soft_logout in the refresh token API in 2c11e6f

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ha, so the server would revoke the access token when a refresh token is used twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, although not strictly enforced by this MSC (and the current implementation in Synapse does not do that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live is this clear enough now?


This also applies to registrations done by application services.
sandhose marked this conversation as resolved.
Show resolved Hide resolved

As in the login API, both field are optional.

Clients advertise their support for refreshing tokens by setting the `refresh_token` field to `true` in the request body.

### Token refresh API

This API lets the client refresh the access token.
A new refresh token is also issued.
The existing refresh token remains valid until the new access token (or refresh token) is used, at which point it is revoked.
This allows for the request to get lost in flight.
The Matrix server can revoke the old access token right away, but does not have to since its lifetime is short enough that it will expire anyway soon after.
uhoreg marked this conversation as resolved.
Show resolved Hide resolved

`POST /_matrix/client/r0/refresh`
turt2live marked this conversation as resolved.
Show resolved Hide resolved
turt2live marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"refresh_token": "aaaabbbbccccdddd"
}
```

response:

```json
{
"access_token": "xxxxyyyyzzz",
"expires_in_ms": 60000,
"refresh_token": "eeeeffffgggghhhh"
}
```

If the `refresh_token` is missing from the response, the client can assume the refresh token has not changed and use the same token in subsequent token refresh API requests.

The `refresh_token` parameter can be invalid for two reasons:

- if it does not exist
- if it was already used once
sandhose marked this conversation as resolved.
Show resolved Hide resolved

In both cases, the server must reply with a `401` HTTP status code and an `M_UNKNOWN_TOKEN` error code.
sandhose marked this conversation as resolved.
Show resolved Hide resolved
This new use case of the `M_UNKNOWN_TOKEN` error code must be reflected in the spec.
As with other endpoints, the server can include an extra `soft_logout` parameter in the response to signify the client it should do a soft logout.

This new API should be rate-limited and does not require authentication since only the `refresh_token` parameter is needed.
KitsuneRal marked this conversation as resolved.
Show resolved Hide resolved
Identity assertion via the `user_id` query parameter as defined by the Application Service API specification is disabled on this endpoint.

### Device handling

The current spec states that "Matrix servers should record which device each access token is assigned to".
This must be updated to reflect that devices are bound to a session, which are created during login and stays the same after refreshing the token.

uhoreg marked this conversation as resolved.
Show resolved Hide resolved
## Potential issues

The refresh token being rotated on each refresh is strongly recommended in the OAuth 2.0 world for unauthenticated clients to avoid token replay attacks.
This can however make the deployment of CLI tools for Matrix a bit harder, since the credentials can't be statically defined anymore.
This is not an issue in OAuth 2.0 because usually CLI tools use the client credentials flow, also known as service accounts.
An alternative would be to make the refresh token non-rotating for now but recommend clients to support rotation of refresh tokens and enforce it later on.

## Alternatives

This MSC defines a new endpoint for token refresh, but it could also be integrated as a new authentication mechanism.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to expand on this, and the "potential issues" section above, what are the concerns with introducing it as some form of opt-in (or opt-out) mechanism for things like long-lived bots or scripts which do not easily have a refresh opportunity? For example, a nightly batch job to prune rooms/events/etc could use a static access token instead of having to login, do the work, then log out again, which would put the password near the script rather than a single revocable token.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for both use cases (bots and scripts) I'd rather make use of the org.matrix.login.jwt login type with some KMS signing it for the initial login and still have the token refresh. Storing long-lived access tokens without proper secret handling is at least as bad as storing the login/pass of the bot IMO, especially if the user has admin access.
If we still need some kind of static access token, I'd rather have that in Synapse (in the config or something) than in the spec

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, that sounds reasonable. Just wanted to expand on the potential usecase, but agreed that scripts can find other ways to authenticate (or better yet: be replaced by features within the protocol/homeserver implementation)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an authentication option for scripts needs to be in the spec. I have a lot of scripts that push notifications or upload files from CI jobs for example. Those use access tokens, because CI jobs do sometimes get compromised (happened once because of codecov) and that way the access token can be easily rotated without being a homeserver admin. If the script used username and password instead, an attacker would have been able to get past UIA and change the password and just in general do much more nasty stuff than with an access token. The jobs also can't refresh the access token, since they may be running concurrently and can't change CI variables.

What would be my alternative for that use case, that works independent of the specific homeserver implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an ongoing effort to rework the whole authentication process, with use cases like scripts running in CI in mind. This MSC is also done to prepare clients for the eventual migration to this new authentication stack without having them to logout all their existing sessions.

The login API with non-expiring token will hopefully stay until this new auth stack is ready, so when you would need to migrate you will have a proper alternative.

In the meantime, if you want to still adopt refresh tokens and you are admin of your homeserver, I suggest you look into the org.matrix.login.jwt login type in Synapse. Even though it is not standard, it will let you login using a JWT signed by some party.
It might still need some changes in Synapse to allow restricting the tokens (like not being able to use them for UIA to avoid letting the account to be nuked), but I prefer to go that route rather than adding this kind of special cases in this MSC if it will be superseded by something else soon-ish.


## Security considerations

The time to live (TTL) of access tokens isn't enforced in this MSC but is advised to be kept relatively short.
Servers might choose to have stateless, digitally signed access tokens (JWT are good examples of this), which makes them non-revocable.
The TTL of access tokens should be around 15 minutes if they are revocable and should not exceed 5 minutes if they are not.

## Unstable prefix

While this MSC is not in a released version of the specification, clients should use the `org.matrix.msc2918.refresh_token` field in place of the `refresh_token` field in requests to the login and registration endpoints.
The refresh token endpoint should be served and used using the unstable prefix: `POST /_matrix/client/unstable/org.matrix.msc2918/refresh`.

## Detailed rationale

This MSC does not aim to protect against a completely compromised client.
More specifically, it does not protect against an attacker that managed to distribute an alternate, compromised version of the client to users.
In contrast, it protects against a whole range of attacks where the access token and/or refresh token get leaked but the client isn't completely compromised.

For example, those tokens can leak from user backups (user backs up his device on a NAS, the NAS gets compromised and leaks a backup of the client's secret storage), but one can assume those backups could be at least 5 min old.
If the leak only includes the access token, it is useless to the attacker since it would have expired.
If it also includes the refresh token, it is useless *if* the token was refreshed before (which will happen if the user just opens their Matrix client in between).

Worst case scenario, the leaked refresh token is still valid: in this case, the attacker would consume the refresh token to get a valid access token, but when the original client tries to use the same refresh token, the homeserver can detect it, consider the session has been compromised, end the session and warn the user.

This kind of attack also applies to leakage from the server, which could happen from database backups, for example.

The important thing here is while it does not completely prevent attacks in case of a token leakage, it does make this range of attack a lot more time-sensitive and detectable.
A homeserver will notice if a refresh token is being used twice.

The IETF has interesting [guidelines for refresh tokens](https://datatracker.ietf.org/doc/html/draft-ietf-oauth-security-topics#section-4.13.2).
They recommend that either:

- the refresh tokens are sender-bound and require client authentication (making token leakage completely useless if the client credentials are not leaked at the same time)
- or make them rotate to make the attack a lot harder, as described just above.

Since all clients are "public" in the Matrix world, there are no client-bound credentials that could be used, hence the rotation of refresh tokens.

---

The other kind of scenario where this change makes sense is to help further changes in the homeservers.
A good, recent example of this, is in Synapse v1.34.0 [they moved away from macaroons for access tokens](https://github.com/matrix-org/synapse/pull/5588) to random, shorter, saved in database tokens, similar to [what GitHub did recently](https://github.blog/2021-04-05-behind-githubs-new-authentication-token-formats/).

Because there is no refresh token mechanism in the C2S API, most Synapse instances now have a mix of the two formats of tokens, and for a long time.
It makes it impossible to enforce the new format of tokens without invalidating all existing sessions, making it impossible to roll out changes like a web-app firewall in front of Synapse that verifies the shape and checksums of tokens even before reaching Synapse.

---

Lastly, expiring tokens already exist in Synapse (via the `session_lifetime` configuration parameter).
Before this MSC, clients had no idea when the session would end and relied on the server replying with a 401 error with `soft_logout: true` in the response on a random request to trigger a soft logout and go through the authentication process again.
A side effect of this MSC (although it could have been introduced separately) is that the login responses can now include a `expires_in_ms` to inform the clients when the token will expire.