Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JEP: Websocket token authentication with subprotocols #121

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

minrk
Copy link
Member

@minrk minrk commented Apr 30, 2024

JEP for #119

121-token-auth/token-auth.md Outdated Show resolved Hide resolved
121-token-auth/token-auth.md Outdated Show resolved Hide resolved
121-token-auth/token-auth.md Outdated Show resolved Hide resolved
121-token-auth/token-auth.md Outdated Show resolved Hide resolved
121-token-auth/token-auth.md Outdated Show resolved Hide resolved
121-token-auth/token-auth.md Outdated Show resolved Hide resolved

### Clients

Websocket clients SHALL transmit API tokens in the `Sec-Websocket-Protocol` header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the use of SHALL in some sentences and MUST in other sentences significant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These SHALLs should be SHOULDs. I had rfc2119 in mind, where MUST is a requirement, while SHOULD is a recommendation. Sending tokens this way is not required because all existing mechanisms still work, but it is recommended where supported.

Copy link
Contributor

@vidartf vidartf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Positive for adding this support, as it's opt-in on both the server and client side. One technical question in diff comments.

#### Backward compatibility

This mechanism does not replace any other mechanisms, it is purely additional.
A server that does not support the new scheme may reject a websocket connection with e.g. status 403, as if no token was provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref here for next comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this bit describes current behavior. We can propose new behavior that doesn't support the new scheme (e.g. 401 on no creds), but if we are talking about backward compatibility, we have to handle what implementations do today without modification.


- `url_token` SHOULD be extracted and url-decoded (e.g. `token = unquote('{url_token}')`)
- `token` SHOULD be handled identically to if it were sent via `Authorization: Bearer {token}`
- If `token` is invalid or rejected, connection request MUST fail with status 403.
Copy link
Contributor

@vidartf vidartf Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be clear to clients if the server didn't support the subprotocol or whether there was something wrong with the token? Ref other comment above, it seems like an overload of 403. If my token is expired, I don't want the client to fall back on trying a less secure method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't think so. It would perhaps have been better to fail with 401 when no recognized credentials are provided, but it doesn't really make sense to define new behavior for not supporting the new scheme.

Perhaps there is a header we can set on the error response that might be readable by the client error handler, so it can know the token was rejected, not unsupported? Initial poking around suggests that the error handler doesn't preserve a handle on the response (yet again confirming that browsers consistently have the least capable websocket implementation for some reason), so unfortunately that doesn't seem to be an option.

If we had another status code to use for "token recognized and rejected" that would also work, but I don't think there is one, and 403 is really correct for "recognized but not authorized."

If we can't do it feasibly on the response, we may need to have explicit capability detection somewhere, either:

  • a dedicated 'capabilities' endpoint in the server spec
  • or detect and declare support (for JupyterLab, at least) via PageConfig
  • maybe a preflight (or post) OPTIONS request on the ws endpoint or a neighbor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the comment below is related to this: browsers don't report status codes, so there's no distinguishable difference for clients between 403, 404, 500, or any other reason a websocket request could fail (e.g. not a websocket endpoint at all). So status codes are not helpful for browsers (it's still the right thing to do for the server to log the right error code).

@SylvainCorlay
Copy link
Member

I guess that the retry pattern will just be a bit more complicated when dealing with the aligned kernel subprotocol?

@minrk
Copy link
Member Author

minrk commented Jun 18, 2024

I guess that the retry pattern will just be a bit more complicated when dealing with the aligned kernel subprotocol?

I'll need to check. In all cases, the first request should include the kernel subprotocol and the token subprotocol, with the kernel subprotocol first indicating highest priority. There will be these possible cases to handle in terms of server support:

  • supports both:
    • first request succeeds, subprotocol set to kernel subprotocol
  • supports token but not kernel:
    • first request succeeds, subprotocol is set to token subprotocol - this indicates kernel subprotocol is not supported
  • supports kernel subprotocol but not token (this is currently released servers):
    • if cookie authentication available, succeeds and kernel subprotocol is set
    • if cookie authentication is not available, fails with 403. If this were a normal HTTP request, that would be easy to identify, but websockets obfuscate errors so it may be hard to distinguish from other failures
      • second request with kernel subprotocol and token in URL succeeds
  • supports neither token nor kernel:
    • if cookie authentication available, fails with no supported subprotocol
      • (if auth failure is indistinguishable, which I believe it is) second request with token in url fails with no supported subprotocol
      • third request with token in url and no subprotocol succeeds
    • if cookie authentication is not available, fails with 403 (indistinguishable, I think)
      • same as above for the rest

Which means the first request reliably determines whether the token can be in the subprotocol. If the first request fails, the token shall be in the URL parameter. The second request then determines kernel subprotocol support (the first request in current jupyterlab), and in the event of a server supporting neither subprotocol, a third and final request is needed to make a successful connection.

So if token auth is supported by the server, retries are no longer needed to check for support of the kernel subprotocol, which is a plus, I suppose. But in order to handle all possible cases, there could be up to 2 retries instead of the current 1, assuming I'm correct that clients can't meaningfully distinguish between a websocket that fails due to missing auth vs unsupported protocols. It could be simplified if the two failures turn out to be distinguishable in the client.

It may have been a good idea to define a subprotocol version string for the older wire format so that servers could explicitly declare that they only support the older wire format. But I'm not sure if that's worth discussing at this point, because that would reduce the number of cases where the retry is needed.

@minrk
Copy link
Member Author

minrk commented Jul 3, 2024

I've run some tests, and browsers don't appear to record the http response (essentially, the WebSocket in browser seems to pretend that websockets are not built on top of HTTP, so expose nothing about the HTTP requests/responses to client code). So clients need to treat all connection failures as indistinguishable:

  • no supported protocol
  • auth error
  • 404
  • 500
  • not a websocket endpoint at all
  • lost connection

I'm not necessarily proposing we do this, but so folks can have an informed opinion, if the client knows whether the token subprotocol is supported before attempting websocket requests, the conditions look like this:

  • server supports token protocol:
    • first request fails: means auth error, no retry
    • first request succeeds:
      • selected subprotocol will be kernel if kernel subprotocol supported, token if kernel subprotcol not supported
  • server doesn't support token protocol (token in url, same as now):
    • first request fails: auth error or unsupported protocol
    • second request without kernel protocol:
      • succeeds if kernel subprotocol supported
      • fails if it was really an auth error or other problem

So it is simpler, especially since the current subprotocol retries can be eliminated if the token subprocol is known or assumed to be supported. But it adds the preflight specification to somehow communicate that token-authenticated websockets are supported, which we haven't decided on, and don't currently have a mechanism for.

@vidartf
Copy link
Contributor

vidartf commented Jul 3, 2024

@minrk is it possible to (ab)use the reason field in the close event? Or will that undermine some of the security considerations of websockets?

@minrk
Copy link
Member Author

minrk commented Jul 3, 2024

is it possible to (ab)use the reason field in the close event?

Yeah, I hadn't thought of that, but we could. I don't think it's abuse, I think it's actually what close code/reason are intended for. So instead of not accepting the connection in the first place, accept the connection and immediately close with a code (e.g. 4403 - 4000 + status code, since unregistered websocket codes should be in 4000-4999). This has an advantage in that it would actually give us a place for communicating the reason for the close, which is helpful, and maybe what we should have been doing all along.

There are backward-compatibility downsides to the transition, at least:

  • existing clients don't do this, so connection errors must still be handled by clients. But at least we should know that a connection error doesn't mean supporting token protocol + unauthorized token, if servers behave as intended
  • from a Python perspective, it may be complicated to properly accept and immediately close connections without triggering unauthorized on_open/get/pre_get side effects, or accidentally accept unauthorized connections because currently open is assumed to be protected to only be called on authorized requests, and with such a change, open would end up being called (my idea here is to patch out self.open = self.open_and_immediately_close and self.get = WebSocketHandler.get, but if overridden close assumes open has been called, errors are likely and hard to avoid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants