Add login using ORCID oauth #2276

matkaczmarek · 2024-08-20T12:22:18Z

Change introduces three new endpoints:

authorcid_login -> which exchanges token with ORCID using Oauth and logs in the user if ORCID is already connected to the existing user or redirects to orcid_register for new users
orcid_register -> present a form to register new users and links ORCID to it once registered
orcid_init_login -> builds URL and redirects to ORCID for authorization

This PR also introduces changes in the login view and SSO login view:

physionet-django/sso/templates/sso/login.html

bemoody · 2024-09-19T17:52:01Z

There are a few issues I see here - some small, some larger - but first we need to understand the high level functionality.

We currently have the option to add an ORCID ID to a user's profile. Is it your intention that someone who has already done that should now be able to log in via ORCID?

What does "log in via ORCID" mean, exactly? What is the process that the user goes through (explain the steps for somebody who doesn't understand OAuth) and how does this process prove that the person sitting at the keyboard is the person with that ORCID ID?

You also need to explain how "registration" works: describe at a high level what steps are taken, from when a new user first visits the site, until they are fully registered and logged-in.

matkaczmarek · 2024-09-23T13:33:49Z

We currently have the option to add an ORCID ID to a user's profile. Is it your intention that someone who has already done that should now be able to log in via ORCID?

Yes, it was my intention that someone who has already linked an ORCID ID profile should now be able to log in using ORCID. In my opinion, those shouldn't be separate, user still have the option to log in using a username and password. Moreover, linking ORCID ID uses the same Oauth mechanism to authorize a user. It would be a bad user experience that once accounts are linked, users cannot use all linked account benefits and have to link ORCID one more time just to log in.

What does "log in via ORCID" mean, exactly? What is the process that the user goes through (explain the steps for somebody who doesn't understand OAuth) and how does this process prove that the person sitting at the keyboard is the person with that ORCID ID?

From a user's perspective, once the user clicks Log in via ORCID (maybe we should rephrase it to Log in using ORCID), they are redirected to the ORCID authorization page where the user logs in to ORCID platform using ORCID credentials (here they are proving that the person sitting at the keyboard is the person with that ORCID ID). Once it's done we are redirected to our Physionet page and logged in if authorization was correct.

From a technical point of view, when the user is redirected to Physionet page after authorization, Physionet and Orcid exchange first the code, then the token, which consists of encrypted Orcid user data, such as ORCID ID. This token could also be used to access more ORCID user data, but we are not using this functionality. On our site, we are validating the token against malicious behavior. If everything is correct and this ORCID ID is linked to some user in Physionet then this user is logged in.

You also need to explain how "registration" works: describe at a high level what steps are taken, from when a new user first visits the site, until they are fully registered and logged-in.

If, after the authorization process, we see that this ORCID ID is not linked to any account, then we redirect users to a standard registration form, where they have to create a Physionet account providing a username, email, and password. Orcid profile is automatically linked to the new account using data from the Orcid authorization token. If an account is not created then nothing happens.

bemoody · 2024-09-23T16:42:00Z

On our site, we are validating the token against malicious behavior.

What is validated? What kind of malicious behavior? Please be specific.

This article seems like a good place to start: https://oauth.net/articles/authentication/

matkaczmarek · 2024-09-24T09:48:30Z

What is validated? What kind of malicious behavior? Please be specific.

This article seems like a good place to start: https://oauth.net/articles/authentication/

Thanks for a great article. Referring to it, when I was writing On our site, we are validating the token against malicious behavior, I had in mind Injection of access tokens which was resolved by retrieving the access token directly from ORCID endpoint not from request parameters: https://github.com/MIT-LCP/physionet-build/pull/2276/files#diff-2f424d21ad5aad51520be5d93166608c26e0df7a678f42851d72ad8b62fd9f15R536-R547

Also when implementing ORCID authentication, I followed their documentation (Implement OAuth, 3 legged OAuth): https://info.orcid.org/documentation/api-tutorials/api-tutorial-get-and-authenticated-orcid-id/ which results in authenticated ORCID user.

matkaczmarek · 2024-09-26T12:41:12Z

Hi @bemoody, after our last call I two changes:

I added id_token verification by checking the signature of the ID token using ORCID JSON Web Key Set (ORCID public keys used to verify JWT)
Log in using ORCID is now only presented to the user if ORCID_SCOPE contains an openid scope.

bemoody · 2024-09-27T20:47:32Z

If somebody's ORCID account was compromised on Monday, and they reset their ORCID password on Tuesday, and they try to log in to PhysioNet using ORCID on Wednesday, how do we know whether it's the real person or the attacker?

If they logged in to PhysioNet using ORCID on Monday, and their ORCID account was compromised on Tuesday, and they reset their ORCID password on Wednesday, and they visit a PhysioNet page on Thursday using the session cookie they received on Monday, how do we know whether it's the real person or the attacker?

matkaczmarek · 2024-09-30T15:29:58Z

If somebody's ORCID account was compromised on Monday, and they reset their ORCID password on Tuesday, and they try to log in to PhysioNet using ORCID on Wednesday, how do we know whether it's the real person or the attacker?

If the password is changed on Tuesday then all the active ORCID sessions should be invalidated including this Monday's one, therefor if the attacker tries to log in to Physionet on Thursday they will have to go through the OAuth authentication flow again which requires knowledge of the new ORCID password

If they logged in to PhysioNet using ORCID on Monday, and their ORCID account was compromised on Tuesday, and they reset their ORCID password on Wednesday, and they visit a PhysioNet page on Thursday using the session cookie they received on Monday, how do we know whether it's the real person or the attacker?

It depends on how long we keep the session on our side. When the ORCID account is compromised, Physionet itself cannot automatically know this. Unfortunately, ORCID (like most of the OAuth providers) does not notify relying parties about password resets or compromises. It is on our side to invalidate session cookies as often as possible (24 hours seems reasonable). If the attacker gets the session cookie to Physionet, he/she will be able to access the attacked account until the cookie expires. Also, if someone's ORCID account is compromised, good practice is to change the passwords to all websites this ORCID account was linked.

Those are very extreme scenarios. As website creators, we often cannot determine whether the account was compromised. Currently, in our flow, if an attacker gets access to a user's email, then Physionet cannot do anything, as the attacker can change the password (using reset password form) and log in to Physionet using a new password. In both scenarios, we trust the users that they can keep their email/ORCID accounts safe.

bemoody · 2024-09-30T19:00:01Z

If the password is changed on Tuesday then all the active ORCID sessions should be invalidated including this Monday's one, therefor if the attacker tries to log in to Physionet on Thursday they will have to go through the OAuth authentication flow again which requires knowledge of the new ORCID password

Right. So how do you know that the person went through the ORCID login - how do you know that the "code" they gave you was generated recently?

It is on our side to invalidate session cookies as often as possible (24 hours seems reasonable). If the attacker gets the session cookie to Physionet, he/she will be able to access the attacked account until the cookie expires

PhysioNet session IDs are valid for 14 days. This is not a security problem, because when you change your password, all other sessions are invalidated.

Having sessions that expire without warning or recourse is a significant usability problem (which is why I've insisted that we don't change from the Django defaults.) For example, I don't think it's okay if, after somebody spends 8 hours uploading a file, the upload is rejected because they had logged in 31 hours earlier. In fact, we could and should do better than we're doing now.

bemoody · 2024-09-30T19:00:44Z

I've now done a bunch of reading on OIDC, looked at some existing libraries and was unimpressed. ;) But it seems like there are a couple of possibilities:

OIDC providers may allow the RP to refresh the id_token in the same way as refreshing the access_token. This seems ideal if it works.
Embed in the page a hidden iframe that, every N minutes, redirects itself to the ORCID authentication page (using the query parameter prompt=none) - thereby ensuring that as long as you keep the browser open, the tokens are regularly refreshed.

matkaczmarek · 2024-10-01T14:35:02Z

Right. So how do you know that the person went through the ORCID login - how do you know that the "code" they gave you was generated recently?

It doesn't say in the documentation but I checked it experimentally and code is valid for no more than 10 minutes. After this time I got The authorization code is invalid or has expired error and it prevented me from exchanging it for access token and as a result I was not able to log in to Physionet

PhysioNet session IDs are valid for 14 days. This is not a security problem, because when you change your password, all other sessions are invalidated.

Having sessions that expire without warning or recourse is a significant usability problem (which is why I've insisted that we don't change from the Django defaults.) For example, I don't think it's okay if, after somebody spends 8 hours uploading a file, the upload is rejected because they had logged in 31 hours earlier. In fact, we could and should do better than we're doing now.

That's a valid case, but as said on a meeting the attackers will have access to the Physionet as long as they will have valid session. If we want to keep session alive as long as the ORCID session they it will last couple of hours (no more than one day).

bemoody · 2024-10-03T20:36:01Z

To elaborate on my previous comment:

Naturally, we only know what happens on our server. Verifying the id_token lets us know that the client was authenticated to ORCID servers at a particular time, but it doesn't tell us whether the client is still authenticated.

We want to allow authenticated clients to remain logged-in to PhysioNet for a reasonably long time (say, at least 24 hours), and we also want unauthenticated clients to be rejected within a reasonably short time period after they cease to be authenticated (say, at most one hour.)

That means that we need to regularly receive updated information from the ORCID service to tell us whether that client is still authenticated from their point of view.

There is no clever way to make this work without polling the ORCID server in some way.

making the client do the polling

One way to poll the ORCID server is to tell the client to go and get a new authorization code. It's possible that this could be done by embedding an iframe (e.g., in navbar.html, with appropriate CSS to make it invisible), where the content of the iframe contains:

<meta http-equiv="refresh" content="600; url=https://orcid.org/oauth/authorize?response_type=code&scope=openid&prompt=none&client_id=...&redirect_uri=...&state=...">

Then, after 600 seconds, the client will request a new auth code from orcid.org; orcid.org will give either a new auth code or an error; and we tell the client to do the same thing 600 seconds later.

Some logic is needed to ensure that if the user is authenticated via ORCID, and we have not received a positive response from the ORCID server recently, then the user is treated as anonymous. It might be possible to do this by expiring the Django session, but it might be better to use a separate middleware for this purpose.

Of course, we can also add logic to the user login view to say: if the client was previously authenticated via ORCID and has now expired, then redirect directly to ORCID without showing a PhysioNet login form.

polling on the server side

The other way to poll the ORCID server is to use the refresh_token, as described here: https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens

This might or might not work, depending on the OIDC provider. The advantage is that it's seamless from the client's POV - it works in places where client-side redirection is impossible.

This would have to be done by middleware, I would think. The Django auth backend API doesn't seem to provide a way to do it.

matkaczmarek · 2024-10-04T08:45:55Z

Thanks @bemoody for explaining what you want to achieve. It makes sense but let me verify what options we have and whether they are programmaticaly doable. Thanks also for the tip with iframe and refresh_token. I will come back to you with potential solutions.

rafalkosla101 · 2024-10-14T14:23:19Z

Since i have partly taken over this one:
As far as I have tested the implementation orcid is preventing his pages from being loaded in any kind of iframes to prevent security issues such as clickjacking attacks. We could be probably instead of iframe using a popup window that is somewhere hidden and checking what happens inside of that window but it is definitely not a elegant way to handle it and it would still be limiting the time of usage to a couple of hours

Of course, we can also add logic to the user login view to say: if the client was previously authenticated via ORCID and has now expired, then redirect directly to ORCID without showing a PhysioNet login form.

That in a way is quite limiting since a user would be trapped into "orcid login" way of accessing our platform whenever he/she connects orcid to the physionet itself, we could be adding some sort of disconnect feature but that kind of defeats a purpose. We could be also adding a button to end all active sessions which would to some extent minimalize the risk of attacker having 14 days of free physionet access

polling on the server side

The other way to poll the ORCID server is to use the refresh_token, as described here: https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens

Refresh token is not deactivated when user changes password and the token by itself has quite long expiry date which means that the attacker would be able to access the site anyway.

[T-CAIREM 1243]

matkaczmarek force-pushed the feature/orcid_login_1243 branch 2 times, most recently from d2dc9b3 to 03a3e01 Compare August 20, 2024 15:01

rafalkosla101 reviewed Aug 21, 2024

View reviewed changes

physionet-django/sso/templates/sso/login.html Outdated Show resolved Hide resolved

matkaczmarek marked this pull request as ready for review August 22, 2024 14:59

tompollard changed the title ~~Add logging using ORCID oauth~~ Add login using ORCID oauth Sep 5, 2024

matkaczmarek force-pushed the feature/orcid_login_1243 branch from 04185f1 to 8a5df40 Compare September 25, 2024 13:08

matkaczmarek force-pushed the feature/orcid_login_1243 branch 3 times, most recently from ed65f4f to 4e2602a Compare October 4, 2024 10:17

matkaczmarek added 8 commits October 28, 2024 10:16

Add logging using ORCID oauth

2fef032

[T-CAIREM 1243]

Remove debug leftover

27378e6

[T-CAIREM 1243]

Avoid linking two users with one ORCID account

abee0e2

Skip ORCID tests

555d817

Refactor code

cfd739f

Fix bug with not existing profile

08ebfde

Add ORCID id_token validation

0ade802

Show ORCID log in only if openid scope is enabled

7a2f12b

matkaczmarek added 3 commits October 28, 2024 10:16

Resolve poetry conflicts

e25ef82

Move JWKS URL to env variable

983003f

Rebase and resolve conflicts in poetry.lock

f4f6b1e

matkaczmarek force-pushed the feature/orcid_login_1243 branch from 4e2602a to f4f6b1e Compare October 28, 2024 09:23

Add feature flag to enable/disable orcid login

604a45b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add login using ORCID oauth #2276

Add login using ORCID oauth #2276

matkaczmarek commented Aug 20, 2024 •

edited

Loading

bemoody commented Sep 19, 2024

matkaczmarek commented Sep 23, 2024

bemoody commented Sep 23, 2024

matkaczmarek commented Sep 24, 2024

matkaczmarek commented Sep 26, 2024

bemoody commented Sep 27, 2024

matkaczmarek commented Sep 30, 2024 •

edited

Loading

bemoody commented Sep 30, 2024

bemoody commented Sep 30, 2024

matkaczmarek commented Oct 1, 2024

bemoody commented Oct 3, 2024

matkaczmarek commented Oct 4, 2024

rafalkosla101 commented Oct 14, 2024 •

edited

Loading

polling on the server side

Add login using ORCID oauth #2276

Are you sure you want to change the base?

Add login using ORCID oauth #2276

Conversation

matkaczmarek commented Aug 20, 2024 • edited Loading

bemoody commented Sep 19, 2024

matkaczmarek commented Sep 23, 2024

bemoody commented Sep 23, 2024

matkaczmarek commented Sep 24, 2024

matkaczmarek commented Sep 26, 2024

bemoody commented Sep 27, 2024

matkaczmarek commented Sep 30, 2024 • edited Loading

bemoody commented Sep 30, 2024

bemoody commented Sep 30, 2024

matkaczmarek commented Oct 1, 2024

bemoody commented Oct 3, 2024

making the client do the polling

polling on the server side

matkaczmarek commented Oct 4, 2024

rafalkosla101 commented Oct 14, 2024 • edited Loading

polling on the server side

matkaczmarek commented Aug 20, 2024 •

edited

Loading

matkaczmarek commented Sep 30, 2024 •

edited

Loading

rafalkosla101 commented Oct 14, 2024 •

edited

Loading