Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OIDC: Use 'sub' claim to identify the user #759 #772

Conversation

fflorent
Copy link
Collaborator

Context

Resolves #759

When a user change their email in the OIDC Identity Provider and connect again to Grist, a new account is created and they cannot access to their workspace anymore.

Technical considerations

Credits to @illode for the idea.

The sub claim in OIDC spec is used to uniquely identify a user, instead of their email (which can be changed).

We store it in the connect_id field of the users table which seems to be made for that and take advantage of it to update the email when required.

Florent FAYOLLE and others added 2 commits November 24, 2023 11:48
Using the 'sub' claim allows the user to change their email
in the OIDC Identity Provider without loosing their work.
name: this._extractName(userInfo)
name: this._extractName(userInfo),
connectId: userInfo.sub,
loginMethod: 'External',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed the loginMethod: 'External' is meant to be used for such case.

I can also add this property for UserProfiles created SAMLConfig.ts if that's relevant.

@georgegevoian georgegevoian requested review from georgegevoian and removed request for berhalak November 27, 2023 14:41
@paulfitz paulfitz self-requested a review November 27, 2023 14:51
@georgegevoian
Copy link
Contributor

Thanks for opening the PR @fflorent.

The connect_id column and External login method were originally added as part of the implementation for GristConnect. Some of the source code isn't in this repo; you can find it here, under /ext.

Since they're already being used for a different authentication method, I think it would be safer to add a new column and login method, to avoid unintended consequences in installations that have used both OIDC and GristConnect. @paulfitz what do you think?

@fflorent
Copy link
Collaborator Author

Thanks for the information @georgegevoian! 🙏

The connect_id column and External login method were originally added as part of the implementation for GristConnect.

Regarding the connect_id column which seems to be populated with the external_id of the DiscourseConnect endpoint. This field is documented as being:

external_id is any string unique to the user that will never change, even if their email, name, etc change. The suggested value is your database’s ‘id’ row number.

I feel like the meaning of external_id is the same as sub for OIDC.

Since they're already being used for a different authentication method, I think it would be safer to add a new column and login method, to avoid unintended consequences in installations that have used both OIDC and GristConnect

If I understand correctly, you talk about a case where an admin has switched from one to the other without purging the users table.
OIDC requiring sub to just be locally unique (source), that's a very good catch!

Instead of introducing another column which would have much an analogous role, and having to duplicate much of the logic of ensureExternalUser, would it make sense to rather store the login system used to authenticate the user (DiscourseConnect, SAML, OIDC, ...) and either:

  • raise an error at startup if there exist entries of users (or logins) whose login system does not match the one being configured (modulo we cannot use 2 different login system at the same time);
  • change the query to discriminate the records that don't have the same login systems than the one used to authenticate the user;

?

We can imagine, if that's reasonable, to set the value of the login_system (or whatever) field to DiscourseConnect for each record having a value for connect_id (if that's the only case we expect this field populated).

@georgegevoian
Copy link
Contributor

Thanks for the information @georgegevoian! 🙏

The connect_id column and External login method were originally added as part of the implementation for GristConnect.

Regarding the connect_id column which seems to be populated with the external_id of the DiscourseConnect endpoint. This field is documented as being:

external_id is any string unique to the user that will never change, even if their email, name, etc change. The suggested value is your database’s ‘id’ row number.

I feel like the meaning of external_id is the same as sub for OIDC.

Since they're already being used for a different authentication method, I think it would be safer to add a new column and login method, to avoid unintended consequences in installations that have used both OIDC and GristConnect

If I understand correctly, you talk about a case where an admin has switched from one to the other without purging the users table. OIDC requiring sub to just be locally unique (source), that's a very good catch!

Instead of introducing another column which would have much an analogous role, and having to duplicate much of the logic of ensureExternalUser, would it make sense to rather store the login system used to authenticate the user (DiscourseConnect, SAML, OIDC, ...) and either:

  • raise an error at startup if there exist entries of users (or logins) whose login system does not match the one being configured (modulo we cannot use 2 different login system at the same time);
  • change the query to discriminate the records that don't have the same login systems than the one used to authenticate the user;

?

We can imagine, if that's reasonable, to set the value of the login_system (or whatever) field to DiscourseConnect for each record having a value for connect_id (if that's the only case we expect this field populated).

Agree that duplication isn't ideal.

Related to your idea: currently, the users and logins tables effectively have a one-to-one relationship (in reality, it's defined as one-to-many in the model, but as you've probably already seen in places like User.loginEmail(), there's a tendency to do [0] when indexing into logins). It's easy to imagine an approach where more than one login can exist for a given user. Then, connect_id could live in logins instead of users, making it possible to be populated by different login methods, from which your idea to use the login for the system that authenticated the user may follow.

Something like that idea was something we considered in the past (for a feature to improve account linking), but since it was a fairly big change with a lot of cases to think through (for example, how to pick which email to use for loginEmail, since they won't necessarily be the same across all login methods), we set it aside for later. Perhaps now is the right time to do it.

@fflorent
Copy link
Collaborator Author

fflorent commented Nov 28, 2023

for example, how to pick which email to use for loginEmail, since they won't necessarily be the same across all login methods

I see! We may still be able to do a not-so-big refactoring by:

  • keeping that one-login-system at startup (we cannot mix, for example, OIDC and DiscourseConnect);
  • storing what login system is being used somewhere in the code;
  • make the tuple user_id + login_system unique;
  • join users table with logins using user_id in the join pivot but also filtering the logins records so they match with the currently used login system;

If that make sense, I can start working on it. My biggest apprehension is about the regressions, I don't know how easy it is to find every place to adapt for this work (maybe searching for every login[ occurrences?).

PS: Also i guess that we would need to change the UNIQUE(email) constraint to UNIQUE(email, login_system) for the logins table. As a user may use the same email address in 2 different login systems.

@dsagal
Copy link
Member

dsagal commented Dec 3, 2023

I tried to understand sub a little better. Am I right in understanding that even before userinfo() call, one could use tokenSet.claims() to get ID Token claims? These would include iss (issuer) and sub (subject), which together should identify a login uniquely.

I don't think we can introduce an assumption of a single login system. Grist SaaS allows two ways to log in already (with email + password, and with Google). As far as the structure of logins, @georgegevoian's idea to add the identifier to the logins table makes sense -- I imagine adding iss and sub (we've also referred to these elsewhere as authProvider and authSubject). But it's harder to think through the consequences of allowing the email address to change.

@fflorent
Copy link
Collaborator Author

fflorent commented Dec 11, 2023

But it's harder to think through the consequences of allowing the email address to change.

As you seem concerned about such a change, I guess that's not a simple task.

At the beginning I thought I could manage to implement that quickly (or more or less), however it looks like more ambitious that I thought.

May I close the PR? (letting me or anyone reopen-it later when the issue becomes relevant to address for they)

@paulfitz
Copy link
Member

May I close the PR? (letting me or anyone reopen-it later when the issue becomes relevant to address for they)

Hi @fflorent, ok to close the PR. Thanks for having taken a shot at this.

@fflorent
Copy link
Collaborator Author

Thanks for your feedback @dsagal and @paulfitz

@fflorent fflorent closed this Dec 19, 2023
@vviers vviers mentioned this pull request Sep 23, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using OIDC, the email is used to identify the user instead of the 'sub' claim
4 participants