Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user_id format and uniqueness guarantees #404

Closed
soxofaan opened this issue Jun 29, 2021 · 15 comments
Closed

user_id format and uniqueness guarantees #404

soxofaan opened this issue Jun 29, 2021 · 15 comments
Assignees
Labels
documentation user management incl. authorization, account management and billing
Milestone

Comments

@soxofaan
Copy link
Member

openeo-api/openapi.yaml

Lines 5288 to 5295 in f303d65

user_id:
type: string
description: >-
Unique identifier of the user. MUST match the specified pattern.
This is usually a randomly generated internal identifier from the provider
not meant for displaying purposes.
pattern: '^[\w\-\.~]+$'
example: john_doe

generated internal identifier from the provider

In case of OpenID Connect in general and EGI Check-in in particular (as used for openEO Platform), the only guaranteed claim that is usable as unique identifier is the sub claim, which is not guaranteed to follow the regex ^[\w\-\.~]+$.
For example, EGI Check-in returns an email-looking sub, e.g. [email protected].

The only official specification I found so far for the sub claim is in https://openid.net/specs/openid-connect-core-1_0.html#UserInfoResponse:

It MUST NOT exceed 255 ASCII characters in length.

Which is pretty liberal, and could be even read as "any byte string of at most 255 bytes will do". No real-world OIDC provider will probably use weird ASCII control characters, but they still might use normal characters that cause trouble when used in certain parts of the API (URL paths, process namespaces, ...) because there are used as separators or something like that (/, :, @ ...)

Another related, but smaller, more theoretical problem is that openEO back-ends are allowed to support multiple OIDC providers, which could break the uniqueness of the user ids across multiple providers (and/or basic authentication).

@m-mohr
Copy link
Member

m-mohr commented Jun 29, 2021

When writing this I expected that the back-ends have separate user storage databases or similar where they create new users once a new user registers (logins) via an external IP. In this case, a valid ID can be generated. Is that not true here? You need to store user-related data (name? contact details? billing? plan? settings? ...) anyway or do you directly store all user-related (meta-)data with EGI?

@soxofaan
Copy link
Member Author

The API spec currently states vaguely

This is usually a randomly generated internal identifier from the provider

So the problem is not in the API spec itself, and it's up to the back-end implementer to workaround the problems mentioned above, but it's still a bit misleading and hiding some implementation details that are worth mentioning.

@m-mohr
Copy link
Member

m-mohr commented Jun 29, 2021

This is usually a randomly generated internal identifier from the provider

This is actually not an ideal remark in the spec. Those user IDs were meant to be used in the future for sharing etc and thus something "readable" would be great, like @soxofaan or @m-mohr here on GitHub. Maybe we should change the wording with regards to that. I doubt we should use the OIDC subs for such things.

@soxofaan
Copy link
Member Author

separate user storage databases or similar where they create new users once a new user registers (logins) via an external IP. In this case, a valid ID can be generated. Is that not true here?

Probably but not necessarily (e.g. the current version of openeo-aggregator backend is stateless and just uses/forwards user identification tokens provided by OIDC).
Also, the whole point of OIDC is to offload all the challenges of user management to an OIDC provider, and if that OIDC provider provides a reliable unique user identifier why couldn't a backend use that directly?

@m-mohr
Copy link
Member

m-mohr commented Jun 29, 2021

Probably but not necessarily (e.g. the current version of openeo-aggregator backend is stateless and just uses/forwards user identification tokens provided by OIDC). Also, the whole point of OIDC is to offload all the challenges of user management to an OIDC provider ...

Yes, but the aggregator just proxies to other back-ends, which itself have to handle users somehow. I mean you need to check whether someone has paid their bills or store billing info etc. That's usually something you don't do via an IP (EGI might be an exception here, but think about login via Google, FB, GH etc).

if that OIDC provider provides a reliable unique user identifier why couldn't a backend use that directly?

Sure, if that's a good identifier that complies to the rules you can use it. But what happens if I delete my account at EGI and want to use another auth provider instead? That happens from time to time if you login via multiple IP providers. At least there are setups like that.

@soxofaan
Copy link
Member Author

This is usually a randomly generated internal identifier from the provider
This is actually not an ideal remark in the spec.

Actually: the "provider" in that statement: does that refer to OIDC provider or to the openEO back-end as provider?

Those user IDs were meant to be used in the future for sharing etc and thus something "readable" would be great, ... I doubt we should use the OIDC subs for such things.

note that the description currently also states this as well:

not meant for displaying purposes.

@m-mohr
Copy link
Member

m-mohr commented Jun 29, 2021

Actually: the "provider" in that statement: does that refer to OIDC provider or to the openEO back-end as provider?

back-end

note that the description currently also states this as well:

not meant for displaying purposes.

Yeah, that was not written very well. What I meant to say here that for the response { user_id 'jon_doe123', name: 'John Doe'} the UIs should display 'John Doe" and not the id.

@soxofaan
Copy link
Member Author

To give a bit of context why I'm bringing this up: I am working on UDP sharing (just simple non-fine-grained public sharing) so I am looking for some kind of user handle that is guaranteed to be unique and does not mess with the syntax of URLs, namespaces, ...

@m-mohr
Copy link
Member

m-mohr commented Jun 29, 2021

So it was meant to be (and should be clarified to be?) something like:

A unique user identifier at the back-end provider. It is meant to be used as an identifier in URIs (e.g. for sharing purposes). To display a name in a UI where the actual name of the user should be shown, prefer to show the name instead of the id.

@m-mohr
Copy link
Member

m-mohr commented Jun 29, 2021

For your use-case, the user id was meant to be used, indeed. That's why the regex is in place.

@soxofaan
Copy link
Member Author

Actually: the "provider" in that statement: does that refer to OIDC provider or to the openEO back-end as provider?
back-end

ok good to know, I was reading it as OIDC provider

@m-mohr
Copy link
Member

m-mohr commented Jun 29, 2021

Okay, I'll prepare a PR that will clarify this. Probably an improved version of what I've quickly drafted an posted above: #404 (comment)

@m-mohr m-mohr added this to the 1.2.0 milestone Jun 29, 2021
@m-mohr m-mohr self-assigned this Jun 29, 2021
@m-mohr m-mohr added documentation user management incl. authorization, account management and billing labels Jun 29, 2021
@soxofaan
Copy link
Member Author

Thanks for the feedback and clarification.

Overall I think there is some kind of missing link in the API spec or its documentation:

  • if I understand correctly: a back-end implementation SHOULD have some kind of user registration mechanism:
    • in addition to standard OIDC authentication/authorization, where at least some kind of "user id" is constructed and possibly other properties (billing, privacy agreements, ...) are set up
    • (or to support basic authentication)
  • how and when should this happen (before first usage of a backend, automatically on first access, ...)?

Of course that whole additional registration flow should not be part of the openEO API,
but apparently some aspects should be specified or documented in some way, e.g.

  • the fact that there SHOULD be some kind of additional registration (e.g. not only for technical but also for legal reasons)
  • user_id regex
  • the behavior of API when user is authenticated with OIDC but not yet "registered" with back-end? Should there be HTTP forwarding mechanisms or dedicated error codes so that clients can streamline the user experience?

@m-mohr
Copy link
Member

m-mohr commented Jun 29, 2021

  • if I understand correctly: a back-end implementation SHOULD have some kind of user registration mechanism:

Depends on the back-end structure. If there's an existing user pool, it could be that no additional things are required (example: GEE).

By the way, have you read https://open-eo.github.io/openeo-api/draft/index.html#tag/Account-Management ? It also mentions OpenID Connect User Registration (although that's more useful if back-end = IP).

  • how and when should this happen (before first usage of a backend, automatically on first access, ...)?

Depends. back-ends are free to choose here. What probably doesn't work very well in openEO in contrast to other OIDC services is the "on-demand" registration once a new OIDC sub is approaching the back-end. Simply not possible because we have multiple clients and the clients don't know how the registration for the back-end should work or look like. So a user must be either automatically registered or register before the back-end is actually used. The API does not mandate any behavior here.

  • the fact that there SHOULD be some kind of additional registration (e.g. not only for technical but also for legal reasons)

see the link above

  • user_id regex

What's the to-do here?

  • the behavior of API when user is authenticated with OIDC but not yet "registered" with back-end? Should there be HTTP forwarding mechanisms or dedicated error codes so that clients can streamline the user experience?

see above

m-mohr added a commit that referenced this issue Jul 27, 2021
@m-mohr
Copy link
Member

m-mohr commented Jul 27, 2021

The commit above adds a clarification for user_id.

Additionally, new Relation types were added:

  • GET /: create-form to link to the registration page
  • GET /: recovery-form to link to the credentials recovery page.
  • GET /me: New Relation types alternate and related for user-specific external pages. #404

I hope this covers (most/all) aspects of this issue. Otherwise, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation user management incl. authorization, account management and billing
Projects
None yet
Development

No branches or pull requests

2 participants