user_id format and uniqueness guarantees #404

soxofaan · 2021-06-29T14:11:15Z

Lines 5288 to 5295 in f303d65

    
               user_id: 
        
                 type: string 
        
                 description: >- 
        
                   Unique identifier of the user. MUST match the specified pattern. 
        
                   This is usually a randomly generated internal identifier from the provider 
        
                   not meant for displaying purposes. 
        
                 pattern: '^[\w\-\.~]+$' 
        
                 example: john_doe

generated internal identifier from the provider

In case of OpenID Connect in general and EGI Check-in in particular (as used for openEO Platform), the only guaranteed claim that is usable as unique identifier is the sub claim, which is not guaranteed to follow the regex ^[\w\-\.~]+$.
For example, EGI Check-in returns an email-looking sub, e.g. [email protected].

The only official specification I found so far for the sub claim is in https://openid.net/specs/openid-connect-core-1_0.html#UserInfoResponse:

It MUST NOT exceed 255 ASCII characters in length.

Which is pretty liberal, and could be even read as "any byte string of at most 255 bytes will do". No real-world OIDC provider will probably use weird ASCII control characters, but they still might use normal characters that cause trouble when used in certain parts of the API (URL paths, process namespaces, ...) because there are used as separators or something like that (/, :, @ ...)

Another related, but smaller, more theoretical problem is that openEO back-ends are allowed to support multiple OIDC providers, which could break the uniqueness of the user ids across multiple providers (and/or basic authentication).

The text was updated successfully, but these errors were encountered:

m-mohr · 2021-06-29T14:18:22Z

When writing this I expected that the back-ends have separate user storage databases or similar where they create new users once a new user registers (logins) via an external IP. In this case, a valid ID can be generated. Is that not true here? You need to store user-related data (name? contact details? billing? plan? settings? ...) anyway or do you directly store all user-related (meta-)data with EGI?

soxofaan · 2021-06-29T14:20:26Z

The API spec currently states vaguely

This is usually a randomly generated internal identifier from the provider

So the problem is not in the API spec itself, and it's up to the back-end implementer to workaround the problems mentioned above, but it's still a bit misleading and hiding some implementation details that are worth mentioning.

m-mohr · 2021-06-29T14:24:01Z

This is usually a randomly generated internal identifier from the provider

This is actually not an ideal remark in the spec. Those user IDs were meant to be used in the future for sharing etc and thus something "readable" would be great, like @soxofaan or @m-mohr here on GitHub. Maybe we should change the wording with regards to that. I doubt we should use the OIDC subs for such things.

soxofaan · 2021-06-29T14:29:07Z

separate user storage databases or similar where they create new users once a new user registers (logins) via an external IP. In this case, a valid ID can be generated. Is that not true here?

Probably but not necessarily (e.g. the current version of openeo-aggregator backend is stateless and just uses/forwards user identification tokens provided by OIDC).
Also, the whole point of OIDC is to offload all the challenges of user management to an OIDC provider, and if that OIDC provider provides a reliable unique user identifier why couldn't a backend use that directly?

m-mohr · 2021-06-29T14:33:52Z

Probably but not necessarily (e.g. the current version of openeo-aggregator backend is stateless and just uses/forwards user identification tokens provided by OIDC). Also, the whole point of OIDC is to offload all the challenges of user management to an OIDC provider ...

Yes, but the aggregator just proxies to other back-ends, which itself have to handle users somehow. I mean you need to check whether someone has paid their bills or store billing info etc. That's usually something you don't do via an IP (EGI might be an exception here, but think about login via Google, FB, GH etc).

if that OIDC provider provides a reliable unique user identifier why couldn't a backend use that directly?

Sure, if that's a good identifier that complies to the rules you can use it. But what happens if I delete my account at EGI and want to use another auth provider instead? That happens from time to time if you login via multiple IP providers. At least there are setups like that.

soxofaan · 2021-06-29T14:34:43Z

This is usually a randomly generated internal identifier from the provider
This is actually not an ideal remark in the spec.

Actually: the "provider" in that statement: does that refer to OIDC provider or to the openEO back-end as provider?

Those user IDs were meant to be used in the future for sharing etc and thus something "readable" would be great, ... I doubt we should use the OIDC subs for such things.

note that the description currently also states this as well:

not meant for displaying purposes.

m-mohr · 2021-06-29T14:36:57Z

Actually: the "provider" in that statement: does that refer to OIDC provider or to the openEO back-end as provider?

back-end

note that the description currently also states this as well:

not meant for displaying purposes.

Yeah, that was not written very well. What I meant to say here that for the response { user_id 'jon_doe123', name: 'John Doe'} the UIs should display 'John Doe" and not the id.

soxofaan · 2021-06-29T14:39:00Z

To give a bit of context why I'm bringing this up: I am working on UDP sharing (just simple non-fine-grained public sharing) so I am looking for some kind of user handle that is guaranteed to be unique and does not mess with the syntax of URLs, namespaces, ...

m-mohr · 2021-06-29T14:39:03Z

So it was meant to be (and should be clarified to be?) something like:

A unique user identifier at the back-end provider. It is meant to be used as an identifier in URIs (e.g. for sharing purposes). To display a name in a UI where the actual name of the user should be shown, prefer to show the name instead of the id.

m-mohr · 2021-06-29T14:39:42Z

For your use-case, the user id was meant to be used, indeed. That's why the regex is in place.

soxofaan · 2021-06-29T14:40:57Z

Actually: the "provider" in that statement: does that refer to OIDC provider or to the openEO back-end as provider?
back-end

ok good to know, I was reading it as OIDC provider

m-mohr · 2021-06-29T14:43:11Z

Okay, I'll prepare a PR that will clarify this. Probably an improved version of what I've quickly drafted an posted above: #404 (comment)

soxofaan · 2021-06-29T17:28:10Z

Thanks for the feedback and clarification.

Overall I think there is some kind of missing link in the API spec or its documentation:

if I understand correctly: a back-end implementation SHOULD have some kind of user registration mechanism:
- in addition to standard OIDC authentication/authorization, where at least some kind of "user id" is constructed and possibly other properties (billing, privacy agreements, ...) are set up
- (or to support basic authentication)
how and when should this happen (before first usage of a backend, automatically on first access, ...)?

Of course that whole additional registration flow should not be part of the openEO API,
but apparently some aspects should be specified or documented in some way, e.g.

the fact that there SHOULD be some kind of additional registration (e.g. not only for technical but also for legal reasons)
user_id regex
the behavior of API when user is authenticated with OIDC but not yet "registered" with back-end? Should there be HTTP forwarding mechanisms or dedicated error codes so that clients can streamline the user experience?

m-mohr · 2021-06-29T18:38:50Z

if I understand correctly: a back-end implementation SHOULD have some kind of user registration mechanism:

Depends on the back-end structure. If there's an existing user pool, it could be that no additional things are required (example: GEE).

By the way, have you read https://open-eo.github.io/openeo-api/draft/index.html#tag/Account-Management ? It also mentions OpenID Connect User Registration (although that's more useful if back-end = IP).

how and when should this happen (before first usage of a backend, automatically on first access, ...)?

Depends. back-ends are free to choose here. What probably doesn't work very well in openEO in contrast to other OIDC services is the "on-demand" registration once a new OIDC sub is approaching the back-end. Simply not possible because we have multiple clients and the clients don't know how the registration for the back-end should work or look like. So a user must be either automatically registered or register before the back-end is actually used. The API does not mandate any behavior here.

the fact that there SHOULD be some kind of additional registration (e.g. not only for technical but also for legal reasons)

see the link above

user_id regex

What's the to-do here?

the behavior of API when user is authenticated with OIDC but not yet "registered" with back-end? Should there be HTTP forwarding mechanisms or dedicated error codes so that clients can streamline the user experience?

see above

…ry and other user account related tasks #404

m-mohr · 2021-07-27T15:11:03Z

The commit above adds a clarification for user_id.

Additionally, new Relation types were added:

GET /: create-form to link to the registration page
GET /: recovery-form to link to the credentials recovery page.
GET /me: New Relation types alternate and related for user-specific external pages. #404

I hope this covers (most/all) aspects of this issue. Otherwise, please let me know.

…ry and other user account related tasks #404

m-mohr added this to the 1.2.0 milestone Jun 29, 2021

m-mohr self-assigned this Jun 29, 2021

m-mohr added documentation user management incl. authorization, account management and billing labels Jun 29, 2021

m-mohr added a commit that referenced this issue Jul 27, 2021

Clarify user_id, add relation types for registration, password recove…

da78f61

…ry and other user account related tasks #404

m-mohr closed this as completed Jul 27, 2021

m-mohr mentioned this issue Jul 27, 2021

Add support for new account related relation types Open-EO/openeo-web-editor#182

Closed

m-mohr added a commit that referenced this issue Jul 27, 2021

Clarify user_id, add relation types for registration, password recove…

b5a73fc

…ry and other user account related tasks #404

This was referenced May 3, 2023

Release v1.2.0 #492

Merged

Release openEO API v1.2.0 Open-EO/PSC#20

Closed

zcernigoj mentioned this issue Nov 27, 2023

openeo api version hardcoded to 1.0.0 Open-EO/openeo-sentinelhub-python-driver#56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user_id format and uniqueness guarantees #404

user_id format and uniqueness guarantees #404

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021 •

edited

Loading

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021 •

edited

Loading

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021 •

edited

Loading

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021 •

edited

Loading

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021

m-mohr commented Jun 29, 2021 •

edited

Loading

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021 •

edited

Loading

m-mohr commented Jul 27, 2021 •

edited

Loading

user_id format and uniqueness guarantees #404

user_id format and uniqueness guarantees #404

Comments

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021 • edited Loading

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021 • edited Loading

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021 • edited Loading

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021 • edited Loading

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021

m-mohr commented Jun 29, 2021 • edited Loading

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021

soxofaan commented Jun 29, 2021

m-mohr commented Jun 29, 2021 • edited Loading

m-mohr commented Jul 27, 2021 • edited Loading

m-mohr commented Jun 29, 2021 •

edited

Loading

m-mohr commented Jun 29, 2021 •

edited

Loading

m-mohr commented Jun 29, 2021 •

edited

Loading

m-mohr commented Jun 29, 2021 •

edited

Loading

m-mohr commented Jun 29, 2021 •

edited

Loading

m-mohr commented Jun 29, 2021 •

edited

Loading

m-mohr commented Jul 27, 2021 •

edited

Loading