Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some kind of "document type" field to "/" (capabilities) #289

Closed
soxofaan opened this issue Jun 8, 2020 · 18 comments
Closed

Add some kind of "document type" field to "/" (capabilities) #289

soxofaan opened this issue Jun 8, 2020 · 18 comments

Comments

@soxofaan
Copy link
Member

soxofaan commented Jun 8, 2020

@jdries fixed Open-EO/openeo-python-client#142 ( .well-known/openeo discovery), but I was wondering some more about it: couldn't we add a required field (e.g. "doctype") with fixed value (e.g. "openeo-capabilities") to the capabilities document? Then a client can easily detect without heuristics that a given url is root of the API. Now there is not really an explicit way to detect whether a returned document is the capabilities document, a well-known document, something else, ....

@m-mohr
Copy link
Member

m-mohr commented Jun 8, 2020

Why do you need to know that? Shouldn't that be clear from context?

The requested URL contains /.well-known/openeo => well-known document (or invalid response)
Otherwise: Capabilities including the required fields such as api_version (or invalid response)

@soxofaan
Copy link
Member Author

soxofaan commented Jun 8, 2020

Otherwise: Capabilities including the required fields such as api_version (or invalid response)

that's indeed the only thing we can currently do

But IMHO the presence of a field api_version is a very weak indicator that we are handling an openeo capabilities document. You can of course add other heuristics like stac_version or endpoints, but those are still weak indicators.

e.g. when one tries to connect to https://example.com (which is an example from the docs), and the response has a json field "api_version", is that enough of an indicator that we are already handling the capabilities document? Or should the client first inspect https://example.com/.well-known/openeo and work from there?

@m-mohr
Copy link
Member

m-mohr commented Jun 8, 2020

Or should the client first inspect https://example.com/.well-known/openeo and work from there?

Yes. The intended behavior is:

  1. User inputs the Back-end URL, which is meant to be the URL of the well-known document minus /.well-known/openeo. That makes usually for quite nice and easy to remember URLs such as earthengine.openeo.org or openeo.vito.be.
  2. Client appends ./well-known/openeo and requests the details.
  3. From there clients by default connect to the most recent production ready version (or users can choose a version if they want).
  4. Client retrieves Capabilities document from the URL in the well known document.

In this case there should be no issues and no such "document type" field be required.

If that workflow doesn't work because a user directly used a versioned url (e.g. earthengine.openeo.org/v1.0; but should never happen) then the Client should ask for the well-known document there, see that it doesn't exist (404) and then expect that the URL leads to a Capabilities document. If it can't be read, the URL should be rejected.

@soxofaan
Copy link
Member Author

soxofaan commented Jun 9, 2020

I understand the high level flow, but there are some details that make things shaky

user directly used a versioned url

what does this mean in practice? that the url has numbers in it? that the url has a non emptypath component (e.g. https://example.com/foo/bar)?
What if a backend provides the API root at the empty path /?

For example: the EURAC backend currently has this .well-known/openeo:

{"versions": [
    {
        "production": false,
        "api_version": "0.4.2",
        "url": "https://openeo.eurac.edu"
    },
    {
        "production": false,
        "api_version": "0.3.1",
        "url": "http://saocompute.eurac.edu/openEO_0_3_0/openeo/"
    }
]}

Note that both have "production": false.

So a client that uses the flow you describe won't be able to connect to https://openeo.eurac.edu because there is no production ready version in .well-known/openeo and https://openeo.eurac.edu is not a "versioned" url, so the client should not assume it can connect directly.

@m-mohr
Copy link
Member

m-mohr commented Jun 9, 2020

what does this mean in practice? that the url has numbers in it? that the url has a non emptypath component (e.g. https://example.com/foo/bar)?

No, "versioned url" is maybe a bit misleading, but it means you can't access a well-known document for the given url (or the client is already aware of it being a versioned url as it's listed in the well-known document).

What if a backend provides the API root at the empty path /?

Not a problem. You connect to ex.com/.well-known/openeo and that lists ex.com/ as versioned url and then you can read the Capabilities there.

For example: the EURAC backend currently has this .well-known/openeo:
[...]
Note that both have "production": false.
So a client that uses the flow you describe won't be able to connect to https://openeo.eurac.edu because there is no production ready version in .well-known/openeo

Then use the most recent non-production version. I really thought that's part of the openAPI description, but it's not. I'll fix that.

https://openeo.eurac.edu is not a "versioned" url, so the client should not assume it can connect directly.

It is a versioned url. Versioned URL, as described above, is a bit misleading here. It just means there's an openEO API compliant to a specific version behind it.

@soxofaan
Copy link
Member Author

soxofaan commented Jun 9, 2020

"versioned url" ... means you can't access a well-known document for the given url (or the client is already aware of it being a versioned url as it's listed in the well-known document)

Versioned URL, ... It just means there's an openEO API compliant to a specific version behind it.

hmm I'm still confused by these recursive/circular definitions

e.g. the https://openeo.eurac.edu example: following your first rule it is not versioned because you can access a well-known document for it, following your second rule it is versioned because there is a specific version behind it.

Also, see Open-EO/openeo-python-driver#40: e.g. we want the VITO backend to support API roots urls that are not exposed in the well-known document (e.g. for legacy/custom use cases that require a pinned down version). How do you detect that a url is "versioned" if it is not described in the well-known document? If there would be something like a "doctype" field in the capabilities response, then there is a explicit claim to check that we are working with a capabilities response, no need to brittle heuristics.

@m-mohr
Copy link
Member

m-mohr commented Jun 9, 2020

Okay, forget my definitions above, it seems they were not 100% spot on. Let's go through two examples here:

Each back-end has a URL that is used to connect to the back-end. Depending on the implementation it may be "virtual" and not return any openEO API specific response. In this example we use:

Clients can use both of these URLs through the (always unversioned!) well-known discovery mechanism:

Those URLs comply to all API versions (except 0.3 as the mechanism was introduced later).

From the well-known discovery you get to know the versioned URLs that return responses compliant to a specific openEO API version, for example:

All these are versioned URLs as they return a document from the openEO API, which is specific to a version.

So there are basically three types of URLs:

  • "virtual" URL used by a user in a client to connect to a back-end
  • versioned URLs returning responses compliant to a specific openEO API version (i.e. everything bundled under the capabilities documents, which well-known doesn't belong to)
  • well-known URL: a (unversioned) URL used for well-known discovery returning the versioned URLs

So yes, https://openeo.eurac.edu is (due to historical reasons) used for multiple things (unfortunately). These URL types are not necessarily exclusive.

How do you detect that a url is "versioned" if it is not described in the well-known document?

Let's say your non-exposed instance is running at https://openeo.vito.be/openeo/2.0/. Your user passes this URL to the client and it tries (as always) first to connect to https://openeo.vito.be/openeo/2.0/.well-known/openeo. It receives an 4xx HTTP status code so that it can assume the users want to connect directly to the unexposed document and should then read https://openeo.vito.be/openeo/2.0/ and expect a Capabilities response. To clarify, connecting to non-exposed back-ends is not recommended and therefore the API doesn't cater for this use-case. If you follow this behavior (also described above), you shouldn't run into issues. At least it's working in the other clients.

If there would be something like a "doctype" field in the capabilities response, then there is a explicit claim to check that we are working with a capabilities response, no need to brittle heuristics.

Doesn't work as your legacy back-end doesn't implement doctype. We can't add that field to an old API release.

I hope this makes it clear, otherwise I'd suggest hoping on a quick call later.

@m-mohr
Copy link
Member

m-mohr commented Jun 9, 2020

Maybe this issue is related to the Python implementation. I just had a look at Open-EO/openeo-python-client@a126571 and it seems you are doing the connect process the other way round. It first checks for capabilities and then for well-known discovery, but it should make it the other way round. First check the well-known discovery, then call capabilities.

@jdries
Copy link

jdries commented Jun 9, 2020

That was only the first commit :-), I fixed it right after that:
https://github.com/Open-EO/openeo-python-client/blob/d4ad052ed7b39a0929afac3b06480b2d4a1d18f6/openeo/rest/connection.py#L318

m-mohr added a commit that referenced this issue Jun 9, 2020
@m-mohr
Copy link
Member

m-mohr commented Jun 9, 2020

So a client that uses the flow you describe won't be able to connect to https://openeo.eurac.edu because there is no production ready version in .well-known/openeo

Then use the most recent non-production version. I really thought that's part of the openAPI description, but it's not. I'll fix that.

Fixed that in 83af45d

@soxofaan
Copy link
Member Author

soxofaan commented Jun 9, 2020

to the client and it tries (as always) first to connect to https://openeo.vito.be/openeo/2.0/.well-known/openeo

Ah, this might also contribute to my confusion: my understanding of RFC 5785 (well-known URIs) is that the path component in these urls must start with .well-known and it can not be prefixed by e.g. /openeo/2.0. Also see https://tools.ietf.org/html/rfc5785 Appendix B: FAQ "Why aren't per-directory well-known locations defined?"

This property is also explicitly stressed in the openeo docs https://openeo.org/documentation/1.0/developers/api/reference.html#operation/connect .

Is it then intended that a client should blindly do request to non-standard .well-known url?
Or can it take shortcut, and only use .well-known in a standard way, e.g. in pseudo code:

if $url has path component "/":
    # "Automatic version discovery" mode
    request $url/.well-known/openeo
    $api_root = highest production version (or highest version when all are non-production)
else:
    # "User picks API version" mode
    $api_root = $url

(also note that the openeo docs use wrong RFC number in the text: Well-Known URI (see RFC 57855) for, the link is correct however)

m-mohr added a commit that referenced this issue Jun 9, 2020
@m-mohr
Copy link
Member

m-mohr commented Jun 9, 2020

Yes, https://openeo.vito.be/openeo/2.0/.well-known/openeo should be an invalid URL and should return 404, which is in-line with RFC 5785, I think. Although I can't prevent providers from not using root for well-known if there are reasons for it (e.g. not having access to a domain directly, see below).

Is it then intended that a client should blindly do request to non-standard .well-known url?

I blindly ask to request it, yes. (JS client)

Or can it take shortcut, and only use .well-known in a standard way, e.g. in pseudo code:

Not sure. That really depends whether all API implementations can support this, e.g. for us as university it's always a stretch to get new domains and usually are hosted in sub-directories. I don't have a strong opinion here, there are pros and cons for both sides.

I fixed the RFC number.

@soxofaan
Copy link
Member Author

Yes, https://openeo.vito.be/openeo/2.0/.well-known/openeo should be an invalid URL and should return 404, which is in-line with RFC 5785,

FYI: eodc backend currently doesn't fail on this apparently:

$ curl  https://openeo.eodc.eu/v1.0/.well-known/openeo
{
  "versions": [
    {
      "api_version": "1.0.0-rc.2", 
      "production": false, 
      "url": "https://openeo.eodc.eu/v1.0"
    }, 
    {
      "api_version": "0.4.2", 
      "production": false, 
      "url": "https://openeo.eodc.eu/v0.4"
    }
  ]
}

@m-mohr
Copy link
Member

m-mohr commented Jun 10, 2020

I'm aware of that, but that's a back-end issue to be fixed by EODC. @lforesta

@lforesta
Copy link

@m-mohr I missed that, we'll fix it soon, thanks.

@soxofaan
Copy link
Member Author

Another data point: Microsoft Identity platform (used by EURAC atm) uses well-known urls outside of domain root for the OIDC discovery doc, e.g. for EURAC: https://login.microsoftonline.com/92513267-03e3-401a-80d4-c58ed6674e3b/v2.0/.well-known/openid-configuration

@m-mohr
Copy link
Member

m-mohr commented Jun 16, 2020

Yeah, that's logical as they host multiple instances and you can't have that in a single location. So I could imaging that you could hide that with a subdomain in the settings or so, but on the other hand it doesn't really matter whether it's root or not. Same for openEO. If all clients follow the same procedure, it shouldn't be an issue. By the way, is this issue now resolved and can be closed?

@soxofaan
Copy link
Member Author

yes it's fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants