Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning errors for individual queries #69

Open
wetneb opened this issue Jun 3, 2021 · 7 comments
Open

Returning errors for individual queries #69

wetneb opened this issue Jun 3, 2021 · 7 comments

Comments

@wetneb
Copy link
Member

wetneb commented Jun 3, 2021

Sometimes, reconciliation queries can be invalid, for all sorts of reasons:

  • their JSON structure does not fit the schema
  • they contain references to objects that do not exist (for instance, invalid property or type id)
  • some text fields could be too long?
  • the service could have a temporary failure in resolving that particular query

Because queries are generally sent by batches, we currently do not have a good way to return an error in those cases. The service can decide to return a HTTP 401 error (for instance) for the whole batch, but that is not so useful because the client does not know which of the queries caused the error, and perhaps it could also have made use of the reconciliation results for the non-failing queries in the same batch.

So we should devise a way to expose errors for individual queries. It should mostly be about finding a JSON syntax for it and specifying it in https://reconciliation-api.github.io/specs/latest/#reconciliation-query-responses

Originally brought up here: wetneb/openrefine-wikibase#116

@osma
Copy link
Contributor

osma commented Jun 3, 2021

Thanks for bringing this up @wetneb !

I suggest basing the JSON return format on the RFC7807 application/problem+json specification. The simplest possible example is a JSON object with the HTTP error code and a human-readable title:

{
  "title": "Not Found",
  "status": 404
}

There can also be a detail field with more information such as "Requested resource could not be found". There are other predefined fields such as type and instance.

It is also possible to add custom fields. For the batch case, it could be useful to add a field whose value is an array of the individual errors within the batch (perhaps those could also be expressed as Problem JSON objects - a bit of recursion never hurts, eh?).

I've used Problem JSON in the Annif REST API, but only in a very simple form.

There is also zalando/problem, a Java library for handling Problem JSON objects which may be useful for some implementations.

@wetneb
Copy link
Member Author

wetneb commented Jun 3, 2021

OK, so with this solution, if one query fails, we would not return query results for any of the other queries, right? I was initially thinking of a solution where you would be able to be more granular (return reconciliation results for the successful queries and errors for the unsuccessful ones), but perhaps that is not so clean and standard…

For instance, see ElasticSearch's bulk API, which supports that:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

@osma
Copy link
Contributor

osma commented Jun 3, 2021

Ah, that makes sense. Then perhaps the response for the whole batch could contain both results and errors like you say, and the errors could be represented as Problem JSON objects - or if that's not possible for some reason, then at least something very similar?

@jetztgradnet
Copy link
Member

jetztgradnet commented Jun 3, 2021

We had a very similar discussion just yesterday: our application metaphactory supports reconciliation from local graph database(s) but also federated reconciliation from multiple sources, e.g. a local graph database and Wikidata.
The big question is how to handle errors in some members: would one rather fail the entire lookup operation or return partial results without forwarding the information about smoe errors to the user?
Having some means to send partial results AND the information about some errors / warnings would be great!

@thadguidry
Copy link
Contributor

I think borrowing or using OpenAPI can be useful here. With OpenAPI a default response is how you describe errors collectively, not individually. Also with OpenAPI, there's a description of an empty body response, like 204 No Content.

  1. https://swagger.io/docs/specification/describing-responses/
  2. https://swagger.io/specification/#responses-object

But in link 1. above I think $ref at the operation level might help overall. Where you can say there are error responses all with the same status code and response body.
Imagine that 50% of the queries are OK with status code 200, but the other 50% of queries failed (As an example, they all gave an extra parameter for some reason that was not understood by the service).

@epaulson
Copy link

epaulson commented Jun 4, 2021

I know it's underspecified in the spec, but what do any of the servers do now - if one of the queries in a batch are invalid, like maybe 'q1' doesn't include a 'query' key, do servers just fail the whole batch?

@wetneb
Copy link
Member Author

wetneb commented Jun 4, 2021

For Wikidata it depends on the sort of error - some will fail the whole batch, some will return an empty list of results for the failing query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants