-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add REST API method batch-suggest #664
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #664 +/- ##
========================================
Coverage 99.57% 99.58%
========================================
Files 88 89 +1
Lines 6146 6268 +122
========================================
+ Hits 6120 6242 +122
Misses 26 26
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
Would it make sense to limit the number of documents in a single REST call to the minibatch size (currently 32)? After all, there would be little benefit (except maybe avoiding some HTTP overhead) in processing more than one minibatch on the backend side. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
#682 introduced Schemathesis to automate the testing of the actual API, which was previously done with manually written tests. Manually written tests allowed to test specific things, e.g. which error code arises for which (malformed) request. For example now there is the limit of 32 documents for Schemathesis uses the examples from OpenAPI specification and some random inputs in path and query parameters. The requests to
Long list of request logs
But there is no request with 400 code, which is the code for |
fcba876
to
6f9f944
Compare
afad237
to
d819246
Compare
d819246
to
74980c0
Compare
Finally I managed to drop the unnecessary merge and revert commits. I recreated the PR branch from the current main and then cherry-picked the good commits from a backup PR branch. I don't know why rebasing did not work: there ended up also all commits made to main not matter what I tried. Changing base branch back and forth did not help as usually. But now there is a problem installing just released version of a dependency in GH Actions (but not on my laptop):
|
This seemed to be caused by issue sighingnow/libclang#46, which got fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this locally and it seems to work well. A couple of points to consider:
- I suggested a little change to the wording of the method summary
- I verified that the method fails if given more than 32 documents, as it should. But the error message is a bit confusing:
{
"detail": "[{'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'plaa'}] is too long - 'documents'",
"status": 400,
"title": "Bad Request",
"type": "about:blank"
}
Basically the "detail" field includes the whole request, which could be extremely long; and it ends with "is too long - 'documents'" which is not that helpful. Is this something we could change or is this coming directly from Connexion so we can't do anything about it? I would like to see a more helpful message, which wouldn't include the whole request body, just a message stating that there were too many documents.
- Related to the above, can we write (for example using schemathesis) unit tests that verify that the API accepts 32 documents, but doesn't accept 33 documents? I'm worried that if we change the implementation later, the limit of 32 will be dropped and it could lead to problems on the backend side.
If you can fix at least some of the above it would be great, but if points 2 and/or 3 are too difficult, I think it's also OK to merge this as it is.
Done.
I added This seems to be the recommended way to modify the validation, found it via spec-first/connexion#558.
I added tests for cases 32 and 33 documents, and restored the manually written Swagger/OpenAPI tests that I removed in #682. These test do not rely on Schemathesis but on the |
Required as jsonschema is now imported directly
annif/openapi/validation.py
Outdated
"""Validate the request body against the schema.""" | ||
|
||
if self.is_null_value_valid and is_null(data): | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is not covered by unit tests - can we do something about it? If nothing else, mark it with a # noqa
annotation so it doesn't show up in coverage reports...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just marked the line with # noqa
. The validation mechanism in soon-to-be-released Connexion 3 is changed (spec-first/connexion#1610), so this needs to be addressed again when/if upgrading to Connexion 3.
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
Adds a new
/v1/projects/{project_id}/suggest-batch
REST API method. Based on the branch of PR #663, implements the REST API part of #579.The method accepts in json the documents (max. 32) with the optional
document_id
field:The
limit
,threshold
andlanguage
parameters are optional as for the regularsuggest
method and can be given as URL query parameters:An example response is:
The
document_id
is null in the response if the document in the request does not have one. It is similar to theexternal_id
field of MonkeyLearn classifier and to theindex
AWS Comprehend BatchDetectKeyPhrases.