Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Endpoint: Support pagination of XML Response #2000

Closed
DavisRayM opened this issue Jan 22, 2021 · 4 comments · Fixed by #2005 or #2039
Closed

Data Endpoint: Support pagination of XML Response #2000

DavisRayM opened this issue Jan 22, 2021 · 4 comments · Fixed by #2005 or #2039

Comments

@DavisRayM
Copy link
Contributor

DavisRayM commented Jan 22, 2021

Suggested Feature / Enhancement

Benefits of implementing the feature/enhancement

Suggested implementation plan(Steps to be taken to implement feature)

@DavisRayM DavisRayM added this to the Week 4 - 5 (2021) milestone Jan 22, 2021
@gstuder-ona
Copy link

One option that would meet ETL requirements would be a query parameter to the paginated JSON endpoint which just attached the XML as text in a JSON field. The paginated XML would need to be searchable in the same way as the JSON, so maybe this is a fast workaround.

@gstuder-ona
Copy link

As discussed in the product meeting, the requirements for the paginated xml endpoint would include pagination by the last_modified field, which is not included in the xform form instance xml - this would need to be attached as xml metadata. Also, the form_id is not included in the xform instance xml - this also would need to be attached as metadata.

Summary: What is needed is:

form id
form last modified
xform form submission (which should include form version)

@gstuder-ona
Copy link

gstuder-ona commented Jan 29, 2021

@DavisRayM what's up on stage-api looks good, as we built more of the pipeline we realized we also need to know the latest server time. Could we wrap the batch of results like:

<root><submission-batch serverTime="[iso]"><submission-item ...></submission-item></submission-batch></root>

This will allow us to be much more efficient (and safer) pulling the latest submissions from a form, as now we're never quite sure how recent the timestamps are compared to where the server time actually is.

@gstuder-ona
Copy link

gstuder-ona commented Mar 12, 2021

I took a quick look - I'm not sure we can use the endpoint as is? There seems to be some issues the lastModified field - here's an example:

https://api.ona.io/api/v1/data/558517.xml?query={"_date_modified":{"$gt":"2000-01-22T11:42:20"}}&limit=3
<submission-batch serverTime="2021-03-12T12:30:02.993349+00:00">
<submission-item created="2020-11-19T07:44:24.301684+00:00" formVersion="202011190742" lastModified="2021-03-11T12:31:58.498648+00:00" objectID="69885102">
...
</submission-item>
<submission-item created="2021-01-15T08:06:54.031132+00:00" formVersion="202011190801" lastModified="2021-03-11T12:31:59.005477+00:00" objectID="72915824">
...
</submission-item>
<submission-item created="2021-01-15T08:11:15.064601+00:00" formVersion="202101150803" lastModified="2021-03-11T12:31:58.979827+00:00" objectID="72915896">
...
</submission-item>
</submission-batch>

The JSON endpoint returns:

https://api.ona.io/api/v1/data/558517?query={"_date_modified":{"$gt":"2000-01-22T11:42:20"}}&limit=3
[{
	"_id": 69885102,
  ...
	"_date_modified": "2021-01-22T08:50:58",
  ...
}, {
	"_id": 72915824,
  ...
	"_date_modified": "2021-01-15T08:06:54",
  ...
}, {
	"_id": 72915896,
  ...
	"_date_modified": "2021-01-15T08:11:15",
  ...
}]

I'm not sure where the lastModified date is getting pulled from in the XML batch, but it isn't sorted and it doesn't match the search for date_modified? Is there another query we need to run?

Two other issues - empty batches return bad XML:

https://api.ona.io/api/v1/data/558517.xml?query={"_date_modified":{"$gt":"2030-01-22T11:42:20"}}&limit=3
<?xml version="1.0" encoding="utf-8"?>
<submission-batch serverTime="2021-03-12T12:40:29.814571+00:00">

Note the lack of a closing tag, which breaks the parsing.

Also, the JSON endpoint supports a "sort" parameter - in the JSON endpoint, you must specify the sort otherwise your results will not be sorted and so not useful for pagination.

https://api.ona.io/api/v1/data/558517?query={"_date_modified":{"$gt":"2000-01-22T11:42:20"}}&limit=3&sort=_date_modified
[{
	"_id": 72915824,
	"_date_modified": "2021-01-15T08:06:54",
}, {
	"_id": 72915896,
	"_date_modified": "2021-01-15T08:11:15",
}, {
	"_id": 72916460,
	"_date_modified": "2021-01-15T08:37:42",
}]

Note that the results here are now correctly sorted and different from the non-sort results.

The XML endpoint does not support sort, only an incomplete XML document is returned:

https://api.ona.io/api/v1/data/558517.xml?query={"_date_modified":{"$gt":"2000-01-22T11:42:20"}}&limit=3&sort=_date_modified
<?xml version="1.0" encoding="utf-8"?>
<submission-batch serverTime="2021-03-12T12:44:13.866463+00:00">

I think the ask here is to fully support the following queries:

  • https://api.ona.io/api/v1/data/558517.xml?query={"_date_modified":{"$gt":"2000-01-22T11:42:20"}}&limit=3&sort=_date_modified
    (for backwards-compatibility with older forms, we also need)
  • https://api.ona.io/api/v1/data/558517?query={"_submission_time":{"$gt":"2000-01-22T11:42:20"}, "_edited":{"$lt":"true"}}&limit=3&sort=_submission_time
  • https://api.ona.io/api/v1/data/558517?query={"_last_edited":{"$gt":"2000-01-22T11:42:20"}}&limit=3&sort=_last_edited

and that the "lastModified" attribute in submission-item match _date_modified, assuming that's the appropriate thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment