-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance performance when exporting data on endpoint api/v1/data/<form_id>.<format>
#2460
Conversation
api/v1/data/<form_id>
f843300
to
93d9368
Compare
api/v1/data/<form_id>
api/v1/data/<form_id>.<format>
24e4aac
to
0ec8593
Compare
the default ordering by id is making queries for run extremely slow when exporting large amounts of data. to sort data by id in ascending order, the query parameter sort={"_id":1} will be used. For more info read https://github.com/onaio/onadata/blob/main/docs/data.rst#sort-submitted-data-of-a-specific-form-using-existing-fields
Notes are excluded when exporting the csv. The column _notes is usually added in the CSV but its always overriden to be blank as per the implmentation in the CSVDataFrameBuilder class. So this test case is futile
fix flaky tests by ensuring queryset is always ordered
fixing the rules would require alot of refactor so disabling the rules will suffice for now
1c48bd3
to
b25599a
Compare
@@ -815,6 +844,7 @@ def generate_external_export( # noqa C901 | |||
filter_query = options.get("query") | |||
meta = options.get("meta") | |||
token = options.get("token") | |||
sort = options.get("sort") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kelvin-muchiri Could we add documentation for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KipSigei Added the documentation
@@ -653,76 +658,114 @@ def list(self, request, *args, **kwargs): | |||
|
|||
return Response(serializer.data) | |||
|
|||
return custom_response_handler(request, xform, query, export_type) | |||
return custom_response_handler(request, xform, query, export_type, sort=sort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think its necessary to have a new sort
param since you can get it from request
Changes / Features implemented
api/v1/data/<form_id>.<format
is making queries run extremely slow when exporting large amounts of data. To sort data by id in ascending order, the query parametersort={"_id":1}
will be applied to the endpoint while to sort data by id in descending order the query parametersort={"_id"-1}
. For further reference read Sort submitted data of a specific form using existing fieldsCSVDataFrameBuilder._query_data
method which was querying the database twice despite data being passed toCSVDataFrameBuilder.to_flat_csv_export
. Make use of the data passed to ensure lose coupling between the utility classCSVDataFrameBuilder
and the code responsible for querying the data.onadata.apps.api.tests.viewsets.test_note_viewset.TestNoteViewSet.test_csv_export_form_w_notes
. Notes are excluded when exporting the csv. The column_notes
is usually added in the CSV but the value in each record is always overriden to be blank as per the implementation in theCSVDataFrameBuilder
class. So this test case is futileSteps taken to verify this change does what is intended
Side effects of implementing this change
Faster when exporting data on endpoint
api/v1/data/<form_id>.csv
Before submitting this PR for review, please make sure you have:
Closes #