Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance analysis #74

Closed
jpmckinney opened this issue Oct 6, 2020 · 2 comments
Closed

Performance analysis #74

jpmckinney opened this issue Oct 6, 2020 · 2 comments

Comments

@jpmckinney
Copy link
Member

jpmckinney commented Oct 6, 2020

Follow-up to open-contracting/lib-cove-oc4ids#23

I edited test_basic.py to use the same test file as in open-contracting/lib-cove-oc4ids#23

diff --git a/cove_oc4ids/test_basic.py b/cove_oc4ids/test_basic.py
index e960a3a..55c572c 100644
--- a/cove_oc4ids/test_basic.py
+++ b/cove_oc4ids/test_basic.py
@@ -11,10 +11,15 @@ from django.core.files.base import ContentFile
     'null',
     '1',
     '{}',
+    None,
 ])
 def test_explore_page(client, json_data):
+    if json_data is None:
+        with open('data.json') as f:
+            json_data = f.read()
+
     data = SuppliedData.objects.create()
     data.original_file.save('test.json', ContentFile(json_data))

And wrote a small run.py file:

import pytest
pytest.main(['cove_oc4ids/test_basic.py::test_explore_page[None]'])

Then ran:

pip install memory_profiler matplotlib
env DJANGO_SETTINGS_MODULE=cove_project.settings mprof run run.py

Memory usage is way higher:

Screen Shot 2020-10-06 at 5 41 22 PM

I'll start putting @profile decorators and running:

env DJANGO_SETTINGS_MODULE=cove_project.settings python -m memory_profiler run.py
@jpmckinney
Copy link
Member Author

jpmckinney commented Oct 6, 2020

Commenting out the conversion from JSON to Excel reduces memory consumption by 3GB:

Screen Shot 2020-10-06 at 6 53 06 PM

Line-by-line:

Filename: /Users/james/Sites/remote/open-contracting/cove-oc4ids/cove_oc4ids/views.py

Line #    Mem usage    Increment   Line Contents
================================================
    31  273.348 MiB  273.348 MiB   @cove_web_input_error
    32                             @profile
    33                             def explore_oc4ids(request, pk):
    34  273.387 MiB    0.039 MiB       context, db_data, error = explore_data_context(request, pk)
    35  273.387 MiB    0.000 MiB       if error:
    36                                     return error
    37                             
    38  273.387 MiB    0.000 MiB       lib_cove_oc4ids_config = LibCoveOC4IDSConfig(config=settings.COVE_CONFIG)
    39                             
    40  273.387 MiB    0.000 MiB       upload_dir = db_data.upload_dir()
    41  273.387 MiB    0.000 MiB       upload_url = db_data.upload_url()
    42  273.387 MiB    0.000 MiB       file_name = db_data.original_file.file.name
    43  273.387 MiB    0.000 MiB       file_type = context['file_type']
    44                             
    45  273.387 MiB    0.000 MiB       if file_type == 'json':
    46                                     # open the data first so we can inspect for record package
    47  273.391 MiB    0.004 MiB           with open(file_name, encoding='utf-8') as fp:
    48  273.391 MiB    0.000 MiB               try:
    49  578.477 MiB  305.086 MiB                   json_data = json.load(fp, parse_float=Decimal)
    50                                         except ValueError as err:
    51                                             raise CoveInputDataError(context={
    52                                                 'sub_title': _("Sorry, we can't process that data"),
    53                                                 'link': 'index',
    54                                                 'link_text': _('Try Again'),
    55                                                 'msg': _(format_html('We think you tried to upload a JSON file, but it is not well formed JSON.'
    56                                                                      '\n\n<span class="glyphicon glyphicon-exclamation-sign" aria-hidden="true">'
    57                                                                      '</span> <strong>Error message:</strong> {}', err)),
    58                                                 'error': format(err)
    59                                             })
    60                             
    61  578.477 MiB    0.000 MiB               if not isinstance(json_data, dict):
    62                                             raise CoveInputDataError(context={
    63                                                 'sub_title': _("Sorry, we can't process that data"),
    64                                                 'link': 'index',
    65                                                 'link_text': _('Try Again'),
    66                                                 'msg': _('OC4IDS JSON should have an object as the top level, the JSON you supplied does not.'),
    67                                             })
    68                             
    69  578.480 MiB    0.004 MiB           schema_oc4ids = SchemaOC4IDS(lib_cove_oc4ids_config=lib_cove_oc4ids_config)
    70                             
    71                                     # Flatten Tool has catastrophically bad performance on even a 50 MB file (uses 3 GB). In the last 14 days as of
    72                                     # 2020-10-06, `flattened.xlsx` has been requested only once. As such, this feature is disabled.
    73                                     # context.update(convert_json(upload_dir, upload_url, file_name, lib_cove_oc4ids_config,
    74                                     #                             schema_url=schema_oc4ids.schema_url, replace=True,
    75                                     #                             request=request, flatten=True))
    76                                 else:
    77                                     schema_oc4ids = SchemaOC4IDS(lib_cove_oc4ids_config=lib_cove_oc4ids_config)
    78                                     context.update(convert_spreadsheet(
    79                                             upload_dir, upload_url,
    80                                             file_name, file_type,
    81                                             lib_cove_oc4ids_config,
    82                                             schema_url=schema_oc4ids.schema_url,
    83                                             pkg_schema_url=schema_oc4ids.pkg_schema_url))
    84                             
    85                                     with open(context['converted_path'], encoding='utf-8') as fp:
    86                                         json_data = json.load(fp, parse_float=Decimal)
    87                             
    88  578.480 MiB    0.000 MiB       context = common_checks_oc4ids(context, upload_dir, json_data,
    89  549.324 MiB    0.000 MiB                                      schema_oc4ids, lib_cove_oc4ids_config)
    90                             
    91  549.324 MiB    0.000 MiB       if not db_data.rendered:
    92  549.324 MiB    0.000 MiB           db_data.rendered = True
    93  549.324 MiB    0.000 MiB       db_data.save()
    94                             
    95  549.324 MiB    0.000 MiB       template = 'cove_oc4ids/explore.html'
    96                             
    97  835.574 MiB  286.250 MiB       return render(request, template, context)

It renders an 18 MB HTML file.

@jpmckinney
Copy link
Member Author

The biggest issue is fixed in 686643f#diff-785956484ea755ecf104f701159d514b

I'm not sure why memory increases so much for the render call. Someone more familiar with the views will have to improve their performance.

Follow-up issue: open-contracting/cove-ocds#90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant