Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloadable spreadsheet to stay up to date with any metadata updates that are made #13

Closed
18 tasks done
ofanobilbao opened this issue Oct 2, 2020 · 15 comments · Fixed by ebi-ait/ingest-broker#16 or ebi-ait/ingest-client#22
Assignees

Comments

@ofanobilbao
Copy link

ofanobilbao commented Oct 2, 2020

Implementation - generate a spreadsheet with the data filled in as represented in Ingest.

Sub-tasks: (added 14/07/2021, issues found last sprint demo)

  • Fix caching and exported spreadsheet cleanup

  • Add the missing rows (the 1st 4 rows from the original spreadsheet with the column names and static text)

  • Add all the other data types (Karoly)

    • Processes
    • Protocols
    • Data (files)
  • Fix tab names --- shouldn't be encounter module worksheets (Alegria)

  • integration / unit test (download & upload spreadsheet) (Karoly?)
    https://github.com/ebi-ait/ingest-broker/pull/17/files

  • bug in spreadsheet generated - some header columns are missing like 'project.contributors.project_role.ontology_label' (added 15/07/2021) Alegria & Karoly

  • metadata row values are not aligned to correct column headers (added 16/07/2021) Alegria & Karoly

  • data rows should start at row 6 (16/07/2021) Karoly

  • project worksheet should go first before "project - contributors" etc (added 19/07/2021) - Karoly

  • submission errors are not being cleared (added 20/07/2021)

  • couldn't import spreadsheets with module worksheets like collection protocol - reagents see 10x test spreadsheet flattener shouldn't add this as a separate module worksheet (added 20/07/2021)

  • run test plans with a wrangler

  • docs - wrangler will help. See PR for SOP

  • testing scenarios

@amnonkhen
Copy link
Contributor

See miro board for architecture.

image.png

@aaclan-ebi
Copy link

aaclan-ebi commented Jul 16, 2021

Bug: Some column headers missing

Steps to replicate:

  1. Upload a test spreadsheet https://github.com/HumanCellAtlas/metadata-schema/blob/master/infrastructure_testing_files/current/dcp_integration_test_metadata_1_SS2_bundle.xlsx
  2. Go to the project's view
  3. Edit project and update 1 contributor to have a Contributor role ontology
  4. Download spreadsheet

Actual Behavior:
In the Project - Contributor worksheet there are cells which has no headers like project.contributors.project_role.ontology_label

Expected Behavior:
There should be no missing header columns.

@ke4
Copy link

ke4 commented Jul 23, 2021

Integration test subtask - I uploaded a work in progress version here:
ebi-ait/ingest-broker#17
Hopefully the test flow is understandable.

Missing/TODO:

  • We need to add JWT token for ingest API
  • ingest_api_url would be better to come from an environment variable. Currently it is hardcoded to use the dev environment

@ke4
Copy link

ke4 commented Jul 23, 2021

Sub-task: couldn't import spreadsheets with module worksheets like collection protocol - reagents see 10x test spreadsheet

In the downloaded spreadsheet the collection protocol has not got a UUID column. Also missing other properties if you compare the UI with the worksheet. The problem could lies around there, but I would need time to debug this more.
The code that generates the input JSON for the spreadsheet generation is in ingest-client. It is in the ingest/downloader/flattener.py class.

Step to reproduce the error:

  1. Download the following XLSX to your local computer: https://github.com/HumanCellAtlas/metadata-schema/blob/master/infrastructure_testing_files/current/dcp_integration_test_metadata_1_optimus_10X_bundle.xlsx
  2. Login in to Ingest UI
  3. Go to All Submissions tab
  4. Click on Upload New Submission button and upload the previously saved XLSX
  5. On the opened submission click on the Download submission button (Down arrow)
  6. Try to re-upload it Spreadsheet tab of the submission.

You are going to get this error:

'NoneType' object has no attribute 'add_module_entity'

Here is the stack trace:

Traceback (most recent call last):
  File "/app/src/hca-ingest/ingest/importer/importer.py", line 62, in import_file
    spreadsheet_json, template_mgr, errors = self.generate_json(file_path, is_update, project_uuid=project_uuid)
  File "/app/src/hca-ingest/ingest/importer/importer.py", line 44, in generate_json
    spreadsheet_json, errors = workbook_importer.do_import(ingest_workbook, is_update, project_uuid)
  File "/app/src/hca-ingest/ingest/importer/importer.py", line 220, in do_import
    registry.import_modules()
  File "/app/src/hca-ingest/ingest/importer/importer.py", line 166, in import_modules
    submittable_entity.add_module_entity(module_entity)
AttributeError: 'NoneType' object has no attribute 'add_module_entity'

@aaclan-ebi
Copy link

aaclan-ebi commented Jul 23, 2021

@ke4 and I tried to pair on fixing the collection protocol issue:

ebi-ait/ingest-client#24

We haven't fully tested this yet with the broker.

@jacobwindsor
Copy link

jacobwindsor commented Aug 5, 2021

@ke4 @aaclan-ebi I have ran the above steps using broker and client hash 99816a2 (PR 24). No errors seen in broker and seems to work as expected.

@aaclan-ebi
Copy link

@ESapenaVentura
Copy link

Tested this feature - kept metadata changes

@ESapenaVentura
Copy link

AI

@gabsie to create a removal ticket
@ke4 and @aaclan-ebi to work on integration test/rest of the items
@Wkt8 and @ESapenaVentura to look at SOP

@aaclan-ebi
Copy link

Remaining tasks to be filed in specific tickets:

  1. Bulk updates - removal of values for metadata Bulk updates - removal of values for metadata #437
  2. Integration test for Bulk Updates Integration test for Bulk Updates #438

@Wkt8 Wkt8 self-assigned this Sep 8, 2021
@Wkt8
Copy link

Wkt8 commented Sep 8, 2021

Picking up test scenarios and looking at the SOP again.

@aaclan-ebi
Copy link

Dev notes:

While developing this feature, @ke4 and I made the decision to prioritise the functionality and to scope out the generation of header rows and ensuring the order is the same as the generated spreadsheet:

Screenshot 2021-09-08 at 12 10 25

I believe we haven't rewritten the code for Spreadsheet template generation. We haven't duplicated what it's doing. It's just that we lack the time to study and understand that codebase to reuse or extend it so that we can also populate the header rows and generate the worksheets and columns in the same order as to how they are generated in the template.

Spreadsheet template generation:
The implementation is focused on reading the metadata schema and generating the module/concrete type entities as worksheets and properties as header rows/columns in the spreadsheet (generates the blue part of the spreadsheet)
location: https://github.com/ebi-ait/ingest-client/tree/master/ingest/template

Download Spreadsheet:
The implementation is focused on reading a list of metadata and flattening it (generates the yellow part/rows of the spreadsheet)
location: https://github.com/ebi-ait/ingest-client/tree/master/ingest/downloader

I can see how we can combine/merge these 2 separate groups of codes by mapping the property columns once we work further on this. @gabsie already made a ticket to support this #417

cc @amnonkhen

@Wkt8
Copy link

Wkt8 commented Sep 9, 2021

Summary of testing:

  • removing a field directly. This does not work. I could not remove values from a single cell, or remove values from a column of cells (it would not update when uploading the update spreadsheet) and no error message was tracked. This would be a very useful feature, and is probably highest priority. (Bulk updates - removal of values for metadata #437)
  • file's filename cannot be edited. This should be a ticket for a new feature. However, it is my understanding that no error message occurs when changing the file's filename via spreadsheet. If it does not take too long, a task for showing an error message should be prioritised. Otherwise low priority until we have evidence of a direct project being blocked by it. (file's filename cannot be edited #442)
  • Schema version. Supporting this could be one way of supporting migration, but apart from that I see no reason this needs to be a feature.

@Wkt8
Copy link

Wkt8 commented Sep 14, 2021

tagging @ami-day for reference

@Wkt8
Copy link

Wkt8 commented Sep 14, 2021

Removing a field directly partially works: see (#437)

@gabsie gabsie closed this as completed Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants