Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#13 Spreadsheet download #22

Merged
merged 53 commits into from
Jul 21, 2021
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
afc8e27
WIP - create workbook from given JSON
ke4 Jul 8, 2021
578b89e
Added test cases
Jul 9, 2021
357cd6c
Support object list with modules
Jul 9, 2021
f09205d
Optimise conversion of scalar list
Jul 9, 2021
f606b50
Made some functions private
Jul 9, 2021
eeee548
Convert to user-friendly worksheet names
Jul 9, 2021
5072a1c
Handle multiple entities of different types
Jul 9, 2021
db17cce
Updated test data to have multiple entities of same type
Jul 9, 2021
6fd08d7
Test workbook creation with more than 1 data row
ke4 Jul 9, 2021
fb4938b
Add test for creating multiple worksheets
ke4 Jul 12, 2021
a16c966
Create workbook from given JSON
ke4 Jul 12, 2021
37aa895
Initial version of submission data collector
ke4 Jul 13, 2021
29e1834
Prefer if condition which controls flow to be done first in the for loop
Jul 13, 2021
74b103b
Refactor arrange part of unit tests
Jul 13, 2021
8168a9f
Updated with latest changes from master
Jul 13, 2021
9259ef0
Add biomaterials to the collected data
ke4 Jul 13, 2021
99470a4
Merge branch '13_spreadsheet-download' of https://github.com/ebi-ait/…
ke4 Jul 13, 2021
c0efbac
Remove json call when gathering the submission data
ke4 Jul 13, 2021
2399420
Fix errors in collecting data for a submission
Jul 13, 2021
7d0b653
Remove validation of uuid columns until module entities are being han…
Jul 13, 2021
a10cf5b
Fix failing tests as implementation changed
ke4 Jul 14, 2021
bb7b2ce
Add get related projects to IngestApi service class
ke4 Jul 14, 2021
34c3271
Write data header on row 4
ke4 Jul 15, 2021
456a034
Gathers data from protocols by submission id
ke4 Jul 15, 2021
72fbb24
Gathers data for files by submission id
ke4 Jul 15, 2021
d000bc5
Gathers data for processes by submission id
ke4 Jul 15, 2021
600a913
Fix getting of related project
Jul 15, 2021
b567a7c
Pass submission id not uuid
Jul 15, 2021
6e82048
Further fixes on getting the related project
Jul 15, 2021
ba9bc9a
Fix data collector test
ke4 Jul 15, 2021
03b5336
Added tests for ontology properties which should be in the concrete e…
Jul 15, 2021
7f3ba1a
Implement handling of list of ontology objects in metadata
Jul 15, 2021
8fb0560
Refactor flattening of object
Jul 15, 2021
63ecd2b
Grouped private functions for flattening logic
Jul 15, 2021
321c5b8
Extracted flattening logic for lists
Jul 15, 2021
2b9bcd1
Extracted flattening logic to a different class
Jul 15, 2021
7abbdcf
Set back helper functions as private
Jul 16, 2021
6269a3d
Undo unnecessary change
Jul 16, 2021
0807fec
Changing the spreadsheet json
Jul 16, 2021
78e732d
Updating tests with the new output format
Jul 16, 2021
e757244
Updated flattener tests expected output
Jul 16, 2021
dd56ad4
Add test for entity rows with different columns
Jul 16, 2021
e782e6d
Refactor flatten method
Jul 16, 2021
099e5fc
Adjust the spreadsheet generation to the new JSON contract
ke4 Jul 16, 2021
781007f
Fix 1st data row number in the spreadsheet
ke4 Jul 19, 2021
da14c58
Project worksheet should go as first sheet
ke4 Jul 19, 2021
9342eb6
Use a constant var for delimiter chars
Jul 20, 2021
2ccc5bd
Applied more PR comments
Jul 20, 2021
96cf299
Code review fixes
ke4 Jul 20, 2021
0b4d5a8
Merge branch '13_spreadsheet-download' of https://github.com/ebi-ait/…
ke4 Jul 20, 2021
0c89235
enumeration fix
ke4 Jul 20, 2021
b0fba31
Remove row instance variable and use enumeration instead
ke4 Jul 21, 2021
09d7a8f
Change variable naming
ke4 Jul 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions ingest/downloader/downloader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
from typing import List

from openpyxl import Workbook
from openpyxl.worksheet.worksheet import Worksheet

EXCLUDE_KEYS = ['describedBy', 'schema_type']


class XlsDownloader:
def __init__(self):
self.workbook = {}
self.row = 1

def convert_json(self, entity_list: List[dict]):
self._flatten_object_list(entity_list)
return self.workbook

def _flatten_object_list(self, object_list: List[dict], object_key: str = ''):
for entity in object_list:
worksheet_name = object_key
row = {}
content = entity

if not object_key:
content = entity['content']
worksheet_name = self.get_concrete_entity(content)
row = {f'{worksheet_name}.uuid': entity['uuid']['uuid']}

if not worksheet_name:
raise Exception('There should be a worksheet name')

user_friendly_worksheet_name = self._format_worksheet_name(worksheet_name)

rows = self.workbook.get(user_friendly_worksheet_name, [])
self.workbook[user_friendly_worksheet_name] = rows
self._flatten_object(content, row, parent_key=worksheet_name)
rows.append(row)

def _flatten_object(self, object: dict, flattened_object: dict, parent_key: str = ''):
if isinstance(object, dict):
for key in object:
full_key = f'{parent_key}.{key}' if parent_key else key
if key in EXCLUDE_KEYS:
amnonkhen marked this conversation as resolved.
Show resolved Hide resolved
continue
value = object[key]
if isinstance(value, dict) or isinstance(value, list):
self._flatten_object(value, flattened_object, parent_key=full_key)
else:
flattened_object[full_key] = str(value)
elif isinstance(object, list):
if self.is_object_list(object):
self._flatten_object_list(object, parent_key)
else:
stringified = [str(e) for e in object]
flattened_object[parent_key] = '||'.join(stringified)

def _format_worksheet_name(self, worksheet_name):
names = worksheet_name.split('.')
names = [n.replace('_', ' ') for n in names]
new_worksheet_name = ' - '.join([n.capitalize() for n in names])
return new_worksheet_name

def is_object_list(self, content):
return content and isinstance(content[0], dict)

@staticmethod
def get_concrete_entity(content):
return content.get('describedBy').rsplit('/', 1)[-1]

def create_workbook(self, input_json: dict) -> Workbook:
workbook = Workbook()
workbook.remove(workbook.active)

for ws_title, ws_elements in input_json.items():
worksheet: Worksheet = workbook.create_sheet(title=ws_title)
self.add_worksheet_content(worksheet, ws_elements)

return workbook

def add_worksheet_content(self, worksheet, ws_elements: dict):
is_header = True
if isinstance(ws_elements, list):
for content in ws_elements:
self.add_row_content(worksheet, content, is_header)
is_header = False
self.row += 1
else:
self.add_row_content(worksheet, ws_elements)

def add_row_content(self, worksheet, content, is_header=True):
col = 1
for header, cell_value in content.items():
if is_header:
self.row = 1
worksheet.cell(row=self.row, column=col, value=header)
self.row += 1
amnonkhen marked this conversation as resolved.
Show resolved Hide resolved
worksheet.cell(row=self.row, column=col, value=cell_value)
col += 1
87 changes: 87 additions & 0 deletions tests/unit/downloader/project-list-flattened.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
{
"Project": [
{
"project.uuid": "3e329187-a9c4-48ec-90e3-cc45f7c2311c",
"project.project_core.project_short_name": "kriegsteinBrainOrganoids",
"project.project_core.project_title": "Establishing Cerebral Organoids as Models of Human-Specific Brain Evolution",
"project.project_core.project_description": "Direct comparisons of human and non-human primate brain tissue have the potential to reveal molecular pathways underlying remarkable specializations of the human brain. However, chimpanzee tissue is largely inaccessible during neocortical neurogenesis when differences in brain size first appear. To identify human-specific features of cortical development, we leveraged recent innovations that permit generating pluripotent stem cell-derived cerebral organoids from chimpanzee. First, we systematically evaluated the fidelity of organoid models to primary human and macaque cortex, finding organoid models preserve gene regulatory networks related to cell types and developmental processes but exhibit increased metabolic stress. Second, we identified 261 genes differentially expressed in human compared to chimpanzee organoids and macaque cortex. Many of these genes overlap with human-specific segmental duplications and a subset suggest increased PI3K/AKT/mTOR activation in human outer radial glia. Together, our findings establish a platform for systematic analysis of molecular changes contributing to human brain development and evolution. Overall design: Single cell mRNA sequencing of iPS-derived neural and glial progenitor cells using the Fluidigm C1 system This series includes re-analysis of publicly available data in accessions: phs000989.v3, GSE99951, GSE86207, GSE75140. Sample metadata and accession IDs for the re-analyzed samples are included in the file \"GSE124299_metadata_on_processed_samples.xlsx\" available on the foot of this record. The following samples have no raw data due to data loss: GSM3569728, GSM3569738, GSM3571601, GSM3571606, GSM3571615, GSM3571621, GSM3571625, and GSM3571631",
"project.insdc_project_accessions": "SRP180337",
"project.geo_series_accessions": "GSE124299",
"project.insdc_study_accessions": "PRJNA515930"
}
],
"Project - Contributors": [
{
"project.contributors.name": "Alex A,,Pollen",
"project.contributors.email": "[email protected]",
"project.contributors.institution": "University of California, San Francisco (UCSF)",
"project.contributors.laboratory": "Department of Neurology",
"project.contributors.country": "USA",
"project.contributors.corresponding_contributor": "True",
"project.contributors.project_role.text": "experimental scientist",
"project.contributors.project_role.ontology": "EFO:0009741",
"project.contributors.project_role.ontology_label": "experimental scientist"
},
{
"project.contributors.name": "Parisa,,Nejad",
"project.contributors.email": "[email protected]",
"project.contributors.institution": "University of California, Santa Cruz",
"project.contributors.laboratory": "Human Cell Atlas Data Coordination Platform",
"project.contributors.country": "USA",
"project.contributors.corresponding_contributor": "False",
"project.contributors.project_role.text": "data wrangler",
"project.contributors.project_role.ontology": "EFO:0009737",
"project.contributors.project_role.ontology_label": "data curator"
},
{
"project.contributors.name": "Schwartz,,Rachel",
"project.contributors.email": "[email protected]",
"project.contributors.institution": "University of California, Santa Cruz",
"project.contributors.laboratory": "Human Cell Atlas Data Coordination Platform",
"project.contributors.country": "USA",
"project.contributors.corresponding_contributor": "False",
"project.contributors.project_role.text": "data wrangler",
"project.contributors.project_role.ontology": "EFO:0009737",
"project.contributors.project_role.ontology_label": "data curator"
}
],
"Project - Publications": [
{
"project.publications.authors": "Pollen AA||Bhaduri A||Andrews MG||Nowakowski TJ||Meyerson OS||Mostajo-Radji MA||Di Lullo E||Alvarado B||Bedolli M||Dougherty ML||Fiddes IT||Kronenberg ZN||Shuga J||Leyrat AA||West JA||Bershteyn M||Lowe CB||Pavlovic BJ||Salama SR||Haussler D||Eichler EE||Kriegstein AR",
"project.publications.title": "Establishing Cerebral Organoids as Models of Human-Specific Brain Evolution.",
"project.publications.doi": "10.1016/j.cell.2019.01.017",
"project.publications.pmid": "30735633",
"project.publications.url": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544371/"
}
],
"Project - Funders": [
{
"project.funders.grant_id": "U01 MH105989",
"project.funders.organization": "NIMH NIH HHS"
},
{
"project.funders.grant_id": "R35 NS097305",
"project.funders.organization": "NINDS NIH HHS"
},
{
"project.funders.grant_id": "T32 HD007470",
"project.funders.organization": "NICHD NIH HHS"
},
{
"project.funders.grant_id": "T32 GM007266",
"project.funders.organization": "NIGMS NIH HHS"
},
{
"project.funders.grant_id": "F32 NS103266",
"project.funders.organization": "NINDS NIH HHS"
},
{
"project.funders.grant_id": "NA",
"project.funders.organization": "Howard Hughes Medical Institute"
},
{
"project.funders.grant_id": "P51 OD011132",
"project.funders.organization": "NIH HHS"
}
]
}
Loading