Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GET /datasets endpoint for builder #378

Merged
merged 2 commits into from
Apr 17, 2023

Conversation

ebezzi
Copy link
Member

@ebezzi ebezzi commented Apr 13, 2023

@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@281ffdf). Click here to learn what that means.
The diff coverage is 90.24%.

@@           Coverage Diff           @@
##             main     #378   +/-   ##
=======================================
  Coverage        ?   88.33%           
=======================================
  Files           ?       50           
  Lines           ?     2727           
  Branches        ?        0           
=======================================
  Hits            ?     2409           
  Misses          ?      318           
  Partials        ?        0           
Flag Coverage Δ
unittests 88.33% <90.24%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...llxgene_census_builder/build_soma/source_assets.py 69.23% <0.00%> (ø)
...cellxgene_census_builder/tests/anndata/conftest.py 100.00% <ø> (ø)
tools/cellxgene_census_builder/tests/conftest.py 97.95% <ø> (ø)
...rc/cellxgene_census_builder/build_soma/manifest.py 95.38% <90.90%> (ø)
...rc/cellxgene_census_builder/build_soma/datasets.py 95.34% <100.00%> (ø)
...llxgene_census_builder/build_soma/validate_soma.py 89.72% <100.00%> (ø)
...ls/cellxgene_census_builder/tests/test_manifest.py 100.00% <100.00%> (ø)
...llxgene_census_builder/tests/test_source_assets.py 95.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@atolopko-czi atolopko-czi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with some optional, minor suggestions. Before merging, can you perform a test build (at least to the point of staging all the datasets) and note on the PR?


d = Dataset(
dataset_id=dataset_id,
corpora_asset_h5ad_uri=asset_h5ad_uri,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danieljhegeman title is a required (non-null) field, right?


return [Dataset(**d) for d in datasets.values()]
continue
asset_h5ad_uri = assets[0]["url"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth asserting that len(assets) == 1 as a sanity check? Or at least logging this unexpected case as a warning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a logging.error and a continue.

"collection_doi": None,
"title": "dataset #2",
"schema_version": "3.0.0",
"assets": [{"filesize": 456, "filetype": "H5AD", "url": "https://fake.url/dataset_id_2.h5ad"}],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth adding a non-H5AD asset as well, either here or in an appropriate test

)
response.append(d)

logging.info(f"Found {len(datasets)} datasets")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out-of-scope, but it would be nice to report all the warning type counts here (datasets excluded for schema, missing h5ad asset)


d = Dataset(
dataset_id=dataset_id,
corpora_asset_h5ad_uri=asset_h5ad_uri,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good time to rename corpora to dataset

@ebezzi
Copy link
Member Author

ebezzi commented Apr 14, 2023

Did a run with 5 datasets and it worked (after a bugfix!). Should work fine on Monday's run.

@ebezzi ebezzi merged commit 92764bb into main Apr 17, 2023
@ebezzi ebezzi deleted the ebezzi/use-datasets-endpoint-for-builder branch April 17, 2023 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants