Use GET /datasets endpoint for builder #378

ebezzi · 2023-04-13T16:04:41Z

codecov · 2023-04-13T16:16:08Z

Codecov Report

❗ No coverage uploaded for pull request base (main@281ffdf). Click here to learn what that means.
The diff coverage is 90.24%.

@@           Coverage Diff           @@
##             main     #378   +/-   ##
=======================================
  Coverage        ?   88.33%           
=======================================
  Files           ?       50           
  Lines           ?     2727           
  Branches        ?        0           
=======================================
  Hits            ?     2409           
  Misses          ?      318           
  Partials        ?        0

Flag	Coverage Δ
unittests	`88.33% <90.24%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...llxgene_census_builder/build_soma/source_assets.py	`69.23% <0.00%> (ø)`
...cellxgene_census_builder/tests/anndata/conftest.py	`100.00% <ø> (ø)`
tools/cellxgene_census_builder/tests/conftest.py	`97.95% <ø> (ø)`
...rc/cellxgene_census_builder/build_soma/manifest.py	`95.38% <90.90%> (ø)`
...rc/cellxgene_census_builder/build_soma/datasets.py	`95.34% <100.00%> (ø)`
...llxgene_census_builder/build_soma/validate_soma.py	`89.72% <100.00%> (ø)`
...ls/cellxgene_census_builder/tests/test_manifest.py	`100.00% <100.00%> (ø)`
...llxgene_census_builder/tests/test_source_assets.py	`95.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

atolopko-czi

LGTM, with some optional, minor suggestions. Before merging, can you perform a test build (at least to the point of staging all the datasets) and note on the PR?

atolopko-czi · 2023-04-14T11:52:26Z

tools/cellxgene_census_builder/src/cellxgene_census_builder/build_soma/manifest.py

+
+        d = Dataset(
+            dataset_id=dataset_id,
+            corpora_asset_h5ad_uri=asset_h5ad_uri,


@danieljhegeman title is a required (non-null) field, right?

atolopko-czi · 2023-04-14T11:54:10Z

tools/cellxgene_census_builder/src/cellxgene_census_builder/build_soma/manifest.py

-
-    return [Dataset(**d) for d in datasets.values()]
+            continue
+        asset_h5ad_uri = assets[0]["url"]


worth asserting that len(assets) == 1 as a sanity check? Or at least logging this unexpected case as a warning.

I added a logging.error and a continue.

atolopko-czi · 2023-04-14T11:56:20Z

tools/cellxgene_census_builder/tests/test_manifest.py

+                "collection_doi": None,
+                "title": "dataset #2",
+                "schema_version": "3.0.0",
+                "assets": [{"filesize": 456, "filetype": "H5AD", "url": "https://fake.url/dataset_id_2.h5ad"}],


worth adding a non-H5AD asset as well, either here or in an appropriate test

atolopko-czi · 2023-04-14T12:00:33Z

tools/cellxgene_census_builder/src/cellxgene_census_builder/build_soma/manifest.py

+        )
+        response.append(d)
+
+    logging.info(f"Found {len(datasets)} datasets")


out-of-scope, but it would be nice to report all the warning type counts here (datasets excluded for schema, missing h5ad asset)

atolopko-czi · 2023-04-14T12:00:56Z

tools/cellxgene_census_builder/src/cellxgene_census_builder/build_soma/manifest.py

+
+        d = Dataset(
+            dataset_id=dataset_id,
+            corpora_asset_h5ad_uri=asset_h5ad_uri,


Seems like a good time to rename corpora to dataset

ebezzi · 2023-04-14T19:21:04Z

Did a run with 5 datasets and it worked (after a bugfix!). Should work fine on Monday's run.

Use GET /datasets endpoint for builder

a766137

ebezzi requested review from danieljhegeman and atolopko-czi April 13, 2023 16:04

atolopko-czi approved these changes Apr 14, 2023

View reviewed changes

various changes

9ea3714

ebezzi merged commit 92764bb into main Apr 17, 2023

ebezzi deleted the ebezzi/use-datasets-endpoint-for-builder branch April 17, 2023 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use GET /datasets endpoint for builder #378

Use GET /datasets endpoint for builder #378

ebezzi commented Apr 13, 2023

codecov bot commented Apr 13, 2023 •

edited

Loading

atolopko-czi left a comment

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

ebezzi Apr 14, 2023

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

atolopko-czi Apr 14, 2023

ebezzi commented Apr 14, 2023

Use GET /datasets endpoint for builder #378

Use GET /datasets endpoint for builder #378

Conversation

ebezzi commented Apr 13, 2023

codecov bot commented Apr 13, 2023 • edited Loading

Codecov Report

atolopko-czi left a comment

Choose a reason for hiding this comment

atolopko-czi Apr 14, 2023

Choose a reason for hiding this comment

atolopko-czi Apr 14, 2023

Choose a reason for hiding this comment

ebezzi Apr 14, 2023

Choose a reason for hiding this comment

atolopko-czi Apr 14, 2023

Choose a reason for hiding this comment

atolopko-czi Apr 14, 2023

Choose a reason for hiding this comment

atolopko-czi Apr 14, 2023

Choose a reason for hiding this comment

ebezzi commented Apr 14, 2023

codecov bot commented Apr 13, 2023 •

edited

Loading