-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use GET /datasets endpoint for builder #378
Conversation
Codecov Report
@@ Coverage Diff @@
## main #378 +/- ##
=======================================
Coverage ? 88.33%
=======================================
Files ? 50
Lines ? 2727
Branches ? 0
=======================================
Hits ? 2409
Misses ? 318
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with some optional, minor suggestions. Before merging, can you perform a test build (at least to the point of staging all the datasets) and note on the PR?
|
||
d = Dataset( | ||
dataset_id=dataset_id, | ||
corpora_asset_h5ad_uri=asset_h5ad_uri, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danieljhegeman title is a required (non-null) field, right?
|
||
return [Dataset(**d) for d in datasets.values()] | ||
continue | ||
asset_h5ad_uri = assets[0]["url"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth asserting that len(assets) == 1
as a sanity check? Or at least logging this unexpected case as a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a logging.error
and a continue
.
"collection_doi": None, | ||
"title": "dataset #2", | ||
"schema_version": "3.0.0", | ||
"assets": [{"filesize": 456, "filetype": "H5AD", "url": "https://fake.url/dataset_id_2.h5ad"}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth adding a non-H5AD asset as well, either here or in an appropriate test
) | ||
response.append(d) | ||
|
||
logging.info(f"Found {len(datasets)} datasets") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out-of-scope, but it would be nice to report all the warning type counts here (datasets excluded for schema, missing h5ad asset)
|
||
d = Dataset( | ||
dataset_id=dataset_id, | ||
corpora_asset_h5ad_uri=asset_h5ad_uri, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good time to rename corpora
to dataset
Did a run with 5 datasets and it worked (after a bugfix!). Should work fine on Monday's run. |
#332