Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] CellCensus package MVP #206

Merged
merged 17 commits into from
Feb 28, 2023
Merged

[R] CellCensus package MVP #206

merged 17 commits into from
Feb 28, 2023

Conversation

mlin
Copy link
Contributor

@mlin mlin commented Feb 19, 2023

Provides CellCensus::open_soma(census_version='latest') to get the top-level tiledbsoma::SOMACollection, as well as underlying helper methods for loading the release directory JSON.

@codecov
Copy link

codecov bot commented Feb 19, 2023

Codecov Report

Merging #206 (9cb394d) into main (ccfc1f0) will increase coverage by 1.31%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #206      +/-   ##
==========================================
+ Coverage   82.43%   83.75%   +1.31%     
==========================================
  Files          28       29       +1     
  Lines        1560     1619      +59     
==========================================
+ Hits         1286     1356      +70     
+ Misses        274      263      -11     
Flag Coverage Δ
unittests 83.75% <ø> (+1.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tools/cell_census_builder/mp.py 60.00% <0.00%> (-17.28%) ⬇️
tools/cell_census_builder/validate.py 88.96% <0.00%> (-3.15%) ⬇️
tools/cell_census_builder/summary_cell_counts.py 94.11% <0.00%> (-2.18%) ⬇️
tools/cell_census_builder/util.py 63.01% <0.00%> (-1.37%) ⬇️
tools/cell_census_builder/experiment_builder.py 94.27% <0.00%> (-0.49%) ⬇️
tools/cell_census_builder/datasets.py 97.56% <0.00%> (-0.27%) ⬇️
tools/cell_census_builder/globals.py 100.00% <0.00%> (ø)
tools/cell_census_builder/census_summary.py 100.00% <0.00%> (ø)
tools/cell_census_builder/tests/conftest.py 100.00% <0.00%> (ø)
api/python/cell_census/tests/test_directory.py 100.00% <0.00%> (ø)
... and 16 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mlin mlin mentioned this pull request Feb 21, 2023
@mlin mlin changed the title [WIP] bootstrap R package [R] R CellCensus package MVP Feb 26, 2023
@mlin mlin marked this pull request as ready for review February 26, 2023 06:36
@mlin mlin changed the title [R] R CellCensus package MVP [R] CellCensus package MVP Feb 26, 2023
@@ -0,0 +1,28 @@
name: cell_census R package checks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we are multi-lingual, we may want to reorganize the other workflows (or at least rename them so it is clear they are Python-specific).

Optional idea: rename each with a language prefix?

@atolopko-czi - any thoughts or preferences on this?

I'm also OK if we defer this to a future PR/project.

Copy link
Collaborator

@atolopko-czi atolopko-czi Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either language-specific prefixes or subdirs (if that's supported) seems reasonable to me; agree that we should differentiate languages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do a small follow-up PR on this.

@bkmartinjr
Copy link
Contributor

@mlin @pablo-gar - how will demo/doc notebooks be organized now that we are multi-lingual? Will they still exist in language-specific sub-folders alongside the cell-census package, or be promoted into another location?

I ask because this decision may effect how the contents of api/{r,python}/* are organized.

@@ -0,0 +1,22 @@
Version: 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't pretend to know R ecosystem versioning conventions, but should an early release such as this start as a 1.0? Or 0.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should we sync this to the Python versioning? We should probably consolidate the behavior with tiledb-soma anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 1.0 is a red herring -- it describes the format/schema version of the .Rproj file, not of the project/package itself. The package version is written in the DESCRIPTION file and defaulted to 0.0.0.9000 (I don't know where the 9000 comes from!). Agree we will want to some some coordinated version numbers, hopefully controlled by git tags, when we're settled enough to make coherent release versions. Looking forward to that =)

@@ -0,0 +1,12 @@
test_that("open_soma", {
coll <- open_soma("2023-02-13")
Copy link
Contributor

@bkmartinjr bkmartinjr Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this build tag will eventually go away (weeks to months from now). The only durable tag (currently) is the latest tag.

I'm not quite sure what to suggest - and you likely know the above already :-)

If we want to preserve a well-known tag for testing, we could easily add it to the release manifest, and keep it around semi-permanently (perhaps even re-aliasing it as needed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, if tests depend upon having specific data, those tests should use a test fixture (dynamically built, ideally) rather than the live census. And we should use latest if it's the latter case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fixtures are ideal but they will probably take quite a bit of time to write and we should probably add them after we ship this MVP. I think using this as a sanity check is a good idea at least for now. Using latest might be dangerous since a build could cause tests to start failing in main after a build.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that fixtures are fine for future consideration.

I think it's acceptable if tests fail even if the census data is the cause. These are essentially system tests, rather than unit tests. Consider that many of the Python "unit" tests, which are really system tests, are using pytest markers (annotations) to denote that they depend upon live data. If we ever have a failure that is census build-specific, it's probably an indication that we're missing an important builder validation. And we can then improve the validator as needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree all. I changed some things to use latest where it didn't really matter, and where needed (e.g. checking the respective row of the release directory dataframe), I changed the several occurrences of 2022-02-13 to refer to a single hardcoded constant so that it'll be easy to update if/when that version goes away.

Copy link
Contributor

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor nits (most significant is the version number - do we want to use 1.0 at this early date?)

But I think it is completely sufficient as a first bootstrap. Thank you!

on:
pull_request:
paths-ignore:
- "apis/python/**"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably add the builder here: tools/cell_census_builder

@@ -0,0 +1,2 @@
.Rproj.user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for not having this in the root .gitignore?

@@ -0,0 +1,22 @@
Version: 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should we sync this to the Python versioning? We should probably consolidate the behavior with tiledb-soma anyway.

Copy link
Member

@ebezzi ebezzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Added a couple of nitpicks but they can be addressed at a later time (if needed).

@pablo-gar
Copy link
Contributor

@mlin @pablo-gar - how will demo/doc notebooks be organized now that we are multi-lingual? Will they still exist in language-specific sub-folders alongside the cell-census package, or be promoted into another location?

I ask because this decision may effect how the contents of api/{r,python}/* are organized.

@bkmartinjr The main location for users to access the notebooks should be in the doc-site. As such the python notebooks can stay as they are, and the R tutorials should live in the vignettes folder of the R package (cc @mlin) -- these get automatically render in the doc-site via pgkdown (cc @ebezzi)

@mlin mlin merged commit 9097162 into main Feb 28, 2023
@mlin mlin deleted the mlin/bootstrap-r-pkg branch February 28, 2023 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants