-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] CellCensus package MVP #206
Conversation
Codecov Report
@@ Coverage Diff @@
## main #206 +/- ##
==========================================
+ Coverage 82.43% 83.75% +1.31%
==========================================
Files 28 29 +1
Lines 1560 1619 +59
==========================================
+ Hits 1286 1356 +70
+ Misses 274 263 -11
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@@ -0,0 +1,28 @@ | |||
name: cell_census R package checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that we are multi-lingual, we may want to reorganize the other workflows (or at least rename them so it is clear they are Python-specific).
Optional idea: rename each with a language prefix?
@atolopko-czi - any thoughts or preferences on this?
I'm also OK if we defer this to a future PR/project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either language-specific prefixes or subdirs (if that's supported) seems reasonable to me; agree that we should differentiate languages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do a small follow-up PR on this.
@mlin @pablo-gar - how will demo/doc notebooks be organized now that we are multi-lingual? Will they still exist in language-specific sub-folders alongside the cell-census package, or be promoted into another location? I ask because this decision may effect how the contents of api/{r,python}/* are organized. |
@@ -0,0 +1,22 @@ | |||
Version: 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't pretend to know R ecosystem versioning conventions, but should an early release such as this start as a 1.0? Or 0.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should we sync this to the Python versioning? We should probably consolidate the behavior with tiledb-soma anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 1.0
is a red herring -- it describes the format/schema version of the .Rproj
file, not of the project/package itself. The package version is written in the DESCRIPTION
file and defaulted to 0.0.0.9000
(I don't know where the 9000 comes from!). Agree we will want to some some coordinated version numbers, hopefully controlled by git tags, when we're settled enough to make coherent release versions. Looking forward to that =)
@@ -0,0 +1,12 @@ | |||
test_that("open_soma", { | |||
coll <- open_soma("2023-02-13") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this build tag will eventually go away (weeks to months from now). The only durable tag (currently) is the latest
tag.
I'm not quite sure what to suggest - and you likely know the above already :-)
If we want to preserve a well-known tag for testing, we could easily add it to the release manifest, and keep it around semi-permanently (perhaps even re-aliasing it as needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, if tests depend upon having specific data, those tests should use a test fixture (dynamically built, ideally) rather than the live census. And we should use latest
if it's the latter case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fixtures are ideal but they will probably take quite a bit of time to write and we should probably add them after we ship this MVP. I think using this as a sanity check is a good idea at least for now. Using latest
might be dangerous since a build could cause tests to start failing in main after a build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that fixtures are fine for future consideration.
I think it's acceptable if tests fail even if the census data is the cause. These are essentially system tests, rather than unit tests. Consider that many of the Python "unit" tests, which are really system tests, are using pytest markers (annotations) to denote that they depend upon live data. If we ever have a failure that is census build-specific, it's probably an indication that we're missing an important builder validation. And we can then improve the validator as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree all. I changed some things to use latest
where it didn't really matter, and where needed (e.g. checking the respective row of the release directory dataframe), I changed the several occurrences of 2022-02-13
to refer to a single hardcoded constant so that it'll be easy to update if/when that version goes away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor nits (most significant is the version number - do we want to use 1.0
at this early date?)
But I think it is completely sufficient as a first bootstrap. Thank you!
on: | ||
pull_request: | ||
paths-ignore: | ||
- "apis/python/**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably add the builder here: tools/cell_census_builder
api/r/CellCensus/.gitignore
Outdated
@@ -0,0 +1,2 @@ | |||
.Rproj.user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for not having this in the root .gitignore
?
@@ -0,0 +1,22 @@ | |||
Version: 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should we sync this to the Python versioning? We should probably consolidate the behavior with tiledb-soma anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Added a couple of nitpicks but they can be addressed at a later time (if needed).
@bkmartinjr The main location for users to access the notebooks should be in the doc-site. As such the python notebooks can stay as they are, and the R tutorials should live in the vignettes folder of the R package (cc @mlin) -- these get automatically render in the doc-site via |
Provides
CellCensus::open_soma(census_version='latest')
to get the top-leveltiledbsoma::SOMACollection
, as well as underlying helper methods for loading the release directory JSON.