[R] CellCensus package MVP #206

mlin · 2023-02-19T02:00:26Z

Provides CellCensus::open_soma(census_version='latest') to get the top-level tiledbsoma::SOMACollection, as well as underlying helper methods for loading the release directory JSON.

Initial README guidance is to install the package directly from GitHub for now.
Initially depends on bleeding-edge builds of tiledbsoma from r-universe per @eddelbuettel suggestion.
Packaging strategy follows the R Packages book.
Includes GH CI workflow using some nice R Actions.
Some early roxygen docstrings, mostly copy pasta from the Python equivalents.

codecov · 2023-02-19T02:04:50Z

Codecov Report

Merging #206 (9cb394d) into main (ccfc1f0) will increase coverage by 1.31%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #206      +/-   ##
==========================================
+ Coverage   82.43%   83.75%   +1.31%     
==========================================
  Files          28       29       +1     
  Lines        1560     1619      +59     
==========================================
+ Hits         1286     1356      +70     
+ Misses        274      263      -11

Flag	Coverage Δ
unittests	`83.75% <ø> (+1.31%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tools/cell_census_builder/mp.py	`60.00% <0.00%> (-17.28%)`	⬇️
tools/cell_census_builder/validate.py	`88.96% <0.00%> (-3.15%)`	⬇️
tools/cell_census_builder/summary_cell_counts.py	`94.11% <0.00%> (-2.18%)`	⬇️
tools/cell_census_builder/util.py	`63.01% <0.00%> (-1.37%)`	⬇️
tools/cell_census_builder/experiment_builder.py	`94.27% <0.00%> (-0.49%)`	⬇️
tools/cell_census_builder/datasets.py	`97.56% <0.00%> (-0.27%)`	⬇️
tools/cell_census_builder/globals.py	`100.00% <0.00%> (ø)`
tools/cell_census_builder/census_summary.py	`100.00% <0.00%> (ø)`
tools/cell_census_builder/tests/conftest.py	`100.00% <0.00%> (ø)`
api/python/cell_census/tests/test_directory.py	`100.00% <0.00%> (ø)`
... and 16 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

bkmartinjr · 2023-02-27T16:23:57Z

.github/workflows/Rcheck.yml

@@ -0,0 +1,28 @@
+name: cell_census R package checks


now that we are multi-lingual, we may want to reorganize the other workflows (or at least rename them so it is clear they are Python-specific).

Optional idea: rename each with a language prefix?

@atolopko-czi - any thoughts or preferences on this?

I'm also OK if we defer this to a future PR/project.

either language-specific prefixes or subdirs (if that's supported) seems reasonable to me; agree that we should differentiate languages

Will do a small follow-up PR on this.

bkmartinjr · 2023-02-27T16:29:34Z

@mlin @pablo-gar - how will demo/doc notebooks be organized now that we are multi-lingual? Will they still exist in language-specific sub-folders alongside the cell-census package, or be promoted into another location?

I ask because this decision may effect how the contents of api/{r,python}/* are organized.

bkmartinjr · 2023-02-27T16:48:20Z

api/r/CellCensus/CellCensus.Rproj

@@ -0,0 +1,22 @@
+Version: 1.0


I don't pretend to know R ecosystem versioning conventions, but should an early release such as this start as a 1.0? Or 0.?

Also should we sync this to the Python versioning? We should probably consolidate the behavior with tiledb-soma anyway.

This 1.0 is a red herring -- it describes the format/schema version of the .Rproj file, not of the project/package itself. The package version is written in the DESCRIPTION file and defaulted to 0.0.0.9000 (I don't know where the 9000 comes from!). Agree we will want to some some coordinated version numbers, hopefully controlled by git tags, when we're settled enough to make coherent release versions. Looking forward to that =)

bkmartinjr · 2023-02-27T17:02:02Z

api/r/CellCensus/tests/testthat/test-open.R

@@ -0,0 +1,12 @@
+test_that("open_soma", {
+  coll <- open_soma("2023-02-13")


this build tag will eventually go away (weeks to months from now). The only durable tag (currently) is the latest tag.

I'm not quite sure what to suggest - and you likely know the above already :-)

If we want to preserve a well-known tag for testing, we could easily add it to the release manifest, and keep it around semi-permanently (perhaps even re-aliasing it as needed).

IMO, if tests depend upon having specific data, those tests should use a test fixture (dynamically built, ideally) rather than the live census. And we should use latest if it's the latter case.

I think fixtures are ideal but they will probably take quite a bit of time to write and we should probably add them after we ship this MVP. I think using this as a sanity check is a good idea at least for now. Using latest might be dangerous since a build could cause tests to start failing in main after a build.

Agreed that fixtures are fine for future consideration.

I think it's acceptable if tests fail even if the census data is the cause. These are essentially system tests, rather than unit tests. Consider that many of the Python "unit" tests, which are really system tests, are using pytest markers (annotations) to denote that they depend upon live data. If we ever have a failure that is census build-specific, it's probably an indication that we're missing an important builder validation. And we can then improve the validator as needed.

Agree all. I changed some things to use latest where it didn't really matter, and where needed (e.g. checking the respective row of the release directory dataframe), I changed the several occurrences of 2022-02-13 to refer to a single hardcoded constant so that it'll be easy to update if/when that version goes away.

bkmartinjr

A couple of minor nits (most significant is the version number - do we want to use 1.0 at this early date?)

But I think it is completely sufficient as a first bootstrap. Thank you!

ebezzi · 2023-02-27T17:26:45Z

.github/workflows/Rcheck.yml

+on:
+  pull_request:
+    paths-ignore:
+      - "apis/python/**"


We can probably add the builder here: tools/cell_census_builder

ebezzi · 2023-02-27T17:27:49Z

api/r/CellCensus/.gitignore

@@ -0,0 +1,2 @@
+.Rproj.user


Any reason for not having this in the root .gitignore?

ebezzi · 2023-02-27T17:28:36Z

api/r/CellCensus/CellCensus.Rproj

@@ -0,0 +1,22 @@
+Version: 1.0


Also should we sync this to the Python versioning? We should probably consolidate the behavior with tiledb-soma anyway.

ebezzi

Looks great! Added a couple of nitpicks but they can be addressed at a later time (if needed).

pablo-gar · 2023-02-27T18:47:41Z

@mlin @pablo-gar - how will demo/doc notebooks be organized now that we are multi-lingual? Will they still exist in language-specific sub-folders alongside the cell-census package, or be promoted into another location?

I ask because this decision may effect how the contents of api/{r,python}/* are organized.

@bkmartinjr The main location for users to access the notebooks should be in the doc-site. As such the python notebooks can stay as they are, and the R tutorials should live in the vignettes folder of the R package (cc @mlin) -- these get automatically render in the doc-site via pgkdown (cc @ebezzi)

In the beginning

4de790c

mlin added 6 commits February 18, 2023 16:19

add Rcheck workflow

dfbe224

fix Rcheck workflow

2aa914d

fix Rcheck workflow

dff15b0

styler

1f75d7f

styler

2594478

styler

3edaea9

mlin mentioned this pull request Feb 21, 2023

Create an MVP of the R API #142

Closed

mlin added 7 commits February 21, 2023 08:18

Merge remote-tracking branch 'origin/main' into mlin/bootstrap-r-pkg

a0d5a8a

rename cellcensus to CellCensus

f1172e1

rename cellcensus to CellCensus

000250a

add open_soma

90ce5a0

add extra-repositories to fix CI

ea20f1d

stub readme

7b7bf22

update README.md

7b9508b

mlin changed the title ~~[WIP] bootstrap R package~~ [R] R CellCensus package MVP Feb 26, 2023

mlin marked this pull request as ready for review February 26, 2023 06:36

mlin changed the title ~~[R] R CellCensus package MVP~~ [R] CellCensus package MVP Feb 26, 2023

mlin requested review from pablo-gar, maniarathi, bkmartinjr, 0seastar0 and ebezzi February 26, 2023 06:49

bkmartinjr reviewed Feb 27, 2023

View reviewed changes

bkmartinjr approved these changes Feb 27, 2023

View reviewed changes

ebezzi reviewed Feb 27, 2023

View reviewed changes

ebezzi approved these changes Feb 27, 2023

View reviewed changes

mlin added 3 commits February 27, 2023 23:15

clean up .gitignore

288c22c

Rcheck workflow: ignore cell_census_builder

fd42254

hardcode known census version in only one place

9cb394d

mlin merged commit 9097162 into main Feb 28, 2023

mlin deleted the mlin/bootstrap-r-pkg branch February 28, 2023 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R] CellCensus package MVP #206

[R] CellCensus package MVP #206

mlin commented Feb 19, 2023 •

edited

Loading

codecov bot commented Feb 19, 2023 •

edited

Loading

bkmartinjr Feb 27, 2023

atolopko-czi Feb 27, 2023 •

edited

Loading

mlin Feb 28, 2023

bkmartinjr commented Feb 27, 2023

bkmartinjr Feb 27, 2023

ebezzi Feb 27, 2023

mlin Feb 28, 2023

bkmartinjr Feb 27, 2023 •

edited

Loading

atolopko-czi Feb 27, 2023

ebezzi Feb 27, 2023

atolopko-czi Feb 27, 2023

mlin Feb 28, 2023

bkmartinjr left a comment

ebezzi Feb 27, 2023

ebezzi Feb 27, 2023

ebezzi Feb 27, 2023

ebezzi left a comment

pablo-gar commented Feb 27, 2023

		@@ -0,0 +1,12 @@
		test_that("open_soma", {
		coll <- open_soma("2023-02-13")

[R] CellCensus package MVP #206

[R] CellCensus package MVP #206

Conversation

mlin commented Feb 19, 2023 • edited Loading

codecov bot commented Feb 19, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

atolopko-czi Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmartinjr commented Feb 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmartinjr Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmartinjr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebezzi left a comment

Choose a reason for hiding this comment

pablo-gar commented Feb 27, 2023

mlin commented Feb 19, 2023 •

edited

Loading

codecov bot commented Feb 19, 2023 •

edited

Loading

atolopko-czi Feb 27, 2023 •

edited

Loading

bkmartinjr Feb 27, 2023 •

edited

Loading