Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] CellCensus package MVP #206

Merged
merged 17 commits into from
Feb 28, 2023
28 changes: 28 additions & 0 deletions .github/workflows/Rcheck.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: cell_census R package checks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we are multi-lingual, we may want to reorganize the other workflows (or at least rename them so it is clear they are Python-specific).

Optional idea: rename each with a language prefix?

@atolopko-czi - any thoughts or preferences on this?

I'm also OK if we defer this to a future PR/project.

Copy link
Collaborator

@atolopko-czi atolopko-czi Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either language-specific prefixes or subdirs (if that's supported) seems reasonable to me; agree that we should differentiate languages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do a small follow-up PR on this.


on:
pull_request:
paths-ignore:
- "apis/python/**"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably add the builder here: tools/cell_census_builder

push:
branches: [main]

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: r-lib/actions/setup-r@v2
with:
extra-repositories: https://tiledb-inc.r-universe.dev
- uses: r-lib/actions/setup-r-dependencies@v2
with:
working-directory: ./api/r/CellCensus
extra-packages: any::rcmdcheck, any::styler
needs: check
- name: styler
run: Rscript -e 'library("styler"); style_pkg("api/r/CellCensus", dry="fail")'
- uses: r-lib/actions/check-r-package@v2
with:
working-directory: ./api/r/CellCensus
args: 'c("--no-manual", "--as-cran")'
2 changes: 2 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ name: cell_census build

on:
pull_request:
paths-ignore:
- "apis/r/**"
push:
branches: [main]
workflow_dispatch:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/formatting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ name: Python Linting

on:
pull_request:
paths-ignore:
- "apis/r/**"
push:
branches: [main]

Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/unittests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ name: cell_census Python package unit tests

on:
pull_request:
paths-ignore:
- "apis/r/**"
push:
branches: [main]

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,4 @@ temp

# ruff
.ruff_cache
.Rproj.user
4 changes: 4 additions & 0 deletions api/r/CellCensus/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
^CellCensus\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
^README\.Rmd$
2 changes: 2 additions & 0 deletions api/r/CellCensus/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.Rproj.user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for not having this in the root .gitignore?

.Rhistory
22 changes: 22 additions & 0 deletions api/r/CellCensus/CellCensus.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Version: 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't pretend to know R ecosystem versioning conventions, but should an early release such as this start as a 1.0? Or 0.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should we sync this to the Python versioning? We should probably consolidate the behavior with tiledb-soma anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 1.0 is a red herring -- it describes the format/schema version of the .Rproj file, not of the project/package itself. The package version is written in the DESCRIPTION file and defaulted to 0.0.0.9000 (I don't know where the 9000 comes from!). Agree we will want to some some coordinated version numbers, hopefully controlled by git tags, when we're settled enough to make coherent release versions. Looking forward to that =)


RestoreWorkspace: No
SaveWorkspace: No
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
LineEndingConversion: Posix

BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace
23 changes: 23 additions & 0 deletions api/r/CellCensus/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Package: CellCensus
Title: CZI Science Cell Census API
Version: 0.0.0.9000
Authors@R:
person("Chan Zuckerberg Initiative", email = "[email protected]",
role = c("aut", "cre", "cph", "fnd"))
Description: API to facilitate use of the CZI Science Cell Census. The Cell
Census is a versioned container for the single-cell data hosted at CELLxGENE
Discover.
License: MIT + file LICENSE
URL: https://github.com/chanzuckerberg/cell-census
BugReports: https://github.com/chanzuckerberg/cell-census/issues
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Additional_repositories: https://tiledb-inc.r-universe.dev
Imports:
jsonlite,
tiledbsoma,
tiledb
Suggests:
testthat (>= 3.0.0)
Config/testthat/edition: 3
2 changes: 2 additions & 0 deletions api/r/CellCensus/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
YEAR: 2023
COPYRIGHT HOLDER: Chan Zuckerberg Initiative
21 changes: 21 additions & 0 deletions api/r/CellCensus/LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# MIT License

Copyright (c) 2023 Chan Zuckerberg Initiative

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
8 changes: 8 additions & 0 deletions api/r/CellCensus/NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Generated by roxygen2: do not edit by hand

export(get_census_version_description)
export(get_census_version_directory)
export(open_soma)
importFrom(jsonlite,fromJSON)
importFrom(tiledb,tiledb_ctx)
importFrom(tiledbsoma,SOMACollection)
31 changes: 31 additions & 0 deletions api/r/CellCensus/R/open.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
DEFAULT_TILEDB_CONFIGURATION <- c(
"py.init_buffer_bytes" = paste(1 * 1024**3),
"soma.init_buffer_bytes" = paste(1 * 1024**3)
)

#' Open the Cell Census
#'
#' @param census_version The version of the Census, e.g., "latest"
#' @param uri The URI containing the Census SOMA objects. If specified, takes
#' precedence over `census_version`.
#'
#' @return Top-level `tiledbsoma::SOMACollection` object
#' @importFrom tiledbsoma SOMACollection
#' @importFrom tiledb tiledb_ctx
#' @export
#'
#' @examples
open_soma <- function(census_version = "latest", uri = "") {
cfg <- DEFAULT_TILEDB_CONFIGURATION

if (uri == "") {
description <- get_census_version_description(census_version)
uri <- description$soma.uri
if ("soma.s3_region" %in% names(description) &&
description$soma.s3_region != "") {
cfg <- c(cfg, c("vfs.s3.region" = description$soma.s3_region))
}
}

return(tiledbsoma::SOMACollection$new(uri, ctx = tiledb::tiledb_ctx(cfg)))
}
59 changes: 59 additions & 0 deletions api/r/CellCensus/R/release_directory.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
CELL_CENSUS_RELEASE_DIRECTORY_URL <- "https://s3.us-west-2.amazonaws.com/cellxgene-data-public/cell-census/release.json"


#' Get release description for given census version
#'
#' @param census_version The census version name.
#'
#' @return List with the release location and metadata
#' @export
#'
#' @examples
get_census_version_description <- function(census_version) {
census_directory <- get_census_version_directory()
description <- census_directory[census_version, ]
if (nrow(description) == 0) {
stop(paste("unknown Cell Census version:", census_version))
}
ans <- as.list(description)
ans$census_version <- census_version
return(ans)
}

#' Get the directory of cell census releases currently available
#'
#' @return Data frame of available cell census releases, including location and
#' metadata.
#' @importFrom jsonlite fromJSON
#' @export
#'
#' @examples
get_census_version_directory <- function() {
raw <- jsonlite::fromJSON(CELL_CENSUS_RELEASE_DIRECTORY_URL)

# Resolve all aliases for easier use
for (field in names(raw)) {
points_at <- raw[[field]]
while (is.character(points_at)) {
points_at <- raw[[points_at]]
}
raw[[field]] <- points_at
}

# Replace NULLs with empty string to facilitate data frame conversion
raw <- simple_rapply(raw, function(x) ifelse(is.null(x), "", x))

# Convert nested list to data frame
df <- do.call(rbind, lapply(raw, as.data.frame))
rownames(df) <- names(raw)
return(df)
}

# https://stackoverflow.com/a/38950304
simple_rapply <- function(x, fn) {
if (is.list(x)) {
lapply(x, simple_rapply, fn)
} else {
fn(x)
}
}
38 changes: 38 additions & 0 deletions api/r/CellCensus/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@

# CELLxGENE Cell Census

<!-- badges: start -->
<!-- badges: end -->

The `CellCensus` package provides an API to facilitate use of the CZI Science Cell Census. The Cell Census is a versioned container for the single-cell data hosted at [CELLxGENE Discover](https://cellxgene.cziscience.com/).

**Status**: Pre-release, under rapid development. Expect API changes.

For more information, see the [cell_census repo](https://github.com/chanzuckerberg/cell-census/).

## Installation

You can install the development version of CellCensus from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("chanzuckerberg/cell-census/api/r/CellCensus")
print(CellCensus::open_soma())
```

(minimal apt dependencies: r-base cmake git)

## Example

This is a basic example which shows you how to solve a common problem:

``` r
library(CellCensus)
## basic example code
```

## For More Help

For more help, please file a issue on the repo, or contact us at <[email protected]>

If you believe you have found a security issue, we would appreciate notification. Please send email to <[email protected]>.
17 changes: 17 additions & 0 deletions api/r/CellCensus/man/get_census_version_description.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 15 additions & 0 deletions api/r/CellCensus/man/get_census_version_directory.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions api/r/CellCensus/man/open_soma.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions api/r/CellCensus/tests/testthat.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# This file is part of the standard setup for testthat.
# It is recommended that you do not modify it.
#
# Where should you do additional test configuration?
# Learn more about the roles of various files in:
# * https://r-pkgs.org/tests.html
# * https://testthat.r-lib.org/reference/test_package.html#special-files

library(testthat)
library(CellCensus)

test_check("CellCensus")
12 changes: 12 additions & 0 deletions api/r/CellCensus/tests/testthat/test-open.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
test_that("open_soma", {
coll <- open_soma("2023-02-13")
Copy link
Contributor

@bkmartinjr bkmartinjr Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this build tag will eventually go away (weeks to months from now). The only durable tag (currently) is the latest tag.

I'm not quite sure what to suggest - and you likely know the above already :-)

If we want to preserve a well-known tag for testing, we could easily add it to the release manifest, and keep it around semi-permanently (perhaps even re-aliasing it as needed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, if tests depend upon having specific data, those tests should use a test fixture (dynamically built, ideally) rather than the live census. And we should use latest if it's the latter case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fixtures are ideal but they will probably take quite a bit of time to write and we should probably add them after we ship this MVP. I think using this as a sanity check is a good idea at least for now. Using latest might be dangerous since a build could cause tests to start failing in main after a build.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that fixtures are fine for future consideration.

I think it's acceptable if tests fail even if the census data is the cause. These are essentially system tests, rather than unit tests. Consider that many of the Python "unit" tests, which are really system tests, are using pytest markers (annotations) to denote that they depend upon live data. If we ever have a failure that is census build-specific, it's probably an indication that we're missing an important builder validation. And we can then improve the validator as needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree all. I changed some things to use latest where it didn't really matter, and where needed (e.g. checking the respective row of the release directory dataframe), I changed the several occurrences of 2022-02-13 to refer to a single hardcoded constant so that it'll be easy to update if/when that version goes away.

expect_equal(coll$uri, "s3://cellxgene-data-public/cell-census/2023-02-13/soma/")
expect_true(coll$exists())
expect_true(coll$get("census_data")$get("homo_sapiens")$exists())
})

test_that("open_soma latest/default", {
coll_default <- open_soma()
coll_latest <- open_soma("latest")
expect_equal(coll_default$uri, coll_latest$uri)
})
19 changes: 19 additions & 0 deletions api/r/CellCensus/tests/testthat/test-release_directory.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
test_that("get_census_version_description", {
desc <- get_census_version_description("2023-02-13")
expect_equal(desc$release_build, "2023-02-13")
expect_equal(desc$soma.uri, "s3://cellxgene-data-public/cell-census/2023-02-13/soma/")

# alias resolution
desc <- get_census_version_description("latest")
expect_true(is.list(desc))
expect_true(is.character(desc$release_build))
expect_true(is.character(desc$soma.uri))
})

test_that("get_census_version_directory", {
df <- get_census_version_directory()
expect_true(is.data.frame(df))
desc <- as.list(df["2023-02-13", ])
expect_equal(desc$release_build, "2023-02-13")
expect_equal(desc$soma.uri, "s3://cellxgene-data-public/cell-census/2023-02-13/soma/")
})