-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] CellCensus package MVP #206
Changes from 14 commits
4de790c
dfbe224
2aa914d
dff15b0
1f75d7f
2594478
3edaea9
a0d5a8a
f1172e1
000250a
90ce5a0
ea20f1d
7b7bf22
7b9508b
288c22c
fd42254
9cb394d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
name: cell_census R package checks | ||
|
||
on: | ||
pull_request: | ||
paths-ignore: | ||
- "apis/python/**" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can probably add the builder here: |
||
push: | ||
branches: [main] | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: r-lib/actions/setup-r@v2 | ||
with: | ||
extra-repositories: https://tiledb-inc.r-universe.dev | ||
- uses: r-lib/actions/setup-r-dependencies@v2 | ||
with: | ||
working-directory: ./api/r/CellCensus | ||
extra-packages: any::rcmdcheck, any::styler | ||
needs: check | ||
- name: styler | ||
run: Rscript -e 'library("styler"); style_pkg("api/r/CellCensus", dry="fail")' | ||
- uses: r-lib/actions/check-r-package@v2 | ||
with: | ||
working-directory: ./api/r/CellCensus | ||
args: 'c("--no-manual", "--as-cran")' |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,8 @@ name: Python Linting | |
|
||
on: | ||
pull_request: | ||
paths-ignore: | ||
- "apis/r/**" | ||
push: | ||
branches: [main] | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -134,3 +134,4 @@ temp | |
|
||
# ruff | ||
.ruff_cache | ||
.Rproj.user |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
^CellCensus\.Rproj$ | ||
^\.Rproj\.user$ | ||
^LICENSE\.md$ | ||
^README\.Rmd$ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
.Rproj.user | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason for not having this in the root |
||
.Rhistory |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
Version: 1.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't pretend to know R ecosystem versioning conventions, but should an early release such as this start as a 1.0? Or 0.? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also should we sync this to the Python versioning? We should probably consolidate the behavior with tiledb-soma anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This |
||
|
||
RestoreWorkspace: No | ||
SaveWorkspace: No | ||
AlwaysSaveHistory: Default | ||
|
||
EnableCodeIndexing: Yes | ||
UseSpacesForTab: Yes | ||
NumSpacesForTab: 2 | ||
Encoding: UTF-8 | ||
|
||
RnwWeave: Sweave | ||
LaTeX: pdfLaTeX | ||
|
||
AutoAppendNewline: Yes | ||
StripTrailingWhitespace: Yes | ||
LineEndingConversion: Posix | ||
|
||
BuildType: Package | ||
PackageUseDevtools: Yes | ||
PackageInstallArgs: --no-multiarch --with-keep.source | ||
PackageRoxygenize: rd,collate,namespace |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
Package: CellCensus | ||
Title: CZI Science Cell Census API | ||
Version: 0.0.0.9000 | ||
Authors@R: | ||
person("Chan Zuckerberg Initiative", email = "[email protected]", | ||
role = c("aut", "cre", "cph", "fnd")) | ||
Description: API to facilitate use of the CZI Science Cell Census. The Cell | ||
Census is a versioned container for the single-cell data hosted at CELLxGENE | ||
Discover. | ||
License: MIT + file LICENSE | ||
URL: https://github.com/chanzuckerberg/cell-census | ||
BugReports: https://github.com/chanzuckerberg/cell-census/issues | ||
Encoding: UTF-8 | ||
Roxygen: list(markdown = TRUE) | ||
RoxygenNote: 7.2.3 | ||
Additional_repositories: https://tiledb-inc.r-universe.dev | ||
Imports: | ||
jsonlite, | ||
tiledbsoma, | ||
tiledb | ||
Suggests: | ||
testthat (>= 3.0.0) | ||
Config/testthat/edition: 3 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
YEAR: 2023 | ||
COPYRIGHT HOLDER: Chan Zuckerberg Initiative |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# MIT License | ||
|
||
Copyright (c) 2023 Chan Zuckerberg Initiative | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Generated by roxygen2: do not edit by hand | ||
|
||
export(get_census_version_description) | ||
export(get_census_version_directory) | ||
export(open_soma) | ||
importFrom(jsonlite,fromJSON) | ||
importFrom(tiledb,tiledb_ctx) | ||
importFrom(tiledbsoma,SOMACollection) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
DEFAULT_TILEDB_CONFIGURATION <- c( | ||
"py.init_buffer_bytes" = paste(1 * 1024**3), | ||
"soma.init_buffer_bytes" = paste(1 * 1024**3) | ||
) | ||
|
||
#' Open the Cell Census | ||
#' | ||
#' @param census_version The version of the Census, e.g., "latest" | ||
#' @param uri The URI containing the Census SOMA objects. If specified, takes | ||
#' precedence over `census_version`. | ||
#' | ||
#' @return Top-level `tiledbsoma::SOMACollection` object | ||
#' @importFrom tiledbsoma SOMACollection | ||
#' @importFrom tiledb tiledb_ctx | ||
#' @export | ||
#' | ||
#' @examples | ||
open_soma <- function(census_version = "latest", uri = "") { | ||
cfg <- DEFAULT_TILEDB_CONFIGURATION | ||
|
||
if (uri == "") { | ||
description <- get_census_version_description(census_version) | ||
uri <- description$soma.uri | ||
if ("soma.s3_region" %in% names(description) && | ||
description$soma.s3_region != "") { | ||
cfg <- c(cfg, c("vfs.s3.region" = description$soma.s3_region)) | ||
} | ||
} | ||
|
||
return(tiledbsoma::SOMACollection$new(uri, ctx = tiledb::tiledb_ctx(cfg))) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
CELL_CENSUS_RELEASE_DIRECTORY_URL <- "https://s3.us-west-2.amazonaws.com/cellxgene-data-public/cell-census/release.json" | ||
|
||
|
||
#' Get release description for given census version | ||
#' | ||
#' @param census_version The census version name. | ||
#' | ||
#' @return List with the release location and metadata | ||
#' @export | ||
#' | ||
#' @examples | ||
get_census_version_description <- function(census_version) { | ||
census_directory <- get_census_version_directory() | ||
description <- census_directory[census_version, ] | ||
if (nrow(description) == 0) { | ||
stop(paste("unknown Cell Census version:", census_version)) | ||
} | ||
ans <- as.list(description) | ||
ans$census_version <- census_version | ||
return(ans) | ||
} | ||
|
||
#' Get the directory of cell census releases currently available | ||
#' | ||
#' @return Data frame of available cell census releases, including location and | ||
#' metadata. | ||
#' @importFrom jsonlite fromJSON | ||
#' @export | ||
#' | ||
#' @examples | ||
get_census_version_directory <- function() { | ||
raw <- jsonlite::fromJSON(CELL_CENSUS_RELEASE_DIRECTORY_URL) | ||
|
||
# Resolve all aliases for easier use | ||
for (field in names(raw)) { | ||
points_at <- raw[[field]] | ||
while (is.character(points_at)) { | ||
points_at <- raw[[points_at]] | ||
} | ||
raw[[field]] <- points_at | ||
} | ||
|
||
# Replace NULLs with empty string to facilitate data frame conversion | ||
raw <- simple_rapply(raw, function(x) ifelse(is.null(x), "", x)) | ||
|
||
# Convert nested list to data frame | ||
df <- do.call(rbind, lapply(raw, as.data.frame)) | ||
rownames(df) <- names(raw) | ||
return(df) | ||
} | ||
|
||
# https://stackoverflow.com/a/38950304 | ||
simple_rapply <- function(x, fn) { | ||
if (is.list(x)) { | ||
lapply(x, simple_rapply, fn) | ||
} else { | ||
fn(x) | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
|
||
# CELLxGENE Cell Census | ||
|
||
<!-- badges: start --> | ||
<!-- badges: end --> | ||
|
||
The `CellCensus` package provides an API to facilitate use of the CZI Science Cell Census. The Cell Census is a versioned container for the single-cell data hosted at [CELLxGENE Discover](https://cellxgene.cziscience.com/). | ||
|
||
**Status**: Pre-release, under rapid development. Expect API changes. | ||
|
||
For more information, see the [cell_census repo](https://github.com/chanzuckerberg/cell-census/). | ||
|
||
## Installation | ||
|
||
You can install the development version of CellCensus from [GitHub](https://github.com/) with: | ||
|
||
``` r | ||
# install.packages("devtools") | ||
devtools::install_github("chanzuckerberg/cell-census/api/r/CellCensus") | ||
print(CellCensus::open_soma()) | ||
``` | ||
|
||
(minimal apt dependencies: r-base cmake git) | ||
|
||
## Example | ||
|
||
This is a basic example which shows you how to solve a common problem: | ||
|
||
``` r | ||
library(CellCensus) | ||
## basic example code | ||
``` | ||
|
||
## For More Help | ||
|
||
For more help, please file a issue on the repo, or contact us at <[email protected]> | ||
|
||
If you believe you have found a security issue, we would appreciate notification. Please send email to <[email protected]>. |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# This file is part of the standard setup for testthat. | ||
# It is recommended that you do not modify it. | ||
# | ||
# Where should you do additional test configuration? | ||
# Learn more about the roles of various files in: | ||
# * https://r-pkgs.org/tests.html | ||
# * https://testthat.r-lib.org/reference/test_package.html#special-files | ||
|
||
library(testthat) | ||
library(CellCensus) | ||
|
||
test_check("CellCensus") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
test_that("open_soma", { | ||
coll <- open_soma("2023-02-13") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this build tag will eventually go away (weeks to months from now). The only durable tag (currently) is the I'm not quite sure what to suggest - and you likely know the above already :-) If we want to preserve a well-known tag for testing, we could easily add it to the release manifest, and keep it around semi-permanently (perhaps even re-aliasing it as needed). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO, if tests depend upon having specific data, those tests should use a test fixture (dynamically built, ideally) rather than the live census. And we should use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think fixtures are ideal but they will probably take quite a bit of time to write and we should probably add them after we ship this MVP. I think using this as a sanity check is a good idea at least for now. Using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed that fixtures are fine for future consideration. I think it's acceptable if tests fail even if the census data is the cause. These are essentially system tests, rather than unit tests. Consider that many of the Python "unit" tests, which are really system tests, are using pytest markers (annotations) to denote that they depend upon live data. If we ever have a failure that is census build-specific, it's probably an indication that we're missing an important builder validation. And we can then improve the validator as needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree all. I changed some things to use |
||
expect_equal(coll$uri, "s3://cellxgene-data-public/cell-census/2023-02-13/soma/") | ||
expect_true(coll$exists()) | ||
expect_true(coll$get("census_data")$get("homo_sapiens")$exists()) | ||
}) | ||
|
||
test_that("open_soma latest/default", { | ||
coll_default <- open_soma() | ||
coll_latest <- open_soma("latest") | ||
expect_equal(coll_default$uri, coll_latest$uri) | ||
}) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
test_that("get_census_version_description", { | ||
desc <- get_census_version_description("2023-02-13") | ||
expect_equal(desc$release_build, "2023-02-13") | ||
expect_equal(desc$soma.uri, "s3://cellxgene-data-public/cell-census/2023-02-13/soma/") | ||
|
||
# alias resolution | ||
desc <- get_census_version_description("latest") | ||
expect_true(is.list(desc)) | ||
expect_true(is.character(desc$release_build)) | ||
expect_true(is.character(desc$soma.uri)) | ||
}) | ||
|
||
test_that("get_census_version_directory", { | ||
df <- get_census_version_directory() | ||
expect_true(is.data.frame(df)) | ||
desc <- as.list(df["2023-02-13", ]) | ||
expect_equal(desc$release_build, "2023-02-13") | ||
expect_equal(desc$soma.uri, "s3://cellxgene-data-public/cell-census/2023-02-13/soma/") | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that we are multi-lingual, we may want to reorganize the other workflows (or at least rename them so it is clear they are Python-specific).
Optional idea: rename each with a language prefix?
@atolopko-czi - any thoughts or preferences on this?
I'm also OK if we defer this to a future PR/project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either language-specific prefixes or subdirs (if that's supported) seems reasonable to me; agree that we should differentiate languages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do a small follow-up PR on this.