Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporates comms edits and updates links to new landing site #421

Merged
merged 10 commits into from
Apr 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@
"source": [
"## Census organization\n",
"\n",
"The [Census schema](https://cellxgene-census.readthedocs.io/en/latest/schema.html) defines the structure of the Census. In short, you can think of the Census as a structured collection of items that stores different pieces of information. All of these items and the parent collection are SOMA objects of various types and can all be accessed with the [`TileDB-SOMA` API](https://github.com/single-cell-data/TileDB-SOMA) ([documentation](https://tiledbsoma.readthedocs.io/en/latest/)).\n",
"The [Census schema](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_schema.html) defines the structure of the Census. In short, you can think of the Census as a structured collection of items that stores different pieces of information. All of these items and the parent collection are SOMA objects of various types and can all be accessed with the [`TileDB-SOMA` API](https://github.com/single-cell-data/TileDB-SOMA) ([documentation](https://tiledbsoma.readthedocs.io/en/latest/)).\n",
"\n",
"\n",
"The `cellxgene_census` package contains some convenient wrappers of the `TileDB-SOMA` API. An example of this is the function we used to open the Census: `cellxgene_census.open_soma()`\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"\n",
"## Opening the census\n",
"\n",
"First we open the Census, if you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://cellxgene-census.readthedocs.io/en/latest/notebooks/analysis_demo/comp_bio_census_info.html)"
"First we open the Census, if you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://chanzuckerberg.github.io/cellxgene-census/notebooks/analysis_demo/comp_bio_census_info.html)"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion api/r/cellxgene.census/vignettes/census_query_extract.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ To learn what metadata columns are available for fetching and filtering we can d
census$get("census_data")$get("homo_sapiens")$obs$colnames()
```

`soma_joinid` is a special `SOMADataFrame` column that is used for join operations. The definition for all other columns can be found at the [Census schema](https://cellxgene-census.readthedocs.io/en/latest/cellxgene_census_docsite_schema.html).
`soma_joinid` is a special `SOMADataFrame` column that is used for join operations. The definition for all other columns can be found at the [Census schema](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_schema.html).

All of these can be used to fetch specific columns or specific rows matching a condition. For the latter we need to know the values we are looking for *a priori*.

Expand Down
53 changes: 23 additions & 30 deletions docs/cellxgene_census_docsite_FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,27 +19,25 @@ Last updated: Apr, 2023.

## Why should I use the Census?

The Census provides efficient low-latency access via Python and R APIs to most single-cell RNA data from [CZ CELLxGENE Discover](https://cellxgene.cziscience.com/).
The Census provides efficient low-latency access via Python and R APIs to most single-cell RNA data from [CZ CELLxGENE Discover](https://cellxgene.cziscience.com/). To accelerate computational research, the Census enables researchers to:

To accelerate your computational research, **you should use the Census if you want to**:

- Easily get slices of data from more than 400 single-cell datasets spanning about 50 M cells from >60 K genes from human or mouse.
- Get these data with standardized and harmonized cell and gene metadata.
- Access slices of data from more than 500 single-cell datasets spanning about 50 M cells from >60 K genes from human or mice.
- Access to data with standardized cell and gene metadata with harmonized labels.
- Easily load multi-dataset slices into Scanpy or Seurat.
- Implement out-of-core (a.k.a online) operations for larger-than-memory processes.


For example you could easily get "*all T-cells from Lung with COVID-19*" into an [AnnData](https://anndata.readthedocs.io/en/latest/), [Seurat](https://satijalab.org/seurat/), or into memory-sufficient data chunks via [PyArrow](https://arrow.apache.org/docs/python/index.html) or [R Arrow](https://arrow.apache.org/docs/r/).
For example, a user can easily get *all T-cells from Lung with COVID-19* into [AnnData](https://anndata.readthedocs.io/en/latest/), [Seurat](https://satijalab.org/seurat/), or into memory-sufficient data chunks via [PyArrow](https://arrow.apache.org/docs/python/index.html) or [R Arrow](https://arrow.apache.org/docs/r/).


**You should not use the Census if you want to:**
The Census is not suited for:

- Access non-standardized cell metadata and gene metadata available in the original [datasets](https://cellxgene.cziscience.com/datasets).
- Access the author-contributed normalized expression values or embeddings.
- Access all data from a single dataset.
- Access non-RNA or spatial data present in CZ CELLxGENE Discover as it is not yet supported in the Census.
- Access to non-standardized cell metadata and gene metadata available in the original [datasets](https://cellxgene.cziscience.com/datasets).
- Access to the author-contributed normalized expression values or embeddings.
- Access to all data from just one dataset.
- Access to non-RNA or spatial data present in CZ CELLxGENE Discover as it is not yet supported in the Census.

For all of these cases you should perform web downloads from the [CZ CELLxGENE Discover site](https://cellxgene.cziscience.com/datasets), you can find instructions to do so [here](https://cellxgene.cziscience.com/docs/03__Download%20Published%20Data).
If you’d like to perform any of the above tasks, you can access web downloads directly from the [CZ CELLxGENE Discover Datasets](https://cellxgene.cziscience.com/datasets) feature. [Click here](https://cellxgene.cziscience.com/docs/03__Download%20Published%20Data) for more information about downloading published data on CELLxGENE Discover.

## What data is contained in the Census?

Expand All @@ -59,18 +57,17 @@ The Census does not have normalized counts or embeddings because:

If you have any suggestions for methods that our team should explore please share them with us via a [feature request in the github repository](https://github.com/chanzuckerberg/cellxgene-census/issues/new?assignees=&labels=user+request&template=feature-request.md&title=).

## How does the Census differentiate from other services?
## How does the Census differentiate from other tools?

The Census differentiates from existing single-cell services by providing access to the largest corpus of standardized single-cell data via [TileDB-SOMA](https://github.com/single-cell-data/TileDB-SOMA/issues/new/choose).
The Census differentiates from existing single-cell tools by providing fast, efficient access to the largest corpus of standardized single-cell data – CZ CELLxGENE Discover – via [TileDB-SOMA](https://github.com/single-cell-data/TileDB-SOMA/issues/new/choose). Thus, single-cell data from about 50 M cells across >60 K genes, with 11 standardized cell metadata variables and harmonized GENCODE annotations are ready for:

Thus, single-cell data from about 50 M cells across >60 K genes, with 11 standardized cell metadata variables and harmonized GENCODE annotations is at your finger tips to:
* Opening and reading data at low latency from the cloud.
* Querying and accessing data using metadata filters.
* Loading and creating AnnData objects.
* Loading and creating Seurat objects.
* From Python, creating PyArrow objects, SciPy sparse matrices, NumPy arrays, and Pandas data frames.
* From R, creating R Arrow objects, sparse matrices (via the Matrix package), and standard data frames and (dense) matrices.

- Open and read data at low latency from the cloud.
- Query and access data using metadata filters.
- Load and create AnnData objects.
- Load and create Seurat objects.
- From Python create PyArrow objects, SciPy sparse matrices, NumPy arrays, and Pandas data frames.
- From R create R Arrow objects, sparse matrices (via the Matrix package), and standard data frames and (dense) matrices.

## Can I query human and mouse data in a single query?

Expand All @@ -82,7 +79,7 @@ The Census data is publicly hosted free-of-cost in an Amazon Web Services (AWS)

## Can I retrieve the original H5AD datasets from which the Census was built?

Yes, you can use the API function `download_source_h5ad` to do so. For usage see the reference documentation at the [doc-site](https://cellxgene-census.readthedocs.io/en/) or directly from Python or R:
Yes, you can use the API function `download_source_h5ad` to do so. For usage, please see the reference documentation at the [doc-site](https://chanzuckerberg.github.io/cellxgene-census/) or directly from Python or R:

Python

Expand All @@ -100,14 +97,12 @@ library(cellxgene.census)

## How can I increase the performance of my queries?

Since the access patterns are via the internet, usually the main limiting step for data queries is bandwidth and client location.

We recommend the following to increase query efficency:
Since the access patterns are via the internet, usually the main limiting step for data queries is bandwidth and client location. We recommend the following tactics to increase query efficiency:

- Utilize a computer connected to high-speed internet.
- Utilize an ethernet connection and not a wifi connection.
- If possible utilize online computing located in the west coast of the US.
- Highly recommended: [EC2 AWS instances](https://aws.amazon.com/ec2/) in the `us-west-2` region.
- Highly recommended: [EC2 AWS instances](https://aws.amazon.com/ec2/) in the `us-west-2` region.

## Can I use conda to install the Census Python API?

Expand All @@ -121,17 +116,15 @@ pip install cellxgene-census

## How can I ask for support?

You can either submit a [github issue](https://github.com/chanzuckerberg/cellxgene-census/issues/new/choose) or post in the slack channel `#cellxgene-census-users` at the [CZI Slack community](https://cziscience.slack.com/join/shared_invite/zt-czl1kp2v-sgGpY4RxO3bPYmFg2XlbZA#/shared-invite/email).
You can either submit a [github issue](https://github.com/chanzuckerberg/cellxgene-census/issues/new/choose), or for quick support you can join the CZI Science Community on Slack ([czi.co/science-slack](https://czi.co/science-slack)) and ask questions in the `#cellxgene-census-users` channel.

## How can I ask for new features?

You can submit a [feature request in the github repository](https://github.com/chanzuckerberg/cellxgene-census/issues/new?assignees=&labels=user+request&template=feature-request.md&title=).

## How can I contribute my data to the Census?

To inquire about submitting your data to CZ CELLxGENE Discover you need to follow these [instructions](https://cellxgene.cziscience.com/docs/032__Contribute%20and%20Publish%20Data).

If you data request is accepted, upon submission the data will automatically get included in the Census if it meets the [biological criteria defined in the Census schema](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md#data-included).
To inquire about submitting your data to CZ CELLxGENE Discover, [click here](https://cellxgene.cziscience.com/docs/032__Contribute%20and%20Publish%20Data). If your data request is accepted, the data will automatically be included in the Census if it meets the [biological criteria defined in the Census schema](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md#data-included).

## Why do I get an `ArraySchema` error when opening the Census?

Expand Down
42 changes: 23 additions & 19 deletions docs/cellxgene_census_docsite_landing.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,25 @@
❗ **R API in beta.**


# CZ CELLxGENE Discover Census

![image](cellxgene_census_docsite_workflow.svg)

The CZ CELLxGENE Discover **Census** provides efficient computational tooling to access, query, and analyze all single-cell RNA data from CZ CELLxGENE Discover.

Using a **new access paradigm of cell-based slicing and querying**, you can interact with the data across datasets through [TileDB-SOMA](https://github.com/single-cell-data/TileDB-SOMA), or get slices in [AnnData](https://anndata.readthedocs.io/) or [Seurat](https://satijalab.org/seurat/) objects.
CZ CELLxGENE Census provides efficient computational tooling to **access, query, and analyze all single-cell RNA data from CZ CELLxGENE Discover**. Using a new access paradigm of cell-based slicing and querying, you can interact with the data through TileDB-SOMA, or get slices in AnnData or Seurat objects, thus accelerating your research by significantly minimizing data wrangling.

Get started on using the Census:

- [Installation](cellxgene_census_docsite_installation.md)
- [R & Python quick start](cellxgene_census_docsite_quick_start.md)
- [Quick start (Python and R)](cellxgene_census_docsite_quick_start.md)
- [Census data and schema](cellxgene_census_docsite_schema.md)
- [FAQ](cellxgene_census_docsite_FAQ.md)
- [Python tutorials](examples.rst)
- R tutorials. *Coming soon.*
- *Coming soon: R tutorials.*

![image](cellxgene_census_docsite_workflow.svg)

## Citing the Census

Please follow the [citation guidelines](https://cellxgene.cziscience.com/docs/08__Cite%20cellxgene%20in%20your%20publications) offered by CZ CELLxGENE Discover.

## Census Capabilities

The Census is a data object publicly hosted online and a convenience API to open it. The object is built using the [SOMA](https://github.com/single-cell-data/SOMA) API and data model via its implementation [TileDB-SOMA](https://github.com/single-cell-data/TileDB-SOMA). As such, the Census has all the data capabilities offered by TileDB-SOMA including:
The Census is a data object publicly hosted online and an API to open it. The object is built using the [SOMA](https://github.com/single-cell-data/SOMA) API specification and data model, and it is implemented via [TileDB-SOMA](https://github.com/single-cell-data/TileDB-SOMA). As such, the Census has all the data capabilities offered by TileDB-SOMA including:

**Data access at scale**

Expand All @@ -43,28 +37,38 @@ The Census is a data object publicly hosted online and a convenience API to open
- From Python create [PyArrow](https://arrow.apache.org/docs/python/index.html) objects, SciPy sparse matrices, NumPy arrays, and pandas data frames.
- From R create [R Arrow](https://arrow.apache.org/docs/r/index.html) objects, sparse matrices (via the [Matrix](https://cran.r-project.org/package=Matrix) package), and standard data frames and (dense) matrices.

## Census Data

A description of the Census data and its schema is detailed [here](cellxgene_census_docsite_schema.md).

:warning: Note that the data includes:

* **Full-gene sequencing reads** (e.g. Smart-Seq2) and **molecule counts** (e.g. 10X).
* **Duplicate cells** present across multiple datasets, these can be filtered in or out using the cell metadata variable `is_primary_data`.

## Census Data Releases

The Census data release plans are detailed [here](cellxgene_census_docsite_data_release_info.md).

Shortly, starting in May 15, 2023, Census long-term supported data releases will be published every 6 months and will be publicly accessible for at least 5 years. In addition, weekly releases are published without any guarantee of permanence.

Starting May 15th, 2023, Census data releases with long-term support will be published every six months. These releases will be publicly accessible for at least five years. In addition, weekly releases may be published without any guarantee of permanence.

## Questions, feedback and issues

- Check out the [FAQ](cellxgene_census_docsite_FAQ.md).
- Questions: we encourage you to ask questions via [github issues](https://github.com/chanzuckerberg/cellxgene-census/issues). Alternatively, for quick support you can join the [CZI Science Community](https://czi.co/science-slack) on Slack and join the `#cellxgene-census-users` channel
- Bugs: please submit a [github issue](https://github.com/chanzuckerberg/cellxgene-census/issues).
- Security issues: if you believe you have found a security issue, in lieu of filing an issue please responsibly disclose it by contacting <[email protected]>.
- You can send any other feedback to <[email protected]>
- Users are encouraged to submit questions and feature requests about the Census via [github issues](https://github.com/chanzuckerberg/cellxgene-census/issues).
- For quick support, you can join the CZI Science Community on Slack ([czi.co/science-slack](https://czi.co/science-slack)) and ask questions in the `#cellxgene-census-users` channel.
- Users are encouraged to share their feedback by emailing <[email protected]>.
- Bugs can be submitted via [github issues](https://github.com/chanzuckerberg/cellxgene-census/issues).
- If you believe you have found a security issue, please disclose it by contacting <[email protected]>.
- Additional FAQs can be found [here](cellxgene_census_docsite_FAQ.md).


## Coming soon

- We are currently working on creating the tooling necessary to perform data modeling at scale with seamless integration of the Census and [PyTorch](https://pytorch.org/).
- To increase the usability of the Census for research, in 2023 and 2024 we are planning to explore the following areas:
- Include organism-wide normalized layers.
- Include Organism-wide embeddings.
- Include organism-wide embeddings.
- On-demand information-rich subsampling.

## Projects and tools using Census
Expand Down
4 changes: 2 additions & 2 deletions docs/cellxgene_census_docsite_quick_start.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Quick start

This page provides some examples to get you going with using the Census. For more detailed usage please take a look at the [Python tutorials](examples.rst) or R vignettes (*coming soon*).
This page provides details to start using the Census. Click [here] (examples.rst) for more detailed Python tutorials (R vignettes coming soon).

**Contents**

Expand All @@ -11,7 +11,7 @@ This page provides some examples to get you going with using the Census. For mor
## Installation


First make sure to install the Census API following the [installation instructions.](cellxgene_census_docsite_installation.md)
Install the Census API by following [these instructions.](cellxgene_census_docsite_installation.md)

## Python quick start

Expand Down
6 changes: 6 additions & 0 deletions docs/cellxgene_census_docsite_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,12 @@ All data from [CZ CELLxGENE Discover](https://cellxgene.cziscience.com/) that ad
- Raw counts.
- Only standardized cell and gene metadata as described in the CELLxGENE Discover dataset [schema](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md).

:warning: Note that the data includes:

* **Full-gene sequencing reads** (e.g. Smart-Seq2) and **molecule counts** (e.g. 10X).
* **Duplicate cells** present across multiple datasets, these can be filtered in or out using the cell metadata variable `is_primary_data`.


## SOMA objects

You can find the full SOMA specification [here](https://github.com/single-cell-data/SOMA/blob/main/abstract_specification.md#foundational-types).
Expand Down
Loading