You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The data stored in the X data matrix is the data that is viewable in CELLxGENE Explorer. It MUST be encoded as a scipy.sparse.csr_matrix with zero values encoded as implicit zeros.
CELLxGENE's matrix layer requirements are tailored to optimize data reuse. Because each assay has different characteristics, the requirements differ by assay type. In general, CELLxGENE requires submission of "raw" data suitable for computational reuse when a standard raw matrix format exists for an assay. It is STRONGLY RECOMMENDED to also include a "normalized" matrix with processed values ready for data analysis and suitable for visualization in CELLxGENE Explorer. The schema imposes the following requirements:
All matrix layers MUST have the same shape, and have the same cell labels and gene labels.
...
Context
Per conversation on the September 25 DP Call, @nayib-jose-gloria proposed:
CSC matrices–only found 2 in the corpus, and they are memory inefficient compared to CSR/COO. How do we feel about converting existing cases and codifying in the schema that we won’t take CSC? Or putting a size limit on CSC submissions?
“Backed” mode is prohibitively slow for validating CSC, requiring us to read into memory; we allocate processing resources relative to the worst case scenario, and we are quite close to making backed reads/writes happen in constant memory for CSR/COO matrices.
@jahilton@brianraymor My understanding is that for existing datasets with CSC, they should be converted (or in the process of converting) to CSR. For any new datasets the validator sees with CSC, the expectation is that the validator should raise a failure message rather than converting to CSR.
It also seems reasonable to me that this requirement should extend to .raw.X
My understanding is that for existing datasets with CSC, they should must be converted by curators ... And @jahilton is addressing that per earlier comments.
the expectation is that the validator should must raise a failure message rather than converting to CSR.
@brianraymor I think for layers with low sparsity we'll still want to support dense matrices as well; we just want to limit the options for sparse matrices to encode. That said, I can follow-up and see if our initial review of the corpus covered how many dense X/raw.X matrices actually exist.
Design
X
(Matrix Layers)The data stored in the
X
data matrix is the data that is viewable in CELLxGENE Explorer. It MUST be encoded as ascipy.sparse.csr_matrix
with zero values encoded as implicit zeros.CELLxGENE's matrix layer requirements are tailored to optimize data reuse. Because each assay has different characteristics, the requirements differ by assay type. In general, CELLxGENE requires submission of "raw" data suitable for computational reuse when a standard raw matrix format exists for an assay. It is STRONGLY RECOMMENDED to also include a "normalized" matrix with processed values ready for data analysis and suitable for visualization in CELLxGENE Explorer. The schema imposes the following requirements:
Context
Per conversation on the September 25 DP Call, @nayib-jose-gloria proposed:
Also see continuing conversation in cell-sci-platform.
The text was updated successfully, but these errors were encountered: