Release v1.13.0 · chanzuckerberg/cellxgene-census

New embeddings API

Census embeddings can now accessed using a new, simplified API. Check the notebooks for collaboration and hosted models for more information.

obs columns are now categorical instead of strings

Starting from the 2024-04-01 Census build, a subset of the columns in the obs dataframe are now categorical instead of strings.

For Python users, note that Pandas will encode these columns as pandas.Categorical for which some downstream operations may need to be adapted. See this link for more details. In particular:

Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data

and

DataFrame methods like sum, groupby, pivot, value_counts also show “unused” categories when observed=False, which is the default.

For R users, note that these columns will be encoded as factor and similarly downstream operations may need to be adapted. See this link for more details.

For Python and R users interfacing with arrow, these columns will be encoded as dictionary, see more details for R in this link and Python in this link.

Additions

[builder] enable Arrow Dictionary feature flag by @bkmartinjr in #1064
[python] New embeddings API by @ebezzi in #1023
[docs] New embeddings API notebooks by @ebezzi in #1070

Full Changelog: v1.12.0...v1.13.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.13.0

New embeddings API

obs columns are now categorical instead of strings

Additions

Contributors