Releases: chanzuckerberg/cellxgene-census
v1.16.2
What's Changed
- [python] Fix vector search import by @ivirshup in #1287
- [python] Pin against tiledbsoma 1.14.1 by @ivirshup in #1290
- [docs]: Add banner and newsletter signup (SCE-29) by @kaloster in #1286
- [python] Correct docs on default behaviour for DataLoader no longer including soma_joinid by @ivirshup in #1292
New Contributors
Full Changelog: v1.16.1...v1.16.2
v1.16.1
v1.16.0
What's Changed
-
[python] Embeddings search experimental API by @mlin in #1164
-
[python] Add comp_bio_embedding_search notebook by @mlin @pablo-gar in #1241 #1270
-
[docs] Update landing page with latest LTS data and API release. "Adds Release notes" link by @pablo-gar in #1239
-
[python] Update minimum versions of dependencies + python versions following SPEC-0 by @ivirshup in #1189
-
[python] Bump minimum tiledbsoma version to 1.12.3 by @ivirshup in #1235
-
[docs] Fix model names in metrics article by @pablo-gar in #1248
-
[docs] Fix date on
cellxgene_census_docsite_data_release_info.md
by @pablo-gar in #1259 -
feat: Update navbar to include Differential Expression by @tihuan in #1264
Full Changelog: v1.15.0...v1.16.0
v1.15.0
What's Changed
- [python] Shuffle multiple SOMA chunks by @atolopko-czi in #1103
- [python] Relax pin:
tiledbsoma~=1.11.4
by @ryan-williams in #1187 - [pytorch] Set shuffle as default in pytorch, use new algorithm by @ebezzi in #1188
- [python] Remove pytorch pin by @ebezzi in #1190
- [builder] Vector indexing pipeline for Census embeddings by @mlin in #1122
- [builder] Add collection_doi_label to the datasets dataframe by @ebezzi in #1200
- [python] Require scikit-misc < 0.4 by @ebezzi in #1213
- [builder] Upgrade to CELLxGENE schema 5.1 by @Bento007 in #1192
- [docs] Update schema doc for version 2.1.0 by @ebezzi in #1211
- [python] Update scvi pipeline for the June LTS training by @ebezzi in #1173
- [python] Check census version for get_all_available_embeddings by @ivirshup in #1207
- [python] Move to {obs/var} specific arguments for get_anndata by @ivirshup in #1149
- [python] Stop including
"do_not_delete"
key in result ofget_census_version_directory
by @ivirshup in #1215 - [python] Add package specific user-agent by @ivirshup in #1193
- [python] Update docs for
get_obs
,get_var
by @ivirshup in #1170 - [python] Geneformer updates for July 2024 LTS by @mlin in #961
- [python] Support custom obs encoders by @ebezzi in #1191
- [python] Fix ExperimentDataPipe length by @ebezzi in #1221
- [python] dataloader optimization (picking up 1169) by @ivirshup in #1224
- [docs] Update census data release page with upcoming LTS information by @pablo-gar in #1210
- [docs] Add PyTorch loaders article release by @pablo-gar in #1214
- [python] Geneformer CLS embeddings by @mlin in #1225
- [docs] Update the Logistic Regression pytorch notebook by @ebezzi in #1230
- [docs] Add metrics article for LTS 2024-12-15 by @pablo-gar in #1233
New Contributors
- @ryan-williams made their first contribution in #1187
Full Changelog: v1.14.1...v1.15.0
v1.14.1
What's Changed
- Upgrade to
tiledbsoma
1.11.4 @ebezzi in #1185 - [python] Make docs searchable by @ivirshup in #1166
- [python] Progress bar for download_source_h5ad by @ivirshup in #1153
- [docs] Add cell guide notebook for queries on cell type descendants by @pablo-gar in #1163
- [docs] Update links to gget website and author email by @lauraluebbert in #1171
- [docs] Remove newsletter banner by @ebezzi in #1161
- [python] Use pyproject.toml for doc env by @ivirshup in #1172
- [python] Fix wrong parameters in ExperimentDataPipe by @ebezzi in #1178
Full Changelog: v1.14.0...v1.14.1
v1.14.0
Major changes
- Upgrade to
tiledbsoma
1.11.3 - Add a cellxgene_census.get_obs utility function that can be used to return an obs dataframe in Pandas format more easily.
What's Changed
- [builder] Update list of accepted assays and bump schema to 2.0.1 by @ebezzi in #1117
- [docs] update scvi notebook to use latest model by @pablo-gar in #1137
- [docs] Update geneformer notebook to latest API changes by @pablo-gar in #1138
- [docs] Add development setup notes for Census API by @prathapsridharan in #1111
- [python] Add get_obs by @ivirshup in #1151
- [python] share census objects in tests for faster test time by @ivirshup in #1156
- [builder] Bump requests from 2.31.0 to 2.32.0 in /tools/cellxgene_census_builder by @dependabot in #1154
- [builder] Add ClientResponseError to the builder managed exceptions by @ebezzi in #1126
- [python] [R] Bump tiledbsoma to 1.11.3
New Contributors
Full Changelog: v1.13.1...v1.14.0
v1.13.1
Changes
This release pins pytorch
to 2.2.0 to work around an incompatibility with torchdata
introduced by the latest pytorch
release. Therefore, if you're using the Census pytorch loaders and you're seeing dependency issues, you should upgrade to this version.
What's Changed
- [docs] add news article for categoricals by @pablo-gar in #1086
- [docs] Add 2024 articles to docsite by @pablo-gar in #1088
- [misc] Remove MacOS support claim for census builder by @prathapsridharan in #1084
- [builder] cellxgene_ontology_guide integration by @ebezzi in #1094
- [builder] Remove datasets from "HTAN VUMC" Collection from the blocklist by @ebezzi in #1099
- [docs] Update embedding notebook by @pablo-gar in #1124
- [ci] Fix torch dependency issue and pause tests on Mac/Python3.8 by @ebezzi in #1118
- [python] Fix pytorch imports by @ebezzi in #1129
Full Changelog: v1.13.0...v1.13.1
v1.13.0
New embeddings API
Census embeddings can now accessed using a new, simplified API. Check the notebooks for collaboration and hosted models for more information.
obs columns are now categorical instead of strings
Starting from the 2024-04-01
Census build, a subset of the columns in the obs
dataframe are now categorical instead of strings.
For Python users, note that Pandas will encode these columns as pandas.Categorical
for which some downstream operations may need to be adapted. See this link for more details. In particular:
Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data
and
DataFrame methods like sum, groupby, pivot, value_counts also show “unused” categories when observed=False, which is the default.
For R users, note that these columns will be encoded as factor
and similarly downstream operations may need to be adapted. See this link for more details.
For Python and R users interfacing with arrow
, these columns will be encoded as dictionary
, see more details for R in this link and Python in this link.
Additions
- [builder] enable Arrow Dictionary feature flag by @bkmartinjr in #1064
- [python] New embeddings API by @ebezzi in #1023
- [docs] New embeddings API notebooks by @ebezzi in #1070
Full Changelog: v1.12.0...v1.13.0
v1.12.0
Census uses CELLxGENE schema version 5.0.0
The Census builder is now using Census schema 2.0.0 which in turn uses CELLxGENE dataset schema version 5.0.0. All Census data releases starting on 2024-03-26 will adhere to this schema.
- Update to require CELLxGENE schema version 5.0.0. Includes breaking changes.
- Expanded list of assays included in the Census.
- Expanded the list of assays defined as full-gene sequencing assays, which have special normalized layer handling.
- Clarified handling of datasets which are multi-species on the obs or var axis.
Additions
- [builder] CxG schema 5 / Census schema 2 by @bkmartinjr in #1024
What's Changed
- [misc] 'native cell' -> 'unknown' on geneformer label_blocklist by @danieljhegeman in #999
- [misc] Prepare for 1.12.0 release by @prathapsridharan in #1061
New Contributors
- @danieljhegeman made their first contribution in #999
Full Changelog: v1.11.1...v1.12.0