v1.13.0
New embeddings API
Census embeddings can now accessed using a new, simplified API. Check the notebooks for collaboration and hosted models for more information.
obs columns are now categorical instead of strings
Starting from the 2024-04-01
Census build, a subset of the columns in the obs
dataframe are now categorical instead of strings.
For Python users, note that Pandas will encode these columns as pandas.Categorical
for which some downstream operations may need to be adapted. See this link for more details. In particular:
Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data
and
DataFrame methods like sum, groupby, pivot, value_counts also show “unused” categories when observed=False, which is the default.
For R users, note that these columns will be encoded as factor
and similarly downstream operations may need to be adapted. See this link for more details.
For Python and R users interfacing with arrow
, these columns will be encoded as dictionary
, see more details for R in this link and Python in this link.
Additions
- [builder] enable Arrow Dictionary feature flag by @bkmartinjr in #1064
- [python] New embeddings API by @ebezzi in #1023
- [docs] New embeddings API notebooks by @ebezzi in #1070
Full Changelog: v1.12.0...v1.13.0