Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): xarray with experimental backed reading #1247

Open
wants to merge 291 commits into
base: main
Choose a base branch
from

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Nov 30, 2023

This PR is a lighter weight version of #947 that involves using the original AnnData object as the class to hold obs and var xr.Dataset.

Copy link

codecov bot commented Dec 7, 2023

Codecov Report

Attention: Patch coverage is 88.81356% with 33 lines in your changes missing coverage. Please review.

Project coverage is 84.57%. Comparing base (34e9783) to head (bf710d0).

Files with missing lines Patch % Lines
src/anndata/experimental/backed/_lazy_arrays.py 86.15% 9 Missing ⚠️
src/anndata/_core/storage.py 37.50% 5 Missing ⚠️
src/anndata/tests/helpers.py 75.00% 5 Missing ⚠️
src/anndata/_io/specs/lazy_methods.py 91.30% 4 Missing ⚠️
src/anndata/experimental/backed/_compat.py 86.20% 4 Missing ⚠️
src/anndata/experimental/backed/_io.py 89.47% 4 Missing ⚠️
src/anndata/_core/aligned_df.py 80.00% 1 Missing ⚠️
src/anndata/_core/index.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1247      +/-   ##
==========================================
- Coverage   86.87%   84.57%   -2.31%     
==========================================
  Files          39       44       +5     
  Lines        6036     6308     +272     
==========================================
+ Hits         5244     5335      +91     
- Misses        792      973     +181     
Flag Coverage Δ
82.81% <84.06%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/anndata/_core/anndata.py 83.77% <100.00%> (+0.04%) ⬆️
src/anndata/_core/merge.py 83.91% <ø> (-11.08%) ⬇️
src/anndata/_core/views.py 85.71% <100.00%> (-5.40%) ⬇️
src/anndata/_io/specs/__init__.py 100.00% <ø> (ø)
src/anndata/_io/specs/registry.py 95.53% <100.00%> (-0.50%) ⬇️
src/anndata/_io/zarr.py 83.75% <100.00%> (+0.20%) ⬆️
src/anndata/_types.py 85.29% <100.00%> (ø)
src/anndata/experimental/__init__.py 100.00% <100.00%> (ø)
src/anndata/experimental/backed/__init__.py 100.00% <100.00%> (ø)
src/anndata/experimental/backed/_xarray.py 100.00% <100.00%> (ø)
... and 8 more

... and 4 files with indirect coverage changes

@ilan-gold ilan-gold added this to the 0.11.0 milestone Jul 2, 2024
@ilan-gold ilan-gold self-assigned this Jul 2, 2024
@ivirshup
Copy link
Member

/azp run

Copy link

Pull request contains merge conflicts.

src/anndata/experimental/backed/_io.py Outdated Show resolved Hide resolved
f = h5py.File(store, mode="r")

def callback(func, elem_name: str, elem, iospec):
if iospec.encoding_type == "anndata" or elem_name.endswith("/"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the elem_name.endswith("/") case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably root store, we use this in all the other read methods...

Comment on lines +12 to +24
try:
import xarray as xr
except ImportError:
xr = None


try:
from xarray.backends.zarr import ZarrArrayWrapper
except ImportError:

class ZarrArrayWrapper:
def __repr__(self) -> str:
return "mock ZarrArrayWrapper"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code means that you get a really confusing error if you run read_backed without xarray being installed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but unfortunately I think this compat layer is needed so that core is compatible. Maybe we can add a check within read_backed?

@ebezzi
Copy link

ebezzi commented Sep 18, 2024

Issue: if you run adata.obs["cell_type"].to_pandas() (on a dataset from cellxgene), the output should be a categorical, but it is actually an object:

obs_names
0                                   macrophage
1                                 stromal cell
2       CD8-positive, alpha-beta memory T cell
3       CD8-positive, alpha-beta memory T cell
4        CD141-positive myeloid dendritic cell
                         ...                  
9419                              stromal cell
9420                              stromal cell
9421                              stromal cell
9422    CD8-positive, alpha-beta memory T cell
9423                              stromal cell
Length: 9424, dtype: object

@@ -757,6 +757,7 @@ def np_bool_to_pd_bool_array(df: pd.DataFrame):
return df


# TODO: concat for xarray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice to have with this PR. What are your thoughts on this @ilan-gold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea......I agree......

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sorry, I remember now. I looked into this...it's a lot of work...but maybe I'll try my hand at it....sometimes you get fatigue just from how daunting something seems

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can try to help if you open a PR showing where you need input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah @dcherian I am actually almost 100% sure this is already implemented in xarray, but we need to get the functionality here! I think it's all good on your end but I will let you know if we need anything!

@ilan-gold
Copy link
Contributor Author

ilan-gold commented Sep 19, 2024

@ivirshup @ebezzi filed pydata/xarray#9519 but it should not block if it does not come out before this does IMO.

And now: pydata/xarray#9520

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants