Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: Object dtype dtype('O') has no native HDF5 equivalent #636

Open
AB1995UCSF opened this issue Nov 1, 2021 · 6 comments
Open
Labels

Comments

@AB1995UCSF
Copy link

AB1995UCSF commented Nov 1, 2021

Hi there,

I am running into this error when I am trying a h5ad file from my Anndata object. I downloaded a dataset from the Allen Brain Atlas [1], and then I loaded it using this code:

M1_matrix = pd.read_csv('/path/matrix.csv',index_col=0)
M1_rows = pd.read_csv('/path/human_MTG_2018-06-14_genes-rows.csv')
M1_rows.index=M1_rows['gene']
M1_columns = pd.read_csv('/path/Human_M1_data/metadata.csv')
M1_columns.index=M1_columns['sample_name']
import Anndata
adata = anndata.AnnData(X=M1_matrix.to_numpy(), obs=M1_columns, var=M1_rows)

And then I run the following to try and convert objects to strings:

adata.obs.columns = adata.obs.columns.astype(str)
adata.var.columns = adata.var.columns.astype(str)

adata.var=adata.var.convert_dtypes()
adata.obs=adata.obs.convert_dtypes()

And then when I tried to write it with:

adata.write(path/M1.h5ad)

Then I got the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    208         try:
--> 209             return func(elem, key, val, *args, **kwargs)
    210         except Exception as e:

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py in write_array(f, key, value, dataset_kwargs)
    184         value = _to_hdf5_vlen_strings(value)
--> 185     f.create_dataset(key, data=value, **dataset_kwargs)
    186 

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
    148 
--> 149             dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
    150             dset = dataset.Dataset(dsid)

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter)
     88             dtype = numpy.dtype(dtype)
---> 89         tid = h5t.py_create(dtype, logical=1)
     90 

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    208         try:
--> 209             return func(elem, key, val, *args, **kwargs)
    210         except Exception as e:

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py in write_series(group, key, series, dataset_kwargs)
    288     else:
--> 289         write_array(group, key, series.values, dataset_kwargs=dataset_kwargs)
    290 

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    211             parent = _get_parent(elem)
--> 212             raise type(e)(
    213                 f"{e}\n\n"

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'cluster_order' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    208         try:
--> 209             return func(elem, key, val, *args, **kwargs)
    210         except Exception as e:

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py in write_dataframe(f, key, df, dataset_kwargs)
    262     for col_name, (_, series) in zip(col_names, df.items()):
--> 263         write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
    264 

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    211             parent = _get_parent(elem)
--> 212             raise type(e)(
    213                 f"{e}\n\n"

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'cluster_order' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'cluster_order' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_29925/3214377814.py in <module>
----> 1 adata.write('/wynton/group/pollen/arnar/Scanpy/Scanpy/data/Human_M1_data/Human_M1_data.h5ad')

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense)
   1903             filename = self.filename
   1904 
-> 1905         _write_h5ad(
   1906             Path(filename),
   1907             self,

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs)
    109         else:
    110             write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs)
--> 111         write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
    112         write_attribute(f, "var", adata.var, dataset_kwargs=dataset_kwargs)
    113         write_attribute(f, "obsm", adata.obsm, dataset_kwargs=dataset_kwargs)

~/utils/miniconda3/envs/scanpy/lib/python3.9/functools.py in wrapper(*args, **kw)
    875                             '1 positional argument')
    876 
--> 877         return dispatch(args[0].__class__)(*args, **kw)
    878 
    879     funcname = getattr(func, '__name__', 'singledispatch function')

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
    128     if key in f:
    129         del f[key]
--> 130     _write_method(type(value))(f, key, value, *args, **kwargs)
    131 
    132 

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    210         except Exception as e:
    211             parent = _get_parent(elem)
--> 212             raise type(e)(
    213                 f"{e}\n\n"
    214                 f"Above error raised while writing key {key!r} of {type(elem)}"

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'cluster_order' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'cluster_order' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.

Thank. you so much for your help!

Dataset
[1] https://portal.brain-map.org/atlases-and-data/rnaseq/human-m1-10x

@AB1995UCSF
Copy link
Author

AB1995UCSF commented Nov 1, 2021

just fyi, here is the output from logging.print_versions()
anndata     0.7.6
scanpy      1.8.1
sinfo       0.3.4
-----
PIL                 8.4.0
beta_ufunc          NA
binom_ufunc         NA
bottleneck          1.3.2
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
h5py                3.5.0
igraph              0.9.7
joblib              1.0.1
kiwisolver          1.3.1
leidenalg           0.8.8
llvmlite            0.37.0
matplotlib          3.4.3
mkl                 2.4.0
mpl_toolkits        NA
natsort             7.1.1
nbinom_ufunc        NA
numba               0.54.1
numexpr             2.7.3
numpy               1.20.1
packaging           21.0
pandas              1.3.3
pkg_resources       NA
pyexpat             NA
pyparsing           2.4.7
pytz                2021.3
scipy               1.7.1
six                 1.16.0
sklearn             1.0.1
tables              3.6.1
texttable           1.6.4
threadpoolctl       2.2.0
wcwidth             0.2.5
-----
Python 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0]
Linux-3.10.0-1160.36.2.el7.x86_64-x86_64-with-glibc2.17
32 logical CPU cores, x86_64

@LuckyMD
Copy link

LuckyMD commented Nov 2, 2021

Hey!
You can check which columns are causing this issue by running adata.obs.dtypes and adata.var.dtypes and finding the columns that say 'Object'. You can cast those to integer or strings using adata.obs[col_name] = adata.obs[col_name].astype(int) or .astype(str).

@ivirshup
Copy link
Member

ivirshup commented Nov 9, 2021

I think this is related to #504, but is a bit different because I don't think the column giving the error is pd.Int64Dtype, but doesn't have any null values. We could either:

@AB1995UCSF, you should be fine to write this if you just don't call .convert_dtypes(). E.g. just:

adata = anndata.AnnData(
    ...,
    obs=pd.read_csv('/path/Human_M1_data/metadata.csv').set_index("sample_name"),
    ...,
)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

@flying-sheep
Copy link
Member

@ivirshup any idea what solution we want to go with?

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

@github-actions github-actions bot added the stale label Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants