-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving raw h5ad files upon running snap.pp.import_data #344
Comments
It will be a full copy, not a reference. The fact that it is smaller is expected. This has to do with the design of hdf5 format. Once you store something in hdf5 file, you cannot really get rid of it. Instead, you simply mask it so that it is no long accessible. But it still takes up space. When you saving the hdf5 object to a new file, you only save those necessary parts.
Just use ".write()" at any point you want to save the current state.
|
Thank you for the prompt response! Just to make sure I understand, if I import N fragment files to create N h5ad objects and I want to immediately save them to a raw state dir, I'd just have to do something like And then say, if I wanted to load the raw files later to play around with some QC thresholds but would still like to keep a raw version of them, should I copy those h5ad files before loading them? or load them and use Sorry if this is a bit of a redundant question, but just want to make sure we're handling files at different states of the processing workflow appropriately. Thanks! |
Hey Kai,
Thanks so much for developing snapATAC2. I've been using it for a bit and really like it! I do have one question regarding working with files in backed mode. When we import fragment files and work in backed mode, I realize the in-disk files will get updated as we do something to the adatas (e.g., filtering low QC cells). However, if we wanted to go back to the raw h5ad files, those would no longer be "raw" right? Ideally, I'd love to save the raw h5ad files so I don't have to re-import data.
I wasn't sure what would be the best practices workflow for that. I naively decided to copy the newly created h5ad files into a raw dir when I run
snap.pp.import_data
for new datasets, so I can re-use the raw h5ad files later if needed. However, when checking the files after the fact, it looks like the files I copied into the raw dir are smaller (likely updated after QC) instead of the ones I keep in the processing dir and meant to update. So I have a couple of questions:snap.pp.import_data
and make a copy of the h5ad files, would the adata object reference the original or the copied h5ad file in disk?Thanks!
The text was updated successfully, but these errors were encountered: