Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read bam with split reads #259

Open
lauradmartens opened this issue Mar 14, 2024 · 5 comments
Open

Read bam with split reads #259

lauradmartens opened this issue Mar 14, 2024 · 5 comments
Labels
enhancement ✨ New feature or request

Comments

@lauradmartens
Copy link

Hi, first of all thanks a lot for this great tool!

I was wondering if it would be possible to have the make fragment files function from bam files also consider split reads in addition to clipped reads? Do you think this would be difficult to implement?

Thanks a lot for your help!

@kaizhang
Copy link
Owner

I'm not sure how to treat split reads. Suppose we have split reads as follows:

Read1: chr1, 200, 400
Read2: chr1, 500, 600
Split read: chr1, 2000, 2100

What are the fragments supposed to be generated here?

image

@lauradmartens
Copy link
Author

That's a good point! I guess I thought of it rather in the context of RNA coverage rather than ATAC because it would be nice to also use your storage format + functionality for that.

@manzt
Copy link

manzt commented Mar 17, 2024

Hi there. I worked with @lauradmartens on this the other day and we came up with a proof of concept main...manzt:SnapATAC2:split-reads

This seems to work well for our use case: generating fragments .bed for the RNA coverage, and then loading the fragments in AnnData with import_fragments. Props to the API design that this just works! Would you have any interest in supporting in extending the current API to support our use case? We would like to contribute if possible, since having this functionality in snapatac2 (python) would be great for some of our collaborators.

It seems specialized enough that I'm not sure whether adding a flag to make_fragment_file (e.g., expand_split_reads) or introducing a specialized function (e.g., make_rna_coverage_fragment_file) would be desirable. Thanks for your consideration!

@kaizhang
Copy link
Owner

Thanks! I'll take a look at your implementation this weekend, and decide how to proceed.

@kaizhang kaizhang added the enhancement ✨ New feature or request label Mar 19, 2024
@kaizhang
Copy link
Owner

The approach you are taking works when the reads are single-ended and your goal is to generate bigwig coverage file. To make it works for paired-end reads as well and to be compatible with other functions, I'm thinking to store these reads in a new slot in obsm, e.g., 'split_reads'. The downstream functions can then choose to use it or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ✨ New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants