Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_whole_alignment doesn't output chromosomal BAMs #806

Open
cmarkello opened this issue Apr 24, 2020 · 4 comments
Open

run_whole_alignment doesn't output chromosomal BAMs #806

cmarkello opened this issue Apr 24, 2020 · 4 comments
Assignees

Comments

@cmarkello
Copy link
Collaborator

@glennhickey
The run_whole_alignment function in vg_map.py with bam_output=True or surject=True doesn't appear to process and return merged bams by contig, but instead just merges all bam chunks from all contigs together.

https://github.com/vgteam/toil-vg/blob/master/src/toil_vg/vg_map.py#L401

@glennhickey
Copy link
Collaborator

Yup, doesn't look like that logic's implemented for BAM. Surprised it doesn't crash looking at the code. It'd need the BAM equivalent of this function https://github.com/vgteam/toil-vg/blob/master/src/toil_vg/vg_map.py#L653-L655 to be run.

@cmarkello
Copy link
Collaborator Author

Wouldn't the following also need to be refactored in order to take into account the chunked bam output of run_chunk_alignment?

https://github.com/vgteam/toil-vg/blob/master/src/toil_vg/vg_map.py#L456-L457
https://github.com/vgteam/toil-vg/blob/master/src/toil_vg/vg_map.py#L463-L464

@glennhickey
Copy link
Collaborator

I imagine so. Shuffling the GAM chunks by chromosome was a necessity for the old vg call pipeline, not so much anymore. A whole genome GAM can now be split into (unsorted) chromsome chunks with vg chunk -M in a few hours (I think). You can extract a chromosome out of a whole-genome BAM even faster with samtools view. Though, using samtools at least, you'd probably need to index the BAM to efficiently do many chromosomes at once.

@cmarkello cmarkello self-assigned this Apr 25, 2020
@cmarkello
Copy link
Collaborator Author

Yeah I've been working on a solution that did chromosome splitting per chunk. But I agree, I think splitting by contig after merging the BAM would be more efficient and simpler than merging chunked contig BAMs together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants