Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAM output support #5

Open
peterjc opened this issue Jul 17, 2018 · 5 comments
Open

BAM output support #5

peterjc opened this issue Jul 17, 2018 · 5 comments
Labels
enhancement New feature or request in progress Currently working on this feature/bug

Comments

@peterjc
Copy link

peterjc commented Jul 17, 2018

The README says BAMnostic is a pure Python, OS-agnositic Binary Alignment Map (BAM) file parser and random access tool.

Are you intending to support BAM output, or is this only possible though a workaround like outputting SAM format and piping this though a command line tool like samtools?

@peterjc
Copy link
Author

peterjc commented Jul 17, 2018

This isn't that complicated, see e.g. my implementation at https://github.com/peterjc/biopython/blob/SamBam2015/Bio/Sequencing/SamBam/__init__.py#L1714

@betteridiot
Copy link
Owner

Make it more formal here that I am currently working on supporting BAM output with bamnostic. I will keep this issue open until that PR is complete

@betteridiot betteridiot added the in progress Currently working on this feature/bug label Jul 20, 2018
betteridiot added a commit that referenced this issue Sep 20, 2018
This should be the first steps towards fully supporting BAM output mentioned in #5
@betteridiot
Copy link
Owner

@peterjc 3051e0b should address this issue. It is heavily based on where you pointed me to. However, I made some changes regarding how reads that border full blocks are handled:
If a read is not the first read of the block and will not fit in the block, it starts a new one. Otherwise it will follow the normal flow you had set up.

I also added support of directly copying a BAM file's header if it has been opened by bamnostic.

Additionally, instead of having to reconstruct the byte seq of the read, each AlignedSegment has the _raw_stream attribute that is a direct copy of the whole record. This can quickly be written to a file handler without compression.

@peterjc
Copy link
Author

peterjc commented Sep 21, 2018

I don't yet set how you will use ._raw_stream but it does make sense to try as an optimisation when writing alignment data back to disk without modifications.

@betteridiot
Copy link
Owner

Ideally, a new method for AlignedSegment will be written called to_bam(<bgzf.BamWriter>) which would use _raw_stream to write the read to the file. There are still some API stuff to hash out, but I just wanted to point out that it is almost there.

Additionally, it have added CSI support to bamnostic (in case you were wondering).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in progress Currently working on this feature/bug
Projects
None yet
Development

No branches or pull requests

2 participants