You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm contemplating switching over csaw to use GenomicAlignments for BAM file access, rather than continuing to wrestle with Rhtslib and its poor documentation/unreliability on Windows.
I've been looking at the readGAlignments and readGAlignmentPairs functions, which should be swap-in replacements for my main internal BAM reading functions (see LTLA/csaw#4). Currently, csaw runs on sorted BAM files on a chromosome-by-chromosome basis, so I would be calling these functions with which= set to an interval spanning the current chromosome in each iteration.
However, I am concerned about the performance implications of readGAlignmentPairs's obligations to retrieve mates from other chromosomes. Even with indexing, this would seem to require a lot of additional file accesses that would be unnecessary for my use case.
Would it be worthwhile to add an option to ignore paired reads that are not on the same reference sequence? Alternatively, this could feasibly be part of an expanded suite of scanBamParam options based on the INSisize field, e.g., keeping only non-zero INSisize values.
The text was updated successfully, but these errors were encountered:
I'm contemplating switching over csaw to use GenomicAlignments for BAM file access, rather than continuing to wrestle with Rhtslib and its poor documentation/unreliability on Windows.
I've been looking at the
readGAlignments
andreadGAlignmentPairs
functions, which should be swap-in replacements for my main internal BAM reading functions (see LTLA/csaw#4). Currently, csaw runs on sorted BAM files on a chromosome-by-chromosome basis, so I would be calling these functions withwhich=
set to an interval spanning the current chromosome in each iteration.However, I am concerned about the performance implications of
readGAlignmentPairs
's obligations to retrieve mates from other chromosomes. Even with indexing, this would seem to require a lot of additional file accesses that would be unnecessary for my use case.Would it be worthwhile to add an option to ignore paired reads that are not on the same reference sequence? Alternatively, this could feasibly be part of an expanded suite of
scanBamParam
options based on theINS
isize
field, e.g., keeping only non-zeroINS
isize
values.The text was updated successfully, but these errors were encountered: