Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading CRAM files with embedded references #307

Merged
merged 1 commit into from
May 22, 2024

Conversation

athos
Copy link
Member

@athos athos commented Apr 23, 2024

Note: This PR will be marked as ready for review after #306 has been merged.

This PR allows the CRAM reader to handle files with embedded references. With this update, we no longer need to specify any reference file to read CRAM files with embedded references.

Specification

According to the CRAM specification, if the encoder embeds a portion of the reference sequence into a slice, it allocates a dedicated block to store the raw (ASCII character) base sequence data for the reference sequence covering that slice. The content ID of this allocated block is pointed to by the "embedded reference bases block content ID" field in the slice header. If there is no embedded reference for the slice, this field's value will be -1. (See §8.5 for more details)

Implementation

In this change, the CRAM reader first checks the "embedded reference bases block content ID" field in the slice header to identify the block allocated for the embedded reference. If this value is positive, it creates a dedicated seq resolver based on the content of the block associated with the specified content ID. This new seq resolver is responsible for resolving reference sequences covered by the embedded reference and helps with the decoding of CRAM records in the slice.

Notes

This PR also makes some slight changes to the signature of the seq resolver protocol method to make the new seq resolver for embedded reference a little bit more efficient.

Note that this PR does not include additional test code, although you can easily verify the change using a simple snippet with a test file 0600_mapped.cram from hts-specs (which is documented as being encoded with embedded references) as follows:

(require '[cljam.io.cram :as cram])

(with-open [cram-rdr (cram/reader "0600_mapped.cram")]
  (= ["ATTTTTCGGGTTTTTTGAAAATGTAGCTACAGAAAGTTTGTTTAAATATCTTGTTTTCTTGCACTTTGTGCAGAATT"
      "CCCTTTCAGAAAAATTATTTTTAAGAATTTTTCATRYTAGGAATATTGTTATTTCAGAAAATAGCTAAATGTGATTTCTGTAATTTTGCCTGCTAAAGGG"]
     (map :seq (cram/read-alignments cram-rdr))))

@athos athos self-assigned this Apr 23, 2024
Copy link

codecov bot commented Apr 23, 2024

Codecov Report

Attention: Patch coverage is 47.36842% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 88.40%. Comparing base (11d39b0) to head (fae8525).

Files Patch % Lines
src/cljam/io/cram/reader.clj 33.33% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #307      +/-   ##
==========================================
- Coverage   88.50%   88.40%   -0.11%     
==========================================
  Files          95       95              
  Lines        8214     8228      +14     
  Branches      506      506              
==========================================
+ Hits         7270     7274       +4     
- Misses        438      448      +10     
  Partials      506      506              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@athos athos force-pushed the feature/cram-index-access branch from 4795708 to 4fdc68b Compare April 24, 2024 06:32
Base automatically changed from feature/cram-index-access to master April 24, 2024 06:39
@athos athos force-pushed the feature/embedded-reference-support branch from db98dac to 21bd22e Compare April 24, 2024 06:44
@athos athos requested review from alumi, a team and matsutomo81 and removed request for a team April 24, 2024 07:32
@athos athos marked this pull request as ready for review April 24, 2024 07:33
@athos athos requested a review from a team as a code owner April 24, 2024 07:33
@athos athos removed the request for review from a team April 24, 2024 08:28
Copy link
Member

@alumi alumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this feature! I added some comments. Please take a look 🙏

src/cljam/io/cram/reader.clj Outdated Show resolved Hide resolved
src/cljam/io/cram/reader.clj Outdated Show resolved Hide resolved
Copy link
Member

@alumi alumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating! LGTM 👍

@athos athos force-pushed the feature/embedded-reference-support branch from 01eaab0 to f1837a8 Compare May 14, 2024 06:45
@athos
Copy link
Member Author

athos commented May 14, 2024

Rebased onto the latest master branch.

@athos athos force-pushed the feature/embedded-reference-support branch from f1837a8 to 7d8671b Compare May 17, 2024 00:14
@athos
Copy link
Member Author

athos commented May 17, 2024

Rebased onto the latest master branch.

Copy link
Contributor

@matsutomo81 matsutomo81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this update. LGTM 👍

I haven't merged it because I thought you might want to handle the commit integration.

@athos athos force-pushed the feature/embedded-reference-support branch from 7d8671b to fae8525 Compare May 21, 2024 23:59
@athos athos merged commit 82e16ed into master May 22, 2024
17 checks passed
@athos athos deleted the feature/embedded-reference-support branch May 22, 2024 00:06
@athos
Copy link
Member Author

athos commented May 22, 2024

Thank you both for reviewing! I've squashed the commits and merged the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants