Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write CRAM on Spark. #1488

Merged
merged 1 commit into from
Mar 3, 2016
Merged

Write CRAM on Spark. #1488

merged 1 commit into from
Mar 3, 2016

Conversation

cmnbroad
Copy link
Collaborator

#1270. Depends on #1469, which should be reviewed first and then I can remove the first commit from this branch.

@droazen droazen self-assigned this Feb 16, 2016
@cmnbroad cmnbroad force-pushed the cn_spark_write_cram branch 3 times, most recently from b926e6b to 42af603 Compare February 24, 2016 16:29
@cmnbroad
Copy link
Collaborator Author

@droazen This is ready whenever you are. In addition to writing CRAM on Spark, there are a couple of changed for #1087, though we can't completely fix that yet due to the NM/MD tag issue.

@@ -83,24 +88,58 @@ public SparkHeaderlessBAMOutputFormat() {
}
}

// Output format class for writing CRAM files through saveAsNewAPIHadoopFile. Must be public.
public static class SparkCRAMOutputFormat extends KeyIgnoringCRAMOutputFormat<NullWritable> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is identical to SparkBAMOutputFormat -- is it only a separate class because setHeader() is static and we might want a different header for the two output formats, or is there some deeper reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its only because the two classes have different base classes that live in Hadoop-BAM. I don't see any useful way to consolidate them here, though going forward we may want to change AnySAMOutputFormat in Hadoop-BAM to delegate based on file extension similar to the way AnySAMInputFormat does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droazen
Copy link
Collaborator

droazen commented Feb 25, 2016

Review complete -- back to @cmnbroad. Merge after addressing comments.

cmnbroad added a commit that referenced this pull request Mar 3, 2016
@cmnbroad cmnbroad merged commit 3176a26 into master Mar 3, 2016
@cmnbroad cmnbroad deleted the cn_spark_write_cram branch March 3, 2016 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants