Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SAM headers to ModelSegments CNV pipeline output. #3914

Merged
merged 10 commits into from
Dec 15, 2017

Conversation

samuelklee
Copy link
Contributor

@samuelklee samuelklee commented Dec 5, 2017

Comprises the commits after 7992f64. The only commit with real substance is Updated metadata and abstract collection classes.. The rest of the commits simply update calling code, related tests, and test files. These updates were slightly less trivial for the plotting classes, so these are also split off into separate commits.

Again, probably could be engineered better (there are two parallel class hierarchies for metadata and collection classes, which is kind of gross), but we can refactor later if needed.

@asmirnov239 please review. Again, lower priority than gCNV VCF, but the sooner this is in master the easier it will be to get things into FireCloud. Let's try for early next week. I'll start doc updates concurrently.

@codecov-io
Copy link

codecov-io commented Dec 5, 2017

Codecov Report

Merging #3914 into master will decrease coverage by 0.075%.
The diff coverage is 75.542%.

@@               Coverage Diff               @@
##              master     #3914       +/-   ##
===============================================
- Coverage     79.074%   78.999%   -0.075%     
- Complexity     17690     17692        +2     
===============================================
  Files           1163      1166        +3     
  Lines          63662     63620       -42     
  Branches       10025     10038       +13     
===============================================
- Hits           50340     50259       -81     
- Misses          9443      9480       +37     
- Partials        3879      3881        +2
Impacted Files Coverage Δ Complexity Δ
...broadinstitute/hellbender/utils/IntervalUtils.java 91.107% <ø> (ø) 168 <0> (ø) ⬇️
...r/tools/exome/plotting/PlotSegmentedCopyRatio.java 81.944% <ø> (ø) 22 <0> (ø) ⬇️
...ellbender/cmdline/StandardArgumentDefinitions.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...pynumber/denoising/SVDReadCountPanelOfNormals.java 100% <ø> (ø) 1 <0> (ø) ⬇️
...llbender/tools/exome/plotting/PlotACNVResults.java 84.615% <ø> (ø) 22 <0> (ø) ⬇️
.../segmentation/MultidimensionalKernelSegmenter.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...ute/hellbender/tools/copynumber/ModelSegments.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...er/tools/copynumber/denoising/GCBiasCorrector.java 90.385% <0%> (-3.615%) 18 <0> (ø)
...umber/formats/records/MultidimensionalSegment.java 25% <0%> (ø) 4 <0> (ø) ⬇️
.../formats/collections/ModeledSegmentCollection.java 69.767% <0%> (ø) 3 <0> (ø) ⬇️
... and 61 more

@@ -89,8 +112,11 @@ public final int size() {
* Writes the records to file.
*/
public void write(final File outputFile) {
try (final RecordCollectionWriter writer = new RecordCollectionWriter(outputFile)) {
writer.writeAllRecords(records);
try (final FileWriter writer = new FileWriter(outputFile, true)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be modified to first overwrite existing files with the header before appending the records. Currently, if the file exists, this will just append everything.

@samuelklee samuelklee mentioned this pull request Dec 14, 2017
24 tasks
@samuelklee
Copy link
Contributor Author

samuelklee commented Dec 14, 2017

Just a note to self: now that sl_wgs_acnv is in, I did some rebasing.

My local sl_wgs_acnv_headers is rebased on master, but I haven't pushed it; will push after @asmirnov239 completes his review.
Both local and remote sl_delete_cnv_legacy are rebased on local sl_wgs_acnv_headers.
Both local and remote sl_wgs_acnv_headers_docs are rebased on top of that.
At some point sl_gcnv_ploidy_cli should be rebased as well, but currently it's rebased on pre-review sl_wgs_acnv -> sl_wgs_acnv_headers.
The forthcoming branch for adding old padding behavior to PreprocessIntervals will need to be rebased on top of sl_gcnv_ploidy_cli.

@LeeTL1220 @davidbenjamin Hopefully all the branches under review will get merged in soon, but ideally any C30/FC updates to the WDL will happen after sl_wgs_acnv_headers_docs (which I'm still working on) gets merged.

Copy link
Collaborator

@asmirnov239 asmirnov239 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with the review @samuelklee. Back to you!

@@ -135,8 +129,19 @@ protected Object doWork() {
//check that the number of copy-ratio/allele-fraction points in each segment are correct
validateNumPoints();

//load contig names and lengths from the sequence dictionary into a LinkedHashMap
final Map<String, Integer> contigLengthMap = PlottingUtils.getContigLengthMap(sequenceDictionaryFile, minContigLength, logger);
//validate sequence dictionaries and load contig names and lengths into a LinkedHashMap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could extract this validation to a method in PlottingUtils

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, it would not be used anywhere else.

* Represents an immutable collection of records, a set of
* mandatory column headers given by a {@link TableColumnCollection}, and lambdas for
* reading and writing records.
* Represents {@link METADATA} (which can be represented as a {@link SAMFileHeader},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right parenthesis missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public abstract class RecordCollection<RECORD> {
public abstract class AbstractRecordCollection<METADATA extends Metadata, RECORD> {
private final METADATA metadata;
private final SAMFileHeader header;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just access it through metadata field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

@Override
public int hashCode() {
return records.hashCode();
int result = header.hashCode();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you implement the hashCode() for metadata you could use it here instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@Override
protected boolean isCommentLine(final String[] line) {
return line.length > 0 && line[0].startsWith(COMMENT_PREFIX);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we completely getting rid of comments starting with "#"? We could optionally have both symbols serve as an comment deliminator, to distinguish between SAMheader and other miscellaneous comments such as command invoked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for now. I think the right way to do this would be with @PG lines.

*
* @author Samuel Lee &lt;[email protected]&gt;
*/
public interface SampleLocatableMetadata extends LocatableMetadata {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a need for this interface. The concrete implementations could just implement both LocatableMetadata and/or SampleMetadata. We could make an issue for that and refactor the class hierarchy later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed #3976

*
* @author Samuel Lee &lt;[email protected]&gt;
*/
public abstract class AbstractSampleRecordCollection<RECORD> extends AbstractRecordCollection<SampleMetadata, RECORD> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could test this class similarly to AbstractSampleLocatableCollection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

punting, expanded #3916.

private static final String SAMPLE_NAME_EXPECTED = "test";

private static final SampleLocatableMetadata METADATA_EXPECTED = new SimpleSampleLocatableMetadata(
"test-sample",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will the exception be thrown now, if the interval is not compatible with the sequence dictionary(e.g. non of the contigs in the dictionary has interval's chromosom)? If yes, can we add a test for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good catch! Added validation to CopyNumberArgumentValidationUtils.validateIntervals.

final AlleleFractionPrior prior) {
Utils.nonNull(allelicCounts);
Utils.nonEmpty(segments);
Utils.nonNull(segments);
Utils.validateArg(allelicCounts.getMetadata().getSequenceDictionary().equals(segments.getMetadata().getSequenceDictionary()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more explicit here, and say that the sequence dictionaries of allelic count collection and interval collection do not match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Utils.nonEmpty(segments);
Utils.nonNull(segments);
Utils.validateArg(copyRatios.getMetadata().getSequenceDictionary().equals(segments.getMetadata().getSequenceDictionary()),
"Metadata do not match.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as above

/**
* This caller is loosely based on the legacy ReCapSeg caller that was originally implemented in ReCapSeg v1.4.5.0,
* but introduces major changes. The method is as follows:
* 1) use the non-log2 mean copy ratio to determine copy-neutral segments,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be published as javadoc it needs to use html list elements, e.g. <ol><li>...<li>...</ol>. Otherwise this will all be formatted as a single paragraph, with lines wrapped together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pshapiro4broad, doc updates are going to be handled in a separate PR. Also, I think we are typically less careful about formatting javadoc that is not for CLI classes, since these will be geared more towards developers.

}

private Statistics calculateCallingStatistics() {
//get the segments that fall within the copy-neutral region
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// comments need a space before the first comment text char. Also IMO it's good practice to write them as complete sentences, and capitalize/punctuate them as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. We are aiming to get this PR in ASAP so I'm not going to address style issues here.

@samuelklee
Copy link
Contributor Author

Thanks @asmirnov239. Addressed comments and rebased. @LeeTL1220 needs to make some changes on top of this branch, so please take a quick look. Will merge once tests pass unless you catch anything major.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants