-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added SAM headers to ModelSegments CNV pipeline output. #3914
Conversation
b26e831
to
632bac7
Compare
Codecov Report
@@ Coverage Diff @@
## master #3914 +/- ##
===============================================
- Coverage 79.074% 78.999% -0.075%
- Complexity 17690 17692 +2
===============================================
Files 1163 1166 +3
Lines 63662 63620 -42
Branches 10025 10038 +13
===============================================
- Hits 50340 50259 -81
- Misses 9443 9480 +37
- Partials 3879 3881 +2
|
@@ -89,8 +112,11 @@ public final int size() { | |||
* Writes the records to file. | |||
*/ | |||
public void write(final File outputFile) { | |||
try (final RecordCollectionWriter writer = new RecordCollectionWriter(outputFile)) { | |||
writer.writeAllRecords(records); | |||
try (final FileWriter writer = new FileWriter(outputFile, true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be modified to first overwrite existing files with the header before appending the records. Currently, if the file exists, this will just append everything.
Just a note to self: now that sl_wgs_acnv is in, I did some rebasing. My local sl_wgs_acnv_headers is rebased on master, but I haven't pushed it; will push after @asmirnov239 completes his review. @LeeTL1220 @davidbenjamin Hopefully all the branches under review will get merged in soon, but ideally any C30/FC updates to the WDL will happen after sl_wgs_acnv_headers_docs (which I'm still working on) gets merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with the review @samuelklee. Back to you!
@@ -135,8 +129,19 @@ protected Object doWork() { | |||
//check that the number of copy-ratio/allele-fraction points in each segment are correct | |||
validateNumPoints(); | |||
|
|||
//load contig names and lengths from the sequence dictionary into a LinkedHashMap | |||
final Map<String, Integer> contigLengthMap = PlottingUtils.getContigLengthMap(sequenceDictionaryFile, minContigLength, logger); | |||
//validate sequence dictionaries and load contig names and lengths into a LinkedHashMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could extract this validation to a method in PlottingUtils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, it would not be used anywhere else.
* Represents an immutable collection of records, a set of | ||
* mandatory column headers given by a {@link TableColumnCollection}, and lambdas for | ||
* reading and writing records. | ||
* Represents {@link METADATA} (which can be represented as a {@link SAMFileHeader}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right parenthesis missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public abstract class RecordCollection<RECORD> { | ||
public abstract class AbstractRecordCollection<METADATA extends Metadata, RECORD> { | ||
private final METADATA metadata; | ||
private final SAMFileHeader header; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just access it through metadata
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
@Override | ||
public int hashCode() { | ||
return records.hashCode(); | ||
int result = header.hashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you implement the hashCode()
for metadata
you could use it here instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
@Override | ||
protected boolean isCommentLine(final String[] line) { | ||
return line.length > 0 && line[0].startsWith(COMMENT_PREFIX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we completely getting rid of comments starting with "#"? We could optionally have both symbols serve as an comment deliminator, to distinguish between SAMheader and other miscellaneous comments such as command invoked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for now. I think the right way to do this would be with @PG
lines.
* | ||
* @author Samuel Lee <[email protected]> | ||
*/ | ||
public interface SampleLocatableMetadata extends LocatableMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a need for this interface. The concrete implementations could just implement both LocatableMetadata
and/or SampleMetadata
. We could make an issue for that and refactor the class hierarchy later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filed #3976
* | ||
* @author Samuel Lee <[email protected]> | ||
*/ | ||
public abstract class AbstractSampleRecordCollection<RECORD> extends AbstractRecordCollection<SampleMetadata, RECORD> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could test this class similarly to AbstractSampleLocatableCollection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
punting, expanded #3916.
private static final String SAMPLE_NAME_EXPECTED = "test"; | ||
|
||
private static final SampleLocatableMetadata METADATA_EXPECTED = new SimpleSampleLocatableMetadata( | ||
"test-sample", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So will the exception be thrown now, if the interval is not compatible with the sequence dictionary(e.g. non of the contigs in the dictionary has interval's chromosom)? If yes, can we add a test for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good catch! Added validation to CopyNumberArgumentValidationUtils.validateIntervals.
final AlleleFractionPrior prior) { | ||
Utils.nonNull(allelicCounts); | ||
Utils.nonEmpty(segments); | ||
Utils.nonNull(segments); | ||
Utils.validateArg(allelicCounts.getMetadata().getSequenceDictionary().equals(segments.getMetadata().getSequenceDictionary()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be more explicit here, and say that the sequence dictionaries of allelic count collection and interval collection do not match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Utils.nonEmpty(segments); | ||
Utils.nonNull(segments); | ||
Utils.validateArg(copyRatios.getMetadata().getSequenceDictionary().equals(segments.getMetadata().getSequenceDictionary()), | ||
"Metadata do not match."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here as above
/** | ||
* This caller is loosely based on the legacy ReCapSeg caller that was originally implemented in ReCapSeg v1.4.5.0, | ||
* but introduces major changes. The method is as follows: | ||
* 1) use the non-log2 mean copy ratio to determine copy-neutral segments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to be published as javadoc it needs to use html list elements, e.g. <ol><li>...<li>...</ol>
. Otherwise this will all be formatted as a single paragraph, with lines wrapped together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pshapiro4broad, doc updates are going to be handled in a separate PR. Also, I think we are typically less careful about formatting javadoc that is not for CLI classes, since these will be geared more towards developers.
} | ||
|
||
private Statistics calculateCallingStatistics() { | ||
//get the segments that fall within the copy-neutral region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//
comments need a space before the first comment text char. Also IMO it's good practice to write them as complete sentences, and capitalize/punctuate them as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We are aiming to get this PR in ASAP so I'm not going to address style issues here.
d841cd7
to
0bd294e
Compare
Thanks @asmirnov239. Addressed comments and rebased. @LeeTL1220 needs to make some changes on top of this branch, so please take a quick look. Will merge once tests pass unless you catch anything major. |
Comprises the commits after 7992f64. The only commit with real substance is
Updated metadata and abstract collection classes.
. The rest of the commits simply update calling code, related tests, and test files. These updates were slightly less trivial for the plotting classes, so these are also split off into separate commits.Again, probably could be engineered better (there are two parallel class hierarchies for metadata and collection classes, which is kind of gross), but we can refactor later if needed.
@asmirnov239 please review. Again, lower priority than gCNV VCF, but the sooner this is in master the easier it will be to get things into FireCloud. Let's try for early next week. I'll start doc updates concurrently.