Skip to content

Commit

Permalink
Update tests and javadoc.
Browse files Browse the repository at this point in the history
  • Loading branch information
cmnbroad committed Aug 20, 2024
1 parent ad48eed commit 0184bb5
Show file tree
Hide file tree
Showing 2 changed files with 137 additions and 56 deletions.
112 changes: 90 additions & 22 deletions src/main/java/org/broadinstitute/hellbender/tools/CreateBundle.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
import htsjdk.beta.io.bundle.*;
import htsjdk.beta.plugin.registry.HaploidReferenceResolver;
import htsjdk.beta.plugin.variants.VariantsBundle;
import htsjdk.io.HtsPath;
import htsjdk.samtools.util.FileExtensions;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
Expand All @@ -23,44 +22,113 @@
/**
* Create a bundle (JSON) file for use with a GATK tool.
*
* Since most bundles will contain a primary resource plus at least one secondary resource (typically an index),
* Since most bundles need to contain a primary resource plus at least one secondary resource (typically an index),
* the tool will attempt to infer standard secondary resources(s) for a given primary resource if no secondary resource
* is explicitly provided on the command line. Inferred secondary resources are automatically added to the resulting
* bundle. Secondary resource inference can be suppressed by using the --suppress-resource-resolution argument.
*
* Each resource in a bundle must have an associated content type tag. Content types for each resource are either
* specified on the command line via argument tags, or inferred by the tool. For the primary and secondary resources,
* when no content type argument tag is provided, the tool will attempt to infer the content type from the file
* extension. However, the content type for "other" resources (resources that are nether primary nor secondary resources)
* are NEVER inferred, and must always include a content type argument tag.
* extension. However, the content type for "other" resources (resources that are nether primary nor secondary
* resources) are NEVER inferred, and must always include a content type argument tag.
*
* Bundle output file names must end with the suffix ".json".
*
* Common examples:
* In general, content types can be any string, but there are well known content types that must be used when creating
* bundles for tools that expect well known resources types, such as a VCF, a VCF index, a .fasta file, or a reference
* dictionary file. The common well known content types are:
*
* - "CT_VARIANT_CONTEXTS": a VCF file
* - "CT_VARIANTS_INDEX: VCF" index file
*
* - "CT_HAPLOID_REFERENCE": fasta reference file
* - "CT_HAPLOID_REFERENCE_INDEX": fasta index file
* - "CT_HAPLOID_REFERENCE_DICTIONARY": fasta dictionary file
*
* Common bundle creation examples:
*
* VCF Bundles:
*
* 1) Create a resource bundle for a VCF. Let the tool determine the content types, and resolve the secondary resources
* (which for vcfs is the companion index) automatically by finding a sibling index file. If the sibling file cannot
* be found, an exception wil lbe thrown:
* 1) Create a resource bundle for a VCF from just the VCF, letting the tool resolve the secondary (index) resource by
* automatically finding the sibling index file, and letting the tool determine the content types. If the sibling index
* file cannot be found, an exception will be thrown. Resulting bundle contains the VCF and associated index.
*
* CreateBundle \
* --primary path/to/my.vcf \
* --output mybundle.json
*
* The exact same bundle could be created manually by specifying both the resources and the content types explicitly:
*
* CreateBundle \
* --primary:CT_VARIANT_CONTEXTS path/to/my.vcf \
* --secondary:CT_VARIANTS_INDEX path/to/my.vcf.idx \
* --output mybundle.json
*
* 2) Create a resource bundle for a VCF from just the VCF, but suppress automatic resolution of the secondary
* resources. Let the tool determine the content types. The resulting bundle will contain only the vcf resource:
*
* CreateBundle \
* --primary path/to/my.vcf \
* --suppress-resource-resolution \
* --output mybundle.json
*
* 3) Create a resource bundle for a VCF, but specify the VCF AND the secondary index resource explicitly (which
* suppresses automatic secondary resolution). This is useful when the VCF and index are not in the same directory.
* Let the tool determine the content types. The resulting bundle will contain the VCF and index resources:
*
* CreateBundle \
* --primary path/to/my.vcf \
* --secondary some/other/path/to/vcd.idx \
* --output mybundle.json
*
* 4) Create a resource bundle for a VCF, but specify the VCF AND the secondary index resource explicitly (this
* is useful when the VCF and index are not in the same directory), and specify the content types explicitly via
* command line argument tags. The resulting bundle will contain the VCF and index resources.
*
* CreateBundle \
* --primary:CT_VARIANT_CONTEXTS path/to/my.vcf \
* --secondary:CT_VARIANTS_INDEX some/other/path/to/vcd.idx \
* --output mybundle.json
*
* Reference bundles:
*
* 1) Create a resource bundle for a reference from just the .fasta, letting the tool resolve the secondary
* (index and dictionary) resource by automatically finding the sibling files, and determining the content types.
* If the sibling index file cannot be found, an exception will be thrown. The resulting bundle will contain the
* reference, index, and dictionary.
*
* CreateBundle --primary path/to/my.vcf --output mybundle.json
* CreateBundle \
* --primary path/to/my.fasta \
* --output mybundle.json
*
* 2) Create a resource bundle for a VCF. Let the tool determine the content types, but suppress resolution of the secondary
* resources (which for vcfs is the companion index). The resulting bundle will contain only the vcf resource:
* 2) Create a resource bundle for a reference from just the .fasta, but suppress resolution of the secondary index and
* dictionary resources). Let the tool determine the content type. The resulting bundle will contain only the .fasta
* resource:
*
* CreateBundle --primary path/to/my.vcf --output mybundle.json
* CreateBundle \
* --primary path/to/my.fasta \
* --suppress-resource-resolution \
* --output mybundle.json
*
* 3) Create a resource bundle for a VCF. Let the tool determine the content type, but specify the secondary
* index resource explicitly (which suppresses secondary resolution). The resulting bundle will contain the vcf
* and index resources:
* 3) Create a resource bundle for a fasta, but specify the fasta AND the secondary index and dictionary resources
* explicitly (which suppresses automatic secondary resolution). Let the tool determine the content types. The
* resulting bundle will contain the fasta, index and dictionary resources:
*
* CreateBundle --primary path/to/my.vcf --secondary some/other/path/to/vcd.idx --output mybundle.json
* CreateBundle \
* --primary path/to/my.fasta \
* --secondary some/other/path/to/my.fai \
* --secondary some/other/path/to/my.dict \
* --output mybundle.json
*
* Reference bundles: create a bundle using explicitly provided values and content types for the primary and
* secondary resources:
* 4) Create a resource bundle for a fasta, but specify the fasta, index and dictionary resources and the content
* types explicitly. The resulting bundle will contain the fasta, index and dictionary resources:
*
* CreateBundle --primary: path/to/my.fa
* CreateBundle \
* --primary:CT_HAPLOID_REFERENCE path/to/my.fasta \
* --secondary:CT_HAPLOID_REFERENCE_INDEX some/other/path/to/my.fai \
* --secondary:CT_HAPLOID_REFERENCE_DICTIONARY some/other/path/to/my.dict \
* --output mybundle.json
*/
@DocumentedFeature
@CommandLineProgramProperties(
Expand Down Expand Up @@ -110,7 +178,7 @@ public class CreateBundle extends CommandLineProgram {
private enum BundleType {
VCF,
REFERENCE,
OTHER
CUSTOM
}
private BundleType outputBundleType;

Expand All @@ -129,7 +197,7 @@ protected Object doWork() {
final Bundle bundle = switch (outputBundleType) {
case VCF -> createVCFBundle();
case REFERENCE -> createHaploidReferenceBundle();
case OTHER -> createOtherBundle();
case CUSTOM -> createOtherBundle();
};
writer.write(BundleJSON.toJSON(bundle));
} catch (final IOException e) {
Expand All @@ -153,7 +221,7 @@ private BundleType determinePrimaryContentType() {
logger.info(String.format("Primary input content type %s for %s not recognized. A bundle will be created using content types from the provided argument tags.",
primaryContentTag,
primaryResource));
bundleType = BundleType.OTHER;
bundleType = BundleType.CUSTOM;
}
} else {
logger.info(String.format("A content type for the primary input was not provided. Attempting to infer the content type from the %s extension.", primaryResource));
Expand Down
Loading

0 comments on commit 0184bb5

Please sign in to comment.