-
Notifications
You must be signed in to change notification settings - Fork 194
Changing References
Some vg formats, like GBZ, distinguish between paths of different types, or "senses".
For example, we can put a P-line path chr1
and a haplotype for sample sample
on contig chr1
into a GBZ file:
cat >demo.gfa <<EOF
H VN:Z:1.1
S 1 GATTACT
S 2 A
S 3 T
S 4 CATTAG
L 1 + 2 + *
L 1 + 3 + *
L 2 + 4 + *
L 3 + 4 + *
P chr1 1+,2+,4+ *
W sample1 0 chr1 0 14 >1>3>4
W sample2 0 chr1 0 14 >1>2>4
EOF
vg gbwt -G demo.gfa --gbz-format -g demo.gbz
rm demo.gfa
Then we can inspect the path metadata as a TSV with vg paths -M
:
vg paths -M -x demo.gbz
The result will be:
#NAME SENSE SAMPLE HAPLOTYPE LOCUS PHASE_BLOCK SUBRANGE
chr1 SENSE_GENERIC NO_SAMPLE_NAME NO_HAPLOTYPE chr1 NO_PHASE_BLOCK NO_SUBRANGE
sample1#0#chr1#0 SENSE_HAPLOTYPE sample1 0 chr1 0 NO_SUBRANGE
sample2#0#chr1#0 SENSE_HAPLOTYPE sample2 0 chr1 0 NO_SUBRANGE
Tools like vg surject
won't operate on haplotype paths by default. If we want to change the graph to make a haplotype path into a reference path, we can apply a transformation when converting formats to promote haplotypes for a sample:
vg convert -a --ref-sample sample1 demo.gbz >demo-promoted.vg
If we check the path metadata again:
vg paths -M -x demo-promoted.vg
The result will be:
#NAME SENSE SAMPLE HAPLOTYPE LOCUS PHASE_BLOCK SUBRANGE
chr1 SENSE_GENERIC NO_SAMPLE_NAME NO_HAPLOTYPE chr1 NO_PHASE_BLOCK NO_SUBRANGE
sample1#chr1 SENSE_REFERENCE sample1 NO_HAPLOTYPE chr1 NO_PHASE_BLOCK NO_SUBRANGE
sample2#0#chr1#0 SENSE_HAPLOTYPE sample2 0 chr1 0 NO_SUBRANGE
Note that the sense of the path for sample sample1
has changed to reference, and that its haplotype and phase block information have been removed. (It is safe to remove the haplotype because it is 0
, the value used for a haploid haplotype. If we were using a diploid sample, 1
and 2
would have been used for the haplotypes, and they would have been preserved.)
If we decide we don't want the old original chr1
path in there anymore, we can remove all paths with a chr1
prefix:
vg paths -x demo-promoted.vg -d -Q chr1 >demo-removed.vg
Then we can list the path names:
vg paths -L -x demo-removed.vg
And we will see that it is no longer there:
sample1#chr1
sample2#0#chr1#0
Converting the resulting graph back to GBZ format, for use with vg giraffe
, is not yet implemented. However, the modified graph can be used with alignments against the original graph, since the node IDs are all the same.
To clean up:
rm demo.gbz demo-promoted.vg demo-removed.vg