-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VCF out of order or chomosome not present #13
Comments
Here's the sex-specific map I'm using (compressed) |
Thanks for posting! If you remove all chromosomes except 22 from the refined_mf.simmap file, it will work. Though this can change, for now, though you can omit later chromosomes, Ped-sim requires the VCF's first and successive chromosomes to match the first and later chromosomes in the genetic map. So it's looking for chromosome 1 here. |
Note that when using sex-specific maps, simulating all chromosomes simultaneously is necessary to ensure that the parent sexes are the same on all chromosomes, so if you can, try to merge the autosomal data into one VCF. (Support for chrX is coming hopefully in March.) |
Thanks for the quick reply! Removing non chr22 from that map should be an easy fix. Regarding the second - my initial intent was to simulate only a single chromosome's worth of data, not simulating genome-wide data one chromosome at a time. That should still work, right? |
For sure, should work! Hope the documentation helps. |
Hello! Thanks for making this open-source and for putting together such an informative README. I'm still wrapping my brain around branching and the pedigree def file, but for now, I'm using your examples trying to simulate genetic data from a VCF file, and I'm getting an unexpected error about chromosomes in my VCF either out of order or not present.
I first created the map file exactly as specified in the README, named it the same,
refined_mf.simmap
. My ped def file (sims.def
) file is pretty simple:When not using an input VCF I don't run into any issue:
However, when I do try to simulate variants using an input VCF, I run into a problem:
Prior to running this I checked that there were no SNPs in the VCF outside the range defined in the map. I created the VCF with minimal pre-processing starting from the GRCh37 1000 Genomes chromosome 22 VCF. I first collected sample IDs for 20 unrelated samples from GBR, then subsetted the VCF getting only those samples, only variants on chromosome 22 between the region 17152611-51175626, and the
-v snps -m2 -M2 -i 'INFO/AF>0.05'
gets only biallelic SNPs with a global MAF>0.05, then finally sorts the output and writes a new VCF.Here's that VCF in case you want it for a reproducible example:
gbr.22.snps.vcf.gz
Looking at the first few and last few chromosome 22 entries on the sex-specific map:
... shows that subsetting the vcf to the region 22:17152611-51175626 shouldn't result in any variants outside the range defined by the map. Indeed, checking the VCF itself shows that the first record is at position 17152611, and the last is at position 51175626, both included in the sex-specific map (
refined_mf.simmap
).Thanks, and please let me know if I can help with reproducing this issue on your end!
The text was updated successfully, but these errors were encountered: