Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snp-pileup for mm10/GRCm38 mouse #203

Open
ahwanpandey opened this issue Nov 3, 2024 · 4 comments
Open

snp-pileup for mm10/GRCm38 mouse #203

ahwanpandey opened this issue Nov 3, 2024 · 4 comments

Comments

@ahwanpandey
Copy link

ahwanpandey commented Nov 3, 2024

Hello,

Could you point me to a proper VCF to do snp-pileup on WGS for mm10/GRCm38?

I downloaded 00-All.vcf.gz but it has 71,202,368 SNPs which I feel is way too many for the purpose of snp-pileup?

I am also having a look here and the file sizes are much smaller and seem more reasonable, but it doesn't exactly have "C57BL/6J" which is the strain we are using. However it does have "C57BL/6NJ".
https://ftp.ebi.ac.uk/pub/databases/mousegenomes/REL-1505-SNPs_Indels/strain_specific_vcfs/

Thanks for your help!

Best,
Ahwan

@ahwanpandey
Copy link
Author

ahwanpandey commented Nov 4, 2024

I've also found this with "8,213,470" SNPs
https://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/snp142Common.txt.gz

And described by the following post:
"Common SNPs (142): uniquely mapped variants that appear in at least 1% of the population"
https://groups.google.com/a/soe.ucsc.edu/g/genome-announce/c/VuZlU_vCPx4

Please advise the best SNP reference for mouse samples!

@veseshan
Copy link
Collaborator

veseshan commented Nov 4, 2024

While the VCF has 71 million SNPs you can process it to reduce the number. Any polymorphism which is not single nucleotide should be removed. Any with more than one alternate allele should also be removed. The VCF also has a lot of columns that are not useful for snp-pileup and can be removed to reduce the file size.

@ahwanpandey
Copy link
Author

Thanks for you input @veseshan . I will try the 00-All.vcf.gz file with the following filter. Comes up to 70,672,993 SNPs

bcftools view --types snps --max-alleles 2 orig/dbsnp.vcf.gz | cut -f 1-5 | bgzip -cf > dbsnp.snps_only.no_multi_allele.vcf.gz

I also found this resource that might be useful to try:
https://kharchenkolab.github.io/numbat/articles/mouse.html

@ahwanpandey
Copy link
Author

Hi again @veseshan

I am new to mouse WGS analysis and am just reading something that seems to suggest inbred (pure) genetic background mice lack heterozygous SNPs:
kharchenkolab/numbat#198

So does this mean I won't be able to run FACETS either on our pure "C57BL/6J" mice?

Thanks,
Ahwan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants