-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
selection and pcadapt output questions #90
Comments
Hi, Thank you for reaching out. Could you provide with some more information (sample size, number of SNPs) and maybe output log from pcangsd? :-) Best, |
Hi Jonas, Thanks so much for your response. Just to follow up on the GIF and make sure I get the correct order of operations- I do the GIF calculation to the output of pcangsd pcadapt before calculating pvalues, and then I can still use the same script for pcadapt to calculate pvalues, and then I can control for false discovery rate, is that correct? I guess I didn't save the output log unfortunately. Here was my logfile for pcangsd: The input file was beagle output from angsd that was LD-pruned to 6691644 SNPs (at minMaf 0.01). I didn't change the default for pcangsd, so it filtered down to 2942397 with minMaf 0.05, and this is for a sample size of 298 trees sampled across the range. Here are screenshots of the histograms of pvalues, the pvals.sel.pcX are the histograms of p-values associated with the --selection method, and in contrast you can see the histogram output from pcadapt. It also occurred to me to visualize the snp weights but I realized I didn't output them in a file. For more general background info- this is low coverage whole genome data from a long-lived desert tree species. From visualizing the pca and admix results it looks like the first pc axis basically explains the east-west differentiation which indeed explains differences between the two closely related sister species, the eastern and western trees, pc2 seems to explain differences between the northern and southern populations of the western trees, and pc3 explains north/south differences in the eastern trees... |
Hi I have two questions one about --pcadapt and one about --selection.
pcadapt- In the paper you describe adjusting the output by the genome inflation factor, how can I do that with my output? I was able to calculate the p-values based on the output with the associated script included in pcangsd.
I also ran --selection without specifying the number of pcs, and got output for 3 pc axes which was what I expected given my admix results. I calculated p-values from the output for each pc axis using your recommendations in R:
pvals.pc1 <- pchisq(selection[,1], 1, lower.tail=F)
after applying Benjamini–Hochberg, the interesting thing is that none of the pvalues are significant along pc1 or pc3, and only ~22K are significant along pc2. I also looked at the distribution of p-values and they look kind of weird- along pc1 the histogram is hump shaped, and for the other two there is more of a U shape. This is in contrast to the pcadapt output which after Benjamini–Hochberg yielded ~360K significant SNPs from nearly 3 million that were input, and the histogram of pvalues is more uniform especially at the right side of the plot.
I'm wondering why the results between the two methods might be so different and if something strange is going on with one or the other method? Any insights would be helpful, hopefully these questions are appropriate to post here and if not I'm sorry and thanks!
The text was updated successfully, but these errors were encountered: