Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selection and pcadapt output questions #90

Open
lkrichardson opened this issue Jul 3, 2024 · 2 comments
Open

selection and pcadapt output questions #90

lkrichardson opened this issue Jul 3, 2024 · 2 comments

Comments

@lkrichardson
Copy link

Hi I have two questions one about --pcadapt and one about --selection.

  1. pcadapt- In the paper you describe adjusting the output by the genome inflation factor, how can I do that with my output? I was able to calculate the p-values based on the output with the associated script included in pcangsd.

  2. I also ran --selection without specifying the number of pcs, and got output for 3 pc axes which was what I expected given my admix results. I calculated p-values from the output for each pc axis using your recommendations in R:
    pvals.pc1 <- pchisq(selection[,1], 1, lower.tail=F)
    after applying Benjamini–Hochberg, the interesting thing is that none of the pvalues are significant along pc1 or pc3, and only ~22K are significant along pc2. I also looked at the distribution of p-values and they look kind of weird- along pc1 the histogram is hump shaped, and for the other two there is more of a U shape. This is in contrast to the pcadapt output which after Benjamini–Hochberg yielded ~360K significant SNPs from nearly 3 million that were input, and the histogram of pvalues is more uniform especially at the right side of the plot.

I'm wondering why the results between the two methods might be so different and if something strange is going on with one or the other method? Any insights would be helpful, hopefully these questions are appropriate to post here and if not I'm sorry and thanks!

@Rosemeis
Copy link
Owner

Rosemeis commented Jul 4, 2024

Hi,

Thank you for reaching out.
The genomic inflation factor is calculated as follows in R: GIF=median(obs, dof)/qchisq(0.5, dof), where dof is the degrees of freedom.

Could you provide with some more information (sample size, number of SNPs) and maybe output log from pcangsd? :-)
You can also attach the histograms if possible.

Best,
Jonas

@lkrichardson
Copy link
Author

lkrichardson commented Jul 5, 2024

Hi Jonas, Thanks so much for your response.

Just to follow up on the GIF and make sure I get the correct order of operations- I do the GIF calculation to the output of pcangsd pcadapt before calculating pvalues, and then I can still use the same script for pcadapt to calculate pvalues, and then I can control for false discovery rate, is that correct?

I guess I didn't save the output log unfortunately. Here was my logfile for pcangsd:
PCAngsd v1.11
Time: 19/06/2024 09:26:57
Directory: /home/lkr
Options:
-beagle /home/lkr/jtgi/angsd/unlinkedSNPs/unlinkedSNPs.beagle.gz
-out /home/lkr/jtgi/angsd/unlinkedSNPs/pcangsdSelection/pcSelmaf05
-selection
-pcadapt
-sites_save

The input file was beagle output from angsd that was LD-pruned to 6691644 SNPs (at minMaf 0.01). I didn't change the default for pcangsd, so it filtered down to 2942397 with minMaf 0.05, and this is for a sample size of 298 trees sampled across the range. Here are screenshots of the histograms of pvalues, the pvals.sel.pcX are the histograms of p-values associated with the --selection method, and in contrast you can see the histogram output from pcadapt. It also occurred to me to visualize the snp weights but I realized I didn't output them in a file.

For more general background info- this is low coverage whole genome data from a long-lived desert tree species. From visualizing the pca and admix results it looks like the first pc axis basically explains the east-west differentiation which indeed explains differences between the two closely related sister species, the eastern and western trees, pc2 seems to explain differences between the northern and southern populations of the western trees, and pc3 explains north/south differences in the eastern trees...

pvalsHist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants