Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Support] Using PCAngsdv2 to determine kinship #58

Open
fidibidi opened this issue Jan 24, 2022 · 0 comments
Open

[Support] Using PCAngsdv2 to determine kinship #58

fidibidi opened this issue Jan 24, 2022 · 0 comments

Comments

@fidibidi
Copy link

Hi All!

We have been trying to set up PCAngsd/ANGSD, to test for relatedness among our clinical datasets.
We've been following along with the methods that were used in the "Whole genome analysis sheds light on the genetic origin of Huns, Avars and conquering Hungarians" paper.

In particular this section:

"Presence of close relatives in the dataset interferes with unsupervised ADMIXTURE and population genetic analysis, therefore we identified close kins and just one of them was left in the dataset (Supplementary Table 9). We performed kinship analysis using the 1240K data set and the PCAangsd software (version 0.931)(Meisner and Albrechtsen 2018) from the ANGSD package with the “-inbreed 1 -kinship” options. We used the R (version 4.1.2); the RcppCNPy R package (version 0.2.10) to import the Numpy output files of PCAangsd."

Based on this, I took a trio and ran it through ANGSD to generate a BEAGLE file.

angsd -GL 1 -out CA0346.angsd -nThreads 4 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam CA0346.bamslist

The beagle file looked as such:

marker allele1 allele2 Ind0 Ind0 Ind0 Ind1 Ind1 Ind1 Ind2 Ind2 Ind2
chr1_14907 2 0 0.000793 0.999207 0.000000 0.003548 0.996452 0.000000 0.000000 0.999997 0.000003
chr1_14930 0 2 0.000000 0.999953 0.000047 0.000000 0.998436 0.001564 0.105530 0.894445 0.000025
chr1_14976 2 0 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000
chr1_15118 0 2 0.000005 0.999995 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000
chr1_15211 2 3 0.000000 1.000000 0.000000 0.000011 0.999989 0.000000 0.000000 1.000000 0.000000
chr1_15274 3 2 0.000000 1.000000 0.000000 0.000790 0.999210 0.000000 0.000001 0.999999 0.000000
chr1_49272 2 0 0.051665 0.948335 0.000000 0.001532 0.998468 0.000000 0.001330 0.998670 0.000000
chr1_49298 1 3 0.441472 0.558528 0.000000 0.000054 0.999946 0.000000 0.003694 0.996306 0.000000
chr1_51803 3 1 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000

Then taking this beagle file, I ran pcangsd on it:

python pcangsd.py -beagle ~/data/CA0346.angsd.beagle.gz -o ~/data/Test-inbreed-3 -inbreed 1 -kinship -threads 4

Resulting in the following output files: ( which I have output using jupyter notebook for clarity )

inbreed
[-0.40427923 -0.84753537 -0.85559165]
kinship:
[[ 0.15704742 -0.0919658 -0.08373455]
[-0.0919658 0.05300118 0.05168671]
[-0.08373455 0.05168671 0.04196488]]
covariance:
[[ 0.33447766 -0.15413524 -0.18028733]
[-0.15413524 0.75423872 -0.59264392]
[-0.18028733 -0.59264392 0.77522612]]

Unfortunately, the documentation for the PCAngsd is lacking in explanation of output.

If anyone could help us interpret this output, and verify that our commands, and process were correct, that'd be a tremendous help!

We were expecting 2 first degree relationships (child parent) and one unrelated (parents).

Thank you for this cool software, and have a good one!
Fidi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant