Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate gene similarity on the HPO #60

Open
stefanucci-luca opened this issue Nov 2, 2020 · 2 comments
Open

Calculate gene similarity on the HPO #60

stefanucci-luca opened this issue Nov 2, 2020 · 2 comments

Comments

@stefanucci-luca
Copy link

Dear Kevin,

I would like to calculate the similarity for a few genes (~2000).
I annotated these genes with the HPO codes from the human phenotype ontology webpage (http://compbio.charite.de/jenkins/job/hpo.annotations/lastSuccessfulBuild/artifact/util/annotation/genes_to_phenotype.txt).

I obtained reshaped and got a file like this:

A4GALT	.	HP:0010970|HP:0000006
AAAS	.	HP:0040281|HP:0040282|HP:0040283|HP:0011463|HP:0001278|HP:0000972|HP:0012332|HP:0008259|HP:0004322|HP:0001251|HP:0000648|HP:0000007|HP:0002571|HP:0004319|HP:0001263|HP:0008163|HP:0001249|HP:0009916|HP:0003487|HP:0007002|HP:0000252|HP:0001347|HP:0000522|HP:0003676|HP:0000649|HP:0001324|HP:0000953|HP:0001260|HP:0000846|HP:0001250|HP:0007440|HP:0000505|HP:0000982|HP:0001761|HP:0010486|HP:0000830|HP:0007556|HP:0002093|HP:0001430|HP:0001252|HP:0002376|HP:0000612|HP:0000407
AASS	.	HP:0000119|HP:0000752|HP:0001083|HP:0001903|HP:0003593|HP:0001250|HP:0002161|HP:0000736|HP:0001252|HP:0100543|HP:0000007|HP:0001256|HP:0000750|HP:0001249
ABAT	.	HP:0025356|HP:0000278|HP:0000098|HP:0007291|HP:0000007|HP:0002415|HP:0001321|HP:0000494|HP:0001347|HP:0006829|HP:0001263|HP:0001274|HP:0001250|HP:0001254|HP:0025430|HP:0003819
ABCA4	.	HP:0040280|HP:0040281|HP:0040282|HP:0040283|HP:0040284|HP:0000006|HP:0007663|HP:0000662|HP:0001133|HP:0000608|HP:0000512|HP:0000543|HP:0000007|HP:0007737|HP:0007722|HP:0000510|HP:0007984|HP:0007843|HP:0000548|HP:0000580|HP:0000572|HP:0008035|HP:0000639|HP:0000618|HP:0000405|HP:0000603|HP:0000135|HP:0000493|HP:0000463|HP:0001249|HP:0007703|HP:0000613|HP:0000987|HP:0030329|HP:0000649|HP:0000648|HP:0000551|HP:0008046|HP:0000407|HP:0007704|HP:0007814|HP:0008736|HP:0000035|HP:0008002|HP:0007675|HP:0000431|HP:0000610|HP:0000518|HP:0000602|HP:0001513|HP:0008059|HP:0000501|HP:0000563|HP:0000842|HP:0030500|HP:0001347|HP:0000505|HP:0005978|HP:0011504|HP:0011462|HP:0011463|HP:0003621|HP:0007994
ABCB11	.	HP:0040283|HP:0000989|HP:0002014|HP:0003155|HP:0000952|HP:0001081|HP:0003593|HP:0001394|HP:0001744|HP:0001046|HP:0002240|HP:0002630|HP:0002908|HP:0000007|HP:0003819|HP:0004322|HP:0001508|HP:0001406|HP:0001402

which I think is the correct format for phenopy. I then used the command:

phenopy score gene_lists_with_HPO.txt --threads 12 --self

and I got as output something like this:

#query	entity_id	score
A4GALT	A4GALT	1.0
A4GALT	ABCD1	0.0
A4GALT	ACAT1	0.010405043493187662
A4GALT	ACVRL1	0.03336405048957507
A4GALT	ADGRG1	0.0
A4GALT	AGXT	0.009234121604447244
A4GALT	AKT1	0.003509945769583653
A4GALT	ALG1	0.0
A4GALT	AMER1	0.0

However, the identity for some genes are not 1 as I was expecting. For instance:

ABCB7 ABCB7 0.5558528984777618

Would you expect something like this? How would you explain it?
Should I use a different --summarization-method ?

Best regards,

Luca

@arvkevi
Copy link
Contributor

arvkevi commented Nov 3, 2020

Hi Luca,

Thank you for checking out the repo. It looks like you have successfully run phenopy on your input files, that's great! The behavior you describe is expected. It's a property of the HRSS semantic similarity scoring algorithm. It's a way to scale similarity scores by rewarding nodes being compared further down the ontology. The way the algorithm is implemented here, even a phenotype-to-itself is only ever 1.0 by HRSS when the beta_ic is 0.0. This is the case in leaf nodes. Does this explanation help?

@viktorzou
Copy link

so how would i set a network-cutoff value then, if same terms might not result in 1.0? Also is there any possibility to introduce my own scores, if I have some frequency values attached to Phenotypes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants