Skip to content

Commit

Permalink
Results comparison v2 (#141)
Browse files Browse the repository at this point in the history
* scikit

* kappa
  • Loading branch information
gromdimon authored Jul 10, 2024
1 parent 8defd0e commit 8248fb5
Show file tree
Hide file tree
Showing 9 changed files with 1,072 additions and 179 deletions.
3 changes: 3 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ jupyterlab = "*"
pandas = "*"
pandas-stubs = "*"
matplotlib = "*"
scikit-learn = "*"
scipy = "*"
seaborn = "*"

[requires]
python_version = "3.12"
105 changes: 95 additions & 10 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions src/bench/cohens_kappa_results.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Criteria,AutoACMG Kappa,Intervar Kappa,Genebe Kappa
PVS1,0.9059419881723458,0.23490212411495204,0.9529709940861729
PS1,0.34816549570647926,0.0,0.7080419580419581
PM1,-0.005815160955348064,0.32813678652250433,0.6100233518950962
PM2,-0.3840759368137707,0.5030122668215142,0.5405444540978859
PM4,1.0,0.0,0.660569105691057
PM5,0.6311549954527738,0.10098385000928167,0.6754315765405281
PP2,0.0,-0.06556655665566558,0.8062817881681092
PP3,0.28326180257510725,0.5914587264975187,0.7549882629107981
BA1,0.0,0.10098385000928167,0.7562636828022379
BS1,-0.011178388448998433,-0.06099110546378639,0.5831253120319521
BS2,-0.011588095865156633,0.10199479618386809,0.49288796547866387
BP1,0.0,0.34232050893668586,0.9267543859649123
BP3,0.7970838396111786,0.0,0.0
BP4,-0.20113023375288974,0.6171480972031178,0.9327245870820464
BP7,0.7277469840234757,0.7874158937988726,0.9129150008691118
6 changes: 4 additions & 2 deletions src/bench/comparison_criteria_custom.csv
Original file line number Diff line number Diff line change
Expand Up @@ -176,5 +176,7 @@ NM_001754.4(RUNX1):c.952T>G,BA1;BP4,"This missense variant is present in gnomAD
NM_001754.4(RUNX1):c.183G>A,BA1;BP2;BP4,"This synonymous variant is present in gnomAD (v2 and v3) at an allele frequency >0.15% with at least >5 alleles in any general continental population (BA1); in addition, the variant was found in homozygosity in the population database (BP2). Although evolutionary conservation prediction algorithms predict the site as being moderately conserved (PhyloP score: 3.03 > 0.1 [-14.1;6.4]) and the variant is not the reference nucleotide in one primate and/or three mammal species, it is predicted by SSF and MES to lead to either an increase in the canonical splice site score or a decrease of the canonical splice site score by no more than 10% and no putative cryptic splice sites are created (BP4). In summary, the clinical significance of this variant is benign. ACMG/AMP criteria applied, as specified by the ClinGen Myeloid Malignancy Variant Curation Expert Panel for RUNX1: BA1, BP2, and BP4."
NM_001754.4(RUNX1):c.144C>T,BA1;BP2;BP4;BP7,"This synonymous variant is present in gnomAD (v2 and v3) at an allele frequency >0.15% with at least 5 alleles in any general continental population (BA1); in addition, the variant was found in homozygosity in the population database (BP2). The variant is predicted by SSF and MES to lead to either an increase in the canonical splice site score or a decrease of the canonical splice site score by no more than 10% and no putative cryptic splice sites are created, and evolutionary conservation prediction algorithms predict the site as being not highly conserved (PhyloP score: 1.01 [-14.1;6.4]) (BP4; BP7). In summary, the clinical significance of this variant is benign. ACMG/AMP criteria applied, as specified by the ClinGen Myeloid Malignancy Variant Curation Expert Panel for RUNX1: BA1, BP2, BP4 and BP7."
NM_000212.2(ITGB3):c.342T>C,BA1;BP4;BP7,"The NM_000212.2:c.342T>C variant, which leads to a synonymous change, Ile114Ile, is reported at a high frequency in the African population in gnomAD and ExAC (0.05). In-silico splicing predictors do not predict splicing impact. PMID: 27469266 reports on this and other polymorphic, non-causal variants found in linkage disequilibrium with deleterious mutations in GT patients. Ile114Ile is classified as a benign variant. GT-specific criteria applied: BA1, BP4, and BP7."


#BP3,,
NM_005249.4(FOXG1):c.209_232del24,BA1;BS2;BP3;BP5,"The allele frequency of the p.Q70_P77del variant in FOXG1 is 0.03% in gnomAD, which is high enough to be classified as benign based on thresholds defined by the ClinGen Rett/Angelman-like Expert Panel for Rett/AS-like conditions (BA1). The p.Q70_P77del variant is observed in at least 2 unaffected individuals (internal database) (BS2). The p.Q70_P77del variant is an in-frame deletion present in a repetitive region of FOXG1 (BP3). The p.Q70_P77del variant is found in at least 3 patients with an alternate molecular basis of disease (internal database) (BP5_strong). In summary, the p.Q70_P77del variant in FOXG1 is classified as benign based on the ACMG/AMP criteria (BA1, BS2, BP3, BP5_strong)."
NM_005249.5(FOXG1):c.209_235del,BA1;BS2;BP3,"The allele frequency of the c.209_235del variant in FOXG1 is 0.17% in Ashkenazi Jewish sub population in gnomAD, which is high enough to be classified as benign based on thresholds defined by the ClinGen Rett/Angelman-like Expert Panel for Rett/AS-like conditions BA1). The p.Gln70_Pro78del variant is observed in at least 2 unaffected individuals (GeneDx internal database) (BS2). The p.Gln70_Pro78del variant is an in-frame deletion present in a repetitive region of FOXG1 (BP3). In summary, the p.Gln70_Pro78del variant in FOXG1 is classified as benign based on the ACMG/AMP criteria (BA1, BS2, BP3)."
NM_005249.5(FOXG1):c.237_239del,BA1;BP3,"The highest population minor allele frequency of the c.237_239del (p.Pro80del) variant in FOXG1 in gnomAD v4.1 is 0.00054 in the East Asian population, which is higher than the ClinGen Rett and Angelman-like Disorders VCEP threshold (≥0.0003) for BA1, and therefore meets this criterion (BA1). The p.Pro80del variant is an in-frame deletion present in a repetitive region of FOXG1 (BP3). In summary, the p.Pro80del variant in FOXG1 is classified as a benign variant based on the ACMG/AMP criteria (BA1, BP3)."
6 changes: 4 additions & 2 deletions src/bench/comparison_v3.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,10 @@ def intervar_response(variant: str):

url = (
f"http://wintervar.wglab.org/api_new.php?"
f"queryType=position&chr={chromosome}&pos={position}"
f"queryType={position}&chr={chromosome}&pos={position}"
f"&ref={reference}&alt={alternative}&build=hg19"
)
print("Requesting:", url)
backend_resp = requests.get(url)
backend_resp.raise_for_status()
return backend_resp.json()
Expand Down Expand Up @@ -132,6 +133,7 @@ def genebe_response(variant: str):
f"chr={chromosome}&pos={position}"
f"&ref={reference}&alt={alternative}&genome=hg38"
)
print("Requesting:", url)
backend_resp = requests.get(url)
backend_resp.raise_for_status()
return backend_resp.json()
Expand Down Expand Up @@ -237,7 +239,7 @@ def eval_genebe(resp, expected):
]
)

for i, var in enumerate(variants):
for i, var in enumerate(variants[-3:]):
# Save the stats every 10 variants
if i % 50 == 0:
print(f"Processed {i} variants")
Expand Down
5 changes: 5 additions & 0 deletions src/bench/correlation_matrix.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
,AutoACMG,Intervar,Genebe,ClinGen
AutoACMG,1.0,0.008763199951854865,0.28831332823829237,0.11973512248675762
Intervar,0.008763199951854816,1.0,0.6729186674369368,0.7965249163017614
Genebe,0.28831332823829237,0.6729186674369367,1.0,0.8001863441995085
ClinGen,0.11973512248675768,0.7965249163017614,0.8001863441995085,1.0
Loading

0 comments on commit 8248fb5

Please sign in to comment.