Skip to content

Commit

Permalink
Add run_goea quiet mode example. Add examples of custome GOEA print s…
Browse files Browse the repository at this point in the history
…ummaries.

#133 (comment)
  • Loading branch information
dvklopfenstein committed Mar 13, 2020
1 parent f08732d commit afae190
Showing 1 changed file with 137 additions and 36 deletions.
173 changes: 137 additions & 36 deletions notebooks/goea_nbt3102.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"go-basic.obo: fmt(1.2) rel(2019-04-17) 47,398 GO Terms\n"
"go-basic.obo: fmt(1.2) rel(2020-01-01) 47,337 GO Terms\n"
]
}
],
Expand All @@ -126,11 +126,10 @@
"name": "stdout",
"output_type": "stream",
"text": [
"HMS:0:00:06.278703 364,039 annotations READ: gene2go \n",
"1 taxids stored: 10090\n",
"MF 16,802 annotated mouse genes\n",
"CC 18,927 annotated mouse genes\n",
"BP 17,737 annotated mouse genes\n"
"HMS:0:00:07.991907 367,335 annotations, 24,267 genes, 18,190 GOs, 1 taxids READ: gene2go \n",
"CC 18,824 annotated mouse genes\n",
"MF 16,721 annotated mouse genes\n",
"BP 17,859 annotated mouse genes\n"
]
}
],
Expand Down Expand Up @@ -194,15 +193,15 @@
"\n",
"Load BP Gene Ontology Analysis ...\n",
"fisher module not installed. Falling back on scipy.stats.fisher_exact\n",
" 59% 16,747 of 28,212 population items found in association\n",
" 60% 16,820 of 28,212 population items found in association\n",
"\n",
"Load CC Gene Ontology Analysis ...\n",
"fisher module not installed. Falling back on scipy.stats.fisher_exact\n",
" 65% 18,276 of 28,212 population items found in association\n",
" 64% 18,171 of 28,212 population items found in association\n",
"\n",
"Load MF Gene Ontology Analysis ...\n",
"fisher module not installed. Falling back on scipy.stats.fisher_exact\n",
" 58% 16,418 of 28,212 population items found in association\n"
" 58% 16,336 of 28,212 population items found in association\n"
]
}
],
Expand Down Expand Up @@ -235,7 +234,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"400 genes READ: /mnt/c/Users/note2/Data/git/tmp/goatools/goatools/test_data/nbt_3102/nbt.3102-S4_GeneIDs.xlsx\n"
"400 genes READ: /mnt/c/Users/note2/Data/git/goatools/goatools/test_data/nbt_3102/nbt.3102-S4_GeneIDs.xlsx\n"
]
}
],
Expand Down Expand Up @@ -278,37 +277,34 @@
"output_type": "stream",
"text": [
"\n",
"Run BP Gene Ontology Analysis: current study set of 400 IDs ...\n",
" 93% 357 of 382 study items found in association\n",
"Run BP Gene Ontology Analysis: current study set of 400 IDs ... 94% 358 of 382 study items found in association\n",
" 96% 382 of 400 study items found in population(28212)\n",
"Calculating 12,189 uncorrected p-values using fisher_scipy_stats\n",
" 12,189 GO terms are associated with 16,747 of 28,212 population items\n",
" 2,068 GO terms are associated with 357 of 400 study items\n",
"Calculating 12,253 uncorrected p-values using fisher_scipy_stats\n",
" 12,253 GO terms are associated with 16,820 of 28,212 population items\n",
" 2,086 GO terms are associated with 358 of 400 study items\n",
" METHOD fdr_bh:\n",
" 70 GO terms found significant (< 0.05=alpha) ( 68 enriched + 2 purified): statsmodels fdr_bh\n",
" 230 study items associated with significant GO IDs (enriched)\n",
" 74 GO terms found significant (< 0.05=alpha) ( 72 enriched + 2 purified): statsmodels fdr_bh\n",
" 236 study items associated with significant GO IDs (enriched)\n",
" 4 study items associated with significant GO IDs (purified)\n",
"\n",
"Run CC Gene Ontology Analysis: current study set of 400 IDs ...\n",
" 98% 376 of 382 study items found in association\n",
"Run CC Gene Ontology Analysis: current study set of 400 IDs ... 98% 376 of 382 study items found in association\n",
" 96% 382 of 400 study items found in population(28212)\n",
"Calculating 1,724 uncorrected p-values using fisher_scipy_stats\n",
" 1,724 GO terms are associated with 18,276 of 28,212 population items\n",
" 445 GO terms are associated with 376 of 400 study items\n",
" 1,724 GO terms are associated with 18,171 of 28,212 population items\n",
" 449 GO terms are associated with 376 of 400 study items\n",
" METHOD fdr_bh:\n",
" 92 GO terms found significant (< 0.05=alpha) ( 92 enriched + 0 purified): statsmodels fdr_bh\n",
" 89 GO terms found significant (< 0.05=alpha) ( 89 enriched + 0 purified): statsmodels fdr_bh\n",
" 373 study items associated with significant GO IDs (enriched)\n",
" 0 study items associated with significant GO IDs (purified)\n",
"\n",
"Run MF Gene Ontology Analysis: current study set of 400 IDs ...\n",
" 88% 338 of 382 study items found in association\n",
"Run MF Gene Ontology Analysis: current study set of 400 IDs ... 89% 339 of 382 study items found in association\n",
" 96% 382 of 400 study items found in population(28212)\n",
"Calculating 4,128 uncorrected p-values using fisher_scipy_stats\n",
" 4,128 GO terms are associated with 16,418 of 28,212 population items\n",
" 581 GO terms are associated with 338 of 400 study items\n",
"Calculating 4,146 uncorrected p-values using fisher_scipy_stats\n",
" 4,146 GO terms are associated with 16,336 of 28,212 population items\n",
" 580 GO terms are associated with 339 of 400 study items\n",
" METHOD fdr_bh:\n",
" 56 GO terms found significant (< 0.05=alpha) ( 54 enriched + 2 purified): statsmodels fdr_bh\n",
" 273 study items associated with significant GO IDs (enriched)\n",
" 55 GO terms found significant (< 0.05=alpha) ( 53 enriched + 2 purified): statsmodels fdr_bh\n",
" 277 study items associated with significant GO IDs (enriched)\n",
" 0 study items associated with significant GO IDs (purified)\n"
]
}
Expand All @@ -324,13 +320,118 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Write results to an Excel file and to a text file"
"### 5a. Quietly Run Gene Ontology Enrichment Analysis (GOEA)\n",
"GOEAs can be run quietly using `prt=None`:\n",
"```\n",
"goea_results = goeaobj.run_study(geneids_study, prt=None)\n",
"```\n",
"#### No output is printed if `prt=None`:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"goea_quiet_all = goeaobj.run_study(geneids_study, prt=None)\n",
"goea_quiet_sig = [r for r in goea_results_all if r.p_fdr_bh < 0.05]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Print customized results summaries\n",
"##### Example 1: Significant v All GOEA results"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"218 of 18,123 results were significant\n"
]
}
],
"source": [
"print('{N} of {M:,} results were significant'.format(\n",
" N=len(goea_quiet_sig),\n",
" M=len(goea_quiet_all)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Example 2: Enriched v Purified GOEA results"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Significant results: 214 enriched, 4 purified\n"
]
}
],
"source": [
"print('Significant results: {E} enriched, {P} purified'.format(\n",
" E=sum(1 for r in goea_quiet_sig if r.enrichment=='e'),\n",
" P=sum(1 for r in goea_quiet_sig if r.enrichment=='p')))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Example 3: Significant GOEA results by namespace"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Significant results[218] = 74 BP + 55 MF + 89 CC\n"
]
}
],
"source": [
"import collections as cx\n",
"ctr = cx.Counter([r.NS for r in goea_quiet_sig])\n",
"print('Significant results[{TOTAL}] = {BP} BP + {MF} MF + {CC} CC'.format(\n",
" TOTAL=len(goea_quiet_sig),\n",
" BP=ctr['BP'], # biological_process\n",
" MF=ctr['MF'], # molecular_function\n",
" CC=ctr['CC'])) # cellular_component"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Write results to an Excel file and to a text file"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand Down Expand Up @@ -358,16 +459,16 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 70 usr 494 GOs WROTE: nbt3102_BP.png\n",
" 92 usr 195 GOs WROTE: nbt3102_CC.png\n",
" 56 usr 157 GOs WROTE: nbt3102_MF.png\n"
" 74 usr 506 GOs WROTE: nbt3102_BP.png\n",
" 89 usr 155 GOs WROTE: nbt3102_CC.png\n",
" 55 usr 156 GOs WROTE: nbt3102_MF.png\n"
]
}
],
Expand Down Expand Up @@ -402,7 +503,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 15,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -445,7 +546,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 16,
"metadata": {},
"outputs": [
{
Expand Down

0 comments on commit afae190

Please sign in to comment.