GitHub - treynr/ness-poc: Heterogeneous graph aggregation and analysis using large-scale biological datasets.

This is the proof-of-concept version of NESS. See https://github.com/treynr/ness for the production version.

Network Enhanced Similarity Search (NESS)

https://img.shields.io/travis/treynr/ness.svg?style=flat-square

Advances in genomics have led to improved classifications of biological categories and concepts, such as disease, via the analysis of high-throughput studies and experimental datasets. The application of concepts from information theory (e.g., semantic similarity) is a well-established means to provide a quantitative assessment of the similarity of concepts represented in structured vocabularies in biology. However, the gene and variant annotations which serve as critical components of these evaluations are often sparse, potentially biased due to uneven representation in the literature, and constrained to the single species in which they were derived. These deficiencies limit the utility of information content approaches.

NESS is a dual data integration and analysis approach. It is designed to harmonize large-scale, functional genomics data sources into a single cross-species heterogeneous network. By employing a random walk with restart (RWR) over a harmonized graph structure, NESS provides data-driven similarity measures of biological concepts across multiple domains and species. This flexible approach enables cross-species analysis applications that leverage model organism data in the context of sparse datasets. Some applications include gene/variant-disease prioritization, disease classification, and semantic similarity analysis.

Getting started

NESS is organized into two components: nessb, the heterogeneous network builder and nessw the RWR algorithm.

nessb is designed to harmonize disparate data sources (e.g., ontologies, gene networks, differential expression studies) from multiple species into a single network structure. The network builder is relatively quick; testing on a 56-core machine (Intel Xeon CPU E5-2695 v3 @ 2.30GHz) with 504GB of RAM and building a network with ~90K nodes and 4.6M edges:

$ time nessb -a go-annotations.tsv -e gene-networks.tsv -g gene-sets.tsv -o go-relations.tsv sample-net.al

real    0m33.178s
user    0m32.177s
sys     0m0.893s

See the build documentation on how to compile and use nessb.

nessw applies an RWR algorithm over the harmonized data structure. Any biological concept(s) in the network can be used as seed nodes. The RWR implementation is also fairly quick; testing on a 56-core machine (Intel Xeon CPU E5-2695 v3 @ 2.30GHz) with 504GB of RAM, and using a network with ~90K nodes and 4.6M edges to calculate the proximity vector of a single seed node with a restart probability of 0.15:

$ time nessw -a -r 0.15 sample-net.al entity-map-sample-net.al single-seed.txt scores.tsv

real    0m6.681s
user    0m6.358s
sys     0m0.301s

Calculating the complete proximity matrix takes ten or fifteen minutes if distributed across a cluster. See the walk documentation on how to compile and use nessw.

Funding

Part of the GeneWeaver data repository and analysis platform. For a detailed description, see this article.

This work has been supported by joint funding from the NIAAA and NIDA, NIH [R01 AA18776]; and The Jackson Laboratory (JAX) Center for Precision Genetics of the NIH [U54 OD020351].

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
build		build
walk		walk
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
readme.rst		readme.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Network Enhanced Similarity Search (NESS)

Getting started

Funding

About

Releases

Packages

Languages

License

treynr/ness-poc

Folders and files

Latest commit

History

Repository files navigation

Network Enhanced Similarity Search (NESS)

Getting started

Funding

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages