Gap2F

Background

Genome scale metabolic networks integrate, under the bottom-up approach, genomics and biochemistry information from organisms. These networks allow to predict and analyze properties arising from interactions between the components of a metabolic system. Initially, these networks are automatically generated, but require a comprehensive curation process, which includes, among other things, the refinement of network connectivity between metabolites and the metabolic pathways of the network. This process includes a step known as gap find and gap fill, which usually is done manually or by tools that require the acquisition of commercial licenses. The first one, GapFind, consists on the identification of the set of disconnected reactions, evidenced by those metabolites that are consumed in a reaction but not produced for any other reaction in the system. GapFill is the process of adding reactions according to that identification. Both processes are not straightforward between different organisms, since different reactions occur in different organisms, several times with different biochemical directionalities. In recent years, plant metabolic networks are increasing, due to its importance to agricultural science. However, the curation process is long and tedious, therefore, It is required to develop open source tools, that allow an efficient curation of plant metabolic models. In particular, we want to develop a package in R to automaticaly and quickly make gap find and gap fill, directed to plant species, a necessary step that will allow us to better predict plant metabolic networks behaviour. This package will be flexible enough as to select the specific species to work with.

Related work

Although there are R packages for the construction and analysis of metabolic networks, as Sybil, NetPathMiner, BiGGR package, RbioRXN, among others. These packages do not allow to do gap find and gap fill on metabolic models, so it is a missing and highly desired characteristic to be implemented.

Details of your coding project

The general idea is to develop an R package that performs the following functions:

To identify the dead-end metabolites in the network,
To parse the metabolic and biochemical plants databases in order to generate the in-house databases that will be used for the actual gap filling,
find the dead-end metabolites in those databases (being aware from the synonyms of each metabolite
the program will return a data-frame with the biochemical reactions in wich the dead-end metabolites was involved, as well as information associated with the reaction, such as gene, EC number, metabolic pathway and plant species. This data-frame will propose candidate reactions to be introduced into the network to fullfill the gaps. Since this is a knowledge driven decision the researcher have the option of an automatic introduction of this candidate reactions by the system, or will manually select the most convenient reaction (s).

The package should also have a function to include exchange and transport reactions of metabolites that were not solved by gap filling.

In future package updates we expected to include databases for other taxonomic groups.

Expected impact

The scientific community has made efforts to integrate multiple functional genomic characterization and biochemical knowledge within a genome scale metabolic networks. There is a big potential of plant metabolic models in agricultural sciences, especially to
address food security challenges imposed by climate change. An R package that efficiently cure the metabolic network connectivity, will be useful for the scientific community, primarily for research groups that do not have enough resources to purchase commercial licenses. In addition, it will speed up the curation process, which generally may take a long time depending on reconstruction complexity.

Mentors

Professor, Andres Pinzon Velasco Ph.D. Bioinformatics and Systems Biology Laboratory Institute for Genetics, National University of Colombia.

Professor, Silvia Restrepo Ph.D. Mycology and Phytopathology Laboratory Biological Sciences Department, Los Andes University.

Tests

Easy: Install sybil, sybilSBML and RbioRXN from CRAN.

Medium: Load the glycolysis file and identify which functions requires H2O. https://gist.github.com/ampinzonv/c763c7a9d147aecec721

Hard: Load the glycolysis file and identify which functions produce H2O as product.

Solution of tests

*** Rudolfs Petrovs [https://github.com/rp08026/Gap2F_tests/edit/master/README.md]

Sazia Mahfuz https://github.com/SaziaM/Gap2F_tests/tree/SaziaM-patch-1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly