Merge branch 'release'

zhanglab · Jun 30, 2017 · dc42784 · dc42784
2 parents dc670ff + 053e3b4
commit dc42784
Show file tree

Hide file tree

Showing 14 changed files with 1,740 additions and 28 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,13 @@
+v0.31 (2017-06-30)
+------------------
+
+- The `psamm-import` tool has been moved from the `psamm-import` package to
+  the main PSAMM package. This means that to import SBML files the
+  `psamm-import` package is no longer needed. To use the model-specific Excel
+  importers, the `psamm-import` package is still needed. With this release
+  of PSAMM, the `psamm-import` package should be updated to at least 0.16.
+- The tutorial was updated with additional sections on using gap-filling
+  procedures on models.
 
 v0.30 (2017-06-23)
 ------------------

diff --git a/README.rst b/README.rst
@@ -38,15 +38,7 @@ Use ``pip`` to install (it is recommended to use a Virtualenv_):
 
     $ pip install psamm
 
-The ``psamm-import`` tool is developed in `a separate repository`_. After
-installing PSAMM the ``psamm-import`` tool can be installed using:
-
-.. code:: shell
-
-    $ pip install git+https://github.com/zhanglab/psamm-import.git
-
 .. _Virtualenv: https://virtualenv.pypa.io/
-.. _a separate repository: https://github.com/zhanglab/psamm-import
 
 Documentation
 -------------

diff --git a/docs/install.rst b/docs/install.rst
@@ -25,8 +25,11 @@ the environment by running
 
     $ source env/bin/activate
 
-The *psamm-import* tool is developed in a separate Git repository. After
-installing PSAMM, the *psamm-import* tool can be installed using:
+The *psamm-import* tool is included in the main PSAMM repository. Some
+additional model specific importers for Excel format models associated
+with publications are maintained in a separate repository. After
+installing PSAMM, support for these import functions can be added through
+installing this additional program:
 
 .. code-block:: shell
 

diff --git a/docs/tutorial/curation.rst b/docs/tutorial/curation.rst
@@ -473,11 +473,126 @@ a network based optimization to identify metabolites with no production pathways
     (psamm-env) $ psamm-model gapcheck --method gapfind --unrestricted-exchange
 
 These methods included in the ``gapcheck`` function can be used to identify various kinds of
-'gaps' in a metabolic model network. `PSAMM` also includes two functions for filling these gaps
+'gaps' in a metabolic model network. `PSAMM` also includes three functions for filling these gaps
 through the addition of artificial reactions or reactions from a supplied database. The
-functions ``gapfill`` and ``fastgapfill`` can be used to perform these gapfilling procedures
+functions ``gapfill``, ``fastgapfill``, and ``completepath`` can be used to perform these gapfilling procedures
 during the process of generating and curating a model.
 
+GapFill
+~~~~~~~
+The ``gapfill`` function in PSAMM can be used to apply a GapFill algorithm based on [Kumar07]_ to a metabolic model
+to search for and identify reactions that can be added into a model to unblock the production of a specific
+compound or set of compounds. To provide an example of how to utilize this ``gapfill`` function a version of
+the E. coli core model has been provided in the `tutorial-part-2/Gapfilling_Model/` directory. In this directory
+is the E. coli core model with a small additional, incomplete pathway, added that contains the following reactions:
+
+.. code-block:: yaml
+
+    - id: rxn1
+      equation: succ_c[c] => a[c]
+    - id: rxn3
+      equation: b[c] => c[c] + d[c]
+
+
+This small additional pathway converts succinate to an artificial compound 'a'. The other reaction can convert compound
+'b' to 'c' and 'd'. There is no reaction to convert 'a' to 'b' though, and this can be considered a metabolic gap.
+In an additional reaction database, but not included in the model itself, is an additional reaction:
+
+    - id: rxn2
+      equation: a[c] => b[c]
+
+
+This reaction, if added would be capable of unblocking the production of 'c' or 'd', by allowing for the conversion
+of compound 'a' to 'b'. In most cases when performing gap-filling on a model a larger database of non-model reactions
+could be used. For this test case the production of compound 'd[c]' could be unblocked by running the following command:
+
+.. code-block:: shell
+
+    (psamm-env) psamm-model gapfill --compound d[c]
+
+This would produce an output that first lists all of the reactions from the original metabolic model. Then lists the
+included gap-filling reactions with their associated penalty values. And lastly will list any reactions where the
+gap-filling result suggests that the flux bounds of the reaction be changed. A sample of the reaction is shown below::
+
+    ....
+    TPI	Model	0	Dihydroxyacetone-phosphate[c] <=> Glyceraldehyde-3-phosphate[c]
+    rxn1	Model	0	Succinate[c] => a[c]
+    rxn3	Model	0	b[c] => c[c] + d[c]
+    rxn2	Add	1	a[c] => b[c]
+
+Some additional options can be used to refine the gap-filling. The first of these options is ``--no-implicit-sinks``
+option that can be added to the command. If this option is used then the gap-filling will be performed with no
+implicit sinks for compounds, meaning that all compounds produced need to be consumed by other reactions in the
+metabolic model. By default, if this option is not used with the command, then implicit sinks are added for all
+compounds in the model meaning that any compound that is produced in excess can be removed through the added sinks.
+
+The other way to refine the gap-filling procedure is through defining specific penalty values for the addition of
+reactions from different sources. Penalties can be set for specific reactions in a gap-filling database
+through a tab separated file provided in the command using the ``--penalty`` option. Additionally penalty values
+for all database reactions can be set using the ``--db-penalty`` option followed by a penalty value. Similarly
+penalty values can be assigned to added transport reactions using the ``--tp-penalty`` option and to added
+exchange reactions using the ``--ex-penalty`` option. An example of a command that applies these penalties
+to a gap-filling simulation would be like follows:
+
+.. code-block:: shell
+
+    (psamm-env) $ psamm-model gapfill --compound d[c] --ex-penalty 100 --tp-penalty 10 --db-penalty 1
+
+The ``gapfill`` function in PSAMM can be used through the model construction process to help identify potential
+new reactions to add to a model and to explore how metabolic gaps effect the capabilities of a metabolic
+network.
+
+FastGapFill
+~~~~~~~~~~~
+
+The ``fastgapfill`` function in `PSAMM` is different gap-filling method that uses the FastGapFill algorithm
+to attempt to generate a gap-filled model that is entirely flux consistent [Thiele14]_. The implementation
+of this algorithm in `PSAMM` can be utilized for unblocking an entire metabolic model or for unblocking
+specific reactions in a network. Often times unblocking all of the reactions in a model at the same time
+will not produce the most meaningful and easy to understand results so only performing this function on a
+subset of reactions is preferable. To do this the ``--subset`` option can be used to provide a file that
+contains a list of reactions to unblock. In this example that list would look like this:
+
+.. code-block:: shell
+
+    rxn1
+    rxn3
+
+
+This file can be provided to the command to unblock the small artificial pathway that was added to the E. coli core
+model:
+
+
+.. code-block:: shell
+
+    (psamm-env) $ psamm-model fastgapfill --subset subset.tsv
+
+In this case the output from this command will show the following::
+
+    ....
+    TPI	Model	0	Dihydroxyacetone-phosphate[c] <=> Glyceraldehyde-3-phosphate[c]
+    rxn1	Model	0	Succinate[c] => a[c]
+    rxn3	Model	0	b[c] => c[c] + d[c]
+    EX_c[e]	Add	1	c[e] <=>
+    EX_d[e]	Add	1	d[e] <=>
+    EX_succ_c[e]	Add	1	Succinate[e] <=>
+    TP_c[c]_c[e]	Add	1	c[c] <=> c[e]
+    TP_d[c]_d[e]	Add	1	d[c] <=> d[e]
+    TP_succ_c[c]_succ_c[e]	Add	1	Succinate[c] <=> Succinate[e]
+    rxn2	Add	1	a[c] => b[c]
+
+The output will first list the model reactions which are labeled with the 'Model' tag in the second column
+of the output. `PSAMM` will list out any artificial exchange and transporters, as well as any gap reactions
+included from the larger database. These will be labeled with the `Add` tag in the second column. When compared
+to the ``gapfill`` results from the previous section it can be seen that the ``fastgapfill`` result suggests
+some artificial transporters and exchange reactions for certain compounds. This is due to this method
+trying to find a flux consistent gap-filling solution.
+
+Penalty values can be assigned for different types of reactions in the same way that they are in the ``gapfill``
+command. With ``--ex-penalty`` for artificial exchange reactions, ``--tp-penalty`` for artificial transporters,
+``--db-penalty`` for new database reactions, and penalties on specific reactions through a penalty file provided
+with the ``--penalty`` option.
+
 Search Functions in PSAMM
 -------------------------
 

diff --git a/docs/tutorial/import_export.rst b/docs/tutorial/import_export.rst
@@ -36,11 +36,25 @@ The ``psamm-import`` program supports the import of models in various formats.
 For the SBML format, it supports the COBRA-compliant SBML specifications, the FBC
 specifications, and the basic SBML specifications in levels 1, 2, and 3;
 for the JSON format, it supports the import of JSON files directly from the
-`BiGG`_ database or from locally downloaded versions;
-the support for importing from Excel file is model specific and are available
-for 17 published models. There is also a generic Excel import for models
-produced by the ModelSEED pipeline. To see a list of these models or model
-formats that are supported, use the command:
+`BiGG`_ database or from locally downloaded versions.
+
+The support for importing from Excel file is model specific and are available
+for 17 published models. This import requires the installation of the separate
+psamm-import repository. There is also a generic Excel import for models
+produced that were produced by older versions of ModelSEED. Models from the
+current ModelSEED can be imported in the SBML format.
+
+To install the ``psamm-import`` package for Excel format models use the following
+command:
+
+.. code-block:: shell
+
+    (psamm-env) $ pip install git+https://github.com/zhanglab/psamm-import.git
+
+This install will make the Excel importers available from the command line when the
+``psamm-import`` program is called.
+
+To see a list of the models or model formats that are supported for import, use the command:
 
 .. _BiGG: http://bigg.ucsd.edu
 

diff --git a/psamm/command.py b/psamm/command.py
@@ -489,7 +489,7 @@ def main(command_class=None, args=None):
     If no command class is specified the user will be able to select a specific
     command through the first command line argument. If the ``args`` are
     provided, these should be a list of strings that will be used instead of
-    ``sys.argv[1]``. This is mostly useful for testing.
+    ``sys.argv[1:]``. This is mostly useful for testing.
     """
 
     # Set up logging for the command line interface

diff --git a/psamm/datasource/sbml.py b/psamm/datasource/sbml.py
@@ -1403,12 +1403,9 @@ def convert_model_entries(
     Args:
         model: :class:`NativeModel`.
     """
-    compartment_map = {}
-    compound_map = {}
-    reaction_map = {}
-
-    def find_new_ids(entries, id_map, type_name):
+    def find_new_ids(entries):
         """Create new IDs for entries."""
+        id_map = {}
         new_ids = set()
         for entry in entries:
             new_id = convert_id(entry)
@@ -1418,15 +1415,17 @@ def find_new_ids(entries, id_map, type_name):
                 else:
                     raise ValueError(
                         'Entity ID {!r} is not unique after conversion'.format(
-                            type_name, entry.id))
+                            entry.id))
 
             id_map[entry.id] = new_id
             new_ids.add(new_id)
 
+        return id_map
+
     # Find new IDs for all entries
-    find_new_ids(model.compartments, compartment_map, 'Compartment')
-    find_new_ids(model.compounds, compound_map, 'Compound')
-    find_new_ids(model.reactions, reaction_map, 'Reaction')
+    compartment_map = find_new_ids(model.compartments)
+    compound_map = find_new_ids(model.compounds)
+    reaction_map = find_new_ids(model.reactions)
 
     # Create new compartment entries
     new_compartments = []