diff --git a/NEWS.md b/NEWS.md
index df7048d0..a2e38d10 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,3 +1,13 @@
+v0.31 (2017-06-30)
+------------------
+
+- The `psamm-import` tool has been moved from the `psamm-import` package to
+ the main PSAMM package. This means that to import SBML files the
+ `psamm-import` package is no longer needed. To use the model-specific Excel
+ importers, the `psamm-import` package is still needed. With this release
+ of PSAMM, the `psamm-import` package should be updated to at least 0.16.
+- The tutorial was updated with additional sections on using gap-filling
+ procedures on models.
v0.30 (2017-06-23)
------------------
diff --git a/README.rst b/README.rst
index c8af42ca..63f90aaa 100644
--- a/README.rst
+++ b/README.rst
@@ -38,15 +38,7 @@ Use ``pip`` to install (it is recommended to use a Virtualenv_):
$ pip install psamm
-The ``psamm-import`` tool is developed in `a separate repository`_. After
-installing PSAMM the ``psamm-import`` tool can be installed using:
-
-.. code:: shell
-
- $ pip install git+https://github.com/zhanglab/psamm-import.git
-
.. _Virtualenv: https://virtualenv.pypa.io/
-.. _a separate repository: https://github.com/zhanglab/psamm-import
Documentation
-------------
diff --git a/docs/install.rst b/docs/install.rst
index 629eed2a..a2b494e7 100644
--- a/docs/install.rst
+++ b/docs/install.rst
@@ -25,8 +25,11 @@ the environment by running
$ source env/bin/activate
-The *psamm-import* tool is developed in a separate Git repository. After
-installing PSAMM, the *psamm-import* tool can be installed using:
+The *psamm-import* tool is included in the main PSAMM repository. Some
+additional model specific importers for Excel format models associated
+with publications are maintained in a separate repository. After
+installing PSAMM, support for these import functions can be added through
+installing this additional program:
.. code-block:: shell
diff --git a/docs/tutorial/curation.rst b/docs/tutorial/curation.rst
index d13c7236..871279b7 100644
--- a/docs/tutorial/curation.rst
+++ b/docs/tutorial/curation.rst
@@ -473,11 +473,126 @@ a network based optimization to identify metabolites with no production pathways
(psamm-env) $ psamm-model gapcheck --method gapfind --unrestricted-exchange
These methods included in the ``gapcheck`` function can be used to identify various kinds of
-'gaps' in a metabolic model network. `PSAMM` also includes two functions for filling these gaps
+'gaps' in a metabolic model network. `PSAMM` also includes three functions for filling these gaps
through the addition of artificial reactions or reactions from a supplied database. The
-functions ``gapfill`` and ``fastgapfill`` can be used to perform these gapfilling procedures
+functions ``gapfill``, ``fastgapfill``, and ``completepath`` can be used to perform these gapfilling procedures
during the process of generating and curating a model.
+GapFill
+~~~~~~~
+The ``gapfill`` function in PSAMM can be used to apply a GapFill algorithm based on [Kumar07]_ to a metabolic model
+to search for and identify reactions that can be added into a model to unblock the production of a specific
+compound or set of compounds. To provide an example of how to utilize this ``gapfill`` function a version of
+the E. coli core model has been provided in the `tutorial-part-2/Gapfilling_Model/` directory. In this directory
+is the E. coli core model with a small additional, incomplete pathway, added that contains the following reactions:
+
+.. code-block:: yaml
+
+ - id: rxn1
+ equation: succ_c[c] => a[c]
+ - id: rxn3
+ equation: b[c] => c[c] + d[c]
+
+
+This small additional pathway converts succinate to an artificial compound 'a'. The other reaction can convert compound
+'b' to 'c' and 'd'. There is no reaction to convert 'a' to 'b' though, and this can be considered a metabolic gap.
+In an additional reaction database, but not included in the model itself, is an additional reaction:
+
+ - id: rxn2
+ equation: a[c] => b[c]
+
+
+This reaction, if added would be capable of unblocking the production of 'c' or 'd', by allowing for the conversion
+of compound 'a' to 'b'. In most cases when performing gap-filling on a model a larger database of non-model reactions
+could be used. For this test case the production of compound 'd[c]' could be unblocked by running the following command:
+
+.. code-block:: shell
+
+ (psamm-env) psamm-model gapfill --compound d[c]
+
+This would produce an output that first lists all of the reactions from the original metabolic model. Then lists the
+included gap-filling reactions with their associated penalty values. And lastly will list any reactions where the
+gap-filling result suggests that the flux bounds of the reaction be changed. A sample of the reaction is shown below::
+
+ ....
+ TPI Model 0 Dihydroxyacetone-phosphate[c] <=> Glyceraldehyde-3-phosphate[c]
+ rxn1 Model 0 Succinate[c] => a[c]
+ rxn3 Model 0 b[c] => c[c] + d[c]
+ rxn2 Add 1 a[c] => b[c]
+
+Some additional options can be used to refine the gap-filling. The first of these options is ``--no-implicit-sinks``
+option that can be added to the command. If this option is used then the gap-filling will be performed with no
+implicit sinks for compounds, meaning that all compounds produced need to be consumed by other reactions in the
+metabolic model. By default, if this option is not used with the command, then implicit sinks are added for all
+compounds in the model meaning that any compound that is produced in excess can be removed through the added sinks.
+
+The other way to refine the gap-filling procedure is through defining specific penalty values for the addition of
+reactions from different sources. Penalties can be set for specific reactions in a gap-filling database
+through a tab separated file provided in the command using the ``--penalty`` option. Additionally penalty values
+for all database reactions can be set using the ``--db-penalty`` option followed by a penalty value. Similarly
+penalty values can be assigned to added transport reactions using the ``--tp-penalty`` option and to added
+exchange reactions using the ``--ex-penalty`` option. An example of a command that applies these penalties
+to a gap-filling simulation would be like follows:
+
+.. code-block:: shell
+
+ (psamm-env) $ psamm-model gapfill --compound d[c] --ex-penalty 100 --tp-penalty 10 --db-penalty 1
+
+The ``gapfill`` function in PSAMM can be used through the model construction process to help identify potential
+new reactions to add to a model and to explore how metabolic gaps effect the capabilities of a metabolic
+network.
+
+FastGapFill
+~~~~~~~~~~~
+
+The ``fastgapfill`` function in `PSAMM` is different gap-filling method that uses the FastGapFill algorithm
+to attempt to generate a gap-filled model that is entirely flux consistent [Thiele14]_. The implementation
+of this algorithm in `PSAMM` can be utilized for unblocking an entire metabolic model or for unblocking
+specific reactions in a network. Often times unblocking all of the reactions in a model at the same time
+will not produce the most meaningful and easy to understand results so only performing this function on a
+subset of reactions is preferable. To do this the ``--subset`` option can be used to provide a file that
+contains a list of reactions to unblock. In this example that list would look like this:
+
+.. code-block:: shell
+
+ rxn1
+ rxn3
+
+
+This file can be provided to the command to unblock the small artificial pathway that was added to the E. coli core
+model:
+
+
+.. code-block:: shell
+
+ (psamm-env) $ psamm-model fastgapfill --subset subset.tsv
+
+In this case the output from this command will show the following::
+
+ ....
+ TPI Model 0 Dihydroxyacetone-phosphate[c] <=> Glyceraldehyde-3-phosphate[c]
+ rxn1 Model 0 Succinate[c] => a[c]
+ rxn3 Model 0 b[c] => c[c] + d[c]
+ EX_c[e] Add 1 c[e] <=>
+ EX_d[e] Add 1 d[e] <=>
+ EX_succ_c[e] Add 1 Succinate[e] <=>
+ TP_c[c]_c[e] Add 1 c[c] <=> c[e]
+ TP_d[c]_d[e] Add 1 d[c] <=> d[e]
+ TP_succ_c[c]_succ_c[e] Add 1 Succinate[c] <=> Succinate[e]
+ rxn2 Add 1 a[c] => b[c]
+
+The output will first list the model reactions which are labeled with the 'Model' tag in the second column
+of the output. `PSAMM` will list out any artificial exchange and transporters, as well as any gap reactions
+included from the larger database. These will be labeled with the `Add` tag in the second column. When compared
+to the ``gapfill`` results from the previous section it can be seen that the ``fastgapfill`` result suggests
+some artificial transporters and exchange reactions for certain compounds. This is due to this method
+trying to find a flux consistent gap-filling solution.
+
+Penalty values can be assigned for different types of reactions in the same way that they are in the ``gapfill``
+command. With ``--ex-penalty`` for artificial exchange reactions, ``--tp-penalty`` for artificial transporters,
+``--db-penalty`` for new database reactions, and penalties on specific reactions through a penalty file provided
+with the ``--penalty`` option.
+
Search Functions in PSAMM
-------------------------
diff --git a/docs/tutorial/import_export.rst b/docs/tutorial/import_export.rst
index 5d9842c1..f4e1ccd3 100644
--- a/docs/tutorial/import_export.rst
+++ b/docs/tutorial/import_export.rst
@@ -36,11 +36,25 @@ The ``psamm-import`` program supports the import of models in various formats.
For the SBML format, it supports the COBRA-compliant SBML specifications, the FBC
specifications, and the basic SBML specifications in levels 1, 2, and 3;
for the JSON format, it supports the import of JSON files directly from the
-`BiGG`_ database or from locally downloaded versions;
-the support for importing from Excel file is model specific and are available
-for 17 published models. There is also a generic Excel import for models
-produced by the ModelSEED pipeline. To see a list of these models or model
-formats that are supported, use the command:
+`BiGG`_ database or from locally downloaded versions.
+
+The support for importing from Excel file is model specific and are available
+for 17 published models. This import requires the installation of the separate
+psamm-import repository. There is also a generic Excel import for models
+produced that were produced by older versions of ModelSEED. Models from the
+current ModelSEED can be imported in the SBML format.
+
+To install the ``psamm-import`` package for Excel format models use the following
+command:
+
+.. code-block:: shell
+
+ (psamm-env) $ pip install git+https://github.com/zhanglab/psamm-import.git
+
+This install will make the Excel importers available from the command line when the
+``psamm-import`` program is called.
+
+To see a list of the models or model formats that are supported for import, use the command:
.. _BiGG: http://bigg.ucsd.edu
diff --git a/psamm/command.py b/psamm/command.py
index 7beea830..a069f49f 100644
--- a/psamm/command.py
+++ b/psamm/command.py
@@ -489,7 +489,7 @@ def main(command_class=None, args=None):
If no command class is specified the user will be able to select a specific
command through the first command line argument. If the ``args`` are
provided, these should be a list of strings that will be used instead of
- ``sys.argv[1]``. This is mostly useful for testing.
+ ``sys.argv[1:]``. This is mostly useful for testing.
"""
# Set up logging for the command line interface
diff --git a/psamm/datasource/sbml.py b/psamm/datasource/sbml.py
index feee6627..32c4fb40 100644
--- a/psamm/datasource/sbml.py
+++ b/psamm/datasource/sbml.py
@@ -1403,12 +1403,9 @@ def convert_model_entries(
Args:
model: :class:`NativeModel`.
"""
- compartment_map = {}
- compound_map = {}
- reaction_map = {}
-
- def find_new_ids(entries, id_map, type_name):
+ def find_new_ids(entries):
"""Create new IDs for entries."""
+ id_map = {}
new_ids = set()
for entry in entries:
new_id = convert_id(entry)
@@ -1418,15 +1415,17 @@ def find_new_ids(entries, id_map, type_name):
else:
raise ValueError(
'Entity ID {!r} is not unique after conversion'.format(
- type_name, entry.id))
+ entry.id))
id_map[entry.id] = new_id
new_ids.add(new_id)
+ return id_map
+
# Find new IDs for all entries
- find_new_ids(model.compartments, compartment_map, 'Compartment')
- find_new_ids(model.compounds, compound_map, 'Compound')
- find_new_ids(model.reactions, reaction_map, 'Reaction')
+ compartment_map = find_new_ids(model.compartments)
+ compound_map = find_new_ids(model.compounds)
+ reaction_map = find_new_ids(model.reactions)
# Create new compartment entries
new_compartments = []
diff --git a/psamm/importer.py b/psamm/importer.py
new file mode 100644
index 00000000..415e2941
--- /dev/null
+++ b/psamm/importer.py
@@ -0,0 +1,813 @@
+# This file is part of PSAMM.
+#
+# PSAMM is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PSAMM is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PSAMM. If not, see .
+#
+# Copyright 2015-2017 Jon Lund Steffensen
+
+"""Entry points and functionality for importing models.
+
+This module contains entry points for the psamm-import programs and
+functionality to assist in importing external file formats, converting
+them into proper native models and writing those models to files.
+"""
+
+from __future__ import print_function, unicode_literals
+
+import sys
+import os
+import argparse
+import logging
+import math
+import re
+import json
+import codecs
+from itertools import product
+from collections import OrderedDict, Counter
+from decimal import Decimal
+
+import yaml
+import pkg_resources
+from six import iteritems, itervalues, text_type
+from six.moves.urllib.request import urlopen
+from six.moves.urllib.parse import quote as url_quote
+
+from .datasource.native import ModelWriter
+from .datasource.reaction import (parse_reaction,
+ ParseError as ReactionParseError)
+from .datasource.entry import DictCompartmentEntry
+from .datasource import sbml
+from .expression import boolean
+from . import formula
+
+from .util import mkdir_p
+
+# Threshold for putting reactions into subsystem files
+_MAX_REACTION_COUNT = 3
+
+# Threshold for converting reactions into dictionary representation.
+_MAX_REACTION_LENGTH = 10
+
+logger = logging.getLogger(__name__)
+
+
+class ImportError(Exception):
+ """Exception used to signal a general import error."""
+
+
+class ModelLoadError(ImportError):
+ """Exception used to signal an error loading the model files."""
+
+
+class ParseError(ImportError):
+ """Exception used to signal an error parsing the model files."""
+
+
+class Importer(object):
+ """Base importer class."""
+
+ def _try_parse_formula(self, compound_id, s):
+ """Try to parse the given compound formula string.
+
+ Logs a warning if the formula could not be parsed.
+ """
+ s = s.strip()
+ if s == '':
+ return None
+
+ try:
+ # Do not return the parsed formula. For now it is better to keep
+ # the original formula string unchanged in all cases.
+ formula.Formula.parse(s)
+ except formula.ParseError:
+ logger.warning('Unable to parse compound formula {}: {}'.format(
+ compound_id, s))
+
+ return s
+
+ def _try_parse_reaction(self, reaction_id, s,
+ parser=parse_reaction, **kwargs):
+ """Try to parse the given reaction equation string.
+
+ Returns the parsed Reaction object, or raises an error if the reaction
+ could not be parsed.
+ """
+ try:
+ return parser(s, **kwargs)
+ except ReactionParseError as e:
+ if e.indicator is not None:
+ logger.error('{}\n{}\n{}'.format(
+ str(e), s, e.indicator))
+ raise ParseError('Unable to parse reaction {}: {}'.format(
+ reaction_id, s))
+
+ def _try_parse_gene_association(self, reaction_id, s):
+ """Try to parse the given gene association rule.
+
+ Logs a warning if the association rule could not be parsed and returns
+ the original string. Otherwise, returns the boolean.Expression object.
+ """
+ s = s.strip()
+ if s == '':
+ return None
+
+ try:
+ return boolean.Expression(s)
+ except boolean.ParseError as e:
+ msg = 'Failed to parse gene association for {}: {}'.format(
+ reaction_id, text_type(e))
+ if e.indicator is not None:
+ msg += '\n{}\n{}'.format(s, e.indicator)
+ logger.warning(msg)
+
+ return s
+
+
+# Define custom dict representers for YAML
+# This allows reading/writing Python OrderedDicts in the correct order.
+# See: https://stackoverflow.com/questions/5121931/in-python-how-can-you-load-yaml-mappings-as-ordereddicts # noqa
+def _dict_representer(dumper, data):
+ return dumper.represent_dict(iteritems(data))
+
+
+def _dict_constructor(loader, node):
+ return OrderedDict(loader.construct_pairs(node))
+
+
+def _set_representer(dumper, data):
+ return dumper.represent_list(iter(data))
+
+
+def _boolean_expression_representer(dumper, data):
+ return dumper.represent_unicode(text_type(data))
+
+
+def _decimal_representer(dumper, data):
+ # Code from float_representer in PyYAML.
+ if data % 1 == 0:
+ return dumper.represent_int(int(data))
+ elif math.isnan(data):
+ value = '.nan'
+ elif data == float('inf'):
+ value = '.inf'
+ elif data == float('-inf'):
+ value = '-.inf'
+ else:
+ value = text_type(data).lower()
+ if '.' not in value and 'e' in value:
+ value = value.replace('e', '.0e', 1)
+ return dumper.represent_scalar('tag:yaml.org,2002:float', value)
+
+
+def get_default_compartment(model):
+ """Return what the default compartment should be set to.
+
+ If some compounds have no compartment, unique compartment
+ name is returned to avoid collisions.
+ """
+ default_compartment = 'c'
+ default_key = set()
+ for reaction in model.reactions:
+ equation = reaction.equation
+ if equation is None:
+ continue
+
+ for compound, _ in equation.compounds:
+ default_key.add(compound.compartment)
+
+ if None in default_key and default_compartment in default_key:
+ suffix = 1
+ while True:
+ new_key = '{}_{}'.format(default_compartment, suffix)
+ if new_key not in default_key:
+ default_compartment = new_key
+ break
+ suffix += 1
+
+ if None in default_key:
+ logger.warning(
+ 'Compound(s) found without compartment, default'
+ ' compartment is set to {}.'.format(default_compartment))
+ return default_compartment
+
+
+def detect_best_flux_limit(model):
+ """Detect the best default flux limit to use for model output.
+
+ The default flux limit does not change the model but selecting a good
+ value reduced the amount of output produced and reduces clutter in the
+ output files.
+ """
+ flux_limit_count = Counter()
+
+ for reaction in model.reactions:
+ if reaction.id not in model.limits:
+ continue
+
+ equation = reaction.properties['equation']
+ if equation is None:
+ continue
+
+ _, lower, upper = model.limits[reaction.id]
+ if upper is not None and upper > 0 and equation.direction.forward:
+ flux_limit_count[upper] += 1
+ if lower is not None and -lower > 0 and equation.direction.reverse:
+ flux_limit_count[-lower] += 1
+
+ if len(flux_limit_count) == 0:
+ return None
+
+ best_flux_limit, _ = flux_limit_count.most_common(1)[0]
+ return best_flux_limit
+
+
+def reactions_to_files(model, dest, writer, split_subsystem):
+ """Turn the reaction subsystems into their own files.
+
+ If a subsystem has a number of reactions over the threshold, it gets its
+ own YAML file. All other reactions, those that don't have a subsystem or
+ are in a subsystem that falls below the threshold, get added to a common
+ reaction file.
+
+ Args:
+ model: :class:`psamm_import.model.MetabolicModel`.
+ dest: output path for model files.
+ writer: :class:`psamm.datasource.native.ModelWriter`.
+ split_subsystem: Divide reactions into multiple files by subsystem.
+ """
+ def safe_file_name(origin_name):
+ safe_name = re.sub(
+ r'\W+', '_', origin_name, flags=re.UNICODE)
+ safe_name = re.sub(
+ r'_+', '_', safe_name.lower(), flags=re.UNICODE)
+ safe_name = safe_name.strip('_')
+ return safe_name
+
+ common_reactions = []
+ reaction_files = []
+ if not split_subsystem:
+ common_reactions = sorted(model.reactions, key=lambda r: r.id)
+ if len(common_reactions) > 0:
+ reaction_file = 'reactions.yaml'
+ with open(os.path.join(dest, reaction_file), 'w') as f:
+ writer.write_reactions(f, common_reactions)
+ reaction_files.append(reaction_file)
+ else:
+ subsystems = {}
+ for reaction in sorted(model.reactions, key=lambda r: r.id):
+ if 'subsystem' in reaction.properties:
+ subsystem_file = safe_file_name(
+ reaction.properties['subsystem'])
+ subsystems.setdefault(subsystem_file, []).append(reaction)
+ else:
+ common_reactions.append(reaction)
+
+ subsystem_folder = 'reactions'
+ sub_existance = False
+ for subsystem_file, reactions in iteritems(subsystems):
+ if len(reactions) < _MAX_REACTION_COUNT:
+ for reaction in reactions:
+ common_reactions.append(reaction)
+ else:
+ if len(reactions) > 0:
+ mkdir_p(os.path.join(dest, subsystem_folder))
+ subsystem_file = os.path.join(
+ subsystem_folder, '{}.yaml'.format(subsystem_file))
+
+ with open(os.path.join(dest, subsystem_file), 'w') as f:
+ writer.write_reactions(f, reactions)
+ reaction_files.append(subsystem_file)
+ sub_existance = True
+
+ reaction_files.sort()
+ if sub_existance:
+ reaction_file = os.path.join(
+ subsystem_folder, 'other_reactions.yaml')
+ else:
+ reaction_file = 'reactions.yaml'
+ if len(common_reactions) > 0:
+ with open(os.path.join(dest, reaction_file), 'w') as f:
+ writer.write_reactions(f, common_reactions)
+ reaction_files.append(reaction_file)
+
+ return reaction_files
+
+
+def _get_output_limit(limit, default_limit):
+ """Return limit to output given default limit."""
+ return limit if limit != default_limit else None
+
+
+def _generate_limit_items(lower, upper):
+ """Yield key, value pairs for limits dictionary.
+
+ Yield pairs of key, value where key is ``lower``, ``upper`` or ``fixed``.
+ A key, value pair is emitted if the bounds are not None.
+ """
+ # Use value + 0 to convert any -0.0 to 0.0 which looks better.
+ if lower is not None and upper is not None and lower == upper:
+ yield 'fixed', upper + 0
+ else:
+ if lower is not None:
+ yield 'lower', lower + 0
+ if upper is not None:
+ yield 'upper', upper + 0
+
+
+def model_exchange(model):
+ """Return exchange definition as YAML dict."""
+ # Determine the default flux limits. If the value is already at the
+ # default it does not need to be included in the output.
+ lower_default, upper_default = None, None
+ if model.default_flux_limit is not None:
+ lower_default = -model.default_flux_limit
+ upper_default = model.default_flux_limit
+
+ compounds = []
+ for compound, reaction_id, lower, upper in sorted(
+ itervalues(model.exchange)):
+ d = OrderedDict([('id', compound.name)])
+ if reaction_id is not None:
+ d['reaction'] = reaction_id
+
+ lower = _get_output_limit(lower, lower_default)
+ upper = _get_output_limit(upper, upper_default)
+ d.update(_generate_limit_items(lower, upper))
+
+ compounds.append(d)
+
+ return OrderedDict([('compounds', compounds)])
+
+
+def model_reaction_limits(model):
+ """Yield model reaction limits as YAML dicts."""
+ for reaction in sorted(model.reactions, key=lambda r: r.id):
+ equation = reaction.properties.get('equation')
+ if equation is None:
+ continue
+
+ # Determine the default flux limits. If the value is already at the
+ # default it does not need to be included in the output.
+ lower_default, upper_default = None, None
+ if model.default_flux_limit is not None:
+ if equation.direction.reverse:
+ lower_default = -model.default_flux_limit
+ else:
+ lower_default = 0.0
+
+ if equation.direction.forward:
+ upper_default = model.default_flux_limit
+ else:
+ upper_default = 0.0
+
+ lower_flux, upper_flux = None, None
+ if reaction.id in model.limits:
+ _, lower, upper = model.limits[reaction.id]
+ lower_flux = _get_output_limit(lower, lower_default)
+ upper_flux = _get_output_limit(upper, upper_default)
+
+ if lower_flux is not None or upper_flux is not None:
+ d = OrderedDict([('reaction', reaction.id)])
+ d.update(_generate_limit_items(lower_flux, upper_flux))
+
+ yield d
+
+
+def infer_compartment_entries(model):
+ """Infer compartment entries for model based on reaction compounds."""
+ compartment_ids = set()
+ for reaction in model.reactions:
+ equation = reaction.equation
+ if equation is None:
+ continue
+
+ for compound, _ in equation.compounds:
+ compartment = compound.compartment
+ if compartment is None:
+ compartment = model.default_compartment
+
+ if compartment is not None:
+ compartment_ids.add(compartment)
+
+ for compartment in compartment_ids:
+ if compartment in model.compartments:
+ continue
+
+ entry = DictCompartmentEntry(dict(id=compartment))
+ model.compartments.add_entry(entry)
+
+
+def infer_compartment_adjacency(model):
+ """Infer compartment adjacency for model based on reactions."""
+ def reaction_compartments(seq):
+ for compound, _ in seq:
+ compartment = compound.compartment
+ if compartment is None:
+ compartment = model.default_compartment
+
+ if compartment is not None:
+ yield compartment
+
+ for reaction in model.reactions:
+ equation = reaction.equation
+ if equation is None:
+ continue
+
+ left = reaction_compartments(equation.left)
+ right = reaction_compartments(equation.right)
+ for c1, c2 in product(left, right):
+ if c1 == c2:
+ continue
+ model.compartment_boundaries.add((c1, c2))
+ model.compartment_boundaries.add((c2, c1))
+
+
+def count_genes(model):
+ """Count the number of distinct genes in model reactions."""
+ genes = set()
+ for reaction in model.reactions:
+ if reaction.genes is None:
+ continue
+
+ if isinstance(reaction.genes, boolean.Expression):
+ genes.update(v.symbol for v in reaction.genes.variables)
+ else:
+ genes.update(reaction.genes)
+
+ return len(genes)
+
+
+def write_yaml_model(model, dest='.', convert_exchange=True,
+ split_subsystem=True):
+ """Write the given MetabolicModel to YAML files in dest folder.
+
+ The parameter ``convert_exchange`` indicates whether the exchange reactions
+ should be converted automatically to an exchange file.
+ """
+ yaml.SafeDumper.add_representer(OrderedDict, _dict_representer)
+ yaml.SafeDumper.add_representer(set, _set_representer)
+ yaml.SafeDumper.add_representer(frozenset, _set_representer)
+ yaml.SafeDumper.add_representer(
+ boolean.Expression, _boolean_expression_representer)
+ yaml.SafeDumper.add_representer(Decimal, _decimal_representer)
+
+ yaml.SafeDumper.ignore_aliases = lambda *args: True
+
+ yaml.add_constructor(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
+ _dict_constructor)
+
+ yaml_args = {'default_flow_style': False,
+ 'encoding': 'utf-8',
+ 'allow_unicode': True,
+ 'width': 79}
+
+ # The ModelWriter from PSAMM is not yet able to write the full model but
+ # only reactions and compounds.
+ writer = ModelWriter()
+
+ with open(os.path.join(dest, 'compounds.yaml'), 'w+') as f:
+ writer.write_compounds(f, sorted(model.compounds, key=lambda c: c.id))
+
+ if model.default_flux_limit is None:
+ model.default_flux_limit = detect_best_flux_limit(model)
+
+ if model.extracellular_compartment is None:
+ model.extracellular_compartment = (
+ sbml.detect_extracellular_compartment(model))
+
+ if model.default_compartment is None:
+ model.default_compartment = get_default_compartment(model)
+
+ if model.default_flux_limit is not None:
+ logger.info('Using default flux limit of {}'.format(
+ model.default_flux_limit))
+
+ if convert_exchange:
+ logger.info('Converting exchange reactions to exchange file')
+ sbml.convert_exchange_to_compounds(model)
+
+ if len(model.compartments) == 0:
+ infer_compartment_entries(model)
+ logger.info('Inferred {} compartments: {}'.format(
+ len(model.compartments),
+ ', '.join(c.id for c in model.compartments)))
+
+ if len(model.compartments) != 0 and len(model.compartment_boundaries) == 0:
+ infer_compartment_adjacency(model)
+
+ reaction_files = reactions_to_files(model, dest, writer, split_subsystem)
+
+ if len(model.exchange) > 0:
+ with open(os.path.join(dest, 'exchange.yaml'), 'w+') as f:
+ yaml.safe_dump(model_exchange(model), f, **yaml_args)
+
+ reaction_limits = list(model_reaction_limits(model))
+ if len(reaction_limits) > 0:
+ with open(os.path.join(dest, 'limits.yaml'), 'w+') as f:
+ yaml.safe_dump(reaction_limits, f, **yaml_args)
+
+ model_d = OrderedDict()
+ if model.name is not None:
+ model_d['name'] = model.name
+
+ if model.biomass_reaction is not None:
+ model_d['biomass'] = model.biomass_reaction
+ if model.default_flux_limit is not None:
+ model_d['default_flux_limit'] = model.default_flux_limit
+ if model.extracellular_compartment != 'e':
+ model_d['extracellular'] = model.extracellular_compartment
+ if model.default_compartment != 'c':
+ model_d['default_compartment'] = model.default_compartment
+
+ if len(model.compartments) > 0:
+ adjacency = {}
+ for c1, c2 in model.compartment_boundaries:
+ adjacency.setdefault(c1, set()).add(c2)
+ adjacency.setdefault(c2, set()).add(c1)
+
+ compartment_list = []
+ for compartment in sorted(model.compartments, key=lambda c: c.id):
+ adjacent = adjacency.get(compartment.id)
+ if adjacent is not None and len(adjacent) == 1:
+ adjacent = next(iter(adjacent))
+ compartment_list.append(writer.convert_compartment_entry(
+ compartment, adjacent))
+
+ model_d['compartments'] = compartment_list
+
+ model_d['compounds'] = [{'include': 'compounds.yaml'}]
+ model_d['reactions'] = []
+ for reaction_file in reaction_files:
+ model_d['reactions'].append({'include': reaction_file})
+
+ if len(model.exchange) > 0:
+ model_d['exchange'] = [{'include': 'exchange.yaml'}]
+
+ if len(reaction_limits) > 0:
+ model_d['limits'] = [{'include': 'limits.yaml'}]
+
+ with open(os.path.join(dest, 'model.yaml'), 'w+') as f:
+ yaml.safe_dump(model_d, f, **yaml_args)
+
+
+def main(importer_class=None, args=None):
+ """Entry point for import program.
+
+ If the ``args`` are provided, these should be a list of strings that will
+ be used instead of ``sys.argv[1:]``. This is mostly useful for testing.
+ """
+ parser = argparse.ArgumentParser(
+ description='Import from external model formats')
+ parser.add_argument('--source', metavar='path', default='.',
+ help='Source directory or file')
+ parser.add_argument('--dest', metavar='path', default='.',
+ help='Destination directory (default is ".")')
+ parser.add_argument('--no-exchange', action='store_true',
+ help=('Disable importing exchange reactions as'
+ ' exchange compound file.'))
+ parser.add_argument('--split-subsystem', action='store_true',
+ help='Enable splitting reaction files by subsystem')
+ parser.add_argument('--merge-compounds', action='store_true',
+ help=('Merge identical compounds occuring in various'
+ ' compartments.'))
+ parser.add_argument('--force', action='store_true',
+ help='Enable overwriting model files')
+
+ if importer_class is None:
+ parser.add_argument(
+ 'format', help='Format to import ("list" to see all)')
+
+ args = parser.parse_args(args)
+
+ # Set up logging for the command line interface
+ if 'PSAMM_DEBUG' in os.environ:
+ level = getattr(logging, os.environ['PSAMM_DEBUG'].upper(), None)
+ if level is not None:
+ logging.basicConfig(level=level)
+ else:
+ logging.basicConfig(
+ level=logging.INFO, format='%(levelname)s: %(message)s')
+
+ if importer_class is None:
+ # Discover all available model importers
+ importers = {}
+ for importer_entry in pkg_resources.iter_entry_points(
+ 'psamm.importer'):
+ canonical = importer_entry.name.lower()
+ if canonical not in importers:
+ importers[canonical] = importer_entry
+ else:
+ logger.warning('Importer {} was found more than once!'.format(
+ importer_entry.name))
+
+ # Print list of importers
+ if args.format in ('list', 'help'):
+ if len(importers) == 0:
+ logger.error('No importers found!')
+ else:
+ importer_classes = []
+ for name, entry in iteritems(importers):
+ importer_class = entry.load()
+ title = getattr(importer_class, 'title', None)
+ generic = getattr(importer_class, 'generic', False)
+ if title is not None:
+ importer_classes.append(
+ (title, generic, name, importer_class))
+
+ print('Generic importers:')
+ for title, _, name, importer_class in sorted(
+ c for c in importer_classes if c[1]):
+ print('{:<12} {}'.format(name, title))
+
+ print()
+ print('Model-specific importers:')
+ for title, _, name, importer_class in sorted(
+ c for c in importer_classes if not c[1]):
+ print('{:<12} {}'.format(name, title))
+ sys.exit(0)
+
+ importer_name = args.format.lower()
+ if importer_name not in importers:
+ logger.error('Importer {} not found!'.format(importer_name))
+ logger.info('Use "list" to see available importers.')
+ sys.exit(-1)
+
+ importer_class = importers[importer_name].load()
+
+ importer = importer_class()
+
+ try:
+ model = importer.import_model(args.source)
+ except ModelLoadError as e:
+ logger.error('Failed to load model!', exc_info=True)
+ importer.help()
+ parser.error(text_type(e))
+ except ParseError as e:
+ logger.error('Failed to parse model!', exc_info=True)
+ logger.error(text_type(e))
+ sys.exit(-1)
+
+ if args.merge_compounds:
+ compounds_before = len(model.compounds)
+ sbml.merge_equivalent_compounds(model)
+ if len(model.compounds) < compounds_before:
+ logger.info(
+ 'Merged {} compound entries into {} entries by'
+ ' removing duplicates in various compartments'.format(
+ compounds_before, len(model.compounds)))
+
+ print('Model: {}'.format(model.name))
+ print('- Biomass reaction: {}'.format(model.biomass_reaction))
+ print('- Compartments: {}'.format(len(model.compartments)))
+ print('- Compounds: {}'.format(len(model.compounds)))
+ print('- Reactions: {}'.format(len(model.reactions)))
+ print('- Genes: {}'.format(count_genes(model)))
+
+ # Check if dest directory is empty. If we get an error assume that the
+ # directory does not exist.
+ dest_is_empty = False
+ try:
+ dest_is_empty = len(os.listdir(args.dest)) == 0
+ except OSError:
+ dest_is_empty = True
+
+ if not dest_is_empty:
+ if not args.force:
+ logger.error('Destination directory is not empty. Use --force'
+ ' option to proceed anyway, overwriting any existing'
+ ' files in {}'.format(args.dest))
+ return 1
+ else:
+ logger.warning('Destination directory is not empty, overwriting'
+ ' existing files in {}'.format(args.dest))
+
+ # Create destination directory if not exists
+ dest = args.dest
+ mkdir_p(dest)
+
+ convert_exchange = not args.no_exchange
+ write_yaml_model(model, dest, convert_exchange=convert_exchange,
+ split_subsystem=args.split_subsystem)
+
+
+def main_bigg(args=None, urlopen=urlopen):
+ """Entry point for BiGG import program.
+
+ If the ``args`` are provided, these should be a list of strings that will
+ be used instead of ``sys.argv[1:]``. This is mostly useful for testing.
+ """
+ parser = argparse.ArgumentParser(
+ description='Import from BiGG database')
+ parser.add_argument('--dest', metavar='path', default='.',
+ help='Destination directory (default is ".")')
+ parser.add_argument('--no-exchange', action='store_true',
+ help=('Disable importing exchange reactions as'
+ ' exchange compound file.'))
+ parser.add_argument('--split-subsystem', action='store_true',
+ help='Enable splitting reaction files by subsystem')
+ parser.add_argument('--merge-compounds', action='store_true',
+ help=('Merge identical compounds occuring in various'
+ ' compartments.'))
+ parser.add_argument('--force', action='store_true',
+ help='Enable overwriting model files')
+ parser.add_argument('id', help='BiGG model to import ("list" to see all)')
+
+ args = parser.parse_args(args)
+
+ # Set up logging for the command line interface
+ if 'PSAMM_DEBUG' in os.environ:
+ level = getattr(logging, os.environ['PSAMM_DEBUG'].upper(), None)
+ if level is not None:
+ logging.basicConfig(level=level)
+ else:
+ logging.basicConfig(
+ level=logging.INFO, format='%(levelname)s: %(message)s')
+
+ # Print list of available models
+ if args.id == 'list':
+ print('Available models:')
+ f = urlopen('http://bigg.ucsd.edu/api/v2/models')
+ doc = json.loads(f.read().decode('utf-8'))
+ results = doc['results']
+ id_width = min(max(len(result['bigg_id']) for result in results), 16)
+ for result in sorted(results, key=lambda x: x.get('organism')):
+ print('{} {}'.format(
+ result.get('bigg_id').ljust(id_width), result.get('organism')))
+ return 0
+
+ importer_entry = None
+ try:
+ importer_entry = next(
+ pkg_resources.iter_entry_points('psamm.importer', 'JSON'))
+ except StopIteration:
+ logger.error('Failed to locate the COBRA JSON model importer!')
+ sys.exit(-1)
+
+ importer_class = importer_entry.load()
+ importer = importer_class()
+
+ try:
+ f = urlopen(
+ 'http://bigg.ucsd.edu/api/v2/models/{}/download'.format(
+ url_quote(args.id)))
+ model = importer.import_model(codecs.getreader('utf-8')(f))
+ except ModelLoadError as e:
+ logger.error('Failed to load model!', exc_info=True)
+ importer.help()
+ parser.error(text_type(e))
+ except ParseError as e:
+ logger.error('Failed to parse model!', exc_info=True)
+ logger.error(text_type(e))
+ sys.exit(-1)
+
+ if args.merge_compounds:
+ compounds_before = len(model.compounds)
+ sbml.merge_equivalent_compounds(model)
+ if len(model.compounds) < compounds_before:
+ logger.info(
+ 'Merged {} compound entries into {} entries by'
+ ' removing duplicates in various compartments'.format(
+ compounds_before, len(model.compounds)))
+
+ print('Model: {}'.format(model.name))
+ print('- Biomass reaction: {}'.format(model.biomass_reaction))
+ print('- Compartments: {}'.format(len(model.compartments)))
+ print('- Compounds: {}'.format(len(model.compounds)))
+ print('- Reactions: {}'.format(len(model.reactions)))
+ print('- Genes: {}'.format(count_genes(model)))
+
+ # Check if dest directory is empty. If we get an error assume that the
+ # directory does not exist.
+ dest_is_empty = False
+ try:
+ dest_is_empty = len(os.listdir(args.dest)) == 0
+ except OSError:
+ dest_is_empty = True
+
+ if not dest_is_empty:
+ if not args.force:
+ logger.error('Destination directory is not empty. Use --force'
+ ' option to proceed anyway, overwriting any existing'
+ ' files in {}'.format(args.dest))
+ return 1
+ else:
+ logger.warning('Destination directory is not empty, overwriting'
+ ' existing files in {}'.format(args.dest))
+
+ # Create destination directory if not exists
+ dest = args.dest
+ mkdir_p(dest)
+
+ convert_exchange = not args.no_exchange
+ write_yaml_model(model, dest, convert_exchange=convert_exchange,
+ split_subsystem=args.split_subsystem)
diff --git a/psamm/importers/__init__.py b/psamm/importers/__init__.py
new file mode 100644
index 00000000..465ef578
--- /dev/null
+++ b/psamm/importers/__init__.py
@@ -0,0 +1,18 @@
+# This file is part of PSAMM.
+#
+# PSAMM is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PSAMM is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PSAMM. If not, see .
+#
+# Copyright 2014-2015 Jon Lund Steffensen
+
+"""Importers for various external model formats."""
diff --git a/psamm/importers/cobrajson.py b/psamm/importers/cobrajson.py
new file mode 100644
index 00000000..ba54ddde
--- /dev/null
+++ b/psamm/importers/cobrajson.py
@@ -0,0 +1,180 @@
+# This file is part of PSAMM.
+#
+# PSAMM is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PSAMM is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PSAMM. If not, see .
+#
+# Copyright 2015-2017 Jon Lund Steffensen
+
+"""Importer for the COBRApy JSON format."""
+
+import os
+import glob
+import json
+import logging
+import decimal
+
+from six import iteritems
+
+from ..reaction import Reaction, Compound, Direction
+from ..datasource import native
+from ..datasource.entry import (DictCompoundEntry as CompoundEntry,
+ DictReactionEntry as ReactionEntry,
+ DictCompartmentEntry as CompartmentEntry)
+
+from ..importer import Importer as BaseImporter, ModelLoadError
+
+logger = logging.getLogger(__name__)
+
+
+def _float_parser(num_str):
+ num = float(num_str)
+ if num.is_integer():
+ return int(num)
+ else:
+ return decimal.Decimal(num_str)
+
+
+class Importer(BaseImporter):
+ """Read metabolic model from COBRApy JSON format."""
+
+ name = 'json'
+ title = 'COBRApy JSON'
+ generic = True
+
+ def help(self):
+ """Print help text for importer."""
+ print('Source must contain the model definition in COBRApy JSON'
+ ' format.\n'
+ 'Expected files in source directory:\n'
+ '- *.json')
+
+ def _resolve_source(self, source):
+ """Resolve source to filepath if it is a directory."""
+ if os.path.isdir(source):
+ sources = glob.glob(os.path.join(source, '*.json'))
+ if len(sources) == 0:
+ raise ModelLoadError('No .json file found in source directory')
+ elif len(sources) > 1:
+ raise ModelLoadError(
+ 'More than one .json file found in source directory')
+ return sources[0]
+ return source
+
+ def _import(self, file):
+ model_doc = json.load(file, parse_float=_float_parser)
+ model = native.NativeModel()
+ model.compartments.update(self._read_compartments(model_doc))
+ model.compounds.update(self._read_compounds(model_doc))
+
+ for reaction, lower, upper in self._read_reactions(model_doc):
+ model.reactions.add_entry(reaction)
+ if lower is not None and upper is not None:
+ model.limits[reaction.id] = reaction.id, lower, upper
+
+ model.name = model_doc.get('id', 'COBRA JSON model')
+
+ biomass_reaction = None
+ objective_reactions = set()
+ for reaction in model_doc['reactions']:
+ if reaction.get('objective_coefficient', 0) != 0:
+ objective_reactions.add(reaction['id'])
+
+ if len(objective_reactions) == 1:
+ biomass_reaction = next(iter(objective_reactions))
+ logger.info('Detected biomass reaction: {}'.format(
+ biomass_reaction))
+ elif len(objective_reactions) > 1:
+ logger.warning(
+ 'Multiple reactions are used as the'
+ ' biomass reaction: {}'.format(objective_reactions))
+
+ model.biomass_reaction = biomass_reaction
+
+ # Set compartment in reaction compounds
+ compound_compartment = {}
+ for compound in model.compounds:
+ compartment = compound.properties.pop('compartment')
+ compound_compartment[compound.id] = compartment
+
+ for reaction in model.reactions:
+ equation = reaction.equation
+ if equation is None:
+ continue
+
+ # Translate compartments in reaction
+ left = ((c.in_compartment(compound_compartment[c.name]), v) for
+ c, v in equation.left)
+ right = ((c.in_compartment(compound_compartment[c.name]), v) for
+ c, v in equation.right)
+ reaction.properties['equation'] = Reaction(
+ equation.direction, left, right)
+
+ return model
+
+ def _read_compartments(self, doc):
+ for compartment, name in iteritems(doc.get('compartments', {})):
+ yield CompartmentEntry(dict(id=compartment, name=name))
+
+ def _read_compounds(self, doc):
+ for compound in doc['metabolites']:
+ entry = CompoundEntry(compound)
+ if 'formula' in entry.properties:
+ formula = self._try_parse_formula(
+ entry.id, entry.properties['formula'])
+ if formula is not None:
+ entry.properties['formula'] = formula
+ yield entry
+
+ def _parse_reaction_equation(self, entry):
+ metabolites = entry.properties.pop('metabolites')
+ compounds = ((Compound(metabolite), value)
+ for metabolite, value in iteritems(metabolites))
+ if (entry.properties.get('lower_bound') == 0 and
+ entry.properties.get('upper_bound') != 0):
+ direction = Direction.Forward
+ elif (entry.properties.get('lower_bound') != 0 and
+ entry.properties.get('upper_bound') == 0):
+ direction = Direction.Reverse
+ else:
+ direction = Direction.Both
+ return Reaction(direction, compounds)
+
+ def _read_reactions(self, doc):
+ for reaction in doc['reactions']:
+ entry = ReactionEntry(reaction)
+
+ entry.properties['equation'] = (
+ self._parse_reaction_equation(entry))
+
+ lower, upper = None, None
+
+ if 'lower_bound' in entry.properties:
+ lower = entry.properties.pop('lower_bound')
+ if 'upper_bound' in entry.properties:
+ upper = entry.properties.pop('upper_bound')
+
+ if 'gene_reaction_rule' in entry.properties:
+ genes = self._try_parse_gene_association(
+ entry.id, entry.properties.pop('gene_reaction_rule'))
+ if genes is not None:
+ entry.properties['genes'] = genes
+
+ yield entry, lower, upper
+
+ def import_model(self, source):
+ """Import and return model instance."""
+ if not hasattr(source, 'read'): # Not a File-like object
+ with open(self._resolve_source(source), 'r') as f:
+ return self._import(f)
+ else:
+ return self._import(source)
diff --git a/psamm/importers/sbml.py b/psamm/importers/sbml.py
new file mode 100644
index 00000000..354c5f2f
--- /dev/null
+++ b/psamm/importers/sbml.py
@@ -0,0 +1,123 @@
+# This file is part of PSAMM.
+#
+# PSAMM is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PSAMM is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PSAMM. If not, see .
+#
+# Copyright 2015-2017 Jon Lund Steffensen
+
+"""SBML importers."""
+
+import os
+import glob
+import logging
+
+
+from ..datasource import sbml
+from ..datasource.context import FilePathContext
+from ..datasource.entry import (DictCompartmentEntry, DictCompoundEntry,
+ DictReactionEntry)
+
+from ..importer import Importer, ParseError, ModelLoadError
+
+logger = logging.getLogger(__name__)
+
+
+class BaseImporter(Importer):
+ """Base importer for reading metabolic model from an SBML file."""
+
+ def help(self):
+ """Print importer help text."""
+ print('Source must contain the model definition in SBML format.\n'
+ 'Expected files in source directory:\n'
+ '- *.sbml')
+
+ def _resolve_source(self, source):
+ """Resolve source to filepath if it is a directory."""
+ if os.path.isdir(source):
+ sources = glob.glob(os.path.join(source, '*.sbml'))
+ if len(sources) == 0:
+ raise ModelLoadError('No .sbml file found in source directory')
+ elif len(sources) > 1:
+ raise ModelLoadError(
+ 'More than one .sbml file found in source directory')
+ return sources[0]
+ return source
+
+ def _get_reader(self, f):
+ raise NotImplementedError('Subclasses must implement _get_reader()')
+
+ def import_model(self, source):
+ """Import and return model instance."""
+ source = self._resolve_source(source)
+ self._context = FilePathContext(source)
+ with self._context.open() as f:
+ self._reader = self._open_reader(f)
+
+ return self._reader.create_model()
+
+
+class StrictImporter(BaseImporter):
+ """Read metabolic model from an SBML file using strict parser."""
+
+ name = 'SBML-strict'
+ title = 'SBML model (strict)'
+ generic = True
+
+ def _open_reader(self, f):
+ try:
+ return sbml.SBMLReader(f, strict=True, ignore_boundary=True)
+ except sbml.ParseError as e:
+ raise ParseError(e)
+
+ def _translate_compartment(self, entry, new_id):
+ return DictCompartmentEntry(entry, new_id)
+
+ def _translate_compound(self, entry, new_id, compartment_map):
+ return DictCompoundEntry(entry, new_id)
+
+ def _translate_reaction(
+ self, entry, new_id, compartment_map, compound_map):
+ return DictReactionEntry(entry, new_id)
+
+ def import_model(self, source):
+ """Import and return model instance."""
+ model = super(StrictImporter, self).import_model(source)
+
+ # Translate entries into dict-based entries.
+ sbml.convert_model_entries(
+ model, convert_id=lambda entry: entry.id,
+ translate_compartment=self._translate_compartment,
+ translate_compound=self._translate_compound,
+ translate_reaction=self._translate_reaction)
+
+ return model
+
+
+class NonstrictImporter(BaseImporter):
+ """Read metabolic model from an SBML file using non-strict parser."""
+
+ name = 'SBML'
+ title = 'SBML model (non-strict)'
+ generic = True
+
+ def _open_reader(self, f):
+ try:
+ return sbml.SBMLReader(f, strict=False, ignore_boundary=True)
+ except sbml.ParseError as e:
+ raise ParseError(e)
+
+ def import_model(self, source):
+ """Import and return model instance."""
+ model = super(NonstrictImporter, self).import_model(source)
+ sbml.convert_sbml_model(model)
+ return model
diff --git a/psamm/tests/test_importer.py b/psamm/tests/test_importer.py
new file mode 100644
index 00000000..7b3788e7
--- /dev/null
+++ b/psamm/tests/test_importer.py
@@ -0,0 +1,401 @@
+# This file is part of PSAMM.
+#
+# PSAMM is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PSAMM is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PSAMM. If not, see .
+#
+# Copyright 2017 Jon Lund Steffensen
+
+from __future__ import unicode_literals
+
+import os
+import tempfile
+import shutil
+import json
+import unittest
+from decimal import Decimal
+
+from six import BytesIO
+
+from psamm import importer
+from psamm.datasource.native import NativeModel
+from psamm.datasource.entry import (DictCompoundEntry as CompoundEntry,
+ DictReactionEntry as ReactionEntry,
+ DictCompartmentEntry as CompartmentEntry)
+from psamm.datasource.reaction import parse_reaction
+from psamm.reaction import Reaction, Compound
+from psamm.formula import Formula
+from psamm.expression import boolean
+
+from psamm.importers.sbml import StrictImporter as SBMLStrictImporter
+from psamm.importers.sbml import NonstrictImporter as SBMLNonstrictImporter
+from psamm.importers.cobrajson import Importer as CobraJSONImporter
+
+
+class TestImporterBaseClass(unittest.TestCase):
+ def setUp(self):
+ self.importer = importer.Importer()
+
+ def test_try_parse_formula(self):
+ result = self.importer._try_parse_formula('cpd_1', ' ')
+ self.assertIsNone(result)
+
+ # Should not return parsed result!
+ result = self.importer._try_parse_formula('cpd_1', 'CO2')
+ self.assertEqual(result, 'CO2')
+
+ result = self.importer._try_parse_formula('cpd_1', 'XYZ...\u03c0')
+ self.assertEqual(result, 'XYZ...\u03c0')
+
+ def test_try_parse_reaction(self):
+ result = self.importer._try_parse_reaction('rxn_1', 'A => (3) B')
+ self.assertIsInstance(result, Reaction)
+
+ with self.assertRaises(importer.ParseError):
+ self.importer._try_parse_reaction('rxn_1', '+ + ==> \u03c0 5')
+
+ def test_try_parse_gene_association(self):
+ result = self.importer._try_parse_gene_association(
+ 'rxn_1', 'g1 and (g2 or g3)')
+ self.assertIsInstance(result, boolean.Expression)
+
+ result = self.importer._try_parse_gene_association('rxn_1', ' ')
+ self.assertIsNone(result)
+
+ result = self.importer._try_parse_gene_association('rxn_1', '123 and')
+ self.assertEqual(result, '123 and')
+
+
+class TestImporterCheckAndWriteModel(unittest.TestCase):
+ def setUp(self):
+ self.dest = tempfile.mkdtemp()
+
+ self.model = NativeModel({
+ 'id': 'test_mode',
+ 'name': 'Test model',
+ })
+
+ # Compounds
+ self.model.compounds.add_entry(CompoundEntry({
+ 'id': 'cpd_1',
+ 'formula': Formula.parse('CO2')
+ }))
+ self.model.compounds.add_entry(CompoundEntry({
+ 'id': 'cpd_2',
+ }))
+ self.model.compounds.add_entry(CompoundEntry({
+ 'id': 'cpd_3',
+ }))
+
+ # Compartments
+ self.model.compartments.add_entry(CompartmentEntry({
+ 'id': 'c'
+ }))
+ self.model.compartments.add_entry(CompartmentEntry({
+ 'id': 'e'
+ }))
+
+ # Reactions
+ self.model.reactions.add_entry(ReactionEntry({
+ 'id': 'rxn_1',
+ 'equation': parse_reaction('(2) cpd_1[c] <=> cpd_2[e] + cpd_3[c]'),
+ 'genes': boolean.Expression('g_1 and (g_2 or g_3)'),
+ 'subsystem': 'Some subsystem'
+ }))
+ self.model.reactions.add_entry(ReactionEntry({
+ 'id': 'rxn_2',
+ 'equation': parse_reaction(
+ '(0.234) cpd_1[c] + (1.34) cpd_2[c] => cpd_3[c]'),
+ 'subsystem': 'Biomass'
+ }))
+
+ self.model.biomass_reaction = 'rxn_2'
+ self.model.limits['rxn_1'] = 'rxn_1', Decimal(-42), Decimal(1000)
+ self.model.exchange[Compound('cpd_2', 'e')] = (
+ Compound('cpd_2', 'e'), 'EX_cpd_2', -10, 0)
+
+ def tearDown(self):
+ shutil.rmtree(self.dest)
+
+ def test_get_default_compartment(self):
+ importer.get_default_compartment(self.model)
+
+ def test_detect_best_flux_limit(self):
+ flux_limit = importer.detect_best_flux_limit(self.model)
+ self.assertEqual(flux_limit, 1000)
+
+ def test_infer_compartment_entries(self):
+ self.model.compartments.clear()
+ importer.infer_compartment_entries(self.model)
+
+ self.assertEqual(len(self.model.compartments), 2)
+ self.assertEqual({c.id for c in self.model.compartments}, {'c', 'e'})
+
+ def test_get_output_limit(self):
+ self.assertIsNone(importer._get_output_limit(1000, 1000))
+ self.assertEqual(importer._get_output_limit(-5, 0), -5)
+
+ def test_generate_limit_items(self):
+ item = dict(importer._generate_limit_items(None, None))
+ self.assertEqual(item, {})
+
+ item = dict(importer._generate_limit_items(None, 400))
+ self.assertEqual(item, {'upper': 400})
+
+ item = dict(importer._generate_limit_items(-56, None))
+ self.assertEqual(item, {'lower': -56})
+
+ item = dict(importer._generate_limit_items(-224, 0))
+ self.assertEqual(item, {'lower': -224, 'upper': 0})
+
+ item = dict(importer._generate_limit_items(-5, -5))
+ self.assertEqual(item, {'fixed': -5})
+
+ def test_count_genes(self):
+ self.assertEqual(importer.count_genes(self.model), 3)
+
+ def test_write_yaml_model(self):
+ importer.write_yaml_model(self.model, self.dest)
+
+
+class TestSBMLImporter(unittest.TestCase):
+ def setUp(self):
+ self.dest = tempfile.mkdtemp()
+ with open(os.path.join(self.dest, 'model.sbml'), 'wb') as f:
+ f.write('''
+
+
+
+
+
This is model information intended to be seen by humans.
+
+
+
+
+
+
+
+
+
+
+
This is compound information intended to be seen by humans.