diff --git a/NEWS.md b/NEWS.md index df7048d0..a2e38d10 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,13 @@ +v0.31 (2017-06-30) +------------------ + +- The `psamm-import` tool has been moved from the `psamm-import` package to + the main PSAMM package. This means that to import SBML files the + `psamm-import` package is no longer needed. To use the model-specific Excel + importers, the `psamm-import` package is still needed. With this release + of PSAMM, the `psamm-import` package should be updated to at least 0.16. +- The tutorial was updated with additional sections on using gap-filling + procedures on models. v0.30 (2017-06-23) ------------------ diff --git a/README.rst b/README.rst index c8af42ca..63f90aaa 100644 --- a/README.rst +++ b/README.rst @@ -38,15 +38,7 @@ Use ``pip`` to install (it is recommended to use a Virtualenv_): $ pip install psamm -The ``psamm-import`` tool is developed in `a separate repository`_. After -installing PSAMM the ``psamm-import`` tool can be installed using: - -.. code:: shell - - $ pip install git+https://github.com/zhanglab/psamm-import.git - .. _Virtualenv: https://virtualenv.pypa.io/ -.. _a separate repository: https://github.com/zhanglab/psamm-import Documentation ------------- diff --git a/docs/install.rst b/docs/install.rst index 629eed2a..a2b494e7 100644 --- a/docs/install.rst +++ b/docs/install.rst @@ -25,8 +25,11 @@ the environment by running $ source env/bin/activate -The *psamm-import* tool is developed in a separate Git repository. After -installing PSAMM, the *psamm-import* tool can be installed using: +The *psamm-import* tool is included in the main PSAMM repository. Some +additional model specific importers for Excel format models associated +with publications are maintained in a separate repository. After +installing PSAMM, support for these import functions can be added through +installing this additional program: .. code-block:: shell diff --git a/docs/tutorial/curation.rst b/docs/tutorial/curation.rst index d13c7236..871279b7 100644 --- a/docs/tutorial/curation.rst +++ b/docs/tutorial/curation.rst @@ -473,11 +473,126 @@ a network based optimization to identify metabolites with no production pathways (psamm-env) $ psamm-model gapcheck --method gapfind --unrestricted-exchange These methods included in the ``gapcheck`` function can be used to identify various kinds of -'gaps' in a metabolic model network. `PSAMM` also includes two functions for filling these gaps +'gaps' in a metabolic model network. `PSAMM` also includes three functions for filling these gaps through the addition of artificial reactions or reactions from a supplied database. The -functions ``gapfill`` and ``fastgapfill`` can be used to perform these gapfilling procedures +functions ``gapfill``, ``fastgapfill``, and ``completepath`` can be used to perform these gapfilling procedures during the process of generating and curating a model. +GapFill +~~~~~~~ +The ``gapfill`` function in PSAMM can be used to apply a GapFill algorithm based on [Kumar07]_ to a metabolic model +to search for and identify reactions that can be added into a model to unblock the production of a specific +compound or set of compounds. To provide an example of how to utilize this ``gapfill`` function a version of +the E. coli core model has been provided in the `tutorial-part-2/Gapfilling_Model/` directory. In this directory +is the E. coli core model with a small additional, incomplete pathway, added that contains the following reactions: + +.. code-block:: yaml + + - id: rxn1 + equation: succ_c[c] => a[c] + - id: rxn3 + equation: b[c] => c[c] + d[c] + + +This small additional pathway converts succinate to an artificial compound 'a'. The other reaction can convert compound +'b' to 'c' and 'd'. There is no reaction to convert 'a' to 'b' though, and this can be considered a metabolic gap. +In an additional reaction database, but not included in the model itself, is an additional reaction: + + - id: rxn2 + equation: a[c] => b[c] + + +This reaction, if added would be capable of unblocking the production of 'c' or 'd', by allowing for the conversion +of compound 'a' to 'b'. In most cases when performing gap-filling on a model a larger database of non-model reactions +could be used. For this test case the production of compound 'd[c]' could be unblocked by running the following command: + +.. code-block:: shell + + (psamm-env) psamm-model gapfill --compound d[c] + +This would produce an output that first lists all of the reactions from the original metabolic model. Then lists the +included gap-filling reactions with their associated penalty values. And lastly will list any reactions where the +gap-filling result suggests that the flux bounds of the reaction be changed. A sample of the reaction is shown below:: + + .... + TPI Model 0 Dihydroxyacetone-phosphate[c] <=> Glyceraldehyde-3-phosphate[c] + rxn1 Model 0 Succinate[c] => a[c] + rxn3 Model 0 b[c] => c[c] + d[c] + rxn2 Add 1 a[c] => b[c] + +Some additional options can be used to refine the gap-filling. The first of these options is ``--no-implicit-sinks`` +option that can be added to the command. If this option is used then the gap-filling will be performed with no +implicit sinks for compounds, meaning that all compounds produced need to be consumed by other reactions in the +metabolic model. By default, if this option is not used with the command, then implicit sinks are added for all +compounds in the model meaning that any compound that is produced in excess can be removed through the added sinks. + +The other way to refine the gap-filling procedure is through defining specific penalty values for the addition of +reactions from different sources. Penalties can be set for specific reactions in a gap-filling database +through a tab separated file provided in the command using the ``--penalty`` option. Additionally penalty values +for all database reactions can be set using the ``--db-penalty`` option followed by a penalty value. Similarly +penalty values can be assigned to added transport reactions using the ``--tp-penalty`` option and to added +exchange reactions using the ``--ex-penalty`` option. An example of a command that applies these penalties +to a gap-filling simulation would be like follows: + +.. code-block:: shell + + (psamm-env) $ psamm-model gapfill --compound d[c] --ex-penalty 100 --tp-penalty 10 --db-penalty 1 + +The ``gapfill`` function in PSAMM can be used through the model construction process to help identify potential +new reactions to add to a model and to explore how metabolic gaps effect the capabilities of a metabolic +network. + +FastGapFill +~~~~~~~~~~~ + +The ``fastgapfill`` function in `PSAMM` is different gap-filling method that uses the FastGapFill algorithm +to attempt to generate a gap-filled model that is entirely flux consistent [Thiele14]_. The implementation +of this algorithm in `PSAMM` can be utilized for unblocking an entire metabolic model or for unblocking +specific reactions in a network. Often times unblocking all of the reactions in a model at the same time +will not produce the most meaningful and easy to understand results so only performing this function on a +subset of reactions is preferable. To do this the ``--subset`` option can be used to provide a file that +contains a list of reactions to unblock. In this example that list would look like this: + +.. code-block:: shell + + rxn1 + rxn3 + + +This file can be provided to the command to unblock the small artificial pathway that was added to the E. coli core +model: + + +.. code-block:: shell + + (psamm-env) $ psamm-model fastgapfill --subset subset.tsv + +In this case the output from this command will show the following:: + + .... + TPI Model 0 Dihydroxyacetone-phosphate[c] <=> Glyceraldehyde-3-phosphate[c] + rxn1 Model 0 Succinate[c] => a[c] + rxn3 Model 0 b[c] => c[c] + d[c] + EX_c[e] Add 1 c[e] <=> + EX_d[e] Add 1 d[e] <=> + EX_succ_c[e] Add 1 Succinate[e] <=> + TP_c[c]_c[e] Add 1 c[c] <=> c[e] + TP_d[c]_d[e] Add 1 d[c] <=> d[e] + TP_succ_c[c]_succ_c[e] Add 1 Succinate[c] <=> Succinate[e] + rxn2 Add 1 a[c] => b[c] + +The output will first list the model reactions which are labeled with the 'Model' tag in the second column +of the output. `PSAMM` will list out any artificial exchange and transporters, as well as any gap reactions +included from the larger database. These will be labeled with the `Add` tag in the second column. When compared +to the ``gapfill`` results from the previous section it can be seen that the ``fastgapfill`` result suggests +some artificial transporters and exchange reactions for certain compounds. This is due to this method +trying to find a flux consistent gap-filling solution. + +Penalty values can be assigned for different types of reactions in the same way that they are in the ``gapfill`` +command. With ``--ex-penalty`` for artificial exchange reactions, ``--tp-penalty`` for artificial transporters, +``--db-penalty`` for new database reactions, and penalties on specific reactions through a penalty file provided +with the ``--penalty`` option. + Search Functions in PSAMM ------------------------- diff --git a/docs/tutorial/import_export.rst b/docs/tutorial/import_export.rst index 5d9842c1..f4e1ccd3 100644 --- a/docs/tutorial/import_export.rst +++ b/docs/tutorial/import_export.rst @@ -36,11 +36,25 @@ The ``psamm-import`` program supports the import of models in various formats. For the SBML format, it supports the COBRA-compliant SBML specifications, the FBC specifications, and the basic SBML specifications in levels 1, 2, and 3; for the JSON format, it supports the import of JSON files directly from the -`BiGG`_ database or from locally downloaded versions; -the support for importing from Excel file is model specific and are available -for 17 published models. There is also a generic Excel import for models -produced by the ModelSEED pipeline. To see a list of these models or model -formats that are supported, use the command: +`BiGG`_ database or from locally downloaded versions. + +The support for importing from Excel file is model specific and are available +for 17 published models. This import requires the installation of the separate +psamm-import repository. There is also a generic Excel import for models +produced that were produced by older versions of ModelSEED. Models from the +current ModelSEED can be imported in the SBML format. + +To install the ``psamm-import`` package for Excel format models use the following +command: + +.. code-block:: shell + + (psamm-env) $ pip install git+https://github.com/zhanglab/psamm-import.git + +This install will make the Excel importers available from the command line when the +``psamm-import`` program is called. + +To see a list of the models or model formats that are supported for import, use the command: .. _BiGG: http://bigg.ucsd.edu diff --git a/psamm/command.py b/psamm/command.py index 7beea830..a069f49f 100644 --- a/psamm/command.py +++ b/psamm/command.py @@ -489,7 +489,7 @@ def main(command_class=None, args=None): If no command class is specified the user will be able to select a specific command through the first command line argument. If the ``args`` are provided, these should be a list of strings that will be used instead of - ``sys.argv[1]``. This is mostly useful for testing. + ``sys.argv[1:]``. This is mostly useful for testing. """ # Set up logging for the command line interface diff --git a/psamm/datasource/sbml.py b/psamm/datasource/sbml.py index feee6627..32c4fb40 100644 --- a/psamm/datasource/sbml.py +++ b/psamm/datasource/sbml.py @@ -1403,12 +1403,9 @@ def convert_model_entries( Args: model: :class:`NativeModel`. """ - compartment_map = {} - compound_map = {} - reaction_map = {} - - def find_new_ids(entries, id_map, type_name): + def find_new_ids(entries): """Create new IDs for entries.""" + id_map = {} new_ids = set() for entry in entries: new_id = convert_id(entry) @@ -1418,15 +1415,17 @@ def find_new_ids(entries, id_map, type_name): else: raise ValueError( 'Entity ID {!r} is not unique after conversion'.format( - type_name, entry.id)) + entry.id)) id_map[entry.id] = new_id new_ids.add(new_id) + return id_map + # Find new IDs for all entries - find_new_ids(model.compartments, compartment_map, 'Compartment') - find_new_ids(model.compounds, compound_map, 'Compound') - find_new_ids(model.reactions, reaction_map, 'Reaction') + compartment_map = find_new_ids(model.compartments) + compound_map = find_new_ids(model.compounds) + reaction_map = find_new_ids(model.reactions) # Create new compartment entries new_compartments = [] diff --git a/psamm/importer.py b/psamm/importer.py new file mode 100644 index 00000000..415e2941 --- /dev/null +++ b/psamm/importer.py @@ -0,0 +1,813 @@ +# This file is part of PSAMM. +# +# PSAMM is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# PSAMM is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with PSAMM. If not, see . +# +# Copyright 2015-2017 Jon Lund Steffensen + +"""Entry points and functionality for importing models. + +This module contains entry points for the psamm-import programs and +functionality to assist in importing external file formats, converting +them into proper native models and writing those models to files. +""" + +from __future__ import print_function, unicode_literals + +import sys +import os +import argparse +import logging +import math +import re +import json +import codecs +from itertools import product +from collections import OrderedDict, Counter +from decimal import Decimal + +import yaml +import pkg_resources +from six import iteritems, itervalues, text_type +from six.moves.urllib.request import urlopen +from six.moves.urllib.parse import quote as url_quote + +from .datasource.native import ModelWriter +from .datasource.reaction import (parse_reaction, + ParseError as ReactionParseError) +from .datasource.entry import DictCompartmentEntry +from .datasource import sbml +from .expression import boolean +from . import formula + +from .util import mkdir_p + +# Threshold for putting reactions into subsystem files +_MAX_REACTION_COUNT = 3 + +# Threshold for converting reactions into dictionary representation. +_MAX_REACTION_LENGTH = 10 + +logger = logging.getLogger(__name__) + + +class ImportError(Exception): + """Exception used to signal a general import error.""" + + +class ModelLoadError(ImportError): + """Exception used to signal an error loading the model files.""" + + +class ParseError(ImportError): + """Exception used to signal an error parsing the model files.""" + + +class Importer(object): + """Base importer class.""" + + def _try_parse_formula(self, compound_id, s): + """Try to parse the given compound formula string. + + Logs a warning if the formula could not be parsed. + """ + s = s.strip() + if s == '': + return None + + try: + # Do not return the parsed formula. For now it is better to keep + # the original formula string unchanged in all cases. + formula.Formula.parse(s) + except formula.ParseError: + logger.warning('Unable to parse compound formula {}: {}'.format( + compound_id, s)) + + return s + + def _try_parse_reaction(self, reaction_id, s, + parser=parse_reaction, **kwargs): + """Try to parse the given reaction equation string. + + Returns the parsed Reaction object, or raises an error if the reaction + could not be parsed. + """ + try: + return parser(s, **kwargs) + except ReactionParseError as e: + if e.indicator is not None: + logger.error('{}\n{}\n{}'.format( + str(e), s, e.indicator)) + raise ParseError('Unable to parse reaction {}: {}'.format( + reaction_id, s)) + + def _try_parse_gene_association(self, reaction_id, s): + """Try to parse the given gene association rule. + + Logs a warning if the association rule could not be parsed and returns + the original string. Otherwise, returns the boolean.Expression object. + """ + s = s.strip() + if s == '': + return None + + try: + return boolean.Expression(s) + except boolean.ParseError as e: + msg = 'Failed to parse gene association for {}: {}'.format( + reaction_id, text_type(e)) + if e.indicator is not None: + msg += '\n{}\n{}'.format(s, e.indicator) + logger.warning(msg) + + return s + + +# Define custom dict representers for YAML +# This allows reading/writing Python OrderedDicts in the correct order. +# See: https://stackoverflow.com/questions/5121931/in-python-how-can-you-load-yaml-mappings-as-ordereddicts # noqa +def _dict_representer(dumper, data): + return dumper.represent_dict(iteritems(data)) + + +def _dict_constructor(loader, node): + return OrderedDict(loader.construct_pairs(node)) + + +def _set_representer(dumper, data): + return dumper.represent_list(iter(data)) + + +def _boolean_expression_representer(dumper, data): + return dumper.represent_unicode(text_type(data)) + + +def _decimal_representer(dumper, data): + # Code from float_representer in PyYAML. + if data % 1 == 0: + return dumper.represent_int(int(data)) + elif math.isnan(data): + value = '.nan' + elif data == float('inf'): + value = '.inf' + elif data == float('-inf'): + value = '-.inf' + else: + value = text_type(data).lower() + if '.' not in value and 'e' in value: + value = value.replace('e', '.0e', 1) + return dumper.represent_scalar('tag:yaml.org,2002:float', value) + + +def get_default_compartment(model): + """Return what the default compartment should be set to. + + If some compounds have no compartment, unique compartment + name is returned to avoid collisions. + """ + default_compartment = 'c' + default_key = set() + for reaction in model.reactions: + equation = reaction.equation + if equation is None: + continue + + for compound, _ in equation.compounds: + default_key.add(compound.compartment) + + if None in default_key and default_compartment in default_key: + suffix = 1 + while True: + new_key = '{}_{}'.format(default_compartment, suffix) + if new_key not in default_key: + default_compartment = new_key + break + suffix += 1 + + if None in default_key: + logger.warning( + 'Compound(s) found without compartment, default' + ' compartment is set to {}.'.format(default_compartment)) + return default_compartment + + +def detect_best_flux_limit(model): + """Detect the best default flux limit to use for model output. + + The default flux limit does not change the model but selecting a good + value reduced the amount of output produced and reduces clutter in the + output files. + """ + flux_limit_count = Counter() + + for reaction in model.reactions: + if reaction.id not in model.limits: + continue + + equation = reaction.properties['equation'] + if equation is None: + continue + + _, lower, upper = model.limits[reaction.id] + if upper is not None and upper > 0 and equation.direction.forward: + flux_limit_count[upper] += 1 + if lower is not None and -lower > 0 and equation.direction.reverse: + flux_limit_count[-lower] += 1 + + if len(flux_limit_count) == 0: + return None + + best_flux_limit, _ = flux_limit_count.most_common(1)[0] + return best_flux_limit + + +def reactions_to_files(model, dest, writer, split_subsystem): + """Turn the reaction subsystems into their own files. + + If a subsystem has a number of reactions over the threshold, it gets its + own YAML file. All other reactions, those that don't have a subsystem or + are in a subsystem that falls below the threshold, get added to a common + reaction file. + + Args: + model: :class:`psamm_import.model.MetabolicModel`. + dest: output path for model files. + writer: :class:`psamm.datasource.native.ModelWriter`. + split_subsystem: Divide reactions into multiple files by subsystem. + """ + def safe_file_name(origin_name): + safe_name = re.sub( + r'\W+', '_', origin_name, flags=re.UNICODE) + safe_name = re.sub( + r'_+', '_', safe_name.lower(), flags=re.UNICODE) + safe_name = safe_name.strip('_') + return safe_name + + common_reactions = [] + reaction_files = [] + if not split_subsystem: + common_reactions = sorted(model.reactions, key=lambda r: r.id) + if len(common_reactions) > 0: + reaction_file = 'reactions.yaml' + with open(os.path.join(dest, reaction_file), 'w') as f: + writer.write_reactions(f, common_reactions) + reaction_files.append(reaction_file) + else: + subsystems = {} + for reaction in sorted(model.reactions, key=lambda r: r.id): + if 'subsystem' in reaction.properties: + subsystem_file = safe_file_name( + reaction.properties['subsystem']) + subsystems.setdefault(subsystem_file, []).append(reaction) + else: + common_reactions.append(reaction) + + subsystem_folder = 'reactions' + sub_existance = False + for subsystem_file, reactions in iteritems(subsystems): + if len(reactions) < _MAX_REACTION_COUNT: + for reaction in reactions: + common_reactions.append(reaction) + else: + if len(reactions) > 0: + mkdir_p(os.path.join(dest, subsystem_folder)) + subsystem_file = os.path.join( + subsystem_folder, '{}.yaml'.format(subsystem_file)) + + with open(os.path.join(dest, subsystem_file), 'w') as f: + writer.write_reactions(f, reactions) + reaction_files.append(subsystem_file) + sub_existance = True + + reaction_files.sort() + if sub_existance: + reaction_file = os.path.join( + subsystem_folder, 'other_reactions.yaml') + else: + reaction_file = 'reactions.yaml' + if len(common_reactions) > 0: + with open(os.path.join(dest, reaction_file), 'w') as f: + writer.write_reactions(f, common_reactions) + reaction_files.append(reaction_file) + + return reaction_files + + +def _get_output_limit(limit, default_limit): + """Return limit to output given default limit.""" + return limit if limit != default_limit else None + + +def _generate_limit_items(lower, upper): + """Yield key, value pairs for limits dictionary. + + Yield pairs of key, value where key is ``lower``, ``upper`` or ``fixed``. + A key, value pair is emitted if the bounds are not None. + """ + # Use value + 0 to convert any -0.0 to 0.0 which looks better. + if lower is not None and upper is not None and lower == upper: + yield 'fixed', upper + 0 + else: + if lower is not None: + yield 'lower', lower + 0 + if upper is not None: + yield 'upper', upper + 0 + + +def model_exchange(model): + """Return exchange definition as YAML dict.""" + # Determine the default flux limits. If the value is already at the + # default it does not need to be included in the output. + lower_default, upper_default = None, None + if model.default_flux_limit is not None: + lower_default = -model.default_flux_limit + upper_default = model.default_flux_limit + + compounds = [] + for compound, reaction_id, lower, upper in sorted( + itervalues(model.exchange)): + d = OrderedDict([('id', compound.name)]) + if reaction_id is not None: + d['reaction'] = reaction_id + + lower = _get_output_limit(lower, lower_default) + upper = _get_output_limit(upper, upper_default) + d.update(_generate_limit_items(lower, upper)) + + compounds.append(d) + + return OrderedDict([('compounds', compounds)]) + + +def model_reaction_limits(model): + """Yield model reaction limits as YAML dicts.""" + for reaction in sorted(model.reactions, key=lambda r: r.id): + equation = reaction.properties.get('equation') + if equation is None: + continue + + # Determine the default flux limits. If the value is already at the + # default it does not need to be included in the output. + lower_default, upper_default = None, None + if model.default_flux_limit is not None: + if equation.direction.reverse: + lower_default = -model.default_flux_limit + else: + lower_default = 0.0 + + if equation.direction.forward: + upper_default = model.default_flux_limit + else: + upper_default = 0.0 + + lower_flux, upper_flux = None, None + if reaction.id in model.limits: + _, lower, upper = model.limits[reaction.id] + lower_flux = _get_output_limit(lower, lower_default) + upper_flux = _get_output_limit(upper, upper_default) + + if lower_flux is not None or upper_flux is not None: + d = OrderedDict([('reaction', reaction.id)]) + d.update(_generate_limit_items(lower_flux, upper_flux)) + + yield d + + +def infer_compartment_entries(model): + """Infer compartment entries for model based on reaction compounds.""" + compartment_ids = set() + for reaction in model.reactions: + equation = reaction.equation + if equation is None: + continue + + for compound, _ in equation.compounds: + compartment = compound.compartment + if compartment is None: + compartment = model.default_compartment + + if compartment is not None: + compartment_ids.add(compartment) + + for compartment in compartment_ids: + if compartment in model.compartments: + continue + + entry = DictCompartmentEntry(dict(id=compartment)) + model.compartments.add_entry(entry) + + +def infer_compartment_adjacency(model): + """Infer compartment adjacency for model based on reactions.""" + def reaction_compartments(seq): + for compound, _ in seq: + compartment = compound.compartment + if compartment is None: + compartment = model.default_compartment + + if compartment is not None: + yield compartment + + for reaction in model.reactions: + equation = reaction.equation + if equation is None: + continue + + left = reaction_compartments(equation.left) + right = reaction_compartments(equation.right) + for c1, c2 in product(left, right): + if c1 == c2: + continue + model.compartment_boundaries.add((c1, c2)) + model.compartment_boundaries.add((c2, c1)) + + +def count_genes(model): + """Count the number of distinct genes in model reactions.""" + genes = set() + for reaction in model.reactions: + if reaction.genes is None: + continue + + if isinstance(reaction.genes, boolean.Expression): + genes.update(v.symbol for v in reaction.genes.variables) + else: + genes.update(reaction.genes) + + return len(genes) + + +def write_yaml_model(model, dest='.', convert_exchange=True, + split_subsystem=True): + """Write the given MetabolicModel to YAML files in dest folder. + + The parameter ``convert_exchange`` indicates whether the exchange reactions + should be converted automatically to an exchange file. + """ + yaml.SafeDumper.add_representer(OrderedDict, _dict_representer) + yaml.SafeDumper.add_representer(set, _set_representer) + yaml.SafeDumper.add_representer(frozenset, _set_representer) + yaml.SafeDumper.add_representer( + boolean.Expression, _boolean_expression_representer) + yaml.SafeDumper.add_representer(Decimal, _decimal_representer) + + yaml.SafeDumper.ignore_aliases = lambda *args: True + + yaml.add_constructor(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, + _dict_constructor) + + yaml_args = {'default_flow_style': False, + 'encoding': 'utf-8', + 'allow_unicode': True, + 'width': 79} + + # The ModelWriter from PSAMM is not yet able to write the full model but + # only reactions and compounds. + writer = ModelWriter() + + with open(os.path.join(dest, 'compounds.yaml'), 'w+') as f: + writer.write_compounds(f, sorted(model.compounds, key=lambda c: c.id)) + + if model.default_flux_limit is None: + model.default_flux_limit = detect_best_flux_limit(model) + + if model.extracellular_compartment is None: + model.extracellular_compartment = ( + sbml.detect_extracellular_compartment(model)) + + if model.default_compartment is None: + model.default_compartment = get_default_compartment(model) + + if model.default_flux_limit is not None: + logger.info('Using default flux limit of {}'.format( + model.default_flux_limit)) + + if convert_exchange: + logger.info('Converting exchange reactions to exchange file') + sbml.convert_exchange_to_compounds(model) + + if len(model.compartments) == 0: + infer_compartment_entries(model) + logger.info('Inferred {} compartments: {}'.format( + len(model.compartments), + ', '.join(c.id for c in model.compartments))) + + if len(model.compartments) != 0 and len(model.compartment_boundaries) == 0: + infer_compartment_adjacency(model) + + reaction_files = reactions_to_files(model, dest, writer, split_subsystem) + + if len(model.exchange) > 0: + with open(os.path.join(dest, 'exchange.yaml'), 'w+') as f: + yaml.safe_dump(model_exchange(model), f, **yaml_args) + + reaction_limits = list(model_reaction_limits(model)) + if len(reaction_limits) > 0: + with open(os.path.join(dest, 'limits.yaml'), 'w+') as f: + yaml.safe_dump(reaction_limits, f, **yaml_args) + + model_d = OrderedDict() + if model.name is not None: + model_d['name'] = model.name + + if model.biomass_reaction is not None: + model_d['biomass'] = model.biomass_reaction + if model.default_flux_limit is not None: + model_d['default_flux_limit'] = model.default_flux_limit + if model.extracellular_compartment != 'e': + model_d['extracellular'] = model.extracellular_compartment + if model.default_compartment != 'c': + model_d['default_compartment'] = model.default_compartment + + if len(model.compartments) > 0: + adjacency = {} + for c1, c2 in model.compartment_boundaries: + adjacency.setdefault(c1, set()).add(c2) + adjacency.setdefault(c2, set()).add(c1) + + compartment_list = [] + for compartment in sorted(model.compartments, key=lambda c: c.id): + adjacent = adjacency.get(compartment.id) + if adjacent is not None and len(adjacent) == 1: + adjacent = next(iter(adjacent)) + compartment_list.append(writer.convert_compartment_entry( + compartment, adjacent)) + + model_d['compartments'] = compartment_list + + model_d['compounds'] = [{'include': 'compounds.yaml'}] + model_d['reactions'] = [] + for reaction_file in reaction_files: + model_d['reactions'].append({'include': reaction_file}) + + if len(model.exchange) > 0: + model_d['exchange'] = [{'include': 'exchange.yaml'}] + + if len(reaction_limits) > 0: + model_d['limits'] = [{'include': 'limits.yaml'}] + + with open(os.path.join(dest, 'model.yaml'), 'w+') as f: + yaml.safe_dump(model_d, f, **yaml_args) + + +def main(importer_class=None, args=None): + """Entry point for import program. + + If the ``args`` are provided, these should be a list of strings that will + be used instead of ``sys.argv[1:]``. This is mostly useful for testing. + """ + parser = argparse.ArgumentParser( + description='Import from external model formats') + parser.add_argument('--source', metavar='path', default='.', + help='Source directory or file') + parser.add_argument('--dest', metavar='path', default='.', + help='Destination directory (default is ".")') + parser.add_argument('--no-exchange', action='store_true', + help=('Disable importing exchange reactions as' + ' exchange compound file.')) + parser.add_argument('--split-subsystem', action='store_true', + help='Enable splitting reaction files by subsystem') + parser.add_argument('--merge-compounds', action='store_true', + help=('Merge identical compounds occuring in various' + ' compartments.')) + parser.add_argument('--force', action='store_true', + help='Enable overwriting model files') + + if importer_class is None: + parser.add_argument( + 'format', help='Format to import ("list" to see all)') + + args = parser.parse_args(args) + + # Set up logging for the command line interface + if 'PSAMM_DEBUG' in os.environ: + level = getattr(logging, os.environ['PSAMM_DEBUG'].upper(), None) + if level is not None: + logging.basicConfig(level=level) + else: + logging.basicConfig( + level=logging.INFO, format='%(levelname)s: %(message)s') + + if importer_class is None: + # Discover all available model importers + importers = {} + for importer_entry in pkg_resources.iter_entry_points( + 'psamm.importer'): + canonical = importer_entry.name.lower() + if canonical not in importers: + importers[canonical] = importer_entry + else: + logger.warning('Importer {} was found more than once!'.format( + importer_entry.name)) + + # Print list of importers + if args.format in ('list', 'help'): + if len(importers) == 0: + logger.error('No importers found!') + else: + importer_classes = [] + for name, entry in iteritems(importers): + importer_class = entry.load() + title = getattr(importer_class, 'title', None) + generic = getattr(importer_class, 'generic', False) + if title is not None: + importer_classes.append( + (title, generic, name, importer_class)) + + print('Generic importers:') + for title, _, name, importer_class in sorted( + c for c in importer_classes if c[1]): + print('{:<12} {}'.format(name, title)) + + print() + print('Model-specific importers:') + for title, _, name, importer_class in sorted( + c for c in importer_classes if not c[1]): + print('{:<12} {}'.format(name, title)) + sys.exit(0) + + importer_name = args.format.lower() + if importer_name not in importers: + logger.error('Importer {} not found!'.format(importer_name)) + logger.info('Use "list" to see available importers.') + sys.exit(-1) + + importer_class = importers[importer_name].load() + + importer = importer_class() + + try: + model = importer.import_model(args.source) + except ModelLoadError as e: + logger.error('Failed to load model!', exc_info=True) + importer.help() + parser.error(text_type(e)) + except ParseError as e: + logger.error('Failed to parse model!', exc_info=True) + logger.error(text_type(e)) + sys.exit(-1) + + if args.merge_compounds: + compounds_before = len(model.compounds) + sbml.merge_equivalent_compounds(model) + if len(model.compounds) < compounds_before: + logger.info( + 'Merged {} compound entries into {} entries by' + ' removing duplicates in various compartments'.format( + compounds_before, len(model.compounds))) + + print('Model: {}'.format(model.name)) + print('- Biomass reaction: {}'.format(model.biomass_reaction)) + print('- Compartments: {}'.format(len(model.compartments))) + print('- Compounds: {}'.format(len(model.compounds))) + print('- Reactions: {}'.format(len(model.reactions))) + print('- Genes: {}'.format(count_genes(model))) + + # Check if dest directory is empty. If we get an error assume that the + # directory does not exist. + dest_is_empty = False + try: + dest_is_empty = len(os.listdir(args.dest)) == 0 + except OSError: + dest_is_empty = True + + if not dest_is_empty: + if not args.force: + logger.error('Destination directory is not empty. Use --force' + ' option to proceed anyway, overwriting any existing' + ' files in {}'.format(args.dest)) + return 1 + else: + logger.warning('Destination directory is not empty, overwriting' + ' existing files in {}'.format(args.dest)) + + # Create destination directory if not exists + dest = args.dest + mkdir_p(dest) + + convert_exchange = not args.no_exchange + write_yaml_model(model, dest, convert_exchange=convert_exchange, + split_subsystem=args.split_subsystem) + + +def main_bigg(args=None, urlopen=urlopen): + """Entry point for BiGG import program. + + If the ``args`` are provided, these should be a list of strings that will + be used instead of ``sys.argv[1:]``. This is mostly useful for testing. + """ + parser = argparse.ArgumentParser( + description='Import from BiGG database') + parser.add_argument('--dest', metavar='path', default='.', + help='Destination directory (default is ".")') + parser.add_argument('--no-exchange', action='store_true', + help=('Disable importing exchange reactions as' + ' exchange compound file.')) + parser.add_argument('--split-subsystem', action='store_true', + help='Enable splitting reaction files by subsystem') + parser.add_argument('--merge-compounds', action='store_true', + help=('Merge identical compounds occuring in various' + ' compartments.')) + parser.add_argument('--force', action='store_true', + help='Enable overwriting model files') + parser.add_argument('id', help='BiGG model to import ("list" to see all)') + + args = parser.parse_args(args) + + # Set up logging for the command line interface + if 'PSAMM_DEBUG' in os.environ: + level = getattr(logging, os.environ['PSAMM_DEBUG'].upper(), None) + if level is not None: + logging.basicConfig(level=level) + else: + logging.basicConfig( + level=logging.INFO, format='%(levelname)s: %(message)s') + + # Print list of available models + if args.id == 'list': + print('Available models:') + f = urlopen('http://bigg.ucsd.edu/api/v2/models') + doc = json.loads(f.read().decode('utf-8')) + results = doc['results'] + id_width = min(max(len(result['bigg_id']) for result in results), 16) + for result in sorted(results, key=lambda x: x.get('organism')): + print('{} {}'.format( + result.get('bigg_id').ljust(id_width), result.get('organism'))) + return 0 + + importer_entry = None + try: + importer_entry = next( + pkg_resources.iter_entry_points('psamm.importer', 'JSON')) + except StopIteration: + logger.error('Failed to locate the COBRA JSON model importer!') + sys.exit(-1) + + importer_class = importer_entry.load() + importer = importer_class() + + try: + f = urlopen( + 'http://bigg.ucsd.edu/api/v2/models/{}/download'.format( + url_quote(args.id))) + model = importer.import_model(codecs.getreader('utf-8')(f)) + except ModelLoadError as e: + logger.error('Failed to load model!', exc_info=True) + importer.help() + parser.error(text_type(e)) + except ParseError as e: + logger.error('Failed to parse model!', exc_info=True) + logger.error(text_type(e)) + sys.exit(-1) + + if args.merge_compounds: + compounds_before = len(model.compounds) + sbml.merge_equivalent_compounds(model) + if len(model.compounds) < compounds_before: + logger.info( + 'Merged {} compound entries into {} entries by' + ' removing duplicates in various compartments'.format( + compounds_before, len(model.compounds))) + + print('Model: {}'.format(model.name)) + print('- Biomass reaction: {}'.format(model.biomass_reaction)) + print('- Compartments: {}'.format(len(model.compartments))) + print('- Compounds: {}'.format(len(model.compounds))) + print('- Reactions: {}'.format(len(model.reactions))) + print('- Genes: {}'.format(count_genes(model))) + + # Check if dest directory is empty. If we get an error assume that the + # directory does not exist. + dest_is_empty = False + try: + dest_is_empty = len(os.listdir(args.dest)) == 0 + except OSError: + dest_is_empty = True + + if not dest_is_empty: + if not args.force: + logger.error('Destination directory is not empty. Use --force' + ' option to proceed anyway, overwriting any existing' + ' files in {}'.format(args.dest)) + return 1 + else: + logger.warning('Destination directory is not empty, overwriting' + ' existing files in {}'.format(args.dest)) + + # Create destination directory if not exists + dest = args.dest + mkdir_p(dest) + + convert_exchange = not args.no_exchange + write_yaml_model(model, dest, convert_exchange=convert_exchange, + split_subsystem=args.split_subsystem) diff --git a/psamm/importers/__init__.py b/psamm/importers/__init__.py new file mode 100644 index 00000000..465ef578 --- /dev/null +++ b/psamm/importers/__init__.py @@ -0,0 +1,18 @@ +# This file is part of PSAMM. +# +# PSAMM is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# PSAMM is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with PSAMM. If not, see . +# +# Copyright 2014-2015 Jon Lund Steffensen + +"""Importers for various external model formats.""" diff --git a/psamm/importers/cobrajson.py b/psamm/importers/cobrajson.py new file mode 100644 index 00000000..ba54ddde --- /dev/null +++ b/psamm/importers/cobrajson.py @@ -0,0 +1,180 @@ +# This file is part of PSAMM. +# +# PSAMM is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# PSAMM is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with PSAMM. If not, see . +# +# Copyright 2015-2017 Jon Lund Steffensen + +"""Importer for the COBRApy JSON format.""" + +import os +import glob +import json +import logging +import decimal + +from six import iteritems + +from ..reaction import Reaction, Compound, Direction +from ..datasource import native +from ..datasource.entry import (DictCompoundEntry as CompoundEntry, + DictReactionEntry as ReactionEntry, + DictCompartmentEntry as CompartmentEntry) + +from ..importer import Importer as BaseImporter, ModelLoadError + +logger = logging.getLogger(__name__) + + +def _float_parser(num_str): + num = float(num_str) + if num.is_integer(): + return int(num) + else: + return decimal.Decimal(num_str) + + +class Importer(BaseImporter): + """Read metabolic model from COBRApy JSON format.""" + + name = 'json' + title = 'COBRApy JSON' + generic = True + + def help(self): + """Print help text for importer.""" + print('Source must contain the model definition in COBRApy JSON' + ' format.\n' + 'Expected files in source directory:\n' + '- *.json') + + def _resolve_source(self, source): + """Resolve source to filepath if it is a directory.""" + if os.path.isdir(source): + sources = glob.glob(os.path.join(source, '*.json')) + if len(sources) == 0: + raise ModelLoadError('No .json file found in source directory') + elif len(sources) > 1: + raise ModelLoadError( + 'More than one .json file found in source directory') + return sources[0] + return source + + def _import(self, file): + model_doc = json.load(file, parse_float=_float_parser) + model = native.NativeModel() + model.compartments.update(self._read_compartments(model_doc)) + model.compounds.update(self._read_compounds(model_doc)) + + for reaction, lower, upper in self._read_reactions(model_doc): + model.reactions.add_entry(reaction) + if lower is not None and upper is not None: + model.limits[reaction.id] = reaction.id, lower, upper + + model.name = model_doc.get('id', 'COBRA JSON model') + + biomass_reaction = None + objective_reactions = set() + for reaction in model_doc['reactions']: + if reaction.get('objective_coefficient', 0) != 0: + objective_reactions.add(reaction['id']) + + if len(objective_reactions) == 1: + biomass_reaction = next(iter(objective_reactions)) + logger.info('Detected biomass reaction: {}'.format( + biomass_reaction)) + elif len(objective_reactions) > 1: + logger.warning( + 'Multiple reactions are used as the' + ' biomass reaction: {}'.format(objective_reactions)) + + model.biomass_reaction = biomass_reaction + + # Set compartment in reaction compounds + compound_compartment = {} + for compound in model.compounds: + compartment = compound.properties.pop('compartment') + compound_compartment[compound.id] = compartment + + for reaction in model.reactions: + equation = reaction.equation + if equation is None: + continue + + # Translate compartments in reaction + left = ((c.in_compartment(compound_compartment[c.name]), v) for + c, v in equation.left) + right = ((c.in_compartment(compound_compartment[c.name]), v) for + c, v in equation.right) + reaction.properties['equation'] = Reaction( + equation.direction, left, right) + + return model + + def _read_compartments(self, doc): + for compartment, name in iteritems(doc.get('compartments', {})): + yield CompartmentEntry(dict(id=compartment, name=name)) + + def _read_compounds(self, doc): + for compound in doc['metabolites']: + entry = CompoundEntry(compound) + if 'formula' in entry.properties: + formula = self._try_parse_formula( + entry.id, entry.properties['formula']) + if formula is not None: + entry.properties['formula'] = formula + yield entry + + def _parse_reaction_equation(self, entry): + metabolites = entry.properties.pop('metabolites') + compounds = ((Compound(metabolite), value) + for metabolite, value in iteritems(metabolites)) + if (entry.properties.get('lower_bound') == 0 and + entry.properties.get('upper_bound') != 0): + direction = Direction.Forward + elif (entry.properties.get('lower_bound') != 0 and + entry.properties.get('upper_bound') == 0): + direction = Direction.Reverse + else: + direction = Direction.Both + return Reaction(direction, compounds) + + def _read_reactions(self, doc): + for reaction in doc['reactions']: + entry = ReactionEntry(reaction) + + entry.properties['equation'] = ( + self._parse_reaction_equation(entry)) + + lower, upper = None, None + + if 'lower_bound' in entry.properties: + lower = entry.properties.pop('lower_bound') + if 'upper_bound' in entry.properties: + upper = entry.properties.pop('upper_bound') + + if 'gene_reaction_rule' in entry.properties: + genes = self._try_parse_gene_association( + entry.id, entry.properties.pop('gene_reaction_rule')) + if genes is not None: + entry.properties['genes'] = genes + + yield entry, lower, upper + + def import_model(self, source): + """Import and return model instance.""" + if not hasattr(source, 'read'): # Not a File-like object + with open(self._resolve_source(source), 'r') as f: + return self._import(f) + else: + return self._import(source) diff --git a/psamm/importers/sbml.py b/psamm/importers/sbml.py new file mode 100644 index 00000000..354c5f2f --- /dev/null +++ b/psamm/importers/sbml.py @@ -0,0 +1,123 @@ +# This file is part of PSAMM. +# +# PSAMM is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# PSAMM is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with PSAMM. If not, see . +# +# Copyright 2015-2017 Jon Lund Steffensen + +"""SBML importers.""" + +import os +import glob +import logging + + +from ..datasource import sbml +from ..datasource.context import FilePathContext +from ..datasource.entry import (DictCompartmentEntry, DictCompoundEntry, + DictReactionEntry) + +from ..importer import Importer, ParseError, ModelLoadError + +logger = logging.getLogger(__name__) + + +class BaseImporter(Importer): + """Base importer for reading metabolic model from an SBML file.""" + + def help(self): + """Print importer help text.""" + print('Source must contain the model definition in SBML format.\n' + 'Expected files in source directory:\n' + '- *.sbml') + + def _resolve_source(self, source): + """Resolve source to filepath if it is a directory.""" + if os.path.isdir(source): + sources = glob.glob(os.path.join(source, '*.sbml')) + if len(sources) == 0: + raise ModelLoadError('No .sbml file found in source directory') + elif len(sources) > 1: + raise ModelLoadError( + 'More than one .sbml file found in source directory') + return sources[0] + return source + + def _get_reader(self, f): + raise NotImplementedError('Subclasses must implement _get_reader()') + + def import_model(self, source): + """Import and return model instance.""" + source = self._resolve_source(source) + self._context = FilePathContext(source) + with self._context.open() as f: + self._reader = self._open_reader(f) + + return self._reader.create_model() + + +class StrictImporter(BaseImporter): + """Read metabolic model from an SBML file using strict parser.""" + + name = 'SBML-strict' + title = 'SBML model (strict)' + generic = True + + def _open_reader(self, f): + try: + return sbml.SBMLReader(f, strict=True, ignore_boundary=True) + except sbml.ParseError as e: + raise ParseError(e) + + def _translate_compartment(self, entry, new_id): + return DictCompartmentEntry(entry, new_id) + + def _translate_compound(self, entry, new_id, compartment_map): + return DictCompoundEntry(entry, new_id) + + def _translate_reaction( + self, entry, new_id, compartment_map, compound_map): + return DictReactionEntry(entry, new_id) + + def import_model(self, source): + """Import and return model instance.""" + model = super(StrictImporter, self).import_model(source) + + # Translate entries into dict-based entries. + sbml.convert_model_entries( + model, convert_id=lambda entry: entry.id, + translate_compartment=self._translate_compartment, + translate_compound=self._translate_compound, + translate_reaction=self._translate_reaction) + + return model + + +class NonstrictImporter(BaseImporter): + """Read metabolic model from an SBML file using non-strict parser.""" + + name = 'SBML' + title = 'SBML model (non-strict)' + generic = True + + def _open_reader(self, f): + try: + return sbml.SBMLReader(f, strict=False, ignore_boundary=True) + except sbml.ParseError as e: + raise ParseError(e) + + def import_model(self, source): + """Import and return model instance.""" + model = super(NonstrictImporter, self).import_model(source) + sbml.convert_sbml_model(model) + return model diff --git a/psamm/tests/test_importer.py b/psamm/tests/test_importer.py new file mode 100644 index 00000000..7b3788e7 --- /dev/null +++ b/psamm/tests/test_importer.py @@ -0,0 +1,401 @@ +# This file is part of PSAMM. +# +# PSAMM is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# PSAMM is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with PSAMM. If not, see . +# +# Copyright 2017 Jon Lund Steffensen + +from __future__ import unicode_literals + +import os +import tempfile +import shutil +import json +import unittest +from decimal import Decimal + +from six import BytesIO + +from psamm import importer +from psamm.datasource.native import NativeModel +from psamm.datasource.entry import (DictCompoundEntry as CompoundEntry, + DictReactionEntry as ReactionEntry, + DictCompartmentEntry as CompartmentEntry) +from psamm.datasource.reaction import parse_reaction +from psamm.reaction import Reaction, Compound +from psamm.formula import Formula +from psamm.expression import boolean + +from psamm.importers.sbml import StrictImporter as SBMLStrictImporter +from psamm.importers.sbml import NonstrictImporter as SBMLNonstrictImporter +from psamm.importers.cobrajson import Importer as CobraJSONImporter + + +class TestImporterBaseClass(unittest.TestCase): + def setUp(self): + self.importer = importer.Importer() + + def test_try_parse_formula(self): + result = self.importer._try_parse_formula('cpd_1', ' ') + self.assertIsNone(result) + + # Should not return parsed result! + result = self.importer._try_parse_formula('cpd_1', 'CO2') + self.assertEqual(result, 'CO2') + + result = self.importer._try_parse_formula('cpd_1', 'XYZ...\u03c0') + self.assertEqual(result, 'XYZ...\u03c0') + + def test_try_parse_reaction(self): + result = self.importer._try_parse_reaction('rxn_1', 'A => (3) B') + self.assertIsInstance(result, Reaction) + + with self.assertRaises(importer.ParseError): + self.importer._try_parse_reaction('rxn_1', '+ + ==> \u03c0 5') + + def test_try_parse_gene_association(self): + result = self.importer._try_parse_gene_association( + 'rxn_1', 'g1 and (g2 or g3)') + self.assertIsInstance(result, boolean.Expression) + + result = self.importer._try_parse_gene_association('rxn_1', ' ') + self.assertIsNone(result) + + result = self.importer._try_parse_gene_association('rxn_1', '123 and') + self.assertEqual(result, '123 and') + + +class TestImporterCheckAndWriteModel(unittest.TestCase): + def setUp(self): + self.dest = tempfile.mkdtemp() + + self.model = NativeModel({ + 'id': 'test_mode', + 'name': 'Test model', + }) + + # Compounds + self.model.compounds.add_entry(CompoundEntry({ + 'id': 'cpd_1', + 'formula': Formula.parse('CO2') + })) + self.model.compounds.add_entry(CompoundEntry({ + 'id': 'cpd_2', + })) + self.model.compounds.add_entry(CompoundEntry({ + 'id': 'cpd_3', + })) + + # Compartments + self.model.compartments.add_entry(CompartmentEntry({ + 'id': 'c' + })) + self.model.compartments.add_entry(CompartmentEntry({ + 'id': 'e' + })) + + # Reactions + self.model.reactions.add_entry(ReactionEntry({ + 'id': 'rxn_1', + 'equation': parse_reaction('(2) cpd_1[c] <=> cpd_2[e] + cpd_3[c]'), + 'genes': boolean.Expression('g_1 and (g_2 or g_3)'), + 'subsystem': 'Some subsystem' + })) + self.model.reactions.add_entry(ReactionEntry({ + 'id': 'rxn_2', + 'equation': parse_reaction( + '(0.234) cpd_1[c] + (1.34) cpd_2[c] => cpd_3[c]'), + 'subsystem': 'Biomass' + })) + + self.model.biomass_reaction = 'rxn_2' + self.model.limits['rxn_1'] = 'rxn_1', Decimal(-42), Decimal(1000) + self.model.exchange[Compound('cpd_2', 'e')] = ( + Compound('cpd_2', 'e'), 'EX_cpd_2', -10, 0) + + def tearDown(self): + shutil.rmtree(self.dest) + + def test_get_default_compartment(self): + importer.get_default_compartment(self.model) + + def test_detect_best_flux_limit(self): + flux_limit = importer.detect_best_flux_limit(self.model) + self.assertEqual(flux_limit, 1000) + + def test_infer_compartment_entries(self): + self.model.compartments.clear() + importer.infer_compartment_entries(self.model) + + self.assertEqual(len(self.model.compartments), 2) + self.assertEqual({c.id for c in self.model.compartments}, {'c', 'e'}) + + def test_get_output_limit(self): + self.assertIsNone(importer._get_output_limit(1000, 1000)) + self.assertEqual(importer._get_output_limit(-5, 0), -5) + + def test_generate_limit_items(self): + item = dict(importer._generate_limit_items(None, None)) + self.assertEqual(item, {}) + + item = dict(importer._generate_limit_items(None, 400)) + self.assertEqual(item, {'upper': 400}) + + item = dict(importer._generate_limit_items(-56, None)) + self.assertEqual(item, {'lower': -56}) + + item = dict(importer._generate_limit_items(-224, 0)) + self.assertEqual(item, {'lower': -224, 'upper': 0}) + + item = dict(importer._generate_limit_items(-5, -5)) + self.assertEqual(item, {'fixed': -5}) + + def test_count_genes(self): + self.assertEqual(importer.count_genes(self.model), 3) + + def test_write_yaml_model(self): + importer.write_yaml_model(self.model, self.dest) + + +class TestSBMLImporter(unittest.TestCase): + def setUp(self): + self.dest = tempfile.mkdtemp() + with open(os.path.join(self.dest, 'model.sbml'), 'wb') as f: + f.write(''' + + + + +

This is model information intended to be seen by humans.

+ +
+ + + + + + + + +

This is compound information intended to be seen by humans.

+ +
+
+ + + + +
+ + + + + + + + + + + + + + + + + +

This is reaction information intended to be seen by humans.

+ +
+ + + + + +
  • + + + + + + + + + + + + + + + + + + + + + + + + + '''.encode('utf-8')) + + def tearDown(self): + shutil.rmtree(self.dest) + + def test_create_strict_importer(self): + importer = SBMLStrictImporter() + importer.help() + importer.import_model(os.path.join(self.dest, 'model.sbml')) + + def test_strict_importer_main(self): + output_dir = os.path.join(self.dest, 'output') + os.mkdir(output_dir) + importer.main(importer_class=SBMLStrictImporter, args=[ + '--source', os.path.join(self.dest, 'model.sbml'), + '--dest', output_dir + ]) + + def test_create_nonstrict_importer(self): + importer = SBMLNonstrictImporter() + importer.help() + importer.import_model(os.path.join(self.dest, 'model.sbml')) + + def test_nonstrict_importer_main(self): + output_dir = os.path.join(self.dest, 'output') + os.mkdir(output_dir) + importer.main(importer_class=SBMLNonstrictImporter, args=[ + '--source', os.path.join(self.dest, 'model.sbml'), + '--dest', output_dir + ]) + + +class TestCobraJSONImporter(unittest.TestCase): + def setUp(self): + self.dest = tempfile.mkdtemp() + with open(os.path.join(self.dest, 'model.json'), 'w') as f: + json.dump({ + 'id': 'test_model', + 'compartments': { + 'c': 'Cytosol', + 'e': 'Extracellular', + }, + 'metabolites': [ + { + 'id': 'cpd_1', + 'name': 'Compound 1', + 'formula': 'CO2', + 'compartment': 'c' + }, + { + 'id': 'cpd_2', + 'name': 'Compound 2', + 'charge': -2, + 'compartment': 'e' + }, + { + 'id': 'cpd_3', + 'compartment': 'c' + } + ], + 'reactions': [ + { + 'id': 'rxn_1', + 'metabolites': { + 'cpd_1': -1, + 'cpd_2': 2, + 'cpd_3': 1, + }, + 'lower_bound': -20, + 'upper_bound': 100, + 'gene_reaction_rule': 'gene_1 and (gene_2 or gene_3)' + } + ] + }, f) + + def tearDown(self): + shutil.rmtree(self.dest) + + def test_create_importer(self): + importer = CobraJSONImporter() + importer.help() + importer.import_model(os.path.join(self.dest, 'model.json')) + + def test_cobra_json_main(self): + output_dir = os.path.join(self.dest, 'output') + os.mkdir(output_dir) + importer.main(importer_class=CobraJSONImporter, args=[ + '--source', os.path.join(self.dest, 'model.json'), + '--dest', output_dir + ]) + + def test_cobra_json_main_bigg(self): + def mock_urlopen(url): + return open(os.path.join(self.dest, 'model.json'), 'rb') + + output_dir = os.path.join(self.dest, 'output') + os.mkdir(output_dir) + importer.main_bigg( + args=['mock_model', '--dest', output_dir], urlopen=mock_urlopen) + + +class TestBiGGImportMain(unittest.TestCase): + def test_list_models(self): + def mock_urlopen(url): + return BytesIO(b''' +{"results_count": 4, + "results": [ + {"organism": "Escherichia coli str. K-12 substr. MG1655", + "metabolite_count": 72, + "bigg_id": "e_coli_core", + "gene_count": 137, + "reaction_count": 95}, + {"organism": "Homo sapiens", + "metabolite_count": 342, + "bigg_id": "iAB_RBC_283", + "gene_count": 346, + "reaction_count": 469}, + {"organism": "Escherichia coli str. K-12 substr. MG1655", + "metabolite_count": 1668, + "bigg_id": "iAF1260", + "gene_count": 1261, + "reaction_count": 2382}]}''') + + importer.main_bigg(['list'], urlopen=mock_urlopen) diff --git a/psamm/util.py b/psamm/util.py index 07c131f0..75689f80 100644 --- a/psamm/util.py +++ b/psamm/util.py @@ -19,6 +19,8 @@ from __future__ import unicode_literals +import os +import errno import re import math import subprocess @@ -28,6 +30,15 @@ from six import iteritems, text_type +def mkdir_p(path): + """Make directory path if it does not already exist.""" + try: + os.makedirs(path) + except OSError as e: + if e.errno != errno.EEXIST or not os.path.isdir(path): + raise + + class LoggerFile(object): """File-like object that forwards to a logger. diff --git a/setup.py b/setup.py index 0912e612..9b076573 100755 --- a/setup.py +++ b/setup.py @@ -16,15 +16,41 @@ # # Copyright 2014-2017 Jon Lund Steffensen +from __future__ import print_function + +import sys from setuptools import setup, find_packages +import pkg_resources # Read long description with open('README.rst') as f: long_description = f.read() +# Test whether psamm-import is currently installed. Since the psamm-import +# functionality was moved to this package (except Excel importers), only newer +# versions of psamm-import are compatible with recent versions of PSAMM. +try: + pkg_resources.get_distribution('psamm-import <= 0.15.2') +except (pkg_resources.DistributionNotFound, + pkg_resources.VersionConflict): + pass +else: + msg = ( + 'Please upgrade or uninstall psamm-import before upgrading psamm:\n' + '$ pip install --upgrade psamm-import\n' + ' OR\n' + '$ pip uninstall psamm-import' + '\n\n' + ' The functionality of the psamm-import package has been moved into' + ' the psamm package, and the psamm-import package now only contains' + ' the model-specific Excel importers.') + print(msg, file=sys.stderr) + sys.exit(1) + + setup( name='psamm', - version='0.30', + version='0.31', description='PSAMM metabolic modeling tools', maintainer='Jon Lund Steffensen', maintainer_email='jon_steffensen@uri.edu', @@ -50,6 +76,8 @@ psamm-model = psamm.command:main psamm-sbml-model = psamm.command:main_sbml psamm-list-lpsolvers = psamm.lpsolver.generic:list_solvers + psamm-import = psamm.importer:main + psamm-import-bigg = psamm.importer:main_bigg [psamm.commands] chargecheck = psamm.commands.chargecheck:ChargeBalanceCommand @@ -72,6 +100,11 @@ sbmlexport = psamm.commands.sbmlexport:SBMLExport search = psamm.commands.search:SearchCommand tableexport = psamm.commands.tableexport:ExportTableCommand + + [psamm.importer] + JSON = psamm.importers.cobrajson:Importer + SBML = psamm.importers.sbml:NonstrictImporter + SBML-strict = psamm.importers.sbml:StrictImporter ''', test_suite='psamm.tests',