Skip to content

OpenPredict Translator API 0.1.0

Compare
Choose a tag to compare
@vemonet vemonet released this 12 Dec 16:37
· 137 commits to master since this release

Major refactor of OpenPredict

  1. OpenPredict now uses dvc (data version control) and DagsHub (a platform to publish the data) to handle all data files required to train the models and run the predictions that are all in the data/ folder at the root of the repository (instead of having half of them committed to git, and the other half to be downloaded from some obscure servers, now every single CSV input and model output at stored in the data/ folder).

dvc/DagsHub works and looks a lot like git/GitHub, but specialized for large data files (and it can also be used to store metadata about your runs). The repo size limit for open source projects on DagsHub is 10G

You can find the data used for OpenPredict (prediction + similarity + evidence path + drkg model) at https://dagshub.com/vemonet/translator-openpredict

  1. There is now a decorator @trapi_predict to mark a functions that return predictions that can be integrated automatically in a TRAPI query. It allows the dev to specify which relations this prediction function can resolve in a TRAPI query. So the dev just need to insure that his prediction functions take the input we expect, and returns the predictions in the expected format (cf. the code example below, note that it is vastly inspired from ElasticSearch and BioThings return formats).
    Then the predictions generated by this function can be automatically integrated to our TRAPI API, and a simple GET endpoint to query the prediction individually is also automatically generated
from openpredict import trapi_predict, PredictOptions, PredictOutput

@trapi_predict(path='/predict',
    name="Get predicted targets for a given entity",
    description="Return the predicted targets for a given entity: drug (DrugBank ID) or disease (OMIM ID), with confidence scores.",
    relations=[
        {
            'subject': 'biolink:Drug',
            'predicate': 'biolink:treats',
            'object': 'biolink:Disease',
        },
        {
            'subject': 'biolink:Disease',
            'predicate': 'biolink:treated_by',
            'object': 'biolink:Drug',
        },
    ]
)
def get_predictions(
        input_id: str, options: PredictOptions
    ) -> PredictOutput:
    # Add the code the load the model and get predictions here
    predictions = {
        "hits": [
            {
                "id": "DB00001",
                "type": "biolink:Drug",
                "score": 0.12345,
                "label": "Leipirudin",
            }
        ],
        "count": 1,
    }
    return predictions
  1. When someone wants to add a new prediction model to the Translator OpenPredict API they can either create a new folder under src/ in the existing translator-openpredict repo, and add all the python files they need to train and run the prediction (and use the decorator to annotate the prediction function).
    Or do it in a separate repository published to GitHub, with data stored using dvc . So we can easily import the code and data required to run the prediction from the OpenPredict API. There is a template repository to help people get started with the recommended architecture: https://github.com/MaastrichtU-IDS/cookiecutter-openpredict-api
pip install cookiecutter
cookiecutter https://github.com/MaastrichtU-IDS/cookiecutter-openpredict-api
  1. Now use hatch instead of poetry for build process

Full Changelog: v0.0.8...v0.1.0