Skip to content

Latest commit

 

History

History
56 lines (44 loc) · 3.3 KB

README.md

File metadata and controls

56 lines (44 loc) · 3.3 KB

Structure visualization via UMAP embedding of MCES distances

This repository contains datasets from the Paper Small molecule machine learning: All models are wrong, some may not even be helpful alongside Jupyter Notebooks for visualization of MCES distances. Files too large for GitHub are hosted at OSF.

file description
biostructures.csv Biomolecular structures (SMILES and InChI-key first block)
biostructures_20k.csv Subsample of biomolecular structures used throughout the paper
subsampled_instances_20k.csv Subsample of pairs of biomolecular structures used for runtime and threshold evaluations
mces_distances.npz Compressed numpy-object containing all computed MCES distances alongside SMILES. Hosted externally at doi:10.17605/OSF.IO/5SXFE.
umap_df.csv Computed UMAP embeddings for various datasets
umap_embedding_biostructures.pkl umap-learn object allowing projection of new structures onto the computed UMAP embedding. Hosted externally at doi:10.17605/OSF.IO/5SXFE.

Visualization

Visualization of precomputed UMAP embeddings as well as for new structures is possible via the python-script umap_vis.py. If you just want to use the visualization, download this repository and run python umap_vis.py.

To project MCES distances of a new dataset onto the existing UMAP embedding, use the Jupyter Notebook umap_embedding.ipynb.

A python installation with version >= 3.9 is required (3.9.18 is was used in development). Packages required are:

umap-learn=0.5.3
numba=0.53.1
scipy=1.7.1
pandas
numpy
plotly
rdkit
dash
gunicorn

A conda (or mamba) environment with all necessary packages installed can be created with

conda env create -f conda_env.yml
# to activate:
conda activate umap_mces

Visualization example

Docker

A docker container for the visualization can be built with the provided Dockerfile.

For the special case of self-hosting the docker container via reverse proxy, the environment variable PROXY_PREFIX_REQUESTS might have to be set with the docker run option docker run -e PROXY_PREFIX_REQUESTS='...' ....