This directory contains scripts to sample molecules from Molecule Chef and also to compute metrics on the generated molecules that appear in Table 1.
This contains the scripts to sample reactants to put them in a format ready for the Molecular Transformer. To generate molecules these steps can be followed:
- Run
scripts/evaluate/generation/generate_for_mchef/create_reactant_bags.py
to generate tokenized reactant bags for transformer. - Feed these through the transformer to get tokenized product bags. You can run this translation with their code using a command such as:
python translate.py -model <transformer-weight-path>> \
-src <path-to-tokenized-reactants> \
-output <path-for-tokenized-products> \
-batch_size 300 -replace_unk -max_length 500 -fast -gpu 1 -n_best 5
- Use the script
scripts/evaluate/put_together_molecular_transformer_predictions.py
to put together the tokenized predictions to create a file of SMILES generated. This can be put ingenerated_smiles
folder.
Stores generated SMILES strings from the models.
This folder contains the scripts to evaluate the molecules generated by a model.
Modify tables_spec.json
to control what metrics are evaluated.
Then run python evaluate_metrics.py
to create the table.
Quality Filters
The quality filters require the rd_filters
package to run.
This can be installed for instance with: pip install git+https://github.com/PatWalters/rd_filters.git
.
The rules and alerts that we use come from GuacaMol [1] supplementary information, which can be found on the publication
web page.
- GuacaMol: Benchmarking Models for de Novo Molecular Design Nathan Brown, Marco Fiscato, Marwin H.S. Segler, and Alain C. Vaucher Journal of Chemical Information and Modeling 2019 59 (3), 1096-1108 DOI: 10.1021/acs.jcim.8b00839