-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
murcko scaffold split #18
base: main
Are you sure you want to change the base?
Conversation
@karinazad How do you feel about I believe it’s clearer from a discovery perspective and it matches the underlying data structure. |
shuffle: bool = True, | ||
include_chirality: bool = False, | ||
) -> tuple[Subset, Subset]: | ||
"""Split a dataset based on Murcko scaffold splitting based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prepend r
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? Is that for the input SMILES strings? Escape characters shouldn't be present in valid SMILES anyway and isn't it that if a string is formed with escape chars without r initially, prepending it won't change the format? Or maybe I'm just misunderstanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, preprend r
to the docstring.
@karinazad You’ll also need to add this to the docs. |
for ind, s in enumerate(smiles): | ||
mol = Chem.MolFromSmiles(s) | ||
if mol is not None: | ||
scaffold = MurckoScaffoldSmiles(mol=mol, includeChirality=include_chirality) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird function? It looks like a wrapper around:
Chem.MolToSmiles(GetScaffoldForMol(mol), includeChirality)
No clue about this:
It looks autogenerated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah GetScaffoldForMol
looks strange
should we also change the function name? |
from unittest.mock import MagicMock, patch | ||
|
||
import pytest | ||
from beignet.subsets._murcko_scaffold_split import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from beignet.subsets import (...)
@@ -0,0 +1,71 @@ | |||
from importlib.util import find_spec | |||
from unittest.mock import MagicMock, patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from unittest.mock import MagicMock
import unittest.mock
Use unittest.mock.patch
explicitly.
@@ -0,0 +1,71 @@ | |||
from importlib.util import find_spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make explicit.
@@ -0,0 +1,122 @@ | |||
import math | |||
import random | |||
from collections import defaultdict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make explicit.
https://doi.org/10.1021/jm9602928 | ||
- "RDKit: Open-source cheminformatics. https://www.rdkit.org" | ||
""" | ||
train_idx, test_idx = _murcko_scaffold_split_indices( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline?
|
||
scaffolds = defaultdict(list) | ||
|
||
for ind, s in enumerate(smiles): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid abbreviations, e.g., use index
and sequence
.
Chem, MurckoScaffoldSmiles = None, None | ||
|
||
|
||
def murcko_scaffold_split( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add generator: Generator
arg.
seed: int = 0xDEADBEEF, | ||
shuffle: bool = True, | ||
include_chirality: bool = False, | ||
) -> tuple[Subset, Subset]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better design is returning List[Subset]
based on a lengths
parameter, e.g., see torch.utils.data.random_split.
Add |
|
||
if shuffle: | ||
if seed is not None: | ||
random.Random(seed).shuffle(scaffolds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use generator.
Implements Murcko scaffold split based on RDKit