Add Blood-Brain Barrier Database (B3DB) to TDC #215
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on discussion in #174, adding the Blood-Brain Barrier Database (B3DB) to TDC. Currently not adding B3DB to the ADMET benchmark group, but this could be added later (@kexinhuang12345, please see ac35e01).
Dataset Description
The Blood-Brain-Barrier Dataset (B3DB) is a curated resource of 7,807 small molecules classified as either BBB permeable (BBB+) or BBB non-permeable (BBB-), with 4,956 BBB+ and 2,851 BBB- molecules originally included. BBB permeability is measured by the logarithm of the brain-plasma concentration ratio:
Numerical$\log{BB}$ data was originally included for 1,058 of the 7,807 molecules in the dataset.
Data Processing
After removing duplicates and
NA
IUPAC identifiers, there is classification data for 6,167 molecules and regression data for 942 molecules. Data processing script available at: https://gist.github.com/ayushnoori/af42cc651856f347614d0bd2a8fe7defData Description
We add two new datasets:
b3db_classification
: Binary permeability label for all 6,167 small molecules, where theb3db_regression
: NumericalReference
Meng, F., Xi, Y., Huang, J. & Ayers, P. W. A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors. Sci Data 8, 289 (2021).
DOI: 10.1038/s41597-021-01069-5
GitHub: https://github.com/theochem/B3DB
Harvard DataVerse: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/1RVMJ0