A BioBERT-based NLP model to perform relation extraction (RE) and named entity recognition (NER) to identify directional DDIs. Model weights are from BioBERT-Large v1.1. Training and validation scripts are modified from https://github.com/kamalkraj/BERT-NER-TF and implemented with TensorFlow v2.
python3
tensorflow
(version >= 2.0)fastprogress
(version >= 0.1.21)seqeval
(version >= 0.0.5)
BERT-TF-master
contains the main scripts for training and validationDDI_data
contains the training and validation datasets for RE and NER steps
- Download BioBERT-Large v1.1. Save as a sub-directory, e.g. 'biobert_large'.
- Convert TensorFlow version 1 model weights to TensorFlow version 2 model weights; follow procedure in
tf1_convert_tf2.sh
. - First, run training and validation for RE step. To do this, run
myrun_re.py
underBERT-TF-master
directory. An example bash script,example_re.sh
, shows the various command line arguments supplied tomyrun_re.py
. - Second, run training and validation for NER step. To do this, run
myrun_ner.py
underBERT-TF-master
directory, followed bymyner_detokenize.py
underBERT-TF-master/biocodes
directory. An example bash script,example_ner.sh
, shows the various command line arguments supplied to both these scripts.
This software and documentation were developed by the authors in their capacities as Oak Ridge Institute for Science and Education (ORISE) research fellows at the U.S. Food and Drug Administration (FDA).
FDA assumes no responsibility whatsoever for use by other parties of the Software, its source code, documentation or compiled executables, and makes no guarantees, expressed or implied, about its quality, reliability, or any other characteristic. Further, FDA makes no representations that the use of the Software will not infringe any patent or proprietary rights of third parties. The use of this code in no way implies endorsement by the FDA or confers any advantage in regulatory decisions.