This repository contains code and models for replicating results from the following publication:
- Dependency or Span, End-to-End Uniform Semantic Role Labeling
- To appear in AAAI 2019
Part of the codebase is extended from lsgn.
- Python 2.7
- TensorFlow 1.8.0
- pyhocon (for parsing the configurations)
- tensorflow_hub (for loading ELMo)
- sudo apt-get install tcsh (Only required for processing CoNLL05 data)
- GloVe embeddings, the srlconll scripts and the eval09 script:
./scripts/fetch_required_data.sh
- Build kernels:
./scripts/build_custom_kernels.sh
(Please make adjustments to the script according to your OS/gcc version)
- Some of our models are trained with the ELMo embeddings. We use the ELMo model loaded by tensorflow_hub. You can download the file from here and decompress it into the directory:
/elmo
.
For replicating results on CoNLL-2005, CoNLL-2009 and CoNLL-2012 datasets, please follow the steps below.
The data is provided by:
CoNLL-2005 Shared Task,
but the original words are from the Penn Treebank dataset, which is not publicly available.
If you have the PTB corpus, you can run:
./scripts/fetch_and_make_conll05_data.sh /path/to/ptb/
The data is provided by:
CoNLL-2009 Shared Task,
Run:
./scripts/make_conll2009_data.sh /path/to/conll-2009
You have to follow the instructions below to get CoNLL-2012 data
CoNLL-2012, this would result in a directory called /path/to/conll-formatted-ontonotes-5.0
.
Run:
./scripts/make_conll2012_data.sh /path/to/conll-formatted-ontonotes-5.0
- Experiment configurations are found in
experiments.conf
- Choose an experiment that you would like to run, e.g.
conll2012_best
- For a single-machine experiment, run the following two commands:
python singleton.py <experiment>
python evaluator.py <experiment>
- Results are stored in the
logs
directory and can be viewed via TensorBoard. - For final evaluation of the checkpoint with the maximum dev F1:
- Run
python test_single.py <experiment>
for the single-model evaluation. For example:python test_single.py conll2012_final
- Run
- If you want to use GPUs, add command line parameter:
-gpu 0
. - The evaluator should not be run on GPUs, since evaluating full documents does not fit within GPU memory constraints.
- The training runs indefinitely and needs to be terminated manually. The model generally converges at about 300k steps and within 12-36 hours.
- At test time, the code loads the entire GloVe 300D embedding file in the beginning, which would take a while.