UCCA Parser from HIT-SCIR CoNLL2019 Unified Transition-Parser

This repository accompanies the COLING 2020 paper Comparison by Conversion: Reverse-Engineering UCCA from Syntax and Lexical Semantics:

@inproceedings{hershcovich-etal-2020-comparison,
    title = "Comparison by Conversion: Reverse-Engineering {UCCA} from Syntax and Lexical Semantics",
    author = "Hershcovich, Daniel  and
      Schneider, Nathan  and
      Dvir, Dotan  and
      Prange, Jakob  and
      de Lhoneux, Miryam  and
      Abend, Omri",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.264",
    pages = "2947--2966",
    abstract = "Building robust natural language understanding systems will require a clear characterization of whether and how various linguistic meaning representations complement each other. To perform a systematic comparative analysis, we evaluate the mapping between meaning representations from different frameworks using two complementary methods: (i) a rule-based converter, and (ii) a supervised delexicalized parser that parses to one framework using only information from the other as features. We apply these methods to convert the STREUSLE corpus (with syntactic and lexical semantic annotations) to UCCA (a graph-structured full-sentence meaning representation). Both methods yield surprisingly accurate target representations, close to fully supervised UCCA parser quality{---}indicating that UCCA annotations are partially redundant with STREUSLE annotations. Despite this substantial convergence between frameworks, we find several important areas of divergence.",
}

The code is based on the repository, which accompanies the paper HIT-SCIR at MRP 2019: A Unified Pipeline for Meaning Representation Parsing via Efficient Training and Effective Encoding from the CoNLL MRP 2019 Shared Task on Cross-Framework Meaning Representation Parsing, providing code to train models and pre/post-process the MRP dataset.

Changes from the original implementation are:

Deletion of non-UCCA parsing code, for simplicity. The original code also targeted DM, PSD, EDS and AMR.
Addition of scripts for interoperability with the UCCA XML format, under bash/mrp2xml.sh. The original code only supports the MRP format.
Support for additional features in the input to (the parser model)[modules/transition_parser_ucca.py].
Modification of the preprocessing scripts and data reader such that the preprocessing now do the CoNLL-U format parsing and save the attributes in a dict, rather than saving it as a CoNLL-U string. The data reader therefore does not need to use the conllu library and can simply read the attributes from the companion field.
Fix to read edge attributes from the MRP data and not edge properties (following the renaming of this element in the MRP format).
Experiments with various settings, differing by input features (listed in the paper), under config/.

See REPLICATING.md for instructions on replicating the experiments reported in the paper.

Pre-requisites

Python 3.6
AllenNLP 0.9.0

Dataset

The full MRP training data is available at mrp-data. Specifically, we use the publicly available UCCA data in MRP format.

Usage

Install requirements

After creating a Conda environment or a virtualenv, run

pip install -r requirements.txt

Download BERT

The parser uses BERT Large. To get the BERT checkpoints, run

cd bert/
make

Prepare data

To get the data, augment it with the companion data, and split it to training/validation/evaluation, run

cd data/
make split

For evaluation data given only as input text in MRP format, you need to convert the companion data to conllu format:

python3 toolkit/preprocess_eval.py \
    udpipe.mrp \
    input.mrp \
    --outdir /path/to/output

Train the parser

Based on AllenNLP, the training command is like

CUDA_VISIBLE_DEVICES=${gpu_id} \
TRAIN_PATH=${train_set} \
DEV_PATH=${dev_set} \
BERT_PATH=${bert_path} \
WORD_DIM=${bert_output_dim} \
LOWER_CASE=${whether_bert_is_uncased} \
BATCH_SIZE=${batch_size} \
    allennlp train \
        -s ${model_save_path} \
        --include-package utils \
        --include-package modules \
        --file-friendly-logging \
        ${config_file}

Refer to bash/train.sh for more and detailed examples.

Predict with the parser

The predicting command is like

CUDA_VISIBLE_DEVICES=${gpu_id} \
    allennlp predict \
        --cuda-device 0 \
        --output-file ${output_path} \
        --predictor ${predictor_class} \
        --include-package utils \
        --include-package modules \
        --batch-size ${batch_size} \
        --silent \
        ${model_save_path} \
        ${test_set}

More examples in bash/predict.sh.

Package structure

bash/ command pipelines and examples
config/ Jsonnet config files
metrics/ metrics used in training and evaluation
modules/ implementations of modules
toolkit/ external libraries and dataset tools
utils/ code for input/output and pre/post-processing

Acknowledgement

We thank the developers of the HIT-SCIR parser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCCA Parser from HIT-SCIR CoNLL2019 Unified Transition-Parser

Pre-requisites

Dataset

Usage

Install requirements

Download BERT

Prepare data

Train the parser

Predict with the parser

Package structure

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
bash		bash
bert		bert
config		config
data		data
file-lists		file-lists
metrics		metrics
modules		modules
toolkit		toolkit
utils		utils
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
REPLICATING.md		REPLICATING.md
requirements.txt		requirements.txt

License

danielhers/hit-scir-ucca-parser

Folders and files

Latest commit

History

Repository files navigation

UCCA Parser from HIT-SCIR CoNLL2019 Unified Transition-Parser

Pre-requisites

Dataset

Usage

Install requirements

Download BERT

Prepare data

Train the parser

Predict with the parser

Package structure

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages