This repository contains data for the 2020 Quality Estimation Shared Task:
http://www.statmt.org/wmt20/quality-estimation-task.html
Check the 'data' folder
Check the 'nmt-models' folder
Check 'http://www.statmt.org/wmt20/quality-estimation-task.html'
Europarl v9
ParaCrawl v3
Common Crawl corpus
News Commentary v14
Wiki Titles v1
Document-split Rapid corpus
News Commentary v14
Wiki Titles v1
UN Parallel Corpus V1.0
CWMT Corpus (casia2015, datum2015, datum2017, NEU)
Europarl v8
Rapid corpus of EU press releases
Flores Iterative Back Translation
Flores Iterative Back Translation
If you use this data in your work, please cite:
@article{tacl2020,
title = {Unsupervised Quality Estimation for Neural Machine Translation},
author = {Fomicheva, Marina and Sun, Shuo and Yankovskaya, Lisa and Blain, Frédéric and Guzmán, Francisco and Fishel, Mark and Aletras, Nikolaos and Chaudhary, Vishrav and Specia, Lucia},
journal = {Transactions of the Association for Computational Linguistics},
volume = {8},
pages = {539-555},
year = {2020}
}
- 2020-03-15: Adding details about training data for NMT models
- 2020-03-19: Releasing dataset
The dataset is licensed under CC-BY-SA, see the LICENSE file for details.