Name		Name	Last commit message	Last commit date
parent directory ..
figures		figures
natural		natural
semi_natural		semi_natural
synthetic		synthetic
README.md		README.md
evaluate.py		evaluate.py
idiom_frequencies.tsv		idiom_frequencies.tsv
idioms.tsv		idioms.tsv
results.pickle		results.pickle
visualise.ipynb		visualise.ipynb

README.md

Overgeneralisation

This folder contains the data for the overgeneralisation test described in section 4.3 of the paper.

This folder also contains a script for evaluation, a visualisation notebook, a pickled file with our substitutivity results for the paper and a list of the idioms that we considered, and their frequency in opus.

Idioms

An overview can be found in the file idioms.tsv, which contains four columns:

idiom: the idiom itself, in multiple variations we looked for
english_keywords: english words that could be used to find partial matches in OPUS. We didn't use them.
dutch_keywords: potential translations of the English keyword that a model would output in a literal Dutch translation.
sub_clause: the subordinate clause taken from OPUS and inserted into the synthetic and semi-natural corpora.

Data

The three data sources are in the respective subfolders:

synthetic: contains subfolders per template (1 - 10) and per subfolder 500 samples per idiom (0 - 19). There are no target translations.
semi_natural: contains subfolders per template (1 - 10) and per subfolder 500 samples per idiom (0 - 19). There are no target translations.
natural: contains a file per idiom (0 - 19), containing as many identical matches as we could find in OPUS, given the syntactic variation of the idioms in the idioms.tsv file. OPUS target translations are available with the extension .nl.

Usage

To run the test for a specific setup and condition (synthetic, semi_natural or natural), use your model to translate all files in the respective folder. After that, you can use the evaluation script to systematically compare the translations and compute consistency scores, and the visualisation notebook to visualise your results (run as is, the visualisation notebook will visualise the systematicity results from the paper).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overgeneralisation

overgeneralisation

README.md

Overgeneralisation

Idioms

Data

Usage

Files

overgeneralisation

Directory actions

More options

Directory actions

More options

Latest commit

History

overgeneralisation

Folders and files

parent directory

README.md

Overgeneralisation

Idioms

Data

Usage