This folder contains the data for the systematicity tests, which is described in section 4.1 of the paper.
This folder also contains a script for evaluation (evaluate.py), a notebook to visualise results (visualise.ipynb), and a pickled file of our systematicity results for the paper.
The systematicity test has two different setups:
The S --> S conj S setup considers the systematic recombination of two sentences into a new sentence. This setup has two conditions:
- Systematic recombinations of sentences that are minimally different. In this setup, the consistency of the translation of
S2
across translations ofS1 and S2
andS1' and S2
is considered, whereS1
andS1'
are synthetic sentences that differ in only one noun. - Systematic recombinations of sentences that are ery different different. In this setup, the consistency of the translation of
S2
across translations ofS1 and S2
andS3 and S2
is considered, whereS1
andS3
are different synthetic sentences.
For this condition, three data sources are available, that can be found in the respective subfolders. Each file contains a concatenation of a synthetic sentence template and a sentence from the indicated data source:
synthetic
contains three files per template (1 - 10), one with the original sentence (S1
), one with the minimally different first sentence (S1'
) and one with a different first sentence (S3
). There are no target translations.semi_natural
follows the same pattern, but for semi-natural data.natural
follows the same pattern, but for natural data.
The S --> NP VP considers the systematic recombination of noun- and verb phrases. Because this test requires control over the sentence structure and properties to ensure that the recombination is correct, this test cannot be conducted with natural data. There are two data sources available, that can be found in the respective subfolders:
synthetic
contains three files per template (1 - 10), one with the original sentence, one in which a noun in the NP is adapted and one in which a noun in the VP is adapted.semi_natural
follows the same pattern, but for semi_natural data and for the NP only.
To run the test for a specific setup and condition (synthetic
, semi_natural
or natural
), use your model to translate all files in the respective folder.
After that, you can use the evaluation script to systematically compare the translations and compute consistency scores, and the visualisation notebook to visualise your results (run as is, the visualisation notebook will visualise the systematicity results from the paper).