This repository contains the resources for our papers Exploring Large Language Models for Classical Philology and Graecia capta ferum victorem cepit. Detecting Latin Allusions to Ancient Greek Literature.
By request, we provide simple pipelines for:
Please note that while the general setup is similar to what we used in our paper, this version focuses on readability and flexibility rather than being an exact replica.
We use pdm for easy dependency management. To install the required packages
- Install
pdm
- Run the following command:
pdm install
This will take care of all the necessary dependencies.
To download the Universal Dependencies treebanks, you can use the following commands:
wget -P data/ https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-5502/ud-treebanks-v2.14.tgz
tar --extract --file data/ud-treebanks-v2.14.tgz -C data/
rm data/ud-treebanks-v2.14.tgz
This will download the treebanks, extract them into the data/ directory, and remove the downloaded archive.
In the configs/ directory, you will find default configurations for the three tasks. Feel free to adjust them to your needs or create new ones. To run a script, use the following command format:
pdm run python src/ancient-language-models/script_name.py configs/config-name.py
For example, to run the unlabeled parsing script:
pdm run python src/ancient-language-models/unlabeled_parsing.py configs/unlabeled_parsing-config.py
The scripts include small wandb
sweeps, which you may want to extend or adjust according to your needs. For example, the Dependency Parsing script only computes uncorrected results, meaning there is no theoretical guarantee that the resulting tree will not contain cycles or multiple root nodes. To address this, you might consider adding an algorithm like Chu-Liu-Edmonds to ensure a valid tree structure.
Greek | Latin | Multilingual | |
---|---|---|---|
Encoder-only | GrεBERTa | LaBERTa | PhilBERTa |
Encoder-decoder | GrεTa | LaTa | PhilTa |
In our paper Graecia capta ferum victorem cepit. Detecting Latin Allusions to Ancient Greek Literature, we introduce SPhilBERTa, a Sentence Transformer model to identify cross-lingual references between Latin and Ancient Greek texts. SPhilBERTa can be found here.
If you have any questions or problems, feel free to reach out.
@inproceedings{riemenschneider-frank-2023-exploring,
title = "Exploring Large Language Models for Classical Philology",
author = "Riemenschneider, Frederick and
Frank, Anette",
editor = "Rogers, Anna and
Boyd-Graber, Jordan and
Okazaki, Naoaki",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.846",
doi = "10.18653/v1/2023.acl-long.846",
pages = "15181--15199",
}