Implementation of An Efficient Self-Supervised Cross-View Training For Sentence Embedding (TACL 2023).
@article{10.1162/tacl_a_00620,
author = {Limkonchotiwat, Peerat and Ponwitayarat, Wuttikorn and Lowphansirikul, Lalita and Udomcharoenchaikit, Can and Chuangsuwanich, Ekapol and Nutanong, Sarana},
title = "{An Efficient Self-Supervised Cross-View Training For Sentence Embedding}",
journal = {Transactions of the Association for Computational Linguistics},
volume = {11},
pages = {1572-1587},
year = {2023},
month = {12},
issn = {2307-387X},
doi = {10.1162/tacl_a_00620},
url = {https://doi.org/10.1162/tacl\_a\_00620},
eprint = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\_a\_00620/2196817/tacl\_a\_00620.pdf},
}
git clone https://github.com/mrpeerat/SCT
cd SCT
pip install -e .
- SCT-Distillation-BERT-Tiny
- SCT-Distillation-BERT-Mini
- SCT-Distillation-BERT-Small
- SCT-Distillation-BERT-Base
We use the training data from BSL's paper: here.
We use sts-b development set from sentence transformer.
Self-supervised:
Models | Reference Temp | Student Temp | Queue Size | Learning Rate |
---|---|---|---|---|
BERT-Tiny | 0.03 | 0.04 | 131072 | 5e-4 |
BERT-Mini | 0.01 | 0.03 | 131072 | 3e-4 |
BERT-Small | 0.02 | 0.03 | 65536 | 3e-4 |
BERT-Base | 0.04 | 0.05 | 65536 | 5e-4 |
BERT-Large | 0.04 | 0.05 | 16384 | 5e-4 |
Distillation:
Models | Reference Temp | Student Temp | Queue Size | Learning Rate |
---|---|---|---|---|
BERT-Tiny | 0.03 | 0.04 | 131072 | 5e-4 |
BERT-Mini | 0.04 | 0.05 | 65536 | 1e-4 |
BERT-Small | 0.04 | 0.05 | 131072 | 1e-4 |
BERT-Base | 0.04 | 0.05 | 65536 | 1e-4 |
Please set the model's parameter before training.
>> bash Running_distillation_script.sh
>> bash Running_script.sh
For finetuning model parameters:
learning_rate_all=(1e-4 3e-4 5e-4)
queue_sizes=(131072 65536 16384)
teacher_temps=(0.01 0.02 0.03 0.04 0.05 0.06 0.07)
student_temps=(0.01 0.02 0.03 0.04 0.05 0.06 0.07)
Our evaluation code for sentence embeddings is based on a modified version of SentEval and SimCSE.
Before evaluation, please download the evaluation datasets by running
cd SentEval
pip install -e .
cd data/downstream/
bash download_dataset.sh
Please see this notebooks.
python evaluation.py \
--model_name_or_path "your-model-path" \
--task_set sts \
--mode test
Self-supervised:
Models | STS (Avg.) |
---|---|
SCT-BERT-Tiny | 69.73 |
SCT-BERT-Mini | 69.59 |
SCT-BERT-Small | 72.56 |
SCT-BERT-Base | 75.55 |
SCT-BERT-Large | 78.16 |
Distillation:
Models | STS (Avg.) |
---|---|
SCT-Distillation-BERT-Tiny | 76.43 |
SCT-Distillation-BERT-Mini | 77.58 |
SCT-Distillation-BERT-Small | 78.16 |
SCT-Distillation-BERT-Base | 79.58 |
Self-supervised:
Models | Reranking (Avg.) | NLI (Avg.) |
---|---|---|
SCT-BERT-Tiny | 55.29 | 71.89 |
SCT-BERT-Small | 58.59 | 75.70 |
SCT-BERT-Base | 60.97 | 77.93 |
SCT-BERT-Large | 63.02 | 79.55 |
Distillation:
Models | Reranking (Avg.) | NLI (Avg.) |
---|---|---|
SCT-Distillation-BERT-Tiny | 61.14 | 78.53 |
SCT-Distillation-BERT-Small | 61.94 | 80.44 |
SCT-Distillation-BERT-Base | 64.63 | 80.97 |