GLUE-X

We collect 14 publicly available datasets as OOD test data and conduct evaluations on 8 classic NLP tasks over popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.

Fine-tune your language model

Please checkout these examples from Hugging Face Transformer, to fine-tune your custom models.

Out-of-Domain Tests (OOD)

The data for all OOD tests can be found here.

Main Contributer

Shuibai Zhang (Code work and Experiments Implementation); Linyi Yang (Guidance and Experiments Design); Wei Zhou (Website Implementation)

Citation

If you find this work is helpful for your research, please consider to cite the paper as follows.

@article{yang2022glue,
  title={GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective},
  author={Yang, Linyi and Zhang, Shuibai and Qin, Libo and Li, Yafu and Wang, Yidong and Liu, Hanmeng and Wang, Jindong and Xie, Xing and Zhang, Yue},
  journal={arXiv preprint arXiv:2211.08073},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Datasets		Datasets
MMD&Overlap		MMD&Overlap
evaluation		evaluation
model_rationale_score		model_rationale_score
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLUE-X

Fine-tune your language model

Out-of-Domain Tests (OOD)

Main Contributer

Citation

About

Releases

Packages

Contributors 2

Languages

YangLinyi/GLUE-X

Folders and files

Latest commit

History

Repository files navigation

GLUE-X

Fine-tune your language model

Out-of-Domain Tests (OOD)

Main Contributer

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages