Skip to content

Translation Dataset with Terminology for Kpop Fandom

License

Notifications You must be signed in to change notification settings

skswldndi/KpopMT

Repository files navigation

KpopMT: Translation Dataset with Terminology for Kpop Fandom

We propose KpopMT dataset, which enables precise terminology translation, choosing Kpop fandom as an initiative for social groups given its global popularity. Expert translators provide 1k English translations for Korean posts and comments, each annotated with specific terminology within social groups' language systems. We evaluate existing translation systems including GPT models on KpopMT to identify their failure cases. Results show overall low scores, underscoring the challenges of reflecting group-specific terminologies and styles in translation. We plan to expand KpopMT to encompass other social groups, such as sports and global movie communities.

Important

We provide three kinds of datasets: parallel (tagged.lang.txt), monolingual (fan-monolingual.lang), and termbase (termbase-category). Details are in the paper.

Citation

@misc{kim2024kpopmttranslationdatasetterminology,
      title={KpopMT: Translation Dataset with Terminology for Kpop Fandom}, 
      author={JiWoo Kim and Yunsu Kim and JinYeong Bak},
      year={2024},
      eprint={2407.07413},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.07413}, 
}

About

Translation Dataset with Terminology for Kpop Fandom

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published