Skip to content

Latest commit

 

History

History
61 lines (35 loc) · 3 KB

Readme.md

File metadata and controls

61 lines (35 loc) · 3 KB

Poem2Poem

Machine translation in poems domain preserving rhythm and rhyme

Inspired by DeepSpeare project:

https://arxiv.org/abs/1807.03491

https://github.com/jhlau/deepspeare

And by YSDA NLP course:

https://github.com/yandexdataschool/nlp_course

Also there is research paper by Marjan Ghazvininejad, Yejin Choi and Kevin Knight:

https://aclweb.org/anthology/N18-2011

But I have heard about it only after finishing this project

Goal

This project is aimed at writing a program which can automatically translate poems in one language to another (English -> Russian as example) so that translated text has rhythm and rhyme. Ideally, the program would capture poetic meter and rhyme patterns from original poem and reproduce them in translated one. However, note that such perfectly translated version generally doesn't exist in nature at all. It means that we have to find some trade-off between rhythm, rhyme, fluency, adequacy, etc. of translation.

Examples of results

"The Road Not Taken"
by Robert Frost
Automatically generated translation
containing some rhyme and rhythm
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
две дороги поднялись на желтые дрова
и жаль что я мог не путешествовать
и одним путником я был со мной
и взглянула вниз вниз так как я смог
туда где он погнулся под букетом

Method

Model consists of two parts: translation model and poetic meter model. They are trained simultaneously using multitask approach. Both are implemented as Encoder-Decoder architecture with Attention mechanism. Despite this fact, actually they differ much. Idea of Meter model and most of code of its loss function is borrowed from DeepSpeare project. Inference function is highly flexible and supports many modes that can be combined to each other independently:

  • With or without rhythm
  • With or without rhyme
  • Type of rhyme
  • Type of sampling and number of samples

Data

There is no parallel corpus of poems large enough to be used for training such model. Four datasets were used to train and fine-tune this model:

  • OpenSubtitles parallel English->Russian corpus
  • Parallel corpus of songs parsed by me from the Internet special for this project (Sourse text is English, target text is Russian; Russian translation of song doesn't contain rhyme and rhythm)
  • Parallel corpus of Russian classic poems translated to English by the model with the same architecture but trained on OpenSubtitles from Russian to English - Backtranslation approach.
  • Small corpus of Shakespeare sonnets translated to Russian by Marshak

How to use

This section will be added soon

Acknowledgements

https://github.com/jhlau/deepspeare

https://github.com/yandexdataschool/nlp_course

https://github.com/IlyaGusev/rupo