Adding the language id twice to the question before passing it to mGEN #3

styx97 · 2022-11-28T21:02:52Z

Hi,

Thanks for uploading CORA on github! I am trying to use your package in my project, and wanted to make sure if it was the intention of the authors to add the language_id twice to the outputs of the mDPR before passing it to mGEN.

In mDPR/dense_retriever.py, in the method parse_qa_jsonlines_file, the 2 - letter language id is added to the question while encoding the question for mDPR. Considering this was intentional, the 2 letter language id is added again while converting mDPR outputs to seq2seq (here)

What ends up happening is that before the input sequence is sent into mGEN, the question ID is appended in the end by the language id twice, both being the same. We would follow the same format if the authors intended it to be so.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the language id twice to the question before passing it to mGEN #3

Adding the language id twice to the question before passing it to mGEN #3

styx97 commented Nov 28, 2022

Adding the language id twice to the question before passing it to mGEN #3

Adding the language id twice to the question before passing it to mGEN #3

Comments

styx97 commented Nov 28, 2022