You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for uploading CORA on github! I am trying to use your package in my project, and wanted to make sure if it was the intention of the authors to add the language_id twice to the outputs of the mDPR before passing it to mGEN.
In mDPR/dense_retriever.py, in the method parse_qa_jsonlines_file, the 2 - letter language id is added to the question while encoding the question for mDPR. Considering this was intentional, the 2 letter language id is added again while converting mDPR outputs to seq2seq (here)
What ends up happening is that before the input sequence is sent into mGEN, the question ID is appended in the end by the language id twice, both being the same. We would follow the same format if the authors intended it to be so.
The text was updated successfully, but these errors were encountered:
Hi,
Thanks for uploading CORA on github! I am trying to use your package in my project, and wanted to make sure if it was the intention of the authors to add the
language_id
twice to the outputs of themDPR
before passing it tomGEN
.In mDPR/dense_retriever.py, in the method
parse_qa_jsonlines_file
, the 2 - letter language id is added to the question while encoding the question formDPR
. Considering this was intentional, the 2 letter language id is added again while converting mDPR outputs to seq2seq (here)What ends up happening is that before the input sequence is sent into
mGEN
, the question ID is appended in the end by the language id twice, both being the same. We would follow the same format if the authors intended it to be so.The text was updated successfully, but these errors were encountered: