Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the language id twice to the question before passing it to mGEN #3

Open
styx97 opened this issue Nov 28, 2022 · 0 comments
Open

Comments

@styx97
Copy link

styx97 commented Nov 28, 2022

Hi,

Thanks for uploading CORA on github! I am trying to use your package in my project, and wanted to make sure if it was the intention of the authors to add the language_id twice to the outputs of the mDPR before passing it to mGEN.

In mDPR/dense_retriever.py, in the method parse_qa_jsonlines_file, the 2 - letter language id is added to the question while encoding the question for mDPR. Considering this was intentional, the 2 letter language id is added again while converting mDPR outputs to seq2seq (here)

What ends up happening is that before the input sequence is sent into mGEN, the question ID is appended in the end by the language id twice, both being the same. We would follow the same format if the authors intended it to be so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant