You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently trying to reproduce your results on unsupervised NMT. I noted that you mentioned you filter out tokenized data with more than 175 tokens. However, I didn't find any code in your data processing file get-data-nmt.sh for doing so.
Can you confirm that the data script is up-to-date?
Also, I use the pretraining script you provided in some issues. I found that the loader in your code would remove long sequences, which is set to 100 sub-tokens for default.
Did you filter out the sequence longer than 175 tokens here?
Looking forward to your reply. Thanks!
The text was updated successfully, but these errors were encountered:
Hi, thanks for sharing your code.
I'm currently trying to reproduce your results on unsupervised NMT. I noted that you mentioned you filter out tokenized data with more than 175 tokens. However, I didn't find any code in your data processing file get-data-nmt.sh for doing so.
Can you confirm that the data script is up-to-date?
Also, I use the pretraining script you provided in some issues. I found that the loader in your code would remove long sequences, which is set to 100 sub-tokens for default.
Did you filter out the sequence longer than 175 tokens here?
Looking forward to your reply. Thanks!
The text was updated successfully, but these errors were encountered: