-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file not found #2
Comments
Hi, you can get the file here, for example: https://github.com/nawnoes/data-preprocess/blob/master/WikiExtractor.py Note that you actually don't have to download, extract and process the wiki dumps -- we have also released the processed dumps used to train our system here: https://github.com/ufal/multilexnorm2021/releases/tag/v1.0.0 |
Thanks a lot for your help. I have another question. After synthetic pre-training, i need to load the saved checkpoint, and fine-tuning the synthetic-pretraining checkpoint with hand-annotated traing data. This procedure is right or not? Now i fine-tune the byt5 model directly with hand-annotated traing data, and i can only get ERR with 70.15 on En language. |
That sounds alright. I'm not sure what validation dataset you use, but reducing the error by 70% seems good to me :) |
As for the validation dataset, i simply use the test file under path (/data/multilexnorm/test/eval/test/intrinsic_evaluation/en/test.norm.masked), and i am tring to achieve the performance reported in the paper (73.8 on En language) |
I can't find the file named /utility/WikiExtractor.py used in initialize.sh. The file seems to be important for synthetic pre-training
The text was updated successfully, but these errors were encountered: