-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could you share the processed all.txt? #23
Comments
Thanks for using my code! |
Thanks! Something like 403 forbidden. |
Hmm, looks tough, while I'm not familiar with connections in China.
In download_*.py you're using, fix like
If nothing changed, I already give up! |
Haha, I'll try. Thank you very much for the instant reply! |
Do you succeed with wget? I guessed some kind of IP block happened. |
Yes, I was able to download using |
Actually, no, it fails with I'm on an us-east AWS EC2 instance. |
Thank you for the information. @thudzj By the way, as shown in my comment (#24 (comment)), you can try the unknown file on Google Drive (at your own risk). |
Hi Sosuke,
Thanks a lot for the wonderful work! I expect to obtain the bookcorpus dataset with your crawler, but I failed to crawl the articles owing to some network errors. I am afraid that I cannot achieve a complete dataset. So could you please share with me the dataset you have got, e.g. the all.txt. My email address is [email protected]. Thanks!
Zhijie
The text was updated successfully, but these errors were encountered: