-
-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Help needed] file size mismatch #2938
Comments
Upon further investigation this doesn't seem to be an issue with gallery-dl, but rather the site. Once I let gallery-dl retry it eventually succeeds in downloading the files. They look fine, videos playback fine, I'm just wondering whether there might be some corruption. I can't tell. |
I tried several things - What does work is sending |
Thank you very much for testing. If you look at my extractor I'm requesting pages in a similar manner as to how a user would (root of the site, then board, then thread) with the goal of having the cookies generate, I noticed gallery-dl uses Just to make sure I understand, do you think |
I think I figured it out. We can get both cookies by sending a request to Afterwards those cookies would either need to be refreshed every few minutes, or maybe it would also work to unset or extend the
Yeah, they do. Not sending them causes 8chan to periodically drop the connection during downloads and we get file size mismatch errors, which does not happen when they are included in a request. Just try it out.
I thinks that's unnecessary. Just remove those two requests. edit: Using a several hours old edit2: For reference, I'm using these as headers = {
"Accept": "video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5",
"Accept-Language": "en-US,en;q=0.5",
"Range": "bytes=0-",
"DNT": "1",
"Connection": "keep-alive",
"Referer": "https://8chan.moe/v/res/673938.html",
"Cookie": "captchaexpiration=Tue, 20 Sep 2022 16:29:07 GMT; captchaid=6329e99ff54ecb27cafcdbda3ZS8RplhsSLtWoxq8dLyrTvswKEgK2jE2w+u4LirJVr3qnpfsPVwpetZVMTkKHhR6BlaL/Ox9E3QB+voGG7T0A==",
"Sec-Fetch-Dest": "video",
"Sec-Fetch-Mode": "no-cors",
"Sec-Fetch-Site": "same-origin",
"TE": "trailers",
} |
I took your code from #2938 (comment), modified it quite a bit, and eventually ended up with a working extractor that does not get interrupted while downloading: 1696f68. |
I'm running into this issue where gallery-dl occasionally chokes on partial content (code 206). It outputs "file size mismatch" after each of the five tries and then gives up. I read the HttpDownloader source but I'm not getting any wiser. Would appreciate your help with debugging.
The site is https://8chan.moe/
I was testing using this thread (mostly SFW): https://8chan.moe/v/res/640607.html
I think the site serves all somewhat large files using chunks, at least all the files it choked on were larger ones.
This thread has a lot of large files (mostly SFW): https://8chan.moe/v/res/673938.html
If you run the extractor I recommend setting sleep to 1 to make sure you don't get rate limited.
You can let the extractor run and it should choke on some file.
Here is my extractor:
The text was updated successfully, but these errors were encountered: