Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deviantArt] [bug] Downloading/Reparsing many journals results in "API responded with 429 Too Many Requests" despite using own account, while art download works fine #2702

Closed
a-washing-machine opened this issue Jun 25, 2022 · 7 comments

Comments

@a-washing-machine
Copy link

a-washing-machine commented Jun 25, 2022

For each artist's gallery I download, I also grab their journals. I run my queue every other month, with -A=8 as the abort-criterium.

I haven't set that up in my config, I still do this using
gallery-dl_1.22.1.exe -A=8 --config my_config.conf https://www.deviantart.com/ARTISTNAME/gallery/?catpath=/
gallery-dl_1.22.1.exe -A=8 --config my_config.conf https://www.deviantart.com/ARTISTNAME/gallery/?catpath=scraps
gallery-dl_1.22.1.exe -A=8 --config my_config.conf https://www.deviantart.com/ARTISTNAME/journal/

And while galleries and scraps run fine, after a while, journal-downloads keep running into this issue:

gallery-dl_1.22.1.exe -A=8 --config my_config.conf https://www.deviantart.com/ARTISTNAME/journal/
[deviantart][warning] API responded with 429 Too Many Requests. Using 1s delay.
[deviantart][warning] API responded with 429 Too Many Requests. Using 2s delay.
[deviantart][warning] API responded with 429 Too Many Requests. Using 3s delay.
[deviantart][warning] API responded with 429 Too Many Requests. Using 4s delay.
. . .
[deviantart][warning] API responded with 429 Too Many Requests. Using 17s delay.

...And yes, gallery-downloads themselves (mostly) still run fine even when this occurs. In fact, downloading a large amount of new artworks even provides a "cool down period" for the journal API requests!

This bug has actually been around for a long time, but I originally thought I could just circumvent the issue on my end. No need to bother someone to fix something if I can just make the problem go away myself, I figured.

Finally got around to try, and...

Nope, turns out, rearranging my download queue so all the journal-downloads aren't right next to each other anymore didn't fix the issue.

The issue can be replicated easiest by downloading from someone with 800 - 1000 journals (yes, apparently those exist), or multiple artists with several hundred journals each, in direct succession. (Or even just reparsing them, with the files themselves already downloaded.)

Example user with 1900+ journals:
https://www.deviantart.com/c-puff/posts/journals

Keep in mind that, using -A=8, it usually doesn't come to this extreme, but still encounters the issue eventually and then just keeps having it once it does, new journals found or not, slowing down my entire download queue by at least a day or two every time. :/


Hilariously, I also encountered this:

gallery-dl_1.22.1.exe -A=8 --config my_config.conf https://www.deviantart.com/ARTIST_WITH_NO_JOURNALS/journal/
[deviantart][warning] API responded with 429 Too Many Requests. Using 1s delay.
[deviantart][warning] API responded with 429 Too Many Requests. Using 2s delay.
[deviantart][warning] API responded with 429 Too Many Requests. Using 3s delay.
[deviantart][warning] API responded with 429 Too Many Requests. Using 4s delay.
. . .
[deviantart][warning] API responded with 429 Too Many Requests. Using 17s delay.
[deviantart][info] No results for https://www.deviantart.com/ARTIST_WITH_NO_JOURNALS/journal/

...So, even when there AREN'T any journals, the issue still persists.


...Are journals perhaps not using the correct credentials, or have their wait timer set too low compared to artwork downloads?

@rautamiekka
Copy link
Contributor

Yeah, for me it started doing that on offset 600 for the c-puff one, but then again my settings do additional queries for completeness, so I'm likely doing triple as many queries.

@a-washing-machine
Copy link
Author

Out of curiosity, what "other queries for completeness" are we talking about here?

I vaguely remembering reading somewhere about a check to see if a larger resolution than the already saved image was available, or something to that effect. Is it that?

@rautamiekka
Copy link
Contributor

https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractordeviantartextra
https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractor-deviantart-metadata

cuz I save the description, tags/keywords, and the JSON data of the post; the folders are normally not saved, but the description file does have it predefined just in case.

@a-washing-machine
Copy link
Author

Ah, I see, that does seem like useful information to save somewhere. :)

@mikf
Copy link
Owner

mikf commented Jun 27, 2022

I can give you an explanation as to why this happens, but not a real solution.

The main "culprit" here is the change from commit 65b1cb7, which makes gallery-dl use a private access token when fetching journal texts. This is necessary when a journal can only be accessed by logged in users, so, to make things easier to implement, I just enabled this behavior in general.

The problem with that is that private tokens have a much lower rate limit than public tokens.

You could try using a rather high --sleep-request value, but I should also at least implemkent an option to disable the behavior from 65b1cb7.

Hilariously, I also encountered this:
...

gallery-dl does 2 API calls here: the first with a public token, followed by the same call with a private token to check for potential private journals/deviations, which then triggers the 429 error.

@a-washing-machine
Copy link
Author

Sorry for the late reply

Hmm. Didn't know there was such a thing as "private journals". Do those happen often?

What would happen with "private journals" if gallery-dl used the public key for journals instead?

Would gallery-dl just not "see" them, or get some kind of detectable "access denied" error that it could react to by (temporarily) switching to the "private key"?

Or would that mess up the "abort" parameter?

Also, what'd be a good --sleep-request value? At a guess, I mean. 1 second, 5 seconds, 20 seconds? Then I can take that and experiment if I can go lower or need to go higher.

mikf added a commit that referenced this issue Jul 31, 2022
and retry with a private token if needed
@mikf
Copy link
Owner

mikf commented Aug 29, 2022

This is fixed in v1.23.0
gallery-dl now uses a public token for journals by default and only falls back to a private one if a journal needs a login to be viewed.

@mikf mikf closed this as completed Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants