Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kemono] 429 Too Many Requests #5160

Closed
shinji257 opened this issue Feb 4, 2024 · 20 comments
Closed

[Kemono] 429 Too Many Requests #5160

shinji257 opened this issue Feb 4, 2024 · 20 comments

Comments

@shinji257
Copy link
Contributor

After a short period of time I get a 429 error. It lasts for a bit then goes away (I'd say 30 seconds?) however I was hoping someone had a way to mitigate the problem in the configuration so I can actually do a run without hitting this?

As of right now when 429 is hit for Kemono it doesn't even slow down. It just continues to spam skipping over artists until it gets to the end of the list.

 shinj@Tinym P:\....\gallery-dl  gallery-dl.exe --cookies-from-browser brave/kemono.su --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36" https://kemono.su/favorites -v
[gallery-dl][debug] Version 1.26.7 - Executable
[gallery-dl][debug] Python 3.8.10 - Windows-10-10.0.22635
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json']
[gallery-dl][debug] Starting DownloadJob for 'https://kemono.su/favorites'
[kemonoparty][debug] Using KemonopartyFavoriteExtractor for 'https://kemono.su/favorites'
[cookies][debug] Extracting cookies from C:\Users\shinj\AppData\Local\BraveSoftware\Brave-Browser\User Data\Default\Network\Cookies
[cookies][debug] Found Local State file at 'C:\Users\shinj\AppData\Local\BraveSoftware\Brave-Browser\User Data\Local State'
[cookies][info] Extracted 2 cookies from Brave
[cookies][debug] Cookie version breakdown: {'v10': 2, 'other': 0, 'unencrypted': 0}
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): kemono.su:443
[urllib3.connectionpool][debug] https://kemono.su:443 "GET /api/v1/account/favorites?type=artist HTTP/1.1" 200 None
[kemonoparty][debug] Using KemonopartyUserExtractor for 'https://kemono.su/fanbox/user/10036203'
[kemonoparty][debug] Using cached cookies from ('brave', '', '', '', 'kemono.su')
[urllib3.connectionpool][debug] https://kemono.su:443 "GET /fanbox/user/10036203 HTTP/1.1" 200 10578
[urllib3.connectionpool][debug] https://kemono.su:443 "GET /api/v1/fanbox/user/10036203?o=0 HTTP/1.1" 200 83027
[urllib3.connectionpool][debug] https://kemono.su:443 "GET /fanbox/user/10036203/post/2177931 HTTP/1.1" 200 7205
[urllib3.connectionpool][debug] https://kemono.su:443 "GET /fanbox/user/10036203/dms HTTP/1.1" 200 2751
[kemonoparty][debug] Using download archive 'C:\Users\shinj\AppData\Roaming/gallery-dl/.archives/kemonoparty.sqlite3'
[snipping]
[kemonoparty][debug] Using download archive 'C:\Users\shinj\AppData\Roaming/gallery-dl/.archives/kemonoparty.sqlite3'
[kemonoparty][debug] Active postprocessor modules: [MetadataPP]
[urllib3.connectionpool][debug] Resetting dropped connection: kemono.su
[urllib3.connectionpool][debug] https://kemono.su:443 "GET /fanbox/user/10052645/post/6135366 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://kemono.su:443 "GET /fanbox/user/10052645/post/6076249 HTTP/1.1" 429 555
[kemonoparty][error] HttpError: '429 Too Many Requests' for 'https://kemono.su/fanbox/user/10052645/post/6076249'
[kemonoparty][debug] Using KemonopartyUserExtractor for 'https://kemono.su/fanbox/user/10187738'
[kemonoparty][debug] Using cached cookies from ('brave', '', '', '', 'kemono.su')
[urllib3.connectionpool][debug] https://kemono.su:443 "GET /fanbox/user/10187738 HTTP/1.1" 429 555
[kemonoparty][error] HttpError: '429 Too Many Requests' for 'https://kemono.su/fanbox/user/10187738'
[kemonoparty][debug] Using KemonopartyUserExtractor for 'https://kemono.su/fanbox/user/10200427'
[kemonoparty][debug] Using cached cookies from ('brave', '', '', '', 'kemono.su')
[urllib3.connectionpool][debug] https://kemono.su:443 "GET /fanbox/user/10200427 HTTP/1.1" 429 555
[kemonoparty][error] HttpError: '429 Too Many Requests' for 'https://kemono.su/fanbox/user/10200427'
[kemonoparty][debug] Using KemonopartyUserExtractor for 'https://kemono.su/fanbox/user/10387886'
[kemonoparty][debug] Using cached cookies from ('brave', '', '', '', 'kemono.su')
[snipped repeating messages]
@Skaronator
Copy link

They definitely changed their rate-limiting config (or even activated it in the first place).

Just configure sleep and sleep-request https://github.com/mikf/gallery-dl/blob/master/docs/options.md

I'm playing around and [0, 2] is not enough.

@Skaronator
Copy link

Skaronator commented Feb 4, 2024

sleep-request: 5, 10 seems to work fine. (it didn't after 20 minutes)

Trying 10, 30 seconds now.

@shinji257
Copy link
Contributor Author

shinji257 commented Feb 4, 2024

I'm actually more hoping for "Hey we hit a rate limit. Wait 30 seconds or longer then try the last request again" vs a relatively static 10-30 second range for sleep-request. I think we currently do something like that with Twitter.

Tried to set sleep-request to 10, 30 and no dice. Still got it eventually.

@github-userx
Copy link

@shinji257 @Skaronator It is my understanding that same as bunkr.su they added (or upped/strengthened) their DDosGuard protection that now often will require a captcha to solve before accessing the site or files for download ?

cyberdrop-dl has added cookies extraction for this I believe.

@shinji257
Copy link
Contributor Author

Doesn't look like it is that. It looks like they just seem to heavily throttle available connections per client. I get this even during normal browsing from time to time. I think usually if a captcha is hit then you get a 403 error or similar.

@Doofy420
Copy link

This happens with other sites that have rate limiting as well, like desuarchive. Gdl just skips any posts that give a 429.

@shinji257
Copy link
Contributor Author

Yes except it ends up skipping all but maybe the first two artists since it gets the 429 even when trying to pull requests from API.

@biggestsonicfan
Copy link

biggestsonicfan commented Mar 11, 2024

I'm trying to resume a kemono grab, but gallery-dl grabbing everything up to that point creates the 429 error.

Can we get a sleep-retry config? The retries seem to be happening instantly...

@Hrxn
Copy link
Contributor

Hrxn commented Mar 11, 2024

Why wouldn't sleep-request not work for you here?

@biggestsonicfan
Copy link

No idea, I have it set to 10.0 but the retries are less than a second apart.

@mikf
Copy link
Owner

mikf commented Mar 11, 2024

Sounds like this setting isn't actually being used. Check with -E.

edit: Retries for extractor HTTP requests, i.e. not file downloads, do indeed wait for at least sleep-request seconds. (code)


You can pass kemono's o=… query parameter to gallery-dl to skip that amount of posts, although this only worked in multiples of 50 last time I checked.

https://kemono.su/fanbox/user/12345?o=350

@biggestsonicfan
Copy link

In theory that would work, but there are no "skippable" posts when downloading a discord server...

@mikf
Copy link
Owner

mikf commented Mar 11, 2024

Discord channels also have an o parameter, although it currently gets always set to 0 with no way of changing it (other than changing the code itself).

@biggestsonicfan
Copy link

Interesting, did not know that. Also, it appears just waiting until the next day managed to reset my "too many tries" for now.

@shinji257
Copy link
Contributor Author

shinji257 commented Apr 4, 2024

So no way to put a pause between downloads? I think just a few seconds would be enough based on my experience with the site.

@shinji257
Copy link
Contributor Author

This has been working so far for me.

            "sleep-request": [0.5, 1.5],
            "sleep-extractor": [0.5, 1.5],

@kattjevfel
Copy link
Contributor

The problem I'm having is that even with a decent delay of 10 seconds, it will occasionally skip over one artist due to a 429, and with hundreds of artists and --abort 5, it will potentially skip hundreds of files.

In turn I will then have sometimes do a "full" run without --abort which then takes roughly 2 full days with the delays.

I'd really just like for it to be able to wait X minutes instead of skipping when it encounters 429, and maybe then I can finally set a delay below 60 seconds...

@shinji257
Copy link
Contributor Author

shinji257 commented Apr 5, 2024

We can have it retry 429 but there is no way to set a delay on that. I know that (at least) DeviantArt is setup that if a 429 is hit and the active sleep is lower than 30 seconds it adds a second and tries again after that delay. Maybe we can duplicate that over to this to resolve the issue?

EDIT:
I added "retry-codes": [429], to the list in the kemono section of my config. This should have it retry. I don't know exactly how the code works but it should (I hope) wait another 0.5-1.5 random seconds before doing so in order to hopefully allow for a subsequent attempt. I did notice some 429 errors popup but was followed by a success right after.

@shinji257
Copy link
Contributor Author

So it has been running for a day or two. Has hit 429 every now and again but thanks to me adding it as a retry code it seems to be merged into the "5 tries" bit of the application. In the end it manages to get it thanks to the delays I added in. Example logging...

(NSFW Warning)

* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_22_ea340e11-fef5-4073-b30f-9ba354f92dd6.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_23_3fe4dc89-9a10-4c54-8f3b-c2373c3388b6.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_24_e19f95c2-266c-4484-b3a3-5707169e82df.jpg
[downloader.http][warning] HTTPSConnectionPool(host='c6.kemono.su', port=443): Read timed out. (read timeout=30) (1/5)
[downloader.http][warning] HTTPSConnectionPool(host='c6.kemono.su', port=443): Read timed out. (read timeout=30) (2/5)
[downloader.http][warning] '429 Too Many Requests' for 'https://kemono.su/data/05/f7/05f7c103d97fb33566d84cac8436d2531a522cd6ad1cfbbfcc632286e3501596.jpg' (3/5)
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_25_c42bd277-2e18-4bad-b658-d795cc0335bc.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_26_4ce212f7-037c-47eb-bb60-14dba1f0f827.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_27_46969f1f-8b66-4ff7-8a52-319659195104.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_28_b25bfb6a-2075-4e5b-99f2-054d7f71f60d.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_29_c71eb867-039c-45e4-9aed-084397b76261.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_30_e2f0dca8-1e42-469e-8dca-8f90b7fdf11b.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_31_f8bccef6-2256-4d78-9db6-66b32b16eebe.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_32_ca7f1fa4-a309-4cf2-8432-3a9e3851be08.jpg
[downloader.http][warning] HTTPSConnectionPool(host='c6.kemono.su', port=443): Read timed out. (read timeout=30) (1/5)
[downloader.http][warning] HTTPSConnectionPool(host='c6.kemono.su', port=443): Read timed out. (read timeout=30) (2/5)
[downloader.http][warning] '429 Too Many Requests' for 'https://kemono.su/data/8a/3a/8a3a7d8b5c24eb59ac4f92540a92ab0f3223eba2c651e8734f650f2f9f6b9f6b.jpg' (3/5)
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_33_8f91936f-ec7b-4c5c-8bc0-2566bd1639a5.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_34_93df6664-0bc6-4ea0-b953-5e4b67bd0d60.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_35_5070e6dd-cbd4-4f85-a6c2-0c45d99f8220.jpg
* .\gallery-dl\kemonoparty\fanbox\14149030\3372601_36_ff85ad54-b5d5-4592-b8e9-d8ab946d7745.jpg

mikf added a commit that referenced this issue Apr 16, 2024
@mikf
Copy link
Owner

mikf commented Apr 16, 2024

Added a sleep-429 option that lets you set a custom sleep time for 429 Too Many Requests responses (defaults to 60 seconds for now): 566472f

JackTildeD added a commit to JackTildeD/gallery-dl-forked that referenced this issue Apr 24, 2024
* save cookies to tempfile, then rename

avoids wiping the cookies file if the disk is full

* [deviantart:stash] fix 'index' metadata (mikf#5335)

* [deviantart:stash] recognize 'deviantart.com/stash/…' URLs

* [gofile] fix extraction

* [kemonoparty] add 'revision_count' metadata field (mikf#5334)

* [kemonoparty] add 'order-revisions' option (mikf#5334)

* Fix imagefap extrcator

* [twitter] add 'birdwatch' metadata field (mikf#5317)

should probably get a better name,
but this is what it's called internally by Twitter

* [hiperdex] update URL patterns & fix 'manga' metadata (mikf#5340)

* [flickr] add 'contexts' option (mikf#5324)

* [tests] show full path for nested values

'user.name' instead of just 'name' when testing for
"user": { … , "name": "…", … }

* [bluesky] add 'instance' metadata field (mikf#4438)

* [vipergirls] add 'like' option (mikf#4166)

* [vipergirls] add 'domain' option (mikf#4166)

* [gelbooru] detect returned favorites order (mikf#5220)

* [gelbooru] add 'date_favorited' metadata field

* Update fapello.py

get fullsize image instead resized

* fapello.py Fullsize image

by remove ".md" and ".th" in image url, it will download fullsize of images

* [formatter] fix local DST datetime offsets for ':O'

'O' would get the *current* local UTC offset and apply it to all
'datetime' objects it gets applied to.
This would result in a wrong offset if the current offset includes
DST and the target 'datetime' does not or vice-versa.

'O' now determines the correct local UTC offset while respecting DST for
each individual 'datetime'.

* [subscribestar] fix 'date' metadata

* [idolcomplex] support new pool URLs

* [idolcomplex] fix metadata extraction

- replace legacy 'id' vales with alphanumeric ones, since the former are
  no longer available
- approximate 'vote_average', since the real value is no longer
  available
- fix 'vote_count'

* [bunkr] remove 'description' metadata

album descriptions are no longer available on album pages
and the previous code erroneously returned just '0'

* [deviantart] improve 'index' extraction for stash files (mikf#5335)

* [kemonoparty] fix exception for '/revision/' URLs

caused by 03a9ce9

* [steamgriddb] raise proper exception for deleted assets

* [tests] update extractor results

* [pornhub:gif] extract 'viewkey' and 'timestamp' metadata (mikf#4463)

mikf#4463 (comment)

* [tests] use 'datetime.timezone.utc' instead of 'datetime.UTC'

'datetime.UTC' was added in Python 3.11
and is not defined in older versions.

* [gelbooru] add 'order-posts' option for favorites (mikf#5220)

* [deviantart] handle CloudFront blocks in general (mikf#5363)

This was already done for non-OAuth requests (mikf#655)
but CF is now blocking OAuth API requests as well.

* release version 1.26.9

* [kemonoparty] fix KeyError for empty files (mikf#5368)

* [twitter] fix pattern for single tweet (mikf#5371)

- Add optional slash
- Update tests to include some non-standard tweet URLs

* [kemonoparty:favorite] support 'sort' and 'order' query params (mikf#5375)

* [kemonoparty] add 'announcements' option (mikf#5262)

mikf#5262 (comment)

* [wikimedia] suppress exception for entries without 'imageinfo' (mikf#5384)

* [docs] update defaults of 'sleep-request', 'browser', 'tls12'

* [docs] complete Authentication info in supportedsites.md

* [twitter] prevent crash when extracting 'birdwatch' metadata (mikf#5403)

* [workflows] build complete docs Pages only on gdl-org/docs

deploy only docs/oauth-redirect.html on mikf.github.io/gallery-dl

* [docs] document 'actions' (mikf#4543)

or at least attempt to

* store 'match' and 'groups' in Extractor objects

* [foolfuuka] improve 'board' pattern & support pages (mikf#5408)

* [reddit] support comment embeds (mikf#5366)

* [build] add minimal pyproject.toml

* [build] generate sdist and wheel packages using 'build' module

* [build] include only the latest CHANGELOG entries

The CHANGELOG is now at a size where it takes up roughly 50kB or 10% of
an sdist or wheel package.

* [oauth] use Extractor.request() for HTTP requests (mikf#5433)

Enables using proxies and general network options.

* [kemonoparty] fix crash on posts with missing datetime info (mikf#5422)

* restore LD_LIBRARY_PATH for PyInstaller builds (mikf#5421)

* remove 'contextlib' imports

* [pp:ugoira] log errors for general exceptions

* [twitter] match '/photo/' Tweet URLs (mikf#5443)

fixes regression introduced in 40c0553

* [pp:mtime] do not overwrite '_mtime' for None values (mikf#5439)

* [wikimedia] fix exception for files with empty 'metadata'

* [wikimedia] support wiki.gg wikis

* [pixiv:novel] add 'covers' option (mikf#5373)

* [tapas] add 'creator' extractor (mikf#5306)

* [twitter] implement 'relogin' option (mikf#5445)

* [docs] update docs/configuration links (mikf#5059, mikf#5369, mikf#5423)

* [docs] replace AnchorJS with custom script

use it in rendered .rst documents as well as in .md ones

* [text] catch general Exceptions

* compute tempfile path only once

* Add warnings flag

This commit adds a warnings flag

It can be combined with -q / --quiet to display warnings.
The intent is to provide a silent option that still surfaces
warning and error messages so that they are visible in logs.

* re-order verbose and warning options

* [gelbooru] improve pagination logic for meta tags (mikf#5478)

similar to 494acab

* [common] add Extractor.input() method

* [twitter] improve username & password login procedure (mikf#5445)

- handle more subtasks
- support 2FA
- support email verification codes

* [common] update Extractor.wait() message format

* [common] simplify 'status_code' check in Extractor.request()

* [common] add 'sleep-429' option (mikf#5160)

* [common] fix NameError in Extractor.request()

… when accessing 'code' after an requests exception was raised.

Caused by the changes in 566472f

* [common] show full URL in Extractor.request() error messages

* [hotleak] download files with 404 status code (mikf#5395)

* [pixiv] change 'sanity_level' debug message to a warning (mikf#5180)

* [twitter] handle missing 'expanded_url' fields (mikf#5463, mikf#5490)

* [tests] allow filtering extractor result tests by URL or comment

python test_results.py twitter:+/i/web/
python test_results.py twitter:~twitpic

* [exhentai] detect CAPTCHAs during login (mikf#5492)

* [output] extend 'output.colors' (mikf#2566)

allow specifying ANSI colors for all loglevels
(debug, info, warning, error)

* [output] enable colors by default

* add '--no-colors' command-line option

---------

Co-authored-by: Luc Ritchie <[email protected]>
Co-authored-by: Mike Fährmann <[email protected]>
Co-authored-by: Herp <[email protected]>
Co-authored-by: wankio <[email protected]>
Co-authored-by: fireattack <[email protected]>
Co-authored-by: Aidan Harris <[email protected]>
@mikf mikf closed this as completed Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants