Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

two bug fixs #129

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1d85518
delete part file when file not match to server before retry
Ovear Aug 31, 2022
8eecbb8
overwrite part file directly when not resume
Ovear Aug 31, 2022
711ed9c
Fix #137 #139
Ovear Sep 29, 2022
de56804
Introduce api unittest
Ovear Sep 29, 2022
b2e1531
Feat: Check file hash before skip
Ovear Sep 29, 2022
ea01064
switch to utf8 when write file to avoid BOM
Ovear Sep 29, 2022
973af5e
fix: attachments download
Ovear Sep 30, 2022
00c71e2
feat: keep file with 416 code and no hash file to new location
Ovear Sep 30, 2022
99de1f8
bugfix: fix date format when compare
Ovear Oct 5, 2022
1f77d6e
Add option to skip local hash check
Ovear Oct 17, 2022
520d99a
fix readme
Ovear Jan 4, 2023
1c96211
Fix inline image location of content html
Ovear Jan 4, 2023
26d9a0e
introduce adaptive chunk size since different chunk sizes are applied
Ovear Jan 5, 2023
dd4fd69
Change debug.log to utf8
Ovear Jan 15, 2023
97001ec
Update extenstion to get cookies.txt
Ovear Mar 30, 2023
54d8a66
handle img tag properly when extract inline images
Ovear Jul 10, 2023
634bf2f
feat: support new domains of kemono and commer. improved cookie file …
Ovear Jul 10, 2023
59d5d36
feat: update to latest api
Ovear Apr 27, 2024
617e72f
fix: minor bug fix when download icon/banner
Ovear May 1, 2024
bc27c63
fix: adadpt new api to download single post
Ovear May 1, 2024
877dbae
feat: introduces proxy agent for scenarios not suitable for standard …
Ovear May 1, 2024
c42a657
fix: try to fix #14 hope it won't break anything
Ovear May 1, 2024
e107671
feat: handle http 429 globally
Ovear May 1, 2024
1e3891a
feat: implement global request interval limit to avoid 429 errors
Ovear May 1, 2024
6628a6e
fix: offset fixed to 50 according to new api document
Ovear May 2, 2024
a9de53e
feat: show progress bar once download started
Ovear May 2, 2024
738c84a
fix: enhance attachments download compatibility and clarify debug logs
Ovear May 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ A downloader tool for kemono.party and coomer.party.
3. Then install requirements with `pip install -r requirements.txt`
- If the command doesn't run try adding `python -m`, `python3 -m`, or `py -m` to the front
4. Get a cookie.txt file from kemono.party/coomer.party
- You can get a cookie text file on [Chrome](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid?hl=en) with this extension.
- You can get a cookie text file on [Chrome](https://chrome.google.com/webstore/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc) or [Firefox](https://addons.mozilla.org/firefox/addon/cookies-txt/) with this extension.
- A cookie.txt file is required to use downloader!
5. Run `python kemono-dl.py --cookies "cookie.txt" --links https://kemono.party/SERVICE/user/USERID`
- If the script doesn't run try replacing `python` with `python3` or `py`
Expand All @@ -26,13 +26,13 @@ Takes in a url or list of urls separated by a comma.
`--from-file FILE`
Reads in a file with urls separated by new lines. Lines starting with # will not be read in.
`--kemono-fav-users SERVICE`
Downloads favorite users from kemono.party of specified type or types separated by a comma. Types include: all, patreon, fanbox, gumroad, subscribestar, dlsite, fantia. Your cookie file must have been gotten while logged in to work.
Downloads favorite users from kemono.party/su of specified type or types separated by a comma. Types include: all, patreon, fanbox, gumroad, subscribestar, dlsite, fantia. Your cookie file must have been gotten while logged in to work.
`--coomer-fav-users SERVICE`
Downloads favorite users from coomer.party of specified type or types separated by a comma. Types include: all, onlyfans. Your cookie file must have been gotten while logged in to work.
Downloads favorite users from coomer.party/su of specified type or types separated by a comma. Types include: all, onlyfans. Your cookie file must have been gotten while logged in to work.
`--kemono-fav-posts`
Downloads favorite posts from kemono.party. Your cookie file must have been gotten while logged in to work.
Downloads favorite posts from kemono.party/su. Your cookie file must have been gotten while logged in to work.
`--coomer-fav-posts`
Downloads favorite posts from coomer.party. Your cookie file must have been gotten while logged in to work.
Downloads favorite posts from coomer.party/su. Your cookie file must have been gotten while logged in to work.

## What files to download

Expand All @@ -56,6 +56,8 @@ Download the users profile banner. Only works when a user url is passed.
Try to download the post embed with yt-dlp.
`--skip-attachments`
Do not download post attachments.
`--skip-local-hash`
Do not check hash for downloaded local files.
`--overwrite`
Overwrite any previously created files.

Expand Down Expand Up @@ -121,6 +123,11 @@ The time in seconds to wait between downloading posts. (default: 0)
The amount of times to retry / resume downloading a file. (default: 5)
`--ratelimit-sleep SEC`
The time in seconds to wait after being ratelimited (default: 120)
`--ratelimit-ms MS`
The time in millisecond to limit before next request (default: 300)

`--proxy-agent https://agent/proxy`
Proxy agent URL. This is NOT standrad http/s proxy. Pass 'u' parameter to agent for proxying. Not enabled by default. Enable this you can not download kemono and commer at once.

# Notes
- Excepted link formats:
Expand Down
78 changes: 78 additions & 0 deletions src/api_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
import unittest
import requests
from numbers import Number
from requests.adapters import HTTPAdapter, Retry

class ApiTest(unittest.TestCase):
site = ''
timeout = 5
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'}
def getSession(self):
retries = Retry(
total = 3,
backoff_factor = 0.1,
status_forcelist = [ 500, 502, 503, 504 ]
)
session = requests.Session()
session.mount('https://', HTTPAdapter(max_retries=retries))
session.mount('http://', HTTPAdapter(max_retries=retries))
return session
def callApi(self, url):
print(f'Requst {url}')
response = self.getSession().get(url=url, headers=self.headers, timeout=self.timeout)
if response.status_code == 401:
raise Exception(f'HTTP 401 bad cookie | {response.status_code} {response.reason}')
elif not response.ok:
raise Exception(f'HTTP not ok, | {response.status_code} {response.reason}')
print(f'Response {response.text}')
return response.json()

class KemonoApiTest(ApiTest, unittest.TestCase):
site = 'kemono.party'
patreonUser = '35150295'
patreonPost = '65210116'

def test_creators(self):
print('Start test for creators')
creators = self.callApi(url=f'https://{self.site}/api/creators/')
self.assertGreaterEqual(len(creators), 1, 'creator can not be empty')
creator = creators[0]
self.assertTrue(isinstance(creator['favorited'], Number), 'favorited must be number')
self.assertTrue(isinstance(creator['indexed'], Number), 'indexed must be number')
self.assertTrue(isinstance(creator['updated'], Number), 'updated must be number')
self.assertTrue(isinstance(creator['id'], str), 'favorited must be str')
self.assertTrue(isinstance(creator['name'], str), 'favorited must be str')
self.assertTrue(isinstance(creator['service'], str), 'favorited must be str')

def test_patreon_post(self):
print('Start test for Patreon post api')
post = self.callApi(url=f"https://{self.site}/api/patreon/user/{self.patreonUser}/post/{self.patreonPost}")
self.assertEqual(len(post), 1, 'Post list must equal to 1')
post = post[0]
self.assertEqual(post['added'], 'Thu, 28 Apr 2022 03:16:21 GMT', 'added not equal')
self.assertEqual(post['attachments'], [{'name': 'Nelves_Moonwell_Final.jpg',
'path': '/59/ca/59ca91127d30cd44c85a8fd71a7a560b74c4eb7e0a2873065057fe20f7e3c5b8.jpg'}],
'attachment not equal')
self.assertEqual(post['content'], "<p>Made a quick render of night elves \"bathing\" \
in a moonwell while waiting for simulations on the Miss Fortune animation. I'm in the polishing stage of the Miss \
Fortune animation and will be hiring a VA soon as well as beginning the render!</p><p>Hope you all enjoy this scene :)</p>",
'content not equal')
self.assertEqual(post['edited'], 'Sat, 16 Apr 2022 14:03:04 GMT', 'edited not equal')
self.assertEqual(post['embed'], {}, 'embed not equal')
self.assertEqual(post['file'], {'name': 'Nelves_Moonwell_Final.jpg',
'path': '/59/ca/59ca91127d30cd44c85a8fd71a7a560b74c4eb7e0a2873065057fe20f7e3c5b8.jpg'},
'file not equal')
self.assertEqual(post['id'], '65210116', 'post id not equal')
self.assertEqual(post['published'], 'Sat, 16 Apr 2022 14:03:04 GMT', 'published not equal')
self.assertEqual(post['title'], 'Moonwell Bathing and Miss Fortune Update', 'title not equal')
self.assertEqual(post['user'], self.patreonUser, 'user must be same')
self.assertEqual(post['service'], 'patreon', 'service must be patreon')
self.assertFalse(post['shared_file'], 'shared file must be false in this post')


class CoomerApiTest(ApiTest, unittest.TestCase):
site = 'coomer.party'


if __name__ == '__main__':
unittest.main()
64 changes: 58 additions & 6 deletions src/args.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import re
import argparse
from http.cookiejar import MozillaCookieJar, LoadError
from urllib.parse import urlparse, urlunparse

from .version import __version__

Expand All @@ -26,19 +27,19 @@ def get_args():

ap.add_argument("--kemono-fav-users",
metavar="SERVICE", type=str, default=None,
help="Downloads favorite users from kemono.party of specified type or types separated by a comma. Types include: all, patreon, fanbox, gumroad, subscribestar, dlsite, fantia. Your cookie file must have been gotten while logged in to work.")
help="Downloads favorite users from kemono.party/su of specified type or types separated by a comma. Types include: all, patreon, fanbox, gumroad, subscribestar, dlsite, fantia. Your cookie file must have been gotten while logged in to work.")

ap.add_argument("--coomer-fav-users",
metavar="SERVICE", type=str, default=None,
help="Downloads favorite users from coomer.party of specified type or types separated by a comma. Types include: all, onlyfans. Your cookie file must have been gotten while logged in to work.")
help="Downloads favorite users from coomer.party/su of specified type or types separated by a comma. Types include: all, onlyfans. Your cookie file must have been gotten while logged in to work.")

ap.add_argument("--kemono-fav-posts",
action='store_true', default=False,
help="Downloads favorite posts from kemono.party. Your cookie file must have been gotten while logged in to work.")
help="Downloads favorite posts from kemono.party/su. Your cookie file must have been gotten while logged in to work.")

ap.add_argument("--coomer-fav-posts",
action='store_true', default=False,
help="Downloads favorite posts from coomer.party. Your cookie file must have been gotten while logged in to work.")
help="Downloads favorite posts from coomer.party/su. Your cookie file must have been gotten while logged in to work.")



Expand Down Expand Up @@ -82,6 +83,10 @@ def get_args():
action='store_true', default=False,
help="Do not download post attachments.")

ap.add_argument("--skip-local-hash",
action='store_true', default=False,
help="Do not check hash for downloaded local files.")

ap.add_argument("--overwrite",
action='store_true', default=False,
help="Overwrite any previously created files.")
Expand Down Expand Up @@ -196,28 +201,58 @@ def get_args():
metavar="SEC", type=int, default=120,
help="The time in seconds to wait after being ratelimited (default: 120)")

ap.add_argument("--ratelimit-ms",
metavar="MS", type=int, default=300,
help="The time in millisecond to limit before next request (default: 300)")

ap.add_argument("--user-agent",
metavar="UA", type=str, default='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
help="Set a custom user agent")

ap.add_argument("--proxy-agent",
metavar="https://agent/proxy", type=str, default=None,
help="Proxy agent URL. This is NOT standrad http/s proxy. Pass 'u' parameter to agent for proxying. Not enabled by default. "
"Enable this you can not download kemono and commer at once.")

args = vars(ap.parse_args())
args['cookie_domains'] = {'kemono': None, 'coomer': None}

# takes a comma seperated lost of cookie files and loads them into a cookie jar
if args['cookies']:
cookie_files = [s.strip() for s in args["cookies"].split(",")]
args['cookies'] = MozillaCookieJar()
loaded_cookies = MozillaCookieJar()
loaded = 0
for cookie_file in cookie_files:
try:
args['cookies'].load(cookie_file)
loaded_cookies.load(cookie_file)
loaded += 1
except LoadError:
print(F"Unable to load cookie {cookie_file}")
except FileNotFoundError:
print(F"Unable to find cookie {cookie_file}")
if loaded == 0:
print("No cookies loaded | exiting"), exit()

# make sure cookies are wildcard for better compatibility
for cookie in loaded_cookies:
args['cookie_domains']['kemono'] = args['cookie_domains']['kemono'] or (
match := re.search(r'^(?:www)?\.?(kemono\.(?:party|su))$', cookie.domain)) and match.group(1)
args['cookie_domains']['coomer'] = args['cookie_domains']['coomer'] or (
match := re.search(r'^(?:www)?\.?(coomer\.(?:party|su))$', cookie.domain)) and match.group(1)

if cookie.domain.startswith('www.'):
cookie.domain = cookie.domain[3:]
cookie.domain_specified = True
cookie.domain_initial_dot = True
elif not cookie.domain.startswith('.'):
cookie.domain = f'.{cookie.domain}'
cookie.domain_specified = True
cookie.domain_initial_dot = True
args['cookies'].set_cookie(cookie)

if (not args['cookie_domains']['kemono'] and (args['kemono_fav_users'] or args['kemono_fav_posts'])) or (
not args['cookie_domains']['coomer'] and (args['coomer_fav_users'] or args['coomer_fav_posts'])):
print(f"Bad cookie file | Unable to detect domain when download favorites"), exit()
# takes a comma seperated string of links and converts them to a list
if args['links']:
args['links'] = [s.strip().split('?')[0] for s in args["links"].split(",")]
Expand Down Expand Up @@ -306,4 +341,21 @@ def check_size(args, key):
print(f"--coomer-fav-users no valid options were passed")
args['coomer_fav_users'] = temp

if args['proxy_agent']:
u = urlparse(args['proxy_agent'])
if not u.netloc or not u.path:
print(f"Bad proxy agent url | Url shoule be something like https://example.com/agent"), exit()
if not u.scheme:
u.scheme = 'http'
args['proxy_agent'] = urlunparse(u)

# we should change cookie domain to proxy agent
new_cookies = MozillaCookieJar()
for cookie in args['cookies']:
cookie.domain = f'.{u.netloc}'
cookie.domain_specified = True
cookie.domain_initial_dot = True
new_cookies.set_cookie(cookie)
args['cookies'] = new_cookies

return args
72 changes: 67 additions & 5 deletions src/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,29 @@
import hashlib
import os
import time
import requests
from urllib.parse import urlparse, urlencode, parse_qs, urlunparse

from .args import get_args
from .logger import logger

running_args = get_args()

def parse_url(url):
# parse urls
downloadable = re.search(r'^https://(kemono\.party|coomer\.party)/([^/]+)/user/([^/]+)($|/post/([^/]+)$)',url)
downloadable = re.search(r'^https://((?:kemono|coomer)\.(?:party|su))/([^/]+)/user/([^/]+)($|/post/([^/]+)$)',url)
if not downloadable:
return None
return downloadable.group(1)

# create path from template pattern
def compile_post_path(post_variables, template, ascii):
drive, tail = os.path.splitdrive(template)
tail = tail[1:] if tail[0] in {'/','\\'} else tail
tail_trimmed = tail[0] in {'/','\\'}
tail = tail[1:] if tail_trimmed else tail
tail_split = re.split(r'\\|/', tail)
cleaned_path = drive + os.path.sep if drive else ''
cleaned_path = (drive + os.path.sep if drive else
(os.path.sep if tail_trimmed else ''))
for folder in tail_split:
if ascii:
cleaned_path = os.path.join(cleaned_path, restrict_ascii(clean_folder_name(folder.format(**post_variables))))
Expand Down Expand Up @@ -87,7 +96,7 @@ def print_download_bar(total:int, downloaded:int, resumed:int, start):

rate = (downloaded-resumed)/time_diff

eta = time.strftime("%H:%M:%S", time.gmtime((total-downloaded) / rate))
eta = time.strftime("%H:%M:%S", time.gmtime((total-downloaded) / rate)) if rate else '99:99:99'

if rate/2**10 < 100:
rate = (round(rate/2**10, 1), 'KB')
Expand Down Expand Up @@ -141,4 +150,57 @@ def print_download_bar(total:int, downloaded:int, resumed:int, start):
# latest_version = datetime.datetime.strptime(latest_tag, r'%Y.%m.%d.%H')
# if current_version < latest_version:
# logger.debug(f"Using kemono-dl {__version__} while latest release is kemono-dl {latest_tag}")
# logger.warning(f"A newer version of kemono-dl is available. Please update to the latest release at https://github.com/AplhaSlayer1964/kemono-dl/releases/latest")
# logger.warning(f"A newer version of kemono-dl is available. Please update to the latest release at https://github.com/AplhaSlayer1964/kemono-dl/releases/latest")


# doesn't support multithreading
def function_rate_limit(func):
last_call_times = {}

def wrapper(*args, **kwargs):
nonlocal last_call_times
func_name = func.__name__
t = time.time()
last_call_time = last_call_times.get(func_name, 0)
if (t - last_call_time) * 1000 < running_args['ratelimit_ms']:
time.sleep(running_args['ratelimit_ms'] / 1000 - (t - last_call_time))
last_call_times[func_name] = time.time()
return func(*args, **kwargs)

return wrapper

class RefererSession(requests.Session):
def __init__(self, *args, **kwargs):
self.proxy_agent = kwargs.pop('proxy_agent', None)
self.max_retries_429 = kwargs.pop('max_retries_429', 3)
self.sleep_429 = kwargs.pop('sleep_429', 120)

super().__init__(*args, **kwargs)

def rebuild_auth(self, prepared_request, response):
super().rebuild_auth(prepared_request, response)
u = urlparse(response.url)
prepared_request.headers["Referer"] = f'{u.scheme}://{u.netloc}/'

@function_rate_limit
def get(self, url, **kwargs):
old_url = url
retry_429 = kwargs.pop('retry_429', True)
max_retries_429 = kwargs.pop('max_retries_429', self.max_retries_429)

if self.proxy_agent:
u = urlparse(self.proxy_agent)
q_params = parse_qs(u.query)
q_params['u'] = url
u = u._replace(query=urlencode(q_params))
url = urlunparse(u)

resp = super().get(url, **kwargs)
max_retries_429 -= 1
if resp.status_code != 429 or not retry_429 or max_retries_429 < 1:
return resp

# need retry
logger.warning(f"Failed to access: {url if self.proxy_agent else old_url} | {resp.status_code} Too Many Requests | Sleeping for {self.sleep_429} seconds")
time.sleep(self.sleep_429)
return self.get(old_url, retry_429=retry_429, max_retries_429=max_retries_429, **kwargs)
2 changes: 1 addition & 1 deletion src/logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
file_format = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
stream_format = logging.Formatter('%(levelname)s:%(message)s')

file_handler = logging.FileHandler('debug.log', encoding="utf-16")
file_handler = logging.FileHandler('debug.log', encoding="utf-8")
file_handler.setFormatter(file_format)

stream_handler = logging.StreamHandler()
Expand Down
Loading