Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The HTTP chunk size is too small causing a bottleneck in download speeds #3143

Closed
trodiz opened this issue Nov 2, 2022 · 7 comments
Closed

Comments

@trodiz
Copy link

trodiz commented Nov 2, 2022

the variable in question:

self.chunk_size = 16384

When you set a rate limit, eg: -r 3M, instead of getting a download rate closer to that specified value you'll get a much lower value. In my case it was throttled at 1 mega bytes per second. The very small chunk_size creates an overhead in each iteration of the for loop in _receive_rate function. I'd recommend a much more commonly used chunk size of 1,048,576 (1MiB).

@mikf
Copy link
Owner

mikf commented Nov 2, 2022

Are you sure it's not the download server that's limiting your download speed? I just tested a bit with large Danbooru files and -r/--limit-rate works as expected there.

The chunk size is rather small and there should definitely be an option to configure it, but I don't think that is what's primarily limiting your speeds.

@Hrxn
Copy link
Contributor

Hrxn commented Nov 2, 2022

Can't say that I've ever noticed gallery-dl bottlenecking my download rate..

The site/server is usually the limiting factor (or a proxy, if used), on many occasions gallery-dl was saturating my download bandwidth just fine..

@trodiz
Copy link
Author

trodiz commented Nov 2, 2022

Thanks for the reply folks. I am having this issue on all sites that I have tested such as reddit, imgur, gfycat etc. If it is allowed, I can share a link here.

Changing the chunk_size in the code does fix this issue for me. Also, without the rate limit (no -r flag) everything downloads at full-speed as expected. In this case the receive function is called which also uses default chunk_size. So it must be the 100s of iterations inside the _receive_rate function that is causing the slowdown. It has a few math operations and an IO call for writing the progress bar to the console. There is a good chance that all of this noticeable due to my computer being very old and slow.

ps: I'm using the latest version of the library. (gallery-dl==1.23.5)

mikf added a commit that referenced this issue Nov 2, 2022
and double the previous default from 16384 (2**14) to 32768 (2**15)
@mikf
Copy link
Owner

mikf commented Nov 2, 2022

due to my computer being very old and slow.

That might be the reason for your problem, given how "fast" Python is.
The simple receive() only gets used when there is no rate limit and no download progress. Otherwise the more computationally expensive _receive_rate is used.

Anyway, commit bca9f96 adds a chunk-size option with which you can configure this value however you want without touching any actual code, and it also increases the default chunk size to 2^15 = 32k (from 2^14). 1MiB is far too much as a default value, I think.

@trodiz
Copy link
Author

trodiz commented Nov 3, 2022

@mikf Thanks for adding the option. Meanwhile I profiled the code and narrowed the issue down to this:

>>> import time
>>> def do():
...     t1 = time.perf_counter()
...     for i in range(1000):
...         time.sleep(0.01)
...     t2 = time.perf_counter()
...     dt = t2 - t1
...     print(f'{dt=}', f'exp={0.01*1000}')
...
>>> do()
dt=15.758051600001636 exp=10.0

Turns out on my machine time.sleep is only accurate up to ~10ms range. So each iteration in the for loop is sleeping a tiny bit more than it needs to. This error accumulates over 100s of loops. A bigger chunk size reduces the iterations and solves the problem. Here's the relevant line from the profiling output:
with 16KiB chunk size

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1170   16.095    0.014   16.095    0.014 {built-in method time.sleep}

and with 1MiB chunk size

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       19    5.855    0.308    5.855    0.308 {built-in method time.sleep}

This is specific to my computer and probably not an issue on most modern machines.
Once again thanks for your help.

@mikf
Copy link
Owner

mikf commented Nov 7, 2022

@trodiz I think I found a solution for inaccurate time.sleep() durations, but I'm not sure if it actually works outside my own tests. Could you try the following patch and report back if it works for you even with a small chunk-size?.

t2 = time.time() was meant to counteract an inaccurate time.sleep() duration, but apparently it does the opposite.

diff --git a/gallery_dl/downloader/http.py b/gallery_dl/downloader/http.py
index 2e7e76e6..69dc4813 100644
--- a/gallery_dl/downloader/http.py
+++ b/gallery_dl/downloader/http.py
@@ -301,7 +301,7 @@ class HttpDownloader(DownloaderBase):
                 if elapsed < expected:
                     # sleep if less time elapsed than expected
                     time.sleep(expected - elapsed)
-                    t2 = time.time()
+                    t2 += expected - elapsed
 
             t1 = t2

@trodiz
Copy link
Author

trodiz commented Nov 7, 2022

Although it eliminated the inaccuracy in time.time, it still did not compensate for the error in time.sleep. Any sleep interval below 15ms is not possible on Windows 10 unless I adjust the time resolution manually.

Here's an example of a file I'm trying to download:

FILE SIZE bytes CHUNK SIZE bytes LOOPS LIMITED RATE bytes/sec UNLIMITED RATE bytes/sec SLEEP PER LOOP milli sec
16,777,216 32,768 512 3,145,728 7,340,032 5.952
  • A 16MiB file should take 5.33s to finish at a rate of 3MiBps
  • At full speed it will finish in 2.29s (7MiBps is the max speed I can get)
  • Therefore, an additional 3.05s needs to be slept, or 5.95ms per loop

Here's a solution that should work on all Windows machines:

import ctypes
import platform

if platform.system() == "Windows":
    ntdll = ctypes.WinDLL('NTDLL.DLL')
    ntdll.NtSetTimerResolution(0)

This improves the time resolution to 1ms or lower and everything will run as expected.

@trodiz trodiz closed this as completed Nov 9, 2022
mikf added a commit that referenced this issue Nov 11, 2022
(#3143)

Reverts part of c59b98c by going back to using a global timer
instead of a per-chunk one.

Reintroduces the issue of ignoring rate limits after
suspending and resuming the process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants