[BUG?] very slow in creating library #427

Cantello · 2024-08-31T16:39:57Z

Describe the bug
When adding a library, the media entities are sometimes added quickly but thumbnails take forever or are not generated at all, sometimes, the entities are not created at all.

To Reproduce

Start the docker container (on debian 12), docker-compose.yml is below

  stump:
    image: aaronleopold/stump
    container_name: stump
    volumes:
      - /opt/stump:/config
      - /mnt/_nfs/Stuff:/data_nfs
      - /tmp:/temporary
    ports:
      - 10801:10801
    environment:
      - STUMP_CONFIG_DIR=/config
      - PUID=1000
      - PGID=1000
      - STUMP_VERBOSITY=2
    restart: unless-stopped

Initialize server
Create library (single folder, 40 .cbz files, series-based, thumbnails enabled with 200x350)
Library is either created but thumbnails are not generated (or extremely slow, like 1 per minute), OR
Library is not created at all ("No books to display", the series is created but with 0 books in it, the files are there in the files tab within the series)

Expected behavior
Library is created with thumbnails

Screenshots

Click to expand

Desktop (please complete the following information):
Access on Windows, Vivaldi as browser, stump is hosted on docker on debian 12.

Additional context
Log file here (maybe not enough and/or too chatty, was using verbosity 3):

https://logpaste.com/6lqmqRvU

The container stats are also interesting, nearly no CPU usage (1-2%) but huge memory usage (2.5GB after 15 minutes and rising).

The text was updated successfully, but these errors were encountered:

aaronleopold · 2024-08-31T16:54:27Z

Hey 👋 Thanks for the detailed report!

thumbnails take forever or are not generated at all

Coincidentally, someone else on Discord reported the really slow thumbnail generation. I'm not sure if you're on there, but if you're interested the discussions can be found here.

There is already a draft PR up which I'm hoping will help with this.

Create library (single folder, 40 .cbz files, series-based, thumbnails enabled with 200x350)

Can you share additional details about your library structure? You said a single folder, so something like?:

Root
├── Single Folder of Stuff
│   ├── 40 .cbz files

OR

Root
├── 40 .cbz files

Or something else? I'm curious if perhaps you are being affected by a recently fixed bug in the scanner (not in latest yet): #423

Library is not created at all ("No books to display", the series is created but with 0 books in it, the files are there in the files tab within the series)

This is an odd scenario. I won't have time to review the logs you provided today, but do you notice the server doing anything when this happens? I'm curious if there is a panic happening somewhere.

The container stats are also interesting, nearly no CPU usage (1-2%) but huge memory usage (2.5GB after 15 minutes and rising).

If I had to guess, CPU usage is low during a scan but huge during thumbnail generation. A scan doesn't load too much of the files into memory, but to generate thumbnails means we must load a (potentially large) image into memory, convert it to another format, apply transformations (scale, quality, etc). This is another area I'm hopeful the aforementioned PR will help, though.

Edit to add: I assume you are also using a network drive for your books (/mnt/_nfs/Stuff:/data_nfs). This is going to slow things down even more, in general.

Cantello · 2024-08-31T18:04:09Z

Thanks for the quick reply, I will just experiment around, stump is (not yet) my main comic reader (but will probably be as Komga is probably never going to implement smart filters).

As for your remarks:

Can you share additional details about your library structure? You said a single folder, so something like?:

The first test library was just 40 books in the root folder, my regular library is roughly as follows, one folder per

.
└── Comics/
    ├── English/
    │   ├── A Gift for a Ghost (2020)
    │   ├── Alan Moore's Neonomicon (2010-2011)
    │   ├── Alan Moore's The Courtyard (2004)
    │   ├── Aliens - Rescue (2019)
    │   └── American Gods/
    │       ├── American Gods - My Ainsel (2018)
    │       ├── American Gods - Shadows (2017)
    │       └── American Gods - The Moment of the Storm (2019-2020)
    └── German/
        ├── Ab Irato (2016)
        ├── Adler ohne Krallen (2012)
        ├── Akira (1991)
        ├── Aldebaran (2010)
        ├── Alice im Wunderland (2010)
        ├── Alice Matheson (2016-2020)
        └── Allein (2008-)

This is an odd scenario. I won't have time to review the logs you provided today, but do you notice the server doing anything when this happens? I'm curious if there is a panic happening somewhere.

Nothing that I am aware of, any idea where I could be looking?

Edit to add: I assume you are also using a network drive for your books (/mnt/_nfs/Stuff:/data_nfs). This is going to slow things down even more, in general.

I know that this could be a bottleneck (which is why I switched to NFS instead of SMB, the files are served by a Synology NAS) but other comic readers (tried Kavita & Komga) currently do not suffer from any problems.

P.S.: memory usage is now up to 5.8GB, this does somehow not seem quite right... Is this due to the very verbose logging?

aaronleopold · 2024-08-31T18:24:40Z

The first test library was just 40 books in the root folder, my regular library is roughly as follows, one folder per

This sounds like it would trigger that bug I mentioned. The bug is specifically for media files existing in the root of a library. Since it was just a test, it shouldn't otherwise affect you (unless you have files in the library root elsewhere). It should be fixed as of the last nightly.

Nothing that I am aware of, any idea where I could be looking?

Nothing specific, I really am curious what the entire scan would output though. No books is odd, but sporadically is even stranger. In general, keywords to look for in logs include error, warn, or panic.

P.S.: memory usage is now up to 5.8GB, this does somehow not seem quite right... Is this due to the very verbose logging?

I wouldn't think so, you're mapping to a dir on your host where the logs are written to. I am not sure how long the logs would remain in a buffer somewhere if nobody is attached to it.

Would you be open to spinning up a test container if I were to push an image with the changes from #426? I think your setup is a good one for working out some of these issues.

Cantello · 2024-08-31T19:09:44Z

Would you be open to spinning up a test container if I were to push an image with the changes from #426? I think your setup is a good one for working out some of these issues.

Absolutely, just tell me what to do. Might take until Monday, though.

One thing that I noticed after starting a new library scan with reduced log verbosity (the only real error I found in the log):

2024-08-31T18:59:04.918447Z  INFO stump_core::job: Starting job, job_id: 532f68d0-50c8-419f-8b7b-17fdd35e65c0, job_name: "library_scan"
    at core/src/job/mod.rs:619
2024-08-31T18:59:22.479440Z  INFO stump_core::filesystem::scanner::library_scan_job: Executing the init task for library scan
    at core/src/filesystem/scanner/library_scan_job.rs:269
2024-08-31T18:59:22.638665Z  INFO stump_core::filesystem::scanner::library_scan_job: Executing the walk series task for library scan
    at core/src/filesystem/scanner/library_scan_job.rs:407
2024-08-31T19:00:10.251108Z ERROR tower_http::trace::on_failure: response failed, classification: Status code: 500 Internal Server Error, latency: 0 ms
    at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-http-0.4.4/src/trace/on_failure.rs:93

aaronleopold · 2024-08-31T19:17:55Z

Absolutely, just tell me what to do. Might take until Monday, though.

I appreciate it! No rush on my end. I'll be pushing a perf-experimental tag for each commit on that PR. The first job should finish shortly: https://github.com/stumpapp/stump/actions/runs/10647624711/job/29515774379?pr=426

Edit to add: it just got pushed https://hub.docker.com/r/aaronleopold/stump/tags

There are two new configuration options to try out:

STUMP_MAX_SCANNER_CONCURRENCY: The maximum number of concurrent files which may be processed by a scanner (default 200)
STUMP_MAX_THUMBNAIL_CONCURRENCY: The maximum number of concurrent files which may be processed by a thumbnail generator (default 50)

One thing that I noticed after starting a new library scan with reduced log verbosity (the only real error I found in the log):

Without more verbose logs it's hard to say would could have triggered a 500. I wouldn't necessarily attribute it to the scanner though, likely a red herring. When you start a scan via HTTP call, it sends an internal message to queue the job. The HTTP request should end by the time the job would start. The only time a 500 would happen is if the message to queue the job failed, but since it shows some info level logs from the scanner I don't think that is what is happening.

aaronleopold · 2024-09-03T23:12:51Z

See #426 (comment) for some more context

Cantello · 2024-09-04T07:19:11Z

I tried the experimental docker image but I guess there is something more afoot here with my setup than just slow thumbnail creation. Somehow, even after two days, the library creation & scanning process was still working and the status just showed 0% complete (0 of 1745 books) 0 in progress. It also seemed as if the frontend was still working but the backend did not provide any updates, e.g. the settings page was not displayed even though the web page switched to settings. According to docker ps the container was still working and logging into the container and installing procps it shows stump working as well.

I recreated the container after deleting all user data and it is currently working. 2/125 tasks done after 50 minutes of a library with ~700 books/100GB and memory consumption is steadily rising with VERBOSITY=2 (now at 1.2GB RAM).
edit: restarted the container on verbosity=0 and still after 2 hours, memory consumption rose to 6-7GB and the container was unresponsive.

aaronleopold · 2024-09-04T14:04:13Z

To be clear, when you say you tried the experimental docker image you mean the literal experimental tag or the perf-experimental tag I pushed with (potential) improvements?

It also seemed as if the frontend was still working but the backend did not provide any updates, e.g. the settings page was not displayed even though the web page switched to settings

Yeah that sounds like it was overwhelmed, e.g. no resources available to issue the queries to display misc settings sections so the suspense boundary stays (nothing gets rendered). If you don't have the Live refetch preference enabled (settings > appearance) that is supposed to help. I agree, there is clearly something going on here with how your setup is interacting with Stump.

Is there any way for you to profile what is eating the memory, specifically? E.g. if the process is holding any files open, etc.

Cantello · 2024-09-05T08:23:51Z

To be clear, when you say you tried the experimental docker image you mean the literal experimental tag or the perf-experimental tag I pushed with (potential) improvements?

I was using perf-experimental.

Is there any way for you to profile what is eating the memory, specifically? E.g. if the process is holding any files open, etc.

Sorry, I'm not much of a developer (not at all) and thus cannot really help there. If you have any advice on what you'd need, I can go from there. From what I tried:

root@3e6fdecd9b67:/# free -h
              total        used        free      shared  buff/cache   available
Mem:           15Gi       6.4Gi       3.4Gi        80Mi       5.7Gi       8.4Gi
Swap:            0B          0B          0B

root@3e6fdecd9b67:/# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   3736  1852 ?        Ss   06:05   0:00 /bin/bash /entrypoint.sh
root          36  0.0  0.0   6144  1408 ?        S    06:05   0:00 su stump -c /app/stump
stump         37  1.0  5.3 3232524 871560 ?      Ssl  06:05   1:09 /app/stump
root         107  0.0  0.0   3976  2952 pts/0    Ss   06:08   0:00 /bin/bash
root         715  0.0  0.0   7640  2584 pts/0    R+   07:57   0:00 ps aux

root@3e6fdecd9b67:/# lsof -p 37
COMMAND PID  USER   FD      TYPE DEVICE SIZE/OFF NODE NAME
stump    37 stump  cwd   unknown                      /proc/37/cwd (readlink: Permission denied)
stump    37 stump  rtd   unknown                      /proc/37/root (readlink: Permission denied)
stump    37 stump  txt   unknown                      /proc/37/exe (readlink: Permission denied)
stump    37 stump    0   unknown                      /proc/37/fd/0 (readlink: Permission denied)
stump    37 stump    1   unknown                      /proc/37/fd/1 (readlink: Permission denied)
stump    37 stump    2   unknown                      /proc/37/fd/2 (readlink: Permission denied)
[... lots more lines, 45 file descriptors in total]

root@3e6fdecd9b67:/var/log# lsof -p 36
COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
su       36 root  cwd    DIR   0,79     4096  4915529 /
su       36 root  rtd    DIR   0,79     4096  4915529 /
su       36 root  txt    REG   0,79    63568  7117593 /bin/su
su       36 root  mem    REG    8,1           7117593 /bin/su (path dev=0,79)
su       36 root  mem    REG    8,1           7117908 /lib/x86_64-linux-gnu/security/pam_permit.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117888 /lib/x86_64-linux-gnu/security/pam_deny.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117837 /lib/x86_64-linux-gnu/libnsl-2.28.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117811 /lib/x86_64-linux-gnu/libcrypt-2.28.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117924 /lib/x86_64-linux-gnu/security/pam_unix.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117899 /lib/x86_64-linux-gnu/security/pam_limits.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117903 /lib/x86_64-linux-gnu/security/pam_mail.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117890 /lib/x86_64-linux-gnu/security/pam_env.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117860 /lib/x86_64-linux-gnu/libpthread-2.28.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117859 /lib/x86_64-linux-gnu/libpcre.so.3.13.3 (path dev=0,79)
su       36 root  mem    REG    8,1           7117866 /lib/x86_64-linux-gnu/libselinux.so.1 (path dev=0,79)
su       36 root  mem    REG    8,1           7117911 /lib/x86_64-linux-gnu/security/pam_rootok.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117843 /lib/x86_64-linux-gnu/libnss_files-2.28.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117808 /lib/x86_64-linux-gnu/libcap-ng.so.0.0.0 (path dev=0,79)
su       36 root  mem    REG    8,1           7117813 /lib/x86_64-linux-gnu/libdl-2.28.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117799 /lib/x86_64-linux-gnu/libaudit.so.1.0.0 (path dev=0,79)
su       36 root  mem    REG    8,1           7117805 /lib/x86_64-linux-gnu/libc-2.28.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117880 /lib/x86_64-linux-gnu/libutil-2.28.so (path dev=0,79)
su       36 root  mem    REG    8,1           7117854 /lib/x86_64-linux-gnu/libpam_misc.so.0.82.1 (path dev=0,79)
su       36 root  mem    REG    8,1           7117852 /lib/x86_64-linux-gnu/libpam.so.0.84.2 (path dev=0,79)
su       36 root  mem    REG    8,1           7117791 /lib/x86_64-linux-gnu/ld-2.28.so (path dev=0,79)
su       36 root    0u   CHR    1,3      0t0        5 /dev/null
su       36 root    1w  FIFO   0,13      0t0 46841355 pipe
su       36 root    2w  FIFO   0,13      0t0 46841356 pipe

edit: stupid me thought that the percentage indicator in the upper right corner was the completion of library building but instead is the reading completion...

aaronleopold · 2024-09-05T19:30:34Z

Sorry, I'm not much of a developer (not at all) and thus cannot really help there

No worries! I appreciate what you provided. Unfortunately nothing really sticks out to me, aside from curiosity around those Permission denieds. However, I was able to finally reproduce your issues! It's my fault, I should have really tried my initial investigations in a docker container, as that is the key part but everything I tried was local for convenience.

I was able to get a quickly growing memory usage for pretty much any scenario (host mount, SMB, NFS), getting to 10GB easily within 20 minutes. When cancelling the scan, the memory stabilizes and the OS does not free it. I just killed the container after 30 minutes sitting around 10GB idle. I'm not 100% what the next steps should be TBH, since I only am able to experience this in docker I'm thinking:

Experiment with allocators (e.g. https://crates.io/crates/jemallocator) when running in docker
Experiment with different base image(s)
Remove all IO operations and slowly add them back (this is sort of extremely tedious since I have to rebuild an image each time, should be last resort)

aaronleopold · 2024-09-05T23:15:03Z

Okay so I might have found something!

I think the root cause of this is an issue in the zip crate. I initially became suspicious of this when:

Using a different allocator (tried mimalloc and jemalloc) didn't help or really change anything
Changing to distroless image didn't help
Scanning a (granted, much smaller) ebook library didn't trigger the issue
Refreshing the books page for a library filled with ZIP files just kept increasing memory usage
Refreshing the books page for a library filled with ebooks didn't increase memory usage 👀

It's important to note that the epub-rs library uses a different, older version of the zip crate. This crate was updated in this project a couple months ago (#353). So to verify that there is something funky going on with the zip crate, I downgraded and testsed twice:

Downgraded the zip crate to 0.6.6
Downgrade the zip crate to 1.1.3 (the version used in epub-rs)

The results were pretty much the same between the versions. The memory usage definitely did still increase during the duration of the scan, but we're talking magnitudes less than before. I was getting to over 10GB of usage within 20 minutes of scanning the NFS mount, but with the downgraded zip crate, I was at 256.2MiB after 8 minutes at which point the scan was complete. Memory dropped back down by about 10MiB, so it's still holding on to memory. In case the numbers aren't clear either wrt the timing, the NFS mount took 8 minutes to scan 1546 books after downgrading. If you recall/read any of the comments in the linked PR, this is a substantial difference to say the least lol

I think next steps will be to push another perf-experimental image with the downgraded zip crate and see if the issue is improved at all on your end. I might eagerly downgrade for the 0.0.6 release I was planning on doing this weekend, as well. As a side note, I'm contemplating keeping the distroless base image since it shrunk the image by 67% in size, IDK if anyone has thoughts on that.

I'll also try to find time to create a good reproduction to hand off to the zip crate maintainers to hopefully figure out what might be happening. Edit to add that I found an issue which might be relevant: zip-rs/zip2#138

Relates to #427

* ⬇️ Downgrade `zip` crate to `1.1.3` Relates to #427 * fix test

aaronleopold · 2024-09-06T23:42:12Z

@Cantello I'm going to plan on merging the linked PR sometime over the weekend. If you're able to, let me know whether the latest perf-experimental improves the situation on your end at all

Cantello · 2024-09-09T07:40:41Z

@Cantello I'm going to plan on merging the linked PR sometime over the weekend. If you're able to, let me know whether the latest perf-experimental improves the situation on your end at all

I just tried the latest per-experimental container again and yes, it has greatly improved the situation here. The whole library (for one language, 700 files, 100GB) was scanned and thumbnails were created in ~14 minutes. Memory usage is high (1.8GB) but not dramatically so.
So, thanks for the extremely quick fix and for your hard work to identify this problem! Two thumbs up!

aaronleopold · 2024-09-09T14:46:42Z

Awesome, thanks for re-testing with the updates! I'll go ahead and close out this issue now that the slowness aspect has been resolved. I'll try and keep an eye on the memory usage in docker, it's odd that running Stump outside docker doesn't have the same issue. If you find the memory worsens, feel free to create a separate issue.

Thanks for outlining your experience and pointing the slowness out! You directly helped in figuring out the huge performance regression 🤝

Cantello added the bug Something isn't working label Aug 31, 2024

aaronleopold added this to v0.1.x Aug 31, 2024

github-project-automation bot moved this to Backlog in v0.1.x Aug 31, 2024

aaronleopold moved this from Backlog to Triage in v0.1.x Aug 31, 2024

aaronleopold mentioned this issue Sep 3, 2024

⚡ Refactor thumbnail generation and scanner IO operations #426

Merged

aaronleopold added a commit that referenced this issue Sep 5, 2024

⬇️ Downgrade zip crate to 1.1.3

23d9415

Relates to #427

aaronleopold mentioned this issue Sep 6, 2024

⬇️ Downgrade zip crate to 1.1.3 #435

Merged

aaronleopold added a commit that referenced this issue Sep 6, 2024

⬇️ Downgrade zip crate to 1.1.3 (#435)

fe35819

* ⬇️ Downgrade `zip` crate to `1.1.3` Relates to #427 * fix test

aaronleopold closed this as completed Sep 9, 2024

github-project-automation bot moved this from Triage to Done in v0.1.x Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG?] very slow in creating library #427

[BUG?] very slow in creating library #427

Cantello commented Aug 31, 2024 •

edited

Loading

aaronleopold commented Aug 31, 2024 •

edited

Loading

Cantello commented Aug 31, 2024

aaronleopold commented Aug 31, 2024

Cantello commented Aug 31, 2024 •

edited

Loading

aaronleopold commented Aug 31, 2024 •

edited

Loading

aaronleopold commented Sep 3, 2024

Cantello commented Sep 4, 2024 •

edited

Loading

aaronleopold commented Sep 4, 2024 •

edited

Loading

Cantello commented Sep 5, 2024 •

edited

Loading

aaronleopold commented Sep 5, 2024

aaronleopold commented Sep 5, 2024 •

edited

Loading

aaronleopold commented Sep 6, 2024

Cantello commented Sep 9, 2024 •

edited

Loading

aaronleopold commented Sep 9, 2024

[BUG?] very slow in creating library #427

[BUG?] very slow in creating library #427

Comments

Cantello commented Aug 31, 2024 • edited Loading

aaronleopold commented Aug 31, 2024 • edited Loading

Cantello commented Aug 31, 2024

aaronleopold commented Aug 31, 2024

Cantello commented Aug 31, 2024 • edited Loading

aaronleopold commented Aug 31, 2024 • edited Loading

aaronleopold commented Sep 3, 2024

Cantello commented Sep 4, 2024 • edited Loading

aaronleopold commented Sep 4, 2024 • edited Loading

Cantello commented Sep 5, 2024 • edited Loading

aaronleopold commented Sep 5, 2024

aaronleopold commented Sep 5, 2024 • edited Loading

aaronleopold commented Sep 6, 2024

Cantello commented Sep 9, 2024 • edited Loading

aaronleopold commented Sep 9, 2024

Cantello commented Aug 31, 2024 •

edited

Loading

aaronleopold commented Aug 31, 2024 •

edited

Loading

Cantello commented Aug 31, 2024 •

edited

Loading

aaronleopold commented Aug 31, 2024 •

edited

Loading

Cantello commented Sep 4, 2024 •

edited

Loading

aaronleopold commented Sep 4, 2024 •

edited

Loading

Cantello commented Sep 5, 2024 •

edited

Loading

aaronleopold commented Sep 5, 2024 •

edited

Loading

Cantello commented Sep 9, 2024 •

edited

Loading