[Feature] Autotag Optimizations #2366

kermieisinthehouse · 2022-03-06T21:19:48Z

Autotag performance currently leaves me unable to run autotag on my image library (just under 14 million images).

I'm requesting some specific optimizations so that autotag can be greatly sped up. Note that the old bulk autotag implementation (sqlite regex based), was much faster, but was less configurable and used more memory.

In scope:

A checkbox for tags / performers / studios that enables / disables eligibility to autotag. This can reduce the search space considerably AND allow people to run autotag without worrying about pollution of e.g. single name performers.
Memoization of compiled RE objects during an autotag run to save CPU
Folding all alias regexes into a single regex: all aliases match the same underlying ID, and assuming they are somewhat similar, we can quickly check them all in one optimized call to regexp. We can generate all of the matching regexes to all aliases, then concat them into a master regex of "REGEX1 | REGEX2 | REGEX3 | ...". The final compiled object will optimize this pretty well, especially if the tags share substrings, like "example tag phrase" and "example tag phrases".
A dedicated query function for the autotag task that doesn't use a sort by title. On large collections, sqlite sorts use crazy amounts of CPU time, when the order doesn't matter. The internal ROWID sort is consistent enough.
Indexes for images / scenes tables as necessary for new query

Not in scope: I am unsure of how much time the tag narrowing strategy currently saves. Is tokenizing the string and querying the database really faster than just naively doing all regex comparisons, especially if we only compile them once during task lifetime?

This was partially started in #1927, but it is not necessary to reuse any of it.

I am willing to put a decent bounty on this issue, as it is somewhat large and very useful to me.

WithoutPants · 2022-03-07T06:41:29Z

Bounty placed for $251.

kermieisinthehouse added feature Pull requests that add a new feature feature request labels Mar 6, 2022

kermieisinthehouse added this to the "Soon" milestone Mar 6, 2022

WithoutPants mentioned this issue Mar 7, 2022

Autotag optimisation #2368

Merged

WithoutPants added the bounty This issue has a bounty on it in the OpenCollective label Mar 7, 2022

WithoutPants modified the milestones: "Soon", Version 0.14.0 Mar 7, 2022

WithoutPants mentioned this issue Mar 29, 2022

Add ignore autotag flag #2439

Merged

WithoutPants closed this as completed in #2439 Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Autotag Optimizations #2366

[Feature] Autotag Optimizations #2366

kermieisinthehouse commented Mar 6, 2022

WithoutPants commented Mar 7, 2022

[Feature] Autotag Optimizations #2366

[Feature] Autotag Optimizations #2366

Comments

kermieisinthehouse commented Mar 6, 2022

WithoutPants commented Mar 7, 2022