Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import is slow with inline field in aunique #4145

Closed
jcassette opened this issue Nov 13, 2021 · 6 comments · Fixed by #4147
Closed

Import is slow with inline field in aunique #4145

jcassette opened this issue Nov 13, 2021 · 6 comments · Fixed by #4147
Labels
feature features we would like to implement

Comments

@jcassette
Copy link
Contributor

Problem

Running this command in verbose (-vv) mode:

[julien@julien-nas /mnt/tank/julien]$ python3 -m cProfile -s tottime /home/julien/.local/bin/beet -vv import -A /mnt/tank/julien/Album/
user configuration: /mnt/tank/julien/beets/config.yaml
data directory: /mnt/tank/julien/beets
plugin paths:
Sending event: pluginload
inline: adding item field multidisc
inline: adding album field media
inline: adding album field format
library database: /mnt/tank/julien/beets/library.db
library directory: /mnt/tank/julien/music
Sending event: library_opened
Sending event: import_begin
Sending event: import_task_created
/mnt/tank/julien/Album
0 of 10 items replaced
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: import_task_files
Sending event: album_imported
Sending event: import
Sending event: cli_exit

Led to this problem:

         4173933 function calls (4027432 primitive calls) in 120.860 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2857  109.761    0.038  109.761    0.038 {method 'execute' of 'sqlite3.Connection' objects}
    ...

Setup

  • OS: FreeBSD julien-nas 12.2-RELEASE-p6 FreeBSD 12.2-RELEASE-p6 facd5d71c97(HEAD) TRUENAS amd64
  • Python version: Python version 3.8.12
  • beets version: beets version 1.5.0 + commit b67c25a
  • Turning off plugins made problem go away (yes/no): yes see below
[julien@julien-nas /mnt/tank/julien]$ python3 -m cProfile -s tottime /home/julien/.local/bin/beet -vv --plugins= import -A /mnt/tank/julien/Album/
user configuration: /mnt/tank/julien/beets/config.yaml
data directory: /mnt/tank/julien/beets
plugin paths:
Sending event: pluginload
library database: /mnt/tank/julien/beets/library.db
library directory: /mnt/tank/julien/music
Sending event: library_opened
Sending event: import_begin
Sending event: import_task_created
/mnt/tank/julien/Album
0 of 10 items replaced
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: import_task_files
Sending event: album_imported
Sending event: import
Sending event: cli_exit
         604108 function calls (598012 primitive calls) in 2.915 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      135    1.085    0.008    1.085    0.008 {method 'execute' of 'sqlite3.Connection' objects}
    ...

My configuration (output of beet config) is:

directory: /mnt/tank/julien/music

plugins: inline
threaded: no

import:
    copy: no
    move: yes
    write: no
    log: /mnt/tank/julien/beets/log.txt
per_disc_numbering: yes

aunique:
    keys: albumartist album year format
    disambiguators: albumtype label catalognum albumdisambig releasegroupdisambig

paths:
    default: $albumartist - $year - $album [$format]%aunique{}/%if{$multidisc,Disc $disc/}$track - $title
item_fields:
    multidisc: 1 if disctotal > 1 else 0
album_fields:
    media: sorted(set(item.media for item in items))
    format: sorted(set(item.format for item in items))
acoustid:
    apikey: ********
discogs:
    user_token: ********
lastgenre:
    canonical: yes
    whitelist: /mnt/tank/julien/beets/genres.txt
pathfields: {}
@sampsyo sampsyo added the needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." label Nov 13, 2021
@sampsyo
Copy link
Member

sampsyo commented Nov 13, 2021

I think it's likely that this slowness arises from the interaction of inline-defined fields and their use in %aunique. Could you perhaps check whether leaving %aunique in its default configuration, or using all built-in fields for it, is less slow?

@jcassette
Copy link
Contributor Author

Hello and thanks for the suggestion, the import is much faster with the format field removed of aunique:

         426065 function calls (420533 primitive calls) in 2.904 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      135    1.094    0.008    1.094    0.008 {method 'execute' of 'sqlite3.Connection' objects}

@jcassette jcassette changed the title Import is slow with inline plugin Import is slow with inline field in aunique Nov 15, 2021
@jcassette
Copy link
Contributor Author

Since inline fields need slow queries I guess we could separate the matching in aunique in two parts:

  1. check if there is a duplicate only with keys using fast queries
  2. if a duplicate was found, re-check with all keys

@sampsyo
Copy link
Member

sampsyo commented Nov 16, 2021

Huh, that's a pretty good idea! %aunique matching is a pretty complex thing inherently, but that two-phase approach could really help in this specific case. Good thinking!

@sampsyo sampsyo added feature features we would like to implement and removed needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." labels Nov 16, 2021
@jcassette
Copy link
Contributor Author

Thank you 😄 I will try to implement that and compare the performance.

@jcassette
Copy link
Contributor Author

I did the modification of aunique and there was no improvement in the case of importing a duplicate album. (I did not try the case of an album with identical fixed attributes and different flexible attributes.)

However I found out that dbcore.AndQuery was not using short-circuit evaluation!

So I fixed that and the performance is much better in the first case I described:

[julien@julien-nas /mnt/tank/julien]$ python3 -m cProfile -s tottime /home/julien/.local/bin/beet -vv import -A /mnt/tank/julien/Album/
user configuration: /mnt/tank/julien/beets/config.yaml
data directory: /mnt/tank/julien/beets
plugin paths:
Sending event: pluginload
inline: adding item field multidisc
inline: adding album field media
inline: adding album field format
library database: /mnt/tank/julien/beets/library.db
library directory: /mnt/tank/julien/music
Sending event: library_opened
Sending event: import_begin
Sending event: import_task_created
/mnt/tank/julien/Album
0 of 10 items replaced
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: import_task_files
Sending event: album_imported
Sending event: import
Sending event: cli_exit
         553149 function calls (545816 primitive calls) in 3.344 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      145    1.487    0.010    1.487    0.010 {method 'execute' of 'sqlite3.Connection' objects}

sampsyo added a commit that referenced this issue Nov 17, 2021
Use short-circuit evaluation in AndQuery and OrQuery (fix #4145)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature features we would like to implement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants