Improvements to query performance #3258

SimonTeixidor · 2019-05-08T20:06:09Z

As discussed in PR #2388, this PR makes gives some performance improvements for querying. On my database of around 14k songs, these commits make beet ls > /dev/null run in 6.4 seconds, down from 15.5.

The second commit confuses me. For some reason, inlining the call to formatted() makes the difference between 11 seconds and 6.4 seconds. I was confused for quite some time, thinking that FormattedMapping itself was doing something slow, but it turns out that the function call overhead was causing it?!

In this PR I have used lru_cache to cache the formatting templates. I would have preferred to explicitly create the template at the call site, but I can't seem to figure it out. The function list_items, in ui/commands.py, receives an argument fmt. I figured that this argument would contain the formatting string, but it appears to be empty. Whether I specify a formatting string on the command line or not does not seem to matter. Any pointers here would be great. In it's current state, I imagine that this PR would break Python2 support. I don't know if you intend to support Python 2, given that it will be deprecated in half a year now.

Finally, I'm not sure how to run the tests? I run the tox command, but it complains about some interpreters missing. Am I expected to have every version of Python installed on my computer for this to work? Sorry, I'm not really a pythonista :)

wisp3rwind · 2019-05-08T20:35:14Z

beets/dbcore/db.py

@@ -581,16 +581,6 @@ def add(self, db=None):
                    self._dirty.add(key)
            self.store()

-    # Formatting and templating.
-
-    _formatter = FormattedMapping


This change is not as innocuous as it seems: The _formatter attribute is used by the beets.library.Item subclass of Model which replaces FormattedMapping by FormattedItemMapping. I would strongly suspect that this is the reason for the performance improvement, not having an additional factory method wrapping it. Without reading the code, I don't know what the exact implications are, but this commit must surely be breaking some of beets functionality.

There might of course be room for optimization in FormattedItemMapping.

Ah, thanks! I couldn't figure out why this would impact performance so much. I guess I'll investigate this further.

wisp3rwind · 2019-05-08T20:58:58Z

Thanks for tackling this long-standing issue! However, I believe the overall role of templates in beets should be investigated somwhat further before deciding on a strategy for fixing it. In particular, @sampsyo made a few suggestions in 2030: There does not seem to be the obvious way to approach caching templates.
One additional point is that I think it might be good to keep the general code layout in mind: Templates are not really an integral part of dbcore (IIRC, there is the loose idea that it would be nice to make it completely independent of the rest of beets). Thus, it might be preferable not to entangle this two parts of beets even more deeply.

EDIT: I might personally be a little too perfectionist in that regard, its probably fine to just choose one of the approaches that have been proposed and go for it, in particular since there are no user-facing changes.

In this PR I have used lru_cache to cache the formatting templates. I would have preferred to explicitly create the template at the call site, but I can't seem to figure it out. The function list_items, in ui/commands.py, receives an argument fmt. I figured that this argument would contain the formatting string, but it appears to be empty. Whether I specify a formatting string on the command line or not does not seem to matter. Any pointers here would be great. In it's current state, I imagine that this PR would break Python2 support. I don't know if you intend to support Python 2, given that it will be deprecated in half a year now.

You already mentioned #2030 in your comments in #2388: I think one of the two approaches I benchmarked is very close to what you intended to do. Note the comments about the default template.

I believe there are plans to drop Python 2 support in the near future,but as long as we still support it, this change must at least be disabled when on Python 2 (which might be an acceptable approach if Python 2 support is indeed removed soon-ish).

Finally, I'm not sure how to run the tests? I run the tox command, but it complains about some interpreters missing. Am I expected to have every version of Python installed on my computer for this to work? Sorry, I'm not really a pythonista :)

Without any traceback, I cannot really tell what the issue is. Note that you can run individual test environments using tox -e.

SimonTeixidor · 2019-05-08T21:31:10Z

You already mentioned #2030 in your comments in #2388: I think one of the two approaches I benchmarked is very close to what you intended to do. Note the comments about the default template.

Thanks, I should have read the issue more carefully. So the suggestion to create a functemplate.template() function that is cached, and instantiates templates seems reasonable to me. I've pushed a commit with this change. I still used lru_cache, but if we agree that the approach is good I will look into how to make it python2 compatible.

I looked into the performance of FormattedItemMapping. The problem seems to be that the formatter fetches the related album for every item, even when it is not needed. One solution to this is to fetch the album lazily, only when we really need it. This makes the default beet ls fast, whereas beet ls -f '$artpath' runs a bit slower, as it needs to fetch the album. For most attributes, the album does not seem to be needed, so this should speed up "simple" of queries. I've pushed a commit with this change as well.

sampsyo · 2019-05-09T16:01:27Z

Awesome work so far! This is super interesting. It seems like there are two main things here: LRU-caching the templates, and making FormattedItemMapping lazy. (The latter is a great insight—we didn't previously know this layer was costing so much.) Any chance I could convince you to split one of them off into a separate pull request for review (since they seem orthogonal)?

I don't see any big downsides to using @lru_cache (we can definitely figure out the Python 2 story). And I'd also be happy to dig into why you're seeing that mysterious behavior that was preventing explicit reuse…

We were previously doing calls to Template() directly, sometimes in a loop. This caused the same template to be recompiled over and over. This commit introduces a function template() which caches the results, so that multiple calls with the same template string does not require recompilation.

SimonTeixidor · 2019-05-09T16:35:43Z

@sampsyo - Sure, I created PR #3260 for the changes to FormattedItemMapping.

SimonTeixidor · 2019-05-09T17:46:40Z

I fixed the flake8 errors, and disabled lru_cache on python2, so tests are passing now.

sampsyo · 2019-05-09T18:14:46Z

I can confirm the performance impact on my machine! Here are three time runs without the change:

beet ls > /dev/null  4.94s user 0.38s system 99% cpu 5.355 total
beet ls > /dev/null  4.90s user 0.36s system 99% cpu 5.288 total
beet ls > /dev/null  4.87s user 0.36s system 99% cpu 5.255 total

And with it:

beet ls > /dev/null  6.99s user 0.41s system 99% cpu 7.424 total
beet ls > /dev/null  7.06s user 0.41s system 99% cpu 7.492 total
beet ls > /dev/null  7.09s user 0.43s system 99% cpu 7.589 total

So that's a change from about 7.0 seconds to about 4.9 seconds, or a 1.4x speedup. 🎉

sampsyo · 2019-05-09T18:18:52Z

To answer your question:

Finally, I'm not sure how to run the tests? I run the tox command, but it complains about some interpreters missing. Am I expected to have every version of Python installed on my computer for this to work? Sorry, I'm not really a pythonista :)

Yep, tox depends on Python interpreters for each version you want to test under. You can always choose a specific environment by typing something like tox -e py37.

Improvements to query performance

Don't restrict to Python 2 precisely.

SimonTeixidor changed the title ~~Improvements to query performance #2388~~ Improvements to query performance May 8, 2019

SimonTeixidor mentioned this pull request May 8, 2019

Performance: Common operations are slow. #2388

Open

wisp3rwind reviewed May 8, 2019

View reviewed changes

SimonTeixidor force-pushed the performance-2388 branch from 38939ea to c5075b2 Compare May 9, 2019 16:28

SimonTeixidor mentioned this pull request May 9, 2019

Lazily fetch album in FormattedItemMapping, it's not needed in most cases #3260

Merged

Fix formatting, and add python2 support.

7df4e23

sampsyo merged commit 7df4e23 into beetbox:master May 9, 2019

sampsyo added a commit that referenced this pull request May 9, 2019

Merge pull request #3258 from SimonPersson/performance-2388

38bbb7d

Improvements to query performance

sampsyo added a commit that referenced this pull request May 9, 2019

Changelog refinement for #3258

d236e1e

sampsyo added a commit that referenced this pull request May 9, 2019

Refine @cached decorator from #3258

ff1d43d

Don't restrict to Python 2 precisely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to query performance #3258

Improvements to query performance #3258

SimonTeixidor commented May 8, 2019

wisp3rwind May 8, 2019

SimonTeixidor May 8, 2019

wisp3rwind commented May 8, 2019 •

edited

Loading

SimonTeixidor commented May 8, 2019

sampsyo commented May 9, 2019

SimonTeixidor commented May 9, 2019

SimonTeixidor commented May 9, 2019

sampsyo commented May 9, 2019

sampsyo commented May 9, 2019

Improvements to query performance #3258

Improvements to query performance #3258

Conversation

SimonTeixidor commented May 8, 2019

wisp3rwind May 8, 2019

Choose a reason for hiding this comment

SimonTeixidor May 8, 2019

Choose a reason for hiding this comment

wisp3rwind commented May 8, 2019 • edited Loading

SimonTeixidor commented May 8, 2019

sampsyo commented May 9, 2019

SimonTeixidor commented May 9, 2019

SimonTeixidor commented May 9, 2019

sampsyo commented May 9, 2019

sampsyo commented May 9, 2019

wisp3rwind commented May 8, 2019 •

edited

Loading