Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Works search performance #1163

Merged

Conversation

darrell-k
Copy link
Contributor

Add parameter to create search helper tables with a primary key. This is the magic bullet.

Works Query: replace joins to library_track with EXISTS check - runs much faster with large libraries. The Works search query can be made even faster by using different SQL when $search is passed in, but this is pretty good already and avoids possible inadvertent logic changes.

Tested with a large library (25,000 albums, 288,000 tracks, 8400 works, 20,000 contributors).

The livesearch still takes about 30 seconds when restricting to local music only (on an old Chromebook running Debian) to resolve for example "beethoven 5" with the above library because as we know it runs 5 queries for every typed character.

It's substantially faster when searching all music (ie without the library_track check) so that might be improved further with more work on join columns and/or indexes. But I think we should go with this and see what the users think.

Copy link
Member

@michaelherger michaelherger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Would it make sense to simplify things and always create the primary key? I guess the potential drawback would be that creating the index might be more of a penalty than the win of a faster lookup could be?

I think the reason why I changed the SQL in most other queries to be based on the search results probably were the result of similar challenges. And that wouldn't take advantage of the index. Would you mind measuring performance of those searches with and without the primary key? It would simplify the change - I don't like additional parameters unless they're really needed.

@@ -4722,8 +4727,7 @@ sub worksQuery {
}

if (defined $libraryID) {
$sql .= 'JOIN library_track ON library_track.track = tracks.id ';
push @{$w}, 'library_track.library = ?';
push @{$w}, 'EXISTS (SELECT * FROM library_album WHERE library_album.album = albums.id AND library_album.library = ?)';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to SELECT 1 ... instead of SELECT * ..., as we don't really need the data? Similar to when they say that SELECT COUNT(1) was faster than SELECT COUNT(*).

Comment on lines 291 to 297
my $searchSQL = "SELECT SUBSTR(fulltext.id, 33) AS id, FULLTEXTWEIGHT(matchinfo(fulltext)) AS fulltextweight FROM fulltext WHERE fulltext MATCH 'type:$type $tokens' $orderOrLimit";
if ($createPrimaryKey) {
$dbh->do("CREATE $temp TABLE $name (id integer primary key, fulltextweight integer)");
$searchSQL = "INSERT INTO $name $searchSQL";
} else {
$searchSQL = "CREATE $temp TABLE $name AS $searchSQL";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if you did a separate CREATE $temp in all cases, so we can have the actual $searchSQL identical.

@michaelherger
Copy link
Member

I'd re-consider reversing the SQL from starting on tracks to start on the search result. So instead of this:

	my $sql = 'SELECT %s FROM tracks
		JOIN worksSearch ON works.id = worksSearch.id
...

do something like

	my $sql = 'SELECT %s FROM worksSearch
		JOIN tracks ON tracks.id = worksSearch.id
...

? That would solve the index creation challenge, wouldn't it?

That's not the actual code change, but you get the idea. Start with the select depending on whether you do a search or not, then add the other join statements and all the rest.

@darrell-k
Copy link
Contributor Author

I've rerun my tests and done some extra ones. See below.

Caveat: These results are in sqllitebrowser, running sqlite 3.39.2 (my LMS runs sqlite 3.22). For some reason the queries run faster in LMS/DBI, but the relative performance improvements are similar.

Based on this and your suggestion, I'll go ahead and rewrite the SQL in worksQuery when $search is passed in (and bypass processing other parameters, just in case).

I'll also address the other points you've made. Particularly see my test of the albumsQuery search with/without creating a primary key on the temporary table: based on this I'll make the temp table primary key creation unconditional.

worksSearch:

SQL 1 (same as standard Works query)

SELECT works.title, works.id, composer.name, c	omposer.id, composer.namesort, works.titlesort, GROUP_CONCAT(DISTINCT albums.artwork), GROUP_CONCAT(DISTINCT albums.id) 
FROM tracks
		JOIN contributor_track composer_track ON composer_track.track = tracks.id AND composer_track.role = 2
		JOIN contributors composer ON composer.id = composer_track.contributor
		JOIN contributor_track ON contributor_track.track = tracks.id
		JOIN contributors ON contributors.id = contributor_track.contributor
		JOIN works ON tracks.work = works.id AND composer.id = works.composer
		JOIN albums ON tracks.album = albums.id 
		JOIN worksSearch ON works.id = worksSearch.id 
		JOIN albumsSearch ON albums.id = albumsSearch.id 
		WHERE tracks.work IS NOT NULL  
AND EXISTS (SELECT * FROM library_album WHERE library_album.album = albums.id AND library_album.library = '91b427f1')
		GROUP BY tracks.work, composer.id
		ORDER BY composer.namesort, works.titlesort

SQL 2 (starting with workssearch, also omitting the dobule join to contributors because I don't think we ever run worksQuery with both $search and non-composer $artist/$roleID)

SELECT works.title, works.id, composer.name, composer.id, composer.namesort, works.titlesort, 
GROUP_CONCAT(DISTINCT albums.artwork), GROUP_CONCAT(DISTINCT albums.id)
FROM workssearch
join works on works.id=workssearch.id
join tracks on tracks.work=workssearch.id
join albums on albums.id=tracks.album
join albumsSearch on albumsSearch.id=albums.id
join contributors composer on composer.id=works.composer
WHERE EXISTS (SELECT * FROM library_album WHERE library_album.album = albums.id AND library_album.library = '91b427f1')
group by workssearch.id, composer.id
order by composer.namesort, works.titlesort

Large result set(6705 works):
	Without primary key:
		Table creation time: 100ms
		SQL 1: > 1 minute (canceled)
		SQL 2: > 1 minute (canceled)
	With primary key:
		Table creation time: 160ms
		SQL 1: 4.2 seconds
		SQL 2: 1.3 seconds
	
Small result set (6 works):
	Without primary key:
		Table creation time: 40ms
		SQL 1: 3.3 seconds
		SQL 2: 120 ms
	With primary key:
		Table creation time: 50ms
		SQL 1: 1.4 seconds
		SQL 2: 50 ms

Live Search

(example, Albums. Searching for "b" - 24,268 rows in temporary table):

Without primary key:
	Table creation: 200 ms
	Query runtime: > 1 minute (canceled)
With primary key:
	Table creation: 300 ms
	Query runtime: 900 ms

And albumsQuery with $search

Large (2910 rows in temporary table):
	Without primary key: 
		Table creation: 24 ms
		Query runtime: 209 ms
	With primary key:
		Table creation: 120 ms
		Query runtime: 70 ms 

Smaill (57 rows in temporary table):
	Without primary key: 
		Table creation: 5 ms
		Query runtime: 38 ms
	With primary key:
		Table creation: 14 ms
		Query runtime: 22 ms 

@michaelherger
Copy link
Member

Nice job - thanks a lot!

@darrell-k
Copy link
Contributor Author

Changes pushed.

Copy link
Member

@michaelherger michaelherger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give that a try! Thanks a lot!

@michaelherger michaelherger merged commit b4c87de into LMS-Community:public/9.0 Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants