-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query with asymmetric matching for accented characters #3882
Comments
The |
Thanks for the suggestion but, unfortunately, I have just tried it, with default config settings. Unfortunately it seems to fail to even match many identical, unaccented strings: I have six tracks with title containing the unaccented string 'Noel' but using It seems to work even less well for accents. I have 12 tracks with Noël and fuzzy returns no results for However, I guess that a similar plugin approach would be a good idea for implementation of this feature. Create a new plugin similar to |
Info for anyone considering implementing this (maybe even a future me!):
|
Hello! Yes, I think this would be a cool thing to have, and similar requests come up from time to time (see #2811, for example). It was the original reason we introduced the An easy way to go about this would be to reuse our existing Unfortunately, I think it's inevitable that this is going to be a slow kind of query. SQLite does not support this sort of ASCIIficiation, so we'd need to do the matching in Python. We have robust built-in support for this kind of slow query, but that doesn't make it fast. |
I have created a quick and dirty plugin (called "bareasc") to experiment with bare ASCII matches using It seems to work quite well with the dvořák case but needs some more experimentation. And I haven't looked into performance at all. Would be interested to hear comments from anyone else. The draft PR is #3883. |
Apart from my idiocy last night with using I did a quick-and-dirty performance test... on my database (currently around 3200 tracks), a simple If it turns out that doing the comparison in sqlite is actually faster, then I can imagine a way to make that happen: every database record write could add an extra internal column which contains ALL the text, from all fields, just appended together into one string and put through |
The PR #3883 has now been accepted into beets - thanks to all who helped me with that. I think the |
Use case
OK - let me admit it before I start... I am a dumb native English speaker and I don't like letters with accents and don't even know how to enter most of them on my keyboard. Yep, I know: the rest of the world despises me.
However, for people like me (and, to be fair, some other cases where accented letter use varies even between speakers of the same or very closely related languages, or has differed at various historical times), it would be very nice to have the optional ability to do simplified comparisons when searching in beets.
Today, in beets, if I use
beet ls -a dvořák
I get 5 matches in my database. However, if I usebeet ls -a dvorak
I get no matches. Although I support MusicBrainz's requirement to use correct spelling, as a non-speaker of the language I would have to search online to look up the correct accents to use. It would be useful if beets allowed (perhaps optionally, based on a config setting) less strict searching. The case is even worse with Noel, which appears in various places in my database in both the forms Noël and Noel - similar problems exist for other partly non-naturalised words which are sometimes spelt in English with accents and sometimes not.Solution
A popular, powerful and standardised by the Unicode consortium approach, is called asymmetric matching. In this case, speakers choose a language (e.g. English) and can write search terms using the characters of that language: the system then matches accented letters not present in that language with the unaccented forms in the query. However, if the user actually enters an accent in the search term, then only the same accented letter matches. The link above includes some examples of using the term "resume" to match the many words spelt "resume" but with accents on various different letters.
This solution feels very natural and is easy for both speakers and non-speakers of the language to use. Unfortunately, I have not found an existing Python implementation (but I haven't looked particularly hard).
Alternatives
Regex searches can provide some capabilities but they require particular fields to be specified and require the use of the arcane regex format. For example, the Dvorak search would probably need to be entered as `artist::Dvo.a.' and requires me to know which letters are accented. It would also miss cases where the composer is mentioned in the title or albumtitle but not as an artist.
The text was updated successfully, but these errors were encountered: