Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More 'asciify' documentation #929

Closed
gwern opened this issue Sep 3, 2014 · 2 comments · Fixed by #4067
Closed

More 'asciify' documentation #929

gwern opened this issue Sep 3, 2014 · 2 comments · Fixed by #4067
Labels
docs needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature."

Comments

@gwern
Copy link
Contributor

gwern commented Sep 3, 2014

The documentation for the asciify_paths option says

Convert all non-ASCII characters in paths to ASCII equivalents. For example, if your path template for singletons is singletons/$title and the title of a track is “Café”, then the track will be saved as singletons/Cafe.mp3.

This is clear enough for a Latin script (one would expect 'é' to be converted to 'e'), but it's unclear what this command would do for anything written in entirely different scripts or writing systems like Japanese kanji. (It says 'all' - does that mean they would be deleted since they have no ASCII equivalents?) I have no idea what it might do, and am too terrified to let this option anywhere near my files to figure it out empirically, so more documentation would be helpful for deciding whether to use this option.

@andriykohut
Copy link
Contributor

You can check out unidecode description, it's just doing transliteration:

function unidecode() takes Unicode data and tries to represent it in ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F), where the compromises taken when mapping between two character sets are chosen to be near what a human with a US keyboard would choose.

Something like:

from unidecode import unidecode

unidecode.unidecode(u'パイソン') # 'paison'
unidecode.unidecode(u'蠎') # 'mang'

@sampsyo sampsyo added the docs label Sep 4, 2014
@sampsyo
Copy link
Member

sampsyo commented Sep 4, 2014

Thanks for chiming in, @andriykohut.

@gwern, if you do some investigation here, could you please consider adding what you find to the docs? The Unidecode mapping is pretty straightforward and there's no reason we shouldn't spend a sentence or two giving more detail.

@sampsyo sampsyo added the needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." label Sep 7, 2014
@sampsyo sampsyo closed this as completed Nov 25, 2014
emiham added a commit to emiham/beets that referenced this issue Sep 17, 2021
This was proposed in beetbox#929 but never dealt with, so after going through some confusion around this myself I figured it's about time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature."
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants