Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect language when composing, prompt user if it seems incorrect #287

Closed
nikclayton opened this issue Dec 1, 2023 · 2 comments
Closed
Milestone

Comments

@nikclayton
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Mastodon server-side translation relies on the author of a status correctly setting the language they've used. This doesn't always happen, and you get posts in e.g., German tagged as being written in English.

If you're reading this post, and your account is set to English then the server will refuse to perform the translation, returning a 403.

On-device translation will solve this problem for the recipient, but it would be helpful if Pachli could prompt the user (with an option to disable this) if the text they've written does not seem to match the language they've chosen

Describe the solution you'd like

When composing a post perform language detection on the text. If it doesn't appear to match the configured language for the post change the language icon to indicate this (e.g., change the colour to yellow (to match the "You haven't added alt text" indicator) or red).

If the language still appears incorrect when the user hits the post button then pop up a dialog warning them, showing the detected languages. Options are probably:

  1. Use the suggested language, and post
  2. Keep the existing language, and post
  3. As (2), but with a "don't prompt me again" suffix

"Don't prompt me again" is saved as a preference, and exposed through the preferences UI so the user can adjust it outside of the context of posting a new status.

https://github.com/pemistahl/lingua is one option for performing the on-device language detection (Apache 2.0)

https://developers.google.com/ml-kit/language/identification/android is the Google (free, but not open source, so can't be used in F-Droid builds).

There may be others.

Lingua has a set of test data (https://github.com/pemistahl/lingua/tree/main/src/accuracyReport/resources/language-testdata) that could be used to compare the accuracy of Lingua and Google's ML Kit. If they're roughly on-par then use Lingua exclusively, but if ML Kit performs better give users the choice (on non-F-Droid) and only use Lingua on F-Droid).

@nikclayton nikclayton added this to the 2.x milestone Dec 1, 2023
@nikclayton
Copy link
Contributor Author

Some experiments with Lingua suggest that it's a non-starter, as a trivial test app OOMs trying to do language detection. This is not a slight on Lingua, it's not designed for environments as resource constrained as phones.

@nikclayton
Copy link
Contributor Author

Implemented in #792

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

1 participant