Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ml): better multilingual search with nllb models #13567

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mertalev
Copy link
Contributor

@mertalev mertalev commented Oct 17, 2024

Description

The NLLB models have the highest quality results for multilingual text in our model catalog. However, these models expect the input to explicitly indicate the language of the text. Since we don't provide this, the results are worse than other multilingual models.

This PR makes the machine learning service accept a language option that it maps to the corresponding FLORES200 token that NLLB expects. The server is updated to accept this parameter and forward it to the machine learning service, while web and mobile are updated to provide the language based on current user settings. Search still works without it, and the option only has an impact when using an NLLB model.

How Has This Been Tested?

I tested on web and can confirm the results for Turkish are significantly better. Changing the language in the user settings has a marked effect on the ranking. I haven't tested mobile so I'm not sure if all the language codes there line up with the map in the machine learning service.

Copy link
Contributor

github-actions bot commented Oct 18, 2024

📖 Documentation deployed to pr-13567.preview.immich.app

Copy link
Contributor

@jrasm91 jrasm91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

@@ -66,6 +66,65 @@
SUPPORTED_PROVIDERS = ["CUDAExecutionProvider", "OpenVINOExecutionProvider", "CPUExecutionProvider"]


WEBLATE_TO_FLORES200 = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder where this mapping should live. To some extent the server knows which model it is using when it sends the request so you could argue this belongs there. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's better for it to live here - it makes things easier if you query ML directly or use the models in another context like in notebooks.

@@ -144,6 +144,7 @@
page: nextPage,
withExif: true,
isVisible: true,
language: $lang,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a new setting? I like to have my UI in English, but I prefer to use my native language for searching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we could have a Search Language user preference separate from the UI language.

Copy link
Contributor Author

@mertalev mertalev Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing is that the setting only applies to four non-default models, so I wonder if it'd be confusing to users. Maybe we can hide the setting if they're not using a relevant model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also have this be something the admin can override?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean making it use a certain search language by default for all users? That could be nice, possibly doing the same for the UI language as well. But I think we should avoid making too many changes at once in this PR, so the admin stuff can be done separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants