Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: handling synonyms and text fields more efficiently #234

Merged
merged 39 commits into from
Oct 24, 2024

Conversation

alexgarel
Copy link
Member

@alexgarel alexgarel commented Aug 21, 2024

  • Using synonyms capabilities of ES to avoid storing taxonomies fields in the index
  • Better handling of full text queries that support them within any expression
  • make boost_phrase a separate parameter
  • Raise errors if the query is not well understood or do not pass some sanity checks
  • Use main translation of taxonomy for facets values (instead of a random synonym)
  • Better handling of global config to avoid treacherous patterns
  • Unify parameters for Get and Post (better use of pydantic)
  • Error on extraneous search parameters to avoid hard to debug issues with typos

Part of: #193

@raphael0202 revealed some pending issues:

  • make labels:"fair-trade" work (it does not work because of the "-" being splet in two words)
  • make labels:"appellation d'origine protégée" work (it does not work because of apostrophe

It might be a change in analyzer and/or synonyms file.
Apart from adding new test in test_search.py, I need to add a test on ES analysis, as it will help document behaviour.

@github-actions github-actions bot added 🐳 docker Pull requests that update Docker code CLI Taxonomies labels Aug 21, 2024
@alexgarel alexgarel marked this pull request as draft August 21, 2024 16:39
@alexgarel alexgarel changed the title feat: handling synonyms more efficiently feat: handling synonyms and text fields more efficiently Aug 26, 2024
@github-actions github-actions bot added 📚 Documentation Improvements or additions to documentation API labels Oct 8, 2024
app/_import.py Outdated Show resolved Hide resolved
app/_import.py Outdated Show resolved Hide resolved
app/indexing.py Outdated
number_of_fields,
)

log = logging.getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We name the logger logger in the rest of the project, and use app.utils.get_logger function (to set-up the right LOG_LEVEL)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because the logging.getLogger is very strange… if you give it a name, it resets log level to NOT_SET !

Here it wasn't used anymore so I removed it.

I will do a refactor on that, some time ;-)

app/indexing.py Outdated Show resolved Hide resolved
app/utils/analyzers.py Outdated Show resolved Hide resolved
app/openfoodfacts.py Outdated Show resolved Hide resolved
app/postprocessing.py Outdated Show resolved Hide resolved
tests/int/conftest.py Outdated Show resolved Hide resolved
@raphael0202
Copy link
Contributor

raphael0202 commented Oct 24, 2024

Great work! 🎉
I tested it locally, it works well :)

@raphael0202 raphael0202 self-requested a review October 24, 2024 09:55
@alexgarel alexgarel merged commit f427a5e into main Oct 24, 2024
8 checks passed
@alexgarel alexgarel deleted the fix-text-handling branch October 24, 2024 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API CLI 🐳 docker Pull requests that update Docker code 📚 Documentation Improvements or additions to documentation indexation Taxonomies
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants