Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: define filters and sort schemas for search #4270

Conversation

jfcalvo
Copy link
Member

@jfcalvo jfcalvo commented Nov 17, 2023

Description

This is a proposal for adding filters and sort attributes on the endpoints for search records.

The changes allow us to define the following schema for searching:

from argilla.server.schemas.v1.datasets import SearchRecordsQuery

SearchRecordsQuery.parse_obj({
  "query": {
    "text": {
      "q": "query"
    },
  },
  "filters": {
    "and": [
      {"type": "terms", "scope": {"entity": "suggestion", "question": "sentiment"}, "values": ["positive"]},
      {"type": "terms", "scope": {"entity": "suggestion", "question": "sentiment", "property": "agent"}, "values": ["chat-gpt3.5", "chat-gpt-4.0"]},
      {"type": "terms", "scope": {"entity": "response", "question": "topic"}, "values": ["politics", "news"]},
      {"type": "range", "scope": {"entity": "response", "question": "quality"}, "gte": 0.5, "lte": 0.9},
      {"type": "range", "scope": {"entity": "metadata", "metadata_property": "price"}, "gte": 100.0},
    ]
  },
  "sort": [
    {"scope": {"entity": "suggestion", "question": "sentiment", "property": "score"}, "order": "asc"},
    {"scope": {"entity": "response", "question": "quality"}, "order": "desc"},
  ]
})

# This is an old proposal of the schema leaving it here for reference
SearchRecordsQuery.parse_obj({
  "query": {
    "text": {
      "q": "query"
    },
  },
  "filters": {
    "and": [
      {"type": "terms", "field": "suggestion.sentiment.value", "values": ["positive"]},
      {"type": "terms", "field": "suggestion.sentiment.agent", "values": ["chat-gpt3.5", "chat-gpt-4.0"]},
      {"type": "terms", "field": "response.topic.value", "values": ["politics", "news"]},
      {"type": "range", "field": "response.quality.value", "gte": 0.5, "lte": 0.9},
    ]
  },
  "sort": [
    {"field": "suggestion.sentiment.score", "order": "asc"},
    {"field": "response.quality.value", "order": "desc"},
  ]
})

Notice some changes from the Notion documentation:

  • I'm using field instead of path because we are already using field in the query parameter.
  • I have added the possibility of specify float in the values in the case that we want to specify numbers.
  • I'm adding a new term (singular) filter so is more explicit to filter by one specific value (this is optional we can remove it if not needed).

With this PR we should ask several questions:

  • Is this functionality enough for the changes that are required on the frontend for filtering responses and suggestions?
  • Is this format flexible enough so in the future we can add more boolean operators and extend the search without deprecate anything?.
  • Do we need additional boolean operators by now or it's enough with and?
    • and will be enough for the frontend requirements by now.
  • What fieldformat should we follow?
    • Making the "value" of the field to filter implicit like suggestion.sentiment instead of suggestion.sentiment.value?
    • Making the "value" of the field to filter explicit like suggestion.sentiment.value instead of suggestion.sentiment?
      • We choose this solution and we will always use value (instead of values) at the end.

@damianpumar should help us to answer the first question of this list.

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested

  • Running tests locally.

Checklist

  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

Copy link

codecov bot commented Nov 17, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (be16ace) 64.66% compared to head (07c96e3) 91.72%.
Report is 10 commits behind head on feature/responses-and-suggestion-filter.

❗ Current head 07c96e3 differs from pull request most recent head 9c9c248. Consider uploading reports for the commit 9c9c248 to get more accurate results

Additional details and impacted files
@@                             Coverage Diff                              @@
##           feature/responses-and-suggestion-filter    #4270       +/-   ##
============================================================================
+ Coverage                                    64.66%   91.72%   +27.05%     
============================================================================
  Files                                          321      322        +1     
  Lines                                        18513    18558       +45     
============================================================================
+ Hits                                         11972    17022     +5050     
+ Misses                                        6541     1536     -5005     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4270-ki24f765kq-no.a.run.app

@damianpumar damianpumar reopened this Nov 19, 2023
@damianpumar
Copy link
Contributor

@jfcalvo Yes, for me it's ok.

Thanks Jose!

@damianpumar
Copy link
Contributor

damianpumar commented Nov 20, 2023

I found some issues in your object scheme,

{"type": "terms", "field": "suggestion.sentiment.values", "values": ["positive"]},

This example does not exist 👇
{"type": "range", "field": "suggestion.sentiment.score", "gte": 0.8},

This example 👇
{"type": "term", "field": "suggestion.sentimen.score", "value": 0.6},
must be
{"type": "range", "field": "suggestion.sentiment.score", "gte": 0.5, "lte": 0.9},

This example must be:
{"type": "terms", "field": "suggestion.sentiment.values", "values": ["positive"]},

@damianpumar
Copy link
Contributor

damianpumar commented Nov 20, 2023

  • For suggestions:
  • Score:
    • {"type": "range", "field": "suggestion.sentiment.score", "gte": 0.5, "lte": 0.9}
  • Agent:
    • {type": "terms", "field": "suggestion.sentiment.agent", "values": ["chat-gpt3.5", "chat-gpt-4.0"]}
  • Values:
    • OR:
      • {type": "terms", "field": "suggestion.sentiment.values", "values": ["positive", "negative"]} TBD: value or values
    • AND:
      • {"type": "terms", "field": "suggestion.sentiment.values", "values": ["positive"]}
      • {"type": "terms", "field": "suggestion.sentiment.values", "values": ["negative"]}
  • For responses:
    • {"type": "terms", "field": "response.topic.values", "values": ["politics", "news"]}

The sort key is not necessary for this iteration.

@frascuchon frascuchon self-requested a review November 20, 2023 15:22
@frascuchon frascuchon marked this pull request as draft November 20, 2023 15:22
Copy link
Member

@frascuchon frascuchon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this way is much more readable, comprehensible, parseable, and easy to validate

@jfcalvo jfcalvo marked this pull request as ready for review November 21, 2023 08:47
@jfcalvo jfcalvo merged commit e95c1e1 into feature/responses-and-suggestion-filter Nov 21, 2023
20 checks passed
@jfcalvo jfcalvo deleted the feature/define-filters-and-sort-schemas-for-search branch November 21, 2023 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants