feat: define filters and sort schemas for search #4270

jfcalvo · 2023-11-17T16:57:39Z

Description

This is a proposal for adding filters and sort attributes on the endpoints for search records.

The changes allow us to define the following schema for searching:

from argilla.server.schemas.v1.datasets import SearchRecordsQuery

SearchRecordsQuery.parse_obj({
  "query": {
    "text": {
      "q": "query"
    },
  },
  "filters": {
    "and": [
      {"type": "terms", "scope": {"entity": "suggestion", "question": "sentiment"}, "values": ["positive"]},
      {"type": "terms", "scope": {"entity": "suggestion", "question": "sentiment", "property": "agent"}, "values": ["chat-gpt3.5", "chat-gpt-4.0"]},
      {"type": "terms", "scope": {"entity": "response", "question": "topic"}, "values": ["politics", "news"]},
      {"type": "range", "scope": {"entity": "response", "question": "quality"}, "gte": 0.5, "lte": 0.9},
      {"type": "range", "scope": {"entity": "metadata", "metadata_property": "price"}, "gte": 100.0},
    ]
  },
  "sort": [
    {"scope": {"entity": "suggestion", "question": "sentiment", "property": "score"}, "order": "asc"},
    {"scope": {"entity": "response", "question": "quality"}, "order": "desc"},
  ]
})

# This is an old proposal of the schema leaving it here for reference
SearchRecordsQuery.parse_obj({
  "query": {
    "text": {
      "q": "query"
    },
  },
  "filters": {
    "and": [
      {"type": "terms", "field": "suggestion.sentiment.value", "values": ["positive"]},
      {"type": "terms", "field": "suggestion.sentiment.agent", "values": ["chat-gpt3.5", "chat-gpt-4.0"]},
      {"type": "terms", "field": "response.topic.value", "values": ["politics", "news"]},
      {"type": "range", "field": "response.quality.value", "gte": 0.5, "lte": 0.9},
    ]
  },
  "sort": [
    {"field": "suggestion.sentiment.score", "order": "asc"},
    {"field": "response.quality.value", "order": "desc"},
  ]
})

Notice some changes from the Notion documentation:

I'm using field instead of path because we are already using field in the query parameter.
I have added the possibility of specify float in the values in the case that we want to specify numbers.
~~I'm adding a new term (singular) filter so is more explicit to filter by one specific value (this is optional we can remove it if not needed).~~

With this PR we should ask several questions:

Is this functionality enough for the changes that are required on the frontend for filtering responses and suggestions?
- Reviewed with @damianpumar and feedback applied.
Is this format flexible enough so in the future we can add more boolean operators and extend the search without deprecate anything?.
Do we need additional boolean operators by now or it's enough with and?
- and will be enough for the frontend requirements by now.
What fieldformat should we follow?
- ~~Making the "value" of the field to filter implicit like suggestion.sentiment instead of suggestion.sentiment.value?~~
- Making the "value" of the field to filter explicit like suggestion.sentiment.value instead of suggestion.sentiment?
  - We choose this solution and we will always use value (instead of values) at the end.

@damianpumar should help us to answer the first question of this list.

Type of change

New feature (non-breaking change which adds functionality)

How Has This Been Tested

Running tests locally.

Checklist

I added relevant documentation
follows the style guidelines of this project
I did a self-review of my code
I made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I filled out the contributor form (see text above)
I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

…h endpoints

codecov · 2023-11-17T17:27:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (be16ace) 64.66% compared to head (07c96e3) 91.72%.
Report is 10 commits behind head on feature/responses-and-suggestion-filter.

❗ Current head 07c96e3 differs from pull request most recent head 9c9c248. Consider uploading reports for the commit 9c9c248 to get more accurate results

Additional details and impacted files

@@                             Coverage Diff                              @@
##           feature/responses-and-suggestion-filter    #4270       +/-   ##
============================================================================
+ Coverage                                    64.66%   91.72%   +27.05%     
============================================================================
  Files                                          321      322        +1     
  Lines                                        18513    18558       +45     
============================================================================
+ Hits                                         11972    17022     +5050     
+ Misses                                        6541     1536     -5005

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2023-11-17T17:27:54Z

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4270-ki24f765kq-no.a.run.app

damianpumar · 2023-11-19T16:14:18Z

@jfcalvo Yes, for me it's ok.

Thanks Jose!

damianpumar · 2023-11-20T08:07:25Z

I found some issues in your object scheme,

{"type": "terms", "field": "suggestion.sentiment.values", "values": ["positive"]},

This example does not exist 👇
{"type": "range", "field": "suggestion.sentiment.score", "gte": 0.8},

This example 👇
{"type": "term", "field": "suggestion.sentimen.score", "value": 0.6},
must be
{"type": "range", "field": "suggestion.sentiment.score", "gte": 0.5, "lte": 0.9},

This example must be:
{"type": "terms", "field": "suggestion.sentiment.values", "values": ["positive"]},

damianpumar · 2023-11-20T10:12:13Z

For suggestions:
Score:
- {"type": "range", "field": "suggestion.sentiment.score", "gte": 0.5, "lte": 0.9}
Agent:
- {type": "terms", "field": "suggestion.sentiment.agent", "values": ["chat-gpt3.5", "chat-gpt-4.0"]}
Values:
- OR:
  - {type": "terms", "field": "suggestion.sentiment.values", "values": ["positive", "negative"]} TBD: value or values
- AND:
  - {"type": "terms", "field": "suggestion.sentiment.values", "values": ["positive"]}
  - {"type": "terms", "field": "suggestion.sentiment.values", "values": ["negative"]}
For responses:
- {"type": "terms", "field": "response.topic.values", "values": ["politics", "news"]}

The sort key is not necessary for this iteration.

src/argilla/server/schemas/v1/datasets.py

frascuchon

I think this way is much more readable, comprehensible, parseable, and easy to validate

feat: add initial proposal for filtering and sorting on records searc…

e0d7d6b

…h endpoints

jfcalvo requested a review from frascuchon November 17, 2023 16:57

jfcalvo marked this pull request as ready for review November 17, 2023 16:57

jfcalvo mentioned this pull request Nov 17, 2023

[FEATURE] Support filter and sort as part of the request body for search endpoints #4227

Closed

damianpumar closed this Nov 19, 2023

damianpumar reopened this Nov 19, 2023

feat: apply feedback from code review

07c96e3

frascuchon approved these changes Nov 20, 2023

View reviewed changes

frascuchon reviewed Nov 20, 2023

View reviewed changes

src/argilla/server/schemas/v1/datasets.py Outdated Show resolved Hide resolved

frascuchon self-requested a review November 20, 2023 15:22

frascuchon marked this pull request as draft November 20, 2023 15:22

jfcalvo added 3 commits November 20, 2023 16:46

feat: add additional validations to gte and lte fields for RangeFilter

ddc0858

feat: add entity for search record filters

3604d50

feat: small improvements on filter key names

9c9c248

frascuchon approved these changes Nov 21, 2023

View reviewed changes

jfcalvo marked this pull request as ready for review November 21, 2023 08:47

jfcalvo merged commit e95c1e1 into feature/responses-and-suggestion-filter Nov 21, 2023
20 checks passed

jfcalvo deleted the feature/define-filters-and-sort-schemas-for-search branch November 21, 2023 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: define filters and sort schemas for search #4270

feat: define filters and sort schemas for search #4270

jfcalvo commented Nov 17, 2023 •

edited

Loading

codecov bot commented Nov 17, 2023 •

edited

Loading

github-actions bot commented Nov 17, 2023

damianpumar commented Nov 19, 2023

damianpumar commented Nov 20, 2023 •

edited

Loading

damianpumar commented Nov 20, 2023 •

edited

Loading

frascuchon left a comment

feat: define filters and sort schemas for search #4270

feat: define filters and sort schemas for search #4270

Conversation

jfcalvo commented Nov 17, 2023 • edited Loading

Description

codecov bot commented Nov 17, 2023 • edited Loading

Codecov Report

github-actions bot commented Nov 17, 2023

damianpumar commented Nov 19, 2023

damianpumar commented Nov 20, 2023 • edited Loading

damianpumar commented Nov 20, 2023 • edited Loading

frascuchon left a comment

Choose a reason for hiding this comment

jfcalvo commented Nov 17, 2023 •

edited

Loading

codecov bot commented Nov 17, 2023 •

edited

Loading

damianpumar commented Nov 20, 2023 •

edited

Loading

damianpumar commented Nov 20, 2023 •

edited

Loading