feat: Support for update records from SDK #3946

frascuchon · 2023-10-14T23:51:29Z

Description

This PR introduces support for updating records from Python SDK. The records update workflow has 2 ways to be implemented. In this PR both are supported:
Single update:

records = ds.records[:100]
for record in records:
   record.metadata.update({"new": "metadata"})
   record.update()

or batch update:

records = ds.records[:100]
for record in records:
   record.metadata.update({"new": "metadata"})
   record.suggestions = [...]

ds.update_records(records)

This is still compatible with previous record update functionality (where only suggestions could be updated)

for record in records:
   record.update(suggestions=[...])

In order to support this, some changes have been introduced affecting records immutability, which is now removed. The reason behind this is to provide a similar way to update local and remote records (change your data at record level and then call the update_records method) since the record.update method is only available for synced/remote entities.

The metadata validation is still missing on the record update and will be tackled as a separate PR since extra changes could be potentially introduced.

Refs #3748

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

New feature (non-breaking change which adds functionality)
Refactor (change restructuring the codebase without changing functionality)
Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

(Please describe the tests that you ran to verify your changes. And ideally, reference tests)

All the flows described below have been tested locally.

Checklist

I added relevant documentation
I followed the style guidelines of this project
I did a self-review of my code
I made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I filled out the contributor form (see text above)
I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

…o feat/support-for-update-records-from-SDK

Also, this class defines an generic type `R` for records.

The implementation will show a warning with an explicit message

Also, the question (id -> name) and question (name -> id) maps are computed from the original dataset

Records can be updated by assigning content and then call the `record.update` method. Suggestions are still supported, so users can update a record by passing the suggestions. But a more general way should be: ```python record.metadata.update({"new": "metadata"}) record.suggestions = (Suggestion....) record.update() ```

…` workflow Record suggestions can be modified locally to prepare changes and then call the `ds.updated_records` with modified suggestions. The `record.update` still support suggestions ```python records = ds.records[:10] for record in records: record.suggestions = [SuggestionSchema(...)] record.metadata.update({"new": "metadata"}) # Apply all local changes to remote records ds.update_records(records) ```

…test dataset class

…=...) and `record.update()` The suggestions will be filtered before update them if suggestions where provided in the `record.update` method. Otherwise, the record suggestions will be sent as new suggestions

…etter integration with unit tests (A code review must be taken in order to not modify a class because the tests)

for more information, see https://pre-commit.ci

frascuchon · 2023-10-14T23:55:30Z

src/argilla/client/feedback/schemas/remote/records.py

        validate_assignment = True

-    def __update_suggestions(


Here, the old __update_suggestions method has been split into 2 steps: 1. validate and normalize/filter suggestions and 2. prepare and call the update endpoints.

…:argilla-io/argilla into feat/support-for-update-records-from-SDK

github-actions · 2023-10-15T00:42:32Z

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-3946-ki24f765kq-no.a.run.app

…o feat/support-for-update-records-from-SDK

alvarobartt · 2023-10-16T13:04:55Z

src/argilla/client/feedback/dataset/remote/dataset.py

+    def update_records(self, records: Union[RemoteFeedbackRecord, List[RemoteFeedbackRecord]]) -> None:
+        if not isinstance(records, list):
+            records = [records]
+
+        # TODO: Use the batch version of endpoint once is implemented
+        for record in records:
+            record.update()
+


Which is the scenario where someone modifies a RemoteFeedbackRecord and then pushes it if not via RemoteFeedbackRecord.update? Are we allowing the assignment there? e.g. record.metadata = {"a": 1}, if so, won't this be conflictive?

Once we have the batch version of records update, the updates should be done using the batch version, since it has a better performance than the per-record update. The record.update is a way to support the current behaviour but i think it should be deprecated and removed

alvarobartt · 2023-10-16T13:05:35Z

src/argilla/client/feedback/dataset/remote/dataset.py

+    @property
+    def _question_id_to_name(self) -> Dict["UUID", str]:
+        return self.dataset._question_id_to_name_id
+
+    @property
+    def _question_name_to_id(self) -> Dict[str, "UUID"]:
+        return self.dataset._question_name_to_id
+


Do we actually need to wrap those properties under the same name? IMO we can just re-use those from self.dataset

There are several places using this variable. The idea should be to remove them and start using the dataset ones, but for this, we need to refactor some remote schemas first. I would like to keep this PR with minimal changes

alvarobartt · 2023-10-16T13:06:51Z

src/argilla/client/feedback/schemas/records.py

@@ -88,6 +88,7 @@ def to_server_payload(self) -> Dict[str, Any]:
        """Method that will be used to create the payload that will be sent to Argilla
        to create a `ResponseSchema` for a `FeedbackRecord`."""
        return {
+            # UUID is not json serializable!!!


Suggested change

# UUID is not json serializable!!!

Yes, it's not, for the moment we're checking the user_id in the Python SDK in add_records, but we can review this later to just parse it as a str in the to_server_payload method

alvarobartt · 2023-10-16T13:07:48Z

src/argilla/client/feedback/schemas/records.py

-    suggestions: Union[Tuple[SuggestionSchema], List[SuggestionSchema]] = Field(
-        default_factory=tuple, allow_mutation=False
-    )
+    suggestions: Union[Tuple[SuggestionSchema], List[SuggestionSchema]] = Field(default_factory=tuple)


I think I already mentioned this, but the tuple may be confusing, I'm more comfortable with the list, in any case, responses is still a list, so we should align that at some point

I agree, but we need to do this taking into account that tuples have been used in current releases. My change here just removes the allow_mutation=True. Other extra things should be tackled in separate PRs. Otherwise, a lot of changes could be included here without a need.

alvarobartt · 2023-10-16T13:10:15Z

src/argilla/client/feedback/schemas/remote/records.py

        validate_assignment = True

-    def __update_suggestions(
+    def __normalize_suggestions_to_update(


I think we should carefully review the suggestions update/addition workflow, because I think we're adding too much complexity here that can probably be simplified

yes, but separate PR. I didn't change the internal logic, just separated the method in 2 different ones.

alvarobartt · 2023-10-16T13:15:22Z

src/argilla/client/workspaces.py

@@ -262,7 +262,7 @@ def __active_client() -> "httpx.Client":
            raise RuntimeError(f"The `rg.active_client()` is not available or not respoding.") from e

    @classmethod
-    def __new_instance(
+    def _new_instance(


Why is this change? Is there any strong reason for it?

we need to create workspaces in unit tests, a there is no easy way since the __init__ cannot be used.

tests/integration/client/feedback/dataset/remote/test_dataset.py

tests/unit/client/feedback/dataset/remote/test_dataset.py

tests/unit/client/feedback/dataset/test_base.py

gabrielmbmb · 2023-10-16T13:22:25Z

src/argilla/client/feedback/schemas/records.py

+            # UUID is not json serializable!!!
            "user_id": self.user_id,


this should be str(self.user_id), right?

It should be, but a lot of tests must be changed to support this. I put the comment to don't forget and tackle in a separate PR

Co-authored-by: Alvaro Bartolome <[email protected]>

frascuchon added 22 commits October 13, 2023 14:56

fix: Using utcnow datetime

1ce3feb

tests: Remove date creation

86a3ea1

Merge branch 'feature/support-for-metadata-filtering-and-sorting' int…

ce580db

…o feat/support-for-update-records-from-SDK

feat: Define update_recordsfor base feeedback dataset class

83bac6f

Also, this class defines an generic type `R` for records.

refactor: Implement update_records method for local datasets

b10ea26

The implementation will show a warning with an explicit message

feat: Implement update_records method based on record.update

bbd4236

Also, the question (id -> name) and question (name -> id) maps are computed from the original dataset

chore: Fix ArgillaRecordsMixin method signatures

bcdbe9c

chore: Add some TODO reminders

1a464f3

feat: call record update endpoint

1daff38

tests: Adapt Test base dataset including missing abstract methods to …

b28c45f

…test dataset class

tests: Add unit test for local.update_records workflow

c42b963

chore: Move set_suggestions fuction to records.py API module

2a94c0e

refactor: Control record suggestions updates from `update(suggestions…

ab77ecb

…=...) and `record.update()` The suggestions will be filtered before update them if suggestions where provided in the `record.update` method. Otherwise, the record suggestions will be sent as new suggestions

refactor: Define the workspace instance creation method private for b…

4b0ccbd

…etter integration with unit tests (A code review must be taken in order to not modify a class because the tests)

fix: Indentation return

256f797

tests: Remove raise check for suggestions immutability

547b641

chore: Adapt imports

5d416d7

tests: fixture for mock httpx client

a85279a

tests: Unit tests for update records with and without suggestions

2787be3

tests: Integration tests for updating records

99e5d00

frascuchon changed the title ~~Feat/support for update records from sdk~~ feat: Support for update records from SDK Oct 14, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

db1f466

for more information, see https://pre-commit.ci

frascuchon commented Oct 14, 2023

View reviewed changes

frascuchon added 3 commits October 15, 2023 02:00

chore: Fix method signature

f219d7b

Merge branch 'feat/support-for-update-records-from-SDK' of github.com…

04c3829

…:argilla-io/argilla into feat/support-for-update-records-from-SDK

chore: Update changelog

02b5c29

frascuchon marked this pull request as ready for review October 15, 2023 00:04

frascuchon requested a review from alvarobartt October 15, 2023 00:04

frascuchon requested review from gabrielmbmb and davidberenstein1957 October 15, 2023 00:04

frascuchon mentioned this pull request Oct 15, 2023

⏳ Filtering and sorting using custom metadata info #3748

Closed

alvarobartt linked an issue Oct 16, 2023 that may be closed by this pull request

[FEATURE] Update metadata for a FeedbackRecord in Argilla #3897

Closed

alvarobartt assigned frascuchon Oct 16, 2023

alvarobartt added type: enhancement Indicates new feature requests client labels Oct 16, 2023

alvarobartt added this to the v1.17.0 milestone Oct 16, 2023

frascuchon added 3 commits October 16, 2023 11:14

Merge branch 'feature/support-for-metadata-filtering-and-sorting' int…

7537bd7

…o feat/support-for-update-records-from-SDK

Merge branch 'feature/support-for-metadata-filtering-and-sorting' int…

15d36df

…o feat/support-for-update-records-from-SDK

ci: Show file system description

2e74cd0

alvarobartt approved these changes Oct 16, 2023

View reviewed changes

gabrielmbmb approved these changes Oct 16, 2023

View reviewed changes

Apply suggestions from code review

c24c65e

Co-authored-by: Alvaro Bartolome <[email protected]>

frascuchon merged commit ac3eeb9 into feature/support-for-metadata-filtering-and-sorting Oct 16, 2023
5 of 7 checks passed

frascuchon deleted the feat/support-for-update-records-from-SDK branch October 16, 2023 15:07

frascuchon modified the milestones: v1.17.0, v1.18.0 Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support for update records from SDK #3946

feat: Support for update records from SDK #3946

frascuchon commented Oct 14, 2023 •

edited

Loading

frascuchon Oct 14, 2023

github-actions bot commented Oct 15, 2023

alvarobartt Oct 16, 2023

frascuchon Oct 16, 2023

alvarobartt Oct 16, 2023

frascuchon Oct 16, 2023

alvarobartt Oct 16, 2023

alvarobartt Oct 16, 2023

frascuchon Oct 16, 2023

alvarobartt Oct 16, 2023

frascuchon Oct 16, 2023

alvarobartt Oct 16, 2023

frascuchon Oct 16, 2023

gabrielmbmb Oct 16, 2023

frascuchon Oct 16, 2023

feat: Support for update records from SDK #3946

feat: Support for update records from SDK #3946

Conversation

frascuchon commented Oct 14, 2023 • edited Loading

Description

Choose a reason for hiding this comment

github-actions bot commented Oct 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frascuchon commented Oct 14, 2023 •

edited

Loading