Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support for update records from SDK #3946

Conversation

frascuchon
Copy link
Member

@frascuchon frascuchon commented Oct 14, 2023

Description

This PR introduces support for updating records from Python SDK. The records update workflow has 2 ways to be implemented. In this PR both are supported:
Single update:

records = ds.records[:100]
for record in records:
   record.metadata.update({"new": "metadata"})
   record.update() 

or batch update:

records = ds.records[:100]
for record in records:
   record.metadata.update({"new": "metadata"})
   record.suggestions = [...]

ds.update_records(records)

This is still compatible with previous record update functionality (where only suggestions could be updated)

for record in records:
   record.update(suggestions=[...])

In order to support this, some changes have been introduced affecting records immutability, which is now removed. The reason behind this is to provide a similar way to update local and remote records (change your data at record level and then call the update_records method) since the record.update method is only available for synced/remote entities.

The metadata validation is still missing on the record update and will be tackled as a separate PR since extra changes could be potentially introduced.

Refs #3748

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

  • New feature (non-breaking change which adds functionality)
  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

(Please describe the tests that you ran to verify your changes. And ideally, reference tests)

All the flows described below have been tested locally.

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

Also, this class defines an generic type `R` for records.
The implementation will show a warning with an explicit message
Also, the question (id -> name) and question (name -> id) maps are computed from the original dataset
Records can be updated by assigning content and then call the `record.update` method.

Suggestions are still supported, so users can update a record by passing the suggestions. But a more general way should be:

```python
record.metadata.update({"new": "metadata"})
record.suggestions = (Suggestion....)

record.update()
```
…` workflow

Record suggestions can be modified locally to prepare changes and then call the `ds.updated_records` with modified suggestions. The `record.update` still support suggestions

```python
records = ds.records[:10]

for record in records:
   record.suggestions = [SuggestionSchema(...)]
   record.metadata.update({"new": "metadata"})

# Apply all local changes to remote records
ds.update_records(records)

```
…=...) and `record.update()`

The suggestions will be filtered before update them if suggestions where provided in the `record.update` method. Otherwise, the record suggestions will be sent as new suggestions
…etter integration with unit tests

(A code review must be taken in order to not modify a class because the tests)
@frascuchon frascuchon changed the title Feat/support for update records from sdk feat: Support for update records from SDK Oct 14, 2023
validate_assignment = True

def __update_suggestions(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the old __update_suggestions method has been split into 2 steps: 1. validate and normalize/filter suggestions and 2. prepare and call the update endpoints.

@frascuchon frascuchon marked this pull request as ready for review October 15, 2023 00:04
@github-actions
Copy link

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-3946-ki24f765kq-no.a.run.app

Comment on lines +322 to +329
def update_records(self, records: Union[RemoteFeedbackRecord, List[RemoteFeedbackRecord]]) -> None:
if not isinstance(records, list):
records = [records]

# TODO: Use the batch version of endpoint once is implemented
for record in records:
record.update()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is the scenario where someone modifies a RemoteFeedbackRecord and then pushes it if not via RemoteFeedbackRecord.update? Are we allowing the assignment there? e.g. record.metadata = {"a": 1}, if so, won't this be conflictive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have the batch version of records update, the updates should be done using the batch version, since it has a better performance than the per-record update. The record.update is a way to support the current behaviour but i think it should be deprecated and removed

Comment on lines +108 to +115
@property
def _question_id_to_name(self) -> Dict["UUID", str]:
return self.dataset._question_id_to_name_id

@property
def _question_name_to_id(self) -> Dict[str, "UUID"]:
return self.dataset._question_name_to_id

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need to wrap those properties under the same name? IMO we can just re-use those from self.dataset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several places using this variable. The idea should be to remove them and start using the dataset ones, but for this, we need to refactor some remote schemas first. I would like to keep this PR with minimal changes

@@ -88,6 +88,7 @@ def to_server_payload(self) -> Dict[str, Any]:
"""Method that will be used to create the payload that will be sent to Argilla
to create a `ResponseSchema` for a `FeedbackRecord`."""
return {
# UUID is not json serializable!!!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# UUID is not json serializable!!!

Yes, it's not, for the moment we're checking the user_id in the Python SDK in add_records, but we can review this later to just parse it as a str in the to_server_payload method

suggestions: Union[Tuple[SuggestionSchema], List[SuggestionSchema]] = Field(
default_factory=tuple, allow_mutation=False
)
suggestions: Union[Tuple[SuggestionSchema], List[SuggestionSchema]] = Field(default_factory=tuple)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I already mentioned this, but the tuple may be confusing, I'm more comfortable with the list, in any case, responses is still a list, so we should align that at some point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but we need to do this taking into account that tuples have been used in current releases. My change here just removes the allow_mutation=True. Other extra things should be tackled in separate PRs. Otherwise, a lot of changes could be included here without a need.

validate_assignment = True

def __update_suggestions(
def __normalize_suggestions_to_update(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should carefully review the suggestions update/addition workflow, because I think we're adding too much complexity here that can probably be simplified

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but separate PR. I didn't change the internal logic, just separated the method in 2 different ones.

@@ -262,7 +262,7 @@ def __active_client() -> "httpx.Client":
raise RuntimeError(f"The `rg.active_client()` is not available or not respoding.") from e

@classmethod
def __new_instance(
def _new_instance(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change? Is there any strong reason for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to create workspaces in unit tests, a there is no easy way since the __init__ cannot be used.

tests/unit/client/feedback/dataset/test_base.py Outdated Show resolved Hide resolved
Comment on lines +91 to 92
# UUID is not json serializable!!!
"user_id": self.user_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be str(self.user_id), right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be, but a lot of tests must be changed to support this. I put the comment to don't forget and tackle in a separate PR

Co-authored-by: Alvaro Bartolome <[email protected]>
@frascuchon frascuchon merged commit ac3eeb9 into feature/support-for-metadata-filtering-and-sorting Oct 16, 2023
5 of 7 checks passed
@frascuchon frascuchon deleted the feat/support-for-update-records-from-SDK branch October 16, 2023 15:07
@frascuchon frascuchon modified the milestones: v1.17.0, v1.18.0 Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Update metadata for a FeedbackRecord in Argilla
3 participants