-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support for update records from SDK #3946
feat: Support for update records from SDK #3946
Conversation
…o feat/support-for-update-records-from-SDK
Also, this class defines an generic type `R` for records.
The implementation will show a warning with an explicit message
Also, the question (id -> name) and question (name -> id) maps are computed from the original dataset
Records can be updated by assigning content and then call the `record.update` method. Suggestions are still supported, so users can update a record by passing the suggestions. But a more general way should be: ```python record.metadata.update({"new": "metadata"}) record.suggestions = (Suggestion....) record.update() ```
…` workflow Record suggestions can be modified locally to prepare changes and then call the `ds.updated_records` with modified suggestions. The `record.update` still support suggestions ```python records = ds.records[:10] for record in records: record.suggestions = [SuggestionSchema(...)] record.metadata.update({"new": "metadata"}) # Apply all local changes to remote records ds.update_records(records) ```
…test dataset class
…=...) and `record.update()` The suggestions will be filtered before update them if suggestions where provided in the `record.update` method. Otherwise, the record suggestions will be sent as new suggestions
…etter integration with unit tests (A code review must be taken in order to not modify a class because the tests)
for more information, see https://pre-commit.ci
validate_assignment = True | ||
|
||
def __update_suggestions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the old __update_suggestions
method has been split into 2 steps: 1. validate and normalize/filter suggestions and 2. prepare and call the update endpoints.
…:argilla-io/argilla into feat/support-for-update-records-from-SDK
The URL of the deployed environment for this PR is https://argilla-quickstart-pr-3946-ki24f765kq-no.a.run.app |
…o feat/support-for-update-records-from-SDK
…o feat/support-for-update-records-from-SDK
def update_records(self, records: Union[RemoteFeedbackRecord, List[RemoteFeedbackRecord]]) -> None: | ||
if not isinstance(records, list): | ||
records = [records] | ||
|
||
# TODO: Use the batch version of endpoint once is implemented | ||
for record in records: | ||
record.update() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is the scenario where someone modifies a RemoteFeedbackRecord
and then pushes it if not via RemoteFeedbackRecord.update
? Are we allowing the assignment there? e.g. record.metadata = {"a": 1}
, if so, won't this be conflictive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have the batch version of records update, the updates should be done using the batch version, since it has a better performance than the per-record update. The record.update
is a way to support the current behaviour but i think it should be deprecated and removed
@property | ||
def _question_id_to_name(self) -> Dict["UUID", str]: | ||
return self.dataset._question_id_to_name_id | ||
|
||
@property | ||
def _question_name_to_id(self) -> Dict[str, "UUID"]: | ||
return self.dataset._question_name_to_id | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to wrap those properties under the same name? IMO we can just re-use those from self.dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several places using this variable. The idea should be to remove them and start using the dataset ones, but for this, we need to refactor some remote schemas first. I would like to keep this PR with minimal changes
@@ -88,6 +88,7 @@ def to_server_payload(self) -> Dict[str, Any]: | |||
"""Method that will be used to create the payload that will be sent to Argilla | |||
to create a `ResponseSchema` for a `FeedbackRecord`.""" | |||
return { | |||
# UUID is not json serializable!!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# UUID is not json serializable!!! |
Yes, it's not, for the moment we're checking the user_id
in the Python SDK in add_records
, but we can review this later to just parse it as a str
in the to_server_payload
method
suggestions: Union[Tuple[SuggestionSchema], List[SuggestionSchema]] = Field( | ||
default_factory=tuple, allow_mutation=False | ||
) | ||
suggestions: Union[Tuple[SuggestionSchema], List[SuggestionSchema]] = Field(default_factory=tuple) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I already mentioned this, but the tuple
may be confusing, I'm more comfortable with the list
, in any case, responses
is still a list, so we should align that at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but we need to do this taking into account that tuples have been used in current releases. My change here just removes the allow_mutation=True
. Other extra things should be tackled in separate PRs. Otherwise, a lot of changes could be included here without a need.
validate_assignment = True | ||
|
||
def __update_suggestions( | ||
def __normalize_suggestions_to_update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should carefully review the suggestions
update/addition workflow, because I think we're adding too much complexity here that can probably be simplified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but separate PR. I didn't change the internal logic, just separated the method in 2 different ones.
@@ -262,7 +262,7 @@ def __active_client() -> "httpx.Client": | |||
raise RuntimeError(f"The `rg.active_client()` is not available or not respoding.") from e | |||
|
|||
@classmethod | |||
def __new_instance( | |||
def _new_instance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change? Is there any strong reason for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to create workspaces in unit tests, a there is no easy way since the __init__
cannot be used.
# UUID is not json serializable!!! | ||
"user_id": self.user_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be str(self.user_id)
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be, but a lot of tests must be changed to support this. I put the comment to don't forget and tackle in a separate PR
Co-authored-by: Alvaro Bartolome <[email protected]>
ac3eeb9
into
feature/support-for-metadata-filtering-and-sorting
Description
This PR introduces support for updating records from Python SDK. The records update workflow has 2 ways to be implemented. In this PR both are supported:
Single update:
or batch update:
This is still compatible with previous record update functionality (where only suggestions could be updated)
In order to support this, some changes have been introduced affecting records immutability, which is now removed. The reason behind this is to provide a similar way to update local and remote records (change your data at record level and then call the
update_records
method) since therecord.update
method is only available for synced/remote entities.The metadata validation is still missing on the record update and will be tackled as a separate PR since extra changes could be potentially introduced.
Refs #3748
Type of change
(Please delete options that are not relevant. Remember to title the PR according to the type of change)
How Has This Been Tested
(Please describe the tests that you ran to verify your changes. And ideally, reference
tests
)All the flows described below have been tested locally.
Checklist
CHANGELOG.md
file (See https://keepachangelog.com/)