-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Review record flatten process #4936
Labels
area: python sdk
Indicates that an issue or pull request is related to the Python SDK
Milestone
Comments
I think we should double-check the generated structure of |
nataliaElv
added
the
area: python sdk
Indicates that an issue or pull request is related to the Python SDK
label
Jun 11, 2024
Current nested structure next(iter(dataset.records(with_responses=True))).to_dict() {'id': UUID('25b8ca21-d0fb-4135-815d-0887393007b8'),
'fields': {'post': 'Another clown in favour of more tax in this country. Blows my mind people can be this stupid.'},
'metadata': {},
'suggestions': {'is_toxic': {'value': '1', 'score': None, 'agent': None},
'toxic_spans': {'value': [{'label': 'insult',
'start': 86,
'end': 92,
'score': 0.6666666666666666},
{'label': 'insult', 'start': 8, 'end': 13, 'score': 0.6666666666666666}],
'score': None,
'agent': None}},
'responses': defaultdict(list, {}),
'vectors': {},
'_server_id': '4836f966-0a7f-4d6e-b017-398270265c95'} Current Flattened structure from pprint import pprint
pprint(dataset.records.to_list(flatten=False)) [{'_server_id': '0f3fd360-7776-464e-82a2-9654da527212',
'fields': {'question': 'What is the capital of France?'},
'id': '1',
'metadata': {},
'responses': defaultdict(<class 'list'>, {}),
'suggestions': {'answer': {'agent': None, 'score': None, 'value': 'F'}},
'vectors': {}},
{'_server_id': '5aa81ed0-3cd2-4208-bae0-b8fb0ba8fd1d',
'fields': {'question': 'What is the capital of Germany?'},
'id': '2',
'metadata': {},
'responses': defaultdict(<class 'list'>, {}),
'suggestions': {'answer': {'agent': None, 'score': None, 'value': 'Berlin'}},
'vectors': {}}]
|
@frascuchon @burtenshaw This is the issue I mentioned regarding the |
Also, @MoritzLaurer found errors when creating HF datasets with records partially annotated. We need to review this to:
|
This was referenced Jul 1, 2024
frascuchon
added a commit
that referenced
this issue
Jul 3, 2024
…st (#5137) This PR changes the structure generated by `to_list(flatten=True)` to simplify reading responses. The response content is split into values and users, so no user ID is defined as part of the column name: The result for the following record: ```python record = rg.Record( fields={"field": "The field"}, metadata={"key": "value"}, responses=[ rg.Response(question_name="q1", value="value", user_id=user_a), rg.Response(question_name="q2", value="value", user_id=user_a), rg.Response(question_name="q2", value="value", user_id=user_b), rg.Response(question_name="q1", value="value", user_id=user_c), ], suggestions=[ rg.Suggestion(question_name="q1", value="value", score=0.1, agent="test"), rg.Suggestion(question_name="q2", value="value", score=0.9), ], ) ``` is : ```python { "id": <record_id>, "_server_id": None, "field": "The field", "key": "value", "q1.responses": ["value", "value"], "q1.responses.users": [str(user_a), str(user_c)], "q2.responses": ["value", "value"], "q2.responses.users": [str(user_a), str(user_b)], "q1.suggestion": "value", "q1.suggestion.score": 0.1, "q1.suggestion.agent": "test", "q2.suggestion": "value", "q2.suggestion.score": 0.9, "q2.suggestion.agent": None, } ``` Refs #4936 **Type of change** <!-- Please delete options that are not relevant. Remember to title the PR according to the type of change --> - Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** <!-- Please add some reference about how your feature has been tested. --> **Checklist** <!-- Please go over the list and make sure you've taken everything into account --> - I added relevant documentation - follows the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: burtenshaw <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
notes:
The text was updated successfully, but these errors were encountered: