feat: add target_context to dataset columns #266

oyangz · 2024-04-26T21:43:07Z

Issue #, if available:

Description of changes:

Update: Initially we planned on the target_context dataset column taking list of strings. This does not work with ray operations such as map_batches due to issues including ray-project/ray#39559 and other unsupported data type errors. Thus the target_context dataset column has been modified to take a string, and we will use string concatenation when there are multiple target contexts, similar to the existing target_output field.

Description (updated):

Adds support for loading target_context for evaluation of RAG "ground truth" context provided in a dataset. The target_context for each dataset sample is a ~~list of~~ string.
~~Modifies json_parser to accept lists of strings by updating JMESPath output validation and string casting~~.
Adds unit tests for JSON and JSONLINES cases.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

danielezhu

Non-blocking thought: is target context the only column type that we anticipate being a special case (ie requiring different logic for json parsing)? If we end up having other columns that are similar in behavior to target context, we will want to group these column types together.

franluca · 2024-04-29T08:01:11Z

Non-blocking thought: is target context the only column type that we anticipate being a special case (ie requiring different logic for json parsing)? If we end up having other columns that are similar in behavior to target context, we will want to group these column types together.

Just to double check, is the different logic there because we expect the context to be a list? If so:

what happens if it is actually not a list?
this is pretty similar to the situation with multiple ground truths, which we had the workaround to concatenate together using the "". Is this no longer necessary? if so, I find it a bit undesirable to have different treatment for the same case.

xiaoyi-cheng · 2024-04-29T09:40:10Z

this is pretty similar to the situation with multiple ground truths, which we had the workaround to concatenate together using the "". Is this no longer necessary? if so, I find it a bit undesirable to have different treatment for the same case.

Interesting. Let's hold off this PR and discuss offline. Maybe we should keep target_context as a string and concatenate the strings together.

feat: add target_context to dataloaders

48a7bf1

oyangz changed the title ~~feat: add target_context to data_loaders~~ feat: add target_context to dataset columns Apr 26, 2024

danielezhu previously approved these changes Apr 26, 2024

View reviewed changes

oyangz requested a review from xiaoyi-cheng April 26, 2024 22:58

xiaoyi-cheng requested review from franluca and polaschwoebel April 29, 2024 06:35

lucfra approved these changes Apr 29, 2024

View reviewed changes

franluca previously approved these changes Apr 29, 2024

View reviewed changes

xiaoyi-cheng changed the title ~~feat: add target_context to dataset columns~~ [DO NOT MERGE] feat: add target_context to dataset columns May 1, 2024

add target_context as strings instead of list of strings

079bc9b

oyangz dismissed stale reviews from franluca and danielezhu via 079bc9b May 20, 2024 18:09

oyangz changed the title ~~[DO NOT MERGE] feat: add target_context to dataset columns~~ feat: add target_context to dataset columns May 20, 2024

Merge branch 'main' into target_context

2d7e056

danielezhu approved these changes May 20, 2024

View reviewed changes

xiaoyi-cheng approved these changes May 21, 2024

View reviewed changes

oyangz merged commit 6190b49 into aws:main May 21, 2024
3 checks passed

oyangz deleted the target_context branch May 21, 2024 16:02

xiaoyi-cheng mentioned this pull request Jul 11, 2024

feat: update context to take lists and rename context field #305

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add target_context to dataset columns #266

feat: add target_context to dataset columns #266

oyangz commented Apr 26, 2024 •

edited

Loading

danielezhu left a comment

franluca commented Apr 29, 2024

xiaoyi-cheng commented Apr 29, 2024

feat: add target_context to dataset columns #266

feat: add target_context to dataset columns #266

Conversation

oyangz commented Apr 26, 2024 • edited Loading

danielezhu left a comment

Choose a reason for hiding this comment

franluca commented Apr 29, 2024

xiaoyi-cheng commented Apr 29, 2024

oyangz commented Apr 26, 2024 •

edited

Loading