Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TextClassification dataset displays the labels in wrong order #3828

Closed
gabrielmbmb opened this issue Sep 26, 2023 · 5 comments · Fixed by #4332
Closed

[BUG] TextClassification dataset displays the labels in wrong order #3828

gabrielmbmb opened this issue Sep 26, 2023 · 5 comments · Fixed by #4332
Assignees
Labels
area: api Indicates that an issue or pull request is related to the Fast API server or REST endpoints type: bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@gabrielmbmb
Copy link
Member

Describe the bug
When setting the label_schema for a TextClassification dataset the order in which the labels were provided is lost.

Stacktrace and Code to create the bug

import argilla as rg

rg.set_workspace("admin")

settings = rg.TextClassificationSettings(label_schema=[
    "1 (extremely positive/supportive)",
    "2 (positive/supportive)",
    "3 (neutral)",
    "4 (hateful/unsupportive)",
    "5 (extremely hateful/unsupportive)",
    "6 (can't say!)"
])

rg.log(rg.TextClassificationRecord(text="blablabla"), name="test-order-labels")

rg.configure_dataset_settings(name="test-order-labels", settings=settings)

image

Expected behavior
The order in which the labels were provided is not lost.

Environment:

  • Argilla Version [e.g. 1.0.0]: 1.16.0
  • ElasticSearch Version [e.g. 7.10.2]: 8.8.2
  • Docker Image (optional) [e.g. argilla:v1.0.0]:
@gabrielmbmb gabrielmbmb added type: bug Indicates an unexpected problem or unintended behavior area: api Indicates that an issue or pull request is related to the Fast API server or REST endpoints labels Sep 26, 2023
@gabrielmbmb gabrielmbmb added this to the v1.17.0 milestone Sep 26, 2023
@gabrielmbmb gabrielmbmb self-assigned this Sep 26, 2023
@frascuchon frascuchon modified the milestones: v1.17.0, v1.18.0 Oct 19, 2023
@davidberenstein1957
Copy link
Member

@frascuchon do we want to fix this or just allow for this in the FeedbackDataset?

@dvsrepo
Copy link
Member

dvsrepo commented Oct 30, 2023

I think @gabrielmbmb fixed this for a community user..please confirm

@gabrielmbmb gabrielmbmb modified the milestones: v1.18.0, v1.19.0 Nov 2, 2023
@jfcalvo
Copy link
Member

jfcalvo commented Nov 8, 2023

@gabrielmbmb please can you confirm that this is already solved? If that's the case please close the issue.

@gabrielmbmb gabrielmbmb modified the milestones: v1.19.0, v1.20.0 Nov 8, 2023
@gabrielmbmb
Copy link
Member Author

@dvsrepo @jfcalvo this is still pending to be fixed, will work on it soon.

@davidberenstein1957
Copy link
Member

@gabrielmbmb, this might be an issue in the TokenClassification dataset due in terms of visualizing the label order in the UI.

gabrielmbmb added a commit that referenced this issue Nov 28, 2023
# Description

This PR fixes a bug where the order of the labels for a Text
Classification dataset provided in the class
`TextClassificationSettings` was not preserved. This was happening
because the `labels_schema` attribute had the `Set[str]` type to ensure
there is no duplicate labels, but `set` doesn't preserver the order.

Instead of using `set` to ensure there is no duplicates, `labels_schema`
now has the `List[str]` type and a basic for loop has been added to
ensure there is no duplicates.

Closes #3828

**Type of change**

- [x] Bug fix (non-breaking change which fixes an issue)

**How Has This Been Tested**

```python
import argilla as rg

rg.set_workspace("argilla")

settings = rg.TextClassificationSettings(label_schema=[
    "1 (extremely positive/supportive)",
    "2 (positive/supportive)",
    "3 (neutral)",
    "4 (hateful/unsupportive)",
    "5 (extremely hateful/unsupportive)",
    "6 (can't say!)",
    "6 (can't say!)",
    "6 (can't say!)",
])

rg.log(rg.TextClassificationRecord(text="blablabla"), name="test-order-labels")

rg.configure_dataset_settings(name="test-order-labels", settings=settings)
```

After that, go to the UI and check the labels appears in the provided
order.

**Checklist**

- [x] I followed the style guidelines of this project
- [x] I did a self-review of my code
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [x] I have added relevant notes to the `CHANGELOG.md` file (See
https://keepachangelog.com/)

---------

Co-authored-by: Francisco Aranda <[email protected]>
davidberenstein1957 pushed a commit that referenced this issue Nov 29, 2023
# Description

This PR fixes a bug where the order of the labels for a Text
Classification dataset provided in the class
`TextClassificationSettings` was not preserved. This was happening
because the `labels_schema` attribute had the `Set[str]` type to ensure
there is no duplicate labels, but `set` doesn't preserver the order.

Instead of using `set` to ensure there is no duplicates, `labels_schema`
now has the `List[str]` type and a basic for loop has been added to
ensure there is no duplicates.

Closes #3828

**Type of change**

- [x] Bug fix (non-breaking change which fixes an issue)

**How Has This Been Tested**

```python
import argilla as rg

rg.set_workspace("argilla")

settings = rg.TextClassificationSettings(label_schema=[
    "1 (extremely positive/supportive)",
    "2 (positive/supportive)",
    "3 (neutral)",
    "4 (hateful/unsupportive)",
    "5 (extremely hateful/unsupportive)",
    "6 (can't say!)",
    "6 (can't say!)",
    "6 (can't say!)",
])

rg.log(rg.TextClassificationRecord(text="blablabla"), name="test-order-labels")

rg.configure_dataset_settings(name="test-order-labels", settings=settings)
```

After that, go to the UI and check the labels appears in the provided
order.

**Checklist**

- [x] I followed the style guidelines of this project
- [x] I did a self-review of my code
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [x] I have added relevant notes to the `CHANGELOG.md` file (See
https://keepachangelog.com/)

---------

Co-authored-by: Francisco Aranda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: api Indicates that an issue or pull request is related to the Fast API server or REST endpoints type: bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants