Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Labels are swapped after exporting splits (test, train, val) in YOLOv5 format #1952

Closed
2 of 6 tasks
abaybektursun opened this issue Jul 18, 2022 · 3 comments · Fixed by #1953
Closed
2 of 6 tasks
Labels
bug Bug fixes core Issues related to Core features

Comments

@abaybektursun
Copy link

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • FiftyOne installed from (pip or source): pip
  • FiftyOne version (run fiftyone --version): FiftyOne v0.16.5, Voxel51, Inc.
  • Python version: Python 3.8.10

Commands to reproduce

  1. Import libraries
import fiftyone.utils.splits as fous
import fiftyone as fo
from pathlib import Path
  1. Load a COCO formatted dataset (not split)
coco_path = Path("datasets/coco_custom_v1.1")
coco_dataset = fo.Dataset.from_dir(
        dataset_type=fo.types.COCODetectionDataset,
        data_path=coco_path/"data",
        labels_path=coco_path/"labels.json"
)
  1. Start the fiftyone server and make sure the dataset labels look good
session = fo.launch_app(coco_dataset, remote=True, port=6262)
  1. Create split tags
fous.random_split(coco_dataset, {"train": 0.65, "test": 0.2, "val": 0.15})
  1. BUG HERE -> Export all the splits to YoloV5 format
for split in ["train", "test", "val"]:
    split_view = coco_dataset.match_tags(split)
    split_view.export(
        export_dir="datasets/test-split-yolo",
        dataset_type=fo.types.YOLOv5Dataset,
        split=split
    )
  1. Import the dataset we just exported to look at the labels
coco_split_imported_dataset = fo.Dataset("coco_split_imported_dataset")
for split in ["train", "test", "val"]:
coco_split_imported_dataset.add_dir(
            dataset_dir="datasets/test-split-yolo",
            dataset_type=fo.types.YOLOv5Dataset,
            split=split,
            tags=split,
)
  1. Observe that for the samples with label 'train' the class labels are swapped

Describe the problem

I have a task for object detection with 2 classes. After training a Yolo model I noticed that my confusion matrix looks weird, it was performing poorly on the same classes but well on the opposite classes, meaning in my validation set labels were swapped. After hours of debugging I pinned it down. Fiftyone has a bug. When you split a dataset and then export it into YOLOv5 format, it messes up the labels (swaps them) for train set.

Code to reproduce issue

Described in the Commands to reproduce section in detail.

Other info / logs

My dataset has 2 classes.

What areas of FiftyOne does this bug affect?

  • App: FiftyOne application issue
  • Core: Core fiftyone Python library issue
  • Server: Fiftyone server issue

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance
    from the FiftyOne community.
  • No. I cannot contribute a bug fix at this time.
@abaybektursun abaybektursun added the bug Bug fixes label Jul 18, 2022
@brimoor
Copy link
Contributor

brimoor commented Jul 18, 2022

Hi @abaybektursun, your problem is that, when exporting in YOLOv5 format you must ensure that all splits use exactly the same classes list (YOLOv5 uses integer targets in the .txt files and all splits must use the same classes list).

This is achieved in the example YOLOv5 export snippet from the FiftyOne docs by providing the same classes list via the classes argument when exporting:

# The splits to export
splits = ["train", "val"]

# All splits must use the same classes list
classes = ["list", "of", "classes"]

# The dataset or view to export
# We assume the dataset uses sample tags to encode the splits to export
dataset_or_view = fo.Dataset(...)

# Export the splits
for split in splits:
    split_view = dataset_or_view.match_tags(split)
    split_view.export(
        export_dir=export_dir,
        dataset_type=fo.types.YOLOv5Dataset,
        label_field=label_field,
        split=split,
        classes=classes,
    )

You can use distinct() to conveniently retrieve a sorted list of all observed labels in a given field, eg:

classes = dataset.distinct("ground_truth.detections.label")

@brimoor brimoor changed the title [BUG] Labels are swapped after exporting splits (test, train, val) [BUG] Labels are swapped after exporting splits (test, train, val) in YOLOv5 format Jul 18, 2022
@brimoor brimoor added the core Issues related to Core features label Jul 18, 2022
@brimoor
Copy link
Contributor

brimoor commented Jul 18, 2022

I do think we should update the YOLOv5 exporter to raise an error when different class lists are encountered when exporting multiple splits into the same directory. That should never be allowed because it will lead to the label mismatch observed here.

Without providing the classes argument, the classes list is dynamically computed for each split based on observed values, which may not match if some splits don't contain all values or the implementation doesn't sort in a deterministic order (but in any case, manually providing the classes list is the correct approach for formats such as YOLOv5).

@Ikiselev7
Copy link

@abaybektursun providing classes to export worked for me as work around in this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes core Issues related to Core features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants