Adding support for label field dicts when importing labeled datasets #1864

brimoor · 2022-06-10T04:20:16Z

Adds support for passing label_field as a dict mapping label keys to field names when importing datasets using Dataset.from_dir() and all related factory methods.

As the example below illustrates, this provides more fine-grained support for directly specifying the field names to use when importing multitask dataset formats such as BDD:

import random
import os

import fiftyone as fo
import fiftyone.zoo as foz

#
# Export a dataset with detections and classifications
#

dataset = foz.load_zoo_dataset("quickstart")
data_path = os.path.dirname(dataset.first().filepath)

for sample in dataset.select_fields().iter_samples(progress=True):
    sample["weather"] = fo.Classification(label=random.choice(["sunny", "cloudy"]))
    sample.save()

dataset.export(
    labels_path="/tmp/bdd.json",
    dataset_type=fo.types.BDDDataset,
    label_field=["ground_truth", "weather"],
)
print(dataset)

# Now import with default label field names
dataset2 = fo.Dataset.from_dir(
    data_path=data_path,
    labels_path="/tmp/bdd.json",
    dataset_type=fo.types.BDDDataset,
)
print(dataset2)

# New syntax: import with one field name customized
dataset3 = fo.Dataset.from_dir(
    data_path=data_path,
    labels_path="/tmp/bdd.json",
    dataset_type=fo.types.BDDDataset,
    label_field={"detections": "ground_truth"},
)
print(dataset3)

ehofesmann

LGTM

Though this could get confusing due to the label fields of importers changing depending on what is being imported. Like this issue where ground_truth turns into detections when coco_id is provided when importing a coco dataset (I see you are working on updating this for coco, we might as well update this for all formats). This might be a good time to consider moving away from defaulting to "ground_truth" and always requiring an importer to return a dict of labels.

This could let us get rid of label_cls when defining importers which is a bit confusing for new contributors. Without it, the importer could just return a dict of whatever labels it wants for each sample which is much easier to implement. Something like OpenLABEL where all fields are parsed directly from the annotations already has label_cls returning None. There are a couple of changes that would need to be made, primarily getting rid of this optimization which I don't have any idea of how much this helps.

brimoor · 2022-06-17T14:25:28Z

Just documenting that @ehofesmann makes some good points above and were discussed IRL

brimoor added 3 commits June 9, 2022 23:58

allowing label_field to be a dict

086bad2

fixing broken notes

8cfd062

allowing samples not in DB

eaf653e

brimoor added the enhancement Code enhancement label Jun 10, 2022

brimoor requested review from ehofesmann and a team June 10, 2022 04:20

brimoor self-assigned this Jun 10, 2022

ehofesmann approved these changes Jun 10, 2022

View reviewed changes

brimoor merged commit 5cd9186 into develop Jun 17, 2022

brimoor deleted the feature/label-dict-imports branch June 17, 2022 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for label field dicts when importing labeled datasets #1864

Adding support for label field dicts when importing labeled datasets #1864

brimoor commented Jun 10, 2022

ehofesmann left a comment •

edited

Loading

brimoor commented Jun 17, 2022

Adding support for label field dicts when importing labeled datasets #1864

Adding support for label field dicts when importing labeled datasets #1864

Conversation

brimoor commented Jun 10, 2022

ehofesmann left a comment • edited Loading

Choose a reason for hiding this comment

brimoor commented Jun 17, 2022

ehofesmann left a comment •

edited

Loading