Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for label field dicts when importing labeled datasets #1864

Merged
merged 3 commits into from
Jun 17, 2022

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Jun 10, 2022

Adds support for passing label_field as a dict mapping label keys to field names when importing datasets using Dataset.from_dir() and all related factory methods.

As the example below illustrates, this provides more fine-grained support for directly specifying the field names to use when importing multitask dataset formats such as BDD:

import random
import os

import fiftyone as fo
import fiftyone.zoo as foz

#
# Export a dataset with detections and classifications
#

dataset = foz.load_zoo_dataset("quickstart")
data_path = os.path.dirname(dataset.first().filepath)

for sample in dataset.select_fields().iter_samples(progress=True):
    sample["weather"] = fo.Classification(label=random.choice(["sunny", "cloudy"]))
    sample.save()

dataset.export(
    labels_path="/tmp/bdd.json",
    dataset_type=fo.types.BDDDataset,
    label_field=["ground_truth", "weather"],
)
print(dataset)

# Now import with default label field names
dataset2 = fo.Dataset.from_dir(
    data_path=data_path,
    labels_path="/tmp/bdd.json",
    dataset_type=fo.types.BDDDataset,
)
print(dataset2)

# New syntax: import with one field name customized
dataset3 = fo.Dataset.from_dir(
    data_path=data_path,
    labels_path="/tmp/bdd.json",
    dataset_type=fo.types.BDDDataset,
    label_field={"detections": "ground_truth"},
)
print(dataset3)

@brimoor brimoor added the enhancement Code enhancement label Jun 10, 2022
@brimoor brimoor requested review from ehofesmann and a team June 10, 2022 04:20
@brimoor brimoor self-assigned this Jun 10, 2022
Copy link
Member

@ehofesmann ehofesmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Though this could get confusing due to the label fields of importers changing depending on what is being imported. Like this issue where ground_truth turns into detections when coco_id is provided when importing a coco dataset (I see you are working on updating this for coco, we might as well update this for all formats). This might be a good time to consider moving away from defaulting to "ground_truth" and always requiring an importer to return a dict of labels.

This could let us get rid of label_cls when defining importers which is a bit confusing for new contributors. Without it, the importer could just return a dict of whatever labels it wants for each sample which is much easier to implement. Something like OpenLABEL where all fields are parsed directly from the annotations already has label_cls returning None. There are a couple of changes that would need to be made, primarily getting rid of this optimization which I don't have any idea of how much this helps.

@brimoor
Copy link
Contributor Author

brimoor commented Jun 17, 2022

Just documenting that @ehofesmann makes some good points above and were discussed IRL

@brimoor brimoor merged commit 5cd9186 into develop Jun 17, 2022
@brimoor brimoor deleted the feature/label-dict-imports branch June 17, 2022 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants