Adding support for dynamic embedded document field schemas #1825

brimoor · 2022-06-06T05:12:45Z

Change log

Adds support for declaring dynamic embedded document fields on the dataset's schema via add_sample_field() and add_frame_field()
Added support for selecting/excluding embedded document fields via select_fields() and exclude_fields()
Added a dynamic=True flag that can be passed to dataset factory methods that will cause all dynamic embedded document attributes that are encountered to be automatically added to the dataset's schema
Added a schema() aggregation that can be used to compute the observed type(s) of arbitrarily nested embedded documents
Added get_dynamic_field_schema() and add_dynamic_sample_fields() methods for automatically detecting and declaring dynamic sample fields
Added get_dynamic_frame_field_schema() and add_dynamic_frame_fields() methods for detecting and declaring dynamic frame fields
Added flat=True option to get_field_schema() and get_frame_field_schema() methods that returns all embedded document fields as top-level keys

Notes

The only default behavior that this PR changes is that evaluate_detections() will automatically add the dynamic attributes that it populates to the dataset's schema
Dynamic attributes are not declared by default by add_samples(), from_dir(), etc
As a result, there is no decrease in performance in the default case

Example usage

Previously undeclared dynamic attributes can now be declared:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")
fo.pprint(dataset.get_dynamic_field_schema())

# Declare dynamic attributes
dataset.add_dynamic_sample_fields()
fo.pprint(dataset.get_dynamic_field_schema())

# Verify that they exist in the dataset's schema
fo.pprint(dataset.get_field_schema(flat=True))

# Dynamic attributes are available in the App for filtering
session = fo.launch_app(dataset)

# Dynamic attributes are carried over to patches views too
session.view = dataset.to_patches("ground_truth")

Detection evaluation automatically declares the dynamic attributes that it populates:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

dataset.evaluate_detections("predictions", gt_field="ground_truth", eval_key="eval")
fo.pprint(dataset.get_field_schema(flat=True))

session = fo.launch_app(dataset)

# Dynamic attributes can be excluded
# This syntax selects only the default fields on the detections
session.view = dataset.select_fields(
    ["predictions.detections.label", "ground_truth.detections.label"]
)

Design documentation

Any field(s) of your FiftyOne datasets that contain DynamicEmbeddedDocument values can have arbitrary custom attributes added to their instances.

For example, all Label and Metadata classes are dynamic, so you can add custom attributes to them as follows:

# Provide some default attributes
label = fo.Classification(label="cat", confidence=0.98)

# Add custom attributes
label["int"] = 5
label["float"] = 51.0
label["list"] = [1, 2, 3]
label["bool"] = True
label["dict"] = {"key": ["list", "of", "values"]}

By default, dynamic attributes are not included in a dataset's schema, which means that these attributes may contain arbitrary heterogenous values across the dataset's samples.

However, FiftyOne provides methods that you can use to formally declare custom dynamic attributes, which allows you to enforce type constraints, filter by these custom attributes in the App, and more.

You can use get_dynamic_field_schema() to detect the names and type(s) of any undeclared dynamic embedded document attributes on a dataset:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

print(dataset.get_dynamic_field_schema())

{
    'ground_truth.detections.iscrowd': <fiftyone.core.fields.FloatField>,
    'ground_truth.detections.area': <fiftyone.core.fields.FloatField>,
}

You can then use add_sample_field() to declare a specific dynamic embedded document attribute:

dataset.add_sample_field("ground_truth.detections.iscrowd", fo.FloatField)

or you can use the add_dynamic_sample_fields() method to declare all dynamic embedded document attribute(s) that contain values of a single type:

dataset.add_dynamic_sample_fields()

Pass the add_mixed=True option to add_dynamic_sample_fields() if you wish to declare all dynamic attributes that contain mixed values using a generic Field type.

You can provide the optional flat=True option to get_field_schema() to retrieve a flattened version of a dataset's schema that includes all embedded document attributes as top-level keys:

print(dataset.get_field_schema(flat=True))

{
    'id': <fiftyone.core.fields.ObjectIdField>,
    'filepath': <fiftyone.core.fields.StringField>,
    'tags': <fiftyone.core.fields.ListField>,
    'metadata': <fiftyone.core.fields.EmbeddedDocumentField>,
    'metadata.size_bytes': <fiftyone.core.fields.IntField>,
    'metadata.mime_type': <fiftyone.core.fields.StringField>,
    'metadata.width': <fiftyone.core.fields.IntField>,
    'metadata.height': <fiftyone.core.fields.IntField>,
    'metadata.num_channels': <fiftyone.core.fields.IntField>,
    'ground_truth': <fiftyone.core.fields.EmbeddedDocumentField>,
    'ground_truth.detections': <fiftyone.core.fields.ListField>,
    'ground_truth.detections.id': <fiftyone.core.fields.ObjectIdField>,
    'ground_truth.detections.tags': <fiftyone.core.fields.ListField>,
    ...
    'ground_truth.detections.iscrowd': <fiftyone.core.fields.FloatField>,
    'ground_truth.detections.area': <fiftyone.core.fields.FloatField>,
    ...
}

By default, dynamic attributes are not declared on a dataset's schema when samples are added to it:

import fiftyone as fo

sample = fo.Sample(
    filepath="/path/to/image.jpg",
    ground_truth=fo.Detections(
        detections=[
            fo.Detection(
                label="cat",
                bounding_box=[0.1, 0.1, 0.4, 0.4],
                mood="surly",
            ),
            fo.Detection(
                label="dog",
                bounding_box=[0.5, 0.5, 0.4, 0.4],
                mood="happy",
            )
        ]
    )
)

dataset = fo.Dataset()
dataset.add_sample(sample)

schema = dataset.get_field_schema(flat=True)

assert "ground_truth.detections.mood" not in schema

However, methods such as add_samples() and from_dir() provide an optional dynamic=True option that you can provide to automatically declare any dynamic embedded document attributes encountered while importing data:

dataset = fo.Dataset()

dataset.add_sample(sample, dynamic=True)
schema = dataset.get_field_schema(flat=True)

assert "ground_truth.detections.mood" in schema

Note that, when declaring dynamic attributes on non-empty datasets, you must ensure that the attribute's type is consistent with any existing values in that field, e.g., by first running get_dynamic_field_schema() to check the existing type(s). Methods like add_sample_field() and add_samples(..., dynamic=True) do not validate newly declared field's types against existing field values:

import fiftyone as fo

sample1 = fo.Sample(
    filepath="/path/to/image1.jpg",
    ground_truth=fo.Classification(
        label="cat",
        mood="surly",
        age="bad-value",
    ),
)

sample2 = fo.Sample(
    filepath="/path/to/image2.jpg",
    ground_truth=fo.Classification(
        label="dog",
        mood="happy",
        age=5,
    ),
)

dataset = fo.Dataset()

dataset.add_sample(sample1)

# Either of these are problematic
dataset.add_sample(sample2, dynamic=True)
dataset.add_sample_field("ground_truth.age", fo.IntField)

sample1.reload()  # ValidationError: bad-value could not be converted to int

If you declare a dynamic attribute with a type that is not compatible with existing values in that field, you will need to remove that field from the dataset's schema using remove_dynamic_sample_field() in order for the dataset to be usable again:

# Removes dynamic field from dataset's schema without deleting the values
dataset.remove_dynamic_sample_field("ground_truth.age")

You can use select_fields() and exclude_fields() to create views that select/exclude specific dynamic attributes from your dataset and its schema:

dataset.add_sample_field("ground_truth.age", fo.Field)
sample = dataset.first()

assert "ground_truth.age" in dataset.get_field_schema(flat=True)
assert sample.ground_truth.has_field("age")

# Omits the `age` attribute from the `ground_truth` field
view = dataset.exclude_fields("ground_truth.age")
sample = view.first()

assert "ground_truth.age" not in view.get_field_schema(flat=True)
assert not sample.ground_truth.has_field("age")

# Only include `mood` (and default) attributes of the `ground_truth` field
view = dataset.select_fields("ground_truth.mood")
sample = view.first()

assert "ground_truth.age" not in view.get_field_schema(flat=True)
assert not sample.ground_truth.has_field("age")

benjaminpkane

LGTM

…into add-dynamic-fields-ben

Custom embedded fields in the App

brimoor added 17 commits July 21, 2021 15:49

adding a Schema aggregation

7896e7d

return FiftyOne field types

5f96080

adding method to get schema of Label attributes

a449545

tweaking name

b52d5fa

Merge branch 'develop' into schema-aggregation

3f3c41a

Merge branch 'develop' into schema-aggregation

0d314a4

Merge branch 'develop' into schema-aggregation

0d6a93c

Merge branch 'develop' into schema-aggregation

65aac60

Merge branch 'develop' into schema-aggregation

b76e9c2

Merge branch 'develop' into schema-aggregation

7f58ddf

Merge branch 'develop' into schema-aggregation

778a723

Merge branch 'develop' into schema-aggregation

81b721b

update

60c8e30

Merge branch 'develop' into schema-aggregation

ef2660b

updating ObjectIdField logic

5fd28af

adding support for declaring dynamic embedded document fields

ffc23dd

cleanup

91d8993

brimoor added the feature Work on a feature request label Jun 6, 2022

brimoor requested a review from a team June 6, 2022 05:12

brimoor self-assigned this Jun 6, 2022

brimoor mentioned this pull request Jun 6, 2022

[WIP] Adding dynamic fields to schema #2 #1826

Closed

fixing bug

b93509f

brimoor mentioned this pull request Jul 26, 2022

[FR] Add App sliders for parameters other than confidence for detections #1368

Open

brimoor mentioned this pull request Aug 27, 2022

Add support for rendering attributes of detections in expanded sample view #452

Closed

brimoor added 6 commits September 28, 2022 00:47

Merge branch 'develop' into add-dynamic-fields1

e6c3804

linting

c8c3dc8

finishing refactor

968678b

handling edge cases

a3c7dab

don't rely on mongo error types

1462a12

Merge branch 'develop' into add-dynamic-fields1

ec54698

benjaminpkane added 3 commits November 3, 2022 09:26

Merge branch 'add-dynamic-fields1' into add-dynamic-fields-ben

4f721a1

distribution work

96ae411

distribution work

8432bb4

benjaminpkane approved these changes Nov 4, 2022

View reviewed changes

brimoor and others added 24 commits November 4, 2022 18:03

Merge branch 'develop' into add-dynamic-fields1

a4af4ed

work

31ebde9

cleanup

24825ab

Merge branch 'add-dynamic-fields1' into add-dynamic-fields-ben

6e6186e

Merge branch 'develop' into add-dynamic-fields1

e3a515c

Merge branch 'develop' into add-dynamic-fields1

e4b6745

Merge branch 'add-dynamic-fields1' into add-dynamic-fields-ben

63cf54c

fixing kwarg bug

509ab04

descending order by count

67e3a1b

allowing datetimes to be inferred

f301f06

Merge branch 'add-dynamic-fields1' into add-dynamic-fields-ben

952dd84

adding flat option

0537219

respect filtered fields

a4d7168

bugs

45f11d9

Merge branch 'add-dynamic-fields-ben' of github.com:voxel51/fiftyone …

5f4a198

…into add-dynamic-fields-ben

update location state

f1c4d42

lint

ef40b6f

Merge branch 'iss-2229' into add-dynamic-fields-ben

ebf1b75

tick update

1e7be24

handling db fields for embedded paths

c28f0c1

simplify

07aac2e

Merge branch 'add-dynamic-fields1' into add-dynamic-fields-ben

7024d72

allowing dynamic attributes to NOT be added to schema

6432812

Merge pull request #2239 from voxel51/add-dynamic-fields-ben

503d357

Custom embedded fields in the App

brimoor merged commit ed202e6 into develop Nov 7, 2022

brimoor deleted the add-dynamic-fields1 branch November 7, 2022 05:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for dynamic embedded document field schemas #1825

Adding support for dynamic embedded document field schemas #1825

brimoor commented Jun 6, 2022 •

edited

Loading

benjaminpkane left a comment

Adding support for dynamic embedded document field schemas #1825

Adding support for dynamic embedded document field schemas #1825

Conversation

brimoor commented Jun 6, 2022 • edited Loading

Change log

Notes

Example usage

Design documentation

benjaminpkane left a comment

Choose a reason for hiding this comment

brimoor commented Jun 6, 2022 •

edited

Loading