Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop passing classes and default_classes to export and evaluation methods #1858

Merged
merged 2 commits into from
Jun 10, 2022

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Jun 9, 2022

Resolves #1722.

This is scratching a pretty deep itch, and it feels oh so good 🥇

The fact that certain API methods would silently search for class lists on Dataset.classes and Dataset.default_classes often caused unexpected behavior for users that had first used something like filter_labels() to select a subset of classes to work with (see #1722 for details).

Lesson learned: don't be fancy, let the user store class lists, but prefer relying on the user to explicitly request certain behavior:

# bad: silently access dataset.default_classes to locate classes
dataset.some_method(...)

# good: let user explicitly tell us what to do
dataset.some_method(..., classes=dataset.default_classes)

NOTE: there's still one important place that will silently infer class lists: the annotation API. My reasoning there is, even if the user first did something like filter_labels() to extract a view of interest to annotate, they are very likely to want all possible classes from their schema available for use when annotating.

^removed default classes/mask targets from the annotation API too

@brimoor brimoor added the enhancement Code enhancement label Jun 9, 2022
@brimoor brimoor requested review from ehofesmann and a team June 9, 2022 20:04
@brimoor brimoor self-assigned this Jun 9, 2022
Copy link
Member

@ehofesmann ehofesmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I would argue that this could be extended even to the annotation API. If you filter out some classes from the view, it still uses the underlying dataset (and the view) to get the entire list of classes so you don't need to be concerned about missing labels:

return sorted(
set(samples._dataset.distinct(label_path))
| set(samples.distinct(label_path))
)

Also, I have serious doubts that many users actually use or even know about the existence of dataset.classes and dataset.default_classes, so in the case that they do want to specify classes that don't exist yet, they would probably be providing them manually anyway. Though for the same reason, there's also not much of a reason to change the way the annotation API works currently. It would only trip people up if they have default classes defined for one field and go to annotate another field with different classes, which I suppose could happen but I've never seen it yet.

@brimoor
Copy link
Contributor Author

brimoor commented Jun 10, 2022

This LGTM, but I would argue that this could be extended even to the annotation API. If you filter out some classes from the view, it still uses the underlying dataset (and the view) to get the entire list of classes so you don't need to be concerned about missing labels

@ehofesmann you're a wise man 🧙‍♂️
I removed default classes/mask targets from the annotation API too 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Stop passing classes and default_classes to export and evaluation methods
2 participants