Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📋 [TASK] Dataclass-based Input/Output in Anomalib #2257

Closed
2 of 4 tasks
Tracked by #2258 ...
samet-akcay opened this issue Aug 19, 2024 · 1 comment · Fixed by #2098
Closed
2 of 4 tasks
Tracked by #2258 ...

📋 [TASK] Dataclass-based Input/Output in Anomalib #2257

samet-akcay opened this issue Aug 19, 2024 · 1 comment · Fixed by #2098
Assignees
Milestone

Comments

@samet-akcay
Copy link
Contributor

Implement Dataclass-based Input/Output in Anomalib

Background

Currently, anomalib handles inputs and outputs using dictionaries. While functional, this approach has several drawbacks, including lack of type safety, potential for typos in key names, and reduced code readability. We propose transitioning to a dataclass-based solution to address these issues.

Proposed Change

Replace the current dictionary-based system with a hierarchy of dataclasses:

  1. DatasetItem: Represents a single item in the dataset
  2. Batch: Contains multiple DatasetItem objects
  3. Dataset: Contains multiple Batch objects

Example Transformation

For example, current approach in a video dataset is as follow: Same will be applied to other modalities.

{
    "image": clip,
    "mask": self.get_mask(idx),
    "video_path": video_path,
    "frames": clip_pts,
    "last_frame": self.last_frame_idx(video_idx),
}

Proposed approach:

DatasetItem(
    image=clip,
    gt_mask=self.get_mask(idx),
    video_path=video_path,
    frames=clip_pts,
    last_frame=self.last_frame_idx(video_idx),
)

Benefits

  1. Type Safety: Dataclasses provide built-in type hints, reducing runtime errors.
  2. Code Readability: Explicit attribute names improve code comprehension.
  3. IDE Support: Better autocomplete and refactoring capabilities.
  4. Validation: Opportunity to add custom validation logic within dataclasses.
  5. Extensibility: Easier to add methods or derived properties to data structures.

Implementation Steps

  1. Define the dataclass structures (DatasetItem, Batch, Dataset).
  2. Update dataset loading and processing functions to use these new structures.
  3. Modify model input/output handling to work with the new dataclass objects.
  4. Update documentation and examples to reflect the new approach.
  5. Implement backward compatibility layer (if necessary) for existing code.

Potential Challenges

  • Ensuring backward compatibility with existing code and APIs
  • Performance considerations (if any) when switching from dictionaries to dataclasses
  • Updating all relevant parts of the codebase to use the new structures

Discussion Points

  • Should we include any additional fields in the dataclasses?
  • Do we need to implement custom __post_init__ methods for validation?
  • How should we handle optional fields?

Next Steps

  • Create detailed technical design document
  • Implement proof-of-concept in a separate branch
  • Discuss potential impacts on existing features and integrations
  • Plan for gradual rollout and testing strategy
@samet-akcay samet-akcay added this to the v1.2.0 milestone Aug 19, 2024
@samet-akcay samet-akcay linked a pull request Aug 19, 2024 that will close this issue
9 tasks
@samet-akcay samet-akcay changed the title Feature: Implement Dataclass-based Input/Output in Anomalib Feature: Dataclass-based Input/Output in Anomalib Aug 19, 2024
@samet-akcay samet-akcay modified the milestones: v1.2.0, v2.0 Oct 14, 2024
@samet-akcay samet-akcay changed the title Feature: Dataclass-based Input/Output in Anomalib 📋 [TASK] Dataclass-based Input/Output in Anomalib Oct 14, 2024
@samet-akcay
Copy link
Contributor Author

PR Link: #2098

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants