Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supported mask formats with Albumentations #1973

Open
PRFina opened this issue Oct 7, 2024 · 1 comment
Open

Supported mask formats with Albumentations #1973

PRFina opened this issue Oct 7, 2024 · 1 comment
Labels
question Further information is requested

Comments

@PRFina
Copy link

PRFina commented Oct 7, 2024

Your Question

From the documentation, both API reference and [user guide] (https://albumentations.ai/docs/getting_started/mask_augmentation/) sections, it's not straightforward to understand which kind of mask format is supported and more importantly, if different mask formats can lead to different transformation outputs due to some internal implementation details.
Take for example a semantic segmentation task with 3 classes: A, B, and C, each class has an associated mask Ma, Mb, Mc stored as a different file. Besides RLE encoding and similar sparse formats, the most basic ways to encode a dense mask, and augment a sample are:

  • Read Ma, Mb, and Mc as an np array and store them in a Python list, eg masks. The transform API allows to call transformed = transform(image=image, masks=masks) and gets the augmented image and mask pair.
  • Read Ma, Mb, and Mc as a np array and stack them in a mask np array of shape (H, W, C), where C=3 and each array's element is True or False. Let's refer to this as one-hot boolean encoding. The transform API allows to call transformed = transform(image=image, mask=mask) and gets the augmented image and mask pair.
  • Read Ma, Mb, and Mc as a np array and encode them in a mask array of shape (H, W), where each array's item represents the class index (0, 1, 2). Let's refer to this as integer tensor encoding. Then I can call transformed = transform(image=image, mask=mask) and get the augmented image and mask pair.

Now, my questions are:

  • Does Albumentations support all of the 3 types of encodings for every transform?
  • Does the encoding type affect the output of a given transformation?
  • Is one approach better than another in terms of performance?
@PRFina PRFina added the question Further information is requested label Oct 7, 2024
@ternaus
Copy link
Collaborator

ternaus commented Oct 8, 2024

Albumentations supports all 3. Performance is similar, results will be the same. Same transform are used under the hood. The difference is only in forth and back format conversion.

Thank you for the question, will update the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants