Feature : Multimask Training #29

kshitijrajsharma · 2024-03-24T09:20:09Z

What does this PR do ?

Previously we used two classes : Building and background , building being 1 and background being 0 . Which is binary masks . It is working great if buildings are separated from each other but this approach struggles when buildings are closely attached with each other. We often observe this scenario in places like slum and city area . Hence to solve this problem I came up with multimask approach instead of binary which will be used to teach model about nature of building boundaries so that it can spearate them better than before .

This PR introduces new multimasks labels that are being used in training for RAMP

Consideration

In this multimask labels we use following classes :

"background", - 0
"buildings",- 1
"boundary", - 2
"close_contact" - 3

Those classes are derived from RAMP utils . During implementation this PR introduces two new parameters for preprocessing which is input_contact_spacing and input_boundary_width

Definition

input_contact_spacing : contact_spacing deals with the interaction between two separate building shapes. This concept uses a positive buffer, extending outward from the edges of a building's shape, to see if and where it intersects with the buffer of another building.

input_boundary_width: boundary_width refers to creating a specific type of border or margin around the original shape of the building. This is achieved by applying a "negative buffer" to the building's shape. A negative buffer essentially shrinks the original shape inward by a specified distance, creating a smaller shape within the original. The space between the original shape and this smaller, inwardly adjusted shape forms the boundary.

Why those options ?

Two main approach here , one : how to distinguish its a building from background with accurate tracing of edges (boundary) and two : how to make sure they are delinated correctly when they are very close to each other (contact)

Visualization :

Color	Band Value	Class Name
Black	0	background
White	1	footprint
Blue	2	boundary
Red	3	contact

What is pixel unit ?

Technically when working with rasters even when we specify units in meter we need to do calculation interms of pixels . Meter unit differs based on the resolution of image . Each TMS will have different resolution and pixel width differes based on zoom levels , that's the exact reason why input parameter is in pixel to maintain consistency between different zoom levels

Formula :

$$Real-world width (in meters)= Pixel width×Resolution (meters per pixel)$$

Screenshot

Predictions with -

Binary masks training :

Multimasks training :

You can see clear separation of buildings in second screenshot , Now model is being able to distinguish buildings accurately than before as it has knowledge how boundary of buildings looks like

How to test ?

Find related model here : https://fair-dev.hotosm.org/start-mapping/121

Publish two different trainings and compare outputs

Training with multimasks : Training 438
Training with binary masks : Training 425

What next ?

I still haven't worked on inference yet . You should see difference in binary inference method too because at this point model gives extra classes yet should be able to distinguish building footprints like binary masks do (both approach maintain same band value for buildings which is 1 and burn value is 255). It would be nice to have mutlimask prediction too , it would help to compare with different classes.

With this multi-masks approach future integration of models which supports multimasks like YOLO becomes easier , Checkout the development going on in another open PR within repo

…better user understanding

kshitijrajsharma added 8 commits March 13, 2024 13:47

Added multimasks logics and method for preprocessing and training

e45d586

feat(multimasks): binary or multimasks option in training

40d6519

docs(multimasks): descriptive summary of arguments to be supplied

11e0f48

ci(test_app): test with multimasks option

1032e70

docs(multimasks_from_polygons): added better console msg

fe22183

docs(multimasks-from-polygons): improved logs

2ccdb72

perf(preprocess): accepts input in meters instead of pixel width for …

e3c628e

…better user understanding

docs(preprocess): definition correction

8089e4c

kshitijrajsharma mentioned this pull request Mar 24, 2024

Feature : Multimasks training for RAMP hotosm/fAIr#240

Merged

kshitijrajsharma added the enhancement New feature or request label Mar 24, 2024

kshitijrajsharma marked this pull request as ready for review March 25, 2024 06:14

kshitijrajsharma requested a review from omranlm March 25, 2024 06:14

kshitijrajsharma added 2 commits March 27, 2024 17:47

fix(multimasks): fix bug on inconsistency between different zoom levels

4cb7264

Fix multimasks error in feedback

3ddb004

kshitijrajsharma merged commit c876f9e into master Sep 3, 2024
1 check passed

kshitijrajsharma deleted the feature/multimasks branch September 3, 2024 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature : Multimask Training #29

Feature : Multimask Training #29

kshitijrajsharma commented Mar 24, 2024 •

edited

Loading

Feature : Multimask Training #29

Feature : Multimask Training #29

Conversation

kshitijrajsharma commented Mar 24, 2024 • edited Loading

What does this PR do ?

Consideration

Definition

Why those options ?

Visualization :

What is pixel unit ?

Screenshot

How to test ?

What next ?

kshitijrajsharma commented Mar 24, 2024 •

edited

Loading