Skip to content

Commit

Permalink
Merge pull request #55 from IGNF/dev
Browse files Browse the repository at this point in the history
Integration of Entropy in decision process
  • Loading branch information
CharlesGaydon authored Mar 28, 2022
2 parents 27d048d + e23043b commit e102738
Show file tree
Hide file tree
Showing 22 changed files with 271 additions and 2,529 deletions.
27 changes: 19 additions & 8 deletions .github/workflows/cicd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,28 @@ jobs:
run: docker run lidar_prod_im pytest --ignore=actions-runner --ignore="notebooks"

- name: Full module run on LAS subset
run: docker run -v /var/data/cicd/CICD_github_assets:/CICD_github_assets lidar_prod_im

- name: Evaluate decisions using optimization code on a single, corrected LAS
run: >
docker run -v /var/data/cicd/CICD_github_assets:/CICD_github_assets lidar_prod_im
python lidar_prod/run.py print_config=true +task='optimize'
docker run
-v /var/data/cicd/CICD_github_assets/M8.4/inputs/:/inputs/
-v /var/data/cicd/CICD_github_assets/M8.4/outputs/:/outputs/ lidar_prod_im
python lidar_prod/run.py
print_config=true
paths.src_las=/inputs/730000_6360000.subset.prototype_format202.las
paths.output_dir=/outputs/
- name: Evaluate decisions using optimization task (debug mode, on a single, corrected LAS)
run: >
docker run
-v /var/data/cicd/CICD_github_assets/M8.4/inputs/evaluation/:/inputs/
-v /var/data/cicd/CICD_github_assets/M8.4/outputs/evaluation/:/outputs/ lidar_prod_im
python lidar_prod/run.py
print_config=true
+task='optimize'
+building_validation.optimization.debug=true
building_validation.optimization.todo='prepare+evaluate+update'
building_validation.optimization.paths.input_las_dir=/CICD_github_assets/M8.0/20220204_building_val_V0.0_model/20211001_buiding_val_val/
building_validation.optimization.paths.results_output_dir=/CICD_github_assets/opti/
building_validation.optimization.paths.building_validation_thresholds_pickle=/CICD_github_assets/M8.3B2V0.0/optimized_thresholds.pickle
building_validation.optimization.paths.input_las_dir=/inputs/
building_validation.optimization.paths.results_output_dir=/outputs/
building_validation.optimization.paths.building_validation_thresholds_pickle=/inputs/optimized_thresholds.pickle
- name: clean the server for further uses
if: always() # always do it, even if something failed
Expand Down
19 changes: 8 additions & 11 deletions dockerfile → Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ RUN apt-get update && apt-get upgrade -y && apt-get install -y \
wget \
git \
postgis \
pdal \
libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6 # package needed for anaconda

# install anaconda
Expand All @@ -38,17 +37,15 @@ SHELL ["conda", "run", "-n", "lidar_prod", "/bin/bash", "-c"]
RUN echo "Make sure pdal is installed:"
RUN python -c "import pdal"

# the entrypoint garanty that all command will be runned in the conda environment
ENTRYPOINT ["conda", \
"run", \
"-n", \
# the entrypoint garanties that all command will be runned in the conda environment
ENTRYPOINT ["conda", \
"run", \
"-n", \
"lidar_prod"]

# cmd for a normal run (non evaluate)
CMD ["python", \
"lidar_prod/run.py", \
CMD ["python", \
"lidar_prod/run.py", \
"print_config=true", \
"paths.src_las=/CICD_github_assets/M8.0/20220204_building_val_V0.0_model/subsets/871000_6617000_subset_with_probas.las", \
"paths.output_dir=/CICD_github_assets/app/", \
"data_format.codes.building.candidates=[202]", \
"building_validation.application.building_validation_thresholds_pickle=/CICD_github_assets/M8.3B2V0.0/optimized_thresholds.pickle"]
"paths.src_las=your_las.las", \
"paths.output_dir=./path/to/outputs/"]
31 changes: 19 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,23 +35,29 @@ Goal: Confirm or refute groups of candidate building points when possible, mark

1) Clustering of _candidate buildings points_ into connected components.
2) Point-level decision
1) Decision at the point-level based on probabilities : `confirmed` if p>=`C1` / `refuted` if (1-p)>=`R1`
2) Identification of points that are `overlayed` by a building vector from the database.
1) Identification of points with ambiguous probability: `high entropy` if entropy $\geq$ E1
2) Identification of points that are `overlayed` by a building vector from the database.
3) Decision at the point-level based on probabilities :
1) `confirmed` if:
1) p$\geq$`C1`, or
2) `overlayed` and p$\geq$ (`C1` * `Cr`), where `Cr` is a relaxation factor that reduces the confidence we require to confirm when a point overlayed by a building vector.
2) `refuted` if (1-p)$\geq$`R1`
3) Group-level decision :
1) Confirmation: if proportion of `confirmed` points >= `C2` OR if proportion of `overlayed` points >= `O1`
2) Refutation: if proportion of `refuted` points >= `R2` AND proportion of `overlayed` points < `O1`
3) Uncertainty: elsewise.
1) Uncertain due to high entropy: if proportion of `high entropy` points $\geq$ `E2`
2) Confirmation: if proportion of `confirmed` points $\geq$ `C2` OR if proportion of `overlayed` points $\geq$ `O1`
3) Refutation: if proportion of `refuted` points $\geq$ `R2` AND proportion of `overlayed` points < `O1`
4) Uncertainty: elsewise (this is a safeguard: uncertain groups are supposed to be already captured via their entropy)
4) Update of the point cloud classification

Decision thresholds `C1`, `C2`, `R1`, `R2`, `O1` are chosen via a multi-objective hyperparameter optimization that aims to maximize automation, precision, and recall of the decisions. Right now we have automation=90%, precision=98%, recall=98% on a validation dataset. Illustration comes from older version.
Decision thresholds `E1`, `E2` , `C1`, `C2`, `R1`, `R2`, `O1` are chosen via a multi-objective hyperparameter optimization that aims to maximize automation, precision, and recall of the decisions. Right now we have automation=91%, precision=98.5%, recall=98.1% on a validation dataset. Illustration comes from older version.

![](assets/img/LidarBati-BuildingValidationM7.1V2.0.png)

#### B) Building Completion

Goal: Confirm points that were too isolated to make up a group but have high-enough probability nevertheless (e.g. walls)

Identify _candidate buildings points_ that have not been clustered in previous step due AND have high enough probability (p>=0.5)).
Among _candidate buildings points_ that have not been clustered in previous step due, identify those which nevertheless meet the requirement to be `confirmed`.
Cluster them together with previously confirmed building points in a relaxed, vertical fashion (higher tolerance, XY plan).
For each cluster, if some points were confirmed, the others are considered to belong to the same building, and are
therefore confirmed as well.
Expand All @@ -63,7 +69,9 @@ therefore confirmed as well.

Goal: Highlight potential buildings that were missed by the rule-based algorithm, for human inspection.

Clustering of points that have a probability of beind a building p>=`C1` AND are **not** _candidate buildings points_. This clustering defines a LAS extra dimensions (default name `Group`).
Among points that were **not** _candidate buildings points_ identify those which meet the requirement to be `confirmed`, and cluster them.

This clustering defines a LAS extra dimensions (`Group`) which indexes newly found cluster that may be some missed buildings.

![](assets/img/LidarBati-BuildingIdentification.png)

Expand Down Expand Up @@ -100,7 +108,7 @@ To run the module from anywhere, you can install as a package in a your virtual
conda activate lidar_prod

# install the package
pip install --upgrade https://github.com/IGNF/lidar-prod-quality-control/tarball/main # from github directly
pip install --upgrade https://github.com/IGNF/lidar-prod-quality-control/tarball/prod # from github directly, using production branch
pip install -e . # from local sources
```

Expand Down Expand Up @@ -153,13 +161,12 @@ conda activate lidar_prod
python lidar_prod/run.py +task=optimize building_validation.optimization.todo='prepare+evaluate+update' building_validation.optimization.paths.input_las_dir=[path/to/labelled/test/dataset/] building_validation.optimization.paths.results_output_dir=[path/to/save/results] building_validation.optimization.paths.building_validation_thresholds_pickle=[path/to/optimized_thresholds.pickle]
```

### CICD, Releases and versions
### CICD and versions

New features are staged in the `dev` branch, and CICD workflow is run when a pull requets to merge is created.
In Actions, check the output of a full evaluation on a single LAS to spot potential regression. The app is also run
on a subset of a LAS, which can be visually inspected before merging - there can always be surprises.

Package version follows semantic versionning conventions and is defined in `setup.py`.

Releases are generated when new high-level functionnality are implemented (e.g. a new step in the production process) or
when key parameters are changed. Generally speaking, the latest release `Vx.y.z` is the one to use in production.
Releases are generated when new high-level functionnality are implemented (e.g. a new step in the production process), with a documentation role. Production-ready code is fast-forwarded in the `prod` branch when needed.
3 changes: 2 additions & 1 deletion bash/setup_environment/requirements.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ dependencies:
- isort # import sorting
- flake8 # code analysis
# --------- geo --------- #
- conda-forge:python-pdal
- conda-forge:pdal==2.3.*
- conda-forge:python-pdal==3.0.*
- conda-forge:laspy==2.1.*
- numpy
- scikit-learn
Expand Down
14 changes: 8 additions & 6 deletions configs/building_validation/application/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,11 @@ bd_uni_request:

# TODO: update min_frac_confirmation_factor_if_bd_uni_overlay and others after optimization...
thresholds:
min_confidence_confirmation: 0.697
min_frac_confirmation: 0.384
min_frac_confirmation_factor_if_bd_uni_overlay: 0.808
min_uni_db_overlay_frac: 0.508
min_confidence_refutation: 0.973
min_frac_refutation: 0.285
min_confidence_confirmation: 0.6400365762003571 # min proba to validate a point
min_frac_confirmation: 0.779844069887882 # min fractin of confirmed points per group for confirmation
min_frac_confirmation_factor_if_bd_uni_overlay: 0.5894477997785892 # relaxation factor to min proba when point is under BDUni vector
min_uni_db_overlay_frac: 0.5041941489707767 # min fraction of points under BDUni vector per group for confirmation
min_confidence_refutation: 0.7477148092712739 # min proba to refute a point
min_frac_refutation: 0.7979734453001499 # min fractin of refuted points per group for confirmation
min_entropy_uncertainty: 0.884546947499147 # min entropy to flag a point as uncertain
min_frac_entropy_uncertain: 0.7271206406484895 # min fractin of uncertain points (based on entropy) per group to flag as uncertain
6 changes: 3 additions & 3 deletions configs/building_validation/optimization/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ study:
directions: ["maximize","maximize","maximize"]
sampler:
_target_: optuna.samplers.NSGAIISampler
population_size: 30
population_size: 50
mutation_prob: 0.25
crossover_prob: 0.8
swapping_prob: 0.5
crossover_prob: 0.1
swapping_prob: 0.1
seed: 12345
constraints_func:
_target_: functools.partial
Expand Down
7 changes: 0 additions & 7 deletions configs/data_format/cleaning/default.yaml

This file was deleted.

35 changes: 25 additions & 10 deletions configs/data_format/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,50 @@ crs: 2154
# Those names connect the logics between successive tasks
las_dimensions:
# input
classification: classification #las format
classification: classification # las format

# Extra dims
# ATTENTION: If extra dimensions are added, you may want to add them in cleaning.in parameter as well.
ai_building_proba: building # user-defined - output by deep learning model
entropy: entropy # user-defined - output by deep learning model

# intermediary channels
# Intermediary channels
cluster_id: ClusterID # pdal-defined -> created by clustering operations
uni_db_overlay: BDTopoOverlay # user-defined -> a 0/1 flag for presence of a BDUni vector
candidate_buildings_flag: F_CandidateB # -> a 0/1 flag identifying candidate buildings found by rules-based classification
ClusterID_candidate_building: CID_CandidateB # -> Cluster index from BuildingValidator, 0 if no cluster, 1-n elsewise
ClusterID_isolated_plus_confirmed: CID_IsolatedOrConfirmed # -> Cluster index from BuildingCompletor, 0 if no cluster, 1-n elsewise


# additionnal output channel
# Additionnal output channel
ai_building_identified: Group

cleaning:
input:
_target_: lidar_prod.tasks.cleaning.Cleaner
extra_dims:
- "${data_format.las_dimensions.ai_building_proba}=float"
- "${data_format.las_dimensions.entropy}=float"
output:
# Extra dims that are kept when cleaning dimensions
# You can override with "all" to keep all extra dimensions at development time.
_target_: lidar_prod.tasks.cleaning.Cleaner
extra_dims:
- "${data_format.las_dimensions.ai_building_identified}=uint"
- "${data_format.las_dimensions.ai_building_proba}=float"

codes:
building:
candidates: [202] # found by rules-based classification (TerraScan)
detailed: # used for detailed output when doing threshold optimization
unsure_by_entropy: 200 # unsure (based on entropy)
unclustered: 202 # refuted
ia_refuted: 110 # refuted
ia_refuted_and_db_overlayed: 111 # unsure
both_unsure: 112 # unsure
ia_refuted_but_under_db_uni: 111 # unsure
both_unsure: 112 # unsure (elsewise)
ia_confirmed_only: 113 # confirmed
db_overlayed_only: 114 # confirmed
both_confirmed: 115 # confirmed
final: # used at the end of the building process
unsure: 214 # unsure
not_building: 208 # refuted
building: 6 # confirmed

defaults:
- cleaning: default.yaml
building: 6 # confirmed
42 changes: 0 additions & 42 deletions flake8_output.txt

This file was deleted.

20 changes: 14 additions & 6 deletions lidar_prod/application.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,23 +29,31 @@ def apply(config: DictConfig):
"""
assert os.path.exists(config.paths.src_las)
in_f = config.paths.src_las
out_f = osp.join(config.paths.output_dir, osp.basename(in_f))
IN_F = config.paths.src_las
OUF_F = osp.join(config.paths.output_dir, osp.basename(IN_F))

with TemporaryDirectory() as td:
# Temporary LAS file for intermediary results.
temp_f = osp.join(td, osp.basename(in_f))
temp_f = osp.join(td, osp.basename(IN_F))

# Removes unnecessary input dimensions to reduce memory usage
cl: Cleaner = hydra.utils.instantiate(config.data_format.cleaning.input)
cl.run(IN_F, temp_f)

# Validate buildings (unsure/confirmed/refuted) on a per-group basis.
bv: BuildingValidator = hydra.utils.instantiate(
config.building_validation.application
)
bv.run(in_f, temp_f)
bv.run(temp_f, temp_f)

# Complete buildings with non-candidates that were nevertheless confirmed
bc: BuildingCompletor = hydra.utils.instantiate(config.building_completion)
bc.run(temp_f, temp_f)

# Define groups of confirmed building points among non-candidates
bi: BuildingIdentifier = hydra.utils.instantiate(config.building_identification)
bi.run(temp_f, temp_f)

cl: Cleaner = hydra.utils.instantiate(config.data_format.cleaning)
cl.run(temp_f, out_f)
# Remove unnecessary intermediary dimensions
cl: Cleaner = hydra.utils.instantiate(config.data_format.cleaning.output)
cl.run(temp_f, OUF_F)
7 changes: 6 additions & 1 deletion lidar_prod/tasks/building_completion.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,12 @@ def prepare(self, in_f: str, out_f: str):
value=f"{self.data_format.las_dimensions.cluster_id} = 0"
)
pipeline |= pdal.Writer(
type="writers.las", filename=out_f, forward="all", extra_dims="all"
type="writers.las",
filename=out_f,
forward="all",
extra_dims="all",
minor_version=4,
dataformat_id=8,
)
os.makedirs(osp.dirname(out_f), exist_ok=True)
pipeline.execute()
Expand Down
7 changes: 6 additions & 1 deletion lidar_prod/tasks/building_identification.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,12 @@ def prepare(self, in_f: str, out_f: str) -> None:
)

pipeline |= pdal.Writer(
type="writers.las", filename=out_f, forward="all", extra_dims="all"
type="writers.las",
filename=out_f,
forward="all",
extra_dims="all",
minor_version=4,
dataformat_id=8,
)
os.makedirs(osp.dirname(out_f), exist_ok=True)
pipeline.execute()
Loading

0 comments on commit e102738

Please sign in to comment.