Improve models generation #96

HealthyPear · 2021-01-25T16:38:11Z

Requirements

Upgrade all other notebooks and their docs version #76

Summary of expected modifications

Refactoring of the I/O from protopipe.scripts.build_models to protopipe.mva.io
Implementation of Random Forest energy regressor
Inclusion of missing features from CTAMARS related to energy regressor
Explicit inclusion of all model class attributes from scikit-learn
Improvement of configuration files to maximize usage
Final testing
Update documentation
BONUS feature: testing via integration tests

- refactored I/O from protopipe.mva.io - add Random Forest energy regressor - explicit all class options from scikit-learn - better organized information - smaller fixes

review-notebook-app · 2021-01-25T16:38:16Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2021-01-25T18:04:32Z

Codecov Report

Merging #96 (d10dac3) into master (166fbe0) will increase coverage by 0.00%.
The diff coverage is 77.77%.

@@           Coverage Diff           @@
##           master      #96   +/-   ##
=======================================
  Coverage   48.92%   48.93%           
=======================================
  Files          22       23    +1     
  Lines        2001     2058   +57     
=======================================
+ Hits          979     1007   +28     
- Misses       1022     1051   +29

Impacted Files	Coverage Δ
protopipe/scripts/write_dl2.py	`4.91% <3.84%> (+<0.01%)`	⬆️
protopipe/scripts/build_model.py	`86.32% <86.30%> (-0.29%)`	⬇️
protopipe/mva/utils.py	`30.65% <87.50%> (+0.22%)`	⬆️
protopipe/mva/__init__.py	`100.00% <100.00%> (ø)`
protopipe/mva/io.py	`100.00% <100.00%> (ø)`
protopipe/mva/train_model.py	`64.00% <100.00%> (-24.24%)`	⬇️
protopipe/scripts/data_training.py	`94.87% <100.00%> (+0.27%)`	⬆️
protopipe/scripts/tests/test_pipeline.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 166fbe0...d10dac3. Read the comment docs.

kosack

Nice! Had just a few small inline changes to suggest.

A general question: is there a way to include another YAML file in a YAML file, via some pre-processor perhaps? (jinja2 maybe?) Right now if you make any changes, you have a lot of YAML files to update, and many of them have a lot of the same text in them, with only the regressor changing. So having a file for parameters and another with the regressor config would simplify that a lot. That is not necessary for this PR, but is something to think about as a refactoring.

docs/scripts/model_building.rst

protopipe/scripts/write_dl2.py

protopipe/aux/example_config_files/AdaBoostRegressor.yaml

protopipe/mva/train_model.py

kosack · 2021-04-15T08:53:43Z

protopipe/mva/utils.py

-    except:
-        pass
+    except KeyError as e:
+        print(e)


should this really silently fail (well partially silently, as it just prints the error and continues)? If so, maybe add a comment explaining why the error is caught but nothing is done with it. Otherwise, use a warning or raise the exception.

Looking at it a second time I decided that it was not done well so I rewrote it.

basically now:

if label is None, it means that we are doing an energy regressor, so all this code block doesn't happen

if it's not None, label is always added to the dataframe, but the 2 energy keys are checked for existence since in our reference analysis we need energy as 1 of the features

unfortunately right now this is needed to make DL2 work here, so I raise an error if the model is a classifier and those keys are not selected for usage (or I could add them always?)

kosack · 2021-04-15T11:49:46Z

protopipe/mva/utils.py

+        # This is needed because our reference analysis uses energy as
+        # feature for classification
+        # We should propably support a more elastic choice in the future.
+        if not all(i in derived_features for i in ["log10_reco_energy", "log10_reco_energy_tel"]):


could also write this as

if not {"log10_reco_energy", "log10_reco_energy_tel"}.issubset(set(derived_features)):

Do you really need this to fail if missing? Isn't it just a choice in the config file whether or not to use energy in the feature list?

yes this is what I was referring to above: right now the DL2 script expects these 2 parameters as features because this is how our reference analysis works, but I plan to make it more flexible (also the refactored version should)

I will leave this for later and for now just rely on the error message

HealthyPear added 3 commits January 25, 2021 17:23

Refactor model script I/O

0efd817

Improve input features and add missing CTAMARS ones

71f59ea

Started improving model script (see complete commit message for details)

e5b64bb

- refactored I/O from protopipe.mva.io - add Random Forest energy regressor - explicit all class options from scikit-learn - better organized information - smaller fixes

HealthyPear added enhancement New feature or request machine learning labels Jan 25, 2021

HealthyPear added this to the v0.5.0 milestone Jan 25, 2021

HealthyPear linked an issue Feb 23, 2021 that may be closed by this pull request

Better handling of model features #90

Closed

HealthyPear added 10 commits April 12, 2021 15:23

Update from master and solve conflicts

2f045ab

clarify CLI help

8819bde

small format changes to protopipe.mva.utils.prepare_data

e070b06

simplify a condition in TrainModel

b554334

Test improvement of models initialization

07fcb9e

allow fit of single model (no GridSearchCV)

03f8c97

small formatting change

22b264c

Add example configuration file for RandomForestRegressor

de38037

Add example configuration file for RandomForestClassifier

fb26a1c

fix input signal file name key

a55c861

kosack previously approved these changes Apr 13, 2021

View reviewed changes

HealthyPear added 6 commits April 14, 2021 11:09

Add testing files for RandomForestClassifier and RandomForestRegressor

1ab50d7

Add test configuration file for AdaBoostRegressor (replaces regressor)

9fb476f

Add AdaBoostRegressor configuration file

861e61b

Update model output

52104a5

Update example config files for RandomForest-based algorithms

beb5562

Improve protopipe.mva.utils.prepare_data

e3fecb5

HealthyPear added 7 commits April 14, 2021 19:28

Improve and simplify protopipe-MODEL

608d0f5

Modify protopipe-TRAINING according to new version of protopipe-MODEL

7fad104

Modify protopipe-DL2 according to modification to protopipe-MODEL

dc92517

Update test configuration files

b5539e4

Update test pipeline

44a9317

Remove obsolete MVA example/test configuration files

2e01972

Update documentation

c76eee6

HealthyPear dismissed kosack’s stale review via c76eee6 April 14, 2021 17:54

HealthyPear marked this pull request as ready for review April 14, 2021 18:01

HealthyPear requested a review from kosack April 14, 2021 18:01

HealthyPear mentioned this pull request Apr 14, 2021

Implement DL2 integration tests #126

Merged

1 task

kosack requested changes Apr 15, 2021

View reviewed changes

HealthyPear added 8 commits April 15, 2021 11:02

Rename some regressor features

a68cab6

Remove code leftovers from DL2 script

45bc7d9

Fix check for classification features

47c522b

Improve check for model type

f589fc8

Remove old test configuration files for regressor and classifier

8411fe7

Fix comment/description in configuration files

44368ec

Fix names of energy-releated features

95cff80

Check if label is explicitly None because it can be also 0

d10dac3

HealthyPear requested a review from kosack April 15, 2021 10:09

kosack approved these changes Apr 15, 2021

View reviewed changes

HealthyPear merged commit 4928ae4 into cta-observatory:master Apr 15, 2021

HealthyPear deleted the feature-add_RF_for_energy branch April 15, 2021 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve models generation #96

Improve models generation #96

HealthyPear commented Jan 25, 2021 •

edited

Loading

review-notebook-app bot commented Jan 25, 2021

codecov bot commented Jan 25, 2021 •

edited

Loading

kosack left a comment

kosack Apr 15, 2021 •

edited

Loading

HealthyPear Apr 15, 2021

kosack Apr 15, 2021

HealthyPear Apr 15, 2021

Improve models generation #96

Improve models generation #96

Conversation

HealthyPear commented Jan 25, 2021 • edited Loading

Requirements

Summary of expected modifications

review-notebook-app bot commented Jan 25, 2021

codecov bot commented Jan 25, 2021 • edited Loading

Codecov Report

kosack left a comment

Choose a reason for hiding this comment

kosack Apr 15, 2021 • edited Loading

Choose a reason for hiding this comment

HealthyPear Apr 15, 2021

Choose a reason for hiding this comment

kosack Apr 15, 2021

Choose a reason for hiding this comment

HealthyPear Apr 15, 2021

Choose a reason for hiding this comment

HealthyPear commented Jan 25, 2021 •

edited

Loading

codecov bot commented Jan 25, 2021 •

edited

Loading

kosack Apr 15, 2021 •

edited

Loading