Small improvements to modeling script #114

HealthyPear · 2021-03-19T13:28:02Z

This PR is the result of some improvement that came out as a consequence of the development of the integration tests for the modeling part.

the split between train and test data has been simplified/improved by
- using the correspondent scikit-learn function,
- splitting on images instead of obs_id (more "democratic" choice)
- shuffling the images against shower reuse in CORSIKA
the I/O of the script has been improved,
- input and output information can be overwritten by the CLI, which has lower priority over the config file
- camera IDs can be read similarly with the following priorities (from higher to lower)
  - from the config file
  - from the training file
  - from the CLI

codecov · 2021-03-19T13:33:03Z

Codecov Report

Merging #114 (e87557c) into master (e708fe7) will decrease coverage by 0.61%.
The diff coverage is 9.43%.

@@            Coverage Diff             @@
##           master     #114      +/-   ##
==========================================
- Coverage   39.22%   38.61%   -0.62%     
==========================================
  Files          22       22              
  Lines        1912     1950      +38     
==========================================
+ Hits          750      753       +3     
- Misses       1162     1197      +35

Impacted Files	Coverage Δ
protopipe/mva/train_model.py	`15.68% <ø> (ø)`
protopipe/scripts/build_model.py	`11.29% <2.50%> (-3.93%)`	⬇️
protopipe/pipeline/utils.py	`50.90% <22.22%> (-1.66%)`	⬇️
protopipe/mva/utils.py	`12.31% <50.00%> (+0.97%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e708fe7...e87557c. Read the comment docs.

HealthyPear added 2 commits March 19, 2021 14:21

Split train and test using scikit-learn API

9c479d6

Improve build_models script

e53d78d

HealthyPear added enhancement New feature or request machine learning input / output new features or issues regarding input and output formats labels Mar 19, 2021

HealthyPear mentioned this pull request Mar 19, 2021

Improve train-test splitting #115

Open

2 tasks

Update docs

618a08d

HealthyPear marked this pull request as ready for review March 19, 2021 14:30

HealthyPear requested a review from kosack March 19, 2021 14:30

Merge branch 'master' into feature-improve_build_models_and_mva

e87557c

HealthyPear mentioned this pull request Mar 25, 2021

Setup of pipeline integration testing up to modeling #116

Merged

2 tasks

kosack approved these changes Mar 25, 2021

View reviewed changes

HealthyPear merged commit 7632ef2 into cta-observatory:master Mar 26, 2021

HealthyPear deleted the feature-improve_build_models_and_mva branch March 26, 2021 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small improvements to modeling script #114

Small improvements to modeling script #114

HealthyPear commented Mar 19, 2021 •

edited

Loading

codecov bot commented Mar 19, 2021 •

edited

Loading

Small improvements to modeling script #114

Small improvements to modeling script #114

Conversation

HealthyPear commented Mar 19, 2021 • edited Loading

codecov bot commented Mar 19, 2021 • edited Loading

Codecov Report

HealthyPear commented Mar 19, 2021 •

edited

Loading

codecov bot commented Mar 19, 2021 •

edited

Loading